Wednesday, July 25, 2012

HPET Warnings in Messages

You know when you happen to be looking through logs for potential answers to a problem, and you run into messages that indicate another, completely different problem that likely needs your attention?

Yeah. That.

You may remember that part of some recent data center work I did involved replacing a server that had died. I booted it up using a Mondo Rescue image of the original server, and then the Dev dude copied specific data from another server using some script he whipped up at the time. One and done. 

Yesterday I SSH'd in to complete the setup by installing Nagios and OpenLDAP. No, we don't have Puppet or Chef in the mix yet. It was one of many things on my to-do list. Anyway, I got NRPE installed but received an error when doing a basic test on the machine to make sure it worked: 

user@server3:/usr/local/nagios/bin$ !99
sudo /usr/local/nagios/libexec/check_nrpe -H localhost
CHECK_NRPE: Error - Could not complete SSL handshake.
Since SSL isn't even enabled on this box I wasn't quite sure what was causing it so I checked /var/log/messages. I found this instead:

Jul 22 12:05:30 JXT3 kernel: [130848.552652] CE: hpet increasing min_delta_ns to 15000 nsec
Jul 22 12:05:30 JXT3 kernel: [130848.552726] CE: hpet increasing min_delta_ns to 22500 nsec
Now this was a few days ago and I don't see any other mentions of it, but it makes me nervous so I check it out. Seems it's a known bug that appears to affect some versions of the Linux kernel, though I can't get an exact bead on which ones. We're running 2.6.32-28 which is usually a couple of minor revision ahead of the versions reported by users online. Also, all of our servers are running the same kernel and I don't see this error in their logs. 

HPET stands for High Precision Event Timer. From Wikipedia, this timer counts upwards in increments of no less than 10MHz and has no less than 3 comparators. This explanation didn't do much towards letting me know what the purpose was. I skimmed a paper on Intel's website that made a little more sense (in the beginning at least) and let me understand the general gist of things, which is that it's essentially an internal timer for the OS. 

Why is it giving errors? https://bugs.launchpad.net/ubuntu/+source/linux/+bug/575774 contains some info. The thread ends with the suggestion to disable it, which can apparently be done by adding the line "hpet=disable" to grub. I may try this if I continue to see these errors pop up as others have described it as causing stability issues (freezing, crashing) after it reaches the 100ms mark.

No comments:

Post a Comment