June 30, 2012

Fallout from the Leap Second

From ServerFault:
Anyone else experiencing high rates of linux server crashes today?
Just today, Sat June 30th - starting soon after the start of the day GMT. We've had a handful of blades in different datacentres as managed by different teams all go dark - not responding to pings, screen blank.

They're all running Debian Squeeze - with everything from stock kernel to custom 3.2.21 builds. Most are Dell M610 blades, but I've also just lost a Dell R510 and other departments have lost machines from other vendors too. There was also an older IBM x3550 which crashed and which I thought might be unrelated, but now I'm wondering.
The problem was a library that wasn't able to handle the Leap Second -- they had to remove it before the ssystems would load again:
THE WORKAROUND
Ok people, here's how I worked around it.
1.disabled ntp: /etc/init.d/ntp stop
2.created http://linux.brong.fastmail.fm/2012-06-30/fixtime.pl (code stolen from Marco, see blog posts in comments)
3.ran fixtime.pl without an argument to see that there was a leap second set
4.ran fixtime.pl with an argument to remove the leap second
Oopsie... Posted by DaveH at June 30, 2012 5:07 PM
Comments
Post a comment









Remember personal info?