[Clusterusers] Fly crashed again

Wm. Josiah Erikson wjerikson at hampshire.edu
Wed Apr 24 10:16:49 EDT 2013


I am currently running memtest86+ on the head node. If that passes, I 
will replace the CPU's. If it does not pass, I will replace offending 
RAM. If it crashes again after that, I will try updating the kernel, in 
case there's something different that we're doing with the head node now 
that is tickling a kernel bug. If none of that works, I will replace the 
entire machine with a different one. If it still crashes after all of 
that, well... I'll cross that bridge if we come to it.
Thanks for your patience,
     -Josiah




On 4/23/13 11:17 PM, Wm. Josiah Erikson wrote:
> Looks like it's time to do some more serious hardware debugging on the 
> head node. Probably run memtest86+ and/or replace the processor(s):
>
> http://clusterstatus.i3ci.hampshire.edu/2013/04/12/fly-crashed-again/#comment-697 
>
>
> It'll probably be down for the rest of the night - couldn't get it to 
> reboot remotely. I'll run some tests in the morning and determine the 
> best course of action. I'll keep you all updated.
>

-- 
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
(413) 559-6091



More information about the Clusterusers mailing list