[Clusterusers] Fly crashed again
Wm. Josiah Erikson
wjerikson at hampshire.edu
Wed Apr 24 10:16:49 EDT 2013
I am currently running memtest86+ on the head node. If that passes, I
will replace the CPU's. If it does not pass, I will replace offending
RAM. If it crashes again after that, I will try updating the kernel, in
case there's something different that we're doing with the head node now
that is tickling a kernel bug. If none of that works, I will replace the
entire machine with a different one. If it still crashes after all of
that, well... I'll cross that bridge if we come to it.
Thanks for your patience,
-Josiah
On 4/23/13 11:17 PM, Wm. Josiah Erikson wrote:
> Looks like it's time to do some more serious hardware debugging on the
> head node. Probably run memtest86+ and/or replace the processor(s):
>
> http://clusterstatus.i3ci.hampshire.edu/2013/04/12/fly-crashed-again/#comment-697
>
>
> It'll probably be down for the rest of the night - couldn't get it to
> reboot remotely. I'll run some tests in the morning and determine the
> best course of action. I'll keep you all updated.
>
--
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
(413) 559-6091
More information about the Clusterusers
mailing list