[Clusterusers] Fly crashed again

Lee Spector lspector at hampshire.edu
Wed Apr 24 10:49:46 EDT 2013


Thanks for your work on this Josiah!

  -Lee

On Apr 24, 2013, at 10:16 AM, Wm. Josiah Erikson wrote:

> I am currently running memtest86+ on the head node. If that passes, I will replace the CPU's. If it does not pass, I will replace offending RAM. If it crashes again after that, I will try updating the kernel, in case there's something different that we're doing with the head node now that is tickling a kernel bug. If none of that works, I will replace the entire machine with a different one. If it still crashes after all of that, well... I'll cross that bridge if we come to it.
> Thanks for your patience,
>    -Josiah
> 
> 
> 
> 
> On 4/23/13 11:17 PM, Wm. Josiah Erikson wrote:
>> Looks like it's time to do some more serious hardware debugging on the head node. Probably run memtest86+ and/or replace the processor(s):
>> 
>> http://clusterstatus.i3ci.hampshire.edu/2013/04/12/fly-crashed-again/#comment-697 
>> 
>> It'll probably be down for the rest of the night - couldn't get it to reboot remotely. I'll run some tests in the morning and determine the best course of action. I'll keep you all updated.
>> 
> 
> -- 
> Wm. Josiah Erikson
> Assistant Director of IT, Infrastructure Group
> System Administrator, School of CS
> Hampshire College
> Amherst, MA 01002
> (413) 559-6091
> 
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> https://lists.hampshire.edu/mailman/listinfo/clusterusers

--
Lee Spector, Professor of Computer Science
Cognitive Science, Hampshire College
893 West Street, Amherst, MA 01002-3359
lspector at hampshire.edu, http://hampshire.edu/lspector/
Phone: 413-559-5352, Fax: 413-559-5438



More information about the Clusterusers mailing list