[Clusterusers] Mysterious node reboots

Wm. Josiah Erikson wjerikson at hampshire.edu
Thu Apr 24 13:30:44 EDT 2014


Hi guys,
     This is also up on the blog, but in case you don't read it (it's at 
http://clusterstatus.i3ci.hampshire.edu), here's what's up. Any thoughts 
or ideas are welcome!


Some weird reboot/reinstalls occurred, that may or may not make any sense.

1d 36m ago, compute-1-18 rebooted. It's plugged into one of the white 
2200VA UPSes. No other nodes plugged into this UPS went down at that time.

Seventeen hours ago, the following nodes went down: 1-12, 1-14, and 2-22 
through 2-25. 1-12 and 1-14 are plugged into a power strip (along with 
many other nodes that did not go down) that is plugged into the 3000VA 
UPS that is second from the bottom in the stack. It has bad batteries. 
We stole from the bottom UPS, which is off, and made a new battery pack. 
2-22 through 2-26 are plugged into the top half of the 208V PDU. All of 
these nodes were plugged into locations that had plenty of other nodes 
also plugged in that didn't go down.

2 hours ago, the following nodes went down: 1-5, 1-10, 2-1 through 2-4. 
These nodes are all plugged into the top 3000VA UPS in the stack. 
However, compute-1-11 and 1-9 are also plugged into that UPS, and they 
did not go down at that time. They're also fully loaded and have been 
the whole time.

What gives? At this point it is entirely unclear to me.


-- 
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
(413) 559-6091

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.hampshire.edu/pipermail/clusterusers/attachments/20140424/ea319a0a/attachment.html>


More information about the Clusterusers mailing list