<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Yes, it was bad RAM. I found and removed the offending stick and
will RMA it. For now, compute-1-10 "only" has 24GB of RAM.<br>
</p>
<p> -Josiah</p>
<p><br>
</p>
<br>
<div class="moz-cite-prefix">On 1/17/17 2:41 PM, Wm. Josiah Erikson
wrote:<br>
</div>
<blockquote
cite="mid:a3ffaa94-c437-b043-38cf-4c0bcc8fa2b9@hampshire.edu"
type="cite">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<p> Not sure what exactly is going on there, but I'm suspicious
of bad memory. I'm NIMBYing it and will run a memory test on it
when the existing jobs are finished or errored out :)</p>
<p> Sending this to the list too so everyone knows why
compute-1-10 is NIMBYed.<br>
</p>
<p> -Josiah</p>
<p><br>
</p>
<br>
<div class="moz-cite-prefix">On 1/17/17 10:55 AM, Thomas Helmuth
wrote:<br>
</div>
<blockquote
cite="mid:CABgVVjeAQUjsfS+P+JrGeCFsxB_qvRx4EGTGZvDvDfELggLSMg@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>Sure! It looks like most or all of them threw our
old friend hs_err messages with a bunch of Java info.
I've attached a bunch of them. I remember trying to
get to the bottom of these years ago, I think with a
different compute node, and we ended up just ignoring
the error and taking the node off my tag.<br>
<br>
</div>
The errors printed to tractor are all over the place --
from just saying they aborted to null pointer exceptions
to array index out of bounds exceptions. I think they're
unrelated and were just caused by whatever threw the
hs_err. <br>
<br>
</div>
All of these errors occurred within 1-3 minutes of
starting the run.<br>
<br>
</div>
Thanks,<br>
</div>
Tom<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Jan 17, 2017 at 9:11 AM, Wm.
Josiah Erikson <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:wjerikson@hampshire.edu" target="_blank">wjerikson@hampshire.edu</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">My quick
and dirty look at the node doesn't see anything wrong with
it -<br>
can you send me a link to or the text of the weird error?<br>
<br>
-Josiah<br>
<div class="HOEnZb">
<div class="h5"><br>
<br>
<br>
On 1/13/17 3:51 PM, Thomas Helmuth wrote:<br>
> Hi Josiah,<br>
><br>
> I have some runs going on fly, and noticed that a
bunch of them on<br>
> compute-1-10 crashed with various weird error
messages. This was the<br>
> only node with weird crashes, so I'm wondering if
something is going<br>
> bad in that node. Any ideas? Would you be able to
either take that<br>
> node offline, or remove it from tag "tom" so my
runs don't use it?<br>
><br>
> Thanks,<br>
> Tom<br>
<br>
</div>
</div>
<span class="HOEnZb"><font color="#888888">--<br>
Wm. Josiah Erikson<br>
Assistant Director of IT, Infrastructure Group<br>
System Administrator, School of CS<br>
Hampshire College<br>
Amherst, MA 01002<br>
<a moz-do-not-send="true"
href="tel:%28413%29%20559-6091" value="+14135596091">(413)
559-6091</a><br>
<br>
</font></span></blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
(413) 559-6091
</pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
(413) 559-6091
</pre>
</body>
</html>