[Clusterusers] compute-1-10
Wm. Josiah Erikson
wjerikson at hampshire.edu
Tue Jan 17 14:41:53 EST 2017
Not sure what exactly is going on there, but I'm suspicious of bad
memory. I'm NIMBYing it and will run a memory test on it when the
existing jobs are finished or errored out :)
Sending this to the list too so everyone knows why compute-1-10 is
NIMBYed.
-Josiah
On 1/17/17 10:55 AM, Thomas Helmuth wrote:
> Sure! It looks like most or all of them threw our old friend hs_err
> messages with a bunch of Java info. I've attached a bunch of them. I
> remember trying to get to the bottom of these years ago, I think with
> a different compute node, and we ended up just ignoring the error and
> taking the node off my tag.
>
> The errors printed to tractor are all over the place -- from just
> saying they aborted to null pointer exceptions to array index out of
> bounds exceptions. I think they're unrelated and were just caused by
> whatever threw the hs_err.
>
> All of these errors occurred within 1-3 minutes of starting the run.
>
> Thanks,
> Tom
>
> On Tue, Jan 17, 2017 at 9:11 AM, Wm. Josiah Erikson
> <wjerikson at hampshire.edu <mailto:wjerikson at hampshire.edu>> wrote:
>
> My quick and dirty look at the node doesn't see anything wrong
> with it -
> can you send me a link to or the text of the weird error?
>
> -Josiah
>
>
>
> On 1/13/17 3:51 PM, Thomas Helmuth wrote:
> > Hi Josiah,
> >
> > I have some runs going on fly, and noticed that a bunch of them on
> > compute-1-10 crashed with various weird error messages. This was the
> > only node with weird crashes, so I'm wondering if something is going
> > bad in that node. Any ideas? Would you be able to either take that
> > node offline, or remove it from tag "tom" so my runs don't use it?
> >
> > Thanks,
> > Tom
>
> --
> Wm. Josiah Erikson
> Assistant Director of IT, Infrastructure Group
> System Administrator, School of CS
> Hampshire College
> Amherst, MA 01002
> (413) 559-6091 <tel:%28413%29%20559-6091>
>
>
--
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
(413) 559-6091
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.hampshire.edu/pipermail/clusterusers/attachments/20170117/253bbbaf/attachment.html>
More information about the Clusterusers
mailing list