[Clusterusers] compute-1-18 and 1-3

Lee Spector lspector at hampshire.edu
Tue Jan 31 11:23:43 EST 2017


Thanks Josiah,

FWIW I'm currently and for the last few days running jobs via ssh on compute-1-18... and it seems to have been working fine.

If you'd rather that I use another node then let me know. I chose it more or less at random, but I think it was lightly loaded when I did so... maybe because of the reboot?

Ideally I'd use the fastest nodes available for these "manual" runs, and dabbling on rack 4 leads me to believe that the nodes with large numbers of cores are a bad idea for this.

 -Lee

> On Jan 31, 2017, at 11:10 AM, Wm. Josiah Erikson <wjerikson at hampshire.edu> wrote:
> 
> These two nodes weren't running tractor - they had rebooted themselves
> and reinstalled. Compute-1-18 5 days, 11:39 uptime and compute-1-3 1
> day, 5:18. Not sure why - neither has anything in the logs nor shows
> anything particularly suspicious in ganglia. I have restarted tractor
> and noted this occurrance to see if it's a pattern or random. Sometimes
> some jobs do just trigger random reboots if they randomly use up all the
> RAM, invoking the oom-killer, but usually that will leave something in
> the logs.
> 
> 
> -- 
> Wm. Josiah Erikson
> Assistant Director of IT, Infrastructure Group
> System Administrator, School of CS
> Hampshire College
> Amherst, MA 01002
> (413) 559-6091
> 
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> https://lists.hampshire.edu/mailman/listinfo/clusterusers

--
Lee Spector, Professor of Computer Science
Director, Institute for Computational Intelligence
Hampshire College, Amherst, Massachusetts, USA
lspector at hampshire.edu, http://hampshire.edu/lspector/, 413-559-5352



More information about the Clusterusers mailing list