[Clusterusers] compute-1-18 and 1-3

Wm. Josiah Erikson wjerikson at hampshire.edu
Tue Jan 31 11:33:43 EST 2017


ah OK - yeah it wasn't running tractor so that's why. Unfortunately I
just screwed that up for you. Compute-1-18 is one of the faster nodes,
so you chose well. Four of Eva's jobs just got dispatched to that node,
but I think if you "eject and reschedule" them, it probably won't screw
things up for her? I have already NIMBYed the node. Sorry about that!

If you need me to do that because it says you don't have permission, let
me know.

    -Josiah



On 1/31/17 11:23 AM, Lee Spector wrote:
> Thanks Josiah,
>
> FWIW I'm currently and for the last few days running jobs via ssh on compute-1-18... and it seems to have been working fine.
>
> If you'd rather that I use another node then let me know. I chose it more or less at random, but I think it was lightly loaded when I did so... maybe because of the reboot?
>
> Ideally I'd use the fastest nodes available for these "manual" runs, and dabbling on rack 4 leads me to believe that the nodes with large numbers of cores are a bad idea for this.
>
>  -Lee
>
>> On Jan 31, 2017, at 11:10 AM, Wm. Josiah Erikson <wjerikson at hampshire.edu> wrote:
>>
>> These two nodes weren't running tractor - they had rebooted themselves
>> and reinstalled. Compute-1-18 5 days, 11:39 uptime and compute-1-3 1
>> day, 5:18. Not sure why - neither has anything in the logs nor shows
>> anything particularly suspicious in ganglia. I have restarted tractor
>> and noted this occurrance to see if it's a pattern or random. Sometimes
>> some jobs do just trigger random reboots if they randomly use up all the
>> RAM, invoking the oom-killer, but usually that will leave something in
>> the logs.
>>
>>
>> -- 
>> Wm. Josiah Erikson
>> Assistant Director of IT, Infrastructure Group
>> System Administrator, School of CS
>> Hampshire College
>> Amherst, MA 01002
>> (413) 559-6091
>>
>> _______________________________________________
>> Clusterusers mailing list
>> Clusterusers at lists.hampshire.edu
>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
> --
> Lee Spector, Professor of Computer Science
> Director, Institute for Computational Intelligence
> Hampshire College, Amherst, Massachusetts, USA
> lspector at hampshire.edu, http://hampshire.edu/lspector/, 413-559-5352
>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> https://lists.hampshire.edu/mailman/listinfo/clusterusers

-- 
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
(413) 559-6091



More information about the Clusterusers mailing list