[Clusterusers] Re: cluster mailing lists & something funky on compute-1-3

Wm. Josiah Erikson wjerikson at hampshire.edu
Tue Oct 9 11:04:38 EDT 2007


I may want to look into the way that our kernels are allocating RAM, or, 
possibly, just make our swap space much smaller, eliminating the 
possibility of having breve trying to run in swap and having it just die 
instead, which is actually probably better.

    -Josiah



Lee Spector wrote:
>
> Thanks!
>
> Interesting... If your theory is right then this will only happen when 
> the head node crashes, so maybe we should always check for that 
> whenever rebooting the head.
>
> Unfortunately ballooning RAM consumption is probably always going to 
> be a possibility not just with breve or with our particular 
> simulations but in any evolutionary context with variable-sized 
> programs/representations and/or variable-sized populations. There are 
> ways to limit things but sometimes you don't know what dimension will 
> grow or how much, it may depend on random seeds or other things, some 
> limits will prevent evolutionary progress, etc. So it's something that 
> we have to be able to recover from.
>
>  -Lee
>
>
>
>
> On Oct 9, 2007, at 9:17 AM, Wm. Josiah Erikson wrote:
>
>> uh... that's clusterusers at lists.hampshire.edu obviously :)
>>
>> compute-1-3 was stuck in permanent iowait with a breve process using 
>> up more RAM than the machine actually had (meaning it was in swap). I 
>> think it probably got stuck when the head node crashed and ended up 
>> with a stale file handle or something when the home directory got 
>> pulled out from under it.
>>
>> I rebooted it :)
>>
>>    -Josiah
>>
>>
>>
>> Wm. Josiah Erikson wrote:
>>> I have subscribed Adam, Brian was already subscribed, as is Kyle, 
>>> Lee, Jaime, Chris, me, Michael, and a few others. I think helga can 
>>> be removed from the discussion unless there is somebody else on that 
>>> list that should know what's up with the cluster.
>>>
>>> clusterusers at hermes doesn't work anymore - that's old and the 
>>> hostnames have changed. clusteruers at lists.hampshire.edu should be 
>>> the proper address. We'll see if this gets through :)
>>>
>>> I'll go check on compute-1-3 and report back.
>>>
>>>    -Josiah
>>>
>>>
>>>
>>> Lee Spector wrote:
>>>>
>>>> It seems like a lot of people are using/concerned with the status 
>>>> of the cluster these days, but maybe not all of them are on the 
>>>> cluster-users list. Is that right? It'd be nice to clear this up so 
>>>> that we're not having unsynced conversations among those of us on 
>>>> the cluster list, the helga list, and maybe no list, not to mention 
>>>> individual email side conversations. Can we take care of this by 
>>>> subscribing helga to cluster-users and making sure that all 
>>>> non-helga users and administrators are individually subscribed?
>>>>
>>>> On an actual cluster-related note: Does anyone know in what 
>>>> particular way compute-1-3 is currently hosed, such that it hangs 
>>>> my cluster-looped scripts? To see what I mean you can try 
>>>> 'cluster-fork hostname' (I'll include the output below since if 
>>>> someone fixes compute-1-3 you won't see what I mean...). 
>>>> Cluster-fork is smart enough to ignore some node pathologies (e.g. 
>>>> compute-0-10 is currently down, and is properly skipped, and 
>>>> compute-1-12 and compute-1-13 are refusing ssh connections, and are 
>>>> properly skipped), but compute-1-3 requires a command-C. Worse, and 
>>>> more important for me, is that command-C (or other cluster-fork 
>>>> options) won't work for many of my scripts, which use ./sh loops to 
>>>> run through the nodes.
>>>>
>>>> If someone has the access/knowledge/power to fix compute-1-3 then 
>>>> please do so and let us know. Or if you just have an idea what 
>>>> might be up with compute-1-3 and how to prevent it then please share.
>>>>
>>>> Thanks,
>>>>
>>>>  -Lee
>>>>
>>>> -----------
>>>>
>>>> [lspector at fly bin]$ cluster-fork hostnamecompute-0-1:
>>>> compute-0-1.local
>>>> compute-0-2:
>>>> compute-0-2.local
>>>> compute-0-3:
>>>> compute-0-3.local
>>>> compute-0-4:
>>>> compute-0-4.local
>>>> compute-0-5:
>>>> compute-0-5.local
>>>> compute-0-6:
>>>> compute-0-6.local
>>>> compute-0-7:
>>>> compute-0-7.local
>>>> compute-0-8:
>>>> compute-0-8.local
>>>> compute-0-9:
>>>> compute-0-9.local
>>>> compute-0-10: down
>>>> compute-0-11:
>>>> compute-0-11.local
>>>> compute-0-12:
>>>> compute-0-12.local
>>>> compute-0-13:
>>>> compute-0-13.local
>>>> compute-0-14:
>>>> compute-0-14.local
>>>> compute-0-15:
>>>> compute-0-15.local
>>>> compute-0-16:
>>>> compute-0-16.local
>>>> compute-0-17:
>>>> compute-0-17.local
>>>> compute-0-18:
>>>> compute-0-18.local
>>>> compute-0-19:
>>>> compute-0-19.local
>>>> compute-0-20:
>>>> compute-0-20.local
>>>> compute-0-21:
>>>> compute-0-21.local
>>>> compute-0-22:
>>>> compute-0-22.local
>>>> compute-0-23:
>>>> compute-0-23.local
>>>> compute-1-1:
>>>> compute-1-1.local
>>>> compute-1-2:
>>>> compute-1-2.local
>>>> compute-1-3:             [ HANGS HERE, CTRL-C ALLOWS CONTINUATION
>>>> compute-1-4:
>>>> compute-1-4.local
>>>> compute-1-5:
>>>> compute-1-5.local
>>>> compute-1-6:
>>>> compute-1-6.local
>>>> compute-1-7:
>>>> compute-1-7.local
>>>> compute-1-8:
>>>> compute-1-8.local
>>>> compute-1-9:
>>>> compute-1-9.local
>>>> compute-1-10:
>>>> compute-1-10.local
>>>> compute-1-11:
>>>> compute-1-11.local
>>>> compute-1-12:
>>>> ssh: connect to host compute-1-12 port 22: Connection refused
>>>> compute-1-13:
>>>> ssh: connect to host compute-1-13 port 22: Connection refused
>>>> compute-1-14:
>>>> compute-1-14.local
>>>> compute-1-15:
>>>> compute-1-15.local
>>>> compute-1-16:
>>>> compute-1-16.local
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Lee Spector, Professor of Computer Science
>>>> School of Cognitive Science, Hampshire College
>>>> 893 West Street, Amherst, MA 01002-3359
>>>> lspector at hampshire.edu, http://hampshire.edu/lspector/
>>>> Phone: 413-559-5352, Fax: 413-559-5438
>>>>
>>>
>>
>> -- 
>> Wm. Josiah Erikson
>> Computing Support
>> School of Cognitive Science
>> Hampshire College
>> Amherst, MA 01002
>> (413) 559-6091
>>
>> _______________________________________________
>> Clusterusers mailing list
>> Clusterusers at lists.hampshire.edu
>> http://lists.hampshire.edu/mailman/listinfo/clusterusers
>
> -- 
> Lee Spector, Professor of Computer Science
> School of Cognitive Science, Hampshire College
> 893 West Street, Amherst, MA 01002-3359
> lspector at hampshire.edu, http://hampshire.edu/lspector/
> Phone: 413-559-5352, Fax: 413-559-5438
>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> http://lists.hampshire.edu/mailman/listinfo/clusterusers

-- 
Wm. Josiah Erikson
Computing Support
School of Cognitive Science
Hampshire College
Amherst, MA 01002
(413) 559-6091




More information about the Clusterusers mailing list