[Clusterusers] hex issues

Ryan Moore ryan at hampshire.edu
Wed Dec 15 08:09:49 EST 2004


Node 2 sounds like something I can fix. I'll have it back up before 
noon. -ryan


Lee Spector wrote:

>
> Fellow cluster users,
>
> I've recently resumed doing some runs on hex and have been having some 
> puzzling problems.
>
> The most puzzling is that my script to start lisp runs on all of the 
> nodes seems most of the time to get interrupted at some random point, 
> meaning that it starts my processes on nodes n01-n??, where n?? is 
> occasionally 23 but sometimes 19, 15, or something else. One 
> possibility here is that the processes really are being started but 
> that they abort before they can do anything -- I thought this might 
> happen if the nodes were heavily loaded, but when I poked around with 
> w and top I didn't see much happening on the failing nodes... And 
> besides it seems unlikely that somebody's running short, intense 
> processes on the highest-numbered nodes, always in sequence from n23 
> down...
>
> In a possibly related problem I've been getting the following message 
> printed to my shell on the master node from time to time:
>
>      protocol failure in circuit setup
>
> I've googled this but only learned that it's an rsh error, which makes 
> sense in relation to the previous problem.
>
> Another problem is that n02 is unresponsive, and seemingly in a way 
> that's somehow worse than a normal crashed node. Normally a crashed 
> node causes a hiccup and an error message ("unreachable node" or 
> something like that) from some of my scripts that are currently 
> hanging instead. Incidentally, some of the scripts I use to do 
> diagnostics (and kill my processes!) are hanging this way, making it 
> hard to mess around much.
>
> Any leads or diagnostic suggestions?
>
> Thanks,
>
>  -Lee
>
> -- 
> Lee Spector
> Dean, Cognitive Science + Professor, Computer Science
> Cognitive Science, Hampshire College
> 893 West Street, Amherst, MA 01002-3359
> lspector at hampshire.edu, http://hampshire.edu/lspector/
> Phone: 413-559-5352, Fax: 413-559-5438
>
>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> http://lists.hampshire.edu/mailman/listinfo/clusterusers






More information about the Clusterusers mailing list