[Clusterusers] hex issues
Ryan Moore
ryan at hampshire.edu
Wed Dec 15 08:09:49 EST 2004
Node 2 sounds like something I can fix. I'll have it back up before
noon. -ryan
Lee Spector wrote:
>
> Fellow cluster users,
>
> I've recently resumed doing some runs on hex and have been having some
> puzzling problems.
>
> The most puzzling is that my script to start lisp runs on all of the
> nodes seems most of the time to get interrupted at some random point,
> meaning that it starts my processes on nodes n01-n??, where n?? is
> occasionally 23 but sometimes 19, 15, or something else. One
> possibility here is that the processes really are being started but
> that they abort before they can do anything -- I thought this might
> happen if the nodes were heavily loaded, but when I poked around with
> w and top I didn't see much happening on the failing nodes... And
> besides it seems unlikely that somebody's running short, intense
> processes on the highest-numbered nodes, always in sequence from n23
> down...
>
> In a possibly related problem I've been getting the following message
> printed to my shell on the master node from time to time:
>
> protocol failure in circuit setup
>
> I've googled this but only learned that it's an rsh error, which makes
> sense in relation to the previous problem.
>
> Another problem is that n02 is unresponsive, and seemingly in a way
> that's somehow worse than a normal crashed node. Normally a crashed
> node causes a hiccup and an error message ("unreachable node" or
> something like that) from some of my scripts that are currently
> hanging instead. Incidentally, some of the scripts I use to do
> diagnostics (and kill my processes!) are hanging this way, making it
> hard to mess around much.
>
> Any leads or diagnostic suggestions?
>
> Thanks,
>
> -Lee
>
> --
> Lee Spector
> Dean, Cognitive Science + Professor, Computer Science
> Cognitive Science, Hampshire College
> 893 West Street, Amherst, MA 01002-3359
> lspector at hampshire.edu, http://hampshire.edu/lspector/
> Phone: 413-559-5352, Fax: 413-559-5438
>
>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> http://lists.hampshire.edu/mailman/listinfo/clusterusers
More information about the Clusterusers
mailing list