[Clusterusers] hex issues
Ryan Moore
ryan at hampshire.edu
Wed Dec 15 11:43:21 EST 2004
1) rsh maxes out at 500. that is the absolute unchangeable limit. But I
don't think that's the issue...
2) xinetd is the tcp/ip 'super server' which controls rsh connections.
I increased the max instances parameter up to 80 from 60 and I increased
the 'connections per second' parameter up to 30 from 25.
I've restarted the xinetd, so feel free to see if these changes fixed
the problem.
If people want a longer explanation I don't mind giving a mini
networking lecture, but I'm off to lunch right now.
- Ryan
Lee Spector wrote:
>
> Thanks Jaime. That seems to be the same issue I'm experiencing, except
> that I never had the problem on fly OR on hex prior to the recent node
> additions. I haven't been doing enough runs on the new nodes to say if
> the problem cropped up immediately when they were installed or
> sometime thereafter. But I know that it never happened in thousands of
> rsh calls prior to the new nodes. And it's now happening only for me
> only on the new nodes.
>
> One other thing is that the problem does, in my case, interrupt my
> script... either that or it always happens simultaneously on a run of
> nodes from n??-n23, which seems unlikely.
>
> Also -- for the ones that are failing to accept rsh they are sometimes
> doing it very persistently. I've had to resort to manually rshing to
> each such node (vanilla "rsh" works to get an interactive shell, but
> rsh with a command argument does not) to kill processes.
>
> This seems like something that we should fix rather than work around
> (though the loops are clever!). Ryan: could you do a little digging on
> that error message ("protocol failure in circuit setup") and see if
> there's a known fix?
>
> Thanks,
>
> -Lee
More information about the Clusterusers
mailing list