[Clusterusers] hex issues

Ryan Moore ryan at hampshire.edu
Wed Dec 15 11:43:21 EST 2004


1) rsh maxes out at 500. that is the absolute unchangeable limit. But I 
don't think that's the issue...

2) xinetd is the tcp/ip 'super server' which controls rsh connections.  
I increased the max instances parameter up to 80 from 60 and I increased 
the 'connections per second' parameter up to 30 from 25.

I've restarted the xinetd, so feel free to see if these changes fixed 
the problem.

If people want a longer explanation I don't mind giving a mini 
networking lecture, but I'm off to lunch right now.

- Ryan

Lee Spector wrote:

>
> Thanks Jaime. That seems to be the same issue I'm experiencing, except 
> that I never had the problem on fly OR on hex prior to the recent node 
> additions. I haven't been doing enough runs on the new nodes to say if 
> the problem cropped up immediately when they were installed or 
> sometime thereafter. But I know that it never happened in thousands of 
> rsh calls prior to the new nodes. And it's now happening only for me 
> only on the new nodes.
>
> One other thing is that the problem does, in my case, interrupt my 
> script... either that or it always happens simultaneously on a run of 
> nodes from n??-n23, which seems unlikely.
>
> Also -- for the ones that are failing to accept rsh they are sometimes 
> doing it very persistently. I've had to resort to manually rshing to 
> each such node (vanilla "rsh" works to get an interactive shell, but 
> rsh with a command argument does not) to kill processes.
>
> This seems like something that we should fix rather than work around 
> (though the loops are clever!). Ryan: could you do a little digging on 
> that error message ("protocol failure in circuit setup") and see if 
> there's a known fix?
>
> Thanks,
>
>  -Lee






More information about the Clusterusers mailing list