[Clusterusers] cluster planning

Mon May 15 11:51:03 EDT 2006

Josiah,

This all sounds fantastic to me. On the IP/access/firewall stuff --  
if I'll be able to ssh/scp into the cluster head node from the  
outside world I'll be very happy, assuming we won't be putting the  
cluster at unreasonable risk.

I expect that my next major use of the cluster will involve breve,  
and possibly a new built for which we'll have to talk to Jon.

I do have a sequence of probably ignorant questions, which can be  
summarized as: Will I still be able to run/stop cluster-wide lisp  
jobs with my shell scripts or some reasonable replacements? I guess  
there are a couple of parts to this, on the possibly-silly assumption  
that I WOULD still be using shell scripts:

- Can I still get a list of all of the node names and stick them in a  
local file with something like:

/usr/bin/gstat -l -1 | grep n | cut -f 1 -d ' ' | sort > ~/pssc/rshnodes

- Can I still then fork processes on all of the nodes (AT THE SAME  
TIME) with something like:

forkall "/opt/bin/cmucl -dynamic-space-size 1600 -load /home/lspector/ 
$1/load -quiet > /tmp/output"

where my "forkall" is defined as:

#!/bin/sh
while read nodename  ; do
         ssh $nodename "$@" & < /dev/null;
done <  ~/pssc/rshnodes

- Can I still then kill all of my lisp processes across the cluster  
with something like:

forkall source ~/bin/kill-lisps

where my kill-lisps is:

kill -9 `ps -ax | grep lisp | gawk '{ print $1 }'`

If the answer is something like "NO, you can't do any of that in such  
a goofy, neanderthal sort of way, but there are perfectly good and in  
fact simpler ways to do this with ROCKS" then of course I'd be fine  
with that, although I'll need some pointers about the new way to do  
it. If, on the other hand, there's some major snag in doing this sort  
of thing in any way, then I'm worried and we need to talk.

Thanks,

  -Lee

On May 15, 2006, at 10:32 AM, Wm. Josiah Erikson wrote:

> Hello all,
>    For anybody who doesn't already know:
>
>    Right now, fly's head node is serving most of fly AND most of  
> hex (a couple of dead motherboards and hard drives is the only  
> reason that isn't ALL - hardware is on order to remedy this),  
> proving that non-identical hardware can coexist in the same cluster  
> just fine with ROCKS, the clustering software I'm currently using  
> and have fallen in love with.
>      I'm going to go forward with cluster planning assuming that  
> everybody thinks it would be great if both clusters were  
> permanently the same, as that's the universal response I got from  
> everybody when we first started talking about getting fly back up  
> and running. This is possible and even relatively simple with  
> ROCKS, as I have just proven :)
>    I'm going to put new hard drives, in a RAID 1, in hex's current  
> master node - 250GB hard drives, an upgrade from the current 120GB  
> hard drives - the old ones are well past their life expectancy,  
> which makes me nervous. Hex's master node will serve all 40 compute  
> nodes. The question is: Should I put hex at fly's IP (directly  
> acessible from the outside world via SSH and HTTP), or hex's IP  
> (only accessible directly from inside Hampshire's network). I would  
> argue for the former... I also think fly is a cooler name than hex,  
> but I don't actually care :) I can keep fly up-to-date security- 
> wise for those services that are available, and the rest are  
> firewalled at the kernel level as well as at the edge of  
> Hampshire's network, so I don't think we're exposing ourselves to  
> anything big and scary by giving outselves a globally valid IP.
>
>    In short: I plan on putting both clusters together into one,  
> using hex's current master node to be the head node, calling it  
> fly, and making it globally accessible. Does anybody have a problem  
> with this?
>
>    Here are the services/programs that I know need to be installed.  
> Please add to this list:
>
>    -breve
>    -Maya and Pixar license servers
>    -Pixar stuff (RenderManProServer and rat)
>    -MPI
>    -build tools (gcc, etc - v4 as well as v3)
>    -X on the head node
>    -Maya
>    -cmucl
>    -All the standard ROCKS stuff that is now on fly. This means  
> that the new combined cluster will look very very much indeed like  
> fly does currently. Nearly identical, in fact, except for the  
> addition of gcc 4.0 , license servers, and the head node will be  
> faster and RAIDed. We're working on a backup scheme for the  
> homedirs, and until then, I would continue to use the FireWire  
> drive that is currently being used for backup.
>
>    I will take everything that people have on both their hex and  
> their fly homedirs and put them into the new cluster, if necessary.  
> It would be nice if people would clean up what isn't needed before  
> then.
>
>    Also, please tell me when you next plan to use the cluster and  
> for what, so that I can plan what should be up and working and how  
> at that point.
>
>    Thanks - send any comments, concerns, or "what the hell do you  
> think you're doing"'s my way.
>
>    -Josiah
>
>
>
>
>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> http://lists.hampshire.edu/mailman/listinfo/clusterusers

--
Lee Spector, Professor of Computer Science
School of Cognitive Science, Hampshire College
893 West Street, Amherst, MA 01002-3359
lspector at hampshire.edu, http://hampshire.edu/lspector/
Phone: 413-559-5352, Fax: 413-559-5438