[Clusterusers] cluster planning
Wm. Josiah Erikson
wjerikson at hampshire.edu
Mon May 15 13:16:26 EDT 2006
I just read this myself, and it makes things sound much more complex
than they have to be, depending on what you're trying to do....
"ssh compute-0-1 ls" will still launch "ls" on compute-0-1 (n01 in the
old nomenclature) (note ssh instead of rsh)
"cluster-fork ls" will launch "ls" on all the nodes. cluster-fork has
all kinds of options:
Cluster Fork - version 4.1
Usage: cluster-fork [-hm] [-d database] [-u username] [-p password]
[-q sql-expr] [-n nodes] [--help] [--list-rcfiles]
[--list-project-info] [--bg] [--verbose] [--rcfile arg] [--db database]
[--host host] [--user username]
[--password password] [--query sql-expr] [--nodes encoded node list]
[--pe-hostfile sge machinefile] command
We will also have alfserver running on all of the nodes, with an alfred
maitre'd running on the head node, so we could use that to launch any
kind of job we liked, as well as mpich or linpack, neither of which I
have used or am familiar with, but there's plenty of documentation out
there for it.
-Josiah
Wm. Josiah Erikson wrote:
> Yes, this is all still possible, though slightly different (and
> possibly, or probably... better). You might want to read this:
>
> http://fly.hampshire.edu/rocks-documentation/4.1/start-computing.html
>
> (or
> http://www.rocksclusters.org/rocks-documentation/4.1/start-computing.html
> )
>
> -Josiah
>
>
> Lee Spector wrote:
>
>>
>> Josiah,
>>
>> This all sounds fantastic to me. On the IP/access/firewall stuff --
>> if I'll be able to ssh/scp into the cluster head node from the
>> outside world I'll be very happy, assuming we won't be putting the
>> cluster at unreasonable risk.
>>
>> I expect that my next major use of the cluster will involve breve,
>> and possibly a new built for which we'll have to talk to Jon.
>>
>> I do have a sequence of probably ignorant questions, which can be
>> summarized as: Will I still be able to run/stop cluster-wide lisp
>> jobs with my shell scripts or some reasonable replacements? I guess
>> there are a couple of parts to this, on the possibly-silly
>> assumption that I WOULD still be using shell scripts:
>>
>> - Can I still get a list of all of the node names and stick them in
>> a local file with something like:
>>
>> /usr/bin/gstat -l -1 | grep n | cut -f 1 -d ' ' | sort > ~/pssc/rshnodes
>>
>> - Can I still then fork processes on all of the nodes (AT THE SAME
>> TIME) with something like:
>>
>> forkall "/opt/bin/cmucl -dynamic-space-size 1600 -load
>> /home/lspector/ $1/load -quiet > /tmp/output"
>>
>> where my "forkall" is defined as:
>>
>> #!/bin/sh
>> while read nodename ; do
>> ssh $nodename "$@" & < /dev/null;
>> done < ~/pssc/rshnodes
>>
>> - Can I still then kill all of my lisp processes across the cluster
>> with something like:
>>
>> forkall source ~/bin/kill-lisps
>>
>> where my kill-lisps is:
>>
>> kill -9 `ps -ax | grep lisp | gawk '{ print $1 }'`
>>
>> If the answer is something like "NO, you can't do any of that in
>> such a goofy, neanderthal sort of way, but there are perfectly good
>> and in fact simpler ways to do this with ROCKS" then of course I'd
>> be fine with that, although I'll need some pointers about the new
>> way to do it. If, on the other hand, there's some major snag in
>> doing this sort of thing in any way, then I'm worried and we need to
>> talk.
>>
>> Thanks,
>>
>> -Lee
>>
>>
>>
>>
>>
>> On May 15, 2006, at 10:32 AM, Wm. Josiah Erikson wrote:
>>
>>> Hello all,
>>> For anybody who doesn't already know:
>>>
>>> Right now, fly's head node is serving most of fly AND most of
>>> hex (a couple of dead motherboards and hard drives is the only
>>> reason that isn't ALL - hardware is on order to remedy this),
>>> proving that non-identical hardware can coexist in the same cluster
>>> just fine with ROCKS, the clustering software I'm currently using
>>> and have fallen in love with.
>>> I'm going to go forward with cluster planning assuming that
>>> everybody thinks it would be great if both clusters were
>>> permanently the same, as that's the universal response I got from
>>> everybody when we first started talking about getting fly back up
>>> and running. This is possible and even relatively simple with
>>> ROCKS, as I have just proven :)
>>> I'm going to put new hard drives, in a RAID 1, in hex's current
>>> master node - 250GB hard drives, an upgrade from the current 120GB
>>> hard drives - the old ones are well past their life expectancy,
>>> which makes me nervous. Hex's master node will serve all 40 compute
>>> nodes. The question is: Should I put hex at fly's IP (directly
>>> acessible from the outside world via SSH and HTTP), or hex's IP
>>> (only accessible directly from inside Hampshire's network). I would
>>> argue for the former... I also think fly is a cooler name than hex,
>>> but I don't actually care :) I can keep fly up-to-date security-
>>> wise for those services that are available, and the rest are
>>> firewalled at the kernel level as well as at the edge of
>>> Hampshire's network, so I don't think we're exposing ourselves to
>>> anything big and scary by giving outselves a globally valid IP.
>>>
>>> In short: I plan on putting both clusters together into one,
>>> using hex's current master node to be the head node, calling it
>>> fly, and making it globally accessible. Does anybody have a problem
>>> with this?
>>>
>>> Here are the services/programs that I know need to be installed.
>>> Please add to this list:
>>>
>>> -breve
>>> -Maya and Pixar license servers
>>> -Pixar stuff (RenderManProServer and rat)
>>> -MPI
>>> -build tools (gcc, etc - v4 as well as v3)
>>> -X on the head node
>>> -Maya
>>> -cmucl
>>> -All the standard ROCKS stuff that is now on fly. This means
>>> that the new combined cluster will look very very much indeed like
>>> fly does currently. Nearly identical, in fact, except for the
>>> addition of gcc 4.0 , license servers, and the head node will be
>>> faster and RAIDed. We're working on a backup scheme for the
>>> homedirs, and until then, I would continue to use the FireWire
>>> drive that is currently being used for backup.
>>>
>>> I will take everything that people have on both their hex and
>>> their fly homedirs and put them into the new cluster, if necessary.
>>> It would be nice if people would clean up what isn't needed before
>>> then.
>>>
>>> Also, please tell me when you next plan to use the cluster and
>>> for what, so that I can plan what should be up and working and how
>>> at that point.
>>>
>>> Thanks - send any comments, concerns, or "what the hell do you
>>> think you're doing"'s my way.
>>>
>>> -Josiah
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Clusterusers mailing list
>>> Clusterusers at lists.hampshire.edu
>>> http://lists.hampshire.edu/mailman/listinfo/clusterusers
>>
>>
>>
>> --
>> Lee Spector, Professor of Computer Science
>> School of Cognitive Science, Hampshire College
>> 893 West Street, Amherst, MA 01002-3359
>> lspector at hampshire.edu, http://hampshire.edu/lspector/
>> Phone: 413-559-5352, Fax: 413-559-5438
>>
>> _______________________________________________
>> Clusterusers mailing list
>> Clusterusers at lists.hampshire.edu
>> http://lists.hampshire.edu/mailman/listinfo/clusterusers
>
>
>
More information about the Clusterusers
mailing list