[Clusterusers] fly - "New" cluster details
Wm. Josiah Erikson
wjerikson at hampshire.edu
Fri May 26 10:49:22 EDT 2006
Hello all,
This is a very long email telling you about the setup details of the
new cluster. I will try to keep the most important information at the
top, and put the less and less interesting and more and more geeky stuff
further and further towards the bottom.
If you are receiving this email directly, instead of just through
the clusterusers list, that means that I have activated your account,
and you should be able to log in to fly.hampshire.edu (directly, from
outside or wherever) with your old hex account info. It will then ask
you to generate a public/private key pair - I would answer defaults, and
leave the passphrase empty if you don't want to be asked for a password
every time you SSH into a node.
If you want your old account activated and you didn't get this email
directly (check the To: list), just ask and you shall receive. My
intention was to activate all the accounts of people I'd heard from
recently, but I probably forgot somebody. I'm trying to prune unused
accounts, not trying to shut anybody out who wants to use the cluster.
Anything that was in your old hex homedir is now in your new fly homedir
- I moved them over wholesale. This also means that your .bashrc and
.bash_profile stuff got moved over as well - you may prefer to grab the
default stuff from /etc/skel, as it might have broken some things.
OK, now for some details on the software setup of the cluster,
followed by some hardware details:
The home directories are cross-mounted to every node, of course, as
is /share/apps.
The compute nodes from hex are named compute-0-1 through compute-0-23
The compute nodes from old fly are named compute-1-1 through
compute-1-16
gcc and gcc4 are both installed, breve will be installed soon (in
/share/apps/breve/<version>), cmucl is installed (in /share/apps), Maya
(for libraries) and the Pixar software is installed locally on every
node, with the appropriately optimized version of RenderManProServer. If
prman, etc, are not in your path, you can fix it by grabbing the default
.bashrc and .bash_profile from /etc/skel, as mentioned previously
(assuming bash is your shell).
All of the default ROCKS stuff is installed as well, though I only
installed the first two OS CD's. fly is ROCKS 4.1, which is based on
CentOS 4.2, which is based on RHEL 4... check
http://www.rocksclusters.org for details and documentation (yay
well-documented, open source software!)
If there is anything that you want installed that isn't, just ask.
Most of the usual GNU tools are present.
Hardware details, for the morbidly curious (you can check
http://fly.hampshire.edu/ganglia for even more details, but here's a
quick rundown):
We have 40 machines total, 62 processors, and close to 100GB of RAM
in this cluster.
The head node is hex's old head node, a dual-P4 2.4Ghz with 4GB of
RAM. It has two 250GB Seagate Barracudas in a RAID 1, and an external FW
drive for backup (we're working on a better backup solution). It's
connected to the rest of the campus network with a 10/100 ethernet card,
and the rest of the cluster with gigabit. There's a 24-port gigabit
switch that connects the head node, and compute-0-* to each other, with
the exception of compute-0-1, which is on 10/100 so that I could connect
the 10/100 switch that serves fly's old nodes.
compute-0-1 through compute-0-13 are dual 2.4Ghz P4's with 2GB of
RAM each
compute-0-14 through compute-0-23 are dual 3.0Ghz P4's with 4GB of
RAM each
I have disabled hyperthreading, as most people seem to agree that it
degrades performance in most situations common to a cluster, as well as
confusing things slightly.
compute-1-* are 1200Mhz Athlon (Thunderbird, 100Mhz FSB) machines
with 768MB of RAM, with the following exceptions:
compute-1-1 is a 750Mhz Athlon, which will shortly be "upgraded" to
a 1200Mhz machine again
compute-1-13 is a Sempron 3000+ (which means it runs at 1826Mhz or
so, in this case), 512K cache, 1GB RAM
compute-1-16 is a newer (Thoroughbred B, I believe) Athlon, with SSE
instructions, running at 1150Mhz
All the nodes have 2GB of swap, a 10GB / partition, and the rest of
the drive is at /state/partition1 doing nothing.
Please feel free to email with questions, concerns, feature
requests, comments, "Hey, you left this really important thing out!"'s,
etc....
Have a great Memorial Day weekend!
-Josiah
More information about the Clusterusers
mailing list