[Clusterusers] fly - "New" cluster details

Fri May 26 10:49:22 EDT 2006

Hello all,
    This is a very long email telling you about the setup details of the 
new cluster. I will try to keep the most important information at the 
top, and put the less and less interesting and more and more geeky stuff 
further and further towards the bottom.
    If you are receiving this email directly, instead of just through 
the clusterusers list, that means that I have activated your account, 
and you should be able to log in to fly.hampshire.edu (directly, from 
outside or wherever) with your old hex account info. It will then ask 
you to generate a public/private key pair - I would answer defaults, and 
leave the passphrase empty if you don't want to be asked for a password 
every time you SSH into a node.
    If you want your old account activated and you didn't get this email 
directly (check the To: list), just ask and you shall receive. My 
intention was to activate all the accounts of people I'd heard from 
recently, but I probably forgot somebody. I'm trying to prune unused 
accounts, not trying to shut anybody out who wants to use the cluster. 
Anything that was in your old hex homedir is now in your new fly homedir 
- I moved them over wholesale. This also means that your .bashrc and 
.bash_profile stuff got moved over as well - you may prefer to grab the 
default stuff from /etc/skel, as it might have broken some things.
    OK, now for some details on the software setup of the cluster, 
followed by some hardware details:

    The home directories are cross-mounted to every node, of course, as 
is /share/apps.

    The compute nodes from hex are named compute-0-1 through compute-0-23
    The compute nodes from old fly are named compute-1-1 through 
compute-1-16

    gcc and gcc4 are both installed, breve will be installed soon (in 
/share/apps/breve/<version>), cmucl is installed (in /share/apps), Maya 
(for libraries) and the Pixar software is installed locally on every 
node, with the appropriately optimized version of RenderManProServer. If 
prman, etc, are not in your path, you can fix it by grabbing the default 
.bashrc and .bash_profile from /etc/skel, as mentioned previously 
(assuming bash is your shell).
    All of the default ROCKS stuff is installed as well, though I only 
installed the first two OS CD's. fly is ROCKS 4.1, which is based on 
CentOS 4.2, which is based on RHEL 4... check 
http://www.rocksclusters.org for details and documentation (yay 
well-documented, open source software!)
    If there is anything that you want installed that isn't, just ask. 
Most of the usual GNU tools are present.

    Hardware details, for the morbidly curious (you can check 
http://fly.hampshire.edu/ganglia for even more details, but here's a 
quick rundown):

    We have 40 machines total, 62 processors, and close to 100GB of RAM 
in this cluster.

    The head node is hex's old head node, a dual-P4 2.4Ghz with 4GB of 
RAM. It has two 250GB Seagate Barracudas in a RAID 1, and an external FW 
drive for backup (we're working on a better backup solution). It's 
connected to the rest of the campus network with a 10/100 ethernet card, 
and the rest of the cluster with gigabit. There's a 24-port gigabit 
switch that connects the head node, and compute-0-* to each other, with 
the exception of compute-0-1, which is on 10/100 so that I could connect 
the 10/100 switch that serves fly's old nodes.

    compute-0-1 through compute-0-13 are dual 2.4Ghz P4's with 2GB of 
RAM each
    compute-0-14 through compute-0-23 are dual 3.0Ghz P4's with 4GB of 
RAM each

    I have disabled hyperthreading, as most people seem to agree that it 
degrades performance in most situations common to a cluster, as well as 
confusing things slightly.

    compute-1-* are 1200Mhz Athlon (Thunderbird, 100Mhz FSB) machines 
with 768MB of RAM, with the following exceptions:

    compute-1-1 is a 750Mhz Athlon, which will shortly be "upgraded" to 
a 1200Mhz machine again
    compute-1-13 is a Sempron 3000+ (which means it runs at 1826Mhz or 
so, in this case), 512K cache, 1GB RAM
    compute-1-16 is a newer (Thoroughbred B, I believe) Athlon, with SSE 
instructions, running at 1150Mhz

    All the nodes have 2GB of swap, a 10GB / partition, and the rest of 
the drive is at /state/partition1 doing nothing.

    Please feel free to email with questions, concerns, feature 
requests, comments, "Hey, you left this really important thing out!"'s, 
etc....

    Have a great Memorial Day weekend!

    -Josiah