[Clusterusers] Changes to cluster for more fair usage

Wm. Josiah Erikson wjerikson at hampshire.edu
Fri Jun 28 14:52:21 EDT 2013


So, I've made the following changes:

1. I've removed the "tom" tag from compute-4-2 through compute-4-4, 
since he cannot use it as well as others can, and this will leave a 
large number of slots always open for shorter-running tasks
2. I've increased the number of slots on compute-4-2 through compute-4-5 
to half the number of cores they have. I tried going higher, but weird 
things started happening with thread creation (I had to restart one of 
your jobs because of it, Jaime - if you were wondering why that 
happened, that's why). This will also enable Jaime to launch 100 
concurrent tasks, which is best for his runs. Bassam should also, at the 
same time, be able to take up one slot per node, which should provide 
maximum usage of Rack 4. Blender is currently set to only launch one 
task per blade, and we will probably keep it that way.
3. I've changed our dispatching scheme to:

"P+ATCL" -- Active Task Count Leveling, this mode also first
  # sorts jobs by strict priority, then within a group of jobs with
  # the same priority it prefers to assign available blades to jobs
  # with the fewest active tasks.  This mode tends to allocate roughly
  # the same number of blades to each job, while favoring older jobs
  # over newly spooled ones.  Given roughly equal numbers of blades,
  # jobs with short-running tasks will finish sooner than jobs with
  # long-running tasks, under this scheme.

I was wrong, by the way - we were using P+RR before.

Once Tom's jobs that are currently on Rack 4 finish, we should start 
seeing more fair usage, I think.

We may need to tweak this again if/when another use-case comes along.

All the best,

-- 
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
(413) 559-6091



More information about the Clusterusers mailing list