[Clusterusers] A Little More Cluster Time

Wm. Josiah Erikson wjerikson at hampshire.edu
Fri Jan 24 23:24:18 EST 2014


Everything looks like it's working as planned - the nodes that Tom 
doesn't use are the only ones taking the render jobs... and they're 
really good at render jobs, so even though only 1 or 2 are running at 
once for any given job, they're moving through quickly. I increased the 
BladeMax for Maya to 3, which seems to be working fine, too.

I added 4 more nodes today, and will add another 4 on Monday, so that 
should help as well.

On that note, I was looking over the logs for the past year, and noticed 
this rather astonishing fact:

Over the past year, we have more than doubled both the amount of RAM and 
the number of processors in the cluster. Total cost? Under $10K. Thanks, 
eBay (and whoever bought all those C6100's and then unloaded them for so 
cheap to all of those eBay sellers)

On Monday, we will surpass 1TB of RAM in the cluster, and pass 700 CPU 
"cores" (Nehalems count as two cores per actual core, since they have 
two execution units - kinda cheating). Neat. Not bad for a $5K/year budget.

:)

     -Josiah


On 1/24/14 10:50 PM, Chris Perry wrote:
>
> For future reference:
> In the web interface, click on your job to select it. The pane below 
> the job list will show various job features. If you click on the 
> priority # you can just type in a new one!
>
> - chris
>
>
> On Jan 24, 2014, at 10:48 PM, Thomas Helmuth wrote:
>
>> Hi Chris,
>>
>> That is perfect, thanks! I'm not sure how to change the priority of 
>> my current runs, but that doesn't seem to be an issue as they seem to 
>> be off and running now that you lowered your priority. I don't expect 
>> to need to start any new jobs before the deadline, but if I do I'll 
>> make sure to spool them at priority 121. Thanks for being flexible!
>>
>> -Tom
>>
>>
>> On Fri, Jan 24, 2014 at 10:00 PM, Chris Perry <perry at hampshire.edu 
>> <mailto:perry at hampshire.edu>> wrote:
>>
>>
>>     It seems that there's an easy solution which is to just lower the
>>     priority of our jobs (or raise the priority of your jobs). This
>>     should mean that whenever a task finishes on our jobs, tractor
>>     will give your higher-priority tasks the attention they need
>>     first. And as long as your higher-priority job is running, it
>>     will get the procs before the lower-priority jobs. Will this work
>>     okay for the time being?
>>
>>     I just lowered the priority for the tube and lilyd3 jobs that are
>>     running. I also did 'retry running tasks' on lilyd3 which killed
>>     the running renders and respooled them, which will automatically
>>     send them only to the nodes that your jobs are not running on
>>     given the lower priority.  As Bassam's frames finish (they seem
>>     pretty fast and the already-running ones should be done within
>>     minutes), new ones will only spool behind your jobs.
>>
>>     Tom - moving forward during this crunch period, you should just
>>     go ahead and spool at priority 121. This will cause your jobs to
>>     run at an even higher priority than our single-frame renders
>>     which means that your jobs will always get priority on the procs
>>     you can run on, and ours will receive whatever's left.
>>
>>     Anyone worried about this approach?  Seems to be exactly what the
>>     priority system is built for.
>>
>>     - chris
>>
>>
>>     On Jan 24, 2014, at 9:29 PM, Thomas Helmuth wrote:
>>
>>     > Hi fly cluster users,
>>     >
>>     > Lee and I have a paper deadline coming up on January 29th, and
>>     I am hoping to get a few sets of runs before the deadline. If it
>>     isn't too much of an inconvenience, I was wondering if it would
>>     be possible to pause other task launches until all of my runs
>>     have started.
>>     >
>>     > If it helps, I never use certain nodes because I have had
>>     problems with crashed runs on them, so you'd be able to still use
>>     those. I'm not sure exactly which ones I don't use -- Josiah has
>>     set up the service tag "tom" for them. I'm sure I'm not using any
>>     asheclass nodes, and I believe a few nodes on racks 1 and 4.
>>     Maybe Josiah could even set up a service tag that includes the
>>     nodes I don't use? I'm not sure if this would be helpful for
>>     already spooled jobs, but could with future jobs.
>>     >
>>     > Thanks, and I'll let you know when all my runs have nodes!
>>     > Tom
>>     > _______________________________________________
>>     > Clusterusers mailing list
>>     > Clusterusers at lists.hampshire.edu
>>     <mailto:Clusterusers at lists.hampshire.edu>
>>     > https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>
>>     _______________________________________________
>>     Clusterusers mailing list
>>     Clusterusers at lists.hampshire.edu
>>     <mailto:Clusterusers at lists.hampshire.edu>
>>     https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>
>>
>> _______________________________________________
>> Clusterusers mailing list
>> Clusterusers at lists.hampshire.edu 
>> <mailto:Clusterusers at lists.hampshire.edu>
>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>
>
>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> https://lists.hampshire.edu/mailman/listinfo/clusterusers

-- 
-----
Wm. Josiah Erikson
Head, Systems and Networking
Hampshire College
Amherst, MA 01002

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.hampshire.edu/pipermail/clusterusers/attachments/20140124/aa611c61/attachment.html>


More information about the Clusterusers mailing list