[Clusterusers] A Little More Cluster Time

Thomas Helmuth thelmuth at cs.umass.edu
Mon Jan 27 09:36:40 EST 2014


I'm done with my heavy use of the cluster. Thanks for the extra processing
time. Now back to your regularly scheduled programming!

-Tom


On Fri, Jan 24, 2014 at 11:24 PM, Wm. Josiah Erikson <
wjerikson at hampshire.edu> wrote:

>  Everything looks like it's working as planned - the nodes that Tom
> doesn't use are the only ones taking the render jobs... and they're really
> good at render jobs, so even though only 1 or 2 are running at once for any
> given job, they're moving through quickly. I increased the BladeMax for
> Maya to 3, which seems to be working fine, too.
>
> I added 4 more nodes today, and will add another 4 on Monday, so that
> should help as well.
>
> On that note, I was looking over the logs for the past year, and noticed
> this rather astonishing fact:
>
> Over the past year, we have more than doubled both the amount of RAM and
> the number of processors in the cluster. Total cost? Under $10K. Thanks,
> eBay (and whoever bought all those C6100's and then unloaded them for so
> cheap to all of those eBay sellers)
>
> On Monday, we will surpass 1TB of RAM in the cluster, and pass 700 CPU
> "cores" (Nehalems count as two cores per actual core, since they have two
> execution units - kinda cheating). Neat. Not bad for a $5K/year budget.
>
> :)
>
>     -Josiah
>
>
>
> On 1/24/14 10:50 PM, Chris Perry wrote:
>
>
>  For future reference:
> In the web interface, click on your job to select it. The pane below the
> job list will show various job features. If you click on the priority # you
> can just type in a new one!
>
>  - chris
>
>
>  On Jan 24, 2014, at 10:48 PM, Thomas Helmuth wrote:
>
>   Hi Chris,
>
>  That is perfect, thanks! I'm not sure how to change the priority of my
> current runs, but that doesn't seem to be an issue as they seem to be off
> and running now that you lowered your priority. I don't expect to need to
> start any new jobs before the deadline, but if I do I'll make sure to spool
> them at priority 121. Thanks for being flexible!
>
>  -Tom
>
>
> On Fri, Jan 24, 2014 at 10:00 PM, Chris Perry <perry at hampshire.edu> wrote:
>
>>
>> It seems that there's an easy solution which is to just lower the
>> priority of our jobs (or raise the priority of your jobs). This should mean
>> that whenever a task finishes on our jobs, tractor will give your
>> higher-priority tasks the attention they need first. And as long as your
>> higher-priority job is running, it will get the procs before the
>> lower-priority jobs. Will this work okay for the time being?
>>
>> I just lowered the priority for the tube and lilyd3 jobs that are
>> running. I also did 'retry running tasks' on lilyd3 which killed the
>> running renders and respooled them, which will automatically send them only
>> to the nodes that your jobs are not running on given the lower priority.
>>  As Bassam's frames finish (they seem pretty fast and the already-running
>> ones should be done within minutes), new ones will only spool behind your
>> jobs.
>>
>> Tom - moving forward during this crunch period, you should just go ahead
>> and spool at priority 121. This will cause your jobs to run at an even
>> higher priority than our single-frame renders which means that your jobs
>> will always get priority on the procs you can run on, and ours will receive
>> whatever's left.
>>
>> Anyone worried about this approach?  Seems to be exactly what the
>> priority system is built for.
>>
>> - chris
>>
>>
>> On Jan 24, 2014, at 9:29 PM, Thomas Helmuth wrote:
>>
>> > Hi fly cluster users,
>> >
>> > Lee and I have a paper deadline coming up on January 29th, and I am
>> hoping to get a few sets of runs before the deadline. If it isn't too much
>> of an inconvenience, I was wondering if it would be possible to pause other
>> task launches until all of my runs have started.
>> >
>> > If it helps, I never use certain nodes because I have had problems with
>> crashed runs on them, so you'd be able to still use those. I'm not sure
>> exactly which ones I don't use -- Josiah has set up the service tag "tom"
>> for them. I'm sure I'm not using any asheclass nodes, and I believe a few
>> nodes on racks 1 and 4. Maybe Josiah could even set up a service tag that
>> includes the nodes I don't use? I'm not sure if this would be helpful for
>> already spooled jobs, but could with future jobs.
>> >
>> > Thanks, and I'll let you know when all my runs have nodes!
>> > Tom
>>  > _______________________________________________
>> > Clusterusers mailing list
>> > Clusterusers at lists.hampshire.edu
>> > https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>
>> _______________________________________________
>> Clusterusers mailing list
>> Clusterusers at lists.hampshire.edu
>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>
>
>  _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>
>
>
>
> _______________________________________________
> Clusterusers mailing listClusterusers at lists.hampshire.eduhttps://lists.hampshire.edu/mailman/listinfo/clusterusers
>
>
> --
> -----
> Wm. Josiah Erikson
> Head, Systems and Networking
> Hampshire College
> Amherst, MA 01002
>
>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.hampshire.edu/pipermail/clusterusers/attachments/20140127/f69aa8d8/attachment.html>


More information about the Clusterusers mailing list