[Clusterusers] ...and another
Lee Spector
lspector at hampshire.edu
Thu Jul 28 13:03:06 EDT 2016
I like the "ups" tag as well. Our runs are long (days or weeks), and losing them can sometimes be much worse even than losing the ones that actually die, because even if we re-run those, the stats for the entire experiment are messed up.
(Short story: even if the outage had nothing to do with the run itself, the fact that longer-running runs are more likely to get hit by random outages means that longer-running runs are more likely to be restarted, which, if there is a correlation between longer-runningness and eventual failure -- which there probably is -- means that restarting those runs will bias averages toward success.)
So if we can target things only toward nodes that are protected then that would be helpful.
-Lee
> On Jul 28, 2016, at 10:49 AM, Bassam Kurdali <bassam at urchn.org> wrote:
>
> That makes sense! my renders are relatively short (duration), so I'd
> be fine with the occasional outage.
>
> On Thu, 2016-07-28 at 09:43 -0400, Wm. Josiah Erikson wrote:
>> ...or I could add another service tag, "ups", and runs that need to
>> be
>> able to run for long periods of time could be submitted with that
>> tag. I
>> think that's a better solution. I'll do that, and let you all know
>> when
>> it's done.
>>
>> -Josiah
>>
>>
>>
>> On 7/28/16 9:42 AM, Wm. Josiah Erikson wrote:
>>>
>>> I hear from the power company that we can expect regular brownouts
>>> until
>>> this heat wave passes... should I just leave the cluster as-is in
>>> that
>>> case (the ones that are UPS protected) and just turn the rest off
>>> so
>>> that runs can actually finish?
>>>
>>>
>>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> https://lists.hampshire.edu/mailman/listinfo/clusterusers
--
Lee Spector, Professor of Computer Science
Director, Institute for Computational Intelligence
Hampshire College, Amherst, Massachusetts, USA
lspector at hampshire.edu, http://hampshire.edu/lspector/, 413-559-5352
More information about the Clusterusers
mailing list