[Clusterusers] How bad would power outages tomorrow be?

Wm. Josiah Erikson wjerikson at hampshire.edu
Mon May 1 14:26:46 EDT 2017


I honestly don't know, and I don't know if anyone would listen if we
begged them not to or not.

    -Josiah



On 5/1/17 1:59 PM, Lee Spector wrote:
> Ah, okay. For the first one I checked out that I'm interested in (compute-2-6) it doesn't include ups, but that one should finish by this evening. I'll have to poke around a bit and I must do something else for at least a bit. But I'm supposing that this has to be done now...? Or is 4:00 okay?
>
>
>> On May 1, 2017, at 1:58 PM, Wm. Josiah Erikson <wjerikson at hampshire.edu> wrote:
>>
>> Well, it's a bit complicated, since it's NODES that are on UPS, of
>> course. If you click an individual node, there's a thing in the lower
>> pane that says "Service Keys". If that list includes "ups", that node is
>> on UPS. For some reason, the rack4 nodes aren't listing their service
>> keys. They're all on UPS.
>>
>>    -Josiah
>>
>>
>>
>> On 5/1/17 1:48 PM, Lee Spector wrote:
>>> Sure, but can you first remind me how to see which current runs are on UPS?
>>>
>>> Thanks,
>>>
>>> -Lee
>>>
>>>
>>>
>>>> On May 1, 2017, at 1:49 PM, Wm. Josiah Erikson <wjerikson at hampshire.edu> wrote:
>>>>
>>>>   Can I leave this on you? I'm going to do nothing unless you tell me
>>>> you'd like me to send it up the chain. Todd said we would have to bring
>>>> it to the President if we wanted to postpone it.
>>>>
>>>>   -Josiah
>>>>
>>>>
>>>>
>>>> On 5/1/17 10:28 AM, Lee Spector wrote:
>>>>> It might... but I don't remember how to tell which are which. Can you remind me?
>>>>>
>>>>> For the suites of runs, if any of them goes down then it compromises the value of the data, and we'd probably have to re-run the whole suite. (The reason, briefly: Suppose you have a setup that find solutions quickly on half of its runs, and takes a long time to fail, never finding a solution and eventually just hitting the generation limit. A power outage might kill all and only the ones that wouldn't ever have succeeded. If we start them all afresh later, then probably half of those restarted runs will succeed. So it'll look like a 75% success rate when really the success rate is 50%.)
>>>>>
>>>>>
>>>>>
>>>>>> On May 1, 2017, at 10:24 AM, Wm. Josiah Erikson <wjerikson at hampshire.edu> wrote:
>>>>>>
>>>>>> Remember it should only kill runs that aren't on "UPS" nodes. Does that
>>>>>> change anything, or were you already considering that?
>>>>>>
>>>>>>  -Josiah
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 5/1/17 10:18 AM, Lee Spector wrote:
>>>>>>> Well.... it wouldn't be good. Not quite as awful as an interruption over this last weekend or today, since we have a bunch of publication deadlines tonight (actually 7:59am tomorrow) and some of those runs might produce results that will lead to last minute changes. But it will set invalidate some experiments involving suites of runs that have already consumed several days of CPU time on many nodes, and kill a few runs that have been going for a couple of weeks, for which I'm still curious about the outcome.
>>>>>>>
>>>>>>> Tom also has some experiments involving large suites of runs that may have to be re-started if this happens.
>>>>>>>
>>>>>>> That said, solar=good! I don't know how to make the call!
>>>>>>>
>>>>>>> -Lee
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On May 1, 2017, at 9:59 AM, Wm. Josiah Erikson <wjerikson at hampshire.edu> wrote:
>>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> There are a lot of runs going on the cluster right now. In light of
>>>>>>>> that, I just got an announcement that there will be testing going on
>>>>>>>> tomorrow morning to get our solar arrays online, which may cause some
>>>>>>>> power disruptions, which may take some of our nodes offline. How bad is
>>>>>>>> this, and should I get my boss to push back to the President on this?
>>>>>>>> Lee in particular...
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Wm. Josiah Erikson
>>>>>>>> Assistant Director of IT, Infrastructure Group
>>>>>>>> System Administrator, School of CS
>>>>>>>> Hampshire College
>>>>>>>> Amherst, MA 01002
>>>>>>>> (413) 559-6091
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Clusterusers mailing list
>>>>>>>> Clusterusers at lists.hampshire.edu
>>>>>>>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>>>>>> --
>>>>>>> Lee Spector, Professor of Computer Science
>>>>>>> Director, Institute for Computational Intelligence
>>>>>>> Hampshire College, Amherst, Massachusetts, USA
>>>>>>> lspector at hampshire.edu, http://hampshire.edu/lspector/, 413-559-5352
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Clusterusers mailing list
>>>>>>> Clusterusers at lists.hampshire.edu
>>>>>>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>>>>> -- 
>>>>>> Wm. Josiah Erikson
>>>>>> Assistant Director of IT, Infrastructure Group
>>>>>> System Administrator, School of CS
>>>>>> Hampshire College
>>>>>> Amherst, MA 01002
>>>>>> (413) 559-6091
>>>>>>
>>>>>> _______________________________________________
>>>>>> Clusterusers mailing list
>>>>>> Clusterusers at lists.hampshire.edu
>>>>>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>>>> --
>>>>> Lee Spector, Professor of Computer Science
>>>>> Director, Institute for Computational Intelligence
>>>>> Hampshire College, Amherst, Massachusetts, USA
>>>>> lspector at hampshire.edu, http://hampshire.edu/lspector/, 413-559-5352
>>>>>
>>>>> _______________________________________________
>>>>> Clusterusers mailing list
>>>>> Clusterusers at lists.hampshire.edu
>>>>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>>> -- 
>>>> Wm. Josiah Erikson
>>>> Assistant Director of IT, Infrastructure Group
>>>> System Administrator, School of CS
>>>> Hampshire College
>>>> Amherst, MA 01002
>>>> (413) 559-6091
>>>>
>>>> _______________________________________________
>>>> Clusterusers mailing list
>>>> Clusterusers at lists.hampshire.edu
>>>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>> --
>>> Lee Spector, Professor of Computer Science
>>> Director, Institute for Computational Intelligence
>>> Hampshire College, Amherst, Massachusetts, USA
>>> lspector at hampshire.edu, http://hampshire.edu/lspector/, 413-559-5352
>>>
>>> _______________________________________________
>>> Clusterusers mailing list
>>> Clusterusers at lists.hampshire.edu
>>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>> -- 
>> Wm. Josiah Erikson
>> Assistant Director of IT, Infrastructure Group
>> System Administrator, School of CS
>> Hampshire College
>> Amherst, MA 01002
>> (413) 559-6091
>>
>> _______________________________________________
>> Clusterusers mailing list
>> Clusterusers at lists.hampshire.edu
>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
> --
> Lee Spector, Professor of Computer Science
> Director, Institute for Computational Intelligence
> Hampshire College, Amherst, Massachusetts, USA
> lspector at hampshire.edu, http://hampshire.edu/lspector/, 413-559-5352
>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> https://lists.hampshire.edu/mailman/listinfo/clusterusers

-- 
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
(413) 559-6091



More information about the Clusterusers mailing list