[Clusterusers] More keg rebuilding

Thomas Helmuth thelmuth at cs.umass.edu
Wed Jan 15 11:10:17 EST 2014


Everything looks good on my end so far! I'll let you know if I have any
troubles.

-Tom


On Wed, Jan 15, 2014 at 11:08 AM, Wm. Josiah Erikson <
wjerikson at hampshire.edu> wrote:

>  OK, first keg reboot done and was a success. It doesn't look like
> anything was interrupted or crashed - at least, not that I've noticed yet :)
>     -Josiah
>
>
> On 1/15/14 11:01 AM, Thomas Helmuth wrote:
>
>  Hi Josiah,
>
>  I'm now ready for the keg reboot whenever you are. It would "nice" to
> have the last few runs finish, but definitely not necessary if the reboot
> kills them. Let me know when its done, since I have some more runs I want
> to start.
>
>  Thanks,
> Tom
>
>
> On Tue, Jan 14, 2014 at 8:55 PM, Thomas Helmuth <thelmuth at cs.umass.edu>wrote:
>
>>  Hi Josiah,
>>
>> I'm hoping most of my runs should be done by tomorrow, either early or
>> late morning. Even if a few are still going, they aren't as critical as
>> ones that finished today. So, I think it would be a great time to reboot
>> keg if it works for you. I'll take a look at what's still running in the
>> morning and will be in touch.
>>
>>  -Tom
>>
>>
>> On Mon, Jan 13, 2014 at 3:28 PM, Wm. Josiah Erikson <
>> wjerikson at hampshire.edu> wrote:
>>
>>>  OK, let me know. What I'm doing is long-term maintenance, not
>>> time-sensitive research :)
>>>     -Josiah
>>>
>>>
>>> On 1/13/14 3:24 PM, Thomas Helmuth wrote:
>>>
>>>  Well, that would certainly work. Currently, I'd really like all of my
>>> green runs to finish, which I'm hoping will be within a day or two, but
>>> after that it might actually be a nice break point to do the reboot then.
>>> We'll see if it works out well to reboot sometime tomorrow or Wednesday.
>>>
>>>  -Tom
>>>
>>>
>>> On Mon, Jan 13, 2014 at 3:19 PM, Wm. Josiah Erikson <
>>> wjerikson at hampshire.edu> wrote:
>>>
>>>>  You know, I could just wait on all of this until February, too, if
>>>> you're feeling the crunch. Would that be better?
>>>>
>>>>     -Josiah
>>>>
>>>>
>>>> On 1/13/14 9:27 AM, Thomas Helmuth wrote:
>>>>
>>>>  Sure. I doubt the runs I have currently will be done by the end of
>>>> the day. The most important are the Pagie runs, the others can be paused
>>>> until after the keg reboot. I'm guessing they'll be done sometime tomorrow,
>>>> but its hard to say.
>>>>
>>>>  -Tom
>>>>
>>>>
>>>> On Mon, Jan 13, 2014 at 9:24 AM, Wm. Josiah Erikson <
>>>> wjerikson at hampshire.edu> wrote:
>>>>
>>>>>  Keg is the license server for the Pixar stuff, so if keg goes down
>>>>> for too long, the runs sometimes crash due to license checkout failures for
>>>>> tractor.
>>>>>
>>>>> We should coordinate about rebooting keg - maybe when your current
>>>>> runs finish, before you start new ones? Hopefully at the end of the day or
>>>>> something.
>>>>>
>>>>>     -Josiah
>>>>>
>>>>>
>>>>>
>>>>> On 1/13/14 9:17 AM, Thomas Helmuth wrote:
>>>>>
>>>>>  Hi Josiah,
>>>>>
>>>>>  I don't really know what keg is (maybe the HDDs for fly?), but I'll
>>>>> assume you have everything under control. I do have a paper deadline coming
>>>>> up at the end of January, and the runs currently going are very important
>>>>> for it, so I guess be extra careful that runs/data aren't lost.
>>>>>
>>>>>  -Tom
>>>>>
>>>>>
>>>>> On Mon, Jan 13, 2014 at 9:05 AM, Wm. Josiah Erikson <
>>>>> wjerikson at hampshire.edu> wrote:
>>>>>
>>>>>> Hi guys,
>>>>>>     So I'm going to try again with resizing keg's RAID array. I am
>>>>>> going to be a little more careful this time. The first thing I'm going to
>>>>>> do, today, is to remove the fourth drive from the RAID array, rebuild its
>>>>>> partition table with GPT, which can handle larger than 2TB partitions, and
>>>>>> then re-add it to the array and let it rebuild. Then I'm going to reboot
>>>>>> keg and make sure everything comes back up fine. I will wait to reboot,
>>>>>> though, until Tom's jobs are done, if that would be helpful. It looks like
>>>>>> they are relatively short-running jobs, Tom (I mean, for you... heh)?
>>>>>>     I'll ping you again before I reboot it, which "shouldn't" hose
>>>>>> your jobs, but things haven't gone as planned twice recently, so I'm less
>>>>>> confident about that :)
>>>>>>     Of course, now that I'm saying that, everything will go as
>>>>>> planned. It's like bringing an umbrella to ensure it doesn't rain, probably
>>>>>> :)
>>>>>>
>>>>>> --
>>>>>> Wm. Josiah Erikson
>>>>>> Assistant Director of IT, Infrastructure Group
>>>>>> System Administrator, School of CS
>>>>>> Hampshire College
>>>>>> Amherst, MA 01002
>>>>>> (413) 559-6091 <%28413%29%20559-6091>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Clusterusers mailing list
>>>>>> Clusterusers at lists.hampshire.edu
>>>>>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Clusterusers mailing listClusterusers at lists.hampshire.eduhttps://lists.hampshire.edu/mailman/listinfo/clusterusers
>>>>>
>>>>>
>>>>> --
>>>>> Wm. Josiah Erikson
>>>>> Assistant Director of IT, Infrastructure Group
>>>>> System Administrator, School of CS
>>>>> Hampshire College
>>>>> Amherst, MA 01002(413) 559-6091
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Clusterusers mailing list
>>>>> Clusterusers at lists.hampshire.edu
>>>>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>>>>
>>>>>
>>>>
>>>> --
>>>> Wm. Josiah Erikson
>>>> Assistant Director of IT, Infrastructure Group
>>>> System Administrator, School of CS
>>>> Hampshire College
>>>> Amherst, MA 01002(413) 559-6091
>>>>
>>>>
>>>
>>> --
>>> Wm. Josiah Erikson
>>> Assistant Director of IT, Infrastructure Group
>>> System Administrator, School of CS
>>> Hampshire College
>>> Amherst, MA 01002(413) 559-6091
>>>
>>>
>>
>
> --
> Wm. Josiah Erikson
> Assistant Director of IT, Infrastructure Group
> System Administrator, School of CS
> Hampshire College
> Amherst, MA 01002(413) 559-6091
>
>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.hampshire.edu/pipermail/clusterusers/attachments/20140115/ff5606e9/attachment-0001.html>


More information about the Clusterusers mailing list