[Clusterusers] fly's hardware problem

Jaime J Davila jjdCCS at hampshire.edu
Tue Dec 22 11:12:06 EST 2015


Hi Josiah,

Thank you so much for asking, and for the time you're dedicating to this. I actually have two paper deadlines coming up, one on January 15 and another one on February 3. I have some interesting results right now, but need to run more experiments to confirm and expand on stuff. While life happens, and I'll deal with whatever comes up, I could really use cpu time right now. 

Thanks again,

Jaime

On December 22, 2015 10:13:50 AM EST, "Wm. Josiah Erikson" <wjerikson at hampshire.edu> wrote:
>Hey folks,
>    I'm trying to get an idea of how much people were planning on using
>the cluster over the holidays, i.e. how much effort I should put into
>this. Are there folks who were planning to do important work over the
>break whose plans will be totally screwed up if it's not back until
>after the new year?
>    I will come in tomorrow and probably be able to get it back up
>anyway, just want to know what kind of priority to give it in case of
>mishap.
>    Thanks and happy holidays!
>    -Josiah
>
>
>On 12/22/15, 8:47 AM, Wm. Josiah Erikson wrote:
>> I got some time this morning and came in on a vacation day...
>>
>> After some mucking around with BIOS and CPU's and memory config,
>things
>> deteriorating quickly...
>>
>> Well, the machine won't even post now, so I'll have to move
>everything
>> over to a new machine and get a new license file from Pixar. I'll try
>to
>> get this done tomorrow morning, but this may mean that the cluster
>will
>> be down until January if everything doesn't go smoothly.
>>
>> Apologies!
>>     -Josiah
>>
>>
>> On 12/21/15 9:28 PM, Wm. Josiah Erikson wrote:
>>> ...or not. It appears to be happening again. I'm out tomorrow, but I
>>> should be in briefly on Wednesday and will make another attempt.
>>>     -Josiah
>>>
>>>
>>> On 12/21/15, 11:04 AM, Wm. Josiah Erikson wrote:
>>>> One of the power supplies was probably bad (dim/flickering lights,
>and
>>>> Google searches had other people experiencing this problem with a
>bad
>>>> power supply). I replaced it with a spare I had sitting around
>(thanks
>>>> Mt. Holyoke)! and hopefully that will be the end of that. Of
>course, it
>>>> could be something else too :)
>>>> For now, fly's back up. Hopefully for good, but no guarantees.
>>>>     -Josiah
>>>>
>>>>
>>>> On 12/21/15 10:35 AM, Wm. Josiah Erikson wrote:
>>>>> Well, it just kernel panicked. Neat. That's new. So it's down now
>until
>>>>> I get it back up :) I'll keep you updated...
>>>>>     -Josiah
>>>>>
>>>>>
>>>>> On 12/21/15 10:08 AM, Wm. Josiah Erikson wrote:
>>>>>> Hello all,
>>>>>>     I rebooted fly again - it is not fixed yet, though it is up
>and
>>>>>> running currently. I will probably take it down later today to
>try
>>>>>> removing one of the CPU's to see if it is the problem. This
>should
>>>>>> result in less than 10 minutes of downtime, and jobs should
>resume where
>>>>>> they left off, so no need to hold off launching renders or
>whatever. My
>>>>>> current theory is that CPU #2 is faulty. It could also be a bad
>power
>>>>>> supply. I'll keep you updated.
>>>>>>
>
>-- 
>-----
>Wm. Josiah Erikson
>Head, Systems and Networking
>Hampshire College
>Amherst, MA 01002
>
>_______________________________________________
>Clusterusers mailing list
>Clusterusers at lists.hampshire.edu
>https://lists.hampshire.edu/mailman/listinfo/clusterusers

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.hampshire.edu/pipermail/clusterusers/attachments/20151222/00ea24c1/attachment-0001.html>


More information about the Clusterusers mailing list