[Clusterusers] fly's hardware problem

Wm. Josiah Erikson wjerikson at hampshire.edu
Tue Dec 22 08:47:02 EST 2015


I got some time this morning and came in on a vacation day...

After some mucking around with BIOS and CPU's and memory config, things
deteriorating quickly...

Well, the machine won't even post now, so I'll have to move everything
over to a new machine and get a new license file from Pixar. I'll try to
get this done tomorrow morning, but this may mean that the cluster will
be down until January if everything doesn't go smoothly.

Apologies!
    -Josiah


On 12/21/15 9:28 PM, Wm. Josiah Erikson wrote:
> ...or not. It appears to be happening again. I'm out tomorrow, but I
> should be in briefly on Wednesday and will make another attempt.
>     -Josiah
>
>
> On 12/21/15, 11:04 AM, Wm. Josiah Erikson wrote:
>> One of the power supplies was probably bad (dim/flickering lights, and
>> Google searches had other people experiencing this problem with a bad
>> power supply). I replaced it with a spare I had sitting around (thanks
>> Mt. Holyoke)! and hopefully that will be the end of that. Of course, it
>> could be something else too :)
>> For now, fly's back up. Hopefully for good, but no guarantees.
>>     -Josiah
>>
>>
>> On 12/21/15 10:35 AM, Wm. Josiah Erikson wrote:
>>> Well, it just kernel panicked. Neat. That's new. So it's down now until
>>> I get it back up :) I'll keep you updated...
>>>     -Josiah
>>>
>>>
>>> On 12/21/15 10:08 AM, Wm. Josiah Erikson wrote:
>>>> Hello all,
>>>>     I rebooted fly again - it is not fixed yet, though it is up and
>>>> running currently. I will probably take it down later today to try
>>>> removing one of the CPU's to see if it is the problem. This should
>>>> result in less than 10 minutes of downtime, and jobs should resume where
>>>> they left off, so no need to hold off launching renders or whatever. My
>>>> current theory is that CPU #2 is faulty. It could also be a bad power
>>>> supply. I'll keep you updated.
>>>>

-- 
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
(413) 559-6091



More information about the Clusterusers mailing list