<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
-----BEGIN PGP SIGNED MESSAGE-----<br>
Hash: SHA512<br>
<br>
Quite so - thanks for pointing that out. Fixed!<br>
-Josiah<br>
<br>
<br>
On 12/23/15 10:02 AM, Jaime Davila wrote:<br>
<span style="white-space: pre;">> Thanks Josiah, much
appreciated.<br>
><br>
> BTW, it looks like the clock on fly is out of sync. I don't
think it's<br>
> an issue, I just mention it in case that affects tractor in
some way. My<br>
> jobs get listed initially as having been running for 4+
hours, and then<br>
> their run time gets corrected when they finish. They seem to
be running<br>
> fine, but I don't know if internally tractor looks at those
times for<br>
> anything like load balancing, etc. As far as I can tell my
processes are<br>
> not being affected by any of it.<br>
><br>
> Thanks again,<br>
><br>
> Jaime<br>
><br>
><br>
> On 12/23/2015 06:41 AM, Wm. Josiah Erikson wrote:<br>
>> Hi all,<br>
>> Thanks for all the feedback and thanks. As far as I
can tell,<br>
>> everything went well. I swapped the disks, controller,
memory, power<br>
>> supplies, and extra dual-interface network card into a
new machine, and<br>
>> it came up with a few tweaks. Luckily eth0 was on the
card I swapped<br>
>> over, so no need for a new license file!<br>
>> Hopefully the problem was with the motherboard or
CPUs on the old<br>
>> machine, otherwise I just swapped the problem over...
only time will tell.<br>
>> Enjoy, and tell me if you have any issues.<br>
>> All the best,<br>
>> -Josiah<br>
>><br>
>><br>
>> On 12/22/15 10:13 AM, Wm. Josiah Erikson wrote:<br>
>>> Hey folks,<br>
>>> I'm trying to get an idea of how much people were
planning on using<br>
>>> the cluster over the holidays, i.e. how much effort I
should put into<br>
>>> this. Are there folks who were planning to do
important work over the<br>
>>> break whose plans will be totally screwed up if it's
not back until<br>
>>> after the new year?<br>
>>> I will come in tomorrow and probably be able to
get it back up<br>
>>> anyway, just want to know what kind of priority to
give it in case of<br>
>>> mishap.<br>
>>> Thanks and happy holidays!<br>
>>> -Josiah<br>
>>><br>
>>><br>
>>> On 12/22/15, 8:47 AM, Wm. Josiah Erikson wrote:<br>
>>>> I got some time this morning and came in on a
vacation day...<br>
>>>><br>
>>>> After some mucking around with BIOS and CPU's and
memory config, things<br>
>>>> deteriorating quickly...<br>
>>>><br>
>>>> Well, the machine won't even post now, so I'll
have to move everything<br>
>>>> over to a new machine and get a new license file
from Pixar. I'll try to<br>
>>>> get this done tomorrow morning, but this may mean
that the cluster will<br>
>>>> be down until January if everything doesn't go
smoothly.<br>
>>>><br>
>>>> Apologies!<br>
>>>> -Josiah<br>
>>>><br>
>>>><br>
>>>> On 12/21/15 9:28 PM, Wm. Josiah Erikson wrote:<br>
>>>>> ...or not. It appears to be happening again.
I'm out tomorrow, but I<br>
>>>>> should be in briefly on Wednesday and will
make another attempt.<br>
>>>>> -Josiah<br>
>>>>><br>
>>>>><br>
>>>>> On 12/21/15, 11:04 AM, Wm. Josiah Erikson
wrote:<br>
>>>>>> One of the power supplies was probably
bad (dim/flickering lights, and<br>
>>>>>> Google searches had other people
experiencing this problem with a bad<br>
>>>>>> power supply). I replaced it with a spare
I had sitting around (thanks<br>
>>>>>> Mt. Holyoke)! and hopefully that will be
the end of that. Of course, it<br>
>>>>>> could be something else too :)<br>
>>>>>> For now, fly's back up. Hopefully for
good, but no guarantees.<br>
>>>>>> -Josiah<br>
>>>>>><br>
>>>>>><br>
>>>>>> On 12/21/15 10:35 AM, Wm. Josiah Erikson
wrote:<br>
>>>>>>> Well, it just kernel panicked. Neat.
That's new. So it's down now until<br>
>>>>>>> I get it back up :) I'll keep you
updated...<br>
>>>>>>> -Josiah<br>
>>>>>>><br>
>>>>>>><br>
>>>>>>> On 12/21/15 10:08 AM, Wm. Josiah
Erikson wrote:<br>
>>>>>>>> Hello all,<br>
>>>>>>>> I rebooted fly again - it is
not fixed yet, though it is up and<br>
>>>>>>>> running currently. I will
probably take it down later today to try<br>
>>>>>>>> removing one of the CPU's to see
if it is the problem. This should<br>
>>>>>>>> result in less than 10 minutes of
downtime, and jobs should resume where<br>
>>>>>>>> they left off, so no need to hold
off launching renders or whatever. My<br>
>>>>>>>> current theory is that CPU #2 is
faulty. It could also be a bad power<br>
>>>>>>>> supply. I'll keep you updated.<br>
>>>>>>>><br>
>><br>
><br>
><br>
><br>
> _______________________________________________<br>
> Clusterusers mailing list<br>
> <a class="moz-txt-link-abbreviated" href="mailto:Clusterusers@lists.hampshire.edu">Clusterusers@lists.hampshire.edu</a><br>
> <a class="moz-txt-link-freetext" href="https://lists.hampshire.edu/mailman/listinfo/clusterusers">https://lists.hampshire.edu/mailman/listinfo/clusterusers</a></span><br>
<br>
- -- <br>
Wm. Josiah Erikson<br>
Assistant Director of IT, Infrastructure Group<br>
System Administrator, School of CS<br>
Hampshire College<br>
Amherst, MA 01002<br>
(413) 559-6091<br>
-----BEGIN PGP SIGNATURE-----<br>
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)<br>
Comment: GPGTools - <a class="moz-txt-link-freetext" href="https://gpgtools.org">https://gpgtools.org</a><br>
<br>
iQEcBAEBCgAGBQJWerlBAAoJEEhWIsC/LdC33qQH/1Cy/LSR0njniUVcaMLr3p+X<br>
4Nq2sX2nFICfxXnNimy8gbArJYs8BXtSS3KG1OPjncOMEvvRLLKihe0yIp+p0DjE<br>
2B7GCseDnm6pgNuRMp5K4oO6t5ITBlZj9zbwD90Mh17cnviVzQCbNxNMN7Y6Xy3H<br>
fHKOghYY/QLJi4xNHtcVHs7GTUGuaH4ZDNNhfdJFRsYT9cnqjO149DavPSjD+oiA<br>
ZjakVHynObRydd+Y21FmT4WFUXFX6XtN5h4d+KCthnFn/cAe0CBitxpoGa+Jlliv<br>
dDrUz1i45FFDzuG2BNA+O7a1i12YCHIkWSNvcRiTbT8VCJWbUV/D/JC5xVZsQkw=<br>
=TC8s<br>
-----END PGP SIGNATURE-----<br>
<br>
</body>
</html>