[Clusterusers] fly's hardware problem

Bassam Kurdali bassam at urchn.org
Tue Dec 22 11:12:41 EST 2015


I'm fine to wait till Jan too. I was more thinking of pushing some
renders since other people might not be using the cluster over the
hols, not for a specific deadline.
cheers
B
On Tue, 2015-12-22 at 10:21 -0500, Thomas Helmuth wrote:
> Hi Josiah,
> 
> As far as I know, neither Lee nor I plan on using cluster for GP runs
> over the break. On the other hand, we are working on some papers with
> deadlines in late January that have a bunch of data on fly that we'll
> need. Will we not even be able to access that data? That could be a
> problem.
> 
> Basically, I don't think we'll need tractor or access to the nodes,
> but it would be good to be able to access and gather data. If that's
> not possible, it's not a huge deal since we have some initial data,
> but we'd need access to it ASAP after break.
> 
> Thanks!
> Tom
> 
> On Tue, Dec 22, 2015 at 10:13 AM, Wm. Josiah Erikson 
> hire.edu> wrote:
> > Hey folks,
> >     I'm trying to get an idea of how much people were planning on
> > using
> > the cluster over the holidays, i.e. how much effort I should put
> > into
> > this. Are there folks who were planning to do important work over
> > the
> > break whose plans will be totally screwed up if it's not back until
> > after the new year?
> >     I will come in tomorrow and probably be able to get it back up
> > anyway, just want to know what kind of priority to give it in case
> > of
> > mishap.
> >     Thanks and happy holidays!
> >     -Josiah
> > 
> > 
> > On 12/22/15, 8:47 AM, Wm. Josiah Erikson wrote:
> > > I got some time this morning and came in on a vacation day...
> > >
> > > After some mucking around with BIOS and CPU's and memory config,
> > things
> > > deteriorating quickly...
> > >
> > > Well, the machine won't even post now, so I'll have to move
> > everything
> > > over to a new machine and get a new license file from Pixar. I'll
> > try to
> > > get this done tomorrow morning, but this may mean that the
> > cluster will
> > > be down until January if everything doesn't go smoothly.
> > >
> > > Apologies!
> > >     -Josiah
> > >
> > >
> > > On 12/21/15 9:28 PM, Wm. Josiah Erikson wrote:
> > >> ...or not. It appears to be happening again. I'm out tomorrow,
> > but I
> > >> should be in briefly on Wednesday and will make another attempt.
> > >>     -Josiah
> > >>
> > >>
> > >> On 12/21/15, 11:04 AM, Wm. Josiah Erikson wrote:
> > >>> One of the power supplies was probably bad (dim/flickering
> > lights, and
> > >>> Google searches had other people experiencing this problem with
> > a bad
> > >>> power supply). I replaced it with a spare I had sitting around
> > (thanks
> > >>> Mt. Holyoke)! and hopefully that will be the end of that. Of
> > course, it
> > >>> could be something else too :)
> > >>> For now, fly's back up. Hopefully for good, but no guarantees.
> > >>>     -Josiah
> > >>>
> > >>>
> > >>> On 12/21/15 10:35 AM, Wm. Josiah Erikson wrote:
> > >>>> Well, it just kernel panicked. Neat. That's new. So it's down
> > now until
> > >>>> I get it back up :) I'll keep you updated...
> > >>>>     -Josiah
> > >>>>
> > >>>>
> > >>>> On 12/21/15 10:08 AM, Wm. Josiah Erikson wrote:
> > >>>>> Hello all,
> > >>>>>     I rebooted fly again - it is not fixed yet, though it is
> > up and
> > >>>>> running currently. I will probably take it down later today
> > to try
> > >>>>> removing one of the CPU's to see if it is the problem. This
> > should
> > >>>>> result in less than 10 minutes of downtime, and jobs should
> > resume where
> > >>>>> they left off, so no need to hold off launching renders or
> > whatever. My
> > >>>>> current theory is that CPU #2 is faulty. It could also be a
> > bad power
> > >>>>> supply. I'll keep you updated.
> > >>>>>
> > 
> > --
> > -----
> > Wm. Josiah Erikson
> > Head, Systems and Networking
> > Hampshire College
> > Amherst, MA 01002
> > 
> > _______________________________________________
> > Clusterusers mailing list
> > Clusterusers at lists.hampshire.edu
> > https://lists.hampshire.edu/mailman/listinfo/clusterusers
> > 
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> https://lists.hampshire.edu/mailman/listinfo/clusterusers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.hampshire.edu/pipermail/clusterusers/attachments/20151222/70eed3f7/attachment.html>


More information about the Clusterusers mailing list