[Clusterusers] Keg's NFS share down

Wm. Josiah Erikson wjerikson at hampshire.edu
Thu Jan 30 11:12:09 EST 2014


Update: after backing up all the data that Chris, Bassam, and I 
determined to be important from keg (which finished with no errors), I 
installed a newer kernel, reconfigured GRUB, and rebooted... and now 
it's checking its root filesystem, which is a good sign! It hasn't found 
any errors yet, but who knows how long it will take...
     -Josiah



On 1/29/14 1:51 PM, Wm. Josiah Erikson wrote:
> Bassam and I just went in and looked at this together, after I 
> discovered that the RAID 5 had resized itself spontaneously and then 
> resynced to approximately 2TB, when it's supposed to be ~9TB. However, 
> the FILESYSTEM still thinks it's the right size. So we told the RAID5 
> to resize itself again, and it responded appropriately, is resyncing, 
> and when we mount the filesystem, we no longer get the input/output 
> errors we did before. Cross your fingers - if this works, it will be 
> one of the more harrowing near escapes I've ever had.
>
> We'll know tomorrow, as I'm leaving today before it will finish 
> resyncing.
>
>     -Josiah
>
>
> On 1/29/14 12:46 PM, Bassam Kurdali wrote:
>> On Wed, 2014-01-29 at 10:36 -0500, Wm. Josiah Erikson wrote:
>>> Well... the reshape finished, and I figured out why the NFS errors were
>>> happening, and then the online resize finished successfully as well, so
>>> I rebooted to re-seat the RAM, which all worked fine and we're back to
>>> 6GB, but then when it came back up, I had lots of input/output 
>>> errors on
>>> the filesystem. It's currently booted from a rescue CD, and resyncing
>>> again (why? I don't know), and then I'll fsck the filesystem and see if
>>> I can rescue it. If not, we're looking at several days of downtime,
>>> rebuilding from backups, and losing any renders that people didn't have
>>> backed up.
>> eek! I hope that doesn't happen too. We have our renders up to a certain
>> time backed up, probably a month or two old by now.
>> What about svn and stuff in git / sparkleshare? we don't have a backup
>> of those repos (now I feel stupid)
>>
>>> I hope that doesn't happen - I'll keep you updated.
>>>       -Josiah
>>>
>>>
>>> On 1/28/14 11:04 PM, Wm. Josiah Erikson wrote:
>>>> In the middle of the reshape, I got some reaally strange NFS errors
>>>> that were making me nervous, so in the interest of not corrupting any
>>>> data, I have taken keg's NFS offline. It will be back online tomorrow
>>>> morning when the reshape finishes. This will render a bunch of things
>>>> inoperable, but mainly nobody will be able to render.
>>>>
>>
>> _______________________________________________
>> Clusterusers mailing list
>> Clusterusers at lists.hampshire.edu
>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>

-- 
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
(413) 559-6091



More information about the Clusterusers mailing list