[Clusterusers] Keg's NFS share down

Wm. Josiah Erikson wjerikson at hampshire.edu
Thu Jan 30 14:59:40 EST 2014


     Keg is back up, its filesystem having passed all checks, and 
appears to be fully functional - all checks are green (literally, it's 
got a whole bunch of nagios checks, and they're all green.... hehe).
     Spool away!
     -Josiah



On 1/30/14 11:12 AM, Wm. Josiah Erikson wrote:
> Update: after backing up all the data that Chris, Bassam, and I 
> determined to be important from keg (which finished with no errors), I 
> installed a newer kernel, reconfigured GRUB, and rebooted... and now 
> it's checking its root filesystem, which is a good sign! It hasn't 
> found any errors yet, but who knows how long it will take...
>     -Josiah
>
>
>
> On 1/29/14 1:51 PM, Wm. Josiah Erikson wrote:
>> Bassam and I just went in and looked at this together, after I 
>> discovered that the RAID 5 had resized itself spontaneously and then 
>> resynced to approximately 2TB, when it's supposed to be ~9TB. 
>> However, the FILESYSTEM still thinks it's the right size. So we told 
>> the RAID5 to resize itself again, and it responded appropriately, is 
>> resyncing, and when we mount the filesystem, we no longer get the 
>> input/output errors we did before. Cross your fingers - if this 
>> works, it will be one of the more harrowing near escapes I've ever had.
>>
>> We'll know tomorrow, as I'm leaving today before it will finish 
>> resyncing.
>>
>>     -Josiah
>>
>>
>> On 1/29/14 12:46 PM, Bassam Kurdali wrote:
>>> On Wed, 2014-01-29 at 10:36 -0500, Wm. Josiah Erikson wrote:
>>>> Well... the reshape finished, and I figured out why the NFS errors 
>>>> were
>>>> happening, and then the online resize finished successfully as 
>>>> well, so
>>>> I rebooted to re-seat the RAM, which all worked fine and we're back to
>>>> 6GB, but then when it came back up, I had lots of input/output 
>>>> errors on
>>>> the filesystem. It's currently booted from a rescue CD, and resyncing
>>>> again (why? I don't know), and then I'll fsck the filesystem and 
>>>> see if
>>>> I can rescue it. If not, we're looking at several days of downtime,
>>>> rebuilding from backups, and losing any renders that people didn't 
>>>> have
>>>> backed up.
>>> eek! I hope that doesn't happen too. We have our renders up to a 
>>> certain
>>> time backed up, probably a month or two old by now.
>>> What about svn and stuff in git / sparkleshare? we don't have a backup
>>> of those repos (now I feel stupid)
>>>
>>>> I hope that doesn't happen - I'll keep you updated.
>>>>       -Josiah
>>>>
>>>>
>>>> On 1/28/14 11:04 PM, Wm. Josiah Erikson wrote:
>>>>> In the middle of the reshape, I got some reaally strange NFS errors
>>>>> that were making me nervous, so in the interest of not corrupting any
>>>>> data, I have taken keg's NFS offline. It will be back online tomorrow
>>>>> morning when the reshape finishes. This will render a bunch of things
>>>>> inoperable, but mainly nobody will be able to render.
>>>>>
>>>
>>> _______________________________________________
>>> Clusterusers mailing list
>>> Clusterusers at lists.hampshire.edu
>>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>
>

-- 
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
(413) 559-6091



More information about the Clusterusers mailing list