[Clusterusers] Stress-testing the cluster tonight

Bassam Kurdali bassam at urchn.org
Sat May 31 12:06:02 EDT 2014


Is there something we can do to goose it while you're away like rebooting (if it goes down again)?

 "Wm. Josiah Erikson" <wjerikson at hampshire.edu> wrote:

>This was that eth2 problem. I hope the kernel update I just did will 
>keep it from happening again... but the kernel parameter line I added 
>before was supposed to fix it even without the kernel update. Keep your 
>finger crossed...
>     -Josiah
>
>On 5/30/14 5:06 PM, Bassam Kurdali wrote:
>> Hmm, maybe fly is having problems as a result? tractor just 'went away'
>> I can still SSH into fly though, is there anything I can do to restart
>> tractor , otherwise we are down for a week :(
>> On Thu, 2014-05-29 at 23:05 -0400, Wm. Josiah Erikson wrote:
>>> I am ferreting out any weak nodes by running dnetc on anything that
>>> isn't doing anything else. My goal is to get rid of/fix any nodes that
>>> are not reliable. Please let me know if you feel like this is causing
>>> any problems for you. I will stop this tomorrow morning - it's just
>>> overnight, as I'm going on vacation for a week starting Saturday. My
>>> tentative belief at this point is that all of the nodes are fully
>>> functional and reliable.... testing that belief :)
>>>
>>
>> _______________________________________________
>> Clusterusers mailing list
>> Clusterusers at lists.hampshire.edu
>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>
>-- 
>-----
>Wm. Josiah Erikson
>Head, Systems and Networking
>Hampshire College
>Amherst, MA 01002
>
>_______________________________________________
>Clusterusers mailing list
>Clusterusers at lists.hampshire.edu
>https://lists.hampshire.edu/mailman/listinfo/clusterusers


More information about the Clusterusers mailing list