[Clusterusers] Stress-testing the cluster tonight

Wm. Josiah Erikson wjerikson at hampshire.edu
Sat May 31 17:15:11 EDT 2014


Note: fly1.hampshire.edu doesn't ping from off-campus, ever :)
     -Josiah


On 5/31/14 5:14 PM, Wm. Josiah Erikson wrote:
> Though, you don't even have to do that, actually. If 
> fly1.hampshire.edu stops pinging but fly.hampshire.edu is still 
> pinging, then just log into fly, do a "sudo ifdown eth2" and then 
> "sudo ifup eth2" and that should fix it.
>     -Josiah
>
>
> On 5/31/14 5:09 PM, Wm. Josiah Erikson wrote:
>> Yeah, you can reboot it and then execute sudo 
>> /etc/init.d/tractor-engine start on the head node.
>>     -Josiah
>>
>>
>> On 5/31/14 12:06 PM, Bassam Kurdali wrote:
>>> Is there something we can do to goose it while you're away like 
>>> rebooting (if it goes down again)?
>>>
>>>   "Wm. Josiah Erikson" <wjerikson at hampshire.edu> wrote:
>>>
>>>> This was that eth2 problem. I hope the kernel update I just did will
>>>> keep it from happening again... but the kernel parameter line I added
>>>> before was supposed to fix it even without the kernel update. Keep 
>>>> your
>>>> finger crossed...
>>>>      -Josiah
>>>>
>>>> On 5/30/14 5:06 PM, Bassam Kurdali wrote:
>>>>> Hmm, maybe fly is having problems as a result? tractor just 'went 
>>>>> away'
>>>>> I can still SSH into fly though, is there anything I can do to 
>>>>> restart
>>>>> tractor , otherwise we are down for a week :(
>>>>> On Thu, 2014-05-29 at 23:05 -0400, Wm. Josiah Erikson wrote:
>>>>>> I am ferreting out any weak nodes by running dnetc on anything that
>>>>>> isn't doing anything else. My goal is to get rid of/fix any nodes 
>>>>>> that
>>>>>> are not reliable. Please let me know if you feel like this is 
>>>>>> causing
>>>>>> any problems for you. I will stop this tomorrow morning - it's just
>>>>>> overnight, as I'm going on vacation for a week starting Saturday. My
>>>>>> tentative belief at this point is that all of the nodes are fully
>>>>>> functional and reliable.... testing that belief :)
>>>>>>
>>>>> _______________________________________________
>>>>> Clusterusers mailing list
>>>>> Clusterusers at lists.hampshire.edu
>>>>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>>> -- 
>>>> -----
>>>> Wm. Josiah Erikson
>>>> Head, Systems and Networking
>>>> Hampshire College
>>>> Amherst, MA 01002
>>>>
>>>> _______________________________________________
>>>> Clusterusers mailing list
>>>> Clusterusers at lists.hampshire.edu
>>>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>> _______________________________________________
>>> Clusterusers mailing list
>>> Clusterusers at lists.hampshire.edu
>>> https://lists.hampshire.edu/mailman/listinfo/clusterusers
>>
>

-- 
-----
Wm. Josiah Erikson
Head, Systems and Networking
Hampshire College
Amherst, MA 01002



More information about the Clusterusers mailing list