[Clusterusers] breve segmentation faults on fly

Lee Spector lspector at hampshire.edu
Tue Oct 30 05:53:09 EDT 2007


Ah -- right. Yes, that gives me a pretty good idea, and then looking  
directly at the individual nodes in ganglia I can actually see  
exactly where the run starts and where the CPU stops. And.... nothing  
to see there, really. I've got one segfault in my current run which  
started around 14:00 yesterday, and I can see all of the nodes  
ramping up but then compute-1-9, without any other apparent surges of  
activity, drops off around 20:00.

  -Lee

On Oct 29, 2007, at 9:42 PM, Chris Perry wrote:

>
> Maybe your log/output files for those runs stop when they segfault?  
> That might be a timestamp you can use..
>
> - chris
>
> On Oct 29, 2007, at 9:19 PM, Lee Spector wrote:
>
>>
>> Interesting idea but not trivial to tell, since the seg fault  
>> messages aren't time stamped and I'm not sure exactly when they  
>> happened.... but this might be worth looking into more carefully...
>>
>>  -Lee
>>
>> On Oct 29, 2007, at 9:06 PM, Chris Perry wrote:
>>
>>>
>>> I know we've been rendering fairly hard recently - do the graphs  
>>> on ganglia show a correlation between your seg fault times/ 
>>> machines and a lack of RAM on those machines? I would expect a  
>>> different error than seg fault in that case, but it's worth  
>>> seeing if there's a possible connection.
>>>
>>> - chris
>>>
>>> On Oct 29, 2007, at 2:09 PM, Lee Spector wrote:
>>>
>>>>
>>>> Great -- I didn't see any core files, but where would I find  
>>>> them from a run like this?
>>>>
>>>>  -Lee
>>>>
>>>> On Oct 29, 2007, at 1:55 PM, jon klein wrote:
>>>>
>>>>>
>>>>> I can't really say why that build of breve is crashing for you,  
>>>>> but it is pretty old at this point and due to be updated soon.   
>>>>> The latest builds do have some potential memory savings, so it  
>>>>> could help out.  I'll make a new build soon and let you know  
>>>>> when it's installed.
>>>>>
>>>>> Did the crashes leave any core files behind?  That would be the  
>>>>> only way to get a hint of why they might have crashed.
>>>>>
>>>>> -- jon klein
>>>>>
>>>>>
>>>>> On Oct 29, 2007, at 12:41 PM, Lee Spector wrote:
>>>>>
>>>>>> On Oct 29, 2007, at 11:24 AM, Wm. Josiah Erikson wrote:
>>>>>>> Hum. Well, uh, unless you tell me something that seems like  
>>>>>>> evidence is to the contrary, I'll assume the ball is in some  
>>>>>>> court other than mine for the time being?
>>>>>>
>>>>>> Josiah: I guess so. The OS memory limit seemed like such a  
>>>>>> good theory that I'm having a hard time letting go of it :-),  
>>>>>> but if the system is back to where it was previously then I  
>>>>>> don't see what could be causing this on your end.
>>>>>>
>>>>>> Jon: I'm using  /share/apps/breve/dev/bin/breve_cli, which  
>>>>>> looks to be a version that has been there since May:
>>>>>>
>>>>>> $ ls -l  /share/apps/breve/dev/bin/breve_cli
>>>>>> -rwxr-xr-x  1 1000 1000 305 May 26 07:22 /share/apps/breve/dev/ 
>>>>>> bin/breve_cli
>>>>>>
>>>>>> Might that have something to do with this, and should it be  
>>>>>> updated?
>>>>>>
>>>>>> I can kill my current run at any point if I should do that for  
>>>>>> upgrading breve.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>  -Lee
>>>>>>
>>>>
>>>> --
>>>> Lee Spector, Professor of Computer Science
>>>> School of Cognitive Science, Hampshire College
>>>> 893 West Street, Amherst, MA 01002-3359
>>>> lspector at hampshire.edu, http://hampshire.edu/lspector/
>>>> Phone: 413-559-5352, Fax: 413-559-5438
>>>>
>>>> _______________________________________________
>>>> Clusterusers mailing list
>>>> Clusterusers at lists.hampshire.edu
>>>> http://lists.hampshire.edu/mailman/listinfo/clusterusers
>>>
>>> _______________________________________________
>>> Clusterusers mailing list
>>> Clusterusers at lists.hampshire.edu
>>> http://lists.hampshire.edu/mailman/listinfo/clusterusers
>>
>> --
>> Lee Spector, Professor of Computer Science
>> School of Cognitive Science, Hampshire College
>> 893 West Street, Amherst, MA 01002-3359
>> lspector at hampshire.edu, http://hampshire.edu/lspector/
>> Phone: 413-559-5352, Fax: 413-559-5438
>>
>> _______________________________________________
>> Clusterusers mailing list
>> Clusterusers at lists.hampshire.edu
>> http://lists.hampshire.edu/mailman/listinfo/clusterusers
>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> http://lists.hampshire.edu/mailman/listinfo/clusterusers

--
Lee Spector, Professor of Computer Science
School of Cognitive Science, Hampshire College
893 West Street, Amherst, MA 01002-3359
lspector at hampshire.edu, http://hampshire.edu/lspector/
Phone: 413-559-5352, Fax: 413-559-5438




More information about the Clusterusers mailing list