[Clusterusers] breve segmentation faults on fly

Chris Perry perry at hampshire.edu
Mon Oct 29 21:42:25 EDT 2007


Maybe your log/output files for those runs stop when they segfault?  
That might be a timestamp you can use..

- chris

On Oct 29, 2007, at 9:19 PM, Lee Spector wrote:

>
> Interesting idea but not trivial to tell, since the seg fault  
> messages aren't time stamped and I'm not sure exactly when they  
> happened.... but this might be worth looking into more carefully...
>
>  -Lee
>
> On Oct 29, 2007, at 9:06 PM, Chris Perry wrote:
>
>>
>> I know we've been rendering fairly hard recently - do the graphs  
>> on ganglia show a correlation between your seg fault times/ 
>> machines and a lack of RAM on those machines? I would expect a  
>> different error than seg fault in that case, but it's worth seeing  
>> if there's a possible connection.
>>
>> - chris
>>
>> On Oct 29, 2007, at 2:09 PM, Lee Spector wrote:
>>
>>>
>>> Great -- I didn't see any core files, but where would I find them  
>>> from a run like this?
>>>
>>>  -Lee
>>>
>>> On Oct 29, 2007, at 1:55 PM, jon klein wrote:
>>>
>>>>
>>>> I can't really say why that build of breve is crashing for you,  
>>>> but it is pretty old at this point and due to be updated soon.   
>>>> The latest builds do have some potential memory savings, so it  
>>>> could help out.  I'll make a new build soon and let you know  
>>>> when it's installed.
>>>>
>>>> Did the crashes leave any core files behind?  That would be the  
>>>> only way to get a hint of why they might have crashed.
>>>>
>>>> -- jon klein
>>>>
>>>>
>>>> On Oct 29, 2007, at 12:41 PM, Lee Spector wrote:
>>>>
>>>>> On Oct 29, 2007, at 11:24 AM, Wm. Josiah Erikson wrote:
>>>>>> Hum. Well, uh, unless you tell me something that seems like  
>>>>>> evidence is to the contrary, I'll assume the ball is in some  
>>>>>> court other than mine for the time being?
>>>>>
>>>>> Josiah: I guess so. The OS memory limit seemed like such a good  
>>>>> theory that I'm having a hard time letting go of it :-), but if  
>>>>> the system is back to where it was previously then I don't see  
>>>>> what could be causing this on your end.
>>>>>
>>>>> Jon: I'm using  /share/apps/breve/dev/bin/breve_cli, which  
>>>>> looks to be a version that has been there since May:
>>>>>
>>>>> $ ls -l  /share/apps/breve/dev/bin/breve_cli
>>>>> -rwxr-xr-x  1 1000 1000 305 May 26 07:22 /share/apps/breve/dev/ 
>>>>> bin/breve_cli
>>>>>
>>>>> Might that have something to do with this, and should it be  
>>>>> updated?
>>>>>
>>>>> I can kill my current run at any point if I should do that for  
>>>>> upgrading breve.
>>>>>
>>>>> Thanks,
>>>>>
>>>>>  -Lee
>>>>>
>>>
>>> --
>>> Lee Spector, Professor of Computer Science
>>> School of Cognitive Science, Hampshire College
>>> 893 West Street, Amherst, MA 01002-3359
>>> lspector at hampshire.edu, http://hampshire.edu/lspector/
>>> Phone: 413-559-5352, Fax: 413-559-5438
>>>
>>> _______________________________________________
>>> Clusterusers mailing list
>>> Clusterusers at lists.hampshire.edu
>>> http://lists.hampshire.edu/mailman/listinfo/clusterusers
>>
>> _______________________________________________
>> Clusterusers mailing list
>> Clusterusers at lists.hampshire.edu
>> http://lists.hampshire.edu/mailman/listinfo/clusterusers
>
> --
> Lee Spector, Professor of Computer Science
> School of Cognitive Science, Hampshire College
> 893 West Street, Amherst, MA 01002-3359
> lspector at hampshire.edu, http://hampshire.edu/lspector/
> Phone: 413-559-5352, Fax: 413-559-5438
>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> http://lists.hampshire.edu/mailman/listinfo/clusterusers




More information about the Clusterusers mailing list