[Clusterusers] breve segmentation faults on fly

Lee Spector lspector at hampshire.edu
Mon Oct 29 21:19:53 EDT 2007


Interesting idea but not trivial to tell, since the seg fault  
messages aren't time stamped and I'm not sure exactly when they  
happened.... but this might be worth looking into more carefully...

  -Lee

On Oct 29, 2007, at 9:06 PM, Chris Perry wrote:

>
> I know we've been rendering fairly hard recently - do the graphs on  
> ganglia show a correlation between your seg fault times/machines  
> and a lack of RAM on those machines? I would expect a different  
> error than seg fault in that case, but it's worth seeing if there's  
> a possible connection.
>
> - chris
>
> On Oct 29, 2007, at 2:09 PM, Lee Spector wrote:
>
>>
>> Great -- I didn't see any core files, but where would I find them  
>> from a run like this?
>>
>>  -Lee
>>
>> On Oct 29, 2007, at 1:55 PM, jon klein wrote:
>>
>>>
>>> I can't really say why that build of breve is crashing for you,  
>>> but it is pretty old at this point and due to be updated soon.   
>>> The latest builds do have some potential memory savings, so it  
>>> could help out.  I'll make a new build soon and let you know when  
>>> it's installed.
>>>
>>> Did the crashes leave any core files behind?  That would be the  
>>> only way to get a hint of why they might have crashed.
>>>
>>> -- jon klein
>>>
>>>
>>> On Oct 29, 2007, at 12:41 PM, Lee Spector wrote:
>>>
>>>> On Oct 29, 2007, at 11:24 AM, Wm. Josiah Erikson wrote:
>>>>> Hum. Well, uh, unless you tell me something that seems like  
>>>>> evidence is to the contrary, I'll assume the ball is in some  
>>>>> court other than mine for the time being?
>>>>
>>>> Josiah: I guess so. The OS memory limit seemed like such a good  
>>>> theory that I'm having a hard time letting go of it :-), but if  
>>>> the system is back to where it was previously then I don't see  
>>>> what could be causing this on your end.
>>>>
>>>> Jon: I'm using  /share/apps/breve/dev/bin/breve_cli, which looks  
>>>> to be a version that has been there since May:
>>>>
>>>> $ ls -l  /share/apps/breve/dev/bin/breve_cli
>>>> -rwxr-xr-x  1 1000 1000 305 May 26 07:22 /share/apps/breve/dev/ 
>>>> bin/breve_cli
>>>>
>>>> Might that have something to do with this, and should it be  
>>>> updated?
>>>>
>>>> I can kill my current run at any point if I should do that for  
>>>> upgrading breve.
>>>>
>>>> Thanks,
>>>>
>>>>  -Lee
>>>>
>>
>> --
>> Lee Spector, Professor of Computer Science
>> School of Cognitive Science, Hampshire College
>> 893 West Street, Amherst, MA 01002-3359
>> lspector at hampshire.edu, http://hampshire.edu/lspector/
>> Phone: 413-559-5352, Fax: 413-559-5438
>>
>> _______________________________________________
>> Clusterusers mailing list
>> Clusterusers at lists.hampshire.edu
>> http://lists.hampshire.edu/mailman/listinfo/clusterusers
>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> http://lists.hampshire.edu/mailman/listinfo/clusterusers

--
Lee Spector, Professor of Computer Science
School of Cognitive Science, Hampshire College
893 West Street, Amherst, MA 01002-3359
lspector at hampshire.edu, http://hampshire.edu/lspector/
Phone: 413-559-5352, Fax: 413-559-5438




More information about the Clusterusers mailing list