[Clusterusers] breve segmentation faults on fly
Chris Perry
perry at hampshire.edu
Mon Oct 29 21:42:25 EDT 2007
Maybe your log/output files for those runs stop when they segfault?
That might be a timestamp you can use..
- chris
On Oct 29, 2007, at 9:19 PM, Lee Spector wrote:
>
> Interesting idea but not trivial to tell, since the seg fault
> messages aren't time stamped and I'm not sure exactly when they
> happened.... but this might be worth looking into more carefully...
>
> -Lee
>
> On Oct 29, 2007, at 9:06 PM, Chris Perry wrote:
>
>>
>> I know we've been rendering fairly hard recently - do the graphs
>> on ganglia show a correlation between your seg fault times/
>> machines and a lack of RAM on those machines? I would expect a
>> different error than seg fault in that case, but it's worth seeing
>> if there's a possible connection.
>>
>> - chris
>>
>> On Oct 29, 2007, at 2:09 PM, Lee Spector wrote:
>>
>>>
>>> Great -- I didn't see any core files, but where would I find them
>>> from a run like this?
>>>
>>> -Lee
>>>
>>> On Oct 29, 2007, at 1:55 PM, jon klein wrote:
>>>
>>>>
>>>> I can't really say why that build of breve is crashing for you,
>>>> but it is pretty old at this point and due to be updated soon.
>>>> The latest builds do have some potential memory savings, so it
>>>> could help out. I'll make a new build soon and let you know
>>>> when it's installed.
>>>>
>>>> Did the crashes leave any core files behind? That would be the
>>>> only way to get a hint of why they might have crashed.
>>>>
>>>> -- jon klein
>>>>
>>>>
>>>> On Oct 29, 2007, at 12:41 PM, Lee Spector wrote:
>>>>
>>>>> On Oct 29, 2007, at 11:24 AM, Wm. Josiah Erikson wrote:
>>>>>> Hum. Well, uh, unless you tell me something that seems like
>>>>>> evidence is to the contrary, I'll assume the ball is in some
>>>>>> court other than mine for the time being?
>>>>>
>>>>> Josiah: I guess so. The OS memory limit seemed like such a good
>>>>> theory that I'm having a hard time letting go of it :-), but if
>>>>> the system is back to where it was previously then I don't see
>>>>> what could be causing this on your end.
>>>>>
>>>>> Jon: I'm using /share/apps/breve/dev/bin/breve_cli, which
>>>>> looks to be a version that has been there since May:
>>>>>
>>>>> $ ls -l /share/apps/breve/dev/bin/breve_cli
>>>>> -rwxr-xr-x 1 1000 1000 305 May 26 07:22 /share/apps/breve/dev/
>>>>> bin/breve_cli
>>>>>
>>>>> Might that have something to do with this, and should it be
>>>>> updated?
>>>>>
>>>>> I can kill my current run at any point if I should do that for
>>>>> upgrading breve.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -Lee
>>>>>
>>>
>>> --
>>> Lee Spector, Professor of Computer Science
>>> School of Cognitive Science, Hampshire College
>>> 893 West Street, Amherst, MA 01002-3359
>>> lspector at hampshire.edu, http://hampshire.edu/lspector/
>>> Phone: 413-559-5352, Fax: 413-559-5438
>>>
>>> _______________________________________________
>>> Clusterusers mailing list
>>> Clusterusers at lists.hampshire.edu
>>> http://lists.hampshire.edu/mailman/listinfo/clusterusers
>>
>> _______________________________________________
>> Clusterusers mailing list
>> Clusterusers at lists.hampshire.edu
>> http://lists.hampshire.edu/mailman/listinfo/clusterusers
>
> --
> Lee Spector, Professor of Computer Science
> School of Cognitive Science, Hampshire College
> 893 West Street, Amherst, MA 01002-3359
> lspector at hampshire.edu, http://hampshire.edu/lspector/
> Phone: 413-559-5352, Fax: 413-559-5438
>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> http://lists.hampshire.edu/mailman/listinfo/clusterusers
More information about the Clusterusers
mailing list