[Clusterusers] breve segmentation faults on fly

Lee Spector lspector at hampshire.edu
Mon Oct 29 11:08:00 EDT 2007


Geez -- now that I look more carefully I see that it died this way on  
almost all of them on a run over the weekend. So I don't think it has  
to do with different things on subsets of the nodes. (I was going to  
send you the node lists but in one run 36 out of 39 nodes had  
segmentation faults so I won't bother.)

  -Lee


On Oct 29, 2007, at 10:09 AM, Wm. Josiah Erikson wrote:

> I actually removed those memory limits because I was  
> troubleshooting something. The only node that still has a memory  
> limit is the head node, at 500MB.
>
> Which nodes is it dying on? Is it consistently some subset? There  
> are currently four subsets of nodes that are identical to each other:
>
> -compute-0-1 through compute-0-13
> -compute-0-14 through compute-0-23
> -compute-1-6, compute-1-7, and compute-1-9
> -the rest of the compute-1-x
>
> It's probably not anything to do with hardware, but if it was, it  
> would probably consistently die on some subset of the nodes....
>
> Just throwing out ideas.
>
>    -Josiah
>
>
>
> Lee Spector wrote:
>>
>> In the last couple of days I've been having a lot of my breve runs  
>> on fly dying, after running correctly for a while, with the  
>> following in their log files:
>>
>> /share/apps/breve/dev/bin/breve: line 13: 20607 Segmentation  
>> fault      $DIRECTORY/breve_ex $*
>>
>> I don't think I've seen this previously. Could it be the new  
>> memory allocation limits? Or something in a recent breve build? I  
>> guess it's possible that it's ultimately due to a change in my  
>> code, but it's hard for me to see how -- I've changed little and  
>> what I've changed seems harmless.
>>
>> I'm not sure what that "line 13" is line 13 of -- couldn't be my  
>> simulation source, since that's just a @define ...
>>
>> Hitting a hard memory limit -- perhaps what Josiah recently set up  
>> -- makes sense to me, though I'm not sure how to verify it after  
>> the fact...
>>
>> Any other ideas?
>>
>> Thanks,
>>
>>  -Lee
>>
>>
>>
>> -- 
>> Lee Spector, Professor of Computer Science
>> School of Cognitive Science, Hampshire College
>> 893 West Street, Amherst, MA 01002-3359
>> lspector at hampshire.edu, http://hampshire.edu/lspector/
>> Phone: 413-559-5352, Fax: 413-559-5438
>>
>> _______________________________________________
>> Clusterusers mailing list
>> Clusterusers at lists.hampshire.edu
>> http://lists.hampshire.edu/mailman/listinfo/clusterusers
>
> -- 
> Wm. Josiah Erikson
> Computing Support
> School of Cognitive Science
> Hampshire College
> Amherst, MA 01002
> (413) 559-6091
>
> _______________________________________________
> Clusterusers mailing list
> Clusterusers at lists.hampshire.edu
> http://lists.hampshire.edu/mailman/listinfo/clusterusers

--
Lee Spector, Professor of Computer Science
School of Cognitive Science, Hampshire College
893 West Street, Amherst, MA 01002-3359
lspector at hampshire.edu, http://hampshire.edu/lspector/
Phone: 413-559-5352, Fax: 413-559-5438




More information about the Clusterusers mailing list