<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Yes -- all of this sounds great -- thanks!!</div><div class=""><br class=""></div><div class=""> -Lee</div><div class=""><br class=""></div><br class=""><div><blockquote type="cite" class=""><div class="">On Jul 19, 2017, at 11:13 AM, Thomas Helmuth <<a href="mailto:trhtom@gmail.com" class="">trhtom@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="auto" class="">Sounds great, thanks!</div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Jul 19, 2017 10:55 AM, "Wm. Josiah Erikson" <<a href="mailto:wjerikson@hampshire.edu" class="">wjerikson@hampshire.edu</a>> wrote:<br type="attribution" class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF" class=""><p class="">I just saw one of Tom's jobs do the same thing (run rack0 nodes out of RAM - not sure if any of the jobs actually crashed yet, but still...), so I did two things:</p><p class="">1. Set the per-blade "sh" limit to 8 (previously unlimited)<br class="">
</p><p class="">2. Set the number of slots on Rack 0 nodes to 8 (previously 16)<br class="">
</p><p class=""> -Josiah</p><p class=""><br class="">
</p>
<br class="">
<div class="m_7154915287393964308moz-cite-prefix">On 7/19/17 9:40 AM, Wm. Josiah Erikson wrote:<br class="">
</div>
<blockquote type="cite" class=""><p class="">Also, some additional notes:</p><p class="">1. I've set the "bash" limit tag to 8 per node. Tom seems to use sh instead of bash, so this won't affect him, I think.<br class="">
</p><p class="">2. If you want to do 4 per node instead, just add the limit tag "4pernode" to a RemoteCmd in the .alf file, like this:</p><p class=""> RemoteCmd "bash -c "env blah blah" -tags "4pernode" -service "tom"</p><p class="">3. There's also a 2pernode tag</p><p class=""> -Josiah</p><p class=""><br class="">
</p>
<br class="">
<div class="m_7154915287393964308moz-cite-prefix">On 7/19/17 9:30 AM, Wm. Josiah Erikson wrote:<br class="">
</div>
<blockquote type="cite" class=""><p class="">Yeah, it ran out of RAM - probably tried to take 16 jobs simultaneously. It's got 64GB of RAM, so should be able to do quite a few. It's currently running at under 1/3 RAM usage with 3 jobs, so 8 jobs should be safe. I'll limit "bash" limit tag jobs (what
yours default to since you didn't set one explicitly and that's the command you're running) to 8 per node. I could lower it to 4 if you prefer. Let me know.<br class="">
</p><p class=""> -Josiah</p><p class=""><br class="">
</p>
<br class="">
<div class="m_7154915287393964308moz-cite-prefix">On 7/19/17 8:38 AM, Thomas Helmuth wrote:<br class="">
</div>
<blockquote type="cite" class="">
<div dir="auto" class="">Moving today, so I'll be brief.
<div dir="auto" class=""><br class="">
</div>
<div dir="auto" class="">This sounds like the memory problems we've had on rack 4. There, there are just too many cures for the amount of memory, so if too many runs start, none get enough memory. That's why rack 4 isn't on the Tom or big go service tags. Probably means
we should remove rack 0 from those tags for now until we figure it out.</div>
<div dir="auto" class=""><br class="">
</div>
<div dir="auto" class="">I've done runs on rack 4 by limiting each run to 1GB of memory.</div>
<div dir="auto" class=""><br class="">
</div>
<div dir="auto" class="">Tom</div>
</div>
<div class="gmail_extra"><br class="">
<div class="gmail_quote">On Jul 19, 2017 7:46 AM, "Lee Spector" <<a href="mailto:lspector@hampshire.edu" target="_blank" class="">lspector@hampshire.edu</a>> wrote:<br type="attribution" class="">
<blockquote class="m_7154915287393964308quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br class="">
I launched some runs this morning (a couple of hours ago -- I'm in Europe) that behaved oddly. I don't know if this is related to recent upgrades/reconfiguration, or what, but I thought I'd share them here.<br class="">
<br class="">
First, there were weird delays between launches and the generation of the initial output. I'm accustomed to delays of a couple of minutes at this point, which I think we've determined (or just guessed) are due to the necessity of re-downloading dependencies
on fly nodes, which doesn't occur when running on other machines. But these took much longer, maybe 30-45 minutes or longer (not sure of the exact times, but they were on that order). And weirder still, there were long times during which many had launched
but only one was producing output and proceeding for a long time, with others really kicking in only after the early one(s) finished. I'd understand this if the non-producing ones weren't launched at all, because they were waiting for free slots.... but that's
not what was happening.<br class="">
<br class="">
And then one crashed, on compute-0-1 (this was launched with the "tom" service tag, using the run-fly script in Clojush), with nothing in the .err file (from standard error) but the following in the .out file (from standard output):<br class="">
<br class="">
Producing offspring...<br class="">
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000006f0<wbr class="">e00000, 1059061760, 0) failed; error='Cannot allocate memory' (errno=12)<br class="">
#<br class="">
# There is insufficient memory for the Java Runtime Environment to continue.<br class="">
# Native memory allocation (malloc) failed to allocate 1059061760 bytes for committing reserved memory.<br class="">
# An error report file with more information is saved as:<br class="">
# /home/lspector/runs/july19a-92<wbr class="">35/Clojush/hs_err_pid57996.log<br class="">
<br class="">
-Lee<br class="">
<br class="">
--<br class="">
Lee Spector, Professor of Computer Science<br class="">
Director, Institute for Computational Intelligence<br class="">
Hampshire College, Amherst, Massachusetts, 01002, USA<br class="">
<a href="mailto:lspector@hampshire.edu" target="_blank" class="">lspector@hampshire.edu</a>,
<a href="http://hampshire.edu/lspector/" rel="noreferrer" target="_blank" class="">
http://hampshire.edu/lspector/</a><wbr class="">, <a href="tel:413-559-5352" value="+14135595352" target="_blank" class="">
413-559-5352</a><br class="">
<br class="">
______________________________<wbr class="">_________________<br class="">
Clusterusers mailing list<br class="">
<a href="mailto:Clusterusers@lists.hampshire.edu" target="_blank" class="">Clusterusers@lists.hampshire.e<wbr class="">du</a><br class="">
<a href="https://lists.hampshire.edu/mailman/listinfo/clusterusers" rel="noreferrer" target="_blank" class="">https://lists.hampshire.edu/ma<wbr class="">ilman/listinfo/clusterusers</a><br class="">
</blockquote>
</div>
<br class="">
</div>
<br class="">
<fieldset class="m_7154915287393964308mimeAttachmentHeader"></fieldset> <br class="">
<pre class="">______________________________<wbr class="">_________________
Clusterusers mailing list
<a class="m_7154915287393964308moz-txt-link-abbreviated" href="mailto:Clusterusers@lists.hampshire.edu" target="_blank">Clusterusers@lists.hampshire.<wbr class="">edu</a>
<a class="m_7154915287393964308moz-txt-link-freetext" href="https://lists.hampshire.edu/mailman/listinfo/clusterusers" target="_blank">https://lists.hampshire.edu/<wbr class="">mailman/listinfo/clusterusers</a>
</pre>
</blockquote>
<br class="">
<pre class="m_7154915287393964308moz-signature" cols="72">--
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
<a href="tel:(413)%20559-6091" value="+14135596091" target="_blank" class="">(413) 559-6091</a>
</pre>
<br class="">
<fieldset class="m_7154915287393964308mimeAttachmentHeader"></fieldset> <br class="">
<pre class="">______________________________<wbr class="">_________________
Clusterusers mailing list
<a class="m_7154915287393964308moz-txt-link-abbreviated" href="mailto:Clusterusers@lists.hampshire.edu" target="_blank">Clusterusers@lists.hampshire.<wbr class="">edu</a>
<a class="m_7154915287393964308moz-txt-link-freetext" href="https://lists.hampshire.edu/mailman/listinfo/clusterusers" target="_blank">https://lists.hampshire.edu/<wbr class="">mailman/listinfo/clusterusers</a>
</pre>
</blockquote>
<br class="">
<pre class="m_7154915287393964308moz-signature" cols="72">--
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
<a href="tel:(413)%20559-6091" value="+14135596091" target="_blank" class="">(413) 559-6091</a>
</pre>
</blockquote>
<br class="">
<pre class="m_7154915287393964308moz-signature" cols="72">--
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
<a href="tel:(413)%20559-6091" value="+14135596091" target="_blank" class="">(413) 559-6091</a>
</pre>
</div>
</blockquote></div></div>
_______________________________________________<br class="">Clusterusers mailing list<br class=""><a href="mailto:Clusterusers@lists.hampshire.edu" class="">Clusterusers@lists.hampshire.edu</a><br class="">https://lists.hampshire.edu/mailman/listinfo/clusterusers<br class=""></div></blockquote></div><br class=""><div class="">
<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-ligatures: normal; font-variant-position: normal; font-variant-caps: normal; font-variant-numeric: normal; font-variant-alternates: normal; font-variant-east-asian: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-stroke-width: 0px;"><span class="Apple-style-span" style="font-size: 12px;"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-position: normal; font-variant-caps: normal; font-variant-numeric: normal; font-variant-alternates: normal; font-variant-east-asian: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-stroke-width: 0px;"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-position: normal; font-variant-caps: normal; font-variant-numeric: normal; font-variant-alternates: normal; font-variant-east-asian: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-stroke-width: 0px;"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-ligatures: normal; font-variant-position: normal; font-variant-caps: normal; font-variant-numeric: normal; font-variant-alternates: normal; font-variant-east-asian: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; border-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-stroke-width: 0px;"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><p style="margin: 0px;" class=""></p><div class=""><div apple-content-edited="true" style="orphans: auto; widows: auto;" class="">--</div><div apple-content-edited="true" style="orphans: auto; widows: auto;" class="">Lee Spector, Professor of Computer Science</div><div apple-content-edited="true" style="orphans: auto; widows: auto;" class="">Director, Institute for Computational Intelligence<br class="">Hampshire College, Amherst, Massachusetts, 01002, USA<br class=""><a href="mailto:lspector@hampshire.edu" class="">lspector@hampshire.edu</a>, <a href="http://hampshire.edu/lspector/" class="">http://hampshire.edu/lspector/</a>, 413-559-5352</div></div></div></div></span></div></span></div></span></div></span></span></div></div></div></div>
</div>
<br class=""></body></html>