<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Also, some additional notes:</p>

    <p>1. I've set the "bash" limit tag to 8 per node. Tom seems to use

      sh instead of bash, so this won't affect him, I think.<br>

    </p>

    <p>2. If you want to do 4 per node instead, just add the limit tag

      "4pernode" to a RemoteCmd in the .alf file, like this:</p>

    <p>    RemoteCmd "bash -c "env blah blah" -tags "4pernode" -service

      "tom"</p>

    <p>3. There's also a 2pernode tag</p>

    <p>    -Josiah</p>

    <p><br>

    </p>

    <br>

    <div class="moz-cite-prefix">On 7/19/17 9:30 AM, Wm. Josiah Erikson

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:53149407-f293-d666-8194-a9cdb4f44b52@hampshire.edu">

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

      <p>Yeah, it ran out of RAM - probably tried to take 16 jobs

        simultaneously. It's got 64GB of RAM, so should be able to do

        quite a few. It's currently running at under 1/3 RAM usage with

        3 jobs, so 8 jobs should be safe. I'll limit "bash" limit tag

        jobs (what yours default to since you didn't set one explicitly

        and that's the command you're running) to 8 per node. I could

        lower it to 4 if you prefer. Let me know.<br>

      </p>

      <p>    -Josiah</p>

      <p><br>

      </p>

      <br>

      <div class="moz-cite-prefix">On 7/19/17 8:38 AM, Thomas Helmuth

        wrote:<br>

      </div>

      <blockquote type="cite"

cite="mid:CABgVVjfn0FW28918bU4GDky0NCX-4kyv4+uApHwUcBzhWWd8NA@mail.gmail.com">

        <div dir="auto">Moving today, so I'll be brief.

          <div dir="auto"><br>

          </div>

          <div dir="auto">This sounds like the memory problems we've had

            on rack 4. There, there are just too many cures for the

            amount of memory, so if too many runs start, none get enough

            memory. That's why rack 4 isn't on the Tom or big go service

            tags. Probably means we should remove rack 0 from those tags

            for now until we figure it out.</div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">I've done runs on rack 4 by limiting each run

            to 1GB of memory.</div>

          <div dir="auto"><br>

          </div>

          <div dir="auto">Tom</div>

        </div>

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Jul 19, 2017 7:46 AM, "Lee

            Spector" <<a href="mailto:lspector@hampshire.edu"

              moz-do-not-send="true">lspector@hampshire.edu</a>>

            wrote:<br type="attribution">

            <blockquote class="quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

              I launched some runs this morning (a couple of hours ago

              -- I'm in Europe) that behaved oddly. I don't know if this

              is related to recent upgrades/reconfiguration, or what,

              but I thought I'd share them here.<br>

              <br>

              First, there were weird delays between launches and the

              generation of the initial output. I'm accustomed to delays

              of a couple of minutes at this point, which I think we've

              determined (or just guessed) are due to the necessity of

              re-downloading dependencies on fly nodes, which doesn't

              occur when running on other machines. But these took much

              longer, maybe 30-45 minutes or longer (not sure of the

              exact times, but they were on that order). And weirder

              still, there were long times during which many had

              launched but only one was producing output and proceeding

              for a long time, with others really kicking in only after

              the early one(s) finished. I'd understand this if the

              non-producing ones weren't launched at all, because they

              were waiting for free slots.... but that's not what was

              happening.<br>

              <br>

              And then one crashed, on compute-0-1 (this was launched

              with the "tom" service tag, using the run-fly script in

              Clojush), with nothing in the .err file (from standard

              error) but the following in the .out file (from standard

              output):<br>

              <br>

              Producing offspring...<br>

              Java HotSpot(TM) 64-Bit Server VM warning: INFO:

              os::commit_memory(<wbr>0x00000006f0e00000, 1059061760, 0)

              failed; error='Cannot allocate memory' (errno=12)<br>

              #<br>

              # There is insufficient memory for the Java Runtime

              Environment to continue.<br>

              # Native memory allocation (malloc) failed to allocate

              1059061760 bytes for committing reserved memory.<br>

              # An error report file with more information is saved as:<br>

              # /home/lspector/runs/july19a-<wbr>9235/Clojush/hs_err_pid57996.<wbr>log<br>

              <br>

               -Lee<br>

              <br>

              --<br>

              Lee Spector, Professor of Computer Science<br>

              Director, Institute for Computational Intelligence<br>

              Hampshire College, Amherst, Massachusetts, 01002, USA<br>

              <a href="mailto:lspector@hampshire.edu"

                moz-do-not-send="true">lspector@hampshire.edu</a>, <a

                href="http://hampshire.edu/lspector/" rel="noreferrer"

                target="_blank" moz-do-not-send="true">http://hampshire.edu/lspector/</a><wbr>,

              <a href="tel:413-559-5352" value="+14135595352"

                moz-do-not-send="true">413-559-5352</a><br>

              <br>

              ______________________________<wbr>_________________<br>

              Clusterusers mailing list<br>

              <a href="mailto:Clusterusers@lists.hampshire.edu"

                moz-do-not-send="true">Clusterusers@lists.hampshire.<wbr>edu</a><br>

              <a

                href="https://lists.hampshire.edu/mailman/listinfo/clusterusers"

                rel="noreferrer" target="_blank" moz-do-not-send="true">https://lists.hampshire.edu/<wbr>mailman/listinfo/clusterusers</a><br>

            </blockquote>

          </div>

          <br>

        </div>

        <br>

        <fieldset class="mimeAttachmentHeader"></fieldset>

        <br>

        <pre wrap="">_______________________________________________

Clusterusers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Clusterusers@lists.hampshire.edu" moz-do-not-send="true">Clusterusers@lists.hampshire.edu</a>

<a class="moz-txt-link-freetext" href="https://lists.hampshire.edu/mailman/listinfo/clusterusers" moz-do-not-send="true">https://lists.hampshire.edu/mailman/listinfo/clusterusers</a>

</pre>

      </blockquote>

      <br>

      <pre class="moz-signature" cols="72">-- 

Wm. Josiah Erikson

Assistant Director of IT, Infrastructure Group

System Administrator, School of CS

Hampshire College

Amherst, MA 01002

(413) 559-6091

</pre>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Clusterusers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Clusterusers@lists.hampshire.edu">Clusterusers@lists.hampshire.edu</a>

<a class="moz-txt-link-freetext" href="https://lists.hampshire.edu/mailman/listinfo/clusterusers">https://lists.hampshire.edu/mailman/listinfo/clusterusers</a>

</pre>

    </blockquote>

    <br>

    <pre class="moz-signature" cols="72">-- 

Wm. Josiah Erikson

Assistant Director of IT, Infrastructure Group

System Administrator, School of CS

Hampshire College

Amherst, MA 01002

(413) 559-6091

</pre>

  </body>

</html>