<div dir="ltr">Sounds good!<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 22, 2015 at 7:59 PM, Wm. Josiah Erikson <span dir="ltr"><<a href="mailto:wjens@hampshire.edu" target="_blank">wjens@hampshire.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF">
    I've changed MinRAM to 3GB. I think actually, on second thought,
    that 4GB might prevent machines that only have 8GB from taking jobs
    when they should be able to. I'll watch and tweak as necessary.<br>
    Wow the cluster is getting hammered. Fun :)<span class="HOEnZb"><font color="#888888"><br>
        -Josiah</font></span><div><div class="h5"><br>
    <br>
    <br>
    <div>On 4/22/15 7:54 PM, Thomas Helmuth
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">Ok, good, glad it's not something in Clojush doing
        the crashing! I'm guessing somewhere in the range of 4GB would
        do the trick. So, let's try that?<br>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Wed, Apr 22, 2015 at 7:52 PM, Wm.
          Josiah Erikson <span dir="ltr"><<a href="mailto:wjens@hampshire.edu" target="_blank">wjens@hampshire.edu</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I have
            rebooted compute-1-17. That should fix that. (Apologies to
            Piper, who had a maxwell job running. You'll have to retry
            that one)<br>
            <br>
            The other problem is that some nodes are actually out of
            memory because of the insanely intensive rendering jobs that
            are running on them. We could play with minimum free memory
            required to take new jobs.... how much RAM do you think is
            necessary? Right now it's set to 1GB. Maybe I should try
            changing it to 2GB or 4GB?<span><font color="#888888"><br>
                    -Josiah</font></span>
            <div>
              <div><br>
                <br>
                <br>
                On 4/22/15 7:22 PM, Thomas Helmuth wrote:<br>
                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                  Hi Josiah,<br>
                  <br>
                  I just started my first runs on fly since the hard
                  drive switch. I'm getting some weird errors. The first
                  is that every run started on compute-1-17 crashed with
                  the following message:<br>
                  <br>
                  ====[2015/04/22 19:06:03 /J1504220051/T10/C10/thelmuth
                  on compute-1-17 ]====<br>
                   /bin/sh: line 0: cd: /home/thelmuth/ClusteringBench/:
                  Not a directory<br>
                   /bin/sh:
                  /home/thelmuth/Results/clustering-bench/determin-decim/ratio-0.25/replace-space-with-newline/tourney-7/logs/log9.txt:
                  Stale file handle<br>
                   /bin/sh: line 0: cd:
                  /home/thelmuth/Results/clustering-bench/determin-decim/ratio-0.25/replace-space-with-newline/tourney-7/csv/:
                  Not a directory<br>
                  ...<br>
                  <br>
                  It sounds like 1-17 for some reason cannot access my
                  homedir. When I tried to SSH to compute-1-17, it asks
                  for my password, which it doesn't for other nodes. So,
                  sounds like there's something amiss there.<br>
                  <br>
                  I had some other runs, it looks like all on rack 2
                  nodes, that crashed printing the following to the
                  output log files:<br>
                  <br>
                  Java HotSpot(TM) 64-Bit Server VM warning: INFO:
                  os::commit_memory(0x00000007e5500000, 447741952, 0)
                  failed; error='Cannot allocate memory' (e\<br>
                  rrno=12)<br>
                  #<br>
                  # There is insufficient memory for the Java Runtime
                  Environment to continue.<br>
                  # Native memory allocation (malloc) failed to allocate
                  447741952 bytes for committing reserved memory.<br>
                  # An error report file with more information is saved
                  as:<br>
                  # /home/thelmuth/ClusteringBench/hs_err_pid21904.log<br>
                  <br>
                  This looks like an error we used to get with Java but
                  I haven't seen recently. I checked, and I don't think
                  anything has changed in our code regarding memory
                  management.<br>
                  <br>
                  Is it possible that one or both of these errors are
                  caused by something in the move, or an upgrade to
                  tractor or something? I can go back and look for
                  similar errors to the second one to see if we figured
                  out what went wrong there.<br>
                  <br>
                  Tom<br>
                </blockquote>
                <br>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br></div>