<div dir="ltr">Go for it.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jun 22, 2015 at 4:38 PM, Wm. Josiah Erikson <span dir="ltr"><<a href="mailto:wjerikson@hampshire.edu" target="_blank">wjerikson@hampshire.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
I've got a couple of nodes (compute-1-4 and compute-2-27) that won't
remount /helga. It looks, Tom, like your processes are still running
and won't let go of /helga because they were in fact writing logs to
it, which I did not anticipate. If I can reboot those nodes, we
should be all set:<br>
<br>
[root@fly install]# ssh compute-1-4<br>
Rocks Compute Node<br>
Rocks 6.1.1 (Sand Boa)<br>
Profile built 09:02 08-Jun-2015<br>
<br>
Kickstarted 09:09 08-Jun-2015<br>
[root@compute-1-4 ~]# ls /helga<br>
ls: cannot access /helga: Stale file handle<br>
[root@compute-1-4 ~]# umount -f /helga<br>
umount2: Device or resource busy<br>
umount.nfs: /helga: device is busy<br>
umount2: Device or resource busy<br>
umount.nfs: /helga: device is busy<br>
[root@compute-1-4 ~]# ps aux | grep max<br>
root 723 0.0 0.0 105312 864 pts/0 S+ 16:31 0:00
grep max<br>
root 3717 0.1 0.0 349216 13068 ? Sl Jun08 38:24
/opt/pixar/tractor-blade-1.7.2/python/bin/python-bin
/opt/pixar/tractor-blade-1.7.2/blade-modules/tractor-blade.py
--dyld_framework_path=None
--ld_library_path=/opt/pixar/tractor-blade-1.7.2/python/lib:/opt/gridengine/lib/linux-x64:/opt/openmpi/lib:/opt/python/lib:/share/apps/maxwell-3.1:/share/apps/maxwell64-3.1
--debug --log /var/log/tractor-blade.log -P 8000<br>
[root@compute-1-4 ~]# ps aux | grep Max<br>
root 725 0.0 0.0 105312 864 pts/0 S+ 16:32 0:00
grep Max<br>
[root@compute-1-4 ~]# lsof | grep /helga<br>
lsof: WARNING: can't stat() nfs4 file system /helga<br>
Output information may be incomplete.<br>
python-bi 3717 root 5w
unknown
/helga/public_html/tractor/cmd-logs/thelmuth/J1506210001/T2.log
(stat: Stale file handle)<br>
python-bi 3717 root 8w
unknown
/helga/public_html/tractor/cmd-logs/thelmuth/J1506210001/T9.log
(stat: Stale file handle)<br>
python-bi 3717 root 12w
unknown
/helga/public_html/tractor/cmd-logs/thelmuth/J1506210001/T14.log
(stat: Stale file handle)<br>
python-bi 3717 root 16w
unknown
/helga/public_html/tractor/cmd-logs/thelmuth/J1506210001/T24.log
(stat: Stale file handle)<br>
[root@compute-1-4 ~]# top <br>
<br>
top - 16:32:23 up 14 days, 7:21, 1 user, load average: 33.77,
35.33, 35.73<br>
Tasks: 230 total, 1 running, 229 sleeping, 0 stopped, 0 zombie<br>
Cpu(s): 50.2%us, 0.2%sy, 0.0%ni, 49.6%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st<br>
Mem: 15928760k total, 8015840k used, 7912920k free, 238044k
buffers<br>
Swap: 2047992k total, 0k used, 2047992k free, 1094888k
cached<br>
<br>
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND <br>
8428 thelmuth 20 0 6089m 1.5g 11m S 254.9 10.1 2809:37
java <br>
8499 thelmuth 20 0 6089m 1.5g 11m S 243.6 10.1 2829:15
java <br>
8367 thelmuth 20 0 6089m 1.5g 11m S 236.0 10.0 2829:42
java <br>
8397 thelmuth 20 0 6089m 1.5g 11m S 28.3 10.1 2787:46
java <br>
737 root 20 0 15036 1220 840 R 1.9 0.0 0:00.01
top <br>
1536 root 20 0 0 0 0 S 1.9 0.0 10:24.07
kondemand/6 <br>
1 root 20 0 19344 1552 1232 S 0.0 0.0 0:00.46
init <br>
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02
kthreadd <br>
3 root RT 0 0 0 0 S 0.0 0.0 0:00.53
migration/0 <br>
4 root 20 0 0 0 0 S 0.0 0.0 0:02.85
ksoftirqd/0 <br>
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00
migration/0 <br>
6 root RT 0 0 0 0 S 0.0 0.0 0:01.50
watchdog/0 <br>
7 root RT 0 0 0 0 S 0.0 0.0 0:00.49
migration/1 <br>
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00
migration/1 <br>
9 root 20 0 0 0 0 S 0.0 0.0 0:01.85
ksoftirqd/1 <br>
10 root RT 0 0 0 0 S 0.0 0.0 0:01.21
watchdog/1 <br>
11 root RT 0 0 0 0 S 0.0 0.0 0:00.50
migration/2 <br>
12 root RT 0 0 0 0 S 0.0 0.0 0:00.00
migration/2 <br>
13 root 20 0 0 0 0 S 0.0 0.0 0:02.12
ksoftirqd/2 <br>
14 root RT 0 0 0 0 S 0.0 0.0 0:01.06
watchdog/2 <br>
15 root RT 0 0 0 0 S 0.0 0.0 0:01.81
migration/3 <br>
16 root RT 0 0 0 0 S 0.0 0.0 0:00.00
migration/3 <br>
17 root 20 0 0 0 0 S 0.0 0.0 0:00.50
ksoftirqd/3 <br>
18 root RT 0 0 0 0 S 0.0 0.0 0:00.87
watchdog/3 <br>
19 root RT 0 0 0 0 S 0.0 0.0 0:01.76
migration/4 <br>
20 root RT 0 0 0 0 S 0.0 0.0 0:00.00
migration/4 <br>
21 root 20 0 0 0 0 S 0.0 0.0 0:04.57
ksoftirqd/4 <br>
22 root RT 0 0 0 0 S 0.0 0.0 0:01.03
watchdog/4 <br>
23 root RT 0 0 0 0 S 0.0 0.0 0:00.51
migration/5 <br>
24 root RT 0 0 0 0 S 0.0 0.0 0:00.00
migration/5 <br>
25 root 20 0 0 0 0 S 0.0 0.0 0:02.27
ksoftirqd/5 <br>
[root@compute-1-4 ~]# umount -f /helga<br>
umount2: Device or resource busy<br>
umount.nfs: /helga: device is busy<br>
umount2: Device or resource busy<br>
umount.nfs: /helga: device is busy<br>
[root@compute-1-4 ~]# umount -f /helga<br>
umount2: Device or resource busy<br>
umount.nfs: /helga: device is busy<br>
umount2: Device or resource busy<br>
umount.nfs: /helga: device is busy<br>
[root@compute-1-4 ~]# mount -o remount /helga<br>
mount.nfs: Stale file handle<br>
[root@compute-1-4 ~]# mount -o remount /helga<br>
mount.nfs: Stale file handle<br>
[root@compute-1-4 ~]# top<br>
<br>
top - 16:33:19 up 14 days, 7:22, 1 user, load average: 36.17,
35.76, 35.86<br>
Tasks: 229 total, 1 running, 228 sleeping, 0 stopped, 0 zombie<br>
Cpu(s): 99.6%us, 0.4%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi,
0.0%si, 0.0%st<br>
Mem: 15928760k total, 8016408k used, 7912352k free, 238044k
buffers<br>
Swap: 2047992k total, 0k used, 2047992k free, 1094888k
cached<br>
<br>
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND <br>
8499 thelmuth 20 0 6089m 1.5g 11m S 209.2 10.1 2831:08
java <br>
8397 thelmuth 20 0 6089m 1.5g 11m S 198.9 10.1 2789:37
java <br>
8367 thelmuth 20 0 6089m 1.5g 11m S 196.2 10.0 2831:37
java <br>
8428 thelmuth 20 0 6089m 1.5g 11m S 192.9 10.1 2811:28
java <br>
764 root 20 0 15164 1352 952 R 0.7 0.0 0:00.03
top <br>
1532 root 20 0 0 0 0 S 0.3 0.0 20:31.35
kondemand/2 <br>
1534 root 20 0 0 0 0 S 0.3 0.0 12:32.23
kondemand/4 <br>
1536 root 20 0 0 0 0 S 0.3 0.0 10:24.14
kondemand/6 <br>
1537 root 20 0 0 0 0 S 0.3 0.0 17:00.03
kondemand/7 <br>
3717 root 20 0 341m 12m 3000 S 0.3 0.1 38:24.92
python-bin <br>
1 root 20 0 19344 1552 1232 S 0.0 0.0 0:00.46
init <br>
2 root 20 0 0 0 0 S 0.0 0.0 0:00.02
kthreadd <br>
3 root RT 0 0 0 0 S 0.0 0.0 0:00.53
migration/0 <br>
4 root 20 0 0 0 0 S 0.0 0.0 0:02.85
ksoftirqd/0 <br>
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00
migration/0 <br>
6 root RT 0 0 0 0 S 0.0 0.0 0:01.50
watchdog/0 <br>
7 root RT 0 0 0 0 S 0.0 0.0 0:00.49
migration/1 <br>
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00
migration/1 <br>
9 root 20 0 0 0 0 S 0.0 0.0 0:01.85
ksoftirqd/1 <br>
10 root RT 0 0 0 0 S 0.0 0.0 0:01.21
watchdog/1 <br>
11 root RT 0 0 0 0 S 0.0 0.0 0:00.50
migration/2 <br>
12 root RT 0 0 0 0 S 0.0 0.0 0:00.00
migration/2 <br>
13 root 20 0 0 0 0 S 0.0 0.0 0:02.12
ksoftirqd/2 <br>
14 root RT 0 0 0 0 S 0.0 0.0 0:01.06
watchdog/2 <br>
15 root RT 0 0 0 0 S 0.0 0.0 0:01.81
migration/3 <br>
16 root RT 0 0 0 0 S 0.0 0.0 0:00.00
migration/3 <br>
17 root 20 0 0 0 0 S 0.0 0.0 0:00.50
ksoftirqd/3 <br>
18 root RT 0 0 0 0 S 0.0 0.0 0:00.87
watchdog/3 <br>
19 root RT 0 0 0 0 S 0.0 0.0 0:01.76
migration/4 <br>
20 root RT 0 0 0 0 S 0.0 0.0 0:00.00
migration/4 <br>
21 root 20 0 0 0 0 S 0.0 0.0 0:04.57
ksoftirqd/4 <br>
[root@compute-1-4 ~]# umount -f /helga<br>
umount2: Device or resource busy<br>
umount.nfs: /helga: device is busy<br>
umount2: Device or resource busy<br>
umount.nfs: /helga: device is busy<br>
[root@compute-1-4 ~]# mount /helga<br>
mount.nfs: Stale file handle<br>
[root@compute-1-4 ~]# logout<br>
Connection to compute-1-4 closed.<br>
[root@fly install]# ssh compute-2-17<br>
Rocks Compute Node<br>
Rocks 6.1.1 (Sand Boa)<br>
Profile built 23:42 25-Dec-2014<br>
<br>
Kickstarted 23:58 25-Dec-2014<br>
[root@compute-2-17 ~]# ls /helga<br>
ls: cannot access /helga: Stale file handle<br>
[root@compute-2-17 ~]# umount -f /helga<br>
umount2: Device or resource busy<br>
umount.nfs: /helga: device is busy<br>
umount2: Device or resource busy<br>
umount.nfs: /helga: device is busy<br>
[root@compute-2-17 ~]# lsof | grep /helga<br>
lsof: WARNING: can't stat() nfs4 file system /helga<br>
Output information may be incomplete.<br>
python-bi 28615 root 8w
unknown
/helga/public_html/tractor/cmd-logs/hz12/J1504210040/T3.log (stat:
Stale file handle)<br>
[root@compute-2-17 ~]# <br><div><div class="h5">
<br>
<br>
<br>
<div>On 6/22/15 4:33 PM, Wm. Josiah Erikson
wrote:<br>
</div>
<blockquote type="cite">
Uh... shouldn't have been. I wonder if it was because I yanked its
logfile place out from under it? I did not expect that to happen!
It's back now, but it looks like all of your runs errored out. I'm
so sorry!<br>
-Josiah<br>
<br>
<br>
<div>On 6/22/15 4:00 PM, Thomas Helmuth
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>
<div>Very cool!<br>
<br>
</div>
It looks like tractor is down. Is that related to this move?<br>
<br>
</div>
Tom<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Jun 22, 2015 at 1:50 PM, Wm.
Josiah Erikson <span dir="ltr"><<a href="mailto:wjerikson@hampshire.edu" target="_blank">wjerikson@hampshire.edu</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Keg will
be going down shortly, and coming back up as a harder,
faster,<br>
better, stronger version shortly as well, I hope :)<br>
<span><font color="#888888"> -Josiah<br>
</font></span><span><br>
<br>
On 6/11/15 9:53 AM, Wm. Josiah Erikson wrote:<br>
> Hi all,<br>
> I'm pretty sure there is 100% overlap between
people who care<br>
> about fly and people who care about keg at this
point (though there<br>
> are some people who care about fly and not so much
keg, like Lee and<br>
> Tom - sorry), so I'm sending this to this list.<br>
> I have a new 32TB 14+2 RAID6 with 24GB of RAM
superfast (way<br>
> faster than gigabit) keg all ready to go! I would
like to bring it up<br>
> on Monday, June 22nd. It would be ideal if
rendering was NOT happening<br>
> at that time, to make my rsyncing life easier :)
Any objections?<br>
> -Josiah<br>
><br>
<br>
</span><span>--<br>
Wm. Josiah Erikson<br>
Assistant Director of IT, Infrastructure Group<br>
System Administrator, School of CS<br>
Hampshire College<br>
Amherst, MA 01002<br>
<a href="tel:%28413%29%20559-6091" value="+14135596091" target="_blank">(413)
559-6091</a><br>
<br>
</span>
<div>
<div>_______________________________________________<br>
Clusterusers mailing list<br>
<a href="mailto:Clusterusers@lists.hampshire.edu" target="_blank">Clusterusers@lists.hampshire.edu</a><br>
<a href="https://lists.hampshire.edu/mailman/listinfo/clusterusers" rel="noreferrer" target="_blank">https://lists.hampshire.edu/mailman/listinfo/clusterusers</a><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Clusterusers mailing list
<a href="mailto:Clusterusers@lists.hampshire.edu" target="_blank">Clusterusers@lists.hampshire.edu</a>
<a href="https://lists.hampshire.edu/mailman/listinfo/clusterusers" target="_blank">https://lists.hampshire.edu/mailman/listinfo/clusterusers</a>
</pre>
</blockquote>
<br>
<pre cols="72">--
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
<a href="tel:%28413%29%20559-6091" value="+14135596091" target="_blank">(413) 559-6091</a>
</pre>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
Clusterusers mailing list
<a href="mailto:Clusterusers@lists.hampshire.edu" target="_blank">Clusterusers@lists.hampshire.edu</a>
<a href="https://lists.hampshire.edu/mailman/listinfo/clusterusers" target="_blank">https://lists.hampshire.edu/mailman/listinfo/clusterusers</a>
</pre>
</blockquote>
<br>
<pre cols="72">--
Wm. Josiah Erikson
Assistant Director of IT, Infrastructure Group
System Administrator, School of CS
Hampshire College
Amherst, MA 01002
<a href="tel:%28413%29%20559-6091" value="+14135596091" target="_blank">(413) 559-6091</a>
</pre>
</div></div></div>
<br>_______________________________________________<br>
Clusterusers mailing list<br>
<a href="mailto:Clusterusers@lists.hampshire.edu">Clusterusers@lists.hampshire.edu</a><br>
<a href="https://lists.hampshire.edu/mailman/listinfo/clusterusers" rel="noreferrer" target="_blank">https://lists.hampshire.edu/mailman/listinfo/clusterusers</a><br>
<br></blockquote></div><br></div>