<div dir="ltr">I agree with the others that it should be pretty easy to do everything you need. And I can walk you through how to easily do it in Clojush with the python script I have to launch runs.<br><div><br>I'm not sure if I'd get more or less performance with one slot per node, but I'm guessing less -- I often get multiple runs going on a node, so I'd need to get 4 or 8 times speedups with multi-threaded in a single run to get the same amount of work done. But, maybe this would just happen automatically with how we have it setup in Clojush.<br><br></div><div>Tom<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 20, 2015 at 4:19 PM, Chris Perry <span dir="ltr"><<a href="mailto:perry@hampshire.edu" target="_blank">perry@hampshire.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
I just wrote a post to the tractor developer forum asking about the one piece of this that I’m not 100% sure about, but if we go with Josiah’s recommendation of one slot per blade then my post will be irrelevant and YES we can do exactly what you want.<br>
<br>
And Josiah, I do think that you’re right about one slot per node, not only because of the great multithreading that everyone’s doing, but because our fileserver can’t keep up with too many concurrent reads/writes anyway.<br>
<span class="HOEnZb"><font color="#888888"><br>
- chris<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
On Apr 20, 2015, at 3:55 PM, Wm. Josiah Erikson <<a href="mailto:wjerikson@hampshire.edu">wjerikson@hampshire.edu</a>> wrote:<br>
<br>
><br>
><br>
> On 4/20/15 3:43 PM, Lee Spector wrote:<br>
>> Maybe it's finally time for me to do this. As Tom points out I should have a better handle on it before he leaves. But the runs that I do are a little different from what he usually does.<br>
>><br>
>> Is it now possible to do this with all of these properties?:<br>
>><br>
>> - My run starts immediately.<br>
> Yes. If you spool with a high enough priority, it will actually<br>
> terminate and re-spool other jobs that are in your way.<br>
>> - It runs on a node that I specify.<br>
> Yes. The service key would just be the node you want.<br>
>> - No other jobs run on that node until my run is done, even if mine temporarily goes to low cpu utilization.<br>
> We would have to make it so that your node had one slot, and then you<br>
> did something like this if you had more than one command:<br>
> <a href="http://fly.hampshire.edu/docs/Tractor/scripting.html#sharing-one-server-check-out-among-several-commands" target="_blank">http://fly.hampshire.edu/docs/Tractor/scripting.html#sharing-one-server-check-out-among-several-commands</a>.<br>
> I think that most things that we are doing these days are multithreaded<br>
> enough that one slot per node might be a valid choice cluster-wide... at<br>
> least at the moment with Maxwell and prman. What say others?<br>
>> - I can see stderr as well as std out, which I'll be piping to a file (which has to be somewhere I can grep it).<br>
> Just pipe each one to whatever file you want. yes.<br>
>> - I can easily and instantly kill and restart my runs.<br>
> Yes. Right-click and restart, interrupt or delete.<br>
>><br>
>> If so then yes, I guess I should try to switch to a tractor-based workflow for my work on fly.<br>
>><br>
>> -Lee<br>
>><br>
>><br>
>>> On Apr 20, 2015, at 3:26 PM, Chris Perry <<a href="mailto:perry@hampshire.edu">perry@hampshire.edu</a>> wrote:<br>
>>><br>
>>><br>
>>> I have a different ideal solution to propose: use tractor!<br>
>>><br>
>>> What you are looking to do with your NIMBY mechanism is to pull machines out of tractor temporarily so they can run your job(s). But this is exactly what tractor does: it gives a job to an idle machine then keeps that machine from running new jobs until the first job is killed or completed. If you spool with high enough priority, tractor kills and respools a lower priority job so as to free up the machine, so you don’t have to wait if immediate access is a concern of yours.<br>
>>><br>
>>> Not to mention, the tractor interface will show running jobs so there would be a “Lee’s job” item in green on the list. This might help keep you less likely to lose track in the way you describe happening now.<br>
>>><br>
>>> Worth considering? I’m sure there are a few kinks to figure out (such as how to tag your jobs so that they have the full machine, guaranteed) but I feel confident that we can do this.<br>
>>><br>
>>> - chris<br>
>>><br>
>>> On Apr 20, 2015, at 2:31 PM, Lee Spector <<a href="mailto:lspector@hampshire.edu">lspector@hampshire.edu</a>> wrote:<br>
>>><br>
>>>> Some of these were probably my doing, but I only recall nimbying 1-4 and 4-5 in the recent past.<br>
>>>><br>
>>>> It's not a problem with a node that causes me to do this, it's an interest in having total control of a node for a compute-intensive multithreaded run, with no chance that any other processes will be allocated on it. Sometimes I'll start one of these, check in on it regularly for a while, and then check in less frequently if it's not doing anything really interesting but I'm not ready to kill it in case it might still do something good. Then sometimes I lose track. Right now I have nothing running, and any nimbying that I've done can be unnimbyed, although I'm not 100% sure what I may have left nimbyed.<br>
>>>><br>
>>>> Ideally, I guess, we'd have a nimby command that records who nimbyed and maybe periodically asks them if they still want it nimbyed. I'm not sure how difficult that would be. If somebody is inspired to do this, then it would also be nice if the command for numbying/unnimbying was more straightforward than the current one (which takes a 1 or a 0 as an argument, which is a little confusing), if it returned a value that made it more clear what the new status is, and if there was a simple way just to check the status (which maybe there is now? if so I don't know it).<br>
>>>><br>
>>>> -Lee<br>
>>>><br>
>>>><br>
>>>>> On Apr 20, 2015, at 1:36 PM, Wm. Josiah Erikson <<a href="mailto:wjerikson@hampshire.edu">wjerikson@hampshire.edu</a>> wrote:<br>
>>>>><br>
>>>>> Hi all,<br>
>>>>> Why are nodes 1-17, 1-18, 1-2, 1-4, and 1-9 NIMBYed? If you are<br>
>>>>> having a problem with a node that causes you to need to NIMBY it, please<br>
>>>>> let me know, because maybe it just means I screwed up the service keys<br>
>>>>> or something. I'm not omniscient :)<br>
>>>>><br>
>>>>> --<br>
>>>>> Wm. Josiah Erikson<br>
>>>>> Assistant Director of IT, Infrastructure Group<br>
>>>>> System Administrator, School of CS<br>
>>>>> Hampshire College<br>
>>>>> Amherst, MA 01002<br>
>>>>> <a href="tel:%28413%29%20559-6091" value="+14135596091">(413) 559-6091</a><br>
>>>>><br>
>>>>> _______________________________________________<br>
>>>>> Clusterusers mailing list<br>
>>>>> <a href="mailto:Clusterusers@lists.hampshire.edu">Clusterusers@lists.hampshire.edu</a><br>
>>>>> <a href="https://lists.hampshire.edu/mailman/listinfo/clusterusers" target="_blank">https://lists.hampshire.edu/mailman/listinfo/clusterusers</a><br>
>>>> --<br>
>>>> Lee Spector, Professor of Computer Science<br>
>>>> Director, Institute for Computational Intelligence<br>
>>>> Cognitive Science, Hampshire College<br>
>>>> 893 West Street, Amherst, MA 01002-3359<br>
>>>> <a href="mailto:lspector@hampshire.edu">lspector@hampshire.edu</a>, <a href="http://hampshire.edu/lspector/" target="_blank">http://hampshire.edu/lspector/</a><br>
>>>> Phone: <a href="tel:413-559-5352" value="+14135595352">413-559-5352</a>, Fax: <a href="tel:413-559-5438" value="+14135595438">413-559-5438</a><br>
>>>><br>
>>>> _______________________________________________<br>
>>>> Clusterusers mailing list<br>
>>>> <a href="mailto:Clusterusers@lists.hampshire.edu">Clusterusers@lists.hampshire.edu</a><br>
>>>> <a href="https://lists.hampshire.edu/mailman/listinfo/clusterusers" target="_blank">https://lists.hampshire.edu/mailman/listinfo/clusterusers</a><br>
>>> _______________________________________________<br>
>>> Clusterusers mailing list<br>
>>> <a href="mailto:Clusterusers@lists.hampshire.edu">Clusterusers@lists.hampshire.edu</a><br>
>>> <a href="https://lists.hampshire.edu/mailman/listinfo/clusterusers" target="_blank">https://lists.hampshire.edu/mailman/listinfo/clusterusers</a><br>
>> --<br>
>> Lee Spector, Professor of Computer Science<br>
>> Director, Institute for Computational Intelligence<br>
>> Cognitive Science, Hampshire College<br>
>> 893 West Street, Amherst, MA 01002-3359<br>
>> <a href="mailto:lspector@hampshire.edu">lspector@hampshire.edu</a>, <a href="http://hampshire.edu/lspector/" target="_blank">http://hampshire.edu/lspector/</a><br>
>> Phone: <a href="tel:413-559-5352" value="+14135595352">413-559-5352</a>, Fax: <a href="tel:413-559-5438" value="+14135595438">413-559-5438</a><br>
>><br>
>> _______________________________________________<br>
>> Clusterusers mailing list<br>
>> <a href="mailto:Clusterusers@lists.hampshire.edu">Clusterusers@lists.hampshire.edu</a><br>
>> <a href="https://lists.hampshire.edu/mailman/listinfo/clusterusers" target="_blank">https://lists.hampshire.edu/mailman/listinfo/clusterusers</a><br>
><br>
> --<br>
> Wm. Josiah Erikson<br>
> Assistant Director of IT, Infrastructure Group<br>
> System Administrator, School of CS<br>
> Hampshire College<br>
> Amherst, MA 01002<br>
> <a href="tel:%28413%29%20559-6091" value="+14135596091">(413) 559-6091</a><br>
><br>
> _______________________________________________<br>
> Clusterusers mailing list<br>
> <a href="mailto:Clusterusers@lists.hampshire.edu">Clusterusers@lists.hampshire.edu</a><br>
> <a href="https://lists.hampshire.edu/mailman/listinfo/clusterusers" target="_blank">https://lists.hampshire.edu/mailman/listinfo/clusterusers</a><br>
<br>
_______________________________________________<br>
Clusterusers mailing list<br>
<a href="mailto:Clusterusers@lists.hampshire.edu">Clusterusers@lists.hampshire.edu</a><br>
<a href="https://lists.hampshire.edu/mailman/listinfo/clusterusers" target="_blank">https://lists.hampshire.edu/mailman/listinfo/clusterusers</a><br>
</div></div></blockquote></div><br></div>