[Cs254f11] slurp

Lee Spector lspector at hampshire.edu
Tue Oct 25 10:56:15 EDT 2011


> What I still don't get is why it was repeatable (and still is, I just did it again) that when I used the mmap version of slurp to pull the file in, it was then around 10 times faster to spit it out to the REPL. Weird.


I think that the mmap version doesn't "pull the file in" into the same kind of thing, at some level, because memory maps are something like lazy data structures (http://en.wikipedia.org/wiki/Mmap) and presumably this can have performance implications for whatever clooj is doing en route to the output pane. Not that I understand what's going on here in detail, but I think that's the source of what you're seeing.

 -Lee


On Oct 25, 2011, at 10:24 AM, Wm. Josiah Erikson wrote:

> Lee is quite right. Using either regular slurp or mmap/slurp, I can, using many utility functions I have written (and they finally are work right! Yay! Maybe now I can start doing evolution!), do this right off the bat:
> 
> (get (get_notes (get_note_lines (nth (map parse_song (map get_songpath reference-scored-songs)) 0))) 3000)
> 
> And it returns an integer. Immediately. What I still don't get is why it was repeatable (and still is, I just did it again) that when I used the mmap version of slurp to pull the file in, it was then around 10 times faster to spit it out to the REPL. Weird.
> 
>   -Josiah
> 
> 
> 
> Lee Spector wrote:
>> I should have made the bottom lines more clear for the class more generally:
>> 
>> - slurp isn't the culprit here, and it's perfectly fine and the simplest thing to use for file input in most applications -- so slurp away.
>> 
>> - Don't spew humongous amounts of text to the output pane in clooj. It'll get bogged down trying to format it and you'll be sorry. But the data sizes that cause a problem here probably aren't things you really want to see spewed anyway, so usually this won't matter.  If you really do want to spew humongous amounts of text then you can run your code using leningen (and there are some simple ways to redirect the leiningen output to a file if you want to -- ask me how) or, depending on your application, you might want to explicitly direct your output to a file from within your code.
>> 
>> -Lee
>> 
>> 
>> On Oct 25, 2011, at 9:01 AM, Lee Spector wrote:
>> 
>>  
>>> [cc-ing the class since this may be of broader interest]
>>> 
>>> The slowness here seems to be the clooj output pane, which just doesn't seem to be good at spewing large volumes of data (maybe because code it uses to format output wasn't written with large volumes in mind).
>>> 
>>> When I do:
>>> 
>>> (def testsong (slurp "/Users/leespector/Code/clojure/play/src/LovelyRita.csv"))
>>> 
>>> (first testsong)
>>> 
>>> (last testsong)
>>> 
>>> each of these returns instantly. The "last" call was to ensure that the whole thing had been read (even though that should be the case according to the spec because slurp returns a string, which isn't lazy -- but I wanted to make sure).
>>> 
>>> The delay comes when you try to print the value of testsong to the output pane, and since I don't think you really want to ever do this anyway I don't think this should be a problem. Just don't do it :-).
>>> 
>>> BTW if I do all of this from a leiningen repl then the spewing does take some time -- there's a fair amount of data there -- but it starts spewing instantly and it doesn't get all gummed up like clooj does.
>>> 
>>> -Lee
>>> 
>>> 
>>> 
>>> On Oct 24, 2011, at 2:45 PM, Wm. Josiah Erikson wrote:
>>> 
>>>    
>>>> So I've attached the file I'm slurping. Tell me if you get different results. If I do something very very simple like:
>>>> 
>>>> (def testsong (slurp "/Users/josiah/Documents/cs254/LovelyRita.csv"))
>>>> 
>>>> and then type "testsong" into the REPL, it takes forever, sometimes as much as 2 minutes, before returning.
>>>> 
>>>> However, if I use the mmaped version:
>>>> 
>>>> (require '[clojure.contrib.mmap :as mmap])
>>>> (def testsong (mmap/slurp "/Users/josiah/Documents/cs254/LovelyRita.csv"))
>>>> 
>>>> and then type "testsong" into the REPL, it takes oh, 3-5 seconds.
>>>> 
>>>> Said file is under 1MB, and the real thing I want to do with it is:
>>>> 
>>>> (require '[clojure.contrib.string :as string])
>>>> 
>>>> (defn parse_song
>>>> "Takes a filename, assumed to be a MIDI file converted to .csv format, and returns a vector of lists.
>>>> Each list has a number of elements, split by the comma in the .csv"
>>>> [file]
>>>> (vec (map #(string/split #"," %) (string/split-lines (mmap/slurp file)))))
>>>> 
>>>> (def song (parse_song "/Users/josiah/Documents/cs254/LovelyRita.csv"))
>>>> 
>>>> Then if you type "song" into the REPL, you get a long delay as well, before it spits out the answer. Maybe it's just the REPL? I'm going to keep coding anyway.
>>>> 
>>>> Before I found the mmapped version of slurp and was using a definition of parse_song with the normal one, parse_song would sometimes never return at all, or at least not before I ran out of patience and killed clooj :)
>>>> 
>>>> 
>>>> -- 
>>>> -----
>>>> Wm. Josiah Erikson
>>>> Network Engineer
>>>> Hampshire College
>>>> Amherst, MA 01002
>>>> 
>>>>      
>>> --
>>> Lee Spector, Professor of Computer Science
>>> Cognitive Science, Hampshire College
>>> 893 West Street, Amherst, MA 01002-3359
>>> lspector at hampshire.edu, http://hampshire.edu/lspector/
>>> Phone: 413-559-5352, Fax: 413-559-5438
>>> 
>>> _______________________________________________
>>> Cs254f11 mailing list
>>> Cs254f11 at lists.hampshire.edu
>>> https://lists.hampshire.edu/mailman/listinfo/cs254f11
>>>    
>> 
>> --
>> Lee Spector, Professor of Computer Science
>> Cognitive Science, Hampshire College
>> 893 West Street, Amherst, MA 01002-3359
>> lspector at hampshire.edu, http://hampshire.edu/lspector/
>> Phone: 413-559-5352, Fax: 413-559-5438
>> 
>>  
> 
> -- 
> -----
> Wm. Josiah Erikson
> Network Engineer
> Hampshire College
> Amherst, MA 01002
> 

--
Lee Spector, Professor of Computer Science
Cognitive Science, Hampshire College
893 West Street, Amherst, MA 01002-3359
lspector at hampshire.edu, http://hampshire.edu/lspector/
Phone: 413-559-5352, Fax: 413-559-5438



More information about the Cs254f11 mailing list