[Cs254f11] My plan

Tue Oct 18 11:24:53 EDT 2011

On Oct 18, 2011, at 9:35 AM, Wm. Josiah Erikson wrote:

> Right, yes, I think that randomly GENERATING the genomes to make each critic will be they key, but they will stay the same after they have been generated, otherwise evolution will be impossible. I agree with and understand you here.

Not only should the critics stay the same after they have been generated, but they should also *act* the same each time if applied to the same MIDI file multiple times. That is, there should be no randomness in what a critic does, once it has been created.

> Where I don't understand you, I think, is where you suggest that I should have each genome BE a critic. I'm not sure how I would then breed the critics together, since I was thinking of a genome as being the smallest possible unit that did anything useful (and the combination of these genomes would be what made a critic unique). However, as I go to code this, I may discover what you mean, as perhaps this makes no sense in practical terms. I suppose one could breed the genomes together by giving the arguments that one genome passed to a function to the function in another genome, but I don't think that will create useful evolution. I was thinking of each genome as being a random function (I would write a set of them that performed operations on the file) and random argument (s) within meaningful parameters (like length of the line, or number of lines in the file). This doesn't really make for tree-based GP, though, I suppose.... hmm... not sure anything other than a depth of one would work properly or usefully in this context, though. Maybe I'm boxing myself in.

On this: What Thom said (while I was typing this)!

Maybe more simply: Whatever is "inside" one of your "scoring genomes" (I'm not sure, and maybe you aren't yet either) you were going to combine them by "adding all of their results up." The GP perspective is: Why do you think that adding them all up is the best thing to do? Maybe it'd be better to add two of them, multiply by a third, divide that result by 3.141592, and then square the result of that. Or something else. Why not let all of that evolve? So the idea is to make your critics programs that do the whole job, including all of the stuff that you were going to put inside of scoring genomes AND the ways that all of their results are combined. And let *all* of that evolve.

 -Lee

> 
>    -Josiah
> 
> 
> 
> On 10/17/11 8:15 PM, Lee Spector wrote:
>> [Re-adding the list to the cc: -- hope that's okay! I do think that others will also benefit from this.]
>> 
>> Comments below...
>> 
>> 
>> On Oct 17, 2011, at 4:22 PM, Wm. Josiah Erikson wrote:
>>>    OK, so each critic is made up of a number of "scoring genomes", that are either generated randomly or pulled randomly from a large predetermined set of soup of things you could do to get a score. Add all the scoring genomes together and you get a critic, which will then evaluate each song and the total deviation of each individual score from my own will be that critic's fitness. Then I will breed together the x best critics in each generation of y members (adjustable) and run it again.
>> It sounds from the below like each "scoring genome" can itself be a pretty complicated beast, do math and comparisons etc. So why couldn't *one* of these be able to do all of the things that you're thinking that a bunch of them could do summed together? You're saying that a critic will be the sum of a bunch of things, each of which can do a bunch of things including making sums of bunches of things... right? If I've got that right then I think it'll be simpler and probably just as good to make each critic *be* just *one* of these scoring genomes.
>> 
>>>    Now each scoring genome could both DO random things (like pull two random characters out from two random lines and compare them to each other, or pull a random character from a random line and compare it to the most common character in the whole file, or a million other possibilities) and could also assign scores to those things, positive, negative, who knows. It could also be that having a particular genome hard-coded with the actual positions in the files that it pulls from, once it's been randomly generated, or whatever, would be helpful and create more consistent results when breeding the resulting critic with another one.
>> My advice is to avoid any randomness in the actions or calculations of the scoring genomes. In other words, the same genome, if evaluated twice on the same file, should produce exactly the same score.
>> 
>> It's not that I can't imagine it being useful to do random stuff (or, as you suggest below, grab random data from a file) as part of a calculating a score, but rather that I predict that randomness here will make evolution impossible.
>> 
>> If your scores are nondeterministic (giving different scores on different applications to the same data) then your fitnesses (the differences between the scores and your own personal scores) will also be nondeterministic. This means that the "selection" performed in the evolutionary loop will be basing its selections on luck to a fairly large extent, and that this luck factor may well be as important as actual quality in determining who gets selected. This would mean that the evolutionary loop will not be able to amplify quality over generations.
>> 
>> If you really want to have scoring genomes that involve randomness then I think you'd have to take pretty serious countermeasures, like testing each one by running it a large numbers and averaging the results. But I think it'd be simpler and better just to leave out the randomness altogether.
>> 
>> Of course, if you're not going to be using a random number you'll have to have a non-random number and that will have to come from somewhere. Presumably there could just be numbers in the genome itself, or some functions could operate on *all* lines in a file or all lines of a particular type, etc.
>> 
>> 
>>> It will be interesting to see what tools I can come up with to try to help this evolve towards something useful, i.e. what to put in the soup. The ideas I have so far involve
>>> 
>>> Fetching:
>>>    -set of characters, same random position on each line of the file (that position would be in the genome and would get passed on from generation to generation)
>>>    -getting the most common character in the file
>>>    -random number of random characters from the file on different lines
>>>    -same as above except same line
>>>    -random initial number determines starting number of the position on the line, increase for each line until out of characters
>>> 
>>> Operators:
>>>    -modulus division, addition, subtraction, multiplication, division, mode, mean, median, and mixtures of all of these with comparison
>>> 
>>>    Maybe this is all too heterogeneous and I need to make everything take two operators or something. I'll see as I start actually implementing this. I'm going to start with a few simple genomes :)
>> All of this looks good except for the randomness...
>> 
>>>    I figured out the slurp problem - for some reason when I stick it in a vector of lists, it works fine:
>>> 
>>> (def rita (vec (string/split-lines (slurp "/Volumes/cs254/group_storage/josiah/LovelyRita.csv"))))
>>> 
>>> (get rita (rand-int (count rita)))
>>> ;"11, 6864, Note_on_c, 9, 42, 0"
>>> 
>> 
>> Well... I don't know, but this does have half of the smell of a laziness problem, since calling vec on something lazy forces it to be fully realized. But the doc string on split-lines says it's not lazy, and neither should slurp be. So this doesn't fully make sense. Another thing that doesn't is that you said you had the same problem with the slurp example in clojinc with Jabberwocky.txt, and I haven't experienced any such problem... it returns instantly for me.
>> 
>>  -Lee
>> 
>> 
>> 
>> 
>> 
>> --
>> Lee Spector, Professor of Computer Science
>> Cognitive Science, Hampshire College
>> 893 West Street, Amherst, MA 01002-3359
>> lspector at hampshire.edu, http://hampshire.edu/lspector/
>> Phone: 413-559-5352, Fax: 413-559-5438
>> 
> 
> -- 
> Wm. Josiah Erikson
> Network Engineer
> Hampshire College
> Amherst, MA 01002
> (413) 559-6091
> 

--
Lee Spector, Professor of Computer Science
Cognitive Science, Hampshire College
893 West Street, Amherst, MA 01002-3359
lspector at hampshire.edu, http://hampshire.edu/lspector/
Phone: 413-559-5352, Fax: 413-559-5438