27 May 2007

Attempt to write a GA

This is really just a note-to-self:

This week I started playing with my MacBook's XCode IDE by putting together a simple genetic algorithm. Melanie Mitchell (1997) suggests the following programming exercise: put something together that ends up evolving binary strings of all 1s (i.e. a typical chromosome might start out as "0110001010110011" and after 20 generations of fitness-proportionate selection, roulette-wheel sampling, and 70% crossover, look like "1111011111111111." What fancy terminology.)

Paraphrasing Mitchell's description of a simple GA, the steps are along the lines of:

  1. Start with a randomly generated population of n l-bit chromosomes.
  2. Calculate the fitness f(x) of each chromosome x in the population.
  3. Repeat the following until n offspring are created:
    1. Pick two "parents," with so-called roulette-wheel sampling so that it's more likely to pick fitter parents. (Where fitness is the number of ones in the chromosome.)
    2. With some probability PROB_XOVR (like, 70%) cross over at a randomly-selected junction within the two chromosomes. These are two offspring.
    3. With some probability PROB_MUT (like, 0.1%) mutate (flip) each spot in each offspring chromosome.
  4. Replace the current population with the new population.
  5. Go to step 2. (You can go 20, 100, or however long it takes to evolve a fit population.)

Anyhow, it worked! I was surprised by what an impact crossover has, but it's probably the caffeine making me think fuzzy.

With a population of 100 and a chromosome length of 20, the average fitness (the number of 1s at the beginning of the universe) was, say, 9.8. By the 48th generation the average fitness was about 17, and by the next generation chromosomes started to appear with all 1s.

Here's another run. A snippet of Generation 0 shows what you'd expect; each of the length-20 chromosomes has about 10 zeros and 10 ones (in this run, 9.86):

totalfitness = 986. avgfitness = 9.860000.
Generation 0.
11111000001110110110
11111000111100000001
11001110011100100001
11100001111111000110
10111010110001001100
10001100011111111111
01110011000011011101


By generation 23 the population had become much more 1-laden. The average fitness climbed to 17.3:

Generation 23.
Found a gene with all 1s at generation 23.
Found a gene with all 1s at generation 23.
totalfitness = 1730. avgfitness = 17.299999.
Generation 23.
11101111111111111011
11111111111111111010
11111110111111111011
10110111111111111100
11111110111111110111
01101111111111111011
10111110111111111011
11111101111110110111
11110111101111111011


Hooray.

I guess the next step is plotting population fitness as a function of generation. That would be an excuse to re-learn file I/O.

g-fav

No comments: