24 July 2009

Watching simulated chromosomes mutate as they try to reach a "goal"

Hi -

What might a time-lapse of evolution look like, at the genetic level, watching genes win and lose on the basis of how well the organism was able to survive? In this blog post, we'll look at a really dumbed-down version of this that's still, I think, conceptually compelling.


How is a recipe different from an evolving organism? What if real-world problems, like designing bridges or electronic circuits, were solved by a computer that hunted around for a good solution rather than at the explicit, exacting direction of a team of engineers?

Genetic algorithms are computational "recipes" that function in this manner. Broadly, it's been possible to program a computer to program itself, using a technique that has some analogies with real-life evolution. This won't surprise any computer scientists, since the technique has been around since the 1950s and gained popularity in the 1970s. But I thought you might be interested in GAs because we can peer (diagrammatically) into an accelerated world of simulated, simplified "evolution" - and, also, it's a fun programming exercise that lets me get a little more experience with languages like Mathematica and C.

(If this is new to you, see the brief overview that I blogged about in 2007.)

In short, we give our evolution-simulating program a few things:
  • A population of organisms (usually called chromosomes), each chromosome being composed of a bunch of genes... and each gene can be as simple as a 0 or 1 -- or as complicated as a mathematical function like Sin(x). Usually the population is random, a bunch of Frankensteins.
  • A goal to achieve, like "figure out what math function best approximates this stock-ticker," or "figure out how to place 100 Lego blocks to make a really long bridge that doesn't collapse under its own weight." Usually the goal is presented in terms of a test called a fitness function. We test each organism (or chromosome's) ability to pass the test. The chromosomes that do better survive - pairs of chomosomes (parents) breed, make children, the kids are mutated, and the weaker parents are killed. Repeat.
  • And you need to let the simulation know how likely it is that genes will mutate or that breeding parents will swap chromosomal segments ("crossover").
As a prelude to a larger project, I put together a simple Mathematica notebook that would try out a problem in a textbook on the subject, Melanie Mitchell's An Introduction to Genetic Algorithms. The test?

Turning random "binary" chromosomes into a freaky population that loves the number 1

Start with a bunch of random chromosomes, like:

{0,0,1,0,1,1,1,0} (chromosome 1)
{1,0,0,1,1,0,0,0} (chromosome 2)

...and let them breed and mutate. The chromosomes that looked most like {1,1,1,1,1,1,1,1} were not killed off.

Our population had 40 chromosomes, each with 10 genes valued 0 or 1. The universe runs for 100 time-steps.

Here's a chromosome from the randomized primordial ooze, our friend chromosome 5:


And here is its universe, textually:


..and graphically (Mr. Five is in the fifth row from the top):


A few simple experiments: does it work?

Of course, I implemented the basic GA stuff, like a function for "fitness-proportionate selection using roulette wheel sampling," a fancy way of saying "a way to favor the parents that most look like a bunch of ones." At the end I'll link to the notebook so you can use it, laugh at my code, and offer suggestions.

Here I'll show you a few plots that show how the population's fitness improves (hopefully) over 100 sessions of selection, mating with crossover, and "honey, I mutated the kids."


Above, we use the recommended parameters for mutation (0.001 = 0.1% likely) and crossover (70%). As you can see, the population approaches the theoretical maximum fitness (400, which is the number of chromosomes multiplied by the number of genes each) at about step 20.

The curve goes up, meaning that the "fitness" of the chromosomes is improving.  A plot of the chromosomes at ten snapshots in their history are provided right under that.  They're getting more black - which means that they're becoming filled with 1s.  (Under that is what the first ten generations looked like.)

It bounces around, too. That's because the genes mutate now and then.

What if we go outside without sunblock?

What if we turn the mutation probability up to a whopping 10%?


Augh! The population never has a chance to improve its fitness; just as it tries to get healthy, we flip bits and really screw things up.

One other quick experiment: do what extent does the crossover probability matter? Let's turn it down to 0.5%, so that crossover happens about 1 in 200 matings. We'll increase the number of generations to 500:


Neat! Maybe I'm reading too much into this, but it's as if improvements in fitness happen in punctuated jumps.

Oh, what does the life of one chromosome over the whole history look like? Going back to our original parameters for mutation and crossover, Mr. Five indeed tends toward all ones (or "all black," reading downward.)




Well, that's it. If you find this interesting, you might want to code some of this stuff up on your own. I suggest a few video clips of neat stuff (e.g. synthetic organisms learning to crawl, Lego bridges...) and good books on these blog posts:

  • g-fav's first attempt to write a GA, in C (May 2007)
  • "virtual schadenfraude" - turn on your speakers! (Oct 2008)
  • Lego bridges and truck backer-uppers (May 2009)
The Mathematica notebook (do whatever you want with this, was written for fun and not error-tested, not responsible for damages...) max_ones.nb - free Wolfram Mathematica Player

Have a good weekend!

Gregg





04 July 2009

Maps on FedEx shipping labels

Hi - 

Among Kevin Kelly's numerous thoughtful blog posts about network effects, the Web, and various design-issues is this brief post about FedEx's use of road maps printed directly on packages.  Check it out.  (One commenter noted that it would be more useful in the cab of the truck on a clipboard, instead of in the back of the truck stuck to a package, but still...)

g