‘Genetic’ is one of those words that make things cool, some others are topological, chaos or chaotic, fractal, quantum and recursive. That was not intended to be an exhaustive list. So for a while I have wanted to have a go at making a genetic algorithm and I have finally gotten around to it.
How they work
First let me refer you to the wiki, then I’ll quickly talk about the things I think are important. The main idea is to use an search algorithm inspired by evolution to solve a problem.
There are four core ideas.
Parametrization of Solution
To begin with we assume that a solution takes the form of a binary vector of length . Each component of a solution vector can be thought of as a gene, which can be on or off (1 or 0 resp.).
The vectors I will call individuals, or creatures, or candidate solutions, or vectors. The population is a set of individuals, to begin with we initialize the population randomly or however you like.
We also assume that we have a function , this measures an individual’s fitness. Fitness is how well the solution, parametrized by the binary vector, solves your problem. So we will assume that bigger numbers mean better solutions.
Once we have a population, we assess their fitness. Now we want to create new individuals, hopefully with better fitness. We do this by combing different parent solutions to create child solutions. One simple way to do this is called uniform crossover , a function taking solutions to solutions is called a genetic operator, a binary genetic operator is a crossover. Uniform crossover simply means that the child’s ith gene is equal to the father’s or mothers with equal probability (alert! hetronormative language in use! please evacuate the building! ). But not everybody get’s to be a parent, ideally you want the chance that a creature has a child to depend in some way on its fitness.
One way of selecting parents is called tournament selection. The idea is that you grab k creatures from your population, and the one with the highest fitness is selected. The parameter k then controls selection pressure.
‘Mutation’ also happens to be one of the cool words. In this context it is an unary genetic operator, that we think of as perturbing a solution vector. There are many ways to do this, perhaps to the simplest is to flip each gene from on to off or vice versa with probability called the mutation rate.
The problem with this can be that it encourages genes to be on where there are few on and off when there are many on. If the number of on genes is part of what is interesting, say if your vector represented variable selection in a machine learning model, then you do not want the mutation to be biased towards a certain number of on genes.
This means that ideally you should use a kind of mutation suitable for your problem, but even if you don’t make the best choice, natural selection will combat biases in the mutation, more or less successfully.
Since I want the expected number of on genes to be the same after mutation, my approach is to, with probability pick an on gene at random and turn it off and with probability pick an off gene and turn it on, repeated times.
Ok I have talked enough about the theory, let’s make one! First we need a problem to solve.
I will randomly choose an integer between and . Your job is to guess what is.
The information you are given is that after each guess , I will select random integers from the same range, write it as a binary vector, and then I will then tell you how often either:
1. The random integer is both and coordinate-wise; or
2. The random integer is both not and not coordinate-wise.
So I will tell you an integer from that describes the size of the subset for which are upper bounds and the size of the subset for which are lower bounds.