Neural Networks Part 1: Feedingforward and Backpropagation


Last time we looked at the perceptron model for classification, which is similar to logistic regression, in that it uses a nonlinear transformation of a linear combination of the input features to give an output class.

A neural network generalizes this idea by conceiving of a network of nodes which each take as an input a linear combination of the outputs of other nodes, then use some activation function to turn this into an output, such as:

A Feedforward Neural Network

A neural network is just a directed graph. We will be looking at feedforward neural networks, which are acyclic.

Each node is called a neuron. There are three main kinds of neurons:

  • Input neurons, one neuron for each feature, so if our input were vectors in \mathbb{R}^3, we would have 3 input neurons. These neurons have trivial activation functions and no incoming edges.
  • Output neurons, one neuron for each output feature, so if our so if our output were vectors in \mathbb{R}^3, we would have 3 output neurons. These will typically have a nonlinear activation function, chosen so that the output range matches our training data. These neurons have no outgoing edges.
  • Hidden neurons are in between input and output neurons and have both incoming and outgoing edges.

Since the graph is acyclic, we can think of it as being composed of layers, as in the following picture taken from wikipedia:

although note there can be many hidden layers. Each edge has a weight, a real number.

The neural network makes a prediction by feeding forward the input. To describe this we develop some notation. The discussion that follows is indebted to this very helpful page.


Let N be a neural network and G index the vertices of N so that each N_i is a vertex for i \in G.

For each i \in G let T_i denote the set of indices of the targets of N_i. That is the set of indices X \subset G such for each j \in X there exists a directed edge e_{ij} from N_i \to N_j. We may also conflate T_i with the set of indexed neurons if the context makes this unambiguous.

Similarly, for each i \in G let S_i denote the set of indices of the sources of N_i.

For each index i, let f_i: \mathbb{R} \to \mathbb{R} denote the activation function of N_i.

For each index i, let O_i(x) \in \mathbb{R}:=f_i(x) be the output of N_i. When the argument x is understood, we may simply write O_i.

For each directed edge e_{ij} define w_{ij} \in \mathbb{R} to be the weight of the edge.

Where the outputs are understood, for each index i, define the input I_i to be \sum \limits _{j \in S_i}w_{ji}O_j, the weighted sum of the outputs of the previous layer.

Feeding Forward

Feeding forward refers to how an input becomes an output or prediction.

As you might have guessed: feeding forward an input vector x, the input and output to each input neuron is just the corresponding component of the vector x.

From there the input to a neuron in the first hidden layer is simply the sum of the components of x, weighted by the weights of the incoming edges to that hidden node.

You repeat this for each hidden node. Then the inputs to the hidden nodes are transformed via their activation functions to their outputs. The weighted sums of these outputs are then passed on as inputs to the next layer, and so on.

This is probably best understood through a simple example.

nn1Here we have 4 neurons. The input xis two dimensional, so we have two input neurons shown in blue. We have a single hidden layer with just one hidden neuron, shown in green. Finally we have a single output neuron shown in orange. Hence the output y is one dimensional.

Choosing the Weights: Backpropagation

So we have seen how a neural network can make predictions, but how can we choose the weights so that the predictions are any good? The answer is the same as in logistic regression, we use gradient descent. In order to do this we need to use differentiable activation functions and choose an appropriate error function to minimize.

If x \in \mathbb{R}^k is the input vector, and y \in \mathbb{R}^n is the target vector, let f_N: \mathbb{R}^k \to \mathbb{R}^n be the prediction of the neural network. Then we define the error to be e(x)= \frac{1}{2}(f_N(x)-y)^2. This is the natural Euclidean norm of the difference between the target and the prediction. The factor of a half is there for aesthetical reasons, since we will be differentiating and gaining a factor of 2.

The trick to the algorithm is to figure out how the weights contribute to the error, and backpropagation is an elegant way of assigning credit/blame to each node and edge. We fix an input x, and write e for e(x), and for each each edge e_{ij} we are seeking \frac{\partial e}{\partial w_{ij}}.  Once we have found this, we set w_{ij} \mapsto w_{ij} - \lambda \frac{\partial e}{\partial w_{ij}}, where \lambda is the learning rate.  So we are doing stochastic gradient descent as in the perceptron.

It is clear that an edge e_{ij} and its weight w_{ij} contribute to the error through the target N_j‘s input. So it makes sense to divide and conquer using the chain rule, that is

\frac{\partial e}{\partial w_{ij}} = \frac{\partial e}{\partial I_j} \cdot \frac{\partial I_j}{\partial w_{ij}}.

The first part of the product is worth giving a name,  we denote it by E_j:=\frac{\partial e}{\partial I_j} and call it the error at the node.

The second part of the product is simple to calculate:

\frac{\partial E_j}{\partial w_{ij}} = \frac{\partial}{\partial w_{ij}} \sum \limits _{k \in S_j}w_{kj}O_k = O_i

Returning to E_j, we can calculate it recursively, assuming the errors for the neurons in T_j have already been calculated.

Observe that e can be thought of as a function of the variables O_k for k indexing any layer, and so using the chain rule we have

E_j = \frac{\partial e}{\partial I_j}

= \sum \limits_{k \in T_j} \frac{\partial e}{\partial I_k}\cdot \frac{\partial I_k}{\partial I_j}

= \sum \limits_{k \in T_j} E_k \cdot \frac{\partial I_k}{\partial I_j}

=\sum \limits_{k \in T_j} E_k \cdot \frac{\partial I_k}{\partial O_j} \cdot \frac{\partial O_j}{\partial I_j}

=f_j'(I_j)\sum \limits_{k \in T_j} E_k \cdot \frac{\partial I_k}{\partial O_j}

=f_j'(I_j)\sum \limits_{k \in T_j} E_k \cdot \left ( \frac{\partial}{\partial O_j} \sum \limits _{l \in S_k}w_{lk}O_l \right )

=f_j'(I_j)\sum \limits_{k \in T_j} E_k \cdot w_{jk}

notice that if we assume that each layer is fully-connected to the next, then we could write the above succinctly as an inner product, and inner products can be calculated very efficiently, for instance using a GPU, so it is often faster to have more connections!

Together this gives

\frac{\partial e}{\partial w_{ij}}=O_i \cdot f_j'(I_j)\cdot\sum \limits_{k \in T_j} E_k \cdot w_{jk}.

Since we have calculated the error at a node recursively, we had better deal with the base case –  the output neurons. But this follows simply from the definition of  e to be E_i=f'(I_i)(O_i-y_i), with the factor of 2 cancelling with the half as promised.

Bias Neurons

The final piece of the puzzle are called bias neurons, which I had left out before to simplify the discussion. Each layer, except the output layer, has a bias neuron. A bias neuron has no inputs, and its output is always 1. Effectively it just adds the weight of its outgoing edges to the inputs of the next layer.

These behave just like regular neurons, only when updating their weights, we note that since their input is effectively constant, they have \frac{\partial I_j}{\partial w_{ij}}=1, and so we need only calculate the error at a bias neuron to update its weights.

Next Time

In the next installment I will show how to compactly represent the process of feeding forward and backpropagation using vector and matrix notation assuming the layers are fully connected. We will then use this to give a Python implementation.

Enter the Perceptron


Some time ago in my post on logistic regression, I talked about the problem where you have a n \times p matrix X representing n data points with p features, and a vector y \in \{0,1 \}^n of classifications.

In linear regression you would produce a vector \beta \in \mathbb{R}^p so that X\beta \approx y, but the problem here is that our output could be any real number, not \{0,1\}. The solution we investigated was using the logistic link function p(x) = \frac{1}{1+\exp(-x)} which turns any real number into something we interpret as a probability.

That was one solution, another even simpler one is the function defined s(x) = \left \{\begin{array}{cc} 1 & x \ge 0 \\ -1 &x<0\end{array} \right..  It is convenient in this case to think of the response as taking values in \{-1,1 \} not \{0,1\}.

Previously we added a column of ones to X in order to create an intercept term, here we will write it separately so so that X \mapsto X\beta + b where b \in \mathbb{R}^n is called the bias.

This post will take the book ‘The Elements of Statistical Learning’ as its guide (the section starting on p. 130), which is freely available online, but there is also a great blog post about it here.

The Algorithm

So define f(x):=\beta^Tx+b, we classify a point as being in the positive class if f(x)>0 and negative if f(x) <0. Geometrically we have an affine hyperplane L:=\{x:f(x)=0 \}, and points are classified depending on whether they are ‘above’ or ‘below’ the hyperplane.

The unit normal u to L is given by \beta/ \Vert \beta \Vert, points above L then mean points with positive projection onto this vector.

The signed distance of a point x to L is found by projecting the vector x-x_0 onto u, where x_0 is the ‘origin’ of the hyperplane (since an affine hyperplane is a translation of a linear subspace, there is a point corresponding to the translation of the origin). So we get

u^T(x-x_0)= \beta^T/ \Vert \beta \Vert (x-x_0) = (\beta^Tx+b)/ \Vert \beta \Vert = f(x)/ \Vert \beta \Vert = f(x)/ \Vert f'(x) \Vert,

notice this means that f(x) is proportional to the signed distance of x from L.

If M is the set of misclassified points, then we can measure the error as the sum of the distances (up to a constant of proportionality), of the misclassified points to L, that is

D(\beta, b):=-\sum \limits _{x \in M}y_x f(x),

where y_x is the response of x, notice that this makes each summand negative, which is reversed by the minus sign outside.

If we fix M we can differentiate with respect to \beta and b, giving

\frac{\partial}{\partial \beta}D(\beta, b)= -\sum \limits _{x \in M}y_xx,


\frac{\partial}{\partial b}D(\beta, b)= -\sum \limits _{x \in M}y_x.

Just like in logistic regression, we can use these partial derivatives to search for a minimum of D. The algorithm purposed by Frank Rosenblatt, the inventor of the perceptron, is to modify the hyperplane after looking at each individual misclassified item in M as it comes, that is (\beta, b) \mapsto (\beta,b) + \lambda(y_xx,y_x), where \lambda >0 is the learning rate. This has the advantage of being much faster than computing the global gradient informed by all of the data, and is an example of stochastic gradient descent.

Of course by changing the hyperplane as we go we may change M, and so what we actually do is iterate over each datapoint in a fixed order, and check if it is misclassified under the latest hyperplane, then update and move onto the next point. This is then very easy to code.

The Code

First let’s recycle some code we used for logistic regression to get some test data.

import numpy as np
from scipy import linalg
import random as rd
import matplotlib.pyplot as plt
def test_set(a,b, no_samples, intercept, slope, variance, offset):
    #First we grab some random x-values in the range [a,b].
    X1 = [rd.uniform(a,b) for i in range(no_samples)]
    #Sorting them seems to make the plots work better later on.
    #Now we define our linear function that will underly our synthetic dataset: the true decision   boundary.
    def g(x):
        return intercept + slope*x
    #We create two classes of points R,B. R points are given by g(x) plus a Gaussian noise term with positive mean,
    #B points are given by g(x) minus a Gaussian norm term with positive mean.
    X2=[] #This will hold the y-coordinates of our data.
    Y=[]  #This will hold the classification of our data.
    for x in X1:
        if R_turn:
            x2 += rd.gauss(offset, variance)
            x2 -= rd.gauss(offset, variance)
        R_turn=not R_turn
    #Now we combine the input data into a single matrix.
    X1 = np.array([X1]).T
    X2 = np.array([X2]).T
    X = np.hstack((X1, X2))
    #Now we return the input, output, and true decision boundary.
    return [ X,np.array(Y).T, map(g, X1)]

Fairly straightforward stuff I hope! If not then it should make sense when we plot it.

X,Y,g = test_set(a=0, b=5, no_samples=200, intercept=10, slope=0, variance=0.5, offset=1)
fig, ax = plt.subplots()
R =  X[::2] #Even terms
B =  X[1::2] #Odd terms

ax.scatter(R[:,0],R[:,1],marker='o', facecolor='red', label="Outcome 1")
ax.scatter(B[:,0],B[:,1],marker='o', facecolor='blue', label="Outcome 0")
ax.plot(X[:,0], g, color='green', label="True Decision Boundary")
ax.legend(loc="upper left", prop={'size':8})

This will produce something like:


Now to implement the algorithm.

def perceptron(X,y, learning_rate=0.1, max_iter=1000):
    #Expecting X = n x m inputs, y = n x 1 outputs taking values in -1,1
    weight = np.zeros((m))
    bias = 0
    index = list(range(n))
    while n_iter <= max_iter:
        for row in index:
            if (X[row,:].dot(weight)+bias)*y[row] <=0: #Misclassified point.
                weight += learning_rate*X[row,:]*y[row]
                bias += learning_rate*y[row]
    return {'weight':weight, 'bias':bias}

we shuffle the order we take the data points in at each step to get rid of any peculiarities in the order we update. Now let’s try it on our test data.

result= perceptron(X,Y,0.1)
def decision_boundary(x, weight, bias):
    return (-bias - x*weight[0])/weight[1]

fig, ax = plt.subplots()
R =  X[::2] #Even terms
B =  X[1::2] #Odd terms

ax.scatter(R[:,0],R[:,1],marker='o', facecolor='red', label="Outcome 1")
ax.scatter(B[:,0],B[:,1],marker='o', facecolor='blue', label="Outcome -1")
ax.plot(X[:,0], g, color='green', label="True Decision Boundary")
ax.plot(X[:,0], [decision_boundary(x, result['weight'], result['bias']) for x in X[:,0]],
        label='Predicted Decision Boundary')
ax.legend(loc="upper left", prop={'size':8})


perceptron_test_resultsNot bad! Get the code here.

Snakes on a Manifold: Part 2

Welcome back. Last time we implemented the main components of the game. This time we will implement the main game loop. The job of the loop is to: get keyboard input, update the game state, update the display.

We will now be using pygame, so you might want to check out a tutorial before proceeding, but I think the best way to learn is just to dive right in. You will however need to install pygame before running the code. If you haven’t already, you should download the code from last time too. Finally if you want to skip to playing with the code yourself, here it is.

The Code
Ok here it is.

import pygame
import snake as sk
from pygame.locals import *

class Game(object):
    def __init__(self, size, topology, speed=6):
        #Resolution in pixels.
        self.screen_resolution =600
        self.speed = speed #Controls how fast the snake moves.
        self.screen = pygame.display.set_mode((self.screen_resolution,
        pygame.display.set_caption('SNAKES ON A MANIFOLD!')

        self.board = sk.Board(size, topology)
        #Put the snake in the center.
        self.snake = sk.Snake([size/2, size/2], self.board.opposite_direction,
        #Number of grid cells.
        #Width in pixels of a grid cell.
        #Now we name some colors in RGB for ease of use. = (0,0,0) = (255,0,0) = (0,255,0),0,255)
        self.snake_colour =

        #Initialises screen to background color.

        #Then we draw the boundary cells.

        #This command updates the changes we have made to the screen.

        #Main game loop.

    def main(self):
        while True: # main game loop
            #Displays score in window caption.
            pygame.display.set_caption('Topology: '+self.board.topology+'  |  Score: '+ str(score))
            #Gets keyboard input
            for event in pygame.event.get():
                if event.type == QUIT:
                    #User has closed window.
                if event.type==pygame.KEYDOWN:
                    #User has pressed key.
                    if event.key==K_q:
                    if event.key==K_UP:
                    if event.key==K_DOWN:
                    if event.key==K_LEFT:
                    if event.key==K_RIGHT:
            if Quit:
            #To see if anything has happened, we check where the head is.
            head_loc = self.snake.head_position
            #Now we find out what, apart from the head, is in the grid cell.
            head_on = self.board.grid[head_loc[0], head_loc[1]]

            if (head_on == 1) or (head_on==2):
                #The snake has hit the boundary or part of itself.
            if head_on==3:
                #The snake has found an apple.

            if first:
                #When the game starts, the snake stands still waiting
                #for user input to start the game.
                while True:
                    for event in pygame.event.get():
                        if event.type == QUIT:
                        if event.type==pygame.KEYDOWN:
                            if event.key==K_q:
                            if event.key==K_UP:
                            if event.key==K_DOWN:
                            if event.key==K_LEFT:
                            if event.key==K_RIGHT:
                    if Quit:
                    elif got_direction:
                if Quit:

            #This pauses the game to slow it down, adjust to make faster/slower.
        #Waiting to quit after died.
        while True: # main game loop
        #Gets keyboard input

            for event in pygame.event.get():
                if event.type == QUIT:
                if event.type==pygame.KEYDOWN:
                    if event.key==K_q:
                    if event.key==K_r:
            if Quit:
            if restart:
        if restart:
            G=Game(self.size, self.topology)

    def draw_boundary(self):
        #Draws any boundary grid cells.
        for row in range(self.size):
            for col in range(self.size):
                if self.board.grid[row, col]==1:
                    rect=pygame.Rect(x,y, self.square_size, self.square_size)
                    pygame.draw.rect(self.screen,self.boundary_colour, rect)

    def draw_snake(self):
        #We add a snake colored square where the head has moved,
        #and add a background colored square where the tail was.

        background_rect=pygame.Rect(left,top, self.square_size, self.square_size)
        pygame.draw.rect(self.screen, self.background_colour,background_rect)

        snake_rect=pygame.Rect(left,top, self.square_size, self.square_size)
        pygame.draw.rect(self.screen, self.snake_colour,snake_rect)

    def draw_apple(self, x, y):
                               self.square_size, self.square_size)
        pygame.draw.rect(self.screen, self.apple_colour, apple_rect)

    def update_board(self):
        #Updates the numpy array of the board.
        old_tail_position = self.snake.segments[0].previous_position
        self.board.grid[old_tail_position[0], old_tail_position[1]]=0
        self.board.grid[new_head_position[0], new_head_position[1]]=2

And we’re done! Now you can play a game by typing something like this.

G=Game(20, 'Mobius Strip')

Please let me know what you think. The code is here. If anyone is feeling ambitious, you could add a menu screen where you could select a speed and topology.

Snakes on a Manifold: Part 1

Have you every wondered what it would be like to live on a Mobius Strip? I was playing around with some polygonal schemas for various topologies awhile back and I was reminded of the game ‘Snake’ which I used to play on my dad’s phone when I was a kid. I realized that the game was set on a torus. So for fun, and to illustrate a talk I was giving, I decided to implement this game in python using a module called pygame where you could play with a variety of topologies.

It looks like this:

In this post we will implement the underlying logic of the game, and in the next post we will implement the input/output part with pygame. If you just want the code, click here.

The Code

We will create a Snake class, and a Snake will be composed of Snake_Segment objects. Each Snake_Segment is responsible for its behind, the Snake_Segment immediately behind it, so we have a sort of linked list.

Let’s start with the Snake_Segment.

class Snake_Segment(object):
    def __init__(self, initial_position, behind=None):
        self.position=initial_position #Position on the board.
        self.previous_position = initial_position #Keep track of to tell tail to move there.
        self.direction=None #This is for the head segment.
        if behind == None:
            self.behind=self #The tail of the snake.

    def move(self, target_coords):
        self.previous_position = self.position
        self.position = target_coords
        #Once the segment moves, it passes along the message to its behind, along with its
        #previous coordinates.
        if not self.behind==self: #Checks that it is not the tail of the snake.

Now we can build a Snake. Its actions are to move and grow.

class Snake(object):
    def __init__(self, initial_position, opposite_directions, get_next_coords):
        #It is important to keep track of opposite_directions
        #because unless the Snake always moves forwards.
        #The head is the segment at the end of self.segments.
        #The snake keeps track of the head's board coordinates.
        self.opposite_direction = opposite_directions
        #We pass the snake a function from the Board class to tell it where to move next.
        self.get_next_coords =get_next_coords

    def move(self, direction=None):
        if direction==None:
            #This is the default action, the snake keeps going in the same direction.
            direction = self.direction
        elif self.direction==None:
            #This happens at the beginning before its first movement.
        elif direction == self.opposite_direction[self.direction]:
            #Do nothing in this case because the snake only moves forwards.
            direction = self.direction
            #The snake turns.
        #Now we get new coordinates for the head based on the direction.
        target_coords, self.direction = self.get_next_coords(self.head_position,
        #We then tell the head to move, and the command propagates down to the tail.
        self.head_position = target_coords

    def grow(self):
        #We grow by adding a segment to the front of the snake,
        #becoming the new head.
        self.length +=1

So now we have a Snake it needs somewhere to sliver, what we will call a Board. This is the most complicated part of the program because it is tasked with incorporating the topological information. This is handled by its get_next_coord method. Its other duty is to generate apples.

class Board(object):
    def __init__(self, size, topology):
        self.size =size
        #The board will be self.size x self.size squares.
        self.topology = topology #A string like 'Disc', 'Torus', etc.
        self.opposite_direction = {'UP':'DOWN', 'DOWN':'UP', 'LEFT':'RIGHT',
    def init_topology(self):
        #The topological information is contained in self.edges, which tells the board
        #which edges are boundary, which edges are identified, and with what orientations.
        #self.edges is a dictionary with keys 'LEFT', 'RIGHT' , 'UP', 'DOWN',
        #and having values of the form ('edge name or boundary', -1/0/1) where -1 represents
        #identification with reversed orientation, 1 identification with orientation preserved and
        #0 for the boundary case.

        if self.topology =='Disc':
            self.edges ={'LEFT':('BOUNDARY',0), 'RIGHT':('BOUNDARY',0),
                         'UP':('BOUNDARY',0), 'DOWN':('BOUNDARY',0)}
        elif self.topology=='Torus':
            self.edges ={'LEFT':('RIGHT',1), 'RIGHT':('LEFT',1),
                         'UP':('DOWN',1), 'DOWN':('UP',1)}
        elif self.topology=='Sphere':
            self.edges ={'LEFT':('UP',1), 'RIGHT':('DOWN',1),
                         'UP':('LEFT',1), 'DOWN':('RIGHT',1)}
        elif self.topology=='Mobius Strip':
            self.edges ={'LEFT':('BOUNDARY',0), 'RIGHT':('BOUNDARY',0),
                         'UP':('DOWN',-1), 'DOWN':('UP',-1)}
        elif self.topology=='Klein Bottle':
            self.edges ={'LEFT':('RIGHT',-1), 'RIGHT':('LEFT',-1),
                         'UP':('DOWN',1), 'DOWN':('UP',1)}
        elif self.topology=='Projective Plane':
            self.edges ={'LEFT':('RIGHT',-1), 'RIGHT':('LEFT',-1),
                         'UP':('DOWN',-1), 'DOWN':('UP',-1)}

    def set_grid(self, coords, value):
        self.grid[coords[0], coords[1]]=value

    def grid_init(self):
        grid=np.zeros(shape=(self.size, self.size))
        #On the grid, 0s represent empty, 1s represent boundary, 2s represent snake and 3s apples.
        if self.edges['LEFT'][0]=='BOUNDARY':
            grid[::,0] = np.ones(shape=self.size)
        if self.edges['RIGHT'][0]=='BOUNDARY':
            grid[::,-1] = np.ones(shape=self.size)
        if self.edges['UP'][0]=='BOUNDARY':
            grid[0] = np.ones(shape=self.size)
        if self.edges['DOWN'][0]=='BOUNDARY':
            grid[-1] = np.ones(shape=self.size)

    def generate_apple(self):
        x=random.randint(0, self.size-1)
        y=random.randint(0, self.size-1)
        if self.grid[x,y]==0: #Makes sure the square is not already occupied.
            return [x,y]
            return self.generate_apple()

    def get_next_coord(self, coord, direction):
        #First handles how to move in a direction without worrying about boundaries.
        if direction == 'UP':
            change_coord = [-1,0]
        elif direction =='DOWN':
            change_coord = [1,0]
        elif direction == 'LEFT':
            change_coord = [0,-1]
        elif direction == 'RIGHT':
            change_coord = [0,1]
            print 'Invalid direction' #Just in case.
        #Calculate new coordinates optimistically.
        potential_new_coords = [coord[i] + change_coord[i] for i in range(2)]

        #First check to see if we are trying to travel through an edge.
        if -1 in potential_new_coords:
        elif self.size in potential_new_coords:
        if not check:
            #We are in the interior so no problems.
            return [potential_new_coords, direction]

        #Now we know we are traveling through some edge,
        #and the direction tells us which.
        edge, orientation = self.edges[direction]
        #If the snake emerges from the LEFT edge, it will then be moving in the RIGHT direction,
        #and so on.
        new_direction = self.opposite_direction[edge]
        #We automatically know one coordinate because we know the new edge.
        if edge=='BOUNDARY':
            print 'Error, boundary case should not happen here.'
        elif edge == 'LEFT':
        elif edge =='RIGHT':
        elif edge == 'UP':
        elif edge == 'DOWN':
            print 'Edge invalid'
        #Now we figure out the other coordinate, which depends on either
        #the old row or column, depending on whether the new edge is opposite
        #or perpendicular.
        if edge == self.opposite_direction[direction]:
        #Find out whether we want the old row or old column.
        if left_or_right:
            if opposite:
                coord_component = coord[0]
                coord_component = coord[1]
            if opposite:
                coord_component = coord[1]
                coord_component = coord[0]
        #Now we find out whether the the orientation is flipped.
        if orientation==-1:
            coord_component = self.size-1-coord_component
        if left_or_right:
        return [potential_new_coords, new_direction]

And we’re done! Next time we will make this pretty using pygame. If you want to test it before then, you can print the board grid to the console.
The source code is here.

Computing Homology

Last time we introduced abstract simplicial complexes (abs) and how to represent them in Python. Today we will look at the homology groups of an abs.

For an excellent primer on homology I recommend checking out this post , I will just quickly recap the basics.

Homology: A quick reminder

Let X be an abs.

Write X_i for the i^{th}dimensional skeleton of X, formed by restricting to those elements of X with dimension \le i.

If \sigma \in X is a p+1-simplex with vertices labelled 0, \dots, p we may write \sigma = [0, \dots, p] to denote this. To denote the p-simplex consisting of the vertices of \sigma except the i^{th} vertex we write [0, \dots, \hat{i}, \dots p].

For each dimension p we write N_p for the set of p-dimensional simplices, and write n_p = |N_p| for its cardinality.

For each dimension p we write C_p for the pdimensional chain group given by the group presentation C_p =<N_p |[x,y], x=x^{-1},\ \forall \, x,y \in N_p >. What this says is that the generators of the group are the p-simplices, the group is abelian and each element is its own inverse.

An alternative way to think of this is as a vector space, where C_p has basis N_p and is over the field \mathbb{Z}_2 (mod 2 arithmetic). So every element can be represented as a binary vector of length n_p, and addition is bitwise xor.

Now we define boundary maps \delta_{p+1}: C_{p+1} \to C_p. These are both group homomorphisms and linear maps. For a p+1-simplex \sigma = [0,1, \dots, p] define:

\delta_{p+1}\sigma = \sum \limits _{i=0}^{p}[0, \dots, \hat{i}, \dots p], the sum of its p-dimensional faces.

Since N_{p+1} is a a basis for C_{p+1} this determines a linear mapping.

Define Z_{p+1} := ker \, \delta_{p+1} called the p+1dimensional cycle group (subspace).

Define B_p: = Im \, \delta_{p+1} called the pdimensional boundary group (subspace).

Now we can define the pdimensional homology group H_p:= \frac{Z_p}{B_p}.

This is both a quotient group and a quotient vector space.

Finally define the p^{th} Betti number B_p:= \text{rank }H_p = \text{rank } Z_p - \text{rank }B_p.


We are going to be interested in the Betti numbers as these can be thought of as counting the p-dimensional holes in our space. For B_0 this just means the number of connected components.

In order to calculate the ranks of interest, we must construct the matrices representing the boundary mappings. In order to do this we must agree upon an ordering of the simplices. So we simply use the one generated by the data structure, which is why each simplex has an index attribute.

To find the rank of C_p, which we have called n_p, we simply count the number of p-simplices. So we can add that feature to our SimplicialComplex class easily.

    def chain_rank(self,p):
        return len(self.n_simplex_dict[p])

Now the matrix of \delta_p: C_p \to C_{p-1} will be have n_{p-1} rows, one for each p-1-simplex, and n_p columns, one for each p-simplex. Then the i^{th} column of the matrix will contain the coefficients of the image of the i^{th} \ p-simplex expressed in terms of our basis for C_{p-1}.

So to find the i^{th} column vector, we ask each of a simplex’s boundary faces for their index. These will be the positions of the 1s in the column.

    def boundary_column(self,simplex):
        column_size = self.chain_rank(simplex.dimension-1)
        faces = simplex.faces
        column = [0 for i in range(column_size)]
        for face in faces:
        return column

Now we simply combine these columns to form the boundary matrix. Note that for p greater than the dimension of the simplicial complex, C_p=0 the trivial group, and similarly for negative p. We don’t have to calculate anything in this case.

This also means that the kernel of \delta_0 is C_0. This is handled by us having the empty set as a kind of -1-dimensional simplex.

    def get_boundary_matrix(self,p, skeleton=None):
        if skeleton==None:
        if ((p>skeleton)or(p<0)):
            return np.asarray([])
        mat=[self.boundary_column(simplex) for simplex in self.n_simplex_dict[p]]
        return np.asarray(mat, dtype=int).transpose()

The skeleton argument is used when you want to only consider simplices up to a certain dimension. For instance if X is given by [0,1,2] then X is a representation of a triangle. Then the zeroth Betti number B_0=1 as there is one connected component. However the zeorth Betti number of X_0 is 3, one for each vertex.

Let’s test what we’ve got so far. In this post Jeremy Kun computes the homology of the tetrahedron with an ‘arm’, he provides a nice picture of it here. If we label the vertices of the tetrahedron 0,1,2,3, then this structure can be specified with the maximal simplices [0,1,2,3], [0,4], [2,4 then restricting to the 2-skeleton, since we want there to be a hole in the tetrahedron.

Let’s see what the boundary matrices are:

def jeremy_example():
    maximal_simplices = [[0,1,2,3],[0,4],[2,4]]
    S = SimplicialComplex(maximal_simplices)
    S.plot(direction='up', label_color='blue')
    for i in range(3):
        print "Boundary matrix of dimension ", i
        print S.get_boundary_matrix(i,skeleton=2)


Which produced the following plot (of the 3-skeleton):


And outputted to the console:

Boundary matrix of dimension 0
[[1 1 1 1 1]]
Boundary matrix of dimension 1
[[1 1 1 0 0 0 1 0]
[1 0 0 1 1 0 0 0]
[0 1 0 1 0 1 0 1]
[0 0 1 0 1 1 0 0]
[0 0 0 0 0 0 1 1]]
Boundary matrix of dimension 2
[[1 1 0 0]
[1 0 1 0]
[0 1 1 0]
[1 0 0 1]
[0 1 0 1]
[0 0 1 1]
[0 0 0 0]
[0 0 0 0]]

If you check his post, you will see that the matrices are the same, up to a reordering of basis, and noticing that 1 \equiv -1 \ (mod \ 2)

So far so good. Now all that remains is to calculate the rank and nullity of each matrix. But this is easily done, we simply diagonalize the matrix and count the pivots. For \delta_{p+1} the number of pivots is equal to \text{rank } B_p and n_{p+1} - \text{rank }B_p = \text{ rank } Z_{p+1} by the rank-nullity theorem.

The form we will put the matrix in is called Smith-Normal form I won’t go through the details of how the algorithm works, they are on the wiki page. If you have seen Gaussian elimination before this is basically that.

    def reduce_matrix(self, matrix):
        #Returns [reduced_matrix, rank, nullity]
        if np.size(matrix)==0:
            return [matrix,0,0]

        def _reduce(x):
            #We recurse through the diagonal entries.
            #We move a 1 to the diagonal entry, then
            #knock out any other 1s in the same  col/row.
            #The rank is the number of nonzero pivots,
            #so when we run out of nonzero diagonal entries, we will
            #know the rank.
            #Searching for a nonzero entry then moving it to the diagonal.
            for i in range(x,m):
                for j in range(x,n):
                    if matrix[i,j]==1:
                if nonzero:
            #Knocking out other nonzero entries.
            if nonzero:
                for i in range(x+1,m):
                    if matrix[i,x]==1:
                        matrix[i,:] = np.logical_xor(matrix[x,:], matrix[i,:])
                for i in range(x+1,n):
                    if matrix[x,i]==1:
                        matrix[:,i] = np.logical_xor(matrix[:,x], matrix[:,i])
                #Proceeding to next diagonal entry.
                return _reduce(x+1)
                #Run out of nonzero entries so done.
                return x
        return [matrix, rank, n-rank]

    def boundary_rank(self,p, skeleton=None):
        if skeleton==None:
        if (p >= skeleton) or (p<0):
            #Trivial case.
            return 0
            return self.reduce_matrix(self.get_boundary_matrix(p+1, skeleton))[1]

    def cycle_rank(self, p, skeleton=None):
        if skeleton==None:
        if p==0:
            #Mapping to trivial group so kernel is everything in this case.
            return self.chain_rank(0)
        elif (p>skeleton) or (p<0):
            #Trivial case.
            return 0
            return self.reduce_matrix(self.get_boundary_matrix(p, skeleton))[2]

    def betti_number(self, p, skeleton=None):
        if skeleton==None:
        return self.cycle_rank(p,skeleton) - self.boundary_rank(p, skeleton)

Now let’s just test it out on the same example. We expect B_0=1 since there is one connected component, B_1=1 from the 1-dimensional hole made by the arm and B_2=1 for the ball inside the tetrahedron’s shell.

def jeremy_example(skeleton=2):
    maximal_simplices = [[0,1,2,3],[0,4],[2,4]]
    S = SimplicialComplex(maximal_simplices)
    S.plot(direction='up', label_color='blue')
    for i in range(4):
        print "Boundary matrix of dimension", i
        print S.get_boundary_matrix(i,skeleton)
        print "Reduced boundary matrix of dimension", i
        print S.reduce_matrix(S.get_boundary_matrix(i,skeleton))[0]
        print "Betti number",i,"=", S.betti_number(i,skeleton)


The console outputs:

Boundary matrix of dimension 0
[[1 1 1 1 1]]
Reduced boundary matrix of dimension 0
[[1 0 0 0 0]]
Betti number 0 = 1
Boundary matrix of dimension 1
[[1 1 1 0 0 0 1 0]
[1 0 0 1 1 0 0 0]
[0 1 0 1 0 1 0 1]
[0 0 1 0 1 1 0 0]
[0 0 0 0 0 0 1 1]]
Reduced boundary matrix of dimension 1
[[1 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0]
[0 0 1 0 0 0 0 0]
[0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 0 0]]
Betti number 1 = 1
Boundary matrix of dimension 2
[[1 1 0 0]
[1 0 1 0]
[0 1 1 0]
[1 0 0 1]
[0 1 0 1]
[0 0 1 1]
[0 0 0 0]
[0 0 0 0]]
Reduced boundary matrix of dimension 2
[[1 0 0 0]
[0 1 0 0]
[0 0 1 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
Betti number 2 = 1
Boundary matrix of dimension 3
Reduced boundary matrix of dimension 3
Betti number 3 = 0

Just as expected. So we’re done. Get the code here.

Representing an Abstract Simplicial Complex in Python

What’s an abstract simplicial complex?

So what is an abstract simplicial complex? It is a collection of finite sets that are closed under taking subsets.  So if X is an abstract simplicial complex (asc), and S \in X , then every subset of S is also in X.

These are meant to be combinatorial models of geometric simplicial complexes, which are topological spaces given by collections of simplices, like a point, a line segment, a triangle or a tetrahedron. An n-simplex is the convex hull of n+1 affinely independent points in \mathbb{R}^n.

These are important in topology and geometry because they are very easy to understand and compute with, but can be pieced together in clever ways to approximate more complicated objects.

Some terminology

Let X be an abs.

Definition: the vertices of X, written V, are the set of all points in at least one element of X. Writing this formally: V = \bigcup \limits _{S \in X}S.

Definition: An element of X is called a simplex.

Definition: If \sigma is a simplex, and \tau is a subset of \sigma, then we say that \tau is a face of \sigma or that \sigma is a coface of \tau and denote this by \tau \le \sigma.

Definition: If \sigma \in X then the dimension of \sigma is |\sigma|-1.

Definition: If \sigma \in X and the only coface of \sigma is itself, then we call \sigma a maximal simplex.

Notice that an abs is completely determined by its set of maximal simplices.

Representation in Python

We want a convenient way to represent and generate these objects. Firstly we want to create one from a list of maximal simplices given as lists of integers. Secondly we want to map the (co)face relations, and enable us to easily iterate over simplices of a given dimension.

So stage one will be unwrapping the maximal simplices into simplices, placing them in a tree. Stage two will be filling in the face relations, representing the abs as a Hasse diagram, where each node represents a simplex.

Stage 1

In our data structure a node represents a simplex. So we create a new node class for the job.

import tigraphs2 as tig
class Simplex(tig.BasicNode, object):
    def __init__(self, vertices, **kwargs):
        super(Simplex, self).__init__(**kwargs)

        self.vertices = vertices
        self.dimension = len(self.vertices)-1
        #These are used for initializing the complex:
        self._cfaces =[]
        self.__parent = None

Now we want to create a class for a complex, which will inherit from the tree class since the initial structure will be a tree.

class SimplicialComplex(tig.return_tree_class(), object):
    def __init__(self, maximal_simplices, Vertex=Simplex, **kwargs):
        super(SimplicialComplex, self).__init__(Vertex=Vertex, **kwargs)
        self.maximal_simplices = maximal_simplices
        self.dimension = max(map(len,self.maximal_simplices))-1
        self.n_simplex_dict={} #Keeps track of simplexes by dimension.
        for i in range(-1,self.dimension+1):
        #Create the empty set 'root' of the structure.
        #Initialize complex from maximal simplices
        for ms in self.maximal_simplices:

Now we actually need to write the methods, starting with the create_simplex method:

    def create_simplex(self, vertices):
        vertex = self.vertices[-1]
        #It is convienient for the simplex to know its index for later on
        #when we create 'vectors' of simplices.
        index = len(self.n_simplex_dict[vertex.dimension])-1

The way we will form the initial tree is by building up from the empty set, so we need a way of building up sets.

    def create_child(self, simplex, vertex):
        #This method creates a new simplex which is an immediate coface to the argument
        #simplex. It does so by appending the new vertex to the old simplex's vertices.

        #_cfaces keeps track of which of the simplex's cofaces have already been added.

        if vertex in simplex._cfaces:
        child_vertices=[v for v in simplex.vertices]
        self.create_edge(ends=[simplex, child])

We are now ready to write the add_maximal_simplex method. It is a recursive procedure, starting with the list representing the maximal simplex, say [0,1,2], we add create cofaces to the root [] with vertices [0], [1] and [2]. To each of these new simplices, we pass on a list of vertices still to be added. To [0] we pass [1,2], to [1] we pass [2], and to [2] we pass [].

In general at each stage we use each vertex remaining in turn to create a child, and pass to that child the vertices yet to be used, that is the vertices remaining greater in value.

    def add_maximal_simplex(self,vertices, simplex=None, first=True):
        #We start at the root.
        if first:
            simplex = self.get_root()

        if len(vertices)>=1: #Condition to terminate recursion.
            for index in range(len(vertices)):
                vertex = vertices[index]
                self.create_child(simplex, vertex)

That’s stage 1 complete. Let’s improve the default plot method for the graph class and see if the code is working.

    def plot(self,margin=60):
        A = self.get_adjacency_matrix_as_list()
        g = ig.Graph.Adjacency(A, mode='undirected')
        for vertex in self.vertices:
            if vertex.label !=None:
                #print vertex.label
        #The layout is designed for trees, so we tell it to set the empty set as the root.
        layout = g.layout("rt", root=0)
        visual_style = {}
        visual_style["vertex_size"] = 20
        visual_style["vertex_label_dist"] =1
        #We want the root at the bottom of the plot.
        ig.plot(g, layout=layout, margin=margin, **visual_style)

Great! Let’s try it out on the tetrahedron. This defined by the maximal simplex [0,1,2,3].

def test(n):
    maximal_simplices = [list(range(n))]
    Sn = SimplicialComplex(maximal_simplices)

Looking good! Notice that the levels of the tree correspond to simplices of similar dimension and that each simplex’s vertices are in numeric order.

Also notice that many of the (co)face relations are missing, so now we fill them in.

Stage 2
First we add a useful method for accessing a simplex by its list of vertices. This is why we have arranged it so that the vertices are always given in ascending order, so we know exactly what each simplex’s address is.

    def get_simplex(self, address, first=True,simplex=None):
        #We recurse up from the root to the desired simplex.
        if first:
            simplex = self.get_root()
        if len(address)==0:
            return simplex
        if address[0] not in simplex._cfaces:
            return None
        return self.get_simplex(address=address[1:],first=False,

Now we need to add in missing faces and cofaces. To do this we add a method to find a given simplex’s immediate faces, its boundary.

    def _boundary(self,simplex):
        vertices = simplex.vertices
        n = len(vertices)
        #Each boundary simplex is found by discarding a single vertex.
        for index in range(n):
        return boundary

Next we add a simple method for creating a (co)face relation.

    def add_face_coface(self, face, coface):
        #Remember that since the tree was constructed from the empty set up,
        #cofaces are the children in the graph.

        if coface._parent != face:
            self.create_edge([face, coface])

Now we simply combine these method to add all of the relations. Recall that in a Hasse diagram, we demonstrate inclusions through transitivity where possible.

    def update_adjacency_simplex(self, simplex):
        boundary_faces = self._boundary(simplex)
        boundary_faces = map(self.get_simplex, boundary_faces)
        for face in boundary_faces:

    def update_adjacency(self):
        for i in range(self.dimension+1):
            for simplex in self.n_simplex_dict[i]:

And we’re done! Phew! Now let’s see what it looks like.

def test(n):
    maximal_simplices = [list(range(n))]
    Sn = SimplicialComplex(maximal_simplices)



Lovely. You can download the code here, remember you will need igraph for plotting, and my tigraph module from here. Next time we will look at computing the homology of an abs.

My First Genetic Algorithm

‘Genetic’ is one of those words that make things cool, some others are topological, chaos or chaotic, fractal, quantum and recursive. That was not intended to be an exhaustive list. So for a while I have wanted to have a go at making a genetic algorithm and I have finally gotten around to it.


How they work

First let me refer you to the wiki, then I’ll quickly talk about the things I think are important. The main idea is to use an search algorithm inspired by evolution to solve a problem.

There are four core ideas.

Parametrization of Solution

To begin with we assume that a solution takes the form of a binary vector of length m. Each component of a solution vector can be thought of as a gene, which can be on or off (1 or 0 resp.).

The vectors I will call individuals, or creatures, or candidate solutions, or vectors. The population is a set of individuals, to begin with we initialize the population randomly or however you like.


We also assume that we have a function f: 2^n: \to \mathbb{R}, this measures an individual’s fitness. Fitness is how well the solution, parametrized by the binary vector, solves your problem. So we will assume that bigger numbers mean better solutions.


Once we have a population, we assess their fitness.  Now we want to create new individuals, hopefully with better fitness. We do this by combing different parent solutions to create child solutions. One simple way to do this is called uniform crossover , a function taking solutions to solutions is called a genetic operator, a binary genetic operator is a crossover. Uniform crossover simply means that the child’s ith gene is equal to the father’s or mothers with equal probability (alert! hetronormative language in use! please evacuate the building! ). But not everybody get’s to be a parent, ideally you want the chance that a creature has a child to depend in some way on its fitness.

One way of selecting parents is called tournament selection. The idea is that you grab k creatures from your population, and the one with the highest fitness is selected. The parameter k then controls selection pressure.


‘Mutation’ also happens to be one of the cool words. In this context it is an unary genetic operator, that we think of as perturbing a solution vector. There are many ways to do this, perhaps to the simplest is to flip each gene from on to off or vice versa with probability p called the mutation rate.

The problem with this can be that it encourages genes to be on where there are few on and off when there are many on. If the number of on genes is part of what is interesting, say if your vector represented variable selection in a machine learning model, then you do not want the mutation to be biased towards a certain number of on genes.

This means that ideally you should use a kind of mutation suitable for your problem, but even if you don’t make the best choice, natural selection will combat biases in the mutation, more or less successfully.

Since I want the expected number of on genes to be the same after mutation, my approach is to, with probability p pick an on gene at random and turn it off and with probability p pick an off gene and turn it on, repeated n times.

Ok I have talked enough about the theory, let’s make one! First we need a problem to solve.

The Problem

I will randomly choose an integer k between 0 and 2^n. Your job is to guess what k is.

The information you are given is that after each guess \phi, I will select N random integers from the same range,  write it as a binary vector, and then I will then tell you how often either:

1. The random integer is both \le \phi and \le k coordinate-wise; or

2. The random integer is both not \le \phi and  not  \le k coordinate-wise.

So I will tell you an integer from 0, \dots, N that describes the size of the subset for which \phi, k are upper bounds and the size of the subset for which \phi, k are lower bounds.

The Code

You can view it statically here, or download the notebook and play with it yourself, or just get the source – happy evolutions!