Home » python

Category Archives: python

Understanding scan() in Theano

This is just short post since the meat is in an ipython notebook. I discuss the scan function in Theano, used for looping, iteration over tensors and recurrences. If you are a complete beginner to theano, check out my previous post. It seemed a bit obscure when I first started trying to understand it, so I thought I would share my learning process and hopefully speed up yours!

My sources for learning were these two great resources: this and that.

OK check it out here and let me know what you think.

Theano, Autoencoders and MNIST

Recently I have finally upgraded my ancient laptop. Other than the ability for me to play the occasional video game, this means that I now have a dedicated Nvidia graphics card, in particular one that supports something called CUDA.

Now that I have the hardware, I thought that it would be fun to check out this python library theano. As I understand it, it has two very interesting aspects.

One is that it does symbolic computation, that is it represents variables as in mathematics, and their manipulations.

A variable in computer science is something like a integer or a string or  a list etc…, a piece of information stored somewhere in memory with a name made up by the programmer, and the compiler takes care of associating the human understandable name to the address of the information in memory.

A variable in mathematics is like x in x^2 + x + 1, an indeterminate quantity that can be manipulated using arithmetic operations, or using calculus and so on. So what in mathematics we call a variable corresponds roughly to a function in computer science.

The difference between symbolic computation and its variables, and regular functions, is that the programming language is aware of the mathematical nature of a variable, such that when you apply operations to variables to create new variables, it does this in an efficient way. The best way to understand this is a simple example. Let’s suppose we have two functions f(x) =x, g(x) = -x. Now we can combine these functions by adding them to make a new function h(x) = f(x) + g(x).

Now in python say, when you call h with an argument, it will pass this argument to both f and g and then add the result and return it.

If we are doing symbolic computation, and f,g were variables with f=-g, then the variable h=f+g=0 constantly. And so the computer does no work at all when using h, because it knows that it must always be zero.

So one important aspect is that symbolic computation can simplify and hence often optimize functions automatically.

Another is that theano can automagically differentiate variables with respect to variables! This might not sound like a big deal, but it is. In machine learning, we often create a complicated function, like in logistic regression or neural networks, and then derive a helpful form for the derivative to do gradient descent to optimize it. In theano, all you have to do is specify the objective function, and it will let you do gradient descent automatically.

The second amazing thing about theano is that it compiles its theano functions into C, and so it often runs much much faster than native python or numpy. It can also make it easier to run computations on a gpu which can mean a 100x speedup for the right cases!

Hopefully I’ve made you curious, so  click here for an installation guide. On this linux machine it was very simple. I am dual booting windows 8.1 and Ubuntu at the moment, it is very tricky to get the gpu aspect to work on windows, but here is a guide for the brave. To check that the GPU aspect is working, run the test code here.

Introduction to Theano

I recommend that you read the tutorials here, and also this excellent ipython worksheet here.

Let’s little play with the basic functionality.

import theano as th
from theano import tensor as T

#To make a variable, specify the type of variable from the tensor
#library.

#We can have scalars.
x=T.dscalar(name='scalar_x')

#Note that the name is just a string that theano will use
#to communicate to us about the variable.
print x
print x.type
#We can also have vectors.
v = T.dvector(name='vector_v')
print v
print v.type
#And matrices.
A = T.dmatrix(name='matrix_A')
print A
print A.type

#We can also make new variables using standard mathematical operations.
x_2 = x*x
print x_2
print x_2.type

#We can also make python functions which return new variables.
def power(variable, n):
    return variable**n
x_10 = power(x,10)
print x_10
print x_10.type

#We can of course do standard linear algebra operations also.
Av = T.dot(A,v)
print Av
print Av.type

#Of course when a variable is actually evaluated, you must 
#ensure that the dimensions are correct, or you will
#get an error.

#To see the value of a variable for a particular value of the variables
#comprising it, we make a theano function.
f = th.function([A,v], [Av])
#The syntax is a list of input variables, and then a list of output variables.

#The python code takes a little longer to run initially, because thenao
#compiles the function into C, but thereafter it will run extremely fast.

#Let's try using the function.
import numpy as np
m=np.ones((2,2))
print m
vec = np.asarray([1,2])
print vec
print f(m,vec)

#Now we can try computing gradients.
#First let's make a scalar variable by taking an inner product.
w=T.dvector(name='w_vector')
vTw = T.dot(v,w)

#Now we take the gradient with respect to w.
vTw_grad = T.grad(vTw,w)
print vTw_grad
print vTw_grad.type

#Now let's test it.
vec1 = np.asarray([1,2])
vec2 = np.asarray([0,0])

#To evaulate a variable given inputs, there is another syntax
#in additon to creating a thenano function.
print vTw_grad.eval({w:vec1,v:vec2})


#Next we discuss theano's shared variables,
#these differ from regular theano variables in that
#theano variables only have a value within a theano function
#whereas shared theano variables have a value independent of 
#being called in a function.
w=T.shared(name='shared_matrix', value=np.ones((2,3)))
print w
print w.type
print w.get_value()

#You can also set a shared variable's value.
w.set_value(np.zeros((2,3)))
print w.get_value()

#You can also have theano functions update shared variables,
#we illustrate this with a silly example that updates a matrix
#like in gradient descent.
x=T.dvector('x')
wx=T.dot(w,x)
cost = wx.mean()
cost_grad = T.grad(cost, w)

f=th.function(inputs=[x], outputs=[wx, cost], updates=[(w, w-0.1*cost_grad)])
#Notice the syntax of updates argument, should be a list
#of two tuples of the form: (variable_to_be_updated, updated_variable).
print f([1,1,1])
print w.get_value()

If you want to tinker with this, get the sourcecode here.

Autoencoders

I don’t want to go into too much detail here, but an autoencoder is a feedforward neural network with a single hidden layer with the twist being that its target output is exactly the same as its input. If the number of hidden neurons is less than the number of input/output neurons, then the network will be forced to learn some sort of compression.

Even when the number of hidden neurons is large, there are various tricks in cost functions you can use to ensure that the network does something more interesting than learning the identity function.

The hope of an autoencoder is that it learns some structure in the data, so that the hidden layer can be thought of as a feature space that one can use for a supervised learning algorithm.

Basically then we have x \mapsto f_1(W_1x+b_1) =h the hidden layer, then h \mapsto f_2(W_2h+b_2), where b_1, b_2 are bias vectors, W_1, W_2 are matrices, and f_1, f_2 are element-wise non-linearities like the logistic function.

A simplification that is often made is to enforce W_1 = W_2^T. This gives the model less free parameters to learn, and so is a form of regularization. It also uses less memory. This is referred to as an autoencoder with tied weights.

Autoencoders with Theano

Thanks to automatic differentiation, backprogation is effortless with theano, and thanks to gpu computation, much faster also! I recommend having a look at my python implementation of backpropagation just so you can see the effort saved. This implementation benefited greatly from this tutorial.

import numpy as np
import theano as th
from theano import tensor as T
from numpy import random as rng

class AutoEncoder(object):
    def __init__(self, X, hidden_size, activation_function,
                 output_function):
        #X is the data, an m x n numpy matrix
        #where rows correspond to datapoints
        #and columns correspond to features.
        assert type(X) is np.ndarray
        assert len(X.shape)==2
        self.X=X
        self.X=th.shared(name='X', value=np.asarray(self.X, 
                         dtype=th.config.floatX),borrow=True)
        #The config.floatX and borrow=True stuff is to get this to run
        #fast on the gpu. I recommend just doing this without thinking about
        #it until you understand the code as a whole, then learning more
        #about gpus and theano.
        self.n = X.shape[1]
        self.m = X.shape[0]
        #Hidden_size is the number of neurons in the hidden layer, an int.
        assert type(hidden_size) is int
        assert hidden_size > 0
        self.hidden_size=hidden_size
        initial_W = np.asarray(rng.uniform(
                 low=-4 * np.sqrt(6. / (self.hidden_size + self.n)),
                 high=4 * np.sqrt(6. / (self.hidden_size + self.n)),
                 size=(self.n, self.hidden_size)), dtype=th.config.floatX)
        self.W = th.shared(value=initial_W, name='W', borrow=True)
        self.b1 = th.shared(name='b1', value=np.zeros(shape=(self.hidden_size,),
                            dtype=th.config.floatX),borrow=True)
        self.b2 = th.shared(name='b2', value=np.zeros(shape=(self.n,),
                            dtype=th.config.floatX),borrow=True)
        self.activation_function=activation_function
        self.output_function=output_function
                    
    def train(self, n_epochs=100, mini_batch_size=1, learning_rate=0.1):
        index = T.lscalar()
        x=T.matrix('x')
        params = [self.W, self.b1, self.b2]
        hidden = self.activation_function(T.dot(x, self.W)+self.b1)
        output = T.dot(hidden,T.transpose(self.W))+self.b2
        output = self.output_function(output)
        
        #Use cross-entropy loss.
        L = -T.sum(x*T.log(output) + (1-x)*T.log(1-output), axis=1)
        cost=L.mean()       
        updates=[]
        
        #Return gradient with respect to W, b1, b2.
        gparams = T.grad(cost,params)
        
        #Create a list of 2 tuples for updates.
        for param, gparam in zip(params, gparams):
            updates.append((param, param-learning_rate*gparam))
        
        #Train given a mini-batch of the data.
        train = th.function(inputs=[index], outputs=[cost], updates=updates,
                            givens={x:self.X[index:index+mini_batch_size,:]})
                            

        import time
        start_time = time.clock()
        for epoch in xrange(n_epochs):
            print "Epoch:",epoch
            for row in xrange(0,self.m, mini_batch_size):
                train(row)
        end_time = time.clock()
        print "Average time per epoch=", (end_time-start_time)/n_epochs
                   
    def get_hidden(self,data):
        x=T.dmatrix('x')
        hidden = self.activation_function(T.dot(x,self.W)+self.b1)
        transformed_data = th.function(inputs=[x], outputs=[hidden])
        return transformed_data(data)
    
    def get_weights(self):
        return [self.W.get_value(), self.b1.get_value(), self.b2.get_value()]

And that is it! Now we should test this puppy out, so let’s find some data.

MNIST

The MNIST data is famous in machine learning circles, it consists of single handwritten digits. Nearly every paper written on neural networks and so on tests their contribution on the data. The task is to classify the digits, but we will just test our autoencoder. First we will steal a very nice helper function to load the data set, this will download it for you if you don’t have it already. You can find it here.

import cPickle
import gzip
import os

def load_data(dataset):
    ''' Loads the dataset

    :type dataset: string
    :param dataset: the path to the dataset (here MNIST)
    '''

    #############
    # LOAD DATA #
    #############

    # Download the MNIST dataset if it is not present
    data_dir, data_file = os.path.split(dataset)
    if data_dir == "" and not os.path.isfile(dataset):
        # Check if dataset is in the data directory.
        new_path = os.path.join(os.path.split(__file__)[0], "..", "data", dataset)
        if os.path.isfile(new_path) or data_file == 'mnist.pkl.gz':
            dataset = new_path

    if (not os.path.isfile(dataset)) and data_file == 'mnist.pkl.gz':
        import urllib
        origin = 'http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz'
        print 'Downloading data from %s' % origin
        urllib.urlretrieve(origin, dataset)

    print '... loading data'

    # Load the dataset
    f = gzip.open(dataset, 'rb')
    train_set, valid_set, test_set = cPickle.load(f)
    f.close()
    #train_set, valid_set, test_set format: tuple(input, target)
    #input is an numpy.ndarray of 2 dimensions (a matrix)
    #which row's correspond to an example. target is a
    #numpy.ndarray of 1 dimensions (vector)) that have the same length as
    #the number of rows in the input. It should give the target
    #target to the example with the same index in the input.

    return (train_set, valid_set, test_set)
        

path="/home/harri/Dropbox/Work/Blogs/triangleinequality/Theano/data/mnist.pkl.gz"

data=load_data(path)

We also want a plotting function to visualize the features learned, I won’t explain this here, but if you google “visualizing features of neural network” you can learn plenty about it.

def plot_first_k_numbers(X,k):
    from matplotlib import mpl,pyplot
    m=X.shape[0]
    k=min(m,k)
    j = int(round(k / 10.0))
    
    fig, ax = pyplot.subplots(j,10)
   
    for i in range(k):

        w=X[i,:]

        
        w=w.reshape(28,28)
        ax[i/10, i%10].imshow(w,cmap=pyplot.cm.gist_yarg,
                      interpolation='nearest', aspect='equal')
        ax[i/10, i%10].axis('off')

    
    pyplot.tick_params(\
        axis='x',          # changes apply to the x-axis
        which='both',      # both major and minor ticks are affected
        bottom='off',      # ticks along the bottom edge are off
        top='off',         # ticks along the top edge are off
        labelbottom='off')
    pyplot.tick_params(\
        axis='y',          # changes apply to the x-axis
        which='both',      # both major and minor ticks are affected
        left='off', 
        right='off',    # ticks along the top edge are off
        labelleft='off')
    
    fig.show()

Now we put it together.

def m_test(data):
    X=data[0][0]
    activation_function = T.nnet.sigmoid
    output_function=activation_function
    A = AutoEncoder(X, 500, activation_function, output_function)
    A.train(20,20)
    W=np.transpose(A.get_weights()[0])
    plot_first_k_numbers(W, 100, False)
    plot_first_k_numbers(W,100, True)

m_test(data) 

Running this, with 500 hidden units for 20 epochs with minibatch size of 20, the following features were learned:

features_learned_500h20_20Nothing too interesting yet, in order to get more useful features we could change the cost function to encourage sparsity, or introduce noise into the inputs, or use a different activation function. Hopefully this has been helpful for those just starting out with theano. Here is the source code.

Binary Search Trees

So in preparation for starting a PhD in Machine Learning/Data Science at the University of Edinburgh .

I have been watching some of the lectures on algorithms and data structures at MIT here.

One of the data structures discussed is the binary search tree (BST), and so in this post I will explain what they are and give a python implementation.

Basics

As you might expect, a BST is a way of organizing data in a binary tree. It consists of a collection of nodes, each of which has a key, corresponding to a piece of data, say an integer. Importantly the keys must be comparable, so that for any two nodes A, B either A.key \le B.key or B.key \le A.key.

Each node also has (at most) three pointers: parent, left child and right child.

Most importantly we have the invariant of the BST. An invariant is, of course, something that stays the same, a property that is not altered under any of the data structures allowable operations. The invariant or search property is that for any given node A all of the nodes in the left subtree rooted at A have keys less than A.key, and all of the nodes in the right subtree have keys greater than A.key.

Any useful data structure has operations. Some obvious things to want to do are to insert and delete. Of course in order to do these things we have to take care not to mess up the search property. To insert we begin at the root and at each stage we ask whether the key to be inserted is lesser, greater or equal to the parent, in which case we move right, left or nowhere respectively. It’s easiest just to see an example:

page1

 

Now I think you get the idea, so let’s start coding this up. First we will make the BST node.

class TreeNode(object):
    #A tree node has a key, which can be compared with other keys,
    #and possibly 'pointers' to a parent, left child or right child.
    def __init__(self, key, parent=None, left_child=None,
                 right_child=None):
        self.parent=parent
        self.left_child=left_child
        self.right_child=right_child
        self.key=key

Next we will start making the BST. The BST remembers the root, and the pointers do the rest of the work. We begin by implementing a helpful method called ‘find’, which returns a node with a given key, or that could (but doesn’t) have a child with that key.

class BinarySearchTree(object):
    def __init__(self, root=None):
        self.root=root

    def find(self, key):
        #Returns a node corresponding to a key.
        #If key is the key of some node, returns that node.
        #If tree is empty, returns None.
        #Otherwise returns a leaf that could accept
        #a node with that key as a child.
        #This function wraps a call to a recursive function self._find.
        if self.root is None:
            return None
        else:
            return self._find(key, self.root)

    def _find(self, key, node):
        #This is a recursive function that does all the work for
        #self.find.
        if key == node.key:
            return node
        elif key < node.key:
            if node.left_child is None:
                return node
            else:
                return self._find(key, node.left_child)
        else:
            if node.right_child is None:
                return node
            else:
                return self._find(key, node.right_child)

Using ‘find’ we can then implement ‘insert’.

    def insert(self, key):
        #Inserts a node with given key.
        #If key already in tree, then returns the node with that key.
        #Otherwise creates a new node and returns it.
        #This takes time of order the height of the tree.
        node = self.find(key)
        if node is None:
            self.root=TreeNode(key)
            return self.root
        elif key == node.key:
            return node
        elif key < node.key:
            left_child = TreeNode(key, parent=node)
            node.left_child=left_child
            return left_child
        else:
            right_child = TreeNode(key, parent=node)
            node.right_child=right_child
            return right_child

So far so good!

The next thing we want to do is be able to delete keys.

If a key belongs to a leaf node then this is simple we just delete the node.

If the node has one child, then if it has a parent then when we delete the node we must connect the parent to the child of the deleted node.

If the node has two children things are a bit more tricky. If we delete the node then we break the tree. What we need to do instead is replace the node’s key with something else in the tree. What we replace it with is the rightmost node of the subtree rooted at the node’s left child, as we shall see later this is the successor of the node in the BST. The following diagram should explain:
page2So we swap the node’s key for this new rightmode node’s, then delete the rightmost node. Since it is rightmost it cannot have a right child, and so it is one of our base cases for the deletion operation. If this is still not clear to you, do some examples on paper and you will soon see the idea. Let’s implement this.

    def delete(self, key):
        #Delete key from tree.
        #If key is not in BST does nothing.
        #Otherwise it calls a semi-recursive function _delete.
        node = self.find(key)
        if (node is None) or (node.key != key):
            return
        else:
            self._delete(node)

    def _delete(self, node):
        #If the node has less than two children we can delete the node
        #directly by removing it and then gluing the tree back together.
        #If node has two children, it swaps a key lower down in the tree to
        #replace the deleted node, and deletes the lower down node.
        if (node.left_child is None) and (node.right_child is not None):
            #Has one right child.
            right_child = node.right_child
            parent = node.parent
            if parent is not None:
                parent.right_child=right_child
                right_child.parent =parent
            else:
                right_child.parent=None
                if node is self.root:
                    self.root=right_child
        elif (node.left_child is not None) and (node.right_child is None):
            #Has one left child.
            left_child = node.left_child
            parent = node.parent
            if parent is not None:
                parent.left_child=left_child
                left_child.parent =parent
            else:
                left_child.parent=None
                if node is self.root:
                    self.root=left_child
        elif node.left_child is None:
            #Has no children.
            parent = node.parent
            if parent is None:
                self.root = None
            else:
                if parent.left_child is not None:
                    left = parent.left_child is node
                else:
                    left = False
                if left:
                    parent.left_child =None
                else:
                    parent.right_child=None
        else:
            right_most_child = self.find_leftmost(node.right_child)
            node.key = right_most_child.key
            self._delete(right_most_child)

    def find_rightmost(self, node):
        if node.right_child is None:
            return node
        else:
            return self.find_rightmost(node.right_child)

    def find_leftmost(self, node):
        if node.left_child is None:
            return node
        else:
            return self.find_leftmost(node.left_child)

Sorting

Ok great, we have now covered the basics of how to modify a BST. Now we want to know how to extract information from it. In particular given a key, we want to be able to find the next largest or smallest key in the BST, if any. Calling the successor (predecessor)  operation repeatedly, starting from the left(right)-most  node, will give us the sorted keys in ascending(descending) order.

I will explain how to find the successor, the predecessor case is of course very similar. Let’s suppose we want to find the successor of node A with right-child B. In this case we choose the left-most child of B which we will call S.

Note that in a subtree rooted at some node, the left-most node is the smallest element in the subtree, and the right-most node is the largest.

So in the subtree rooted at A, the right-subtree rooted at B contains all the elements in the subtree greater than A. The left-most element of the right-subtree is then the smallest element in the subtree greater than A and is a good candidate for a successor. Let’s draw a picture and try to see why this must in fact be the correct node.

page3page4

 

OK, but what if A has no right child? In this case we move up through the ancestors of A, searching for the first ancestor that is a left-child of its parent. If there is no such node, ie we get to the root, then A is the right-most, and hence largest, element in the tree, and has no successor. To see that this is correct employ a similar argument to the previous case.

Let’s code that up.

    def find_next(self, key):
        node = self.find(key)
        if (node is None) or (node.key != key):
            #Key not in tree.
            return
        else:
            ret= self._find_next(node)
            if ret is None:
                #Key is largest in tree.
                return ret
            else:
                return ret.key

    def _find_next(self, node):
        right_child = node.right_child
        if right_child is not None:
            return self.find_leftmost(right_child)
        else:
            parent = node.parent
            while(parent is not None):
                if node is parent.left_child:
                    break
                node = parent
                parent = node.parent
            return parent

    def find_prev(self, key):
        node = self.find(key)
        if (node is None) or (node.key != key):
            #Key not in tree.
            return
        else:
            ret= self._find_prev(node)
            if ret is None:
                #Key is smallest in tree.
                return ret
            else:
                return ret.key

    def _find_prev(self, node):
        left_child = node.left_child

        if left_child is not None:
            return self.find_rightmost(left_child)
        else:
            parent = node.parent
            while(parent is not None):
                if node is parent.right_child:
                    break
                node = parent
                parent = node.parent
            return parent

By now I think we deserve to see the fruits of our hard work, so let’s whip up a plotting routine using igraphs. Don’t worry about the details: since I can’t find a binary tree plotting style in igraphs, I have colored the root green, left children red and right children blue.

In order to preserve information about left and right children, the plotting routine builds the graph in igraphs using a queue.

    def plot(self):
        #Builds a copy of the BST in igraphs for plotting.
        #Since exporting the adjacency lists loses information about
        #left and right children, we build it using a queue.
        import igraph as igraphs
        G = igraphs.Graph()
        if self.root is not None:
            G.add_vertices(1)
        queue = [[self.root,0]]
        #Queue has a pointer to the node in our BST, and its index
        #in the igraphs copy.
        index=0
        not_break=True
        while(not_break):
            #At each iteration, we label the head of the queue with its key,
            #then add any children into the igraphs graph,
            #and into the queue.

            node=queue[0][0] #Select front of queue.
            node_index = queue[0][1]
            G.vs[node_index]['label']=node.key
            if index ==0:
                #Label root green.
                G.vs[node_index]['color']='green'
            if node.left_child is not None:
                G.add_vertices(1)
                G.add_edges([(node_index, index+1)])
                queue.append([node.left_child,index+1])
                G.vs[index+1]['color']='red' #Left children are red.
                index+=1
            if node.right_child is not None:
                G.add_vertices(1)
                G.add_edges([(node_index, index+1)])
                G.vs[index+1]['color']='blue'
                queue.append([node.right_child, index+1])
                index += 1 

            queue.pop(0)
            if len(queue)==0:
                not_break=False
        layout = G.layout_reingold_tilford(root=0)
        igraphs.plot(G, layout=layout)

Now let’s test it!

def test():
    lst= [1,4,2,5,1,3,7,11,4.5]
    B = BinarySearchTree()
    for item in lst:
        B.insert(item)
    B.plot()
    B.delete(5)
    B.delete(1)
    B.plot()
    print B.root.key
    print B.find_next(3)
    print B.find_prev(7)
    print B.find_prev(1)

test()

This produced the following output:
test1
test2
and
“4
4
4.5
None” to the console, as desired.

You can probably see that the operations take O(h) time where h is the height of the tree. This means that we can sort a length n list in O(nh) time. On average h = O(\log n), but if given a monotonic list we will get a linear tree with h=n. So next time we will see how to ‘balance’ the tree using tree rotations to achieve fast operations.

Please feel free to leave any comments, suggestions or corrections! Here is the source code.

Neural Networks Part 2: Python Implementation

Ok so last time we introduced the feedforward neural network. We discussed how input gets fed forward to become output, and the backpropagation algorithm for learning the weights of the edges.

Today we will begin by showing how the model can be expressed using matrix notation, under the assumption that the neural network is fully connected, that is each neuron is connected to all the neurons in the next layer.

Once this is done we will give a Python implementation and test it out.

Matrix Notation For Neural Networks

Most of  this I learned from here.

In what follows, vectors are always thought of as columns, and so the transpose a row.

So first off we have x \in \mathbb{R}^k, our input vector, and y \in \mathbb{R}^m our output vector.

Our neural network N will have N layers L_1, \dots , L_n with L_1   the input layer, and L_n the output layer.

For each layer L_i:

  • write n_i:=|L_i| for the number of neurons in that layer (not including the bias neurons);
  • write f_i: \mathbb{R}^{n_i} \to \mathbb{R}^{n_i} for the layer’s activation function;
  • write f_i': \mathbb{R}^{n_i} \to \mathbb{R}^{n_i} for vector of derivatives of the activation functions;
  • write I_i \in \mathbb{R}^{n_i} for the vector of inputs to the ith layer;
  • write b_i \in \mathbb{R}^{n_{i}} for the vector of biases;
  • write O_i \in \mathbb{R}^{n_i}= f_i(I_i) for the vector of outputs from the ith layer;
  • write O_i' \in \mathbb{R}^{n_i}= f_i'(I_i);
  • write W_i for the n_{i+1} \times n_i real matrix containing the weights of the edges going from L_i \to L_{i+1} (not including bias neurons), so that W_iO_i + b_i = I_{i+1}.

Feeding forward in vector notation

So to feedforward the input we have

x \mapsto W_1x +b_1 \mapsto f_2(W_1x+b_1) \mapsto W_2f_2(W_1x)+b_2 \mapsto \dots \mapsto f_n(W_{n-1} f_{n-1}( \dots W_2f_2(W_1x) )).

And we call this output f_N(x).

Backpropagation in vector notation

Now write E_i \in \mathbb{R}^{n_i} for the error at the layer i defined E_i:=\frac{\partial e}{\partial I_i} .

Then E_n = O_n' \cdot (O_n-y), and E_{i-1} = \left (W_{i-1}^TE_i\right ) \cdot O_{i-1}', where the multiplication \cdot is component wise.

Write \triangle W_i for the matrix with entries (\frac{\partial e}{\partial w_{ij}})_{ij}, for non-bias weights.

Then we have \triangle W_i = \cdot E_{i+1} \cdot O_i^T, the outer product of E_{i+1} and O_i.

And for the biases write \triangle b_i \in \mathbb{R}^{n_i} for the vector with jth component \frac{\partial e}{\partial (b_i)_j}.

Then \triangle b_i =E_{i+1}.

Python Implementation

We will create a class NeuralNetwork and perform our calculations using matrices in numpy.

import numpy as np

class NeuralNetwork(object):
    def __init__(self, X, y, parameters):
        #Input data
        self.X=X
        #Output data
        self.y=y
        #Expect parameters to be a tuple of the form:
        #    ((n_input,0,0), (n_hidden_layer_1, f_1, f_1'), ...,
        #     (n_hidden_layer_k, f_k, f_k'), (n_output, f_o, f_o'))
        self.n_layers = len(parameters)
        #Counts number of neurons without bias neurons in each layer.
        self.sizes = [layer[0] for layer in parameters]
        #Activation functions for each layer.
        self.fs =[layer[1] for layer in parameters]
        #Derivatives of activation functions for each layer.
        self.fprimes = [layer[2] for layer in parameters]
        self.build_network()

    def build_network(self):
        #List of weight matrices taking the output of one layer to the input of the next.
        self.weights=[]
        #Bias vector for each layer.
        self.biases=[]
        #Input vector for each layer.
        self.inputs=[]
        #Output vector for each layer.
        self.outputs=[]
        #Vector of errors at each layer.
        self.errors=[]
        #We initialise the weights randomly, and fill the other vectors with 1s.
        for layer in range(self.n_layers-1):
            n = self.sizes[layer]
            m = self.sizes[layer+1]
            self.weights.append(np.random.normal(0,1, (m,n)))
            self.biases.append(np.random.normal(0,1,(m,1)))
            self.inputs.append(np.zeros((n,1)))
            self.outputs.append(np.zeros((n,1)))
            self.errors.append(np.zeros((n,1)))
        #There are only n-1 weight matrices, so we do the last case separately.
        n = self.sizes[-1]
        self.inputs.append(np.zeros((n,1)))
        self.outputs.append(np.zeros((n,1)))
        self.errors.append(np.zeros((n,1)))

    def feedforward(self, x):
        #Propagates the input from the input layer to the output layer.
        k=len(x)
        x.shape=(k,1)
        self.inputs[0]=x
        self.outputs[0]=x
        for i in range(1,self.n_layers):
            self.inputs[i]=self.weights[i-1].dot(self.outputs[i-1])+self.biases[i-1]
            self.outputs[i]=self.fs[i](self.inputs[i])
        return self.outputs[-1]

    def update_weights(self,x,y):
        #Update the weight matrices for each layer based on a single input x and target y.
        output = self.feedforward(x)
        self.errors[-1]=self.fprimes[-1](self.outputs[-1])*(output-y)

        n=self.n_layers-2
        for i in xrange(n,0,-1):
            self.errors[i] = self.fprimes[i](self.inputs[i])*self.weights[i].T.dot(self.errors[i+1])
            self.weights[i] = self.weights[i]-self.learning_rate*np.outer(self.errors[i+1],self.outputs[i])
            self.biases[i] = self.biases[i] - self.learning_rate*self.errors[i+1]
        self.weights[0] = self.weights[0]-self.learning_rate*np.outer(self.errors[1],self.outputs[0])
        self.biases[0] = self.biases[0] - self.learning_rate*self.errors[1] 
    def train(self,n_iter, learning_rate=1):
        #Updates the weights after comparing each input in X with y
        #repeats this process n_iter times.
        self.learning_rate=learning_rate
        n=self.X.shape[0]
        for repeat in range(n_iter):
            #We shuffle the order in which we go through the inputs on each iter.
            index=list(range(n))
            np.random.shuffle(index)
            for row in index:
                x=self.X[row]
                y=self.y[row]
                self.update_weights(x,y)

    def predict_x(self, x):
        return self.feedforward(x)

    def predict(self, X):
        n = len(X)
        m = self.sizes[-1]
        ret = np.ones((n,m))
        for i in range(len(X)):
            ret[i,:] = self.feedforward(X[i])
        return ret

And we’re done! Now to test it we generate some synthetic data and supply the network with some activation functions.

def logistic(x):
    return 1.0/(1+np.exp(-x))

def logistic_prime(x):
    ex=np.exp(-x)
    return ex/(1+ex)**2

def identity(x):
    return x

def identity_prime(x):
    return 1

First we will try to get it to approximate a sine curve.

#expit is a fast way to compute logistic using precomputed exp.
from scipy.special import expit
def test_regression(plots=False):
    #First create the data.
    n=200
    X=np.linspace(0,3*np.pi,num=n)
    X.shape=(n,1)
    y=np.sin(X)
    #We make a neural net with 2 hidden layers, 20 neurons in each, using logistic activation
    #functions.
    param=((1,0,0),(20, expit, logistic_prime),(20, expit, logistic_prime),(1,identity, identity_prime))
    #Set learning rate.
    rates=[0.05]
    predictions=[]
    for rate in rates:
        N=NeuralNetwork(X,y,param)
        N.train(4000, learning_rate=rate)
        predictions.append([rate,N.predict(X)])
    import matplotlib.pyplot as plt
    fig, ax=plt.subplots(1,1)
    if plots:
        ax.plot(X,y, label='Sine', linewidth=2, color='black')
        for data in predictions:
            ax.plot(X,data[1],label="Learning Rate: "+str(data[0]))
        ax.legend()
test_regression(True)

When I ran this it produced the following:
NN_approx_sine

Next we will try a classification problem with a nonlinear decision boundary, how about being above and below the sine curve?

def test_classification(plots=False):
    #Number samples
    n=700

    n_iter=1500
    learning_rate=0.05

    #Samples for true decision boundary plot
    L=np.linspace(0,3*np.pi,num=n)
    l = np.sin(L)

    #Data inputs, training
    X = np.random.uniform(0, 3*np.pi, size=(n,2))
    X[:,1] *= 1/np.pi
    X[:,1]-= 1

    #Data inputs, testing
    T = np.random.uniform(0, 3*np.pi, size=(n,2))
    T[:,1] *= 1/np.pi
    T[:,1] -= 1

    #Data outputs
    y = np.sin(X[:,0]) <= X[:,1]

    #Fitting
    param=((2,0,0),(30, expit, logistic_prime),(30, expit, logistic_prime),(1,expit, logistic_prime))
    N=NeuralNetwork(X,y, param)
    #Training
    N.train(n_iter, learning_rate)
    predictions_training=N.predict(X)
    predictions_training= predictions_training <0.5
    predictions_training= predictions_training[:,0]
    #Testing
    predictions_testing=N.predict(T)
    predictions_testing= predictions_testing <0.5
    predictions_testing= predictions_testing[:,0]

    #Plotting
    import matplotlib.pyplot as plt
    fig, ax=plt.subplots(2,1)

    #Training plot
    #We plot the predictions of the neural net blue for class 0, red for 1.
    ax[0].scatter(X[predictions_training,0], X[predictions_training,1], color='blue')
    not_index = np.logical_not(predictions_training)
    ax[0].scatter(X[not_index,0], X[not_index,1], color='red')
    ax[0].set_xlim(0, 3*np.pi)
    ax[0].set_ylim(-1,1)
    #True decision boundary
    ax[0].plot(L,l, color='black')
    #Shade the areas according to how to they should be classified.
    ax[0].fill_between(L, l,y2=-1, alpha=0.5)
    ax[0].fill_between(L, l, y2=1, alpha=0.5, color='red')

    #Testing plot
    ax[1].scatter(T[predictions_testing,0], T[predictions_testing,1], color='blue')
    not_index = np.logical_not(predictions_testing)
    ax[1].scatter(T[not_index,0], T[not_index,1], color='red')
    ax[1].set_xlim(0, 3*np.pi)
    ax[1].set_ylim(-1,1)
    ax[1].plot(L,l, color='black')
    ax[1].fill_between(L, l,y2=-1, alpha=0.5)
    ax[1].fill_between(L, l, y2=1, alpha=0.5, color='red')    

test_classification()

When I ran this it looked like this.
NN_sine_classification
The top plot shows the performance on the training data, and the bottom the performance of on the testing data. The points are colored according to the net’s predictions, whereas the areas are shaded according to the true classifications.

As you can see neural networks are capable of giving very good models, but the number of iterations and hidden nodes may be large. Less iterations are possible using good configurations of the network i terms of sizes and numbers of hidden layers, and choosing the learning rate well, but it is difficult to know how to choose these well.

It ended up being quicker to choose a larger number of hidden nodes, a large number of iterations, and a small learning rate, than to experiment finding a good choice. In the future I may write about how to use momentum to speed up training.

The code for this post is available here, please comment if anything is unclear or incorrect!

Snakes on a Manifold: Part 2

Welcome back. Last time we implemented the main components of the game. This time we will implement the main game loop. The job of the loop is to: get keyboard input, update the game state, update the display.

We will now be using pygame, so you might want to check out a tutorial before proceeding, but I think the best way to learn is just to dive right in. You will however need to install pygame before running the code. If you haven’t already, you should download the code from last time too. Finally if you want to skip to playing with the code yourself, here it is.

The Code
Ok here it is.

import pygame
import snake as sk
from pygame.locals import *

class Game(object):
    def __init__(self, size, topology, speed=6):
        pygame.init()
        #Resolution in pixels.
        self.screen_resolution =600
        self.topology=topology
        self.speed = speed #Controls how fast the snake moves.
        self.screen = pygame.display.set_mode((self.screen_resolution,
                                               self.screen_resolution))
        pygame.display.set_caption('SNAKES ON A MANIFOLD!')

        self.board = sk.Board(size, topology)
        #Put the snake in the center.
        self.snake = sk.Snake([size/2, size/2], self.board.opposite_direction,
                         self.board.get_next_coord)
        #Number of grid cells.
        self.size=size
        #Width in pixels of a grid cell.
        self.square_size=int(self.screen_resolution/self.size)
        #Now we name some colors in RGB for ease of use.
        self.black = (0,0,0)
        self.red = (255,0,0)
        self.green = (0,255,0)
        self.blue=(0,0,255)
        self.darkgrey=(40,40,40)

        self.background_colour=self.black
        self.boundary_colour=self.blue
        self.snake_colour = self.green
        self.apple_colour=self.red

        #Initialises screen to background color.
        self.screen.fill(self.background_colour)

        #Then we draw the boundary cells.
        self.draw_boundary()

        #This command updates the changes we have made to the screen.
        pygame.display.update()

        #Main game loop.
        self.main()

    def main(self):
        score=0
        Quit=False
        first=True
        while True: # main game loop
            #Displays score in window caption.
            pygame.display.set_caption('Topology: '+self.board.topology+'  |  Score: '+ str(score))
            #Gets keyboard input
            direction=None
            for event in pygame.event.get():
                if event.type == QUIT:
                    #User has closed window.
                    Quit=True
                    break
                if event.type==pygame.KEYDOWN:
                    #User has pressed key.
                    if event.key==K_q:
                        Quit=True
                        break
                    if event.key==K_UP:
                        direction='UP'
                    if event.key==K_DOWN:
                        direction='DOWN'
                    if event.key==K_LEFT:
                        direction='LEFT'
                    if event.key==K_RIGHT:
                        direction='RIGHT'
            if Quit:
                break
            #To see if anything has happened, we check where the head is.
            head_loc = self.snake.head_position
            #Now we find out what, apart from the head, is in the grid cell.
            head_on = self.board.grid[head_loc[0], head_loc[1]]

            if (head_on == 1) or (head_on==2):
                #The snake has hit the boundary or part of itself.
                Quit=True
                break
            if head_on==3:
                #The snake has found an apple.
                self.snake.grow()
                x,y=self.board.generate_apple()
                self.draw_apple(x,y)
                score+=1

            self.update_board()
            if first:
                #When the game starts, the snake stands still waiting
                #for user input to start the game.
                x,y=self.board.generate_apple()
                first=False
                self.draw_apple(x,y)
                self.draw_snake()
                pygame.display.update()
                while True:
                    got_direction=False
                    for event in pygame.event.get():
                        if event.type == QUIT:
                            Quit=True
                            break
                        if event.type==pygame.KEYDOWN:
                            if event.key==K_q:
                                Quit=True
                                break
                            if event.key==K_UP:
                                direction='UP'
                                got_direction=True
                                break
                            if event.key==K_DOWN:
                                direction='DOWN'
                                got_direction=True
                                break
                            if event.key==K_LEFT:
                                direction='LEFT'
                                got_direction=True
                                break
                            if event.key==K_RIGHT:
                                direction='RIGHT'
                                got_direction=True
                                break
                    if Quit:
                        break
                    elif got_direction:
                        break
                if Quit:
                    break

            self.snake.move(direction)
            self.draw_snake()
            pygame.display.update()
            #This pauses the game to slow it down, adjust to make faster/slower.
            pygame.time.Clock().tick(self.speed)
        #Waiting to quit after died.
        Quit=False
        restart=False
        while True: # main game loop
        #Gets keyboard input

            for event in pygame.event.get():
                if event.type == QUIT:
                    Quit=True
                    break
                if event.type==pygame.KEYDOWN:
                    if event.key==K_q:
                        Quit=True
                        break
                    if event.key==K_r:
                        restart=True
                        break
            if Quit:
                break
            if restart:
                break
        if restart:
            G=Game(self.size, self.topology)
        pygame.quit()

    def draw_boundary(self):
        #Draws any boundary grid cells.
        for row in range(self.size):
            for col in range(self.size):
                if self.board.grid[row, col]==1:
                    y=row*self.square_size
                    x=col*self.square_size
                    rect=pygame.Rect(x,y, self.square_size, self.square_size)
                    pygame.draw.rect(self.screen,self.boundary_colour, rect)

    def draw_snake(self):
        #We add a snake colored square where the head has moved,
        #and add a background colored square where the tail was.
        old_tail_position=self.snake.segments[0].previous_position
        new_head_position=self.snake.head_position

        top=old_tail_position[0]*self.square_size
        left=old_tail_position[1]*self.square_size
        background_rect=pygame.Rect(left,top, self.square_size, self.square_size)
        pygame.draw.rect(self.screen, self.background_colour,background_rect)

        top=new_head_position[0]*self.square_size
        left=new_head_position[1]*self.square_size
        snake_rect=pygame.Rect(left,top, self.square_size, self.square_size)
        pygame.draw.rect(self.screen, self.snake_colour,snake_rect)

    def draw_apple(self, x, y):
        apple_rect=pygame.Rect(y*self.square_size,x*self.square_size,
                               self.square_size, self.square_size)
        pygame.draw.rect(self.screen, self.apple_colour, apple_rect)

    def update_board(self):
        #Updates the numpy array of the board.
        old_tail_position = self.snake.segments[0].previous_position
        new_head_position=self.snake.head_position
        self.board.grid[old_tail_position[0], old_tail_position[1]]=0
        self.board.grid[new_head_position[0], new_head_position[1]]=2

And we’re done! Now you can play a game by typing something like this.

G=Game(20, 'Mobius Strip')

Please let me know what you think. The code is here. If anyone is feeling ambitious, you could add a menu screen where you could select a speed and topology.

Snakes on a Manifold: Part 1

Have you every wondered what it would be like to live on a Mobius Strip? I was playing around with some polygonal schemas for various topologies awhile back and I was reminded of the game ‘Snake’ which I used to play on my dad’s phone when I was a kid. I realized that the game was set on a torus. So for fun, and to illustrate a talk I was giving, I decided to implement this game in python using a module called pygame where you could play with a variety of topologies.

It looks like this:

In this post we will implement the underlying logic of the game, and in the next post we will implement the input/output part with pygame. If you just want the code, click here.

The Code

We will create a Snake class, and a Snake will be composed of Snake_Segment objects. Each Snake_Segment is responsible for its behind, the Snake_Segment immediately behind it, so we have a sort of linked list.

Let’s start with the Snake_Segment.

class Snake_Segment(object):
    def __init__(self, initial_position, behind=None):
        self.position=initial_position #Position on the board.
        self.previous_position = initial_position #Keep track of to tell tail to move there.
        self.direction=None #This is for the head segment.
        if behind == None:
            self.behind=self #The tail of the snake.
        else:
            self.behind=behind

    def move(self, target_coords):
        self.previous_position = self.position
        self.position = target_coords
        #Once the segment moves, it passes along the message to its behind, along with its
        #previous coordinates.
        if not self.behind==self: #Checks that it is not the tail of the snake.
            self.behind.move(self.previous_position)

Now we can build a Snake. Its actions are to move and grow.

class Snake(object):
    def __init__(self, initial_position, opposite_directions, get_next_coords):
        #It is important to keep track of opposite_directions
        #because unless the Snake always moves forwards.
        #The head is the segment at the end of self.segments.
        self.segments=[]
        self.segments.append(Snake_Segment(initial_position))
        self.direction=None
        self.length=1
        #The snake keeps track of the head's board coordinates.
        self.head_position=initial_position
        self.opposite_direction = opposite_directions
        #We pass the snake a function from the Board class to tell it where to move next.
        self.get_next_coords =get_next_coords

    def move(self, direction=None):
        if direction==None:
            #This is the default action, the snake keeps going in the same direction.
            direction = self.direction
        elif self.direction==None:
            #This happens at the beginning before its first movement.
            self.direction=direction
        elif direction == self.opposite_direction[self.direction]:
            #Do nothing in this case because the snake only moves forwards.
            direction = self.direction
        else:
            #The snake turns.
            self.direction=direction
        #Now we get new coordinates for the head based on the direction.
        target_coords, self.direction = self.get_next_coords(self.head_position,
                                                             direction)
        #We then tell the head to move, and the command propagates down to the tail.
        self.segments[-1].move(target_coords)
        self.head_position = target_coords

    def grow(self):
        #We grow by adding a segment to the front of the snake,
        #becoming the new head.
        self.segments.append(Snake_Segment(self.head_position,
                                           self.segments[self.length-1]))
        self.length +=1

So now we have a Snake it needs somewhere to sliver, what we will call a Board. This is the most complicated part of the program because it is tasked with incorporating the topological information. This is handled by its get_next_coord method. Its other duty is to generate apples.

class Board(object):
    def __init__(self, size, topology):
        self.size =size
        #The board will be self.size x self.size squares.
        self.grid=[]
        self.topology = topology #A string like 'Disc', 'Torus', etc.
        self.init_topology()
        self.grid_init()
        self.opposite_direction = {'UP':'DOWN', 'DOWN':'UP', 'LEFT':'RIGHT',
                                   'RIGHT':'LEFT'}
    def init_topology(self):
        #The topological information is contained in self.edges, which tells the board
        #which edges are boundary, which edges are identified, and with what orientations.
        #self.edges is a dictionary with keys 'LEFT', 'RIGHT' , 'UP', 'DOWN',
        #and having values of the form ('edge name or boundary', -1/0/1) where -1 represents
        #identification with reversed orientation, 1 identification with orientation preserved and
        #0 for the boundary case.

        if self.topology =='Disc':
            self.edges ={'LEFT':('BOUNDARY',0), 'RIGHT':('BOUNDARY',0),
                         'UP':('BOUNDARY',0), 'DOWN':('BOUNDARY',0)}
        elif self.topology=='Torus':
            self.edges ={'LEFT':('RIGHT',1), 'RIGHT':('LEFT',1),
                         'UP':('DOWN',1), 'DOWN':('UP',1)}
        elif self.topology=='Sphere':
            self.edges ={'LEFT':('UP',1), 'RIGHT':('DOWN',1),
                         'UP':('LEFT',1), 'DOWN':('RIGHT',1)}
        elif self.topology=='Mobius Strip':
            self.edges ={'LEFT':('BOUNDARY',0), 'RIGHT':('BOUNDARY',0),
                         'UP':('DOWN',-1), 'DOWN':('UP',-1)}
        elif self.topology=='Klein Bottle':
            self.edges ={'LEFT':('RIGHT',-1), 'RIGHT':('LEFT',-1),
                         'UP':('DOWN',1), 'DOWN':('UP',1)}
        elif self.topology=='Projective Plane':
            self.edges ={'LEFT':('RIGHT',-1), 'RIGHT':('LEFT',-1),
                         'UP':('DOWN',-1), 'DOWN':('UP',-1)}

    def set_grid(self, coords, value):
        self.grid[coords[0], coords[1]]=value

    def grid_init(self):
        grid=np.zeros(shape=(self.size, self.size))
        #On the grid, 0s represent empty, 1s represent boundary, 2s represent snake and 3s apples.
        if self.edges['LEFT'][0]=='BOUNDARY':
            grid[::,0] = np.ones(shape=self.size)
        if self.edges['RIGHT'][0]=='BOUNDARY':
            grid[::,-1] = np.ones(shape=self.size)
        if self.edges['UP'][0]=='BOUNDARY':
            grid[0] = np.ones(shape=self.size)
        if self.edges['DOWN'][0]=='BOUNDARY':
            grid[-1] = np.ones(shape=self.size)
        self.grid=grid

    def generate_apple(self):
        x=random.randint(0, self.size-1)
        y=random.randint(0, self.size-1)
        if self.grid[x,y]==0: #Makes sure the square is not already occupied.
            self.grid[x,y]=3
            return [x,y]
        else:
            return self.generate_apple()

    def get_next_coord(self, coord, direction):
        #First handles how to move in a direction without worrying about boundaries.
        if direction == 'UP':
            change_coord = [-1,0]
        elif direction =='DOWN':
            change_coord = [1,0]
        elif direction == 'LEFT':
            change_coord = [0,-1]
        elif direction == 'RIGHT':
            change_coord = [0,1]
        else:
            print 'Invalid direction' #Just in case.
        #Calculate new coordinates optimistically.
        potential_new_coords = [coord[i] + change_coord[i] for i in range(2)]

        #First check to see if we are trying to travel through an edge.
        check=False
        if -1 in potential_new_coords:
            check=True
        elif self.size in potential_new_coords:
            check=True
        if not check:
            #We are in the interior so no problems.
            return [potential_new_coords, direction]

        #Now we know we are traveling through some edge,
        #and the direction tells us which.
        edge, orientation = self.edges[direction]
        #If the snake emerges from the LEFT edge, it will then be moving in the RIGHT direction,
        #and so on.
        new_direction = self.opposite_direction[edge]
        #We automatically know one coordinate because we know the new edge.
        left_or_right=False
        if edge=='BOUNDARY':
            print 'Error, boundary case should not happen here.'
        elif edge == 'LEFT':
            left_or_right=True
            potential_new_coords[1]=0
        elif edge =='RIGHT':
            left_or_right=True
            potential_new_coords[1]=self.size-1
        elif edge == 'UP':
            potential_new_coords[0]=0
        elif edge == 'DOWN':
            potential_new_coords[0]=self.size-1
        else:
            print 'Edge invalid'
        #Now we figure out the other coordinate, which depends on either
        #the old row or column, depending on whether the new edge is opposite
        #or perpendicular.
        if edge == self.opposite_direction[direction]:
            opposite=True
        else:
            opposite=False
        #Find out whether we want the old row or old column.
        if left_or_right:
            if opposite:
                coord_component = coord[0]
            else:
                coord_component = coord[1]
        else:
            if opposite:
                coord_component = coord[1]
            else:
                coord_component = coord[0]
        #Now we find out whether the the orientation is flipped.
        if orientation==-1:
            coord_component = self.size-1-coord_component
        if left_or_right:
            potential_new_coords[0]=coord_component
        else:
            potential_new_coords[1]=coord_component
        return [potential_new_coords, new_direction]

And we’re done! Next time we will make this pretty using pygame. If you want to test it before then, you can print the board grid to the console.
The source code is here.
 

Computing Homology

Last time we introduced abstract simplicial complexes (abs) and how to represent them in Python. Today we will look at the homology groups of an abs.

For an excellent primer on homology I recommend checking out this post , I will just quickly recap the basics.

Homology: A quick reminder

Let X be an abs.

Write X_i for the i^{th}dimensional skeleton of X, formed by restricting to those elements of X with dimension \le i.

If \sigma \in X is a p+1-simplex with vertices labelled 0, \dots, p we may write \sigma = [0, \dots, p] to denote this. To denote the p-simplex consisting of the vertices of \sigma except the i^{th} vertex we write [0, \dots, \hat{i}, \dots p].

For each dimension p we write N_p for the set of p-dimensional simplices, and write n_p = |N_p| for its cardinality.

For each dimension p we write C_p for the pdimensional chain group given by the group presentation C_p =<N_p |[x,y], x=x^{-1},\ \forall \, x,y \in N_p >. What this says is that the generators of the group are the p-simplices, the group is abelian and each element is its own inverse.

An alternative way to think of this is as a vector space, where C_p has basis N_p and is over the field \mathbb{Z}_2 (mod 2 arithmetic). So every element can be represented as a binary vector of length n_p, and addition is bitwise xor.

Now we define boundary maps \delta_{p+1}: C_{p+1} \to C_p. These are both group homomorphisms and linear maps. For a p+1-simplex \sigma = [0,1, \dots, p] define:

\delta_{p+1}\sigma = \sum \limits _{i=0}^{p}[0, \dots, \hat{i}, \dots p], the sum of its p-dimensional faces.

Since N_{p+1} is a a basis for C_{p+1} this determines a linear mapping.

Define Z_{p+1} := ker \, \delta_{p+1} called the p+1dimensional cycle group (subspace).

Define B_p: = Im \, \delta_{p+1} called the pdimensional boundary group (subspace).

Now we can define the pdimensional homology group H_p:= \frac{Z_p}{B_p}.

This is both a quotient group and a quotient vector space.

Finally define the p^{th} Betti number B_p:= \text{rank }H_p = \text{rank } Z_p - \text{rank }B_p.

Computation

We are going to be interested in the Betti numbers as these can be thought of as counting the p-dimensional holes in our space. For B_0 this just means the number of connected components.

In order to calculate the ranks of interest, we must construct the matrices representing the boundary mappings. In order to do this we must agree upon an ordering of the simplices. So we simply use the one generated by the data structure, which is why each simplex has an index attribute.

To find the rank of C_p, which we have called n_p, we simply count the number of p-simplices. So we can add that feature to our SimplicialComplex class easily.

    def chain_rank(self,p):
        return len(self.n_simplex_dict[p])

Now the matrix of \delta_p: C_p \to C_{p-1} will be have n_{p-1} rows, one for each p-1-simplex, and n_p columns, one for each p-simplex. Then the i^{th} column of the matrix will contain the coefficients of the image of the i^{th} \ p-simplex expressed in terms of our basis for C_{p-1}.

So to find the i^{th} column vector, we ask each of a simplex’s boundary faces for their index. These will be the positions of the 1s in the column.

    def boundary_column(self,simplex):
        column_size = self.chain_rank(simplex.dimension-1)
        faces = simplex.faces
        column = [0 for i in range(column_size)]
        for face in faces:
            column[face.index]=1
        return column

Now we simply combine these columns to form the boundary matrix. Note that for p greater than the dimension of the simplicial complex, C_p=0 the trivial group, and similarly for negative p. We don’t have to calculate anything in this case.

This also means that the kernel of \delta_0 is C_0. This is handled by us having the empty set as a kind of -1-dimensional simplex.

    def get_boundary_matrix(self,p, skeleton=None):
        if skeleton==None:
            skeleton=self.dimension
        if ((p>skeleton)or(p<0)):
            return np.asarray([])
        mat=[self.boundary_column(simplex) for simplex in self.n_simplex_dict[p]]
        return np.asarray(mat, dtype=int).transpose()

The skeleton argument is used when you want to only consider simplices up to a certain dimension. For instance if X is given by [0,1,2] then X is a representation of a triangle. Then the zeroth Betti number B_0=1 as there is one connected component. However the zeorth Betti number of X_0 is 3, one for each vertex.

Let’s test what we’ve got so far. In this post Jeremy Kun computes the homology of the tetrahedron with an ‘arm’, he provides a nice picture of it here. If we label the vertices of the tetrahedron 0,1,2,3, then this structure can be specified with the maximal simplices [0,1,2,3], [0,4], [2,4 then restricting to the 2-skeleton, since we want there to be a hole in the tetrahedron.

Let’s see what the boundary matrices are:

def jeremy_example():
    maximal_simplices = [[0,1,2,3],[0,4],[2,4]]
    S = SimplicialComplex(maximal_simplices)
    S.plot(direction='up', label_color='blue')
    for i in range(3):
        print "Boundary matrix of dimension ", i
        print S.get_boundary_matrix(i,skeleton=2)

jeremy_example()

Which produced the following plot (of the 3-skeleton):

jeremy_example

And outputted to the console:

Boundary matrix of dimension 0
[[1 1 1 1 1]]
Boundary matrix of dimension 1
[[1 1 1 0 0 0 1 0]
[1 0 0 1 1 0 0 0]
[0 1 0 1 0 1 0 1]
[0 0 1 0 1 1 0 0]
[0 0 0 0 0 0 1 1]]
Boundary matrix of dimension 2
[[1 1 0 0]
[1 0 1 0]
[0 1 1 0]
[1 0 0 1]
[0 1 0 1]
[0 0 1 1]
[0 0 0 0]
[0 0 0 0]]

If you check his post, you will see that the matrices are the same, up to a reordering of basis, and noticing that 1 \equiv -1 \ (mod \ 2)

So far so good. Now all that remains is to calculate the rank and nullity of each matrix. But this is easily done, we simply diagonalize the matrix and count the pivots. For \delta_{p+1} the number of pivots is equal to \text{rank } B_p and n_{p+1} - \text{rank }B_p = \text{ rank } Z_{p+1} by the rank-nullity theorem.

The form we will put the matrix in is called Smith-Normal form I won’t go through the details of how the algorithm works, they are on the wiki page. If you have seen Gaussian elimination before this is basically that.

    def reduce_matrix(self, matrix):
        #Returns [reduced_matrix, rank, nullity]
        if np.size(matrix)==0:
            return [matrix,0,0]
        m=matrix.shape[0]
        n=matrix.shape[1]

        def _reduce(x):
            #We recurse through the diagonal entries.
            #We move a 1 to the diagonal entry, then
            #knock out any other 1s in the same  col/row.
            #The rank is the number of nonzero pivots,
            #so when we run out of nonzero diagonal entries, we will
            #know the rank.
            nonzero=False
            #Searching for a nonzero entry then moving it to the diagonal.
            for i in range(x,m):
                for j in range(x,n):
                    if matrix[i,j]==1:
                        matrix[[x,i],:]=matrix[[i,x],:]
                        matrix[:,[x,j]]=matrix[:,[j,x]]
                        nonzero=True
                        break
                if nonzero:
                    break
            #Knocking out other nonzero entries.
            if nonzero:
                for i in range(x+1,m):
                    if matrix[i,x]==1:
                        matrix[i,:] = np.logical_xor(matrix[x,:], matrix[i,:])
                for i in range(x+1,n):
                    if matrix[x,i]==1:
                        matrix[:,i] = np.logical_xor(matrix[:,x], matrix[:,i])
                #Proceeding to next diagonal entry.
                return _reduce(x+1)
            else:
                #Run out of nonzero entries so done.
                return x
        rank=_reduce(0)
        return [matrix, rank, n-rank]

    def boundary_rank(self,p, skeleton=None):
        if skeleton==None:
            skeleton=self.dimension
        if (p >= skeleton) or (p<0):
            #Trivial case.
            return 0
        else:
            return self.reduce_matrix(self.get_boundary_matrix(p+1, skeleton))[1]

    def cycle_rank(self, p, skeleton=None):
        if skeleton==None:
            skeleton=self.dimension
        if p==0:
            #Mapping to trivial group so kernel is everything in this case.
            return self.chain_rank(0)
        elif (p>skeleton) or (p<0):
            #Trivial case.
            return 0
        else:
            return self.reduce_matrix(self.get_boundary_matrix(p, skeleton))[2]

    def betti_number(self, p, skeleton=None):
        if skeleton==None:
            skeleton=self.dimension
        return self.cycle_rank(p,skeleton) - self.boundary_rank(p, skeleton)

Now let’s just test it out on the same example. We expect B_0=1 since there is one connected component, B_1=1 from the 1-dimensional hole made by the arm and B_2=1 for the ball inside the tetrahedron’s shell.

def jeremy_example(skeleton=2):
    maximal_simplices = [[0,1,2,3],[0,4],[2,4]]
    S = SimplicialComplex(maximal_simplices)
    S.plot(direction='up', label_color='blue')
    for i in range(4):
        print "Boundary matrix of dimension", i
        print S.get_boundary_matrix(i,skeleton)
        print "Reduced boundary matrix of dimension", i
        print S.reduce_matrix(S.get_boundary_matrix(i,skeleton))[0]
        print "Betti number",i,"=", S.betti_number(i,skeleton)

jeremy_example()

The console outputs:

Boundary matrix of dimension 0
[[1 1 1 1 1]]
Reduced boundary matrix of dimension 0
[[1 0 0 0 0]]
Betti number 0 = 1
Boundary matrix of dimension 1
[[1 1 1 0 0 0 1 0]
[1 0 0 1 1 0 0 0]
[0 1 0 1 0 1 0 1]
[0 0 1 0 1 1 0 0]
[0 0 0 0 0 0 1 1]]
Reduced boundary matrix of dimension 1
[[1 0 0 0 0 0 0 0]
[0 1 0 0 0 0 0 0]
[0 0 1 0 0 0 0 0]
[0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 0 0]]
Betti number 1 = 1
Boundary matrix of dimension 2
[[1 1 0 0]
[1 0 1 0]
[0 1 1 0]
[1 0 0 1]
[0 1 0 1]
[0 0 1 1]
[0 0 0 0]
[0 0 0 0]]
Reduced boundary matrix of dimension 2
[[1 0 0 0]
[0 1 0 0]
[0 0 1 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
Betti number 2 = 1
Boundary matrix of dimension 3
[]
Reduced boundary matrix of dimension 3
[]
Betti number 3 = 0

Just as expected. So we’re done. Get the code here.