Home » machine learning » Neural Networks Part 2: Python Implementation

Neural Networks Part 2: Python Implementation

Ok so last time we introduced the feedforward neural network. We discussed how input gets fed forward to become output, and the backpropagation algorithm for learning the weights of the edges.

Today we will begin by showing how the model can be expressed using matrix notation, under the assumption that the neural network is fully connected, that is each neuron is connected to all the neurons in the next layer.

Once this is done we will give a Python implementation and test it out.

Matrix Notation For Neural Networks

Most of  this I learned from here.

In what follows, vectors are always thought of as columns, and so the transpose a row.

So first off we have x \in \mathbb{R}^k, our input vector, and y \in \mathbb{R}^m our output vector.

Our neural network N will have N layers L_1, \dots , L_n with L_1   the input layer, and L_n the output layer.

For each layer L_i:

  • write n_i:=|L_i| for the number of neurons in that layer (not including the bias neurons);
  • write f_i: \mathbb{R}^{n_i} \to \mathbb{R}^{n_i} for the layer’s activation function;
  • write f_i': \mathbb{R}^{n_i} \to \mathbb{R}^{n_i} for vector of derivatives of the activation functions;
  • write I_i \in \mathbb{R}^{n_i} for the vector of inputs to the ith layer;
  • write b_i \in \mathbb{R}^{n_{i}} for the vector of biases;
  • write O_i \in \mathbb{R}^{n_i}= f_i(I_i) for the vector of outputs from the ith layer;
  • write O_i' \in \mathbb{R}^{n_i}= f_i'(I_i);
  • write W_i for the n_{i+1} \times n_i real matrix containing the weights of the edges going from L_i \to L_{i+1} (not including bias neurons), so that W_iO_i + b_i = I_{i+1}.

Feeding forward in vector notation

So to feedforward the input we have

x \mapsto W_1x +b_1 \mapsto f_2(W_1x+b_1) \mapsto W_2f_2(W_1x)+b_2 \mapsto \dots \mapsto f_n(W_{n-1} f_{n-1}( \dots W_2f_2(W_1x) )).

And we call this output f_N(x).

Backpropagation in vector notation

Now write E_i \in \mathbb{R}^{n_i} for the error at the layer i defined E_i:=\frac{\partial e}{\partial I_i} .

Then E_n = O_n' \cdot (O_n-y), and E_{i-1} = \left (W_{i-1}^TE_i\right ) \cdot O_{i-1}', where the multiplication \cdot is component wise.

Write \triangle W_i for the matrix with entries (\frac{\partial e}{\partial w_{ij}})_{ij}, for non-bias weights.

Then we have \triangle W_i = \cdot E_{i+1} \cdot O_i^T, the outer product of E_{i+1} and O_i.

And for the biases write \triangle b_i \in \mathbb{R}^{n_i} for the vector with jth component \frac{\partial e}{\partial (b_i)_j}.

Then \triangle b_i =E_{i+1}.

Python Implementation

We will create a class NeuralNetwork and perform our calculations using matrices in numpy.

import numpy as np

class NeuralNetwork(object):
    def __init__(self, X, y, parameters):
        #Input data
        #Output data
        #Expect parameters to be a tuple of the form:
        #    ((n_input,0,0), (n_hidden_layer_1, f_1, f_1'), ...,
        #     (n_hidden_layer_k, f_k, f_k'), (n_output, f_o, f_o'))
        self.n_layers = len(parameters)
        #Counts number of neurons without bias neurons in each layer.
        self.sizes = [layer[0] for layer in parameters]
        #Activation functions for each layer.
        self.fs =[layer[1] for layer in parameters]
        #Derivatives of activation functions for each layer.
        self.fprimes = [layer[2] for layer in parameters]

    def build_network(self):
        #List of weight matrices taking the output of one layer to the input of the next.
        #Bias vector for each layer.
        #Input vector for each layer.
        #Output vector for each layer.
        #Vector of errors at each layer.
        #We initialise the weights randomly, and fill the other vectors with 1s.
        for layer in range(self.n_layers-1):
            n = self.sizes[layer]
            m = self.sizes[layer+1]
            self.weights.append(np.random.normal(0,1, (m,n)))
        #There are only n-1 weight matrices, so we do the last case separately.
        n = self.sizes[-1]

    def feedforward(self, x):
        #Propagates the input from the input layer to the output layer.
        for i in range(1,self.n_layers):
        return self.outputs[-1]

    def update_weights(self,x,y):
        #Update the weight matrices for each layer based on a single input x and target y.
        output = self.feedforward(x)

        for i in xrange(n,0,-1):
            self.errors[i] = self.fprimes[i](self.inputs[i])*self.weights[i].T.dot(self.errors[i+1])
            self.weights[i] = self.weights[i]-self.learning_rate*np.outer(self.errors[i+1],self.outputs[i])
            self.biases[i] = self.biases[i] - self.learning_rate*self.errors[i+1]
        self.weights[0] = self.weights[0]-self.learning_rate*np.outer(self.errors[1],self.outputs[0])
        self.biases[0] = self.biases[0] - self.learning_rate*self.errors[1] 
    def train(self,n_iter, learning_rate=1):
        #Updates the weights after comparing each input in X with y
        #repeats this process n_iter times.
        for repeat in range(n_iter):
            #We shuffle the order in which we go through the inputs on each iter.
            for row in index:

    def predict_x(self, x):
        return self.feedforward(x)

    def predict(self, X):
        n = len(X)
        m = self.sizes[-1]
        ret = np.ones((n,m))
        for i in range(len(X)):
            ret[i,:] = self.feedforward(X[i])
        return ret

And we’re done! Now to test it we generate some synthetic data and supply the network with some activation functions.

def logistic(x):
    return 1.0/(1+np.exp(-x))

def logistic_prime(x):
    return ex/(1+ex)**2

def identity(x):
    return x

def identity_prime(x):
    return 1

First we will try to get it to approximate a sine curve.

#expit is a fast way to compute logistic using precomputed exp.
from scipy.special import expit
def test_regression(plots=False):
    #First create the data.
    #We make a neural net with 2 hidden layers, 20 neurons in each, using logistic activation
    param=((1,0,0),(20, expit, logistic_prime),(20, expit, logistic_prime),(1,identity, identity_prime))
    #Set learning rate.
    for rate in rates:
        N.train(4000, learning_rate=rate)
    import matplotlib.pyplot as plt
    fig, ax=plt.subplots(1,1)
    if plots:
        ax.plot(X,y, label='Sine', linewidth=2, color='black')
        for data in predictions:
            ax.plot(X,data[1],label="Learning Rate: "+str(data[0]))

When I ran this it produced the following:

Next we will try a classification problem with a nonlinear decision boundary, how about being above and below the sine curve?

def test_classification(plots=False):
    #Number samples


    #Samples for true decision boundary plot
    l = np.sin(L)

    #Data inputs, training
    X = np.random.uniform(0, 3*np.pi, size=(n,2))
    X[:,1] *= 1/np.pi
    X[:,1]-= 1

    #Data inputs, testing
    T = np.random.uniform(0, 3*np.pi, size=(n,2))
    T[:,1] *= 1/np.pi
    T[:,1] -= 1

    #Data outputs
    y = np.sin(X[:,0]) <= X[:,1]

    param=((2,0,0),(30, expit, logistic_prime),(30, expit, logistic_prime),(1,expit, logistic_prime))
    N=NeuralNetwork(X,y, param)
    N.train(n_iter, learning_rate)
    predictions_training= predictions_training <0.5
    predictions_training= predictions_training[:,0]
    predictions_testing= predictions_testing <0.5
    predictions_testing= predictions_testing[:,0]

    import matplotlib.pyplot as plt
    fig, ax=plt.subplots(2,1)

    #Training plot
    #We plot the predictions of the neural net blue for class 0, red for 1.
    ax[0].scatter(X[predictions_training,0], X[predictions_training,1], color='blue')
    not_index = np.logical_not(predictions_training)
    ax[0].scatter(X[not_index,0], X[not_index,1], color='red')
    ax[0].set_xlim(0, 3*np.pi)
    #True decision boundary
    ax[0].plot(L,l, color='black')
    #Shade the areas according to how to they should be classified.
    ax[0].fill_between(L, l,y2=-1, alpha=0.5)
    ax[0].fill_between(L, l, y2=1, alpha=0.5, color='red')

    #Testing plot
    ax[1].scatter(T[predictions_testing,0], T[predictions_testing,1], color='blue')
    not_index = np.logical_not(predictions_testing)
    ax[1].scatter(T[not_index,0], T[not_index,1], color='red')
    ax[1].set_xlim(0, 3*np.pi)
    ax[1].plot(L,l, color='black')
    ax[1].fill_between(L, l,y2=-1, alpha=0.5)
    ax[1].fill_between(L, l, y2=1, alpha=0.5, color='red')    


When I ran this it looked like this.
The top plot shows the performance on the training data, and the bottom the performance of on the testing data. The points are colored according to the net’s predictions, whereas the areas are shaded according to the true classifications.

As you can see neural networks are capable of giving very good models, but the number of iterations and hidden nodes may be large. Less iterations are possible using good configurations of the network i terms of sizes and numbers of hidden layers, and choosing the learning rate well, but it is difficult to know how to choose these well.

It ended up being quicker to choose a larger number of hidden nodes, a large number of iterations, and a small learning rate, than to experiment finding a good choice. In the future I may write about how to use momentum to speed up training.

The code for this post is available here, please comment if anything is unclear or incorrect!


  1. […] with theano, and thanks to gpu computation, much faster also! I recommend having a look at my python implementation of backpropagation just so you can see the effort saved. This implementation benefited greatly from this […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: