Home » maths » graph theory » A First Taste of Graph Theory

# A First Taste of Graph Theory

1. Motivation

Welcome back. So far we’ve learnt how to clean data, and do some basic plotting. Now we really want to move on to some machine learning proper and make some models.

If you recall we were looking at the titanic data, and trying to split the data using sex, class, age etc, and the name of the game is to have these splits consist of mostly survivors or non-survivors, that way when we predict that a particular group will survive, because the majority in that group do, we will be mostly right.

Consider that if ${51 \% }$ of women survived compared with ${49 \% }$ for men, when we make the prediction, reducing these probabilities to classes, we are throwing away a lot of information.

What we have been groping towards is a kind of crude decision tree learning and this, with some more advanced variations, is what I want to implement.

But what is a decision tree? What is a tree? What is a graph? How are we to proceed if we know no graph theory? Well we could probably get away without any. But we’re better than that. So today I will give a primer on graph theory, will this eventual goal in mind. Once we know something about graphs, we can learn about making graphs with Python, and once we’ve done that, we can try them out on the titanic data.

2. The Definition of a Graph

Here is a picture of a graph:

The red filled circles we call ‘vertices’ or sometimes ‘nodes’. The lines connecting the vertices we call ‘edges’.

A graph is basically a way of showing relationships between things, for instance, imagine that the vertices above were people, and the edges represented the relationship of holding hands, awww.

If you have come across the idea of relations before, you may be thinking of reflexivity, transitivity and symmetry.

A ‘reflexive’ edge is what we call a loop, like this:

A ‘symmetric’ edge is what we call undirected, and it just looks like a line, here’s what directed edges look like:

The beginning vertex of a directed edge is called the source, and the ending vertex is called the target.

Transitivity – Homer taller than Bart and Bart taller than Lisa implies Homer taller than Lisa – you could visualise as a triangle:

Now that we have a visual idea of graphs, let’s define them formally.

Definition 1 An undirected graph ${G}$ is a triple ${(V,E, ends)}$, where ${V}$ is a set of vertices, ${E}$ is a set of edges and ${ends}$ is a mapping from ${E}$ to subsets of ${V}$ of size ${1}$ or ${2}$.

And for directed graphs:

Definition 2 A directed graph ${G =(V,E, source, target)}$, where ${V,E}$ are sets of vertices and edges respectively, and ${source, target: E \rightarrow V.}$

If you’ve studied maths, then you’ll know that one of the key questions to ask about a new mathematical object is: what are the morphisms? Meaning, what are the functions from graphs to graphs that preserve structure.

A graph homomorphism is a function ${f: G \rightarrow H}$ between two graphs, taking vertices to vertices and edges, that ‘respects’ the structure of the graph, in symbols: ${ends \circ f = f \circ ends}$, the function commutes with the ${ends}$ maps.

Notice that the notation suppresses the difference between the ${ends}$ mapping in ${G}$ and ${H}$, but we’re all grown-ups here, we get context.

As you can probably guess, for directed graphs we require ${f \circ source = source \circ f}$ and ${f \circ target = target \circ f}$

A subgraph of a graph is a subset of its vertices, a subset of its edges, with the same mappings but that it is still a graph.

The degree of a vertex is the number of edges connected to it.

3. Types of Graphs

Now we’ll see some of the standard classes of graphs. These will all be simple graphs, meaning that there are no loops, finite numbers of vertices and edges, at most one edge between any two vertices and undirected.

3.1. Linear Graphs

These just look like a line:

3.2. Cyclic Graphs

We’ve already seen two of these, you walk around a cycle and come back to the beginning. We denote the cyclic graph with ${n}$ vertices by ${C_n}$.

3.3. Complete Graphs

This is the situation where everything is connected to everything else, the most edges possible. This is denoted ${K_n}$

3.4. Trees

A tree is connected, meaning there is only one ‘piece’ of the graph, and has no cycles, meaning no parts that look a cyclic graph.

The degree 1 vertices are called leaves, and an n-ary tree is one where each non-leaf has n ‘children’.

4. Representing Graphs

One very natural way to encode the layout of a graph is an adjacency matrix. If ${|V|=n}$ then the matrix will be ${n \times n}$, where each row represents a vertex as a source, and the columns are vertices as targets, and the entry is the number of edges with that source and target.

So if there is a ${2}$ in row ${1}$ and column ${5}$, there are two edges from vertex ${1}$ to vertex ${5}$. Of course the matrix will depend on how you choose to order the vertices. For example, the adjacency matrix of ${C_3 = \left ( \begin{array}{ccc} 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1& 0 \end{array} \right )}$, and the graph looks like this:

Notice that if there are no loops, the diagonal will be ${0}$s, and if undirected then the matrix is symmetric about the main diagonal. Another neat property is that if ${A}$ is the adjacency matrix of a simple graph ${G}$, then ${A^n_{i,j}}$ is the number of paths of length ${n}$ from vertex ${i \mapsto j}$. Prove it yourself!

The problem with adjacency matrices as a data structure, is that it takes a lot of work to change it when you add or delete a vertex. Another way of doing it is with an adjacency list. This is where you have a list of the vertices, with each vertex ‘pointing’ to a list of vertices that are its targets. This can easily be implemented using dictionaries in Python. This makes it easy to find adjacent vertices, but it doesn’t really treat edges with the respect they deserve. We’ll be implementing something where both edges and vertices are objects that know about who they are connected to, this won’t be as fast as an adjacency list, but it will allow us to keep track of a lot of information.

5. Next Time

This was just to give you an idea of graphs are, and some of terminology needed to discuss them. There is loads more stuff to learn about graphs, you can colour edges and vertices, talk about paths and walks on a graph, planar graphs… but that’s all for now. Next time: object orientated programming with Python.

1. […] time we are going to combine the lessons learned about objects and decorators in Python, and about graph theory, to represent graphs as […]

2. […] Welcome back! Last time we introduced the Vertex, Edge, Graph, DirGraph and UnDirGraph classes, now we are going to have some fun and introduce the special graphs like linear, cycle, complete and so on. If you want to refresh your memory on the special graphs, we discussed them here. […]

3. […] So now you know something about Python, and also something about the Titanic data set. Our next goal is to produce a decision tree model for the data. Prior to that I created some Python code to represent graphs, to learn the very basics of graph theory you can click here. […]

4. Great blog so far!!

Represents a cyclic relationship rather than a transitive relationship. You may want to correct that.

Cheers.

5. […] to this problem in this post, which includes graph theory (so it would be very nice if you go and read the basics of graph theory before continuing, although it is not that necessary since I assume you […]