Convolutional Neural Networks

Kernels

An introductory video to the concept

A filter of an image is when we layer the centre pixel of a kernel over every pixel in an image. We then multiply each pixel value with its respective kernel value, sum them all up, and divide by the total value of the kernel.

TL;DR, take the average of the sum of all values in the kernel.

Kernel Examples

Mean Kernel

A mean blur kernel simplest form of kernel; each value of the kernel is 1, so every pixel used by the kernel is summed and divided linearly.

\[ \begin{bmatrix} 1 & 1 & 1\\ 1 & 1 & 1\\ 1 & 1 & 1 \end{bmatrix} \]

Gaussian Kernel

A more complex example of kernel is one with a normal distribution. The gaussian blur is a 2-dimensional normal distribution

It might look something like this:

\[ \begin{bmatrix} 0.25 & 0.5 & 0.25\\ 0.5 & 1 & 0.5\\ 0.25 & 0.5 & 0.25 \end{bmatrix} \]

A gaussian kernel prioritises the pixels in the centre of the kernel

Kernels in Neural Networks

Explaining a simple Convolutional Neural Network

When using kernels in Neural Networks, the values that we apply to the kernels are completely random. This is because each value in the network can be considered a weight.

\[ \begin{bmatrix} w_{1,1} & w_{1,2} & \dots & w_{1,n}\\ w_{2,1} & w_{2,2} & \dots & w_{2,n}\\ \vdots & \vdots & \ddots & \vdots\\ w_{m,1} & w_{m,2} & \dots & w_{m,n} \end{bmatrix} \]

A single kernel

Feed-Forward

To provide an image to a network (for example, a multi-layer perceptron), we have to provide every Red, Green, and Blue value of each pixel as an input node. For a 2MP image, that’s 3 \(\times\) 2,000,000 = 6,000,000 inputs, which isn’t realistic for a simple neural network to compute.

Instead, we pass the image through multiple different kernels, each one returning another image. This is our first feature map, called so because it’s a map of features that the neural network thinks is important. In essence, it’s our first layer of the convolutional network.

We can reduce the resolution between layers of the feature map through a process called Max Pooling by changing the max pool’s dimensions and stride .

We run each layer through another set of convolutions, increasing the number of feature images in the feature map, but reducing each image’s resolution. We do this until each feature map is \(1\times 1\).

These final features can now be represented as nodes, which are fully connected to an output node that returns a single value (or a few values depending on your aims).

Max-Pooling

We create an \(n \times n\) filter that will be our max pooling filter. Then, we define a stride. Then, we run the filter against the convoluted image once again, but instead of stepping by one each time, we step by the value of our stride.

We store the maximum value of each filter in the corresponding location in the output image.

This process is essentially subsampling:

Max Pooling applied to an image of a bird

Back-Propagation

This is the process of adjusting the weights of each kernel’s value based on a prediction on what will reduce the error. This is similar to how a multi-layer perceptron performs back-propagation, using derivations of formulae.

Advantages of Convolutional Neural Networks

Each node on a hidden layer is connected through less parameters
You can add more layers because each one has less inputs than the previous
Convolutions can be repeated as many times as necessary to obtain more accurate results

Glossary

Convolutional Network: A neural network where each node isn’t fully connected to every other node.
Kernel (filter): A 2-dimensional array containing numerical values, to be multiplied with a subsection of an image. More info.
Kernel Dimensions: The width and height of the kernel, for example \(3\times 2\)
Kernel Stride: How many pixels the kernel should skip over when processing. Altering the stride changes the resolution of the output image
Feature Map: A 3D matrix consisting of images that represent distinct features of an input
Max Pooling: The process of reducing the resolution of a feature map