Perceptrons & MLPs

Feed Forward

Sigmoid Function

The sigmoid function provides a smoother step compared to the step function:

The step function vs the sigmoid function

The step function can only be useful if the activation is exactly 0, with sigmoid, we can reward results that are close to the target with the smooth step.

Gradient Descent

Article on Gradient Descent Methods.

Batch Update

Batch Update converges to a local minimum much faster than sequential gradient descent. However, it is also more likely to get stuck in a local minimum.

Stochastic Update

Single Stochastic Update provides a performance improvement. It splits the data up into smaller batches and runs the whole training on each batch one at a time, computing the average at the end of each pass. (online explanation)

Glossary