Perceptrons & MLPs

Feed Forward

Sigmoid Function

The sigmoid function provides a smoother step compared to the step function:

The step function can only be useful if the activation is exactly 0, with sigmoid, we can reward results that are close to the target with the smooth step.

Gradient Descent

Article on Gradient Descent Methods.

Batch Update

Batch Update converges to a local minimum much faster than sequential gradient descent. However, it is also more likely to get stuck in a local minimum.

Stochastic Update

Single Stochastic Update provides a performance improvement. It splits the data up into smaller batches and runs the whole training on each batch one at a time, computing the average at the end of each pass. (online explanation)

Glossary

Batch Update Gradient Descent: 4.2.4. Considers all inputs of data when calculating the updated weight vector, converging quickly on a local minimum.
Single Stochastic Gradient Descent: 4.2.7. Running the learning algorithm on smaller groups of data.