The sigmoid function provides a smoother step compared to the step function:
The step function can only be useful if the activation is exactly 0, with sigmoid, we can reward results that are close to the target with the smooth step.
Article on Gradient Descent Methods.
Batch Update converges to a local minimum much faster than sequential gradient descent. However, it is also more likely to get stuck in a local minimum.
Single Stochastic Update provides a performance improvement. It splits the data up into smaller batches and runs the whole training on each batch one at a time, computing the average at the end of each pass. (online explanation)