- Perform gradient descent on randomized batches from training data - Small batch size has exponential speed up