Backpropagation
The algorithm that makes deep learning work.
The Story
Gradient descent needs the gradient. For a single weight, numerical gradient works. But for millions of weights, it is too slow. Backpropagation computes all gradients efficiently using the chain rule.
The Chain Rule
If a function is a composition of simple functions, its gradient is a product of simple gradients:
df/dx = df/dg * dg/dx
Forward Then Backward
- Forward pass: compute the output from input to output, saving intermediate values.
- Backward pass: starting from the loss, propagate gradients backward through each layer using the chain rule.
Backpropagation is not a different algorithm. It is gradient descent + the chain rule + clever bookkeeping. Once you see this, the magic dissolves and you understand.
Exercises
Loading Python runtime (Pyodide)...
Primary Source: 3Blue1Brown — Backpropagation (YouTube)