Gradient Descent
How a network learns to improve itself.
The Story
We have a loss function. We need to adjust the weights to minimize it. Gradient descent is the algorithm that does this — it is the engine of learning in every neural network.
The Intuition
Imagine you are on a foggy mountainside and want to reach the valley. You can't see the bottom, but you can feel which direction is downhill under your feet. You take a step downhill. Then another. That is gradient descent.
The Math
The gradient tells us the direction of steepest ascent. We go the opposite way:
w_new = w_old - learning_rate * gradient
The learning rate controls how big each step is. Too big and you overshoot. Too small and it takes forever.
Numerical Gradient
Before doing calculus, you can approximate the gradient numerically:
def numerical_gradient(f, x, h=0.0001):
return (f(x + h) - f(x - h)) / (2 * h)
This is slow but correct — and it teaches you what a gradient is before you learn the chain rule.
Exercises
Primary Source: Andrej Karpathy — micrograd (Video 1)