Gradient Descent

How a network learns to improve itself.

The Story

We have a loss function. We need to adjust the weights to minimize it. Gradient descent is the algorithm that does this — it is the engine of learning in every neural network.

The Intuition

Imagine you are on a foggy mountainside and want to reach the valley. You can't see the bottom, but you can feel which direction is downhill under your feet. You take a step downhill. Then another. That is gradient descent.

The Math

The gradient tells us the direction of steepest ascent. We go the opposite way:

w_new = w_old - learning_rate * gradient

The learning rate controls how big each step is. Too big and you overshoot. Too small and it takes forever.

Numerical Gradient

Before doing calculus, you can approximate the gradient numerically:

def numerical_gradient(f, x, h=0.0001):
    return (f(x + h) - f(x - h)) / (2 * h)

This is slow but correct — and it teaches you what a gradient is before you learn the chain rule.

Exercises

Loading Python runtime (Pyodide)...

Primary Source: Andrej Karpathy — micrograd (Video 1)