Backpropagation

This post starts from the ordinary derivative and builds to gradient descent for neural networks. If you already know multivariable calculus, you can skip to Why the Gradient is Steepest. If you're here for the ML connection, skip to Applied to Machine Learning. But I'd encourage reading the whole thing — several of the "obvious" steps are where my own misconceptions lived. Starting from One Dimension The Derivative as a Rate You have a function $f(x)$. The derivative $f'(x)$ tells you: if you nudge $x$ by a tiny amount $\Delta x$, how much does $f$ change? ...

Backpropagation

What Happens to a Neural Network's Geometry When You Change How It Learns?

Gradient Descent, Backpropagation, and the Misconceptions That Tripped Me Up