Gradient Descent, Backpropagation, and the Misconceptions That Tripped Me Up
This post starts from the ordinary derivative and builds to gradient descent for neural networks. If you already know multivariable calculus, you can skip to Why the Gradient is Steepest. If you're here for the ML connection, skip to Applied to Machine Learning. But I'd encourage reading the whole thing — several of the "obvious" steps are where my own misconceptions lived. Starting from One Dimension The Derivative as a Rate You have a function $f(x)$. The derivative $f'(x)$ tells you: if you nudge $x$ by a tiny amount $\Delta x$, how much does $f$ change? ...