Loss Functions, Decision Boundaries, Activation Spaces, and Why MSE

Prior reading: Gradient Descent and Backpropagation Three Ways to Look at a Model Loss surface: The landscape over parameter space. What the optimizer sees. Decision boundary: The surface in input space that separates classes. What the user sees. Activation space: The internal geometry of learned representations. What the model "thinks." These are different views of the same object, but they behave differently. Which Are Data-Dependent? Loss surface: Entirely data-dependent. Change the data, change the landscape. Decision boundary: Data-dependent through training, but fixed at inference. Activation space: Shaped by data and architecture jointly. The architecture constrains which representations are possible; the data selects among them. How They Relate The loss function defines the objective. Gradient descent reshapes the decision boundary to minimize loss. The activation space is the intermediate computation that makes the decision boundary expressible. ...

February 12, 2025 · 3 min · Austin T. O'Quinn

Gradient Descent, Backpropagation, and the Misconceptions That Tripped Me Up

This post starts from the ordinary derivative and builds to gradient descent for neural networks. If you already know multivariable calculus, you can skip to Why the Gradient is Steepest. If you're here for the ML connection, skip to Applied to Machine Learning. But I'd encourage reading the whole thing — several of the "obvious" steps are where my own misconceptions lived. Starting from One Dimension The Derivative as a Rate You have a function $f(x)$. The derivative $f'(x)$ tells you: if you nudge $x$ by a tiny amount $\Delta x$, how much does $f$ change? ...

January 15, 2025 · 36 min · Austin T. O'Quinn
.