Fundamentals

Decision Theory: What an Agent Will Do and Why

Prior reading: Game Theory: Why Coordination on Safety Is So Hard Game theory is about interactions between agents. Decision theory is about the reasoning inside one agent. This post builds up the framework for understanding what a rational agent will do given its beliefs and preferences — and why that framework has direct, sometimes alarming, consequences for AI safety. Starting from a Single Choice Suppose you're an agent — human, AI, doesn't matter — and you have to choose between actions. You're uncertain about the state of the world. Each action produces an outcome that depends on both your choice and the state. How should you decide? ...

Game Theory: Why Coordination on Safety Is So Hard

This post builds up the game theory you need to reason precisely about AI safety dynamics — races, coordination failures, and regulatory design. If you already know Nash equilibria, you can skip to Games That Show Up in AI Safety. If you're here for solutions, skip to Mechanism Design. But I'd recommend reading the whole thing, because the point isn't any single definition — it's seeing why the structure of the interaction, not the intentions of the players, drives the outcome. ...

Why Sparsity, and How Do We Get It?

Prior reading: Gradient Descent and Backpropagation What Is Sparsity? A sparse representation has mostly zeros. Instead of every neuron firing for every input, only a small subset activates. Why Sparsity Is Good Interpretability: If only 50 of 10,000 features fire, you have a chance of understanding what they represent Efficiency: Sparse computations are cheaper Generalization: Sparse models tend to overfit less — they can't memorize with fewer active parameters Disentanglement: Sparse features tend to correspond to more independent, meaningful concepts How We Achieve Sparsity Architectural: ...

What Is RL, Why Is It So Hard, and How Does It Go Wrong?

Prior reading: Gradient Descent and Backpropagation | Loss Functions and Spaces The Setup An agent takes actions in an environment, receives rewards, and learns a policy $\pi(a|s)$ that maximizes expected cumulative reward. $$\max_\pi \mathbb{E}\left[\sum_{t=0}^T \gamma^t r_t\right]$$ Where Are the Neural Nets? Neural networks serve as function approximators for things that are too complex to represent exactly: Policy network: $\pi_\theta(a|s)$ — maps states to action probabilities Value network: $V_\phi(s)$ — estimates expected future reward from a state Q-network: $Q_\psi(s,a)$ — estimates expected future reward from a state-action pair World model (optional): $p_\omega(s'|s,a)$ — predicts next state Without neural nets, RL only works for tiny state spaces (tabular methods). Neural nets let it scale to images, language, and continuous control. ...

Loss Functions, Decision Boundaries, Activation Spaces, and Why MSE

Prior reading: Gradient Descent and Backpropagation Three Ways to Look at a Model Loss surface: The landscape over parameter space. What the optimizer sees. Decision boundary: The surface in input space that separates classes. What the user sees. Activation space: The internal geometry of learned representations. What the model "thinks." These are different views of the same object, but they behave differently. Which Are Data-Dependent? Loss surface: Entirely data-dependent. Change the data, change the landscape. Decision boundary: Data-dependent through training, but fixed at inference. Activation space: Shaped by data and architecture jointly. The architecture constrains which representations are possible; the data selects among them. How They Relate The loss function defines the objective. Gradient descent reshapes the decision boundary to minimize loss. The activation space is the intermediate computation that makes the decision boundary expressible. ...

Linear Algebra Proofs of Optimality for Gradient Descent

Prior reading: Gradient Descent and Backpropagation When Gradient Descent Is Provably Optimal For convex functions, gradient descent converges to the global minimum. For strongly convex functions, it converges exponentially fast. The proofs are clean linear algebra. The Convex Case A function $f$ is convex if for all $x, y$: $$f(\lambda x + (1-\lambda)y) \leq \lambda f(x) + (1-\lambda) f(y)$$ This means every local minimum is a global minimum. Gradient descent can't get stuck. ...

Gradient Descent, Backpropagation, and the Misconceptions That Tripped Me Up

This post starts from the ordinary derivative and builds to gradient descent for neural networks. If you already know multivariable calculus, you can skip to Why the Gradient is Steepest. If you're here for the ML connection, skip to Applied to Machine Learning. But I'd encourage reading the whole thing — several of the "obvious" steps are where my own misconceptions lived. Starting from One Dimension The Derivative as a Rate You have a function $f(x)$. The derivative $f'(x)$ tells you: if you nudge $x$ by a tiny amount $\Delta x$, how much does $f$ change? ...