Mesa-Optimization and the Optimization Pressure Spectrum

Prior reading: Gradient Descent and Backpropagation | Decision Theory for AI Safety | What Is RL? The Question Why does an AI system behave the way it does? And why do optimizers keep creating sub-optimizers with misaligned goals? Three frameworks give different (complementary) answers. But first — some terminology that trips people up. Mesa vs. Meta I've seen these two confused often enough that it's worth being explicit. I had to look up what "mesa" even meant the first time I encountered it. ...

August 6, 2025 · 13 min · Austin T. O'Quinn

Decision Theory: What an Agent Will Do and Why

Prior reading: Game Theory: Why Coordination on Safety Is So Hard Game theory is about interactions between agents. Decision theory is about the reasoning inside one agent. This post builds up the framework for understanding what a rational agent will do given its beliefs and preferences — and why that framework has direct, sometimes alarming, consequences for AI safety. Starting from a Single Choice Suppose you're an agent — human, AI, doesn't matter — and you have to choose between actions. You're uncertain about the state of the world. Each action produces an outcome that depends on both your choice and the state. How should you decide? ...

April 9, 2025 · 20 min · Austin T. O'Quinn
.