Mesa-Optimization and the Optimization Pressure Spectrum

Prior reading: Gradient Descent and Backpropagation | Decision Theory for AI Safety | What Is RL? The Question Why does an AI system behave the way it does? And why do optimizers keep creating sub-optimizers with misaligned goals? Three frameworks give different (complementary) answers. But first — some terminology that trips people up. Mesa vs. Meta I've seen these two confused often enough that it's worth being explicit. I had to look up what "mesa" even meant the first time I encountered it. ...

August 6, 2025 · 13 min · Austin T. O'Quinn

Decision Theory for AI Safety

Prior reading: Game Theory for AI Safety Why Decision Theory Matters Every AI system that takes actions is implicitly using a decision theory — a framework for choosing among options given beliefs and preferences. The choice of decision theory determines: Whether the AI cooperates or defects in strategic situations Whether it's manipulable or manipulation-resistant Whether it takes catastrophic gambles or plays it safe How it reasons about its own future behavior Decision theory isn't just philosophy. It's the operating system of agency. ...

April 9, 2025 · 5 min · Austin T. O'Quinn
.