Decision-Theory

Prior reading: Game Theory: Why Coordination on Safety Is So Hard Game theory is about interactions between agents. Decision theory is about the reasoning inside one agent. This post builds up the framework for understanding what a rational agent will do given its beliefs and preferences — and why that framework has direct, sometimes alarming, consequences for AI safety. Starting from a Single Choice Suppose you're an agent — human, AI, doesn't matter — and you have to choose between actions. You're uncertain about the state of the world. Each action produces an outcome that depends on both your choice and the state. How should you decide? ...

Decision-Theory

Mesa-Optimization and the Optimization Pressure Spectrum

Decision Theory: What an Agent Will Do and Why