Competitive Dynamics, Policy, and the Race to the Bottom

Prior reading: Game Theory for AI Safety | The AI Threat Landscape | P-Hacking and Benchmarks The Core Problem AI safety is short-term costly and long-term valuable. Every actor faces pressure to defect. This post makes two claims. First, the motivation to prioritize safety is structurally lacking — everyone has reasons to cut corners. Second, even if we could fix motivation entirely, the regulatory problem is so hard that good intentions wouldn't be enough. Both have to be true for the situation to be as bad as it is. Unfortunately, both are. ...

February 11, 2026 · 8 min · Austin T. O'Quinn

The AI Threat Landscape: What 'Safe' Means and What We're Afraid Of

Prior reading: Mesa-Optimization and Three Lenses | Game Theory for AI Safety Part I: What Does "Safe" Even Mean? "Make AI safe" is meaningless without specifying: safe for whom, against what threat, under what conditions? Who Is the User? Public: Lowest common denominator. Must handle naive, careless, and adversarial users simultaneously. Internal / enterprise: Can assume some training, access controls, and monitoring. Knowledgeable human: Researchers, developers. Different failure modes matter. Who Is the Adversary? No adversary: Accidental misuse, honest mistakes. The easiest case. Casual adversary: Jailbreaking for fun, social engineering. Medium difficulty. Sophisticated adversary: State actors, determined attackers with resources. The hard case. What Are We Protecting? Users from the model: Preventing harmful outputs. The model from users: Preventing extraction, manipulation, prompt injection. Society from the system: Preventing large-scale harms (economic disruption, disinfo). The future from the present: Preventing lock-in, power concentration, existential risk. Safety claims without a threat model are empty. A system "safe" for internal research may be wildly unsafe for public deployment. ...

January 21, 2026 · 4 min · Austin T. O'Quinn
.