Policy

The Case for AI as a Public Service

Prior reading: Competitive Dynamics, Policy, and the Race to the Bottom Who Built This Thing? There's a story the AI industry tells about itself. It goes something like: brilliant researchers at well-funded labs, armed with novel architectures and massive compute budgets, built the most capable information-processing systems in history. This story isn't wrong, exactly. It's just incomplete in a way that matters. The optimization environment that produced modern large language models is not a product of any lab. It is a civilization-scale effort. ...

Why AI Policy Is Hard Even When Everyone Agrees

Prior reading: Competitive Dynamics and Safety | P-Hacking and Benchmarks The Premise The competitive-dynamics post explains why motivation for AI safety is lacking. This post assumes the opposite: everyone is motivated. Every nation, every company, every researcher genuinely wants to regulate AI well. It's still incredibly hard. Here's why. The Object You're Regulating Keeps Changing Capability Jumps Are Unpredictable Regulation assumes you can define what you're regulating. But AI capabilities change discontinuously. A model goes from "can't do X" to "can do X fluently" between training runs, sometimes between scale thresholds no one predicted. ...

Competitive Dynamics, Policy, and the Race to the Bottom

Prior reading: Game Theory for AI Safety | The AI Threat Landscape | P-Hacking and Benchmarks The Core Problem AI safety is short-term costly and long-term valuable. Every actor faces pressure to defect. This post makes two claims. First, the motivation to prioritize safety is structurally lacking — everyone has reasons to cut corners. Second, even if we could fix motivation entirely, the regulatory problem is so hard that good intentions wouldn't be enough. Both have to be true for the situation to be as bad as it is. Unfortunately, both are. ...

The AI Threat Landscape: What 'Safe' Means and What We're Afraid Of

Prior reading: Mesa-Optimization and Three Lenses | Game Theory for AI Safety Part I: What Does "Safe" Even Mean? "Make AI safe" is meaningless without specifying: safe for whom, against what threat, under what conditions? Who Is the User? Public: Lowest common denominator. Must handle naive, careless, and adversarial users simultaneously. Internal / enterprise: Can assume some training, access controls, and monitoring. Knowledgeable human: Researchers, developers. Different failure modes matter. Who Is the Adversary? No adversary: Accidental misuse, honest mistakes. The easiest case. Casual adversary: Jailbreaking for fun, social engineering. Medium difficulty. Sophisticated adversary: State actors, determined attackers with resources. The hard case. What Are We Protecting? Users from the model: Preventing harmful outputs. The model from users: Preventing extraction, manipulation, prompt injection. Society from the system: Preventing large-scale harms (economic disruption, disinfo). The future from the present: Preventing lock-in, power concentration, existential risk. Safety claims without a threat model are empty. A system "safe" for internal research may be wildly unsafe for public deployment. ...

What Is RL, Why Is It So Hard, and How Does It Go Wrong?

Prior reading: Gradient Descent and Backpropagation | Loss Functions and Spaces The Setup An agent takes actions in an environment, receives rewards, and learns a policy $\pi(a|s)$ that maximizes expected cumulative reward. $$\max_\pi \mathbb{E}\left[\sum_{t=0}^T \gamma^t r_t\right]$$ Where Are the Neural Nets? Neural networks serve as function approximators for things that are too complex to represent exactly: Policy network: $\pi_\theta(a|s)$ — maps states to action probabilities Value network: $V_\phi(s)$ — estimates expected future reward from a state Q-network: $Q_\psi(s,a)$ — estimates expected future reward from a state-action pair World model (optional): $p_\omega(s'|s,a)$ — predicts next state Without neural nets, RL only works for tiny state spaces (tabular methods). Neural nets let it scale to images, language, and continuous control. ...