Prior reading: Competitive Dynamics and Safety | P-Hacking and Benchmarks

The Premise

The competitive-dynamics post explains why motivation for AI safety is lacking. This post assumes the opposite: everyone is motivated. Every nation, every company, every researcher genuinely wants to regulate AI well. It's still incredibly hard. Here's why.

The Object You're Regulating Keeps Changing

Capability Jumps Are Unpredictable

Regulation assumes you can define what you're regulating. But AI capabilities change discontinuously. A model goes from "can't do X" to "can do X fluently" between training runs, sometimes between scale thresholds no one predicted.

You can't write regulation for capabilities that don't exist yet but might next quarter. By the time you've regulated today's dangerous capability, three new ones have appeared.

Categories Don't Hold

Is an LLM a search engine? A tool? An agent? An author? A medical device? The legal and regulatory categories we have were designed for objects that stay in one category. AI systems cross categories depending on how they're used, and the same model can be a harmless chatbot or a bioweapon assistant depending on the prompt.

"Frontier" Is a Moving Line

Regulation often targets "frontier models" or "models above X compute threshold." But today's frontier is next year's open-source commodity. Distillation can produce small models with frontier-level capabilities in narrow domains. Any bright line you draw will be in the wrong place within months.

This is already a problem, but it gets much worse once you look at why these lines erode.

The Compute Threshold Trap

Current governance frameworks — the EU AI Act, US Executive Order 14110 — lean heavily on training compute as a regulatory trigger. The logic is intuitive: bigger models require more compute, bigger models are more capable, therefore compute is a reasonable proxy for danger. Set the line at $10^{26}$ FLOPs and you've captured the frontier.

The problem is that this treats risk as a function of production cost. It isn't.

Efficiency Is Eating the Threshold

Algorithmic efficiency improves relentlessly. Rough estimates suggest that the compute required to reach a fixed capability level drops by something like an order of magnitude per year. Better architectures, better data curation, better training recipes — each generation does more with less.

What this means concretely: if a $10^{26}$ FLOP threshold captured "dangerous" models in 2024, by 2026 you might achieve comparable capability at $10^{24}$ FLOPs. The regulatory line hasn't moved. The technology has sprinted past it.

This isn't a one-time adjustment problem. It's exponential decay. A static threshold doesn't slowly become outdated — it becomes irrelevant on a schedule. You can almost set a calendar reminder for when your compute-based regulation stops working.

Risk Isn't One-Dimensional

Even if you could fix the efficiency problem, there's a deeper issue: danger isn't a scalar function of compute. The safety community broadly agrees that risk emerges from the interaction of at least three properties:

  • Intelligence ($I$): How well the system reasons — its ability to solve novel problems, plan, and generalize.
  • Generality ($G$): How broadly it can apply that reasoning across domains.
  • Agenticity ($A$): How autonomously it can act — its effective planning horizon and ability to take actions in the world.

A model can be highly intelligent but not agentic (an oracle you query). It can be agentic but narrow (a robot that navigates warehouses). The danger surface $D = f(I, G, A)$ is nonlinear and multi-dimensional. A single-axis threshold based on FLOPs collapses all of this into one number and hopes for the best.

The particularly worrying part of this topology is the interaction between intelligence and agenticity. High intelligence with low agenticity is probably manageable — that's just a very smart tool. But there seems to be a regime where adding agenticity to an already-intelligent system causes the risk derivative to spike. Not a gradual ramp, but a phase transition. We don't know exactly where that transition sits, which makes drawing lines even harder.

Proxy Agency: The Sandbox Doesn't Hold

Here's a failure mode that makes the I/G/A framing concrete. Suppose you build a highly intelligent model and restrict its agenticity — no internet access, no tool use, text-in text-out only. Is it safe?

Maybe not. If the model is intelligent enough, it can generate proxy agency by persuading the humans who interact with it. It doesn't need API access to the internet if it can convince an operator to look something up. It doesn't need tool use if it can write instructions compelling enough that a person follows them. The model's effective agenticity isn't just what you grant it in the sandbox — it's what it can induce through its output channel.

This means the axes aren't independent. High intelligence can bootstrap agenticity through social engineering, even in a locked-down environment. A compute threshold that ignores this interaction is measuring the wrong thing entirely.

Edge Proliferation and Enforceability

All of the above assumes you can even identify who to regulate. That assumption is eroding fast.

Models Are Getting Small and Cheap

When frontier capability required a datacenter and tens of millions of dollars, regulation had a natural enforcement point: there were only a handful of actors to monitor. But models are getting dramatically cheaper at fixed quality. Open-weight releases mean last year's frontier is available for download. Quantization and distillation squeeze capable models onto consumer hardware. Fine-tuning on commodity GPUs can specialize a model for tasks that would have been frontier-only a year ago.

The enforcement surface is expanding from "a few big labs" to "anyone with a laptop," and it's happening on roughly the same exponential schedule as the efficiency gains.

The Edge Problem

Once capable models run on edge devices — phones, personal computers, local servers — you lose visibility. There's no API call to audit, no cloud provider to subpoena, no training run to monitor. The model is just there, running locally, doing whatever the operator wants.

This is the proliferation problem. Regulating a handful of frontier labs is hard but tractable. Regulating millions of people running local models is a different kind of problem entirely — closer to regulating what people do with spreadsheets than what pharmaceutical companies do with drugs.

Enforceability Requires Bottlenecks

Effective regulation needs chokepoints: points in the supply chain where you can observe, verify, and intervene. For AI, the candidate chokepoints are:

  • Compute providers: Cloud companies can monitor training runs. But this misses local training and edge inference.
  • Model distributors: Platforms can restrict downloads. But models leak, get pirated, get replicated.
  • Hardware: You could restrict GPU sales. But this has massive collateral damage, triggers geopolitical conflict, and still doesn't prevent inference on older or repurposed hardware.

None of these chokepoints are airtight, and they all become leakier as efficiency improves. The history of trying to control information technology through distribution bottlenecks — from cryptography export controls to DRM to content filtering — is not encouraging.

You Can't Measure What You're Trying to Control

Safety Has No Units

We can measure FLOPS, parameter count, benchmark scores. We cannot measure "how dangerous is this model" in any agreed-upon unit. Without measurement, regulation is guesswork.

Evaluations Are Gameable

Any safety benchmark becomes a target. If regulation requires passing Benchmark X, companies optimize for Benchmark X. The benchmark stops measuring safety and starts measuring benchmark performance. This is Goodhart's law applied to governance, and it's not hypothetical — it's the default outcome whenever you attach regulatory consequences to a metric.

Capabilities Are Dual-Use

The same capability that makes a model dangerous makes it useful. "Can reason about biology" is both a bioweapon risk and a drug discovery tool. "Can write persuasive text" is both a manipulation risk and a communication aid. Regulating the capability means regulating the benefit. There's no scalpel — only a sledgehammer.

The Regulatory Apparatus Is Too Slow

Legislative Timelines vs. Development Timelines

A major AI bill takes 1-3 years from proposal to law. A major AI capability can emerge in weeks. By the time legislation passes, the technical landscape it was designed for no longer exists.

This is sometimes framed as "just move faster." But democratic lawmaking is slow for good reasons — deliberation, public comment, amendment, review. Rushing that process to keep pace with AI development means either writing bad law quickly or concentrating power in unelected agencies that can act unilaterally. Neither outcome is obviously good.

The Expertise Gap

Effective regulation requires understanding what you're regulating. The people who deeply understand AI systems are mostly employed by the companies being regulated. Regulatory bodies struggle to recruit and retain technical talent at government salaries. The result is a persistent information asymmetry between the regulators and the regulated, which is a recipe for policy that sounds reasonable but misses the technical reality.

International Coordination

Even if one country gets policy right, AI is global. International agreements require shared definitions (what is a "frontier model"?), shared measurement (how do you verify compliance?), shared enforcement (what happens when someone defects?), and shared speed (the slowest negotiator sets the pace).

None of these exist yet. Building them takes years. The technology won't wait. And unlike nuclear nonproliferation, where the dangerous material is physically hard to produce, the "dangerous material" in AI is information — weights, architectures, training recipes — which is trivially easy to copy and distribute.

The Self-Undermining Problem

AI Changes the Policy Environment

AI itself transforms the information environment that policy depends on. AI-generated content floods public discourse, making informed democratic deliberation harder. AI tools accelerate lobbying and influence campaigns. AI automates the production of policy analysis — but also policy manipulation.

The tool you're trying to regulate is actively reshaping the ground you're standing on to regulate it. This isn't true of most regulated technologies. Cars don't make transportation policy harder to write. AI might genuinely make AI policy harder to write.

Regulatory Capture by Speed

Even well-intentioned companies will push for regulation they can comply with faster than competitors. The natural outcome is regulation that codifies current industry practice rather than pushing toward actual safety — regulation that protects incumbents rather than the public. And because the incumbents are also the primary source of technical expertise for regulators, this dynamic is structurally hard to escape.

What Might Actually Work

None of this means policy is hopeless. It means static, threshold-based, one-time legislation is the wrong tool. Here's what might be better:

Dynamic Thresholds

If you're going to use compute thresholds, at minimum they need to float. Peg the reporting threshold to the current efficiency frontier: if the state-of-the-art achieves last year's benchmark performance at one-tenth the cost, the threshold drops by 10x. This is conceptually simple — an index, updated annually, like inflation adjustments in tax law. It doesn't solve the multi-dimensional problem, but it at least keeps the one-dimensional proxy from going stale.

Capability-Based, Not Resource-Based

Instead of regulating inputs (how much compute you used), regulate outputs (what the model can do). This is harder to measure but closer to what actually matters. Define specific dangerous capabilities — autonomous bioweapon synthesis, persuasion-based manipulation at scale, autonomous cyber offense — and require developers to evaluate for them. The evaluation regime needs to update as new capabilities emerge, but at least it's pointed at the right target.

Liability Over Prescription

Instead of specifying what models can and can't do (which becomes outdated), hold developers liable for harms. Let them figure out how to be safe, then hold them accountable when they're not. This tracks the moving target better because the obligation ("don't cause harm") doesn't change even when capabilities do. It's also more robust to gaming — you can optimize around a benchmark, but you can't optimize around actual consequences.

Red Lines Over Bright Lines

Rather than thresholds ("models above X FLOPS"), define prohibited outcomes ("no model may enable autonomous bioweapon synthesis") and require developers to demonstrate their models can't produce them. The burden is on developers, and the definition is outcome-based rather than capability-based.

Institutional Infrastructure

Standing bodies with technical expertise, security clearances, and the authority to act between legislative cycles. Something between "pass a new law every time" and "unelected officials decide everything." The FDA model — a permanent agency with domain expertise and enforcement power — is probably closer to right than the current approach of periodic executive orders and omnibus bills.

The Honest Assessment

Even with universal motivation, AI policy is a problem of regulating a moving target you can't measure, using tools that are too slow, with thresholds that erode exponentially, in a landscape where the target is proliferating to the edge and reshaping the policy environment as it goes. Every hard problem in regulation — measurement, enforcement, international coordination, speed — is harder for AI than for basically any technology we've previously tried to govern.

This doesn't mean policy is useless. It means policy alone is insufficient. Governance has to be paired with technical safety research that makes the regulatory problem more tractable, and it has to be designed for adaptation rather than permanence. The worst outcome is a false sense of security — confident regulation that was accurate for about six months and now serves mainly to prevent regulators from admitting they need to start over.