Prior reading: Competitive Dynamics, Policy, and the Race to the Bottom

Who Built This Thing?

There's a story the AI industry tells about itself. It goes something like: brilliant researchers at well-funded labs, armed with novel architectures and massive compute budgets, built the most capable information-processing systems in history. This story isn't wrong, exactly. It's just incomplete in a way that matters.

The optimization environment that produced modern large language models is not a product of any lab. It is a civilization-scale effort.

Start with the obvious layer. The researchers who build these systems didn't conjure their expertise from nothing. They were educated in publicly funded universities, raised in societies with functioning infrastructure, fed by agricultural systems they had no hand in building. Every AI lab sits atop a civilization's worth of logistics, education, and economic surplus. This is true of any technology company, and it's not a particularly novel observation.

But the dependency goes deeper — and this is where it starts to matter specifically for AI. The data required to train a capable language model is not a corporate asset that was independently produced. It is an aggregate of billions of individual contributions: every forum post, product review, blog entry, uploaded photo, solved CAPTCHA, and inadvertent click through a data-collection consent dialog. The person arguing about politics on Reddit in 2014 was, without knowing it, providing training signal for a system that wouldn't exist for another decade. The grandmother who labeled crosswalks to prove she wasn't a robot was annotating a self-driving car dataset for free.

No individual contribution matters much. In aggregate, they are the entire foundation. Without the accumulated output of ordinary people living ordinary lives online, there is no GPT, no Claude, no Gemini. The labs provided the architecture and the compute. Society provided the substance.

The Commons Is Eating Itself

Here's the problem. That societal substrate — the web-scale corpus of human thought, creativity, error, and argument — is degrading. And the thing degrading it is the product it was used to build.

AI-generated content now accounts for roughly half of all new English-language articles on the web. The term people have settled on is slop: text that is fluent, superficially plausible, and empty. SEO spam, synthetic product reviews, generated blog posts, AI-completed forum answers. The signal-to-noise ratio of the open web is collapsing in real time.

If this were just an annoyance, it wouldn't merit a structural argument. But Shumailov et al. demonstrated in Nature that when generative models are trained on data contaminated with their own outputs, they undergo model collapse — a progressive loss of distributional diversity where the tails of the original data vanish and the model converges on increasingly narrow, homogenized output.1 Rare but valid patterns get pruned. Each generation of recursive training compounds the error. The model doesn't just get worse — it forgets what the full range of human expression looked like.

Now think about who this hurts.

The labs that trained their foundation models on pre-2023 web scrapes — before the flood — hold something new entrants cannot easily replicate: a clean dataset. The open web that a startup would scrape today is a fundamentally different, lower-quality resource than the one that trained the current generation of frontier models. And it's getting worse on a schedule.

Those same incumbents are now signing exclusive licensing deals with publishers — Reddit, the Associated Press, News Corp — to secure authenticated human-generated content as the open web degrades. The cost of these deals is trivial for a company with billions in funding and prohibitive for anyone trying to compete.

What you get is bifurcation. Not a market with many competitors jostling for position, but a structural split between a small number of actors who got in early enough to secure clean data and everyone else. That "everyone else" includes not just startups but entire nations that didn't move fast enough.

Why Nobody Pushes Back

Capitalism's theoretical engine is competition. New entrants challenge incumbents, markets self-correct through rivalry, no position is permanently secure. But if the primary input to your product is a shared commons, and your product poisons that commons for everyone who comes after you, you've built a moat that digs itself. You don't even have to be anticompetitive on purpose. The structural effect is the same.

So why doesn't the society that built this commons demand a say in how it's used?

Because everyone is bought in. Workers depend on the economy that AI companies are reshaping. Pension funds hold AI stocks. Small businesses use AI tools to stay competitive with larger ones who adopted first. The competitive dynamics that pressure nations and companies to defect on safety operate on individuals too — you can't opt out of a system you depend on for your livelihood, even when that system is consuming the commons you contributed to.

This is the bind. The people who collectively produced the training data are the same people whose economic survival is now tangled up in the products built from it. Resistance isn't just psychologically hard — it's materially costly. The system doesn't need to suppress dissent. It just needs to make everyone a stakeholder in its continuation.

What Public Service Changes

If AI's foundational input is a public product — the accumulated data and labor of a society — then there's a case that its output should be too. Not as charity, but as structural logic. Public utilities exist when a resource is collectively produced, naturally monopolistic, and too important to leave to market concentration. AI is starting to check all three boxes.

Consider what a public-service framing actually changes.

It eliminates self-serving actors from the safety equation. The race to the bottom in AI safety exists because companies face competitive pressure to cut corners. A public service doesn't need to ship faster than a rival. It doesn't need to capture market share or satisfy quarterly earnings calls. The entire incentive structure that makes safety a competitive disadvantage disappears — not because the technical problems get easier, but because the institutional pressure to ignore them does. You don't have to convince a public utility to slow down for safety the way you have to convince a company whose competitors won't.

It makes controlling impacts tractable. One of the hardest problems in AI governance is that the regulatory surface keeps expanding — more actors, more edge devices, more jurisdictions. A public service consolidates the deployment surface. You're not trying to regulate thousands of companies and millions of individual operators. You're governing one institution with clear accountability, democratic oversight, and no profit motive to resist regulation. This doesn't make governance easy, but it makes it structurally possible in a way that governing a fragmented private market may not be.

It addresses the ownership problem directly. If the training data is a collective product, then the licensing deals that are currently privatizing it represent an enclosure of the commons. A public service doesn't need to buy access to data that society produced — the alignment between producer and beneficiary is built into the structure.

What It Doesn't Solve

This isn't a silver bullet. Public institutions can be slow, underfunded, captured by political interests, and bad at innovation. The history of government-run technology projects is not uniformly inspiring. A public AI service would still face the measurement problems that make safety hard to evaluate, the capability jumps that make planning difficult, and the international coordination challenges that no single nation's policy can address.

There's also a real tension with capability development. Private competition, for all its pathologies, drives rapid progress. A public monopoly might be safer but slower — and in a world where other nations' private sectors aren't slowing down, "slower" has its own risks. The same international competitive dynamics that pressure companies would pressure nations running public AI services.

And public doesn't automatically mean democratic. A government-controlled AI service in an authoritarian state is a surveillance tool, not a public good. The case for public service is really a case for democratically governed public service, which is a much harder thing to build and sustain.

The Question

The commons that built AI is being simultaneously privatized and degraded. The people who produced it can't opt out of the system consuming it. Competition — the mechanism that's supposed to keep markets honest — is being structurally undermined by the very dynamics of how AI training data works.

You can look at all of this and conclude that better regulation is the answer — and maybe it is. But it's worth asking whether the thing we're trying to regulate into behaving like a public good should maybe just be one.


  1. Shumailov, I., Shumaylov, Z., Zhao, Y., et al. "AI models collapse when trained on recursively generated data." Nature 631, 755–759 (2024). ↩︎