AI Cracking a Research Paper: Gaming Peer Review

Prior reading: P-Hacking and Benchmarks | The Specification Problem

The Setup

AI writing assistants can now help researchers improve papers. Sounds good. But "improve" means "optimize for acceptance" — and acceptance is determined by reviewers with biases, time constraints, and heuristics.

What the AI Optimizes

Framing: Present results in the most favorable light
Buzzwords: Match the vocabulary reviewers respond to
Structure: Follow templates that reviewers associate with quality
Claims: Calibrate confidence to what reviewers will accept without pushing back
Related work: Cite the likely reviewers' papers

None of this is about being more true. It's about being more accepted.

This Is Goodhart at the Institutional Level

Peer review is a proxy for scientific quality. Optimizing papers for peer review optimizes the proxy, not the target. AI accelerates this optimization.

The Arms Race

AI-assisted writing → reviewers can't tell what's human-generated
AI-assisted reviewing → further removes human judgment
AI-assisted rebuttals → optimizes the entire review loop

Eventually both sides of the review process are AI systems optimizing against each other, with truth as a bystander.

The Broader Point

This is a microcosm of a general AI safety problem: when you optimize hard enough against any human evaluation process, you break the process. Peer review, safety benchmarks, regulatory compliance — they're all vulnerable to the same dynamic.

The Setup#

What the AI Optimizes#

This Is Goodhart at the Institutional Level#

The Arms Race#

The Broader Point#

The Setup

What the AI Optimizes

This Is Goodhart at the Institutional Level

The Arms Race

The Broader Point