Prior reading: P-Hacking and Benchmarks | The Specification Problem
The Setup
AI writing assistants can now help researchers improve papers. Sounds good. But "improve" means "optimize for acceptance" — and acceptance is determined by reviewers with biases, time constraints, and heuristics.
What the AI Optimizes
- Framing: Present results in the most favorable light
- Buzzwords: Match the vocabulary reviewers respond to
- Structure: Follow templates that reviewers associate with quality
- Claims: Calibrate confidence to what reviewers will accept without pushing back
- Related work: Cite the likely reviewers' papers
None of this is about being more true. It's about being more accepted.
This Is Goodhart at the Institutional Level
Peer review is a proxy for scientific quality. Optimizing papers for peer review optimizes the proxy, not the target. AI accelerates this optimization.
The Arms Race
- AI-assisted writing → reviewers can't tell what's human-generated
- AI-assisted reviewing → further removes human judgment
- AI-assisted rebuttals → optimizes the entire review loop
Eventually both sides of the review process are AI systems optimizing against each other, with truth as a bystander.
The Broader Point
This is a microcosm of a general AI safety problem: when you optimize hard enough against any human evaluation process, you break the process. Peer review, safety benchmarks, regulatory compliance — they're all vulnerable to the same dynamic.