Prior reading: P-Hacking and Benchmarks | The Specification Problem

The Setup

AI writing assistants can now help researchers improve papers. Sounds good. But "improve" means "optimize for acceptance" — and acceptance is determined by reviewers with biases, time constraints, and heuristics.

What the AI Optimizes

  • Framing: Present results in the most favorable light
  • Buzzwords: Match the vocabulary reviewers respond to
  • Structure: Follow templates that reviewers associate with quality
  • Claims: Calibrate confidence to what reviewers will accept without pushing back
  • Related work: Cite the likely reviewers' papers

None of this is about being more true. It's about being more accepted.

This Is Goodhart at the Institutional Level

Peer review is a proxy for scientific quality. Optimizing papers for peer review optimizes the proxy, not the target. AI accelerates this optimization.

The Arms Race

  • AI-assisted writing → reviewers can't tell what's human-generated
  • AI-assisted reviewing → further removes human judgment
  • AI-assisted rebuttals → optimizes the entire review loop

Eventually both sides of the review process are AI systems optimizing against each other, with truth as a bystander.

The Broader Point

This is a microcosm of a general AI safety problem: when you optimize hard enough against any human evaluation process, you break the process. Peer review, safety benchmarks, regulatory compliance — they're all vulnerable to the same dynamic.