Prior reading: What Are Formal Methods? | Model Checking and Formal Specification

The Problem

Formal methods can prove a system satisfies a specification. The hard part is writing the specification.

"Be helpful and don't cause harm" is not a formal spec. Turning it into one requires resolving ambiguity, edge cases, and value judgments that humans can't even agree on in natural language.

Specification as Translation

Every spec is a translation from human intent to formal language. Translation is lossy. The gap between what we mean and what we write is the specification problem.

Goodhart's Law Enters

Any metric you optimize becomes a poor metric. A formal spec is the ultimate metric — the system will satisfy it exactly. If the spec is slightly wrong, the system is precisely wrong.

The Language Problem

Natural language is ambiguous by design. Formal languages are precise by design. The bridge between them is where alignment breaks.

  • Humans communicate with shared context and implicit norms
  • Formal specs must make everything explicit
  • The things hardest to specify are exactly the things we care most about (fairness, harm, intent)

Is This Solvable?

Possible paths forward:

  • Iterative specification refinement with human feedback
  • Specification learning from demonstrations
  • Compositional specs that build complex properties from verified primitives
  • Accepting that partial specifications + monitoring may be the practical answer