The Specification and Language Problem

Prior reading: What Are Formal Methods? | Model Checking and Formal Specification

The Problem

Formal methods can prove a system satisfies a specification. The hard part is writing the specification.

"Be helpful and don't cause harm" is not a formal spec. Turning it into one requires resolving ambiguity, edge cases, and value judgments that humans can't even agree on in natural language.

Specification as Translation

Every spec is a translation from human intent to formal language. Translation is lossy. The gap between what we mean and what we write is the specification problem.

Goodhart's Law Enters

Any metric you optimize becomes a poor metric. A formal spec is the ultimate metric — the system will satisfy it exactly. If the spec is slightly wrong, the system is precisely wrong.

The Language Problem

Natural language is ambiguous by design. Formal languages are precise by design. The bridge between them is where alignment breaks.

Humans communicate with shared context and implicit norms
Formal specs must make everything explicit
The things hardest to specify are exactly the things we care most about (fairness, harm, intent)

Is This Solvable?

Possible paths forward:

Iterative specification refinement with human feedback
Specification learning from demonstrations
Compositional specs that build complex properties from verified primitives
Accepting that partial specifications + monitoring may be the practical answer

The Problem#

Specification as Translation#

Goodhart's Law Enters#

The Language Problem#

Is This Solvable?#

The Problem

Specification as Translation

Goodhart's Law Enters

The Language Problem

Is This Solvable?