What Are Formal Methods?

Formal methods give mathematical guarantees about system behavior. Instead of testing examples and hoping they generalize, you prove properties hold for all inputs in a defined space.

How Other Safety Approaches Work

  • Empirical testing: Run inputs, check outputs. Only covers what you test.
  • Red-teaming: Adversarial search for failures. Better coverage, still incomplete.
  • RLHF / Constitutional AI: Shape behavior through training signal. No guarantees on out-of-distribution.
  • Interpretability: Understand what the model learned. Doesn't directly constrain behavior.

What Formal Methods Add

  • Soundness: If the proof says "this property holds," it holds. Period.
  • Completeness trade-offs: Sound methods may be conservative — they might say "I can't prove it's safe" when it actually is.
  • Scalability: The core challenge. Formal methods work beautifully on small systems. Neural networks have millions of parameters.

The Gap

Why aren't we just doing formal verification on everything? Because:

  1. We don't know how to write specifications for "aligned"
  2. Neural networks are hard to verify (non-linear, high-dimensional)
  3. The tools are young

Where This Series Goes

This post sets up the rest of the formal methods section: reachability analysis, model checking, and the specification problem.