What Are Formal Methods?
Formal methods give mathematical guarantees about system behavior. Instead of testing examples and hoping they generalize, you prove properties hold for all inputs in a defined space.
How Other Safety Approaches Work
- Empirical testing: Run inputs, check outputs. Only covers what you test.
- Red-teaming: Adversarial search for failures. Better coverage, still incomplete.
- RLHF / Constitutional AI: Shape behavior through training signal. No guarantees on out-of-distribution.
- Interpretability: Understand what the model learned. Doesn't directly constrain behavior.
What Formal Methods Add
- Soundness: If the proof says "this property holds," it holds. Period.
- Completeness trade-offs: Sound methods may be conservative — they might say "I can't prove it's safe" when it actually is.
- Scalability: The core challenge. Formal methods work beautifully on small systems. Neural networks have millions of parameters.
The Gap
Why aren't we just doing formal verification on everything? Because:
- We don't know how to write specifications for "aligned"
- Neural networks are hard to verify (non-linear, high-dimensional)
- The tools are young
Where This Series Goes
This post sets up the rest of the formal methods section: reachability analysis, model checking, and the specification problem.