What Are Formal Methods in AI Safety?

What Are Formal Methods?

Formal methods give mathematical guarantees about system behavior. Instead of testing examples and hoping they generalize, you prove properties hold for all inputs in a defined space.

How Other Safety Approaches Work

Empirical testing: Run inputs, check outputs. Only covers what you test.
Red-teaming: Adversarial search for failures. Better coverage, still incomplete.
RLHF / Constitutional AI: Shape behavior through training signal. No guarantees on out-of-distribution.
Interpretability: Understand what the model learned. Doesn't directly constrain behavior.

What Formal Methods Add

Soundness: If the proof says "this property holds," it holds. Period.
Completeness trade-offs: Sound methods may be conservative — they might say "I can't prove it's safe" when it actually is.
Scalability: The core challenge. Formal methods work beautifully on small systems. Neural networks have millions of parameters.

The Gap

Why aren't we just doing formal verification on everything? Because:

We don't know how to write specifications for "aligned"
Neural networks are hard to verify (non-linear, high-dimensional)
The tools are young

Where This Series Goes

This post sets up the rest of the formal methods section: reachability analysis, model checking, and the specification problem.

What Are Formal Methods?#

How Other Safety Approaches Work#

What Formal Methods Add#

The Gap#

Where This Series Goes#

What Are Formal Methods?

How Other Safety Approaches Work

What Formal Methods Add

The Gap

Where This Series Goes