Safety

What Are Formal Methods? Formal methods give mathematical guarantees about system behavior. Instead of testing examples and hoping they generalize, you prove properties hold for all inputs in a defined space. How Other Safety Approaches Work Empirical testing: Run inputs, check outputs. Only covers what you test. Red-teaming: Adversarial search for failures. Better coverage, still incomplete. RLHF / Constitutional AI: Shape behavior through training signal. No guarantees on out-of-distribution. Interpretability: Understand what the model learned. Doesn't directly constrain behavior. What Formal Methods Add Soundness: If the proof says "this property holds," it holds. Period. Completeness trade-offs: Sound methods may be conservative — they might say "I can't prove it's safe" when it actually is. Scalability: The core challenge. Formal methods work beautifully on small systems. Neural networks have millions of parameters. The Gap Why aren't we just doing formal verification on everything? Because: ...

Safety

Layers of Safety: From Data to Deployment

What Are Formal Methods in AI Safety?