Prior reading: Transistor Timing Side Channels | What Are Formal Methods?
The Containment Problem
If you have an AI system that might be dangerous, you want to run it in a box where it can't affect the outside world except through controlled channels. Simple in concept. The problem: how do you prove the box has no leaks?
Software containment is hard because the attack surface is enormous. Operating systems have millions of lines of code, thousands of syscalls, and a long history of privilege escalation exploits. A sufficiently capable AI looking for a way out of a sandbox is running against software written by humans who couldn't anticipate all escape paths.
Why seL4
seL4 is a formally verified microkernel. "Formally verified" means:
- A mathematical proof exists that the kernel's C implementation correctly implements its abstract specification
- The specification guarantees isolation between components: a process cannot access memory, capabilities, or channels it hasn't been explicitly granted
- The proof covers the actual binary running on hardware — not just a model of the software
This is not "we tested it a lot." This is "we have a machine-checked mathematical proof that these properties hold for every possible execution."
What seL4 Proves
- Functional correctness: The C code does exactly what the specification says. No undefined behavior, no buffer overflows, no uninitialized memory.
- Integrity: A component cannot modify another component's memory or state without possessing an explicit capability to do so.
- Confidentiality: Information cannot flow from a high-security component to a low-security component except through explicitly authorized channels.
- Authority confinement: A component cannot grant more authority than it possesses.
What seL4 Does NOT Prove
- Timing channels: seL4's proofs do not cover covert timing channels (→ see transistor-timing post). Information can leak through timing even when the kernel is functionally correct.
- Hardware correctness: The proofs assume the hardware behaves according to its specification. Hardware bugs (Spectre, Meltdown, Rowhammer) are out of scope.
- Application correctness: seL4 guarantees the kernel is correct. Whatever you run on top of seL4 has its own bugs.
Capability-Based Restrictions
What Capabilities Are
In seL4, a capability is an unforgeable token that grants a specific right to a specific resource. A process can only:
- Access memory it has a capability for
- Send messages to endpoints it has a capability for
- Create child processes with at most the capabilities it possesses
No ambient authority. No root. No privilege escalation by default. Every action requires an explicit, verifiable authorization.
Capabilities for AI Containment
Design the containment architecture as a capability graph:
| |
The AI process has no capability for:
- Network access
- File system access
- Spawning processes
- Accessing hardware devices
- Communicating with any process except through the two authorized channels
These aren't firewall rules that can be misconfigured or bypassed. They're structural properties of the capability graph, enforced by the verified kernel.
Capability Composition
Capabilities compose predictably:
- A process with caps A and B can grant A or B to a child, but cannot synthesize cap C that it doesn't hold
- Authority only decreases down the process tree — never increases
- This is provably true as a consequence of seL4's verification
Provable Lack of Homomorphic Paths
The Key Property
A homomorphic path to a restricted outcome is an indirect sequence of permitted operations that achieves an effect equivalent to a restricted operation. Example: a process without network access that manipulates shared memory to cause another process (with network access) to send a message on its behalf.
In a capability system, we want to prove: there is no composition of the AI's granted capabilities that produces an effect equivalent to a capability it was not granted.
How seL4 Enables This
seL4's authority confinement property means:
- The AI's capabilities are exactly what it was granted — no more
- It cannot forge, amplify, or delegate capabilities it doesn't possess
- The only information flows are through explicitly created channels
To prove no homomorphic paths exist:
- Enumerate the AI's capabilities: This is finite and explicit in seL4's capability space.
- Define the restricted outcomes: e.g., "any message reaches the external network," "any file is written to persistent storage."
- Prove that no composition of granted capabilities can produce a restricted outcome: This follows from seL4's information flow proofs if the capability graph is correctly configured and the restricted outcome requires capabilities not in the AI's transitive authority set.
Formal Argument Sketch
Let $\mathcal{C}{\text{AI}}$ be the set of capabilities granted to the AI. Let $\mathcal{O}{\text{restricted}}$ be the set of restricted outcomes. Let $\text{reach}(\mathcal{C})$ be the set of all outcomes reachable by any composition of capabilities in $\mathcal{C}$.
Claim: $\text{reach}(\mathcal{C}{\text{AI}}) \cap \mathcal{O}{\text{restricted}} = \emptyset$
Proof sketch: By seL4's authority confinement, $\text{reach}(\mathcal{C})$ is bounded by the transitive closure of $\mathcal{C}$ under the kernel's capability operations (grant, revoke, derive). If no capability in $\mathcal{C}{\text{AI}}$ (or derivable from it) intersects with the capabilities required for any outcome in $\mathcal{O}{\text{restricted}}$, the intersection is empty.
This is verifiable by static analysis of the capability graph — no runtime monitoring needed.
The Gaps
Even with this proof:
- Covert channels: Timing, power, cache — side channels outside the kernel's model (→ see transistor-timing post)
- Human operators: The AI might convince a human to change its capability set. Social engineering bypasses formal verification.
- Hardware attacks: Rowhammer, voltage glitching — physical attacks that violate the hardware assumptions seL4's proofs rely on.
- Output channel content: The AI has a text output channel. What it writes through that channel is unconstrained by the kernel. If the output reaches a human or another system that acts on it, the AI can influence the outside world through persuasion, not capability violation.
Architecture for Contained AI
A Practical Design
| |
- AI Process: Isolated. Can only read from input endpoint and write to output endpoint.
- Input Filter: Receives external input, sanitizes, forwards to AI. Has capability to external input and AI's input endpoint — nothing else.
- Output Filter: Receives AI output, applies rate limiting and content filtering, forwards to external output. Cannot modify AI's memory or state.
- Monitor: Can read all channels (for logging/analysis) and can kill the AI process. Cannot modify any process's capabilities.
Each component has the minimum capabilities needed for its role. No component has all capabilities. Compromise of any single component doesn't compromise the containment.
What This Buys You
- Provable isolation: The AI cannot access anything outside its granted capabilities. This is a theorem, not a test result.
- Auditable authority: The capability graph is finite, explicit, and statically analyzable.
- Minimal TCB (Trusted Computing Base): Only seL4 (~10,000 lines of verified C) needs to be trusted. Everything else is constrained by the kernel's guarantees.
- Defense in depth: Input filter, output filter, and monitor are independent containment layers, each with its own minimal capabilities.
What This Doesn't Buy You
- Protection against the AI writing convincing text through the output channel (the persuasion problem)
- Protection against covert timing channels between the AI and external observers
- Protection against hardware-level attacks
- Any guarantee about the AI's behavior — only about its authority
The Bigger Point
seL4 containment gives you the strongest possible structural guarantee: the AI literally cannot do things it wasn't given capability to do, and this is mathematically proven. But structural containment is necessary, not sufficient. An AI contained by seL4 with a text output channel is like a prisoner in an escape-proof cell with a telephone — the cell works perfectly, but the telephone might be enough.
The value of formal containment is that it eliminates one class of risk entirely (capability-based escape), letting you focus your safety efforts on the remaining classes (persuasion, side channels, hardware) instead of trying to defend against everything simultaneously.