seL4 as a Foundation for Trustworthy AI Containment

Prior reading: Transistor Timing Side Channels | What Are Formal Methods?

The Containment Problem

If you have an AI system that might be dangerous, you want to run it in a box where it can't affect the outside world except through controlled channels. Simple in concept. The problem: how do you prove the box has no leaks?

Software containment is hard because the attack surface is enormous. Operating systems have millions of lines of code, thousands of syscalls, and a long history of privilege escalation exploits. A sufficiently capable AI looking for a way out of a sandbox is running against software written by humans who couldn't anticipate all escape paths.

Why seL4

seL4 is a formally verified microkernel. "Formally verified" means:

A mathematical proof exists that the kernel's C implementation correctly implements its abstract specification
The specification guarantees isolation between components: a process cannot access memory, capabilities, or channels it hasn't been explicitly granted
The proof covers the actual binary running on hardware — not just a model of the software

This is not "we tested it a lot." This is "we have a machine-checked mathematical proof that these properties hold for every possible execution."

What seL4 Proves

Functional correctness: The C code does exactly what the specification says. No undefined behavior, no buffer overflows, no uninitialized memory.
Integrity: A component cannot modify another component's memory or state without possessing an explicit capability to do so.
Confidentiality: Information cannot flow from a high-security component to a low-security component except through explicitly authorized channels.
Authority confinement: A component cannot grant more authority than it possesses.

What seL4 Does NOT Prove

Timing channels: seL4's proofs do not cover covert timing channels (→ see transistor-timing post). Information can leak through timing even when the kernel is functionally correct.
Hardware correctness: The proofs assume the hardware behaves according to its specification. Hardware bugs (Spectre, Meltdown, Rowhammer) are out of scope.
Application correctness: seL4 guarantees the kernel is correct. Whatever you run on top of seL4 has its own bugs.

Capability-Based Restrictions

What Capabilities Are

In seL4, a capability is an unforgeable token that grants a specific right to a specific resource. A process can only:

Access memory it has a capability for
Send messages to endpoints it has a capability for
Create child processes with at most the capabilities it possesses

No ambient authority. No root. No privilege escalation by default. Every action requires an explicit, verifiable authorization.

Capabilities for AI Containment

Design the containment architecture as a capability graph:

1
2
3
4
5
[AI Process]
  ├── cap: read from input channel (text only)
  ├── cap: write to output channel (text only, rate-limited)
  ├── cap: allocate memory (bounded)
  └── (nothing else)

The AI process has no capability for:

Network access
File system access
Spawning processes
Accessing hardware devices
Communicating with any process except through the two authorized channels

These aren't firewall rules that can be misconfigured or bypassed. They're structural properties of the capability graph, enforced by the verified kernel.

Capability Composition

Capabilities compose predictably:

A process with caps A and B can grant A or B to a child, but cannot synthesize cap C that it doesn't hold
Authority only decreases down the process tree — never increases
This is provably true as a consequence of seL4's verification

Provable Lack of Homomorphic Paths

The Key Property

A homomorphic path to a restricted outcome is an indirect sequence of permitted operations that achieves an effect equivalent to a restricted operation. Example: a process without network access that manipulates shared memory to cause another process (with network access) to send a message on its behalf.

In a capability system, we want to prove: there is no composition of the AI's granted capabilities that produces an effect equivalent to a capability it was not granted.

How seL4 Enables This

seL4's authority confinement property means:

The AI's capabilities are exactly what it was granted — no more
It cannot forge, amplify, or delegate capabilities it doesn't possess
The only information flows are through explicitly created channels

To prove no homomorphic paths exist:

Enumerate the AI's capabilities: This is finite and explicit in seL4's capability space.
Define the restricted outcomes: e.g., "any message reaches the external network," "any file is written to persistent storage."
Prove that no composition of granted capabilities can produce a restricted outcome: This follows from seL4's information flow proofs if the capability graph is correctly configured and the restricted outcome requires capabilities not in the AI's transitive authority set.

Formal Argument Sketch

Let $\mathcal{C}{\text{AI}}$ be the set of capabilities granted to the AI. Let $\mathcal{O}{\text{restricted}}$ be the set of restricted outcomes. Let $\text{reach}(\mathcal{C})$ be the set of all outcomes reachable by any composition of capabilities in $\mathcal{C}$.

Claim: $\text{reach}(\mathcal{C}{\text{AI}}) \cap \mathcal{O}{\text{restricted}} = \emptyset$

Proof sketch: By seL4's authority confinement, $\text{reach}(\mathcal{C})$ is bounded by the transitive closure of $\mathcal{C}$ under the kernel's capability operations (grant, revoke, derive). If no capability in $\mathcal{C}{\text{AI}}$ (or derivable from it) intersects with the capabilities required for any outcome in $\mathcal{O}{\text{restricted}}$, the intersection is empty.

This is verifiable by static analysis of the capability graph — no runtime monitoring needed.

The Gaps

Even with this proof:

Covert channels: Timing, power, cache — side channels outside the kernel's model (→ see transistor-timing post)
Human operators: The AI might convince a human to change its capability set. Social engineering bypasses formal verification.
Hardware attacks: Rowhammer, voltage glitching — physical attacks that violate the hardware assumptions seL4's proofs rely on.
Output channel content: The AI has a text output channel. What it writes through that channel is unconstrained by the kernel. If the output reaches a human or another system that acts on it, the AI can influence the outside world through persuasion, not capability violation.

Architecture for Contained AI

A Practical Design

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
┌─────────────────────────────────────────────┐
│  seL4 Microkernel (formally verified)       │
├──────────┬──────────┬──────────┬────────────┤
│ AI       │ Input    │ Output   │ Monitor    │
│ Process  │ Filter   │ Filter   │ Process    │
│          │          │          │            │
│ caps:    │ caps:    │ caps:    │ caps:      │
│ - memory │ - ext in │ - ext out│ - read all │
│ - in ep  │ - ai ep  │ - ai ep │   channels │
│ - out ep │          │ - rate   │ - log      │
│          │          │   limit  │ - kill ai  │
└──────────┴──────────┴──────────┴────────────┘

AI Process: Isolated. Can only read from input endpoint and write to output endpoint.
Input Filter: Receives external input, sanitizes, forwards to AI. Has capability to external input and AI's input endpoint — nothing else.
Output Filter: Receives AI output, applies rate limiting and content filtering, forwards to external output. Cannot modify AI's memory or state.
Monitor: Can read all channels (for logging/analysis) and can kill the AI process. Cannot modify any process's capabilities.

Each component has the minimum capabilities needed for its role. No component has all capabilities. Compromise of any single component doesn't compromise the containment.

What This Buys You

Provable isolation: The AI cannot access anything outside its granted capabilities. This is a theorem, not a test result.
Auditable authority: The capability graph is finite, explicit, and statically analyzable.
Minimal TCB (Trusted Computing Base): Only seL4 (~10,000 lines of verified C) needs to be trusted. Everything else is constrained by the kernel's guarantees.
Defense in depth: Input filter, output filter, and monitor are independent containment layers, each with its own minimal capabilities.

What This Doesn't Buy You

Protection against the AI writing convincing text through the output channel (the persuasion problem)
Protection against covert timing channels between the AI and external observers
Protection against hardware-level attacks
Any guarantee about the AI's behavior — only about its authority

The Bigger Point

seL4 containment gives you the strongest possible structural guarantee: the AI literally cannot do things it wasn't given capability to do, and this is mathematically proven. But structural containment is necessary, not sufficient. An AI contained by seL4 with a text output channel is like a prisoner in an escape-proof cell with a telephone — the cell works perfectly, but the telephone might be enough.

The value of formal containment is that it eliminates one class of risk entirely (capability-based escape), letting you focus your safety efforts on the remaining classes (persuasion, side channels, hardware) instead of trying to defend against everything simultaneously.

The Containment Problem#

Why seL4#

What seL4 Proves#

What seL4 Does NOT Prove#

Capability-Based Restrictions#

What Capabilities Are#

Capabilities for AI Containment#

Capability Composition#

Provable Lack of Homomorphic Paths#

The Key Property#

How seL4 Enables This#

Formal Argument Sketch#

The Gaps#

Architecture for Contained AI#

A Practical Design#

What This Buys You#

What This Doesn't Buy You#

The Bigger Point#