Side Channels: The Basics
A side channel is any unintended information leakage from a computation's physical implementation. You're not breaking the algorithm — you're watching the hardware run the algorithm and inferring secrets from physical observables.
The classic side channels:
- Timing: How long a computation takes
- Power: How much energy it draws
- Electromagnetic emanation: What RF signals the chip emits
- Acoustic: What sounds the hardware makes (yes, really)
- Cache behavior: Which memory lines are accessed
Transistor-Level Timing
Why Timing Varies
A transistor's switching time depends on physical conditions:
- Input data: Different bit patterns cause different charging/discharging profiles in gate capacitances. A NAND gate switching from 00→01 takes different time than 11→10.
- Temperature: Hotter transistors switch slower (in most regimes). Computation heats the chip non-uniformly depending on what's being computed.
- Voltage droop: High-activity regions draw more current, causing local voltage drops that slow nearby transistors.
- Process variation: Manufacturing imperfections mean nominally identical transistors have slightly different characteristics. This creates a unique per-chip "fingerprint."
What Leaks
These timing variations are tiny — picoseconds at the transistor level. But they accumulate across millions of operations and become measurable at the system level:
$$\Delta t_{\text{total}} = \sum_{i=1}^{N} \delta t_i(\text{data}_i, \text{state}_i)$$
where $\delta t_i$ is the data-dependent timing variation at operation $i$.
The total $\Delta t$ can reveal:
- Secret keys: Constant-time implementations exist precisely because timing leaks key bits. Branching on secret data creates measurable timing differences.
- Model weights: If inference timing depends on weight values (e.g., sparse operations skip zeros), timing reveals weight structure.
- Input data: If processing time varies with input content, timing reveals what the input was.
Relevance to AI Systems
Model Inference Timing
Neural network inference involves operations whose timing can be data-dependent:
- Sparse operations: Skipping zero-valued weights or activations saves time — and leaks the sparsity pattern.
- Conditional computation: Mixture-of-experts models route inputs to different experts. Which expert was selected leaks information about the input.
- Variable-length processing: Autoregressive generation produces one token at a time. The time between tokens can leak information about the model's "confidence" or internal computation.
- Attention patterns: Self-attention over sequences of different lengths takes different time. Padding helps but isn't always perfect.
AI Systems Handling Secrets
As AI systems are integrated into sensitive workflows (medical records, financial data, classified information), timing side channels become a concrete threat:
- A model processing a medical query might take measurably different time for different conditions, leaking the diagnosis.
- A model with system prompt containing secrets might take different time depending on how the secret interacts with the user query.
- Multi-tenant GPU deployments (multiple users sharing a GPU) create shared physical channels.
Attacks on AI Training
If an attacker can observe timing during training:
- Training data content may leak through batch processing time variations
- Gradient magnitudes may be inferable from optimizer step timing
- Model architecture search decisions may be observable
Defenses
Constant-Time Implementation
The gold standard in cryptography: ensure computation takes the same time regardless of data. Extremely difficult for neural networks — most operations are inherently data-dependent.
Noise and Jitter
Add random delays to obscure timing signals. Reduces but doesn't eliminate leakage. Determined attackers can average out noise with enough measurements.
Physical Isolation
Dedicate hardware to a single tenant. No shared physical substrate = no shared side channel. Expensive and contrary to the economics of cloud AI.
Architectural Choices
- Avoid conditional computation paths that depend on sensitive data
- Use dense operations even when sparse would be faster
- Pad all sequences to maximum length
Each defense trades performance for security — and in AI, performance is the product.
The Bigger Picture
Side channels are a reminder that security is a property of the physical system, not just the algorithm. A mathematically perfect AI safety mechanism running on hardware that leaks information through timing is not actually safe. As AI systems handle increasingly sensitive computations, the gap between algorithmic security and physical security becomes a real vulnerability.