Transistor Timings as a Side Channel

Side Channels: The Basics

A side channel is any unintended information leakage from a computation's physical implementation. You're not breaking the algorithm — you're watching the hardware run the algorithm and inferring secrets from physical observables.

The classic side channels:

Timing: How long a computation takes
Power: How much energy it draws
Electromagnetic emanation: What RF signals the chip emits
Acoustic: What sounds the hardware makes (yes, really)
Cache behavior: Which memory lines are accessed

Transistor-Level Timing

Why Timing Varies

A transistor's switching time depends on physical conditions:

Input data: Different bit patterns cause different charging/discharging profiles in gate capacitances. A NAND gate switching from 00→01 takes different time than 11→10.
Temperature: Hotter transistors switch slower (in most regimes). Computation heats the chip non-uniformly depending on what's being computed.
Voltage droop: High-activity regions draw more current, causing local voltage drops that slow nearby transistors.
Process variation: Manufacturing imperfections mean nominally identical transistors have slightly different characteristics. This creates a unique per-chip "fingerprint."

What Leaks

These timing variations are tiny — picoseconds at the transistor level. But they accumulate across millions of operations and become measurable at the system level:

$$\Delta t_{\text{total}} = \sum_{i=1}^{N} \delta t_i(\text{data}_i, \text{state}_i)$$

where $\delta t_i$ is the data-dependent timing variation at operation $i$.

The total $\Delta t$ can reveal:

Secret keys: Constant-time implementations exist precisely because timing leaks key bits. Branching on secret data creates measurable timing differences.
Model weights: If inference timing depends on weight values (e.g., sparse operations skip zeros), timing reveals weight structure.
Input data: If processing time varies with input content, timing reveals what the input was.

Relevance to AI Systems

Model Inference Timing

Neural network inference involves operations whose timing can be data-dependent:

Sparse operations: Skipping zero-valued weights or activations saves time — and leaks the sparsity pattern.
Conditional computation: Mixture-of-experts models route inputs to different experts. Which expert was selected leaks information about the input.
Variable-length processing: Autoregressive generation produces one token at a time. The time between tokens can leak information about the model's "confidence" or internal computation.
Attention patterns: Self-attention over sequences of different lengths takes different time. Padding helps but isn't always perfect.

AI Systems Handling Secrets

As AI systems are integrated into sensitive workflows (medical records, financial data, classified information), timing side channels become a concrete threat:

A model processing a medical query might take measurably different time for different conditions, leaking the diagnosis.
A model with system prompt containing secrets might take different time depending on how the secret interacts with the user query.
Multi-tenant GPU deployments (multiple users sharing a GPU) create shared physical channels.

Attacks on AI Training

If an attacker can observe timing during training:

Training data content may leak through batch processing time variations
Gradient magnitudes may be inferable from optimizer step timing
Model architecture search decisions may be observable

Defenses

Constant-Time Implementation

The gold standard in cryptography: ensure computation takes the same time regardless of data. Extremely difficult for neural networks — most operations are inherently data-dependent.

Noise and Jitter

Add random delays to obscure timing signals. Reduces but doesn't eliminate leakage. Determined attackers can average out noise with enough measurements.

Physical Isolation

Dedicate hardware to a single tenant. No shared physical substrate = no shared side channel. Expensive and contrary to the economics of cloud AI.

Architectural Choices

Avoid conditional computation paths that depend on sensitive data
Use dense operations even when sparse would be faster
Pad all sequences to maximum length

Each defense trades performance for security — and in AI, performance is the product.

The Bigger Picture

Side channels are a reminder that security is a property of the physical system, not just the algorithm. A mathematically perfect AI safety mechanism running on hardware that leaks information through timing is not actually safe. As AI systems handle increasingly sensitive computations, the gap between algorithmic security and physical security becomes a real vulnerability.

Side Channels: The Basics#

Transistor-Level Timing#

Why Timing Varies#

What Leaks#

Relevance to AI Systems#

Model Inference Timing#

AI Systems Handling Secrets#

Attacks on AI Training#

Defenses#

Constant-Time Implementation#

Noise and Jitter#

Physical Isolation#

Architectural Choices#

The Bigger Picture#