Prior reading: Layers of Safety

The Chinese Room

Searle's argument: a person following rules to manipulate Chinese symbols doesn't understand Chinese, even if the system's outputs are indistinguishable from understanding. The component (person) lacks understanding; does the system have it?

ROMs That Are Smarter Than Their Parts

A read-only memory chip contains no intelligence. But a ROM storing a chess engine's entire game tree would play perfect chess. The system (ROM + lookup procedure) is "more intelligent" than its parts.

Conversely: a system of intelligent components can be less intelligent than any individual part (design by committee, bureaucratic dysfunction).

The Safety Analog

A neural network's individual neurons are trivial. The system exhibits complex, potentially dangerous behavior. Safety analysis at the component level misses emergent system behavior.

But also: a system of individually safe components can be unsafe in composition. Two safe modules interacting can produce unsafe emergent behavior through feedback loops, race conditions, or distributional shift at the interface.

Implications for AI Safety

  • Component-level safety is insufficient: You must verify system-level properties.
  • System-level properties may not reduce: You can't always decompose "the system is safe" into "each part is safe."
  • Emergence cuts both ways: Systems can be safer or more dangerous than their parts suggest.
  • Monitoring must be at the system level: Watch inputs, outputs, and interactions — not just individual module behavior.