Specification

The Specification and Language Problem

Prior reading: What Are Formal Methods? | Model Checking and Formal Specification The Problem Formal methods can prove a system satisfies a specification. The hard part is writing the specification. "Be helpful and don't cause harm" is not a formal spec. Turning it into one requires resolving ambiguity, edge cases, and value judgments that humans can't even agree on in natural language. Specification as Translation Every spec is a translation from human intent to formal language. Translation is lossy. The gap between what we mean and what we write is the specification problem. ...

Model Checking and Formal Specification

Prior reading: What Are Formal Methods? What Is Model Checking? Given a model of a system and a property expressed in formal logic, model checking exhaustively verifies whether the property holds in all reachable states. Temporal Logic Crash Course LTL (Linear Temporal Logic): "Eventually the system reaches a safe state." $\Diamond \text{safe}$ CTL (Computation Tree Logic): "On all paths, the system never enters an unsafe state." $\forall\Box \neg \text{unsafe}$ These let you express liveness, safety, and fairness properties. Classical Model Checking Works brilliantly for finite-state systems: protocols, hardware, control software. Tools like SPIN, NuSMV, and PRISM are mature. ...

What Is RL, Why Is It So Hard, and How Does It Go Wrong?

Prior reading: Gradient Descent and Backpropagation | Loss Functions and Spaces The Setup An agent takes actions in an environment, receives rewards, and learns a policy $\pi(a|s)$ that maximizes expected cumulative reward. $$\max_\pi \mathbb{E}\left[\sum_{t=0}^T \gamma^t r_t\right]$$ Where Are the Neural Nets? Neural networks serve as function approximators for things that are too complex to represent exactly: Policy network: $\pi_\theta(a|s)$ — maps states to action probabilities Value network: $V_\phi(s)$ — estimates expected future reward from a state Q-network: $Q_\psi(s,a)$ — estimates expected future reward from a state-action pair World model (optional): $p_\omega(s'|s,a)$ — predicts next state Without neural nets, RL only works for tiny state spaces (tabular methods). Neural nets let it scale to images, language, and continuous control. ...