2026  7

April  4

Depth-Robust Safety: What Happens When You Truncate a Language Model

April 2, 2026 · 8 min · Austin T. O'Quinn

What Happens to a Neural Network's Geometry When You Change How It Learns?

April 2, 2026 · 11 min · Austin T. O'Quinn

Geometric Similarity Is Blind to Computational Structure

April 1, 2026 · 11 min · Austin T. O'Quinn

Perfect Shields Create Unsafe Policies

April 1, 2026 · 9 min · Austin T. O'Quinn

February  1

Probabilistic Security: Great Against Accidents, Useless Against Attackers

February 4, 2026 · 3 min · Austin T. O'Quinn

January  2

The AI Threat Landscape: What 'Safe' Means and What We're Afraid Of

January 21, 2026 · 4 min · Austin T. O'Quinn

Detectability of Testing

January 7, 2026 · 2 min · Austin T. O'Quinn

2025  24

December  2

Stability of Safety

December 17, 2025 · 2 min · Austin T. O'Quinn

Jailbreaking: Transference, Universality, and Why Defenses May Be Impossible

December 3, 2025 · 7 min · Austin T. O'Quinn

November  1

Safety Training as Capability Elicitation

November 12, 2025 · 4 min · Austin T. O'Quinn

October  3

When Safety Training Backfires

October 29, 2025 · 4 min · Austin T. O'Quinn

A Survey of Alignment Techniques and Their Trade-Offs

October 15, 2025 · 7 min · Austin T. O'Quinn

P-Hacking as Optimization: Implications for Safety Benchmarking

October 1, 2025 · 2 min · Austin T. O'Quinn

September  2

Chain of Thought Is Hackable (By the Model)

September 17, 2025 · 2 min · Austin T. O'Quinn

Systems vs. Components: The Chinese Room and ROMs

September 3, 2025 · 2 min · Austin T. O'Quinn

August  2

Layers of Safety: From Data to Deployment

August 20, 2025 · 2 min · Austin T. O'Quinn

Mesa-Optimization and the Optimization Pressure Spectrum

August 6, 2025 · 13 min · Austin T. O'Quinn

July  2

Mechanistic Interpretability: Circuits, Superposition, and Sparse Autoencoders

July 16, 2025 · 8 min · Austin T. O'Quinn

Platonic Forms in Near-Capacity Models

July 2, 2025 · 2 min · Austin T. O'Quinn

June  2

Probing: What Do Models Actually Know?

June 18, 2025 · 2 min · Austin T. O'Quinn

The Specification and Language Problem

June 4, 2025 · 2 min · Austin T. O'Quinn

May  2

Model Checking and Formal Specification

May 21, 2025 · 1 min · Austin T. O'Quinn

Reachability Analysis for Neural Networks

May 7, 2025 · 1 min · Austin T. O'Quinn

April  2

What Are Formal Methods in AI Safety?

April 23, 2025 · 1 min · Austin T. O'Quinn

Decision Theory for AI Safety

April 9, 2025 · 5 min · Austin T. O'Quinn

March  2

Game Theory for AI Safety

March 26, 2025 · 5 min · Austin T. O'Quinn

Why Sparsity, and How Do We Get It?

March 12, 2025 · 1 min · Austin T. O'Quinn

February  2

What Is RL, Why Is It So Hard, and How Does It Go Wrong?

February 26, 2025 · 3 min · Austin T. O'Quinn

Loss Functions, Decision Boundaries, Activation Spaces, and Why MSE

February 12, 2025 · 3 min · Austin T. O'Quinn

January  2

Linear Algebra Proofs of Optimality for Gradient Descent

January 28, 2025 · 2 min · Austin T. O'Quinn

Gradient Descent, Backpropagation, and the Misconceptions That Tripped Me Up

January 15, 2025 · 36 min · Austin T. O'Quinn
.