Geometric Similarity Is Blind to Computational Structure

This post starts with a simple question — how would you tell if two neural networks learned the same thing? — and builds to a case where the standard answer is dangerously wrong. How would you compare two networks? Suppose you train two neural networks on the same task from different random initializations, and both get 99% accuracy. Did they learn the same thing? You can't just compare the raw activation values. To see why, think about a simpler example. Imagine two spreadsheets tracking student performance. One has columns [math_score, reading_score]. The other has columns [total_score, score_difference]. Both contain the same information — you can convert between them with simple arithmetic — but the raw numbers look completely different. A student with (90, 80) in the first spreadsheet would be (170, 10) in the second. ...

April 1, 2026 · 11 min · Austin T. O'Quinn

Probing: What Do Models Actually Know?

Prior reading: Gradient Descent and Backpropagation | Why Sparsity? What Is Probing? Train a simple classifier (usually linear) on a model's internal representations to test whether specific information is encoded there. If a linear probe can extract "is this sentence toxic?" from layer 12 activations, the model represents toxicity at that layer. How It Works Freeze the model Extract activations at a chosen layer for a labeled dataset Train a linear (or shallow) classifier on those activations Measure accuracy High accuracy → the information is linearly accessible in the representation. ...

June 18, 2025 · 2 min · Austin T. O'Quinn
.