Or: the same architecture + the same data + different learning algorithms = radically different internal structure

A gap in the Platonic Representation Hypothesis

The Platonic Representation Hypothesis (Huh et al., ICML 2024) claims that different neural networks converge toward the same internal representation of reality. They tested this across dozens of architectures — CNNs, ViTs, language models — and found increasing alignment as models get bigger.

It's a compelling result. But every single model they tested was trained with backpropagation.

That's a big asterisk. What if you change the learning algorithm entirely? Not just SGD vs Adam, but fundamentally different credit assignment mechanisms — evolutionary strategies that never compute a gradient, the Forward-Forward algorithm where each layer learns on its own, feedback alignment that uses random matrices in the backward pass?

I'm a PhD student with a Raspberry Pi 5B (8GB RAM, no GPU). I decided to find out.

The experiment: same everything except the learning rule

The setup is dead simple. Take the exact same MLP (784→256→128→10), the exact same MNIST data, the exact same random initialization. Train it four different ways:

A quick orientation on these four, since two are less common:

  • Backpropagation — the standard. Compute the loss at the output, then propagate error gradients backward through every layer using the chain rule. Each weight gets a precise signal about how it contributed to the overall error. It's global — every layer's update depends on every other layer.
  • Feedback Alignment — like backprop, but the backward pass uses fixed random matrices instead of the actual weight transposes. The gradient direction is approximately right, the magnitude is wrong. Surprisingly, this still works (Lillicrap et al., 2016).
  • Evolutionary Strategies — no gradients at all. Randomly perturb all the weights, run the network forward, see if the loss got better or worse. Repeat many times. The average direction of improvement approximates the gradient. Slow, but simple.
  • Forward-Forward (Hinton, 2022) — each layer learns independently by maximizing a local "goodness" score (the sum of squared activations, $\sum_i h_i^2$) for real data and minimizing it for fake data. No backward pass, no global error signal. Each layer only knows about its own inputs and outputs.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# Same architecture, same init, same data
torch.manual_seed(42)
init_weights = SimpleMLP().state_dict()

# Train four copies with different algorithms
model_bp = SimpleMLP(); model_bp.load_state_dict(init_weights)
train_backprop(model_bp, data)       # Global, exact gradient

model_fa = SimpleMLP(); model_fa.load_state_dict(init_weights)
train_feedback_alignment(model_fa, data)  # Global, random feedback

model_es = SimpleMLP(); model_es.load_state_dict(init_weights)
train_evolutionary(model_es, data)   # Gradient estimated via perturbation

model_ff = SimpleMLP(); model_ff.load_state_dict(init_weights)
train_forward_forward(model_ff, data)  # Layer-local goodness, no backprop

Then measure 25 structural metrics on the resulting networks. Not just accuracy — the actual geometry of the weight matrices and activations. Stable rank, spectral structure, pairwise neuron angles, intrinsic dimensionality, neural collapse metrics, topological connectivity.

I did this for 10 random seeds, 3 datasets, and 2 model widths. Over 70 models total. Then ran it on CIFAR-10 with a CNN to make sure it wasn't a MNIST thing.

All on a Raspberry Pi. It took about 24 hours.

The punchline: a "what–how" separation

I found something clean. Really clean.

WHAT a network learns — which samples are hard, which classes get confused with each other — is mostly determined by the data. Different algorithms agree on this. Cross-algorithm difficulty concordance: $W = 0.72$ (Kendall's coefficient). Class-level representational structure correlation: 0.98.

HOW it encodes — the geometry of the weights, the sparsity pattern, the spectral structure, the loss landscape curvature — is almost entirely determined by the learning algorithm. ANOVA says the algorithm explains 73–100% of the variance in every weight-level metric I measured. $F$-statistics up to 38,408. All $p < 10^{-8}$.

Let that sink in. The data determines the content. The algorithm determines the container.

How different are we talking?

Vocab sheet — the metrics I'm about to throw at you
  • Stable rank — a continuous measure of how many dimensions a weight matrix actually uses. A 256×784 matrix could in principle use all 256 dimensions, but if most of the energy is concentrated in a few singular values, the stable rank is low. Formally: $|W|_F^2 / |W|_2^2$. A stable rank of 3 means the matrix is effectively 3-dimensional.
  • Neuron angle — the average angle between pairs of neuron weight vectors. 90° means neurons are orthogonal (pointing in completely different directions, encoding independent features). 0° means they're all pointing the same way (redundant).
  • Class selectivity — what fraction of a neuron's total activation is driven by its preferred class. 100% means each neuron responds to exactly one digit. 30% means each neuron responds to several classes, with a mild preference for one.
  • Sparsity — fraction of neurons that are effectively dead (activation near zero across the dataset). High sparsity means the network is only using a handful of its neurons.
  • Flatness score — how much the loss changes when you randomly perturb the weights. A flat minimum (low score) means the network is robust to small weight perturbations. A sharp minimum (high score) means it's sitting on a knife edge. Related to generalization — flatter minima tend to generalize better.
  • Intrinsic dimensionality — how many independent directions of variation exist in the network's internal representations. Estimated from the eigenspectrum of the activation covariance matrix.
  • Spectral structure — the distribution of singular values of the weight matrices. Tells you whether the matrix spreads its energy evenly across dimensions or concentrates it in a few.

Here's the encoding taxonomy I found, at matched accuracy:

BackpropEvol. StrategiesForward-Forward
Accuracy96.0%70.0%86.3%
Stable rank34.0105.33.1
Neuron angle89.9°90.0°50.5°
Class selectivity29.7%28.5%99.8%
Sparsity39.8%53.4%97.1%
Flatness score4,4781,5033.3

Forward-Forward compresses everything into ~3 effective dimensions. Its neurons are separated by only 50.5° on average (vs. ~90° for backprop — nearly orthogonal). It kills 248 of 256 neurons. The few survivors are clean — each responds to exactly one digit class.

Backprop spreads information across all 256 neurons, each responding to 3–4 classes. The neurons stay orthogonal, using the full representational capacity.

And here's the thing: these differences aren't because of accuracy. I trained backprop with a low learning rate to converge at the same 86% accuracy as Forward-Forward. Still 23× different in stable rank. Still 937× different in flatness. The learning algorithm, not the accuracy level, determines the geometry.

The mechanism: angular collapse in one epoch

Okay, so the encodings are different. But why?

I tracked the geometry during training, epoch by epoch. This is the plot that made me sit up straight:

Epoch 0Epoch 1Epoch 5Epoch 15Epoch 25
Backprop90.0°89.6°89.9°89.9°89.9°
ES90.0°90.0°90.0°90.0°90.0°
FF90.0°53.4°52.9°52.3°51.7°

Look at Forward-Forward. At epoch 0, all neurons start orthogonal (90° — that's just random initialization in high dimensions). After one single epoch — before the model has learned anything, at 10% accuracy (chance!) — the neurons have collapsed to 53°.

The geometry is set before learning begins.

Backprop? 89.9° for 25 straight epochs. Rock stable. ES? 90.0°. Doesn't budge.

Why does this happen?

I formalized this as a proposition in the paper, but the intuition is simple:

Forward-Forward maximizes "goodness" $\sum_i h_i^2$ independently per neuron. Every neuron in a layer receives the same input and optimizes the same objective. Without any coordination between neurons, they all converge to the same solution — the dominant direction of the data. It's parallel PCA. Independent agents solving the same problem find the same answer.

Backprop is different. The chain rule creates competition. When neuron A improves its contribution to the global loss, that changes what's optimal for neuron B. This implicit repulsion keeps neurons diverse and orthogonal. It's an active process — I confirmed this by explicitly trying to force alignment on backprop with a penalty term. It barely moved: 89.9° → 89.7°. Backprop fights back.

Does this hold on a "real" task?

Fair question. MNIST is a toy. So I ran the same experiment on CIFAR-10 with a CNN (3 conv layers + 2 FC, ~320K parameters), adding feedback alignment as a third algorithm.

Acc.AngleStable RankSparsityClass Sel.Intr. Dim.
Backprop83.9%89.4°30.674.3%56.6%18.0
Feedback Alignment84.0%89.4°28.573.0%56.0%17.9
Forward-Forward10.0%68.6°1.599.5%56.6%2.2

Two things jumped out.

FA and backprop are geometrically identical. Same neuron angles (89.4° for both). Same stable rank (within 7%). Same intrinsic dimensionality (18.0 vs 17.9). Same per-sample difficulty ranking ($\rho = 0.953$). Even the convolutional filter angles match (87.4° vs 86.9°).

Feedback alignment uses random matrices where backprop uses the weight transpose. The gradient direction is approximately right (sign concordance) but the magnitude is completely wrong. And yet the geometry is identical. Gradient direction appears to matter. Magnitude doesn't — at least for this comparison.

FF fails on CIFAR-10 (10% = chance) but still collapses. The encoding paradigm is a property of the credit assignment mechanism, not of whether the model works. Local goodness collapses geometry regardless.

I tried to fix FF's collapse

If angular collapse is the problem, can we just... prevent it?

1
2
3
# Add orthogonality regularizer to FF training
ortho_loss = (gram_matrix - identity).pow(2).mean()
loss = ff_goodness_loss + 0.5 * ortho_loss

Results:

AccuracyAngleStable RankClass Sel.
FF (baseline)85.5%50.9°3.0100%
FF + orthogonality91.5%87.7°3.796.1%
Backprop96.0%89.9°35.431.8%

The orthogonality constraint brought angles back to 87.7° and accuracy jumped from 85.5% to 91.5% — the best Forward-Forward model I found. Angular collapse actually hurts FF's performance.

But notice: stable rank barely moved (3.0 → 3.7). Preventing angular collapse didn't prevent spectral concentration. There are two mechanisms at work — angular alignment and magnitude concentration — and they're partially independent.

So what does this mean?

Here's how I think about it now.

The data determines capabilities. The algorithm determines geometry. These are largely independent axes. You might reach for an analogy like "same book, different font" — but that undersells it. Fonts are cosmetic. The geometry here is load-bearing. It determines which attacks succeed, which interpretability tools work, and whether you can fingerprint the training algorithm from the weights alone. A better analogy: same blueprints, different construction materials. The building stands either way, but one is fireproof and the other isn't.

Interpretability. The same architecture produces 30% to 100% class-selective neurons depending on the algorithm. Interpretability tools calibrated for backprop's distributed representations may behave completely differently on networks trained other ways.

Adversarial robustness. FF retains 82% accuracy under FGSM at $\varepsilon=0.30$ while backprop drops to 2.5%. That sounds amazing until you try transfer attacks from a backprop surrogate — then FF drops to 27%. It's gradient masking, not real robustness. But gradient-free attacks fail completely. The attack surface depends on the geometry, which depends on the algorithm.

Algorithm fingerprinting. At least on small networks, you can identify the training algorithm from the weights alone. In our experiments, a single number — the stable rank of the first layer — achieved 100% classification accuracy across 30 models. Whether this extends to larger architectures is an open question.

Why this matters beyond the lab

You might think: "Nobody uses Forward-Forward or evolutionary strategies in production. Who cares?"

Fair, but non-backprop training is already here:

  • MeZO (Malladi et al., NeurIPS 2023) fine-tunes LLMs at 30B parameters using only forward passes — zeroth-order optimization, no backward pass at all.
  • Surrogate gradients train spiking neural networks on Intel's Loihi 2 and IBM's NorthPole — shipping neuromorphic hardware.
  • Federated learning transmits noisy, compressed, delayed gradients by design.

These methods span a spectrum of gradient fidelity from exact to absent. My results suggest — at least for small networks — that the dividing line isn't gradient accuracy but whether you have any global error signal at all. Feedback alignment, with completely random feedback, produces the same geometry as exact backprop. The challenge for bio-plausible learning isn't computing exact gradients. It's maintaining global error coordination.

What I'm doing next

The obvious limitation: I ran this on MNIST MLPs and a small CIFAR-10 CNN. A Raspberry Pi can only do so much. I'm scaling to:

  1. GPT-2 + MeZO (~15 hours on a 3090): Compare backprop vs. zeroth-order geometry on a transformer. This is the experiment that would test the what–how separation at LLM scale.
  2. ResNet-18 on CIFAR-100 (~3 days): Standard benchmark with 100 classes. Feedback alignment vs. backprop on a real architecture.
  3. Gradient noise sweep: BP with controlled noise at multiple levels, mapping out whether geometry degrades smoothly or has a phase transition.

The one-paragraph version

We trained identical neural networks with fundamentally different learning algorithms and measured their internal geometry. We found that what a network learns is determined by the data, but how it encodes is determined by the learning algorithm. The mechanism is weight angular collapse: local learning rules align neurons within a single epoch, while backprop actively maintains orthogonality. On CIFAR-10 CNNs, feedback alignment with random feedback produces geometry identical to exact backprop, suggesting gradient direction — not magnitude — is the critical factor. The credit assignment mechanism is the primary geometric force shaping neural representations.


Paper: The Geometry of Credit Assignment (workshop paper, NeurIPS format)

Code: All experiments reproduce on a Raspberry Pi 5B. 25 Python scripts, 70+ models, 25 metrics. Everything runs on CPU.

Data: Every number in this post traces to a JSON file which traces to a Python script which runs from a single reproduce.sh.


Written by Austin T. O'Quinn. If something here helped you or you think I got something wrong, I'd like to hear about it — oquinn.18@osu.edu.