representations

The Platonic Representation Hypothesis

April 29, 2026

Paper: https://phillipi.github.io/prh/#hypothesis

AI systems are becoming homogenous in both architecture and their capabilties
representational convergence: a growing similarity in how datapoints are represented in different neural networks, spanning differing model architectre, training objectives, and data modalities.
Questions the paper tries to answer:
- what caused this?
- will it continue and why?
the hypothesis: different models are all developing representations for the same thing; trying to produce a representation reality itself.
- in other words: a representation of the joint distribution over events in the world that generate the data we observe
representation algorithms are all derived from the same underlying reality Z and thereby better approximations of Z result in the representations becoming aligned
- more data and more tasks require representations that capture more information (better approximate) z itself
Convergent Realism:
- (Newton-Smith, 1981; Putnam, 1982; Doppelt, 2007; Hardin & Rosenberg, 1982) in the philosophy of science (i.e., that science is converging on truth),
Representation learning literature
- (e.g., Tian et al. (2020a); Zimmermann et al. (2021); Richens & Everitt (2024); Cao & Yamins (2024)).
Anna Karenina Scenario (Basnal et al. 2021) -- all well performing neural nets must represent the world in the same way
PRH: Anna Karenina Scenario and the converged representation is an accurate statistical model of reality

Representations are Converging

This section walks through key observations of how these representations are aligning.

Different models can have aligned representations, despite having different architectures and trained on different loss functions (objectives)
1. Rosetta Neurons- activated by the same pattern across a range of vision models, forming a common dictionary independently discovered by all models
Alignment Increases as models get more capable (scale and performance)
1. They analyze the transfer performance of 78 vision models and find that models with a high transfer performance form a tightly clustered set of representations, weaker transfer performance have more variable representations
2. Concludes that models which are competent will represent data very similiarly
3. "All strong models are alike; each weak model is weak in its own way."
4. Open Questions:
  1. Does the convergence extend to model weights?
  2. Models with different architectures might not have compatible weight spaces, evidence suggests that models sharing the same architecture also align on the same basin of weights
Convergence is occurring across modalities
1. They find that the better an LLM is at language modeling, the better it aligns with vision models. Converse also holds.
Models are even converging/aligning to brains
Raises an open question: Does increased alignment thus predict downstream performance?

paper restricts itself vector embeddings as representations
kernel of a representation: the similarity structure it induces (Kornblith et al. 2019; Klabunde et al., 2023)
- need to read more about KCA

This section tries to undercover which mechanism is causing models to converge. Outlines a few contenders

Task Generality
- Each data point and task places an additional constraint on the model. The more complex the tasks, the number of representation that can satisfy the constraints grows smaller.
- General tasks demand specific solutions
Model Capacity
Simplicity Bias

If we believe that models are in fact converging, then to what end? what is the final destination?

The central hypothesis of the paper: the representaiton we are converging on is an accurate statstical model of the underling reality that generates our observations
- I find the wording here insteresting. "The underling reality that generates our observations" seems verbose, why not just say the world? there must be something specific the authors are trying to convey here.
Ha, next section they address exactly this.

scale is sufficient but not efficient
- yes, scale will get us there, but some methods will get us there faster than others
Training data can be shared across modalities
1. the PRH claims that there exists some modality agnostic underlying representation of (observable) reality, and therefore using both image and language data should help find it
  1. this extends to all forms of data that are approximators of reality -- audio, sensor data etc.