The hierarchy pays off at the right altitude
2026-06-25 · win · experiment E
The question
We had just learned a painful lesson: combining higher-level experts had failed to lower the cost of predicting the next character. The diagnosis was that character prediction saturates, so phrase and topic structure barely move it. If that diagnosis was right, the same experts should pay off handsomely when measured at the level they actually operate on. So we tested the prediction directly. Predict the next word, not the next character.
What we tried
We built word-level experts: a word's own frequency, the previous word (a phrase), the previous two words (a longer phrase), and a recency cache for topic. We combined them by multiplying their opinions together, a product of experts, where each expert can quietly abstain.
What happened
It compounded, cleanly.
| measure | before | after |
|---|---|---|
| perplexity | 476 | 247 |
| bits per word | 8.90 | 7.95 |
Perplexity nearly halved. And the learned combination weights told us where the value lived: almost all the mass landed on the phrase experts, the previous word and the two before it. The phrase structure that was invisible at the character level was doing real work at the word level.
The lesson
Each concept level helps predict the level it operates on. Word concepts help characters; phrases and topics help words. Measure at the right altitude and the hierarchy compounds.
This is the through-line of the whole project, confirmed by the cleanest experiment for it. The architecture, multi-level concepts combined by a product of experts, learned online and inspectably, works. The earlier failure was never the architecture. It was measuring a word-level idea with a character-level ruler.
Lineage
Grew from word concepts, which proved the first rung, and from the voting loss, which said to measure higher ideas at the right altitude.
Led to one part repeated, the Column that folds these experts into one part.
Thread: the hierarchy, each level helping predict the level it operates on.