When the whole room agrees on a topic

2026-06-26 · qualified result · experiment T

The question

A reader holds a sense of what a passage is about, and that sense colors the next word. The brain may do this by ignition: a higher region commits to one global state and broadcasts it to everyone below. We asked the small version of that. If a higher level picks one topic and hands it down, do the lower predictors do better?

It matters because topic is the most concrete lever we have on the thing we cannot yet do: keeping a passage coherent over long spans.

What we tried

We read text with real article boundaries (a slice of a Wikipedia dump). As the model read, it tracked which of 128 topics it was in, switching only when one topic clearly took over, and holding steady otherwise. That committed topic, call it G, was broadcast to every position. The lower predictor then counted the next token conditioned on both its local context and G, falling back to plain local counts when it had never seen that pair.

The two models shared the exact same machinery, so turning G off recovered the plain baseline bit for bit. A fair fight.

What happened

At the character level, the topic hurt.

level	without topic	with topic	effect
characters	1.86	2.08	hurts
words, overall	12.99	12.97	helps a little
words, where local context ran out	12.62	12.27	clearly helps

(Numbers are bits, lower is better.) Spelling a word does not depend on the topic, so folding topic into the character counts only made each cell rarer and the prediction worse. We confirmed the machinery was sound by feeding it a shuffled topic, which was worse still. Real topic beats random topic. No topic beats any topic.

At the word level the story flipped. Overall the gain was tiny, because the local word context already nails most words. But on the 5.5% of words where the local context had never been seen, the broadcast topic saved a third of a bit. That diluted third of a bit was the entire overall gain.

The lesson

Top-down topic helps where, and only where, local prediction has failed, and the altitude decides whether that ever happens. Characters never run out of local context, so topic can only fragment them. Words sometimes do, and there the global state carries real information.

So the design is clear. Let the word level commit a topic, and mix it in softly as a prior on the word distribution, used only when the local model backs off. Never push it down onto characters as an extra key. This is the global-workspace idea, landed at the right altitude.

Lineage

Grew from the scorecard, which named global coherence as the frontier, and from the source mining. The flat fourth level is what sent us looking for a level past local context.

Led to the design above, a word-level topic prior used only on backoff, into the heterogeneous stack as the slow topic level, the fair rematch as the topic prior, and the event model, which re-confirms this altitude law by a different road, a topic discovered from belief jolts.

Thread: global coherence, the open frontier, with topic as the most concrete lever on it.

‹ Predicting the kind, not the word You can't write your signature backwards ›