When the letters lie, it leans on the idea

2026-06-26 · qualified result · experiment Y

The question

We have a stack that reads text at four altitudes at once: letters, words, phrases, and topic. A gate decides, token by token, which altitude to trust, by watching how confident each one has been lately. The lower levels carry leaky-evidence pooling and the dynamic confidence router from the work before this.

So we asked the question a reader faces every day. What happens when the input is wrong? You read a smudged page, a typo, a misheard word, and you still follow the meaning. We wanted to see the stack do the same, and to see where it goes when the surface stops being trustworthy.

What we tried

We poured noise on the input the model reads, at perception time, on the input only. The target it is scored against stays clean. Given a corrupted view of the past, predict the true future.

Two corruptions. A surface scramble at probability p swaps and substitutes letters inside each word, keeping the spaces, so the letters lie but the word boundaries hold. A word corruption substitutes or drops whole words. The gate is never told the input is noisy. It only ever sees each level's running confidence.

What happened

The concept stack degrades gently where the flat model collapses. Scored against clean next characters, in bits, lower is better:

surface noise `p`	flat bigram	full stack
0.0	2.92	4.80
0.2	+1.14	+0.61
0.4	+1.98	+0.73

The bigram has the lowest clean cost. It is a sharp local memorizer, and exact letter pairs are exactly what the scramble breaks, so it collapses fastest. The full stack starts worse in absolute terms but its cost rises about 2.7 times more slowly, and past p of 0.2 it is nearly flat. It trades absolute level for robustness.

The headline: when the letters lie, it leans on the idea. As the surface noise rises, the share of prediction the gate routes to the concept levels climbs.

surface noise `p`	mass on letters	mass on concepts
0.0	13.6%	86.4%
0.3	4.6%	95.4%

The gate is not told the input is noisy. It routes on confidence alone. When the letters start lying, the letter level's confidence falls, and the gate quietly hands prediction up to the slow topic level. The shift is monotone and large, and the topic level absorbs almost all of the migrated mass.

The lesson

A stack that reads at several altitudes reorganizes itself toward abstraction exactly when the surface stops being trustworthy, with no signal telling it the input is noisy. It leans on the idea because the letters lost their confidence, not because anyone told it to.

Two honest negatives sit next to the win. We tried training on noised text to see if it forced abstraction the way dropout does for a gradient net. It hurt clean rare-context accuracy, 0.239 down to 0.136. Count tables are not gradient nets. They do not overfit the way denoising is meant to cure, so corrupting the train stream mostly deletes signal. The right form is consistency, counting a clean view and a noisy view into the same concept, and that is the next step. Second, under word noise the slow topic level is on track to overtake the word level but has not crossed over yet by the noise we tested. A promising trend, parked.

Lineage

Grew from a vote that remembers, the leaky-evidence pooling the letter level uses, and from the heterogeneous stack, the gated four-altitude stack this pours noise on.

Led to the consistency idea: count a clean and a noisy view into the same concept, rather than train on damaged text.

Thread: surprise and robustness, the system reaching for the altitude that still holds when a lower one fails.

‹ Use the map to read, not to walk One brain part, or many? ›