Finding where one word ends
2026-06-25 · qualified win · experiment A
The question
The first experiment, and the foundation for everything after. The architecture mints concepts at moments of surprise. Before trusting that, we had to know it works at the very bottom. Reading raw characters, with no spaces and no labels, can prediction error alone find where one word ends and the next begins?
And a sharper sub-question. A 2023 paper argued that transient Bayesian surprise is the brain's event-boundary signal. Does it beat the plain surprise measures here?
What we tried
We streamed raw characters and watched a signal called branching entropy: when the model becomes suddenly unsure what comes next, that ambiguity marks a likely boundary. We measured how well it recovers true word boundaries. Then we put Bayesian surprise head to head with it at the character level.
What happened
Boundaries are recoverable from prediction alone. The branching-entropy signal hit an F1 of 0.775 against true word boundaries, a quality the literature respects, with no labels and no spaces given.
And one fashionable signal failed. Bayesian surprise, the signal the paper championed, scored below random at the character level. The reason is altitude: it is a signal for semantic events, and at the character scale it peaks in the middle of a word, not at its edge.
| signal | works at the character level? |
|---|---|
| branching-entropy rise | yes, F1 0.775 |
| Bayesian surprise | no, below random |
The lesson
The boundary signal is level-dependent. Branching entropy carves letters into words; Bayesian surprise belongs higher up, at sentences and themes.
The core bet held at the bottom: ambiguity marks a boundary. And the negative result was a gift. It told us the right signal for each altitude, so we did not waste the cheap surprise measure where it does not work. This signal carries forward into every later experiment that carves a hierarchy, all the way up to discovering phrases. We also dogfooded our own platform here, persisting the discovered words as a Prism document, the first piece of an inspectable mind.
Lineage
Grew from the start. This is the first experiment.
Led to finding phrases one level up, a memory of change, and a vote that remembers.
Thread: surprise, the one signal that runs through boundaries, attention, and learning.