What an agent learns while it dreams

2026-06-26 · qualified win · experiment AA

The question

A model that learns by counting never forgets. That is its gift, and its problem. It keeps every context it ever saw, including the ones it saw once, in a moment of noise. The memory grows, and a lot of it is junk.

A recent idea from Letta, Towards Agents that Learn, frames the cure. An agent improves by refining its memory, not its weights. It does this offline, while idle, in a pass they call agent dreaming: replay what you saw, and tidy what you learned. The catch they report is sharp. Refine too much and the memory goes generic and lossy. It forgets the specific in favor of the vague.

In our world this is almost literal. Each Column is a little memory-agent, and its count table is its memory. So we asked the two questions the idea forces. Can a model improve by tidying its memory, with no new data? And does tidying too much break it the way Letta warns?

What we tried

We built a sleep pass: one offline pass over the count tables, replaying a bounded buffer of recent text. It does three count-only things. It prunes contexts seen too few times to trust. It distills: where a long, specific context predicts exactly what its shorter backoff already predicts, it drops the specific copy, losing nothing. And it tries to promote recurring patterns into shared concepts.

No gradients, no batch optimization. Pruning is a count threshold, distilling is a comparison of two counted distributions, promotion is one-pass clustering. The honest caveat: sleep is a second pass, so it is offline, not the single streaming pass of the pure online model. But the learning rule stays local and counted. Brains sleep and replay. This is replay and bookkeeping, not training.

We trained on 16 MB of text8, then slept, and scored a held-out tail. We split every number by whether the model had much evidence for a context (common) or little (rare).

What happened

One gentle sleep cut the memory by more than a third and made the model better, not worse.

	held-out bpc	rare-context bpc	memory
before sleep	1.8773	2.592	3.26 M entries
after one sleep	1.8666	2.456	2.06 M entries

The win lands on the rare tail. Pruning throws away contexts seen once or twice, the noise the online pass had no way to filter. Distilling drops specific contexts that only echoed their backoff. The memory shrinks 37%, and the rare-context cost falls four times more than the common cost rises. A clear gain, from tidying alone, with no new data.

Then we kept sleeping, harder each time, and watched Letta's failure mode arrive on schedule.

cycle	bpc	rare-context	common-context
0	1.8773	2.592	1.735
1 (best)	1.8638	2.422	1.781
5	1.9140	2.454	1.869
10	2.2867	2.203	2.289

Quality peaks after one cycle, then falls for nine straight. The signature is exact. Rare-context cost keeps improving. Common-context cost keeps getting worse. Refinement pours the specific memories into the generic backoff. The tail is happy, because the generic answer was always its best guess. The common case is ruined, because its specific count was real signal, now blurred into mush. That is "generic and lossy," reproduced in counts.

Promotion, the concept step, lost outright (bpc 2.10 against 1.87). A concept keyed by exact context strings cannot fire for a context it has never seen, so it compresses without generalizing. We parked it with a reason.

The lesson

Sleep over a count memory works once. One offline pass of pruning and lossless distilling shrinks the memory by a third and sharpens the rare-context tail, with no new data. A second, harder pass starts grinding the specific memories into generic mush. The rare contexts keep gaining and the common contexts keep losing, and over-refinement is the moment the loss overtakes the gain.

The keeper for the cortex is plain. Dream once. Prune the contexts you cannot trust, distill the specifics that only repeat their backoff, and stop. You get a smaller memory that predicts as well or better and reads the rare tail more clearly. Do not keep dreaming. The model that refines forever forgets what made it sharp, exactly as the idea warned.

Lineage

Grew from the JEPA representations that taught us a counted latent cannot collapse, from the vote that remembers and its bounded accumulation, and from the one part repeated whose count tables are the memory we slept over. The sleep idea itself is Letta's, from Towards Agents that Learn: refine memory, not weights, and beware the generic-and-lossy drift.

Led to the open follow-up: route a context through the online concept cluster, not its raw string, so promotion can reach contexts it has never seen.

Thread: representations and online learning. It sits on the source-mining branch that grew out of the flat fourth level, the result that sent us looking past fixed local levels.

‹ How sure is a count? Use the map to read, not to walk ›