How we work

We publish the failures

A research blog that only shows wins is a sales brochure. This one publishes every experiment, including the ones that lost. The raytracing idea that lost to a bigram has a post. The topic cache that helped a little and then hurt coherence has a post. The boundary signal that scored below random has a post.

The losses carry as much information as the wins. A clean negative result closes a door, names why it closed, and points at the door worth trying next. We write each one with the same care: the question, the attempt, the number, the lesson.

We judge an idea on the right axis

The search space here is enormous. A new idea almost always arrives weak. "No better than a bigram" is the normal first result of a real idea, not a verdict on it. So we do not kill an idea on the headline metric. We check the other dials first.

The pattern repeats across these posts. An idea ties on top-one accuracy but cuts perplexity threefold. An idea loses on clean text but wins the moment the input is noisy. An idea loses overall but wins decisively on the slice where the local context has run out. Each of those would have been thrown away if we had only read the headline number.

This is the rule we hold ourselves to: before shelving an idea, measure calibration, robustness, generalization, rare-context behavior, and transfer. A fragile idea usually wins first on a dimension you were not headlining.

We give fragile ideas room to grow

We borrow a stance from Jony Ive on the fragility of ideas. An idea, by definition, is unresolved. If it were finished, it would not be an idea. So we are tender with one in development. We judge it by trajectory, not by level. We set a budget of ten to twenty real variations before any decision to stop, because in a high-dimensional space the wins compound late.

A shelved idea goes to a graveyard, not a trash can. We record why it stopped and the step it died at, so it can come back when a complementary piece arrives. Three ideas sit in that graveyard now, each parked with a note on the axis it might still win.

Everything learns online, by counting

There is one hard rule across every experiment. No gradient descent. No backprop. No batch optimization that revisits the data.

Every model here is a single streaming pass of counters. It sees a piece of text once, updates its counts, and predicts by looking them up. This is the property a transformer cannot match: it learns from every sentence as it arrives, it never retrains, and a sparse update for new material barely touches what it already knew, so it does not forget.

It also keeps the whole mind inspectable. The counts, the discovered words, the concepts, all of it lives in a Prism document you can open and read. That is the bet underneath the blog: continual, compositional, inspectable learning is the thing a frozen function cannot give you, and it may be the truer road.

The post template

Every post follows the same shape, so each one reads in a couple of minutes:

The question, and why it mattered. What we wanted to know, in plain words.
What we tried. The setup, without the plumbing.
What happened. The one number, or the small table, that carries the result.
The lesson. The honest takeaway, the negatives included, and what it changes about where we go next.

‹ The lab notebook Grammar is just counting, made productive ›