Karpathy’s Wiki Idea Is Right. It Just Doesn’t Tell You How Deep to Go.

Karpathy’s Wiki Idea Is Right. It Just Doesn’t Tell You How Deep to Go.

Andrej Karpathy posted on April 3rd about a shift in how he uses AI — away from generating code on demand, toward building a self-maintaining personal knowledge base. Something that grows over time, stays readable, and doesn’t reset every time you open a new session. The post went viral within days. Then MemPalace dropped. I wrote about that one already, and suddenly everyone was talking about AI memory at once. This post is about Karpathy’s idea specifically, because I think he got the core of it right and I also think there’s a gap he didn’t address.

The core idea: if you’ve actually processed something — thought it through, made decisions with it — a compiled record of that thinking beats starting from scratch every time. The AI reads what you’ve already built instead of guessing. Nothing disappears between sessions. For someone working on one deep research topic, it’s a clean and honest design.

My problem is that I don’t have one topic. I have six workstreams running simultaneously, and they don’t behave the same way.

There’s a freelance business with clients, pricing decisions, and operational history that compounds over time. A personal blog. A YouTube channel. An Indonesian blog. A LinkedIn publishing queue. A book that doesn’t exist yet but will. Each one needs a different kind of memory. Some need deep accumulated context — decisions that reference earlier decisions, strategy that builds on itself, history that matters when a client questions a call you made three months ago. Others just need to answer “where did I leave off.” If I give all of them the same depth, I either load irrelevant context into every session, or I build a maintenance surface I’ll quietly abandon after a few weeks.

There’s another layer to this that Karpathy’s setup doesn’t account for: I don’t have one agent doing the work. I’ve written about this separately. Claude assists my writing — fact-checking, claim-checking, helping me think through structure faster than I could alone. I decide what goes up and what doesn’t. Codex handles the building side. The roles are written down and don’t drift. That division changes the memory problem. It’s not just about how much context to load. It’s about which agent needs what, and when. Claude helping me work through a post for this blog needs a different surface than Codex building out a client workflow. Loading the full freelance business wiki into a writing session is noise. Loading only the publishing context into a build session is just wrong.

So I tiered everything by how much the work actually demands, and by who’s doing the work.

The freelance business gets the deepest structure. Repeated judgment calls, procedures that evolve, strategy that accumulates — when that context disappears you end up remaking decisions you already made. A book project will need the same depth when it starts, for a different reason: you can’t pressure-test an argument across sessions without a record of where the argument has already been. Both fit exactly the use case Karpathy was describing, and both are heavy enough to justify a properly maintained wiki.

The content work is different. YouTube, the personal blog, LinkedIn — what these need is continuity, not depth. What’s in progress, what’s been published, what angles are already used up. I don’t need months of context to work through a YouTube script. I need to know what the last three videos covered.

Then there’s the operating system itself, the thing I’m building all this inside. It’s getting a lot of my attention right now because I’m actively building it. Once it stabilizes, it becomes a light reference layer — a few durable notes, nothing that needs regular upkeep.

LinkedIn surprised me. I post twice a week and had been treating it as too lightweight to bother organizing. But low-stakes and high-frequency turned out to be its own problem. Without any structure I kept losing the thread — what angles I’d already run, what was sitting half-finished, what I’d told myself I’d get to two weeks ago. The fix wasn’t a deep knowledge base. It was a running list and a status tracker. Frequency creates its own continuity problem, and a heavy system would have been the wrong answer for it.

Karpathy built the right thing for a specific shape of work. What I had to figure out was how to apply the same underlying principle — earn structure before you build it, don’t add depth until the work genuinely demands it — across workstreams that aren’t shaped the same way, handled by agents that aren’t interchangeable. The tier model is on paper now. Building it out across all six is what comes next. That’s the Lowrobb OS case study, and I’ll write about it when there’s something real to show.