From a Chat Window to an Operating System

30 May 2026 Ai

I did not set out to build an operating system. I set out to stop rewriting the same prompt every Monday.

What I have now is a set of skills, knowledge-base notes, scheduled rhythms, and a couple of bots that run a meaningful slice of my week as a product executive. It looks, from the outside, like it was designed. It was not. It accreted, one frustration at a time, over about a year. Each step solved the problem the previous step had created, and only in hindsight do the steps line up into something that deserves the word "system."

This is the honest version of that path: twelve steps, the things that broke along the way, and what each one was actually worth. If you are somewhere on the early part of it, I hope this saves you a few of the wrong turns.

Six layers of the operating system, stacked from bottom to top: a chat window, a memory, live tools, skills, a self-maintaining layer, and work that runs outside the chat window. Each layer is paired with the rule that made it hold. — Each phase added a layer, and each layer needed one rule to stop it falling over. The whole essay is the story of those six layers and six rules.

Phase one: learning to talk to it

The first thing I did was the thing everyone does. I opened a chat window and asked questions. Summarise this. Draft that. What are the trade-offs here. It was useful in the way a sharp intern is useful: quick, broad, and completely without context about my actual situation.

The first real improvement was unglamorous. I stopped starting from scratch every time and put standing context into the instructions: who I am, what I care about, how I like things written, what "good" looks like for me. The quality jump from that single change was larger than any clever prompt I have written since. Most people under-invest here for years. The model is not the bottleneck; the context you give it is.

Then I started attaching files. A strategy doc, a planning sheet, a transcript. The answers got specific instead of generic, and I learned the lesson that governs everything downstream: the model is only ever as good as what it can see. A brilliant model reasoning over nothing produces a confident, plausible, useless answer.

The fourth step was a shift in how I asked. Instead of broad questions, I brought specific problems with the relevant material attached. "Here is the doc, here is the decision I am trying to make, here is what I am worried about." That is when it stopped feeling like a search engine and started feeling like a colleague. The work was no longer prompting. The work was assembling the right context and pointing it at the right problem.

The cost of this phase was that none of it persisted. Every conversation started cold. I was re-explaining my world every single time, and the better the answers got, the more that re-explaining grated.

Phase two: giving it a memory

So I built it a memory. Not a clever one. A knowledge base, in an ordinary docs tool, holding the stable facts of my world: how our metrics are defined, what the strategy is, who owns what, the history behind the decisions that keep getting questioned. The things I was tired of re-typing went into notes, and the conversations started from those notes.

This was the first thing that felt like infrastructure rather than usage, and it broke almost immediately in a way I did not expect. As the knowledge base grew, the same fact ended up in two notes, the two copies drifted, and I got two different answers to the same question depending on which note got read. The memory was now actively misleading me.

The fix was a rule I have since come to treat as load-bearing: every fact gets exactly one home, and everywhere else points at it instead of restating it. (I wrote up why that rule matters more than it looks in an essay on the canonical-home rule.) Alongside it I built an index: a single note that says where everything lives, so a conversation could find the right note without me naming it. The index turned a pile of notes into something navigable. It also, much later, became the thing that let me add new capabilities without rewiring the old ones, which I did not appreciate at the time but now consider one of the most important pieces of the whole setup.

The value of this phase was that the system finally knew things between conversations. The cost was that I now had a knowledge base to maintain, and maintenance is where most personal systems quietly die.

Phase three: giving it hands

A knowledge base holds what is stable. It does not hold what is true today: the live churn number, this week's delivery status, what a customer said on Tuesday. For a while I was copying that data in by hand, which defeated the point.

So I connected tools. Live sources, wired in directly: the product analytics, the customer-success platform, the ticketing system, the meeting-notes app, chat. Suddenly a question like "how is this priority actually doing" could be answered against real data instead of my fuzzy recollection of a dashboard I half-remembered.

This created a new and specific kind of confusion. With seven or eight live sources connected, the system did not always know which one owned which question, and would occasionally answer an adoption question from the wrong place, or reach for the knowledge base when it should have pulled live data. The answers were worse in a way that was hard to spot, because they still looked confident.

The fix was to write down the routing: which kind of question goes to which source, and which file types and data types live where. A small map that says product-usage goes here, customer health goes there, strategy and definitions stay in the knowledge base. Boring to write, and it removed a whole category of quietly-wrong answers. The principle underneath it is the one I keep relearning: a system gets more useful when you connect more to it, and less trustworthy at exactly the same time, unless you also tell it how to choose.

Phase four: turning prompts into skills

By this point I had a handful of long prompts I was reusing constantly. A meeting-prep prompt that pulled context from the calendar, the notes app, and the knowledge base. A document-review prompt that ran a doc through a fixed rubric before I approved it. They worked, but they lived in a text file I copied from, and I tweaked them slightly every time, which meant they slowly diverged and I could never remember which version was the good one.

Turning them into skills fixed that. A skill is just a saved, named capability: the prompt, the steps, the sources it reads, packaged so I invoke it by name instead of pasting it. Meeting prep became a thing I asked for rather than a thing I assembled. Document triage became a four-check rubric I ran on everything, not just the docs that looked important enough to bother with.

The value here was that refinement stuck. A prompt I had sharpened over twenty uses stopped being something I had to remember and became something the system just did, the same way, every time. The work I had put into getting the prompt right stopped evaporating between uses.

The issue was the obvious one in hindsight. Each skill restated my tone, my formatting rules, my output preferences, because I had pasted them in. When I wanted to change how things were written, I had to change it in nine places, and I would miss one.

Phase five: the system that maintains itself

Two problems had been building the whole time. The knowledge base needed tending (stale notes, duplicated facts, indexes pointing at notes that no longer existed), and I was not tending it. And the skills were drifting, each one carrying its own slightly-different copy of rules that should have been shared.

So I built skills whose job was to maintain the system itself. A daily structural check on the knowledge base that catches broken index pointers and orphaned notes within a day instead of letting them rot until I tripped over them. A monthly content audit that hunts for duplication and staleness. Scans that check the knowledge base has not accumulated anything sensitive or anything that looks like a planted instruction. These are unglamorous and they are the difference between a system that compounds and one that decays.

Then I went one level further and built a skill that helps me build and edit skills, with a check wired into it that refuses to ship a skill which restates shared rules, hardcodes an ID that should be looked up, or collides with another skill's write target. That last one matters more than it sounds: I cannot personally remember, while writing the twelfth skill, every note the previous eleven already write to. A check that sees the whole set can. I wrote about why that gate is the thing that keeps the quality from sliding here.

This is the phase where it stopped being a collection of tools and became a system, because it had started looking after itself. The value was that I could keep adding to it without the maintenance burden growing faster than the usefulness. The cost was conceptual: I had to accept that a good chunk of my building effort would go into plumbing nobody would ever see, rather than into shiny new capabilities.

Phase six: leaving the chat window

The last step took it out of the chat window entirely. I wanted a question answered without me being in the loop at all: a teammate asks something in chat, and a bot answers it from our public support material, accurately, without me or anyone else fielding it.

Building that meant moving from a chat interface to writing actual software with an AI coding tool. The bot reads from public support pages, finds the relevant answer, and replies in the channel. It was the steepest step on the curve and the one that taught me the most, because a bot that runs unattended has nowhere to hide. In a chat window, a weak answer just sits there and I quietly ignore it. A bot posts its weak answer to a channel full of colleagues. Everything I had learned about context, routing, and not letting a confident-but-wrong answer through suddenly had real stakes.

The value was a category shift. Up to here, the system made me faster. This was the first piece that did work instead of me, for other people, on its own. That is a different thing, and it changes what you are willing to trust the system with.

What the whole thing was actually worth

Stepping back, the gain was not any single skill. It was that a real slice of my recurring work, the Monday assembly of where things stand, the prep before meetings, the review before I approve a document, the end-of-day close that feeds a weekly reflection, now runs on rails I trust. The skills compose: small autonomous pieces run in the background and a Monday orchestration pulls them together before I sit down. I spend less time assembling context and more time on the judgment that context was always in service of.

If I were starting again, I would do it in the same order, because each step genuinely depended on the one before. But I would believe sooner that the boring steps are the important ones. The context, the single home for each fact, the routing map, the maintenance skills, the gate that holds quality when discipline slips. None of those are the part you show people. All of them are the part that decides whether you are still using the thing in six months.

The whole system is published, MIT-licensed, in the library, as templates you can fill in with your own world. It is not a product and it is not advice. It is one person's working notes, left out where someone a few steps behind might find them useful.

The skill that took the longest to build was the one that builds the other skills. That ordering tells you most of what I learned.

The shift that mattered was not a better chatbot. It was treating my own attention as the thing worth building an operating system around. The whole system is published, MIT-licensed, in the library. Find me on LinkedIn if any of this lands, and the rest of the series walks through the pieces.