AI OTR

From a Chat Window to an Operating System

Sat, 30 May 2026 00:00:00 GMT

I did not set out to build an operating system. I set out to stop rewriting the same prompt every Monday.

What I have now is a set of skills, knowledge-base notes, scheduled rhythms, and a couple of bots that run a meaningful slice of my week as a product executive. It looks, from the outside, like it was designed. It was not. It accreted, one frustration at a time, over about a year. Each step solved the problem the previous step had created, and only in hindsight do the steps line up into something that deserves the word "system."

This is the honest version of that path: twelve steps, the things that broke along the way, and what each one was actually worth. If you are somewhere on the early part of it, I hope this saves you a few of the wrong turns.

Phase one: learning to talk to it

The first thing I did was the thing everyone does. I opened a chat window and asked questions. Summarise this. Draft that. What are the trade-offs here. It was useful in the way a sharp intern is useful: quick, broad, and completely without context about my actual situation.

The first real improvement was unglamorous. I stopped starting from scratch every time and put standing context into the instructions: who I am, what I care about, how I like things written, what "good" looks like for me. The quality jump from that single change was larger than any clever prompt I have written since. Most people under-invest here for years. The model is not the bottleneck; the context you give it is.

Then I started attaching files. A strategy doc, a planning sheet, a transcript. The answers got specific instead of generic, and I learned the lesson that governs everything downstream: the model is only ever as good as what it can see. A brilliant model reasoning over nothing produces a confident, plausible, useless answer.

The fourth step was a shift in how I asked. Instead of broad questions, I brought specific problems with the relevant material attached. "Here is the doc, here is the decision I am trying to make, here is what I am worried about." That is when it stopped feeling like a search engine and started feeling like a colleague. The work was no longer prompting. The work was assembling the right context and pointing it at the right problem.

The cost of this phase was that none of it persisted. Every conversation started cold. I was re-explaining my world every single time, and the better the answers got, the more that re-explaining grated.

Phase two: giving it a memory

So I built it a memory. Not a clever one. A knowledge base, in an ordinary docs tool, holding the stable facts of my world: how our metrics are defined, what the strategy is, who owns what, the history behind the decisions that keep getting questioned. The things I was tired of re-typing went into notes, and the conversations started from those notes.

This was the first thing that felt like infrastructure rather than usage, and it broke almost immediately in a way I did not expect. As the knowledge base grew, the same fact ended up in two notes, the two copies drifted, and I got two different answers to the same question depending on which note got read. The memory was now actively misleading me.

The fix was a rule I have since come to treat as load-bearing: every fact gets exactly one home, and everywhere else points at it instead of restating it. (I wrote up why that rule matters more than it looks in an essay on the canonical-home rule.) Alongside it I built an index: a single note that says where everything lives, so a conversation could find the right note without me naming it. The index turned a pile of notes into something navigable. It also, much later, became the thing that let me add new capabilities without rewiring the old ones, which I did not appreciate at the time but now consider one of the most important pieces of the whole setup.

The value of this phase was that the system finally knew things between conversations. The cost was that I now had a knowledge base to maintain, and maintenance is where most personal systems quietly die.

Phase three: giving it hands

A knowledge base holds what is stable. It does not hold what is true today: the live churn number, this week's delivery status, what a customer said on Tuesday. For a while I was copying that data in by hand, which defeated the point.

So I connected tools. Live sources, wired in directly: the product analytics, the customer-success platform, the ticketing system, the meeting-notes app, chat. Suddenly a question like "how is this priority actually doing" could be answered against real data instead of my fuzzy recollection of a dashboard I half-remembered.

This created a new and specific kind of confusion. With seven or eight live sources connected, the system did not always know which one owned which question, and would occasionally answer an adoption question from the wrong place, or reach for the knowledge base when it should have pulled live data. The answers were worse in a way that was hard to spot, because they still looked confident.

The fix was to write down the routing: which kind of question goes to which source, and which file types and data types live where. A small map that says product-usage goes here, customer health goes there, strategy and definitions stay in the knowledge base. Boring to write, and it removed a whole category of quietly-wrong answers. The principle underneath it is the one I keep relearning: a system gets more useful when you connect more to it, and less trustworthy at exactly the same time, unless you also tell it how to choose.

Phase four: turning prompts into skills

By this point I had a handful of long prompts I was reusing constantly. A meeting-prep prompt that pulled context from the calendar, the notes app, and the knowledge base. A document-review prompt that ran a doc through a fixed rubric before I approved it. They worked, but they lived in a text file I copied from, and I tweaked them slightly every time, which meant they slowly diverged and I could never remember which version was the good one.

Turning them into skills fixed that. A skill is just a saved, named capability: the prompt, the steps, the sources it reads, packaged so I invoke it by name instead of pasting it. Meeting prep became a thing I asked for rather than a thing I assembled. Document triage became a four-check rubric I ran on everything, not just the docs that looked important enough to bother with.

The value here was that refinement stuck. A prompt I had sharpened over twenty uses stopped being something I had to remember and became something the system just did, the same way, every time. The work I had put into getting the prompt right stopped evaporating between uses.

The issue was the obvious one in hindsight. Each skill restated my tone, my formatting rules, my output preferences, because I had pasted them in. When I wanted to change how things were written, I had to change it in nine places, and I would miss one.

Phase five: the system that maintains itself

Two problems had been building the whole time. The knowledge base needed tending (stale notes, duplicated facts, indexes pointing at notes that no longer existed), and I was not tending it. And the skills were drifting, each one carrying its own slightly-different copy of rules that should have been shared.

So I built skills whose job was to maintain the system itself. A daily structural check on the knowledge base that catches broken index pointers and orphaned notes within a day instead of letting them rot until I tripped over them. A monthly content audit that hunts for duplication and staleness. Scans that check the knowledge base has not accumulated anything sensitive or anything that looks like a planted instruction. These are unglamorous and they are the difference between a system that compounds and one that decays.

Then I went one level further and built a skill that helps me build and edit skills, with a check wired into it that refuses to ship a skill which restates shared rules, hardcodes an ID that should be looked up, or collides with another skill's write target. That last one matters more than it sounds: I cannot personally remember, while writing the twelfth skill, every note the previous eleven already write to. A check that sees the whole set can. I wrote about why that gate is the thing that keeps the quality from sliding here.

This is the phase where it stopped being a collection of tools and became a system, because it had started looking after itself. The value was that I could keep adding to it without the maintenance burden growing faster than the usefulness. The cost was conceptual: I had to accept that a good chunk of my building effort would go into plumbing nobody would ever see, rather than into shiny new capabilities.

Phase six: leaving the chat window

The last step took it out of the chat window entirely. I wanted a question answered without me being in the loop at all: a teammate asks something in chat, and a bot answers it from our public support material, accurately, without me or anyone else fielding it.

Building that meant moving from a chat interface to writing actual software with an AI coding tool. The bot reads from public support pages, finds the relevant answer, and replies in the channel. It was the steepest step on the curve and the one that taught me the most, because a bot that runs unattended has nowhere to hide. In a chat window, a weak answer just sits there and I quietly ignore it. A bot posts its weak answer to a channel full of colleagues. Everything I had learned about context, routing, and not letting a confident-but-wrong answer through suddenly had real stakes.

The value was a category shift. Up to here, the system made me faster. This was the first piece that did work instead of me, for other people, on its own. That is a different thing, and it changes what you are willing to trust the system with.

What the whole thing was actually worth

Stepping back, the gain was not any single skill. It was that a real slice of my recurring work, the Monday assembly of where things stand, the prep before meetings, the review before I approve a document, the end-of-day close that feeds a weekly reflection, now runs on rails I trust. The skills compose: small autonomous pieces run in the background and a Monday orchestration pulls them together before I sit down. I spend less time assembling context and more time on the judgment that context was always in service of.

If I were starting again, I would do it in the same order, because each step genuinely depended on the one before. But I would believe sooner that the boring steps are the important ones. The context, the single home for each fact, the routing map, the maintenance skills, the gate that holds quality when discipline slips. None of those are the part you show people. All of them are the part that decides whether you are still using the thing in six months.

The whole system is published, MIT-licensed, in the library, as templates you can fill in with your own world. It is not a product and it is not advice. It is one person's working notes, left out where someone a few steps behind might find them useful.

The skill that took the longest to build was the one that builds the other skills. That ordering tells you most of what I learned.

The shift that mattered was not a better chatbot. It was treating my own attention as the thing worth building an operating system around. The whole system is published, MIT-licensed, in the library. Find me on LinkedIn if any of this lands, and the rest of the series walks through the pieces.

The 80% Problem

Tue, 19 May 2026 00:00:00 GMT

Most executives end the year hitting most of their goals. Then the company quietly underperforms, and nobody can fully explain why.

The numbers, looked at one at a time, are fine. Eight of the ten OKRs landed green. Three of the four strategic bets paid off. The board pack tells a story that survives scrutiny. The annual review is calm.

And yet somewhere, and you can feel it before you can prove it, the year was worse than the scoreboard says.

This is the 80% problem.

The system is built to produce 80%

Quarterly goals are designed to be achievable. That's their job. A goal nobody hits demoralises the team; a goal everyone hits ceases to be a goal. So we settle on stretch targets that the best version of the team can hit with effort, which means the expected outcome is somewhere around eighty percent.

Multiply that across a function. Eighty percent of product on time, eighty percent of hiring filled, eighty percent of the strategic bets paying off, eighty percent of the customer commitments held. Each number, on its own, looks like a respectable quarter.

But the 20% that gets missed is not random. It's not evenly distributed across the company in a way that averages out. The misses cluster. They concentrate in the places where attention was thinnest, where ownership was murkiest, where the political cost of raising a hand was highest.

The misses are correlated, and the correlation is where the company actually lives.

The misses compound across functions

A 20% shortfall in onboarding shows up six months later as a 20% shortfall in retention. A 20% gap in sales enablement shows up two quarters later as a missed pipeline number. A 20% delay on the platform team shows up next year as a competitor catching up on something you thought you'd locked down.

None of these are visible in the quarter they occur. They show up downstream, and by the time they show up, the people who could have caught them earlier have moved on to the next thing.

The reviews you run will not catch this. Quarterly reviews look at the quarter, not the compounding. Annual reviews look at the year, by which point the compounding has already happened. You'd need a review cadence that lives between these two: close enough to the work to catch the early signal, distant enough to see across functions.

Most executives don't have one.

The miss is usually attention, not capability

When something underperforms, the instinct is to ask whether the team had the right people, the right resources, the right plan. Sometimes the answer is yes to all three, and the thing still missed.

The harder honest answer is usually that the leader didn't notice in time. The signal was there in week three. It was clearer in week six. By week ten it was obvious to everyone except the person who could still have changed the trajectory.

This isn't a failure of intelligence. It's a failure of attention allocation. The leader's calendar was full of legitimately important things, and the early signal of the underperforming bet was a quiet line on a Slack channel that didn't get read, or a hedge in a status update that didn't get probed, or a number that drifted a percent at a time across four weeks before anyone said anything.

You don't fix attention with more meetings. You fix it with a system for noticing.

Noticing is a skill

The executives I pay attention to, the ones who keep getting their bets right while the rest of us are still explaining why this year was unusual, have one thing in common. They've built, deliberately, a system for noticing.

It usually looks unglamorous. A weekly habit they hold without fail. A short list of questions they ask themselves on a Friday afternoon. A document they reread at the end of every month, not to review the team, but to review their own attention from the previous four weeks. A standing prompt, asked of themselves, about what's quietly going wrong that nobody has flagged yet.

The form varies. The function is the same: a forced cadence of looking at the things the rest of the calendar is designed to push out of view.

This is the practice that the surface markers of good leadership are the output of. The calm meetings, the right call at the right moment, the bet that everyone retrospectively says was obvious. These are downstream of noticing. The noticing is the work.

What this isn't

It isn't journaling. Journaling is reflection without a job to do.

It isn't a productivity system. Productivity systems optimise output; this optimises the quality of what you're outputting toward.

It isn't coaching, in the sense the word usually gets used. Coaching is something someone else does to you, on a schedule that someone else sets. The practice I'm describing is something you do for yourself, daily and weekly, on a cadence you own.

It is, if I had to name it, an operating system for your own attention. And like any operating system, it's invisible when it works and catastrophic when it doesn't.

The quiet observation

The executives who catch the 80% problem earliest tend to have one thing in common. A system for noticing it.

The ones who don't tend to find out at the annual review, in the same words, in different clothes, year after year.

Find me on LinkedIn if any of this lands. The system I run is published, MIT-licensed, in the library. The rest of the series picks up specific pieces of it.