Three Handoffs at Ninety Is Seventy-Three

A VP of growth at a thirty-person services firm walked me through a chain she’d built.

Three steps. Lead research at the top. Qualification in the middle. Personalized outreach at the bottom. Each step was an AI agent. The whole thing ran nightly on a list of new domains pulled from a directory.

She was proud of it. “Each step is solid,” she said. “I reviewed the outputs. Each one is, you know, ninety percent fine.”

The first batch of emails went out on a Tuesday. By Friday, the head of sales pulled her into a room. The pipeline notes were full of polite confusion. One prospect had been emailed about a problem they did not have. One had been addressed as the CEO of a subsidiary they had spun out two years earlier. One was a real-estate broker that the system had quietly classified as a manufacturing operations lead.

“The AI got it wrong,” she said in the room.

The AI did not get it wrong. Each step did exactly what it had been told to do, at the accuracy each step had been calibrated for. The chain consumed twenty-seven points of accuracy that nobody had budgeted. Nobody knew the budget existed.

The math that nobody is doing on a whiteboard

Here’s the thing. The math is not subtle. It is the kind of math an eighth-grader does in their head. Once you see it, you cannot stop seeing it.

Three steps. Each one is ninety percent fine. Plug it in.

0.9 × 0.9 × 0.9 = 0.729.

That is your final output’s accuracy. Seventy-three percent, not ninety. The “ninety percent fine” you reviewed at each step does not stack. It compounds in the wrong direction. Three handoffs at ninety percent is seventy-three percent. And four handoffs at ninety percent is sixty-six. And five handoffs at ninety percent is fifty-nine. The chain is consuming accuracy at a rate nobody is tracking, because the only place anybody looked was one step at a time.

The VP did not have a tool problem. The agents were fine. She had a budget problem. The accuracy budget for the final output was never named, and so the chain spent it invisibly across every handoff. By the time the email landed in the prospect’s inbox, the budget was overdrawn, and the only place the overdraft showed up was the room she got pulled into on Friday.

Where the orchestrator was supposed to live

This is where The Station Plan starts doing work.

The Station Plan describes an AI-native business as a working kitchen with three layers. There is a Chef at the center making judgment calls. There is a Line of stations, each one owning a specific dish. And there is the Pass between them. The Pass is the orchestrator. The Pass routes tickets between stations, manages handoffs, and most importantly, the Pass is the layer that knows what good enough looks like at the boundary.

When a station hands its work to the next station, the Pass is the one asking the question nobody else is asking. Is this output good enough to be used as input by the next station, given how much accuracy we have left to spend? That is the Pass’s whole job. It is not a routing layer. It is a tolerance-keeper.

In the chain the VP built, there was no Pass.

Each agent was a station. The handoffs were direct. Step one fed step two, step two fed step three, and nobody was watching the budget. There was no role in the system whose job was to look at step one’s output and say “this is ninety percent, the chain can survive that” or “this is ninety percent, but we only had three percent left to spend, throw it back.” The output flowed through.

The Pass is not optional. Pull the Pass and the kitchen still cooks. The plates just come back wrong, and the only place the wrongness shows up is at the customer table. Or in the prospect’s inbox.

Yeah. That is the trap.

What “fine” hides

Here’s the thing most leaders miss when they look at a single step in isolation.

“Ninety percent fine” is a fuzzy phrase that does two completely different jobs depending on what the next step is going to do with the output. If the next step is a human who will edit it, ninety percent fine is plenty. If the next step is another agent that is going to trust the output as ground truth and build on it, ninety percent fine is dangerous. The same output. Two completely different deployment standards.

The qualification step in the VP’s chain trusted the research step as ground truth. It assumed the company was the company the research step said it was. It then made a qualification decision on a hallucinated parent company. The output of qualification looked fine in isolation. “Yes, fit. Mid-market services.” Right call, given the input. Wrong input.

The outreach step then trusted qualification as ground truth. It wrote a personalized email referencing a pain point the qualification step had confidently named. The output of outreach looked fine in isolation. Sharp email. Specific reference. “Hi Dan, I noticed your team is dealing with the same intake bottleneck a lot of mid-market services firms run into right now.” Beautiful sentence. Wrong Dan. Wrong company. Wrong pain point.

Every step looked fine. Every step was fine, by the standard it had been graded against. The chain failed because nobody had translated “fine at this step” into “fine enough for the next step to build on.”

The habit underneath this

I want to be careful with how I frame this, because the VP was not careless. She had built something real. The chain worked. It scaled outbound by a factor she could not have hit any other way, and the unit economics were good even at seventy-three percent. The problem was not the building. The problem was the not-naming.

You are stacking tolerances you never named.

That is the operator habit. Not AI is unreliable. Not the agents are too dumb. The agents literally did exactly what they were calibrated to do. You did not calibrate them as a chain. You calibrated them as individuals. And then you chained them, and the chain consumed accuracy at compound rates while you reviewed each one in isolation.

The discipline is not adding a layer. The discipline is doing the math on a whiteboard before the chain runs.

Doing the math out loud

Here is what the math conversation actually looks like when an operator names tolerance budgets for the first time.

You start at the end. What accuracy does the final output need? For an outbound email that has to land warmly, you might say ninety-five percent. The wrong name on the wrong company gets you blocked, blacklisted, or laughed at. The cost of a miss is high, the reversibility is low, the budget for accuracy loss is two percent at most.

Then you walk backwards. Three steps. Two percent of total accuracy loss to spend. Each step gets about a third of a percent. Right? That means each step has to operate at roughly ninety-nine-point-three percent accuracy in isolation to land the chain at the threshold the final output needs.

Most operators look at that number and say “that is not achievable with the current agents.” Correct. That is what the math is telling you. The chain you have built cannot reliably produce ninety-five percent final-output accuracy with the agents you have. Either the agents need to be sharper, the chain needs to be shorter, or a human has to enter the loop at the highest-cost handoff to absorb the tolerance the agents cannot hold.

The math does not change. Three handoffs at ninety percent is seventy-three percent. Always.

What the Monday move actually is

Map any agent chain you have built. Write the steps on a whiteboard. At each step, write down what accuracy you actually think that agent is operating at. Not aspiration. The number you would defend if asked.

Multiply them.

If the product is below the accuracy threshold the final output needs, you have just found the cascade. The chain is consuming a budget nobody named. You now know which handoff to tighten first. Usually it is the one with the lowest score, but sometimes it is the one whose downstream stations are trusting it as ground truth and building on it. That second category is where the silent failures live, because the output looked fine to the human reviewing it in isolation, and then the next station treated it like the law.

Does that make sense?

Where the Pass earns its keep

The thing operators miss when they hear “orchestrator” is that the orchestrator is not a routing layer. It is the role that enforces tolerance budgets at every handoff. The Pass is not a connector. The Pass is the conscience of the chain. It is the role asking the only question that matters when one station hands work to the next. Given how much accuracy we have left to spend, is this output good enough to be trusted by the next station?

In agent chains today, that role almost never exists. Most chains are connectors. Step one’s output is piped to step two’s input, raw. No tolerance check, no budget question, no throwback to step one when the output is below the chain’s threshold.

The Pass is the layer most operators are missing in their chains right now. Not because it is hard to build. Because the math has not been done. And until the math is done, nobody knows what the Pass is supposed to enforce.

Three handoffs at ninety percent is seventy-three percent. Name your tolerances.

Original framework. Distilled from client work and the meeting where the concept clicked.

Framework spine: The Station Plan, specifically the role of the Pass as the layer that enforces tolerance budgets at every handoff between stations on the line. Read the full framework.

~ source material · Original framework. Distilled from client work and the meeting where the concept clicked.