Hallucinated Intent and the Envelope Problem

Mihir Wagle March 24, 2026 9 min read

mcphallucinated-intentenvelope-designcricket-analyticsagent-governancecqrs

The dominant mental model for AI agents is "digital employee." Give it a role, assign it tasks, review its output. The org chart already handles delegation, so why invent a new coordination primitive? Because the analogy breaks in ways that will get people hurt.

The Employee Model Fails at the Boundary

Employees are stateful. They accumulate institutional knowledge over months and years: political dynamics, unwritten rules, taste. An agent gets a context window and maybe some retrieved memory. That's not institutional knowledge. That's a cheat sheet.

Employees also have interests. When your PM pushes back on a deadline, that's a signal that something about the plan doesn't survive contact with reality. An agent that always says "sure, I'll do that" isn't a cooperative employee. It's a yes-man with no skin in the game. The friction that makes human collaboration expensive is also what makes it safe.

So Call Them Tools? Not Quite.

A hammer doesn't surprise you. A lathe doesn't spontaneously decide to cut a different angle. Agents do. They exhibit emergent behavior. They evolve. They interpolate in gaps you didn't know existed.

That puts them somewhere between a tool and a domesticated animal. Useful, trainable, occasionally unpredictable, and dangerous if you forget they have their own dynamics. A sheepdog is a tool for herding, but you don't operate it like a fence. You train it, then you trust but verify. That's a fundamentally different contract than "tool."

The Coach, Not the Manager

A cricket coach doesn't bat, doesn't bowl, and critically, cannot call timeout mid-over. The intervention points are structurally limited. Once the ball is bowled, it's the captain's call.

A coach's authority is over preparation and selection, not execution:

Selects the XI. Capability composition, not micromanagement.
Sets the game plan. Broad strategic direction: attack or defend, pace or spin, matchup priorities.
Prepares through nets sessions. Specification and rehearsal, not real-time instruction.
Reads the match between sessions. Drift monitoring at natural breakpoints.
Reviews match footage afterward. Post-hoc evaluation against the game plan.

This is categorically different from management. Management assumes a persistent supervisory relationship with a single entity. Coaching assumes a system you prepare and release. You coach a team, not a person. You shape the system's behavior envelope before deployment, not review an individual's output after.

The agent isn't the bowler. The agent is the whole fielding side. The human is the coach. The orchestration layer between them is the captain: the one who translates the coach's broad strategy into field placements with the bowler, makes tactical calls between deliveries, and operates inside boundaries the coach established before the first ball.

The Real Failure Mode: Hallucinated Intent

The dangerous failure isn't when agents get facts wrong. It's when they get judgment wrong and look confident doing it.

A human provides a vague brief. The agent fills the ambiguity with plausible-sounding reasoning. It executes confidently in a direction nobody authorized. The output looks right. It passes a casual review. The problem compounds downstream.

This is exactly how autopilot crashes work. The system operates fine within its training envelope. The moment conditions get ambiguous (sensor conflict, edge case, novel situation) it doesn't stop. It interpolates. It applies "judgment" that is statistically reasonable but contextually wrong. And because the system looks competent, the human supervisor's attention has already drifted.

Agents are dangerous not when they fail, but when they succeed at the wrong thing. A tool that breaks gives you a signal. An agent that confidently fills an ambiguity gap gives you a plausible output you don't scrutinize.

The Supervision Scaling Wall

The traditional answer is human accountability. But accountability requires assessment. Assessment requires comprehension. Comprehension requires time. And time is exactly what machine-speed execution eliminates.

This is where the coach model reveals its structural advantage. A coach doesn't watch every ball with the intent to intervene. A coach watches the pattern of play and adjusts between sessions. The monitoring is statistical, not transactional. You catch drift, not instances.

Three models from other domains that operationalize this:

Nuclear submarine: Pre-authorize the envelope. A submarine captain doesn't radio home for every torpedo decision. The rules of engagement are specified before deployment with extraordinary precision. The friction isn't at decision-time. It's at policy-time. The captain has judgment within the envelope. Outside it, the system stops.

Financial risk management: Monitor the portfolio, not the trade. No risk manager reviews every trade. They set VaR limits, concentration thresholds, drawdown circuit breakers. If an agent's outputs suddenly shift (different decision patterns, higher write frequency, unexpected scope) that's your signal. You catch the drift, not the instance.

Aviation: Make the machine explain before it acts. Fly-by-wire systems annunciate. They tell the pilot what they're about to do and why. The pilot's job isn't to compute the answer. It's to recognize whether the announced intent makes sense. Recognition is cognitively cheaper than computation. The human isn't solving the problem. They're smelling the smoke.

Why Reads Are Free and Writes Need Authorization

The coach analogy maps cleanly onto a structural principle from system design: CQRS (Command Query Responsibility Segregation). Reads and writes are fundamentally different operations with fundamentally different risk profiles.

In cricket, the field placement is the projection layer. The shared read model. Everyone can see it. The batsman reads it. The bowler reads it. The commentary team reads it. Reading is free, parallel, and harmless.

Changing the field is a command. The captain and bowler set it together during play, not the coach. But they set it within the game plan the coach established before the match. The coach defined the envelope. The captain and bowler issue commands within it. The fielders execute.

Applied to agents: let them read anything. Let them query, analyze, synthesize, explore. Reads are free. But the moment an agent writes (modifies data, sends a message, commits code, makes a purchase) that's a command. It crosses the envelope boundary. It needs to flow through an authorization layer the human established before deployment.

The write path is where hallucinated intent becomes hallucinated action. That's where the blast radius lives. That's where the envelope must hold.

Context Graphs Won't Save You

The fashionable answer to the institutional knowledge gap is the context graph. Capture all the relationships, all the unwritten rules, all the organizational context, and agents will make good decisions.

This is the "just add a knowledge graph" of 2025. Same energy as "just add a data lake" in 2015. It's the completeness fantasy. If we just had enough context, judgment would be trivial.

It fails for the same reason that knowing the pitch dimensions, the soil composition, the grass length, and the weather forecast doesn't teach you how to set a field.

Institutional knowledge isn't a graph. It's a set of heuristics that evolved under pressure, half of which nobody can articulate, and a third of which contradict each other. "We don't ship features the week before earnings" is easy to encode. But "we don't ship the week before earnings unless the VP personally signs off and the customer is top-10 and the competitive pressure is acute enough that the CTO mentioned it in staff meeting"? That's not a graph. That's case law. Precedent-based, contextual, and the exceptions are the knowledge.

Context graphs fail in three specific ways:

They solve retrieval, not judgment. Even with perfect access to every document, policy, and Slack thread, the agent still has to decide which context is relevant to this specific decision. That's judgment. You've moved the interpolation problem from "no context" to "which context." The hallucinated intent failure mode is identical. Just better-informed hallucination.

They assume institutional knowledge is stable. It's not. It shifts with every reorg, every leadership change, every post-mortem. The graph is stale the moment you finish building it. You're maintaining a read model of something that mutates through informal channels: hallway conversations, Slack emoji reactions, who got promoted.

They're a write-path solution disguised as a read-path solution. Building the graph feels like infrastructure. But what you're actually doing is codifying decisions about what matters. That's a write operation on organizational policy. Nobody's treating it that way. Nobody's asking: who authorizes this node? Who validates this edge? What's the blast radius if this relationship is wrong?

A great cricket coach doesn't carry a complete graph of every match situation ever played. They carry a small set of high-leverage heuristics: when to attack, when to defend, when to trust the bowler's instinct, when to override. Earned through pattern recognition across thousands of matches, not by building a comprehensive database of cricket situations.

The institutional knowledge moat is real. But the moat isn't "codify everything." The moat is knowing which ten heuristics matter and encoding them as boundary conditions. The rest is retrieval, and retrieval is a commodity.

Evals Are Not Performance Reviews

If agents aren't employees, then evals aren't performance reviews.

Performance reviews assess an individual's judgment across instances. Was this person good? That question is unbounded. You can never enumerate enough cases to answer it definitively. This is why performance reviews are universally hated and universally unreliable.

The coach doesn't evaluate whether Bumrah is "good." The coach evaluates whether the game plan was adequate for the conditions. Did the envelope hold? Was the field placement right for that surface? Did the strategy account for the left-hander?

The eval target isn't the agent. It's the envelope. You're not asking "did the agent make good decisions?" You're asking "did my specification of the boundaries produce acceptable outcomes across the conditions I care about?"

This reframes the coverage problem entirely. You don't need exhaustive coverage over agent behaviors. That's infinite. You need coverage over envelope boundary conditions, which are finite and enumerable:

What happens when the agent encounters ambiguity? Does it stop or interpolate?
What happens at the read/write boundary? Does authorization hold?
What happens when inputs drift outside the training distribution? Does it annunciate?
What happens when two heuristics in the envelope contradict? Which wins?

That's property-based testing, not case enumeration. You're checking invariants, not instances. Monitoring the portfolio, not the trade.

What Actually Needs to Get Built

The fix isn't better agents. It's better envelopes.

Specification quality is the bottleneck. The meta-tool we need isn't a smarter agent. It's something that forces humans to be precise before the agent runs. A pre-flight checklist, not a guardrail. Agents should refuse to interpolate past a confidence threshold, not fill the gap.

Decision policies, not context graphs. A small, opinionated set of boundary conditions that encode the judgment calls that actually matter. Ten heuristics, not ten thousand nodes.

Read/write separation as an architectural primitive. Let agents read freely. Gate writes through authorization. The blast radius of a bad read is wasted compute. The blast radius of a bad write is a production incident.

Envelope-level evals, not instance-level reviews. Test the boundary conditions. Check the invariants. Monitor for drift. Stop trying to enumerate every possible decision an agent might make.

Agents are not employees. They're not traditional tools. They're not autonomous entities that need a context graph to behave. They're a fielding side that needs a coach: someone who selects the team, sets the game plan, establishes the field, and watches the pattern of play from the dressing room.

The organizations that figure out envelope design will outperform the ones still writing performance reviews for their chatbots.

This is all abstract until you see it in practice. In the next post, I'll walk through what envelope design actually looks like for a data analytics agent: where the read/write boundary sits, what the decision policies contain, how you eval the envelope instead of the output, and what breaks when you get it wrong.