Rocketships and Golf Carts: AI's Transportation Problem

Mihir Wagle 10 min read
governancecatalogsearchcartai

The AI industry has a categorization problem dressed up as a technology debate.

Every new capability ships under the same label: "AI agent." An email summarizer and a protein structure predictor occupy the same product category, the same way a bicycle and a Saturn V both qualify as "transportation." The label is correct and useless at the same time. Buyers can't tell what's worth paying for, vendors paper over thin value with anthropomorphic metaphors, and the genuinely extraordinary capabilities get buried under the same marketing copy as the mundane ones.

The Transportation Framework

If the goal is to get from point A to point B, the right vehicle depends on distance, terrain, passenger volume, and path constraints. Nobody serious argues that every trip needs a car. Buses work for high-volume fixed routes. Trains work for predictable corridors. Bikes work for sparse networks. Walking handles everything else. Each mode reshapes the infrastructure around it. Highways created suburbs. Rails created downtowns.

AI capabilities sort into three tiers using the same logic.

Tier 1: Rocketships. Problems that were genuinely unsolvable before. The alternative to the rocketship isn't a cheaper vehicle. The alternative is not going to the moon. Nobody argues about the ROI here because the denominator is zero.

Tier 2: Economic displacement. Problems that humans solve today, where the cost curve or speed curve makes AI substitution rational. The task costs $400 in human time and $0.03 in compute, and the math is legible enough that nobody needs to invoke magic. The bus route with enough riders to justify the schedule.

Tier 3: Leave it alone. Problems where human judgment, relationship, or adaptability is the actual product. Forcing AI here is not just wasteful. It degrades the outcome. These are the walking paths. Nobody puts an engine on a pedestrian.

The useful question for any AI capability is not "does it use AI?" but "which tier is it in?" That question is almost never asked.

The AlphaFold Precedent

DeepMind did not launch AlphaFold by calling it a digital scientist colleague. They identified a Tier 1 problem: predicting protein structures from amino acid sequences. Experimental methods (X-ray crystallography, cryo-EM) took months to years per protein, cost millions, and had a backlog that would outlast civilization. Nothing else worked at this scale.

The adoption sequence followed the tiers exactly. AlphaFold 1 proved the approach on the hardest variant of the problem. AlphaFold 2 solved it so decisively that the field reorganized around it, releasing 200 million protein structures. AlphaFold 3 expanded to adjacent problems where the economics justified displacement.

Nobody called it a teammate. Nobody gave it memory. AlphaFold is a prediction engine that takes input, produces output, and hands the results to humans for validation. When it's wrong, the response is recalibration, not disappointment. The trust relationship is mechanical, which is exactly why it works.

The sequence matters: earn credibility at Tier 1, then expand to Tier 2 where the math works.

DeepMind also didn't try to make AlphaFold collaborate like a human team. The dominant pattern in multi-agent AI is to run agents independently and merge their outputs, like musicians in separate studios mixed by a producer who never heard them play together. AlphaFold skipped the collaboration theater entirely. It's a single system optimized for a single problem, designed to do what scientists cannot rather than to act like one.

The Anthropomorphization Tax

The industry took the opposite approach. Instead of starting with rocketships, most vendors deployed AI everywhere simultaneously and papered over the thin value with human metaphors. Agents have "memory." They "learn" your preferences. They're your "digital colleague."

This framing is not a harmless marketing choice. It actively degrades adoption by setting the wrong expectations at every layer.

It sets the wrong acceptance threshold. A colleague who forgets what you said yesterday is broken. A workflow engine that requires explicit state input each run is just a tool. The metaphor creates failure modes that don't exist in the underlying system. Nobody gets mad at grep for not remembering what you searched for last time.

It also imports the wrong trust model. You trust a colleague through track record and shared context. You trust a power tool through predictability and transparency. When you frame an agent as a colleague, users expect judgment, and when it exercises judgment badly, the disappointment feels personal rather than mechanical.

The anthropomorphization is not a cognitive error on the part of vendors. It's a pricing strategy. "Workflow automation tool" competes with Zapier at $20 a month. "AI teammate" competes with a headcount at $80K a year. The human metaphor justifies enterprise contracts. But it creates an expectations gap that eventually collapses, and every agent that hallucinates a client email accelerates the collapse.

Worse, the colleague frame obscures the actual safety architecture these systems need. I've written before about how the dangerous failure isn't when agents get facts wrong but when they get judgment wrong and look confident doing it. An agent that fills an ambiguity gap with plausible reasoning and executes in a direction nobody authorized is exhibiting hallucinated intent. The colleague metaphor makes this harder to catch, because colleagues are supposed to exercise judgment. A tool that oversteps its specification is obviously broken. A "teammate" that takes initiative looks like it's doing its job.

The fix has to be structural. Trust in agentic systems is a ledger, not a feeling. You separate reads from writes. Reads are free: let agents query, analyze, synthesize. Writes are commands: they cross the envelope boundary and need authorization the human established before deployment. This is CQRS applied to AI governance, and it works precisely because it treats agents as systems with specifications rather than colleagues with judgment. The anthropomorphic framing makes this separation invisible, because colleagues don't have a read/write boundary. They just "do things."

The Rocketship Nobody Marketed

Here is a concrete example of a Tier 1 capability hiding inside a Tier 2 product.

Microsoft Copilot Search performs semantic retrieval across an enterprise's full data corpus (email, chat, files, meetings, calendars) using Microsoft Graph grounding, while simultaneously enforcing user permissions, sensitivity labels, DLP policies, retention rules, and conditional access policies from Microsoft Purview and Microsoft Information Protection.

The enforcement is not a single gate. It is a three-layer pipeline:

At index time, permissions are encoded into the semantic index itself. The search space is pre-scoped per user before a query ever runs.

At query time, conditional access and sensitivity labels filter further. The query executes inside a permission-aware boundary, not against the full corpus with post-hoc trimming.

At response time, DLP policies and sensitivity checks evaluate what gets returned. Even if a result passes the first two gates, the response can be suppressed or redacted based on policy.

The end user types a question and gets an answer. They have no idea that three independent enforcement layers just fired in sequence, each applying different governance primitives, across data spanning half a dozen services.

This is a Tier 1 capability. No human can do this. Not slowly, not with a team, not with unlimited time. A compliance officer can audit one document against one policy. This system applies thousands of policies against millions of documents, per user, per query, in seconds. That is not acceleration of existing human work. That is a category of operation that did not previously exist.

Notice what the three-layer architecture actually is: a governance pipeline that doesn't require human-shaped intermediaries. I've argued elsewhere that the medallion architecture's Silver and Gold layers exist because humans cannot eat raw data. Agents don't share that limitation. The same logic applies here. Traditional enterprise search required human-curated taxonomies, hand-maintained permission matrices, and governance officers manually reviewing access policies. Copilot Search replaces those human intermediary layers with a machine-native enforcement pipeline. The governance isn't pre-chewed for human consumption. It's applied computationally, at scale, in a way no human intermediary layer could match.

Before this, enterprise knowledge was either accessible or governed. Pick one. Organizations that made information easy to find inevitably surfaced things they shouldn't have. Organizations that locked information down made it unfindable. The two objectives were in structural tension. The three-layer architecture resolves that tension at query time, per user, at scale.

And yet nobody markets it this way. The capability is buried inside the same "summarize my meetings" pitch as everything else. The rocketship shares a product page with the golf carts.

The Invisible Layer

Amazon's catalog is a mess. Millions of SKUs, duplicate listings, inconsistent metadata, seller-generated descriptions of wildly varying quality. Nobody notices. You type "USB-C cable 6ft" and get a reasonable result. Behind that query, Amazon is resolving synonyms, deduplicating listings, filtering counterfeit flags, applying your purchase history, enforcing geographic availability, and suppressing sellers with policy violations. The value is entirely in what you don't see.

This is what good infrastructure looks like. Sewage systems, road maintenance, trash collection, the electrical grid. Nobody thinks about them. Everyone depends on them. You only notice when they fail. The hallmark of successful infrastructure is invisibility.

Copilot Search's three-layer governance pipeline is the same kind of infrastructure. The sensitivity labels being checked, the DLP policies firing, the conditional access evaluation: none of it is visible to the user. They see an answer. The thousands of unauthorized results silently suppressed per query, per user, every time? Invisible. The governance is the sewage system of enterprise knowledge. It makes the mess navigable without requiring the mess to be cleaned up first.

This is why the rocketship doesn't get marketed as one. Rocketships are visible. Everyone watches the launch. Infrastructure is invisible by design. The AI industry rewards what's visible: a meeting summary, a drafted email, a generated slide deck. Things you can screenshot for a Forrester study. The invisible layer that makes enterprise knowledge simultaneously accessible and governed doesn't produce artifacts. It produces absence: the unauthorized results that never appear, the compliance violations that never happen, the data leaks that never occur. The industry has no framework for valuing absence, so it doesn't.

But absence is where the value compounds. Amazon's search isn't valuable because of what it shows you. It's valuable because of the millions of irrelevant, counterfeit, out-of-stock, and policy-violating results it filters before you see anything at all. The same way a city works not because of its landmarks but because someone maintains the sewage system and picks up the trash. The invisible layer is the substrate.

The Actual Framework

The AI industry needs a vehicle classification system, not better anthropomorphic metaphors.

For every AI capability, ask: what tier is this? If it solves a previously unsolvable problem, it's a rocketship. Price it like one, market it like one, and don't dilute it by bundling it with the golf carts. If it makes existing work cheaper or faster, it's Tier 2. Do the math. Show the bus route economics. Stop pretending it's magic. If the human is the product, leave it alone.

This is ultimately a measurement crisis. The same way organizations reward PMs for shipping features rather than killing bad ones, the AI industry rewards capabilities for looking impressive rather than being categorically new. Meeting summaries generate visible artifacts. Governed semantic search across a permission graph generates the same search box. The rocketship looks exactly like a golf cart from the outside. The value is invisible unless you know what to measure.

My cricket analogy: the best bowlers are measured not just by wickets taken but by dot balls bowled. The delivery where nothing happened was the plan. The pressure that led to the wicket three overs later traces back to the dots. Enterprise AI needs the same reframe. The value of Copilot Search isn't in the answers it surfaces. It's in the thousands of unauthorized results it silently suppresses, per query, per user, every time. That's the dot ball. That's the thing nobody is counting.

The vendors who learn to identify their rocketships, name the invisible layers, and measure the absence will build durable positioning. The vendors who spray "AI agent" across every surface and rely on anthropomorphic metaphors to justify the price will discover that the expectations gap has a half-life, and it's shorter than their enterprise contract terms.

The industry is arguing about whether agents should have memory. Meanwhile, the hardest problem in enterprise AI got solved quietly inside an invisible layer nobody is measuring correctly.

← Back to blog

Enjoyed this post? Get new ones in your inbox.