Trust Is a Ledger, Not a Feeling: Rethinking Control in Agentic AI
This article was originally published on LinkedIn on December 21, 2025. Read it there.
In my previous post, I wrote about two-way and one-way doors in agentic systems. The distinction turns out to be where trust actually lives.
Two Approaches to Trust in Agentic AI
Right now, the discourse on agentic AI safety falls into two camps:
Model-intrinsic safety (Joseph Breeden's concept-aligned models): Ground representations in human-interpretable ontology with explicit concept-level constraints enforced at generation time rather than learned as output behavior.
System-level transparency (Yasmeen Ahmad's internet principles for AI): Employ layered architectures and clean interfaces with radical observability, derived from early internet principles.
Both are valuable. But both miss a fundamental architectural primitive.
The Core Problem
We often assume technical reversibility equals safety. In practice, trust doesn't work that way.
Consider a pricing example: even if reversing a price increase is technically feasible, the trust damage from increasing prices constitutes a one-way door psychologically.
Agentic systems complicate this because they operate in chains of reasoning. If a self driving agent hallucinates a premise early ("that looks like an oil slick"), a later action ("emergency brake") can look perfectly valid from a permissions standpoint.
The Bollywood Thriller Analogy
Think of the film A Wednesday!: commitment becomes irreversible once a line is crossed, regardless of reasoning quality. Agentic AI faces the same problem: you can't safely replay long chains of thought in real time.
The Solution: CQRS Model
I propose borrowing CQRS (Command Query Responsibility Segregation) from critical infrastructure, separating reversible sense-making from irreversible action. This involves three phases:
1) Draft (Two-way door)
Agents reason, explore, and simulate. Hallucinations here function as brainstorming.
2) Invariant checks (Logic airlocks)
Before one-way doors open, the system enforces invariants binding actions to verified premises rather than replaying entire thought chains.
Three concrete examples:
The Clean-up Invariant:
- Agent statement: "Delete database db01."
- Justification: "I identified it as a temporary test artifact in step 4."
- Verification: The system re-queries whether db01 actually carries the tag
env=test. If not, the command is rejected.
The Refund Invariant:
- Agent statement: "Issue $200 refund to User A."
- Justification: "User falls under the 'SLA Breach' policy."
- Verification: The system requires a specific SLA breach event ID from telemetry logs.
The Code Merge Invariant:
- Agent statement: "Merge PR #402 to Main."
- Justification: "The code is clean and safe."
- Verification: The system ignores the agent's opinion, requiring instead a link to a passing Build Artifact ID from CI/CD pipeline.
We force the agent to "quote its sources" at the moment of commitment.
3) Commit (The ledger)
Actions are permanently committed and fully auditable once invariants pass and policy allows.
Trust as Transaction
Trust emerges not from smarter agents or explainability alone, but through transactional design: by defining exactly where a reversible draft becomes an irreversible commitment, and by protecting that boundary with invariants that survive depth, speed, and scale.
Thanks to Tom Peplow for feedback on this piece.