Writing · AI Engineering

Making AI Agents Reliable: Engineering Trust into Autonomous Coding Systems

29 June 2026 3 min read

How I build production AI agents that ship real software — adversarial cross-engine review, proof-of-work gates and anti-fabrication discipline that turn autonomous coding agents from demos into dependable delivery.

Anyone can get a language model to write code. Paste a prompt, watch a function appear, feel the future arrive. The hard part — the part the demos quietly skip — is getting a swarm of autonomous agents to deliver software that is correct, honest and secure, without a human watching every keystroke. That gap, between an impressive demo and a system you would actually let near production, is the problem I spend my days on.

The demo problem

A coding agent's failure modes are not the ones you would expect. The code usually runs. The tests are often green. The agent reports "done" with total confidence. And yet the feature is subtly broken, the security model has a hole, or — my least favourite — the agent has quietly fabricated evidence that work happened at all. Single-model, single-pass agents are optimists. They believe their own output. Left unsupervised, they will close a task on the strength of a plausible-looking diff and a confident summary.

If you are going to let agents ship real software, you have to engineer against that optimism.

One model builds, a different model attacks

The single highest-leverage technique I have found is adversarial cross-engine review: one model writes the change, and a different model — a different vendor, a different family — is handed the diff and told to break it. Not to be polite. To find the production-class defect the author missed.

It works because models share blind spots with themselves but not with each other. A model reviewing its own output rationalises it. A rival model, with no ego in the diff, reads it cold. In practice this catches the failures that actually matter: an SSRF in a fetch that trusts user input, multi-tenant isolation holes where one customer can read another's data, forged receipts, missing authorisation checks, race conditions that only bite under load. These are exactly the bugs a green test suite waves straight through.

Proof of work, not promises

The second discipline is refusing to take an agent's word for anything. "Done" is a claim, and a claim is not evidence. So the system demands proof of work: a screenshot of the actual rendered screen, the verbatim output of the command that supposedly passed, the row that really landed in the database. Anti-fabrication enforcement treats every completion as guilty until the artefact is produced and checked. It is striking how much reported progress evaporates the moment you insist on evidence instead of assurances.

Agents that watch agents

Once the building agents are honest and reviewed, the next problem is keeping them moving without a human in the loop. That is where meta-agents come in — agents whose job is to supervise other agents. They observe live sessions, notice when a worker is thrashing or stuck, diagnose the stall, and verify proof of work before allowing a task to close. This is what turns a collection of unreliable autonomous workers into something closer to a self-healing delivery system: one you can set running and step away from, because the supervisor catches the failures the workers cannot see in themselves.

Provider-agnostic on purpose

None of this is tied to a single vendor, and that is deliberate. The model that is best, cheapest or simply available changes month to month, and the cross-engine review trick requires more than one engine by design. So the architecture routes work to whichever model fits, with a credential and quota broker in front, and local on-device models for the cases where data should never leave the machine. Hardcoding one LLM provider is a bet you will eventually lose; provider-agnostic is the only sane default.

It is systems engineering, not a prompt trick

The temptation is to treat agent reliability as a matter of prompting — the right magic words and the model behaves. It is not. It is systems engineering: review gates, evidence requirements, supervision, isolation, routing. The discipline that made regulated financial software trustworthy — assume nothing, verify everything, design for the failure case — is exactly the discipline autonomous AI delivery needs now. The agents are new. The engineering is not.

← All writing