When an Agent Ships a Regression

A regression ships. You open the usual box — who wrote this, what were they thinking — and find no human author. An agent generated the change; a human waved it through under time pressure; it passed the checks and broke in production. The incident is entirely real. The reflexes built around it — the blame, the postmortem template, the “talk to the author” — were built for a world where the author was a person. Operating safely now means rebuilding those reflexes for a world where increasingly they are not.

Instability is now a first-class metric

DORA’s 2025 research made this explicit by elevating “instability” to a named performance category alongside throughput, tracked through change fail rate and a new deployment rework rate. The same research frames AI as an amplifier: it intensifies whatever practices already exist rather than fixing weak ones. Strong delivery practices compound; weak ones become more painful. That is the whole argument in one line — cheap generation does not make delivery safer, it makes your existing assurance (or lack of it) matter more, because there is far more change flowing through it.

The agent cannot be the safety net

The instinct is to ask the agent to catch its own regressions. It cannot — the same dynamic that makes models unable to review their own security makes them unreliable at recognising their own breakage. Safety has to come from outside the model. OWASP’s 2025 risk taxonomy names this directly as “Excessive Agency”: the fix is to scope an agent’s functionality, permissions and autonomy, and to require human approval before high-impact actions. Shipping to production is the canonical high-impact action. An agent that can merge and deploy unsupervised is not a productivity feature; it is an unscoped privilege.

Detect and contain, because volume is up

When the rate of change rises, the assurance question shifts from prevention to fast recovery. The machinery is not novel — it is the operational hygiene many teams skipped because change was slow enough to get away without it: progressive delivery (canaries, feature flags, gradual rollout) so a bad agent-authored change reaches a fraction of users first; observability tuned to surface regressions quickly and attribute them; and an instant, rehearsed rollback or kill-switch. None of this is AI-specific. What is new is that the volume and speed of machine-generated change remove your margin for not having it.

Blameless, but accountable

Blameless postmortems were always about the person’s intent, not the absence of ownership. With a machine author that distinction becomes load-bearing. There is no one to blame — and “the AI did it” is not a root cause, it is an abdication. The postmortem still needs a named human owner: the person who accepted the change into the system and is answerable for it (the accountable core). Blameless about intent; accountable for the outcome. The responsibility gap an agent opens is closed by a named owner, never by pointing at the model.

“The agent generated it” has the same standing in an incident review as “the compiler emitted it”: true, and beside the point. The first question that moves a postmortem forward is not which model wrote the change, but which human accepted it — and whether the path that let them accept it without seeing the regression is the thing to fix.
— Sanjeev Purohit, from our delivery work

Operating safely

Gate production-shipping behind human approval — the Excessive-Agency mitigation; agent autonomy is a scoped, revocable privilege, not a default.
Progressive delivery + instant rollback — canaries and flags so a machine-authored regression is contained to a blast radius you can absorb.
Attribution-aware observability — know not just that something broke, but what produced the change (provenance), so the trail is reconstructable.
Treat change fail rate and rework rate as the assurance dial — DORA’s instability metrics are where agent-introduced regressions become visible.
A named owner per change — blameless on intent, accountable for the result.

Agents will ship regressions; that is not the failure. The failure is not seeing it quickly, not being able to contain and reverse it, and not knowing who owns the result. Assurance is the difference between an agent-authored regression being a five-minute rollback and being a weekend.