Product Delivery in the Age of AI

Most organisations do not fail because they cannot build software. They fail because they cannot consistently turn business intent into production outcomes. That sentence has been the spine of how we think about delivery, and the arrival of capable AI in every part of the software lifecycle has not changed it. It has, if anything, sharpened it. AI is now woven through discovery, requirements, planning, build, test and documentation. What it has done is make the production of artefacts at every stage dramatically cheaper. What it has not done is make the translation of intent into the right artefacts any better.

The adoption debate is settled. DORA's 2024 research found roughly 76% of developers relying on AI for daily professional work, with around three-quarters reporting individual productivity gains and writing code the most common use. Stack Overflow's 2025 survey puts the figure at 84% using or planning to use AI tools, with 51% of professionals using them daily. The interesting question is no longer whether engineers use AI. It is what happens to delivery when they do.

The productivity contradiction is not a contradiction

Two headline numbers seem to be in open conflict. McKinsey reports developers completing some tasks up to twice as fast with generative AI. METR's randomised controlled trial of sixteen experienced open-source developers on mature repositories found the opposite: they took 19% longer with early-2025 tooling, even though they had forecast a 24% speed-up and believed afterwards they had been 20% faster. Press coverage tends to set these against each other as if one must be wrong.

They are not in conflict. They are the same finding, sliced by task complexity and codebase context. McKinsey itself notes that the time savings shrink to under 10% on complex, high-complexity work. METR's developers were working on large, mature, high-context codebases — precisely the conditions where the gains evaporate. The pattern is consistent and, once you see it, obvious: AI helps most where intent is already clear and the stakes are low, and helps least exactly where delivery is genuinely hard. It accelerates typing, not thinking.

The bottleneck moves downstream

Faster local output is not the same thing as better system-level delivery, and DORA's 2024 data captures the paradox cleanly. Individual productivity, flow and satisfaction all rise with AI adoption. Yet a 25% increase in adoption is associated with an estimated 1.5% drop in delivery throughput and a 7.2% drop in delivery stability. More code, produced faster, by happier engineers — and delivery gets slower and less stable. The constraint has not disappeared. It has moved downstream, into integration, review and acceptance.

This is the structural shift worth naming. When generation was expensive, production was the bottleneck. When generation becomes near-free, the bottleneck becomes acceptance: whether the output is correct, integrated, maintainable and actually aligned to the intent that prompted it. Marty Cagan's warning about teams becoming very efficient at building the wrong thing applies with new force — AI lets you do the wrong work faster, and produce convincing artefacts for the wrong problem at every stage of the lifecycle.

Acceptance debt has a measurable fingerprint

You can see the cost accumulating in the code itself. GitClear's analysis of more than 211 million lines of code found code churn rising from 4.5% in 2023 to 5.7% in 2024, cloned (copy-pasted) code climbing from 8.3% in 2021 to 12.3% in 2024, and refactoring activity falling sharply toward under 10% of changed lines. The pattern is telling: AI is good at generating plausible code and poor at reusing and consolidating what already exists. The result is more churn, more duplication and less consolidation — the measurable fingerprint of cheap generation without an acceptance discipline. That is rework being manufactured upstream and paid for later.

Trust data tells the same story from the human side. Stack Overflow's 2025 survey found only 33% of developers trust AI accuracy against 46% who actively distrust it, with the single biggest frustration — cited by 66% — being solutions that are almost right but not quite. Three-quarters say they would still ask a person when they do not trust the AI's output. The human remains the acceptance gate, and the cost of AI has quietly shifted from writing to verifying.

When generation is nearly free, the scarce capability is no longer producing the artefact. It is deciding whether the artefact was worth producing.

Governance becomes a delivery concern, not a compliance afterthought

NIST's 2024 Generative AI Profile (NIST-AI-600-1) names confabulation, information integrity, intellectual property and human-AI configuration as risks to be mapped, measured and managed. Read through a delivery lens, this is a formal acknowledgement that accelerated generation increases the need for verification and acceptance discipline. Governance stops being a downstream gate and becomes part of how delivery is designed — because the volume and confidence of AI-generated output outpaces the human capacity to scrutinise it unless that scrutiny is built into the system.

The operating model has to invert

The practitioner consensus, from Cagan's SVPG to Thoughtworks, is that AI raises the bar on judgment, product thinking and context rather than lowering it. The product manager's and architect's job becomes harder, not redundant, and measuring value by lines of code or accepted suggestions is precisely the wrong instinct — it optimises the part AI already made cheap while ignoring the flow and stability metrics that actually move.

So the operating model must invert. The time AI frees up at the keyboard should not be reinvested in generating more code. It should be spent on the things AI cannot do: sharper discovery, deliberate architecture, and rigorous acceptance criteria. The scarce capability is still the translation layer between strategy, product, architecture, engineering and operations. AI makes that layer more important, because it removes the natural friction that used to slow bad intent down before it reached production.

Put plainly: AI compresses execution across the whole lifecycle, but it does not improve Intent Translation. The discipline of delivery — turning intent into executable systems — is the part that does not get automated, and the part that now determines whether all that accelerated output becomes outcomes or just churn. We have written about where this constraint lands most acutely in The Acceptance Gap, and about how to build the translation and acceptance capability deliberately in the From Idea to Production: A Practical Product Delivery Lifecycle. If you want the broader thesis on how engineering itself changes when generation is cheap, start with What Is Agentic Engineering?.