Generation got cheap. That much is settled. A senior engineer can now conjure a plausible service, a migration script or a week of boilerplate in an afternoon, and the marginal cost of the next thousand lines is trending towards zero. The board hears this and reaches for the obvious lever: developer headcount is the largest line in the engineering budget, generation is the bulk of what developers do, therefore AI savings should track headcount. That syllogism is wrong, and it is wrong in a way that costs real money.
The error is treating software cost as if it lives where the typing happens. It does not. Once code is near-free to produce, the cost does not vanish — it migrates downstream, to the work that turns a generated artefact into something an organisation can actually accept, integrate, review and own. That migration is the central fact of AI delivery economics, and almost nobody is modelling it on the P&L.
The savings are real — and they are local
Start with where the savings genuinely land. McKinsey's State of AI 2025 (analyst) reports that software engineering and IT functions cite the most concrete AI-driven cost reductions — tied specifically to code generation, automated testing and incident resolution. So the generation dividend is real, and it shows up exactly where you'd expect: at the keyboard. The problem is that the same study finds only about 39% of organisations report any enterprise-level EBIT impact from AI at all, and most of those attribute less than 5% of EBIT to it. Roughly 6% qualify as 'AI high performers'. Adoption, in other words, is racing ahead of anything visible on the bottom line.
The MIT NANDA report The GenAI Divide: State of AI in Business 2025 (analyst; primary source) puts the same gap in sharper relief: despite an estimated $30-40bn of enterprise generative-AI spend, roughly 95% of organisations see no measurable P&L return, and only about 5% of integrated pilots extract significant value. Crucially, NANDA names the binding constraint. It is not talent, infrastructure or model quality. It is the lack of learning, integration and contextual adaptation — which is to say, the downstream work, not the generation.
The slowdown nobody budgeted for
If generation is faster, why isn't delivery? Two findings should unsettle anyone forecasting linear savings. METR's randomised controlled trial (academic) took 16 experienced open-source developers across 246 real repository issues and found that allowing early-2025 AI tools made them 19% slower — a measured slowdown, not a speedup. The preprint is on arXiv as 2507.09089 for anyone who wants the methodology. The number that matters most for our purposes is the perception gap: developers predicted AI would speed them up by 24% beforehand, and even after living through a 19% slowdown still believed it had sped them up by 20%. That is not a rounding error. It is a systematic over-estimation of velocity by the very people the savings model trusts to report it.
Google's DORA 2024 Accelerate State of DevOps report (analyst) corroborates the team-level picture: a 25% increase in AI adoption was associated with an estimated 1.5% decrease in delivery throughput and a 7.2% decrease in delivery stability — and this against an adoption base where roughly 75.9% of respondents already rely on AI for at least part of their work. Individual productivity, felt at the keyboard, simply did not translate into team-level delivery gains. The friction reappeared somewhere else.
Where? GitClear's AI Copilot Code Quality research (press; vendor-published, so read it as directional) analysed over 211 million lines and found that in 2024, copy-pasted or cloned code overtook refactored 'moved' code for the first time, with code-clone frequency rising roughly eightfold. That is the downstream cost made visible. Cheap generation produces more code, more duplication and more divergence — and every clone is a future review, a future bug surface, a future maintenance liability that lands on a team months after the generation 'saving' was booked.
The bill for cheap generation does not disappear. It is rebilled downstream — to review, integration and operational ownership — and it arrives after the savings have been celebrated.
The Acceptance Gap, now with a P&L
This is the Acceptance Gap rendered in money. The distance between code that exists and code an organisation will accept into production — verified, integrated, owned — is where the cost reaccumulates. Gartner (analyst) predicts at least 30% of generative-AI projects will be abandoned after proof of concept by end-2025, citing poor data quality, inadequate risk controls, escalating costs and unclear business value, and notes plainly that GenAI costs 'aren't as predictable as other technologies'. Unpredictable downstream cost is precisely what a headcount-based savings model fails to capture.
Governance turns acceptance into a fixed cost
And the downstream bill is increasingly non-optional, because regulators are codifying it. NIST released the Generative AI Profile (NIST AI 600-1) on 26 July 2024 (governance) as a cross-sectoral companion to the AI RMF 1.0, defining 12 GenAI-specific risk categories and over 200 suggested govern, measure and manage actions — formalising operational ownership as an explicit obligation. The EU AI Act (governance) entered into force on 1 August 2024, with high-risk obligations — risk management, data governance, technical documentation, human oversight and post-market monitoring — plus deployer duties applying from 2 August 2026. And the UK Government's AI Playbook (GDS, 10 February 2025; governance) sets ten principles including 'meaningful human control at key decision points' and a mandated AI Systems Inventory.
Read those three together and the conclusion is structural: review, human oversight and operational accountability are no longer discretionary engineering hygiene. They are becoming legal and regulatory fixtures. You cannot generate your way past them, and you cannot price them at zero.
What to actually budget
The correct unit of account is not lines generated or developers replaced. It is cost-to-acceptance per change: the fully-loaded expense of taking a generated artefact through verification, integration, review and into ownership under your governance obligations. Model that, and the AI business case stops looking like a headcount cut and starts looking like a reallocation — fewer hours producing, far more hours accepting. The organisations in NANDA's 5% are the ones that funded the downstream.
- If your AI savings case is anchored to developer headcount, you are budgeting the cheapest part of the problem and ignoring where cost actually lands.
- Treat reported velocity gains with suspicion — METR's 20-point perception gap means self-reported speedups are not reliable inputs to a P&L.
- Make cost-to-acceptance a measured line item, not an assumption. If you cannot measure it, you cannot prove the saving.
If you are pressure-testing an AI savings case before it reaches the board, start with The Acceptance Gap to see where the cost reaccumulates, then read Measuring AI Engineering to instrument cost-to-acceptance. If the question is whether to build, buy or generate the capability at all, Build, Buy or Generate frames the trade — and Why Product Transformations Fail explains what happens when the downstream work is left unfunded.