Walk into almost any large organisation in 2026 and you will find AI pilots — dozens of them. Walk in a year later and you will find most of them exactly where you left them: working, demoed, admired, and contributing nothing to the P&L. The uncomfortable truth is that the pilots usually succeed. It is the organisations that fail to do anything with them. The technology crosses the line; the value does not.
Adoption is not value
The funnel is steep. In one widely-cited 2025 study of enterprise-grade GenAI tools, roughly 60% of organisations evaluated them, about 20% ran a pilot, and only around 5% reached production — a sharp drop-off, though one scoped to bought, task-specific tools rather than to all AI use. (We will set aside the more lurid headline from the same report — the “95% get zero return” claim has been contested enough that we will not lean on it — but the direction is not seriously in dispute.) The deeper problem is what gets counted as success. Seat licences, usage dashboards and “people are using it” are adoption metrics. They are not value. Deloitte’s 2026 enterprise survey found only about a fifth of organisations already growing revenue from AI, against the roughly three-quarters who merely expect to.
The last mile no one owns
Ask why a specific pilot stalled and the answer is rarely “the model wasn’t good enough”. It is that the pilot was scoped to prove the technology worked, and then handed across a gap that no one’s job description covered — from the team that built it to the team that would have to live with it, inside a workflow that was never redesigned to use it. The engineering literature has said this for a decade: the model is a small fraction of a real production system, and the hardest part is the transition into operations, which most teams treat informally. McKinsey’s 2025 work puts a number on the other side of it — of the many things organisations could do, end-to-end workflow redesign is the single biggest driver of bottom-line impact from AI. The pilot is the easy part. The redesign is the hard part, and it is the part everyone skips.
It is organisational, not technical
This reframes the whole problem. RAND’s analysis of why AI projects fail puts the leading causes upstream and organisational — the wrong problem, miscommunicated intent, missing data and infrastructure — well before model capability. Google’s DORA research describes AI as an amplifier: it magnifies whatever discipline, or dysfunction, an organisation already has. Point a powerful amplifier at a broken operating model and you get a louder broken operating model, faster. None of this is a reason to slow down on the technology. It is a reason to stop pretending the technology is the hard part.
The Production Gap
We call the chasm between a pilot that works and an operating model that delivers value the Production Gap. There are really two gaps, not one. The first — from an idea to a working demo — AI has made almost trivial to cross; that is why pilots are everywhere. The second — from a working demo to changed, value-producing production — is exactly as hard as it has always been, because it is organisational, and it is where almost everyone stalls.
Underneath it is a scoping error. Most pilots are run as a Proof of Technology — can the model do the thing? — when the question that actually decides ROI is a Proof of Production: will this change how the work is done, who owns it, and what it earns? A Proof of Technology that passes tells you almost nothing about whether the value will ever arrive, because it never tested the part that was always going to be hard.
Not all stalling is failure
An honest caution, before anyone over-corrects into a kill-every-pilot panic. Some pilots stall because they should — they were never worth scaling, and stopping them is good portfolio discipline, not failure. (The companion to this argument is that, when building is cheap, deciding what not to build becomes the scarce skill.) Others lag rather than fail: the returns from a genuine operating-model change arrive on a curve, not in a quarter. The discipline is to tell the three apart — a healthy kill, a slow-burning bet, and a genuine absorption failure — and to be honest about which one you are looking at. The Production Gap is about the third: the worthwhile pilot that dies in the handoff.
The most dangerous sentence in an AI programme is “the pilot was a success.” A pilot that proved the model but changed no one’s working day has proved nothing that pays. The teams actually capturing value treat the pilot as a dress rehearsal for a changed operating model, not a technology demo — and they name an owner for the last mile before they start, not after it has already stalled.
Closing the gap
For the people accountable for the return — CIOs, CTOs, boards, transformation sponsors — the move is not another pilot. It is to scope the operating-model change into the pilot from the start: name the workflow it will redesign, the owner of the last mile, and the P&L line it is meant to move, and then measure the pilot against that path to production rather than against a usage dashboard. Treat senior ownership as non-negotiable — value tracks the leaders who own the change, not the technical teams left to push it uphill. The organisations that win the AI era will not be the ones that ran the most pilots. They will be the ones that crossed the second gap.
Frequently asked
- Why do most AI pilots fail to deliver ROI?
- Usually not because the model is inadequate, but because the pilot was scoped to prove the technology and then handed across a last mile no one owns, into a workflow that was never redesigned. The blocker is organisational — operating-model change — not model capability.
- Isn’t a successful pilot proof of value?
- No. A working pilot is a Proof of Technology; usage and adoption are not value. ROI depends on a Proof of Production — a changed workflow, an owned last mile, and a moved P&L line — which a tech demo never tests.
- What separates the teams that actually scale AI?
- End-to-end workflow / operating-model redesign (McKinsey finds it the single biggest driver of bottom-line impact), senior leadership owning the change rather than delegating it, and narrow high-value use cases measured against P&L.
- How should we measure an AI pilot?
- Against a path to production and a specific P&L outcome, not a usage dashboard. Name the workflow it redesigns, the owner of the last mile, and the line it is meant to move — at scoping, not after.
- Is the high failure rate unique to AI?
- Read it against an already-high base rate for IT and data projects — some stalling is normal portfolio attrition, and some is healthy pruning of pilots that were never worth scaling. The concern here is the worthwhile pilot that dies in the handoff.