Very few AI initiatives fail suddenly. Most fail gradually. They begin as sensible experiments — a small team, a limited scope, a clear question. Early signals look promising, so more stakeholders take an interest. Funding expands. Expectations grow. New dependencies appear. Before long the organisation is discussing timelines, operating models and long-term ownership. Somewhere in that drift, the experiment became a pilot, the pilot became a POC, the POC became a programme — and nobody ever consciously decided it should. What changed along the way was not the quality of the evidence. It was the amount already invested. The question quietly shifted from “should we continue?” to “how do we make this work?”, and the original question simply disappeared.
The biggest AI risk is not deploying the wrong thing. It is being unable to stop the wrong thing.
Most ideas shouldn’t ship — and that’s the base rate, not a scandal
Start with a fact that predates AI by two decades. Across controlled experiments at Microsoft, Google, Netflix, Booking and Amazon, only about a third of deliberately built, carefully considered, believed-useful ideas actually move the metric they were designed to improve. Another third do nothing; the rest make things worse. In experimentation-mature organisations the failure rate is higher still — well over half, and in some optimised domains 80–96% of ideas fail. Intuition, in other words, is a poor predictor of value even among smart teams building things they were sure would work. “Most ideas shouldn’t ship” is not a sign of incompetence. It is the base rate.
The AI-specific numbers point the same way, though they should be read more cautiously — they are recent, self-reported and often vendor-adjacent. Gartner forecast that at least 30% of generative-AI projects would be abandoned after proof of concept (a figure it later pushed toward half); survey data has the share of organisations scrapping most of their AI initiatives jumping from roughly a sixth to over 40% in a year. Treat the exact figures as dated and directional. The point is not the precise percentage. It is that a high abandonment rate is the normal, healthy shape of a portfolio under uncertainty — and the organisations in trouble are usually the ones abandoning too little, too late.
It isn’t the model — it’s that we can’t stop
If most ideas shouldn’t ship, the interesting question is why organisations keep funding the ones that aren’t working. The answer is one of the most robust findings in management research, and it has nothing to do with AI: escalation of commitment. Once we have invested in a course of action, we persist in it beyond the point the evidence justifies — driven by sunk cost, self-justification, a bias toward completing what we started, optimism and the illusion of control, and plain entrapment. It has been replicated for fifty years. It is why good people keep pouring money into bad projects. So the failure mode of most AI programmes is not the model. It is governance. Most AI failures are not technical failures; they are governance failures disguised as technical programmes.
Couldn’t scale versus shouldn’t scale
It is worth separating two things that look alike from the outside. A worthwhile pilot that works but never makes it into the business has hit the Production Gap — it could not cross into a changed operating model, and that is an absorption failure to fix. This piece is about the other half: the use case that should not cross at all. Most stalled AI projects are not tragedies of execution. They are decisions that were never made — ideas that, on the evidence, did not deserve the next stage, kept alive because stopping felt like failure. Couldn’t scale is an engineering problem. Shouldn’t scale is a judgement problem, and the judgement usually goes missing.
The discipline: Default Off, and a high Kill Rate
The fix is two things working together — a philosophy and a measure. The philosophy is Default Off: production is not the automatic destination of every promising experiment; it is an outcome a use case has to earn. Continuation is not the default. Every transition — experiment to pilot, pilot to production, production to scale — is treated as a fresh investment decision that new evidence must justify, not an automatic promotion bought with enthusiasm. Which gives the sharpest operational test in the piece: a pilot without a kill criterion is not a pilot. It is a delayed programme.
The measure is the Kill Rate: the share of use cases you deliberately stop, tracked over time the way you track delivery throughput. It sounds negative; it is the opposite. A portfolio with a healthy kill rate is one where judgement is actually operating — where the base rate is being respected and escalation is being beaten. A team that almost never kills anything is not disciplined or unusually wise; it has simply let continuation become the default, and is accumulating zombie initiatives that drain funding, attention and credibility while never quite being cancelled. Ask the question on a real portfolio — what is our kill rate this quarter, and which initiatives have crossed a stage without earning it? — and the zombies surface fast.
An initiative rarely became a programme because someone decided it deserved to — it accumulated momentum: a workshop became a pilot, a pilot a roadmap item, a roadmap item a programme, and at each step the cost of stopping grew. The strongest organisations treated every transition as a new investment decision; the evidence had to earn the next stage. The ones that struggled treated continuation as the default — and by the time the evidence was questioned, the initiative was being defended, not evaluated.
This is not an argument against experimenting
The obvious objection is that this sounds like an excuse to be timid — to pre-judge ideas and stop trying things. It is the opposite. Because the win rate is low, you should run more experiments, not fewer; experimentation is precisely how you find the third that works without betting the company on intuition. The discipline is about where you stop, not whether you start. Kill on evidence, at a gate — never pre-emptively, and never by quietly starving something instead of deciding. The purpose of a pilot is not to prove that something works. It is to discover whether it deserves to continue.
Make “no” legitimate and owned
None of this works if stopping is socially career-ending. The transformation-leadership job is to make “no” a respected outcome of good governance rather than an admission of failure — to give someone explicit ownership of the kill decision, to celebrate a well-judged stop the way you celebrate a launch, and to report the kill rate to the board as a sign of health, not embarrassment. It completes a sequence the rest of this work has been building toward: can we build it, should we build it, and — the question almost no one owns — should we keep funding it. The fastest AI project is the one you stop before it becomes a programme. Looking back, the most expensive initiatives are rarely the ones that failed. They are the ones that should have stopped earlier, and quietly accumulated enough momentum that nobody felt able to be the person who said no.
Frequently asked
- Isn’t killing a project just admitting failure?
- No — it is the system working. Across two decades of controlled experiments only about a third of believed-useful ideas move the metric; most shouldn’t ship. A deliberate stop, made on evidence, is good portfolio discipline, not an error. The expensive mistake is the project that should have stopped and didn’t.
- Why do organisations keep funding AI projects that aren’t working?
- Escalation of commitment — a decades-replicated behavioural pattern driven by sunk cost, self-justification, completion bias, optimism and entrapment. The failure mode is governance, not model quality: most AI failures are governance failures disguised as technical programmes.
- Doesn’t this contradict “experiment more”?
- No. Because the win rate is low you should run more cheap experiments, not fewer. The discipline is where you stop: kill on evidence at a gate, not pre-emptively. A pilot without a kill criterion is not a pilot — it is a delayed programme.
- What is the one metric to watch?
- A deliberate kill rate (or no-build rate): the share of use cases you stop, tracked like delivery throughput. A portfolio that almost never kills anything has let continuation become the default and is accumulating zombie initiatives.
- How is this different from a pilot that stalled?
- A pilot that couldn’t cross into the business is the Production Gap — an absorption failure to fix. This is the other half: the use case that shouldn’t scale at all. Couldn’t scale is an engineering problem; shouldn’t scale is a judgement problem.