Skip to content

Transformation Leadership · 11 min read · Updated 2026-06-20

Why Most AI Use Cases Should Never Reach Production

Most AI failures are not technical failures — they are governance failures disguised as technical programmes. Most ideas simply should not ship, and the real risk is not deploying the wrong thing but being unable to stop it. The highest-leverage AI capability is the ability to stop: Default Off as the philosophy, the Kill Rate as the measure.

By Priyanka Pandey · Founder & Editorial Lead

Reviewed and challenged by Sanjeev Purohit · Principal, Decision Architecture

Built from

  • Field experience
  • Independent research
  • Original framework
  • Reviewed with field experience

Last substantively reviewed · 2026-06-20

In brief

Most AI failures are governance failures disguised as technical programmes: most ideas structurally shouldn’t ship, the real failure mode is escalation of commitment (not model quality), and the highest-leverage capability is the ability to stop — Default Off as philosophy, the Kill Rate as the measure.

  • Most ideas shouldn’t ship — it’s the base rate, not a scandal: only ~1/3 of deliberately built, believed-useful ideas move the metric (mature orgs see 80–96% fail). Intuition is a poor predictor of value.
  • The failure mode is escalation of commitment / sunk-cost, not model quality — organisations fund losing projects past the evidence. Most AI failures are governance failures disguised as technical programmes.
  • Initiatives rarely fail suddenly; they escalate gradually (experiment → pilot → POC → programme) without anyone deciding — what changes is the amount invested, not the evidence.
  • Default Off (philosophy): production is earned, not the default destination; every transition is a fresh investment decision new evidence must justify.
  • The Kill Rate (measure): the share of use cases you deliberately stop, tracked like throughput — a healthy kill rate means judgement is operating; a near-zero one means continuation became the default (zombie projects).
  • Couldn’t scale (Production Gap, absorption failure) ≠ shouldn’t scale (this — a judgement that was never made).
  • Completes the sequence: can we build it? → should we build it? (Conviction Gap) → should we keep funding it? (Kill Rate).

Very few AI initiatives fail suddenly. Most fail gradually. They begin as sensible experiments — a small team, a limited scope, a clear question. Early signals look promising, so more stakeholders take an interest. Funding expands. Expectations grow. New dependencies appear. Before long the organisation is discussing timelines, operating models and long-term ownership. Somewhere in that drift, the experiment became a pilot, the pilot became a POC, the POC became a programme — and nobody ever consciously decided it should. What changed along the way was not the quality of the evidence. It was the amount already invested. The question quietly shifted from “should we continue?” to “how do we make this work?”, and the original question simply disappeared.

The biggest AI risk is not deploying the wrong thing. It is being unable to stop the wrong thing.

Most ideas shouldn’t ship — and that’s the base rate, not a scandal

Start with a fact that predates AI by two decades. Across controlled experiments at Microsoft, Google, Netflix, Booking and Amazon, only about a third of deliberately built, carefully considered, believed-useful ideas actually move the metric they were designed to improve. Another third do nothing; the rest make things worse. In experimentation-mature organisations the failure rate is higher still — well over half, and in some optimised domains 80–96% of ideas fail. Intuition, in other words, is a poor predictor of value even among smart teams building things they were sure would work. “Most ideas shouldn’t ship” is not a sign of incompetence. It is the base rate.

The AI-specific numbers point the same way, though they should be read more cautiously — they are recent, self-reported and often vendor-adjacent. Gartner forecast that at least 30% of generative-AI projects would be abandoned after proof of concept (a figure it later pushed toward half); survey data has the share of organisations scrapping most of their AI initiatives jumping from roughly a sixth to over 40% in a year. Treat the exact figures as dated and directional. The point is not the precise percentage. It is that a high abandonment rate is the normal, healthy shape of a portfolio under uncertainty — and the organisations in trouble are usually the ones abandoning too little, too late.

The base rate: most deliberately-built ideas don’t move the metric. A high stop rate is the healthy shape, not a failure.

It isn’t the model — it’s that we can’t stop

If most ideas shouldn’t ship, the interesting question is why organisations keep funding the ones that aren’t working. The answer is one of the most robust findings in management research, and it has nothing to do with AI: escalation of commitment. Once we have invested in a course of action, we persist in it beyond the point the evidence justifies — driven by sunk cost, self-justification, a bias toward completing what we started, optimism and the illusion of control, and plain entrapment. It has been replicated for fifty years. It is why good people keep pouring money into bad projects. So the failure mode of most AI programmes is not the model. It is governance. Most AI failures are not technical failures; they are governance failures disguised as technical programmes.

Escalation of commitment: each transition adds sunk cost, so stopping gets harder exactly as the evidence gets weaker.

Couldn’t scale versus shouldn’t scale

It is worth separating two things that look alike from the outside. A worthwhile pilot that works but never makes it into the business has hit the Production Gap — it could not cross into a changed operating model, and that is an absorption failure to fix. This piece is about the other half: the use case that should not cross at all. Most stalled AI projects are not tragedies of execution. They are decisions that were never made — ideas that, on the evidence, did not deserve the next stage, kept alive because stopping felt like failure. Couldn’t scale is an engineering problem. Shouldn’t scale is a judgement problem, and the judgement usually goes missing.

The discipline: Default Off, and a high Kill Rate

The fix is two things working together — a philosophy and a measure. The philosophy is Default Off: production is not the automatic destination of every promising experiment; it is an outcome a use case has to earn. Continuation is not the default. Every transition — experiment to pilot, pilot to production, production to scale — is treated as a fresh investment decision that new evidence must justify, not an automatic promotion bought with enthusiasm. Which gives the sharpest operational test in the piece: a pilot without a kill criterion is not a pilot. It is a delayed programme.

Default Off: each stage is earned at a gate with a pre-agreed kill criterion — continuation is never the default.

The measure is the Kill Rate: the share of use cases you deliberately stop, tracked over time the way you track delivery throughput. It sounds negative; it is the opposite. A portfolio with a healthy kill rate is one where judgement is actually operating — where the base rate is being respected and escalation is being beaten. A team that almost never kills anything is not disciplined or unusually wise; it has simply let continuation become the default, and is accumulating zombie initiatives that drain funding, attention and credibility while never quite being cancelled. Ask the question on a real portfolio — what is our kill rate this quarter, and which initiatives have crossed a stage without earning it? — and the zombies surface fast.

An initiative rarely became a programme because someone decided it deserved to — it accumulated momentum: a workshop became a pilot, a pilot a roadmap item, a roadmap item a programme, and at each step the cost of stopping grew. The strongest organisations treated every transition as a new investment decision; the evidence had to earn the next stage. The ones that struggled treated continuation as the default — and by the time the evidence was questioned, the initiative was being defended, not evaluated.
Sanjeev Purohit, from our delivery work

This is not an argument against experimenting

The obvious objection is that this sounds like an excuse to be timid — to pre-judge ideas and stop trying things. It is the opposite. Because the win rate is low, you should run more experiments, not fewer; experimentation is precisely how you find the third that works without betting the company on intuition. The discipline is about where you stop, not whether you start. Kill on evidence, at a gate — never pre-emptively, and never by quietly starving something instead of deciding. The purpose of a pilot is not to prove that something works. It is to discover whether it deserves to continue.

Make “no” legitimate and owned

None of this works if stopping is socially career-ending. The transformation-leadership job is to make “no” a respected outcome of good governance rather than an admission of failure — to give someone explicit ownership of the kill decision, to celebrate a well-judged stop the way you celebrate a launch, and to report the kill rate to the board as a sign of health, not embarrassment. It completes a sequence the rest of this work has been building toward: can we build it, should we build it, and — the question almost no one owns — should we keep funding it. The fastest AI project is the one you stop before it becomes a programme. Looking back, the most expensive initiatives are rarely the ones that failed. They are the ones that should have stopped earlier, and quietly accumulated enough momentum that nobody felt able to be the person who said no.

Frequently asked

Isn’t killing a project just admitting failure?
No — it is the system working. Across two decades of controlled experiments only about a third of believed-useful ideas move the metric; most shouldn’t ship. A deliberate stop, made on evidence, is good portfolio discipline, not an error. The expensive mistake is the project that should have stopped and didn’t.
Why do organisations keep funding AI projects that aren’t working?
Escalation of commitment — a decades-replicated behavioural pattern driven by sunk cost, self-justification, completion bias, optimism and entrapment. The failure mode is governance, not model quality: most AI failures are governance failures disguised as technical programmes.
Doesn’t this contradict “experiment more”?
No. Because the win rate is low you should run more cheap experiments, not fewer. The discipline is where you stop: kill on evidence at a gate, not pre-emptively. A pilot without a kill criterion is not a pilot — it is a delayed programme.
What is the one metric to watch?
A deliberate kill rate (or no-build rate): the share of use cases you stop, tracked like delivery throughput. A portfolio that almost never kills anything has let continuation become the default and is accumulating zombie initiatives.
How is this different from a pilot that stalled?
A pilot that couldn’t cross into the business is the Production Gap — an absorption failure to fix. This is the other half: the use case that shouldn’t scale at all. Couldn’t scale is an engineering problem; shouldn’t scale is a judgement problem.

Our perspective

The common view

Disappointing AI outcomes mean the technology or the use cases aren’t ready yet; the answer is better models and more/better pilots.

The Ivaaya view

Most ideas structurally shouldn’t ship (the base rate), and the reason failing ones survive is escalation of commitment, not model quality — a governance failure. The fix is Default Off (continuation must be earned at a gate) measured by a deliberate Kill Rate. The highest-leverage AI capability is the ability to stop.

This is just an excuse to be timid and stop trying things.
The opposite — because the win rate is low you should run MORE cheap experiments, not fewer. The discipline is where you stop: kill on evidence at a gate, never pre-emptively. The purpose of a pilot is to discover whether it deserves to continue.
Killing too early destroys option value / ignores the J-curve.
Which is why you kill on evidence at a pre-agreed gate, not by gut or by quietly starving a project. A kill criterion set at the start is what protects option value from both premature death and zombie survival.
Isn’t this the same as a pilot that stalled?
No. Couldn’t scale is the Production Gap (absorption failure to fix); shouldn’t scale is a judgement that was never made. Different problems, different fixes.
  • Make production opt-in: every stage transition is a fresh, evidence-gated decision, not an automatic promotion.
  • Set a kill criterion at the start of every pilot; a pilot without one is a delayed programme.
  • Track a deliberate kill / no-build rate like delivery throughput; report it to the board as health.
  • Give someone explicit ownership of the kill decision and make a well-judged stop a celebrated outcome.
The evidence & related ideas →

What we’ve observed

  • Two decades of controlled experiments (Microsoft/Google/Netflix/Booking/Amazon): ~1/3 of ideas move the metric (1/3 flat, 1/3 negative); mature orgs >50% fail, some domains 80–96%.
  • Escalation of commitment is a ~50-year replicated finding (Staw 1976 → Sleesman et al. meta-analysis): sunk cost, self-justification, completion bias, optimism/illusion of control, entrapment.
  • AI-attrition (bounded/dated): Gartner forecast ≥30% (later ≥50%) of GenAI projects abandoned post-POC; S&P survey scrapping most initiatives 17%→42% YoY (~46% of POCs); MIT 60/20/5 production funnel.
  • The POC that became a programme because no one would say stop; the steering committee funding a loser to avoid admitting sunk cost.
  • A successful workshop → pilot → roadmap item → programme, each step adding cost and audience, none a conscious decision.

How certain are we?

  • Only ~1/3 of deliberately built ideas move the metric (base rate)established: Observed repeatedly across delivery programmes.
  • Escalation of commitment is the dominant reason failing projects surviveestablished: Observed repeatedly across delivery programmes.
  • A deliberate kill rate is a leading indicator of a healthy AI portfolioemerging: Still early, but increasingly visible.
  • AI-specific POC-abandonment percentages (forecast/self-reported, time-sensitive)emerging: Still early, but increasingly visible.

Related ideas

About the author

Priyanka Pandey

Founder & Editorial Lead

Priyanka Pandey founded Ivaaya and leads its editorial voice, translating real delivery experience into practical thinking on AI-native engineering, decision-making and technology leadership. Her work focuses on helping senior leaders make sense of the changes reshaping software delivery without adding to the noise.

Reviewed and challenged by

Sanjeev Purohit

Principal, Decision Architecture

Sanjeev works across enterprise architecture, product strategy and AI-native delivery. The ideas in this article have been challenged against real programmes, production systems and organisational decision-making before publication.

Compare notes

If you have an AI initiative that quietly became a programme without anyone deciding it should, the issue is usually governance, not the model. Tell us where one is hard to stop — we are comparing notes with teams making “no” a legitimate, owned decision.

What’s hard to stop?