Why are AI product gross margins lower than SaaS?

Because inference is a per-use cost of goods, not a near-zero marginal cost. Every query re-runs the model and consumes compute, so cost scales with usage. Classic SaaS sits around 70–90% gross margin; AI-native products commonly run materially lower (roughly low-50s to mid-60s gross margin, dated and directional), with inference alone a large share of an AI product’s cost base.

Won’t falling token prices fix margins on their own?

Usually not. When unit cost falls, teams spend the saving on more usage — deeper reasoning, bigger context, agentic loops that multiply tokens per task — so aggregate spend rises even as price-per-token drops. Cheaper per token does not mean cheaper overall; the saving gets reinvested rather than banked.

Should we still use per-seat pricing?

Rarely, for usage-heavy AI. Per-seat pricing breaks when cost scales with usage: heavy users on a flat fee erode margin. Price to the cost driver — usage or outcome — so revenue and cost move together. The market is shifting from seats toward usage/hybrid models for exactly this reason.

What is the Margin Floor?

The structural gross-margin level an AI product settles toward once inference is a real cost of goods, held down because efficiency gains get reinvested into more capability. You raise it by design — pricing to the cost driver, gating features by margin, and driving COGS down — not by waiting for tokens to get cheaper.

How do we measure AI product economics?

Treat the token as the unit of cost and attribute it: cost-per-request, gross-margin-by-feature, and gross-margin-by-customer — not just a company-level number. That is what surfaces the features and customers sitting below the Margin Floor before they quietly sink the business.

Product Ventures · 11 min read · Updated 2026-06-20

The Unit Economics of an AI Product

Classic software was almost free to run; an AI product pays real money every time a customer uses it. Inference behaves more like cost of goods sold than infrastructure — and falling token prices do not automatically fix margins, because consumption expands to absorb the savings. AI changed the economics of software, not just the mechanics. We call the line you now have to design for the Margin Floor.

By Priyanka Pandey · Founder & Editorial Lead

Reviewed and challenged by Sanjeev Purohit · Principal, Decision Architecture

Built from

Field experience
Independent research
Data-backed
Original framework
Reviewed with field experience

Last substantively reviewed · 2026-06-20

In brief

Inference behaves more like cost of goods sold than infrastructure, and falling token prices do not automatically improve margins because consumption expands to absorb the savings — so an AI product’s margin is a design problem (the Margin Floor), not a deflation problem to wait out.

Classic software had ~zero marginal cost (75–85% gross margin); an AI product pays compute/tokens on every use — inference is the largest variable line in COGS and scales with usage.
AI changed the economics of software, not just the mechanics. Intelligence became abundant; margin became scarce.
Treating inference as infrastructure (fixed, amortised) rather than COGS (per-unit) is the category error that hides the cost that grows when you succeed.
Falling token prices do not fix margins: every efficiency gain is reinvested into more usage (deeper reasoning, bigger context, agentic loops) so aggregate spend rises — cheaper per token ≠ cheaper overall.
The Margin Floor: the structural gross-margin level an AI product settles toward; you raise it only by design, on three levers — price to the cost driver, gate features by margin, drive COGS down.
Per-seat pricing breaks when cost scales with usage; the AI P&L has two halves — cost to build/deliver and cost to run per customer.

For two decades, software was a wonderful business for one quiet reason: once you had written it, serving the next customer cost almost nothing. Marginal cost trended to zero, gross margins settled at 75–85%, and those margins funded everything else. AI breaks that premise at the foundation. An AI product pays real money — compute, tokens, energy — every single time a customer uses it. The marginal cost is no longer trivial; it is often the largest variable line in the cost of goods, and it scales with usage rather than melting away at scale. This is not a tooling change. It is an economic one.

Intelligence became abundant. Margin became scarce.

The counter-intuitive core: cost per use falls, usage rises to meet it, and margin barely moves. Cheaper AI ≠ higher margin.

The end of zero-marginal-cost software

The numbers, taken carefully, tell a consistent story. Classic SaaS gravitates to roughly 70–90% gross margin. AI-native products run materially lower — venture snapshots in early 2026 put the average in the low-50s (ICONIQ’s 2026 survey projects around 52%), with the best LLM-native companies nearer 65% (Bessemer), and inference alone eating something like a fifth of an AI product’s cost base. (Treat any single figure as dated, self-reported and directional — these are venture-sample projections, not audited actuals, and they move — but the gap is not in dispute.) The reason is structural, not temporary: inference is a per-transaction variable cost. It behaves more like cost of goods sold than like infrastructure. And that distinction is the whole game.

Infrastructure is a fixed platform cost you pay to be in business and then amortise across everyone. Cost of goods sold is a cost you pay on every unit you sell. Treating inference as the former when it is really the latter is the category error that quietly destroys AI-product margins — because it hides the one cost that grows precisely when you succeed.

Inference sits in COGS, not infrastructure — a wedge that grows with every query, not a fixed platform line.

But token prices are collapsing

The obvious objection is that this fixes itself. Inference is getting cheaper at a startling rate: for a fixed capability, the price per token has fallen steeply — independent analysis from Epoch AI puts it at roughly five-to-ten times cheaper per year on the cost-performance frontier, and faster still at the task level, so a given capability that cost real money two years ago now costs a fraction of it. If the cost of the thing in your COGS is collapsing, surely the margin problem solves itself. It is a reasonable assumption. It is also usually wrong.

The savings never stay savings

Here is the part that catches people out. When the unit cost of a capability falls, organisations almost never bank the saving. They spend it — on more context, deeper reasoning, more steps, more personalisation, more automation, more of the things that were previously too expensive to do. Agentic workflows are the clearest case: a single task that once took one model call now takes a loop of them, consuming roughly an order of magnitude more tokens — Anthropic reports its own multi-agent systems use around fifteen times the tokens of a single chat. So per-token price falls while tokens-per-task rises, and aggregate spend climbs rather than falls. Cheaper per token does not mean cheaper overall. The margin you expected the price drop to hand you was quietly consumed by the extra usage that same price drop unlocked.

Every time the cost of a capability fell, the saving was rarely banked — it was reinvested into something previously unaffordable: more context, more automation, more personalisation. Capacity expanded to consume the efficiency, so cost per unit fell while expectations rose. Efficiency rarely arrives as profit on its own; somebody has to decide to keep the saving rather than spend it.
— Sanjeev Purohit, from our delivery work

	Classic software	AI product
Cost to serve	Near-zero marginal cost	Real money on every use — behaves like cost of goods sold
What a price cut does	Margin expands as you scale	Margin stays flat — usage expands to absorb the saving
The line to design for	A gross margin you can assume	The Margin Floor you have to engineer

AI changed the economics of software, not just the mechanics — design for the Margin Floor.

So margin is a design problem: the Margin Floor

Put the two forces together — falling unit price, rising usage — and you get a gross-margin floor that does not lift on its own. We call it the Margin Floor: the structural level your gross margin settles toward once inference is a real cost of goods, held down by the fact that every efficiency gain gets reinvested into more capability. You do not reach a healthy margin by waiting for tokens to get cheaper. You reach it by design, on three levers.

The Margin Floor: features sit above or below it; three levers raise the floor and lift features above it.

Price to the cost driver. Per-seat pricing breaks when cost scales with usage — your heaviest users, on a flat fee, are exactly the ones losing you money. The market is already moving from seats to usage and outcome pricing for this reason. Align what you charge with what drives your cost, or your best customers become your worst accounts.
Gate features by margin. Every feature now carries a marginal cost, so every feature has to earn it. This is where deciding what to build meets deciding what to run: a capability that was cheap to build can still be a feature you cannot afford to serve. Measure gross margin by feature and by customer, not just at the company line.
Drive COGS down deliberately. Route easy work to small models and escalate only when needed; cache repeated context; distil; and, at scale, consider owning inference rather than renting it. These are real levers — but they are engineering work with trade-offs, not a free lunch.

Can we afford to succeed with it?

The AI P&L now has two halves: what it costs to build and deliver the software, and what it costs to run it for every customer, every day. The second half is new, and it is the one founders underprice. The discipline is not pessimism about AI — it is treating inference as the cost of goods it is, designing the Margin Floor instead of waiting for it, and asking a sharper question than “can we afford to build this?”. Because the hardest part of an AI product is not getting the answer. It is getting the economics to work.

Frequently asked

Why are AI product gross margins lower than SaaS?: Because inference is a per-use cost of goods, not a near-zero marginal cost. Every query re-runs the model and consumes compute, so cost scales with usage. Classic SaaS sits around 70–90% gross margin; AI-native products commonly run materially lower (roughly low-50s to mid-60s gross margin, dated and directional), with inference alone a large share of an AI product’s cost base.
Won’t falling token prices fix margins on their own?: Usually not. When unit cost falls, teams spend the saving on more usage — deeper reasoning, bigger context, agentic loops that multiply tokens per task — so aggregate spend rises even as price-per-token drops. Cheaper per token does not mean cheaper overall; the saving gets reinvested rather than banked.
Should we still use per-seat pricing?: Rarely, for usage-heavy AI. Per-seat pricing breaks when cost scales with usage: heavy users on a flat fee erode margin. Price to the cost driver — usage or outcome — so revenue and cost move together. The market is shifting from seats toward usage/hybrid models for exactly this reason.
What is the Margin Floor?: The structural gross-margin level an AI product settles toward once inference is a real cost of goods, held down because efficiency gains get reinvested into more capability. You raise it by design — pricing to the cost driver, gating features by margin, and driving COGS down — not by waiting for tokens to get cheaper.
How do we measure AI product economics?: Treat the token as the unit of cost and attribute it: cost-per-request, gross-margin-by-feature, and gross-margin-by-customer — not just a company-level number. That is what surfaces the features and customers sitting below the Margin Floor before they quietly sink the business.

Our perspective

The common view

AI margins are disappointing now but will recover automatically as inference (tokens/GPUs) keeps getting cheaper — it is a temporary cost problem.

The Ivaaya view

Inference is COGS, not infrastructure, and falling prices will not rescue margins because consumption expands to absorb every saving (Jevons). Margin is a structural floor you raise by design — pricing to the cost driver, gating features by margin, and driving COGS down — not a deflation you can wait out. AI changed the economics of software, not just the mechanics.

“Token prices are collapsing (~10× a year), so the margin problem solves itself.”: — Unit price falls but usage rises to meet it — agentic loops multiply tokens per task and teams reinvest savings into more capability, so aggregate spend climbs. The saving is consumed, not banked; cheaper per token ≠ cheaper overall.
“Inference is just infrastructure — a platform cost you amortise like servers.”: — Infrastructure is fixed and shared across all users; inference is paid on every transaction and scales with usage. It belongs in COGS. Mislabelling it hides the one cost that grows exactly when the product succeeds.
“Then AI products are just structurally bad businesses.”: — No — they are different businesses. The Margin Floor is raisable by design: price to the cost driver (not seats), gate features by margin, and drive COGS down (routing, caching, distillation, owning inference). Founders who treat economics as part of product design build durable margins.

Account for inference in COGS and model gross-margin-by-feature and by-customer, not just at the company line.
Price to the cost driver (usage/outcome), not per seat.
Make economics part of product design — a feature must earn its marginal cost, not just its build cost.
Treat falling token prices as fuel for more capability, not as automatic margin.

The evidence & related ideas →

What we’ve observed

Classic SaaS ~70–90% gross margin vs AI-native products materially lower (early-2026 venture snapshots ~low-50s average, ICONIQ projects ~52%; best LLM-native ~65% per Bessemer; inference ~20–23% of AI product COST, not revenue) — self-reported, dated and directional.
For a fixed capability, inference price per token fell ~90% in ~2 years (≈10× per year); yet aggregate AI spend rose (enterprise ~$1.2M in 2024 → ~$7M in 2026).
Agentic workflows consume ~10–500× more tokens per task than a single chatbot call; ~72% of production AI cost sits outside the model invoice (orchestration, retrieval, retries, observability).
Reported negative per-unit margins while scaling (e.g. Copilot, Anthropic, Cursor) before pricing/inference changes — press-reported, bounded.

A demo that looked cheap until cost-per-call was multiplied by real usage.
A feature shipped on a flat plan that a power user turned margin-negative.

How certain are we?

AI products carry structurally lower gross margins than classic SaaS due to inference COGS — observed: Seen consistently in our own work.
Falling token prices do not automatically improve margins because usage expands to absorb savings — observed: Seen consistently in our own work.
The Margin Floor is raisable by pricing, margin-gating and COGS reduction — emerging: Still early, but increasingly visible.
Exact margin/cost figures (time-sensitive; some vendor/press-reported) — emerging: Still early, but increasingly visible.

Related ideas

The Margin Floor Decision Architecture

About the author

Priyanka Pandey

Founder & Editorial Lead

Priyanka Pandey founded Ivaaya and leads its editorial voice, translating real delivery experience into practical thinking on AI-native engineering, decision-making and technology leadership. Her work focuses on helping senior leaders make sense of the changes reshaping software delivery without adding to the noise.

Reviewed and challenged by

Sanjeev Purohit

Principal, Decision Architecture

Sanjeev works across enterprise architecture, product strategy and AI-native delivery. The ideas in this article have been challenged against real programmes, production systems and organisational decision-making before publication.

Part of 2 perspectives

Related thinking

Compare notes

If your AI product’s margin gets thinner the more people use it, the issue is usually that inference is sitting in the wrong place on the P&L. Tell us where the economics are biting — we are comparing notes with teams designing for the Margin Floor rather than waiting for tokens to get cheaper.

Where do the economics bite? →

This made me think of…