For two decades, software was a wonderful business for one quiet reason: once you had written it, serving the next customer cost almost nothing. Marginal cost trended to zero, gross margins settled at 75–85%, and those margins funded everything else. AI breaks that premise at the foundation. An AI product pays real money — compute, tokens, energy — every single time a customer uses it. The marginal cost is no longer trivial; it is often the largest variable line in the cost of goods, and it scales with usage rather than melting away at scale. This is not a tooling change. It is an economic one.
Intelligence became abundant. Margin became scarce.
The end of zero-marginal-cost software
The numbers, taken carefully, tell a consistent story. Classic SaaS gravitates to roughly 70–90% gross margin. AI-native products run materially lower — venture snapshots in early 2026 put the average in the low-50s (ICONIQ’s 2026 survey projects around 52%), with the best LLM-native companies nearer 65% (Bessemer), and inference alone eating something like a fifth of an AI product’s cost base. (Treat any single figure as dated, self-reported and directional — these are venture-sample projections, not audited actuals, and they move — but the gap is not in dispute.) The reason is structural, not temporary: inference is a per-transaction variable cost. It behaves more like cost of goods sold than like infrastructure. And that distinction is the whole game.
Infrastructure is a fixed platform cost you pay to be in business and then amortise across everyone. Cost of goods sold is a cost you pay on every unit you sell. Treating inference as the former when it is really the latter is the category error that quietly destroys AI-product margins — because it hides the one cost that grows precisely when you succeed.
But token prices are collapsing
The obvious objection is that this fixes itself. Inference is getting cheaper at a startling rate: for a fixed capability, the price per token has fallen steeply — independent analysis from Epoch AI puts it at roughly five-to-ten times cheaper per year on the cost-performance frontier, and faster still at the task level, so a given capability that cost real money two years ago now costs a fraction of it. If the cost of the thing in your COGS is collapsing, surely the margin problem solves itself. It is a reasonable assumption. It is also usually wrong.
The savings never stay savings
Here is the part that catches people out. When the unit cost of a capability falls, organisations almost never bank the saving. They spend it — on more context, deeper reasoning, more steps, more personalisation, more automation, more of the things that were previously too expensive to do. Agentic workflows are the clearest case: a single task that once took one model call now takes a loop of them, consuming roughly an order of magnitude more tokens — Anthropic reports its own multi-agent systems use around fifteen times the tokens of a single chat. So per-token price falls while tokens-per-task rises, and aggregate spend climbs rather than falls. Cheaper per token does not mean cheaper overall. The margin you expected the price drop to hand you was quietly consumed by the extra usage that same price drop unlocked.
Every time the cost of a capability fell, the saving was rarely banked — it was reinvested into something previously unaffordable: more context, more automation, more personalisation. Capacity expanded to consume the efficiency, so cost per unit fell while expectations rose. Efficiency rarely arrives as profit on its own; somebody has to decide to keep the saving rather than spend it.
| Classic software | AI product | |
|---|---|---|
| Cost to serve | Near-zero marginal cost | Real money on every use — behaves like cost of goods sold |
| What a price cut does | Margin expands as you scale | Margin stays flat — usage expands to absorb the saving |
| The line to design for | A gross margin you can assume | The Margin Floor you have to engineer |
So margin is a design problem: the Margin Floor
Put the two forces together — falling unit price, rising usage — and you get a gross-margin floor that does not lift on its own. We call it the Margin Floor: the structural level your gross margin settles toward once inference is a real cost of goods, held down by the fact that every efficiency gain gets reinvested into more capability. You do not reach a healthy margin by waiting for tokens to get cheaper. You reach it by design, on three levers.
- Price to the cost driver. Per-seat pricing breaks when cost scales with usage — your heaviest users, on a flat fee, are exactly the ones losing you money. The market is already moving from seats to usage and outcome pricing for this reason. Align what you charge with what drives your cost, or your best customers become your worst accounts.
- Gate features by margin. Every feature now carries a marginal cost, so every feature has to earn it. This is where deciding what to build meets deciding what to run: a capability that was cheap to build can still be a feature you cannot afford to serve. Measure gross margin by feature and by customer, not just at the company line.
- Drive COGS down deliberately. Route easy work to small models and escalate only when needed; cache repeated context; distil; and, at scale, consider owning inference rather than renting it. These are real levers — but they are engineering work with trade-offs, not a free lunch.
Can we afford to succeed with it?
The AI P&L now has two halves: what it costs to build and deliver the software, and what it costs to run it for every customer, every day. The second half is new, and it is the one founders underprice. The discipline is not pessimism about AI — it is treating inference as the cost of goods it is, designing the Margin Floor instead of waiting for it, and asking a sharper question than “can we afford to build this?”. Because the hardest part of an AI product is not getting the answer. It is getting the economics to work.
Frequently asked
- Why are AI product gross margins lower than SaaS?
- Because inference is a per-use cost of goods, not a near-zero marginal cost. Every query re-runs the model and consumes compute, so cost scales with usage. Classic SaaS sits around 70–90% gross margin; AI-native products commonly run materially lower (roughly low-50s to mid-60s gross margin, dated and directional), with inference alone a large share of an AI product’s cost base.
- Won’t falling token prices fix margins on their own?
- Usually not. When unit cost falls, teams spend the saving on more usage — deeper reasoning, bigger context, agentic loops that multiply tokens per task — so aggregate spend rises even as price-per-token drops. Cheaper per token does not mean cheaper overall; the saving gets reinvested rather than banked.
- Should we still use per-seat pricing?
- Rarely, for usage-heavy AI. Per-seat pricing breaks when cost scales with usage: heavy users on a flat fee erode margin. Price to the cost driver — usage or outcome — so revenue and cost move together. The market is shifting from seats toward usage/hybrid models for exactly this reason.
- What is the Margin Floor?
- The structural gross-margin level an AI product settles toward once inference is a real cost of goods, held down because efficiency gains get reinvested into more capability. You raise it by design — pricing to the cost driver, gating features by margin, and driving COGS down — not by waiting for tokens to get cheaper.
- How do we measure AI product economics?
- Treat the token as the unit of cost and attribute it: cost-per-request, gross-margin-by-feature, and gross-margin-by-customer — not just a company-level number. That is what surfaces the features and customers sitting below the Margin Floor before they quietly sink the business.