What’s the difference between an API model and an open-weight model?

With an API (proprietary) model — GPT, Claude/Opus, Gemini — you rent access; the model runs on the provider’s servers and the weights never leave them. With an open-weight model — Gemma, Llama, Mistral, Qwen — the provider publishes the file, so you can download it, run it on your own infrastructure (or a device), inspect it and usually fine-tune it.

Is “open weights” the same as “open source”?

No. Open weights means the trained model file is downloadable, but the training data and code are almost always withheld, so most “open” models aren’t open-source in the formal sense. Licences also differ a lot — some are permissive (Apache/MIT), others carry real commercial limits (e.g. Meta’s Llama caps free use at 700M monthly users and bans training rival models). Check the licence, not the label.

Is self-hosting cheaper than using an API?

Not usually, until volume is high and steady. The true cost of self-hosting is several times the raw GPU price once you include the engineering and reliability work, so for low or variable volume an API is typically cheaper. Self-hosting wins on economics only at scale — and on privacy, control, edge/offline and continuity at any scale.

API or open-weight — which should we choose?

Rent an API when you need frontier capability, speed with no operations, or low/variable volume. Own (run an open-weight model) when data cannot leave your boundary, you have very high steady volume, or you need edge/offline, deep customisation, or freedom from lock-in and model deprecation. Most serious setups are hybrid: frontier APIs for the hardest work, open/managed-open models for high-volume routine work.

Rent or Own: API Models vs Open Weights You Run Yourself

Two companies tell you they “use AI for customer support.” One sends each conversation to a provider over the internet and pays per use. The other runs a model it downloaded, on its own servers, where no customer data ever leaves the building. Same sentence, opposite architectures — and the difference shows up in their cost base, their compliance posture, and what happens to them the day a vendor changes its mind. This is the choice people most often flatten into “a model is a model.” It isn’t. There are really two questions hiding in one.

Two questions: rent or own — and where it runs.

Two questions, not one

The first question is what you’re allowed to have. With a proprietary model you rent access: you call it through an API, it runs on the provider’s hardware, and the model’s “weights” — the trained brain — never leave their building. You can’t download it, inspect it, or take it with you. With an open-weight model, the provider publishes the actual file; you can download it, run it, look inside, and usually fine-tune it. The second, separate question is where the computation happens: on someone else’s servers, inside your own cloud or data centre, or on a device in your hand. People collapse these into one axis, but they’re independent — and most of the interesting choices live in how you combine them.

The three points on the spectrum

In practice you’re choosing between three places to stand. First, proprietary frontier models via API — OpenAI’s GPT, Anthropic’s Claude and Opus, Google’s Gemini. You rent capability, pay per unit of text, and carry no infrastructure. Second, open-weight models you download — Google’s Gemma, Meta’s Llama, Mistral, Alibaba’s Qwen, DeepSeek. You own the file. Third — and this is where owning gets nuanced — where you actually run an open model: locally on a laptop or phone (data never leaves the device), inside your own cloud where the model and data stay within your security boundary, or via a managed-open host (AWS Bedrock, Google Vertex, Groq, Together) that runs the open model for you and bills per use, like an API. That last option matters: you can consume an open model through an API, so “open vs API” and “rent vs run-yourself” really are different axes. Most serious deployments end up hybrid — frontier APIs for the hardest work, cheaper open models for high-volume routine work.

“Open weights” is not “open source”

This is the single most misused phrase in the field, and it has bitten legal teams. “Open weights” means the trained file is downloadable — but the training data and training code are almost always withheld, so you can’t fully audit or reproduce how the model was built. Most “open” models are open-weight, not open-source in the formal sense. And the licence matters more than the label: some are genuinely permissive (Gemma now ships under Apache 2.0; many Mistral and Qwen releases are Apache/MIT — free commercial use, no strings), while others carry real limits. Meta’s Llama licence, for instance, is free to use commercially until you cross 700 million monthly users, at which point you must ask Meta’s permission, and it forbids using Llama’s outputs to train competing models. The useful instinct: before you build on an “open” model, read its licence the way you’d read any other supplier contract.

What matters to you	Rent it (proprietary API)	Own it (open-weight, self-run)
Raw capability	Still leads at the hard frontier	At/near parity for everyday tasks; lags slightly at the edge
Cost shape	Pay-per-use; cheap to start, scales with volume	Upfront + ops cost; cheaper per unit at very high steady volume
Data privacy / residency	Data leaves your boundary (providers commit not to train on it)	Data never leaves — often the only compliant path
Customisation	Limited to what the provider offers	Full — fine-tune, quantise, modify (within the licence)
Operational burden	Near zero — no GPUs, no scaling	Heavy — you run, scale and maintain it
Lock-in & continuity	Switching is real work; models get deprecated on a schedule	You own the file — it runs forever, you upgrade on your terms

Renting buys speed and frontier capability with no ops; owning buys control, privacy and continuity at the cost of running it yourself.

The rent-or-own decision is rarely about which model is smartest. It’s about where your data is allowed to go, what you can afford at the volume you’ll actually run, and whether you can live with someone else deprecating the thing you built on. Answer those, and the choice usually makes itself.
— Sanjeev Purohit, from our delivery work

Open weights ≠ open source — check the licence, not the label.

When to choose which

Renting (a frontier API) wins when you need the absolute best capability, want to move fast with no infrastructure, or have low and variable volume. Owning (open weights you run) wins when data cannot leave your boundary — regulated industries, data-residency rules — or when you have very high, steady volume where the economics flip, or when you need edge/offline operation, deep customisation, or freedom from deprecation and lock-in. As of 2026 the capability gap has closed for most everyday tasks (open models are “good enough” for a great deal of work) even as the frontier labs may still lead — and possibly pull ahead — on the hardest reasoning. The pragmatic default that follows is the hybrid one: rent the frontier for the hard 10%, own or use managed-open models for the routine 90%. The question was never “open or closed?” It was “what does this particular workload actually need?”

Frequently asked

What’s the difference between an API model and an open-weight model?: With an API (proprietary) model — GPT, Claude/Opus, Gemini — you rent access; the model runs on the provider’s servers and the weights never leave them. With an open-weight model — Gemma, Llama, Mistral, Qwen — the provider publishes the file, so you can download it, run it on your own infrastructure (or a device), inspect it and usually fine-tune it.
Is “open weights” the same as “open source”?: No. Open weights means the trained model file is downloadable, but the training data and code are almost always withheld, so most “open” models aren’t open-source in the formal sense. Licences also differ a lot — some are permissive (Apache/MIT), others carry real commercial limits (e.g. Meta’s Llama caps free use at 700M monthly users and bans training rival models). Check the licence, not the label.
Is self-hosting cheaper than using an API?: Not usually, until volume is high and steady. The true cost of self-hosting is several times the raw GPU price once you include the engineering and reliability work, so for low or variable volume an API is typically cheaper. Self-hosting wins on economics only at scale — and on privacy, control, edge/offline and continuity at any scale.
API or open-weight — which should we choose?: Rent an API when you need frontier capability, speed with no operations, or low/variable volume. Own (run an open-weight model) when data cannot leave your boundary, you have very high steady volume, or you need edge/offline, deep customisation, or freedom from lock-in and model deprecation. Most serious setups are hybrid: frontier APIs for the hardest work, open/managed-open models for high-volume routine work.