We hardened the build-time supply chain and left the runtime one wide open. Over the last decade, getting software safely into production became a supply-chain discipline: we learned to track where every dependency came from, to sign artefacts, to generate a bill of materials, to refuse a build whose provenance we could not establish. Then we wired up AI agents and quietly abandoned all of it. Yet the information actually feeding the agent — the documents it retrieves, the records it reads, the snippets assembled into its context at the moment it answers — often arrives with unknown origin, unknown freshness and unknown integrity. We pour it straight into the model and act surprised when the output is confidently wrong.
This is the runtime sibling of a question we have asked elsewhere. If governing what an agent is allowed to do is an access problem — the Agent Gateway — then governing what an agent is allowed to believe is a supply problem. Two questions sit underneath almost every serious enterprise AI failure: can the agent see it, and can the agent touch it? This article is about the first. And the uncomfortable truth it leads to is short enough to put on a wall: your agent is only as good as its worst context supplier.
Your agent is only as good as its worst context supplier.
The model is no longer the ceiling
The instinct, when an agent disappoints, is to reach for a better model. Increasingly that is the wrong lever. The dominant constraint on reliable output is moving upstream, to the quality and structure of the context the model is given. The peer-reviewed work on long contexts is blunt about it: models do not use a long input evenly. Performance is highest when the relevant information sits near the start or end and degrades sharply when it is buried in the middle — the "lost in the middle" effect — and merely changing the position of the same information, without changing a word of it, measurably changes the answer. A bigger window does not rescue this; it has been reproduced across model families and on systems explicitly built for long context. Grounding does not rescue it either. Even when a retrieval system supplies relevant, correct sources, models still introduce unsupported claims and contradictions: retrieved is not the same as true, and grounded is not the same as correct.
So the reflex that says "we will just use a bigger context window" misunderstands the problem. A bigger context window is a bigger receiving dock, not a better supply chain. It lets you take delivery of more material; it does nothing to tell you where that material came from, whether it is current, or whether it has been tampered with. (The easy case has genuinely been solved — a frontier model can now find a single fact in a million tokens almost perfectly — but real workloads are not single-fact lookups. They are many competing documents, partial matches and multi-step reasoning, which is exactly where the quality of the supply chain reasserts itself.)
Context deserves a supply chain
The word "supply chain" is doing real work here, not decoration. The moment you say it, decades of manufacturing and data-engineering intuition arrive for free: there are suppliers of varying quality, there are defects, there is provenance and chain of custody, there is freshness and spoilage, there are audits and recalls. Every one of those concepts has an exact analogue in how context reaches an agent — and almost none of them is currently being applied. The Context Supply Chain is the deliberate treatment of that flow as something to be governed end to end, on three questions of trust:
- Provenance — can I trust where this came from? The origin, authority and lineage of every piece of context: which source, which version, who owns it, whether it is allowed to be here at all.
- Freshness — can I trust that it is current? Whether the document, the index and the embedding reflect the world as it is now, or as it was before the policy changed, the price moved, or the record was superseded.
- Integrity — can I trust that it hasn’t been corrupted? Whether the content is what its source intended, or whether it has been poisoned, injected, duplicated or quietly degraded somewhere along the chain.
Provenance — can I trust where this came from?
Most retrieval systems make a quiet, dangerous assumption: that whatever is in the knowledge base is reliable. It is not. Even the most curated corpora carry error — the medical literature indexed in PubMed contains fraudulent and retracted papers, and tens of thousands of later works cite them as if they were sound. An enterprise corpus is worse, not better: it mixes drafts with published material, superseded policies with current ones, authoritative records with someone’s exported notes. Provenance is the discipline of knowing, for every piece of context, where it originated, which version it is, how authoritative the source is, and whether it should be trusted — and of carrying that lineage through to the answer so it can be inspected later. This is where the data-engineering ideas of lineage and data contracts earn their place: a context source is a supplier, and a supplier without a contract is a liability.
Freshness — can I trust that it is current?
Context has a shelf life, and most systems have no idea what theirs is. An embedding computed last quarter, an index that has not been rebuilt since the policy changed, a cached document that was superseded weeks ago — each is a perfectly retrievable, perfectly confident piece of stale reality. Freshness is the discipline of treating right-time as seriously as right-text: knowing how old each piece of context is, setting freshness expectations the way you would set any service level, and re-indexing on change rather than on a calendar. The failure mode is quiet precisely because nothing breaks — the agent answers fluently from a world that no longer exists. (This is distinct from the length-driven degradation above; it is time-driven, and it is the part of "context rot" that has nothing to do with how many tokens you sent.)
Integrity — can I trust that it hasn’t been corrupted?
The third question is the one security has only half-answered. The industry has formalised the build-time supply chain — the OWASP guidance treats the integrity of training data, models and components as a first-class attack surface. But the runtime context supply chain, the content retrieved and assembled while the agent is actually working, sits in the gap. This is where indirect prompt injection lives: a malicious instruction planted in a document or a web page that the agent dutifully retrieves and then obeys, because to the model a retrieved instruction and a trusted one look identical. It is where data-source poisoning lives. Integrity is the discipline of least-trust on sources, of validating content before it becomes context, and of treating the runtime chain with the same suspicion we now apply to the build. Which is the bridge to its sibling: the Agent Gateway governs what the agent may touch; the Context Supply Chain governs what it may take in. Govern both, or govern neither.
Quality control you can actually run
None of this is governable by good intentions; it needs measurement at the gate. The retrieval-evaluation tools that have matured over the last two years are the supply chain’s quality control: you can score whether the retrieved context actually supports the answer (faithfulness) and whether the retriever ranked relevant material above noise (context precision), and you can run those scores against a golden set on every change. Two honest caveats keep this from becoming theatre. First, a faithfulness score measures whether an answer is grounded in the retrieved context — not whether that context is true; it cannot catch a trusted-but-wrong source, which is exactly why provenance and freshness still matter. Second, the most underrated control is observability: on a bad answer, can you reconstruct precisely what context the agent was fed? If you cannot, you are not running a supply chain; you are hoping. Because bad context is technical debt with a response time — it does not announce itself in a build log, it surfaces as a confident, plausible, wrong answer in front of a customer.
The agent was not failing because retrieval was broken — it was faithfully reflecting the confusion already in the source material: drafts beside published material, superseded policies beside current ones. The teams that recovered stopped treating it as a search problem and started treating it as a supply-chain problem: where did this originate, how fresh is it, which source is authoritative.
| Pillar | Trust question | What it checks |
|---|---|---|
| Provenance | Can I trust where it came from? | Origin, ownership and lineage of every source |
| Freshness | Can I trust it is current? | Age, change-rate and staleness of the data |
| Integrity | Can I trust it is uncorrupted? | Tampering, injection and contract conformance |
Where this is a bridge
This is deliberately a piece for two readers. For the architect, the Context Supply Chain is a governance question: context is an asset — we have argued before that the context layer is the architecture — and assets need supply chains, with owners, contracts and audits. For the engineer, it is a build: the pipeline of sourcing, ingestion, chunking, embedding, indexing, retrieval and assembly, with a quality gate at each stage and observability across the whole. Same article, two entry points. It is also the doctrine that sits above the more specific work to come — retrieval that does not rot, memory hierarchies, the context-layer reference architecture — each of which is one link in this chain. Trust is upstream; govern the supply, not just the model.
Stop upgrading the receiving dock
The organisations that win the next phase of enterprise AI will not be the ones with the biggest context window or the newest model. They will be the ones whose agents are fed the cleanest, freshest, best-sourced context — because they treated that context as a supply chain and governed it like one. Most hallucinations begin long before the model answers, in an upstream defect nobody owned. The fix is not a bigger receiving dock. It is provenance, freshness and integrity, applied to the flow of reality you hand the machine.
Frequently asked
- What is the context supply chain?
- The end-to-end pipeline by which context reaches an agent — upstream sources (documents, code, tickets, tools, the web, other agents) through ingestion, chunking, embedding, indexing, retrieval and assembly into the context window — treated as a supply chain to be governed for Provenance (where it came from), Freshness (whether it is current) and Integrity (whether it has been corrupted).
- Isn’t a bigger context window the fix?
- No. A bigger context window is a bigger receiving dock, not a better supply chain. "Lost in the middle" and length-driven degradation mean models do not use long inputs evenly, and a bigger window does nothing to establish origin, freshness or integrity. The easy single-fact lookup is largely solved on frontier models; real multi-document, multi-step work is where supply-chain quality reasserts itself.
- Does RAG or grounding stop hallucination?
- It reduces but does not eliminate it. Even with relevant, correct sources, models introduce unsupported claims and contradictions — retrieved is not true, and grounded is not correct. Retrieval quality bounds output quality, so noisy or untrusted context produces confidently wrong answers.
- How do you measure context quality?
- With retrieval-evaluation metrics — faithfulness (does the answer stay grounded in the retrieved context?) and context precision (did the retriever rank relevant material above noise?) — run against a golden set on every change. Caveat: faithfulness measures groundedness in the context, a proxy for hallucination, not factual truth; it cannot catch a trusted-but-wrong source.
- How does this relate to the Agent Gateway?
- They are the two runtime-trust questions. The Context Supply Chain governs what an agent is allowed to believe ("can the agent see it?"); the Agent Gateway governs what an agent is allowed to do ("can the agent touch it?"). Govern both, or govern neither.