Skip to content

Context Architecture · 13 min read · Updated 2026-06-20

The Context Supply Chain

We hardened the build-time supply chain and left the runtime one wide open. The information actually feeding an agent often arrives with unknown origin, freshness and integrity — and your agent is only as good as its worst context supplier. Context deserves a supply chain: provenance, freshness, integrity.

By Priyanka Pandey · Founder & Editorial Lead

Reviewed and challenged by Sanjeev Purohit · Principal, Decision Architecture

Built from

  • Field experience
  • Independent research
  • Data-backed
  • Original framework
  • Reviewed with field experience

Last substantively reviewed · 2026-06-20

In brief

The dominant lever on enterprise AI-agent output quality is increasingly the context fed to the model, not the model itself — so context should be governed end-to-end like a supply chain on three trust questions (Provenance: where from? Freshness: still current? Integrity: uncorrupted?), because your agent is only as good as its worst context supplier.

  • We hardened the build-time software supply chain (SBOMs, provenance, signed artefacts) and left the runtime context supply chain — the information actually feeding the agent — unsourced, unversioned, stale and unguarded.
  • The model is no longer the ceiling: lost-in-the-middle and length-driven degradation mean a bigger context window is a bigger receiving dock, not a better supply chain. Retrieved ≠ true; grounded ≠ correct.
  • The Context Supply Chain governs the flow of context on three trust questions: Provenance (where from?), Freshness (still current?), Integrity (uncorrupted?).
  • Your agent is only as good as its worst context supplier; bad context is technical debt with a response time.
  • X10 is the runtime data-sibling of X09: the Context Supply Chain governs what an agent may BELIEVE ("can the agent see it?"); the Agent Gateway governs what it may DO ("can the agent touch it?").
  • Context is an asset (context-is-architecture); assets need supply chains — owners, contracts, audits, recalls.
  • Quality control is measurable at the gate (faithfulness, context precision) but faithfulness is a groundedness proxy, not a truth meter — observability of what context the agent actually used is the underrated control.

We hardened the build-time supply chain and left the runtime one wide open. Over the last decade, getting software safely into production became a supply-chain discipline: we learned to track where every dependency came from, to sign artefacts, to generate a bill of materials, to refuse a build whose provenance we could not establish. Then we wired up AI agents and quietly abandoned all of it. Yet the information actually feeding the agent — the documents it retrieves, the records it reads, the snippets assembled into its context at the moment it answers — often arrives with unknown origin, unknown freshness and unknown integrity. We pour it straight into the model and act surprised when the output is confidently wrong.

This is the runtime sibling of a question we have asked elsewhere. If governing what an agent is allowed to do is an access problem — the Agent Gateway — then governing what an agent is allowed to believe is a supply problem. Two questions sit underneath almost every serious enterprise AI failure: can the agent see it, and can the agent touch it? This article is about the first. And the uncomfortable truth it leads to is short enough to put on a wall: your agent is only as good as its worst context supplier.

Your agent is only as good as its worst context supplier.

The model is no longer the ceiling

The instinct, when an agent disappoints, is to reach for a better model. Increasingly that is the wrong lever. The dominant constraint on reliable output is moving upstream, to the quality and structure of the context the model is given. The peer-reviewed work on long contexts is blunt about it: models do not use a long input evenly. Performance is highest when the relevant information sits near the start or end and degrades sharply when it is buried in the middle — the "lost in the middle" effect — and merely changing the position of the same information, without changing a word of it, measurably changes the answer. A bigger window does not rescue this; it has been reproduced across model families and on systems explicitly built for long context. Grounding does not rescue it either. Even when a retrieval system supplies relevant, correct sources, models still introduce unsupported claims and contradictions: retrieved is not the same as true, and grounded is not the same as correct.

So the reflex that says "we will just use a bigger context window" misunderstands the problem. A bigger context window is a bigger receiving dock, not a better supply chain. It lets you take delivery of more material; it does nothing to tell you where that material came from, whether it is current, or whether it has been tampered with. (The easy case has genuinely been solved — a frontier model can now find a single fact in a million tokens almost perfectly — but real workloads are not single-fact lookups. They are many competing documents, partial matches and multi-step reasoning, which is exactly where the quality of the supply chain reasserts itself.)

The context supply chain: sources → ingest → retrieve → assemble → agent, governed at every stage by Provenance, Freshness and Integrity. Defects compound downstream.

Context deserves a supply chain

The word "supply chain" is doing real work here, not decoration. The moment you say it, decades of manufacturing and data-engineering intuition arrive for free: there are suppliers of varying quality, there are defects, there is provenance and chain of custody, there is freshness and spoilage, there are audits and recalls. Every one of those concepts has an exact analogue in how context reaches an agent — and almost none of them is currently being applied. The Context Supply Chain is the deliberate treatment of that flow as something to be governed end to end, on three questions of trust:

  • Provenance — can I trust where this came from? The origin, authority and lineage of every piece of context: which source, which version, who owns it, whether it is allowed to be here at all.
  • Freshness — can I trust that it is current? Whether the document, the index and the embedding reflect the world as it is now, or as it was before the policy changed, the price moved, or the record was superseded.
  • Integrity — can I trust that it hasn’t been corrupted? Whether the content is what its source intended, or whether it has been poisoned, injected, duplicated or quietly degraded somewhere along the chain.
Three pillars, three questions of trust: Provenance (where from?), Freshness (still current?), Integrity (uncorrupted?).

Provenance — can I trust where this came from?

Most retrieval systems make a quiet, dangerous assumption: that whatever is in the knowledge base is reliable. It is not. Even the most curated corpora carry error — the medical literature indexed in PubMed contains fraudulent and retracted papers, and tens of thousands of later works cite them as if they were sound. An enterprise corpus is worse, not better: it mixes drafts with published material, superseded policies with current ones, authoritative records with someone’s exported notes. Provenance is the discipline of knowing, for every piece of context, where it originated, which version it is, how authoritative the source is, and whether it should be trusted — and of carrying that lineage through to the answer so it can be inspected later. This is where the data-engineering ideas of lineage and data contracts earn their place: a context source is a supplier, and a supplier without a contract is a liability.

Freshness — can I trust that it is current?

Context has a shelf life, and most systems have no idea what theirs is. An embedding computed last quarter, an index that has not been rebuilt since the policy changed, a cached document that was superseded weeks ago — each is a perfectly retrievable, perfectly confident piece of stale reality. Freshness is the discipline of treating right-time as seriously as right-text: knowing how old each piece of context is, setting freshness expectations the way you would set any service level, and re-indexing on change rather than on a calendar. The failure mode is quiet precisely because nothing breaks — the agent answers fluently from a world that no longer exists. (This is distinct from the length-driven degradation above; it is time-driven, and it is the part of "context rot" that has nothing to do with how many tokens you sent.)

Integrity — can I trust that it hasn’t been corrupted?

The third question is the one security has only half-answered. The industry has formalised the build-time supply chain — the OWASP guidance treats the integrity of training data, models and components as a first-class attack surface. But the runtime context supply chain, the content retrieved and assembled while the agent is actually working, sits in the gap. This is where indirect prompt injection lives: a malicious instruction planted in a document or a web page that the agent dutifully retrieves and then obeys, because to the model a retrieved instruction and a trusted one look identical. It is where data-source poisoning lives. Integrity is the discipline of least-trust on sources, of validating content before it becomes context, and of treating the runtime chain with the same suspicion we now apply to the build. Which is the bridge to its sibling: the Agent Gateway governs what the agent may touch; the Context Supply Chain governs what it may take in. Govern both, or govern neither.

The runtime-trust pairing: can the agent SEE it? — the Context Supply Chain (believe); can the agent TOUCH it? — the Agent Gateway (do).

Quality control you can actually run

None of this is governable by good intentions; it needs measurement at the gate. The retrieval-evaluation tools that have matured over the last two years are the supply chain’s quality control: you can score whether the retrieved context actually supports the answer (faithfulness) and whether the retriever ranked relevant material above noise (context precision), and you can run those scores against a golden set on every change. Two honest caveats keep this from becoming theatre. First, a faithfulness score measures whether an answer is grounded in the retrieved context — not whether that context is true; it cannot catch a trusted-but-wrong source, which is exactly why provenance and freshness still matter. Second, the most underrated control is observability: on a bad answer, can you reconstruct precisely what context the agent was fed? If you cannot, you are not running a supply chain; you are hoping. Because bad context is technical debt with a response time — it does not announce itself in a build log, it surfaces as a confident, plausible, wrong answer in front of a customer.

The agent was not failing because retrieval was broken — it was faithfully reflecting the confusion already in the source material: drafts beside published material, superseded policies beside current ones. The teams that recovered stopped treating it as a search problem and started treating it as a supply-chain problem: where did this originate, how fresh is it, which source is authoritative.
Sanjeev Purohit, from our delivery work
PillarTrust questionWhat it checks
ProvenanceCan I trust where it came from?Origin, ownership and lineage of every source
FreshnessCan I trust it is current?Age, change-rate and staleness of the data
IntegrityCan I trust it is uncorrupted?Tampering, injection and contract conformance
The Context Supply Chain — your agent is only as good as its worst context supplier.

Where this is a bridge

This is deliberately a piece for two readers. For the architect, the Context Supply Chain is a governance question: context is an asset — we have argued before that the context layer is the architecture — and assets need supply chains, with owners, contracts and audits. For the engineer, it is a build: the pipeline of sourcing, ingestion, chunking, embedding, indexing, retrieval and assembly, with a quality gate at each stage and observability across the whole. Same article, two entry points. It is also the doctrine that sits above the more specific work to come — retrieval that does not rot, memory hierarchies, the context-layer reference architecture — each of which is one link in this chain. Trust is upstream; govern the supply, not just the model.

Stop upgrading the receiving dock

The organisations that win the next phase of enterprise AI will not be the ones with the biggest context window or the newest model. They will be the ones whose agents are fed the cleanest, freshest, best-sourced context — because they treated that context as a supply chain and governed it like one. Most hallucinations begin long before the model answers, in an upstream defect nobody owned. The fix is not a bigger receiving dock. It is provenance, freshness and integrity, applied to the flow of reality you hand the machine.

Frequently asked

What is the context supply chain?
The end-to-end pipeline by which context reaches an agent — upstream sources (documents, code, tickets, tools, the web, other agents) through ingestion, chunking, embedding, indexing, retrieval and assembly into the context window — treated as a supply chain to be governed for Provenance (where it came from), Freshness (whether it is current) and Integrity (whether it has been corrupted).
Isn’t a bigger context window the fix?
No. A bigger context window is a bigger receiving dock, not a better supply chain. "Lost in the middle" and length-driven degradation mean models do not use long inputs evenly, and a bigger window does nothing to establish origin, freshness or integrity. The easy single-fact lookup is largely solved on frontier models; real multi-document, multi-step work is where supply-chain quality reasserts itself.
Does RAG or grounding stop hallucination?
It reduces but does not eliminate it. Even with relevant, correct sources, models introduce unsupported claims and contradictions — retrieved is not true, and grounded is not correct. Retrieval quality bounds output quality, so noisy or untrusted context produces confidently wrong answers.
How do you measure context quality?
With retrieval-evaluation metrics — faithfulness (does the answer stay grounded in the retrieved context?) and context precision (did the retriever rank relevant material above noise?) — run against a golden set on every change. Caveat: faithfulness measures groundedness in the context, a proxy for hallucination, not factual truth; it cannot catch a trusted-but-wrong source.
How does this relate to the Agent Gateway?
They are the two runtime-trust questions. The Context Supply Chain governs what an agent is allowed to believe ("can the agent see it?"); the Agent Gateway governs what an agent is allowed to do ("can the agent touch it?"). Govern both, or govern neither.

Our perspective

The common view

AI quality is a model problem: pick a better model and a bigger context window, retrieve some documents (RAG), and grounding will keep it honest. Context is a search/plumbing detail.

The Ivaaya view

The ceiling is increasingly upstream: the quality, provenance and freshness of the context, not the model. We hardened the build-time supply chain and left the runtime one wide open. Context should be governed end-to-end like a supply chain — Provenance, Freshness, Integrity — because your agent is only as good as its worst context supplier, and bad context is technical debt with a response time. This is the runtime data-sibling of the Agent Gateway: the Context Supply Chain governs what an agent may believe; the Gateway governs what it may do.

Bigger context windows / better models will make this go away.
No. Lost-in-the-middle and length-driven degradation mean a bigger window is a bigger receiving dock, not a better supply chain; the easy single-fact case is solved but multi-document, multi-step work is where supply-chain quality reasserts itself. The model can only reason with the reality you supply.
We use RAG, so the answers are grounded and therefore safe.
Grounded ≠ correct. Even with relevant sources models add unsupported claims, and most systems assume the knowledge base is trustworthy when it is full of drafts, duplicates, superseded and occasionally poisoned content. Faithfulness scores measure groundedness in the context, not truth.
Isn’t this just context engineering / RAG tuning?
Context engineering designs the layer you own; the Context Supply Chain governs the whole flow that feeds the agent — including sources you do not own — for provenance, freshness and integrity. It is governance over a pipeline, not prompt or retrieval tuning.
  • Treat every context source as a supplier with provenance and a data contract; carry lineage through to the answer so it can be inspected.
  • Set freshness expectations like service levels; re-index on change, not on a calendar; detect stale embeddings/indexes.
  • Apply least-trust to runtime sources; validate content before it becomes context; defend against indirect injection / retrieval poisoning (the runtime sibling of access).
  • Gate the chain with retrieval/faithfulness evals against a golden set, and invest in context observability — on a bad answer, reconstruct exactly what the agent was fed.
The evidence & related ideas →

What we’ve observed

  • Lost-in-the-middle: models use long inputs unevenly (U-shaped); changing the position of identical information measurably changes the answer, even in long-context models — a bigger window does not fix it (Liu et al., TACL 2024 — peer-reviewed).
  • Grounding reduces but does not eliminate hallucination: even with relevant, correct sources, models add unsupported claims/contradictions (arXiv:2505.04847, EMNLP 2025).
  • Upstream source trust is real risk: most RAG assumes the knowledge base is reliable; even PubMed contains fraudulent/retracted papers with tens of thousands of downstream citations (arXiv:2510.09106 §3.3; Retraction Watch).
  • OWASP LLM03:2025 formalises the BUILD-TIME LLM supply chain (training data, models, components); the RUNTIME context supply chain (retrieved content, indirect injection) sits under LLM01 — under-governed by comparison.
  • The easy single-needle case is largely solved on a frontier model (Gemini 2.5 Flash) but does NOT generalise to multi-needle / multi-hop reasoning (arXiv:2511.05850).
  • Context engineering is maturing as a discipline (Thoughtworks moved it to Adopt, Apr 2026; Anthropic "attention budget" / finite resource) — but framed as engineering, not as a governed supply chain (naming white-space).
  • The RAG demo that dazzled on a curated folder and collapsed on the real corpus — drafts beside published, superseded policies beside current, duplicates competing — the model faithfully reflecting the confusion in the sources.
  • An agent confidently answering from a stale embedding / superseded policy because nobody measured freshness or versioned the source.

How certain are we?

  • Models use long context unevenly (lost-in-the-middle); position changes the answer; a bigger window does not fix itestablished: Observed repeatedly across delivery programmes.
  • Grounding/RAG reduces but does not eliminate hallucination; retrieved ≠ trueestablished: Observed repeatedly across delivery programmes.
  • The runtime context supply chain is under-governed relative to the build-time supply chainobserved: Seen consistently in our own work.
  • Context quality is increasingly the dominant lever on agent output qualityemerging: Still early, but increasingly visible.
  • Context should be governed as a supply chain on Provenance/Freshness/Integrity (our argued framework)emerging: Still early, but increasingly visible.

Related ideas

About the author

Priyanka Pandey

Founder & Editorial Lead

Priyanka Pandey founded Ivaaya and leads its editorial voice, translating real delivery experience into practical thinking on AI-native engineering, decision-making and technology leadership. Her work focuses on helping senior leaders make sense of the changes reshaping software delivery without adding to the noise.

Reviewed and challenged by

Sanjeev Purohit

Principal, Decision Architecture

Sanjeev works across enterprise architecture, product strategy and AI-native delivery. The ideas in this article have been challenged against real programmes, production systems and organisational decision-making before publication.

Compare notes

If your agent dazzles on a curated demo and falls over on the real corpus, the model is rarely the problem — the context supply chain is. Tell us where the information feeding your agents arrives with unknown origin, freshness or integrity; we are comparing notes with teams governing context as a supply chain — provenance, freshness, integrity — rather than reaching for a bigger window.

Where does your context come from?