AI Engineering · 6 min read · Updated 2026-06-18

Context Engineering: Context Is the New Architecture

Most context-engineering advice optimises the ephemeral window. The durable asset is the context layer you own and version: the conventions, decisions and contracts that let a generated change be judged correct, fast.

By Priyanka Pandey · Founder & Editorial Lead

Reviewed and challenged by Sanjeev Purohit · Principal, Decision Architecture

Built from

Independent research
Original framework
Reviewed with field experience

Last substantively reviewed · 2026-06-18

Part of Agentic Engineering · The AI Engineering Maturity Model

In brief

Context engineering is owning and versioning the persistent context layer — conventions, decisions and contracts — that lets a generated change be judged correct quickly: the durable asset, not the per-prompt window.

Prompt engineering was the instruction; context engineering is everything else around it.
More context is not better — relevant, structured, versioned context is.
The asset is the persistent layer you own and version, not the window your tool rebuilds each prompt.

Best for

Any team running agents against a real, evolving codebase

Not for

One-off prompts with no durability requirement

Once generation becomes abundant, the binding constraint moves to acceptance: the judgement that a change is correct, safe and worth shipping. We have argued this is the central economic fact of AI-native engineering. The natural follow-up question is operational. What, concretely, lets a team make that judgement quickly and confidently? Increasingly the answer is not a better model or a cleverer prompt. It is context.

In 2025 the industry quietly renamed its own problem. Tobi Lutke popularised the phrase 'context engineering' in June; Andrej Karpathy amplified it days later, arguing for '+1 for context engineering over prompt engineering' on the grounds that a prompt connotes a short task description while real systems require curating the full information payload. He described it as 'the delicate art and science of filling the context window with just the right information for the next step.' Within months the conversation had matured from intuition-led prompting into something with a name, a literature and a discipline.

Prompt engineering was the instruction. Context engineering is everything else.

The distinction matters because it changes who is accountable. Prompt engineering is writing instructions. Context engineering, as Anthropic puts it, is 'the set of strategies for curating and maintaining the optimal set of tokens during LLM inference' — instructions, retrieved knowledge, memory, tools and prior outputs, all of it. Anthropic, Thoughtworks, Karpathy, MIT Technology Review and a 1,400-paper arXiv survey have all converged on roughly the same boundary. This is rare alignment, and it tells you the field has found a real seam.

It also explains why 'vibe coding' faded. Thoughtworks' Rachel Laycock observed that the industry 'couldn't stop talking about vibe coding' at the start of 2025, then watched it 'practically disappear,' displaced by serious work on context, infrastructure and security. The romance of conjuring software from a hunch gave way to the engineering of what the model is allowed to see.

More context is not better

The most counter-intuitive finding, and the one most teams learn the expensive way, is that filling the window degrades reasoning. Anthropic calls it 'context rot': as token count rises, recall falls, because attention scales as n-squared pairwise relationships across n tokens. There is a finite attention budget, and you can overspend it. Stanford's 'Lost in the Middle' study found the same U-shaped curve two years earlier — models attend well to the start and end of a long context and lose the middle. Thoughtworks' response is 'progressive context disclosure': start with a lightweight index of what is available and let the agent pull in only what it needs, rather than front-loading everything.

Context rot — recall across a long window

Beginning · highMiddle · lowEnd · high

Context rot — the middle is lost first.

The practical toolkit is now reasonably settled. Anthropic names compaction (summarise near the limit, then reinitialise), structured note-taking (persistent notes held outside the window), sub-agent architectures (clean windows that return tight 1,000-to-2,000-token summaries) and just-in-time retrieval (load data at runtime through tools rather than stuffing it in up front). These are good techniques. But notice what they have in common: every one of them optimises the ephemeral, per-prompt window your tool reconstructs on each turn.

The asset is the layer, not the window

Here is the gap worth staking. Almost everyone treats context as a runtime problem — token budgets, retrieval, compaction, pipelines. Very few treat the durable context layer as a versioned, owned engineering asset with its own lifecycle. By the context layer we mean the persistent things your system reassembles the window from: your conventions, your architecture decision records, your interface contracts, your canonical examples, your shared team instructions. The window is reconstructed every prompt and thrown away. The layer is what you keep. Context is the new architecture, and the architecture is the thing you should be writing down, reviewing and owning.

Two layers, two lifecycles

Prompt window

Ephemeral · discarded each turn

Context layer

Owned

Versioned

Reviewed

Owned and reviewed like code

Two layers, two lifecycles.

Most Context Engineering: Context Is the New Architecture advice optimises the layer your tool rebuilds per prompt. The real asset is the persistent, version-controlled context layer you own — and it encodes your standing acceptance criteria.

This is where context engineering rejoins the acceptance gap directly. A change can be judged correct and safe quickly precisely when the criteria for correctness are already written down somewhere the system can reach. A well-engineered context layer encodes the standing acceptance criteria of your codebase: this is how we name things, this is why we chose this datastore, this is the contract this service must honour, this is what a good test looks like here. When that knowledge is captured, versioned and reviewed, acceptance gets cheaper for both the model and the human reviewing it. When it lives only in senior engineers' heads, every generated change reopens questions that should have been settled once.

Engineer it, version it, own it

Treating context as architecture has concrete consequences. Thoughtworks moved context engineering from 'Assess' in November 2025 to 'Adopt' in April 2026, calling it 'a foundational architectural concern' that must be built as 'a dynamic, precisely managed pipeline.' We would push one step further: the pipeline is the runtime; the source is the asset. Three disciplines follow.

Version it. Conventions, decision records and canonical examples belong in the repository, reviewed through pull requests, with a history you can blame and revert. If a model's behaviour shifts, you want a diff to point at.
Give it provenance. The NIST Generative AI Profile stresses documenting and tracing the information fed to models. Apply the same rigour to context that you apply to code: where did this instruction come from, who approved it, when did it last change. Provenance is the governance hook that turns 'context' from folklore into an auditable artefact.
Give it an owner. A context layer with no owner rots faster than the model does. Someone is accountable for keeping conventions current, retiring stale examples and ensuring the contracts the system relies on still hold.

MIT Technology Review and Thoughtworks both make the point that the binding constraint here is human and design, not compute. Engineers remain critical as the architects of context — the people who do knowledge priming, build reference applications and curate shared instructions. Reliability now depends on context design rather than raw scale. This is the AI-native operating model in miniature: the senior engineer's job moves from doing the work to directing it, and the context layer is the medium through which that direction is expressed.

So the contrarian edge is simple. By all means tune your window. But do not mistake the ephemeral artefact your tool rebuilds each prompt for the asset. The asset is the persistent context layer you write, version and own. Build that well and acceptance gets faster, because correctness is already encoded. Neglect it and you will spend the next year optimising tokens while the real architecture stays in people's heads.

If you want to know where your organisation sits on this — whether your context layer is owned and versioned or reconstructed by accident — our five-stage maturity model is the place to start, and 'What is Agentic Engineering' sets out the broader operating model that context now serves.

Our perspective

The common view

Bigger context windows and better prompts solve AI quality.

The Ivaaya view

Quality comes from the persistent, version-controlled context layer you own — the window is ephemeral; the layer is the architecture.

“Larger context windows make this moot.”: — Bigger windows degrade with irrelevant tokens (lost-in-the-middle); curation and ownership matter more than raw capacity.

If you’re doing this tomorrow

Own and version the context layer (conventions, ADRs, contracts) like code.
Curate context for relevance; do not just add more tokens.

Where teams go wrong

Optimising the ephemeral prompt window
Dumping ever more context and degrading retrieval
Treating context as per-tool config rather than a versioned asset

At a glance

What: The persistent, owned context layer agents reason over.
Why: A generated change is only as good as the context used to judge it.
When: Any team running agents on a real codebase.
When not: One-off prompts with no durability need.

The evidence & related ideas →

What we’ve observed

Stanford’s “Lost in the Middle” shows models use long contexts unevenly — adding tokens can degrade, not improve, retrieval.
In practice, a versioned context layer (conventions, decisions, contracts) is what lets a generated change be judged correct fast.

How certain are we?

A versioned, owned context layer is the durable asset — observed: Seen consistently in our own work.
More context is not better — established: Observed repeatedly across delivery programmes.

About the author

Priyanka Pandey

Founder & Editorial Lead

Priyanka Pandey founded Ivaaya and leads its editorial voice, translating real delivery experience into practical thinking on AI-native engineering, decision-making and technology leadership. Her work focuses on helping senior leaders make sense of the changes reshaping software delivery without adding to the noise.

Reviewed and challenged by

Sanjeev Purohit

Principal, Decision Architecture

Sanjeev works across enterprise architecture, product strategy and AI-native delivery. The ideas in this article have been challenged against real programmes, production systems and organisational decision-making before publication.

Related thinking

Compare notes

If the durable context layer — your conventions, decisions and contracts — is still living in people’s heads rather than versioned where a generated change can be judged against it, tell us where the gap shows. We’re trading approaches with teams making that layer an owned asset.

Where does your context live? →

This made me think of…