Ask a delivery team a deceptively simple question about a recent production incident: who decided this, on what basis, and can you reconstruct it? A year ago the honest answer involved a named engineer, a pull request and a Slack thread. Today the chain runs through a human prompt, a retrieved snippet of internal documentation, a model's inference, an agent's tool call, a second model's review, and a human approval that may have been a reflexive click. The decision still happened. The pipeline that produced it has become opaque. That opacity is the problem provenance engineering exists to solve.
The intellectual root of this is older than the current agent wave. Jatinder Singh, Jennifer Cobbe and Chris Norval at Cambridge introduced the notion of decision provenance — applying provenance methods to expose the decision pipeline, the chain of inputs to, nature of, and flow-on effects from decisions taken within complex systems-of-systems, as a technical means of increasing accountability in algorithmic decision-making. They were writing about automated decisions in distributed systems. The argument lands with far greater force now that the systems-of-systems include the engineering process itself.
Authorship traceability is already broken
The first uncomfortable finding is that we cannot reliably establish even the most basic fact — who, or what, wrote a given change. An empirical study of AIDev, spanning over 456,000 pull requests from five leading coding agents (OpenAI Codex, Devin, GitHub Copilot, Cursor and Claude Code) across 61,000 repositories and 47,000 developers, found striking inconsistency. Devin, GitHub Copilot and Cursor explicitly indicate their authorship in commit metadata; OpenAI Codex and Claude Code attribute their contributions to human developers. The authors are blunt that this lack of authorship traceability undermines core principles of responsible AI — transparency, auditability and accountability.
If under-attribution is one failure mode, over-attribution is the mirror image. In early March 2026, VS Code's Git extension (v1.110) was altered to auto-add a 'Co-authored-by: Copilot' trailer to commits even when developers had written every line themselves, touching roughly four million commits before Microsoft reversed the change amid a developer backlash. The episode is instructive precisely because it was not malicious — it was a default. Authorship metadata is itself becoming a supply-chain trust problem: a field that can be silently wrong in either direction is worse than no field at all, because it manufactures false confidence.
Regulation is converting this from hygiene to obligation
For teams working in regulated commerce and identity contexts, the timeline is no longer theoretical. Article 12 of the EU AI Act requires a queryable record of AI-driven decisions, with record-keeping and logging for high-risk systems to support traceability and monitoring; high-risk obligations reach full enforcement on 2 August 2026, and the GPAI Code of Practice published on 10 July 2025 makes provenance tracking, data logging and watermarking monitoring obligations for signatories. The bar an audit trail must clear is specific. As analysts at CX Today summarise it, an AI audit trail must capture which system made the decision, what data it relied on, what it produced and which controls shaped the outcome — interaction IDs, timestamps, model, prompt and policy versions, input provenance references, output artefacts, human approvals or overrides, and tamper-evident storage — so an organisation can recreate the decision in a form a third party would accept.
Read that requirement against the AIDev findings and the gap is stark. We are being asked to reconstruct, for a hostile third party, decisions whose authorship our tooling cannot even consistently record. This is the Acceptance Gap reframed as a compliance problem: generation is abundant, but a change is only defensible once we can attest to how it was accepted and by whom.
The standards already exist — they need assembling
The encouraging news is that provenance engineering does not start from a blank sheet. The building blocks are mature; they have simply not been composed for the agentic SDLC. The W3C PROV data model gives a domain-agnostic vocabulary — Entities, Activities and Agents linked by relations such as wasGeneratedBy, used and wasAttributedTo — designed precisely so application-specific provenance can be translated into a common interchange model between provenance-aware systems. That triad maps with almost suspicious neatness onto co-produced software: an Activity is a generation or review step, an Agent is a human, a model or an autonomous agent, an Entity is the artefact produced.
From the supply-chain world come the cryptographic primitives. SLSA, the OpenSSF project at v1.1 stable as of 2024, defines verifiable provenance describing where, when and how artefacts were produced, using in-toto attestations wrapped in DSSE envelopes to capture builder identity, build instructions, parameters, environment variables and dependency digests. The in-toto framework underneath it — stewarded by Santiago Torres-Arias of Purdue with NYU's Secure Systems Lab and NJIT, and funded by DARPA, AFRL and NSF — cryptographically links each step via a signed envelope, a subject identified by cryptographic digest and a typed predicate, producing an unbroken chain of custody from source to production. The discipline these standards enforce for build steps is exactly what we now need one layer up, for decision and authorship steps.
Provenance is the connective tissue that makes agentic delivery defensible: not a log you write after the incident, but a typed, signed predicate you emit at the moment each decision is made.
Toward reasoning and decision provenance
The frontier work is extending these ideas inward, to the reasoning itself. Recent research formalises reasoning provenance for autonomous agents, distinguishing computational state persistence from structured behavioural analytics and proposing Action Provenance Graphs that link prompts, plans, tool invocations, intermediate reasoning states and outcomes, so auditors can reconstruct causal pathways and trace responsibility across complex agent interaction chains. The pattern is already being proven in high-stakes domains: a clinical decision-support framework from Alu and Oluwadare (Frontiers in Artificial Intelligence, February 2026) constrains a model to reason only over verified retrieved sources and logs queries, retrieved document IDs, inference chains and outputs to a tamper-evident permissioned ledger, so any inference step can be replayed and examined after the fact — explicitly aligned with FDA GMLP and EU AI Act expectations.
Content provenance offers a cautionary note on rigour. The C2PA standard, formed in 2021 by uniting Adobe's Content Authenticity Initiative with Microsoft and BBC's Project Origin, launched its conformance programme in 2025 with self-assertion and two assurance levels — yet peer-reviewed analysis argues the current specifications fail to achieve their claimed security goals. The lesson for engineering teams is that a provenance claim is only as strong as its threat model. Tamper-evidence, cryptographic binding and an honest account of what is not protected are not optional polish; they are the difference between a record a third party accepts and a record that quietly launders unaccountable decisions.
What this means for delivery
Provenance engineering is therefore best understood not as a tooling purchase but as a decision-quality discipline applied across the lifecycle. The practical agenda is concrete: treat authorship as data that must be correct by construction rather than by default; emit PROV-style attributions at every human, agent and model boundary; bind them with in-toto-grade signatures; and capture the retrieved context and reasoning chain, not merely the output. Done well, the same record that satisfies the auditor on 2 August 2026 also halves your mean-time-to-understand during a 3am incident, because debugging a co-produced system is, at bottom, a provenance query.
This is where Delivery Assurance and AI governance meet. If you have started codifying how AI-authored changes move through your pipeline, the next move is to make those policies enforceable and reviewable rather than aspirational — which is precisely the territory of AI Coding Governance That Enables, Not Forbids. Provenance is the evidence layer; governance is what turns that evidence into an accepted, defensible delivery.