Delivery Assurance · 6 min read · Updated 2026-06-18

Provenance Engineering: Reconstructing Who Decided What When Humans, Agents and Models All Contributed

When software is co-produced by humans, agents and models, "who decided this, on what basis, and can we reconstruct it?" stops being a forensic luxury and becomes a first-class engineering requirement.

By Priyanka Pandey · Founder & Editorial Lead

Reviewed and challenged by Sanjeev Purohit · Principal, Decision Architecture

Built from

Independent research
Original framework
Reviewed with field experience

Last substantively reviewed · 2026-06-18

Part of Trust, Governance & the Economics of AI · The Governance-to-Value Ratio

In brief

When software is co-produced by humans, agents and models, “who decided this, on what basis, and can we reconstruct it?” becomes a first-class engineering requirement, not a forensic luxury.

Authorship traceability is already broken — agents are both under- and over-attributed.
Regulation is converting provenance from hygiene to obligation.
The standards (W3C PROV, in-toto/SLSA) exist; they need assembling for the SDLC.

Best for

Teams that must reconstruct who or what decided a co-produced change

Not for

Solo human work with clear authorship

Ask a delivery team a deceptively simple question about a recent production incident: who decided this, on what basis, and can you reconstruct it? A year ago the honest answer involved a named engineer, a pull request and a Slack thread. Today the chain runs through a human prompt, a retrieved snippet of internal documentation, a model's inference, an agent's tool call, a second model's review, and a human approval that may have been a reflexive click. The decision still happened. The pipeline that produced it has become opaque. That opacity is the problem provenance engineering exists to solve.

The intellectual root of this is older than the current agent wave. Jatinder Singh, Jennifer Cobbe and Chris Norval at Cambridge introduced the notion of decision provenance — applying provenance methods to expose the decision pipeline, the chain of inputs to, nature of, and flow-on effects from decisions taken within complex systems-of-systems, as a technical means of increasing accountability in algorithmic decision-making. They were writing about automated decisions in distributed systems. The argument lands with far greater force now that the systems-of-systems include the engineering process itself.

Authorship traceability is already broken

The first uncomfortable finding is that we cannot reliably establish even the most basic fact — who, or what, wrote a given change. An empirical study of AIDev, spanning over 456,000 pull requests from five leading coding agents (OpenAI Codex, Devin, GitHub Copilot, Cursor and Claude Code) across 61,000 repositories and 47,000 developers, found striking inconsistency. Devin, GitHub Copilot and Cursor explicitly indicate their authorship in commit metadata; OpenAI Codex and Claude Code attribute their contributions to human developers. The authors are blunt that this lack of authorship traceability undermines core principles of responsible AI — transparency, auditability and accountability.

If under-attribution is one failure mode, over-attribution is the mirror image. In early March 2026, VS Code's Git extension (v1.110) was altered to auto-add a 'Co-authored-by: Copilot' trailer to commits even when developers had written every line themselves, touching roughly four million commits before Microsoft reversed the change amid a developer backlash. The episode is instructive precisely because it was not malicious — it was a default. Authorship metadata is itself becoming a supply-chain trust problem: a field that can be silently wrong in either direction is worse than no field at all, because it manufactures false confidence.

Regulation is converting this from hygiene to obligation

For teams working in regulated commerce and identity contexts, the timeline is no longer theoretical. Article 12 of the EU AI Act requires a queryable record of AI-driven decisions, with record-keeping and logging for high-risk systems to support traceability and monitoring; high-risk obligations reach full enforcement on 2 August 2026, and the GPAI Code of Practice published on 10 July 2025 makes provenance tracking, data logging and watermarking monitoring obligations for signatories. The bar an audit trail must clear is specific. As analysts at CX Today summarise it, an AI audit trail must capture which system made the decision, what data it relied on, what it produced and which controls shaped the outcome — interaction IDs, timestamps, model, prompt and policy versions, input provenance references, output artefacts, human approvals or overrides, and tamper-evident storage — so an organisation can recreate the decision in a form a third party would accept.

Read that requirement against the AIDev findings and the gap is stark. We are being asked to reconstruct, for a hostile third party, decisions whose authorship our tooling cannot even consistently record. This is the Acceptance Gap reframed as a compliance problem: generation is abundant, but a change is only defensible once we can attest to how it was accepted and by whom.

The standards already exist — they need assembling

The encouraging news is that provenance engineering does not start from a blank sheet. The building blocks are mature; they have simply not been composed for the agentic SDLC. The W3C PROV data model gives a domain-agnostic vocabulary — Entities, Activities and Agents linked by relations such as wasGeneratedBy, used and wasAttributedTo — designed precisely so application-specific provenance can be translated into a common interchange model between provenance-aware systems. That triad maps with almost suspicious neatness onto co-produced software: an Activity is a generation or review step, an Agent is a human, a model or an autonomous agent, an Entity is the artefact produced.

From the supply-chain world come the cryptographic primitives. SLSA, the OpenSSF project at v1.1 stable as of 2024, defines verifiable provenance describing where, when and how artefacts were produced, using in-toto attestations wrapped in DSSE envelopes to capture builder identity, build instructions, parameters, environment variables and dependency digests. The in-toto framework underneath it — stewarded by Santiago Torres-Arias of Purdue with NYU's Secure Systems Lab and NJIT, and funded by DARPA, AFRL and NSF — cryptographically links each step via a signed envelope, a subject identified by cryptographic digest and a typed predicate, producing an unbroken chain of custody from source to production. The discipline these standards enforce for build steps is exactly what we now need one layer up, for decision and authorship steps.

Provenance is the connective tissue that makes agentic delivery defensible: not a log you write after the incident, but a typed, signed predicate you emit at the moment each decision is made.

Toward reasoning and decision provenance

The frontier work is extending these ideas inward, to the reasoning itself. Recent research formalises reasoning provenance for autonomous agents, distinguishing computational state persistence from structured behavioural analytics and proposing Action Provenance Graphs that link prompts, plans, tool invocations, intermediate reasoning states and outcomes, so auditors can reconstruct causal pathways and trace responsibility across complex agent interaction chains. The pattern is already being proven in high-stakes domains: a clinical decision-support framework from Alu and Oluwadare (Frontiers in Artificial Intelligence, February 2026) constrains a model to reason only over verified retrieved sources and logs queries, retrieved document IDs, inference chains and outputs to a tamper-evident permissioned ledger, so any inference step can be replayed and examined after the fact — explicitly aligned with FDA GMLP and EU AI Act expectations.

Content provenance offers a cautionary note on rigour. The C2PA standard, formed in 2021 by uniting Adobe's Content Authenticity Initiative with Microsoft and BBC's Project Origin, launched its conformance programme in 2025 with self-assertion and two assurance levels — yet peer-reviewed analysis argues the current specifications fail to achieve their claimed security goals. The lesson for engineering teams is that a provenance claim is only as strong as its threat model. Tamper-evidence, cryptographic binding and an honest account of what is not protected are not optional polish; they are the difference between a record a third party accepts and a record that quietly launders unaccountable decisions.

What this means for delivery

Provenance engineering is therefore best understood not as a tooling purchase but as a decision-quality discipline applied across the lifecycle. The practical agenda is concrete: treat authorship as data that must be correct by construction rather than by default; emit PROV-style attributions at every human, agent and model boundary; bind them with in-toto-grade signatures; and capture the retrieved context and reasoning chain, not merely the output. Done well, the same record that satisfies the auditor on 2 August 2026 also halves your mean-time-to-understand during a 3am incident, because debugging a co-produced system is, at bottom, a provenance query.

This is where Delivery Assurance and AI governance meet. If you have started codifying how AI-authored changes move through your pipeline, the next move is to make those policies enforceable and reviewable rather than aspirational — which is precisely the territory of AI Coding Governance That Enables, Not Forbids. Provenance is the evidence layer; governance is what turns that evidence into an accepted, defensible delivery.

Our perspective

The common view

Git history and logs are enough provenance.

The Ivaaya view

Co-produced delivery needs decision and reasoning provenance — typed, signed predicates emitted at decision time — assembled from existing standards, not reconstructed after an incident.

“We have git blame.”: — Authorship is already unreliable for agent work; defensible provenance is a signed, decision-time record, not commit metadata.

If you’re doing this tomorrow

Emit signed provenance at decision time (who/what/context), not after the fact.
Assemble W3C PROV + in-toto/SLSA into the SDLC rather than inventing a bespoke scheme.

Where teams go wrong

Trusting authorship metadata that can be silently wrong
Reconstructing provenance after an incident
Only build provenance, no reasoning/decision provenance

At a glance

What: Decision and reasoning provenance for co-produced software.
Why: Authorship is broken and regulation now requires reconstruction.
When: Agent/model-co-produced delivery, especially regulated.
When not: Solo human work with clear authorship.

The evidence & related ideas →

What we’ve observed

An empirical study of AI-authored pull requests (AIDev) finds inconsistent authorship attribution across leading coding agents; in 2026 a VS Code change mis-added co-author trailers to ~4M commits.
W3C PROV and in-toto/SLSA supply the vocabulary and signatures; EU AI Act record-keeping makes assembling them an obligation.

How certain are we?

Authorship traceability for AI work is currently unreliable — established: Observed repeatedly across delivery programmes.
Decision provenance is becoming a regulatory obligation — observed: Seen consistently in our own work.

About the author

Priyanka Pandey

Founder & Editorial Lead

Priyanka Pandey founded Ivaaya and leads its editorial voice, translating real delivery experience into practical thinking on AI-native engineering, decision-making and technology leadership. Her work focuses on helping senior leaders make sense of the changes reshaping software delivery without adding to the noise.

Reviewed and challenged by

Sanjeev Purohit

Principal, Decision Architecture

Sanjeev works across enterprise architecture, product strategy and AI-native delivery. The ideas in this article have been challenged against real programmes, production systems and organisational decision-making before publication.

Related thinking

Compare notes

Once humans, agents and models are all co-producing a change, the quiet question becomes whether you could reconstruct who decided what, and on what basis. We are comparing notes with teams trying to make that reconstructable on purpose — where does it currently break for you?

Can you reconstruct the decision? →

This made me think of…