Most governance documents for AI coding read like they were written to stop something. They lead with prohibition, treat the engineer as a liability to be contained, and end up doing the one thing they were meant to prevent: pushing the actual usage into the shadows, where nobody can see it, audit it, or improve it. Governance that works does the opposite. It exists to let more people use these tools, more safely, with the organisation able to stand behind what ships. The goal is not control for its own sake. The goal is raising acceptance without quietly raising risk.
That framing matters because the evidence on whether AI coding even helps is far messier than the marketing. METR's randomised controlled trial (Becker, Rush, Barnes and Rein) is the study to sit with. Sixteen experienced open-source developers, working on mature repositories they averaged five years of familiarity with, took on 246 real tasks. Allowing early-2025 AI tools increased completion time by 19 per cent. The developers had forecast a 20 to 24 per cent speed-up; experts predicted a 38 to 39 per cent reduction. And in METR's own write-up, those same developers self-reported a roughly 20 per cent speed-up after the fact, while the measured effect was a 19 per cent slowdown. Hold that gap in your mind, because it is the whole governance problem in one number: perceived productivity and actual productivity can point in opposite directions, and people cannot feel the difference.
Why governance is an acceptance problem, not a compliance one
If practitioners systematically misjudge their own output, then a governance regime built purely on self-attestation and good intentions is building on sand. This is the continuation of a theme we keep returning to: the acceptance gap is not closed by enthusiasm, it is closed by evidence. The Stack Overflow 2025 Developer Survey shows the population is already past the adoption tipping point — 84 per cent use or plan to use AI tools, up from 76 per cent — but trust is falling, with 46 per cent distrusting AI accuracy against 33 per cent who trust it. Tellingly, the most experienced developers are the most cautious, and the top frustration, cited by 66 per cent, is the answer that is 'almost right but not quite'. That 'almost right' is precisely what governance has to catch, because it is the failure mode that slips through casual review.
The systemic picture reinforces it. DORA's 2024 Accelerate State of DevOps report found 75.9 per cent of respondents relying on AI for part of their work, yet a 25 per cent increase in AI adoption was associated with an estimated 1.5 per cent decrease in delivery throughput and a 7.2 per cent decrease in delivery stability. Individual gains, where they exist, are not translating into system-level delivery. Governance is the layer that decides whether they do.
The control points worth having
Strip away the theatre and effective AI coding governance comes down to five concrete control points: approved tools, data boundaries, IP and security posture, auditability, and human accountability. None of these need to be heavy. All of them need to be explicit.
Data boundaries are where the contractual detail earns its keep, and the honest move is to read the vendor terms yourself rather than the brochure. As vendor framing, GitHub states that Copilot Business and Enterprise data is not used to train models and that prompts and suggestions are not retained, while its IP indemnity applies only when the duplication-detection filter is enabled and a suggestion is used unmodified. GitLab, in its own press commentary, frames the coming policy shift — Copilot Free and Pro interaction data used for training by default from April 2026 unless opted out, with Business and Enterprise exempt — as a 'governance wake-up call' on default data boundaries. The lesson is not which tool to pick; it is that the default tier and the default flag determine your IP and confidentiality position, and defaults change.
On security, treat vendor data as a signal to be triangulated, not as gospel. As vendor framing, Veracode's 2025 GenAI Code Security Report — across 80 tasks and over 100 LLMs — reports that AI-generated code introduced security vulnerabilities in 45 per cent of cases, with Java failing 72 per cent of security checks and Cross-Site Scripting and Log Injection failing 86 and 88 per cent respectively, and crucially no improvement with newer or larger models. A security vendor has an interest in a frightening number, so the figure is best read directionally: generated code is not safe by default, and scale does not save you. That is consistent enough with the independent productivity findings to act on.
Governance that enables is governance that makes the safe path the easy path: approved tools wired in, scanning automatic, the audit trail a by-product of normal work rather than a tax on it.
Standards give you the shape, not the answers
You do not need to invent this from scratch. NIST's Generative AI Profile (NIST-AI-600-1, July 2024) identifies twelve GenAI-specific risks — including IP concerns, data poisoning, prompt injection and over-reliance — and organises suggested actions around Govern, Map, Measure and Manage, with the Govern function treated as foundational. The EU AI Act, for high-risk systems, requires human oversight and automatic event logging across the lifecycle, with deployers keeping logs for at least six months; Article 5 prohibitions took effect in February 2025 and core high-risk obligations begin in August 2026. Whether or not those obligations bind you, the logging-and-oversight pattern is a sound default for auditability: capture what the system did, keep it long enough to investigate, and keep a named human in the loop.
The academic literature is candid about the limits here. A peer-reviewed paper on responsible GenAI governance (IEEE ISTAS 2025) argues for a balanced, risk-based approach that enables innovation and oversight together, but flags a real gap: there is limited empirical work on how governance frameworks actually perform in complex, distributed enterprises. So we should hold our own frameworks loosely and instrument them, rather than assume the document equals the outcome.
Human accountability is the load-bearing wall
Every control above resolves to one principle: a person, not a model, is accountable for what merges. This is why we treat acceptance as a reviewed event rather than a click, and why agentic code review becomes a governance instrument and not just a quality one. It is also why authority should be distributed deliberately. Borrowing Ed Harmel-Law's advice process, the engineer closest to the change makes the call, having sought advice from those affected and those with expertise — which scales accountability far better than a central board that becomes a bottleneck. And it is worth distinguishing reversibility before reaching for heavy process at all: Jeff Bezos's distinction between one-way and two-way doors tells you that a sandboxed, easily reverted change does not warrant the same scrutiny as a schema migration in a payment path.
Where this leaves the cautious CTO
Gartner expects 90 per cent of enterprise software engineers to use AI code assistants by 2028, up from under 14 per cent in early 2024, while warning that more than 40 per cent of agentic AI projects could be cancelled by 2027 on unclear value, rising costs and weak governance — and recommends a cross-functional task force spanning engineering, architecture, security and legal. That is the right instinct, provided the task force ships enablement, not edicts. Approve a small set of tools on the right tier. Make scanning and logging automatic. Name who is accountable. Measure the real effect, because the METR study tells you nobody can feel it. Then expand the approved surface as the evidence comes in. Governance done this way is not the brake on adoption. It is the thing that makes adoption survivable, and eventually, genuinely faster.
If you are putting this in place, anchor it to where your organisation actually sits on the The AI Engineering Maturity Model, and let that determine how much process is proportionate rather than copying a regime built for a different stage.