Platform · 10 min read

Platform Engineering for Agentic Teams

When AI agents become contributors, the internal platform stops being plumbing and becomes the control plane that decides whether AI compounds your strengths or your dysfunction.

For most of its history the internal platform has been background infrastructure — pipelines, environments, a service catalogue, the paved road a developer takes to production. Useful, unglamorous, and largely invisible when it worked. AI coding agents end that quiet. The moment a non-human contributor can open a pull request, the platform stops being plumbing and becomes the thing that decides whether that contribution is safe to accept.

This is not a tooling upgrade. It is a change in who — and what — the platform now serves.

The evidence: AI amplifies, it does not fix

The most decisive finding of the year is DORA’s 2025 research — independent, longitudinal, roughly 5,000 respondents (now published under Google Cloud). Its headline is blunt: “AI doesn’t fix a team; it amplifies what’s already there.” And the variable that decides which way the amplification runs is the platform. Where internal-platform quality is high, AI’s effect on organisational performance is strong and positive; where it is low, the effect is negligible. DORA also reports that 90% of organisations now run at least one internal platform, and names “quality internal platforms” as one of the seven capabilities in its new AI Capabilities Model — the layer that lets AI’s benefits scale securely.

Give a strong team AI and a good platform, and you compound strength. Give a struggling team AI and a weak platform, and you compound the mess — faster.

The platform is the unit of AI adoption

Most AI-adoption programmes are run as tool rollouts: licences, a training session, a Slack channel. DORA’s result says that is the wrong unit. The platform — not the seat — is the distribution-and-governance layer through which AI’s benefits actually reach production, or fail to. You do not adopt AI developer by developer; you adopt it platform by platform. That single reframing changes who owns the outcome, and it puts platform engineering at the centre of an AI strategy rather than underneath it.

What agents demand that humans didn’t

The strain shows up first at review. DORA found AI lifts throughput and product performance, but continues to degrade delivery stability without robust controls — strong automated testing, mature version control, fast feedback. Telemetry across thousands of teams tells the same story from the other side: time to first review up by around 157%, and roughly a third more pull requests merged with no review at all (Faros AI, ~22,000 developers — vendor data, read with that in mind, but directionally echoed by DORA and others). Generation got cheap; acceptance didn’t. The platform is where that gap is either absorbed or ignored.

So the platform acquires a new job description — not just to ship human work faster, but to admit non-human work safely. The convergent picture from the teams furthest along looks like this:

  • Identity for non-human contributors — every agent gets its own attributable identity and least-privilege, short-lived credentials. Microsoft’s Entra Agent ID and the SPIFFE standard are two routes to the same principle: no shared keys, no anonymous commits.
  • An agent registry — a single inventory of which agents exist and who owns them. As Microsoft puts it, you cannot govern agents you do not know exist; shadow agents are the new shadow IT.
  • Sandboxed, ephemeral environments by default — agents run isolated, with restricted filesystem, controlled network and bounded resources. Thoughtworks now treats sandboxing as a sensible default, not an optional extra.
  • Context distributed through golden paths — the conventions, decisions and instruction files (AGENTS.md, CLAUDE.md) an agent needs, baked into the service templates new repositories are scaffolded from, so every project inherits current guidance by default instead of each developer re-writing prompts.
  • Harnesses with feedback wired in — feedforward controls that set an agent up to be right first time, and feedback sensors (compilers, linters, type checkers, test suites) that catch failures and trigger self-correction before a human ever looks.
  • Durable state and automated verification — long-running agents start each session with no memory, so the platform provides the scaffolding for continuity and the end-to-end checks that confirm a change works as a user would actually experience it.

None of these is exotic. They are the concerns a good platform already had — identity, isolation, paved roads, automated gates — re-pointed at a contributor who never reads the wiki, never asks a colleague, and produces work far faster than the humans reviewing it.

Agent-readiness is the next platform-maturity tier

The established maturity models haven’t caught up. CNCF’s Platform Engineering Maturity Model assesses platforms across five aspects — investment, adoption, interfaces, operations, measurement; Microsoft’s capability model adds governance and provisioning. Neither yet names an agent-readiness tier. We think it is coming, and that it belongs beside the others: a platform’s maturity will increasingly be judged by how safely it can admit a non-human contributor. We are honest that this is a frontier — Thoughtworks still rates anchoring agents to a reference application as “Assess” and sandboxed execution as “Trial”. This is an emerging practice, not a settled playbook.

Agent-readiness is simply this: can you give a non-human contributor an identity, a context, a sandbox and a verdict — and is the only human step the verdict?

But a platform is not a substitute for judgement

The temptation is to read all this as “buy the platform features and the agents will be safe”. DORA’s amplifier finding cuts the other way too: bolt agents onto a weak platform and you industrialise the weakness. An independent randomised trial by METR — not a vendor — found experienced developers were 19% slower using early-2025 AI tools on code they knew well, even while they felt faster. The tooling has moved on since; the lesson hasn’t. The platform and the practices around it, not the model, decide the outcome.

What to measure

Stop counting AI-generated lines of code. Measure the platform’s ability to admit non-human work safely: the share of repositories with anchored, current agent context; time to provision a clean ephemeral sandbox; the proportion of agent actions tied to an attributable identity; the pass rate of automated acceptance gates; and audit coverage of what agents actually did. These are platform metrics — and they are the ones that predict whether agents help or hurt.

Underneath all of it is the idea that runs through everything we build: intent has to survive the journey from a human’s head to production. The platform is where that intent becomes enforceable — where “how we build software here” stops being culture and becomes golden paths, guardrails and gates that a human or an agent inherits by default. That was always the promise of platform engineering. Agents just made it non-optional.