There is a feature factory hiding in plain sight. We learned, painfully, that measuring product teams by features shipped rewards motion over outcomes. Then we built platform teams — and started measuring them by features shipped. Now AI arrives, and the instinct is to measure it by lines generated and pull requests opened. It is the same mistake in new costume. The truer measure of a platform — and of AI — is not what it produces. It is the cognitive load it takes off the people doing the work.
Cognitive load is the binding constraint
Team Topologies makes the case plainly: cognitive load — the total mental effort a team can hold in working memory — is finite, and roughly team-sized. Exceed it and a team can no longer safely own or evolve the systems it has been handed; the team itself becomes the delivery bottleneck. Which is why boundaries and architecture should be drawn to fit within a team’s available cognitive load, not the other way round. This is a practitioner framework rather than peer-reviewed science — and its appeal to Dunbar-style team-size limits is looser than it sounds — but the constraint is real to anyone who has watched a capable team drown in the surface area it was given.
A platform’s job is to remove load, not ship features
Team Topologies states it flatly: “the primary goal of all internal platforms is to reduce the cognitive load on its customers” — the developers using it. Martin Fowler puts it the same way: “the primary benefit of a platform is to reduce the cognitive load on stream-aligned teams.” That is the logic behind platform-as-product and the “thinnest viable platform” — build only as much as removes load, and no more. A platform measured by its feature count is therefore measuring the opposite of its purpose: every feature is surface area, and surface area is load.
A platform’s product is not features. It is the set of things a developer no longer has to think about.
Three kinds of load — only one deserves a developer’s attention
Cognitive-load theory (Sweller) splits the effort three ways: intrinsic load (the irreducible difficulty of the problem), extraneous load (incidental mechanics — how do I deploy this, wire up that), and germane load (the high-value thinking that actually solves the customer’s problem). The job of a platform — and of good AI tooling — is to strip extraneous load, lower intrinsic load with paved paths and sane defaults, and protect germane load, the only kind worth a senior engineer’s scarce attention. A golden path does this; an agent that absorbs boilerplate can too. The test is identical for both: did it free up thinking, or just add a new thing to learn?
Why output metrics mislead — the independent evidence
This is not only the platform crowd’s view. The DevEx research — Noda, Storey, Forsgren (a SPACE and DORA co-author) and Greiler, published in ACM Queue — distils developer productivity into three drivers: feedback loops, cognitive load and flow state, and argues measurement must shift “from what developers produce to how effectively they work.” It insists on combining workflow data with developers’ own perceptions, because the two diverge: a code review that looks fast in the data can still feel disruptive in the day. Counting features, pull requests or generated lines is precisely the output metric this research warns against.
Counting AI-generated lines is the feature factory in a new costume. Output was never the constraint.
Does AI actually lower cognitive load?
The promise is that AI removes toil — pure extraneous load. The evidence is thinner than the pitch, and we should be honest about it. DORA’s 2025 research frames AI as an amplifier: the returns come from the organisational system around the tool — quality platforms, small batches, version control — not the tool itself, and AI showed no measurable effect on friction or burnout. The widely-quoted “over 80% feel more productive” is a sentiment, not a cognitive-load measurement — and independent telemetry points the other way, with developers juggling more concurrent pull-request contexts and restarting work more often. That is the Acceptance Gap seen from inside the developer’s head: AI can remove the load of typing while adding the load of reviewing, verifying and context-switching. Whether it nets out to less load depends entirely on the platform around it.
The redistribution trap
There is a subtler failure even when load does fall. A platform can reduce a stream team’s load by absorbing it — moving the burden onto a central platform team rather than removing it from the system. The consuming team feels lighter; the organisation’s total load is unchanged, or worse, now concentrated in a few people and more fragile. “Reduced cognitive load” that is really “centralised cognitive load” is a metric lying to you. The honest question is always net, system-wide load — including the people running the platform and the ones reviewing the agents.
What to measure instead
So retire the output dashboards — platform feature counts, AI pull requests merged, lines generated — and measure load directly, the way the DevEx research prescribes: perceived load and flow/focus time reported by developers themselves, paired with workflow signals such as time-to-onboard and time-to-first-safe-deploy, read across both the consuming teams and the platform and review teams. We are honest that there is no single validated “cognitive-load-removed” number yet — the field is early. The discipline is to measure load and effectiveness rather than reach for the output proxy because it happens to be easy to count.
The best capability a platform can ship — and the best thing an AI can do — is to let a good engineer not think about something that did not deserve their thinking. That is the number worth chasing. Everything else is the feature factory, counting motion and calling it progress.