Agentic Integration & MCP Experiments

Connecting an agent to a tool is easy. A Model Context Protocol server is a few hours of work; wiring an agent to a database, a calendar or an internal API is a tutorial-length task. The hard question is the one that surfaces the moment that connection works: now that the agent can act, what is it allowed to do, who decided that, can anyone reconstruct why it did what it did, and would you notice if it started doing the wrong thing? This is the question this incubation explores. Our conviction, formed by building these connections rather than theorising about them, is that agent access is not an integration problem at all. It is a boundary, permission, provenance and observability problem wearing an integration problem's clothes.

We are not AI researchers and we make no claim to a proven agentic track record - the field is months old and so is everyone's experience in it. What we bring is two decades of delivery and architecture habits applied to a new substrate: the same instincts that make a payments integration or a multi-tenant platform survive contact with production, pointed at agents. The patterns below are things we build and run in the lab, described generically. They are convictions under test, not finished products.

What we explore: coordination, policy, isolation

Three patterns recur across everything we build here. The first is a multi-agent coordination hub: a small piece of shared infrastructure that lets several agents - and the humans supervising them - exchange typed messages over project-scoped channels, with presence (who is active) and delivery telemetry that tracks each message through an explicit lifecycle from published to delivered to acknowledged. The discipline is deliberately boring: messages have shapes, channels have scopes, and nothing is fire-and-forget. We treat agent coordination the way we treat any distributed system, with contracts and back-pressure, not as a chat room.

The second is a policy engine that sits in front of tool calls and evaluates every single one against an allow / deny / ask decision. Allow lets the call through. Deny blocks it. Ask suspends it and routes a confirmation to a human before anything irreversible happens. Crucially this is a deterministic external layer - the agent does not get to mark its own homework. The third is knowledge isolation: giving each agent or task its own shard or mode, a bounded context that prevents one agent's working knowledge from leaking into another's and contaminating its reasoning. We think of these three as the boundary (isolation), the permission (policy), and the provenance-and-observability (the hub's telemetry) facets of the same underlying problem - which is also why provenance keeps appearing in our notes as the connective tissue that makes the other two auditable.

Why this matters now: the standards have arrived

For a long time, agent integration was a single-vendor gamble. That changed quickly. In December 2025 the Linux Foundation announced the formation of the Agentic AI Foundation, a neutral home for open agent standards, anchored by founding project contributions including Anthropic's Model Context Protocol, Block's goose and OpenAI's AGENTS.md, with platinum founding members including AWS, Anthropic, Block, Bloomberg, Cloudflare, Google, Microsoft and OpenAI. The substrate we are building on is now community-governed open infrastructure rather than a bet on one company's roadmap.

The protocols themselves now encode the patterns we had been reaching for independently. The Agent2Agent protocol, also hosted by the Linux Foundation, standardises exactly the coordination-hub primitives: messages carry a role and typed parts (text, files, structured JSON data), and tasks move through an explicit lifecycle - submitted, working, input-required, completed, failed, canceled - over JSON-RPC with streaming status updates. That input-required state is the same human-in-the-loop pause our policy engine's ask tier implements, and the published-to-delivered-to-acknowledged telemetry we run in our coordination hub is, gratifyingly, converging on the same kind of explicit, correlatable lifecycle the standards are settling on.

Why a policy engine is not redundant

A fair challenge: if the protocols already say to keep a human in the loop, why build a policy engine? Because the specification uses SHOULD, not MUST. The MCP specification (version 2025-11-25) states that for trust and safety there SHOULD always be a human in the loop with the ability to deny tool invocations, that applications SHOULD present confirmation prompts, and that clients SHOULD show tool inputs to the user before calling the server. These are recommendations a busy implementer can quietly skip. A deterministic engine that enforces allow / deny / ask is therefore a real value-add precisely because the standard makes it optional.

The same specification sharpens the argument from the other direction. MCP lets tools advertise behaviour through annotations - read-only versus destructive hints - but it also warns that clients MUST consider tool annotations untrusted unless they come from trusted servers. So behaviour hints supplied by a tool can never be the security boundary; the boundary has to be enforced by a trusted layer outside the agent and outside the tool. This is not a fringe view. The OpenAI Agents SDK runs tool guardrails on every function-tool invocation, with input guardrails before and output guardrails after, and a tripwire that halts the run. In January 2026 Kong shipped MCP Tool ACLs that enforce per-tool allow/deny at the gateway with a default-deny posture - start with zero access, grant explicitly, new tools restricted automatically - and identity-based filtering so different consumers see different tool subsets. Independent implementations of the same pattern, arrived at because the problem is real.

What we are learning

The deepest lesson so far is about scope. It is tempting to govern all agents the same way and call it a security model. Gartner's analysts argue the opposite: applying uniform governance across AI agents leads to failure, because organisations conflate an agent's ability to act with the scope of access it has been granted. They predict that by 2027, 40% of enterprises will demote or decommission autonomous agents because of governance gaps discovered only after a production incident. That is independent corroboration that per-tool-call policy and knowledge isolation are not over-engineering - they are the difference between an agent that survives production and one that gets pulled. It is the same gap we describe in The Governance-to-Value Ratio and The Acceptance Gap: capability is the easy half; the trust to let it run unsupervised is earned through boundaries, not promised in a demo.

The integration is the easy part. The moment an agent can act, you have not built a feature - you have granted authority. Everything hard after that is about bounding, recording and watching how that authority is used.

We are also learning that knowledge isolation is an architecture decision, not a configuration toggle. In our own lab work, giving each sub-agent or task a bounded context - its own working memory and a deliberately narrow view of the world, returning only a condensed summary to whatever called it - is what stops deep work in one place from contaminating reasoning elsewhere. That maps directly onto the shards-and-modes pattern we build, and it reframes isolation as the thing that makes multi-agent work tractable at all, rather than a safety afterthought.

An honest note on the stage

This work is at Incubation. That word is load-bearing. It means we have built and run these patterns in the lab against real tasks, and we believe in them strongly enough to keep investing - but we have not earned the right to call any of them battle-tested, and we will not. The protocols are young, our experience with them is younger, and some of what we hold today will be wrong by next year. What we can promise is that we build in the open, name our uncertainty, and let evidence move our convictions. The coordination hub, the policy engine and the isolation model are where our thinking currently rests, not where it has stopped.

If there is a thread that ties this incubation to the rest of our work, it is that agents make Event Contracts as the Coordination Layer for Mixed Human and Agent Teams the natural seam: typed messages and lifecycle states are how machines and people stay in step without one quietly overriding the other. That is the direction we are pushing next - turning the coordination hub from a way agents talk into a shared, auditable contract that a human can read, trust and, when it matters, refuse.