The moment a team gets one agent working, two temptations appear. The first is to add more agents — a planner, a researcher, a critic, a whole org chart of them — on the theory that more must be smarter. The second is to take the human out of the loop and let it run. Both are dials you can turn, and both have a sensible default that is more modest than the hype suggests. Getting these two settings right is most of what separates an agent that helps from one that quietly causes trouble.
Default to one
Start with a single agent and add more only when you can name what the extra ones buy you. More agents do not add intelligence; they add communication — and communication between agents is where these systems break. A single capable agent with a good set of tools handles most tasks, and it has a decisive practical advantage: one line of reasoning to follow, one identity to govern, one place for things to go wrong. When you fan a task out across many agents, you also have to coordinate, secure and debug all of them. The burden of proof sits with “many,” not with “one.”
When many earns its keep
There are two situations where coordination is genuinely worth the cost. One is real parallelism — independent sub-tasks that can run at the same time, like searching ten sources at once, where wall-clock time matters and the pieces don’t depend on each other. The other is real specialisation — skills, contexts or permissions so different that one agent shouldn’t hold them all. Outside those, multi-agent usually adds failure modes without adding value, and the price is steep: a multi-agent setup can burn on the order of fifteen times the tokens of a single chat, small errors compound as they pass between agents, and the whole thing is harder to debug because no single trace tells you what happened. (The engineering detail of when coordination helps lives in our piece on one agent or many — the rule of thumb here is simply: reach for many only when you can name the parallelism or the isolation it buys.)
How much leash: autonomy is a dial
The second dial is autonomy — how much the agent does without checking with you — and the key idea is that it is independent of how capable the agent is. A very capable agent can still be kept on a short leash, asking permission before each consequential step. A useful way to picture the range is the way we already think about self-driving cars, from fully hands-on to fully autonomous, with several stages in between:
| Setting | What the agent does | The human’s role |
|---|---|---|
| Operator | Nothing without you — you drive | You make every decision |
| Collaborator | Suggests and drafts; you act | You decide and execute |
| Consultant | Proposes a plan and waits | You approve before it acts |
| Approver | Acts, but pauses on the big moves | You sign off on consequential / irreversible steps |
| Observer | Runs on its own | You monitor and can step in |
The mechanism that keeps the dial where you want it is the human-in-the-loop checkpoint: the agent pauses before anything consequential or hard to reverse — spending money, changing a record, sending a message — and a person approves, edits, or rejects. The discipline is to put those checkpoints on exactly the actions you couldn’t comfortably undo, and to resist the pull to remove them just because the agent has been reliable lately.
Where chatbot, copilot and agent sit
This dial also explains the familiar product words. A chatbot simply converses — it answers, it doesn’t act. A copilot or assistant helps a human who stays firmly in charge: every output passes through a person before it becomes an action — low autonomy by design. A true agent is goal-directed: it plans, uses tools, runs in a loop, and takes action on its own, pausing for a human only at the checkpoints you set. “Agentic” is therefore a spectrum, not a badge — and a lot of what is sold as an “agent” is really a copilot, which is often exactly what you want. (Telling the genuine article from the relabelled one is its own skill — see agent washing.)
The questions to ask aren’t “how many agents?” and “how autonomous?” — they’re “what’s the fewest agents that do this?” and “what’s the least it can do on its own and still be useful?” Start low on both dials and turn them up only against a reason you can name.
So what
Both dials have the same governing principle, and it is the through-line of this whole field: use the least that does the job. The fewest agents, the least autonomy, the humans kept where the consequences are. That is not timidity; it is how you get a system you can debug, afford, and defend. The impressive-looking choice — a swarm of fully autonomous agents — is usually the expensive, fragile one. The boring choice is usually the right one.
Frequently asked
- Is a multi-agent system better than a single agent?
- Usually not. More agents add coordination, not intelligence, and coordination is where these systems fail. A single capable agent with good tools handles most tasks and is far easier to debug, govern and afford. Multi-agent only earns its keep for genuinely parallel or genuinely specialised work — and it can cost roughly fifteen times the tokens of a single chat.
- What does “agent autonomy” mean?
- How much an agent does without checking with a human — a dial from “Operator” (you decide everything) through to “Observer” (it runs on its own), with stages like Approver (it acts but pauses on consequential moves) in between. Autonomy is independent of capability: a very capable agent can still be kept on a short leash.
- What’s the difference between a chatbot, a copilot and an agent?
- A chatbot converses but doesn’t act. A copilot/assistant helps a human who stays in charge — every output passes through a person before it becomes an action. A true agent is goal-directed: it plans, uses tools and acts in a loop on its own, pausing for a human only at set checkpoints. “Agentic” is a spectrum, and much that’s sold as an “agent” is really a copilot.
- How much autonomy should we give an AI agent?
- As little as does the job. Set the dial deliberately and keep human-in-the-loop checkpoints on anything consequential or hard to reverse — spending, changing records, sending messages. Resist removing those checkpoints just because the agent has been reliable lately; the cost of a confident mistake on an irreversible action is what they exist to catch.