There is no best agent — only the right agent for one workflow, one data reality, and one risk appetite. This chapter is the design method: discovery, autonomy, architecture mapping, and a spec you can hand to a builder.
Discovery: ten questions before any code
Most failed agent projects were lost before engineering began — wrong workflow, fuzzy success criteria, or data that wasn't there. Run discovery as a working session with the people who do the job today, and leave with written answers to:
- What exact workflow, end to end? Walk a real example, not the org chart's version of it.
- How often does it run, and what does each run cost today in time and money?
- What does a failure cost — and is it reversible? (A wrong draft is cheap; a wrong payment is not.)
- Where does the knowledge live — systems, documents, or someone's head?
- Which systems must the agent read or write, and do APIs exist?
- What does 'done well' mean, measurably? This sentence becomes your eval rubric.
- Who reviews, who approves, who gets the escalations?
- What data may leave the building — and what must never? (Residency rules decide architecture.)
- What volume in 12 months if it works? Build for that, not for the demo.
- Who owns the agent after launch — its prompts, evals, and weekly flywheel?
Pick the autonomy level deliberately
Autonomy is a dial, not a binary, and the right setting comes from failure cost and trust earned — not ambition. Ship one level below where you think you belong, instrument everything, and earn your way up with eval evidence.
rising autonomy → rising blast radius → rising need for evals, budgets and audit
L5 L4 L3 L0
Scripted automation no model in the loop
L2 L1
Assist drafts & suggestions; human does the work Approve agent acts after explicit human sign-off Supervise agent acts; human reviews samples & exceptions Delegate agent owns the task; escalates by policy
Figure 12.1 — The autonomy ladder. Most successful first deployments launch at L2-L3.
From answers to architecture
Discovery answers map almost mechanically onto the choices from Parts II and III:
Autonomous agent owns the outcome end-to-end Discovery finding Design consequence Where Predictable process, steps known workflow with LLM steps, not a free agent Ch. 2 Open-ended, branching, judgment-heavy agent loop; add planning + reflection Ch. 2 Multiple systems to touch MCP servers per system; typed tool contracts Ch. 4 Needs to remember users/cases over time scoped memory layer + write policy Ch. 5 Pause for approvals; long-running durable execution, checkpoints, HITL gates Ch. 6 Strict data residency local/hybrid serving; self-hosted gateway & tracing Ch. 7-8 High volume, cost-sensitive caching + cascade routing from day one Ch. 9 Irreversible or high-value actions L2-L3 autonomy, approval gates, budgets Ch. 10 Quality disputes likely eval suite + tracing before launch, not after Ch. 11
Build, buy, or assemble
Buy a finished product when your workflow is genuinely commodity (generic meeting notes, first-line IT FAQ) and differentiation doesn't matter. Build on frameworks plus your own interfaces when the workflow is your business — your pricing logic, your service playbook, your data. The middle path, assembling vendor agents behind protocol seams (MCP for tools, A2A between agents), is increasingly the pragmatic default: buy the commodity edges, build the differentiating core. Whatever you choose, the evals, budgets and audit trail are always yours to own.
Worked spec — a real-estate lead qualifier
A brokerage receives hundreds of portal and WhatsApp enquiries weekly; agents waste hours on unqualified leads and respond slowly to good ones. Discovery says: high volume, modest failure cost (a misrouted lead), bilingual audience, CRM is the system of record, response speed is the KPI. The spec that falls out:
- Objective — respond to every enquiry in under 2 minutes, qualify against budget / area / timeline / financing, and book viewings for qualified leads.
- Autonomy — L3 — messages send automatically; pricing commitments and complaints escalate to a human within the same thread.
- Pattern — router + single agent loop; no multi-agent topology needed at this volume.
- Tools (via MCP) — CRM read/write, listings search, calendar booking, WhatsApp Business send — each schema-validated, send-rate budgeted.
- Memory — per-lead profile (facts + preferences) with 12-month decay; no cross-lead recall by policy.
- Models — budget model for classification and extraction; frontier model for negotiation-tone drafting; prompt caching on the listing-policy prefix.
- Evals — 40 labelled historical enquiries — qualification accuracy ≥ 90%, zero pricing commitments, Arabic quality spot-checked by a native speaker.
- Success metric — median response < 2 min; ≥ 25% more viewings booked per 100 enquiries within 8 weeks, at agreed cost per lead.
The one-page agent spec
One page, eight headings: Objective · Autonomy level · Pattern · Tools & data · Memory policy · Models & cost plan · Eval set & pass bar · Owner & escalation path. If you cannot fill all eight, you are not ready to build — you are ready for more discovery.