Part IV: From Use Case to Agent

Chapter 12

Designing Custom Agents for Your Use Case

A discovery and architecture method for selecting use cases, setting autonomy, defining tools and memory, and writing an agent specification.

There is no best agent — only the right agent for one workflow, one data reality, and one risk appetite. This chapter is the design method: discovery, autonomy, architecture mapping, and a spec you can hand to a builder.

Discovery: ten questions before any code

Most failed agent projects were lost before engineering began — wrong workflow, fuzzy success criteria, or data that wasn't there. Run discovery as a working session with the people who do the job today, and leave with written answers to:

  • What exact workflow, end to end? Walk a real example, not the org chart's version of it.
  • How often does it run, and what does each run cost today in time and money?
  • What does a failure cost — and is it reversible? (A wrong draft is cheap; a wrong payment is not.)
  • Where does the knowledge live — systems, documents, or someone's head?
  • Which systems must the agent read or write, and do APIs exist?
  • What does 'done well' mean, measurably? This sentence becomes your eval rubric.
  • Who reviews, who approves, who gets the escalations?
  • What data may leave the building — and what must never? (Residency rules decide architecture.)
  • What volume in 12 months if it works? Build for that, not for the demo.
  • Who owns the agent after launch — its prompts, evals, and weekly flywheel?

Pick the autonomy level deliberately

Autonomy is a dial, not a binary, and the right setting comes from failure cost and trust earned — not ambition. Ship one level below where you think you belong, instrument everything, and earn your way up with eval evidence.

rising autonomy → rising blast radius → rising need for evals, budgets and audit

L5 L4 L3 L0

Scripted
automation
no model in the loop

L2 L1

Assist
drafts & suggestions;
human does the work
Approve
agent acts after explicit
human sign-off
Supervise
agent acts; human
reviews samples &
exceptions
Delegate
agent owns the task;
escalates by policy

Figure 12.1 — The autonomy ladder. Most successful first deployments launch at L2-L3.

From answers to architecture

Discovery answers map almost mechanically onto the choices from Parts II and III:

Autonomous
agent owns the
outcome end-to-end
Discovery finding Design consequence Where
Predictable process, steps known workflow with LLM steps, not a free agent Ch. 2
Open-ended, branching,
judgment-heavy
agent loop; add planning + reflection Ch. 2
Multiple systems to touch MCP servers per system; typed tool contracts Ch. 4
Needs to remember users/cases
over time
scoped memory layer + write policy Ch. 5
Pause for approvals; long-running durable execution, checkpoints, HITL gates Ch. 6
Strict data residency local/hybrid serving; self-hosted gateway & tracing Ch. 7-8
High volume, cost-sensitive caching + cascade routing from day one Ch. 9
Irreversible or high-value actions L2-L3 autonomy, approval gates, budgets Ch. 10
Quality disputes likely eval suite + tracing before launch, not after Ch. 11

Build, buy, or assemble

Buy a finished product when your workflow is genuinely commodity (generic meeting notes, first-line IT FAQ) and differentiation doesn't matter. Build on frameworks plus your own interfaces when the workflow is your business — your pricing logic, your service playbook, your data. The middle path, assembling vendor agents behind protocol seams (MCP for tools, A2A between agents), is increasingly the pragmatic default: buy the commodity edges, build the differentiating core. Whatever you choose, the evals, budgets and audit trail are always yours to own.

Worked spec — a real-estate lead qualifier

A brokerage receives hundreds of portal and WhatsApp enquiries weekly; agents waste hours on unqualified leads and respond slowly to good ones. Discovery says: high volume, modest failure cost (a misrouted lead), bilingual audience, CRM is the system of record, response speed is the KPI. The spec that falls out:

  • Objective — respond to every enquiry in under 2 minutes, qualify against budget / area / timeline / financing, and book viewings for qualified leads.
  • Autonomy — L3 — messages send automatically; pricing commitments and complaints escalate to a human within the same thread.
  • Pattern — router + single agent loop; no multi-agent topology needed at this volume.
  • Tools (via MCP) — CRM read/write, listings search, calendar booking, WhatsApp Business send — each schema-validated, send-rate budgeted.
  • Memory — per-lead profile (facts + preferences) with 12-month decay; no cross-lead recall by policy.
  • Models — budget model for classification and extraction; frontier model for negotiation-tone drafting; prompt caching on the listing-policy prefix.
  • Evals — 40 labelled historical enquiries — qualification accuracy ≥ 90%, zero pricing commitments, Arabic quality spot-checked by a native speaker.
  • Success metric — median response < 2 min; ≥ 25% more viewings booked per 100 enquiries within 8 weeks, at agreed cost per lead.

The one-page agent spec

One page, eight headings: Objective · Autonomy level · Pattern · Tools & data · Memory
policy · Models & cost plan · Eval set & pass bar · Owner & escalation path. If you cannot fill
all eight, you are not ready to build — you are ready for more discovery.