Scaling, Reliability & Safety Engineering | AI Agent Engineering Handbook

An agent that is 95% right per step is 60% right after ten steps. This chapter is about closing that gap, and making sure the failures that remain are cheap, contained, and recoverable.

The compounding-error problem

Reliability in agents is multiplicative. At 95% per-step accuracy, a ten-step run succeeds about 60% of the time (0.95^10 ≈ 0.60); at twenty steps it drops toward a coin flip. Production teams attack this from both sides: raise per-step accuracy (better tools, tighter prompts, structured outputs) and cut the number of unchecked steps (checkpoints, validations, early exits). Design for the math, not against it.

0.95^10 ≈ 60%

40%+

ten 95%-reliable steps, compounded
agentic projects Gartner expects cancelled by 2027
basic probability
Gartner, 2025

Scaling the boring, proven way

Agents scale like any other workload once you make them stateless: workers pull runs from a queue, every step reads and writes state in a store (the durable-execution pattern from Chapter 6), and any worker can resume any run. From there the standard toolkit applies, horizontal autoscaling, rate-limit-aware backpressure, and circuit breakers per provider. Three agent-specific additions matter:

Budgets on everything, max steps, max tokens, max wall-clock, max spend per run. A runaway loop should hit a wall in seconds, not show up on an invoice.
Idempotent tools, checkpoint IDs double as deduplication keys, so a retried step cannot send two refunds or two emails.
Graceful degradation, when a provider or tool fails, the agent should fall back, smaller model, cached answer, or a clean handoff to a human, rather than erroring out.

Security: prompt injection and the lethal trifecta

The defining security problem of agents is prompt injection: instructions hidden in content the agent reads, a web page, an email, a PDF, a tool result, that the model treats as commands. The highest-risk shape is what security researcher Simon Willison calls the lethal trifecta: one agent that combines access to private data, exposure to untrusted content, and

the ability to communicate externally. With all three, a poisoned input can exfiltrate whatever the agent can read. No reliable model-level fix exists as of 2026, so the answer is architectural: break the trifecta (does the email-reading agent really need outbound web access?), and layer defenses so no single failure is fatal.

Inputs
sanitise, tag untrusted content, strip secrets
Model & prompt
system-prompt hygiene, spotlighting untrusted spans
Tools
least privilege, allowlists, sandboxed execution, read-only by default
Outputs
schema validation, moderation, no raw HTML/SQL pass-through
Humans & limits
approval gates for irreversible acts; token, time and spend budgets

Figure 10.1. Defense in depth: every layer assumes the one above it can fail.

Treat all retrieved content as data, never instructions, tag or 'spotlight' untrusted spans so the model can tell them apart.
Least-privilege tools: read-only by default, allowlisted domains and tables, sandboxed code execution.
Schema-validate every tool call and every output; reject rather than repair on policy violations.
Human approval gates on irreversible or high-value actions, payments, deletions, external sends.
Log every step with inputs and outputs; an agent you cannot audit is an agent you cannot trust.

The capability budget

Write down, per agent: what it may read, what it may do, what it may spend, and who
approves the exceptions. If a capability is not on the list, the agent does not get it. Most
production incidents trace back to capabilities nobody remembered granting.

← Credit-Smart: Cost Optimization for Online APIs Evaluation & Observability →