Appendices

Chapter C

The Builder's Checklist

A production checklist for agent architecture, cost, safety, reliability, evaluation, launch, and operations.

Download the complete PDF View all chapters

Print this. Every line maps to a chapter; every production incident maps to a skipped line.

Before you build

One workflow chosen, painful, frequent, measurable (Ch. 12).
Baseline captured: today's cost, time, quality (Ch. 14).
One-page spec complete: all eight headings filled (Ch. 12).
Workflow-vs-agent decision made consciously (Ch. 1-2).
Autonomy level set one notch conservative (Ch. 12).

Architecture

Own interfaces wrap every framework and vendor call (Ch. 7).
Gateway in place; fallback chain configured; model versions pinned (Ch. 7).
Tools typed, least-privilege, idempotent; MCP seams where parts may swap (Ch. 4, 10).
Memory has scopes, a write policy, and decay (Ch. 5).
Long-running runs are durable: checkpoints + resumability (Ch. 6).
Residency decided: cloud, local, or hybrid router (Ch. 8).

Cost

Per-task cost tracing live before tuning (Ch. 9, 11).
Prompt caching on stable prefixes; prompts ordered stable-first (Ch. 9).
Cascade routing for high-volume paths; batch API for non-urgent jobs (Ch. 9).

Safety & reliability

Budgets on steps, tokens, time and spend per run (Ch. 10).
Untrusted content tagged as data; trifecta broken by design (Ch. 10).
Schema validation on every tool call and output (Ch. 10).
Human gates on irreversible or high-value actions (Ch. 10).
Kill-switch and rollback rehearsed (Ch. 11).

Launch & operate

Eval suite ≥ 30 real cases with a pass bar wired into CI (Ch. 11).
Tracing on 100% of runs; dashboards for success, escalation, cost, p95 (Ch. 11).
Canary rollout vs. baseline; weekly trace-review booked (Ch. 11, 14).
Owner named for prompts, evals and the flywheel (Ch. 12).
Week-12 decision scheduled: scale, iterate, or stop (Ch. 14).

← Glossary of Agent Engineering Terms Sources & Further Reading →