Appendices

Chapter C

The Builder's Checklist

A production checklist for agent architecture, cost, safety, reliability, evaluation, launch, and operations.

Print this. Every line maps to a chapter; every production incident maps to a skipped line.

Before you build

  • One workflow chosen — painful, frequent, measurable (Ch. 12).
  • Baseline captured: today's cost, time, quality (Ch. 14).
  • One-page spec complete: all eight headings filled (Ch. 12).
  • Workflow-vs-agent decision made consciously (Ch. 1-2).
  • Autonomy level set one notch conservative (Ch. 12).

Architecture

  • Own interfaces wrap every framework and vendor call (Ch. 7).
  • Gateway in place; fallback chain configured; model versions pinned (Ch. 7).
  • Tools typed, least-privilege, idempotent; MCP seams where parts may swap (Ch. 4, 10).
  • Memory has scopes, a write policy, and decay (Ch. 5).
  • Long-running runs are durable: checkpoints + resumability (Ch. 6).
  • Residency decided: cloud, local, or hybrid router (Ch. 8).

Cost

  • Per-task cost tracing live before tuning (Ch. 9, 11).
  • Prompt caching on stable prefixes; prompts ordered stable-first (Ch. 9).
  • Cascade routing for high-volume paths; batch API for non-urgent jobs (Ch. 9).

Safety & reliability

  • Budgets on steps, tokens, time and spend per run (Ch. 10).
  • Untrusted content tagged as data; trifecta broken by design (Ch. 10).
  • Schema validation on every tool call and output (Ch. 10).
  • Human gates on irreversible or high-value actions (Ch. 10).
  • Kill-switch and rollback rehearsed (Ch. 11).

Launch & operate

  • Eval suite ≥ 30 real cases with a pass bar wired into CI (Ch. 11).
  • Tracing on 100% of runs; dashboards for success, escalation, cost, p95 (Ch. 11).
  • Canary rollout vs. baseline; weekly trace-review booked (Ch. 11, 14).
  • Owner named for prompts, evals and the flywheel (Ch. 12).
  • Week-12 decision scheduled: scale, iterate, or stop (Ch. 14).