Print this. Every line maps to a chapter; every production incident maps to a skipped line.
Before you build
- One workflow chosen — painful, frequent, measurable (Ch. 12).
- Baseline captured: today's cost, time, quality (Ch. 14).
- One-page spec complete: all eight headings filled (Ch. 12).
- Workflow-vs-agent decision made consciously (Ch. 1-2).
- Autonomy level set one notch conservative (Ch. 12).
Architecture
- Own interfaces wrap every framework and vendor call (Ch. 7).
- Gateway in place; fallback chain configured; model versions pinned (Ch. 7).
- Tools typed, least-privilege, idempotent; MCP seams where parts may swap (Ch. 4, 10).
- Memory has scopes, a write policy, and decay (Ch. 5).
- Long-running runs are durable: checkpoints + resumability (Ch. 6).
- Residency decided: cloud, local, or hybrid router (Ch. 8).
Cost
- Per-task cost tracing live before tuning (Ch. 9, 11).
- Prompt caching on stable prefixes; prompts ordered stable-first (Ch. 9).
- Cascade routing for high-volume paths; batch API for non-urgent jobs (Ch. 9).
Safety & reliability
- Budgets on steps, tokens, time and spend per run (Ch. 10).
- Untrusted content tagged as data; trifecta broken by design (Ch. 10).
- Schema validation on every tool call and output (Ch. 10).
- Human gates on irreversible or high-value actions (Ch. 10).
- Kill-switch and rollback rehearsed (Ch. 11).
Launch & operate
- Eval suite ≥ 30 real cases with a pass bar wired into CI (Ch. 11).
- Tracing on 100% of runs; dashboards for success, escalation, cost, p95 (Ch. 11).
- Canary rollout vs. baseline; weekly trace-review booked (Ch. 11, 14).
- Owner named for prompts, evals and the flywheel (Ch. 12).
- Week-12 decision scheduled: scale, iterate, or stop (Ch. 14).