What Is an AI Agent (and What Isn't) | AI Agent Engineering Handbook

Working definitions, the anatomy of an agent, the control spectrum, and the discipline of knowing when a plain workflow beats an agent.

A working definition

Strip away the marketing and an AI agent is a simple thing: a system in which a large language model directs its own process toward a goal. The model decides which steps to take, which tools to call, how to interpret the results, and when the job is done. Given the goal 'refund this customer if the order qualifies', an agent reads the policy, queries the order system, weighs the evidence, executes the refund through an API, and writes the case note, choosing that sequence itself. Contrast this with the two things agents are most often confused with. A chatbot converses but does not act: it produces text, and a human does the rest. A workflow acts but does not decide: every branch was written by a developer in advance, and the model, if there is one, fills in a single step, like summarizing or classifying. Anthropic's engineering teams drew this line crisply in their widely cited guidance: workflows run on predefined code paths, while agents dynamically direct their own tool use. The distinction matters because it predicts cost, reliability, and how the system fails.

Scripted workflow
LLM-in-the-loop
Guided agent
Autonomous agent
code decides every step
model fills one step
model picks tools on rails
model owns the path

PREDICTABLE / CHEAP / RIGID

FLEXIBLE / COSTLIER / NEEDS GUARDRAILS

Figure 1.1. The control spectrum. Most production systems live in the middle, not at the edges.

The anatomy of an agent

Every agent, in any framework, is assembled from the same six components around a model core. Frameworks differ in how much of this they hand you pre-built; the architecture underneath barely changes.

Planning Loop
decompose, decide, retry
Tools / Actions
APIs, search, code, MCP
Memory
short-term + long-term

LLM CORE

reasons, plans, decides
Knowledge
RAG, files, databases
Guardrails
permissions, limits, HITL
Environment
users, systems, the world

Figure 1.2. The anatomy of an agent: six components around a reasoning core.

LLM core, the reasoning engine. Model choice sets the ceiling on capability and the floor on cost.
Planning loop, the control flow that turns one model call into a sequence: decompose, decide, act, check, retry.
Tools, typed functions the model may call, search, databases, CRMs, code execution, other agents.
Memory, what persists: the working context of this task, and long-term knowledge across sessions (Chapter 5).
Knowledge, reference material the agent retrieves rather than memorizes, documents, RAG indexes, wikis.
Guardrails, permissions, budgets, output validation, and human approval gates that bound what 'autonomous' means.

When you should not build an agent

The most expensive mistake in this field is reaching for an agent when a workflow would do. Autonomy costs tokens (the model re-reads context every loop), latency (each decision is an inference), and predictability (the path varies run to run). A useful test before any build:

Can you draw the flowchart?, If a competent operator follows the same steps every time, encode those steps. Use the model only for the fuzzy boxes, extraction, judgment, language.
Is the variance real?, Agents earn their cost when inputs are genuinely unpredictable: ambiguous requests, changing environments, tasks where the next step depends on what was just discovered.
Can you afford the failure?, An agent that is 95% right per step is roughly 60% right across ten steps. If errors are costly and hard to detect, keep humans or deterministic checks in the loop.

Field example: invoice intake vs. itinerary rescue

A document-intake pipeline (receive invoice, extract fields, validate against the PO, post to
the ERP) is a workflow with one or two LLM steps — build it that way and it will be cheaper
and more reliable. Re-planning a disrupted travel itinerary — where the next call depends
on what the airline, hotel, and customer each say — is a genuine agent problem.

Why now: the 2026 inflection

Agents stopped being a research topic because three curves crossed. Models became reliable enough at tool selection to chain dozens of steps. A standard protocol layer (MCP, Chapter 4) collapsed the integration cost of connecting them to real systems. And the economics improved an order of magnitude through caching, routing, and small models (Chapter 9). The result shows up in adoption research: Google Cloud and KPMG studies from late 2025 found roughly half of surveyed enterprises running agents in production, with about three quarters of those reporting measurable ROI within the first year. Gartner projects that 40% of enterprise applications will embed task-specific agents by the end of 2026, up from under 5% a year earlier, while also warning that a large share of agentic projects will be cancelled for unclear ROI. Both can be true. The difference between the two outcomes is engineering discipline, which is what the rest of this book is about.

52%

of enterprises had agents in
production by late 2025
Google Cloud / KPMG research

74%

of deployers report ROI within
the first year
same studies

40%

of enterprise apps to embed
agents by end-2026
Gartner forecast

Core Design Patterns →