Glossary of Agent Engineering Terms | AI Agent Engineering Handbook

Working definitions as used in this book.

A2A (Agent2Agent), open protocol for discovery and delegation between independent agents; Agent Cards describe capabilities.
Agent, an LLM that directs its own tool use in a loop toward a goal, within budgets and policies.
Agentic RAG, retrieval where the agent decides what to fetch, when, and whether to search again.
Cascade / routing, sending each task to the cheapest model likely to succeed, escalating on low confidence.
Checkpoint, persisted run state allowing pause, resume, retry, and human approval gates.
Context engineering, deciding what enters the model's window each step: compaction, scratchpads, JIT retrieval.
Context window, the model's working set per call, finite, priced per token, and not memory.
Durable execution, running agents as resumable state machines so crashes and waits don't lose work.
Eval (evaluation), a repeatable test of agent behaviour: unit, trajectory, outcome, or LLM-as-judge.
Function / tool calling, the model emitting structured arguments for code your system executes.
Guardrails, input/output validation, policy checks and budgets wrapped around model behaviour.
HITL (human-in-the-loop), a person approves, samples, or receives escalations from the agent.
Idempotency, designing actions so a retried step cannot apply twice (no double refunds).
Lethal trifecta, private data + untrusted content + external comms in one agent, the prompt-injection worst case.
LLM-as-judge, using a model to score outputs against a rubric; calibrate against human labels.
MCP (Model Context Protocol), open standard connecting agents to tools, resources and prompts via client-server.
Memory (agent), engineered long-term store, working, episodic, semantic, procedural, with write policies.
Multi-agent system, several agents coordinating via supervisor, pipeline, network, or hierarchy topologies.
Orchestration, the control layer sequencing steps, agents, tools, and approvals.
Prompt caching, provider-side reuse of a processed prompt prefix; reads bill at a fraction of input price.
Prompt injection, instructions hidden in content the agent reads, treated as commands.
Quantization, compressing model weights (e.g. 4-bit) to cut memory and speed up inference at small quality cost.
ReAct, the reason-act-observe loop pattern underlying most single-agent designs.
Semantic cache, answering near-duplicate requests from stored responses without a model call.
Trace / trajectory, the recorded sequence of model calls, tool calls and results for one run.

← Framework Quick Reference The Builder's Checklist →