Working definitions as used in this book.
- A2A (Agent2Agent) — open protocol for discovery and delegation between independent agents; Agent Cards describe capabilities.
- Agent — an LLM that directs its own tool use in a loop toward a goal, within budgets and policies.
- Agentic RAG — retrieval where the agent decides what to fetch, when, and whether to search again.
- Cascade / routing — sending each task to the cheapest model likely to succeed, escalating on low confidence.
- Checkpoint — persisted run state allowing pause, resume, retry, and human approval gates.
- Context engineering — deciding what enters the model's window each step: compaction, scratchpads, JIT retrieval.
- Context window — the model's working set per call — finite, priced per token, and not memory.
- Durable execution — running agents as resumable state machines so crashes and waits don't lose work.
- Eval (evaluation) — a repeatable test of agent behaviour: unit, trajectory, outcome, or LLM-as-judge.
- Function / tool calling — the model emitting structured arguments for code your system executes.
- Guardrails — input/output validation, policy checks and budgets wrapped around model behaviour.
- HITL (human-in-the-loop) — a person approves, samples, or receives escalations from the agent.
- Idempotency — designing actions so a retried step cannot apply twice (no double refunds).
- Lethal trifecta — private data + untrusted content + external comms in one agent — the prompt-injection worst case.
- LLM-as-judge — using a model to score outputs against a rubric; calibrate against human labels.
- MCP (Model Context Protocol) — open standard connecting agents to tools, resources and prompts via client-server.
- Memory (agent) — engineered long-term store — working, episodic, semantic, procedural — with write policies.
- Multi-agent system — several agents coordinating via supervisor, pipeline, network, or hierarchy topologies.
- Orchestration — the control layer sequencing steps, agents, tools, and approvals.
- Prompt caching — provider-side reuse of a processed prompt prefix; reads bill at a fraction of input price.
- Prompt injection — instructions hidden in content the agent reads, treated as commands.
- Quantization — compressing model weights (e.g. 4-bit) to cut memory and speed up inference at small quality cost.
- ReAct — the reason-act-observe loop pattern underlying most single-agent designs.
- Semantic cache — answering near-duplicate requests from stored responses without a model call.
- Trace / trajectory — the recorded sequence of model calls, tool calls and results for one run.