Where the field is heading through 2027, what to bet on versus watch, and a concrete twelve-week plan from idea to measured pilot.
Five trajectories worth planning around
- Agents disappear into software — Gartner projects 40% of enterprise applications will embed task-specific agents by the end of 2026, up from under 5% in 2025. 'Agent' stops being a product category and becomes a feature of everything.
- Protocols become plumbing — MCP under Linux Foundation governance, A2A for cross-vendor delegation, and enterprise auth landing in both — integration moves from bespoke code to configuration. Bet on the seams, not the frameworks.
- Memory and learning mature — temporal knowledge graphs and self-editing memory move from papers to defaults; agents that improve from their own traces become the expectation.
- Governance hardens — the EU AI Act's high-risk obligations phase in, UAE and Saudi national AI programs formalise procurement standards, and audit trails shift from best practice to license to operate.
- The shakeout is real — the same analysts forecasting embedded agents forecast 40%+ project cancellations by 2027. The market punishes vague KPIs faster than weak models.
The 90-day playbook
Twelve weeks is enough to go from idea to a measured pilot — if scope stays narrow and measurement starts on day one:
Weeks 1-2 Weeks 3-4 Weeks 5-8 Weeks 9-12 Choose & baseline pick one painful, measurable workflow; capture today's cost, time, quality Workflow-first prototype thin slice with explicit steps; real data; demo to the people who do the job Evals & hardening 30+ real eval cases; tracing, budgets, guardrails; cost pass (cache + route) Pilot & decide limited rollout vs. baseline; weekly flywheel; scale, iterate, or stop honestly
Figure 14.1 — Four phases, one workflow, honest numbers at the end.
The decision at week twelve is the whole point: scale it, iterate it, or stop it — based on the baseline you captured in week one. Teams that skip the baseline can never prove the win, and unprovable wins get cancelled in the next budget cycle.
Ten principles to keep
- Start with the workflow, not the technology — and pick one that is painful, frequent, and measurable.
- Use the simplest pattern that works; graduate to agents, then multi-agent, only when evals demand it.
- Own your interfaces, prompts and evals; rent everything else.
- Put protocol seams (MCP, A2A) wherever you may want to swap parts later.
- Treat memory as an engineered subsystem with a write policy — not a big context window.
- Design for the error math: budgets, checkpoints, idempotent tools, human gates on irreversible acts.
- Make cost a design input: cache, route, compress — measured per task.
- Keep data residency a first-class requirement, not a deployment afterthought.
- Trace everything; turn failures into eval cases weekly.
- Earn autonomy with evidence — ship at L2-L3 and let the data raise the dial. The teams that win with agents are not the ones with the best model. They are the ones with the clearest workflow, the honest evals, and the discipline to ship small and measure.