From two serious options to twelve production-grade frameworks in eighteen months. What each one is actually for, how to choose, and how to avoid marrying one.
How we got here
Through 2024 the choice was effectively LangChain or roll-your-own. By mid-2026 the field looks completely different: industry surveys such as Uvik's 2026 comparison count a dozen production-viable Python frameworks, and the three big model vendors each shipped a first-party agent SDK within weeks of one another. Microsoft consolidated AutoGen and Semantic Kernel into a unified Agent Framework, moving classic AutoGen into maintenance (the community continues it as AG2). Meanwhile TypeScript builders got Mastra and the vendor SDKs' JS ports, and low-code platforms (n8n, Dify, Copilot Studio) made simple agents a configuration exercise. Two findings from the field matter more than any ranking. First: there is no winner. Across one consultancy's twelve 2025-26 client engagements, no single framework appeared more than four times — the right answer tracked workflow complexity, vendor commitment, and appetite for abstraction. Second: composition is normal. Teams routinely run a LangGraph orchestration spine with CrewAI-style role agents inside it, or a vendor SDK agent that calls tools shared with everything else over MCP.
The twelve, in one table
Framework Core abstraction Sweet spot Watch out for LangGraph Graph / state machine with checkpoints Complex routing, approvals, durable long-running flows; the production default for many teams Steeper learning curve; graph thinking is mandatory LangChain Chains + integration library Rapid prototyping atop the largest integration catalog Abstraction churn; many teams graduate to LangGraph CrewAI Role-based crews (role, goal, backstory) Content pipelines, research-write-review, role-shaped business processes; A2A delegation added in 2026 Role metaphor strains on highly dynamic tasks OpenAI Agents SDK Lightweight agents + handoffs + guardrails Fast builds inside the OpenAI ecosystem; clean tracing Vendor-centric; portability needs discipline Framework Core abstraction Sweet spot Watch out for Claude Agent SDK The Claude Code harness as a library: files, terminal, computer use, sub-agents Coding agents and desk-work automation with strong tool ergonomics; MCP-native Anthropic-centric by design Google ADK Multi-agent hierarchies; native A2A with auto Agent Cards Cross-vendor agent interop, Google Cloud estates Heavier; assumes Google tooling Pydantic AI Typed agents, validated structured outputs Teams that want compile-time-ish safety, testing, and clean dependency injection Smaller ecosystem than the giants smolagents Minimal code-acting agents (~1K LOC core) Hugging Face stack, research, learning the loop; agents that write code as their action format Code-execution security needs sandboxing Agent Framework (MS) Unified successor to Semantic Kernel + AutoGen .NET / Azure enterprises, compliance-heavy estates Newest of the set; migration from SK/AutoGen ongoing AG2 (AutoGen fork) Conversational multi-agent chat Research-style agent dialogues, code-execution loops Classic AutoGen itself is in maintenance LlamaIndex Data-centric agents over indexes Document workflows, agentic RAG, knowledge assistants Less suited to general orchestration Haystack Composable pipelines (Deepset) Production search + RAG with agent steps; strong eval tooling Pipeline mindset, not free-form autonomy Honourable mentions: Mastra (TypeScript-first, batteries included), DSPy (programmatic prompt optimization rather than an agent runtime), and the low-code tier — n8n, Dify, Microsoft Copilot Studio — which is genuinely sufficient for linear, low-risk internal automations.
A decision guide
Complex branching, audits, pause/resume? yes LangGraph no no no no no Process maps to roles (research, write, review)? yes CrewAI Committed to one model vendor, want speed? yes Vendor SDK (OpenAI / Claude /
ADK)
Type-safety and testability first? yes Pydantic AI Microsoft / .NET enterprise estate? yes Agent Framework (SK) Document- and data-centric agents? yes LlamaIndex / Haystack None of the above: start with a minimal loop (smolagents or ~100 lines of your own) and graduate only when you feel the ceiling
Figure 3.1 — A pragmatic selection ladder. First 'yes' wins; composition across answers is legitimate.
The lock-in question
Frameworks are the layer most likely to churn under you — abstractions get deprecated, pricing and licensing shift, a better fit appears. The teams that switch painlessly all did the same thing: they kept their own thin interfaces for 'agent', 'tool', and 'memory', and treated the framework as an implementation detail behind them. That hexagonal discipline costs a few hundred lines up front and buys you the right to change your mind. Chapter 7 turns this into a full portability playbook.
Selection method that works
Pick by elimination, not attraction. List your hard constraints — vendor commitments, language, compliance, team skills, durability needs — and strike frameworks that fail any of them. Whatever survives, prototype the riskiest slice of your real workload in two days before committing. A framework that demos well on toy tasks can still fight you on yours.