#2169: How Enterprises Are Rethinking Agent Frameworks

Twelve major agentic AI frameworks exist—yet many serious developers avoid them entirely. What patterns emerge in real enterprise adoption?

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2327
Published: Apr 12
Duration: 24:36
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: claude-sonnet-4-6
Topics: ai-agents ai-safety software-development

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Why Enterprises Are Ditching Agent Frameworks

The agentic AI space has exploded with options. LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Google ADK, Pydantic AI, Smolagents—the list keeps growing. Yet a striking paradox emerges: despite this abundance, many serious developers are actively avoiding frameworks altogether. Understanding this split reveals something fundamental about how enterprise AI actually gets built.

The Framework Abundance Trap

When you see a dozen competing frameworks, the natural instinct is that the market is maturing and consolidating. But the situation is more complex. As analyst Janakiram MSV noted, this mirrors the 2015 container orchestration wars—Docker Swarm, Mesos, and Kubernetes all competing for dominance. Kubernetes won decisively.

But agent frameworks face a fundamentally different squeeze. In 2015, containers didn't get better at orchestrating themselves. You always needed an orchestration layer. Today, the foundation models themselves are improving at orchestration with each generation. This means the independent framework layer is being compressed from below by the models themselves and from above by hyperscalers.

The Hyperscaler Strategy

AWS released Bedrock AgentCore. Google has ADK, native to Gemini and Vertex AI. Microsoft has their Agent Framework for Azure Foundry. All are open-source and free to use. This mirrors the GKE, EKS, AKS playbook: give away the orchestrator, monetize the infrastructure underneath.

The framework becomes a loss leader. When you build on AWS's Bedrock AgentCore, you're not just choosing a runtime—you're embedding your agent architecture into AWS's governance, observability, and billing stack in ways that accumulate and become increasingly difficult to unwind. Agentic lock-in operates at multiple layers simultaneously: the foundation model, the orchestration framework, the runtime environment, and the developer patterns your team internalizes.

The Anthropic Contradiction

Anthropic published "Building Effective Agents" in December 2024 with a core thesis: don't reach for a framework. Their engineering team found that the most successful production implementations weren't using complex frameworks or specialized libraries. They were building with simple, composable patterns directly against LLM APIs.

This creates an apparent contradiction—Anthropic maintains a Claude Agent SDK while officially recommending against frameworks. But this isn't hypocrisy; it reflects a genuine tension. Frameworks reduce boilerplate and accelerate prototyping. The problem is the path from prototype to production-grade system, where frameworks often become liabilities rather than assets.

The Scaling Reality

The adoption numbers tell a sobering story. McKinsey found 39% of organizations are experimenting with agents, but only 23% have begun scaling within even one business function. Gartner predicted 40% of agentic AI deployments will be canceled by 2027 due to rising costs, unclear value, or poor risk controls.

JetBrains surveyed 11,000 developers in January 2026: 90% use AI at work, 66% of companies plan to adopt coding agents within twelve months, but only 13% report using AI across the full software development lifecycle. The gap between "experimenting" and "integrated into production" is massive.

MIT research found 95% of enterprise AI pilots fail to scale, with only 5% delivering measurable profit impact. The constraint isn't model capability—it's operational fit.

The Real Obstacles

The organizational infrastructure required to deploy agents reliably doesn't exist yet: documentation, domain models, escalation policies, testing pipelines, governance structures. AWS's Matthias Patzak noted that agents fail across teams because they lack domain knowledge that exists only in developers' minds—architectural patterns, business rules, design constraints that aren't written down anywhere.

The DORA 2025 report adds another dimension: 77% of organizations deploy once per day or less. Manual testing and deployment pipelines cannot handle the volume of agent-generated code. Even if agents produce good output, delivery infrastructure isn't set up to absorb it.

The Categories That Skip Frameworks Entirely

Several distinct groups never consider frameworks in the first place.

Security and Compliance: A Gravitee report found 88% of organizations experienced confirmed or suspected agent security incidents, yet only 14% had full security approval for their agent fleet. Regulated industries—healthcare, finance, defense, government—can't deploy frameworks lacking security review and compliance certifications. Most frameworks carry no HIPAA, SOC2, or EU AI Act certifications. Akka is essentially the only major framework with multiple compliance certifications.

Geopolitical Constraints: The EU AI Act entered enforcement in August 2025. Transparency requirements, governance documentation, and oversight mechanisms now apply to any AI deployed in EU markets, regardless of the company's headquarters. Companies with EU operations face hard constraints on data processing location that override technical preferences.

Air-Gapped Environments: Defense, intelligence, and critical infrastructure can't use cloud-dependent frameworks. Most mainstream frameworks assume connectivity—they pull from cloud-hosted models, log to remote observability platforms, and call external APIs. Air-gapped deployment requires fundamentally different architecture.

The Vendor Lock-In Matrix

Choosing an agentic AI vendor is categorically different from choosing an API vendor—it's a strategic partnership decision. The lock-in operates at the framework layer, runtime layer, observability layer, and developer pattern layer. When teams internalize how to build agents in a particular framework, that organizational lock-in is very real, even if it doesn't appear in a contract.

APIs from one vendor's platform don't interoperate with another's. You can't easily migrate an agent architecture from Bedrock AgentCore to Azure Foundry. Switching costs accumulate invisibly.

The Bottom Line

The framework explosion isn't a sign of market maturity—it's a sign of strategic positioning by hyperscalers and uncertainty about what actually works at scale. The principled engineering case for building agents without frameworks deserves serious consideration, especially as compliance requirements, cost governance, and organizational readiness emerge as the real constraints on enterprise adoption.

BLOG_POST_END

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2169: How Enterprises Are Rethinking Agent Frameworks

So Daniel sent us this one, and it's a meaty one. He's asking about the agentic AI framework explosion — there are now more than a dozen major open-source options, LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Google ADK, Pydantic AI, Smolagents, the list goes on — and yet despite all that abundance, a significant chunk of serious developers are actively avoiding frameworks altogether. Daniel wants to know: what patterns are we actually seeing in enterprise adoption, and why do frameworks sometimes not even enter the conversation for certain organizations? So. Where do we start with this?

The abundance question is actually the right place to start, because it's a bit of a trap. When you see twelve-plus frameworks competing for attention, the instinct is to think "this space is maturing, pick the best one." But what's actually happening is closer to what Janakiram MSV at The New Stack described back in February — he called it a direct replay of the two thousand fifteen container orchestration wars. Docker Swarm, Mesos, Kubernetes, all fighting for the same territory.

And Kubernetes won that one pretty decisively.

It did. But here's the twist that makes the agent framework situation fundamentally different. In two thousand fifteen, containers didn't get better at orchestrating themselves. You always needed a scheduler. The orchestration layer had durable value. With agent frameworks, the models themselves are getting better at orchestration with every generation. So the independent framework layer is being squeezed from below by the models, and from above by the hyperscalers.

That's a genuinely uncomfortable position to be in if you're, say, the LangGraph team.

It really is. And the hyperscaler angle is where it gets strategically interesting. AWS released Strands Agents. Google has ADK, which is native to Gemini and Vertex AI. Microsoft has their Agent Framework pointing at Azure Foundry. These are all open-source, free to use. And the framing from Janakiram is sharp: this is the GKE, EKS, AKS playbook applied to agents. Give away the orchestrator, monetize the infrastructure underneath.

So the framework is a loss leader.

The framework is a loss leader. And if you build your agentic workflow on AWS's Bedrock AgentCore, for example, you're not just using a runtime — you're embedding your agent architecture into AWS's governance, observability, and billing stack in ways that compound over time and become increasingly difficult to unwind. Agentic lock-in is actually more durable than API lock-in because it accumulates at multiple layers simultaneously: the foundation model, the orchestration framework, the runtime environment, and the developer patterns your team internalizes.

By the way, today's episode is brought to you by Claude Sonnet 4.6, which is generating our script. And yes, I see the irony of an AI writing the section where we discuss how Anthropic tells you not to use frameworks.

Which is the most delicious irony in this entire space, and we have to address it directly. Anthropic's own engineering team published "Building Effective Agents" in December twenty twenty-four, and the core thesis is basically: don't reach for a framework. Their exact framing was that the most successful implementations they'd seen weren't using complex frameworks or specialized libraries. They were building with simple, composable patterns directly against LLM APIs. And then they note — I'm paraphrasing — that if you do use a framework, make sure you understand the underlying code, because incorrect assumptions about what's under the hood are a common source of errors.

So Anthropic's official engineering advice is "maybe don't use a framework" — and Anthropic has a Claude Agent SDK.

They do. And look, I don't think that's actually hypocritical — I think it reflects a genuine tension. Frameworks are useful for getting started. They reduce boilerplate, they give you patterns to follow, they're great for demos and prototypes. The problem is the path from framework-assisted prototype to production-grade system is often where they become a liability rather than an asset.

Which brings us to the adoption numbers, because they tell a pretty interesting story.

They do, and the numbers are all over the place in a way that's revealing. McKinsey found thirty-nine percent of organizations are experimenting with agents, but only twenty-three percent have begun scaling within even one business function. Gartner, in January twenty twenty-five, said sixty-one percent had begun agentic AI development — but the same report predicts forty percent of agentic AI deployments will be canceled by twenty twenty-seven. Due to rising costs, unclear value, or poor risk controls.

Forty percent cancellation rate predicted before they even fully launch. That's a brutal forecast.

And it's not coming from pessimists — Gartner is generally pretty bullish on enterprise tech adoption. The JetBrains AI Pulse survey from January twenty twenty-six, which covered eleven thousand developers, found ninety percent use AI at work, sixty-six percent of companies plan to adopt coding agents within twelve months. But only thirteen percent report using AI across the full software development lifecycle. There's a massive gap between "we're experimenting" and "this is integrated into how we actually ship software."

JetBrains VP Oleg Koverznev made a comparison that stuck with me when I was reading about this — he's warning that AI agents are about to repeat the cloud ROI crisis.

That framing is really useful. When enterprises moved to the cloud, the first wave was enthusiasm and investment, followed quickly by "wait, where did all this money go, and what did we get for it?" An entire category of cloud cost management and FinOps tooling emerged from that pressure. Koverznev is saying the same dynamic is already starting with agents. One developer can spend a hundred dollars a month on API calls. A team orchestrating thousands of agents can spend a hundred thousand. Without governance, those costs compound invisibly and the ROI conversation becomes very uncomfortable very fast.

And the MIT research on this is genuinely sobering. Ninety-five percent of enterprise AI pilots fail to scale. Only five percent deliver measurable profit impact. The constraint isn't model capability — it's operational fit.

Which is the key insight that gets lost in all the framework hype. The problem most enterprises face isn't "our model isn't smart enough" or "we picked the wrong framework." It's that the organizational infrastructure required to deploy agents reliably — documentation, domain models, escalation policies, testing pipelines, governance structures — doesn't exist yet. AWS's Matthias Patzak put it really directly in January: agents can't work across teams because they lack the domain knowledge that exists only in developers' minds. Architectural patterns, business rules, design constraints — that knowledge isn't written down anywhere. When agents try to make changes to another team's code, they usually fail because they're operating without that context.

So it's a people and process problem wearing a technology costume.

The DORA twenty twenty-five report adds another dimension: about seventy-seven percent of organizations deploy once per day or less. Manual testing and deployment pipelines simply cannot handle the volume of agent-generated code. So even if your agents are producing good output, your delivery infrastructure isn't set up to absorb it.

Now I want to get into the category of organizations where frameworks aren't even part of the conversation — because I think this is the most underappreciated part of the story. What are we actually talking about here?

There are several distinct categories, and they're worth separating because the reasons are different. The first and most immediate is security and compliance. A recent Gravitee report found eighty-eight percent of organizations had experienced confirmed or suspected agent security incidents. And yet only fourteen percent had full security approval for their agent fleet. Those two numbers together are alarming.

Eighty-eight percent have had incidents but only fourteen percent have proper security approval. So most organizations are running agents that haven't been properly vetted, getting burned, and presumably doing it anyway.

The ones that have the luxury of doing it anyway. Regulated industries don't. Healthcare, finance, defense, government — they can't deploy frameworks that haven't passed security review. And most frameworks carry no compliance certifications. HIPAA, SOC2, DORA, the EU AI Act. Of the major frameworks, Akka is essentially the only one that holds multiple compliance certifications. So for a hospital system or a bank, the question isn't "LangGraph or CrewAI" — the question is "can we legally deploy this at all, and under what conditions."

And the EU AI Act is now in enforcement. That changed the calculus significantly for European enterprises.

August twenty twenty-five was when enforcement began, and the obligations are real — transparency requirements, governance documentation, oversight mechanisms. And it applies to any AI deployed in EU markets, regardless of where the company is headquartered. So an American company selling into Europe has to comply. And for something like DeepSeek, which has been explicitly ruled out for any enterprise with EU operations and GDPR obligations regardless of its technical capabilities — the geopolitical dimension of where your data is processed becomes a hard constraint, not a preference.

There's a related category here that's even more extreme — air-gapped environments. Defense, intelligence, critical infrastructure. Cloud-dependent frameworks are simply impossible.

Right, and this is a case where the framework conversation never starts because the technical prerequisites don't exist. Most of these frameworks assume connectivity. They're pulling from cloud-hosted models, they're logging to remote observability platforms, they're calling external APIs. For an air-gapped environment, you need a fundamentally different architecture, and most of the mainstream frameworks aren't designed for it.

Let's talk about the vendor lock-in angle more specifically, because Kai Waehner's Enterprise Agentic AI Landscape from earlier this month frames this in a way I found really useful — the trust versus lock-in matrix.

The key insight from that analysis is that choosing an agentic AI vendor in twenty twenty-six is categorically different from choosing an API vendor. It's a strategic partnership decision. Because the lock-in isn't just at the API layer — it's at the framework layer, the runtime layer, the observability layer, and the developer pattern layer. When your team has internalized how to build agents in a particular framework, that's organizational lock-in that doesn't show up in a contract but is very real.

And the hyperscalers understand this perfectly. They're not giving away frameworks out of generosity.

The Cengage CIO Ken Grady put it well — vendors are working at cross-purposes as they seek to compete and protect their data moats. And the practical consequence is that APIs for one vendor's platform don't interoperate with those of another vendor's platform. You can't easily take an agent architecture built on Bedrock AgentCore and migrate it to Azure Foundry. The switching costs accumulate in ways that aren't obvious upfront.

Now I want to get into the "no framework" camp more directly, because there's a principled engineering argument here that's worth taking seriously. Isaac Hagoel at Atlassian spent eight months shipping AI features in production, and his conclusions were pretty blunt.

His piece on DEV.to from August twenty twenty-five is worth reading in full. The core argument is that agents sound compelling on conference slides but in production they break, drift, and stall unless you're babysitting every step. And he identifies a specific set of failure modes that aren't edge cases — they're the common case. Loss of control, premature exit from tasks, performance degrading as instructions and tools and history grow, hallucinated actions, cascading errors with no robust recovery, poor context management over long tasks, and fundamental opacity that makes debugging miserable.

The one that gets me is "true autonomy almost never survives outside of narrow, simple demos." Because that's such a specific and damning claim from someone who's actually shipped this stuff.

And it aligns with the MIT finding. Most agent demos are meticulously iterated until they perform a single showcase scenario perfectly. The team has tuned the prompts, the tools, the context — everything is optimized for that specific flow. What's hard and still largely unsolved is getting robust, reliable performance in the messy, unpredictable real world where users don't follow the happy path.

So Hagoel's prescription is basically: use LLMs as tightly-scoped components, not orchestrators. Schema-enforced outputs, explicit branching logic, no agentic autonomy.

Code-first, workflow-based. And his critique of frameworks is that most of them push you toward agentic complexity even when simpler procedural logic would work better. The framework's design philosophy shapes what you build — and if the framework is optimized for multi-agent orchestration, you tend to reach for multi-agent orchestration even when a deterministic function call would be more reliable and easier to debug.

There's something almost ironic about that. You adopt a framework to reduce complexity, and the framework's abstractions push you toward a more complex architecture than you would have built from scratch.

Which is exactly what the AWS Strands team found when they built their own thin framework. Their explicit reasoning was that they'd realized they no longer needed complex orchestration because models now have native tool-use and reasoning capabilities. Their previous framework libraries were getting in the way of fully leveraging what newer LLMs could do on their own. The framework was designed for a world where models needed heavy scaffolding, and that world is changing rapidly.

This connects to the "smarter models, thinner frameworks" trend. Let's spend a minute on this because I think it's the structural shift that most commentary misses.

It's the most important underlying dynamic. With models at the capability level of Claude Sonnet 4.6, Gemini 3 Pro, Llama 4 Scout — which has a ten-million-token context window — a lot of what frameworks were doing for you, the model can now do natively. Planning, tool selection, self-correction, handling ambiguity. The explicit control flow that LangGraph-style DAG orchestration provides was valuable when models were less capable. As model capability increases, the value of that explicit scaffolding decreases.

So independent frameworks are in a genuinely precarious position. The models are getting better at the thing frameworks were built to do, and the hyperscalers are commoditizing the deployment layer from the other direction.

And the question of what survives that squeeze is interesting. The New Stack's analysis suggests the answer might not be a framework at all — it might be the protocol layer. MCP, the Model Context Protocol, which was donated to the Linux Foundation's Agentic AI Foundation, and A2A, Agent-to-Agent. These are becoming the unifying substrate in the way that TCP/IP unified networking without requiring everyone to use the same operating system.

The "Kubernetes of agents" might be a protocol, not a framework. That's a genuinely provocative thesis.

And historically it's a pattern. The infrastructure wars often get resolved at the protocol layer rather than the application layer. The browser wars didn't end because one browser won permanently — they ended because HTML and HTTP became the shared substrate that browsers competed on top of. If MCP and A2A become the standard interoperability layer for agents, then the framework you build on top becomes a much less critical decision.

So what actually does have durable value in this landscape? Because I don't want to leave listeners with just "frameworks are bad and everything is uncertain."

Fair. There are four areas that are emerging as real differentiators. The first is context engineering — and this is subtle but important. The Manus team, which built one of the more sophisticated agent systems, rebuilt their framework four times before getting context management right. The insight is that the performance bottleneck isn't orchestration logic, it's what the model sees at each step. Garbage context in, garbage outputs out. Getting that right is genuinely hard and genuinely valuable.

And that's independent of which framework you use, or whether you use one at all.

Completely independent. The second area is evaluation and observability. LangChain's real moat, as it turns out, isn't the chain abstraction — it's LangSmith. The tooling for measuring and evaluating agent performance. Tools like Langfuse, Braintrust, Ragas are building in this space. The principle is simple: you cannot improve agents you cannot measure. And right now, most organizations deploying agents have very limited visibility into what those agents are actually doing.

Which connects back to the security incident numbers. Eighty-eight percent have had incidents — and part of why that number is so high is that organizations don't have the observability to catch problems early.

The third area is agent security as a discipline. Microsoft published that eighty percent of Fortune five hundred companies have active agents — but most organizations still treat agents as extensions of human user accounts rather than as independent entities that require their own identities and access controls. That's a fundamental security architecture problem. Agents need their own identity, their own permission scopes, their own audit trails. That's not a framework feature — it's a security practice.

And the fourth area?

Interoperability protocols. Which we've already discussed, but the practical implication is that enterprises betting on MCP and A2A compatibility today are potentially insulating themselves from the coming framework shakeout. If your agent architecture is built on open protocols rather than proprietary framework abstractions, you have more flexibility to swap out components as the landscape evolves.

Let me ask you a practical question. If you're a developer or an engineering leader listening to this, trying to figure out what to actually do — what does this all add up to?

The honest answer is it depends heavily on your context, and anyone who gives you a universal prescription is oversimplifying. But there are a few clear signals. If you're in a regulated industry, start with the compliance and security question before the technology question. Most frameworks haven't done the compliance work. Figure out what you're actually allowed to deploy before you evaluate features.

And if you're not in a regulated industry?

If you're building something new, Anthropic's advice to start directly with the API is genuinely sound. Most of the patterns you need can be implemented in a few hundred lines of Python. The framework abstraction adds overhead — in terms of debugging complexity, dependency management, and the risk of building on abstractions that may not survive the next generation of models. Start simple, add complexity only when you have a specific problem that justifies it.

And the vendor lock-in consideration?

Be explicit about it. The Kai Waehner trust versus lock-in matrix is a useful forcing function — before you adopt a framework, ask: what runtime does this framework assume? What observability does it push me toward? If I want to migrate away from this in two years, what does that cost? The hyperscaler frameworks are particularly worth scrutinizing here because the lock-in is by design.

The ROI question is probably the one that's going to dominate the next twelve to eighteen months. Because the JetBrains warning about the cloud ROI crisis — that pattern feels very likely to repeat.

The global AI system integration and consulting market hit eleven billion dollars in twenty twenty-five and is projected at fourteen billion in twenty twenty-six. That number tells you the implementation gap is real and expensive to close. Organizations are spending enormous amounts on consultants and integrators to bridge the distance between "we have agents" and "our agents are delivering measurable business value." The ones that are going to come out ahead are the ones that are rigorous about measuring outcomes before scaling investment, not after.

I keep coming back to that forty percent cancellation prediction. Because that's not a fringe forecast — that's Gartner saying nearly half of current agentic AI projects won't make it to twenty twenty-seven. And I think the framework question is partly responsible for that. Organizations are picking frameworks and building complexity when they should be validating whether the use case is actually viable first.

The Klarna reversal is the most public example of that dynamic. They made a lot of noise about replacing human customer service with AI agents, and then quietly walked it back as the real-world performance didn't match the demo performance. And Replit's catastrophic database wipe from an agent running amok — these aren't anomalies, they're warnings. The gap between what an agent can do in a controlled demo and what it reliably does in production with real users and real edge cases is still enormous.

Which brings us back to Hagoel's point. The hardest thing about agents isn't building them — it's keeping them from doing something catastrophic when they encounter a situation the demo never tested.

And frameworks don't solve that problem. They can give you structure, but they can't give you reliability. Reliability comes from understanding your system deeply enough to know its failure modes, having the observability to catch problems early, and having the humility to constrain agent autonomy to the scope where you've actually validated it works. That's a discipline, not a dependency.

Alright, I think that's the note to end on. The framework abundance is real, the framework adoption is more complicated than the hype suggests, and for a significant portion of the organizations that matter most — regulated industries, enterprises with real governance requirements, teams that have actually shipped this stuff in production — the framework question is either unanswerable or the wrong question entirely.

And the organizations that are going to navigate this well are the ones that start with "what problem are we actually solving and how will we know if we've solved it" rather than "which framework should we pick." The protocol layer, the observability tooling, the security practices — that's where the durable value is being built right now.

Big thanks to Modal for the GPU credits that keep this whole operation running. Thanks as always to our producer Hilbert Flumingtop. This has been My Weird Prompts — if you're enjoying the show, a quick review on your podcast app goes a long way toward helping new listeners find us. Take care.

See you next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2169: How Enterprises Are Rethinking Agent Frameworks

Why Enterprises Are Ditching Agent Frameworks

The Framework Abundance Trap

The Hyperscaler Strategy

The Anthropic Contradiction

The Scaling Reality

The Real Obstacles

The Categories That Skip Frameworks Entirely

The Vendor Lock-In Matrix

The Bottom Line

Downloads

You Might Also Like

#2169: How Enterprises Are Rethinking Agent Frameworks