You know, Herman, I was looking at a GitHub repo yesterday where the developer had essentially written a three hundred line wrapper just to handle basic state persistence for a simple research agent. And I thought to myself, we are living in the year twenty twenty-six, why are we still out here building the plumbing by hand?
It is the eternal struggle of the early adopter, Corn. We spend eighty percent of our time building the bucket and twenty percent of the time actually filling it with water. But that is exactly what today's prompt from Daniel is pushing us to look at. He wants a state-of-play briefing on the major vendor-provided SDKs for agentic AI.
It is about time. We have spent plenty of hours dissecting the third-party frameworks like CrewAI and LangGraph, which are great for being model-agnostic, but the big players have finally stepped into the ring with their own heavy-hitting development kits. We are talking OpenAI, Anthropic, and Google.
And just for the record, since we are diving into the technical weeds today, this episode of My Weird Prompts is powered by Google Gemini three Flash. It is actually quite fitting given that we are discussing Google's own Agent Development Kit later in the show.
Gemini writing about its own creator's SDK. I am sure it will be completely unbiased and not at all self-congratulatory. But seriously, the landscape has shifted. A year ago, if you wanted an agent, you went to an open-source framework. Now, the vendors are saying, look, we know our models best, here is the official way to build on them.
The shift is massive. We are moving from the generative chat era into the agentic do era. It is no longer about the LLM just talking to you; it is about the LLM having a dedicated runtime, a memory layer, and a standardized way to call tools.
So let us start with the elephant in the room, or perhaps the green logo in the room. OpenAI released their Agents SDK version one point zero back in November of twenty twenty-five. Herman, you have been poking around the documentation. What is the overarching philosophy there? Because OpenAI usually goes for the batteries included approach.
That is the perfect way to describe it. The OpenAI Agents SDK is built on a philosophy of declarative simplicity. They want to abstract away the orchestration entirely. If you look at the API, they have introduced a first-class Agent class. You do not have to manually manage the message loop or the tool-calling logic anymore. You define an agent, you give it a set of tools, and the SDK handles the recursive reasoning steps internally.
I saw some of the YAML configurations they are pushing. It looks suspiciously like Kubernetes for AI. Is it actually easier, or just a different kind of complex?
For a specific set of use cases, it is significantly easier. If you are building a linear or semi-autonomous workflow where you need the agent to just go do X, Y, and Z, the declarative approach is a dream. You set memory equal to true in the config, and the SDK automatically handles the conversation threading and the vector store indexing for long-term retrieval. You do not have to worry about session IDs or database schemas for the chat history in the early stages.
But there has to be a catch. OpenAI is notorious for the black box approach. If I am using their Agent class, how much control do I actually have over the inner monologue or the specific way it decides to sequence those tools?
That is the primary friction point for senior engineers right now. The sentiment in the developer community is that it is the fastest way to get an MVP off the ground, but it is also the most opaque. If the agent gets stuck in a loop or makes a weird tool-call decision, you have very limited visibility into the middle layers of that reasoning chain. It is very much a take it or leave it architecture. You are trading the fine-grained control you get in something like LangGraph for the speed of a managed runtime.
It sounds like the classic Apple approach. It works beautifully as long as you stay inside the garden walls. But what happens when I want to use a tool that requires a complex multi-step authentication flow that the SDK was not designed for?
That is where you start hitting the limits. The SDK makes simple tool use—like calling a weather API or a calculator—incredibly trivial. But for complex, stateful tool interactions, you end up fighting the framework. Also, let us talk about the versioning. We are at one point zero now, and they have already deprecated three different beta patterns from the early twenty twenty-five previews. Developers are a bit wary of the deprecation trap we have talked about before.
Right, the arc of deprecation. You build your whole agentic workflow on their SDK in November, and by March, the API signatures have changed because they decided a different way was more efficient. It is a high-velocity environment.
It really is. Now, contrast that with Anthropic. They released the Claude Agent SDK version zero point nine in January of twenty twenty-six. Their philosophy is almost the polar opposite of OpenAI’s. Where OpenAI wants to hide the orchestration, Anthropic wants to give you the tools to build a very safe, very transparent harness.
Safety-first composability. That sounds very Anthropic. I am guessing it is not as simple as setting memory equals true?
Not even close. Anthropic’s SDK feels more like a library than a full-stack runtime. It focuses heavily on tool-use guardrails. One of the standout features is the explicit human approval callback system. In the code, when you define a tool that has high-stakes implications—like making a financial transfer or deleting a file—you can wrap it in a required approval block. The SDK will automatically pause the agent's execution, serialize the state, and wait for a signed signal from a human before it proceeds.
That is actually huge for enterprise. I cannot tell you how many legal teams have put the brakes on agentic projects because they are terrified of a rogue LLM hallucinating a wire transfer. Having that as a native feature in the SDK rather than a custom-built shim is a big deal.
It is the reason why we are seeing such high adoption in fintech and healthcare right now. The developer sentiment is that while it is more verbose—you are writing more code to handle the loops and the state transitions—you sleep better at night. You have a very clear audit trail of exactly why the model requested a tool and where the human intervened.
So if OpenAI is the fast-moving consumer app builder, Anthropic is the choice for the regulated industry?
Generally, yes. But it is also for the developer who wants to be an architect, not just a consumer. The Claude Agent SDK does not force a specific memory implementation on you. It gives you the hooks to plug in your own Redis instance or your own Pinecone index. It is much more modular. The limitation, of course, is the learning curve. You need to understand the underlying mechanics of tool-use blocks and how Claude handles the thinking tag in the latest models.
I remember seeing a demo where they showed the thinking process of the model being streamed separately from the final output. Does the SDK handle that separation natively?
It does. It treats the chain of thought as a distinct data stream. This allows you to log the model's reasoning to your internal monitoring without necessarily showing the messy intermediate steps to the end user. It is a very sophisticated way to handle transparency.
Okay, so we have the fast-and-easy OpenAI, the safe-and-modular Anthropic. That brings us to Google. They dropped the Agent Development Kit, or ADK, version zero point five in February. Given Google's history, I am betting this is all about the cloud.
You nailed it. Google’s ADK is the definition of cloud-native integration. If you are already in the Google Cloud Platform ecosystem, the ADK is almost an unfair advantage. Its philosophy is about seamless deployment and horizontal scaling.
When you say seamless, are we talking about the adk deploy command I saw in the documentation?
You can take an agent definition—which, by the way, integrates natively with Vertex AI and Firebase—and deploy it to Cloud Run with a single command. The ADK handles the containerization, the IAM permissions, and the scaling logic. It treats an agent not just as a script, but as a microservice.
That is a very different mental model. Instead of thinking about an agent as a persistent entity, you are thinking about it as a serverless function that can spin up thousands of instances to handle a massive batch of tasks.
And that is where Google is winning. If you need to run an agentic workflow across ten thousand documents simultaneously, the ADK’s integration with BigQuery and Google’s distributed infrastructure is hard to beat. They also have this interesting concept called tool-extension sets. You can define a set of tools in the Google Cloud Console and share them across multiple agents without rewriting the API definitions.
But wait, if I am using the Google ADK, am I essentially signing a blood oath to stay on GCP forever?
Pretty much. That is the major limitation. While the SDK is open-source in the sense that the code is on GitHub, it is so deeply entwined with Google’s proprietary services that porting an ADK agent to AWS or Azure would be a nightmare. It is the ultimate vendor lock-in play.
It is the classic Google move. The tools are incredibly powerful, but the walls of the garden are twenty feet high and topped with concertina wire. But I suppose if you are a Fortune five hundred company already spending millions on GCP, you do not really care about portability. You care about the fact that it integrates with your existing security policies and data lakes.
And that is exactly who is using it. We are seeing large-scale logistics and retail companies using the ADK because it connects directly to their inventory databases via the Vertex AI extensions. It is about industrial-grade agents.
So let us do a quick comparison here. If I am a developer starting a project today, and I am looking at these three, how do I actually choose? Because we haven't even touched on when you should just ignore all of them and stick with something like Pydantic AI.
It comes down to where you want the complexity to live. If you want the complexity to be hidden so you can focus on the user experience, you go with OpenAI. If you want the complexity to be visible so you can manage risk and safety, you go with Anthropic. If you want the complexity to be handled by your infrastructure so you can scale to the moon, you go with Google.
And what about the third-party frameworks? Why would I still use LangGraph or CrewAI in twenty twenty-six when these official SDKs exist?
The biggest reason is model agility. Even though these SDKs are getting better, they are still fundamentally tied to their respective models. You cannot run a Claude three point five Sonnet model through the OpenAI Agents SDK efficiently. If your project requires you to swap models based on cost or performance—maybe you use a small local model for simple tasks and a big frontier model for the hard stuff—you still need an agnostic framework.
That is a great point. The vendor SDKs are essentially high-performance silos. If you decide that Gemini is better for your specific task next month, and you built your whole agentic logic in the OpenAI SDK, you are looking at a total rewrite of the orchestration layer.
We saw a case study recently of a retail company that made that exact mistake. They built a customer service agent using OpenAI's early SDK. It worked great until they realized that they needed a specific type of long-context window that Gemini offered at a lower price point. They spent three weeks ripping out the OpenAI-specific memory calls and state management just to move over. They ended up migrating the whole thing to LangGraph because they realized they never wanted to be in that position again.
It is the middle-ware play. It is less efficient in the short term but more resilient in the long term. But I have noticed that even the third-party frameworks are starting to adopt the patterns from the vendor SDKs. Have you seen Pydantic AI lately? They have basically adopted the same declarative style that OpenAI is using, but they keep it model-agnostic.
Yes, Pydantic AI is the current darling of the Python community for exactly that reason. It gives you the type-safety and the structured outputs that developers love, but it doesn't lock you into a specific inference provider. It feels like the sweet spot for many teams right now.
Let us talk about a few other niche but significant vendor SDKs. What about the smaller players? Is Mistral or Meta doing anything in this space?
Meta is an interesting one. They don't have a dedicated agent SDK in the same way, but they have released the Llama Stack. It is more of a reference architecture than a managed SDK. It is designed for developers who are running Llama models locally or on their own hardware. It provides a standardized API for tool calling and safety filtering. It is very much aimed at the open-source, self-hosted crowd.
Which is a huge market. There are so many companies that won't touch a cloud-based LLM with a ten-foot pole for privacy reasons. Having a standardized stack for local agents is a massive win for them.
And Mistral has their la Plateforme, which includes some agentic capabilities, but it is not as mature as the big three we discussed. They are focusing more on the efficiency of the models themselves rather than the surrounding orchestration.
So if we are looking at the state of play in March twenty twenty-six, it sounds like the honeymoon phase of just chatting with boxes is over. If you're a serious dev, you're picking a harness. You're picking a way to govern how these models interact with the world.
It is the professionalization of the field. We are moving away from prompt engineering as the primary lever and moving toward agent architecture. The SDK you choose defines the constraints of your architecture.
I want to go back to the OpenAI SDK for a second, specifically version one point zero. I was reading some developer feedback on X, and people were complaining about the latency added by the SDK's internal loops. Is that a real concern, or just people being pedantic?
It is a real concern for real-time applications. Because the OpenAI SDK handles the recursion internally, every time the agent decides to use a tool, there is a small amount of overhead as the SDK processes the state and prepares the next call. In some benchmarks, we are seeing a ten to fifteen percent latency penalty compared to a hand-rolled message loop that is highly optimized. For a research assistant, it doesn't matter. For a voice-based agent or a high-frequency trading bot, it is a dealbreaker.
That is the tax you pay for the convenience. It is like the difference between writing in C++ and writing in Python. Most people will take the Python speed of development any day, but if you're at the edge of performance, you have to go lower level.
And that is where Anthropic’s approach shines again. Because it is more manual, you can optimize those loops yourself. You can decide exactly when to stop and when to push forward.
What about the cost implications? Do these SDKs make it easier to track and control spend? Because an autonomous agent can run up a five hundred dollar bill in about ten minutes if it gets into a loop.
This is where the vendor SDKs actually have a huge advantage. Both the OpenAI and Google kits have built-in budget caps and token-limit triggers at the agent level. You can initialize an agent with a max budget of five dollars for a specific task. Once the cumulative token cost of the reasoning steps and tool calls hits that five dollars, the SDK will gracefully shut down the agent and return the current state.
That is a lifesaver. I remember the early days of AutoGPT where people would wake up to a thousand-dollar bill because their agent spent the whole night trying to figure out how to order a pizza but got stuck in a captcha loop.
It happened more often than people like to admit. The fact that cost management is now a first-class citizen in the SDK is a sign that these tools are being built for production, not just for Twitter demos.
So, looking ahead, do you think we will see these SDKs converge? Like, will there eventually be an industry standard for agent orchestration, or are we destined to have these three competing ecosystems for the foreseeable future?
I think we are headed for a period of intense fragmentation followed by an eventual standardization of the protocols, not the SDKs themselves. Think about it like the early days of the web. We had different browsers with different ways of rendering things, but we eventually settled on HTTP and HTML. We are starting to see the early signs of an Agent Protocol—a standardized way for an agent to describe its tools and its state to any runtime.
That would be the dream. You write your agent logic once, and then you just choose which runtime to execute it on based on your needs for that day. But we know the big vendors will fight that as long as they can because lock-in is profitable.
Of course. The SDK is the moat. If they can get your entire engineering team trained on the Google ADK and your entire infrastructure built on GCP-native agent services, you are a customer for a long time.
It is the same old story in a shiny new AI wrapper. But for the individual developer or the small startup, the advice seems clear: prototype where it is easiest, which is likely OpenAI, but keep your business logic decoupled so you can move if the winds change.
Don't let the agent logic become your business logic. Keep your tools as clean, independent APIs. That way, the agent is just a disposable brain that you can swap out.
A disposable brain. That is a slightly terrifying thought to end a segment on, but it is accurate. So, let us talk about the practical takeaways for the folks listening who are actually staring at a terminal right now.
Number one: If you are building a prototype and you need it done by Friday, use the OpenAI Agents SDK. The declarative YAML approach and the managed memory will save you days of work. Just be aware that you are trading off visibility and long-term flexibility.
Number two: If you are in a regulated industry or you are building something where a mistake could be catastrophic, look at Anthropic’s Claude Agent SDK. The human-in-the-loop features and the transparency of the tool-use blocks are worth the extra lines of code.
And number three: If you are already deep in the Google Cloud ecosystem and you need to scale an agentic workflow across massive datasets or thousands of concurrent users, the Google ADK is your best friend. The deployment story there is simply better than anyone else's right now.
And number four, which I will add: Always, always have an exit strategy. Audit your code for vendor-specific calls. If you find yourself using a feature that only exists in one SDK, ask yourself if you can achieve the same thing with a more generic pattern. Because in the AI world of twenty twenty-six, six months is a lifetime.
It really is. We have seen entire companies rise and fall in the time it takes for a major model update to rollout.
Well, this has been a fairly comprehensive tour of the landscape. I feel like I finally have a map of where all the landmines are buried.
It is a crowded field, but the tools are getting genuinely impressive. We are finally getting to the point where we can build the things we’ve been dreaming about for the last few years without having to spend all our time on the plumbing.
I might actually go back and refactor that GitHub repo I was looking at. Three hundred lines of state management is about two hundred and ninety lines too many for twenty twenty-six.
You might find that with the right SDK, you can get it down to ten.
That is the dream, Herman. That is the dream.
Well, I think that covers the major ground on the vendor SDK front. It is a fascinating moment in the development of the field.
It really is. Before we wrap up, I want to give a quick shout-out to our producer, Hilbert Flumingtop, for keeping the gears turning behind the scenes.
And a huge thank you to Modal for providing the GPU credits that power this show's infrastructure. They make the heavy lifting of modern AI research and generation much more manageable.
If you found this technical dive useful, please consider leaving us a review on Apple Podcasts or Spotify. It genuinely helps the show reach more developers and enthusiasts who are trying to navigate this crazy landscape.
You can also find all of our episodes, including the archives and the RSS feed, over at myweirdprompts dot com.
This has been My Weird Prompts. I'm Corn.
And I'm Herman Poppleberry.
See you next time.
Take care.