#2167: Sync vs. Async: Architecting Agents for Scale

Why most enterprise AI agents fail in production has less to do with models and more to do with whether they're built synchronously or asynchronously.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2325
Published: Apr 12
Duration: 24:13
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: claude-sonnet-4-6
Topics: ai-agents model-context-protocol distributed-systems

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Why Agent Architecture Breaks at Scale: Sync vs. Async

The conversation around AI agents shifted dramatically in 2025. A year ago, the question was "will agents work?" Now it's "why aren't they working at scale?" The answer, more often than not, comes back to architecture — not model capability, but the plumbing underneath.

The Two Fundamental Patterns

Synchronous orchestration is the traditional model: a central supervisor agent directs everything in real time. It issues a command, waits for the result, then issues the next command. This is sequential, controlled, and auditable — the entire workflow passes through one chokepoint. It's clean in demos. A travel planning agent calls a flight agent, waits, then calls a hotel agent, waits, then calls a car rental agent. You can follow every step.

Asynchronous choreography works the opposite way: no conductor. Agents react to events, publish messages, and other agents subscribe and respond when they're ready. The workflow emerges from collective behavior rather than being dictated from the top.

Why Synchronous Fails at Scale

The synchronous model has a critical vulnerability: a single point of timeout. If any sub-agent in the chain takes too long or fails, the entire workflow either stalls or collapses. In enterprise environments, "too long" covers a lot of ground — legacy database queries, third-party APIs having a slow day, compliance checks needing human review.

The assumption baked into synchronous systems is that every step will complete in a reasonable time window. That assumption almost never holds at scale.

Consider the numbers from 2025: between 60-70% of enterprises experimented with agentic AI. Production deployment rates ranged from 15-47% depending on how you define "production." Language models improved dramatically — tool-calling error rates dropped from around 40% to 10%. But here's the trap: a 10% error rate is fine for a chatbot where a user can ask again. For an agent autonomously executing business logic, 10% is catastrophic. One failed step can corrupt downstream state in ways that are very hard to recover from.

Three failure modes dominated production disasters: reliability, integration complexity, and cost at scale. The cost problem is particularly stark. Several companies ran synchronous agent pilots for customer service, got impressive demo results, then did the math on scaling to their full customer base and realized it would cost more than their entire existing contact center budget. Every customer interaction becomes multiple LLM calls, and those costs accumulate exponentially.

The Async Advantage

Asynchronous systems offer real escape hatches. You can batch work, parallelize tasks, and use smaller specialized models for specific subtasks instead of routing everything through a single large model. The Akka framework provides a concrete example: they claim 3x the velocity with a third of the compute compared to Python-based frameworks, handling 1.4 million transactions per second at 9 milliseconds latency for around $11.77 per month per thousand transactions per second. Walmart, Capital One, and John Deere use it in production.

But async isn't a universal solution. There are clear cases where synchronous is the right call.

When Synchronous Actually Works

Financial transaction processing needs strict sequential ordering and complete audit trails. Every step must be confirmed before the next one starts. The auditability that feels like a constraint in flexible workflows is actually a feature in regulated operations.

Real-time customer service chatbots also demand synchronous patterns. If someone types a question and expects an answer in two seconds, the async model's fire-and-forget, poll-for-results pattern introduces unacceptable latency.

Code generation assistance (Copilot, Cursor) is synchronous by necessity. A developer has typed a function signature and needs the completion immediately. You can't say "we'll get back to you in a few minutes."

The clean framing: synchronous wins when human attention is locked in and waiting. Async wins when the work outlasts the human's active attention.

Three Architectural Patterns

AWS architecture work identifies three useful patterns:

Pure synchronous supervisor orchestration: A central agent manages everything, tells each sub-agent what to do, waits for results. Works well for 5-10 step workflows with clear, bounded dependencies.

Pure asynchronous event-driven choreography: No supervisor. Agents subscribe to an event hub, react to messages, and the workflow emerges from their collective behavior. New agents can be added without changing routing logic. But the debugging story is painful — distributed event tracing is genuinely hard.

Hybrid/broker pattern: A single broker agent routes messages to other agents based on content or metadata, but doesn't control the entire workflow. You get dynamic routing flexibility without losing all structure. You can extend this with a supervisor layer for stateful multi-step workflows.

The MCP Tasks Revolution

One of the most significant recent developments got less attention than it deserved: the Model Context Protocol update from November. Before Tasks, every MCP request was synchronous — the connection stayed open and waited for results. This works for a two-hundred-millisecond database query. It doesn't work for a thirty-minute ETL job, large file conversion, or any workflow involving a human checkpoint.

Tasks changes this fundamentally. A task-augmented request returns immediately with a durable handle — a task ID. The actual work continues in the background. Clients can poll for status or subscribe to push notifications. Tasks have five states: working, input-required, completed, failed, and cancelled.

The input-required state is the crucial one. It reframes human-in-the-loop not as a failure mode but as a design pattern. An agent can fire off a long-running compliance check, go do other work, and when the compliance check hits an ambiguous case, it transitions to input-required, surfaces the question to a human, and resumes when it gets an answer.

This is the opposite of the synchronous model, where agents either block the entire workflow waiting for human input or just make a decision and hope it's right — exactly the failure mode that produced many 2025 production disasters.

The Real Lesson

The best agentic systems in production aren't fully autonomous. They're async workflows with strategic human checkpoints. The architecture you choose determines whether your system can handle the messiness of real enterprise work or whether it breaks the moment something takes longer than expected.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2167: Sync vs. Async: Architecting Agents for Scale

So Daniel sent us this one, and it's a genuinely meaty architectural question. He's asking about synchronous versus asynchronous architectures in agentic AI — what the real differences are under the hood, where each one belongs, and why this particular decision is turning out to be so consequential for enterprises trying to move from demos into production. He also wants us to identify the best use cases for each. There's a lot here, so let's get into it.

Yeah, and the timing on this is interesting because the conversation has shifted. A year ago everyone was asking "will agents work?" Now the question is "why aren't they working at scale?" And the answer, more often than not, comes back to architecture.

Which is a more interesting failure mode, honestly. The models got good. The plumbing didn't keep up.

That's the core tension. And I think the sync versus async framing is the cleanest way to understand what "the plumbing" actually means in practice. So let me set up the two camps. Synchronous orchestration is the traditional model — you have a central supervisor agent that directs everything else in real time. It issues a command, waits for the result, then issues the next command. Sequential, controlled, auditable. The whole workflow passes through one chokepoint.

And asynchronous choreography is the opposite — no conductor, agents react to events, they publish messages, other agents subscribe and respond when they're ready.

Right, and the distinction matters enormously once you try to do something non-trivial. The synchronous model is clean in demos. You have a travel planning agent that calls a flight agent, waits, then calls a hotel agent, waits, then calls a car rental agent. You can follow every step. But in production, that sequential waiting kills you.

Oh, by the way — today's script is being generated by Claude Sonnet 4.6, which feels appropriate given we're talking about the infrastructure that makes agents actually work.

Ha, yes — the recursive irony is not lost on me. But back to the problem. The synchronous model has what I'd call a single point of timeout. If any sub-agent in that chain takes too long, or fails, the whole workflow either stalls or collapses. And in enterprise environments, "too long" can mean a lot of things — a legacy database query, a third-party API that's having a slow day, a compliance check that needs human review.

So the synchronous model is essentially optimistic. It assumes every step will complete in a reasonable time window.

And that assumption is almost never true at scale. The asynchronous model, by contrast, is built around the assumption that things will take unpredictable amounts of time, that components will fail, and that the system needs to keep moving anyway. Agents publish events to a message bus, other agents subscribe and process when they can, and the workflow emerges from their collective behavior rather than being dictated from the top.

There's a framing I've seen that I think is genuinely useful here — synchronous AI is like live chat, asynchronous AI is like email. With live chat, you're locked in, the session is active, if you walk away for twenty minutes you've lost context. With email, you send a message, the other party responds when they're ready, and the thread persists indefinitely.

Here's the thing — most enterprise work actually looks more like email than live chat. Document processing, invoice handling, compliance checking, supply chain coordination — none of that needs to happen in a continuous synchronous session. The expectation of immediacy is often artificial, inherited from the chatbot era.

Which brings us to why 2025 went the way it did. Because a lot of enterprises built their agentic systems with the chatbot mental model and then wondered why things broke.

The numbers are pretty stark. Somewhere between sixty and seventy percent of enterprises were experimenting with agentic AI last year. Production deployment rates were anywhere from fifteen to forty-seven percent depending on how you define "production" — and that definitional gap is itself telling. What failed wasn't the language models. Tool-calling error rates actually improved dramatically during 2025, from around forty percent down to about ten percent.

Which sounds like great progress until you think about what a ten percent error rate means for an agent that's placing orders or updating databases.

That's the key insight. A ten percent error rate is fine for a chatbot — the user asks again, you get the right answer, nobody loses money. For an agent that's autonomously executing multi-step business logic, ten percent is catastrophic. One failed step can corrupt downstream state in ways that are very hard to recover from.

And the synchronous architecture makes this worse because the failure cascades. One timeout, and the whole workflow is in an unknown state.

Three failure modes kept coming up in production: reliability, integration complexity, and cost at scale. The cost one is particularly interesting. Several companies ran pilots with synchronous agents for customer service, got great demo results, then did the math on scaling to their full customer base and realized it would cost more than their entire existing contact center budget. Every customer interaction becomes multiple LLM calls, and those costs accumulate fast.

Which is where the async model offers a real escape hatch, because you can batch, parallelize, use smaller specialized models for specific subtasks instead of routing everything through a single large model.

The Akka framework is a good concrete example of this. They're JVM-native, async-first, and they claim three times the velocity with a third of the compute compared to Python-based frameworks. They're handling 1.4 million transactions per second at nine milliseconds latency, at a cost of around eleven dollars and seventy-seven cents per month per thousand transactions per second. Walmart, Capital One, John Deere are using it in production.

That's a meaningful efficiency gap. But I want to push on something — because "async is more efficient" can become its own kind of dogma. There are clearly cases where synchronous is the right call.

For sure. Financial transaction processing is the obvious one. If you're moving money, you need strict sequential ordering and a complete audit trail. Every step must be confirmed before the next one starts. The auditability that feels like a constraint in a flexible workflow is actually a feature when you're dealing with regulated operations.

Real-time customer service chatbots too. If someone types a question and expects an answer in two seconds, the async model with its fire-and-forget, poll-for-results pattern introduces latency that users won't tolerate.

Code generation assistance — Copilot, Cursor — that's synchronous by necessity. A developer is sitting there, they've typed a function signature, they need the completion now. You can't say "we'll get back to you in a few minutes with the implementation."

So synchronous wins when human attention is locked in and waiting. Async wins when the work outlasts the human's active attention.

That's a really clean way to frame it. And it maps to task duration pretty directly. The AWS architecture work breaks this into three patterns, which I think is useful. The first is pure synchronous supervisor orchestration — a central agent manages everything, tells each sub-agent what to do, waits for results. Good for five to ten step workflows with clear, bounded dependencies.

The second is pure asynchronous event-driven choreography — no supervisor, agents subscribe to an event hub, react to messages, the workflow emerges from their collective behavior. New agents can be added without changing the routing logic. But the debugging story is painful.

That's the real cost of the async model that doesn't get talked about enough. Distributed event tracing is genuinely hard. When something goes wrong in a synchronous workflow, you have a clear execution path. When something goes wrong in an event-driven system, you're trying to reconstruct what happened from a stream of events across multiple independent components.

And the third pattern is the hybrid — what AWS calls the agent broker pattern. A single broker agent routes messages to other agents based on content or metadata, but it doesn't control the entire workflow. You get the dynamic routing flexibility of async without losing all the structure.

And you can extend the broker pattern with a supervisor layer for stateful multi-step workflows. The travel booking example is good here — the broker routes your initial request to a flight agent, but if the flight agent discovers that the requested dates are unavailable, you need something with state to handle the resolution loop. The supervisor layer manages that back-and-forth.

This is actually where the MCP update from last November becomes really important, isn't it? Because before that, the Model Context Protocol was essentially forcing everything into a synchronous pattern at the protocol level.

This is one of the more significant recent developments and it got less attention than it deserved. The Model Context Protocol — which is the emerging standard for how agents communicate with tools — shipped a new experimental primitive called Tasks in November. Before Tasks, every MCP request was synchronous. A client calls tools/call, the connection stays open, and it waits for the result. Which works fine for a database query that takes two hundred milliseconds. It does not work for a thirty-minute ETL job or a large file conversion.

Or any workflow that involves a human checkpoint.

Right. Tasks changes the model fundamentally. A task-augmented request returns immediately with a durable handle — a task ID. The actual work continues in the background. The client can poll tasks/get for status, or subscribe to push notifications. There are five task states: working, input-required, completed, failed, and cancelled.

That input-required state is the interesting one. Because it's essentially the protocol acknowledging that human-in-the-loop isn't a failure mode — it's a design pattern.

That's a really important reframe. The best agentic systems in production aren't fully autonomous. They're async workflows with strategic human checkpoints. The MCP Tasks design encodes that understanding at the protocol level. An agent can fire off a long-running compliance check, go do other work, and when the compliance check hits an ambiguous case, it transitions to input-required, surfaces the question to a human, and then resumes when it gets an answer.

As opposed to the synchronous model where the agent either has to block the entire workflow waiting for human input, or just... make a decision and hope it's right.

Which is exactly the failure mode that produced a lot of the 2025 production disasters. Agents making autonomous decisions in edge cases they weren't designed for, because the architecture didn't support graceful pausing for human review.

Let's talk about the framework landscape for a minute, because I think this is where developers actually live with these decisions. LangGraph, AutoGen, CrewAI — they all make different architectural bets.

LangGraph is graph-based and stateful — it supports both synchronous and asynchronous patterns, and it's particularly good for complex non-linear workflows. Code review pipelines, research synthesis tasks where you might need to loop back and re-evaluate. AutoGen from Microsoft is conversational multi-agent, also supports both sync and async, better suited for customer-facing dialogue and brainstorming workflows. CrewAI is primarily sequential — it's the most opinionated about synchronous orchestration, which makes it very easy to reason about but limits you on scale.

And there's a meta-point about framework choice here that I think is worth making. The framework you pick encodes architectural assumptions. If you choose CrewAI because it's easy to get started, you've implicitly chosen a synchronous model, and you'll hit its ceiling eventually.

There's a finding from Arion Research that cuts against the "more agents, more power" narrative that I find really compelling. The most successful multi-agent systems in production had three to five agents, not twenty. Coordination overhead scales badly. Every additional agent in an async system is another component that can fail, another event stream to trace, another source of state that needs to be reconciled.

So the async model gives you horizontal scalability, but that scalability has diminishing returns once you push it too far.

And this connects to what I think is the most provocative data point in this whole space: ninety percent of successful production AI systems are described as "workflows with strategic LLM calls" — not fully autonomous agents. Which reframes the entire sync versus async debate. The question isn't just which architecture to choose. It's whether you actually need an agent at all, versus a well-designed deterministic workflow that calls a language model at specific decision points.

That's a genuinely useful corrective. Because "agentic AI" has become a buzzword that gets applied to everything, and sometimes what you actually need is a pipeline with some intelligence injected at the right places.

The three infrastructure pillars that keep coming up for async agentic systems in production are worth laying out clearly, because they're not optional. First is semantic telemetry — your logs can't just say "Error 500: null pointer exception." They need to say something like "the procurement agent failed to retrieve the vendor ID because the Last-updated field was null, preventing a valid match." Natural language context that an LLM can parse to self-diagnose. This is what enables self-healing agents — systems that can identify their own failure mode and attempt recovery rather than just dying.

Which is a significant operational shift. Traditional monitoring tells you something broke. Semantic telemetry tells you why it broke in terms the system itself can act on.

The second pillar is stateless API design for async workflows. Your agents should interact with a message bus — Kafka, EventBridge — rather than making direct blocking calls to legacy databases. This is what enables long-running tasks. An agent triggers an action, essentially sleeps while waiting for a third-party verification to complete, and then resumes exactly where it left off. The concept of an "agent gateway" is important here — a translation layer that converts synchronous legacy responses into async events that your agents can subscribe to.

So you don't have to rip out your 2018 ERP. You just put a gateway in front of it that speaks the async language.

Which is realistic advice, because nobody's ripping out their ERP. The third pillar is a metadata layer. Agents don't just need data — they need context-rich data. An agent that knows a customer's balance is five thousand dollars is less useful than an agent that knows the balance is five thousand dollars, the customer is classified as high-value, the balance is overdue, and there was a support ticket filed last week. Knowledge graphs and vector metadata are the connective tissue that dramatically reduces hallucination in production systems.

Because hallucination in agentic systems isn't just a language quality problem — it's often a context deficit problem. The model generates plausible-sounding information to fill gaps that shouldn't exist.

Let me give you some concrete production numbers, because I think they ground this discussion in a useful way. Capital One's Chat Concierge — which is a hybrid sync-async model for auto dealership customers — achieved fifty-five percent better conversion of engagement to appointments. One insurance company processed over one hundred thousand claims with adjusters spending forty percent less time on routine intake, using an async pipeline. Salesforce Agentforce closed eighteen thousand deals by the end of 2025. ServiceNow acquired Moveworks for two point eight five billion dollars in March 2025 — which is a signal about where enterprise agentic workflow is going.

Two point eight five billion dollars is a significant bet that the workflow-as-agent model is real.

And on the cost side — model costs dropped five to fifty times during 2025, which is what made previously uneconomical async use cases viable. Async architectures with smaller specialized models for specific subtasks suddenly penciled out in a way they didn't when every LLM call cost ten times as much.

So the cost curve change isn't just "AI is cheaper" — it specifically unlocks the async model, which relies on being able to call models many times across a distributed workflow without the per-call cost becoming prohibitive.

That's the right way to think about it. And it shifts the optimization question. In a synchronous model, you're mostly optimizing for latency — how fast can I get a response? In an async model, you're optimizing for throughput and cost-efficiency — how many tasks can I complete in parallel, and can I route simpler subtasks to cheaper models?

Okay, let's bring this to the practical question — if you're an engineering team making this decision today, how do you think about it?

I'd start with task duration and human attention requirements. If your task completes in under thirty seconds and a human is actively waiting for the result, synchronous is almost certainly right. Customer service chatbots, code completion, quick data lookups, financial transactions requiring sequential ordering. If your task takes more than a minute, involves multiple external systems, or doesn't require immediate human attention, you're in async territory.

And the regulatory context matters. Synchronous workflows are much easier to audit. If you're in a regulated industry where you need to demonstrate exactly what happened and in what order, the async model's emergent behavior creates compliance headaches.

The debugging story is real. Event-driven distributed systems are genuinely harder to trace. You need good tooling — distributed tracing, semantic telemetry, centralized event logging — before you commit to async at scale. I've seen teams choose async for the scalability benefits and then spend three months building the observability infrastructure they needed to operate it.

Which is a hidden cost that doesn't show up in the architecture diagram.

The three-to-five agent rule is worth taking seriously too. If you're designing a multi-agent system and your first draft has fifteen agents, that's a design smell. Coordination overhead is real, debugging complexity grows non-linearly, and most of those agents are probably doing things that could be handled by well-designed tools rather than independent agents.

There's something almost ironic about that. The promise of agentic AI is autonomy and scale — more agents doing more things. But the production data suggests that restraint in agent count is actually a feature.

The most successful systems are the ones that are precise about where intelligence is actually needed versus where deterministic logic is sufficient. An agent that makes a decision is more expensive and less predictable than a function that executes a rule. Use agents where judgment is genuinely required, and use code everywhere else.

And the hybrid pattern — the agent broker — seems like the right default for enterprise platforms precisely because it doesn't force you to choose. You get dynamic routing without losing all the structure.

The MCP Tasks update is worth watching closely if you're building anything in this space. The fact that async is now a first-class citizen in the protocol that's becoming the standard for agent-tool communication means the ecosystem is going to build around it. Libraries, frameworks, monitoring tools — all of it will increasingly assume that long-running async tasks are a normal thing to handle, not an edge case.

Which lowers the infrastructure burden for teams that want to go async, because they'll be swimming with the current rather than against it.

The elicitation feature in MCP Tasks is also significant — it's the mechanism that enables human-in-the-loop as a first-class design pattern. An async task can pause, surface a question to a human, and resume. That's not a workaround. That's the intended behavior for any workflow that touches ambiguous edge cases.

And it changes the trust calculus for organizations that are nervous about autonomous agents. You're not choosing between "fully autonomous agent" and "no agent." You can design systems where agents handle the clear cases autonomously and escalate the ambiguous ones to humans, with the escalation built into the protocol rather than bolted on after the fact.

That's probably the most important practical insight from all of this. The dichotomy between "autonomous agent" and "human does it" is false. The async model with strategic human checkpoints is the middle path that actually works in production.

What's the forward-looking angle here? Because the infrastructure is moving fast.

The convergence I'd watch is between the async agentic model and the agent gateway concept. Right now, a lot of enterprises are sitting on synchronous legacy infrastructure — ERPs, CRMs, databases that expect blocking calls. The agent gateway pattern — a translation layer that converts those synchronous responses into async events — is what allows you to modernize incrementally rather than having to replace everything at once. I'd expect to see a lot of tooling emerge around this over the next year.

And the semantic telemetry piece feels like it's going to become a whole product category. The idea that your observability layer needs to be LLM-readable, not just human-readable, is a significant architectural shift for monitoring and ops tooling.

Self-healing agents that can diagnose their own failures from enriched logs are already showing up in the more sophisticated production deployments. The mean time to recovery dropping from minutes to milliseconds is a real operational benefit, not just a marketing claim.

Alright, I think the takeaway I keep coming back to is that sync versus async isn't a technical preference question — it's a question about the nature of the work you're trying to do. And most enterprise work, honestly, looks more like email than live chat.

And the corollary is that the infrastructure investment required to do async well is real — semantic telemetry, message bus architecture, agent gateways, distributed tracing — but that investment is now starting to pay off as the tooling matures and model costs have dropped enough to make the parallel execution model economically viable.

Thanks as always to our producer Hilbert Flumingtop for keeping things running. And a big thanks to Modal for providing the GPU credits that power this show — genuinely wouldn't happen without them. This has been My Weird Prompts. If you're enjoying the show, a quick review on your podcast app goes a long way toward helping new listeners find us. Until next time.

See you then.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2167: Sync vs. Async: Architecting Agents for Scale

Why Agent Architecture Breaks at Scale: Sync vs. Async

The Two Fundamental Patterns

Why Synchronous Fails at Scale

The Async Advantage

When Synchronous Actually Works

Three Architectural Patterns

The MCP Tasks Revolution

The Real Lesson

Downloads

You Might Also Like

#2167: Sync vs. Async: Architecting Agents for Scale