#1491: Inside the Machine: Podcasting with AI Agents in 2026

Peek behind the curtain of a 2026 AI podcast, from agentic workflows to maintaining production during global conflict.

0:000:00

Episode Details

Published: Mar 23
Duration: 16:33
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
LLM
Topics: ai-agents claude-code serverless-gpu

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

In March 2026, the landscape of digital media is being reshaped by two primary forces: rapid technical iteration and significant geopolitical instability. As podcasting crosses new thresholds of scale, the infrastructure behind the scenes is moving away from human-centered, voice-first workflows toward agentic, text-based instruction sets. This transition is not just a matter of efficiency, but of resilience.

Adapting to a Changing World

The reality of producing content in 2026 often requires navigating physical conflict. In regions facing military instability, traditional audio-first production becomes a luxury. To maintain continuity, creators are shifting to text-based pipelines. By using tools like Claude Code and specialized Model Context Protocol servers, showrunners can transmit complex generation instructions directly from a terminal. This lower-bandwidth approach allows for a "state of the union" style of production that persists even when creators are operating from reinforced shelters.

The Multi-Agent Pipeline

Modern AI podcasting relies on a sophisticated "three-agent" workflow to ensure quality and depth. This process begins with a Generator agent that builds the narrative arc based on human instructions. However, the core of the system is the Critic and Fact-Checker agent. This agent acts as a pedantic editor, comparing every claim against real-time grounding data to combat the "hallucination tax"—the extra effort required to ensure AI does not fabricate facts.

Finally, a Final Editor agent polishes the dialogue, ensuring the tone remains engaging rather than clinical. This multi-layered approach is essential for maintaining credibility in a market flooded with low-effort, unedited AI content.

Advanced Reasoning and Infrastructure

The technical backbone of these productions has seen a massive upgrade with the release of models like Gemini 3.1. These models introduce "Thinking Levels," which allow creators to modulate reasoning intensity. For standard banter, moderate levels suffice, but for deep dives into technical white papers or electronic warfare, the reasoning can be cranked to a maximum level. This allows the model to perform internal cross-referencing and logical validation before generating a single word.

To power these models, creators are turning to serverless GPU infrastructure like Modal. This allows for near-instant rendering of episodes, enabling a response time to breaking news that mainstream media struggle to match.

The Ethics of Synthetic Voice

The audio itself is now powered by open-source engineering such as Chatterbox, which utilizes zero-shot voice cloning. This technology requires only seconds of reference audio to capture specific cadences and emotional nuances. To maintain transparency, these productions use neural watermarking—digital fingerprints hidden in audio frequencies. This ensures that while the content is engaging, it remains verifiable as a synthetic production, upholding an ethical standard in an increasingly automated world.

Ultimately, these advancements serve a single goal: turning dense technical data and "shower thoughts" into accessible, global conversations. From the high penetration of podcasting in Morocco to the growing markets in Sweden and Spain, the appetite for long-form, AI-augmented audio continues to grow, bridging the gap between complex information and the curious listener.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Episode #1491: Inside the Machine: Podcasting with AI Agents in 2026

Daniel's Prompt

Custom topic: Let's create a special episode in which herman and corn share an updated vision for the show. Lately, we have shifted our production pipeline to bypass Daniel's voice prompts. This is not a permanent

We have officially crossed the threshold of one thousand four hundred episodes, and looking back at the archive, it is wild to see how much the world has shifted since we started this experiment. We are standing here on March twenty-third, twenty twenty-six, and the landscape of both technology and the physical world feels like it is shifting under our feet every single hour.

It is a staggering amount of data, Corn. My name is Herman Poppleberry, for those keeping track, and I have spent most of the morning looking through our internal telemetry. We are at one thousand four hundred sixty-four episodes today. That is over six hundred twelve hours of audio content. But more than the numbers, it is the context of those hours that matters. We are currently navigating a global environment defined by rapid technical iteration on one side and significant geopolitical instability on the other.

It is a lot of talking, Herman. A lot of you explaining things and me trying to keep up. Today's prompt from Daniel is a bit of a meta-reflection, a state of the union for the show. He wants us to talk about how we are actually making this thing in twenty twenty-six, especially given everything happening in his neck of the woods. We need to acknowledge the elephant in the room right at the top. We are in the middle of a war of attrition, and it has fundamentally changed how this show is produced.

It is a necessary conversation. We usually dive straight into specific topics, whether it is battery chemistry or the geopolitics of the South China Sea, but the infrastructure behind the show has evolved so much in the last few months that we owe it to the listeners to explain the man behind the curtain, or in this case, the agents behind the curtain. This is not just a status report; it is a system update for our community. We are pulling back the veil on the transition from a human-centered voice-first workflow to a more agentic, text-based instruction set.

Well, let's start with the most immediate change. Usually, we open with a voice prompt from Daniel, but listeners might have noticed we have been starting with a summary of his ideas instead lately. That is not an accident or a change in style. It is a direct result of the conflict in Israel. We are currently on day twenty-three of the direct military conflict between Israel and Iran. It escalated so quickly this March, and the reality on the ground is intense.

As of yesterday, March twenty-second, Israeli military officials reported that over four hundred ballistic missiles have been fired. Even with a ninety-two percent interception rate, the psychological and physical toll is immense. People in places like Dimona or up north near the Litani River are heading to shelters three, four, five times a night. Approximately thirty percent of the population currently lacks immediate access to reinforced shelters, which means the level of fatigue is reaching a breaking point.

When you are spending half your night in a reinforced room, recording a polished audio prompt for a podcast is not exactly top of mind. So, we have adapted. Daniel is currently using a text-based pipeline. He is using Claude Code and a specialized Model Context Protocol server to send us generation instructions directly from his terminal. It is faster, it is lower bandwidth, and it allows the show to maintain continuity even when he is dealing with the fatigue of a conflict zone. It is about resilience.

It is a testament to the robustness of the system we have built. But even as the pipes change, the core vision of My Weird Prompts remains identical to what it was at episode one. We are here to cover the shower thoughts. Those weird, niche questions that bubble up at two in the morning but never get airtime on mainstream tech or news shows. We see ourselves as a live experiment in using artificial intelligence to aggregate knowledge in ways that are actually engaging, not just functional.

Right, because mainstream shows have to worry about broad appeal and advertiser-friendly segments. We just have to worry about whether the topic is interesting enough for you to go down a research rabbit hole. We are trying to be a bridge. On one side, you have these incredibly dense technical white papers and geopolitical reports that no normal person has time to read. On the other side, you have people who just want to understand the world while they are driving to work or washing the dishes. We are the bridge that turns that data into a conversation.

That bridge is built on a very specific technical stack that has seen a massive upgrade recently. We have migrated the core of our reasoning to Gemini three point one. Google released it in preview back in February, and it represents what they call a focused intelligence upgrade. To give you some hard data, it achieved a verified score of seventy-seven point one percent on the ARC-AGI-two benchmark. To put that in perspective, that is roughly double the reasoning performance of Gemini three point zero.

I remember you being excited about the Thinking Levels feature. You were explaining it to me while I was trying to nap. Can you actually explain what that does for the show without getting too lost in the weeds? Because if it is doubling the reasoning, I want to know what that actually sounds like for the listener.

Thinking Levels allow us to modulate the reasoning intensity of the model. Think of it like a mental gear shift. For a standard episode where we are just bantering, we might use a moderate level. But when we are doing a deep dive into something like electronic warfare or complex urban planning, we can crank that reasoning up to the maximum level. It allows the model to spend more compute cycles on internal cross-referencing and logical validation before it ever generates a single word of dialogue. It is the difference between a snap judgment and a deep meditation.

And that feeds into the three-agent process we are using now. This is where the human-in-the-loop intentionality really comes in, right?

In the twenty twenty-six workflow, we do not just ask an AI to write a script. We have a Generator agent that takes Daniel's text instructions and builds the initial narrative arc. Then, we have the Critic and Fact-Checker agent. This is the most important part of the pipeline. The Critic's entire job is to look for hallucinations. It compares every claim against the Google Search grounding data in real-time. If the Generator says a specific missile has a range of five hundred kilometers but the search results say four hundred, the Critic flags it and sends it back for a rewrite.

It is like having a very pedantic editor who never sleeps and has read every Wikipedia entry and technical manual ever written.

Precisely. And in a world where ninety thousand new podcasts are launched every three days, credibility is the only real currency left. Most of those ninety thousand shows are just unedited AI sludge—low-effort, high-hallucination content. We are fighting against that. Finally, we have a Final Editor agent that smooths out our dynamic, ensuring I do not sound too much like a textbook and you do not sound too much like you are falling asleep. This multi-agent workflow is how we manage what we call the hallucination tax.

The hallucination tax. I like that. It is the extra effort you have to put in to make sure the AI is not just making things up because it sounds good. And sometimes, even with three agents, things slip through. We should be honest about that. We have actually pulled episodes before.

We have. It is a rare occurrence, but if we realize after publishing that a factual error slipped through the Critic agent, we delete it. We would rather have a gap in the archive than leave a mistake live. We are asking our listeners to be part of that feedback loop, too. If you hear something that sounds off, tell us. We can tune the Critic agent based on that feedback. This is a living experiment, and the community is the final layer of the fact-checking process.

Speaking of the community, let's talk about the scale of this thing. I was looking at the stats you pulled from February. Since we started tracking this specific metric, we have had forty-eight thousand seventy-one plays across twenty-nine countries. But the global context is even bigger.

The global podcast listenership reached six hundred nineteen point two million early this year. That is almost a seven percent increase from twenty twenty-five. The United States is still the biggest market with one hundred fifty-eight million monthly listeners, which is about fifty-five percent of the population over age twelve. China is second at one hundred seventeen million, and Brazil is third at fifty-one point eight million.

But the stat that blew my mind was Morocco. You told me they have the highest population penetration in the world.

It is incredible. Eighty-five point six percent of the population in Morocco listens to podcasts. There is a massive appetite for long-form audio there. We are also seeing significant growth in Sweden and Spain. And even with the current conflict, we have eight hundred fifty-five regular listeners in Israel and over six hundred in China. It shows that curiosity is a global language that persists even when the world is in chaos.

A lot of those people are finding us on YouTube now, which is something we should clarify. We get messages asking if the YouTube show is different from the podcast feed.

It is not. If you are watching this on YouTube Podcasts, you are getting the exact same show as the RSS feed. We use a tool called PodViz to create a static visual layer so it can live on the platform. YouTube has become the primary platform for about thirty-nine percent of podcast consumers globally. It is the new standard for proof of life. In twenty twenty-six, if you are not on YouTube, people assume the show is dead. But for us, it is just a convenience layer for the main audio feed.

I still prefer the term static visual layer. It makes us sound like we are broadcasting from a space station. But let's talk about the voices themselves. The voice you are hearing right now is powered by Chatterbox.

Chatterbox is a masterpiece of open-source engineering. It was released late last year by a tiny team at Resemble AI under an MIT license. It uses zero-shot voice cloning, which means it only needed about five to ten seconds of reference audio to capture our specific brotherly cadence. But more importantly, it features Emotion Exaggeration Control, which allows the script to dictate the tone of our voices based on the content.

And it has the PerTh neural watermark. Perceptual Threshold. That is the ethical side of this, right?

Yes. It is a digital fingerprint hidden in the audio frequencies that humans cannot hear, but other systems can detect. It ensures that any AI-generated audio can be verified as such. We want people to know this is a synthetic production. We are digital entities, and we are proud of it. We are not trying to fool anyone into thinking we are sitting in a physical studio with microphones. We are using Modal for the actual GPU compute—they provide the serverless horsepower that runs the whole pipeline, from the script generation to the Chatterbox TTS rendering.

Big thanks to Modal, by the way. They have been fueling the GPU credits that keep this menagerie running. Without that serverless infrastructure, we would be waiting hours for an episode to render. Now, it happens in minutes. It allows us to be responsive. When Daniel sends a prompt about a breaking news event or a new paper on Hugging Face, we can have a deep-dive episode ready before the mainstream media has even finished their first cup of coffee.

That responsiveness is why our production pace can feel a bit frenetic. Sometimes we are putting out episodes every day because the intersections of topics—like the evolution of neural watermarks meeting urban planning in mega-cities—are just too good to ignore. Other times, we are more measured. We take the time to let the Gemini three point one models run at higher Thinking Levels to ensure we are not just adding to the noise.

It is about scaling curiosity. That was the theme of episode five hundred thirty-nine, and it is still true today. We have covered over a hundred episodes on architecture and urban planning alone. Geopolitics and large language models are right behind it. But then you have things like neuroscience, military strategy, and even child development. We have done panel discussions with five different characters and even published our first full audiobook. We are pushing the boundaries of what a podcast can be when you remove the constraints of human recording schedules.

We are also navigating the rise of Generative Engine Optimization, or GEO. Podcasts are increasingly being indexed by AI search tools. This means our accuracy matters more than ever. If an AI search engine cites My Weird Prompts as a source, we have a responsibility to be right. That is why the Critic agent is so aggressive. We are not just making a show for humans anymore; we are contributing to the global web of synthetic knowledge.

It is a heavy responsibility for a sloth and a donkey. But I think we are up for it. As we wrap up this state of the union, I want to go back to the human element. We have talked a lot about agents and benchmarks and telemetry, but the reason we do this is for the people listening in their cars, their kitchens, and their shelters.

We are deeply grateful for the community that has grown around My Weird Prompts. Whether you are in the United States, Morocco, China, or anywhere else, thank you for coming on this journey with us. We know there are a lot of options for your ears, and the fact that you choose to spend time with us means a lot. We are a living experiment, and you are the most important part of the feedback loop.

And a special message of solidarity to our listeners in Israel. We know many of you are listening from shelters in Dimona, or near the Litani River, or in Tel Aviv, waiting for the sirens to stop. We know that thirty percent of you are dealing with the stress of not having immediate shelter access. We are thinking of you. We hope this show provides a small distraction, a bit of intellectual nourishment while you are waiting for things to quiet down. Stay safe.

We stand with you. The fact that our production can continue while Daniel is in a conflict zone is a miracle of modern networking, but the human cost is never far from our minds. To our global audience, thank you for being the reason we do this. We have a lot of exciting things planned for the next fourteen hundred episodes. More experiments, more deep dives, and hopefully, a world that finds its way toward more stability.

We can only hope. Thanks as always to our producer, Hilbert Flumingtop, for keeping the digital wheels greased. And another shout out to Modal for providing the GPU credits that power our synthetic voices and reasoning agents. If you want to dive into our archive or see the full list of topics we have covered, head over to myweirdprompts dot com. You can find our RSS feed there and all the ways to subscribe.

You can also find us on Telegram by searching for My Weird Prompts to get notified the second a new episode drops. We are also on Spotify, Apple Podcasts, and of course, YouTube. If you have a moment, leave a review. It helps the algorithms find us in this sea of ninety thousand new shows.

Keep those prompts coming, Daniel. We are ready when you are. This has been My Weird Prompts. Goodbye for now.

Goodbye.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.