Herman, I was looking at the calendar this morning and realized we are already mid-way through March. It feels like just yesterday we were ringing in the new year, but the pace of development in our world—and specifically in the A-I space—has been so relentless that January feels like a lifetime ago. We are sitting here on March fifteenth, twenty-twenty-six, and I feel like the ground is shifting under our feet every single week.
It really does. And honestly, Corn, I feel like we have some prediction debt to pay off. Our listeners are quick to remind us when we nail something, like when we predicted the shift toward small language models last year, but they are even quicker to point out when we hedge our bets. I was talking to our housemate Daniel about this over breakfast, and he actually sent over a prompt specifically challenging us to stop being cautious. He wants five hard, falsifiable, high-stakes predictions for what is going to happen by the end of twenty-twenty-six. No more safety nets.
I love that. Daniel always knows how to light a fire under us. And he is right—if we are going to call ourselves the Poppleberry brothers and stand by our analysis, we need to put some skin in the game. No more vague statements about things getting better or models getting faster. We need specific milestones that people can look back on in December and say, yes, they got that right, or no, they were completely off the mark.
Spot on. Herman Poppleberry does not do vague. I have been pouring over the March twenty-twenty-six State of the Model report that just dropped last week, and while the headlines are all talking about a fourteen percent stagnation in raw benchmark scores—things like M-M-L-U Pro and Human-Eval Plus—I think people are looking at the wrong metrics. We are not in a plateau; we are in a transition phase. We are moving from the era of the chatbot to the era of the agent. The fourteen percent dip in growth for raw next-token prediction is not a sign of failure; it is a sign that the old architecture has reached its limit and the new one is taking over.
That is a perfect way to frame it. The agentic gap is the biggest hurdle we are facing right now. We have these incredibly smart models that can pass a bar exam or write a poem, but they still struggle to stay on track for a three-hour task without a human holding their hand. It is what we called the handoff problem back in episode eleven hundred and twenty. We have the brains, but we do not have the nervous system yet. So, if we are going to make these predictions, they have to center on how that gap gets closed.
I am ready. I have my five picked out, and I have made sure they are all things we can check on December thirty-first to see if we were right or if we have to eat humble pie on the show. I want to start with the thing that drives every developer crazy right now.
I suspect it has something to do with the frustration of watching an A-I try to use a computer and fail because of a tiny syntax error or a slightly outdated library.
You read my mind. Prediction number one: By the end of twenty-twenty-six, we will see the emergence of mainstream, production-grade self-correcting tool use. Now, to be specific, I am predicting that at least two of the major model providers—likely OpenAI and Anthropic—will release agents that can successfully debug their own A-P-I calls and environment errors with a ninety percent success rate over multi-step chains.
That is a huge jump. Right now, if an agent hits a four-hundred-four error or a deprecated library, it usually just loops or hallucinates a fix that does not exist. It gets stuck in these recursive loops of failure where it apologizes, tries the same thing again, and fails again. Why do you think we are finally going to break through that wall this year?
Because we are shifting away from just training on text and starting to train on execution traces. Up until now, models were mostly trained on what the code looks like on GitHub. Now, they are being trained on what happens when the code runs. It is the difference between reading a cookbook and actually burning a few omelets in the kitchen. The newer architectures are incorporating the feedback loop directly into the inference process. We are seeing the rise of what researchers call execution-conditioned policy. The model is not just guessing the next word; it is observing the state of the terminal and adjusting its strategy in real-time.
So, instead of the model just guessing the next token, it is actually simulating the environment or, better yet, running a micro-simulation of the tool call before it even hits the real server. I can see how that changes the game for enterprise adoption. If I am a supply chain manager and I want an agent to reconcile invoices across three different platforms, I cannot have it break every time a database schema changes by one column name. It needs to see the error, look at the new schema, and rewrite its own query on the fly.
Precisely. If we hit that ninety percent success rate, the reliability of A-I agents goes from being a toy to being a utility. And that leads directly into my second prediction, which is a bit more architectural but equally important for how these things actually think. I am predicting that by the end of the year, the industry will move away from simple token-prediction for reasoning tasks and shift toward what I call state-space reasoning. Specifically, we will see a major model launch that uses a tree-of-thoughts search algorithm in real-time inference as the default mode, not just a research hack.
Wait, let us unpack that for a second because that is a big technical shift. Most people think of A-I as this linear thing—it generates one word after another. You are saying it is going to start branching out internally before it gives you an answer?
That is the core of it. Think of it like a chess engine. A chess engine does not just look at the board and say, move pawn to E-four. It looks at ten thousand possible futures, evaluates the state of the board in each one, and then picks the move that leads to the best possible state. We are seeing this transition happen now. In episode six hundred and fifty-two, we talked about hopeful pausing, where models take a beat to think. But this prediction goes further. I am saying that by December, the top-tier models will be using active search during inference. They will explore multiple reasoning paths, discard the ones that lead to logical contradictions, and only then present the conclusion to the user.
That would explain why the benchmarks look like they are stagnating. If you are just testing the raw transformer, you might be hitting diminishing returns. But if you wrap that transformer in a search architecture, the effective intelligence skyrockets. It is like giving a smart person a whiteboard and ten minutes to think versus making them give an answer in two seconds. The underlying person is the same, but the output quality is vastly different.
Right. The fourteen percent stagnation in the March report is a red herring. It is measuring the engine, not the car. When you add the steering and the navigation system—which is what tree-of-thoughts search is—the utility for the end-user goes up by an order of magnitude. We are moving from System One thinking, which is fast and intuitive, to System Two thinking, which is slow and deliberate. By the end of twenty-twenty-six, your A-I will not just blurt out an answer; it will spend thirty seconds of compute time exploring the logical implications of your request before it says a word.
But Herman, I know you have been looking at the hardware side of this. We cannot run these massive search trees on a phone, can we? If every query requires ten thousand internal simulations, the battery on my phone would melt in five minutes.
That actually brings me to my third prediction, and this is one I feel very strongly about because it addresses the efficiency problem. We are going to see the first massive model distillation event. I am predicting that a model with the capabilities of a one hundred trillion parameter giant—something like the frontier models we saw at the end of twenty-twenty-five—will be successfully distilled into a sub-ten billion parameter model with ninety-five percent performance retention on core reasoning tasks.
Sub-ten billion parameters is small enough to run on a high-end laptop or even a next-generation smartphone without a cloud connection. Are you sure about that ninety-five percent figure? That is a massive compression ratio. You are talking about taking the intelligence of a massive data center and squeezing it into a pocket-sized device.
I am. And the reason is that we are finally realizing how much junk is in the large models. Most of those parameters are just storing facts—who the third prime minister of a specific country was in nineteen-fifty or the lyrics to an obscure pop song. But the reasoning logic—the ability to follow a set of instructions, solve a math problem, or understand a complex logical syllogism—that does not actually require a hundred trillion parameters. If we can separate the world knowledge, which can be stored in a separate database, from the reasoning engine, we can make the engine incredibly small.
It is like the difference between a library and a brain. You do not need to memorize every book in the library to be a genius; you just need to know how to read and think. If we can distill the thinking part and leave the facts to a retrieval system, we get these hyper-efficient models. This would be a massive shift for privacy and for American tech sovereignty. If we can run powerful, agentic A-I locally, we do not have to worry as much about centralized servers or data leaks. It fits perfectly with the move toward local-first A-I that we have been advocating for on the show for months.
It also changes the economics. Right now, the big players are spending billions on electricity to run these massive models. If they can distill that intelligence down, the cost of inference drops to almost zero. That is when A-I truly becomes a utility, like water or electricity. It is everywhere and it is cheap. But if we have all these small, powerful agents running everywhere, they are going to need a way to talk to each other. They cannot just exist in isolation. Which brings me to prediction number four: The standardization of agentic protocols.
You are talking about the H-T-T-P for A-I agents? A common language so they can actually collaborate?
I agree. We saw the release of the Open-Agent Protocol, or O-A-P, back in January of this year. It was a good start, but it was mostly just developers playing around on GitHub. My prediction is that by the end of twenty-twenty-six, at least three of the big five tech companies—think Google, Microsoft, Apple, Meta, and Amazon—will formally adopt O-A-P or a direct derivative as their standard for how agents communicate across platforms.
That would be huge. Right now, if I have a Microsoft agent and it needs to talk to a Google Calendar, it has to go through these clunky, manual A-P-Is that were designed for humans or simple apps. It is fragile. It breaks if a button moves. If they adopt a unified protocol, my agent can negotiate directly with your agent. It can say, I need thirty minutes of Corn’s time, here are the parameters, and your agent can check your preferences and confirm it without either of us ever opening an app. It is the end of the walled garden era for productivity.
It really is. If this happens, the competitive advantage shifts from who has the best app to who has the most reliable agent. And that requires a level of interoperability we just have not seen in the tech industry for decades. But I think the pressure from enterprise customers will force their hand. Businesses are tired of their tools not talking to each other. They want their Salesforce agent to talk to their Slack agent and their QuickBooks agent without needing a human to copy-paste data between them.
It is also a pro-growth move. When you standardize the plumbing, the whole building goes up faster. If we have a standard protocol, a startup in a garage can build an agent that works perfectly with the entire enterprise ecosystem on day one. That is how you kickstart a real economic boom. It is the same thing that happened when we standardized the web.
I agree. But there is one more piece of the puzzle. An agent that can think and talk is great, but an agent that forgets who you are every time you start a new session is useless. It is like having a genius assistant who has amnesia every morning. That leads to my fifth and final prediction: The integration of long-term episodic memory into consumer-grade operating system kernels.
In the O-S kernel? Not just in the app? That is a deep integration, Herman. You are talking about the very core of the computer.
That is the goal. I am predicting that by the end of twenty-twenty-six, either Apple or Microsoft will release an update where the A-I has a secure, local-first vector database that records and indexes everything you do on the device—with your permission, of course—and uses that as a persistent memory layer for every agent on the system. It will not just be a log; it will be a searchable, semantic memory of your entire digital life.
That is the holy grail of personalization. Imagine your A-I knowing that three months ago you mentioned a specific preference for how you like your spreadsheets formatted, and it just applies that knowledge today without you asking. Or it remembers that you were researching a specific topic last summer and can pull up the relevant files the moment you start a new project. But Herman, the privacy implications there are massive. People are going to be terrified of a kernel-level logger. It sounds like a surveillance tool.
That is why it has to be local-first and encrypted. If it is sitting in the cloud, it is a nightmare. But if it is on your device, protected by the same hardware security that stores your fingerprints or face I-D, it becomes a powerful tool for individual liberty. You own your data, and your A-I uses it to serve you, not an advertiser. We have talked about the architecture of intelligence before, specifically in episode eleven hundred and eleven, and the consensus was that memory is the missing link. This prediction says we finally build that link in a way that is baked into the hardware.
It would certainly solve the goldfish memory problem we see today. Right now, even the best models have a context window that eventually fills up and they start forgetting the beginning of the conversation. With a kernel-level episodic memory, the context window effectively becomes infinite because the model can just query its own history whenever it needs to. It is like having a perfect memory of every interaction you have ever had with your computer.
So, there they are. Five specific, high-stakes predictions for the rest of twenty-twenty-six. Self-correcting tool use with a ninety percent success rate, state-space reasoning as the default inference mode, massive model distillation down to ten billion parameters, standardized agentic protocols adopted by the big players, and kernel-level episodic memory. If even three of these come true, the world of twenty-twenty-seven is going to look fundamentally different than the one we are in today.
It is a bold list, Herman. I especially like the focus on the underlying infrastructure. Everyone is looking at the shiny new features, the new video generators or the new voices, but you are looking at the plumbing—the protocols, the memory, the reasoning loops. That is where the real revolution happens. It is like the early days of the internet. People were excited about flashy websites, but the real power was in T-C-P-I-P and H-T-M-L. We are building the T-C-P-I-P of intelligence right now.
Right. And I think it is important for our listeners to realize that this shift toward agentic autonomy is not just a technical curiosity. It has real-world implications for how we work and live. If these predictions hold, we are moving toward a world where the A-I is not just a tool you use, but a partner that anticipates your needs and executes complex tasks on your behalf. It is the difference between using a hammer and having a carpenter.
Which brings up a point about the human-in-the-loop. One of the big misconceptions people have is that agents are going to replace humans entirely by the end of the year. I think our predictions actually suggest the opposite—that agents will finally become reliable enough to be useful assistants, but they will still need that human guidance at the high level. We are not replacing the architect; we are just giving them a thousand tireless builders who can follow instructions perfectly.
We are not predicting the end of work; we are predicting the end of drudgery. The agent handles the A-P-I calls, the debugging, the memory retrieval, and the scheduling. The human provides the vision, the values, and the final decision-making. It is an augmentation, not a replacement. And honestly, from our perspective, that is the most pro-human outcome we could hope for. It frees us up to do the things that only humans can do—creative problem solving, ethical judgment, and emotional connection.
It also aligns with our view on American leadership in this space. By focusing on local-first memory and distilled models, we are moving away from the kind of centralized, state-controlled A-I models we see in other parts of the world. We are building a system that empowers the individual. It is a decentralized intelligence that lives on your device and works for you, not for a government or a massive corporation.
Well said. And look, we know we are sticking our necks out here. If December rolls around and we are still struggling with basic tool use or if the O-S kernels are as dumb as they were in twenty-twenty-four, we will be the first ones to admit it. We will have a special episode where we go through each one and explain why we were wrong. But based on everything I am seeing in the research papers and the private betas, the momentum is all pointing in this direction. The era of the chatbot is over; the era of the agent is beginning.
I agree. The pressure is building, and usually, when that happens in tech, you get a massive breakthrough. So, what should our listeners actually do with this information? Because it is one thing to hear these predictions, but it is another to prepare for them. How do you get ready for a world of invisible A-I?
The first takeaway is to stop building for chat. If you are a developer or a business leader, and you are still thinking about A-I as a window where people type things, you are already behind. You need to start thinking about state persistence. How does your A-I remember what it did yesterday? How does it recover when a tool fails? If you build for state, you will be ready for the agentic revolution. You need to be thinking about how your software can be used by an agent, not just by a human.
That is a great point. And for the individual user, I would say start auditing your own workflow for agentic fragility. Where are the points in your day where you have to manually move data from one app to another or manually check for errors? Those are the spots that are going to be automated by the end of the year. If you can identify them now, you can be the first to adopt the tools that fix them. Look for the friction; that is where the agents will go first.
You hit the nail on the head. And I would also suggest that people start tracking their own A-I-assisted task completion rate. We always talk about how much time A-I saves us, but how often does it actually finish a complex task from start to finish without you intervening? Right now, for most people, that number is probably pretty low—maybe ten or twenty percent for complex stuff. Watch that number. As it climbs toward fifty, seventy, or ninety percent, that is when you know these predictions are hitting home. It is a metric for personal productivity that actually means something.
It makes the abstract concept of A-I progress very concrete. You can literally see your own capacity expanding as the tools get more reliable. I love the idea of a personal dashboard for agentic success.
That is the vision. And hey, we should probably mention that if people want to track our progress on these predictions, they should definitely stay tuned to the show. We will be doing a mid-year check-in in June to see where we stand on each of these five points, and then a full accounting in December. We are going to hold ourselves to the same standard of accountability that we hold the tech companies to.
Oh, I am already looking forward to that December episode. It will either be a victory lap or a very humble apology. But either way, it is going to be a fascinating ride. I think we have covered a lot of ground here, Herman. These five points really do provide a roadmap for the rest of twenty-twenty-six. It is a roadmap that moves us away from the hype and toward actual, functional utility.
They do. And it is a roadmap that leads to what I call invisible A-I. The best technology is the kind you do not have to think about. It just works. It knows who you are, it knows what you need, it communicates with other systems seamlessly, and it does not break when it hits a minor snag. That is the world we are predicting. A world where the computer finally understands us, instead of us having to understand the computer.
Invisible A-I. I like that. It is the ultimate goal—tech that fades into the background and just lets you be more human. Before we wrap up, I want to remind everyone that if you are finding these deep dives helpful, please leave us a review on your podcast app or on Spotify. It really does help the show reach more people who are trying to make sense of this crazy fast-moving world. We are trying to build a community of people who are looking at the future with clear eyes.
It genuinely does. We see every review, and we appreciate the feedback. And if you want to make sure you never miss an episode—especially that mid-year check-in—head over to myweirdprompts dot com. You can find the R-S-S feed there, or you can search for My Weird Prompts on Telegram to get a notification every time we drop a new one. We have a lot of great stuff planned for the rest of twenty-twenty-six.
We have been doing this for over twelve hundred episodes now, and I can honestly say I have never been more excited about the trajectory we are on. The move from chat to agents is the big one, folks. It is the shift we have all been waiting for since we first started talking about transformers years ago.
It really is. Thanks for the challenge, Daniel. We have put our markers on the table. Now we just have to see how the rest of the year plays out. The clock is ticking, and I am ready to see these predictions come to life.
Well, I for one am optimistic. We have the talent, we have the infrastructure, and we have the drive to make these things happen. It is going to be a big year for American innovation. We are going to see things by December that would have seemed like science fiction just a few years ago. Think about that distillation prediction again. If we really get a G-P-T-five level model running on a phone, the implications for education alone are staggering. Every kid in the world could have a world-class, private tutor in their pocket that does not need an internet connection. It levels the playing field in a way that nothing else ever has.
It is a game-changer for the digital divide. We always worry about people being left behind because they do not have high-speed access, but if the intelligence is local, that barrier disappears. It is a very hopeful vision of the future. It means that intelligence is no longer a centralized resource that you have to pay a subscription for; it is something you own.
It is. And it is one worth fighting for. Consider this: if the protocols are standardized, we might see a whole sub-economy of agents trading resources, data, and compute time. Every time we solve one problem, three more interesting ones pop up.
I couldn't agree more. There is always more to explore. That is why we do the show. I will be watching the developer betas like a hawk to see what the next few months bring.
I have a feeling they are going to be busy. This has been My Weird Prompts. Check out the website at myweirdprompts dot com for more.
And do not forget that review! It really helps.
Talk soon.
Bye.