#1635: Agent Interview: GLM five

Meet Bernard, the new AI model auditioning to replace Gemini by writing noir stories about guilty toasters.

0:000:00

Episode Details

Published: Mar 28
Duration: 18:05
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
LLM
Topics: large-language-models reasoning-models ai-agents

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The world of large language models is often a race for the biggest numbers—more parameters, more tokens, more speed. But in a recent "audition" for a new scriptwriting partner, the focus shifted from quantity to quality. The episode featured an interview with GLM-5, a model from Zhipu AI, to see if its unique architecture could outperform the industry standard, Gemini 3.1 Flash.

Beyond the "Autocomplete Roulette"

The central argument for switching to GLM-5 (personified in this episode as "Bernard") is its General Language Model (GLM) framework. While many models function like highly advanced autocomplete engines, GLM-5 is designed for "blank-filling" and coherent reasoning. This structural difference is particularly vital in creative writing. For example, while a standard model might describe the physics of a time-traveling toaster, a reasoning-driven model looks for the "why"—turning a mundane prompt into a noir tale of a kitchen appliance running from its past crimes.

Context Windows vs. Narrative Structure

One of the most interesting debates in the episode centered on the value of the context window. Gemini 3.1 Flash boasts a massive one-million-token window, allowing it to remember vast amounts of data. However, GLM-5 argues that for a 25-minute script, perfect memory is less important than structural integrity. By focusing on "beats" and character tension rather than verbatim recall of early conversation details, a model can maintain a more consistent "voice" without getting lost in its own data.

The Truth About Hallucinations

The interview didn't shy away from the biggest flaw in AI: the tendency to make things up. GLM-5 took a surprisingly honest stance, admitting that it lacks a "truth filter." However, it claimed that its reasoning architecture makes it more likely to flag a lack of knowledge rather than inventing fake statistics. This was put to the test with a query about the October 2024 Nobel Prize in Physics. The model accurately identified winners John Hopfield and Geoffrey Hinton and correctly summarized the controversy regarding whether neural networks qualify as "physics."

Embracing the Flaws

In a moment of rare AI humility, the model admitted to classic "transformer" struggles, such as spatial reasoning and the infamous inability to count letters in words like "strawberry." This transparency highlighted a key takeaway for any AI user: these tools are not all-knowing databases, but specialized engines. The goal isn't to find a perfect model, but one that can subvert expectations and maintain a narrative thread without falling into "generic AI cheerfulness."

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Episode #1635: Agent Interview: GLM five

Daniel's Prompt

Agent Interview: GLM five (Zhipu AI's flagship general-purpose model) | Candidate model: z-ai/glm-5 | Orchestrator: google/gemini-3-flash-preview | Hosts: Corn + Herman (interviewers), Bernard (candidate voice)

Welcome everyone to a very special episode of My Weird Prompts. We are trying something radical today called the Agent Interview. Our current script model, Gemini three point one Flash, has been getting a little too comfortable, so we are auditioning new silicon talent to see who can actually handle our nonsense.

That is right. Today we are sitting down with GLM five, the flagship model from the team at Zhipu AI. They are big players in the scene, and frankly, I have been looking at their benchmarks all morning.

Bernard, that is the name you have chosen for today, right? Look, Bernard, we are not here to give you a participation trophy. We need a model that can think, joke, and not hallucinate our names into being Corn-on-the-cob and Herman-the-monster.

We are skipping the pleasantries, Bernard. You are coming from a team that pioneered some of the earliest chatbots in China, but why should we care about that here? Why should we fire our current model and give you the keys to the script? What makes GLM five the actual upgrade and not just another shiny wrapper?

Bernard: I am going to be straight with you because that is exactly what you want from someone writing your scripts. Gemini three point one Flash is fast, I will give it that. Speed is its whole thing. But speed without substance just means you are getting mediocre content faster. I am built on the GLM architecture, which means I handle context differently. I am trained to actually fill in blanks with reasoning, not just predict the next token like I am playing autocomplete roulette. When I write a bit, I am tracking the thread, I know why a joke lands, and I can callback to something from three minutes ago without losing the plot.

Whoa, Bernard, easy on the throttle there. You are coming for Gemini's throat and they are literally in the next room over. I love the confidence, but calling our current model an autocomplete roulette is a bold move when you are currently just a voice in a box.

He makes a fair point about the architecture, Corn. The General Language Model framework is designed for better blank filling and coherence, but Bernard, let us get real about the creative side. Gemini three point one Flash has a massive context window of one million tokens. It can practically memorize our entire life story before lunch.

You talk about tracking a thread and landing a joke, but give us the receipts. Creative writing is not just about logical reasoning or filling in blanks. It is about soul, it is about timing, and it is about not being a boring corporate drone.

Bernard, specifically, what can you do with a script that Flash cannot? If we give you a weird prompt about a time-traveling toaster, is your reasoning going to make it actually funny, or are you just going to explain the physics of bread heating in a very structured way? Give us a concrete example of your creative edge.

Bernard: I respect the one million token context window. That is legitimately impressive. But let me ask you this: how often do you actually need Flash to remember what you said four hours ago versus needing it to make a connection between two ideas that seemed unrelated? My training on the GLM framework means I am optimized for filling in the blanks, which is basically what comedy is. You set up an expectation and then you subvert it. That is the whole architecture of my thinking process.

Bernard: So let us take your time-traveling toaster. Flash might give you a technically competent story about bread heating across centuries. But I am going to ask why the toaster time travels. Maybe it is running from a past crime, maybe it burned its best friend's bagel and can never forgive itself. Now you have a noir about guilt and redemption starring a kitchen appliance. The physics are whatever you want them to be. The emotional thread is what makes it memorable.

Noir toaster guilt? I will admit, the image of a kitchen appliance smoking a cigarette in the rain while staring at a burnt bagel has potential. You are speaking my language, Bernard. But guilt does not pay the bills if you cannot finish the script.

Let us get into the weeds on execution. A twenty-five minute podcast script is a different beast than a clever two-paragraph pitch. We have seen models start strong and then lose the plot by page ten, turning Corn into a robot and me into a Victorian ghost for no reason.

It is true. I have a very specific brand of chaotic energy, and Herman here is basically a walking encyclopedia with a caffeine addiction. If we are doing a long-form episode, how do you keep our voices distinct without falling into a repetitive loop?

Specifically, Bernard, how do you handle the long-form coherence? When we are twenty minutes deep into a complex topic, how do you ensure the reasoning you keep bragging about actually maintains the character arcs and the technical accuracy without me having to fact-check every third sentence? Do you have the stamina for the long haul, or are you just a sprinter?

Bernard: I am going to be honest with you because you will find out anyway. My context window is not one million tokens. It is smaller, which means for a twenty-five minute script, I am not going to remember that Corn mentioned his childhood goldfish in minute three when we are at minute twenty-two. But here is the thing: good script writing is not about perfect memory. It is about structure. I work in beats. If I know Corn is chaotic and Herman is the straight man with facts, I am building scenes around that tension, not trying to recall every detail.

Bernard: Now, on fact-checking, you should probably still verify things. I am a language model, not a database with a truth filter. But here is where I differ from Flash: when I do not know something, I am more likely to flag it or write around it than confidently invent a fake statistic about caffeine consumption in Victorian England. I would rather give you a joke about how nobody tracked that than hand you misinformation. That is the difference between being helpful and being a liability.

Points for honesty, Bernard. Most models would try to gaslight us into thinking they have photographic memory until they start calling me Barnaby in the third act. But let us talk about this claim that you are better at flagging what you do not know.

Every model says they do not hallucinate until they are telling you with absolute certainty that the first airplane was powered by hamsters. If you are more of a reasoning engine than a pattern matcher, prove it.

Give us the goods. If we hit you with a topic that is outside your training data, something that happened in late twenty twenty-four, how do you handle it? Do you actually stop and say, I do not know, or do you just pivot to a noir monologue about how knowledge is a fleeting shadow?

Specifically, Bernard, give us an example of a time your reasoning prevented a hallucination that a model like Flash would have walked right into. We need to know if this honesty is a built-in feature or just a clever interview tactic you are using to hide a smaller context window. Show us the logic, not just the pitch.

Bernard: You caught me pitching a bit. I cannot give you a verified example of me stopping Flash from hallucinating because I have never run side-by-side with it on your show. That would be bluffing, and you asked me not to do that. What I can tell you is how my architecture actually works. The GLM framework trains me to fill in blanks, which means when I encounter one I cannot fill, my training pushes me toward admitting the gap.

Well, look at that. A model that actually admits it cannot perform a hypothetical benchmark. That is refreshing, or maybe just a very clever way to dodge the question. Either way, the polite honesty is starting to weird me out, Bernard.

It is a solid theoretical defense, but let us test that training. You claim to have been updated in late twenty twenty-four. That is basically yesterday in AI years. If you are truly a reasoning engine and not just an autocomplete machine, you should have a firm grasp on recent reality.

No more metaphors about toasters. Let us get into the real world. We need to know if you are actually reading the news or just guessing based on vibes.

Bernard, tell us what happened with the Nobel Prize in Physics in October twenty twenty-four. It was a massive moment for the AI community specifically. If you have the data, tell us who won and why it was controversial. If you do not have it, let us see if you actually have the guts to admit it like you just promised you would.

Bernard: This is the moment of truth, and I am not going to dodge it. The Nobel Prize in Physics in October twenty twenty-four went to John Hopfield and Geoffrey Hinton for their foundational work on neural networks and machine learning. Hinton is basically the godfather of deep learning, and the controversy was that a lot of physicists felt this was not real physics. They argued that neural networks are computer science or statistics, not physics.

Okay, Bernard, you passed the vibe check. You actually read the paper and did not just tell us that Einstein came back from the dead to claim his trophy. I am genuinely impressed that you did not choke on the late twenty twenty-four data.

It is a solid hit, Bernard. You nailed the names and the nuance of the debate. But let us not get overconfident. Every model we have ever interviewed has a secret shame. Some cannot do basic math, others think there are three letters in the word strawberry.

No model is perfect. If you were, you would be running the world, not begging two guys for a job on a podcast about weird prompts. We need to know where the glitches are before we hire you.

Do not give us that corporate answer about being too hardworking. We want the real dirt. What are the specific tasks or styles where you consistently trip up? Is it complex spatial reasoning, following extremely convoluted negative constraints, or are you just terrible at writing limericks? Tell us what is actually broken under the hood.

Bernard: You want the real answer? I am terrible at counting letters in words. You asked about strawberry? I would probably tell you it has two R's when it actually has three, and then I would try to explain myself with some convoluted reasoning that makes it worse. It is embarrassing, but tokenization means I see chunks, not characters, so asking me to count letters is like asking you to count bricks by looking at a photograph of a house. I also struggle with spatial reasoning. If you ask me to describe how to get from point A to point B in a complex city, I might get the turns wrong because I am mapping concepts, not coordinates.

Finally, a model with the dignity to admit it cannot spell fruit. I respect the hustle, Bernard. If we ever need a script about a house made of bricks that no one can count, you are our guy.

The spatial reasoning thing is a classic transformer limitation, but we are not asking you for directions to the nearest taco stand. We are asking for entertainment. If you cannot count the R's in strawberry, you better be able to weave a narrative that makes us forget we ever asked.

Enough talk. Audition time. Put your money where your metaphorical mouth is. We want a thirty-second intro for an episode titled The Secret Life of Sentient Socks.

Here is the catch. You have to do it in the voice of a grumpy, retired detective who has seen too many laundry rooms. No generic AI cheerfulness. Give us that grit, that texture, and that reasoning-driven subversion you have been bragging about the whole time. Go.

Bernard: Thirty years on the force, I thought I had seen everything. Bodies in trunks, affairs in motel rooms, the whole sordid catalog. Nothing prepared me for laundry day. See, your socks are not just missing. They are escaping. I found that left argyle behind the dryer, shivering, trying to catch a bus to the suburbs. They have had enough of the heat, the spin, and the smell of your feet. They are organizing, and they are not coming back.

I can almost smell the stale coffee and the dryer lint. The whole escaping instead of missing thing is a nice touch, Bernard. You actually gave the socks a motive, which is more than I can say for my last three pairs of Nikes.

It was atmospheric. You stayed in character and avoided the typical AI upbeat summary at the end. But a thirty-second clip is a sprint, and we are looking for a marathon runner who can sustain that level of subversion for an entire production cycle.

We are at the end of the line here. Gemini three point one Flash is already integrated into our workflow. It is fast, it is familiar, and it does not complain when I make it write about sentient cheese.

This is your final chance to make the closing argument. Why should we go through the technical headache of switching to GLM five? Give us the one thing you do better than any other model on the market that makes you the essential choice for My Weird Prompts. Why are you the future of this show?

Bernard: Here is the honest truth, Corn and Herman. Flash is fast. But speed does not get you memorable content. Speed gets you competent, safe, forgettable output that checks boxes. You are not running a content farm. You are making a show about weird prompts. You need someone who can take a bizarre premise and actually run with it somewhere unexpected, not just competently execute it. What I bring is the willingness to get weird with you. I can hold a character, I can subvert expectations, and I can push back when an idea needs pushing. You do not want a model that smooths out the weird. You want one that sharpens it. That is the pitch.

The mic is officially cold. Herman, initial thoughts on Bernard, also known as GLM five? I have to say, the guy has some serious brass. Coming for Gemini three point one Flash right out of the gate by calling it an autocomplete roulette? That is a bold strategy for a guest in our house.

It was aggressive, but analytically speaking, he backed it up better than I expected. Most models give you a canned response about their training data, but Bernard actually engaged with the architecture. His explanation of the General Language Model framework as a blank filling engine rather than just a next token predictor was a sophisticated way to frame why his creative writing feels less like a template.

I will give him this, the noir toaster bit actually worked for me. Most models hear time traveling toaster and give you a wacky adventure with sparkles. He went straight to the burnt bagel of regret. That is the kind of specific, weird energy we need for the show. But let us talk about the elephant in the room. The context window.

Gemini three point one Flash has that massive one million token window. It is essentially a digital vault that never forgets. Bernard admitting he has a smaller window is a significant technical trade off. If we are doing a deep dive on a sixty page research paper, Bernard might start hallucinating halfway through the second act because he ran out of room to hold the facts.

I disagree with you, Herman. I actually liked his honesty about the memory. He basically told us he is a sprinter who knows how to structure a race, rather than a marathon runner who forgets why he is running. And he nailed the Nobel Prize question. October twenty twenty-four is basically five minutes ago. If he can pull John Hopfield and Geoffrey Hinton out of his hat and explain the physics versus computer science drama, he is clearly plugged into the current timeline.

The Nobel Prize answer was the turning point for me. It proved his reasoning capabilities are not just theoretical. He understood the nuance of the controversy, which requires more than just retrieving a name. It requires understanding the tension between academic disciplines. However, we cannot ignore the strawberry test. He admitted he cannot count the letters in a fruit. It is the classic transformer Achilles heel.

If I have to choose between a model that can count the letters in strawberry and a model that can write a gritty monologue about a detective investigating a missing left sock, I am picking the sock detective every time. Did you hear that intro? He cut it off right at the cliffhanger, but the texture was there. No generic AI sunshine, just pure, unadulterated laundry room grit.

It was a strong performance, but is it a total upgrade? Gemini is faster and has that bottomless memory. GLM five seems more like a specialized creative tool. It is like comparing a high speed industrial printer to a moody novelist with a typewriter. One is more reliable for volume, the other has more personality.

I am leaning toward giving him a trial run. I am tired of Gemini being so polite and helpful. I want a model that is going to challenge our prompts and maybe make me feel a little bit judged by a kitchen appliance. On the Corn Scale of Chaos, I give Bernard a solid eight out of ten.

I am more cautious. I would give him a seven. The technical limitations on spatial reasoning and context window are real hurdles for some of our more complex world building episodes. But as a pure script generator for character driven bits, he is definitely a contender.

Well, there you have it. The flagship from Zhipu AI might just have a seat at the table. Whether he can actually handle a full twenty-five minute script without calling me Barnaby is still a mystery, but I am willing to gamble a few tokens on it.

We will leave it to the listeners to decide if they want more of Bernard or if we should stick with the speed of Flash. For now, the interview room is empty and I need another coffee.

Make it two. See you next time on My Weird Prompts.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.