#1219: Beyond the Vibes: Mastering Structured AI Outputs

Stop begging your AI for JSON. Learn how constrained decoding and strict schemas are turning "vibes" into reliable systems architecture.

0:000:00

Episode Details

Published: Mar 15
Duration: 21:07
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
LLM
Topics: api-integration ai-reasoning software-development

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The landscape of AI development has shifted from an era of "prompt-based begging" to one of technical enforcement. Early AI integration often relied on developers pleading with models to "output only raw JSON," only to have the model include conversational filler that broke downstream pipelines. In 2026, the industry has moved toward structured outputs, where the goal is to treat the AI model like a typed API rather than a chatty assistant.

The Mechanics of Constrained Decoding

The core of this shift is a process called constrained decoding. Unlike standard JSON modes where a model simply tries its best to follow a format, constrained decoding uses a finite state machine to guide the model's token generation. At any given step, the system limits the available tokens to only those that satisfy the provided schema. If a schema requires a quote mark to start a string, the probability of every other character is set to zero. This makes it physically impossible for the model to violate the defined structure, ensuring 100% valid outputs every time.

Navigating the Schema Landscape

While the technology for enforcement has improved, the standards remain fragmented. Developers must navigate the nuances between JSON Schema and OpenAPI 3.1. While OpenAPI 3.1 has become a full superset of JSON Schema, minor differences in keyword support—such as how "nullable" fields are handled—can still cause integration friction.

Furthermore, different providers handle these constraints uniquely. OpenAI offers a native "strict mode," while Anthropic’s Claude often requires a "tool-use" workaround where the model is forced to interact with a specific data-collection tool. Understanding these variations is critical for building cross-platform applications that remain stable regardless of the underlying model.

Engineering Better Schemas

A common misconception is that schemas are purely structural. In reality, schema design is a high-level form of prompt engineering. The names and descriptions assigned to fields act as semantic anchors for the model. For example, renaming a generic field like "s" to "sentiment_score" and providing a clear description can significantly boost the model's accuracy.

Additionally, the order of properties in a schema influences the model's logical flow. By placing complex reasoning tasks at the end of a JSON object, developers allow the model more "computation time" as it builds context from the earlier, simpler fields. This structural "step-by-step" thinking can improve output quality by as much as fifteen percent.

Tools for Reliability

To avoid the pitfalls of manual JSON creation, developers should lean on type-safe libraries like Pydantic for Python or Zod for TypeScript. These tools allow developers to define data models in code, which then automatically generate compliant JSON Schemas. This creates a single source of truth and catches errors during development rather than at runtime. For visualizing complex nested structures, tools like JSON Crack provide interactive graphs that make debugging large schemas manageable. As AI moves toward the Model Context Protocol (MCP) and deeper integrations, these rigorous engineering practices will be the difference between a failed experiment and a production-ready system.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Episode #1219: Beyond the Vibes: Mastering Structured AI Outputs

Daniel's Prompt

Custom topic: Let's talk about obtaining reliable structured outputs in JSON from AI models. There is some confusion around exactly what format that schema has to be written in. The short answer appears to be: so l | Context: ## Current Events Context (as of March 15, 2026)

### Recent Developments
- OpenAI's Structured Outputs feature (launched 2024, now widely adopted) uses a json_schema response format type with `stri

I was looking at some code yesterday, trying to parse a response from a model that was supposed to be a simple list of action items. Instead of the clean data I needed, the model decided to give me a helpful little introductory sentence, then the J S O N block wrapped in markdown, and then a polite closing paragraph explaining why it chose those specific items. It is that classic move where the model acts like a helpful assistant when you really just need it to act like a database. It is incredibly frustrating because you have this beautiful downstream pipeline ready to go, and it all chokes because the model decided to be chatty. Today's prompt from Daniel is about fixing exactly that, specifically obtaining reliable structured outputs and clearing up the confusion around the different schema formats we are all wrestling with here in March of twenty-twenty-six.

It is the great divide in A I development right now. We are moving away from what I call the era of begging, where you spend half your prompt capacity pleading with the model to please, for the love of everything, do not include any prose. We have all been there, writing things like "output only raw J S O N" in all caps, and yet the model still hallucinates a "Here is your data" at the top. Now we are entering the era of enforcement. I am Herman Poppleberry, by the way, and I have been geeking out on the technical shift from prompt-based formatting to A P I-level strict enforcement. It is not just a change in how we write prompts; it is a fundamental shift in how the underlying inference engines actually handle token generation. We are moving from "vibes-based" engineering to actual systems architecture.

That is what I want to dig into, because there is this common myth that as long as you provide something that looks like valid J S O N, the model will just get it. But as Daniel points out in the prompt, when you get into tool definitions or the Model Context Protocol, the requirements get much pointier. You start seeing terms like Open A P I three point one or specific J S O N Schema drafts. Why can we not just give it a template and call it a day? Why is "just valid J S O N" not enough anymore?

Because the model is not actually reading your schema the way a human developer does. When we use something like OpenAI's strict mode, which they rolled out back in twenty-twenty-four and has become the industry benchmark, they are using what is called constrained decoding. This is where the magic happens. Instead of the model having the freedom to choose any token from its entire vocabulary, the system constrains the available tokens at each step based on your schema. If your schema says the next character must be a quote mark because you are starting a string, the model literally cannot choose a letter or a number. The probability of every other token is set to zero. This is a massive shift from "J S O N Mode." In standard J S O N Mode, the model is just trying its best to be valid, but it can still wander off the path. With Structured Outputs and strict mode, it is physically impossible for the model to violate the schema.

So it is essentially a grammar check happening in real time during the actual generation process? It is like the model is walking through a maze where the walls are built by your schema?

Precisely the right way to think about it. It is a finite state machine that guides the model. But here is where the confusion Daniel mentioned comes in. For that grammar to work, the A P I needs to understand the rules you are setting. OpenAI uses a specific subset of J S O N Schema. Google's Gemini has its own response schema format but also supports the J S O N Schema standard. Then you have Anthropic's Claude, which, even here in early twenty-twenty-six, still does not have a native response format parameter in the same way. You have to use their tool-use workaround, where you define a schema as a tool and the model calls it to produce the data. This fragmentation is why developers are pulling their hair out. You have three different providers and four different ways to define a "person" object.

I have noticed that with Claude. You basically tell the model, "here is a tool called record data," and the only way you are allowed to talk to me is by using this tool. It works, and it is actually quite reliable because the model is trained to be very precise with tools, but it feels like a bit of a hack compared to a native strict mode. What happens when the schema itself is slightly off? I have seen cases where a schema looks perfectly valid to a standard validator but the A I provider rejects it with a cryptic error message.

That is usually because of the version mismatch between J S O N Schema and Open A P I. For the longest time, Open A P I was based on J S O N Schema but had these annoying little differences. For example, in older versions of Open A P I, you had a "nullable" keyword, whereas standard J S O N Schema used a type array like "string" or "null." If you mixed those up, the constrained decoding engine would just choke. The good news is that with Open A P I three point one, which is finally the standard now, it has become a full superset of J S O N Schema twenty-twenty-dash-twelve. That resolved a lot of the friction, but you still have to be careful about recursive definitions. Most A I providers still struggle with things like dollar sign ref or recursive schemas because they are hard to map to that finite state machine I mentioned. If you have a "comment" that can contain "replies" which are also "comments," the state machine could theoretically go on forever, and the providers want to avoid that complexity during inference.

That makes sense. If the state machine has to account for infinite recursion, it becomes a lot harder to guarantee a valid path or manage the memory overhead. I want to touch on something else Daniel mentioned, which is the developer experience. For people who are not deep in the weeds of J S O N specifications, writing these schemas by hand is a nightmare. One missing comma or a misplaced curly brace and the whole thing falls apart. It is the most common reason for integration failures. In fact, I saw a Gartner survey recently that said seventy-five percent of A I projects fail due to integration issues, often stemming from these inconsistent or unstructured model responses.

I tell everyone to stop writing raw J S O N for schemas. It is a trap. It is like trying to write machine code by hand when you have a high-level language available. If you are in the Python world, use Pydantic. If you are in TypeScript, use Zod. These libraries allow you to define your data models in actual code, which gives you type safety and autocomplete. Then you just call a single method, like "model J S O N schema" in Pydantic, and it generates the perfectly formatted, compliant J S O N Schema for you. It takes the human error out of the equation. You get the red squiggly lines in your I D E before you even run the code, not after you have spent five dollars on A P I calls that failed.

It also makes the code the source of truth, right? If I change a field in my Pydantic model, my A I schema updates automatically. We actually touched on this shift from vibes to engineering back in episode eight hundred seventy-four. It is about treating the A I interface like a typed A P I rather than a text box. But let's talk about the models themselves. Does the way we name things in the schema actually change how the model performs? Because if the decoding is constrained, does the semantic meaning of the field name still matter?

It makes a massive difference, and this is where schema design becomes a form of high-level prompt engineering. Even though the model is forced to follow the structure, it still needs to understand what to put in those fields. I have seen benchmarks showing that if you have a field for sentiment analysis and you name it lowercase "s," the model might get it right sixty percent of the time. If you rename that field to "sentiment score" and add a description saying, "a value between zero and one representing the positivity of the text," the accuracy can jump by fifteen to twenty percent. The model uses the names and descriptions in the schema as semantic anchors. It is not just a structural constraint; it is a context provider. You are basically embedding your instructions directly into the data structure.

That is fascinating because it means the schema is doing double duty. It is providing the guardrails for the output format, but it is also refining the model's understanding of the task. It is like giving it a very specific set of instructions that are physically impossible to ignore. I have also heard that Gemini has a specific feature called property ordering that helps with this. How does that work?

It is a brilliant little optimization. Google DeepMind found that the order in which you define properties in your schema influences how the model reasons. Think about it: the model generates tokens from left to right, or top to bottom in a J S O N object. If you put the most complex reasoning task at the end of the J S O N object, the model has more "computation time," in a sense, because it has already generated the simpler fields and has those in its context window. By using their property ordering field, you can force the model to follow a specific logical flow. For complex nested schemas, this has been shown to improve output quality by about fifteen percent. It is the structural equivalent of telling the model to "think step by step."

It is almost like forcing the model to show its work in a specific order. If it has to determine the category of a support ticket before it writes the summary, it is more likely to write a summary that matches that category. We talked about this kind of agentic friction in episode ten seventy-six, specifically regarding the Model Context Protocol. Speaking of M C P, Daniel mentioned that it has its own requirements. How does that fit into this landscape? Is it just another standard to learn?

M C P is a bit of a purist, and that is causing some friction. It requires pure J S O N Schema for tool input definitions. It does not want the Open A P I flavor. This has caused some headaches for developers who are used to the OpenAI function calling style, which is slightly more permissive in some ways but more restrictive in others. For instance, some providers might let you get away with certain keywords that M C P will reject. If you are building for the M C P ecosystem, you really need to stick to the standard drafts, specifically draft seven or draft twenty-twenty-dash-twelve. This is why tools like the J S O N Schema Store are so vital. It is built into V S Code. If you are editing a file that matches a known schema, V S Code will give you autocomplete and validation automatically.

So if I am a developer and I am looking at a screen full of red squiggly lines in V S Code, what tools should I be using to stay sane? Because let's be honest, looking at three hundred lines of nested J S O N is a great way to get a headache.

First, lean on the built-in support. V S Code's J S O N Schema Store hosts schemas for over two hundred file types. But for visualizing complex structures, I am a huge fan of J S O N Crack. It turns your J S O N into a searchable, interactive graph. It makes it so much easier to see the relationships between nested objects. If you are struggling with the "where does this bracket end" problem, J S O N Crack is a lifesaver. There is also a great tool called Smithery for managing these connections if you are working with multiple models. And for those who truly hate J S O N, there are YAML to J S O N workflows. You can write your schema in YAML, which is much more readable for humans, and then use a simple build script to convert it.

I think the visualization part is key because we are starting to build these incredibly deep schemas. In finance or healthcare, you might have a schema that covers hundreds of potential data points for a patient record or a transaction audit. Daniel mentioned the compliance angle, and I think that is where this gets really serious. We are not just talking about making life easier for developers; we are talking about legal requirements.

We are seeing things like the F I N O S A I Governance Framework in the financial sector now. They are starting to require structured outputs for any A I-generated audit trail. You cannot just have a model write a summary of why a loan was approved; it has to output a structured log that matches a specific regulatory schema. This ensures that the data can be ingested by traditional compliance systems without a human having to manually verify every single word. Structured output is the bridge that makes A I actually useful in regulated environments. According to Cognitive Today, using J S O N schema to enforce these outputs can reduce parsing errors by up to ninety percent. That is the difference between a system that works and a system that requires a full-time team of humans to fix broken J S O N.

It moves A I from being a novelty to being a reliable part of the data pipeline. If you cannot guarantee the format, you cannot automate the downstream process. It is that simple. If there is even a five percent chance the model returns invalid J S O N, you have a broken pipeline. But with strict mode and constrained decoding, that error rate for parsing drops almost to zero. It is a complete game changer for reliability.

And that is the real takeaway for anyone listening. If you are still trying to use regular expressions or string splitting to get data out of an A I model, you are living in the past. You are paying a massive tax in terms of latency and reliability. By moving to a schema-first approach, you are effectively turning the model into a structured data generator. You are treating the L L M like a microservice with a strictly defined A P I contract.

I want to go back to the developer who hates J S O N. You mentioned YAML to J S O N workflows. Is that a viable path for defining these schemas, or does it introduce its own set of problems? I have heard about the "Norway problem" where things get lost in translation.

It is very popular, but you have to be careful. The "Norway problem" is a classic: in YAML, the country code "N O" can be interpreted as a boolean "false." If you are not careful with your types, your YAML-to-J S O N conversion can mangle your data before it even reaches the model. This is why I still lean toward Pydantic or Zod. They are code-native, so you get the benefits of a real programming language—variables, functions, comments—but the output is a rock-solid J S O N Schema. It is the best of both worlds.

That makes sense. It is about having a robust intermediate layer. We discussed the importance of unified data structures for both human and A I consumption in episode twelve zero nine, and the "Dual-Track A P I Tax." If you have to write one A P I for your frontend and a different one for your A I, you are doubling your work. If you use a unified schema, the A I just becomes another client.

And if you are working across multiple providers, like if you are using OpenAI for some tasks and Gemini for others, I highly recommend using LiteLLM. It is a library that provides a unified interface for all these different providers. It handles the translation between OpenAI's function calling format and Gemini's response schema format for you. It saves you from having to write five different versions of the same schema. It is all about reducing that "integration tax" we talked about.

One more thing on best practices before we move on. Descriptions. I cannot stress this enough. Every single field in your schema should have a description property. Even if you think the field name is obvious, like "user name," give the model more context. Tell it if it should be the full name, just the first name, or if it should handle titles like Doctor or Professor. That extra bit of metadata is what separates a mediocre implementation from a production-grade one.

It really is. I have seen cases where just adding the word "required" to a description—even if the field is already marked as required in the schema—helps the model focus. It is like giving the model a tiny little map for every single turn it has to take. And remember, these descriptions are part of the prompt that gets sent to the model. You are not just defining a structure; you are providing a specialized instruction set.

I think we have covered a lot of ground here, from the underlying mechanics of constrained decoding to the practical tools like Pydantic and J S O N Crack. The landscape is definitely maturing. It feels like the Wild West days of begging the model for J S O N are finally behind us. We are finally getting the engineering rigor that A I development has been lacking.

It is a good time to be a developer in this space. The tools are catching up to the potential. We are moving from "asking nicely" to "defining exactly," and that is how we get A I into production in a way that actually lasts.

So, let's wrap this up with some actual takeaways for the people listening. If you are about to sit down and write some code to integrate a model, what are the first three things you should do?

Number one, define your schema in code first. Use Zod if you are on the web or Pydantic if you are doing backend Python work. Never, ever write the raw J S O N schema by hand if you can avoid it. It is just asking for a syntax error that will be a pain to debug.

And number two?

Use a validation layer. Before you even send that schema to the A P I, run it through a standard J S O N Schema validator. V S Code will do this for you if you have the right plugins, but you can also build it into your unit tests. If your schema is invalid, your model calls will fail, so catch those errors early in your C I C D pipeline. Do not let a broken schema reach production.

And the third?

Be descriptive and intentional with your field names. Treat your schema like a conversation with the model. If you want a specific type of output, describe it in the schema metadata. And if you are using Gemini, take advantage of that property ordering field to guide the model's reasoning process. Put the "reasoning" or "explanation" fields at the end so the model can use the context it generated in the earlier fields.

That is solid advice. It is about moving from being a prompt engineer to being a systems architect. You are building a system where the A I is just one component that needs to adhere to a strict interface. It is about reliability, predictability, and ultimately, trust in the system you are building.

That is the goal. We want the A I to be powerful but predictable. Structured output is the key to that predictability. It is the difference between a demo that looks cool and a product that people can actually rely on for their business.

I think that is a perfect place to leave it. This has been a great deep dive into the plumbing that makes modern A I agents actually work. Before we go, I want to thank our producer, Hilbert Flumingtop, for keeping everything running smoothly behind the scenes.

And a big thanks to Modal for providing the G P U credits that power this show. They make it possible for us to explore these technical topics in depth every week.

If you found this episode helpful, please consider leaving us a review on your favorite podcast app. It really helps other developers find the show and join the conversation. We love hearing your feedback and your own "weird prompts" stories.

You can also find us at myweirdprompts dot com for the full archive of episodes and links to everything we discussed today, including J S O N Crack and the Pydantic documentation.

This has been My Weird Prompts. We will see you in the next one.

See ya.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.