Imagine your AI assistant freezing because it is trying to remember every single tool it could possibly use, from calculate pi to order pizza, before you have even asked it to do anything. It is like a handyman showing up to fix a leaky faucet, but he insists on reciting the user manual for every single tool in his three-story warehouse before he even looks at the sink. By the time he is done talking, he has forgotten why he is there, and you have already paid for four hours of his time.
That is a perfect illustration of the context pollution crisis we are seeing in the Model Context Protocol right now. Today's prompt from Daniel is about just-in-time tool usage in MCP, and honestly, it is the only way forward if we want these agentic systems to actually scale. If you are trying to build anything serious with AI agents, you have probably hit this wall where the sheer volume of tool schemas is eating your reasoning capabilities alive.
It is the MCP tool trap. We finally have this great standard for connecting models to local data and APIs, but the more useful we make the agent, the dumber it gets because we are stuffing its brain with JSON schemas. By the way, today's episode is powered by Google Gemini three Flash, which is fitting because we are talking about the very plumbing that makes models like this more effective in production. I am Corn, the resident skeptic of bloated windows.
And I am Herman Poppleberry. I have been digging into the recent benchmarks on this, and the numbers are staggering. When we talk about context bloat, we are not just talking about a few extra pennies on the API bill. We are talking about a fundamental degradation of what the model can actually do. If you load a full suite of MCP servers—say, GitHub, Slack, Google Docs, and a local filesystem—you can easily burn eighty thousand tokens before the user even types hello.
Eighty thousand tokens just to say I am ready to work? That is nearly half of a standard two hundred thousand token context window. It is like renting a massive apartment but filling forty percent of the floor space with empty cardboard boxes labeled things I might need later. You are paying for the space, you are tripping over the boxes, and you have no room left for a couch.
And it is worse than just losing space. There is this concept of context rot or reasoning degradation. When a model has to look through a massive haystack of tool definitions to find the one needle it needs, the latency goes up, the chance of a hallucination skyrockets, and the model starts losing the thread of the actual conversation. Just-in-time, or JIT, tool usage flips the script. Instead of loading everything upfront, we fetch the schemas only when the model actually decides it needs them.
So, instead of the handyman reciting the warehouse inventory, he just knows he has a warehouse, and when he sees the leaky faucet, he goes, oh, I need the pipe wrench, let me go grab that specific manual. But how does the model know the pipe wrench exists if we haven't told it yet? This feels like a chicken and egg problem. How do you find a tool you haven't loaded?
That is where the discovery phase comes in. In a traditional MCP setup, you have eager loading. The client connects to the server, asks for a list of all tools, and shoves those full JSON schemas into the prompt. With JIT, we move to a progressive discovery model. The model is initially given a meta-tool or a search tool. It is essentially a tool whose only job is to find other tools.
A tool for tools. That sounds very meta, even for us. So the assistant says, I need to check the user's calendar, I will call the tool finder to see if we have a calendar tool, and then the system injects the actual calendar schema once the intent is confirmed?
And the January twenty twenty-six update to the MCP specification actually formalized this with the tool discovery capability. It allows servers to expose metadata—like a short description and a name—without sending the massive payload of the full input schema. This allows the agent to perform what we call semantic tool search. We can use retrieval augmented generation, or RAG, but for the tools themselves. We index the tool descriptions in a vector database, and the agent only sees the top three or five most relevant schemas for the current task.
I love that. We are essentially applying RAG to the agent's own capabilities. It keeps the context window lean and mean. I saw a case study recently from a developer building a multi-agent system with over fifty MCP tools. They were hitting eighty percent context usage on start-up. Once they switched to JIT, that dropped to fifteen percent. That is a massive gain in what they call the reasoning headroom.
It is the difference between an agent that feels sluggish and confused and one that feels snappy. If you look at providers like Speakeasy, they did a benchmark with a server containing four hundred tools. Statically loading those would require over four hundred thousand tokens. That is physically impossible for most current models to handle in a single turn. But with dynamic discovery, they got it down to six thousand tokens. That is a hundred-fold reduction.
A hundred-fold reduction is insane. That effectively uncaps the number of tools an agent can have. We go from rationing MCP connections like they are water in a desert to just hooking up everything we own and letting the discovery layer sort it out. But Herman, what about the latency? If I have to do a round trip to find the tool, then another round trip to load the schema, then the actual execution... aren't we just trading token cost for time?
That is the primary trade-off, but it is not as bad as it sounds because of caching. Most JIT implementations use an in-memory cache. Once you have fetched the schema for the GitHub create issue tool once in a session, you keep it in the window for the rest of that conversation. You only pay the discovery tax once. Plus, the time you save by not having the model process eighty thousand tokens of junk every single turn actually makes the overall interaction faster. Processing a lean prompt is much quicker than processing a bloated one.
So it is like a slow start but a faster sprint. I can live with that. I am curious about the implementation details of this tool discovery capability from the January update. How does a developer actually tell their MCP server to be JIT-compatible? Does it require a total rewrite of the server logic?
Not at all. It is mostly a change in the client-server handshake. The server now flags which tools support lazy loading of schemas. When the client calls lists tools, the server sends back the names and a one-sentence description. If the model wants to use it, the client then makes a specific call to get the full JSON schema for that tool ID. The January twenty twenty-six SDK update made this pretty much a one-line configuration change for most servers.
It is basically the difference between a table of contents and the full book. I am thinking about the second-order effects of this. If we aren't worried about context bloat, does that change how we actually design the tools? Because currently, everyone tries to make these Swiss Army knife tools to keep the count low. Does JIT mean we should be building more granular, hyper-specific tools instead?
That is a brilliant point, Corn. We are seeing a shift from context management to context engineering. When you aren't worried about the number of tools, you can build very atomic functions. Instead of one massive database tool with twenty parameters, you have twenty specific tools. This actually helps the model because the descriptions become more precise, which makes the semantic search more accurate. It reduces the chance of the model hallucinating a parameter because the schema it is looking at is focused on exactly one task.
It is the Unix philosophy applied to AI agents. Do one thing and do it well. I can imagine a world where we have thousands of these tiny MCP tools available in a cloud-native registry, and the agent just pulls them down like npm packages on the fly. You mentioned something about the MCP tool router project earlier?
Yeah, that was an open-source project from late twenty twenty-five that really pioneered this. They implemented a local caching layer that sits between the agent and the MCP servers. It tracks which tools are used most frequently across different sessions and keeps those schemas hot. They reported that for common workflows, the discovery latency dropped by seventy percent because the router already knew exactly what the model was looking for based on the initial intent.
It is basically a specialized DNS for AI tools. That makes so much sense. If I am always asking about Jira tickets on Monday mornings, the router should probably have the Jira schema ready to go. What about emerging standards? You mentioned Composio and their cloud-native registry. How are the big players handling this?
Composio is doing some really interesting work with what they call the AI control plane. They have moved the tool discovery layer to the cloud. So, instead of your local machine trying to index all these tools, you query a centralized registry that has already performed the vectorization of thousands of APIs. It is not just about saving tokens; it is about providing a universal context layer.
A universal context layer. That sounds like something that would make our producer Hilbert Flumingtop very happy. It takes the plumbing away from the individual developer and turns it into a service. But does this introduce a privacy concern? If the discovery layer is in the cloud, does it need to see my prompts to know which tools to suggest?
That is the friction point. Most of these systems work by sending a vectorized version of the prompt's intent, not the full text, to the registry. But for high-security environments, you would definitely want a local JIT implementation. The beauty of the MCP standard is that it supports both. You can run a local vector database like the one we talked about in the vector databases as a file episode—oops, I wasn't supposed to mention that—but you can run those locally to keep the discovery process entirely on-premise.
You just couldn't help yourself, could you? One job, Herman! But seriously, the idea of a local discovery layer is compelling. It means I can have an agent with access to my entire company's internal API surface—thousands of endpoints—and it only ever sees the five it needs for the task at hand. That is the only way an enterprise AI actually works without becoming a security or cost nightmare.
And think about the reasoning quality. There is a famous paper about the lost in the middle phenomenon, where models are great at using information at the very beginning or very end of a context window but struggle with stuff in the middle. If your tool schemas are taking up eighty thousand tokens in the middle of your window, you are essentially creating a massive blind spot for the model's reasoning. By using JIT and keeping the window at, say, ten thousand tokens, you stay in the model's high-performance zone.
It is like trying to read a book while someone is constantly shouting random dictionary definitions at you. You might finish the book, but you won't remember the plot. JIT just gives the model the definitions it asks for, when it asks for them. This really changes the game for developers who have been carefully rationing their MCP connections. I know guys who were literally turning off their Slack MCP server just so they could have enough context to use their GitHub one.
The rationing era is ending. We are moving toward a search to use flow. Jentic Engineering had a great blog post about this, saying that increasing agent capabilities by stuffing tool info into the context isn't scaling; it is just impairing performance. They are advocating for a complete decoupling of tool availability from tool registration.
It is the difference between a buffet and an a la carte menu. The buffet is great until you realize you have to carry every single tray of food to your table before you can start eating. The a la carte menu is much more civilized. So, if I am a developer listening to this and I am currently hitting these limits, what is the first step to switching to a JIT architecture?
First, check if your MCP client supports the tool discovery capability from the January twenty twenty-six spec. Most of the major ones, like the updated Claude Desktop or the latest VS Code extensions, have a toggle for lazy tool loading. If you are building your own client, you want to switch from using the list tools method to a two-step process: fetch metadata first, then fetch the schema only on a call attempt.
And what about the servers? Do I need to update my custom MCP servers to support this?
Ideally, yes. You want to ensure your server handles the get tool schema request. If you are using the official MCP SDKs, this is usually handled for you if you update to the latest version. The SDK will automatically serve a truncated metadata list and wait for the specific schema request. It is also worth looking into frameworks like the OpenAI Agents SDK, which has started implementing dynamic tool filtering. You can programmatically swap toolsets based on the state of the conversation.
So if the user says, okay, I'm done with the code, let's talk about the marketing plan, the system can automatically swap out the Python interpreter and GitHub tools for the SEO and social media tools, clearing that context immediately.
It is about being intentional with what the model is thinking about. There is this great quote from the engineering team at Writer dot com. They said we are moving from context management to context engineering, where the model's environment is dynamically constructed rather than statically defined. That is the core of JIT. The environment is a living thing.
It makes the AI feel more like a person. If I ask you a question about quantum physics, you don't start by reciting everything you know about biology and history just in case. You just pull the physics knowledge from the back of your brain. This JIT approach is finally giving AI that same kind of focus. But what about the edge cases? What if the semantic search fails and doesn't find the right tool?
That is the risk. If your vector search for tools is poor, the model might conclude it can't do the task because it can't see the right tool. This is why the quality of tool descriptions is becoming more important than the code itself. Developers need to write descriptions that are optimized for semantic retrieval. You need to include keywords and intent-based language, not just technical jargon.
So, instead of a description that says list underscore files underscore v2, you need it to say use this tool to see what files are in a directory when the user asks about project structure. We are basically doing SEO for our own internal tools so the AI can find them.
That is exactly what it is. Internal Tool SEO. If your discovery layer can't find the tool, the JIT pattern falls apart. I have seen some teams actually using a smaller, cheaper model to act as the router. You send the prompt to a small model, it picks the five best tools, and then you send the prompt and those five schemas to the big, expensive reasoning model.
That is clever. Use the lightweight model as the librarian and the heavyweight model as the researcher. It saves money and context. I can see why this is gaining so much traction. As we move toward GPT-5 or whatever the next massive jump is, even if they have million-token context windows, we still won't want to waste them. Efficiency is always going to be the gold standard.
Even with a million tokens, the latency of processing that much data is a killer. And the more information you provide, the more chances there are for the model to get distracted by an irrelevant parameter in a tool it isn't even using. I think JIT will become the default way we interact with MCP within the next six months. The eager loading model just doesn't scale past five or six tools.
It is funny how we spend all this time building bigger windows, only to realize that the secret to intelligence is actually knowing what to ignore. It is the same for us humans, right? My brain has a lot of useless information about nineties sitcoms, but I try to keep it out of my active context when we are recording this show.
Try being the operative word there, Corn. I have heard your references. But you are right. Intelligence is filtering. JIT tool usage is just the technical implementation of that filter for the Model Context Protocol. It allows for these massive agent ecosystems—what people are calling tool markets—where a model can dynamically discover and use services it has never even seen before.
A tool market. Imagine an agent that can go out to a public registry, find a tool for a specific niche task like converting a weird file format, download the MCP schema, and use it immediately, all without the developer ever knowing that tool existed. That is where this is going.
That is the vision. But to get there, the discovery has to be flawless. We are seeing some interesting work from Apollo's MCP server, which uses semantic schema search to let agents explore GraphQL schemas on demand. Instead of loading the whole GraphQL schema, which can be megabytes, it just fetches the sub-graphs as the agent traverses the data. It is a very granular version of JIT.
It is like a flashlight in a dark room. You only see what you are pointing at. I think the practical takeaway for our listeners is pretty clear: if you are building with MCP and you are feeling the squeeze, stop rationing and start engineering your context. Look into the tool discovery capability and start treating your tools as a searchable index rather than a static list.
And don't sleep on the local caching. If you are building a custom client, implementing a simple key-value store for schemas you have already fetched will save you a ton of latency. Also, take a hard look at your tool descriptions. If they aren't clear enough for a vector search to find them, your JIT implementation is going to fail.
It is a brave new world of context engineering. I am just glad I don't have to manually turn off my Slack connection anymore just to write some code. It was getting lonely in there. Before we wrap up, we should probably mention our sponsor.
Right. Big thanks to Modal for providing the GPU credits that power this show. They are the ones making all this agentic experimentation possible behind the scenes.
And thanks as always to our producer, Hilbert Flumingtop, for keeping the warehouse organized so we don't have to recite the inventory every time.
If you are finding these deep dives useful, a quick review on your podcast app really helps us out. It helps other developers find the show and keeps us motivated to keep digging into these MCP specs.
We will be back next time with whatever weirdness Daniel sends our way. This has been My Weird Prompts. I am Corn.
And I am Herman Poppleberry.
See ya.
Goodbye.