#2400: Claude Code’s Hidden Context Tax

How Claude’s eager-loaded primitives silently consume context—and how to optimize your setup for sharper performance.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2558
Published: Apr 24
Duration: 23:03
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: DeepSeek v3.2
Topics: model-context-protocol ai-reasoning context-window-tax

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Hidden Costs of Claude’s Context

While Claude’s expanded context window suggests limitless working memory, the reality is more nuanced. Not all primitives consume resources equally—some are "context vampires," silently draining capacity before you even begin a task. Understanding this hierarchy is key to maintaining performance.

The Hierarchy of Context Hogs

The heaviest costs come from eager-loaded primitives:

Subagents (400-800 tokens each): Their detailed example blocks are inlined verbatim, making them the most expensive.
CLAUDE.md files: Eager-loaded documentation that accumulates from root to working directory.
Auto-memory: Claude’s self-written MEMORY.md can bloat to 25KB per session, crowding out active tasks.

In contrast, lazy-loaded tools like MCPs (40-80 chars per name) and hooks (zero cost) are lightweight—but only if defaults like ToolSearch remain enabled.

The Eager vs. Lazy Divide

Cost isn’t about size alone—it’s about when a primitive loads. A 200-line skill invoked on demand is cheaper than a 10-line subagent description loaded upfront. Key takeaways:

Disable unused plugins: Even inactive plugins may inject eager-loaded agents via settings.json.
Audit subagents: Replace verbose examples with shorter ones, or convert them to skills.
Scope CLAUDE.md: Split monolithic files into path-specific rules.

The Plugin Pitfall

Installing a plugin often means adopting its hidden eager loads—like main-thread agent swaps that replace Claude’s core prompt silently. The fix? Regularly audit plugins with /plugin and grep for "agent" keys.

Optimization Payoff
A few tweaks—disabling auto-memory, trimming subagents, and scoping documentation—can reclaim 25-30% of working context. The result? Faster, more reliable responses without mid-task amnesia.

Mentions

Anthropic AI research company behind Claude
Claude Code AI coding assistant by Anthropic
Model Context Protocol (MCP) Protocol for tool integration with Claude

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2400: Claude Code’s Hidden Context Tax

Daniel sent us this one — he's asking about Claude Code and skills, specifically how much context gets consumed when using them. As of April 2026, which primitives are the real context hogs, and which are nearly free? He wants us to walk through the primitive hierarchy, the eager versus lazy loading distinction, and where a heavy plugin install actually hurts. Basically, what practical discipline should a power user adopt to avoid quiet context bloat?

Oh, this is a fantastic question. And it’s timely, because the landscape has shifted just in the last few months. Everyone got excited when the base context window expanded again.

You’d think a bigger window means you can just throw everything in and not worry, but that’s not how it works. It’s like having a bigger desk but you keep piling on more junk until you can’t find the one paper you actually need to work on.

The pain point is real: you’re in the middle of a complex coding task, and suddenly Claude seems to forget the function signature you defined three minutes ago, or it starts hallucinating tool names. It feels like a regression, but it’s not the model getting dumber—it’s your context getting fatter.

We’re talking about operational overhead consuming usable working memory. And by the way, today's episode is powered by deepseek-v-three-point-two.

A good tool for a technical deep dive. So, to Daniel’s core question: not all primitives are created equal. Some are context vampires, sipping quietly in the background, and others are practically free. The difference between an optimized setup and a bloated one can be the difference between a sharp, focused assistant and one that’s sluggish and forgetful.

Which, in a professional setting, is the difference between shipping and debugging all night.

So where do we even start? I think we should frame this with three lenses right out of the gate. One: the hierarchy of primitives by context cost. Two: the critical distinction between eager and lazy loading—that’s the axis that matters more than big versus small. And three: the practical impact of plugin sprawl.

Let’s start with the hierarchy. If context is currency, who’s spending the biggest chunks of that budget?

The biggest slice always goes to the base system prompt—Claude's core instructions and capabilities. That's a fixed cost. Then you have the eager-loaded primitives, the stuff that gets shoved in every session before you even type "hello.

The mandatory overhead.

And within that eager load, the hierarchy is clear. The heaviest hitters are subagent descriptions. Every single example block you write for a subagent gets inlined, verbatim. We're talking four hundred to eight hundred tokens per agent. If you have a few of those installed, you've just burned a significant chunk of your working memory on standby instructions.

The very thing that makes subagents powerful—their detailed, example-driven guidance—is also what makes them context obese.

Next in line is the CLAUDE.This is the documentation you scatter around your project to guide Claude. Every file from root to your current directory loads in full. It's eager. If you have a two-hundred-line CLAUDE.md in your project root and another in your src folder, that's all in there, chewing up space.

I'm guessing the "at-import" directive isn't a lazy load.

It is not. It's an eager include. So after subagents and CLAUDE.md files, you have the auto-memory system. md can silently swell to twenty-five kilobytes per session, written by Claude itself as it tries to be helpful. Then you have the list of skill descriptions—just the metadata, name and a truncated line—but if you have four hundred skills installed, that aggregate weight adds up.

What's down at the lightweight end?

MCP tool names are practically free. They're lazy-loaded by default; just the name sits in context until you actually need the schema. Hooks and monitors are harness-level, zero token cost. They don't even show up in the model's view.

The critical insight is that the cost isn't about the primitive's potential size, but about when it loads. A two-hundred-line skill that loads only when invoked is cheaper than a ten-line subagent description that's always there.

That's the eager-versus-lazy axis. It's the single most important concept for managing your context. And this matters more now than ever because the context window is so large. Bloat is insidious. You don't hit a hard wall and get an error; you just get a gradual, creeping degradation in coherence and recall. Your effective working space in a two-hundred-fifty-six-thousand-token window might be thirty percent less because of overhead, and you might not even notice until your outputs get weird.

Which brings us to the plugin impact. Installing a heavy plugin isn't just adding one tool; it's potentially adding eager-loaded agents, new skills, and more to that startup tax.

So the discipline starts with understanding this hierarchy and this loading model—knowing what you're paying for and when. And to make this concrete, let's talk numbers.

According to the Anthropic technical documentation from March, MCP tools, when lazy-loaded via ToolSearch, consume about forty to eighty characters each—just the name. You can have hundreds of those in a few kilobytes. It's trivial.

That's the default, right? ToolSearch is on?

The harness defaults to 'auto'. But if you manually flip the ENABLE_TOOL_SEARCH flag to false, it falls back to eager-loading the full JSON schemas for every MCP. That's when the bloat comes roaring back. So rule number one: don't touch that switch.

So MCPs are lightweight citizens if you leave the harness alone. What about skills? You said the descriptions are eager.

They are, but it's shallow metadata. About one line per skill: name plus a truncated description, maybe eighty to a hundred characters. In a really heavy session with over four hundred skills installed, that list is long, but each entry is small. The real cost comes from the SKILL.md body, but that loads only when you invoke the skill. So the readiness tax is low, but the potential execution cost is high.

How much context does just maintaining that skill readiness consume, in the aggregate?

There's a fallback budget—about one percent of the context window, or roughly eight kilobytes. If your installed skills exceed that, their descriptions get silently truncated. And when that happens, skills can quietly stop triggering because Claude can't see their full purpose anymore.

Now, contrast that with the subagent. You said four hundred to eight hundred tokens each. Why the massive disparity?

Because of the design. A subagent description isn't just a name and a one-liner. It's the full prompt, and crucially, every single example block you write inside those tags is inlined verbatim. The model needs those examples to role-play the agent correctly, so they're not summaries—they're the full text. That's why they cost disproportionately more than their utility. You might have five verbose example dialogues in an agent you use once a week, but you're paying for them in full, every single session.

The very feature that makes them precise and teachable is what makes them gluttons. It's a trade-off.

A severe one. And it creates a meta-heuristic: for context thrift, you should reach for hooks first, then MCPs, then skills, then subagents, and the absolute last resort should be a main-thread agent swap via a plugin's settings.But most authors do the opposite—they reach for a subagent first because it's the most general, most powerful primitive. That impulse produces the largest quiet bloat.

Let's talk about the CLAUDE.This is the documentation meant to help Claude understand your project.

It does help! But it's eager-loaded, and it's cumulative. Every file from root to your current working directory gets loaded in full. The recommendation is to keep each under two hundred lines, because beyond that, both the cost rises and Claude's adherence to the instructions actually drops. It's a case of less being more.

The at-import directive?

It's an eager include. So if you have a massive best-practices file you're importing, you're pulling all of it into context up front. The better approach is to convert verbose sections into actual skills, which load on demand, or to use path-scoped rules in the dot-claude directory.

You mentioned auto-memory. How does that spiral?

This is a great case study in recursive bloat. The auto-memory system writes a MEMORY.md file, up to twenty-five kilobytes per session, chronicling what Claude thinks is important. It's written by Claude itself, so it can grow organically, session after session, silently creeping toward that cap. It's meant to be helpful, but it's essentially the model consuming its own context with its own -commentary.

You get this self-improvement loop that's actually a self-sabotage loop. It's trying to remember everything, and in doing so, it forgets how to think.

The mitigation is to keep MEMORY.md as an index—a table of contents—and push the detailed notes out to topic-specific files that load only on demand. You have to audit it via the slash-memory command and consider disabling it entirely for scratch workspaces.

Alright, give me a before-and-after comparison. Walk me through a real workflow with and without MCP optimization.

Imagine a developer setting up a new project. The 'before' picture: they install a dozen plugins. Several have subagents for code review, documentation, and debugging. They have a three-hundred-line root CLAUDE.md and use at-import for a style guide. They leave auto-memory on. Their effective working context might be down forty percent from the nominal max before they write a single line of their own code.

Disable the plugins they aren't using today. Convert those verbose subagent examples into two short ones instead of five long ones. Break the monolithic CLAUDE.md into path-scoped rules. Convert the style guide into a skill. Set auto-memory to index mode. Just those actions could reclaim twenty-five to thirty percent of the working context. The difference is tangible—faster, more accurate responses, and no mid-task amnesia. Now, what about the installation overhead? Even unused plugins in the list still cost resources, right?

So that's the optimization playbook for what's already installed. But what about the installation process itself? Just having a plugin in your list, even if you never call it today, that's still costing you, right?

This is one of the biggest pitfalls. When you install a plugin, you're not just adding a tool you can optionally use. You're often adding eager-loaded components to your startup prompt. The most dangerous is the plugin's settings.json file that contains an "agent" key.

That's the main-thread swap.

If a plugin defines an agent there, it swaps out the entire base system prompt with its own body. That can rival the size of the original system prompt, and it happens invisibly. You might install a "super code reviewer" plugin and suddenly your Claude is running a completely different core personality, consuming a huge slice of context, before you've even asked for a review.

There's no UI indicator that this swap has happened?

It's completely silent. The only way to know is to go digging into the plugin's configuration files or use the slash-plugin command to inspect what's actually loaded. This is why the audit checklist is so critical. You have to grep for plugins with that main-thread agent swap.

Plugin sprawl isn't just a menu clutter problem. It's a direct, silent tax on your cognitive budget.

A progressive tax. And it gets worse with skill dependencies. This is the transitive dependency graph problem—the silent killer.

Walk me through that.

Let's say you install a "React DevOps Assistant" plugin. It might come with twenty skills. But one of those skills might have a dependency on another internal skill for parsing YAML, and that one might trigger a subagent for configuration validation. You, the user, just see one plugin. But under the hood, activating one skill can eager-load a chain of other primitives. You get this dependency creep where using one simple feature quietly pulls in a whole subtree of context-heavy machinery.

It's not just what you install, it's the web of things those installations bring along for the ride.

And because it's transitive, it's incredibly hard to profile. You might have a clean setup, try one new skill, and your performance tanks without a clear reason. The bloat came from three layers down in the dependency graph.

This sounds like it calls for a practical discipline framework. A rule of thumb for power users.

I've been advocating for what I call the three-two-one rule. Three active skills maximum loaded and ready for model invocation at any time. Two skills configured for lazy-loading—meaning you call them explicitly by name, they're not in the main descriptive list. And one MCP focus area.

Break that down. The three active skills.

You audit your skill list and you pick the three you use constantly for your current project. You disable model-invocation for all the others. That removes their descriptions from the eager-loaded list entirely. Your skill readiness tax drops to almost nothing. You manually invoke other skills when you need them with a slash command.

The two lazy-loaded?

Those are for heavy utilities you use often, but not constantly. You write a short alias or a keybinding that calls them directly. They don't pollute the skill description list, but they're still quickly accessible. The one MCP focus means don't try to be a master of all MCPs at once. Have a core set for your current work—your database, your cloud platform—and leave the rest in the lazy-loaded pool via ToolSearch.

It's a forcing function for intentionality. You have to choose your tools deliberately.

Context is the new CPU cycle. You have to manage it like a finite resource. The three-two-one rule is a guardrail against the natural creep.

Okay, but how do you even know you have a problem? What are the debugging techniques? I can't feel my context being consumed.

You can profile it. The slash-memory command shows you what's in auto-memory. The slash-plugin command shows you installed plugins and lets you disable them. For the real heavy lifting, you use the InstructionsLoaded hook.

The what now?

It's a debug hook you can enable. It fires when the system prompt is assembled and shows you, line by line, what got loaded into context and how many tokens it consumed. It's like a itemized receipt for your context budget. You can see exactly which plugin's agent swap just added eight thousand tokens, or which subagent example block is costing you five hundred.

That's incredibly powerful. So you can actually watch the bloat happen in real time.

And the other manual audit is just using word count. The command 'wc -l' on every CLAUDE.md file in your chain. Listing out your dot-claude rules directory to see which files lack path scoping. Counting the example blocks in your subagents. It's grunt work, but it's the only way to see the silent accumulation.

Give me a concrete case study. A real-world example of dependency creep in, say, a coding assistant setup.

A developer installs a popular "full-stack aide" plugin. It has a skill called "scaffold new API endpoint." They use it. Unbeknownst to them, that skill has a dependency on a "validate OpenAPI spec" skill, which in turn triggers a "configuration subagent" to check their project's config files. That subagent has five verbose example blocks. So one simple scaffold command eager-loaded two extra skills and a seven-hundred-token subagent into their session context. Those components persist, eating memory, for the rest of their work session.

Profile with the InstructionsLoaded hook to see the chain. Then, either disable the automatic dependencies, rewrite the scaffold skill to be self-contained, or trim the subagent's examples down to two short ones. The context savings from that one fix could be over a thousand tokens.

Demonstrate the lazy-loading savings for me. Compare eager versus lazy for a common utility.

Take a "format JSON" utility. In an eager setup, it's a skill in your main list—its eighty-character description is in every prompt. In a lazy setup, you remove it from model invocation. You either type slash-format-json manually, or you have a keyboard shortcut that calls it. The description is gone from your context. You save those eighty characters per prompt, which is small, but multiply that by dozens of minor utilities and the savings become megabytes over a workday. The performance difference is in reduced latency and less cognitive fragmentation for the model.

The discipline is part audit, part configuration, and part habit—knowing what you have, configuring it to load only when needed, and forming the habit of manual invocation for secondary tools.

That's the core of it. It turns context management from a black box into a deliberate engineering practice.

Let's say I'm convinced. I've got my InstructionsLoaded hook running, I'm horrified, and I want to clean house. What's the first move?

Immediate action: run the slash-memory and slash-plugin commands right now. Don't wait. That gives you a snapshot of your current bloat. Then, disable every plugin you're not actively using in this session. Not uninstall—just disable. That cuts off whole branches of eager loading instantly.

That's the emergency stop. What's next on the checklist?

Go through the audit list from Daniel's notes. Grep your plugins directory for any 'agent' key in settings.json files—that's the main-thread swap. Use 'wc -l' on every CLAUDE.md file in your chain; if any are over two hundred lines, break them up. Count the example blocks in your subagents and trim anything past two. It's grunt work, but it's the only way to find the silent accumulators.

That's a one-time fix? Or is this a new long-term habit?

It's absolutely a habit. Think of it like cleaning your physical workspace at the end of the day. Before you close your editor, do a quick context audit. Have any new skills crept into the invocation list? Did auto-memory write a novel today? It's the skill equivalent of closing your tabs. The discipline is what prevents the creep from coming back.

There's always an exception. When is it okay to break these rules and just eat the context cost?

When the primitive is core to your workflow and its benefits vastly outweigh the memory tax. For example, if you're doing deep, multi-session work on a single complex codebase, a well-tuned subagent that understands the entire architecture might be worth its eight hundred tokens. Or if you're onboarding a new team member, a comprehensive CLAUDE.md might be justified for clarity. The key is that it's a conscious trade-off, not an accidental accumulation.

The rule is: know the cost, and only pay it deliberately.

Context is a resource. Spend it where it gives you the highest return. Everything else should be lazy, disabled, or deleted. And Corn, that's going to become even more critical moving forward.

It really is. I was looking at the Anthropic roadmap, and they're teasing some major context compression techniques slated for May.

Oh, I saw that. The early notes from their engineers suggest they’re working on a form of semantic chunking for the system prompt itself. Instead of loading every primitive’s full text, they’d generate a compressed index that the model can query on-demand.

Would that make all this optimization obsolete?

Not at all. It just changes the calculus. Compression introduces its own overhead—you’re trading raw token count for computational latency during retrieval. The core principle remains: context is the new CPU cycle. You manage it like a finite resource. Whether it’s tokens or retrieval latency, waste degrades performance.

The forward-looking thought is, the tools will get smarter, but the discipline stays.

The May update might let you fit more in, but it won’t forgive sloppy habits. If anything, it makes the audit checklist more valuable, because you’ll be able to see the cost in a new dimension—retrieval time.

Final thought for our listeners, then: your Claude instance’s memory is its workspace. Keep it clean, intentional, and pruned. Context is the new CPU cycle—manage it like one.

Couldn’t have said it better myself. Thanks, as always, to our producer, Hilbert Flumingtop, for keeping the audio crisp. And thanks to Modal for providing the serverless muscle that runs our pipeline—reliable compute without the infrastructure headache.

This has been My Weird Prompts. If you found this useful, leaving a review on Apple Podcasts helps others discover the show. All our episodes are at myweirdprompts.

Until next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2400: Claude Code’s Hidden Context Tax

The Hidden Costs of Claude’s Context

The Hierarchy of Context Hogs

The Eager vs. Lazy Divide

The Plugin Pitfall

Mentions

Downloads

You Might Also Like

#2400: Claude Code’s Hidden Context Tax