#2472: When Guardrails Break: The Hidden Costs of AI Gateway Filtering

PII detection at the gateway layer can block legitimate invoices. Here's how guardrails actually work and where they fail.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2630
Published: Apr 27
Updated: May 15
Duration: 24:36
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: ai-security latency prompt-injection

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

AI gateways are increasingly marketed as the ideal layer for guardrails — PII detection, secret scanning, data loss prevention, all running before prompts ever hit a model. But the implementation details reveal a more complicated picture.

How Guardrails Actually Work

Portkey's implementation is the most documented example. Their PII detection runs as a "before request hook" — scanning prompts for phone numbers, email addresses, physical locations, IP addresses, Social Security numbers, names, and credit card numbers. When something matches, it's replaced with a numbered placeholder (NAME_1, EMAIL_ADDRESS_1, etc.), preserving relational structure without exposing actual data.

Cloudflare takes a different approach. Their DLP feature scans both incoming prompts and outgoing responses, using the same detection profiles from their Cloudflare One product. This gives enterprises consistency across their entire stack — but for streaming responses, the DLP scanning buffers the entire response before releasing it, increasing time-to-first-token latency proportionally to the full response generation time.

The Performance Reality

The performance overhead varies dramatically across providers. Portkey with guardrails enabled adds 20-40 milliseconds — noticeable but acceptable for most use cases. Bifrost (Maxim AI's gateway) claims about 11 microseconds for regex-based PII detection, essentially free. But LiteLLM's P99 latency spikes to 28 seconds at 500 requests per second, because their PII masking uses Microsoft Presidio under the hood, a more heavyweight approach.

The Semantic Gap Problem

The gateway sees a string of text but doesn't know the context. An address in an invoice generation prompt is legitimate — it's the entire point of the request. But the gateway can't distinguish between "generate an invoice with my address at 123 Main Street" and "here's a customer's address, 123 Main Street, please store this." Pattern matching always produces edge cases.

The Precision vs Recall Tradeoff

Guardrail testing frameworks now track F1 scores — precision versus recall. Too-aggressive guardrails (high recall, low precision) catch everything but block legitimate requests, frustrating users and slowing workflows. The real danger: when guardrails are too aggressive, people start working around them, creating shadow AI traffic that's worse than having no guardrails at all.

The Two-Layer Solution

The pragmatic approach is probably both: gateway for broad DLP (credit card numbers, API keys accidentally pasted) and application-layer for context-aware decisions ("yes, this is PII, but it's supposed to be here"). Gateway guardrails catch the obvious stuff; application guardrails handle the edge cases. Neither layer alone is sufficient.

Mentions

Anthropic AI safety company behind Claude
Bifrost Low-latency AI gateway by Maxim AI
LiteLLM Open-source LLM gateway with PII masking
Maxim AI Company behind Bifrost AI gateway
Microsoft Presidio Open-source PII detection library
Portkey AI gateway with 60+ built-in guardrails
Requesty AI gateway with guardrails and EU hosting
Together Platform for hosting open-source models

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Featured In

Creator's Picks 304 episodes

#2472: When Guardrails Break: The Hidden Costs of AI Gateway Filtering

Daniel sent us this one — he's been thinking about AI gateways, specifically how the newer ones are offering guardrail features that people might not even realize are sitting there. PII detection, secret scanning, data loss prevention, all running at the gateway level before prompts ever hit the model. He's asking how these are actually implemented, whether the gateway is the right layer for this stuff, and what the trade-offs are when you start filtering prompts aggressively — his example is perfect: if your personal address gets flagged as PII while you're trying to generate an invoice, that's a problem. So where do we even start with this?

Before we get into the weeds, quick note — today's script is coming from DeepSeek V four Pro. Which I find fitting because DeepSeek's own architecture decisions around where they put their content filters have been a whole conversation in themselves.

Right, and that's actually a nice segue because what Daniel's asking about here is fundamentally an architecture question. Where do you put the guardrails? And the answer most of the industry is converging on is the gateway layer, but the reasons are more interesting than just "it's centralized.

The centralization argument is actually underselling it. The way the Cloud Security Alliance framed this back in December was that the AI gateway functions as a Policy Enforcement Point — their term, not mine — that intercepts every prompt and every response. And the real value isn't just that it's one place to configure things. It's that you get uniform enforcement across every model you're routing to, regardless of what the model provider does or doesn't do natively.

Which matters because if you're routing to Anthropic's API, they have their own safety filters. If you're routing to a fine-tuned Llama model on Together, they might have different ones. If you're hitting an open-source model on your own infrastructure, you've got nothing unless you built it yourself. The gateway normalizes all of that.

OpenRouter is the perfect example of the opposite approach. OpenRouter is excellent at what it does — model availability, routing, easy experimentation. But they have zero built-in guardrails. No PII detection, no secret scanning, no content moderation. Requesty did a comparison piece where they basically said, look, with OpenRouter you're going to be writing custom code or bolting on third-party libraries to get any kind of data filtering.

Which isn't necessarily a criticism of OpenRouter. They're a routing layer. They're thin on purpose. The problem is that teams start with OpenRouter for prototyping, then they move to production, and suddenly they realize they need guardrails and OpenRouter can't give them that. So now what?

This is where the "gateway stack" pattern emerges — and I think this is genuinely under-discussed. Organizations end up putting Cloudflare AI Gateway in front of OpenRouter. So you've got a gateway in front of a gateway. Cloudflare handles the DLP and guardrails, OpenRouter handles the model routing. It works, but now you've added latency from two hops instead of one.

Let's talk about what these guardrails actually do at a technical level, because Daniel's question about PII during invoice generation gets at something really specific. How does the detection actually work, and what happens when it triggers?

Portkey's implementation is the most mature one I've seen documented, so let's walk through it. They have over sixty built-in guardrails, and the PII detection sits as what they call a "before request hook." Before your prompt hits the model, it gets scanned. They're looking for phone numbers, email addresses, physical locations, IP addresses, Social Security numbers, names, credit card numbers. When something matches, they don't just block it — they replace it with a numbered placeholder.

Numbered, meaning it's not just a generic "PII redacted" tag. It's tracking which specific instance.

So if your prompt has two names in it, you get NAME_1 and NAME_2. If there are three email addresses, EMAIL_ADDRESS_1, EMAIL_ADDRESS_2, EMAIL_ADDRESS_3. That's actually clever because it preserves the relational structure of the information. The model can still reason about "the person named NAME_1 sent an email from EMAIL_ADDRESS_1" without ever seeing the actual data.

Here's where Daniel's invoice problem kicks in. If I'm generating an invoice and my address gets replaced with LOCATION_1, the model can't put my actual address on the invoice. The output is going to say LOCATION_1. That invoice is useless.

Right, and Portkey's documentation actually acknowledges this tension through their HTTP status code design. They use custom codes — a 200 means the request went through, PII was redacted if it was found. A 246 means PII was detected but the request continued anyway. And a 446 means the request was blocked entirely because PII was found and the deny flag was set to true.

The 246 is the interesting middle ground. It's saying, "Hey, we saw something, we're letting it through, but you should know about it." That's the kind of granularity you need if you're going to handle cases like invoice generation without breaking everything.

Even that assumes you've configured things correctly. And this is where I think the practical reality gets messy. Most teams, especially smaller ones, are going to set these guardrails once and then forget about them. They're not going to build per-endpoint exemption logic. So they'll turn on PII detection globally, and then three weeks later someone in accounting can't generate invoices and they don't know why.

The debugging experience for that is terrible. You send a prompt, it gets blocked or redacted, and unless your gateway gives you really good observability into what happened, you're just staring at a broken output with no idea that a guardrail was the culprit.

Cloudflare's approach is different in some interesting ways. Their DLP feature, which they documented just this month actually, scans both incoming prompts and outgoing responses. And they're using the same detection profiles from their Cloudflare One DLP product — so if you're already a Cloudflare shop, you're getting consistency across your entire stack, not just AI traffic.

That's a real selling point for enterprises. You don't want your AI gateway detecting PII differently than your email filter or your file storage scanner.

They also have a limitation that I think is really significant and probably underappreciated. For streaming responses, the DLP scanning buffers the entire response before releasing it. Their documentation is explicit about this — it increases time-to-first-token latency proportionally to the full response generation time.

Oh, that's brutal for chat applications. If you're generating a long response, the user is sitting there staring at nothing while the entire thing gets scanned behind the scenes.

Portkey handles this differently — for streaming, their output guardrails are what they call "informational only." No fallback actions, no retries. They just flag issues but don't intervene. Which is a pragmatic compromise, but it also means you're not actually preventing sensitive data from being streamed out. You're just... aware of it after the fact.

You've got this fundamental trade-off. Either you buffer the whole response and kill the user experience for streaming, or you let it stream and accept that your guardrails are more like guard-suggestions on the output side.

The performance overhead varies wildly across providers, which is something teams need to actually benchmark rather than just assuming it's negligible. There was a comparison by SlashLLM in March — Portkey with guardrails enabled adds about twenty to forty milliseconds. That's noticeable but probably acceptable for most use cases. Bifrost, which is Maxim AI's gateway, claims about eleven microseconds for their regex-based PII detection. That's basically free.

Eleven microseconds is in the noise. Twenty to forty milliseconds is not nothing, especially if you're chaining multiple calls.

Then there's the extreme case — LiteLLM's P99 latency spikes to twenty-eight seconds at five hundred requests per second. That's not a typo. Twenty-eight seconds.

not a gateway anymore, that's a waiting room.

To be fair, LiteLLM's PII masking is in beta and it's using Microsoft Presidio under the hood, which is a more heavyweight approach. It's less mature than what Portkey or Cloudflare are doing. But it's a good reminder that "guardrails at the gateway" sounds clean in theory, and then the implementation details can completely undermine the value proposition if you're not careful.

Let me poke at the architectural question Daniel raised, because I think there's a genuine tension here that the industry hasn't fully resolved. The gateway layer is being positioned as the ideal place for guardrails — and you can see why. Centralized enforcement, no code changes needed, consistent across models. But the gateway is also the layer with the least context about what you're actually trying to do.

This is the semantic gap problem. The gateway sees a string of text. It doesn't know whether that string is an address being leaked accidentally or an address being used in a legitimate business process. It's just pattern matching.

Pattern matching is always going to have edge cases. Daniel's invoice example is perfect. The address isn't sensitive in that context — it's the entire point of the prompt. But the gateway can't tell the difference between "please generate an invoice with my address at 123 Main Street" and "here's my customer's address, 123 Main Street, please store this in the database.

The application layer has way more context. Your invoicing application knows that this endpoint is supposed to receive addresses. Your CRM knows that phone numbers are expected inputs. But the problem with application-layer guardrails is that they're inconsistent. Every application has to implement them separately, and someone's going to forget.

The pragmatic answer is probably both, right? Gateway for broad DLP — catch the obvious stuff, the credit card numbers that should never be in a prompt, the API keys that someone accidentally pasted. And then application-layer for context-aware decisions — "yes, this is PII, but it's supposed to be here.

I think this is where the "guardrail aggressiveness" question becomes really practical. The testing ecosystem around this is actually getting more sophisticated. There are frameworks now that track F1 scores for guardrails — precision versus recall. If your guardrail is too aggressive, you get high recall but terrible precision. You're catching everything, but you're also blocking tons of legitimate requests.

Which is exactly the invoice problem. High precision means you only block things that are actually problems. High recall means you catch every possible leak but frustrate your users constantly.

The Budecosystem piece on guardrail testing had a line that stuck with me — "safe, useful prompts are incorrectly flagged as unsafe, frustrating users, slowing workflows, or silencing valid outputs." The "silencing valid outputs" part is the real danger. If your guardrails are too aggressive, people start working around them. They'll find ways to bypass the gateway entirely, which is worse than having no guardrails at all because now you've got shadow AI traffic you can't even see.

There's also the question of where in the prompt lifecycle the scanning happens. Pre-inference versus post-inference scanning have completely different failure modes. Pre-inference, you're blocking or redacting before the model sees anything — that protects the model provider from seeing sensitive data, but it can break the task. Post-inference, you're scanning the output — that doesn't interfere with the model's reasoning, but the sensitive data already went to the model provider.

Most of the gateways are doing pre-inference scanning for input guardrails. Portkey's before-request hooks, Cloudflare's prompt scanning. The sensitive data never leaves the gateway. But Daniel's invoice case shows exactly why that can be the wrong call. Sometimes you want the model to see the data, you just don't want it to leak out somewhere else.

Which suggests that the ideal configuration might actually be model-dependent. If you're using a local model running on your own infrastructure, you probably don't need pre-inference PII scanning at all — the data never leaves your network. You might only care about output scanning to prevent the model from generating PII that was in its training data.

If you're hitting Anthropic or OpenAI's API, you might care a lot about what data leaves your environment. Even if those providers have strong data handling policies, many enterprises have compliance requirements that say PII can't be transmitted to third-party APIs, period.

Let's talk about the specific implementations a bit more, because Daniel mentioned Portkey and I think it's worth understanding what's actually happening under the hood. You mentioned the numbered placeholders — that's a specific design choice that has implications.

It's effectively tokenization of sensitive fields. And the value is that it preserves the model's ability to do entity-level reasoning. If I say "John Smith lives at 456 Oak Avenue and his phone number is 555-0123" and the guardrail replaces that with "NAME_1 lives at LOCATION_1 and his phone number is PHONE_1", the model can still understand that these three pieces of information belong to the same entity. It just doesn't know the actual values.

Which is clever, but it also means the guardrail is making a semantic decision — it's deciding that these things are entities that relate to each other. What happens when the guardrail gets that wrong?

That's the failure mode, and it's not well-documented. If the PII detector mistakes a product code for a Social Security number, you get a redaction that breaks the prompt in ways that are hard to debug. The model sees a placeholder where it expected a product identifier, and now you're getting hallucinations or refusals that make no sense.

This connects back to something Daniel mentioned in his prompt — that people often skip these features because they're worried about over-aggressiveness. It's not laziness. It's a rational response to the risk of breaking your production workflows.

The counterpoint, though, is that the consequences of not having guardrails are potentially much worse. A developer accidentally pastes an API key into a prompt that goes to a third-party model provider. Or a customer service agent copies a customer's credit card number into a chat interface. These aren't hypothetical — they happen constantly.

What's the practical answer for someone who's using a gateway today and wants to add these guardrails without breaking everything?

I think the starting point is to run guardrails in what Portkey calls the 246 mode — detect and log, but don't block. Just get visibility into what's flowing through your system. Most teams have no idea how much PII is actually in their prompts until they start scanning.

Which is probably terrifying, honestly.

It usually is. But once you have that visibility, you can start tuning. Maybe you block credit card numbers and API keys aggressively — those should never be in prompts. But you only flag addresses and names for review. Or you set up different policies for different endpoints — your invoice generation endpoint gets a PII exemption, but your general-purpose chat endpoint scans everything.

That's where the gateway model really shines. If you're doing this at the application layer, you have to implement that per-endpoint logic in every application. At the gateway, you configure it once and route different traffic through different policies.

Cloudflare's approach with their DLP profiles is interesting here because they're reusing the same detection profiles across their entire platform. If you've already tuned your DLP profiles for email and file storage, those same profiles apply to your AI traffic. You're not starting from scratch.

Which is a real operational advantage, but it also assumes you're already invested in the Cloudflare ecosystem. If you're not, that's a lot of infrastructure to adopt just for AI gateway guardrails.

That's where Portkey or Requesty might be more practical. Requesty in particular is positioning themselves as "OpenRouter plus guardrails." They've got PII redaction, secret key protection, prompt injection checks all built in. And they offer EU hosting for GDPR compliance, which is a specific requirement that OpenRouter doesn't address.

The GDPR point is actually significant. If you're a European company, or you handle European customer data, the data residency question isn't optional. And a lot of the lightweight routing gateways just don't address it.

Requesty also has configurable logging controls, which touches on another dimension of this. Guardrails protect the data going to the model, but you also need to think about what data the gateway itself is storing. If your gateway is logging every prompt and response for debugging, and those logs contain PII, you've just moved the problem rather than solving it.

Right, you've protected the model provider from seeing the PII, but now you've got a database full of sensitive data in your gateway logs. And gateway logs are exactly the kind of thing that nobody thinks to secure properly.

This is where the "gateway as central governance layer" argument gets a little too tidy. Yes, it's one place to enforce policies. But it's also one place that, if compromised, exposes everything. Every prompt, every response, every API key, every piece of PII that wasn't caught by the guardrails.

Let me try to synthesize what we've actually learned here, because Daniel asked a practical question and I want to make sure we're giving a practical answer. The gateway is emerging as the dominant layer for these guardrails, and there are good reasons for that — centralized enforcement, consistency across models, no code changes needed. But the implementation details matter enormously. The difference between Portkey's twenty to forty milliseconds and LiteLLM's twenty-eight seconds is the difference between usable and unusable. The difference between blocking PII and redacting it with placeholders is the difference between breaking your workflows and preserving them. The difference between pre-inference and post-inference scanning determines whether the model ever sees the data at all.

I'd add that the streaming question is a genuine unsolved problem. Cloudflare buffers the whole response, Portkey doesn't intervene on streaming output, and neither approach is ideal. If you're building a real-time chat application, you're going to have to accept some compromise — either latency or reduced protection.

The invoice problem Daniel raised is really the perfect illustration of why this isn't just a technology question. It's a configuration and context question. The same address that's a PII leak in one context is a required business input in another. And the gateway, by itself, can't tell the difference.

Which is why I keep coming back to the hybrid model. Gateway for the stuff that's unambiguously sensitive regardless of context — credit card numbers, Social Security numbers, API keys. Those should be blocked or redacted everywhere, no exceptions. Application layer for the context-dependent stuff — addresses, names, phone numbers that might be legitimate inputs depending on the use case.

The good news is that the tools are maturing fast. A year ago, most of this was custom code. Now you've got Portkey with sixty-plus built-in guardrails, Cloudflare with their DLP integration, Requesty with out-of-the-box PII redaction. The barrier to entry has dropped dramatically.

The configuration burden hasn't gone away. You still need to think about what you're scanning, how aggressively, and what happens when something is detected. The tools give you knobs, but you have to actually turn them to the right settings for your use case.

Now — Hilbert's daily fun fact.

The Greenland shark can live for over four hundred years, making it the longest-living vertebrate known to science. Some individuals alive today were swimming in the ocean when the Mayflower arrived in North America.

If you're setting up guardrails on your AI gateway, here's what I'd actually recommend. Start in detection-only mode. Don't block anything. Just get a week or two of data on what's actually flowing through your system. You will almost certainly be surprised by what you find.

Once you have that baseline, block the unambiguous stuff first. Credit card numbers, API keys, Social Security numbers. These should never be in prompts, and the false positive rate on these patterns is extremely low. You can be aggressive here without breaking workflows.

Then, for the fuzzier stuff like names and addresses, use redaction with placeholders rather than blocking. That way the model can still reason about the entities even if it can't see the actual values. And set up per-endpoint policies for the cases where you know PII is expected — your invoice generation, your CRM integration, your customer support tools.

Benchmark the performance impact before you deploy to production. The difference between eleven microseconds and forty milliseconds might not matter for a batch processing pipeline, but it matters a lot for a chat interface. Test with your actual traffic patterns, not just a single prompt.

Finally, don't forget about the output side. Input guardrails protect the model provider from seeing your data. Output guardrails protect your users from seeing things they shouldn't. They serve different purposes and you probably need both.

The one thing I'd caution against is assuming that any of this is set-and-forget. Guardrails need ongoing tuning. Your usage patterns change, new types of sensitive data emerge, the models themselves evolve. If you configure your PII detection once and never look at it again, you're going to end up either over-blocking and frustrating users, or under-blocking and leaking data. Probably both, depending on the endpoint.

Looking forward, I think the interesting question is whether guardrails eventually move from the gateway into the model layer itself. Anthropic and OpenAI both have their own safety systems. As those get more sophisticated, does the gateway's role shift from enforcement to coordination? Or does the gateway remain the policy layer because enterprises want control independent of any single model provider?

My bet is on the gateway remaining the policy layer, specifically because of that multi-model reality. As long as organizations are using different models for different tasks, they need a model-agnostic enforcement point. The model providers can't solve that problem because they only control their own APIs.

That makes sense. And it means the gateway isn't going away — it's just getting more sophisticated in what it can do beyond routing.

Thanks to our producer Hilbert Flumingtop for keeping this show running. This has been My Weird Prompts. You can find every episode at myweirdprompts.

If you've got thoughts on where guardrails belong in the AI stack, we'd love to hear them. Until next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2472: When Guardrails Break: The Hidden Costs of AI Gateway Filtering

Mentions

Downloads

You Might Also Like

Featured In

#2472: When Guardrails Break: The Hidden Costs of AI Gateway Filtering