#2505: How Self-Hosted Search Actually Works for AI Agents

SearXNG isn't a crawler — it's a metasearch router. Here's how it works and why AI agents change everything.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2663
Published: Apr 28
Duration: 27:21
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: ai-agents open-source privacy

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

What Is SearXNG and How Does It Work?

SearXNG is not a crawler. It does not build or maintain its own web index. Instead, it acts as a metasearch engine — forwarding your query to over 70 upstream providers including Google, Bing, DuckDuckGo, and Brave, fetching their responses, stripping out tracking and identifying information, then scoring, ranking, deduplicating, and returning aggregated results.

This architectural decision is what makes self-hosted search feasible without Google-scale infrastructure. SearXNG sidesteps the entire problem of indexing the web by piggybacking on existing indexes. It's a query router with a sophisticated result processor.

The Technical Pipeline

When you submit a query to a SearXNG instance, several steps happen in sequence:

The instance receives your query via HTTP GET or POST
It translates the query and dispatches it to selected upstream engines using adapter modules (one per engine, stored in the engines directory)
It fetches raw results using the httpx library and parses them with lxml
It anonymizes everything — strips your IP, tracking parameters, and identifying information
It scores, ranks, deduplicates, and formats the output as HTML, JSON, CSV, or RSS

The adapter modules are the critical translation layer, converting between SearXNG's internal query format and whatever each engine expects. This creates a maintenance challenge — every time Google tweaks its results page markup, someone must update the parser. But the architecture isolates the problem: each engine has its own module, so when one breaks, the other 70-plus engines keep working.

How Scoring Works

The ranking algorithm uses a weighted position sum. Each result's score is calculated as: (occurrences × weight) ÷ position, summed across every position that result appears in. Occurrences is simply how many engines returned that same URL. Weight is a per-engine multiplier you can configure. Position is where it ranked in each engine's results.

This formula favors results that rank highly across many engines — essentially a consensus mechanism. If Google, Bing, DuckDuckGo, and Brave all agree on the best result, SearXNG trusts that consensus. The JSON API response is clean and structured: query string, result count, an array of result objects (each containing URL, title, content snippet, published date, engine sources, score, and category), plus answers, suggestions, corrections, infoboxes, and unresponsive engines.

AI Agents Change the Equation

At the API level, SearXNG returns identical results whether the query comes from a human or an AI agent. The difference lies in what happens before and after the query.

Human searches average three to five words. AI agents generate 20-plus-word structured prompts with chained reasoning and tool calls. A more specific, better-constructed query surfaces more relevant results from upstream engines. The agent is essentially doing query engineering that most humans don't bother with.

More critically, 60% of AI-powered searches end without a click to any website. The AI consumes the snippet directly and moves on. This inverts the implicit contract search engines were built on — surface links, users click through, websites get traffic and ad revenue. The zero-click search breaks that contract entirely.

Open Questions

Does SearXNG's scoring algorithm need to change to serve AI agents better? The current weighted position sum was designed for human-facing results, optimizing for consensus among engines and high ranking. For an LLM consuming snippets to synthesize an answer, different signals might matter more — snippet quality, factual density, source diversity, or structured data presence.

The ecosystem is already fragmenting. SearXNG is increasingly used as a backend for AI agents through MCP servers, ranging from minimalist implementations (pure search, no caching, auditable) to full pipelines with crawling, vectorization, and retrieval-augmented generation. This mirrors a broader tension in AI tooling: auditability and control versus better results from opaque all-in-one solutions.

The Privacy Angle

SearXNG was designed to protect human users from tracking. When an AI agent queries SearXNG, the privacy benefit shifts — it's not about protecting the AI, but protecting the human who owns the agent from leaking their interests, research directions, or business intelligence to Google and Bing. SearXNG becomes infrastructure for data sovereignty in agentic workflows.

With AI agent traffic up 7,851% year over year and Gartner predicting traditional search volume could drop 25% as users shift to generative AI assistants, these questions aren't academic. The web's economic model wasn't built for AI consumers that never visit websites — and the tools we use to search may need to evolve alongside the agents using them.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2505: How Self-Hosted Search Actually Works for AI Agents

Daniel sent us this one — he's asking about SearXNG and self-hostable search APIs, and he zeroes in on something that's actually pretty sharp. The obvious question everyone asks is how these things possibly work without Google-scale infrastructure. But the real question he's asking is subtler: when an AI agent sends the query instead of a human, does the retrieval itself change? Is there a meaningful difference in what comes back?

The answer is yes, but not for the reasons most people assume. Also, quick note — DeepSeek V four Pro is writing our script today, so if anything sounds unusually coherent, that's probably why.

I'll take that as a compliment directed somewhere else. So let's start with the infrastructure question, because it's the one that trips everyone up. People hear "self-hosted search API" and they picture someone's home server crawling the entire web. That's not what's happening.

Right, and this is the core architectural insight that makes the whole thing possible. SearXNG is not a crawler. It does not build or maintain its own web index. What it actually does is act as a metasearch engine — it forwards your query to more than seventy upstream providers, including Google, Bing, DuckDuckGo, Brave, and a bunch of others, fetches their responses, strips out all the tracking and identifying information, then scores, ranks, deduplicates, and returns the aggregated results. You're piggybacking on existing indexes rather than building your own.

Which is either brilliant or parasitic, depending on which side of the ad revenue equation you're sitting on.

I'd say it's a privacy layer, but fair. The point is, indexing the internet at scale genuinely is a massive undertaking. Google spent decades and billions of dollars building that infrastructure. SearXNG sidesteps the entire problem by never touching an index. It's a query router with a really good result processor.

Walk me through what actually happens when I type something into a SearXNG instance and hit enter. What's the pipeline?

First, the instance receives your query — it's a standard HTTP GET or POST to the root or the search endpoint. Second, it translates your query and dispatches it to whichever upstream engines you've selected, using adapter modules. There's one adapter per engine, stored in the engines directory of the codebase. Third, it fetches the raw results using the httpx library and parses them with lxml. Fourth, it anonymizes everything — strips your IP, strips tracking parameters, scrubs the request so the upstream engines can't fingerprint you. Fifth, it scores, ranks, deduplicates, and formats the output as HTML, JSON, CSV, or RSS.

The adapters are doing the heavy lifting of translating between SearXNG's internal query format and whatever each engine expects. That's a maintenance nightmare, isn't it? Every time Google tweaks their results page markup, someone has to update the parser.

It is a constant cat-and-mouse game, and that's actually one of the major operational challenges of running a metasearch engine. The adapters break, the community patches them, upstream changes break them again. But the architecture isolates the problem — each engine has its own module, so when Google changes something, you update one file, and the other seventy-plus engines keep working.

The scoring — how does it decide which results to show first when it's aggregating from dozens of different sources?

This is where it gets interesting. The ranking algorithm is a weighted position sum. The score for each result is calculated as the sum of occurrences times weight divided by position, for each position that result appears in. Occurrences is simply how many engines returned that same URL. Weight is a per-engine multiplier you can configure — so you could give Google results more weight than Bing results, or vice versa, or keep everything at the default of one point zero. And position is just where it ranked in each engine's results.

If a page shows up as the number one result on five different engines, it's going to score very high. If it shows up as the number twenty result on one engine, it barely registers.

The formula favors results that rank highly across many engines, which is a pretty good proxy for relevance and reliability. It's essentially a consensus mechanism — if Google, Bing, DuckDuckGo, and Brave all agree that this is the best result, SearXNG trusts that consensus.

Which is clever until you think about filter bubbles and the fact that most search engines are optimizing for similar things. You might just be amplifying the same biases across the board.

That's a valid critique, and it's one of the reasons some people prefer to weight engines differently or even use SearXNG exclusively with non-mainstream engines. But for most use cases, the consensus approach works remarkably well. And the JSON API response structure is refreshingly clean — when you append format equals json to the query, you get back an object with the query string, the number of results, an array of result objects each containing the URL, title, content snippet, published date, which engines returned it, the score, and a category. You also get answers, suggestions, corrections, infoboxes, and a list of unresponsive engines.

It's structured, parseable, and machine-readable by design. Which brings us to the second half of Daniel's question — is there a meaningful difference in how the retrieval is returned when the query comes from an AI agent versus a human user?

Let me separate this into two layers, because I think that's where the interesting analysis lives. At the API level, the response is identical. SearXNG doesn't know or care whether the HTTP request came from a browser, a Python script, or an AI agent. If you hit the same endpoint with the same parameters, you get the same JSON back. The difference isn't in what the API returns — it's in what happens before and after the query.

That's the key distinction. The retrieval mechanism itself is agnostic to the consumer. But the query formulation, the consumption pattern, and the downstream processing are radically different.

The numbers here are striking. Human Google searches average three to five words. They're getting longer and more conversational, but they're still fundamentally short keyword queries. AI agents, by contrast, generate twenty-plus-word structured prompts with chained reasoning and tool calls embedded in them. The query itself is a different species.

The agent isn't just searching for "best coffee grinder" — it's constructing something like "identify the top-rated burr coffee grinders under two hundred dollars from specialty coffee review sites, exclude Amazon affiliate listicles, prioritize sources that publish particle distribution analysis." That's a fundamentally different interaction with the search engine.

That changes the retrieval in practice, even though the API is returning the same structure. A more specific, better-constructed query is going to surface more relevant results from the upstream engines. The agent is essentially doing query engineering in a way that most humans don't bother with.

There's another dimension here that I think is even more interesting. Sixty percent of AI-powered searches end without a click to any website. The AI consumes the snippet or the result directly and moves on. That's not a bug — that's the intended behavior for an agent that's doing research on your behalf. But it's a complete inversion of how search engines were designed to work.

Search engines were built on the implicit contract that they'd surface links, you'd click through, and the website would get traffic and ad revenue. The zero-click search breaks that contract entirely. And what's fascinating is that humans are already doing this too — eighty-five percent of people double-check AI answers via traditional search, but the AI itself often never visits a page.

We're training a generation of users to expect answers without destinations. That has massive implications for the economic model of the web, but let me pull us back to the technical question. Given that AI agents consume search results differently, does SearXNG's scoring algorithm need to change to serve them better?

I think this is an open question and a important one. The current weighted position sum algorithm was designed for human-facing results. It assumes that what matters is consensus among engines and high ranking. But if the consumer is an LLM that's going to read snippets and synthesize an answer, you might want to optimize for different signals — maybe snippet quality, factual density, source diversity, or even the presence of structured data that the model can parse more reliably.

Or you might want to deprioritize results that are clearly designed for human click-through — the ones with emotional headlines, the ones that bury the actual information behind narrative framing. An AI agent doesn't need to be seduced into clicking.

And this connects to something I've been tracking in the Model Context Protocol ecosystem. SearXNG is increasingly being used as a backend for AI agents through MCP servers. There are now at least four distinct SearXNG MCP servers out there, ranging from the minimalist reuteras implementation — which is pure search, no caching, very auditable — to the crawl four AI RAG MCP server, which is a full pipeline with crawling, vectorization, and retrieval-augmented generation built in.

The ecosystem is already fragmenting around the question of how much processing should happen between the search API and the AI model. Do you want a simple, transparent building block, or do you want a powerful but opaque all-in-one solution?

This mirrors a broader tension in AI tooling right now. The minimalist approach gives you auditability and control — you can inspect exactly what queries were sent, what results came back, and why the model made the decision it did. The all-in-one approach gives you better results faster, but you're trusting a black box. For production use cases where liability matters, I suspect the minimalist approach wins.

I think you're right, but I also think most people will default to the all-in-one because it's easier. The privacy angle here is worth unpacking too, because it's subtly different when an AI agent is involved.

This is what I've been calling the privacy paradox of AI agents using SearXNG. SearXNG was originally designed to protect human users from tracking — you search for something, and Google never knows it was you. But when an AI agent queries SearXNG, the privacy benefit shifts. It's not about protecting the AI, which has no privacy interests or identity to protect. It's about protecting the human who owns the agent from leaking their interests, their research directions, their business intelligence to Google and Bing.

SearXNG becomes infrastructure for data sovereignty in agentic workflows. If my AI agent is researching competitors, or exploring a new market, or digging into a sensitive topic, I don't want that query history building up in Google's profile of me — or of my organization.

This is not theoretical. AI agent traffic grew seven thousand eight hundred fifty-one percent year over year in twenty twenty-five. It's still less than five percent of total search queries, but the trajectory is unmistakable. Gartner is predicting that traditional search volume could drop twenty-five percent as users shift to generative AI assistants.

That's a staggering number. A quarter of all search traffic potentially migrating from search engines to AI agents that consume results without ever visiting a website.

The web is not built for that. The entire economic model — content creation, journalism, SEO, advertising — assumes that search results lead to pageviews, and pageviews lead to revenue. If sixty percent of AI searches end without a click, and AI searches are growing exponentially, we're looking at a structural collapse in referral traffic.

Which creates a perverse incentive for websites to block AI agents or degrade their experience. We're already seeing this — some publishers are serving different content to known AI crawlers, or blocking them entirely. The arms race between content creators and content consumers is heating up.

Let me pull on a technical thread that I think connects all of this. SearXNG's current architecture is Flask-based and synchronous, using uWSGI workers. Each worker consumes about a hundred and fifty megabytes of memory. For a human user typing one query at a time, that's fine. But for an AI agent that might fire dozens of rapid, parallel queries — researching multiple angles simultaneously — the synchronous overhead becomes a real bottleneck.

Because each query ties up a worker while it waits for responses from potentially dozens of upstream engines. If your agent fires twenty queries at once, you need twenty workers, and you're consuming three gigabytes of memory just for the search layer.

The community is actively discussing a migration to async ASGI — probably using aiohttp — specifically to address this. The I/O-bound nature of querying seventy-plus upstream engines makes it a perfect candidate for async. You could handle hundreds of concurrent queries with a fraction of the resources.

This is where commercial alternatives like Tavily or the Brave Search API start to look attractive for production AI workloads. They've already solved the async scaling problem, and they're optimized for machine consumption from the ground up. SearXNG is playing catch-up on the infrastructure side.

The trade-off is control and privacy. When you use Tavily or Brave, you're trusting their infrastructure, their logging policies, their business model. With SearXNG, you can run it on your own hardware, audit every line of code, and know exactly what's happening with your data. For some use cases, that's worth the operational overhead.

Let's talk about the alternatives for a moment, because Daniel mentioned self-hostable search APIs broadly, not just SearXNG. What else is out there?

The landscape splits into three categories. First, you've got the metasearch engines — SearXNG is the dominant open-source option, but there's also Whoogle, which is lighter weight and more opinionated. Second, you've got site-specific indexers like Elasticsearch, Meilisearch, and Typesense — these are incredibly powerful, but they only search data you've explicitly indexed. They're not searching the web. Third, you've got the truly ambitious projects that actually try to index the internet at scale.

Those are the ones that actually require Google-scale infrastructure, which is why they're rare.

Apache Nutch, paired with Hadoop and Elasticsearch, is the classic example — it's a real web crawler and indexer, but running it at any meaningful scale requires serious hardware and engineering effort. Mwmbl is a newer community-driven project that's more practical — as of mid-twenty twenty-five, they'd indexed about five hundred million URLs using term-hash-based retrieval and compressed inverted indexes. They're aiming for billions.

Five hundred million URLs sounds impressive until you realize Google has indexed hundreds of billions. It's not even in the same order of magnitude.

That gap is exactly why metasearch is the pragmatic choice for most self-hosted use cases. You're never going to out-index Google on a home server, or even on a modest cluster. But you can out-aggregate them, and you can do it with privacy guarantees that Google will never offer.

There's another angle on the AI agent question that I want to explore. You mentioned the query structure difference — agents use longer, more structured prompts. But there's also a difference in how agents interact with the API at the HTTP level.

AI agents complete form fills and page navigation in fractions of a second, with unnaturally smooth mouse movements and non-standard user-agent strings. They're detectable as automated traffic unless they deliberately try to mimic human behavior. And that detectability matters because many search engines — including Google — actively try to block automated queries.

Which is why SearXNG's anonymization layer is so valuable for agent use cases. The upstream engines see the query coming from SearXNG, not from the agent. As long as SearXNG itself isn't rate-limited or blocked, the agent can operate freely behind it.

This creates an interesting dynamic where SearXNG instances become strategic infrastructure for AI agent operators. If you're running a fleet of agents that need web search capability, a self-hosted SearXNG instance gives you a reliable, private, and configurable search backend that isn't subject to the API pricing and rate limits of commercial alternatives.

Though you're still dependent on the upstream engines not blocking SearXNG. If Google decides to crack down on metasearch engines, the whole model falls apart.

That's the existential risk, and it's not hypothetical. Google has periodically tightened access, and the SearXNG community has had to adapt — rotating IP addresses, adjusting request patterns, updating parsers. It's a constant arms race. But the fact that SearXNG has survived and thrived for years suggests that completely blocking metasearch is harder than it looks, especially when the queries are well-distributed across many instances.

Let me pose a question that I think gets to the heart of what Daniel is asking. If you're building an AI agent that needs web search capability, and you're choosing between SearXNG and a commercial API like Brave or Tavily, what's the actual decision framework?

I'd break it down into four dimensions. First, privacy and data sovereignty — if you care about your query history not being logged and monetized, SearXNG wins. Second, result quality — commercial APIs often have cleaner, more structured responses because they're designed for machine consumption. SearXNG's JSON is good, but it's a layer on top of HTML scraping, which means occasional parsing failures. Third, operational complexity — SearXNG requires you to run and maintain a service. Commercial APIs are just an API key. Fourth, cost — SearXNG is free software, but you're paying for the server it runs on. Commercial APIs charge per query.

The fifth dimension that nobody talks about until it bites them: legal and terms of service exposure. When you use the Google API directly, you're bound by their terms, which almost certainly prohibit scraping or automated querying at scale. SearXNG sits in a gray area — you're not directly violating Google's terms because you're not the one making the queries.

Though that's a legal distinction that hasn't been fully tested. I wouldn't want to be the test case.

So where does this leave us? For most people building AI agents, SearXNG or something like it is the practical choice if you care about privacy and control. Commercial APIs are the practical choice if you care about reliability and don't want to manage infrastructure. And truly self-hosted web indexing remains a niche pursuit for people with unusual resources or unusual threat models.

I think that's a fair summary. But I want to add one more layer that connects back to something I mentioned earlier — the MCP ecosystem fragmentation. The fact that there are four different MCP servers for SearXNG tells you that this is an unsolved design problem. We don't yet know the right abstraction layer between search APIs and AI agents.

Is the right model a thin wrapper that just translates MCP calls to HTTP requests? Or is it a thick middleware that handles caching, result processing, content extraction, and vectorization? The answer probably depends on the use case, but the fragmentation suggests the community hasn't converged yet.

That fragmentation is healthy, honestly. It means people are experimenting. The thin wrapper approach — like the reuteras server — gives you maximum transparency. You can see exactly what SearXNG returned, and you can debug why your agent made a particular decision. The thick approach — like crawl four AI RAG — gives you better results faster, but you lose that audit trail.

For what it's worth, I'm in the thin wrapper camp. I'd rather have visibility into failures than optimized results with no way to understand edge cases.

I lean that way too, especially for anything production-facing. But I understand the appeal of the thick approach for prototyping and personal use. The question is whether the thick approach becomes a crutch that prevents people from understanding their own systems.

That's a broader AI problem, not specific to search. But it's real. Okay, let me try to synthesize what we've covered, because we've ranged pretty widely. SearXNG works by aggregating results from existing search engines, not by building its own index. The retrieval mechanism itself is identical whether the query comes from a human or an AI agent — same API, same response structure. But the query formulation, the consumption pattern, and the downstream processing are fundamentally different. AI agents write longer, more structured queries, they often consume results without clicking through to websites, and they fire queries at volumes and speeds that stress synchronous architectures.

That last point — the architectural stress — is going to drive a lot of the evolution in this space over the next few years. Async backends, better caching, smarter rate limiting. The tools that serve AI agents well are going to look different from the tools that served human search well.

Which is a good place to pivot to the practical question. If someone listening wants to set this up, what should they actually do?

The simplest path is to deploy SearXNG via Docker. The official image is well-maintained, and you can have an instance running in about ten minutes. The key configuration decisions are which engines to enable — I'd recommend starting with Google, Bing, DuckDuckGo, and Brave, then adjusting based on your needs — and whether to enable the JSON API, which is off by default and needs to be explicitly enabled in settings dot yaml.

If you're setting this up specifically as a backend for AI agents, you'll want to enable the JSON API and probably bump up the rate limiting defaults, because agents will hit your instance harder than human users would.

The default rate limits are calibrated for human browsing behavior. An AI agent doing research can easily exceed them. You'll also want to think about caching — if your agent is going to query similar things repeatedly, adding a caching layer in front of SearXNG can dramatically reduce load on the upstream engines and speed up responses.

If you don't want to self-host, there are public SearXNG instances out there. But you're trusting the operator of that instance with your query data, which kind of defeats the privacy purpose.

It does, but it's still better than querying Google directly in some threat models, because Google is building a persistent profile linked to your identity, while a random SearXNG operator probably isn't. Though you should assume they could be logging your queries.

The threat model question is actually a good way to frame the whole decision. What are you actually trying to protect against? If it's Google's profiling and ad targeting, any SearXNG instance helps. If it's nation-state surveillance, you need to self-host and you need to trust your hosting provider. If it's your own ISP or local network adversary, you need to add transport encryption and possibly route through a VPN or Tor.

If you're an AI agent operator, the threat model often includes competitors trying to infer your research directions from your search patterns. That's a business intelligence leak that most people aren't thinking about yet, but it's going to become a real concern as agent usage grows.

Alright, before we wrap, I believe I'm on fun fact duty today.

Go for it.

Now: Hilbert's daily fun fact. The Portuguese man o' war is not a single organism but a colonial organism made up of specialized individual animals called zooids, each performing a function — feeding, reproduction, defense — that it cannot survive without. It is not a jellyfish, despite looking exactly like one.

For listeners who want to actually do something with this, here's what I'd suggest. First, if you're curious about self-hosted search, spin up a SearXNG Docker instance and just use it as your daily search engine for a week. You'll learn a lot about what you value in search results. Second, if you're building AI agents, think hard about the thin-versus-thick middleware question before you commit to a particular MCP server or integration pattern. The choice you make early will shape your debugging experience for months.

Third, pay attention to the zero-click search trend, even if you're not building anything. The shift from search engines sending traffic to websites to search engines being the destination is going to reshape the internet in ways that affect everyone who publishes anything online.

The open question I keep coming back to is whether SearXNG's consensus-based ranking algorithm is actually the right approach for AI consumption. If the consumer is a language model reading snippets, maybe we should be optimizing for different signals entirely — factual density, source diversity, absence of SEO manipulation. But nobody's really solved that yet, and the people who do solve it are going to build something very valuable.

This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop, and thanks to Daniel for the question. If you want more episodes like this one, you can find us at myweirdprompts dot com or on Spotify.

See you next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2505: How Self-Hosted Search Actually Works for AI Agents

What Is SearXNG and How Does It Work?

The Technical Pipeline

How Scoring Works

AI Agents Change the Equation

Open Questions

The Privacy Angle

Downloads

You Might Also Like

#2505: How Self-Hosted Search Actually Works for AI Agents