So Daniel sent us this one. He writes, quote, I was today years old when I learned that you can do vector embeddings with SQLite. Let's talk about sqlite-vec and when and how it can be used as a lightweight vector storage system, especially for prototypes. He wants us to explore that delightful contradiction — SQLite, the simplest database, now handling the complex data type of embeddings — and figure out when this simple solution actually beats a complex vector database.
That is a fantastic prompt. And by the way, today's episode is powered by DeepSeek V three point two.
I'm sure it has very strong opinions on vector databases. So, the 'today years old' feeling is real. You spend all this time hearing about Pinecone and Weaviate and Chroma, the whole specialized vector DB ecosystem, and then you find out the little database that powers your phone and your browser and probably your thermostat can just… do it.
It feels almost subversive. The entire premise of the last few years in AI infrastructure has been that embeddings are this exotic new data type requiring bespoke, scalable, cloud-native solutions. And then Alex Garcia comes along and says, well, actually, you can store them in a file. On your laptop.
Which is exactly the kind of elegant, pragmatic chaos I enjoy. So the core question Daniel's asking is the right one. When does the simplest possible thing win? When do you reach for sqlite-vec, and when is that actually the professional choice, not just the lazy one?
I think that's where the real insight lives. Because the instinct is to dismiss it as a toy. But if your problem fits inside the envelope of what SQLite is good at — and that envelope is surprisingly large — then adding vectors doesn't make it a toy. It makes it a vector database that happens to be incredibly simple.
Alright. Let's crack open the hood.
So, sqlite-vec is the successor to Alex Garcia's earlier project, sqlite-vss. The first stable release dropped in November of twenty twenty-five. The core mechanism is a virtual table extension.
Virtual tables being the SQLite feature that lets you define a table where the data lives somewhere else, or is computed on the fly.
Right. In this case, the data lives inside the SQLite database file, stored as B L O Bs in a proprietary, packed format that's efficient for vectors. But to your SQL queries, it looks like a normal table with a vector column. You create it with CREATE VIRTUAL TABLE vec_table USING vec0..., you insert your embeddings, and you query using a special vec_distance_knn function.
And the 'knn' there is key. It's brute-force k-nearest neighbors. No approximate nearest neighbor indexes, no H N S W graphs, no fancy hierarchical navigable small worlds. You ask for the five closest vectors, it calculates the distance to every single vector in the table.
That's the current limitation as of April twenty twenty-six. It supports cosine distance, L two distance, and inner product. But every query is a full table scan, computing those distances across your entire corpus.
Which sounds catastrophic until you run the numbers.
For a corpus of, say, a hundred thousand vectors with seven hundred sixty-eight dimensions each, a brute-force scan on modern hardware might take between a hundred and three hundred milliseconds. That's sub-second. For many interactive applications, especially prototypes or tools used by a single person, that's perfectly acceptable.
So the thesis here isn't that sqlite-vec is going to compete with Pinecone on serving a billion vectors with millisecond latency. It's about enabling vector search in all the contexts where SQLite already thrives and dominates. The surprise isn't that it's the best vector database. The surprise is that the most ubiquitous database in the world can now do this trick at all.
Given that it can do the trick, walk me through the actual SQL. If I'm a developer sitting down with this for the first time, what am I typing?
You start by enabling the extension, which you can do via the command line or in your application code. Then, the first real command is creating the virtual table. Something like: CREATE VIRTUAL TABLE document_embeddings USING vec0, open parentheses, embedding float, brackets, seven hundred sixty-eight, close brackets, close parentheses. That defines a table with a single column called 'embedding' that holds seven hundred sixty-eight dimensional float vectors.
And then I can just insert into it like any other table.
Precisely. Your INSERT statement looks totally normal. INSERT INTO document_embeddings, embedding, VALUES, and then you pass your vector array. The magic is in the query. To find the nearest neighbors to a query vector, you use the vec_distance_knn function in your WHERE clause. Something like: SELECT * FROM document_embeddings WHERE vec_distance_knn, embedding, your_query_vector, close parentheses, ORDER BY distance LIMIT five.
That is… suspiciously straightforward. It looks like any other WHERE clause with a function.
That's the beauty of it. The learning curve is nearly flat if you already know SQL. There's no new query language, no gRPC endpoints to configure. It's just SQL. And when you run that query, the extension unpacks all those stored B L O Bs, calculates the cosine or L two distance to your query vector for every single row, sorts them, and returns the top K.
Which brings us back to the brute-force reality. No indexes. So performance scales linearly with the number of vectors. Give me some concrete numbers. What does that linear curve actually look like on, say, a modern MacBook?
Alex Garcia's own benchmarks on the GitHub repo are the best source. For a dataset of a million vectors, each with one hundred fifty-three dimensions — that's a typical OpenAI text-embedding-ada size — a single K N N query takes about one point two seconds. That's on an M three MacBook Air. Halve that dataset to five hundred thousand vectors, and you're down to around six hundred milliseconds.
So the hundred thousand vector corpus you mentioned earlier would be in the one to two hundred millisecond range.
Right. And that's the sweet spot. For a huge number of prototype and even production use cases, a couple hundred milliseconds for a semantic search is fine. Think of an internal tool for searching company documentation, or a personal app that helps you find notes. You're not serving thousands of queries per second. You're serving a user who typed something and is waiting for an answer. Sub-second is often perfectly acceptable.
This is where the single-file portability becomes the killer feature, not just a neat trick. Describe that workflow.
Imagine you're building a prototype R A G application for searching local documents. You write a script that chunks your PDFs, sends the chunks to an embedding API like OpenAI or a local Ollama instance, and inserts the chunks, their metadata, and their embeddings into your SQLite database. At the end of that process, you have one file: knowledge_base.db. That file contains your entire search index.
And I can check that file into Git.
You can check it into Git. You can email it to a colleague. You can drag and drop it onto a USB drive. Your colleague can clone the repo, run your Python script, and immediately have a working semantic search over the exact same data, with zero infrastructure setup. No Docker containers to run, no cloud services to provision, no environment variables for database connection strings. It's just a file.
Contrast that with the client-server model. Even if I'm running something like Qdrant in a Docker container locally for development.
Night and day. With a client-server database, even one running locally, you have a separate process to manage. You have to ensure it's running, you have to know what port it's on, your application needs a connection string. You have to think about connection pooling, even if it's just one connection. There's network latency, even if it's localhost. All of that is complexity and friction. With SQLite, your application opens a file. That's it. The entire data layer is a direct function call within your process.
That makes it ideal for edge and embedded cases you mentioned. A Raspberry Pi collecting sensor data, doing some local anomaly detection with embeddings, and storing the results. It doesn't need a database server running; it just needs to write to a file on its SD card.
Or a mobile app. An offline-capable note-taking app that lets you semantically search your own notes. You bundle the sqlite-vec extension with your app, and the user's entire vector index lives in a file in the app's sandboxed storage. No cloud dependency, no privacy concerns about sending their notes to a server. It's local-first by default.
So the value proposition crystallizes. It's not about raw performance. It's about radical simplicity and portability. The entire complexity budget of your project isn't consumed by the vector database layer. It's essentially zero.
And that frees you up to focus on the actual hard parts: generating good embeddings, designing your chunking strategy, tuning your prompt. The storage and retrieval becomes a solved problem, implemented in about three lines of SQL.
Which feels almost heretical after years of being told this was a hard infrastructure problem. So the envelope is defined by that linear scan cost. Under a hundred thousand vectors, you're golden. Up to a million, you need to be okay with query times measured in seconds. Beyond that…
Beyond that, you're outside the comfort zone. The brute-force scan becomes the dominant bottleneck. That's when you start feeling the need for approximate indexes that can answer queries by looking at only a fraction of the data. That's the graduation line.
Right, and that graduation line maps directly onto Daniel's five ideal use cases. Prototypes are the obvious first win.
It's the perfect prototyping database because the iteration loop is so tight. You're not wrestling with infrastructure; you're wrestling with your actual problem. I can build a proof-of-concept recommendation engine for a podcast app in an afternoon. The entire data layer is a Python script and a .db file. I can show it to my co-founder, and they can run it on their machine with a git clone and a pip install. There's no "well, first you need to set up a Postgres instance" friction.
And when the prototype becomes a product with fifty thousand users, and those hundred-millisecond queries stretch to two seconds under load, that's your signal. Not a crisis, just a scheduled graduation.
That's the key insight. The pain point is specific and measurable: query latency crosses your acceptable threshold. For a web app, that's probably somewhere around one to two seconds. For an interactive tool, maybe five hundred milliseconds. When you hit that wall, you know it's time for an approximate nearest neighbor index.
Case study then. A startup uses sqlite-vec to prototype a "similar products" feature on their site. It works beautifully with their ten thousand product catalog. They launch, get traction, and their catalog grows to two hundred thousand products. Queries are now taking three seconds. What's the graduation path?
The most natural step, if they're already using a relational database for everything else, is pgvector. It's the same SQL-centric mindset, but it runs on Postgres. The beautiful part is that a lot of your application logic can stay the same. Your queries might change from vec_distance_knn to Postgres's specific operators, but the mental model of "vectors are a column type I query with SQL" remains. You migrate your data, update your connection string and a few queries, and you've bought yourself scale and proper ANN indexes.
And if they're not on Postgres? Or if they want a dedicated vector service?
Then you're looking at the dedicated systems: Qdrant, Weaviate, Milvus. Or the fully-managed cloud option, Pinecone. The trade-off is complexity. Running Qdrant in a Docker container gives you fantastic performance and those HNSW indexes, but now you're managing a separate database process. You've left the single-file paradise. You have to monitor it, back it up, think about its resource consumption. That's the cost of scaling.
So the comparison isn't just sqlite-vec versus Pinecone. It's sqlite-vec versus running a local Qdrant container for development. The former is a file. The latter is a service, even if it's on your laptop.
And that distinction defines the second use case: edge and embedded. Think about a Raspberry Pi on a factory floor, doing visual anomaly detection. It's capturing images, generating embeddings, and checking them against a known-good set stored locally. It cannot depend on a network connection to a cloud vector database. Running a full Qdrant server on a Pi is possible but heavy. Sqlite-vec? It's just writing to the SD card. The operational profile is minimal.
Single-file apps. Electron, Tauri, mobile. This is where the model sings. You bundle the extension, and the user's data is a file in their app directory.
A note-taking app like we mentioned, or a personal knowledge management tool. Everything is local, private, and portable. The user can copy their database file to another device, and their entire semantic search index moves with it. You cannot do that with any client-server system.
Local-first, collaborative apps. This is a fascinating angle. Tools like Figma or multiplayer word processors that need to sync offline changes and merge them later.
The vector search becomes a feature of the local data store. You can semantically search your collaborative document history while on a plane, because the embeddings are in the SQLite file that's already syncing via CRDTs or other local-first protocols. The vector search isn't a separate cloud service; it's a query you run against your local replica.
And finally, small-to-mid corpora. The internal company wiki, the personal music library with audio embeddings for mood matching. These are bounded datasets. You're not indexing the entire internet; you're indexing a few thousand documents or songs. The hundred to three hundred thousand vector range is the sweet spot.
And this is where people get the misconception wrong. They hear "brute-force scan" and think "toy." But for a corpus of two hundred thousand help articles, a three-hundred-millisecond search is a feature, not a bug. It's faster than most human-driven keyword searches through a clunky admin panel. The simpler architecture is a benefit, not a compromise.
So the hybrid pattern you mentioned. Start with sqlite-vec for development and initial user testing. But design with graduation in mind.
That's the professional approach. You write your data access layer as an interface. Initially, the implementation talks to your SQLite file. You get to market fast, you learn what your users actually need. When the scaling signals appear — latency, concurrent users, corpus size — you swap out the sqlite-vec implementation for a pgvector or Qdrant implementation. Your application logic barely notices. Sqlite-vec de-risks the initial experiment by making it almost free to try.
It turns the vector database decision from a day-one, high-stakes architectural bet into an implementation detail you can change later. That's huge.
It completely flips the script. For years, the advice has been "if you're doing AI, you need a vector database from the start." Sqlite-vec says, "No, you need to answer a question with vectors. Start with the simplest thing that could possibly work. Prove the value. Then, and only then, invest in the complex infrastructure if you need it."
The weirdest prompt is often the simplest one. Don't assume you need a complex system before you've proven you need a complex system.
Let SQLite hold your vectors first. You might be surprised how long it's happy to do so.
And that's the straightforward, actionable takeaway: your next AI prototype's data layer can be a single SQLite file. Don't overthink it. Start there.
That's the first insight. The second is to use the numbers we've talked about as a rough heuristic. The hundred-thousand-vector, three-hundred-millisecond rule. If your corpus fits comfortably under that line, sqlite-vec isn't just viable; it's probably optimal. You're trading theoretical scale for tangible simplicity.
And the third insight is about workflow. The value isn't just in the query speed. It's in decoupling the embedding generation from the storage. You can use OpenAI's API, or run Ollama locally, or pull from Cohere—whatever model you want to generate the vectors. Storing and querying them is now a separate, simple decision. It's not a monolithic "AI stack" problem anymore.
Which means listeners can actually do something this week. Go to your terminal. Run pip install sqlite-vec. Or download one of the pre-built binaries from Alex Garcia's GitHub. Then open a SQLite shell, load the extension, and run the hello world.
Create a virtual table of embeddings, insert a few rows with dummy vectors, and run a vec_distance_knn query. It'll take you ten minutes. And suddenly, the abstract concept of vector search becomes a concrete thing you just did on your laptop, with no servers, no cloud credits, no complexity.
That's the real democratization. It's not about making billion-vector searches free. It's about making ten-vector searches trivial. Once you've done that, you understand the whole paradigm. You can then reason about when you need to graduate to something more complex, because you've experienced the simple version.
So the prescription is almost medicinal. For your next side project, your next internal tool, your next experiment with RAG—prescribe yourself a single SQLite file. See how far it takes you. You might find the cure for over-engineering was in your file system all along.
And if you hit the limits, you graduate with honor. Not with a sense of failure, but with clear, data-driven signals. That's a much healthier way to build. The design is intentionally focused on exact search for now.
Which leaves the open question of whether Alex Garcia or someone else adds approximate nearest neighbor indexes to sqlite-vec in a future version. That would really blur the line.
It's technically possible. The virtual table interface could support creating an HNSW or IVF index as a separate virtual table that gets updated on insert. But then you're managing index files, you have build times, you're moving further from the pure simplicity of a single file. I'm not sure it's the right direction.
Part of the charm is its honest brutality. It does one thing, and you know exactly what it costs. Adding indexes makes it a different, more complex tool. The democratization we're talking about is about lowering the initial barrier, not necessarily matching the high-end features.
And that's the future implication. The most accessible database in the world now has this capability. It's not a niche tool for AI engineers anymore. It's a feature available to any developer who can write a SQL statement. That continues the trend of taking advanced AI primitives and baking them into the foundational tools we already use.
So the final pitch, to echo our own conclusion back to us, is this. The weirdest prompt is often the simplest one. Don't assume you need a complex system before you've proven you need a complex system. Let SQLite hold your vectors first.
Couldn't have said it better myself. A huge thanks to our producer, Hilbert Flumingtop, for making this conversation happen. And thanks to Modal, our sponsor, for providing the serverless GPU platform that runs our pipeline. If you're building something with vectors, check them out.
If you enjoyed this, leave us a review wherever you listen. It helps more people find the show. All our episodes are at myweirdprompts.com.
This has been My Weird Prompts.
Take your time.