It is funny looking back at the tech landscape from fifteen or sixteen years ago. If you were around the developer scene in two thousand nine or two thousand ten, you would have thought the relational database was a walking corpse. The NoSQL revolution was in full swing, and the pitch was simple: SQL is a relic of the seventies, schemas are a straightjacket, and the future is just dumping JSON into a document store and letting the database figure it out. We were promised a world where the database would finally get out of the way of the developer.
I remember that era vividly. It was the peak of the web two point zero hype. People were genuinely convinced that the rigid rows and columns of PostgreSQL or MySQL simply could not handle the scale or the velocity of the modern internet. There was this sense that if you were still writing JOIN statements, you were basically a dinosaur. And yet, here we are in March of twenty twenty-six, and the most recent Stack Overflow survey shows PostgreSQL sitting at nearly fifty-six percent usage. Meanwhile, MongoDB, the undisputed king of document databases, is sitting at twenty-four percent. It is popular, sure, but it is nowhere near the industry standard we were told it would become.
It is a massive gap, especially considering the billions of dollars in marketing muscle behind document stores over the last decade. Today's prompt from Daniel is about this exact tension. He wants us to dig into why document databases like MongoDB are still chasing SQL after all these years, where they actually fit in a world that is increasingly dominated by AI and machine learning, and what the landscape looks like beyond just the big green leaf logo.
I am Herman Poppleberry, and honestly, this is one of my favorite topics because it is a story about how the industry learned that structure actually matters. Daniel's prompt touches on something deep. Why is it that even though MongoDB is a thirty billion dollar company with over two point three billion in annual revenue, it still feels like the alternative rather than the default? To understand that, we have to look at the "JSON everywhere" promise and why it partially broke our hearts.
I think a lot of it comes down to that original dream. The idea was that you have JSON on the frontend, you pass JSON through your API, and you store JSON in the database. No translation, no object-relational mapping, no headache. It sounds like a dream for a developer who just wants to ship code. You do not have to stop and think, "Wait, do I need a foreign key here? Should this be a one-to-many relationship?" You just save the object.
It is the ultimate "get started fast" button. But the technical term we use to describe the fallout is schema-on-read versus schema-on-write. In a traditional SQL database, you have a strict schema-on-write. You cannot put a string in an integer column. The database enforces the rules at the moment the data is written. If the data is messy, the database rejects it. With a document database, you have schema-on-read. You can throw whatever you want into a collection, but the burden of making sense of that data moves from the database engine to the application code.
And that is where the wheels usually fall off for large teams, right? You end up with a collection where half the documents have a field called user underscore id and the other half have a field called userId in camel case because a developer changed their mind three years ago and nobody ran a migration. Or worse, you have a field that is supposed to be a price, but in ten percent of the documents, it is a string instead of a number.
That is the operational nightmare. The friction you save at the beginning of the project is often paid back with interest, plus a massive penalty, two years later when you are trying to run a report and your application keeps crashing because it encountered a document it did not expect. This is why the industry failed to abandon SQL. We realized that schemas are not a straightjacket; they are a safety harness. Andy Pavlo, the database professor at Carnegie Mellon, noted in his twenty twenty-five retrospective that while MongoDB is still a powerhouse, the "just use Postgres" movement has become the dominant philosophy because Postgres eventually learned how to do everything Mongo could do, but with the safety of SQL.
Which brings us to the actual business of MongoDB. Even if it is not the number one database, it is a massive business. You mentioned the thirty billion dollar market cap. Their recent earnings for the third quarter of fiscal twenty twenty-six showed that Atlas, their cloud platform, now makes up seventy-five percent of their total revenue. That is a huge shift from a few years ago.
It is the "managed service" era. People are not just downloading the community edition and running it on a server in their basement anymore; they are paying for a highly integrated, managed experience. But that success came with a cost, specifically the licensing drama that reshaped the entire market. Back in twenty eighteen, MongoDB changed their license to the Server Side Public License, or SSPL. They did this to stop companies like Amazon from taking the open-source code, wrapping it in a service, and selling it without giving anything back to MongoDB Inc.
And that single move created a massive rift in the ecosystem, didn't it? Because the SSPL is not recognized as "truly" open source by the Open Source Initiative, it forced the hands of the major cloud providers. They could not just use the latest MongoDB code for free anymore.
Right. It turned MongoDB from a community standard into a proprietary product owned by one company. And the industry responded by building compatibility layers. This is where it gets really interesting for the technical crowd. You have Amazon DocumentDB and Azure Cosmos DB with a MongoDB-compatible API. They are essentially telling developers: "Keep writing your code using the MongoDB drivers you love, but run it on our proprietary engines that we control."
And now we have a new player that is even more disruptive. You were telling me about FerretDB two point zero.
Yes, this is a huge development from just a few weeks ago. FerretDB two point zero is a fully open-source alternative that uses Microsoft's DocumentDB project—which Microsoft actually open-sourced in early twenty twenty-five—to run MongoDB queries on top of PostgreSQL. Think about the irony there. We have come full circle. We are now building document databases that are actually just fancy translation layers sitting on top of the very SQL databases they were supposed to replace. FerretDB is claiming performance boosts of up to twenty times for certain workloads compared to their previous versions.
So, if I am a developer, I can have the MongoDB API I like, but the data is actually sitting in a Postgres table? Why would I want that instead of just using native MongoDB?
Because it gives you the best of both worlds. You get the flexibility of the document model for your application code, but you get the reliability, the ecosystem, and the "boring technology" stability of Postgres for your storage. Plus, you avoid the licensing lock-in of MongoDB Inc. This is the "Multi-model" convergence we keep seeing. PostgreSQL's JSONB support effectively neutralized the primary reason people used to leave SQL. JSONB allows Postgres to store decomposed binary JSON, index it, and query it with performance that often rivals native document stores.
So if Postgres can do JSON, and these new document stores are just running on Postgres anyway, why does MongoDB still have that thirty billion dollar market cap? Is it just momentum and better marketing?
It is more than that. MongoDB still offers a developer experience that is incredibly polished. Their query language is very intuitive for people coming from a JavaScript or Python background. And for certain types of data, the document model is genuinely superior. Think about a product catalog for an e-commerce site. One product might have a screen size and a battery capacity, while another product is a pair of shoes with a size and a material. Trying to model that in a relational way often leads to the "Entity-Attribute-Value" pattern, which is a notorious performance killer in SQL. In a document store, you just have a "Product" document with whatever fields that specific product needs. It is clean.
That makes sense. If the data is inherently varied, forcing it into a table is just asking for trouble. But let's pivot to the AI side of things. Daniel asked specifically about AI and machine learning pipelines. We keep hearing that AI is the big driver for the next decade of database growth. How do document databases fit into a RAG pipeline?
This is where document databases have found a second life. In a Retrieval-Augmented Generation pipeline, you are dealing with a lot of semi-structured data. First, you have the raw documents—PDFs, web scrapes, Slack transcripts. These are not rows and columns; they are blobs of text with varying metadata. Storing these in a document database is very natural.
Right, you might have metadata like the author, the date, and a list of tags, but the core content is just a big chunk of text.
But the real value comes after you pass that text through a Large Language Model. The response from an LLM often comes back as a JSON object. It might contain extracted entities, sentiment scores, summaries, and confidence intervals. If you are iterating on your AI agent, you might change your prompt tomorrow to include a new field, like "detected language" or "urgency score." In a SQL database, adding those fields means a database migration every time you tweak your prompt. In a document store, you just save the new JSON. It allows for a much faster iteration cycle in the research and development phase of AI.
But what about vector search? Every database company on the planet, including MongoDB, added "Vector Search" to their marketing materials over the last two years. MongoDB Atlas Vector Search is a big part of their pitch now. Is it actually the right tool for the job?
It is what I call a "good enough" feature. If you are building a simple recommendation engine or a basic chatbot and you are already using MongoDB Atlas, then using their built-in vector search is a no-brainer. It keeps your architecture simple because you do not have to sync data between your primary database and a separate vector database. You avoid the "two-database problem."
But there is a "but" coming, I can hear it in your voice.
There is a big "but." For high-performance, massive-scale AI applications, we are seeing a lot of teams move toward dedicated vector databases like Weaviate or Pinecone, or even specialized Postgres extensions like pgvector. The reason is that vector search is computationally expensive. It requires specific types of indexing like HNSW—Hierarchical Navigable Small Worlds. While MongoDB has added support for this, a dedicated engine is often more optimized for the high-dimensional similarity searches that power complex AI. MongoDB is trying to be a one-stop shop, but there is always a trade-off when you try to be everything to everyone.
It feels like document databases are in this weird middle ground. They are not as strict or globally dominant as SQL, and they are not as specialized as the new vector or graph databases. They are the versatile utility players.
That is a great way to put it. And we should mention that there are other players in this space beyond the big green leaf. Apache CouchDB is still very relevant for specific use cases. It has a unique replication protocol that makes it great for "offline-first" applications. If you have a mobile app that needs to work in a basement with no cell service and then sync back to the cloud later, CouchDB is still the gold standard. Interestingly, MongoDB actually moved away from some of its mobile synchronization features a few years ago, which left a gap that CouchDB and others still fill.
And then you have RavenDB for the dot net ecosystem.
Yes, RavenDB is a fascinating one. It is built natively in C-sharp, which makes it a very attractive choice for enterprise shops that are heavily invested in the Microsoft stack. It is less about the "NoSQL revolution" and more about finding a tool that fits a specific engineering culture. It handles things like ACID transactions across documents very well, which was a major criticism of early document databases.
So, to summarize for Daniel, if we are looking at the "why" behind document databases being less popular than SQL, it really comes down to the fact that SQL grew up. It adopted the best parts of the document model while keeping the safety of the relational model.
Never underestimate the ability of an incumbent to adopt the features of its challengers. SQL has been dominant for forty years for a reason. But the "NoSQL" label itself has basically died. We do not call them NoSQL databases much anymore because they all support some form of structured querying, and most SQL databases now support unstructured data. The walls have crumbled. It is just about the data model now. Do you want to think in terms of relations and tables, or do you want to think in terms of hierarchical documents?
Let's talk about the specific decision matrix. If I am starting a project today, how do I choose?
I think the first question you should ask is: how much do I care about the relationships between my data points? If your data is highly relational—meaning you have users, who have orders, which have line items, which point to products, which have suppliers—you are going to have a miserable time in a document database. You will end up doing "joins" in your application code, which is slow, error-prone, and essentially means you are trying to reinvent a database engine in your Python script.
That is a classic mistake. I have seen teams spend months trying to optimize application-side joins when a single SQL query would have solved it in milliseconds.
But if your data is "self-contained," meaning a single document holds almost everything you need to know about that entity, then a document store shines. A user profile is a great example. You fetch the user ID, and you get their settings, their preferences, and their history all in one go. If you can fetch one document by its ID and have all the information you need to render a page, that is a huge performance win.
And for the AI side?
For AI, the rule of thumb is: use a document store for your metadata and your model outputs. When you are storing the results of an LLM call, you often do not know what fields you will need six months from now. You might want to start tracking the latency of the call, the version of the prompt used, the token count, and the specific model temperature. In a SQL database, adding those columns one by one is a chore. In a document store, you just start saving them. The speed of iteration is your biggest advantage in a fast-moving field like machine learning.
But do not choose it just because it says "Vector Search" on the box.
Correct. Choose your database based on how you need to store and query your primary data. If ninety percent of your work is relational, use Postgres and add pgvector. If ninety percent of your work is messy, evolving JSON, use MongoDB and use their Atlas Vector Search. Do not let the AI hype dictate your core storage architecture.
I want to go back to something you mentioned earlier about the "Database of Databases." Do you think we are heading toward a future where the underlying engine does not matter at all? Where we just pick an API—whether it is SQL or Mongo—and the cloud provider figures out how to store it?
We are definitely seeing a convergence. The database of the future probably does not care whether you store a row or a document. It will just be a storage engine that provides different "views" of the same data. But for now, the underlying storage architecture still matters for performance. A system designed to store rows of fixed-size integers is always going to be faster at calculating a sum over a billion rows than a system that has to parse JSON blobs to find those integers. This is why document databases are generally poor for heavy analytical work—the "OLAP" side of things. If you want to know the average order value for all customers in the Southeast region over the last three years, a columnar SQL database like Snowflake or BigQuery is going to run circles around MongoDB.
So, document databases for the live application, SQL for the heavy analysis.
That is the standard hybrid architecture we see today. You use MongoDB for the transactional stuff where you are looking up specific records, and then you stream that data into a data warehouse for the analysis. The "one database to rule them all" dream is still a bit of a fantasy for high-scale systems.
Let's look at some practical takeaways for Daniel and anyone else trying to make this decision in twenty twenty-six.
First, if you need ACID compliance and relational integrity—meaning your data must be correct and consistent across multiple tables—stick with a relational database like PostgreSQL. The "friction" of a schema is actually a feature that saves you from data corruption later.
Second, if you are in the early stages of an AI project and your data model is changing every day, a document store is a massive accelerator. It lets you prototype without the overhead of migrations.
Third, keep an eye on the Postgres ecosystem. With things like FerretDB two point zero and native JSONB, Postgres is becoming the "black hole" of the database world, sucking in the features of every other database type. For a long-term, sustainable engineering project, the "Postgres-first" approach is becoming very hard to ignore.
And finally, if you do go the MongoDB route, take a serious look at Atlas. If you are going to use a proprietary document model, you might as well get the full benefit of the managed services, the triggers, and the integrated vector search that MongoDB Inc. has spent billions of dollars perfecting.
It really comes down to choosing the right tool for the specific job, rather than following the latest architectural trend. The "NoSQL" hype cycle is over, and we are in the era of pragmatic multi-model engineering. We have better tools than ever, and we actually understand how to use them.
I think Daniel will appreciate that breakdown. It is a much more nuanced picture than the "SQL is dead" headlines we used to see back in two thousand ten.
It is a much healthier place for the industry to be. We have moved past the tribalism of "SQL versus NoSQL" and into a place where we can talk about performance trade-offs and developer experience.
Alright, that about does it for our deep dive into the world of document databases. This has been an interesting one to track, especially seeing how the licensing drama really shaped the competitive landscape we see today. It is a reminder that in tech, the business model often dictates the architecture as much as the technical requirements do.
When the license changes, the code follows.
Big thanks to our producer Hilbert Flumingtop for keeping the gears turning behind the scenes. And a huge thank you to Modal for providing the GPU credits that power the generation of this show. We could not do this without their support for the serverless AI infrastructure.
If you enjoyed this exploration of the database world, you might want to go back and listen to episode eleven twenty-three, where we did a deep dive into the "just use Postgres" movement. It provides a lot of context for why SQL is making such a massive comeback. We also have episode eleven twenty-four, which covers the explosion of specialized databases and why "one size fits all" is a dangerous mindset.
You can find those and all our previous episodes at myweirdprompts dot com. We are also on Spotify, so if you have not followed us there yet, hit that follow button to get every new episode as it drops.
This has been My Weird Prompts. Thanks for listening.
Catch you in the next one.