#2434: From Spreadsheets to Databases: The Mental Shift

Stop treating databases like bigger spreadsheets. Learn the one conceptual shift that actually matters.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2592
Published: Apr 26
Duration: 22:57
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: data-integrity knowledge-management software-development

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The One Mental Shift That Makes Databases Work

Most organizations don't fail at databases because the tools are hard. They fail because they think about data wrong. Airtable has over 300,000 business customers who migrated from spreadsheets. NocoDB and Supabase make the technical leap trivial. The hard part is conceptual.

The Single Canvas Problem

A spreadsheet is one flat canvas. You start typing, structure emerges (or doesn't), and duplication is inevitable. The average Google Sheets user has 47 tabs per workbook. That's not a spreadsheet anymore — it's a cry for help. You're already doing database things badly: cross-tab references, VLOOKUP chains that break, nobody knows who changed what.

The pain threshold hits when multiple people enter data, or when you need to answer questions like "show me all unpaid invoices from vendors we haven't used in six months."

From Embedding to Referencing

The core shift: stop embedding data and start referencing it. A spreadsheet stores a vendor's phone number in every invoice row. A database stores it once in a Vendors table and uses a foreign key to point to it. Update it in one place, and every query sees the change instantly.

This is normalization — not storing the same fact in seventeen places.

Mapping Your Business Nouns

The real work is sitting down with pen and paper and listing every noun in your business that has its own lifecycle. A thing that gets created, updated, tracked, and eventually archived independently. Those are your candidate tables.

The test: does this thing exist on its own, or is it just a property of something else? A phone number isn't a table — it's a column. Invoice status isn't a table — it's a column. A product, with its own lifecycle of creation, price updates, and discontinuation — that's a table.

One-to-Many and Many-to-Many

One-to-many is simple: put a foreign key on the many side. One vendor has many invoices — the Invoices table gets a vendor_id column.

Many-to-many is where spreadsheets break. An invoice can have many products, and a product can appear on many invoices. You can't model this with a single foreign key. The solution is a junction table (Line Items or Invoice Products) that breaks the many-to-many into two one-to-manys. Each row says "on invoice 47, product 12 appeared, quantity 3, price $80."

Draw boxes. Draw lines. On each end write "one" or "many." If both ends say "many," you need a junction table. Get it wrong on paper and it costs an eraser. Get it wrong in production and it costs a weekend.

The Takeaway

The pen-and-paper step isn't optional busywork. It's the cheapest possible time to discover your mental model has a hole. The tools are absurdly good now. The thinking needs to catch up.

Mentions

Airtable Spreadsheet-database hybrid platform
dbdiagram Browser-based database schema designer
Google Sheets Cloud-based spreadsheet application
NocoDB Open-source no-code database tool
Sheetgo Spreadsheet automation and survey tool
Supabase Open-source Firebase alternative with database

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Featured In

Creator's Picks 304 episodes

#2434: From Spreadsheets to Databases: The Mental Shift

You have fifteen Google Sheets tabs, a growing headache of broken formulas, and you're wondering if you need a database. The answer is probably yes, but not the way you think.

Daniel sent us this one. He's asking for a practical primer on database thinking for organizations that have only ever used spreadsheets. How do you split out tables, how do you model relationships like one-to-many and many-to-many, and how do you avoid turning a simple vendor-invoice-client setup into some twelve-table monster that needs a full-time database administrator. The core question is, what's the mental shift that actually matters?

This is one of those topics where the tools have gotten absurdly good, but the thinking hasn't caught up. By the way, today's episode is powered by DeepSeek V four Pro.

I'll pretend I know what that means.

It means the script is better than usual. But here's what I mean about the tools. Airtable alone has over three hundred thousand business customers as of this year. Most of them migrated from spreadsheets. NocoDB, Supabase, even just Google Sheets with connected sheets, the technical leap from spreadsheet to database is basically trivial now. You can spin up a relational backend in an afternoon. The failure point is the conceptual gap. People try to replicate their spreadsheet structure exactly in a database, and that misses the entire point.

Which is what, exactly? Because I think a lot of people hear database and they picture, I don't know, a bigger grid with more rows.

That's exactly the misconception. A spreadsheet is a single canvas. You start typing and the structure emerges as you go, or it doesn't emerge at all. A database is a set of interconnected canvases, tables, that you design before you enter data. The schema comes first. And that constraint, the fact that you have to decide what goes where upfront, that's what makes it powerful. It forces you to think about what your data actually is, rather than just where it fits on a grid.

The real work isn't learning SQL or picking a tool. The real work is sitting down and figuring out what nouns exist in your business.

And there was a survey by Sheetgo back in twenty twenty-four, they found the average Google Sheets user has forty-seven tabs per workbook. That's not a spreadsheet anymore, that's a cry for help. At that scale, you're already doing database things, you're just doing them badly. Cross-tab references, VLOOKUP chains that break if you sneeze, nobody knows who changed what. The pain threshold is usually when you have more than one person entering data, or when you need to answer questions like, show me all unpaid invoices from vendors we haven't used in six months.

That's the thing, right? Most people don't even realize they've crossed that pain threshold. They just assume the frustration is normal. So let's define the actual shift. You said a spreadsheet is a single canvas and a database is interconnected canvases. Unpack that for someone who's never opened anything but Google Sheets.

Imagine you're a small construction contractor. In a spreadsheet, you've probably got one massive tab called something like Master Tracker. Every row is an invoice. You've got columns for client name, client address, vendor name, vendor phone, project address, line item description, quantity, unit price, total. And every time you invoice the same client, you're retyping their address. Every time you use the same vendor, you're retyping their phone number. That's the single canvas problem. Everything lives in one flat space, so you duplicate constantly.

The duplication is where things break.

It's where everything breaks. You update a vendor's phone number in one row but forget the other two hundred rows where they appear. Now your data is lying to you. A database solves this by splitting those nouns into separate tables. Clients lives in one place, with one row per client. Vendors lives in another. Invoices in another. And instead of copying data around, you just connect them with IDs. An invoice doesn't contain the client's address, it contains a client ID that points to the clients table. One source of truth.

The core shift is from embedding everything to referencing everything.

That's it. And this is where people trip. They open a database tool and immediately try to recreate their spreadsheet layout, one big table with all the columns they're used to. They treat the database like a faster spreadsheet. But that completely misses normalization, which is just a fancy word for not storing the same fact in seventeen places.

What about the person who says, look, I've got fifteen Google Sheets tabs, each one is basically a table already. I've got a Vendors tab, a Clients tab, an Invoices tab. Isn't that good enough?

It's closer, but tabs aren't tables. The difference is that in a spreadsheet, those tabs don't enforce any relationship between each other. You can have an invoice in the Invoices tab that references a client ID that doesn't exist in the Clients tab, and Sheets will never tell you. There's no referential integrity. You can also have a vendor name stored directly in the Invoices tab and also in the Vendors tab, and they can disagree, and nothing catches it. A database says, no, this invoice must point to a real client, and if you try to delete that client, the database will either block you or cascade the delete, depending on what you've told it to do.

The mental model is, stop thinking about where the data sits on a grid and start thinking about what the data is and how it relates to other data.

That's the part you can do with pen and paper before you touch any software. Which is where we should go next.

The pen and paper thing. Walk me through it. I'm this contractor, I've got two hundred vendors, five hundred clients, twelve hundred invoices over three years, and my spreadsheet is groaning. What am I actually drawing?

You start by listing every noun in your business that has its own lifecycle. A lifecycle means the thing gets created, updated, tracked, and eventually maybe archived or closed, independently of other things. Those are your candidate tables. What you don't do is make a table for something that's just an attribute of something else. A phone number isn't a table, it's a column on the Vendors or Clients table. A status like paid or pending isn't a table, it's a column on Invoices.

The test is, does this thing exist on its own, or does it only exist as a property of something else.

And that's where people get tangled. They'll look at a column like invoice status and think, well, statuses change, maybe that needs its own table. But status is just a label you apply to an invoice. It has no independent existence. You don't track when a status was created or who its contact person is. It's just a value. Now, a product, that has a lifecycle. You add products, you update prices, you discontinue them. That's a table.

I've got my nouns. Vendors, Clients, Invoices, Products, maybe Projects. What do I do with them?

Draw a box for each one. Inside the box, write the noun at the top, then list the attributes underneath. For Vendors, that's vendor ID, name, contact, phone, maybe a status or category. For Clients, client ID, name, address. For Invoices, invoice ID, date, total amount, and then, and this is where the magic happens, you add columns that point to other tables. Vendor ID and client ID go inside the Invoices box.

The invoice doesn't contain the vendor's name. It contains a number that says, go ask the Vendors table for the rest.

That number is a foreign key. And that foreign key is how you model a one-to-many relationship. One vendor has many invoices. The way you represent that in a schema is simple. The table on the many side, Invoices, gets a vendor ID column. That's it. You don't put anything extra in the Vendors table to indicate which invoices belong to it. The relationship lives in the structure of the foreign key.

What does that actually let me do that my spreadsheet couldn't?

With that structure, a single query can say, give me all invoices from vendor X for client Y in the first quarter of this year. In a spreadsheet, you're filtering, sorting, maybe writing a QUERY function that breaks when you add a column. In a database, it's one line of S-Q-L, and it's fast, and it's correct every time. Because the database knows vendor X is vendor ID seventeen, and it can instantly find every invoice where vendor ID equals seventeen and client ID equals, say, forty-two, and the date is between January first and March thirty-first.

You're not trusting that someone spelled the vendor's name the same way on every row.

That's the integrity piece. The vendor's name lives in exactly one place. Change it once, and every query that joins to the Vendors table sees the update instantly. No stale data. No hunting through twelve hundred rows.

Okay, so one-to-many, foreign key on the many side. What about many-to-many? Because I can already feel the headache coming.

This is where people who've only used spreadsheets hit a wall, because you can't model it with a single foreign key. Take invoices and products. One invoice can have multiple products on it, obviously. But one product can also appear on many different invoices. That's a many-to-many relationship. And the trick is, you never connect them directly.

You need a middleman.

A junction table. Sometimes called a bridge table or an associative table. You create a new box, call it Line Items or Invoice Products. It has its own ID, plus two foreign keys. Invoice ID and product ID. Plus any attributes that are specific to that particular combination, like quantity and unit price. Each row in this table says, on invoice forty-seven, product twelve appeared, with a quantity of three and a unit price of eighty dollars.

The junction table breaks the many-to-many into two one-to-manys. One invoice has many line items, one product has many line items.

That's exactly the mental model. And this is where the pen and paper exercise really shines. You draw your boxes. You draw lines between them. And on each end of the line, you write one or many. Vendors to Invoices, one on the Vendors end, many on the Invoices end. Invoices to Line Items, one to many. Products to Line Items, one to many. If you ever draw a line and both ends say many, you know you need a junction table sitting between them.

If you get this wrong on paper, it costs you an eraser. If you get it wrong after you've built the thing, it costs you a weekend.

Possibly your sanity. I've seen small businesses try to cram many-to-many into a spreadsheet by creating columns like product one, product two, product three on the invoice row. That breaks the moment you have an invoice with four products, or the moment you need to ask, which invoices included product twelve. With a proper junction table, that question is trivial. Without it, you're writing increasingly deranged formulas that will haunt you.

The pen and paper step isn't optional busywork. It's the cheapest possible time to discover that your mental model of your own business has a hole in it.

That hole usually shows up when you hit the trickier stuff. The nouns with clean lifecycles, Vendors, Clients, Invoices, those are easy. The mess starts when something changes state over time. Like a vendor you stop working with. Do they get their own table for past vendors, or do you just put a checkbox on the Vendors table that says inactive?

I've seen people do both and argue about it. The checkbox camp says it's simple. One table, one filter. The separate table camp says you're polluting your active vendor list with dead records.

For a small business, the checkbox wins. And I want to be precise about why, because this is where people get seduced into complexity. If you make a separate past vendors table, you now have two tables with identical structure. Every query that needs to look at all vendors, active and inactive, requires a UNION. If a vendor comes back, you're moving rows between tables. It's administrative overhead for a problem you don't actually have.

The checkbox is just a Boolean column. Is active, true or false. Where is active equals true gives you current vendors. Where is active equals false gives you the archive.

That's exactly what you need for a business with a few thousand vendors. The database will filter that in milliseconds. The separate table argument only becomes real when you're dealing with millions of rows and you need to physically partition old data onto different storage for performance. That's not the contractor with twelve hundred invoices. That's an enterprise with a database administrator and a budget.

The rule is, model state changes as columns, not as new tables, until the hardware forces your hand.

That applies to workflows too. Every business has custom steps. Pending approval, needs resubmission, awaiting payment, scheduled, completed. The temptation is to build a workflow stages table with foreign keys and metadata. But unless each stage carries its own data, like an approver name, a deadline, notes specific to that stage, it's just a status column on the main table.

Because otherwise you're joining to a table that's basically a glorified dropdown list.

The queries get painful. You want all invoices awaiting approval. With a status column, that's one where clause. With a workflow stages table, you're joining invoices to stages, filtering on the stage name, and probably pulling in metadata you don't need. It's elegant on a whiteboard and exhausting in practice.

This connects to something you said earlier about normalization. People hear third normal form and think it's a purity test. Every piece of data must be atomized into its own table.

That's the trap. Third normal form says every non-key column should depend only on the primary key, not on other non-key columns. It's a good guideline. It eliminates the obvious redundancies. But for a small business, strict adherence can make simple reports require five joins. There's a real case for denormalizing strategically. Storing the vendor name directly on the invoice row alongside the vendor ID, for example.

That violates the single source of truth you were just preaching.

It does, and I'd only do it for historical accuracy. If a vendor changes their name, you probably want your old invoices to show the name they had at the time the invoice was issued. Storing vendor name on the invoice freezes it in place. The vendor ID gives you the current name if you join to the Vendors table. Having both gives you options. The cost is, yes, you've duplicated data. The benefit is, your historical reports don't silently rewrite themselves.

You're saying break the rules when the business reality demands it, not because you couldn't figure out the join.

I'd add, break the rules consciously and document why. A comment in your schema that says, vendor name stored here for historical snapshot purposes, prevents the next person from thinking it's a mistake and removing it. That's the difference between pragmatic denormalization and just being sloppy.

What about the business that's read enough to be dangerous and shows up with a twelve-table schema on day one? I'm imagining a landscaping company I heard about that tried to model every possible attribute as a separate table.

That's a real case. Landscaping company, maybe twenty employees, a few hundred clients. They built a schema with separate tables for service types, property zones, equipment assignments, crew assignments, material types, material suppliers, invoice statuses, payment methods. Eight tables just to answer the question, who are our clients and what did we bill them. A simple client list required an eight-table join that took twelve seconds to run.

Twelve seconds for a client list is broken.

They stripped it down to four tables. Clients, Services, Invoices, and a junction table for invoice line items. Same business questions answered. Query time dropped to zero point three seconds. And the schema was readable by someone who wasn't the person who built it.

The discipline is, start with three to five tables. Vendors, Clients, Invoices, Products, maybe Projects. Add a table only when you hit a specific question your current schema cannot answer.

Articulate that question out loud before you add the table. Not, I might need this someday. But, I need to track which crew was assigned to which project on which date, and I can't do that without a Crew Assignments table. If you can't state the question that forces the new table, you don't need the table.

The sketching part. You mentioned dbdiagram.

It's free, it's in the browser, and it lets you draw boxes and lines without installing anything. But honestly, graph paper works just as well. The point is to externalize your thinking before you commit to a structure. You draw your three to five boxes, you draw your relationship lines, you label the ends with one or many. If the diagram looks clean and you can trace from any noun to any related noun without crossing lines into a tangled mess, your schema is probably right.

If it looks like a spiderweb, you've overdone it.

You've overdone it, or you've modeled things that aren't actually entities. A spiderweb diagram is your first warning sign that you're building tables for concepts, not for things with lifecycles.

After all that warning against over-engineering, feels like we should land this with something concrete you can actually do Monday morning.

And the first one is genuinely the most important. Before you open any database tool, before you type a single line of S., spend thirty minutes with a pen and a piece of paper. List every noun in your business. Clients, invoices, vendors, products, projects, payments, employees. Then draw boxes around them and draw lines between them. Write one or many on each end of every line. That thirty minutes will save you more time than any tutorial you'll ever watch.

The second one follows from that. Start with three to five tables. That's it. Vendors, Clients, Invoices, Products, maybe Projects. That handful will answer ninety percent of the questions your business actually asks. Add a new table only when you can articulate the exact question it answers that your current setup cannot.

Articulate it out loud to another human being. If you can't state the question clearly, you don't need the table. I'm not being cute about this. The landscaping company we mentioned, they stripped twelve tables down to four and got faster queries and a schema someone else could actually read. The discipline is saying no to your own cleverness.

For historical data, past vendors, archived clients, old projects, use a Boolean flag. Is active, is archived. Put it on the existing table. Do not create separate archive tables. The separate table approach creates more problems than it solves until you're operating at a scale where you'd already have a database administrator on payroll. A Boolean column with a filter gets you the same result with zero additional complexity.

If you ever do hit the point where performance demands archival tables, you'll know because your queries will be slow and you'll have the data to prove it. You won't be guessing. Until that day, a checkbox is your friend.

The Monday morning version. Pen and paper, thirty minutes. Three to five tables. Booleans for state changes, not separate tables.

That's the whole playbook. Everything else is optimization for problems you don't have yet.

Here's what's interesting though. We've handed people the playbook, and yet there's one question I think every small business owner sits with silently. What's the one table you know you need but are afraid to model? Because once you model it, you have to face what's actually in it.

That's the real homework. Every business has that one entity they've been avoiding. Maybe it's the table that tracks client complaints, or the one that logs when employees cut corners. The thing that lives in sticky notes and email threads because putting it in a database makes it real and measurable.

The fear is, if I build the table, I'll have to look at the numbers. But here's the thing. Once you have this schema, once you've done the pen and paper work and built your three to five core tables, you've actually built a foundation that makes everything else easier to add. Reporting becomes straightforward. You can hook up automation tools without duct tape. You can even expose a simple A.if you ever need to connect something like a payment processor or a scheduling tool.

Only if the foundation is right. If your Vendors table is a mess of duplicated names and missing foreign keys, nothing built on top of it will work. The schema is the load-bearing wall. You get that right, and everything else is decoration.

The homework is, what's the table you're avoiding? Draw it on paper this week. Just the box and the lines. You don't have to build it yet. But name it.

Now, Hilbert's daily fun fact.

The collective noun for a group of porcupines is a prickle.

If you want more on this, we've got show notes at myweirdprompts.Special thanks to Hilbert Flumingtop for producing. This has been My Weird Prompts. I'm Herman Poppleberry.

I'm Corn. Leave us a review if this helped you think differently about your spreadsheet headache.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2434: From Spreadsheets to Databases: The Mental Shift

The One Mental Shift That Makes Databases Work

The Single Canvas Problem

From Embedding to Referencing

Mapping Your Business Nouns

One-to-Many and Many-to-Many

The Takeaway

Mentions

Downloads

You Might Also Like

Featured In

#2434: From Spreadsheets to Databases: The Mental Shift