#2444: Custom IDs: UUIDs vs Human-Readable Keys

How to design database IDs that balance security, human readability, and performance — with lessons from Stripe and TypeID.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2602
Published: Apr 26
Duration: 27:19
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: software-development data-integrity distributed-systems

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Art of Database ID Design: Balancing Security, Readability, and Performance**

Choosing how to identify records in a database seems like a trivial decision — until it isn't. The wrong choice early on can create years of slow-motion headaches, from security vulnerabilities to painful migrations. This episode explores the landscape of database ID strategies, from UUIDs to sequential integers to the emerging hybrid standards that combine the best of both worlds.

The Core Tension: Security vs. Readability

UUIDs are terrible for humans. A standard UUIDv4 string is 36 characters of hexadecimal gibberish — impossible to read over the phone, hard to debug, and frustrating to communicate. Yet they solve real security problems. Sequential integers leak information: if your user ID is 1003, a competitor knows you have roughly a thousand users. Enumeration attacks become trivial when API endpoints use predictable integers. UUIDs make guessing valid IDs computationally infeasible.

Stripe's Elegant Hybrid

The real world has converged on a hybrid answer, and Stripe exemplifies it. Their IDs use prefixed strings — cus_ for customer, pi_ for payment intent, ch_ for charge. The prefix tells you immediately what kind of object you're looking at, eliminating entire classes of errors where wrong IDs get passed to wrong endpoints. It also enables polymorphic lookups: when an API parameter could accept multiple object types, the prefix tells the system which table to query without needing a separate type column.

Stripe even optimized for human ergonomics at the character level. They chose underscores over colons or dashes because double-clicking a string with underscores selects the entire ID — a small detail that affects millions of copy-paste operations daily.

The Hybrid Schema Pattern

The recommended pattern separates concerns: use a surrogate key (auto-incrementing integer or UUID) as the primary key for joins and foreign keys, with a separate column for the human-readable business identifier. This resilience matters because natural keys change — SKUs get updated, invoice numbering formats shift, even VINs contain errors. With a surrogate primary key, updating the natural key column affects one row instead of cascading through every related table.

UUIDv7 and the Performance Question

UUIDv4's random distribution fragments B-tree indexes, hurting write performance at scale. UUIDv7 solves this by embedding a 48-bit millisecond timestamp at the front of the ID, making writes time-sortable. This dramatically reduces index fragmentation while preserving global uniqueness and unpredictability. For small databases the difference is negligible; for millions of rows with frequent joins, it becomes measurable.

TypeID: The Next Evolution

TypeID combines Stripe's prefixed approach with UUIDv7's time-sortable properties. The format — prefix_ followed by a 26-character base32-encoded UUIDv7 suffix — gives you type safety, global uniqueness, K-sortability, and human readability in one package. Libraries exist for JavaScript, Go, Rust, and Elixir.

Practical Invoice Numbering

For small businesses, invoice numbering exposes all these tensions at once. Sequential numbering (INV-0001) is simple but leaks information about your volume. Chronological numbering (20260426-001) adds date-based tracking but can complicate accounting software integration. The key lesson: test your numbering format with your actual accounting tools before committing, and design for the possibility that your business might merge, split, or change numbering schemes down the road.

Mentions

Authgear Compared UUIDv4 vs v7 performance
Discord AutoMod Regex used to catch secret key leaks
Mendix Forum thread on ID drift issues
Statsig Wrote about UUID security-first defaults
Stripe Prefixed ID system with underscores
Tofu Invoice numbering guide website
TypeID Prefixed UUIDv7 format by Jetify
ULID 26-character time-sortable ID format
Unreal Engine Forum on canonical item IDs
UUIDv7 Time-sortable UUID standard from RFC 9562

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2444: Custom IDs: UUIDs vs Human-Readable Keys

Daniel sent us this one — he wants to talk about custom IDs in databases. When to create them, how to avoid drift when you're juggling multiple ID types for the same object, and how to design a schema that handles human-readable IDs alongside UUIDs. He mentioned inventory systems and small business invoices as the concrete scenarios to work through. This is one of those things where the wrong decision early on becomes a slow-motion headache for years.

It's the kind of topic where most advice online falls into two camps — either "always use UUIDs for everything" or "just auto-increment and move on." Neither captures what actually happens when you're building something real. I've inherited systems that tried both extremes, and the scars are real.

By the way, today's episode is powered by DeepSeek V four Pro.

I've been using it for code review and it catches things that surprise me. Okay, so let's start with the core tension here. UUIDs are terrible for humans. A standard UUIDv4 string is thirty-six characters of hexadecimal gibberish. You can't read it over the phone, you can't glance at it and know what it refers to, and if you're debugging an invoice system and someone says "order number d four eight a seven three f two one" you want to walk into the sea. I once spent forty-five minutes on a support call where the customer and I were going back and forth on a UUID character by character. It was a "d" not a "b," but over a bad connection they sound identical.

Yet the security argument for UUIDs is genuinely strong. Sequential integers leak information. If I create an account on your platform and see that my user ID is one thousand and three, I now know you have roughly a thousand users. If I'm a competitor or a bad actor, that's useful intelligence. There's a famous example from the early days of social media — I think it was a now-defunct platform — where a competitor literally tracked their growth week over week just by creating accounts and watching the user ID increment.

Right, and it goes further than user counts. Enumeration attacks are a real thing. If your API endpoint is slash users slash one two three four and there's no proper authorization check, someone can just iterate through integers and scrape your entire user base. UUIDs make that nearly impossible because they're non-sequential and unpredictable. The Statsig team wrote about this — they consider UUIDs a security-first default precisely because guessing valid session IDs becomes computationally infeasible.

That's the surface-level trade-off. Security and global uniqueness on one side, human readability on the other. But what I find interesting is that the real world has already converged on a hybrid answer, and it's Stripe.

Stripe's ID system is elegant. They use prefixed strings — cus underscore for customer, pi underscore for payment intent, ch underscore for charge. The prefix tells you immediately what kind of object you're looking at. Paul Asjes, one of their developer advocates, wrote about this back in twenty twenty-two. He said, and I'm quoting roughly here, without knowing anything else about the ID we can immediately confirm that we're talking about a PaymentIntent object. That alone eliminates an entire class of errors where you pass the wrong ID to the wrong endpoint.

It's not just about developer ergonomics. The prefix enables polymorphic lookups. When Stripe's API accepts a payment method parameter that could be a PaymentMethod, a Source, or a Card ID, the prefix tells the system which table to query without any ambiguity. You don't need a separate type column or a mapping table. The ID itself carries the routing information.

And there's this wonderful detail that I think says everything about why good ID design matters at scale. Stripe chose underscores over colons or dashes as the separator. Double-click behavior. If you double-click cus underscore one two three, the entire ID gets selected. If you double-click cus colon one two three or cus dash one two three, only part of it gets selected. That's a human ergonomics decision baked into a technical ID format, and it affects millions of copy-paste operations every day.

That's the kind of detail that makes me think someone at Stripe had been annoyed by bad ID formats for years and finally got to fix it. I can picture the internal design doc where someone just wrote "underscores. Trust me on this one.

But here's where it gets more interesting. Most people assume Stripe stores that prefixed string as the primary key in their database. And Paul Asjes actually clarified in a comment that they don't. The exposed ID goes through some sharding magic, and the internal primary key is different. Even Stripe maintains a separation between the human-facing identifier and the internal system identifier.

Which brings us to the hybrid schema pattern. Surrogate keys as primary keys, natural keys as unique alternate keys.

And I want to walk through why this matters concretely. Your primary key is what every foreign key in every other table points to. If you use a natural key like a SKU or an invoice number as your primary key, and that natural key ever needs to change — and they do change, more often than people admit — you now have to cascade that update through every related table. It's a nightmare.

What kinds of changes are we talking about? Give me a concrete scenario.

A product gets re-categorized and its SKU changes. A supplier changes their numbering scheme. An invoice numbering format gets revised because the business merged with another location. Even supposedly immutable identifiers like vehicle VIN numbers can contain errors that need correction. I worked with a logistics company that had to update dozens of VINs because a manufacturer had a data entry error in their system. If those VINs had been primary keys, they would have had to cascade updates through shipments, maintenance records, insurance claims, and regulatory filings across six different databases. The Liam ERD guide from late twenty twenty-four made this argument well — surrogate keys are resilient to business change. With a surrogate primary key, you update the natural key column in one place and every foreign key relationship stays intact.

The recommended pattern is something like an auto-incrementing bigint or a UUID as the primary key for joins and foreign keys, plus a separate column with a unique constraint for the human-readable business identifier.

And there's a specific dual ID pattern that avoids needing a separate mapping table. Both IDs live in the same row — you have id bigint auto increment primary key, and public underscore id UUID unique not null. Internal joins use the compact integer for speed. External APIs expose the UUID for security. This is documented in the Bytebase guide and a few other places as a way to minimize join overhead while keeping external IDs opaque.

Let me push on the performance angle. How much does the integer versus UUID primary key actually matter for join performance? Is this something a small business with ten thousand invoices is going to feel, or is it purely a scale concern?

It matters, but the magnitude depends on scale. A UUID is sixteen bytes. A bigint is eight bytes. That difference compounds across indexes and foreign key columns. For ten thousand rows, you won't notice. For ten million rows with multiple indexes and frequent joins, it becomes measurable. But the bigger performance issue isn't storage — it's index fragmentation. Random UUIDv4 values scatter writes across the B-tree, which fragments the index and hurts cache locality. That's actually why UUIDv7 exists.

I know it's part of the newer UUID spec, but walk me through why it's different.

UUIDv7 is part of RFC nine five six two, and it's the emerging standard. Unlike random UUIDv4, UUIDv7 embeds a forty-eight-bit millisecond timestamp at the front of the ID. That makes the IDs time-sortable. Writes cluster chronologically instead of scattering randomly, which dramatically reduces B-tree index fragmentation. Think of it like this — with UUIDv4, inserting a thousand new rows is like throwing darts at a dartboard. With UUIDv7, you're placing them in roughly sequential order. Authgear did a comparison back in February and found that UUIDv7 meaningfully improves database write performance compared to UUIDv4, especially under high insert loads.

You get the global uniqueness of a UUID with the index-friendly properties of a sequential ID. That feels like the best of both worlds.

And then there's ULID, which is a twenty-six character base thirty-two string that also sorts lexicographically by time. ULIDs were designed for string-based databases and APIs where you want natural sorting without extra indexes. The trade-off is that ULIDs created in the same millisecond may not be strictly ordered unless you use a monotonic variant. It's a small edge case, but if you're generating IDs in a tight loop, you need to be aware of it.

Someone looked at all of this and said, what if we combine Stripe's prefixed IDs with UUIDv7?

That's TypeID. Jetify launched it, and the format is prefix underscore followed by a twenty-six character base thirty-two encoded UUIDv7 suffix. So you get something like invoice underscore two x four y six z eight a zero b one c two d three e four f five g six h seven j eight k. It's type-safe because the prefix tells you what you're looking at, it's K-sortable because the UUIDv7 timestamp is at the front, it's globally unique, and there are libraries for JavaScript, Go, Rust, and Elixir.

TypeID feels like the obvious endpoint of this whole evolution. But I suspect most small businesses aren't going to adopt it tomorrow. Let's talk about the invoice use case Daniel mentioned, because that's where the rubber meets the road.

Small business invoice numbering is a great case study because it exposes all the tensions at once. You want something human-readable because your clients read invoice numbers. You want something that works with accounting software. You probably want some business context embedded in the number. And you want to avoid leaking information you'd rather keep private.

The simplest approach is sequential numbering. Invoice zero zero zero one, zero zero zero two, and so on. It's dead simple. But it tells every client exactly how many invoices you've issued, which may or may not matter depending on the business. If you're a freelancer and you send your first invoice as zero zero zero one, the client immediately knows they're your first customer. That might be fine, or it might undercut your negotiating position.

It creates problems when you need to merge data from multiple locations or handle returns and credits. If you have two separate invoicing systems and both started at zero zero zero one, you've got collisions. I've seen this happen when a small business acquires another and suddenly they have two invoice number ones, two invoice number twos, and so on. Reconciling that is a mess.

The next step up is chronological numbering. Something like two zero two six zero four two six dash zero zero one. Date plus sequence. This is useful when you issue multiple invoices per day and want date-based tracking built into the identifier.

The Tofu dot com invoice numbering guide recommends exactly this pattern for businesses that need date-based organization. But they also flag something practical that's easy to skip — you should test your numbering format with your accounting software before committing to it. Different accounting packages have different expectations about invoice number formats, and finding out you have a mismatch after six months of invoicing is painful. Some software expects purely numeric identifiers. Some have character limits. Some choke on dashes. Test it with a dummy invoice before you send real ones.

Then there are client-specific and hybrid formats. ABC dash zero zero one for ABC Company's first invoice. Or MKT dash two zero two six zero four two six dash zero zero one for department dash date dash sequence. These carry business meaning, which is useful, but they're natural keys that can change if the client name changes or the department gets reorganized. What happens when ABC Company rebrands to XYZ? Now all your historical invoice numbers carry a name that no longer exists.

Which is exactly why they should be alternate keys with unique constraints, not primary keys. The Red-Gate ERD guide for invoice management recommends surrogate primary keys for every table — invoice, customer, address, line items — and then the invoice number itself lives in a column with a unique constraint to prevent duplicates.

Let's sketch the invoice schema concretely. What tables are we talking about?

At minimum, four tables. First, an invoice table with a surrogate primary key, a unique invoice number column, a customer ID foreign key, dates, totals, and status. Second, a customer table with its own surrogate key and whatever customer details you need. Third, an address table — and you want this separate from customer because billing and shipping addresses are different entities, and a customer might have multiple addresses over time. Fourth, an invoice line items table that links back to the invoice with a foreign key.

Why separate line items instead of embedding them in the invoice? I've seen people use JSON columns for line items and call it a day.

Because an invoice can have multiple line items, and each line item is a distinct entity that you might need to query independently. If you're running a report on how many units of a particular product you sold last quarter, you're querying line items, not invoices. If line items are buried in a JSON column, that query becomes a full table scan with JSON parsing. It works at small scale but falls apart when you need real reporting. Normalization matters here.

Now let's talk about ID drift, because that was the second part of Daniel's question and it's where things get messy in practice.

ID drift happens when the same logical object has inconsistent identifiers across different systems or different points in time. The classic scenario is importing data from an external source where the IDs were generated under a different scheme. If you're not careful, you end up with duplicate entity IDs for the same logical object, and your associations get corrupted.

The Mendix community forum has a whole thread about this. Someone imported data with pre-generated custom IDs and the platform's backend assumed uniqueness without active checks. The result was associations pointing to the wrong records. An invoice line item that should have pointed to a customer in Berlin was suddenly pointing to a vendor in Toronto because the IDs collided.

In inventory systems, the canonical pattern for avoiding drift is to use a single item ID that's referenced everywhere. The Unreal Engine forums describe this well — rather than storing full item data in each inventory slot, you store only the item ID and reference a central data table for name, type, and description. One canonical ID per unique object instance, regardless of where it appears.

The principle is: don't derive one ID from another, and don't store the same logical object under different IDs in different places. Generate IDs independently and map them.

Enforce uniqueness at the database level. If you're using multiple ID types for the same object — say an auto-increment primary key, a human-readable invoice number, and a UUID for external APIs — each one gets its own unique constraint. The database is your last line of defense against duplicates. Application-level checks are good, but they race. A unique constraint is atomic.

There's an interesting tension here that I want to pull on. Time-sortable IDs like UUIDv7 and ULID embed timestamps. That's great for index performance, but it means every ID leaks the approximate creation time of the record. For most applications that's fine. For sensitive systems — whistleblower submissions, anonymous voting, anything where creation time is itself sensitive — a random UUIDv4 might be the better choice.

The Authgear guide flags this explicitly. They say all time-sortable identifiers expose approximate creation time, and identifiers exposed in APIs or URLs should be evaluated based on privacy and threat models. It's not that UUIDv7 is wrong — it's that you need to know what you're trading off. If you're building a system for anonymous incident reporting and someone can look at the report ID and know it was created on a Tuesday afternoon, that narrows down who could have submitted it.

The decision tree is getting clearer. For internal primary keys, use something that's good for the database — UUIDv7 if you need global uniqueness, auto-incrementing bigint if you don't. For external-facing IDs, use something that balances security and readability — Stripe-style prefixed strings or TypeIDs. For business identifiers like invoice numbers, use a format that makes sense for your domain but store it as an alternate key, not the primary key.

If you're building something new today, I'd probably recommend TypeID as the external ID format. You get the prefix for type safety, the UUIDv7 for sortability and global uniqueness, and the base thirty-two encoding makes it shorter than a standard UUID string. Twenty-six characters instead of thirty-six.

The one thing I'd add is that whatever format you choose, you should be deliberate about it early. The migration cost of changing ID formats later is brutal. You're touching every table with a foreign key, every API endpoint, every integration, every stored reference in external systems.

Every printed invoice that a client has filed away. Some changes you simply can't make retroactively. If you've sent a thousand paper invoices with a particular numbering scheme, those numbers exist in your clients' filing cabinets forever. You can change the format going forward, but you can't erase history.

Which is why I think the surrogate key pattern is so important. It gives you an internal identifier that never changes and never leaks externally, and then you can evolve your external ID format over time without breaking your database. You could switch from sequential invoice numbers to a date-based format for new invoices, and the database doesn't care because all the foreign keys point to the surrogate integer that never changed.

There's one more nuance I want to cover on the Stripe pattern. When Paul Asjes wrote about their ID system, he mentioned that they use Discord AutoMod with a regex on the secret key prefix — sk underscore live underscore — to block accidental secret key leaks. If a developer accidentally pastes a secret key into a Discord message, the AutoMod catches it before it's visible. That's a security practice built directly on top of a well-designed ID prefix system. The prefix convention made the regex trivial to write.

Which is a good reminder that ID design isn't just about the database schema. It ripples out into every system that handles those IDs — logging, monitoring, alerting, access control, audit trails. A consistent prefix convention makes all of those easier to build.

Let me circle back to inventory systems specifically, since Daniel mentioned those. The classic mistake is storing item attributes directly in the inventory table. You have an inventory row with an item name, description, category, and quantity. The problem is that when the item name changes, you now have to update it in every inventory row, and you'll inevitably miss some. I've seen databases where the same product had three different spellings across different inventory rows because someone updated some but not others.

The fix is an items table with a surrogate primary key and all the descriptive attributes, and then the inventory table just stores the item ID and the quantity. The item name lives in exactly one place.

And if you're tracking inventory across multiple locations, you add a location ID foreign key. The inventory table becomes a many-to-many join between items and locations, with quantity as the junction attribute. The item ID remains the single canonical identifier for that product, regardless of where it's stored.

If you need a human-readable SKU, that lives in the items table as a unique alternate key. The inventory system internally uses the surrogate integer or UUID for all its joins.

What about the case where the same physical item might have different IDs in different systems? Like a product that has a supplier SKU, an internal SKU, and a barcode.

That's where you add separate columns for each identifier type, all with appropriate uniqueness constraints where needed. Supplier underscore sku, internal underscore sku, barcode. They're all alternate keys. The primary key is still the surrogate. And you never derive one from another — each is independently generated or sourced.

Because if you derive the internal SKU from the supplier SKU and then the supplier changes their numbering scheme, you're stuck.

Or you change suppliers.

The independence principle matters. I once saw a system where the internal product code was literally the supplier part number with a prefix tacked on. When they switched suppliers for a key component, they had to either break their own numbering convention or create a mapping table to translate old codes to new ones. They chose the mapping table, and it became a permanent source of bugs.

Now: Hilbert's daily fun fact.

The average cumulus cloud weighs about one point one million pounds. That's roughly the weight of two hundred elephants floating above your head on a sunny afternoon.

That is deeply unsettling. I'm going to think about that every time I look up now.

You're welcome.

What should someone actually do with all this if they're setting up a new system today?

First, default to surrogate primary keys. UUIDv7 if you need cross-system uniqueness, bigint auto-increment if you're in a single database. Second, put unique constraints on every business identifier column — invoice numbers, SKUs, usernames, whatever. Third, choose an external ID format deliberately. If you're building an API that clients will interact with, consider TypeID or a Stripe-style prefixed string. Fourth, test your invoice numbering format with your accounting software before you go live.

If you're already deep into a system with a less-than-ideal ID setup? Say you've got sequential integers exposed in your API and you're starting to worry about enumeration attacks.

Add a new column for the better ID format, populate it, add the unique constraint, and start using it for new records and external interfaces. You don't have to migrate everything at once. The surrogate primary key stays the same, so your foreign keys are untouched. You're just adding a new alternate key. Over time, you can deprecate the old external IDs in your API responses. It's not a flag day migration — it's a gradual transition.

The other practical takeaway is to think about the double-click test. Whatever format you choose for human-readable IDs, make sure it behaves well when someone copies and pastes it. Underscores over dashes or colons if you want the whole thing selected in one double-click. It sounds trivial, but if your ID gets copied and pasted hundreds of times a day, those extra seconds add up.

Document your ID format decisions. Future you, or whoever inherits the system, should be able to look at an ID and know what it represents and how it was generated. The Stripe prefix convention is self-documenting in a way that raw UUIDs are not. A comment in your schema file that says "invoice numbers follow the pattern YYYYMMDD-NNN" takes thirty seconds to write and saves hours of confusion later.

The thing that sticks with me from this whole discussion is that good ID design is mostly about being thoughtful about the boundary between humans and machines. The machine wants uniqueness, sortability, and compact storage. The human wants readability, copy-paste ergonomics, and maybe some embedded context. A good schema serves both without letting either dominate.

The hybrid pattern — surrogate keys internally, well-designed external IDs for the surface — is how you get there. It's not complicated. It's just deliberate. You make a few intentional choices up front, and you save yourself years of accumulated friction.

Thanks to our producer Hilbert Flumingtop for making this episode happen, and for the cloud fact that I will now carry with me forever. This has been My Weird Prompts. Find us at myweirdprompts dot com or wherever you get your podcasts.

We'll be back with another one soon. In the meantime, go check your database schemas and make sure your primary keys aren't invoice numbers.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2444: Custom IDs: UUIDs vs Human-Readable Keys

Mentions

Downloads

You Might Also Like

#2444: Custom IDs: UUIDs vs Human-Readable Keys