#data-storage
32 episodes
#3431: How YouTube Stores 500 Hours of Video Every Minute
YouTube's videos are shredded, replicated across global servers, and stored at a cost approaching zero. Here's how.
#3217: When a Truck Beats the Internet: Shipping Data at Scale
Why FedEx sometimes beats fiber for moving massive datasets across the country.
#3073: What 40,000-Year-Old Paint Teaches Us About Digital Storage
Cave paintings outlasted carved stone. Now engineers are using that chemistry to build千年-proof discs.
#2685: Plugin Data Storage for AI Agents
How to separate user data from plugin code across Linux, macOS, and Windows in agentic AI environments.
#2571: How S3 Billing Actually Works (And Why R2 Is Different)
Storage is the decoy cost. The real surprises come from request charges, egress fees, and early deletion penalties.
#2475: Docker Volumes: Why They Can't Move and What To Do
Docker made apps portable but left your data stuck. Here's how to actually move volumes between hosts.
#2465: JSON-L vs Parquet: When Each Format Wins
How far can JSON-L scale before it breaks? And why does Parquet dominate for millions of rows?
#2438: The Folder Illusion: How Object Storage Fakes Hierarchy
Blobs, flat namespaces, and why those "folders" in cloud storage are complete illusions.
#2368: The Multi-Stage Pipeline Behind Netflix's Recommendations
Unpacking the multi-stage AI pipeline behind Netflix, Spotify, and Amazon’s "you might also like" suggestions—from candidate generation to real-tim...
#2271: Vector Search in a Single File
What if you could do vector search with just SQLite? We explore sqlite-vec, the extension that adds embeddings to the world's simplest database, an...
#2064: Why GPT-5 Is Stuck: The Data Wall Explained
The "bigger is better" era of AI is over. Here's why the industry hit a data wall and shifted to a new scaling law.
#2011: Saving AI Knowledge Beyond the Chat Window
We're brilliant at prompting AI, but terrible at saving the answers. Here's why that "digital masterpiece on a chalkboard" vanishes.
#2010: Building Better AI Memory Systems
We obsess over AI inputs but treat outputs like Snapchat messages. Here's why that's a massive blind spot.
#1989: Your Cloud Photos Vanish If You Miss a $5 Bill
Is your data safe in the cloud, or is it one missed payment away from oblivion?
#1988: The Eternal Storage That Can't Escape the Lab
Quartz glass promises 10,000-year data storage, but can it scale before 180 zettabytes make it obsolete?
#1983: Why Your Digital Photos Are Slowly Disappearing
Physical paper from the 1700s is more durable than a Word doc from 1994. Here's why digital data is fragile and how archivists fight bit rot.
#1920: InfluxDB vs. Postgres: The Time-Series Showdown
We compare specialized time-series databases like InfluxDB against traditional SQL options like Postgres with Timescale extensions.
#1910: Our Podcast Is Now a Permanent Research Artifact
Why we're uploading every episode to CERN's Zenodo archive, giving our AI experiments a permanent DOI and a life beyond streaming platforms.
#1797: Why the Cloud Runs on Cassette Tapes
The cloud isn't just hard drives—it's millions of robotic cassette tapes holding petabytes of data for Google and NASA.
#1776: The Sync Trap: Why Your Backup Isn't Safe
Is your backup strategy a responsible habit or a full-blown compulsion? We explore the thin line between data safety and digital hoarding.
#1475: The Folder Illusion: Why Cloud Storage Breaks Your Mental Model
Folders are a lie in the cloud. Explore why Amazon S3 uses flat namespaces and "keys" instead of traditional file hierarchies.
#1233: Why "Just Use Postgres" Isn't Always Enough
Can one database do it all? Explore why hardware constraints and data geometry keep specialized databases like Snowflake and ClickHouse alive.
#1211: Escaping JOIN Hell: The SQL Developer’s Guide to Neo4j
Stop struggling with 15-deep JOINs. Learn how Neo4j turns relationships into first-class citizens for faster, more intuitive data modeling.
#1124: The Database Explosion: Why One Size No Longer Fits All
From vector stores to edge computing, discover why the world now has over 1,000 databases and why Postgres isn't always the answer.