#data-storage

32 episodes

#3431: How YouTube Stores 500 Hours of Video Every Minute

YouTube's videos are shredded, replicated across global servers, and stored at a cost approaching zero. Here's how.

data-storageinfrastructurehardware-reliability

#3217: When a Truck Beats the Internet: Shipping Data at Scale

Why FedEx sometimes beats fiber for moving massive datasets across the country.

data-integritylogisticsdata-storage

#3073: What 40,000-Year-Old Paint Teaches Us About Digital Storage

Cave paintings outlasted carved stone. Now engineers are using that chemistry to build千年-proof discs.

material-sciencedata-storagecave-painting

#2685: Plugin Data Storage for AI Agents

How to separate user data from plugin code across Linux, macOS, and Windows in agentic AI environments.

data-storageai-agentscross-platform

#2571: How S3 Billing Actually Works (And Why R2 Is Different)

Storage is the decoy cost. The real surprises come from request charges, egress fees, and early deletion penalties.

cloud-computingdata-storagelatency

#2475: Docker Volumes: Why They Can't Move and What To Do

Docker made apps portable but left your data stuck. Here's how to actually move volumes between hosts.

dockerbackup-strategiesdata-storage

#2465: JSON-L vs Parquet: When Each Format Wins

How far can JSON-L scale before it breaks? And why does Parquet dominate for millions of rows?

data-storagedata-integrityjsonl

#2438: The Folder Illusion: How Object Storage Fakes Hierarchy

Blobs, flat namespaces, and why those "folders" in cloud storage are complete illusions.

data-storagecloud-computingdistributed-systems

#2368: The Multi-Stage Pipeline Behind Netflix's Recommendations

Unpacking the multi-stage AI pipeline behind Netflix, Spotify, and Amazon’s "you might also like" suggestions—from candidate generation to real-tim...

ai-modelsdata-storageai-training

#2271: Vector Search in a Single File

What if you could do vector search with just SQLite? We explore sqlite-vec, the extension that adds embeddings to the world's simplest database, an...

vector-databasesedge-computingdata-storage

#2064: Why GPT-5 Is Stuck: The Data Wall Explained

The "bigger is better" era of AI is over. Here's why the industry hit a data wall and shifted to a new scaling law.

large-language-modelsai-trainingdata-storage

#2011: Saving AI Knowledge Beyond the Chat Window

We're brilliant at prompting AI, but terrible at saving the answers. Here's why that "digital masterpiece on a chalkboard" vanishes.

knowledge-managementai-agentsdata-storage

#2010: Building Better AI Memory Systems

We obsess over AI inputs but treat outputs like Snapchat messages. Here's why that's a massive blind spot.

ai-agentsragdata-storage

#1989: Your Cloud Photos Vanish If You Miss a $5 Bill

Is your data safe in the cloud, or is it one missed payment away from oblivion?

data-storagehome-labsupply-chain-security

#1988: The Eternal Storage That Can't Escape the Lab

Quartz glass promises 10,000-year data storage, but can it scale before 180 zettabytes make it obsolete?

data-storagehardware-engineeringglass-storage

#1983: Why Your Digital Photos Are Slowly Disappearing

Physical paper from the 1700s is more durable than a Word doc from 1994. Here's why digital data is fragile and how archivists fight bit rot.

data-storagedigital-forensicshardware-reliability

#1920: InfluxDB vs. Postgres: The Time-Series Showdown

We compare specialized time-series databases like InfluxDB against traditional SQL options like Postgres with Timescale extensions.

data-storagedistributed-systemssoftware-development

#1910: Our Podcast Is Now a Permanent Research Artifact

Why we're uploading every episode to CERN's Zenodo archive, giving our AI experiments a permanent DOI and a life beyond streaming platforms.

open-sourcedata-storagedigital-forensics

#1797: Why the Cloud Runs on Cassette Tapes

The cloud isn't just hard drives—it's millions of robotic cassette tapes holding petabytes of data for Google and NASA.

data-storagehardware-engineeringsecurity

#1776: The Sync Trap: Why Your Backup Isn't Safe

Is your backup strategy a responsible habit or a full-blown compulsion? We explore the thin line between data safety and digital hoarding.

data-storagedigital-privacyhuman-factors

#1475: The Folder Illusion: Why Cloud Storage Breaks Your Mental Model

Folders are a lie in the cloud. Explore why Amazon S3 uses flat namespaces and "keys" instead of traditional file hierarchies.

cloud-computingdata-storagecloud-repatriation

#1233: Why "Just Use Postgres" Isn't Always Enough

Can one database do it all? Explore why hardware constraints and data geometry keep specialized databases like Snowflake and ClickHouse alive.

data-storagearchitecturedistributed-systems

#1211: Escaping JOIN Hell: The SQL Developer’s Guide to Neo4j

Stop struggling with 15-deep JOINs. Learn how Neo4j turns relationships into first-class citizens for faster, more intuitive data modeling.

graph-databasesarchitecturedata-storage

#1124: The Database Explosion: Why One Size No Longer Fits All

From vector stores to edge computing, discover why the world now has over 1,000 databases and why Postgres isn't always the answer.

vector-databasesdata-storageedge-computing