Forever Files: Data Permanence, Digital Preservation, and the Fight Against Bitrot

Data feels permanent. It isn’t. Files corrupt silently, services shut down, governments censor, companies delete archives, and physical media decays on timescales that vary from months to centuries depending on what you’re using and how you store it. These seven episodes explored what it actually takes to keep data alive — from the byte level up to civilizational scale.

The Long View

  • Data Forever: From Blockchains to Lunar Vaults opened with the most ambitious preservation project anyone has proposed: storing copies of critical human knowledge on the Moon, where it would be safe from the catastrophes that might affect Earth. The episode traced the full spectrum of data permanence approaches — blockchain-based storage, distributed systems, optical storage, and physical vaults — and examined the tradeoffs between redundancy, cost, accessibility, and what “permanent” actually means in engineering terms. The hosts were characteristically skeptical of grandiose claims while genuinely engaging with the technical challenges.

  • Preserving the Web: The Internet Archive and Arweave examined the organizations and protocols trying to solve the same problem at internet scale. The Internet Archive’s Wayback Machine has crawled billions of web pages, but it’s a nonprofit running on donations, legally exposed (as the copyright lawsuits revealed), and maintains single copies of most material. Arweave takes a different approach: a decentralized storage network with a one-time payment model designed to fund perpetual storage. The episode examined the technical architectures, the economics, and the fragility of depending on either approach alone.

The Physics of Decay

  • Digital Dust: Can NFC Tags Survive for Decades? used NFC tags as a lens for examining a broader question: how long does any digital storage medium actually last? The episode covered the mechanism of NFC tag data storage (EEPROM cells that hold charge), the failure modes (charge leakage, write cycle exhaustion, physical damage), and the realistic lifespan under different storage conditions. More broadly, it examined the concept of bitrot — the silent degradation of stored data across all media types — and the practices (migration, checksum verification, redundancy) that address it.

  • Digital Fingerprints: The Secret Math Saving Your Data explained checksums and cryptographic hashes — the mathematical tools that detect when data has been corrupted. A checksum is a computed value derived from a file’s contents; if the file changes (even by a single bit), the checksum changes. This property makes checksums essential for detecting transmission errors, storage corruption, and tampering. The episode covered MD5, SHA-1, SHA-256, and the specific contexts where each is appropriate, and explained why file integrity verification is a practical skill, not just a theoretical concern.

Sovereignty and Control

  • Digital Borders: The Rise of Data Sovereignty examined the growing tension between global internet infrastructure and national data sovereignty requirements. GDPR was the most prominent early example, but data localization requirements now exist in dozens of jurisdictions, each with different rules about where data must be stored, who can access it, and what audit rights apply. The episode used Cloudflare R2 (with its zero-egress-fee model and global point-of-presence network) as a case study in how infrastructure providers are adapting to this regulatory complexity.

  • Beyond the Blackout: Tech for Digital Survival addressed the hard end of the spectrum: what do you do when your government shuts off the internet? This isn’t a theoretical question — internet shutdowns happen regularly in dozens of countries during elections, protests, and emergencies. The episode covered the practical tools: Tor, Meshtastic (LoRa mesh networking), SMS-based information transfer, satellite internet options, and the social and operational practices that make them effective. The hosts were clear-eyed about the limitations and the threat models these tools address.

The Infrastructure Layer

  • The Plumbing of Data: From FAT32 to Self-Healing ZFS returned to the foundation: the file system layer that determines whether your storage hardware is being used intelligently. Copy-on-write file systems like ZFS don’t overwrite existing data — they write new copies and update pointers, which means corrupted writes can be detected and reverted. ZFS’s checksumming at every block level means it will silently detect and (with sufficient redundancy) repair corruption that other file systems would never notice. The episode explained why your choice of file system is actually a data integrity decision, not just a performance one.

Data permanence isn’t solved by cloud backup alone. It requires understanding the failure modes at every layer — media physics, file system design, infrastructure economics, and regulatory geography. These episodes build that understanding.

Episodes Referenced