#fault-tolerance
51 episodes
#3912: Elevators at 46 MPH: Speed, Safety & Algorithms
How do elevators rocket up skyscrapers at 46 mph, and what happens when cables aren't enough?
#3797: How Self-Reverting Watchdogs Save Broken SSH Sessions
A dead man's switch for server configs that automatically rolls back risky changes when connectivity drops.
#3786: When Your DNS Dies: Home Network Failure Cascade
One dead server, ZFS corruption, and a DNS collapse that takes down everything—including your ability to fix it.
#3775: SBC Clusters vs Virtualization: The Real Tradeoffs
Why physical isolation sounds great but virtualization usually wins for home servers.
#3749: Triage When Everything Breaks at Once
When a roof leak, server failure, and lease termination hit simultaneously, here's how to prioritize.
#3284: Agent Infrastructure Engineer: The New DevOps
Agentic AI is splintering into real engineering disciplines. Here's what the "DevOps of AI" actually does.
#2989: Why Trains Crash When They Can't Steer
Stopping a train takes miles. Seeing an obstacle takes seconds. That gap explains everything.
#2938: How to Prevent Linux Desktop Crashes Under Heavy Load
Stop losing work to memory exhaustion, CPU lockups, and GPU hangs on Linux workstations.
#2924: When Adding One Agent Breaks Everything
The math behind why your 100-agent pipeline fails 40% of the time — and what to do about it.
#2780: Building Self-Healing Agent Pipelines
How to build an agent that monitors and fixes other agents in production — without the hype.
#2773: Beyond Static Fallbacks: Agentic Error Handling in AI Pipelines
From try-except blocks to planning agents that route around failures intelligently.
#2556: The Weird Myths of Solid-State Storage
No moving parts, no sound waves — just electrons trapped in silicon. How solid-state drives actually work.
#2550: Idempotent Pipelines: Checkpoints, Manifests & Safe Re-Runs
How to design scripts and pipelines so re-running them is safe, even after a crash mid-execution.
#2179: Building Cost-Resilient AI Agents
Failed API calls in agent loops aren't just technical problems—they're direct budget drains. Here's how checkpointing, retry strategies, and cachin...
#2002: Brainstorming a Stable-by-Design Smart Home
We explore why Home Assistant is so fragile and brainstorm a stable-by-design future for the platform.
#1921: The Three-Second Heartbeat That Keeps Israel Safe
Why a civilian website sends an empty JSON payload every three seconds, even during peacetime, and what it reveals about mission-critical architect...
#1067: The 3,000-Person Army: How Major AI Models Actually Ship
Think AI is built by a few geniuses? Discover the army of 3,000 specialists required to ship a single major model update.
#1048: The Keepers: How the Samaritans Outlasted Empires
Discover how a community of 950 people used ancient scripts and "survival engineering" to outlast empires for over two millennia.
#1041: Before the Hum: Life in the Pre-Refrigeration Era
Explore the high-stakes world of food preservation, from 19th-century ice trades to the biological secrets of 50-year-old perpetual stews.
#1036: Is Kubernetes Too Big for Your Startup?
Is Kubernetes too complex for most teams? Explore the evolution of infrastructure from Google’s Borg to the new era of AI-driven scaling.
#1032: Ancient Backups: How History Survived the Delete Command
Discover how ancient civilizations used monks, clay jars, and geographic diversity to create the world's first distributed data networks.
#1012: When a Missile Test Is a Diplomatic Message
Explore the strategic signaling behind the GT-255 launch and why the U.S. relies on 50-year-old technology to maintain global security.
#989: From Shackleton to Supply Chains: The Industrialization of Polar Science
Beyond the ice: Explore the massive industrial operations and high-stakes geopolitics required to sustain human life at the Earth's poles.
#894: Iran After Khamenei: The IRGC’s Fight for Survival
Following the death of the Supreme Leader, we examine the IRGC’s grip on Iran’s economy, military, and its future as a "state within a state."