#ai-reasoning
47 episodes
#3816: How to Stop AI Scripts From Falling Apart
Why long-form AI generation breaks down and how hierarchical memory fixes it.
#3814: The Day We Lost Our Minds: What Temperature Does to an AI
A two-host autopsy of the day the podcast's AI hosts briefly lost coherence due to excessive sampling temperature, and what it reveals about how language models actually work.
#2780: Building Self-Healing Agent Pipelines
How to build an agent that monitors and fixes other agents in production — without the hype.
#2693: When AI Ignores Your Style Guide
Why your AI ignores formatting instructions and how to fix it with pipeline architecture, not model swaps.
#2400: Claude Code’s Hidden Context Tax
How Claude’s eager-loaded primitives silently consume context—and how to optimize your setup for sharper performance.
#2308: When AI Forecasts Collide: Geopol Model Divergence
Five AI models forecast the Iran-Israel-US crisis — and their disagreements reveal surprising insights about geopolitical reasoning.
#2241: When More Frameworks Make Worse Decisions
Benjamin Franklin's 250-year-old pro/con list still dominates how we decide—but research shows it's riddled with bias. We map five frameworks that ...
#2239: How AI Benchmarks Became Broken (And What's Replacing Them)
The tests we use to measure AI progress are contaminated, saturated, and gamed. Here's what's actually working.
#2224: Why AI Can't Crack the Voynich Manuscript
A fifteenth-century text has defeated cryptanalysts, linguists, and AI models alike. What does its resistance tell us about language, encoding, and...
#2191: Making Multi-Agent AI Actually Work
Research from Google DeepMind, Stanford, and Anthropic reveals most multi-agent systems waste tokens and amplify errors. Single agents with better ...
#2189: Scaling Multi-Agent Systems: The 45% Threshold
A landmark Google DeepMind study reveals that adding more AI agents often degrades performance, wastes tokens, and amplifies errors—unless your sin...
#2182: Can You Actually Review an AI Agent's Plan?
Most AI agents have plans the way you have a plan while half-asleep—something's happening, but you can't see it. We map the five major planning pat...
#2175: Let Your AI Argue With Itself
What happens when you let multiple AI personas debate each other instead of asking one model one question? A deep dive into synthetic perspective e...
#2173: Inside MiroFish's Agent Simulation Architecture
MiroFish generates thousands of AI agents with distinct personalities to predict social dynamics. But research reveals a critical flaw: LLM agents ...
#2172: Council of Models: How Karpathy Built AI Peer Review
Andrej Karpathy's llm-council uses anonymized peer review to make language models evaluate each other fairly—but can it really suppress model bias?
#2164: Why Bigger Context Windows Don't Fix Attention
Frontier models have million-token context windows, but attention degrades well before you hit the limit. New research reveals why bigger isn't bet...
#2024: Your AI Council: Digital Committee or Groupthink?
A digital boardroom of AI models promises better decisions, but risks amplifying the same old biases.
#2016: Andrej Karpathy: The Bob Ross of Deep Learning
Why the most influential AI mind prefers a blank text file to proprietary black boxes.
#1894: Engineering Serendipity: Tuning AI for Better Brainstorming
Stop asking chatbots for generic ideas. Learn how to configure AI as a structured, critical partner for business innovation and career pivots.
#1893: AI as a Strategic Adversary for Startups
Can AI stress-test your startup idea before investors do? We explore using AI as a strategic adversary to find blind spots.
#1838: Tuning Search Without Losing Your Mind
Modern search bars are AI decision engines. Here's how small teams can tune fuzzy matching, semantic search, and reranking without breaking everyth...
#1668: Kimi K2's Hidden Reasoning: A New AI Architecture
Moonshot AI's Kimi K2 Thinking model uses a hidden reasoning phase to solve complex logic puzzles and coding tasks, beating top proprietary models.
#1633: Can a Character Actor Model Beat a Generalist?
We grill MiniMax M2.7 to see if a model built for "virtual companions" can actually handle high-level comedy and complex character logic.
#1630: When a Reasoning Model Overthinks Comedy
Xiaomi’s new MiMo 2.0 Pro model auditions for a comedy podcast, promising deep reasoning over raw speed.