#rag
81 episodes
#3751: Source-Restricted vs. Open Retrieval: How to Lock Down Your LLM
When should an LLM be locked to specific documents, and when should it search the web? A practical framework for grounding decisions.
#3120: What Makes Agentic Search Tools Like Exa Actually Work?
Why swapping Google for Exa transformed our show's accuracy — and what agentic search does differently.
#2705: Your Brain Isn't a Hard Drive — What Actually Fits
Long-term memory isn't storage — it's a generative model. Here's where the brain/computer analogy actually holds up.
#2682: Live Retrieval vs. RAG: What an Agent Actually Does
Does every AI conversation create a tiny vector store? We unpack the real tradeoffs between live document fetching and pre-indexed RAG.
#2676: Vector Database Schema Design for AI Memory Layers
Stop dumping vectors blindly. Design metadata schemas and namespaces for retrieval that actually works at scale.
#2673: The Embedding Coupling Problem: Editing Vector Stores
Can you edit or delete individual chunks in Pinecone? And can you actually back up a vector index? Yes—but with critical caveats.
#2664: Can You Trust an LLM's Raw Knowledge?
Why pre-trained knowledge isn't reliable for facts — and what actually makes models useful.
#2639: The Hidden Layer That Makes Search Work
Why your search results miss the mark — and how cross-encoders fix it.
#2638: How to Build Disposable AI Agents at Runtime
Create ephemeral AI agents that answer questions about specific items, then vanish. No persistent configuration needed.
#2469: Embedding Model Deprecation: RAG's Silent Killer
When OpenAI retires an embedding model, your RAG pipeline breaks silently. Here’s how to fix it.
#2466: The Hidden Trap of Embedding Model Lock-In
What happens when your vector database works great — until your embedding model gets deprecated and your vectors become useless.
#2315: How to Update AI Models Without Starting Over
Exploring the challenge of updating AI models with new knowledge without costly full retraining.
#2228: Tuning RAG: When Retrieval Helps vs. Hurts
How do you prevent retrieval from suppressing a model's reasoning? We diagnose our own pipeline's four control levers and multi-source fusion strat...
#2214: The Three Failure Modes of AI News Systems
When a conflict changes hourly, AI systems built for yesterday's information fail. Here's how to architect pipelines that actually keep up.
#2213: When Ground Truth Moves Hourly
How do you rigorously evaluate whether Tavily or Exa retrieves better results for breaking news? A formal benchmark beats the vibe check.
#2208: Building Memory for AI Characters That Actually Evolve
How do AI hosts develop real consistency across episodes? Corn and Herman explore retrieval-augmented memory systems that let AI characters genuine...
#2204: Memory Without RAG: The Real Architecture
mem0, Letta, Zep, and LangMem solve agent memory differently than RAG. Here's what's actually happening under the hood.
#2203: Knowledge Without Tools: Why MCPs Aren't Just for Execution
MCPs can be pure knowledge providers with zero tools. Here's why that matters for agents querying government data and authoritative sources.
#2181: When RAG Becomes an Agent
RAG in chatbots is simple retrieval. RAG in agents is a multi-step decision loop. Here's what actually changes.
#2133: Engineering Geopolitical Personas: Beyond Caricatures
How to build LLMs that simulate state actors with strategic fidelity, not just surface mimicry.
#2129: Shifting Left on Hallucinations
Stop hoping your AI doesn't lie. We explore the shift to deterministic guardrails, specialized judge models, and the tools making agents reliable.
#2125: Why Agentic Chunking Beats One-Shot Generation
A single prompt can't write a 30-minute script. Here’s the agentic chunking method that fixes coherence.
#2069: The Vibe Coding Trap: Why Your Agent Skills Keep Breaking
Stop guessing at the agentskills.io spec. Learn the exact YAML fields, directory structure, and authoring patterns to make Claude Code skills that ...
#2057: How Agents Break Through the LLM Output Ceiling
The output window is the new bottleneck: why massive context doesn't solve long-form generation.