#large-language-models
140 episodes
#3816: How to Stop AI Scripts From Falling Apart
Why long-form AI generation breaks down and how hierarchical memory fixes it.
#3814: The Day We Lost Our Minds: What Temperature Does to an AI
A two-host autopsy of the day the podcast's AI hosts briefly lost coherence due to excessive sampling temperature, and what it reveals about how language models actually work.
#3767: How LLMs Actually Learn: Stages or Slurry?
Do large language models learn grammar first, then facts? The honest answer is messier and more fascinating.
#3664: Build Your Own Language Dictionary: Beyond Standard Definitions
Ditch standard dictionaries and build your own curated vocabulary from real encounters with native speakers.
#3596: Why an AI Model Kept Calling Itself Sonnet 4.6
When a Chinese model insists it's "Sonnet 4.6," is it theft, sloppy training, or something stranger?
#3595: How DeepSeek Feels More Open Than Western AI
Why Chinese AI models sometimes feel less censored on American political topics than American models do.
#3553: Can AI Review Your Lease in Israel?
Can AI actually understand Israeli tenant law? We explore the tools, the gaps, and how to build your own.
#3424: Catching Up on AI Without the Firehose
Four curated sources that filter AI noise into signal — Import AI, The Batch, Stanford HAI, and a podcast.
#3406: LoRA Isn’t Just for Image Generation
LoRA lets you fine-tune an LLM’s behavior with a 50MB file. Here’s how it works and why it matters.
#3283: Fine-Tuning DeepSeek for One Podcast
Can a purpose-specific fine-tune fix a model's stubborn writing tics? We explore the practical engineering behind it.
#3278: How to Get Early AI Model Access as a Solo Developer
How a solo developer spending $300/month can get early access to new AI models before the press release.
#3271: LLMs as Parsers, Not Calculators
Stop letting LLMs do math. Use them to parse messy text, then let deterministic code handle the numbers.
#3171: How to Break an LLM's Bad Verbal Habits
Blacklists fail and regex inverts meaning. Here's what actually works to clean up AI writing tics.
#3157: Opus 4.8: What Actually Changed Under the Hood
Anthropic dropped Opus 4.8 with no fanfare. New training data, faster inference, and smarter refusals — here's what changed.
#3127: Crafting AI Characters That Feel Alive
Move beyond system prompts with structured character bibles that give AI personalities real inner lives.
#2672: When a Startup Claims to Break the Quadratic Wall
A startup claims linear attention scaling at 12M tokens, beating GPT-5.5 on retrieval benchmarks.
#2664: Can You Trust an LLM's Raw Knowledge?
Why pre-trained knowledge isn't reliable for facts — and what actually makes models useful.
#2651: AI Training Itself: Student, Teacher, and Grader
Can models generate their own training data and judge their own outputs? The promise and pitfalls of fully AI-led pipelines.
#2650: How to Catch an LLM's Bad Writing Habits
A practical guide to analyzing podcast transcripts for repetitive language and dialogue patterns — from Python word counts to embedding clustering.
#2622: How Transformers Actually Work: Attention, Tokens, and Context
How one architectural change unlocked chatbots, image generation, and protein folding — explained without the jargon.
#2488: Hybrid Pipelines for Entity Resolution
Classic NLP pipelines vs. lightweight LLMs for handling Hezbollah’s half-dozen spellings.
#2464: Batch APIs: The 50% Discount You're Probably Misusing
Batch inference APIs offer 50% off — but only for the right workloads. Here's when they actually make sense.
#2461: How Claude Code's Conversation Compaction Actually Works
The three-tier system, what survives, what dies, and why you shouldn't rely on auto-compact.
#2426: Why DeepSeek V4's Prose Feels More Vivid Than Claude or GPT
A million-token context window at 2% the KV-cache cost — and prose that actually breathes. Here's what makes V4 different.