#large-language-models
140 episodes · Page 2 of 6
#2410: How Researchers Actually Measure Censorship in Chinese LLMs
Beyond headlines: the actual benchmarks, methodologies, and pitfalls in detecting political refusal in Chinese language models.
#2403: Choosing Your LLM Eval Framework
An architectural shootout of four major LLM evaluation harnesses — where each shines and where each breaks down.
#2374: How Granular Can MoE Experts Get?
Exploring the limits of expert granularity in Mixture of Experts models—how narrow can segmentation go before efficiency or accuracy suffers?
#2355: Why Open-Weight Models Are Winning
Discover how Cogito v2.1 leverages process supervision and MoE architecture to redefine reasoning efficiency in open-weight AI models.
#2314: One Model or Three? Inside Claude's Architecture
What makes Claude’s Haiku, Sonnet, and Opus different? Discover how architecture shapes their unique strengths and weaknesses.
#2311: Danish AI: Bridging the Localization Gap
How does AI handle Danish? Explore the challenges and progress in making AI tools work for small-language populations.
#2309: Blind Ranking AI's Best Podcast Scripts
How do 15 AI models handle controversial podcast prompts? We rank their scripts blind and reveal the surprising winners.
#2307: Inside Frontier LLM Training: Stages, Costs, and Checkpoints
Discover the multi-stage process of training frontier large language models, from pretraining to post-training, and why checkpoints are the key to ...
#2306: Can LLM Councils Truly Capture Diverse Worldviews?
Exploring whether LLM councils can achieve genuine worldview diversity or if alignment processes erase meaningful differences.
#2243: What Enterprise AI Pricing Actually Negotiates
Enterprise customers rarely get the deep discounts they expect from AI APIs. What they actually negotiate for—and why the ramp-up requirement exist...
#2242: AI as Your Ideation Blind Spot Spotter
How to use AI not to answer questions you already know to ask, but to surface possibilities your expertise has made invisible to you.
#2233: Who Actually Wants AI to Slow Down?
Daniel argues AI development should slow down for expertise and stability. But who in the industry actually shares this philosophy beyond the obvio...
#2214: The Three Failure Modes of AI News Systems
When a conflict changes hourly, AI systems built for yesterday's information fail. Here's how to architect pipelines that actually keep up.
#2190: Simulating Extreme Decisions With LLMs
LLMs fail at the exact problem wargaming was built to solve—simulating irrational, extreme decision-makers. A new study reveals why.
#2187: Why Claude Writes Like a Person (and Gemini Doesn't)
Claude produces prose that sounds human. Gemini reads like Wikipedia. The difference isn't capability—it's how they were trained to think about wri...
#2172: Council of Models: How Karpathy Built AI Peer Review
Andrej Karpathy's llm-council uses anonymized peer review to make language models evaluate each other fairly—but can it really suppress model bias?
#2171: How IQT Labs Built a Wargaming LLM (Then Archived It)
A deep code review of Snowglobe, IQT Labs' open-source LLM wargaming system that ran real national security simulations before being archived. What...
#2076: Is Pure NLP Dead? The Hidden Scaffolding of AI
Modern AI didn't appear from nowhere. Discover how decades of linguistic rules and statistical models built the foundation for today's LLMs.
#2066: The Transformer Trinity: Why Three Architectures Rule AI
Why did decoder-only models like GPT dominate AI, while encoders and encoder-decoders still hold critical niches?
#2064: Why GPT-5 Is Stuck: The Data Wall Explained
The "bigger is better" era of AI is over. Here's why the industry hit a data wall and shifted to a new scaling law.
#2063: That $500M Chatbot Is Just a Base Model
That polite chatbot? It started as a raw, chaotic autocomplete engine costing half a billion dollars to build.
#2062: How Transformers Learn Word Order: From Sine Waves to RoPE
Transformers can’t see word order by default. Here’s how positional encoding fixes that—from sine waves to RoPE and massive context windows.
#1839: AI's Data Kitchen: From Hoovering to Fine-Tuning
We go behind the curtain of the AI data pipeline, revealing the messy, multi-billion-dollar war over data curation.
#1812: When AI Gets a Truth Tether to the Talmud
Sefaria's new MCP server connects AI directly to 2,700 years of Jewish texts, transforming how scholars and curious learners study ancient literature.