← All Tags

#text-to-speech

13 episodes

#2618: Fixing Acronyms in TTS Pipelines

How to handle acronyms in text-to-speech pipelines using BERT models, lexicons, and layered preprocessing.

text-to-speechspeech-recognitionaudio-processing

#2591: Can You Swap Our Podcast Voices?

How dynamic voice replacement could let listeners choose who narrates each host's lines.

voice-cloningtext-to-speechaudio-processing

#2534: Can AI Generate Diagrams Without Typo Disasters?

Why AI diagram tools still mangle text labels — and what to do about it today.

image-generationprompt-engineeringtext-to-speech

#2311: Danish AI: Bridging the Localization Gap

How does AI handle Danish? Explore the challenges and progress in making AI tools work for small-language populations.

speech-recognitiontext-to-speechlarge-language-models

#2303: Optimizing Podcast Pipelines: TTS Costs and Batch Processing

How batch processing and smart queue management can slash TTS costs for episodic podcast production.

text-to-speechserverless-gpuvoice-cloning

#2192: How We Built a Podcast Pipeline

Hilbert reveals the complete technical architecture behind 2,000+ episodes—from voice memos to GPU-powered TTS, with Claude models, LangGraph workf...

prompt-engineeringspeech-recognitiontext-to-speech

#2027: Text-In, Text-Out: The Missing Photoshop for Words

Why is editing text with AI so clunky? We explore the "TITO" paradigm—using small, local models for fast, private text transformation.

local-aitext-to-speechspeech-recognition

#1810: Why Your TTS Sounds Great in English, Terrible Everywhere Else

English AI voices are polished, but global languages hit a wall. Here's why text-to-speech breaks down for Hebrew, Hindi, and beyond.

text-to-speechlinguisticsdata-integrity

#1809: The TTS Developer's Dilemma: Size vs. Speed

Stop guessing. We break down the critical trade-offs between model size, latency, and sample rate for production-ready voice apps.

text-to-speechgpu-accelerationedge-computing

#1808: The 82M Parameter Voice That Beat Billion-Dollar AI

How a model the size of a tweet outperforms billion-dollar giants in the race for perfect AI speech.

open-source-aismall-language-modelstext-to-speech

#1740: Chatterbox TTS: Open Source vs. ElevenLabs

We dissect Resemble AI's Chatterbox to see how its open-source TTS compares to commercial giants like ElevenLabs.

text-to-speechopen-sourceprosody-control

#1715: Why Voice Agents Need Frameworks (Not Just APIs)

Raw APIs handle models, but who manages the audio plumbing? We break down Vapi, LiveKit, and Pipecat.

speech-recognitiontext-to-speechconversational-ai

#136: The Ghost in the Machine: Why AI Voices Hallucinate

Why does your AI suddenly start shouting or whispering like Darth Vader? Herman and Corn dive into the glitchy world of TTS hallucinations.

text-to-speechhallucinationsautoregressive-modelsaudio-glitcheslatent-space