#automatic-speech-recognition
6 episodes
#2754: Why Your Dictation Setup Might Be Wrong
Modern ASR is shockingly robust. The biggest predictor of accuracy? How well your audio matches its training data.
#2590: How Disfluency Detection Models Clean Up Speech
How transformer models distinguish "um" from meaningful speech — and why removing too much makes you sound like a robot.
#2486: Why Noise Reduction Can Ruin Transcription Accuracy
Cleaning audio before transcription can increase errors by up to 46%. Here's the right approach for your voice app.
#2337: How Speaker Diarization Powers Everything From Call Centers to Courts
Discover how PyAnnote and other tools tackle the critical task of identifying "who spoke when" in audio—and why it’s harder than it sounds.
#109: Teaching AI to Hear: Solving the Custom Dictionary Dilemma
Tired of AI mishearing brand names? Learn how to build efficient custom dictionaries for Gemini 1.5 without breaking the bank.
#10: How ASR Went From Frustration To ... Whisper Magic
Speech to text: from frustrating to fantastic. Uncover the magic behind its rapid rise and connection to the AI boom!