#automatic-speech-recognition

6 episodes

Modern ASR is shockingly robust. The biggest predictor of accuracy? How well your audio matches its training data.

How transformer models distinguish "um" from meaningful speech — and why removing too much makes you sound like a robot.

Cleaning audio before transcription can increase errors by up to 46%. Here's the right approach for your voice app.

Discover how PyAnnote and other tools tackle the critical task of identifying "who spoke when" in audio—and why it’s harder than it sounds.

Tired of AI mishearing brand names? Learn how to build efficient custom dictionaries for Gemini 1.5 without breaking the bank.

Speech to text: from frustrating to fantastic. Uncover the magic behind its rapid rise and connection to the AI boom!

Related Topics