← All Tags

#automatic-speech-recognition

6 episodes

#2754: Why Your Dictation Setup Might Be Wrong

Modern ASR is shockingly robust. The biggest predictor of accuracy? How well your audio matches its training data.

automatic-speech-recognitionspeech-recognitionaudio-processing

#2590: How Disfluency Detection Models Clean Up Speech

How transformer models distinguish "um" from meaningful speech — and why removing too much makes you sound like a robot.

speech-recognitionaudio-processingautomatic-speech-recognition

#2486: Why Noise Reduction Can Ruin Transcription Accuracy

Cleaning audio before transcription can increase errors by up to 46%. Here's the right approach for your voice app.

speech-recognitionaudio-processingautomatic-speech-recognition

#2337: How Speaker Diarization Powers Everything From Call Centers to Courts

Discover how PyAnnote and other tools tackle the critical task of identifying "who spoke when" in audio—and why it’s harder than it sounds.

audio-processingspeech-recognitionautomatic-speech-recognition

#109: Teaching AI to Hear: Solving the Custom Dictionary Dilemma

Tired of AI mishearing brand names? Learn how to build efficient custom dictionaries for Gemini 1.5 without breaking the bank.

automatic-speech-recognitioncustom-dictionariesgemini-15context-bloatdynamic-hint-system

#10: How ASR Went From Frustration To ... Whisper Magic

Speech to text: from frustrating to fantastic. Uncover the magic behind its rapid rise and connection to the AI boom!

automatic-speech-recognitionspeech-to-textasr-technology