Speech-to-Text
Whisper, ASR, transcription, voice typing
14 episodes
Breaking the Voice Wall: The Future of Native Speech AI
Explore why native speech-to-speech AI is 20x more expensive than text pipelines and how "semantic VAD" is solving the awkward silence problem.
Teaching AI to Hear: Solving the Custom Dictionary Dilemma
Tired of AI mishearing brand names? Learn how to build efficient custom dictionaries for Gemini 1.5 without breaking the bank.
Unsung Hero: The Gooseneck Mic's AI Power
The gooseneck mic: a humble hero with surprising AI power. Discover its secret to crystal-clear speech-to-text accuracy!
From Lawyers in Limousines to Developers in Their PJs: The Voice Tech Revolution
From limo-riding lawyers to pajama-clad coders, voice tech is booming. Discover how AI is making it a force for good.
The Multimodal Audio Revolution: A Screen-Free Future?
Is multimodal audio the future? We explore if AI can truly displace traditional speech-to-text for a screen-free world.
Personalizing Whisper: The Voice Typing Revolution
Voice typing is changing everything. Join us as we explore the revolution of personalizing Whisper!
Mic Check: Mastering AI Dictation Hardware
Uncover the secrets to perfect AI dictation! Corn and Herman explore the ultimate speech-to-text hardware.
Building Custom ASR Tools
Ever wondered how to build your own ASR tools from scratch? Discover the why and how in this episode!
How To Fine Tune Whisper
Build your own AI transcription tool! We'll walk you through fine-tuning Whisper, from data to notebook.
Benchmarking Custom ASR Tools - Beyond The WER
Benchmarking custom ASR fine-tunes: We're diving deep beyond the WER to truly measure performance.
Fine-Tuning ASR For Maximal Usability
Fine-tuned ASR is just the start. Discover the next steps for deployment and maximizing usability.
How ASR Went From Frustration To ... Whisper Magic
Speech to text: from frustrating to fantastic. Uncover the magic behind its rapid rise and connection to the AI boom!
Safetensors or something else: STT inference formats explained
Unpacking ASR weight formats: Safetensors and beyond. Tune in to understand the distinctions.
Building Your Own Whisper
Ever wondered if you could build your own speech recognition tool? We dive deep into crafting custom ASR.