Audio & Speech

Speech recognition, TTS, voice cloning, audio engineering

18 episodes RSS Feed

The technology of voice and sound. From text-to-speech systems and voice cloning to speech recognition and audio engineering, this channel covers the cutting edge of how machines learn to speak, listen, and sound convincingly human.

#1752: Whisper Small Beats Whisper Large in Speed & Accuracy

A 4GPU benchmark on Ubuntu shows the 1.5B parameter Whisper Large is slower and less accurate than the tiny Whisper Small.

speech-recognitiongpu-accelerationlatency

#1724: YouTube's Invisible AI Dubbing Machine

How does YouTube translate a video with one click? We explore the tech behind auto-dubbing, from sandwich models to voice cloning.

speech-to-speechvoice-cloningmultimodal-ai

#1555: Beyond Whisper: NVIDIA’s Real-Time Speech Revolution

Move over Whisper. NVIDIA's new models offer 10x speed increases and better accuracy for real-time speech-to-text.

#1218: Why Your Phone Still Can't Keep Up With Your Voice

Why does voice typing feel so clunky compared to recording a memo? We explore the technical hurdles of real-time AI transcription.

#947: Pro Audio in Acoustic Nightmares: Mobile Recording Tips

Learn how to turn a marble-floored room into a studio using your phone, simple blankets, and the right USB-C gear.

audio-engineeringmobile-recordingacoustic-treatment

#868: Beyond the Digital Sandwich: Pro Mobile Mics for AI

Stop holding your phone like a piece of toast. Explore the best mobile microphone setups for high-quality AI voice transcription.

telecommunicationsaudio-engineeringspeech-recognition

#732: Mastering Your Sound: AI EQ and the Perfect Vocal Chain

Use AI to find your perfect EQ profile and build a pro vocal chain. Fix nasality, master de-essing, and sound your best on any device.

audio-engineeringaudio-processingaudio-qualitycomputational-audio

#727: The Math of Immersion: How 360-Degree Sound Actually Works

Learn how object-based audio and clever math trick your brain into hearing 360-degree sound from even the smallest mobile devices.

sensory-processingspatial-audiocomputational-audio

#725: The Science of Sound: Choosing the Best Podcast Speaker

Stop listening to podcasts through tinny speakers. Learn how to choose hardware optimized for the human voice and clear, room-filling audio.

smart-homeaudio-engineeringcomputational-audio

#720: Why Your Ears Prefer Imperfect Plastic to Perfect Pixels

Why do we still buy plastic discs in an age of neural-link streaming? Explore the science of analog warmth and the "ritual" of the record.

sensory-processinganalog-audiodigital-compression

#660: The Bit Rate Dilemma: How Much Audio Data Do You Need?

Herman and Corn explore the science of audio compression, psychoacoustics, and finding the perfect bit rate for podcasts and AI.

audio-processingdata-integritypsychoacoustics

#647: From Bits to Beats: The Science of Digital-to-Analog

Why does digital data need to become analog? Explore the physics of sound and the critical role of the DAC in modern audio engineering.

audio-engineeringsignal-processingdigital-to-analog

#598: Audio Engineering as Prompt Engineering: Better Sound, Better AI

Can better audio quality actually make an AI smarter? Discover how audio post-production functions as a new form of prompt engineering.

prompt-engineeringlarge-language-modelsaudio-engineering

#233: The Sound Spotlight: How Beamforming Redefines Audio

Discover how math and physics turn simple microphones into "sound spotlights" that can isolate a single voice in even the noisiest environments.

beamforming-technologymicrophone-arraysdigital-signal-processing

#196: Beyond the Robot: The Science of Modern Voice Cloning

Herman and Corn dive into the mechanics of neural text-to-speech, exploring how AI masters human prosody and the "average voice" accent problem.

neural-text-to-speechvoice-cloninggenerative-modeling

#99: Beyond the Headset: Pro Audio for AI Voice Control

Tired of headsets? Herman and Corn explore professional microphone setups for seamless, high-accuracy AI voice dictation from a distance.

voice-dictationai-accuracymicrophonesaudio-qualitysignal-to-noise-ratio

#58: Clean Audio, Messy Reality: Noise Removal for Voice-to-Text

Fussy baby, clean audio? We dive into noise removal for voice-to-text. Discover why cleaner audio can transcribe worse.

noise-removalvoice-to-textaudio-processingsignal-processingreal-time-audio

#57: From Lawyers in Limousines to Developers in Their PJs: The Voice Tech Revolution

From limo-riding lawyers to pajama-clad coders, voice tech is booming. Discover how AI is making it a force for good.

voice-technologyaccessibilityproductivity