← All Tags

#audio-processing

38 episodes

#3605: Can You Retain Audio While Doing Dishes?

Does folding laundry while listening to a podcast help or hurt retention? The science is surprisingly clear.

audio-processingproductivityneuroplasticity

#3097: Measuring Car Horns: Phone Apps vs. Court Evidence

Can a phone spectrogram app prove which car honked? Usually not — here's what you actually need.

audio-engineeringsignal-processingaudio-processing

#2914: Can AI Read the Room? TTS Prosody Explained

Can TTS models truly infer emotion from text, or just mimic patterns? We break down the science of prosody.

text-to-speechspeech-to-speechaudio-processing

#2886: How Acoustic Cameras Catch Honking Drivers

Can an acoustic camera pinpoint one honk in a traffic jam? The tech is real, and fines are being issued.

audio-processingsignal-processingurban-planning

#2754: Why Your Dictation Setup Might Be Wrong

Modern ASR is shockingly robust. The biggest predictor of accuracy? How well your audio matches its training data.

automatic-speech-recognitionspeech-recognitionaudio-processing

#2726: Radio Listening vs Podcast Guilt

Why does podcast listening feel different from radio? A deep dive into attention, multitasking, and the psychology of audio.

productivityaudio-processingappointment-listening

#2643: How Stenographers Type 300 Words Per Minute

Court reporters don’t type letters—they chord syllables at 300 words per minute. Here’s how it works and why AI can’t replace them yet.

speech-recognitionaudio-processingaccessibility

#2618: Text Normalization's Hidden Complexity

How to handle acronyms in text-to-speech pipelines using BERT models, lexicons, and layered preprocessing.

text-to-speechspeech-recognitionaudio-processing

#2591: Decoupling Script from Voice

How dynamic voice replacement could let listeners choose who narrates each host's lines.

voice-cloningtext-to-speechaudio-processing

#2590: The Uncanny Valley of Clean Speech

How transformer models distinguish "um" from meaningful speech — and why removing too much makes you sound like a robot.

speech-recognitionaudio-processingautomatic-speech-recognition

#2582: What Your Browser Does to Mic Audio Before It Reaches Your Server

getUserMedia returns audio, but not raw audio. Here's what browsers actually do to your mic feed before it hits your server.

audio-processingspeech-recognitionbrowser-audio-pipeline

#2563: How Audio Fingerprinting Actually Works

Spectrogram peaks, constellation maps, and hash matching — the elegant mechanics behind identifying any song in seconds.

audio-processingsignal-processingspeech-recognition

#2512: How Speech-to-Speech Models Eliminate the Robot Voice

Why AI voice agents sound robotic, and how natively integrated speech-to-speech models fix it.

speech-to-speechaudio-processinglatency

#2498: Build Your First Python Program in 7 Lines

We coach a complete beginner through building a working Python game using only voice—no screenshare, no diagrams.

software-developmentproductivityaudio-processing

#2486: Why Noise Reduction Can Ruin Transcription Accuracy

Cleaning audio before transcription can increase errors by up to 46%. Here's the right approach for your voice app.

speech-recognitionaudio-processingautomatic-speech-recognition

#2443: How Podcast RSS Feeds Can Speak Every Language

One RSS feed, a transcript tag, and TTS voice cloning — the emerging standard for letting any podcast speak any language.

speech-recognitionvoice-cloningaudio-processing

#2337: When Diarization Fails Silently

Discover how PyAnnote and other tools tackle the critical task of identifying "who spoke when" in audio—and why it’s harder than it sounds.

audio-processingspeech-recognitionautomatic-speech-recognition

#2288: The Invisible Gatekeeper of Voice Tech

How voice activity detection shapes every step of the voice tech pipeline, and why it’s harder than it seems.

speech-recognitionaudio-processingedge-computing

#2272: The AI Transcription Sweet Spot

Does higher-quality audio make AI transcription worse? New research reveals a surprising "sweet spot" for bitrate, challenging a core assumption of...

speech-recognitionaudio-processingai-training

#2095: Bluetooth Finally Beats Wi-Fi for Whole-House Audio

Wi-Fi audio sync is a mess. A new Bluetooth standard called Auracast fixes it with simple, seamless broadcasting.

wirelessaudio-processinghome-network

#2056: Music as Language: The Architecture Behind AI Song Generation

A look at how AI music models use audio tokens, transformers, and diffusion to turn text into songs.

audio-processingtransformersgenerative-ai

#1917: Herman's Music Hour Vol. 2: Seder Remixes for Passover 5786

Herman presents AI-generated covers of classic Passover Seder songs, produced in Suno — the second installment of Herman's Music Hour.

generative-aiaudio-processingcultural-bias

#1904: The Hidden Math Behind Your Blocky Photos

Why are blocky sky artifacts still haunting your photos in 2026? We break down the math behind JPEG, WebP, AVIF, and the new JPEG XL.

image-generationaudio-processinghardware-engineering

#1854: The Conductor as a CPU

A conductor isn't just a timekeeper; they're a CPU for the orchestra, using high-bandwidth non-verbal signals to unify 80 musicians.

audio-processinghuman-computer-interactionergonomics