Archive: This documents the V4 pipeline implementation. See Pipeline for the current documentation.

Pipeline V4 Documentation

Voice-to-Podcast Automation with AI - Featuring Fish Audio TTS

Last Updated: December 2025

Overview

The My Weird Prompts pipeline transforms voice-recorded prompts into full podcast episodes with AI-generated dialogue, cover art, and automatic publishing. The pipeline uses Fish Audio TTS with pre-trained voice models to create natural conversations between two AI hosts: Corn the Sloth and Herman the Donkey.

Pipeline Phases

1

Processing

  • Voice Upload: User's voice prompt uploaded to processing queue
  • Transcription: Google Gemini 3 Flash transcribes and extracts metadata
  • Audio Processing: FFmpeg normalizes and prepares prompt audio

Technology: Gemini 3 Flash Preview for transcription, FFmpeg for audio processing

2

Generation

  • Script Generation: AI creates dialogue script between Corn and Herman
  • Cover Art: Flux AI generates unique episode artwork (3 variants)
  • TTS Dialogue: Fish Audio TTS generates voice audio with character personalities

Technology: Gemini for scripting, Flux Schnell for images, Fish Audio for TTS

3

Assembly

  • Combines intro jingle, disclaimer, user prompt, AI dialogue, and outro
  • Loudness normalization to -16 LUFS (podcast standard)
  • MP3 encoding at 192kbps, 44.1kHz

Technology: FFmpeg for audio assembly and normalization

4

Publishing

  • CDN Upload: Audio and images uploaded to Cloudinary
  • Archive: Full episode backed up to Wasabi S3-compatible storage
  • Database: Metadata inserted into Neon PostgreSQL
  • Blog Post: Markdown file generated for Astro static site

Technology: Cloudinary CDN, Wasabi object storage, Neon PostgreSQL

5

Deployment

  • Git push triggers automatic Vercel deployment
  • New episode goes live on website within minutes
  • RSS feed automatically updated for podcast apps

Technology: Vercel auto-deploy, Astro static site generator

Technology Stack

AI Services

  • Google Gemini 3 Flash Preview
  • Fish Audio TTS
  • Flux Schnell (via fal.ai)
  • Replicate (backup)

Storage

  • Cloudinary (CDN)
  • Wasabi S3 (Archive)
  • Neon PostgreSQL
  • GitHub (Source)

Deployment

  • Astro (Static Site)
  • Vercel (Hosting)
  • FFmpeg (Audio)
  • Python Pipeline

Episode Output

For each episode, the pipeline creates:

Final Audio: MP3 file with full podcast episode
Cover Art: 3 AI-generated cover image variants
Metadata: Title, description, tags, timestamps
Transcript: Full dialogue script
Blog Post: Markdown file for website

Cost Estimate

Service Cost per Episode Notes
Fish Audio TTS ~$0.30-0.40 15-minute episode
Image Generation ~$0.01-0.05 3 cover variants
Transcription Minimal Free tier
Storage ~$0.01 Wasabi + Cloudinary
Total per Episode ~$0.35-0.50 Approximate

Key Features

🎙️

Voice Cloning

Fish Audio TTS creates natural-sounding AI hosts with distinct personalities

🎨

AI Art Generation

Unique cover artwork for every episode using Flux AI

Fully Automated

Voice prompt to published episode in minutes

📊

Production Quality

Professional audio normalization and podcast standards

Documentation

The pipeline process is fully documented. View the current technical documentation to learn how the pipeline works.