An Experiment in Progress
My Weird Prompts is more than a podcast. It's a longitudinal record of AI-generated media — how language models write, how text-to-speech renders, and how both evolve over time.
The Research Angle
Every episode is produced by the same automated pipeline: a human records a short voice prompt, and AI handles everything else — transcription, script generation, fact-checking, voice synthesis, and publication. The entire process is documented, versioned, and archived.
Because the pipeline uses the same underlying architecture across hundreds of episodes, the archive functions as a natural experiment. The independent variable is time — and with it, the evolving capabilities of the AI models powering each stage. The dependent variables are the outputs: script quality, factual accuracy, voice naturalness, conversational flow, and production coherence.
The show launched in late 2025 and has produced over 1,000 episodes as of March 2026. Every episode includes structured metadata recording exactly which models, versions, and pipeline configurations were used — making it possible to trace changes in output quality back to specific model updates.
The Zenodo Archive
CERN-backed open-access repository with DOIs for every episode
Every episode is archived in the My Weird Prompts Zenodo community, providing permanent, citable records under a CC-BY-4.0 license. Zenodo is hosted by CERN and designed for long-term preservation of research data.
What's Included Per Episode
Episode Audio
The complete episode as an M4A file — the full AI-generated dialogue with intro, transitions, and outro.
Full Transcript
Plain text combining the original human voice prompt and the complete AI-generated script with speaker labels.
Production Metadata
Structured JSON with episode tags, categories, pipeline version, LLM model, TTS engine, duration, and links.
What the Data Tracks
The metadata preserved with each episode makes it possible to study how AI-generated media changes over time across multiple dimensions.
LLM Evolution
Every episode records which Gemini model version generated the script. As Google updates Flash and other models, the archive captures how script quality, factual grounding, conversational naturalness, and creative range shift.
TTS Quality
The audio files themselves are a record of text-to-speech evolution. Combined with the transcript, researchers can study pronunciation accuracy, prosody, emotional range, and hallucination rates across engine versions.
Pipeline Architecture
The pipeline version field tracks structural changes — when new editing passes were added, when parallel TTS was introduced, when safety checks changed. Each version represents a different approach to automated production.
Topic Coverage
With structured tags, categories, and full-text transcripts across 1,000+ episodes, the archive is a broad-spectrum sample of what one person asked an AI to discuss over months of daily use.
Potential Research Applications
The dataset is openly available for academic and independent research. Here are some directions we think would be particularly interesting.
Longitudinal LLM Benchmarking
Use the transcripts to measure how Gemini's outputs change over model updates — vocabulary diversity, sentence complexity, factual accuracy, and reasoning depth across hundreds of episodes produced weeks or months apart.
TTS Naturalness & Hallucination Analysis
Compare audio against transcripts to study TTS failure modes — mispronunciations, inserted words, repeated phrases, or tonal artifacts. Track whether these improve as TTS engines are updated.
AI Fact-Checking Effectiveness
The pipeline includes two automated fact-checking passes using Google Search grounding. Evaluate how often factual errors survive the review process and whether accuracy improves over time.
Prompt-to-Output Analysis
Each episode pairs a short human voice prompt (1-3 minutes) with a long AI-generated script (15-40 minutes). Study how models interpret, expand, and sometimes drift from the original intent.
AI-Generated Dialogue Structure
Analyze conversational patterns in AI-written multi-speaker dialogue — turn-taking conventions, topic transitions, humor attempts, and how well the model maintains distinct character voices.
Corpus Linguistics & Content Analysis
With 1,000+ full-length transcripts spanning technology, politics, science, culture, and philosophy, the archive is a substantial corpus for studying AI-generated language at scale.
Additional Open Data
Hugging Face Dataset
Structured episode dataset with metadata, transcripts, and embeddings — ready for ML workflows.
Source Code
The complete pipeline is open source — from voice recording app to generation pipeline to website.
Technical White Paper
Full documentation of the pipeline architecture, cost analysis, safety mechanisms, and lessons learned.
Using This Data?
If you're working with the MWP dataset in a research or educational context, we'd love to hear about it. The archive is CC-BY-4.0 licensed — cite as you would any open dataset.