Open Access

An Experiment in Progress

My Weird Prompts is more than a podcast. It's a longitudinal record of AI-generated media — how language models write, how text-to-speech renders, and how both evolve over time.

The Research Angle

Every episode is produced by the same automated pipeline: a human records a short voice prompt, and AI handles everything else — transcription, script generation, fact-checking, voice synthesis, and publication. The entire process is documented, versioned, and archived.

Because the pipeline uses the same underlying architecture across hundreds of episodes, the archive functions as a natural experiment. The independent variable is time — and with it, the evolving capabilities of the AI models powering each stage. The dependent variables are the outputs: script quality, factual accuracy, voice naturalness, conversational flow, and production coherence.

The show launched in late 2025 and has produced over 1,000 episodes as of March 2026. Every episode includes structured metadata recording exactly which models, versions, and pipeline configurations were used — making it possible to trace changes in output quality back to specific model updates.

The Zenodo Archive

CERN-backed open-access repository with DOIs for every episode

Every episode is archived in the My Weird Prompts Zenodo community, providing permanent, citable records under a CC-BY-4.0 license. Zenodo is hosted by CERN and designed for long-term preservation of research data.

What's Included Per Episode

Episode Audio

The complete episode as an M4A file — the full AI-generated dialogue with intro, transitions, and outro.

Full Transcript

Plain text combining the original human voice prompt and the complete AI-generated script with speaker labels.

Production Metadata

Structured JSON with episode tags, categories, pipeline version, LLM model, TTS engine, duration, and links.

Browse the Archive on Zenodo

What the Data Tracks

The metadata preserved with each episode makes it possible to study how AI-generated media changes over time across multiple dimensions.

LLM Evolution

Every episode records which LLM generated the script. With a rotating pool of models across multiple families (via OpenRouter), the archive captures how script quality, factual grounding, conversational naturalness, and creative range vary across model families and versions.

TTS Quality

The audio files themselves are a record of text-to-speech evolution. Combined with the transcript, researchers can study pronunciation accuracy, prosody, emotional range, and hallucination rates across engine versions.

Pipeline Architecture

The pipeline version field tracks structural changes — when new editing passes were added, when parallel TTS was introduced, when safety checks changed. Each version represents a different approach to automated production.

Topic Coverage

With structured tags, categories, and full-text transcripts across 1,000+ episodes, the archive is a broad-spectrum sample of what one person asked an AI to discuss over months of daily use.

Potential Research Applications

The dataset is openly available for academic and independent research. Here are some directions we think would be particularly interesting.

Longitudinal LLM Benchmarking

Use the transcripts to compare outputs across model families (Xiaomi MiMo, DeepSeek, MiniMax, Gemini) and track changes over model updates — vocabulary diversity, sentence complexity, factual accuracy, and reasoning depth across hundreds of episodes produced weeks or months apart.

TTS Naturalness & Hallucination Analysis

Compare audio against transcripts to study TTS failure modes — mispronunciations, inserted words, repeated phrases, or tonal artifacts. Track whether these improve as TTS engines are updated.

AI Fact-Checking Effectiveness

The pipeline includes two automated fact-checking passes using Google Search grounding. Evaluate how often factual errors survive the review process and whether accuracy improves over time.

Prompt-to-Output Analysis

Each episode pairs a short human voice prompt (1-3 minutes) with a long AI-generated script (15-40 minutes). Study how models interpret, expand, and sometimes drift from the original intent.

AI-Generated Dialogue Structure

Analyze conversational patterns in AI-written multi-speaker dialogue — turn-taking conventions, topic transitions, humor attempts, and how well the model maintains distinct character voices.

Corpus Linguistics & Content Analysis

With 1,000+ full-length transcripts spanning technology, politics, science, culture, and philosophy, the archive is a substantial corpus for studying AI-generated language at scale.

Using This Data?

If you're working with the MWP dataset in a research or educational context, we'd love to hear about it. The archive is CC-BY-4.0 licensed — cite as you would any open dataset.

Get in Touch Visit Zenodo Archive