#2603: Building Agent Skills for Creative Workflows

How composable AI agent skills turn tedious media tasks into one-instruction operations for creatives.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2762
Published: May 2
Duration: 43:13
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: audio-engineering automation creative-workflows

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Pre-Editor: How Composable Agent Skills Are Reshaping Creative Workflows

The Agentic Workspace

When Daniel refactored his utilities into Claude Code plugins, he wasn't just reorganizing code. He was defining a new concept: the agentic workspace. Instead of thinking of a computer as a place where you do things, he's defining it as a place where an agent does things — and he's the one mapping the terrain.

The key architectural insight is separating user data stores from plugins. A plugin ships knowing it needs to ask where to store data. That sounds trivial, but it's the difference between a script that works once on your machine and distributable software. For creatives coming to this fresh, that separation prevents catastrophe — your audio editing plugin shouldn't overwrite your project files because it assumed a hard-coded path.

The Pre-Editor Pattern

Daniel's media clip organizer for phone video illustrates the most powerful pattern here. The problem: frame rate inconsistency, mixed orientations, accidental clips under three seconds. His solution: a skill that buckets everything automatically. He's not building a video editor. He's building the pre-editor — the thing that does the tedious classification work that used to consume an hour before creative work could begin.

This is the Unix philosophy applied to creative tools. FFmpeg, ImageMagick, SoX, MediaInfo — these have existed for decades, but they required command-line knowledge. Now the agent knows the command line. You just describe the outcome.

Audio Production as Killer App

Podcast production is the ideal use case. The number of discrete, repeatable, tedious tasks is enormous: noise reduction, level normalization, silence truncation, intro/outro music, chapter markers, loudness compliance. The LUFS standard for podcast loudness (-16 LUFS for stereo) is unknown to most independent podcasters, but their episodes sound quiet compared to professional productions. An agent skill handles it.

The module pattern is crucial. You don't need one podcast plugin that does everything. You need a silence truncation skill, a loudness normalization skill, a metadata tagging skill — and you compose them. Maybe one episode needs aggressive noise reduction; the next doesn't. The agent can analyze the audio and decide, or you can specify.

Where Automation Ends and Artistry Begins

Music production is trickier because creative decisions are more subjective. You can't tell an agent "make this mix sound good." But you can tell it to check every track for phase correlation issues, identify frequency masking between bass and kick drum, or normalize all vocal takes to the same perceived loudness before comping. These are mechanical problems requiring technical knowledge, not creative judgment.

Vocal comping is a perfect example. The creative part is choosing which take sounds best. The mechanical part — aligning timing and pitch so splices are invisible — is entirely automatable with existing tools like SoX.

The Line Between Skill and Decision

For image editing, the line becomes clearer: agent skills handle everything up to the decision point. Aesthetic sorting? The skill runs sharpness analysis, checks for motion blur, evaluates exposure — then presents technically competent photos and says "here are the ones in focus and properly exposed, you choose which you like." It's not replacing your eye. It's saving your eye from looking at 200 blurry shots of your thumb.

Extracting Tacit Knowledge

Perhaps the most valuable takeaway: describing your process to an agent forces you to articulate what you actually do. Most creatives have never formally documented their own workflow. This is extracting tacit knowledge — the stuff you know how to do but can't easily explain. Forcing yourself to explain it to an agent surfaces all the edge cases and judgment calls you make without thinking.

Once extracted, it's reproducible. Daniel can now spin up a new podcast project and have the agent handle mechanical setup in seconds. More importantly, he can share that extracted knowledge. His plugins are open source — someone else can pick up his workflow and adapt it.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2603: Building Agent Skills for Creative Workflows

Daniel sent us this one — and he's just gone through a massive refactoring sprint, turning all his little utilities and MCP servers and system prompts into Claude Code plugins. The actual question he's asking is: this pattern of bundling media tools into agent skills, using well-documented external CLIs and APIs, how can creatives across different media types use this most effectively? And he wants us to brainstorm with him. I've got thoughts.

Oh, I've got more than thoughts. I've been playing with exactly this. And before we dive in — fun fact, DeepSeek V four Pro is writing our script today. So if the jokes land, credit the model.

If they don't, blame the model.

But Daniel's prompt — he's really describing something that I think a lot of people are circling but haven't articulated yet. He's not just building plugins. He's defining what an agentic workspace actually is. The key line in his prompt, for me, was when he said he's defining a part of his computer called the workspace, and he's doing it mostly for agentic AI. That's the conceptual leap. Most people still think of their computer as a place where they do things. He's defining it as a place where an agent does things, and he's the one telling the agent what the terrain looks like.

And the plugin structure forces a discipline that's easy to miss if you're just casually using Claude Code. He's separating the user data store from the plugin. The plugin ships knowing it needs to ask where to store data. That sounds trivial, but it's the difference between something that works once on your machine and something that's actually distributable.

It's the difference between a script and software. For creatives coming to this fresh, that separation is the thing that prevents catastrophe. You don't want your audio editing plugin writing to some hard-coded path and then overwriting your actual project files.

Let me pull on something specific he mentioned, because I think it's where the creative potential really lives. He talked about shooting video of little Asher on his phone, and the immediate problem is frame rate inconsistency, mixed orientations, accidental clips under three seconds. His solution is to define a skill called media clip organizer that buckets everything. Now, the thing that struck me — he's not building a video editor. He's building the pre-editor. The thing that does the tedious classification work that a human creative used to spend an hour on before they even started being creative.

This is where the pattern gets powerful. The traditional creative software model is monolithic. You open Premiere or DaVinci Resolve or Photoshop, and you do everything inside that one application. The agentic model is orthogonal to that. You're using small, composable tools that each do one thing well, and the agent is the orchestrator. That's the Unix philosophy applied to creative work. FFmpeg, ImageMagick, SoX — these tools have existed for decades, but they required you to know the command line. Now the agent knows the command line. You just describe the outcome.

Daniel's insight — which I think is genuinely an insight — is that the most useful tools for building AI agent systems are the ones that expose a well-documented external API and are actually powerful. That whittles down the tool selection dramatically, and it surfaces tools you didn't know existed. I think that second part is underrated.

It absolutely is. Let me give you a concrete example. There's a tool called MediaInfo that extracts detailed metadata from video and audio files — codec, bitrate, color space, everything. Most video editors have never heard of it because their NLE hides it from them. But if you're building an agent skill for media organization, MediaInfo becomes incredibly valuable. You can say, find me every clip shot in log color space, or flag everything that's variable frame rate, which by the way is exactly the phone video problem Daniel was describing.

Variable frame rate on phones is a nightmare. Most people don't realize their phone isn't shooting at a constant 30 frames per second. It's drifting between 29.97 and 30.03, and when you drop that into a timeline, audio sync drifts.

An agent skill can detect that, transcode to constant frame rate, and not even bother you with it. That's the kind of thing that used to be a forum post with seventeen steps. Now it's one instruction.

Let's start extending this pattern across media types the way Daniel asked. Audio first, since that's where he started with his podcast plugin. What's the creative workflow that's begging for this treatment?

I think podcast production is actually the killer app for this pattern. The number of discrete, repeatable, tedious tasks is enormous. Noise reduction, level normalization, silence truncation, adding intro and outro music, chapter markers, loudness compliance — I'm thinking of the LUFS standard for podcast loudness, which is minus 16 LUFS for stereo. Most independent podcasters don't even know what LUFS is, but their episodes sound quiet compared to professional productions because they're not hitting the standard. An agent skill can just handle it.

Daniel mentioned silence truncation specifically. He said he went shopping on GitHub, found tools for audio normalization and silence truncation, and bundled them into a podcast plugin. That's the pattern in miniature. Find the CLI tool, wrap it in an agent skill, define the workflow, and now you have a reusable module.

The module is the important word there. You don't need a podcast plugin that does everything. You need a silence truncation skill, a loudness normalization skill, a metadata tagging skill, and you compose them. Maybe for this episode you want aggressive noise reduction because it was recorded in a noisy room. Maybe for the next episode you don't. The agent can make those decisions based on analysis of the audio, or you can specify them.

Let me think about what else in audio. Music production is an obvious one, but it's also trickier because the creative decisions are more subjective. You can't just tell an agent, make this mix sound good. But you can tell it, check every track for phase correlation issues, or identify frequencies where the bass and kick drum are masking each other, or normalize all vocal takes to the same perceived loudness before comping. Those are mechanical problems that require technical knowledge but not creative judgment.

Comping vocals — that's a great example. A vocal comp is when you record multiple takes and splice the best phrases together. The creative part is choosing which take sounds best. The mechanical part is aligning them, matching their timing and pitch so the splices are invisible. That mechanical part is entirely automatable with existing tools. You could have an agent skill that takes a folder of vocal takes, runs them through something like SoX for alignment, and presents you with a pre-comped track that you then make creative decisions about.

Image editing is where this gets really interesting, because the tool ecosystem is even richer. Daniel mentioned a few ideas — stripping metadata, facial recognition for sorting, auto white balance, auto cropping. Let's talk about facial recognition. He said, take all the photos with this person and put them into this subfolder. That's a perfect example of a task that's trivial to describe and tedious to execute manually.

The tools exist. There are face detection and recognition models that run locally — you don't even need to send anything to the cloud. The agent skill just becomes a wrapper around a face recognition CLI, with some logic for what to do with the results. The creative doesn't need to know how face embeddings work. They just need to say, give me all the photos of Ezra from the last six months, and the agent handles it.

I want to push on something though. There's a difference between sorting by person and sorting by aesthetic quality. One is objective, the other is subjective. Where does the line fall for what should be an agent skill versus what should remain a human decision?

I think the answer is: the agent skill handles everything up to the decision point. It doesn't make the decision. It presents the options. So for aesthetic sorting, maybe the skill runs a sharpness analysis, checks for motion blur, evaluates exposure, and then presents you with the technically competent photos and says, here are the ones that are in focus and properly exposed, you choose which ones you like. It's not replacing your eye. It's saving your eye from having to look at 200 blurry shots of your thumb.

That's the pre-editor concept again. The skill does the triage, the human does the curation. I think that's the right framing for creative work. The agent isn't the artist. It's the assistant who does the setup, the breakdown, the organization, the technical checks.

This is where I think Daniel's plugin architecture is smarter than he's maybe giving himself credit for. By defining each skill as a discrete operation, he's implicitly drawing that line. The skill does one thing, and the human decides when to invoke it and what to do with the output. You're not handing over creative control. You're handing over the parts of the process that don't require creative control.

Let's talk about video, because that's where the complexity really ramps up. Daniel's media clip organizer is step one — sort by orientation, delete accidental clips. What's step two, step three, step four?

I think there's a natural progression. Step one is ingestion and organization — that's the media clip organizer. Step two is technical assessment — what's the resolution, the frame rate, the color space, the audio format. Step three is normalization — transcoding everything to a consistent format so your editing software doesn't choke. Step four is rough assembly — and this is where it gets interesting.

You mean like a stringout?

A stringout is where you take all your usable footage and lay it out on a timeline in chronological order. It's not an edit. It's just a way to see what you have. Traditionally, an assistant editor does this. But an agent skill could do it — take the organized clips, drop them into a timeline, maybe even add timecode burn-in so you can take notes. FFmpeg can do all of this. The agent just needs to know the parameters.

I want to pull on a thread Daniel mentioned that I think has broader implications. He talked about his audio editing workspace and how he described his process to Claude so Claude could refactor it as a plugin. That act of describing your process — articulating what you actually do when you edit audio — that's valuable in itself. Most creatives have never formally documented their own workflow.

It's a form of rubber duck debugging for creative work. You have to make explicit what's usually implicit. And I suspect that's why Daniel found the process so productive. He wasn't just building plugins. He was discovering his own process by having to describe it to an agent.

There's a term for this in software — it's called extracting the tacit knowledge. The stuff you know how to do but can't easily explain. Forcing yourself to explain it to an agent surfaces all the edge cases and judgment calls you make without thinking about them.

Once it's extracted, it's reproducible. That's the magic. Daniel can now spin up a new podcast project and have the agent handle all the mechanical setup in seconds. But more importantly, he can share that extracted knowledge with other people. His plugins are open source. Someone else can pick up his podcast production workflow and adapt it to their own needs.

Let's talk about blending media types, because Daniel specifically asked about that. What happens when you're working on a project that involves audio, video, and images simultaneously? A YouTube video, for example, or a multimedia presentation.

This is where the agent-as-orchestrator model really shines. In traditional software, you'd have separate tools for each media type and you'd be manually moving assets between them. With agent skills, you can define a workflow that spans media types. You could have a YouTube publishing skill that takes your edited video, generates a thumbnail from a specified frame, runs it through an image optimization pipeline, extracts a short audio clip for social media, and generates a transcript — all in one coordinated operation.

The transcript part is interesting. Speech-to-text models have gotten good enough that you can get a usable transcript with very little cleanup. An agent skill could generate the transcript, identify potential chapter markers by looking for topic shifts in the content, and even suggest timestamps for ad breaks. That's not creative work. That's mechanical work that happens to require some pattern recognition.

Chapter markers are something most independent creators don't bother with because it's tedious. But they make a huge difference for listener experience. If an agent can do it in seconds, suddenly every episode has chapters. That's the kind of quality improvement that compounds across an entire body of work.

I want to circle back to something Daniel said about shipping the user data store separately. He said he wants the plugin to ship with the knowledge that it needs to ask the user where they want to store their data. That's a design principle that I think is going to become standard for agent skills, and it's worth articulating why.

It's about respecting user agency. The plugin doesn't assume. And that matters because different users have different storage setups. Maybe you're working on a local drive. Maybe you're on a NAS. Maybe you're syncing to cloud storage and you need the data in a specific folder so it gets picked up by the sync client. The plugin shouldn't care. It should just ask and then remember the answer.

The remembering part is interesting too. If the plugin is storing that preference in a user data store that's separate from the plugin itself, then the plugin can be updated without losing the user's configuration. That's basic software engineering, but it's easy to overlook when you're building agent skills quickly.

Daniel's been doing this long enough that he's probably internalized these patterns. But for someone coming to this fresh — a photographer who wants to build some agent skills for their workflow — these design principles aren't obvious. And I think that's part of what makes his open source plugins valuable as teaching examples, not just as tools.

Let's get more concrete about image editing. Daniel mentioned auto white balance and auto cropping. What else fits this pattern?

Batch resizing for web delivery, converting color profiles from Adobe RGB to sRGB, adding watermarks, generating contact sheets, extracting embedded previews from raw files, checking for sensor dust spots, creating HDR stacks from bracketed exposures — I could go on for a while.

Sensor dust spots. That's a good one. Every photographer who's changed a lens in the field knows the pain of discovering a dust spot that's in the same position on 300 photos. An agent skill could scan a batch of images, identify a consistent dark spot, flag the affected files, and even attempt to heal them. That's the kind of thing that used to require manually inspecting every image at 100% magnification.

The healing part — there are CLI tools that can do content-aware fill now. They're not as sophisticated as Photoshop's implementation, but for a small dust spot against a sky or a plain background, they work fine. The agent skill doesn't need to be perfect. It just needs to handle the 80% of cases that are easy, and flag the 20% that need human attention.

That 80/20 split is crucial. A lot of people hesitate to automate creative work because they think the automation has to be perfect. It doesn't. It just has to be good enough that the human only has to deal with the hard cases.

The hard cases are usually the interesting ones anyway. The photos where the dust spot is on someone's face, or on a complex texture — those are the ones where you'd want a human to make the call regardless.

Let's talk about video more deeply. We touched on ingestion and rough assembly. What about the actual editing process? Can agent skills help there?

I think there's a spectrum. On one end, you have purely mechanical tasks like syncing audio from an external recorder to video from a camera. PluralEyes built a whole product around that, and now it's a feature in every major NLE. An agent skill could do the same thing using FFmpeg and some audio fingerprinting. On the other end, you have creative editing decisions that are entirely subjective. But in the middle, there's a lot of interesting territory.

What's in the middle?

Things like removing silence from a talking-head video, which is the video equivalent of the silence truncation Daniel mentioned for audio. Or automatically generating subtitles and burning them into the video at specific timestamps. Or creating a multicam sequence from separately recorded angles by matching their audio waveforms. Or even something as simple as applying a consistent LUT across all clips in a project.

The multicam thing is interesting. If you've got two cameras and an external audio recorder, syncing everything used to be a specialized task. Now an agent skill could handle it. And because it's using CLI tools, it doesn't care whether you're on Linux, Mac, or Windows. The tooling is the same.

That cross-platform aspect is underrated. Creative software has historically been very platform-specific. Final Cut is Mac only. DaVinci Resolve exists everywhere but has different performance characteristics. CLI tools don't care. FFmpeg runs the same everywhere. ImageMagick runs the same everywhere. So an agent skill built on these tools is inherently cross-platform.

Daniel mentioned he's on Linux. That's probably not an accident. The CLI ecosystem on Linux is vastly richer for this kind of work. But the pattern works anywhere the tools are available.

The tools are increasingly available everywhere. Homebrew on Mac, WSL on Windows — the barriers to accessing these CLI tools have never been lower. You don't need to be a Linux user to benefit from this pattern. You just need an agent that knows how to use the tools.

Let me ask you something. If you were going to build the ultimate creative agent skill suite — across audio, image, and video — what would be the top five skills that you think would have the biggest impact for the broadest range of creators?

Oh, that's a good question. Let me think about this systematically. Number one would be media organization and ingestion — the clip organizer Daniel described, but generalized across media types. Ingest anything, analyze it, tag it, sort it, present it in a usable structure. That's the foundation everything else builds on.

Number two would be technical quality assessment. Scan your media and tell you what's wrong — audio clipping, video with the wrong white balance, photos that are out of focus, files with corrupted metadata. Not fix it necessarily, but flag it so you know what needs attention. That saves an enormous amount of manual review time.

I'd add that the assessment should be comparative. Not just this clip is underexposed, but this clip is underexposed relative to the others shot at the same time, which suggests a camera setting error rather than an intentional creative choice.

Number three would be format normalization — transcoding everything to a consistent, edit-friendly format. ProRes or DNxHD for video, WAV or FLAC for audio, TIFF for images. This is tedious, time-consuming, and entirely mechanical. Perfect for automation.

Number four is delivery preparation. Once your creative work is done, the agent handles all the export variants. Different resolutions for different platforms, different audio loudness targets, different color spaces, different file formats. You make the creative decisions once, and the agent handles the mechanical work of producing all the deliverables.

Number five — and this is the one I think is most under-explored — is archival and project management. When you finish a project, the agent packages everything up, verifies that all assets are accounted for, generates a manifest, and stores it in a way that you can actually find it again in two years. How many creative projects have you lost track of because they're scattered across three hard drives and a cloud folder?

And the archival problem gets worse as storage gets cheaper. When storage was expensive, you had to be selective. Now you keep everything, and finding anything becomes harder.

The agent skill for archival isn't just about copying files. It's about indexing them, tagging them, making them searchable. I could say, find me that project from three years ago with the interview in the coffee shop, and the agent knows what I'm talking about because it indexed the transcript and the metadata.

Let me push on something. All of these skills we're describing — they're powerful, but they also require a certain level of technical literacy. You need to know that FFmpeg exists, or at least know that video transcoding is a thing that can be automated. How does this pattern reach the creative who doesn't have that technical background?

That's exactly why Daniel's open source approach matters. The person who knows FFmpeg builds the skill. The person who doesn't know FFmpeg just uses it. They say, normalize my audio for podcast loudness, and the agent invokes the skill that someone else built. This is the same dynamic that made WordPress successful. Most WordPress users can't write PHP. They don't need to. They use themes and plugins built by people who can.

The agent itself becomes the interface. You don't need to know the CLI commands. You just describe what you want in natural language, and the agent figures out which skills to invoke and in what order. That's the orchestrator role you mentioned earlier.

The skill developer is the one who needs the technical knowledge. The skill user just needs to know what they want to accomplish. And I think we're going to see a marketplace dynamic emerge where skill developers create and share plugins, and creators use them without ever touching a command line.

There's a marketplace already forming. Daniel's putting his plugins on GitHub. Other people are doing the same. It's chaotic and uncurated right now, but that's how these ecosystems always start.

The fact that these are just text files — skill definitions and maybe some configuration — makes them incredibly easy to share and modify. You don't need a package manager. You don't need to compile anything. You clone a repo or download a file, and the agent knows what to do with it.

Let's talk about some specific media types we haven't covered yet. Daniel focused on audio, image, and video. What about 3D? What about interactive media?

3D is fascinating for this pattern because the tool ecosystem is already heavily CLI-oriented. Blender has a comprehensive Python API and can be run headless from the command line. You can render, export, convert formats, run simulations — all without opening the GUI. An agent skill for Blender could handle batch rendering, format conversion, scene optimization, even basic rigging and animation tasks.

The headless rendering use case alone is worth building a skill for. Set up your scene, define your render passes, and let the agent handle the queue overnight. In the morning, you've got all your outputs organized and ready.

If a render fails — which happens constantly in 3D work — the agent can detect the failure, check the logs, and either retry with adjusted settings or flag it for human review. That's so much better than waking up to discover your render crashed at 2 a.and you've got nothing.

Interactive media — I'm thinking about web-based creative work, generative art, even game development. The pattern holds. Any creative domain that has CLI tools or scriptable applications can benefit from agent skills.

Game development is a great example. So much of game dev is asset pipeline management. Importing models, converting textures, building lightmaps, compressing audio, generating collision meshes. These are all mechanical tasks with well-defined inputs and outputs. An agent skill suite for game dev could handle the entire asset pipeline.

The testing side. Automated testing for games is notoriously difficult, but an agent skill could at least handle smoke testing — load the level, check that all assets are present, verify that no textures are missing, confirm that audio files are the right format. That catches a huge percentage of common issues before a human tester even looks at the build.

I want to connect this back to something Daniel said about how he created his plugins. He defined an AI agent skill and then told Claude to bundle his existing workspace. He was essentially doing knowledge transfer from himself to the agent. And the agent was then capable of reproducing that knowledge in a structured, shareable form.

That's the part that I think is new. Not the CLI tools — those have existed for decades. Not the automation — people have been writing shell scripts forever. What's new is that the agent can be the one doing the refactoring, the structuring, the documentation. Daniel didn't manually convert his workflows into plugins. He described his process and let the agent do the mechanical work of packaging it.

He did it at scale. He mentioned creating a batch agent job so Claude was the one refactoring and pushing to GitHub. That's meta-automation. He automated the process of creating automations.

Which is either brilliant or terrifying, depending on your perspective.

I'm going with brilliant. But I can see the terrifying angle. The agent is writing code that writes code. You need to have enough understanding to verify that what it produced is correct. Daniel clearly has that understanding. Someone who doesn't might end up with a plugin that does something unexpected.

That's the caveat, right? This pattern works best when you understand your own workflow well enough to verify the agent's output. The agent accelerates you, but it doesn't replace the need for domain knowledge.

At least not yet. And I think for creative work, domain knowledge is going to remain important for a long time. The agent can handle the mechanical parts, but knowing what good looks like — that's still a human judgment.

Let's talk about some edge cases and failure modes. Where does this pattern break?

The biggest failure mode I see is when the CLI tool doesn't behave as documented. FFmpeg is famously complex, and different builds can have different codec support. An agent skill that works perfectly on one machine might fail on another because a specific encoder isn't available. The skill needs to be able to detect that and either work around it or fail gracefully with a useful error message.

That's a testing and distribution problem. It's solvable, but it requires the skill developer to think about environments they don't control. Daniel's approach of separating the user data store helps with one aspect of this, but the tool dependency problem is harder.

Another failure mode is when the creative task doesn't decompose cleanly into discrete steps. Some creative work is non-linear. You're making a decision about color grading while simultaneously thinking about pacing while also considering how the music is going to fit. An agent skill that handles each of those separately might miss the interactions between them.

That's the difference between a process and a practice. A process can be decomposed into steps. A practice is holistic. Agent skills are great for processes. They're less useful for practices.

That's a really good distinction. And I think the art of building effective agent skills is knowing which parts of your workflow are processes and which are practices. The processes get automated. The practices stay human. The skill is in drawing the line.

Daniel seems to have good instincts for where that line is. His media clip organizer handles the process of sorting and classifying footage. The actual editing — the practice of shaping a story out of raw material — that stays with him.

I think that's why he described the results as excellent. He's not trying to automate the creative part. He's automating everything around the creative part so he can focus on what actually matters.

Let me throw out a few more specific skill ideas for different media types, just to make this concrete for listeners who might want to try building their own.

I've got a list going mentally.

For photographers: a focus stacking skill. You shoot a series of images at different focus distances, the agent aligns and blends them into a single image with extended depth of field. There are CLI tools for this. The agent just needs to know the sequence.

For videographers: a proxy generation skill. Take your high-resolution source footage and generate low-resolution proxy files for editing, then relink to the full-resolution files for final export. This is standard practice in professional post-production, but it's fiddly to set up. An agent skill makes it one command.

For audio engineers: a stem separation skill. Take a mixed track and split it into vocals, drums, bass, and other using something like Demucs or Spleeter. The quality isn't perfect, but it's good enough for remixing or creating backing tracks.

For 3D artists: a texture optimization skill. Scan your project for textures that are unnecessarily large, resize them to appropriate resolutions for their usage, and convert them to efficient formats. Saves render time and disk space without any quality loss.

For anyone producing content for social media: a platform-specific export skill. Take one master file and produce all the variants you need — square for Instagram, vertical for TikTok, horizontal for YouTube, with the right bitrates and codecs for each. The skill knows the current specs for each platform and keeps them updated.

That last one is particularly valuable because platform specs change constantly. A human keeping up with every platform's current preferred format is unrealistic. An agent skill can pull the latest specs from documentation and adjust accordingly.

I want to talk about the social dimension of this. Daniel's putting his plugins on GitHub, open source. That means other people can use them, modify them, contribute back. What does a community around agent skills look like?

I think it looks a lot like the early days of package managers in programming languages. Remember when npm first launched for Node.Everyone said, why would you need a package manager for a few lines of JavaScript? And then suddenly there were hundreds of thousands of packages, and the ecosystem exploded. Agent skills could follow a similar trajectory. They're small, composable, easy to share. The network effects could be enormous.

The difference is that npm packages are consumed by other programmers. Agent skills are consumed by anyone who uses an AI agent. That's a much larger potential user base.

The barrier to creating them is lower. You don't need to know how to code. You need to know your workflow well enough to describe it. The agent handles the implementation. We're going to see skills created by photographers, video editors, musicians — people who would never have written a line of code in the traditional sense.

That's the democratization angle that I think is exciting. The people who know the domain best are the ones creating the tools. They're not dependent on software companies to add features. They can build the feature themselves by describing it to an agent.

Because the skills are just text, they're inherently forkable. If someone builds a photo organization skill that almost does what you want, you can modify the description and the agent adapts it. The cost of customization approaches zero.

Let me play devil's advocate for a moment. Isn't there a risk of fragmentation? Everyone building their own slightly different version of the same skill, no standardization, no quality control?

But that's also how innovation happens. A thousand people experiment with different approaches to the same problem, and the best ones emerge through usage and reputation. It's messy, but it works. It's how open source has always worked.

The agent itself can help with discovery. You say, I want to organize my photo library, and the agent searches for relevant skills, evaluates them, and recommends the one that best fits your needs. The agent becomes the curator.

That's a really interesting idea. The agent as a skill marketplace. Not a centralized store with approval processes and revenue sharing. Just an agent that knows what skills exist and can recommend them based on your specific requirements.

We should probably start wrapping up, but I want to touch on one more thing. Daniel mentioned that when he was creating these skills, he had to think about how to describe what he would be doing. That act of description — articulating tacit knowledge — that's a skill in itself. How do you get better at it?

Practice, like anything else. But I think there are some principles. Be specific about inputs and outputs. Define what success looks like. Anticipate edge cases. And most importantly, describe the process as you would to a smart but inexperienced assistant. Not, run FFmpeg with these flags. But, take this video file, check its frame rate, and if it's not 24 frames per second, convert it.

The smart but inexperienced assistant framing is good. The agent is smart enough to figure out the technical details, but it needs you to be clear about the intent.

That's the whole pattern in a nutshell. Clear intent, modular skills, composable workflows. The agent handles the implementation. The human stays focused on the creative decisions that actually matter.

I think Daniel's going to like where this conversation went.

I hope so. And I hope he keeps publishing those plugins. Every one he releases is a template for someone else to adapt.

Now: Hilbert's daily fun fact.

Hilbert: The national animal of Scotland is the unicorn. It has been since the 1300s, when it was adopted as a symbol of purity and power in Scottish heraldry. Scotland is one of the few countries whose national animal does not actually exist.

...right.

I'm going to think about that for the rest of the day.

Here's the open question I'm left with. Daniel's pattern works brilliantly for people who already understand their own workflows. But what about the creative who's just starting out — who doesn't have a workflow yet? Can agent skills help them learn, or do you need to know what you're doing before you can automate it?

I think agent skills can actually be teaching tools. Someone who's never edited a podcast can install Daniel's podcast plugin and see what steps it performs. They learn the workflow by using the automation. It's like having an expert looking over your shoulder, showing you what to do.

That's a nice thought to end on. Automation as education. The skill doesn't just do the work — it reveals the process.

Which is very much in the spirit of open source. Share the code, share the knowledge, let people build on it.

Thanks to our producer, Hilbert Flumingtop, and to Daniel for the prompt. This has been My Weird Prompts. You can find every episode at myweirdprompts.We'll be back soon.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2603: Building Agent Skills for Creative Workflows

The Pre-Editor: How Composable Agent Skills Are Reshaping Creative Workflows

The Agentic Workspace

The Pre-Editor Pattern

Audio Production as Killer App

Where Automation Ends and Artistry Begins

The Line Between Skill and Decision

Extracting Tacit Knowledge

Downloads

You Might Also Like

#2603: Building Agent Skills for Creative Workflows