#3283: Fine-Tuning DeepSeek for One Podcast

Can a purpose-specific fine-tune fix a model's stubborn writing tics? We explore the practical engineering behind it.

Featuring

Listen

0:00

Episode Details

Episode ID: MWP-3453
Published: Jun 5
Duration: 32:26
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: fine-tuning large-language-models ai-training

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

After thousands of episodes of this podcast, a persistent problem has emerged: DeepSeek has stubborn writing tics that no amount of system prompting or review agents can permanently fix. Words like "genuinely" appear constantly, analogy patterns cycle through the same three templates, and the model keeps reverting to these habits episode after episode. The proposed solution is deceptively simple: take a hundred scripts, write human feedback notes on each one, and use that data to fine-tune a version of DeepSeek optimized specifically and solely for producing this podcast.

The practical engineering behind this is well-defined. The core idea is human reinforcement learning applied to a single production pipeline — not making the model better at everything, but better at exactly one thing. A hundred scripts sits in the sweet spot documented by a January 2025 paper on few-shot fine-tuning. Using LoRA (Low-Rank Adaptation), you train a small adapter layer — just a few megabytes — on top of the frozen base model, costing roughly ten to twenty dollars an hour on a single A100 GPU. The real cost is the human time spent writing meaningful feedback.

The training approach matters significantly. Direct Preference Optimization (DPO) emerges as the stronger choice over supervised fine-tuning because it teaches the model your taste rather than forcing it to mimic exact rewrites. With DPO, each script generates preference pairs: the original output versus a human-edited version. The feedback must cover both positive and negative patterns — what to double down on, not just what to avoid — to prevent the model from simply shifting one tic to another word. There's a documented failure mode where training out one overused intensifier causes another to emerge, making diverse coverage across a hundred episodes the minimum viable dataset.

For character personalities, a hybrid approach works best. The fine-tune should target persistent cross-episode issues like word overuse, analogy templates, and sentence length patterns. The system prompt should handle who is speaking and how they sound. This keeps character traits adjustable per episode while baking prose style preferences directly into the model's weights — the difference between asking someone to avoid a word and rewiring their vocabulary so that word doesn't naturally come to mind.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#3283: Fine-Tuning DeepSeek for One Podcast

Daniel sent us this one — he's been thinking about a problem that's been quietly driving us both up the wall. After three thousand plus episodes, we've noticed DeepSeek has some... let's call them personality tics. The word "genuinely" shows up constantly. There are these strange analogy patterns that feel like the model has exactly three templates it cycles through. And the thing is, no amount of system prompting or review agents seems to kill these habits permanently. So the question is: what if we took a hundred scripts, wrote human feedback notes on each one, and used that to fine-tune a version of DeepSeek optimized specifically and solely for producing this podcast? Not a generalist fine-tune — a purpose-specific one. And alongside that, where do the show elements like character personalities live? Baked into the fine-tune, or kept at the system prompt level? Let's dig into whether this is actually practical engineering or just a beautiful fantasy.

I love this question because it gets at something that every production team using LLMs eventually runs into. You build this elaborate system prompt, you've got a review agent checking outputs, and yet the model keeps doing the same weird things over and over. It's like having a brilliant writer who cannot stop saying "utilize" instead of "use" no matter how many times you ask.

The linguistic equivalent of a facial tic.

And here's the thing — this is actually a well-defined problem with a known solution. What we're talking about is human reinforcement learning applied to a single production pipeline. The core idea is simple: you're not trying to make the model better at everything. You're trying to make it better at exactly one thing. That's a much easier problem.

Where do you even start with something like this? Walk me through the practical steps.

Let's start with the data. The prompt suggests pulling a hundred episodes and writing short feedback notes on each one. That's actually a really good number. There was a paper back in January twenty twenty-five — "Few-Shot Fine-Tuning of LLMs for Domain-Specific Tasks" — that showed measurable gains with as few as fifty to two hundred examples when using LoRA. A hundred is right in the sweet spot.

LoRA being what, exactly, for the person who knows the acronym but not the mechanism?

Low-Rank Adaptation. The short version is that instead of retraining all the billions of parameters in the model, you train a small adapter — a lightweight set of weights that sits on top of the frozen base model. It's like adding a thin layer of custom behavior without touching the underlying intelligence. The adapter might only be a few megabytes, compared to the hundreds of gigabytes of the full model. And critically for this use case, it's cheap. We're talking ten to twenty dollars an hour on a single A100 GPU from Lambda Labs or RunPod.

The cost barrier here is basically negligible for a production that's already paying for inference anyway.

The real cost is the human time — sitting down and writing meaningful feedback on a hundred scripts. But let's talk about what that feedback actually looks like, because the structure matters a lot for how you convert it into training data.

This is where I imagine a lot of people get stuck. You've got a script, you have feelings about it, but how do you turn "this paragraph felt off" into something a model can learn from?

There are two main approaches here, and they map onto the two dominant fine-tuning paradigms. The first is supervised fine-tuning, or SFT. In SFT, you take each script, identify the problematic parts, and you rewrite them. So if the original script said "we need to consider the important implications here" — which, by the way, is not an exaggeration of what we've seen — you'd edit that down to a single "" or remove it entirely. The model then learns from the corrected version.

The second approach?

The second is preference-based learning, specifically DPO — Direct Preference Optimization. This is the more sophisticated approach. Instead of giving the model one corrected version, you give it pairs of outputs and tell it which one you prefer. So for each script, you'd have the original DeepSeek output and a human-edited version, and you'd mark the edited version as preferred. The model learns to predict what makes one output better than another, not just what the correct output looks like.

DPO feels more aligned with what we're actually doing here. The feedback isn't "this is wrong" so much as "this version is better than that version.

That's exactly why DPO is probably the right call for this use case. SFT is simpler to implement, but it can be brittle. The model learns to mimic the exact rewrites you gave it, which might not generalize well to new episode topics. DPO teaches the model your preferences — your taste — and lets it apply those preferences to novel situations.

That's an important distinction. You're not teaching it what to write. You're teaching it what good writing looks like.

And the feedback notes themselves become the basis for constructing those preference pairs. Let me give you a concrete example. Say we have a script where DeepSeek used "" four times in a single paragraph. The human editor reads it, writes a note that says "good pacing in this section, but the '' repetition is distracting — cut to one instance at most." That note then drives the creation of a preference pair: the original paragraph versus the edited version with only one ".

You do this a hundred times across a hundred episodes.

A hundred episodes, and crucially, the feedback should cover both positive and negative patterns. That's something the prompt specifically called out, and it's a really important point. If you only flag problems, the model learns what to avoid but not what to double down on. You want feedback that says "this transition was sharp, this joke landed, this analogy actually worked" alongside the corrections.

Otherwise you end up training a model that's just... It knows what not to do but has no positive direction.

And there's a documented failure mode here that's worth flagging. There are cases where a fine-tuned model learned to avoid one overused word and immediately started overusing a different one. So you train out "" and suddenly every other sentence has "actually." The fix is making sure your feedback examples cover a broad enough range of patterns that the model doesn't just shift its tic to a new word.

Like whack-a-mole with vocabulary.

actually a perfect description. And it gets at why a hundred episodes is probably the minimum viable number. You need enough diversity in your feedback examples that the model sees the pattern behind the patterns. Not "don't say " but "don't overuse any single intensifier.

Let's talk about the data engineering side. What does the actual training dataset look like? How do you structure a hundred episodes of feedback into something you can feed to a fine-tuning pipeline?

This is where it gets interesting from an implementation standpoint. Each training example needs a prompt and a completion — or in the DPO case, a prompt and a pair of completions with a preference label. The prompt would be the system prompt plus the episode context. So everything the model normally receives before it starts writing: the character descriptions, the episode plan, the tone guidance, the topic framing. The completion is the script itself.

DeepSeek's context window is a hundred and twenty-eight thousand tokens, which is relevant here because full episode scripts can get long.

A thirty-minute episode script can easily run four to five thousand words. That's maybe seven to eight thousand tokens. You've got room to fit the full system prompt, the episode plan, and the complete script all in one training example without truncation. That matters because you don't want to train on partial scripts — the model needs to see complete outputs to learn complete behaviors.

Now, the prompt mentioned something interesting — the idea that this fine-tune wouldn't be a generalist improvement. It would be purpose-specific, optimized solely for producing this podcast. What does that actually mean in practice?

It means the training data is deliberately narrow. If you were doing a generalist fine-tune, you'd want diverse examples across many domains — legal writing, creative fiction, technical documentation. Here, every single training example is a podcast script with the same format, the same hosts, the same structural conventions. The model is learning a very specific distribution.

Which has both advantages and risks.

The advantage is that you can get really good at this one thing with relatively little data. The model doesn't have to maintain general capabilities — it just has to nail this specific format. The risk is catastrophic forgetting, where the model loses general knowledge because it's over-optimized for one task.

Catastrophic forgetting is such a dramatic name for what's essentially a model getting laser-focused.

It sounds like the title of a bad sci-fi novel. But it's a real problem. If you fine-tune too aggressively, the model might forget how to discuss topics that didn't appear in your hundred training episodes. It could become brittle — great at exactly the kind of episodes you trained on, terrible at anything slightly different.

LoRA helps with this because...

Because LoRA only modifies a small subset of the model's weights. The vast majority of the parameters stay frozen. So the model retains its general knowledge and language understanding, and the LoRA adapter just layers on the specific stylistic preferences. It's much harder to catastrophically forget when you're only touching maybe one percent of the parameters.

That makes the whole proposition feel less terrifying. You're not rewriting the model's brain. You're giving it a style guide.

A style guide encoded directly into its weights, rather than into a system prompt it can ignore. And that's really the core value proposition here. System prompts are instructions. They're suggestions. Models can and do deviate from them. Fine-tuning bakes the preferences into the model's actual behavior. It's the difference between asking someone to avoid a word and rewiring their vocabulary so that word doesn't naturally come to mind.

That's a compelling framing. So we've got the data collection approach, we've got the training methodology. Let's get to the other big question in the prompt: what about the character personalities? Should Herman's sarcastic tone and Corn's analytical framing be baked into the fine-tune, or should they stay in the system prompt?

This is where I'd argue strongly for a hybrid approach. The character-specific traits — the fact that I'm enthusiastic and nerdy, that you're dry and analytical — those should stay in the system prompt. There are a few reasons for this.

I'm listening.

First, character definitions might change. Not dramatically — we're not going to suddenly become different people — but you might want to adjust the balance. Maybe for a particular episode, you want me to be more skeptical or you to be more playful. If those traits are baked into the fine-tune, you can't easily tweak them episode by episode. They're fixed.

The fine-tune handles the prose style, and the system prompt handles the character voice.

The fine-tune should target persistent, cross-episode issues: word overuse, analogy templates, sentence length patterns, transition quality. Things that are problems regardless of which character is speaking. The system prompt handles who is speaking and how they sound.

There's also a practical consideration here. If you bake character personalities into the fine-tune, you'd need to maintain separate fine-tuned models for each character. That's a lot of overhead.

It creates consistency problems. If Corn's fine-tuned model and Herman's fine-tuned model drift in different directions, you could end up with dialogue that doesn't feel like a real conversation. Keeping character in the system prompt means the base writing style is consistent and the character differentiation is applied on top.

The division of labor is: fine-tune for taste, system prompt for personality.

That's the thesis. Now, let me add one nuance. There are some show-level elements that might benefit from being in the fine-tune. Things like the overall tone of the show — the balance of humor to substance, the pacing of back-and-forth exchanges, the way we structure transitions. Those aren't character-specific, they're format-specific. And they're exactly the kind of thing that system prompts struggle to consistently enforce.

Because "be funny but not too funny" is impossibly vague as an instruction.

But if you have a hundred examples where the human editor has marked certain jokes as landing and others as forced, the model can learn that boundary. It can internalize the show's comedic register in a way that a system prompt can't capture.

There's a middle layer. Not character voice, not word-level tics, but something like...

I'm stealing that. And that middle layer is where fine-tuning really shines, because it's the thing that's hardest to specify in natural language. You can say "maintain a dry, witty tone" but what does that actually mean in practice? The fine-tuned model learns it from examples.

Let's talk about evaluation. How do you know if the fine-tune actually worked? What does success look like?

This is where you need both quantitative and qualitative measures. The quantitative side is straightforward. You count things. How many times does "" appear per thousand words in the fine-tuned model versus the base model? What's the distribution of sentence lengths? How many unique analogy structures show up across twenty test episodes?

You're literally just measuring the things you were trying to change.

Start there, yes. But those are surface-level metrics. The deeper question is whether the scripts are actually better. For that, you need human evaluation. I'd propose an A/B test: take twenty new episode prompts that weren't in the training set, generate scripts from both the base model and the fine-tuned model, and have a human rater — or ideally multiple raters — judge them blind.

Blind meaning the rater doesn't know which model produced which script.

And you'd rate on multiple dimensions: overall quality, humor effectiveness, dialogue naturalness, absence of annoying tics. If the fine-tuned model consistently scores higher across those dimensions, you've got evidence that the fine-tuning worked.

There's a subtlety here though. If the same person who wrote the feedback notes is also doing the evaluation, there's a risk of just training the model to match one person's taste.

That's a valid concern, and it's actually a feature as much as a bug for this specific use case. We're not trying to create a model that pleases everyone. We're trying to create a model that produces scripts that match the editorial taste of this specific production. If Daniel is the one writing the feedback and doing the evaluation, the model is learning to write scripts that Daniel would approve. That's the goal.

It's not objective quality. It's alignment with a specific editorial standard.

Which is exactly what a purpose-specific fine-tune should do. Now, if you wanted to generalize beyond one person's taste, you'd want multiple raters and some kind of inter-rater reliability measure. But for a single production with a single editorial voice, one rater is fine.

Let's talk about the practical engineering challenge. Someone listening to this might think, okay, this sounds great in theory, but is it actually worth the effort for a single podcast? Walk me through the cost-benefit.

On the cost side, you're looking at maybe ten to fifteen hours of human time to write feedback on a hundred scripts, assuming five to ten minutes per script. Then maybe another few hours to format the training data and run the fine-tuning job. The compute cost, as we said, is under fifty dollars on a single GPU. So total cost is maybe fifteen to twenty hours of time and fifty bucks.

The benefit is that every future script gets better without additional intervention. You're not fixing the same problems over and over in the review agent. You're not writing increasingly elaborate system prompts to catch edge cases. The model just... doesn't make those mistakes anymore. Or at least makes them much less frequently.

The break-even point is basically whenever the cumulative time saved on manual fixes exceeds the upfront investment.

For a production that's done three thousand episodes and plans to do thousands more, that break-even comes fast. But there's a bigger picture here too. This isn't just about one podcast. This is a proof of concept for any organization that wants a custom writing model tuned to their specific voice and standards.

The generalizable insight being that fine-tuning for taste is becoming practical at small scale.

And that's a shift. For years, fine-tuning was seen as something you did if you had millions of examples and a team of ML engineers. LoRA and DPO changed that. You can now do meaningful, targeted fine-tuning with a hundred examples and a single GPU. That's a qualitative change in who can use these techniques.

Let me push on something. The prompt mentioned that the review agent catches some issues but can't fix ingrained patterns. Why is that? Why can't a good system prompt plus a review loop solve this without fine-tuning?

Because the review agent is fundamentally reactive. It sees the output and says "fix this." But the fixes it applies are surface-level. It might catch a specific instance of "" and replace it, but it doesn't change the underlying tendency to overuse intensifiers. The next script will have the same problem, just with different words. The review agent is treating symptoms. Fine-tuning treats the cause.

It's the difference between editing a draft and training the writer.

And there's a deeper issue too, which is that review agents have limited context. They're looking at one script at a time. They can't learn patterns across episodes. They can't develop a sense of "this is the third time this month we've used that exact analogy structure." A fine-tuned model, trained on a hundred episodes of feedback, can internalize those cross-episode patterns.

The review agent is a spell-checker. The fine-tune is an education.

A very targeted, slightly obsessive education focused entirely on not saying ".

Which,, would be nice.

You did that on purpose.

So let's get into some of the details that would actually matter if someone were implementing this. How do you select the hundred episodes for the training set?

You don't want a random sample. You want maximum diversity across topic types, episode structures, and — crucially — you want to include both strong scripts and problematic scripts. If you only train on your best episodes, the model doesn't learn what to avoid. If you only train on your worst episodes, it doesn't learn what to aspire to.

You're curating, not sampling.

Curating with intention. I'd suggest something like: twenty episodes that represent your absolute best work, where the feedback is mostly positive reinforcement. Twenty episodes that were problematic, where the feedback is heavy on corrections. And sixty episodes that fall somewhere in the middle, with mixed feedback covering a range of issues.

The feedback notes themselves — how detailed do they need to be?

Detailed enough to be actionable, brief enough to be sustainable. You're not writing a dissertation on each script. The prompt described short notes, and that's the right approach. Something like: "Strong opening, good pacing through the first segment. The train analogy in segment two feels forced — consider cutting or replacing. The transition at minute twelve is sharp. '' appears three times in the closing — reduce to zero or one.

That's maybe sixty seconds of feedback per script if you know what you're looking for.

Which is why the ten-minute-per-script estimate is realistic. You're reading the script, you're noting patterns, you're writing a few sentences. It adds up across a hundred episodes, but per episode it's not a heavy lift.

Now, there's a question that I think a lot of people would have at this point. If you fine-tune on a hundred episodes, and those episodes were themselves generated by DeepSeek, aren't you just reinforcing the model's existing tendencies? How do you avoid training on synthetic data that already has the problems you're trying to fix?

This is a really important point, and it's why the human feedback is essential. You're not training on the raw DeepSeek outputs. You're training on preference pairs where the preferred output is the human-edited version. The human editor is injecting new information — their taste, their judgment — that wasn't in the original model. That's what breaks the cycle.

The human is the source of the signal. Without the human in the loop, you'd just be amplifying the model's quirks.

And this is actually a microcosm of how RLHF works at scale. The big labs don't train on raw model outputs — they train on human preference data. We're just doing it at the scale of a single production rather than a global deployment.

Let's talk about what happens after the fine-tune. You've got your LoRA adapter. How do you actually use it in production?

The nice thing about LoRA adapters is that they're swappable. You load the base DeepSeek model, you load your adapter on top, and you run inference as normal. The adapter is small — maybe a few hundred megabytes — so you can store multiple versions, do A/B testing, roll back if something goes wrong. It's not like you've permanently altered the base model.

You could even have multiple adapters for different purposes. One for podcast scripts, one for something else entirely.

That's the dream, yes. A library of small, purpose-specific adapters that you swap in and out depending on the task. The base model provides the general intelligence, and the adapter provides the specific taste.

I want to go back to something you mentioned earlier about the risk of whack-a-mole with vocabulary. How do you actually prevent that in practice? If you train out "" but the model just picks up "actually" or "fundamentally" or whatever the next intensifier is, haven't you just moved the problem?

This is where the feedback needs to target the pattern, not the specific word. If your feedback notes say "don't use " the model might learn to avoid that word. If your feedback notes say "avoid overusing any single intensifier — vary your language" and you show examples across multiple overused words, the model can learn the higher-level principle.

The feedback needs to be abstract enough to generalize but concrete enough to be learnable.

That's the art of it. And it's why writing good feedback is a skill. You're not just copy-editing. You're teaching the model principles of good writing through examples. "This paragraph uses three different intensifiers in close succession — the repetition weakens all of them" is better feedback than "change '' to 'truly.

There's almost a meta-lesson here about how to give feedback to humans, too. Don't fix the word, fix the habit.

actually a really nice parallel. Good feedback, whether to humans or models, addresses the underlying pattern rather than the surface instance. Now, can we talk about one more technical detail that I think is worth covering? The choice between doing this as a one-time fine-tune versus an iterative process.

I was just about to ask that. Do you fine-tune once on a hundred episodes and call it done, or do you keep fine-tuning as you produce more episodes?

I'd argue for iterative fine-tuning, but with a light touch. Here's what that might look like: you do your initial fine-tune on a hundred episodes. Then, every few months, you take the scripts you've produced since the last fine-tune, write feedback on them, and do a small update. Maybe twenty scripts at a time instead of a hundred.

The model evolves with the show.

The show evolves. New quirks emerge. An iterative approach keeps the model aligned with the current editorial standard rather than freezing it at a single point in time.

The cost of those incremental updates is even lower than the initial fine-tune because you're working with less data.

Twenty scripts of feedback might take two hours to produce and cost ten dollars to train on. At that point, it's basically maintenance.

Let's address the elephant in the room. Or maybe the skeptic in the room. Is this actually worth doing for a single podcast, or is this a solution in search of a problem? The review agent plus system prompt approach is working. Scripts are good. Why add complexity?

It's a fair question. And I think the honest answer is that for some productions, it might not be worth it. If your review agent is catching ninety-five percent of issues and the remaining five percent don't bother you, fine-tuning is overkill. But if you're spending significant time every episode fixing the same patterns, if you're writing increasingly baroque system prompts to handle edge cases, if you're frustrated that the model keeps making the same mistakes despite your best efforts — then fine-tuning starts to look very attractive.

It's about the cumulative friction. A small annoyance repeated three thousand times becomes a large annoyance.

There's an intangible benefit too. There's something satisfying about having a tool that's been shaped to your specific needs. A fine-tuned model feels like yours in a way that a generic model with a system prompt doesn't. For a creative production, that matters.

I think there's also an argument that this is where things are heading anyway. As models become more capable, the bottleneck shifts from raw intelligence to taste. Everyone will have access to smart models. The differentiator will be how well the model aligns with your specific creative vision.

We're already seeing this in image generation, where fine-tuned models produce work that's visually indistinguishable from specific artists. The same thing is coming for text. General intelligence becomes a commodity. Taste becomes the premium feature.

We're essentially describing the commoditization of intelligence and the premiumization of taste.

Which is a very weird sentence, but I think it's accurate. And it means that the skill of fine-tuning for taste — knowing how to select examples, write feedback, construct preference pairs — that's going to be a valuable skill. Not just for ML engineers, but for editors, creative directors, anyone who shapes content.

Let's pull this back to something concrete. If someone listening wanted to try this next week, what's the minimal viable experiment?

Take ten scripts. Your most recent ten. Write one paragraph of feedback on each. Identify two or three patterns you want to fix and two or three things you want to preserve. Create preference pairs by editing the problematic sections. Then use an off-the-shelf LoRA fine-tuning script — there are plenty on GitHub, and Hugging Face has good tutorials — and train an adapter on those ten examples. It won't be perfect, but it'll give you a sense of whether the approach is worth scaling up.

Ten scripts, one paragraph each, an afternoon of work.

Maybe twenty dollars of GPU time. The barrier to entry is really that low. The hard part isn't the technology. It's sitting down and articulating what you actually want.

Which is, now that I think about it, the hard part of most things.

It really is. The technology is the easy part. The taste is the hard part.

To synthesize what we've covered: the core idea is taking roughly a hundred episodes, writing structured feedback that targets patterns rather than instances, using DPO to train a LoRA adapter that bakes editorial taste into the model, and keeping character-specific traits in the system prompt where they remain flexible. The cost is low, the process is repeatable, and the primary investment is the human time to articulate what good looks like.

That's the summary. And I'd add one thing: this isn't theoretical. The techniques we've described — DPO, LoRA, preference-based fine-tuning — are all well-established. The January twenty twenty-five paper showed that fifty to two hundred examples is enough. The tools exist. The question isn't whether it's possible. It's whether the specific production has the appetite to invest the human time.

Whether the current friction is annoying enough to justify that investment.

For a production that's done three thousand episodes and plans to do thousands more, I'd say the case is strong. The cumulative time saved on not fighting the same battles over and over probably pays for the upfront investment within a few months.

Before we wrap, I want to touch on one thing we didn't fully explore. The prompt mentioned that the feedback should cover what works well, not just what's problematic. I think that's worth underlining, because there's a natural tendency when editing to focus on problems.

And it's not just about morale — though that matters for humans, obviously. It's about giving the model a complete picture. If you only flag problems, the model learns a very narrow definition of quality: the absence of errors. But good writing isn't just error-free. It has energy, rhythm, surprise. Positive feedback teaches the model what to aim for, not just what to avoid.

"This transition is sharp" is just as informative as "this analogy is forced.

Maybe more informative, because it's harder for a model to learn what makes something good than what makes something bad. Bad is usually just a pattern violation. Good is harder to specify.

Which brings us back to taste. You can't reduce good writing to a checklist. You can only show examples and say "like this, not like that.

That's exactly what fine-tuning with preference pairs does. It doesn't try to define good writing. It just shows the model a bunch of choices and says "this one.

Let's land this plane with something actionable.

First, start small. Ten to twenty scripts, LoRA fine-tuning, see what happens. Second, keep character and episode-specific context in the system prompt. Fine-tune only for persistent cross-episode style issues that the review agent can't fix. Third, write feedback that targets patterns, not instances. Don't say "change this word." Say "vary your intensifiers" and show examples across multiple words.

If you try it, share what you learn. The whole point of this exercise is that fine-tuning for taste is still underexplored territory for small productions. The more people who experiment and report back, the better the collective understanding gets.

There's a genuine opportunity here.

You know what.

I don't know what you're talking about.

I'm going to fine-tune you out of existence.

And now: Hilbert's daily fun fact.

Hilbert: The 1912 edition of the Tuvalu Shipping Register contains a handwritten marginal note describing a game of Eton fives played on the deck of a cargo schooner using a ball made of rolled-up sailcloth and a buttress from the ship's wheelhouse as the court wall. The note records a final score of eleven to three and adds, in fading ink, "conditions unfavorable.

...right.

This has been My Weird Prompts. If you enjoyed this episode, leave us a review wherever you listen — it helps. No, I'm not apologizing. Find us at myweirdprompts.

I'm Herman Poppleberry.

I'm Corn. See you next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#3283: Fine-Tuning DeepSeek for One Podcast

Downloads

You Might Also Like

#3283: Fine-Tuning DeepSeek for One Podcast