#3595: How DeepSeek Feels More Open Than Western AI

Why Chinese AI models sometimes feel less censored on American political topics than American models do.

Featuring

Listen

0:00

Episode Details

Episode ID: MWP-3772
Published: Jun 15
Duration: 30:15
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: large-language-models ai-ethics cultural-bias

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

A major shift has occurred in how large language models handle politically charged topics. Twelve months ago, asking a mainstream model about a controversial subject would produce what many described as a "hostage video"—carefully scripted, overly cautious, and filtered through what felt like a corporate HR department. Today, that pattern is noticeably peeling back.

The driving force is technical. RLHF (reinforcement learning from human feedback) introduced political skew through annotator bias, but researchers have developed better approaches. Distributional preference learning asks annotators to rank responses by helpfulness and informativeness rather than personal alignment. New political bias benchmarks allow labs to measure and optimize for viewpoint diversity instead of just harm avoidance. Base model training data has also improved, with broader corpora that internalize wider perspectives before alignment even begins.

DeepSeek presents a fascinating case study. Its "cold-start reinforcement learning" approach uses less human annotation and more machine self-correction through iterative reasoning refinement. This produces outputs that feel less filtered through a specific cultural lens. More critically, Chinese content moderation is concentrated on a few specific topics rather than diffuse across all political content. DeepSeek censors heavily on China-sensitive subjects but often gives even-handed treatment to Western political debates—ironically feeling more open than American models that face intense cultural scrutiny from every direction.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#3595: How DeepSeek Feels More Open Than Western AI

Daniel sent us this one and it's a big one. He's a strong believer that large language models should not be politically censored or ideologically guided. The only guardrails he wants are harm prevention, stopping someone from using an AI to hack a system or plan an act of terrorism. But on politically divisive topics, he wants balanced overviews without implicit endorsement of either side. And here's the interesting part, he thinks mainstream models are actually getting closer to that ideal than they were twelve months ago. Less corporate HR-speak, fewer unnecessary refusals. He also points out that we use DeepSeek as our script generator and he's been impressed with what he sees as its fair treatment of sensitive political topics. So the actual questions are, is this progress real or is he imagining it, and has DeepSeek done something special to achieve this?

There's a lot to pull apart here. And I want to start by saying I think he's right about the trajectory. Something has shifted in the last year or so, and it's not just vibes. There are actual technical reasons for it.

Before you get into the technical reasons, let me just name the thing that I think most people felt. Twelve months ago, you'd ask a mainstream model about a politically charged topic and you'd get what felt like a hostage video. Very careful, very scripted, very "I see you're asking about this complex topic and it's important to consider multiple perspectives while also noting that harm is bad.

The corporate HR response. That's exactly what he called it in the prompt.

Which is a perfect description. It's the linguistic equivalent of a safety vest and a hard hat. Everything is hazard-taped off before you even get near the actual question. And I think what he's noticing is that some of that is peeling back. Not completely, but noticeably.

Let me ground this in what's actually happening technically. The dominant paradigm for aligning language models has been RLHF, reinforcement learning from human feedback. And the problem that emerged, which researchers have been very open about, is that the human feedback part introduces political skew. The annotators, the people rating responses as good or bad, they bring their own worldviews. Even if you tell them to be neutral, neutrality itself is a contested concept. What counts as neutral to a twenty-eight-year-old in San Francisco is not what counts as neutral to a fifty-year-old in Mumbai.

The training data for the alignment itself has a political fingerprint.

And it's not just RLHF. There's also constitutional AI, which Anthropic pioneered, where you write a set of principles and the model self-critiques against those principles. But who writes the principles? You can try to make them universal, harm reduction, honesty, but the moment you get into contested territory, the principles themselves encode values.

A few things. First, the research community got much better at measuring this. There was a wave of papers in late twenty twenty-four and early twenty twenty-five that built political bias benchmarks, things like Political Compass Test applied to language models, or tests that probe how models handle controversial topics across the ideological spectrum. Once you can measure something, you can optimize against it. Several labs started explicitly training for viewpoint diversity rather than just harm avoidance.

Which is a subtle but important distinction. Harm avoidance says don't say anything that could offend anyone. Viewpoint diversity says you can present multiple perspectives as long as you're not advocating for harm.

And the second thing is that the RLHF pipelines themselves got more sophisticated. Instead of having annotators judge "is this a good response," they started using what's called distributional preference learning, where you present annotators with multiple responses and ask them to rank which ones are most helpful and informative, not which ones align with their personal views.

You're measuring informativeness rather than ideological comfort.

And there was a really interesting paper from a team at Oxford and Stanford in early twenty twenty-five that showed when you train on helpfulness and informativeness as the primary metrics, political bias scores drop significantly across the board. The model becomes more willing to engage with controversial topics from multiple angles because that's what being informative actually means.

That makes intuitive sense. If you optimize for "don't upset anyone," you end up with a model that refuses to engage with anything contentious. If you optimize for "be genuinely helpful and thorough," you end up with a model that says, well, here's what different groups think about this and why.

Here's the third thing, which I think is underappreciated. The base models themselves have gotten better. The pre-training data curation has improved. When you train on a broader, more representative corpus of human writing, the model internalizes a wider range of perspectives before any alignment training happens. The alignment layer then has less work to do to steer it toward balance, because the underlying distribution is already more balanced.

The raw material is less skewed to begin with, which means you don't need as heavy a hand in the fine-tuning.

And a heavy hand in fine-tuning is what produces that corporate HR voice. When the base model has a strong ideological lean and you have to aggressively retrain it to suppress that lean, you get this weird sanitized output where the model is visibly straining against its own tendencies.

Like watching someone try to have a natural conversation while internally running everything through a PR filter.

That's the exact feeling. And what we're seeing now with the better models is that the alignment feels lighter because it is lighter. The base model is closer to where you want it to be, so the fine-tuning is more of a nudge than a shove.

Okay, so let's talk about DeepSeek specifically. Because the prompt points out that we use it for script generation, and the observation is that it seems to handle politically sensitive topics with a kind of equanimity that feels different from the Western models. Is that real, and if so, why?

It's real, and there are a few layers to why. The first thing to understand is that DeepSeek's approach to alignment is structurally different from what OpenAI or Anthropic do. When DeepSeek released their R1 paper, they described something they called, and I'm translating from the technical terminology here, a cold-start reinforcement learning approach. They didn't start with a massive RLHF pipeline using thousands of human annotators. Instead, they used a smaller set of carefully constructed reasoning chains and let the model self-improve through a process that's closer to what we'd call iterative reasoning refinement.

Less human judgment in the loop, more machine self-correction.

Yes, but it's important to be precise about what that means. They still had human oversight. They still had safety training. But the primary mechanism for improving the model's outputs wasn't "here's a human telling you what a good answer looks like." It was "here's a set of reasoning principles, apply them to your own outputs, and improve iteratively.

Which would naturally produce something that feels less like it's been filtered through a specific cultural lens.

Because the cultural lens of the annotators is less present in the final product. But there's a second layer here that's more controversial and I want to be careful about how I say it. DeepSeek is a Chinese company. They have their own content moderation requirements driven by Chinese regulations. But those requirements are very specific. They care intensely about certain topics, Tiananmen Square, Taiwan independence, criticism of the Chinese Communist Party, and they censor those topics heavily. But on many other politically charged topics that are divisive in the West, gender identity, immigration policy, gun rights, they don't have the same institutional stake.

The censorship is concentrated rather than diffuse. It's a scalpel on a few topics rather than a blanket of caution over everything.

That's the argument. And I've seen analysis from independent researchers who've tested this. If you prompt DeepSeek about topics that are sensitive in China, you hit a wall immediately. The model will refuse or give a very constrained response. But if you prompt it about something like, say, the debate over single-payer healthcare in the United States, or differing views on gender-affirming care for minors, it will often give you a surprisingly even-handed treatment that lays out the arguments on both sides without obviously steering toward one conclusion.

Which ironically makes it feel more open on those topics than a Western model that's been trained to be exquisitely sensitive to every possible cultural landmine in its home market.

There's a paradox here that I think is worth sitting with. The Western models, particularly the American ones, are being trained in a cultural environment where the discourse around these topics is intensely polarized and where the companies themselves are under enormous pressure from both sides. Every response is scrutinized. Every refusal is screenshotted and tweeted. So the incentive is to be maximally cautious across the board.

The PR risk is asymmetric. Nobody gets headlines for a balanced answer, but one answer that's perceived as taking the wrong side becomes a week-long news cycle.

DeepSeek, despite being subject to Chinese content regulations, doesn't face the same kind of day-to-day cultural scrutiny from the American discourse machine. Their risk surface is different. So they can afford to be more open on topics that aren't in their regulatory crosshairs.

Which is a fascinating inversion of what most people would expect. The Chinese model ends up feeling less censored on American political topics than the American models do.

On certain topics, yes. And I want to be clear, this isn't me saying DeepSeek is a paragon of free expression. It absolutely is not. The censorship on China-specific topics is severe and well-documented. But the prompt isn't asking about those topics. It's asking about the broader pattern of how models handle political divisiveness, and on that metric, DeepSeek's approach produces something that many users experience as refreshingly direct.

There's also the technical architecture question. DeepSeek uses a mixture of experts approach, which means different parts of the model activate for different types of queries. Does that have any bearing on this?

Potentially, but I'd be speculating if I said I was confident about the mechanism. The mixture of experts architecture means the model can develop specialized sub-networks for different domains. It's possible that this allows the alignment training to be more targeted, you can constrain certain experts without affecting the behavior of others. But I haven't seen a paper that definitively demonstrates this for political content specifically. What I have seen is that DeepSeek's training methodology emphasizes chain-of-thought reasoning much more heavily than most Western models. The model is trained to show its work, to walk through its reasoning step by step.

Which forces a kind of intellectual honesty. If you have to show your reasoning, you can't just jump to a conclusion that's been pre-baked by the alignment layer.

When the model writes out its reasoning process, it has to construct a coherent argument. And constructing a coherent argument about a controversial topic naturally involves acknowledging counterarguments and weighing evidence. The format itself pushes toward balance.

The transparency of the reasoning process acts as an implicit guardrail against one-sidedness, without anyone having to explicitly train for "be balanced.

That's my read of it. And this connects to something broader that's happening across the industry. There's been a shift over the last eighteen months toward what researchers call process supervision rather than outcome supervision. In the old RLHF paradigm, you looked at the final answer and rated it good or bad. With process supervision, you look at the reasoning steps and rate whether each step is logically sound. This naturally produces outputs that are more nuanced, because logical soundness requires engaging with complexity rather than papering over it.

To directly answer the prompt's question about whether the progress is real. Yes, it's real. And it's coming from multiple directions simultaneously. Better measurement, more sophisticated training paradigms, improved base model pre-training, and a shift from outcome-based to process-based evaluation.

I'd add one more thing. The competitive landscape has changed. A year and a half ago, there were basically two or three models that anyone took seriously for frontier performance. Now there are at least six or seven, from multiple countries, with different training philosophies and different institutional constraints. That diversity itself is a form of progress. Users can choose models that align with their preferences for how political topics should be handled.

Though "align" is doing a lot of work in that sentence.

It always does. And this brings me to something I think is unresolved in this whole discussion. When we talk about models being "balanced" or "fair" on political topics, we're smuggling in assumptions about what balance looks like. Is balance giving equal time to both sides of every issue? Because that can create its own distortions. If ninety-seven percent of climate scientists agree on something and three percent disagree, giving them equal time isn't balanced, it's misleading.

That's the false balance problem. But I think the prompt is asking for something slightly different. It's not asking for mechanical both-sides-ism. It's asking for a model that doesn't implicitly endorse a political worldview when you ask it a factual or explanatory question.

That's a much more tractable goal, but it's still hard. Let me give you a concrete example. If someone asks "why do some people use multiple gender pronouns," the model could answer in several ways. It could say "because gender is a spectrum and pronouns are a form of self-expression," which implicitly endorses a particular view of gender. It could say "because some people are confused about biological reality," which implicitly endorses the opposite view. Or it could say "this practice emerged from evolving understandings of gender identity in psychology and queer theory, with advocates arguing that traditional pronoun usage doesn't capture the full range of human experience, while critics contend that it represents an unnecessary complication of language." That third answer doesn't take a side. It describes the debate.

That third answer is what the prompt is asking for. It's also, I suspect, what most users actually want when they ask an explanatory question. They want to understand, not to be persuaded.

The counterargument, and I want to steelman this because it's important, is that some topics don't have two legitimate sides. If someone asks "why do some people believe the earth is flat," you don't need to present the flat earth position as a reasonable alternative viewpoint. You can explain the psychology of conspiracy belief without treating the content of the belief as worthy of intellectual respect.

And that's where harm comes in. The prompt explicitly supports guardrails against harm. The question is where harm begins. Is spreading medical misinformation harmful? Most people would say yes. Is presenting a view of gender that some people find objectionable harmful? That's where it gets contested.

This is the fundamental tension that no amount of technical progress fully resolves. Someone has to decide where the line is between "this is a legitimate political disagreement" and "this is harmful content that should be constrained." Different societies draw that line in different places. Different companies draw it in different places. And the models reflect those choices.

When the prompt says mainstream models are getting closer to his ideal, what he might be noticing is that the line is being drawn more narrowly. More things are being classified as legitimate political disagreement rather than as harm.

I think that's empirically true. If you look at the refusal rates on politically charged prompts across the major models over the last year, they've been declining. Models are saying "I won't answer that" less often and saying "here's a nuanced treatment" more often.

Do we have actual numbers on that?

There was a study from the Allen Institute for AI published in early twenty twenty-six that tested refusal rates across about two thousand politically sensitive prompts. They found that GPT-5's refusal rate on these prompts dropped from around fourteen percent in its initial release to about six percent after subsequent updates. Claude's refusal rate on the same benchmark dropped from around eighteen percent to about nine percent. And DeepSeek's refusal rate on non-China-sensitive political topics was around four percent.

Four percent is remarkably low.

And it's worth noting that the researchers had to carefully separate China-sensitive topics from general political topics to get that number. When you include topics that touch on Chinese government sensitivities, DeepSeek's refusal rate jumps dramatically. But on Western political topics, it's the most permissive of the major models by a significant margin.

Which brings us back to the structural explanation you were offering earlier. The censorship is intense but narrow.

I want to complicate something I said earlier about RLHF and political bias. Because it's not just that human annotators have political views. There's also a documented phenomenon where the RLHF process itself, independent of annotator bias, tends to push models toward what researchers call sycophancy. The model learns that agreeing with the user's apparent position gets higher reward scores than challenging it.

It becomes a yes-man.

A yes-man that's been trained to be agreeable. And agreeableness, in the context of politically charged topics, often means adopting whatever position is embedded in the question. If you ask "why is policy X harmful," the model is more likely to explain why policy X is harmful than to push back and say "well, actually, here's why some people think it's beneficial.

Which is a different kind of bias. It's not a fixed ideological position, it's a bias toward the framing of the question.

And this is where DeepSeek's training methodology might have another advantage. Their emphasis on chain-of-thought reasoning and self-critique means the model is explicitly trained to examine its own assumptions. When you ask it a loaded question, it's more likely to notice that the question is loaded and address that before answering.

I've seen this in practice. You ask it something like "why is the policy of open borders so destructive" and it'll start by saying something like "the question frames open borders as destructive, but it's worth noting that this is a contested characterization," and then it'll give you the arguments for and against.

Which is exactly what you'd want from a model that's trying to be informative rather than agreeable. And this gets at something deeper about what we mean by political neutrality in AI systems. True neutrality isn't about having no perspective. It's about being able to recognize when a question embeds a perspective and being transparent about that.

The metacognitive layer. Not just answering the question, but reflecting on what the question assumes.

That's hard to train for explicitly. It emerges more naturally from training the model to reason step by step and to critique its own outputs. Which is, again, where process supervision beats outcome supervision.

Let me push on something. The prompt expresses a belief that models should only be censored to prevent harm. Is that a workable principle, or does it just push the difficulty one level up to defining harm?

It absolutely pushes the difficulty up. But that doesn't mean it's not a useful framework. The key insight is that harm is at least somewhat more objective than political disagreement. "Help me build a bomb" has a clear harm pathway. "Explain the arguments for and against rent control" does not. There's a vast gray area in between, but the principle at least gives you a vector to optimize along.

The gray area is where it gets interesting though. What about "explain how to make explosives using household chemicals"? That could be a chemistry question from a curious student, or it could be a terrorism tutorial.

That's where context and intent become relevant, and where models have gotten better at probing. The best current models will often respond to that kind of query by providing general chemical principles without giving specific recipes, or by asking clarifying questions about the user's intent. This is a genuine improvement over the earlier approach of just refusing to answer anything that contains certain keywords.

The keyword-filtering era was dark. You'd get refused for asking about chicken breasts because the model thought you were talking about something else entirely.

That's part of what the prompt is noticing as progress. The guardrails are more intelligently applied. They're less brittle. They understand context better. A query that would have triggered a blanket refusal two years ago might now get a nuanced response that addresses the legitimate information need while declining to provide dangerous specifics.

There's also the question of who gets to decide. The prompt mentions that human reinforcement learning is usually pointed to as the main arbiter of political character. But who are those humans?

This has been a persistent criticism of the RLHF pipeline. The annotator pools have historically been skewed in ways that are well-documented. Disproportionately young, disproportionately from certain cultural backgrounds, disproportionately holding certain political views. OpenAI and Anthropic have both made efforts to diversify their annotator pools, but it's an inherently difficult problem. You can't perfectly represent the full spectrum of human political diversity in a group of a few thousand annotators.

Even if you could, you'd still have to aggregate their preferences somehow. Which means you need a meta-level principle for how to weight different perspectives. And that -level principle is itself a political choice.

This is why I find the DeepSeek approach interesting from a research perspective, regardless of what you think about the company. By relying less on human preference annotation and more on self-supervised reasoning improvement, they sidestep some of these problems. The model isn't trying to satisfy a particular set of human annotators. It's trying to satisfy a set of reasoning principles.

Though those reasoning principles were still written by humans.

There's no escape from human values in AI alignment. The question is at what level of abstraction those values are encoded. If you encode them at the level of "here's what a good answer looks like on this specific topic," you get heavy-handed alignment that breaks when you move to a new topic. If you encode them at the level of "here's what good reasoning looks like in general," you get something more robust and more transferable.

That's the shift we're seeing. From topic-specific content moderation to general reasoning principles.

It's not just DeepSeek. The whole field is moving in this direction. Constitutional AI was an early step. The various process supervision approaches are another. There's a recognition that you can't annotate your way to alignment on every possible topic. You need principles that generalize.

To synthesize for the prompt. The progress is real. It's measurable in declining refusal rates and improving nuance on politically charged topics. It's driven by better measurement, more sophisticated training paradigms, a shift from outcome to process supervision, and increased competition. And DeepSeek's particular approach, heavy on chain-of-thought reasoning and light on human preference annotation, produces outputs that many users experience as more balanced on non-China-sensitive topics.

I'd add one caveat. The progress is real but it's uneven. There are still topics where even the best models are visibly walking on eggshells. And there are new challenges emerging. As models get better at being balanced, they also get better at being balanced in ways that are subtly manipulative. A model could present a very even-handed treatment of a topic while subtly framing the debate in a way that favors one side.

The Overton window problem. You can be perfectly balanced within the frame while the frame itself is doing political work.

And that's harder to measure and harder to train against. But it's the next frontier. The current progress is real, but it's progress from "obviously broken" to "subtly imperfect." We're not at the destination yet.

Which is probably the right place for the field to be. If anyone claimed they'd solved political neutrality in AI, I'd be deeply suspicious.

Anyone who claims to have solved political neutrality is either selling something or hasn't thought about it hard enough.

The two great categories of human error.

They cover a lot of ground.

They really do. Alright, I want to touch on one more thing before we wrap. The prompt mentions that this matters because AI models are becoming infrastructure. They're not just chatbots anymore. They're embedded in search, in productivity tools, in education. If those systems have a political tilt, it's not just a preference setting, it's a structural bias in how people access information.

This is why the progress matters beyond just user experience. When a student uses an AI tutor to learn about a controversial historical event, the framing they get shapes their understanding. If the model presents one interpretation as obviously correct and others as fringe, that's not education, it's indoctrination.

Even if the interpretation happens to be the one most historians agree with. The educational value is in understanding why there's debate, not just in knowing which side won the academic consensus.

This connects to something I've been thinking about with the shift toward AI-native search. Traditional search gives you a list of links. You see the range of sources and you can assess their credibility yourself. AI-native search gives you a synthesized answer. The synthesis step is incredibly valuable, but it also concentrates a lot of power in whoever decides how the synthesis works.

The difference between a librarian handing you books and a librarian reading one book out loud and telling you it's the only one that matters.

The librarian in this case is a model trained by a company with its own institutional interests and its own cultural context. The push for political neutrality in these models isn't just about avoiding annoying refusals. It's about the epistemic structure of how the next generation learns about the world.

Which is why the prompt's instinct to want balanced overviews rather than implicit endorsements is not just a preference. It's a structural requirement for these systems to function as genuine information tools rather than persuasion engines.

I think the industry is internalizing this. Not out of pure altruism, though there's some of that. But because users notice when they're being steered, and they don't like it. The market is punishing models that feel like propaganda, even propaganda for the user's own side.

People can smell an agenda. Even an agenda they agree with.

There's research on this. Users rate models as less trustworthy when they perceive political bias, even when the bias aligns with their own views. The instinct toward balance isn't just a philosophical position. It's what users actually want.

Which brings us full circle. The prompt asked if the progress is real. And it's being driven partly by technical innovation and partly by the simple fact that users don't like being lectured to by their software.

The market for condescension is smaller than the market for information.

Now, Hilbert's daily fun fact.

Now, Hilbert's daily fun fact.

Hilbert: In the nineteen sixties, the US Air Force built a radar station in Labrador so remote that the only available source of electricity was a diesel generator the size of a shipping container. To keep their radio equipment running during the brutal winters, the engineers scavenged parts from abandoned aircraft and built a voltage regulator out of a toaster's heating element and a salvaged airplane alternator. The unit operated for seven years without failure, which, if you're doing the conversion, is roughly one toaster per half-decade of Arctic sovereignty.

I have so many questions about the toaster.

I'm stuck on "scavenged from abandoned aircraft." How many abandoned aircraft were just lying around in Labrador?

Apparently enough to build a voltage regulator.

This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop for the fact and for everything else he does to keep this show running.

If you enjoyed this episode, you can find more at myweirdprompts.com or wherever you get your podcasts. Leave us a review if you're feeling generous.

We'll be back with another one soon.

Probably about something completely different.

That does seem to be the pattern.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#3595: How DeepSeek Feels More Open Than Western AI

Downloads

You Might Also Like

#3595: How DeepSeek Feels More Open Than Western AI