Welcome to My Weird Prompts. Last Tuesday, a journalist in Manchester received a death threat on X. It named her children's school. She reported it using the platform's standard reporting tool. Within twelve minutes, she received an automated response: this did not violate X's community standards.
That's not even enough time to make a cup of tea properly.
Which tells you nobody human looked at it. And that's the thing I keep coming back to — a platform that has community standards but doesn't enforce them isn't just negligent. It's creating what I'd call a reassurance mirage. You see the reporting button, you think there's a safety net, and the net dissolves the moment you fall into it.
The timing on this, the reason we're talking about it now, is that May twenty twenty-six marks the enforcement deadline for the EU Digital Services Act. Platforms with more than forty-five million users in Europe now have to publish quarterly transparency reports showing their moderation accuracy rates, their appeal success rates, how many reports actually reach a human. And the first round of numbers is revealing a gap so wide you could drive a truck through it.
How wide are we talking?
X's first DSA transparency report, filed in March, showed a zero-point-four percent user appeal success rate. Meta's was twelve percent. That's a thirty-to-one ratio. Either X's moderation is near-perfect — which the Manchester case and about a thousand others would contradict — or their appeals process is a dead end.
The DSA is effectively blowing the whistle. These platforms have been allowed to mark their own homework for years, and now they're being forced to show their work, and the work is...
It's not just one platform. The disparities across the board are massive. TikTok's internal metrics, which leaked last year in something called Project Amplify, show a twenty-two percent false negative rate for hate speech in non-English languages. That means more than one in five pieces of hate speech just sails through. And these are the platforms with the most sophisticated moderation stacks on the planet.
Daniel sent us this one — he's asking us to look at how major social media platforms define and enforce their speech policies, why we're seeing this bifurcation between what you might call traditional networks and the alt-platforms, and whether any of them are actually taking this seriously. And there's a specific question buried in there that I think is the right one to start with: is having community standards you don't enforce worse than having none at all?
That's a genuinely uncomfortable question, because it forces you to reckon with the possibility that a platform like Rumble, which openly says "we don't moderate beyond what's illegal," might be more ethically coherent than a platform that maintains the appearance of moderation while letting almost everything through.
The musical equivalent of beige wallpaper, but the wallpaper occasionally doxxes you.
actually not a bad way to put it. The point is, this isn't just a policy debate. It's a technical engineering problem, it's a regulatory design problem, and it's a business model problem all tangled together. And the DSA enforcement moment we're in right now is forcing all of it into the open.
Let's step back and ask: what does "fair speech" even mean when the same phrase is legal in Texas and illegal in Berlin?
That legal layer is where the whole thing gets weird. You've got Section 230 in the US, which basically says platforms aren't liable for what users post — it's the reason the modern internet exists. Then you've got Germany's NetzDG, which says if you don't remove manifestly illegal hate speech within twenty-four hours, you face fines up to fifty million euros. And now the EU Digital Services Act sits on top of both, requiring systemic risk assessments and algorithmic transparency. A single post can be legally protected speech in one jurisdiction and a criminal offense in another, and the platform has to decide which law to follow.
There's no single definition of hate speech that a platform can just code into their systems and be done with it. They're navigating three or four legal frameworks simultaneously, and those frameworks don't agree with each other.
And that's before you even get to the technical problem of what hate speech actually looks like to a machine learning classifier. In a moderation context, "hate speech" isn't a moral category — it's a training dataset. It's labeled examples of text, images, and video that human annotators have tagged as violating a specific policy. The quality of your moderation is entirely dependent on the quality and coverage of that dataset.
Datasets have blind spots.
The twenty twenty-four ACLU report found that moderation AI has a thirty-four percent false negative rate for hate speech in African American Vernacular English dialects. That's not a bug in the code — it's a gap in the training data. The annotators who label the training examples don't speak those dialects, don't recognize the coded language, and so the model never learns to catch it. Same problem with dog whistles, in-group slang, phrases that sound innocuous unless you know the subcultural context.
The AI isn't neutral. It's just bad in patterned ways that reflect who built it and who labeled its training data.
And that brings me to how I think we should actually evaluate these platforms. I'd propose a three-part framework. First, what's the stated policy — what do they say they'll remove? Second, enforcement consistency — do they actually remove it, and at what rate? Third, appeals transparency — when they get it wrong, can a user challenge the decision and actually get a human to reconsider?
That third one is where the DSA numbers really bite. If your appeal success rate is zero-point-four percent, your appeals process is a Potemkin village.
It's worse than useless — it's actively misleading. And this connects to the question the prompt raises about whether having unenforced rules is worse than having none. I think the answer is yes, because it creates what you called a reassurance mirage. Users behave as if there's a safety mechanism. They report content thinking it'll be reviewed. And that reporting is essentially a dead letter.
Which brings us to the obvious question: why not just hire more moderators?
The Stanford study from twenty twenty-five put the average human moderation decision at forty-seven seconds per item. At that rate, moderating X's daily content volume would require roughly two hundred thousand full-time moderators. The industry average cost per moderation decision is two dollars and forty-seven cents. The math just doesn't work — you'd need to spend more than the platform's entire revenue on moderation alone.
Full human review at scale is mathematically impossible. Which means every platform is making an engineering tradeoff between automation and accuracy, and the real question is how honest they are about where they've drawn that line.
That's where the central tension lives. Is a platform with lax enforcement that's transparent about it — say, Rumble, which essentially says "we'll remove illegal content and nothing else" — more ethical than a platform with strict written policies that are enforced so selectively that the policies become a kind of theater?
The answer to that question depends almost entirely on which platform you're looking at. Let's start with the most extreme case, because it's the one that illustrates the whole problem in microcosm.
The Manchester journalist and her twelve-minute rejection letter.
And she's not an outlier. In March of this year, a UK journalist — I won't name her, but the case was widely covered — received forty-seven death threats in a single week. One of them explicitly named her children, said where they went to school, and described what the sender would do to them. She reported all forty-seven. She received forty-seven automated responses saying no violation was found.
Forty-seven out of forty-seven. That's not a moderation failure. That's a moderation abdication.
To understand why it happens, you have to look at what actually occurred inside X after the twenty twenty-two acquisition. The trust and safety team went from about two thousand two hundred full-time employees to roughly four hundred and fifty by January twenty twenty-three. That's an eighty percent reduction. The people who built the moderation pipelines, who understood how the classifiers worked, who maintained relationships with law enforcement in multiple countries — most of them were gone within three months.
You lose institutional knowledge, you lose the humans who could override the machines, and you're left with a skeleton crew trying to manage a platform with five hundred and fifty million users.
The machine that's left running is what I'd call reporting theater. Here's how the pipeline actually works. When you hit that report button on X, your report goes into a triage queue managed by a series of machine learning classifiers. Those classifiers scan the reported content and assign a probability score — how likely is this to be a genuine violation? If the score is above a certain threshold, it gets escalated. If it's below, it gets auto-closed.
What percentage actually reaches a human?
Less than zero-point-zero-three percent. That's three in every ten thousand reports. Everything else is auto-adjudicated. Now, the classifiers themselves are trained on labeled datasets that, as we discussed, have massive blind spots. If a death threat uses indirect language — "someone should take care of you" — or emoji — a knife emoji followed by a house emoji — the classifier often doesn't recognize it as a threat. It's looking for explicit keywords, not semantic meaning.
The more sophisticated the threat, the less likely the machine is to catch it. The people who are dangerous know how to game the system, and the people who are just angry and type out explicit slurs are the only ones who get flagged.
And the twenty twenty-four API changes made this worse in a way that doesn't get enough attention. X effectively shut down third-party moderation tools by restricting API access. Before those changes, organizations like the Center for Countering Digital Hate could run their own analysis on the platform's data, track hate speech trends, and publish independent audits. After the API restrictions, that became functionally impossible.
You remove the external auditors, you gut the internal team, and you leave the whole thing running on autopilot. What does the output look like?
The CCDH published their final comprehensive report before the API cutoff, and the numbers are stark. Hate speech impressions on X increased roughly four hundred percent between twenty twenty-two and twenty twenty-five. Not posts — impressions. How many times hate speech was seen. That's the metric that matters for real-world harm, because it measures exposure, not just production.
In place of proactive moderation, X introduced Community Notes.
Which is a interesting system — crowd-sourced context that gets appended to misleading posts — but it's not a moderation tool. A Community Note doesn't remove a death threat. It doesn't protect the person being threatened. It might add a note saying "this post contains a threat," but that's like putting a sticky note on a ticking bomb. The threat is still there, still visible, still causing harm.
It's the moderation equivalent of a "wash hands" sign in a restaurant with no soap.
That brings us to the uncomfortable structural reality. X's current approach isn't a failure of moderation — it's a deliberate architecture that treats moderation as an afterthought. The reporting system exists because it has to exist, legally speaking, but it's designed to process reports at the lowest possible cost, which means almost entirely automated, and the automation is tuned to minimize false positives at the expense of massive false negatives.
Which is a business decision dressed up as a technical one. If you auto-remove too much, users complain about censorship. If you auto-remove too little, the worst that happens is... well, a journalist gets forty-seven death threats and nobody does anything.
Now let's contrast that with. They've taken a fundamentally different approach, at least structurally. They created the Oversight Board in twenty twenty — an independent body funded by a trust that can't revoke, with the authority to overrule 's content decisions. It's often described as a kind of Supreme Court for Facebook and Instagram.
How many cases does it actually handle?
Between forty and fifty per year. Out of more than a hundred million reported items annually. So you're talking about a mechanism that reviews roughly zero-point-zero-zero-zero-four percent of moderation decisions.
It's a Supreme Court that hears four cases a year while millions of people are being sentenced in absentia by an algorithm.
That's the critique. But here's the counterpoint — the Oversight Board's twenty twenty-five annual report showed that when it does review cases, it overturns 's original moderation decision seventy-three percent of the time. Nearly three out of four decisions made were wrong, according to their own independent review body.
Which tells you two things. One, 's initial moderation is unreliable. Two, the Oversight Board is actually doing its job when it gets the chance. The problem is the scale mismatch.
The Board itself has been pushing to move from individual case review to systemic policy recommendations. In twenty twenty-five, they issued a major recommendation on how handles hate speech against political figures, and committed to implementing most of it. So you could argue the Board's real value isn't in the forty cases it reviews, but in the policy changes those cases trigger.
Or you could argue it's a forty-million-dollar-a-year PR exercise that gives a credible-sounding answer when regulators ask about accountability.
Both things can be true. And I think that's the honest assessment — it's a genuine accountability mechanism that also happens to be excellent PR. The question is whether the genuine part outweighs the PR part, and I think the seventy-three percent overturn rate suggests it does, at least directionally.
What about TikTok? They're the platform everyone loves to hate, but I've heard their moderation stack is actually the most sophisticated in the industry.
TikTok runs a three-layer machine learning pipeline that processes over a hundred million videos daily. The first layer analyzes text — captions, comments, on-screen text via OCR. The second layer analyzes audio — speech-to-text, tone detection, even background music for copyrighted or prohibited content. The third layer analyzes the visual feed frame by frame. Each layer generates a risk score, and the scores are combined into a final moderation decision.
They're doing multimodal analysis in near real-time on a firehose of content.
They employ roughly forty thousand human moderators globally as of twenty twenty-five, which is more than any other platform by a significant margin. On paper, this should be the gold standard.
The leaked Project Amplify document said something different.
Twenty-two percent false negative rate for hate speech in non-English languages. That's more than one in five pieces of hate speech getting through. And Project Amplify revealed something even more troubling — TikTok deliberately reduced moderation of hate speech against political figures during election periods in twenty twenty-four and twenty twenty-five, on the grounds that political speech deserves wider latitude.
They have the technical capability to catch hate speech, but they're making editorial decisions about when to deploy it based on political calculations.
That's the core tension with TikTok. They have the best tools in the industry, but those tools are wielded according to priorities set by a company that answers to a government that doesn't exactly have a sterling record on free expression. The moderation stack is world-class. The governance of that stack is opaque and, per the leaked documents, sometimes actively permissive of content that serves strategic interests.
Across the three major incumbents, we've got X with a gutted team and an automated pipeline that catches almost nothing, with a structurally independent oversight mechanism that touches a microscopic fraction of cases, and TikTok with the best technology in the world that gets selectively dialed down when it's politically inconvenient.
That's the landscape. And it explains why the alt-platforms even exist. If the incumbents are failing at moderation in three different ways, there's an obvious market opportunity for someone to do it differently.
The alt-platforms enter the market with a simple pitch — we'll handle moderation differently. And the most technically interesting of the bunch is Bluesky.
Bluesky's the one with the composable moderation, isn't it? Users can subscribe to third-party moderation services instead of relying on a central team.
They built something called Ozone, which is an open-source moderation tooling system. The idea is that instead of one company deciding what crosses the line, users can choose from community-created moderation lists. As of May twenty twenty-six, there are over twelve hundred of these lists — everything from "block all known spammers" to "hide posts containing anti-Semitic tropes" to "filter out unlabeled AI-generated images.
It's moderation as a marketplace. You pick your filters, you curate your own experience, and the platform itself doesn't have to make the hard calls.
Because Bluesky is relatively small — roughly fifteen million users — they can actually afford to put human eyes on every single hate speech report that comes through their first-party system. At that scale, full human review is economically viable. The math works.
Does the composable model actually work, or does it just shift the responsibility onto users who don't know what they're signing up for?
There was a revealing test in February twenty twenty-six. A Bluesky user subscribed to a third-party moderation list called, I believe, "Hate Speech Shield," which was maintained by a small team of volunteer moderators. That list flagged ninety-four percent of posts from a specific political figure — someone known for inflammatory rhetoric. It effectively made that account invisible to anyone subscribed to the list.
Which demonstrates both the power and the risk in one move. Power because the user took control of their own feed and it worked. Risk because a small group of volunteers can effectively de-platform someone for a large number of users with zero due process.
There's no appeals mechanism for the person being blocked. No transparency about why a particular post was flagged. The list maintainers are accountable to no one. You've traded centralized corporate power for decentralized, unaccountable power.
Like replacing a king with twelve hundred warlords.
That's the uncomfortable framing. But Bluesky's argument is that centralized moderation was already failing — the warlords are at least visible and opt-in. If you don't like how one list operates, you unsubscribe and choose another. The user has agency in a way they never did with Facebook or X.
Until you realize that most users will never configure anything. They'll stay on defaults, and defaults are designed for engagement, not safety.
Which is the exact critique the EFF made in their twenty twenty-six guide to platform reporting. Default settings on every major platform optimize for time-on-site, not user wellbeing. Composable moderation is powerful, but only for the minority who actively use it.
Bluesky is the best option for the kind of person who reads privacy policies and configures their own firewall rules. For everyone else, it's just another feed.
Now let's talk about the other end of the alt-platform spectrum.
The video platform that markets itself as the free speech alternative to YouTube.
Their twenty twenty-five financial disclosure told the whole story. Rumble spent zero dollars on content moderation staff. Not a reduced budget, not an outsourced team — zero. Their entire moderation system consists of user reports and automated keyword filtering. No human review, no appeals process, no community guidelines beyond "nothing illegal.
Which is ethically coherent, I'll give them that. They're not pretending to moderate. They're not building a Potemkin village of community standards. They're saying "we're a pipe, not a publisher," and they mean it.
The market has delivered its verdict. Throughout twenty twenty-five and into twenty twenty-six, Rumble has been bleeding creators back to YouTube. Big names who made the switch during the twenty twenty-two exodus have been quietly returning.
Rumble couldn't build a viable ad revenue model because advertisers don't want to be associated with completely unmoderated platforms. No brand safety means no brand dollars. YouTube's moderation might be inconsistent, but it's consistent enough that Procter and Gamble will run ads there. They won't touch Rumble.
The free speech absolutism turns out to be a business liability. Creators want an audience, audiences attract advertisers, advertisers want guardrails. Remove the guardrails and the whole economic model collapses.
This is where the value proposition for these alt-platforms starts to look shaky. If X is already "anything goes" — effectively unmoderated for the vast majority of hate speech — what does Rumble offer that X doesn't? What does Truth Social offer?
Truth Social is an interesting case because they actually do have moderation, technically speaking. They outsourced it to a single vendor — WebPurify.
WebPurify reviews less than zero-point-zero-one percent of reported content on Truth Social, according to their own service-level disclosures. They're a keyword filtering company that offers a thin layer of human review as a premium add-on, and Truth Social appears to have purchased the bare minimum tier.
It's the illusion of moderation at an even lower tier than X. X at least has the remnants of a trust and safety team. Truth Social has a third-party keyword filter and a handful of contract reviewers who might glance at a post if it contains enough flagged terms.
This brings us to the critical mass paradox. Bluesky can afford human review on a hundred percent of reported hate speech because it has fifteen million users. X has five hundred and fifty million users and cannot. The cost per human moderation decision, per the twenty twenty-five Content Moderation Industry Report, is two dollars and forty-seven cents.
Two forty-seven per decision. Multiply that by the millions of reports a platform like X receives daily, and you're looking at a number that exceeds the GDP of small countries.
The math is brutal. If X received just ten million reports a day — a conservative estimate for a platform of that size — full human review would cost roughly twenty-four million dollars daily. That's nearly nine billion dollars a year. X's total revenue in twenty twenty-four was estimated at around three billion.
Full human moderation would cost three times the company's entire revenue. It's not a policy choice — it's mathematically impossible at that scale.
This is the uncomfortable conclusion that the industry doesn't want to state plainly. Perfect moderation at scale is not just difficult or expensive — it is computationally and economically impossible. Every platform with more than roughly fifty million users is making a tradeoff between coverage and cost, and the tradeoff always, always favors cost.
Which means the platforms that claim to moderate effectively at scale are either lying or defining "effectively" in a way that would make a statistician weep.
The EU's Digital Services Act is forcing some honesty into this conversation. The systemic risk assessment requirements mean platforms over forty-five million users have to publish their moderation accuracy rates quarterly. X's first DSA transparency report, filed in March twenty twenty-six, showed that zero-point-four percent user appeal success rate we mentioned.
Zero-point-four. For context, 's was twelve percent. So either X's moderation is near-perfect — which every other piece of evidence contradicts — or the appeals process is a dead end.
It's the dead end. When your appeal success rate is one-thirtieth of your competitor's, you're not doing appeals. You're processing appeals tickets into a shredder and mailing back a form letter.
Which is worse than having no appeals process at all. At least Rumble is honest about what it is.
That's the question we keep circling back to. Is a platform that honestly admits it doesn't moderate more ethical than one that builds an elaborate machinery of fake moderation?
I think the answer is yes, with a caveat. The honesty is ethically preferable, but the outcome for users might be worse. On Rumble, you know there's no lifeguard, so you stay out of the deep end. On X, there's a lifeguard chair, a whistle, a rescue buoy — and nobody in the chair.
The reassurance mirage you named earlier.
It gets people hurt who might otherwise have been more careful.
Given all this, what can someone actually do? If you want to use social media without being exposed to a firehose of hate speech, where do you even start?
The EFF published a practical guide earlier this year on exactly this, and their number one recommendation is refreshingly blunt. Stop relying on defaults.
Defaults are the enemy.
On every major platform, default settings optimize for engagement — which means the algorithm will serve you whatever keeps you scrolling, and outrage keeps people scrolling. So step one is configuring your experience. On Bluesky, that means subscribing to at least two or three community moderation lists. On platforms, it means digging into the content preference controls, which are buried but functional. On X, frankly, it means muting and blocking aggressively and understanding that the report button is not a safety tool.
Which brings us to the second thing. The "report and forget" reflex — where you flag something, get the auto-reply, and move on — is ineffective everywhere. But there are escalation paths that actually trigger different review processes.
On X, for example, filing a legal complaint in your jurisdiction — not a standard report, but a formal legal notice citing applicable law — routes through a completely different pipeline. It goes to an actual legal team rather than the ML classifier. The EFF guide walks through the specific language to use.
In the EU, the DSA gives users a formal right to appeal moderation decisions to an out-of-court dispute settlement body. That's not a platform process — it's a regulatory one, and the platform has to engage with it.
Which connects to the third and maybe most important actionable point. We are in a critical regulatory window right now. The DSA's transparency requirements are forcing platforms to reveal failure rates they'd rather keep hidden, and that data is what makes accountability possible.
For listeners in the US, where there's no federal equivalent yet, state-level bills are the lever. California's AB twenty-two seventy-three mandates moderation accuracy reporting. Supporting those bills — contacting state representatives, showing up for comment periods — is how the DSA-style transparency framework gets built on this side of the Atlantic.
Transparency changes behavior. When X had to publicly disclose that zero-point-four percent appeal success rate, it became a story. It forced a conversation. Without the reporting requirement, that number stays buried in an internal dashboard.
The practical takeaway is three things. One, configure your moderation settings actively — defaults are not your friend. Two, use escalation paths, not standard reports, when you encounter serious threats. Three, support transparency legislation, because sunlight is the only thing that's ever made platforms squirm.
None of this solves the fundamental scale problem we've been talking about. But it's what you can do today, on the platforms that exist, without waiting for a regulatory miracle or a technological breakthrough that probably isn't coming.
Maybe the real question isn't which platform to use or how to configure it. It's whether the entire centralized model is fundamentally broken.
The fediverse argument.
If moderation at scale is mathematically impossible — and the numbers we've been looking at suggest it is — then maybe the answer isn't better moderation at scale. Maybe it's smaller communities where moderation is actually tractable.
Mastodon's been running on that model for years. Individual servers set their own rules, enforce them locally, and defederate from servers that don't meet their standards.
Bluesky's AT Protocol is a more sophisticated version of the same idea. You're not locked into one platform's moderation decisions. You can move your identity, your followers, your entire social graph to a different provider with different moderation policies.
Which sounds utopian until you remember that most people don't want to think about server selection or protocol migration. They want to open an app and scroll.
The fediverse has always struggled with that onboarding friction. But here's the thing — the EU's Digital Services Act might force the issue. X has already announced it will geoblock EU users entirely rather than comply with DSA moderation requirements.
Wait, they actually said that?
It was buried in their March twenty twenty-six transparency filing. The language was corporate-speak — "exploring geographic service adjustments to accommodate divergent regulatory frameworks" — but the meaning is clear. They're going to wall off Europe.
We're heading toward a two-tier internet. EU users get regulated platforms with mandated transparency and appeal rights. Everyone else gets whatever the platforms decide to serve them.
The twenty twenty-six US midterms will be the first major test of this split. American political discourse will run on effectively unmoderated platforms while European discourse runs on DSA-compliant ones. We're about to see what happens when the same election cycle plays out under two completely different content moderation regimes.
Which brings us back to the ethical question we keep circling. If the choice is between platforms that pretend to moderate and platforms that honestly don't, which is actually worse?
I keep coming back to Rumble and Truth Social. I disagree with their approach to moderation — or rather, their absence of one — but there's something almost refreshing about the honesty. Nobody on Rumble thinks they're protected. Nobody files a report expecting human review.
The reassurance mirage doesn't exist because the reassurance was never offered.
And I wonder if, in the long run, that honesty is less damaging than X's elaborate theater of community standards that don't actually get enforced.
It's a deeply uncomfortable conclusion. We're essentially saying that the platform with the worst moderation might be more ethical than the platform with the second-worst moderation, because at least it doesn't lie about what it is.
That's where we should leave it, I think. Not with an answer, but with a question that the industry is going to have to confront in the next few years. Can you build a social media platform at global scale that moderates effectively, transparently, and honestly? Or does scale itself make honesty impossible?
Now: Hilbert's daily fun fact.
Hilbert: In the 1780s, fishermen in the Aral Sea basin reported a species of blind cave fish that would swim directly into nets during the day but somehow avoid them completely at night, leading local communities to believe the fish could sense the nets through vibrations in the water — a behavior that wouldn't be scientifically documented in any species for another century.
We'll be back next week with an episode on the fediverse experiment — can decentralized moderation actually work when it leaves the white papers and hits the real world? Should be a lively one.
This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop. Find us at myweirdprompts dot com or wherever you get your podcasts.
Until next time.