#3017: Why Every Restaurant Has 4.6 Stars

Google Maps ratings are broken. Here's how four mechanisms inflate them — and what actually works instead.

Featuring

Listen

0:00

Episode Details

Episode ID: MWP-3187
Published: May 23
Duration: 28:09
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: social-engineering review-inflation rating-manipulation

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Online restaurant ratings are systematically inflated, and the problem runs deeper than fake reviews. Four distinct mechanisms create this distortion. First, Google's automated review filter suppresses 15-20% of legitimate negative reviews as a side effect of optimizing for user engagement — negative reviews make people close the app. Second, Google My Business lets owners flag reviews with one click, triggering automatic removal while reviewers navigate a deliberately labyrinthine appeals process. Third, Israel's 2024 Amendment Eleven to the Defamation Law reversed the burden of proof, making negative reviewers presumptively liable and forcing them to prove good faith. Fourth, review-for-reward programs inflate ratings: a Reichman University study found 34% of five-star reviews for new Israeli restaurants came within 24 hours of receiving an incentive.

The solution may lie in abandoning five-star scales altogether. MIT Media Lab research found binary recommendation systems — "would you recommend this to a friend, yes or no" — have 2.3x higher predictive validity. Gett's Tel Aviv experiment confirmed this: switching to thumbs-up/thumbs-down reduced disputes by 40% without degrading service signals. Even better: platforms like Waze show that behavioral signals — return rate, dwell time, ordering patterns — can replace explicit evaluation entirely, avoiding both rating inflation and defamation risk.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#3017: Why Every Restaurant Has 4.6 Stars

Daniel sent us this one — he's been noticing what I think we've all felt lately. You open Google Maps, you're hungry, you see a place with four point six stars and three hundred reviews, you show up, and it's... The hummus is fine. The service is fine. And you sit there thinking, how did this place get four point six stars when it's the culinary equivalent of beige wallpaper? The question is whether crowdsourced review systems have actually become useless as quality indicators, and whether there's a better model — something built on pure recommendation rather than this broken star-rating paradigm. There's a lot to unpack here.

There really is. And you've put your finger on something that has a name in the research literature — it's called rating inflation, and it's not just your imagination. There was a cross-city analysis done last year comparing Google Maps ratings in Tel Aviv versus Tokyo, two cities with roughly comparable restaurant quality distributions by any objective measure. Tel Aviv restaurants average zero point six stars higher. Same quality of food, systematically higher ratings. That's not about the hummus being better. That's the system itself bending the numbers.

Zero point six stars is enormous on a five-point scale. That's the difference between "this place exists" and "I would bring my mother here.

And it gets worse when you look at the mechanisms producing that gap. Google has an automated review filtering system — it's a machine learning model that flags reviews as suspicious based on things like account age, location history, how many reviews you've left in a short period, whether you've actually visited the place according to your location data. The model is designed to catch spam and fake reviews. But here's the problem — independent audits, including a twenty twenty-five analysis by ReviewMeta, suggest that fifteen to twenty percent of legitimate negative reviews get caught in that filter and suppressed.

One in five genuine bad reviews just...

Without the reviewer ever knowing. Google doesn't notify you when your review is filtered. It still appears to you when you're logged in — it looks like it's published. But nobody else can see it. It's what they call a shadow ban.

Of course they call it that. The tech industry's gift to euphemism. So the algorithm isn't biased against businesses, it's biased against negative sentiment, and the mechanism is invisible to the person whose content just disappeared.

That's the key misconception people have. The assumption is that Google removes negative reviews because they're somehow in cahoots with business owners, or because they're afraid of defamation lawsuits. That's not the mechanism. The algorithm is optimized for user engagement. Negative reviews reduce engagement — they make users less likely to click through to a place, less likely to spend time on the platform, less likely to click ads. So the model learns, as a side effect of its optimization target, that suppressing negative reviews is good for the metrics. It's not malice. It's misaligned incentives.

The algorithm doesn't hate your honest opinion. It just hates anything that makes you close the app. Which is almost more infuriating, because there's no villain to point at — just a system that's been tuned to prefer pleasant fictions over useful truths.

That's only the first mechanism. The second one is the business owner toolkit. Google My Business gives owners the ability to flag and report reviews they don't like. A single owner complaint can trigger an automatic temporary removal while the review is "under investigation." The reviewer then has to navigate an appeals process that is, and I'm quoting from a twenty twenty-four analysis of this, deliberately labyrinthine. Multiple steps, unclear criteria, no human contact. Meanwhile the business owner can just keep reporting the same review.

The asymmetry is built in. The business has a one-click button to make your criticism disappear, and you have to fill out what amounts to a digital visa application to get it reinstated. Most people just...

Most people don't. And then we get to mechanism three, which is the one that's particularly relevant here in Israel. In twenty twenty-four, the Knesset passed Amendment Eleven to the Defamation Law. Before this amendment, if a business sued you for defamation over a negative review, they had to prove you acted with malice or reckless disregard for the truth. After the amendment, the burden shifted. Now, if you leave a negative review of a commercial establishment, it is presumptively defamatory, and you — the reviewer — must prove you acted in good faith.

You're guilty until you prove you weren't just being mean. That's a fundamental reversal of how defamation law normally works.

And it had an immediate chilling effect. The case everyone in legal circles talks about is HaMiznon versus Reviewer in the Tel Aviv Magistrate Court, twenty twenty-five. A customer left a three-star review of a hummus place — not a one-star rant, a three-star review that said the hummus was "fine but overpriced" and the service was "slow." The restaurant demanded eight thousand shekels in a settlement letter, citing the new amendment. The reviewer settled rather than fight.

Eight thousand shekels for "fine but overpriced." That's the "yuck" tax we talked about before, except now it's not just one case — it's structurally embedded in the legal system.

That's a three-star review. Imagine what happens to someone who leaves one star and says the food was bad. The chilling effect isn't theoretical. Law firms in Tel Aviv have started advertising "review defense" services to small businesses. The message is clear: leave a negative review, and you might get a lawyer's letter in your inbox.

You've got three layers now. Google's algorithm silently disappears your criticism. The business owner can flag it into oblivion. And if somehow it survives both of those, you might get sued under a law that presumes you're a bad actor. The rational response for any individual is to just... not leave negative reviews.

Which brings us to mechanism four, the one that fills the vacuum. Review-for-reward programs. A twenty twenty-five study from Reichman University analyzed Google Maps reviews for restaurants in Israel that were less than two years old. They found that thirty-four percent of five-star reviews were left within twenty-four hours of the customer receiving an incentive — free dessert, ten percent off the next visit, a small drink. The review isn't fake in the sense that a bot wrote it. A real person ate there. But the five stars were purchased for the price of a complimentary baklava.

Thirty-four percent. So more than a third of the glowing reviews for new restaurants are essentially paid endorsements dressed up as organic enthusiasm. And Google's official policy prohibits this, but enforcement is basically nonexistent because how do you prove someone got a free dessert?

You can't, at scale. And this gets to the structural problem. It's not that there are bad actors gaming the system. It's that the system incentivizes everyone to become a bad actor. The business owner who doesn't offer review incentives is at a competitive disadvantage. The customer who leaves an honest three-star review is taking on legal risk. The algorithm rewards positivity and punishes negativity. Every incentive points in the same direction — inflate, inflate, inflate.

We end up in a world where every shawarma joint has four point five stars and none of them are memorable. The signal has been completely drowned out by the noise of systematic incentive distortion. Which brings us to the core question from the prompt — is there an alternative? Can you build a review system that actually works on recommendations rather than trying to do everything?

Let's look at what's actually broken before we try to fix it. The five-star scale itself has a fundamental problem that most people don't think about. Different users anchor differently. Your four is my three. Your three is someone else's two. Research from the MIT Media Lab in twenty twenty-three found that binary recommendation systems — just "would you recommend this to a friend, yes or no" — have two point three times higher predictive validity for future satisfaction than five-star scales.

Two point three times. So asking people to assign a number between one and five is not just less informative than a simple yes or no — it's dramatically worse.

Because the granularity creates noise. When you ask someone to rate on a five-point scale, you're asking them to make multiple judgments simultaneously. How good was the food? How were my expectations set? What's my personal baseline for "four stars"? Am I comparing this to fine dining or to other shawarma places? Different people answer different questions when they see the same five stars. With a binary recommendation, you're asking one question: given your experience, would you send a friend here? That has much higher inter-rater reliability because it collapses all those sub-judgments into a single actionable signal.

It's the difference between asking someone to describe the weather on a scale of one to five and asking them "should I bring an umbrella." One is a subjective assessment, the other is a decision.

That's exactly the right framing. And we've seen this work in practice. Gett, the taxi platform, ran an experiment in Tel Aviv in twenty twenty-five where they switched from a five-star driver rating system to a simple thumbs-up thumbs-down. Disputes dropped by forty percent. Service quality signals didn't degrade at all. The simpler system was just as informative and generated far less friction.

Forty percent fewer disputes. So not only is the binary system more predictive, it's less contentious. People don't agonize over whether a driver was a four or a five. They just know if they'd ride with them again.

That's the recommendation paradigm. The prompt asks whether we could have a system that works on recommendations rather than trying to evaluate everything comprehensively. I think the answer is yes, and the Gett experiment is one data point. But there's a deeper insight here about what kind of signal we're actually trying to capture.

Think about Waze. Waze doesn't ask you to rate roads on a five-star scale. It doesn't ask you to write reviews of your commute. It collects implicit signals — your speed, your braking patterns, whether you deviate from the suggested route. From those signals, it infers where the traffic is, where the accidents are, where the potholes are. The quality signal emerges from behavior, not from explicit evaluation.

Apply that to restaurants. Instead of asking people to write reviews, track whether they come back. Track how long they stay. Track whether they order dessert — people order dessert when they're having a good time.

Return rate is the single strongest signal of restaurant quality, and almost no review platform captures it. If a place has a hundred five-star reviews but nobody ever goes twice, that's a different signal than a place with fifty reviews and an eighty percent return rate. Dwell time is another one — people linger at good restaurants. They rush through bad ones. These are behavioral signals that are much harder to fake than a star rating.

They avoid the defamation problem entirely. You're not saying the food was bad. You're just... not coming back. The data point exists without anyone having to make a claim that could get them sued.

There's a restaurant in Jaffa that figured this out. It was a pop-up called No Stars — and that was literally the concept. They deliberately opted out of all review platforms. No Google Maps listing, no Yelp, no nothing. Reservations were through a private WhatsApp group. The only way to hear about it was through someone who'd already been. They sold out every night for six months.

The absence of a rating system became the signal. If you heard about No Stars, it meant someone you knew had been there and thought enough of it to tell you personally. That's a higher-trust recommendation than four point eight stars and eight hundred reviews could ever be.

That's the model I think is most promising — what I'd call a friend network recommendation system. Instead of seeing reviews from strangers whose taste you don't know, you only see recommendations from people whose taste profile matches yours. Letterboxd does this for movies. You follow people whose taste you trust, and their ratings shape your recommendations. The platform doesn't pretend to be objective. It's explicitly subjective, and it works because it embraces that subjectivity.

The problem, of course, is scale. Letterboxd works for film buffs who are already curating their taste. Would this model work for finding a plumber in a new city where you don't know anyone?

That's the tradeoff, and I don't want to pretend it's not real. A friend network model is inherently limited by the size and density of your network. It works brilliantly for some things — restaurants in your neighborhood, movies, books — and breaks down for others. But I think we're asking the wrong question when we say "will it scale to cover everything." The current system tries to cover everything, and it's failing at exactly that. Maybe the answer is multiple systems for multiple contexts rather than one universal review platform.

The friend network for restaurants and entertainment, and something else for plumbers. What's the something else?

Verified experience credentials. Instead of reviews, you get something closer to a professional certification that's tied to actual completed jobs. Did the plumber show up on time? Did they fix the problem? Was there a callback within thirty days? These are objective facts, not subjective evaluations. A platform could verify them through transaction records and scheduling data without anyone ever having to write "great plumber, five stars.

You're separating the objective from the subjective entirely. For things where taste matters — restaurants, movies, music — you use recommendation networks calibrated to your preferences. For things where competence matters — plumbers, doctors, lawyers — you use verified outcome data. And for neither do you ask strangers to assign numbers on a five-point scale.

There's an Israeli startup that's already doing something like this for a taste-driven category. Have you heard of Profile Wines?

I have not.

Founded in twenty twenty-four. Instead of asking users to rate wines on a five-star scale, they built what they call a taste genome. You rate wines on specific dimensions — tannin level, acidity, fruitiness, body. You're describing attributes, not making value judgments. The platform then matches you with wines that fit your taste profile. And here's the clever part — because you're describing characteristics rather than saying "this wine is bad," the defamation problem disappears entirely. You can't be sued for saying a wine has high tannins.

That's genuinely elegant. They've sidestepped the entire legal and incentive problem by changing what they're measuring. You're not evaluating quality, you're reporting attributes, and the quality signal emerges from the matching algorithm.

It connects to the broader point about what's broken in current review systems. The problem isn't that people are dishonest. It's that we're asking them to do something — assign a universal quality score — that they can't actually do in a consistent way, and then we're building a whole system of incentives around that broken measurement.

The metric becomes the target, and the target becomes corrupted. Goodhart's Law in action. So let's pull on the Amazon thread, because I think it illustrates the same dynamic in a different domain. Amazon reviews used to be useful. Now they're a wasteland of fake five-star reviews, review clubs, and AI-generated text that reads like it was written by a marketing intern who's being held hostage.

Amazon's been fighting this for years. In twenty twenty-five they rolled out an "AI verified purchase" badge to try to restore trust, and it basically hasn't worked. The problem is structural — a verified purchaser who got the product for free in exchange for a review is still a verified purchaser. The badge verifies the transaction, not the honesty of the evaluation.

The review clubs are the same dynamic as the free dessert. "Join our VIP reviewer program and get free products" — it's not illegal, it's not even against Amazon's terms of service in many cases, but it turns every review into an advertisement.

This is why I keep coming back to the same conclusion. Verification doesn't solve the incentive problem. Better moderation doesn't solve the incentive problem. The only thing that solves it is changing what you're asking people to do. Stop asking for universal quality scores and start asking for specific, verifiable things — would you recommend this, would you go back, what specific attributes did you notice.

If you were designing a review platform from scratch today, what would it look like?

First, binary recommendation instead of star ratings — thumbs up or thumbs down, would you recommend or wouldn't you. Second, implicit behavioral signals — return visits, dwell time, order frequency, all collected passively with user consent. Third, optional attribute tagging — describe specific characteristics rather than making overall quality judgments. Spicy, quiet, good for groups, fast service. These are factual claims that can be aggregated without the defamation risk.

The social layer? The friend network piece?

That's the fourth component, and it's the one that makes everything else work. By default, you see recommendations from your network — people you've chosen to follow because you share taste. The aggregate data is still there if you want it, but it's secondary. The primary experience is "here's what your people think.

Which is how humans have always made decisions about where to eat. You ask someone you know. The review platforms didn't replace that — they supplemented it, and then gradually the supplement became the main thing, and then the main thing became corrupted, and now we're trying to figure out how to get back to something that actually works.

There's a timing issue here that makes this conversation particularly urgent. In March of this year, OpenAI demonstrated GPT-five's capability to generate reviews that are indistinguishable from human-written ones. Not just grammatically correct — indistinguishable in blind tests. As AI-generated reviews flood the platforms, the entire premise of "authentic crowdsourced wisdom" becomes unsustainable.

The window for fixing review systems might be closing. If we don't move toward verified-experience-only models, we're going to be drowning in synthetic reviews that look more authentic than real ones.

The platforms have very little incentive to fix this. Google makes money when you search for restaurants and click on results. They don't make money when you have an authentic dining experience. The engagement metrics that drive their business are better served by a system where every restaurant looks great and you keep searching and clicking, than by a system where the top result is "this place is mediocre, go somewhere else.

The business model and the user need are fundamentally misaligned. Which is why I think the most interesting solutions are going to come from outside the existing platforms. Small, focused networks that don't need to monetize through engagement.

The No Stars model, but digitized. Imagine a network of private recommendation groups organized by neighborhood or by cuisine or by dietary preference. You join because someone vouches for you. The recommendations are visible only to members. There's no public rating, no star system, no platform that can be gamed or monetized. It's just people telling other people where to eat.

It doesn't scale to a billion users. But it might not need to. The current system scaled to a billion users and became useless. Maybe usefulness and scale are in tension here.

That's the tradeoff I keep coming back to. There's a paper from twenty twenty-three that looked at this directly — as review platforms grow, signal quality degrades along a predictable curve. The inflection point seems to be somewhere around fifty to a hundred thousand active users in a given geographic area. Beyond that, the incentive to game the system outweighs the value of additional contributions.

The optimal review platform might be smaller than Google Maps. That's a counterintuitive finding.

And it explains why niche platforms like Letterboxd for movies or Untappd for beer consistently deliver better recommendations than general-purpose platforms. They're smaller, the users share a common vocabulary and set of expectations, and the incentive to fake a beer review is much lower than the incentive to fake a restaurant review that could drive actual foot traffic.

Before we move to practical takeaways, I want to circle back to something the prompt mentioned — the perspective of the small business owner. Because you can understand why negative reviews feel like an attack. If you've poured your savings into a restaurant and someone leaves a one-star review because the wait was long on a busy night, that stings in a way that's hard to overstate.

The current system makes it worse by treating all reviews as equally valid and equally permanent. A one-star review from someone who had a bad experience because the restaurant was unusually crowded carries the same weight as a one-star review from someone who got food poisoning. There's no nuance, no context, no decay over time. The review sits there forever, dragging down the average.

Which is why business owners feel they have to fight every negative review. It's not pettiness — it's rational self-preservation in a system that punishes anything less than perfection.

That's where the recommendation model actually helps business owners too. If the system is "would you recommend this place, yes or no," a single bad night doesn't turn into a permanent scarlet letter. The recommendation percentage can recover. The implicit signals — return rate, dwell time — reflect the overall pattern, not the outlier experience.

It's not just better for consumers. It's fairer for businesses. The current system creates an adversarial relationship where every review is a potential threat. A recommendation system aligns incentives — the business wants you to come back and recommend them, which means they focus on consistent quality rather than review management.

Let's make this practical for listeners. If you're using Google Maps today, what should you actually do? My advice: ignore the star rating entirely. It's the least informative signal on the page. Instead, read the three-star reviews. They're the most honest.

Why three stars specifically?

Because one-star reviews are often emotional — someone had a terrible experience and they're venting. Five-star reviews are often incentivized or filtered for positivity. Three-star reviews are the sweet spot. The reviewer wasn't angry enough to rant, wasn't rewarded enough to rave. They're just describing what happened. And because three-star reviews are less likely to trigger the algorithmic filtering or the legal challenges, they survive at higher rates.

That's a useful heuristic. Ignore the number everyone else is looking at, and read the reviews nobody bothers to write unless they actually have something to say.

For small business owners, my advice is more radical. Consider opting out of Google Maps reviews entirely. Build a private recommendation network — a WhatsApp group, a Telegram channel, an email list. The No Stars model works because scarcity and word-of-mouth create higher trust signals than any public rating system. You'll get fewer customers, but they'll be better customers — people who came because someone they trust sent them.

For anyone designing a platform — or advocating for one — the Gett experiment is your proof of concept. Binary recommendation plus implicit signals. Thumbs up, thumbs down, return rate, dwell time. It's simpler to implement, harder to game, and produces better outcomes. The five-star scale was never a good idea. We just got used to it.

The meta-insight here is what I keep coming back to. The problem isn't bad actors. It's not fake reviewers or litigious restaurant owners or greedy platforms. The problem is that the system itself incentivizes dishonesty. Any review system that asks for a numerical rating on a public platform will eventually be gamed. The solution isn't better enforcement — it's changing what you're measuring.

The future of reviews isn't better algorithms. It's smaller, more honest networks. It's asking different questions. It's accepting that comprehensive, universal quality scores were always a fiction, and building systems that don't pretend otherwise.

Which brings us to the open question I keep wrestling with. Can a recommendation-only system scale beyond niche communities? Letterboxd works for film buffs. Untappd works for beer nerds. No Stars worked for one restaurant in Jaffa. But could you build a city-wide recommendation network for all restaurants that doesn't eventually succumb to the same incentive problems?

I think the answer might be that you don't try. You build lots of small networks with overlapping membership. A vegan recommendation group. A parents-with-young-kids group. A late-night-food group. Each one is small enough to maintain trust, and people belong to several. The platform provides the infrastructure, but the trust stays local.

That's an interesting model. And it maps to how trust actually works in the real world — you trust different people for different things. Your coworker who knows pizza, your neighbor who knows brunch spots, your cousin who's obsessive about coffee. A platform that formalizes those trust relationships without trying to aggregate them into a single universal score.

The best review you'll ever get is from a friend who knows your taste. Everything else is just noise with a number attached.

On that note, I think it's time for Hilbert's daily fun fact.

Now: Hilbert's daily fun fact.

Hilbert: In the nineteen eighties, linguists working with the last remaining speakers of the Nyawaygi language in Queensland, Australia, discovered that the language contained seventeen distinct kinship terms for cousins alone — differentiating by gender, relative age, and whether the parent connecting them was same-sex or opposite-sex sibling to the speaker's parent. The last fluent speaker died in two thousand nine, taking with her the ability to navigate a family tree with precision that English requires paragraphs to approximate.

Seventeen words for cousins.

I thought my family reunions were complicated.

This has been My Weird Prompts. Our producer is Hilbert Flumingtop. If you enjoyed this episode, please leave us a review — ideally a genuine one, though after today's discussion I'm not sure any of us will ever trust a star rating again. You can find all our episodes at myweirdprompts dot com.

Until next time, ask a friend where to eat.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#3017: Why Every Restaurant Has 4.6 Stars

Downloads

You Might Also Like

#3017: Why Every Restaurant Has 4.6 Stars