Ever notice how the average score given by a review show somehow tends to be above average?
If you take a stroll on professional game review websites, you will notice that score tend to be in the 6.0 to 10.0 range, even if they're nominally using a ten-point scale. This is called the four point scale, which is also sometimes called the 7 to 9 scale. Two takes exist on why this is so.
The first view considers the four point scale to be a bad thing, and holds this as evidence of a website's lack of integrity (often toward mainstream outlets). The accusation is rarely leveled at the writers themselves, with the blame usually placed on a site's editors or Executive Meddling.
The game journalism industry, like all forms of journalism, thrives on access. Game magazines and websites need to get a steady flow of new games, previews, and promotional materials directly from the publishers in a timely manner, or they're irrelevant. Unfortunately, the game industry does not have to provide this access, and games review sites and magazines are far more reliant on the companies that produce the games than movie critics are on movie companies; indeed, since most websites are expected to provide their content for free, industry advertising is perhaps their most important source of income. There are murky tales of editorial mandates or outright bribery, but the whole system is set up so that providing a highly critical review of a company's triple-A title is akin to biting the hand that feeds you. This is especially true of previews, which tend to have an artificially positive tone since if a journalist pans a game the company didn't have to show him in the first place, he's unlikely to be invited back to see any of their other work. As such, you're unlikely to see major titles, even awful licensed crap, get panned too hard in for-profit publications.
In addition, there's the fact that many of these game review programs draw their audience by reviewing the most anticipated upcoming games; games which are anticipated due to their high degree of quality and polish. Because of this, many critics are incentivised to only review good games for fear of losing ratings. As such, many game reviewers will simply never get around to reviewing the lower quality, bargain bin, shovelware games in order to balance out the scale, hence skewing their score average upwards.
The other view considers the four point scale to be the result of a perfectly reasonable way to award and interpret review scores. This can be understood fairly easily by comparing with the way school assignments are graded. In any given class, people will usually get scores ranging from 60% to 100%, with the average being around 70-75%. This then leads people, both reviewer and reader, to expect scores to mean something similar to what they already encountered in real life. Getting ~60% means "this sucks, but it can still be considered a game", ~75% is "average", ~85% is "decent/solid" and anything above 90% is a mark of excellence.
Think of just how hard it is to actually get lower than 60% on an assignment. Even if you hand in complete crap for your essay on the Punic Wars, it will be hard to get much lower than 60%. For you to get under 60%, you pretty much have to turn in something that goes beyond "not being good". Unless you forget to include the last 3 pages of your essay, and accuse Napoleon Bonaparte of engineering the Punic Wars to cause the September 11 attacks, all written in another language with ink made from cat urine, you probably did enough done to get passing grades. Game developers that achieve this level of suck quickly go out of business, which in turns explains why games rarely ever get scores below 60%. This also explains why there are few games that get under 75%, as most game developers know that churning out sub-par products isn't good long-term business practice, and those who don't know it quickly learn the lesson.
The situation with the four point scale has lead some reviewers to drop rating scores altogether, or favor an A/B/C/D grading system. Professional reviews tend to keep a rating system to reduce the chance of being misquoted or misinterpreted, as it will be evident that you did not mean the game was "excellent" if there's a big "6/10" or "D" at the end of the article.
The same basic concept applies to every industry; reviewers tend to place things in the upper half of whatever their reviewing scale happens to be, and for the same reasons.
Of course, if reviewers get too negative there's always the risk of fan backlash, because Reviews Are The Gospel. Contrast So Okay Its Average, where being just below this scale is acknowledged to have some quality, if not a lot. See also Broke The Rating Scale and F Minus Minus. See 8.8 for when fans freak out at reviewers for making use of too much of the review scale.
Examples (by subject):
open/close all folders
This happens to an extent with fan reviews too. If you go to any site where shows can be rated (like Anime News Network) most shows will float above 6.0. Fan reviewers do tend to be, well, fans, which would tend to skew reviews positively. That, and they may pattern themselves after official reviews, even without meaning to. And sometimes the fan reviews "cheat" to bring the score closer to their desired number. The problem is with the way the scores are averaged, encouraging this kind of behaviour. By taking the median score or using a fancy formula, there are ways to make it an 8/10 rated movie is affected the same way by a 7/10 and a 1/10.
There's plain selection bias here; no one is forced to watch anime they remotely suspect they won't like. The some-eps rating vs the all-eps rating point spread and population ratio can be instructive.
Exception: Some anime series with exceptionally bad Macekre dubs will still have the original version rated highly, but the dub will get low ratings.
Fan reviews on video games also apply here. You'll find a mixtures of reviews that are all perfect scores or close to it and reviews that give the game the lowest score possible.
Truth in Television: the whole business can also be justified in many cases by score entropy. Here's how it goes: you independently, objectively and honestly review game A. You give it, say, 95%. A year later, you review game B. Game B is pretty much game A, with the awesome cranked to eleven. Or with the same awesome, but all the miscellaneous suck ironed out. Watchagonnado? You objectively have to give it 96%. Cue next year. Some reviewers such as Gamespot claim that their standards rise as the average quality of what they review rises, averting this problem in theory but giving rise to a lot of Fan Dumb if actually followed.
This is the same concept behind why they have the Olympic favorites in events like Ice Skating do their routines last. If they did them first, and got a perfect score, but were then one-upped by an underdog, the judges can't score the underdog higher than perfect, and controversy erupts.
This trope can also be explained in basically all industries because, if you assume the scores are like grades in school, getting a 50% is absolutely terrible. This does lead to bizarre situations with user submitted reviews on sites where a person will give the game a 5 or 6 out of ten while claiming the game was average or somewhat above average, while somebody scoring the game a 7 claims the game was mediocre but without major flaws, or where somebody will give a game an 8.5 or 9/10 because "nothing can be perfect" or because it's not on the system the reviewer likes, while somebody else may score the game a ten saying it's the best game on the system by far despite a few minor flaws.
Lore on Potato Bugs: "'Fouler insect never swarmed or flew, nor creepy toad was gross as 'tato bug. Remove the cursed thing before I freak.' — Wm. Shakespeare, Betty and Veronica, Act 1, Scene 23. I can't even go into how nightmarish these vile little affronts to decency and aesthetics are. If I were having an Indiana Jones-style adventure, the Nazis would lock me in a crypt with a herd of potato bugs. And, I might add, I'd choke myself to death with my own whip right then and there rather than let a single evil little one of them touch my still-living body. They're still better than Scrappy-Doo, though. D-"
This gets to be true on just about any site where viewers can post their opinions as well. Simply put, the only people who vote are either going to 1) post glowing reviews and scores, because they loved it, or 2) post really bad ones, because they hated it. Genial appreciation, the response of the much larger majority never gets factored in.
Any horoscope that rates the upcoming day on an alleged scale of one to ten will use a Four Point Scale.
A well known gun writer came right out and said that negative reviews were not allowed by the editorial staff. He went on to say that they simply wouldn't print reviews for bad guns, so if a new gun came out and none of the major industry mags was reviewing it, take a hint.
New car reviews in both magazines and newspapers. Even the Yugo received lukewarm reviews from the major car magazines; these publications are truly frightened at the thought of losing advertising revenue due to giving a poor review. This is doubly-true after General Motors pulled its advertising from the Los Angeles Times after one of GM's products was panned in print. Of course, this may be the only case where this trope is potentially justified, as compared to everything else on this list cars and other vehicles are very expensive, and if you buy one the dealer isn't inclined to take returns.
European motorcycle magazines seem to have a particular love for BMW motorcycles. A flat spot in the torque curve is a minus for any other marque, but the BMW is praised for having high end power. Or a test of three comparable motorcycles where the two Japanese cycles win on points in the summary, but the article still proclaims the BMW number 1. It's either Euro-chauvinism, or influence by the BMW advertising budget. It doesn't help that BMW routinely provides reviewers with bikes with all the optional extras. Reviewers will gush the entire review on the technological gew-gaws, and then mention in one sentence at the end that these are all optional and cost money. Guess what readers remember?
Jeremy Clarkson mentioned this trope frequently in his published reviews. He says that the best thing that happened to his car reviewing was television, because it meant that the previous power relationship was reversed - he was rich enough to say what he liked thanks to TV profiles, and his public profile was so great that car manufacturers could not not send him cars for review. Which he then reviewed honestly. Tellingly, despite years of saying that he despises all Asian cars except Hondas (because Honda was started by a Mr Honda who had a dream when he was a small boy, like BMW or Lotus, as opposed to simply being the automobile arm of a heavy industry company), firms like Daewoo still sent him cars, which would be savaged.
Consumer Reports has a policy against reviewing cars or household goods that they didn't buy incognito from a retailer. Nonetheless, most of its ratings are Good, Very Good or Excellent.
Cars at a car show or ones that are being appraised are scored on a scale of 1-6, with 1 being perfect and 6 being junk. Most are scored as a 2-3 because they are generally at a car show and most people don't take junk vehicles out to such things.
Go to TV.com. Pick a show you hate, any show. It's pretty much guaranteed that most of the ratings won't drop below 7 out of 10. In some cases, reviewers will rate an episode before it's aired, in a "I think this will be good" way.
For British television dramas, "average" is actually 77%. Even so, very few dramas go below 70 or over 90 (much was made over the Doctor Who Series 4 finale getting 91% for both parts).
As a reality TV example from Dancing with the Stars, you can trip, shuffle, and walk your way across the dance floor for two minutes and still get a four or five. Two and three are put in play extremely rarely, when the judges are trying to force an inferior dancer off the show. In ten seasons, no one has ever been given a one.
The Head Judge Glenn once gave an explanation of each of the ten scores, and getting on the floor and moving your feet grants you a 2. Being vaguely aware that there was music playing was a 3. Dancing mostly in time to said music gets a 4. To get a 1, you literally would have to not dance at all.
On Strictly Come Dancing, Craig Revel-Horwood, in particular, has been criticised for his "low" marking - he marks out of the full 10 (and isn't afraid to use 1s or 2s), while the other judges give out sub-6 scores so rarely that it tends to look like a personal insult when they do. This criticism ignores the fact that, logically, if you're using a ten-point scale then a five or six should be average and a seven or above should be good. Things get even worse once the season passes the quarter-final stage, when any mark lower than 9 tends to be roundly booed by the audience.
Ice Age (formerly Stars On Ice, RussianDancing With The Starson ice), uses standard figure skating scales: 0.0 to 6.0. To put things into perspective, the worst average score in the five-year history of the show, awarded to the worst pair on the very first day, was 4.8. It's becoming worse over the years: now the average score is 6.0, noticeable mistakes mean 5.9, and bad performance is as low as 5.8. To add insult to injury, judges sometimes complain about how they don't have enough grades to noticeably differentiate between performances of similar qualities, apparently ignoring the fact that they have 57 other grades at their disposal.
Sounds of Death, aka S.O.D., is infamous for this. In past years they would publish "reviews" of albums with copy taken straight from the record label's press releases, and in many cases run a glowing review of an album opposite a full-page ad for the same CD!
Allmusic zig-zags this:
It rarely rates an album below three stars, and never rates an album five stars when it comes out.
It isn't unheard of for them to go a little lower. Brooks & Dunn's and Kenny Chesney's discographies include at least a two-star and two-and-a-half star apiece. Kenny has two two-stars.
With certainartists it shifts the scale about one-and-a-half stars lower.
Allmusic also seems to have a strange hate for later "Weird Al" Yankovic albums, which are usually well-received by others.
Some of the reviews date from when Allmusic was still in book form, and in those cases, the stars don't always match up — so they might say an album is unremarkable yet give it four stars, or say it's great but only give it three.
In a similar vein, Country Weekly magazine has used a five-star rating in its albums reviews section since late 2003, a couple years after Chris Neal took over as primary reviewer. Almost everything seemed to get an automatic three-star or higher, with the occasional two-and-a-half at worst. Perhaps the only time he averted this trope was in one issue where a Kidz Bop-esque covers album got one star. Before the star-rating system, the mag's reviewers were even more unflinchingly favorable, both from Neal and his predecessors. When new reviewers took over the reviews in late 2009, they got a little more conservative with the stars; one gave an album only two-and-a-half stars, although the tone of the review didn't suggest that the album was even mediocre.
Robert Christgau used to be much more diverse in his ratings, which either ranged from E- to A+ (before 1990) or through a wide variety of grades including dud, "neither," honorable mention, and B+ to A+. Now that he no longer has the same encyclopedic approach to reviewing he once had, he only rates albums he likes as part of his "Expert Witness" blog, effectively limiting grades from B+ to A+ - literally only four different grades. Though limiting his effectiveness as a reviewer, the new scale makes him considerably more likable as a person.
Rolling Stone almost never gives out one or two stars for albums. Long time readers know a three star rating means the album is unlistenable.
This trope hits professional wrestling reviews hard. Virtually nobody is satisfied with any rating below four stars. Japanese wrestling reviewer Mike Campbell has gotten a reputation as a horribly biased negative critic simply because he averts this trope very hard while explaining the pros and cons of a wrestling match in meticulous detail.
The 10-point must system used for scoring various boxing and MMA bouts.
In boxing, judges award the winner of the round 10 points and the loser 9 points. Barring fouls, the only way to get fewer than 9 points is to get knocked down, which is rare and usually indicates that the boxer is about to lose. Scores of 7 or fewer would require the boxer to get knocked down several times in a 3-minute span. In that situation, the referee or the fighter's corner would usually stop the fight before the round ended. Rarely, rules are set in place in which the fight is automatically stopped if three knockdowns occur in a single round. Thus, in fights that go to decision, the scores are very large, but decided by only a few points. You get 108 points just by managing to not fall down for 12 rounds, and 120 points for winning every single round.
MMA also uses the 10-point must system, but has no knockdown rules. Therefore, if you lose the round, even by the narrowest of margins, you get 9 points. If you're utterly dominated from start to finish, you'll get 8 points. Barring fouls, there's basically no way to get fewer than 8, as a fighter who is performing that poorly would be rescued by the referee.
In competitive debating tournaments
In one scoring system, 75 is considered an average speech, and virtually all speaker scores fall between about 70 and 80, with 79 or 80 being a demigod level speech. Supposedly if someone simply gets up, repeats the topic of the debate, and sits down, that's about a 50. Getting enough judges for a debate can be a problem; often the judging forms are very specific to try to get around the fact that some judges may be, effectively, people who wandered in because they smelled coffee. There are forms where the judge is asked to circle a number from 1 to 5 on 20 different categories, then add the numbers up to give the final score. Since in some categories a 2 is roughly equivalent to "Did not mumble incomprehensible gibberish during the entirety of the debate," 40-50 is about the lowest score you can get if you even attempt to look like you're self-aware.
In other formats, each competitor's score is determined by adding the judges' individual scores, each one out of fifty points. Judges are instructed to both score and rank each competitor. Where the fun begins is that judges aren't allowed to give tied scores, and scores are only allowed to differ from each other by one point. The result being that first place, in every round, automatically carries a 50, second place a 49, and so on. Even if a competitor starts his piece over more than once (which automatically carries a ten-point penalty or worse, depending on the format) they're often just given the last place score. Few judges ever rock the vote; a judge who awards a first place a 49 (let alone, say, a 45) is regarded as being unfamiliar with the format. The dark irony hits when you realize that the most veteran judges are the ones willing to be tough; judges who don't know their way around the competition usually just punt it.
Rivals.com, a football recruiting site, ranks prospects using the standard 1-5 star scale. Then they have a vague additional ranking system that ranks players on a 4.9-6.1 scale.
In ski jumping each jump is scored by five judges. They can award up to 20 points each for style based on keeping the skis steady during flight, balance, good body position, and landing. The highest and lowest style scores are disregarded, with the remaining three scores added to the distance score. However, anything below 18 is usually considered a slightly botched jump and scores below 14 are only ever seen when the jumper falls flat on his face upon landing.
In NCAA football, going through an NFL draft voids the remainder of your scholarship years, which often prevents players from finishing any degrees they have not completed. In order to "help" kids who were on the fence about declaring or staying in school, the NCAA allowed them to consult a panel that would predict where they would be drafted should they come out. However, this panel was notoriously optimistic, frequently telling hundreds of kids a year that they would be drafted in the first 3 roundsnote For reference, with 32 teams in the NFL, that equates to telling all of them they are one of the top 100 players in the draft pool that year.. This had very real consequences as many kids were lured by the promise of NFL riches, fell to late in the draft because they were raw players, and washed out of the NFL before developing.
Gymnastics is theoretically scored out of 10, but is really marked between 9 and 10. Anything below 9 pretty much means "fell off equipment".
In general, electronic products (and products in general) are rated based on their performance in a particular price segment, not overall performance against everything else. The reason is because this would be really unfair to the more affordable and sometimes more practical products.
As an example, a $100 graphics card that performs better than its competitors in this price category (typically ±10%) can receive a 9/10. But a $500 graphics card that can't match its competitors may receive a 7/10, even though the $500 graphics card will totally blow the $100 graphics card out of the water in performance alone.
Also products should be rated based on the times. It seems silly that a 10/10 product from 10 years ago still holds any weight against a 8/10 product of today. Generally though, the user experience is what counts.
Generally the case with all electronics for people: never buy any product given less than an 8 on a 10 point scale. The reasons for this are complicated, but basically boil down to the following few reasons:
A lot of it has to do with useability. If a sample of reviewers generally agreed that the useability of the electronic gizmo sucks and thus gives it lower scores, then nobody will buy it because who wants to buy an electronic gadget that's annoying to use?
Almost every complaint that you could make about most well known high-tech products is either based on taste (iOs vs android say) or is strongly counterbalanced by price (a top end graphics card against a $60 model). The few complaints that don't fall into those two tend towards nitpicking and are often only visible when sitting two things next to each other. So whatever problems you might find can't take too many points off if the device does what it is supposed to for that price.
Gadgets have some of the most vehement fanboys on the internet, and so a site that tries to cater to all of them has to hedge their scores to keep everyone happy further pushing the scores closer together.
Finally, they have to keep the manufacturers happy too, because those smartphones, SLR cameras and 3D TVs aren't cheap. So they will almost always focus a review on the 'new' feature being touted by the manufacturer and how amazing it is and then ignoring the same feature on similar products who are pushing a different part of their widget as being awesome.
Attack of the Show!'s Gadget Pr0n segment has never rated any reviewed item below 70%. Even a digital camera with grainy picture, difficult menus, unresponsive buttons, low battery life, insufficient storage space, and inadequate low light sensitivity that is several hundreds of dollars too expensive will still get the equivalent of a B+.
Zig-zagged by Mac|Life back when it was still called Mac Addict. At the time, they had three review sections: a generic one, one for interactive CD-ROMs and one for children's software. All three used a four-point scale with their mascot, Max: "Freakin' Awesome", "Spiffy", "Yeah, Whatever" and "Blech!".
The catch-all section had reviews written by a panel of reviewers, summarized with the responding four-point scale and a good news/bad news blurb. If they could find even one good thing to say about it, it usually got a "Spiffy" at worst. "Yeah, Whatever" was usually reserved for unspectacular products, and "Blech!" was all but nonexistant.
The interactive CD-ROM section, however, was just the opposite. It used a three-reviewer panel for each CD-ROM, and it was very rare that any of the three had anything good to say about any of the interactive CD-ROMs. You could pretty much guarantee at least one "Blech!" here.
And finally, the children's section used feedback from actual children, with a summary from a regular reviewer. The children's panel and the main reviewer were weighted to give the overall rating, but even then, you'd be hard-pressed to find a "Blech!"
All of this went out the window when the magazine repackaged itself as more staid and formal, going with a standard five-star scale (which has remained with the shift to Mac|Life).
Edge magazine is one publication that, over the years, has attempted to stick to a rating system where a score of 5 should ideally be perceived as average, not negative. However, their mean score is definitely skewed closer to 7, simply because the magazine is more likely to review relatively polished high-profile games than the bargain-bin budget titles that would balance out the weighting the other way. Edge has done quite a lot of self-analysis of its own reviewing/scoring practices over the years, with articles like E124's look at how reviewing practices vary across the gaming publications industry (how much time a reviewer should spend with a game before rating it, how styles of criticism and ratings criteria vary depending on the target audience, and so on). Up until a few years ago, they also did a lot to build up the prestige and mythology around their rarely-awarded Ten Out Of Ten score (see, for example their 10th anniversary issue (E128) retrospective look at the highly exclusive club of four games that had received that score up until that point).
Then in 2007, Halo 3, TheOrangeBox, and Super Mario Galaxy were awarded 10s three months running, and since then the score has been awarded a lot more frequently. (See this interview with the editor for a discussion of their reviewing philosophy from around that time.) In contrast to 10/10, they've only used the dread 1/10 score twice - for the godawful Kabuki Warriors, and Flat Out 3.
Shortly before becoming discontinued, Games for Windows: The Official Magazine (previously Computer Gaming World), switched to a letter grade system like that used in schools, precisely because of this problem. This system is now used on their corresponding website, 1up.com.
Computer Gaming World rather famously didn't have numerical / starred reviews for its first fifteen years or so, until the mid-Nineties, when readers who didn't want to actually read the whole article and just look at the score finally complained enough that they started giving out 0-5 stars. When they did start actually giving scores to their reviewed games, in most cases they were more than willing to use the entire scale. They even had an "unholy trinity" of games that were rated at zero (Postal 2, Mistmare, and Dungeon Lords).
The notorious game reviewer Jeff Gerstmann (who was responsible for the 8.8 trope) was fired by Gamespot for panning Kane and Lynch (a game heavily advertised on the site) with a 6.0. However, the site says he was fired for personal reasons. Also, he was not exactly alone among reviewers in scoring the game poorly. Of course, after this controversy, and his firing, Gerstmann started up Giant Bomb. Over there, Gerstmann and his crew use an X-Play-style review scale (1-5 stars, no half-stars), and they're more than willing to dish out 1 and 2 star reviews for bad games. He later reviewed the sequel Kane and Lynch: Dog Days, which he gave a 3 out of 5 (an average score).
Alex Navarro (a co-worker and supporter of Gerstmann's) often broke the four point scale when he reviewed games including Big Rigs, Robocop, and Land of the Dead.
Gamespot is partially guilty of the scale: browsing their reviews archive, almost 123 of their 233 pages so far score between 7 and 10 (and only seven have a perfect score, which take time to appear - the 4th in 2001, but the nexttwo, only in 2008).
Once upon a time, Gamespot had an excuse for this. A now-long removed from the site breakdown of their scoring system revealed that being technically competent (bug-free console release or a feature-complete PC release that would run on common system configurations at the time) automatically got a game a 6 and other factors built the score up from there. This page, and presumably the system, have been gone from the site for at least five years by now, though.
A non-review example of this occurs in the Guitar Hero games: You will never get fewer than 3 stars on anything, no matter how badly you do. It's just a question of whether you get 3, 4 or 5.
However, Rock Band averts this. As you build up to the base score, which is the score you'd get for hitting every single note if there was no combo system and no Overdrive, you go from 0 stars to 1, to 2, and finally to 3. With the combo system and Overdrive, however, getting 3 stars is still laughably easy on most songs. 4- and 5-starring songs is still just as hard (or easy, depending on the song) as it was in Guitar Hero. This all means that it's more than possible to complete songs with scores below three stars.
It's still not possible to get 0 stars—someone tested this with the song "Polly" by Nirvana. The song literally has only eight notes in its drum part, so it's possible not to hit any of them (and, thus, not to score any points) and still pass the song. The results screen? 0 points and 1 star.
Guitar Hero Metallica introduces a star meter somewhat similar to Rock Band's. The difference is, you still can't get less than three stars in GHM; until you have at least three stars, the star meter will "help" you fill it until you reach three, which sometimes entails, for example, automatically filling itself during sections with no notes.
Guitar Hero sort of justifies it, because "failed a song" means "got a bad review" and so if you get less than three stars you failed. It's more like a Hand Wave than a real justification, though.
The minimum 3 stars on Guitar Hero is only for the guitar. You can get 2 stars on the drums or the mic, not sure of the reasoning behind it but it's possible.
The opposite end of the spectrum occurs for certain DDR clones. In The Groove 2? An "A" is somewhere around low 80%; after A+ is S-, S, S+, one star, two stars, three stars and four stars.
Independent review site WorthPlaying.com has a typical floor of 4.0 unless the game is flat-out broken (in the sense of significant glitches).
Hardcore Gamer Magazine has an interesting version of this. Each game is reviewed by two staffers; the first gives the in-depth review of the game and awards a score (0.0—5.0 scale), then the second comes in with a "second opinion" score, and gives usually a one or two sentence aside about the game. The two scores are averaged out. And while it's refreshing to see the two scores differing by about half a point, the real entertainment comes from watching the second opinion offering completely derail the score of the main reviewer.
RPGFan is notorious for this - with rare exceptions, even a game the reviewer will spend the entire piece criticizing will still get at least a 70. They posted an editorial about it, providing an explanation of their methods and somewhat admitting that the lower half of their scale is pointless, but sidestepped describing their reasoning, instead saying that you should focus on the the text of their reviews.
RPGamer used to score on a scale of 1-10, but ultimately dropped this in favor of a 1-5 system because of this very trend. This led to their reviews since the change actually using the entire scale, with several 1s and 2s given to games that truly tortured the staff members reviewing them. While older scores on the older scales remain unchanged, the review scoring page provides a conversion scale that has led to many games experiencing a severe drop in score when converted to their latest scale.
Videogame magazine Electronic Gaming Monthly, or EGM, made a conscious effort to avert this: most (previously all) titles they featured were handled by three separate reviewers, and highly varying impressions were surprisingly common. Closer to the end of its run, they switched from a 1-10 scale to a 'grade' system (A, B, B+, etc.) for the purpose of avoiding the Four Point Scale trap entirely.
Towards the end of the mag's original run, they handed off the really awful games to internet personality Seanbaby, who wrote humorous reviews lambasting them for being so bad that nobody would - or should - ever play them (many of the reviews can be seen, in extended and uncensored forms, on his website).
Eventually this reached its ridiculous-yet-logical conclusion when EGM was denied a review copy of the Game Boy Advance The Cat In The Hat movie tie-in game, which the developer said was because they "didn't want Seanbaby to make fun of it". Or, to put it another way, they acknowledged right out the gate that their game was so bad it wouldn't even rate a 1 in the normal review section. Seanbaby obligingly went out and purchased a copy just so he could lambaste it.
There were letters from the editor talking about how some company or another wouldn't give them information about their games anymore because of the bad scores they handed out. This happened at least twice with Acclaim and once with Capcom. In their first encounter with Acclaim, EGM had handed out very low review scores to their Total Recall game for the NES; when Acclaim threatened to pull advertising if they didn't give the game a better review, editor-in-chief Ed Semrad wrote in an editorial column that they could go right ahead, because they were sticking by the review even if it cost them money, because journalistic integrity was more important than a paycheck. The second time this happened, it was because EGM had blasted BMX XXX (and rightfully so); this time, Acclaim threatened to never let them review another game of theirs ever again, to which EGM said "fine by us". Capcom's case was a somewhat different affair: it wasn't a review that got them angry, but instead EGM badmouthing the constant stream of "updates" to Street Fighter II; when Capcom asked EGM to apologize for the remarks in exchange for not pulling advertising, EGM again said that they would not retract the statements even if it cost them Capcom's money, because they felt honesty and independence in their publication was more important. In all three cases, Acclaim and Capcom pulled ads from the mag for a few months before buying adspace again.
It should also be noted that EGM's review system was heavily inspired by Famitsu's review system. The first issue of EGM, however, featured scores that ranged from 'miss' to 'DIRECT HIT!'.
Actually inverted by EGM in 1998, where they revised their review policy in order to give HIGHER scores, specifically 10s. There was a period from late 1994-mid 1998 where no reviewer had given out a single 10 (Sonic & Knuckles being the last one to receive one). After a slew of excellent high-profile games such as GoldenEye and Final Fantasy VII passed through in 1997 with 9.5s, the mag revised its policy in the summer of 1998. Previously, a 10 was only awarded if a reviewer believed the game to be "perfect". But as Crispin Boyer pointed out in his editorial discussing the change, "Since you can find flaws in any game if you wanted … there's really no point in having a 10-point scale if we're only using 9 of them." Thus, a 10 would be given out if the game was to be considered a gold standard of gaming and genre. The very next issue, Tekken 3 would break the 3+-year spell by receiving 10s from three of its four reviewers, and later that year, Metal Gear Solid and Ocarina of Time became the first games to receive 10s across the board in the magazine's long history.
EGM also received criticism from readers that some games would receive high scores one year, but the next year, a new-and-improved sequel or an extremely-similar-but-better game would come out to lower scores; alternately, a game that received high scores upon its original release may be ported to another system, or remade years later, to lower scores. Reader logic was that if Game B was better than Game A, objectively, Game B had to be rated higher on the numerical scale (see an entry above). This was addressed multiple times in the reader mail and editorial sections, where it was explained that they did not follow this rule, as long-running and generally high-scoring yearly sports series like Madden or Tony Hawk's Pro Skater would have hit the 10-point ceiling years ago due to improvements in each version. Furthermore, at least technically speaking, games will always be improving due to the more powerful consoles and computers that are released every few years. Finally, innovation naturally tended to score higher because of its originality than when all those ideas were incorporated into every game the next year. EGM explained that instead, they rated games based on the current marketplace, and specifically compared new releases to others within its own genre, while their level of standards would naturally increase into the future as games became more ambitious.
Dr. Ashen's review of Karting Grand Prix mocks this, with Ashen referring to the game as "irredeemably awful", then giving it a score of 73% "because I'm a fucking idiot."
In an earlier review on the Gamestation, a flea-market handheld game system resembling the original PlayStation, Dr. Ashen gives the system 7/10, saying that it's the lowest score one can give "before the company pulls their advertising".
And in yet another review he gives a product 8/10, but "only because it's made in China, and I'm terrified of their government."
He did give out a numerical score for Wolfenstein a two out of five stars, which is already an aversion of this trope. Likely the reason he did give out a rating, though, was because he did the review almost entirely in limerick form and just needed a rhyme.
It is also worth mentioning that, his lack of using scores aside, Yahtzee subverts the whole reason for this trope in the first place (that is, reviewers not giving bad reviews more or less to keep their jobs). His job practically is to give bad reviews and often receives criticism when he praises a game.
British gaming magazine PC Zone's reviews run the whole gamut from 7%-98%. Similarly, a score of 80%+ does NOT automatically gain a "Highly Recommended" award; although these often ARE given out to high scoring games, on occasion they have not been awarded to games that are technically good, but are lacking in some kind of "soul" that the reviewer (and the Second Opinion reviewer) would have liked to see present.
This compilation of MetaCritic scores is this trope in all its glory. 70% is worth no points, 60% is -1, and anything below that is -2. It doesn't really prove consistency, for one. That is standard deviations, while this is a total of points. For another, putting negatives that high just makes the lower scorers look even worse. Talk about spin.
GameTrailers generally has very informative and reliable reviews that coherently explain the points they try to make as the review itself is going on, but the score at the end falls squarely into this trap, the lowest score they usually give being somewhere in the 4.7 to 5.0 range. It once gave a humorous "suicide review" of Ultimate Duck Hunting presented in the form of the reviewer having killed himself over the game and his review being his suicide note, and went on about how it was bad enough to push him over the edge at every turn, only to give it a 3.2.
Nintendo Power is usually good at averting this trope, but some of their reviews of games in popular franchises tend to be given high ratings by default.
With this magazine, what you have to watch for is not the score, but the number of pages of the review. The Nintendo blockbusters get two, three, even four page reviews, squishing out reviews for other games.
They also admitted in response to a letter that while they use a full ten-point scale, they won't put up a review for a game lower than a two, reasoning it's too bad to even bother with, and they only give out tens for the super-duper cream of the crop.
Amiga Computing gave 100% to Xenon 2. A reader called them out on this, asking if they'd give a higher score to an even better game. ("Yup.") They later gave out a score of 109%, and another 100% in the same issue.
The UK Official Dreamcast magazine aimed to avert this trope (back around the turn of the millennium even) by insisting on a rating scheme where 5/10 was strictly "Average". This led to a huge amount of complaints from fans who missed the intention behind the scheme and complained that a game they liked got a "harsh" score (The creators of Fur Fighter commented that the 7/10 they got from the magazine was the lowest score the game received). Eventually, the magazine staff made a phrase for each number and put it under each review score so the reader knew what the rating actually "meant". (For instance, any 7/10 rating had the word "good" under it. Shenmue was the only game that let us find out that the word under a 10/10 was "genius").
The Finnish gaming magazine Pelit uses this to a degree: They use a percentage scale for their game reviews, and they do use the entire gamut of their scoring system, but anything below 65 is still relatively rare. The magazine used to include an info box that described anything below 65% was below all standards, and 50% and lower meant the game was truly atrocious. While the 50-or-lower reviews are amusing to read (such as their Fight Club review where the entire review was just the phrase "Rule 1 of Fight Club: You do not talk about the Fight Club" with a 20% score), the staff hardly ever go out of their way to seek bad games to review, because they don't hate themselves that much. Instead, they pick games that they know they'll like, or ones that have interesting subject matter or are otherwise noteworthy. Originally their scoring system was chosen to maintain compatibility with other gaming magazines of the time, by the early 2000s there were basically no other respectable magazines around that still used the same scale, and the staff have mentioned repeatedly that they would like to switch to a star-based system or no score at all.
Ars Technica has started reviewing video games on a three-point scale: Buy, Rent, and Skip. They expand a bit upon why they use that scale and why they aren't part of Metacritic.
Screw Attack has the same review system, with the exception of using "F' It" rather than "Skip." It's also the system used for the video game reviews in Boys' Life (the magazine of the Boy Scouts), under the names of "Buy," "Borrow," and "Bag," but not many people care about that.
Disney Adventures also used to use this rating system as well.
Nintendo Power uses a three-tier system for digital download reviews ("Recommended", "Hmmm...", and "Grumble Grumble").
Inside Pulse tried to avoid this, but got so many threatening letters from developers that it gave up on a numeric scale entirely, describing games with positive and negative adjectives instead.
When Assassins Creed II was due for release, Ubisoft got caught in a major shitstorm when they announced that they won't give the game out for testing unless the reviewer agrees in advance to give a positive review. Apparently, it didn't need the "boost".
Eidos also pulled this trick for Tomb Raider: Underworld.
Videogame review site actionbutton.net has been routinely lambasted for using a four point scale from fans who believe a game should have gotten five stars.
Spanish mag Nintendo Acción runs on this, to the point some Pokémon fans complained when Pokemon Black And White got only a 94, when other games got 96-98 scores. Though in their defense, said review also lambasts the game's graphics, despite the great animated sprites and the Scenery Porn the game has, and yes, the previous games got better off on graphics somehow.
While Toonami hosted dozens of video game reviews over the course of the show, only a handful ever scored below 7 out of 10. No games ever scored lower than 6 on that scale either. The creators have admitted this is due to not having a profession reviewer in their group and only playing games they really like, not wanting to fill the air with needless negativity.
Believe it or not, during the mid-late 90's, IGN was actually pretty good about averting this trope. However, sometime around late-1998, it gradually cropped up more and more frequently on the site. For example, in 2000, they wrote a very critical (and angry!) review for the PC version of Final Fantasy VIIIbut still gave it a pretty solid 7.4/10.
A very notable exception to the rule is the VNDB (Visual Novel Data Base), which, as the name suggests, is a listing of (Japanese) visual novels on the market. When a user attempts to give a 10/10, the site actually warns them that this score is reserved for absolute perfection that is unlikely to ever be improved upon and as such, should be given only two or three times at most over one's lifetime. As a result, the list only has two entries over 9.00 and less than 50 entries over 8.00, out of a database of well over 10,000 titles. Since visual novels have fairly low requirements to function, as opposed to regular video games, their quality is almost entirely based around the story and therefore highly subjective. As such, even a game that scores around 7.00 can still be very enjoyable.
The defunct Game Player's magazine (now absorbed into several other publications) once had a major shakeup after realizing it had fallen into this trope, with even "terrible" games rating 50-60% scores. A new rating scale was devised to even out the score distribution, and was meant to be read in context with the review itself rather than be taken as an absolute. Under the new review system, even a game with a 50% score is probably still worth solid consideration by a fan of the game's genre, and a low-rated game could either be thoroughly underwhelming, or an excellent game for a very small audience of players. 90% and above, however, would be restricted only to games so fantastic that players outside of its genre might consider checking it out, and consequently, very few of these were given out through any particular year.
Chris Livingston, of Concerned fame, brings this up in his "Bullet Points" series on Crysis 2:
YouTube had a rating system that let people give a video a score of up to five stars, though hardly anyone gave less than three, unless the video was particularly bad. This graph illustrates just how ridiculous it was. This led to a few wide-spread incidents of vote-bots giving dozens or hundreds of one star ratings to people whose videos disagree with the attackers' own political or religious beliefs, where a drop even to four stars will greatly reduce a video's traffic. Youtube has since dropped the 5-star system and changed it to a simple like/dislike system.
Similarly, a web site that hosts community content for Left 4 Dead allows people to give reviews on the created content, ranging from 1-100. Trolls or people who exaggerate how much they hate the custom content will generally give a rating between 1-20. Anyone that wants to praise the author to hell or if the author is using an alt account, they will give scores of 90-100. For the latter, the people will ridicule others who give scores between a 60 and an 80, even if the content doesn't meet the standards of receiving a high score. In other words, if the content is decent, you better either give high scores or risk being flamed by the community for being too harsh or a troll.
The site then added a "I agree/I disagree" system to combat people who were abusing the score system, similar to many places that use a Like/Dislike feature. Unfortunately, it backfired since people could rate down a review enough to actually get a reviewer's score removed and be taken out of the overall average, which meant that a group of people could team up and vote down a review if the score wasn't a perfect 100. In turn, the reviewer could just reset their score to put it back up and defeat the purpose of the voting system. This system was then removed.
There is also a critic scoring system, which is displayed alongside the average score made by the community (example: a campaign can have a score of 85 by the community and a 60 by members who have critic status). However, since members with the critic status are just regular members who gave a lot of reviews, they are still open to spamming low or high scores.
Newgrounds is somewhat of an aversion to this; while the scale is only 0-5, it's an unspoken rule that if it's not up to snuff for the portal, it's a 0, if you just didn't like it or something along those lines you should vote 2, and if you love it vote 5. While 1 3 and 4 are in there, hardly anyone uses them. Undoubtedly this is partially due to its "Blam"/"Protection" system which, generally, rewards you for relatively high ratings of content others have rated relatively high and low ratings for content others have rated low, in a blind system.
Though, as Retsupurae shows in their Newgrounds LPs, the comments will be filled with people who will give text reviews in the comments, and it's surprisingly common to see people complain about how bad a game or video was, then give a 10/10.
Netflix allows you to rate movies, and aggregates all the user reviews into a star rating. Because there are people that will like something no matter how bad it is, and some people that will hate something no matter how good it is, 1 star and 5 star ratings are impossible. However, if a movie doesn't get above 1 and 1/2 stars, you should probably avoid it, and if it reaches 4 and 1/2 stars, it's probably worth watching. So the scale is skewed, but still relatively accurate.
A particularly interesting example of this trope occurs with brokerages. Brokerages have a quid pro quo relationship with the firms that they're supposed to be rating. Usually there's an informal understanding between the two that if the brokerage advise their investors to sell a particular firm's assets that firm will stop providing the brokerage with information or other privileges. So brokerages almost never give firms a "sell" rating.
You can see a Four Point Scale in corporate credit ratings where junk bonds and high risks get a B-rating while better investments get A, AA, AAA, etc. In ordinary education system a B is a respectable grade and a C is a clear pass.
While there are independent rating agencies that are more honest, the big three (S&P, Moody's and Fitch) all receive contributions and payments from the companies they are rating. When the time comes to evaluate a company, the big three are generally the most listened-to voices. The incestuous relationship has been theorized to have greatly contributed to the 2008 Economic Meltdown.
It's worth noting that the lowest investment-grade (below which anything is "high yield" or "junk") is BBB-, and anything below AAA and above default can have a +/- modifier. Below BBB is BB, then B, then CCC, then CC, and a C rating is generally reserved for companies that are paying on time, but who have breached their collateral requirements and are in imminent danger of defaulting. A D is given if you actually default.
And now S&P has made the historic move of reducing the credit rating of the United States federal government from "triple A" to "double A plus." And if that sounds like some kind of joke … well, it is (see the Futurama example below). No word yet on whether the next grade down is plain "double A" or some increment like "double A plus minus."
AA+ is a completely valid rating that S&P was using long before giving it to the United States Government. Several other countries have AA+ rated sovereign debt, and corporations wear a AA+ rating as a badge of honor (it's EXTREMELY difficult for a corporation to obtain a AAA rating on its bond issues, as AAA implies no risk of default within the next 12 months). Thus, it would be perfectly respectable to have a AA+ rating but for the fact that it used to be AAA.
Related: Jim Cramer of Mad Money (featured in Arrested Development and Iron Man, as well as being an actual show) received a lot of flak from Jon Stewart of The Daily Show fame when it was revealed that he recommended buys and holds on stocks and companies that were, days later, revealed to be financially and ethically bankrupt. Further investigation revealed that Cramer and some business partners of his were using his show to artificially run up prices of stock that they owned by encouraging buys, then selling the stock, in a bizarre pump & dump scheme that he has never been prosecuted for.
This may explain the observed fact that you can indeed make money by following Cramer's stock picks specifically by selling them short the day after he touts them as a "buy" on his show.
Couchsurfing.com is a hosting website based around building up a reputation through a publicly visible vouching/feedback system. Negative "reviews" are so rare that many people will refuse to stay with or host people who have even one.
Ebay only has a Positive-Neutral-Negative rating system, but it still skews very much toward positive. Some people leave neutral feedback for sellers when they really should give negative. Part of this is because Ebay doesn't allow anonymous feedback and a few sellers flip out and give the buyer negative feedback in retaliation.
The system itself actually discourages users from giving anything other than positive, making the user confirm that they have given the seller ample time, that they have tried to contact the seller about any problems, and that they understand what they're doing in order to give a neutral. This is more confirmation than one has to do to sign up to the system.
Now sellers are not even allowed to rate the buyers at all. This leads to ratings extortion; i.e. the buyer can withhold rating you positive after they receive the item unless you refund a portion of their money. To make matters worse Ebay now has "Detailed Seller Ratings" which are nominally based on a five-point scale. However, sellers will receive a warning (possibly followed by the withdrawal of certain selling privileges) if any of their ratings fall below 4.5. This means, in effect, that 4 out of 5 is considered a bad score and that it's actually better not to receive a rating at all than to receive one less than a perfect 5/5.
By and large, the lowest score you'll see for an album, book, or movie on amazon.com will be four stars. Similarly, on the seller's page, if any one of the ratings are under 4 stars, they either are flatly taking your money and not making any effort to send the product or (more likely) somebody is overreacting to something out of their hands, like the post office losing the package or they didn't like the product. Sometimes very egregious when the comment will say "Product was supposed to be like new but did not work at all" and will still get a 3 out of 5.
The LSAT has a minimum score of 120, and a maximum of 180. The empty range is twice the size of the scored range.
Somewhat related, the Dutch Cito test at the end of primary school, which partially determines what kind of secondary education a pupil can/will take, has a range of 500-550. (The reason for this is to avoid the Cito results being misinterpreted as IQ.) The empty range is ten times the size of the scored range.
If you're involved in humanities degrees in the British university system, you'll almost never see a mark below 35% or above 75%; forty points used on a hundred-point scale. Language marks tend to be capped at the top end to bring them in-line with humanities, since otherwise it would be quite possible to get 100% on a language test. And of course your final degree in any subject is awarded on a four-point scale, First/2:1/2:2/Third. The thresholds for those are usually 70/60/50/40% respectively.
While not quite as bad, the SAT I has a range from 600 - 2400 (a recently added section changed the grading from the previous score system of 400 - 1600.) Additionally, turning in a completely blank test (if it isn't discarded out of hand) will not result in the lowest possible score - the test taker has to actively answer questions incorrectly to get the lowest score.
The reason for this is to discourage guessing; if you don't know the correct answer, you leave it blank unless you can narrow it down to the point where you are on average gaining points.
In music festival ratings (mostly for high school choirs, orchestras and bands), you theoretically have 5 levels you can rate a performance. The scale is 5 = Poor, 4 = Fair, 3 = Good, 2 = Excellent, 1 = Superior. Very few groups get a 4 or 5, and 3's are what's given when something was terrible. The "Excellent" or "2" rating goes to groups that range from acceptable to very good. It's partially meant to be encouraging. You also have to sign your rating form and you want to be invited back - judges get paid.
The only groups that ever get '4's or '5's are those that are entering the competition for the first time. It may be that most judges consider a '4' to be below whatever they usually see in a music festival.
Many of these competitions have different "levels" of competition so that smaller schools and/or those without much band funding don't have to compete with programs with lots of support and a larger pool of students from which to draw. 4s and 5s often happen when a school enters themselves in too high of a level.
It is difficult to grade on a 100-point scale, since many ratings sites do, so even the best amateur critics tend to have a bimodal or trimodal distribution.
Competitive high school debate organizations use a different scoring system for each event, but a particularly egregious example of this trope can be seen in the Lincoln-Douglas event. Judges are asked to score competitors on a 30-point scale, but any score below 20 is to be reserved for extreme circumstances in which the judge must provide a written justification of why they gave a score lower than 20. Basically, as long as a contestant gets up, says enough words to fill the time limit, and doesn't use any foul language, they get at least a 20/30.
A (mostly Southern) California thing. Restaurants are given a letter grade based on health and safety standards. Mostly just how clean the place is. While the rankings do follow the usual A, B, C, D, F moniker, most restaurants have an A grade, it's rare that a place has B (even in food courts where its neighbors have A's). Though people accuse this system of another kind of Rank Inflation, as an A has no real value since everyone has an A.
There is actually a legitimate reason for this: Since it's an official government statement on a restaurant's hygienic practices, anything below an A is a kiss of death — consumers tend to assume that even a B rated place is a plague pit, even though objectively that's still considered an acceptable rating. Most restaurants overhaul their practices very quickly to get back to an A rating or risk going bankrupt. Although this imbalance was not intended, it's generally seen as an overall good thing from a public health perspective.
The USDA beef grading. Most meats that normal consumers have access to (from lowest to highest) is Select, Choice, and Prime. There's also five ranks below that, and from lowest to highest: Canner, Cutter, Utility, Commercial, Standard. However, Kobe beef from Japan causes Rank Inflation. It's so good, it has its own grade above Prime*
However, note that if you're not eating it in Japan, it's almost certainly not actual Kobe beef, since the name has no legal protection outside its home country and can be slapped on dog food. "Kobe Beef" in U.S. restaurants is always "Prime with some extra spices and twice the price tag"
Angus beef may be in a similar situation.
Anyone watched the Olympics? Try the gymnastics events sometime. Despite being on a 10 point scale, it's rare for any competitor to get below a 9.5.Rank Inflation is so bad that critical flaws (such as a gymnast actually tripping and falling on their face) are worth only about a tenth of a point. Flaws that we viewers can't even distinguish? 1/100th of a point off. Scores generally range from 9.7 to 9.9.
Telephone customer service personnel will occasionally ask you to rate their level of service on a scale of 1-10. If you answer 9 or below, they'll ask for specific reasons why you didn't give them a 10. Customers who can't or don't care to name specific flaws in the service will probably amend their rating to 10. This makes a rating of 10 equivilent to acceptable service with no specific complaints rather than outstanding or beyond expectations service.
People often rate appearance on a 4 point scale. Studies have shown that when asked to rate their own appearance a person will rate themselves somewhere in the 6 to 9 range on a scale of 10 (basically putting everyone above average in their own opinion) but will only rarely rate people lower than a 4 and even then most admit feeling guilty.
From 2005 to 2012 Ofsted school inspections in Britain graded schools on a scale of Outstanding, Good, Satisfactory, or Inadequate. Schools and their senior staff would invariably be criticised for being "Satisfactory". In 2012 "Satisfactory" was renamed "Requires Improvement", reflecting what the grade had already come to mean.
Enlisted Performance Reports in the military. On a 1-5 scale, a rating of a "4" on any of the rating criteria could affect his next promotion. A "4" is listed as "Excellent".
Examples in media:
In My First IGN Interview (from the IGF Pirate Kart), you get the option to do a practice interview with an IGN applicant, who then asks you to rate how well she did. You have a choice between 10, 9, 8 or 7 out of 10, and if you pick 7 she gets as offended as if you had chosen 1. (This is obviously a subtle poke at IGN's game rating system.)
A mission in Borderlands 2's "Mr. Torgue's Campaign of Carnage" DLC involves the player characters being sent after a game reviewer who gave a negative review to a game Mr. Torgue really likes. The review: "Gameplay's pretty dull. It sucked. 6/10." Torgue is half upset because he thinks the game in question is very good, and half upset because by any logical standard a score of 6/10 is above average.
Parodied in the TV show The Critic. Jay is told by his boss that his job is to "rate movies on a scale from good to excellent." Jay himself in an inversion: he dislikes pretty much everything and the best score he ever gave a film was a 7 out of 10.
In an episode of The Simpsons, a journalist who travels around America visiting locations to review visits Springfield. He's repeatedly tricked and abused by the residents and storms off to give Springfield the lowest rating he's given anywhere: 6/10.
In another, Homer becomes a food critic. At first, being Homer, he gives everything an excellent review. While his fellow critics eventually convince him to be crueler, he still won't give anything lower than "seven thumbs up".