World of trade-offs: What journalists think of rating scales in fact-checking

Rating systems are widely accepted among and used by most fact-checkers. According to Mark Stencel, co-director of the Duke Reporters’ Lab, 70% of the 188 fact-checking outlets worldwide employ rating scales.

But the idea received backlash over the years, with critics alleging that the system reflects fact-checkers’ biases and potentially oversimplifies complex issues. Some journalists have expressed frustration that a rating system is limited in reaching certain audiences.

So why do some outlets choose to adopt, or not adopt, a rating system? And is there a way to improve the practice?

Interviews with professional fact-checkers and scholars present a seemingly polarized picture: like attitudes toward Congress, journalists at different fact-checking outlets see flaws in ratings — but they like their rating scheme.

“Marketing gimmick”

When Michael Dobbs proposed the idea of the Fact Checker column in 2007, the then Washington Post reporter had in mind a rating system using Pinocchio noses to measure the degree of falsehood in political claims. The more egregiously wrong a claim was, he thought, the longer the nose would be. An editor revised it to a 1-4 scale of Pinocchios.

The design, when first adopted, received much praise from within the journalism industry for providing a fun and digestible approach to expose political lies.

Glenn Kessler, Dobbs’ successor at the Post’s Fact Checker, said the use of Pinocchios makes it easier to draw audience in and grow The Post’s readership. Referring to, which chose not to use a rating system, he said that “they’re not trying to sell newspapers.

“We’re in the news business,” he said. “News business means headlines, it means bottom line assessments, it means easy-to-­understand ways to find out what your bottom lines ruling is … I’ve always said it’s a marketing gimmick.”

Almost at the same time when The Pinocchio Test was born, PolitiFact, the fact-checking arm under the Poynter Institute, put forward its own six-pronged Truth-O-Meter ranging from True (for nothing but the truth) to Pants on Fire (for the most egregious and ridiculous claims).

The ratings, widely referred to by politicians as well as everyday readers, also helps with marketing, said PolitiFact managing editor Katie Sanders.

The phrase “Pants on Fire,” derived from children’s rhyme “Liar, liar, pants on fire,” was PolitiFact’s effort to bring a sense of lightness to fact-checking, said PolitiFact reporter Jon Greenberg:

But it’s exactly the tastiness and levity of those different rating scales that sow doubts in fact- checkers who do not favor using a rating system. Daniel Dale, who writes fact-checks for CNN, said enticing readers with colorful labels built in a rating system was, to some readers, “treating political dishonesty with a kind of a lightness that it should not have.

“We’re turning the truth into … a kind of entertainment or game,” Dale said.

Quick fix or oversimplification? Fact-checking is not rocket science

In terms of engaging readers, Dale said that to also capture the nuanced and complicated analyses in each fact-check, he prefers custom-written summaries as opposed to simple labels.

Many fact-checkers echoed Dale’s concern that it’s hard for sophisticated political claims to fit perfectly into one of however many ratings a system has to offer.  

Bill Adair, founder of PolitiFact and designer of the Truth-O-Meter, said the design of the rating scale adopted a “layered approach.” By presenting the rating first and a more nuanced analysis later in each fact-check, he said, the system offers readers a choice to either get a quick fix by glancing at the rating fact-checkers gave or go deeper into the analysis in each fact-check.

Lucas Graves, who wrote the book “Deciding What’s True: The Rise of Political Fact-Checking in American Journalism,” said in an interview that a strong suit of these rating systems is that it allows reporters and readers to easily track a speaker’s record of truth-telling. The practice is “only possible because (fact-checkers) make the effort to quantify sort of individual decisions that they reach about political statements,” Graves said. “Fact-checks can have a greater impact when they are easier to quote, easier to refer to and easier to tally up in that way.”

But Jim Drinkard, former reporter at USA Today and former editor at the Associated Press, perceives danger in quantifying the number of inaccurate claims. He says it allows readers without the necessary context to misuse the ratings in a way they were not intended to be used. “Whoever is the president at the time is going to be the most fact-checked person in the universe,” Drinkard said. “So, if you’re counting up the number of wrong things that a politician says, the president is always going to have the most because they do the most public speaking and they are the most attended to.”

As commonly acknowledged by many fact-checkers, it is hard to apply a perfect rating to each claim. At PolitiFact, trainees were told that the truth wasn’t always clear-cut, Graves (2016) noted. “The very philosophy behind the Truth-O-Meter is shades of gray,” he said. The system prompted reporters to think like judges during the process of defending their own decisions on each rating, Graves (2016) said.

As part of a rigorous system, PolitiFact adopted a process where a panel of three PolitiFact journalists would review the rating before publication. The panel looks at similar claims and rulings in the past, Greenberg said, to keep ratings consistent.

Many who have studied fact-checking extensively or who are professional fact-checkers stressed the point that fact-checking is never rocket science.

“The process of evaluating a claim and assigning a rating is as much an art as a science that requires judgments by actual people,” Graves said.

Improving the system

While some proposed ways to improve the rating system, many are generally resistant to the idea of a system overhaul.

When asked the question whether PolitiFact would add a rating, for example, for claims that cannot be proven, Greenberg said changing the current rating scale would be “a big deal.”

“We don’t have such huge problems with what we’ve got right now that it is worth revisiting such a fundamental part of our operation,” he said.

The Post added a new rating named “Bottomless Pinocchios” in December, largely as a result of repeated false claims by President Donald Trump, Kessler said. But he says the Fact Checker is fine as is, Kessler said, other than to refine it.

So is there a way we can minimize the blowback but maximize the benefits of a rating system? Some fact-checkers shared their insights.

Start by acknowledging and articulating that fact-checking is not scientific research

The key to dial down the backlash, Graves said, is to lower the heat on ratings.

“[F]act checkers can be transparent about the methodologies,” he said. “They can be careful not to claim that their ratings reflect anything more than the judgment (by) seasoned journalist who are doing their best making a good­faithed effort to apply verdicts in a consistent way.”

Sanders said she would stress the importance of the context of each fact-check when promoting PolitiFact’s content on social media. “We need to just preach a better understanding of the big picture,” Sanders said.

Stay consistent

Rating definitions should guide reporters through the fact-checking process, Greenberg said, and should be consulted to make sure the reporting remains consistent.

Kessler, who has been writing for the Fact Checker for nine years, described his email exchange with a reader who wrote to tell him that a claim he had just rated was the same as a fact-check he had done years ago. “So with great trepidation, I opened (the email) up to make sure that the Pinocchio rating was exactly the same,” Kessler said. “And actually, it was.”

Check your own bias

Many fact-checkers mentioned that they are more aware of their own bias when fact-checking Trump.

When faced with a president who frequently make false claims, Greenberg said, the team grew more “attentive” to dubious claims from Trump. But at the same time, there is the danger of getting used to false claims made by Trump, he said.

When the PolitiFact D.C. bureau was rating a claim made by Trump on polling numbers, Greenberg said, the team in Washington was leaning toward Mostly False. But an editor based in Florida suggested Mostly True.

“Because he said, look at this, his number is wrong, but it’s not ludicrously wrong. It doesn’t push you in the wrong direction,” Greenberg said. “And he argued our own jurisprudence on us in practices. We said, you are right, and it came out as a Mostly True.”

Kessler shares the similar experience. “We once nearly gave Trump a One Pinocchio,” Kessler said, “and I was like, wait a second, if Obama said this, we would definitely give another two.”

Explore alternatives to present the facts

To continue his endeavor to reach a wider audience, Adair said he and his colleagues at Duke University are researching on a new way to present on-screen live fact-checks on TV to what he called “Do-It-Yourselfers.”

“They don’t want us to summarize the facts for them; they just want to know the facts.” Adair said of the Do-It-Yourselfers. Instead of presenting rated fact-checks, Adair said his research would explore two alternatives: 1) give the subjects facts and leave the decision up to the subjects themselves, and 2) show the subjects terms such as “Half True” or “False” without a meter.

That doesn’t mean the Truth-O-Meter needs to be revised, Adair said. “I’m just wondering if we can present fact-check content in different ways that may or may not include the Truth­O­Meter.”

Graves said outlets with a rating scale and those without complement each other, as they both recognize each other’s work and form a diverse environment together.

“You want fact-checkers testing … the best ways to make a difference in public discourse, and make it easier for people to find the truth, and make it easier to discourage politicians from repeating false claims,” Graves said. “That’s more likely when you have lots of organizations trying out lots of different things than it is if you have a monoculture where everyone’s just doing the same thing.”


Comments are closed.