Can flawed doctor ratings harm patients?

Can you trust online physician ratings? Many people believe you can.

But here’s what a recent article in the Harvard Business Review (HBR) about the reliability of online user ratings regarding buying a product. The authors concluded that online ratings were generally not to be trusted for three important reasons:

The reviews are usually based on a small sample of users. The authors said, “We can be more confident in an average star rating if the sample size is large and if the variability of the distribution of ratings is smaller (i.e., if different reviewers tend to agree).”

People who leave reviews are not randomly selected to do so. “Consumers with extreme opinions are more likely to post reviews, which is referred to as a ‘brag-and-moan’ bias.” The distribution of ratings is often clustered at 5-star and 1-star with only a few 2s, 3s and 4s.

Most people who post ratings have not purchased comparable products. Therefore, they have no way to test one product against another. “It is well-known that consumers’ quality evaluations are heavily biased by variables other than objective product performance, such as brand image, price, and physical appearance.”

These principles can be applied to physician ratings too.

The most important problem with online physician ratings is sample size. The HBR authors illustrated this with an example.

Take a product with an average rating of 4 stars. If the product was rated by at least 25 users, you can be 95 percent sure that the average is between 3.5 and 4.5.

So if two doctors have fewer than 25 ratings each, there may be no real difference between a doctor with a rating of 3.6 and one with a 4.4 score.

What would you guess is the average number of ratings for physicians online?

According to the National Research Corporation in 2015, “Most online provider ratings are based on an average of only 2.4 reviews per physician.”

When I blogged about a 2014 paper claiming online doctor ratings were reliable, the paper said its authors looked at ratings during a 9-year period and found that 21 percent of cardiac surgeons in Florida were each rated an average of 1.9 times. The 79 percent of surgeons who had no online ratings performed 79 percent of the total surgeries in 2012, the year they chose to look at patient outcomes.

The majority of surgery was done by surgeons who had no online ratings. Comparing surgeons under those conditions is no better than flipping a coin.

What about online sites like ProPublica’s Surgeon Scorecard?

I and others have written about its many shortcomings, but not as succinctly and scientifically as patient safety guru Peter Pronovost and co-authors in the July 2016 issue of Annals of Surgery.

Since the full text of the three-page paper is available here, I will simply quote its conclusions:

“Patients and providers deserve valid and transparent performance measures, and hospitals and doctors should be accountable for the care they provide. Flawed measures, however, are not only meaningless but may actually harm patients, providers, and other stakeholders. The measurement enterprise must be held to the same high standards that we appropriately expect of health care providers.”

“Skeptical Scalpel” is a surgeon who blogs at his self-titled site, Skeptical Scalpel. This article originally appears in Physician’s Weekly.

Image credit: Shutterstock.com