The ProPublica Surgeon Scorecard: When journalists become scientists

Most of us would agree that there aren’t enough valid and meaningful health care quality measures to guide patients’ choices of hospitals and physicians. While the federal government has steadily expanded the number of publicly available measures on its Hospital Compare website, it still falls short of what many patients, payers, and providers would like. This is particularly true in the realm of outcomes such as infections and mortality rates, and in provider-level ratings.

Journalists and other ratings-making organizations have recently attempted to fill the measurement chasm left by policymakers and health care professionals. In July, nonprofit journalism organization ProPublica unveiled its Surgeon Scorecard, posting the “adjusted complication rates” for more than 16,000 physicians in eight inpatient procedures. The Scorecard’s release set off an intense debate within the health care community about the validity of the measure as well as the requirements of journalists when they function as scientists to create new measures. With the Surgeon Scorecard, ProPublica acted as judge and jury; they defined the measure, deemed it valid, and declared which surgeons were low quality. What assurances does the public have that such “vigilante” measures are scientifically sound? While ProPublica says its work was “guided by experts,” that review was informal.

Shortly after the Scorecard was issued, some detractors on social media called for it to undergo peer review, a process that is typical for government-issued measures. That review was delivered on Friday, when several researchers in health care quality measurement, including me, published a critique on the RAND Corporation website. Our conclusion: Patients should not consider the Scorecard a valid or reliable predictor of any individual surgeon’s outcomes.

Among several concerns raised, we pointed out that the adjusted complication rate, which was based mostly on readmissions, was not a true complication rate. The measure didn’t consider complications that occurred during a hospital admission and ignored many complications that are most meaningful to patients. For instance, erectile dysfunction is common after radical prostatectomy (more than 50 percent, according to some estimates), but it was not tracked in the ProPublica measure. We also found problems with the underlying data used by ProPublica: Some surgical cases were attributed to non-surgeons or to surgeons in the wrong specialty — a finding that suggests the existence of other errors that are harder to detect.

Developing and vetting a valid new quality measure can be hard, tedious and controversial. Yet that process unearths weaknesses, improves the final product, and ultimately makes the measure more useful to patients and physicians. No matter who creates a measure — the government, journalists or nonprofit groups — we all have a duty to ensure it receives the highest level of scrutiny before it’s issued, not after the fact. When journalists act as scientists, they should be held to the standards of scientists.

Peter Pronovost is an anesthesiologist and director, Armstrong Institute for Patient Safety and Quality.  He blogs at Voices for Safer Care, where this article originally appeared.

Image credit:

View 2 Comments >

Most Popular

Join 150,000+ subscribers

Get the best of KevinMD in your inbox

Sign me up! It's free. 
✓ Join 150,000+ subscribers 
✓ Get KevinMD's 5 most popular stories
Subscribe. It's free.