Why the Surgeon Scorecard is a journalistic low point for ProPublica

With much hype and fanfare, the independent investigative journalism outfit, ProPublica recently released their Surgeon Scorecard, assessing individual specialist surgeons who perform elective knee and hip replacements, spinal surgery, prostate surgery, and gallbladder removal surgery.

I had blogged about the impending release. My trepidation about the idea of a non-medical, non-scientific organization analyzing complex surgical data concerned issues such as patient accrual, exclusion/inclusion criteria, definition of terms, and the method of analysis that would be utilized. Alas, none of these questions were satisfactorily answered. Nothing about the scorecard really works very well. It distorts reality, clouds data, confuses patients, and proffers no insight in how a surgeon might improve his/her results. It essentially presents meaningless, poorly powered raw numbers in the form of fancy statistics and charts (to be elucidated shortly). The collective response from the community of practicing surgeons has been, “What the hell is this?” Even some members of the journalism guild have questioned the validity of the findings — one going so far as to assert that ProPublica committed journalistic malpractice and should retract the piece. That’s not for me to say. But let’s break a few things down.

First, there are loads of problems with the methodology. The entire database is composed of Medicare billing records for in-patient only hospital stays from the years 2009 to 2013. This is a red flag from the start for two reasons. One, it excludes outpatient Medicare cases. It excludes Medicare patients admitted through the ER. It excludes the vast numbers of patients with private health insurance. Two, and most glaringly concerning, the entire analysis of “surgeon quality” was based entirely on billing records. There was no case-specific analysis. No chart review. This may not have been possible given HIPAA and availability of data to journalists, but it is a critical weakness. Conclusions were based solely on ICD and DRG codes, context-free. This is like determining the best baseball player in America by evaluating batting average alone, independent of any and all context, and finding out that the award has to be given, not to Mike Trout, but to an 11 year old boy in Huntsville, Alabama who bats cleanup and plays shortstop for his summer travel team because he finished the season hitting at a .744 clip, with 14 homers to boot.

I can’t reiterate enough the paucity of data that is analyzed. Laparoscopic cholecystectomy (LC) is generally either an outpatient procedure performed on a patient between the ages of 20 to 60, or it is done semi-urgently on a patient admitted through the ER with acute cholecystitis. Both of these scenarios would be excluded from the analysis pool.

What’s left is a tiny proportional sliver of the total actual LC’s performed in this country as the basis upon which to judge and assess quality. It’s just silly. I’m a practicing general surgeon in Cleveland, Ohio so of course I spent some time reviewing the data on LC in my area. What I found was both ridiculous and inexplicable. If you try to find the LC complication rates of surgeons who operate at the main campus of the Cleveland Clinic you will only find one surgeon listed who qualifies for analysis (a minimum of 20 procedures performed over the 5 year time period).

At University Hospital’s main campus, there are zero surgeons who made the cut for analysis. So, at the mother ship hospitals for the two massive health care providers of Northeast Ohio, there is apparently only one surgeon who did enough LC’s to qualify for the Surgeon Scorecard. I mean, didn’t an editor at ProPublica find this odd, that the Cleveland Clinic allegedly doesn’t do enough LC’s to qualify for the scorecard? I use to operate a good bit at the east side community hospital Hillcrest. The two busiest general surgeons there from 2009 to 2013 also don’t qualify. None of it makes any sense.

We also have to talk about boring statistical terms like “confidence intervals.” ProPublica uses a 95 percent confidence interval when presenting their data. Given the relatively low number of procedures performed, the results of many surgeon’s ratings often straddle two, and sometimes, three categories (low, medium, high) of complication rates. As ProPublica itself admits:

There is a possibility that a surgeon whose adjusted complication rate is “high” might be equivalent to a doctor listed in the “medium” category. The further apart the doctors’ rates stand, the less probability there is of an overlap.

When I reviewed my data, I found that my “risk-adjusted complication rate” was 4.2 percent. (For what it’s worth, my complication rate was “better” than all but one surgeon in the Cleveland area — hooray, I guess.) I don’t really know what to make of that 4.2 as a raw number but when you account for the 95 percent confidence interval, it is just as likely, based on the shaded areas of the CI that I could be either a low, medium, or high complication surgeon.

So, I could be good or bad. I could be medium. In fact, all surgeons in the Cleveland area who perform LCs and qualify for assessment fall within a rather narrow complication rate band of 4.1 to 5.5 percent. But then the confidence intervals scatter the results of Cleveland surgeons all over the board of low, medium, high. What is a patient to do with such unreliable, discordant information? How does this help an anxious patient make an informed decision? Nothing is gained. Nothing is learned. It’s like you’re 19 again, and some girl broke your heart: All is meaningless, full of sound and fury, signifying nothing.

Another troubling aspect to the scorecard is the rather arbitrary way the term “complication rate” is defined. Per ProPublica, a surgeon gets dinged if one of two things occur: The patient dies during the same admission when the surgery was performed or if the patient is readmitted within 30 days of surgery and a panel of doctors determines that the readmission was “related to the surgery.” This is terrible on multiple levels. The 30 day readmit criteria is not clarified. We don’t know what factors were considered. We are simply told that “a panel of physicians” determined whether a readmission was “related” to the recent surgery. The word “related” is doing a lot of work in that sentence.

So the 84-year-old patient three weeks out from hip replacement who is admitted through the ER with “increasing confusion” due to insomnia and overuse of narcotic pain meds is a red mark against the orthopedic surgeon. Urinary tract infection two weeks after spinal surgery in a patient with known BPH. The anxious 27-year-old lady readmitted at midnight two days after a LC because of refractory nausea. The 49-year-old male who develops chest pains ten days after lumbar fusion surgery. All these are reportable offenses that don’t necessarily have anything to do with the quality of said procedure performed.

Most appallingly, these minor events are categorized in the same vein as a freaking peri-op death when assessing individual surgeon quality. So a surgeon who has a tendency to operate on older patients and subsequently sees a higher percentage of his patients readmitted with tangentially related minor medical issues could conceivably have a higher “adjusted complication rate” than a true hack surgeon who kills a few otherwise healthy patients every year.

Furthermore, why does ProPublica exclude all complications that occur during the surgical admission except death? Why is “return to OR” not there? What about post hemorrhage and need for transfusion? What about a surgeon who all too regularly whacks a common bile duct and transfers the patient immediately to a tertiary care center where it is promptly repaired, and the patient never gets readmitted? What about an orthopod who is careless about post-op DVT prophylaxis and sees an unacceptable level of blood clots and pulmonary embolisms on his patients? What about surgical site infections? Why is death the only metric deemed appropriate for quality assessment? It’s really an embarrassing lapse in judgment and methodology.

You see, surgeons across America are not afraid of transparency. Cardiac surgeons have had to publicly report their CABG results for years. The American College of Surgeons has made transparency and quality improvement a focus of inquiry. Justin Dimick at Michigan, Karl Bilimoria at Northwestern, and Conor Delaney at UH Case Medical Center are doing yeoman’s work getting some of this complicated data into peer reviewed journals. Through initiatives such NSQIP and PQRS reporting, the College has begun the long, arduous process of quality assessment to ensure that patients and payers are presented with data that are accurate, comprehensive, and fair to surgeons.

ProPublica calls out a urologist at Johns Hopkins, one of our elite tertiary care centers, for having a higher complication rate than some of his colleagues, without accounting for any mitigating factors. What’s his patient population? Does he tend to operate on sicker patients? How many did he do? What exactly were his so-called “complications”? Did he perform a lot of “re-do” or revisional surgery?

None of these critical, enlightening factors are considered. They wanted to get their story up on line ASAP. They wanted to be first, which is a fundamental principle that drives a lot of modern journalism, but isn’t so useful when it comes to presenting highly complex, scientific data to the general public. You can’t just vomit up a thin sliver of data based on a select cohort of patients and arrogantly title your findings Surgeon Scorecard as some sort of definitive, go-to patient resource.

And by releasing an article on Dr. Constantine Toumbis, a spinal surgeon in Florida who apparently has a higher than normal complication rate and was recently discovered to be an ex-felon dating back to a stabbing incident 20 years ago while a medical student, as a companion piece to the Surgeon Scorecard, ProPublica veers precipitously close to the yellow journalism of Horace Greeley and Gawker and the New York Post. It’s a low moment for an otherwise esteemed investigative operation that has been deservedly recognized for its work in exposing corruption, deceit, and greed across a wide range of subject matter.

But this project is no good. The worst surgeons are either too slow or way too fast. ProPublica rushed this study to print without doing the due diligence of vetting it with actual surgeons who are actively attempting to perform the same task of assessing and improving surgical outcomes. There are no short cuts to this. It will take some time. It’s complex. It will take some twisting of arms within the surgical community. But it’s coming. No longer will we as surgeons be able to hide behind our surgical masks or the “MD” certificate hanging on our office walls. We will have to demonstrate proficiency and excellence. I am confident that most board certified surgeons in this country are unafraid of such a proposition. As long as it’s done the right way.

Jeffrey Parks is a general surgeon who blogs at his self-titled site, Jeffrey Parks, MD.

Image credit: Shutterstock.com