Did the NEJM publish a bad study about checklists?

Recently, a study in the New England Journal of Medicine called into question the effectiveness of surgical checklists for preventing harm. Atul Gawande, one of the original researchers demonstrating the effectiveness of such checklists and author of a book on the subject, quickly wrote a rebuttal on the Incidental Economist. He writes, “I wish the Ontario study were better,” and I join him in that assessment, but want to take it a step further.

Gawande first criticizes the study for being underpowered. I had a hard time swallowing this argument given they looked at over 200,000 cases from 100 hospitals. I had to do the math. A quick calculation shows that given the rates of death in their sample, they only had about 40% power (we conventionally aim for a power of 80% or better.) Then I became curious about Gawande’s original study. They achieved better than 80% power with just over 7,500 cases. How is this possible?

The most important thing I keep in mind when I think about statistical significance, other than the importance of clinical significance, is that not only does it depend on the sample size, but also the baseline prevalence and the magnitude of the difference you are looking for. In Gawande’s original study, the baseline prevalence of death was 1.5%. This is substantially higher than the 0.7% in the Ontario study. When your baseline prevalence approaches the extremes (i.e. 0% or 50%) you have to pump up the sample size to achieve statistical significance.

So, Gawande’s study achieved adequate power because their baseline rate was higher and the difference they found was bigger. The Ontario study would have needed a little over twice as many cases to achieve 80% power.

This raises an important question: Why didn’t the Ontario study look at more cases?

The number of cases in a study is dictated by limitations in data collection. Studies are generally limited by the manpower they can afford to hire and the realistic time limitations of conducting a study. However, studies that use existing databases are usually not subject to these constraints. While creating queries to extract data is often tricky, once you have setup your extraction methodology it simply dumps the data into your study database. You can extend or contract the time period for data collection by simply changing the parameters of your query. Modern computing power means there are few limitations on the sizes of these study databases and the statistical methodologies we can employ. Simply put, the Ontario study (which relied on “administrative health data,” read: “existing data”) easily could have doubled the number of cases in their study.

Exactly how did they define their study group? As Gawande points out in his critique, the Ontario study relied on this bizarre 3-month window before and after checklist implementation at individual hospitals. Why 3 months? Why not 6 or 12 or 18? They even write in their methods: “We conducted sensitivity analyses using different periods for comparison.”

They never give the results of these sensitivity analyses or provide sound justification for the choice of a 3-month period. Three months not only keeps their power low, but it fails to account for secular trends. Maybe something like influenza was particularly bad in the post-checklist period, leading to more deaths despite effective checklist use. Maybe a new surgical technique or tool was introduced, like DaVinci robots, or many new, inexperienced surgeons were hired that increased mortality. In discussing their limitations, they address this:

Since surgical outcomes tend to improve over time, it is highly unlikely that confounding due to time-dependent factors prevented us from identifying a significant improvement after implementation of a surgical checklist.

I will leave it to you to decide if you think this is an adequate explanation. I’m not buying it.

Gawande concludes that this study reflects a failure of implementation of using checklists, rather than a failure of checklists themselves. I’m inclined to agree.

Ultimately, I don’t wonder why this study was published; bad studies are published all the time (hence the work of John Ioannidis). I wonder why this study was published in the New England Journal of Medicine. NEJM is supposed to be the gold standard for academic medical research. If they print it, you should be confident in the results and conclusions. Their editors and peer reviewers are supposed to be the best in the world. The Ontario study seems to be far below the standards I expect for NEJM.

I think their decision to accept the paper hinged on the fact that this was a large study that showed a negative finding on a subject that has been particularly hot over the past few years. Nobody seemed to care that this was not a particularly well-conducted study; this is the sadness that plagues the medical research community. Be a critical reader.

 Josh Herigon is a medical student who blogs at mediio.

Comments are moderated before they are published. Please read the comment policy.

  • PoliticallyIncorrectMD

    I find the whole argument peculiar, however very reflective of contemporary approach to medical “science”. Studies are labeled bad if they contradict whatever the particular belief of the particular researcher is. In the rest of science, it is done the other way around!

  • http://intellectualfollies.blogspot.com/ Vamsi Aribindi

    NEJM has lost much of it’s gold standard status in my eyes.

    Among it’s sins is it’s publication of the Vioxx/rofecoxib study way back in 2004 (1). It apparently didn’t learn from this and went on to publish an Avandia study just a few years later (2).

    Finally, Dr. Drazen (editor of the NEJM) published a stunning editorial. In response to a study that showed that most physicians distrusted trials sponsored by pharmaceutical companies regardless of their “quality”, Dr. Drazen exhorted doctors to “just believe the data” (3). Of course, soon after clear evidence emerged that at least one pharma company (Novartis) blatantly falsified data of Valsartan (4).

    The NEJM gets too much of it’s funding from pharma company-ordered-reprints of journal articles that their pharma reps use to convince doctors to prescribe their drugs.(5) I will grant that Dr. Drazen and the NEJM led the drive to establish clinicaltrials.gov, but still- there’s too much money and too much taint.

    (1), (2), (3): Whoriskey P. As
    drug industry’s influence over research grows, so does the potential for
    bias. Washington Post. 2012 November 24. http://www.washingtonpost.com/business/economy/as-drug-industrys-influence-over-research-grows-so-does-the-potential-for-bias/2012/11/24/bb64d596-1264-11e2-be82-c3411b7680a9_story.html

    (4): Lancet Editors.
    Retraction–Valsartan in a Japanese population with hypertension and
    other cardiovascular disease (Jikei Heart Study): a randomised, open-label, blinded endpoint morbidity-mortality study.
    Lancet. 2013 Sep7;382(9895):843.

    (5): 1. Dorsey et al. Finances of the publishers of the most highly
    cited US medical journals. J Med Libr Assoc. 2011 Jul;99(3):255-8.

  • http://onhealthtech.blogspot.com Margalit Gur-Arie

    Completely off topic, and just to add to Vamsi’s point, I am a bit troubled by NEJM’s stance on research from a slightly different perspective.
    Following the controversy regarding the nature of informed consent for the SUPPORT trials, NEJM, or rather Dr. Drazen, found it necessary to publish one of those stunning editorials, blasting the government for finding that consent was inappropriately obtained in this case.
    Drazen JM, Solomon CG, Greene MF. Informed consent and SUPPORT. N Engl J Med 2013;368:1929-1931
    Whether you agree or disagree with Dr. Drazen and the US branch of the SUPPORT trial, I don’t see why NEJM is reflexively advocating for less constraints on human research (or maybe I do…).
    Either way, I think we need to have a serious public debate on this before another cart leaves the barn in front of the horses (my take is here, including all the pertinent links to papers: http://onhealthtech.blogspot.com/2014/03/is-nuremberg-code-obsolete.html )