Every new scientific finding to be nothing more than a first draft

A recurring theme on e_Patients.net is the need for empowered, engaged patients to understand what they read about science. It’s true when researching treatments for one’s condition, it’s true when considering government policy proposals, it’s true when reading advice based on statistics. If you take any journal article at face value, you may get severely misled; you need to think critically.

Sometimes there’s corruption (e.g. the fraudulent vaccine/autism data reported recently, or “Dr. Reuben regrets this happened“), sometimes articles are retracted due to errors (see the Retraction Watch blog), sometimes scientists simply can’t reproduce a result that looked good in the early trials.

But an article in the New Yorker sent a chill down my spine tonight. It’ll chill you, too, if you believe the scientific method leads to certainty.

This sums it up:

Many results that are rigorously proved and accepted start shrinking in later studies.

This is disturbing. The whole idea of science is that once you’ve established a truth, it stays put: you don’t combine hydrogen and oxygen in a particular way and sometimes you get water, and other times chocolate cake.

Reliable findings are how we’re able to shoot a rocket and have it land on the moon, or step on the gas and make a car move (predictably), or flick a switch and turn on the lights. Things that were true yesterday don’t just become untrue. Right?

Bad news: sometimes the most rigorous published findings erode over time. That’s what the New Yorker article is about.

I won’t try to teach here everything in the article; if you want to understand research and certainty, read it. (It’s longish, but great writing.) I’ll just paste in some quotes. All emphasis is added, and my comments are in [brackets].

All sorts of well-established, multiply confirmed findings have started to look increasingly uncertain. It’s as if our facts were losing their truth: claims that have been enshrined in textbooks are suddenly unprovable. … In the field of medicine, the phenomenon seems extremely widespread, affecting not only antipsychotics but also therapies ranging from cardiac stents to Vitamin E and antidepressants: Davis has a forthcoming analysis demonstrating that the efficacy of antidepressants has gone down as much as threefold in recent decades.
“This is a very sensitive issue for scientists,” [Schooler] says. “You know, we’re supposed to be dealing with hard facts, the stuff that’s supposed to stand the test of time. But when you see these trends you become a little more skeptical of things.”
[One factor is] publication bias, or the tendency of scientists and scientific journals to prefer positive data over null results, which is what happens when no effect is found. The bias was first identified by the statistician Theodore Sterling, in 1959, after he noticed that ninety-seven per cent of all published psychological studies with statistically significant data found the effect they were looking for.

[The point here is that naturally all you see published is a successful study. Lots of useful information can come from failed studies, but they never get published.]
[The problem is that anything can happen once, at random. That’s why it’s important that a result be replicable (repeatable by another scientist): like that light switch, if someone else tries it, you better get the same result. But the article points out that most published results are never tested by another researcher.]

In recent years, publication bias has mostly been seen as a problem for clinical trials, since pharmaceutical companies are less interested in publishing results that aren’t favorable. But it’s becoming increasingly clear that publication bias also produces major distortions in fields without large corporate incentives, such as psychology and ecology.
[But publication bias] remains an incomplete explanation. For one thing, it fails to account for the initial prevalence of positive results among studies that never even get submitted to journals. [By this point, this article was driving me nuts.]
[Re another cause of this problem,] In a recent review article, Palmer summarized the impact of selective reporting on his field: “We cannot escape the troubling conclusion that some—perhaps many—cherished generalities are at best exaggerated in their biological significance and at worst a collective illusion nurtured by strong a-priori beliefs often repeated.”
[We had two posts in October here and here in October about an Atlantic article by Dr. John Ioannidis, who is quoted in this article:] “…even after a claim has been systematically disproven”—he cites, for instance, the early work on hormone replacement therapy, or claims involving various vitamins—“you still see some stubborn researchers citing the first few studies that show a strong effect. They really want to believe that it’s true.”
The current “obsession” with replicability distracts from the real problem, which is faulty design [of studies].

In a forthcoming paper, Schooler recommends the establishment of an open-source database, in which researchers are required to outline their planned investigations [before they do them] and document all their results. [Including those that fail!]
[Note: Pew Research publishes all its raw data, for other researchers to scrutinize or use in other ways.]

The corker that caps it off is John Crabbe, an Oregon neuroscientist, who designed an exquisite experiment on mice sent to three different labs with incredibly uniform conditions. Read the article for details. When these mice were injected with cocaine, the reactions of the three groups of relatives were radically different. Same biology, same circumstances, seven times greater effect in one of the groups.

If you’re a researcher and this has happened, and it’s time to “publish or perish,” what do you do? What is reality?

The article winds down:

The disturbing implication of the Crabbe study is that a lot of extraordinary scientific data are nothing but noise. The hyperactivity of those coked-up Edmonton mice wasn’t an interesting new fact—it was a meaningless outlier, a by-product of invisible variables we don’t understand.

Implications for e-patients

Wiser people than I will have more to say, but here are my initial takeaways.

Don’t presume anything you read is absolute certainty. There may be hope where people say there’s none; there may not be hope where people say there is. Question, question, question.
Be a responsible, informed partner in medical decision making.
- Don’t expect your physician to have perfect knowledge.How can s/he, when the “gold standard” research available may be flaky?
- Do expect your physician to know about this issue,and to have an open mind. The two of you may be in a glorious exploration of uncertainty. Make the most of it.
Expect your health journalists to know about this, too. Health News Review wrote about this article last week, and I imagine Retraction Watch will too. How about the science writers for your favorite news outlet? I can’t imagine reporting on any finding without understanding this. Write to them!

Mind you, all is not lost. Reliability goes way up when you can reproduce a result. But from this moment forward, I’m considering every new scientific finding to be nothing more than a first draft, awaiting replication by someone else.

Dave deBronkart, also known as e-Patient Dave, blogs at e-Patients.net and is the author of Laugh, Sing, and Eat Like a Pig: How an Empowered Patient Beat Stage IV Cancer and Let Patients Help!