The benefit of a drug cannot be measured by the law of averages

One of the topics that I have often thought about (especially in light of our seeming inability to develop zero-risk obesity drugs) is the problem of averages. Our entire medical philosophy of “evidence-based” medicine seems built on the “Gaussian” assumption that averages can reflect the true benefit (or risk) of a drug, when in real life (or medical practice) there is no such thing as the truly average patient.

Clearly, a drug that works in most cases may be entirely ineffective (or have rare but serious adverse effects) in a given patient. Similarly, a drug that is ineffective for most patients can potentially work miracles in a small set of individuals.

For those of you who like analogies, imagine wanting to treat every case of fever with penicillin. Yes, if you run your study during an epidemic of streptococcal infections, more people with fevers may respond than during other times. But even then you will need large numbers to cut through the “noise”, as many fevers will spontaneously resolve or continue unabated unto death (which is why we need a “control” group). Chances are, we may well find that treating all fevers with penicillin is not much better that placebo and we will likely nicely demonstrate that simply taking penicillin for fever has unacceptable individual risks (including deaths from anaphylactic shock). Clearly, penicillin should not be on the market given its potential for “abuse” by anyone who has a fever.

But as we take a closer look at the data we may find that while penicillin is not a great drug for everyone who comes down with a fever, there may be a subset of patients (strangely those who appear to have bacterial infections), in which penicillin does seem to sometimes work. Yes, some of these patients may also have severe anaphylactic responses, but on “average”, people with fever due to bacterial infections do seem to get better faster than people with other causes of fever.

As we look even more closely at the data it seems that even among those with bacterial infections not everyone is “average” – fever patients affected with a certain type of bacteria (interestingly those who stain positively with a certain dye) seem to respond well (albeit still with occasional anaphylactic responses), while those infected by non-staining bacteria (and even some of those that stain positive) seem entirely unresponsive.

You can see where I am going with this – as long as we treat fever as a uniform entity, our chances of finding a “cure” in a large randomized trial of patients presenting with fever is virtually zero unless we are dealing with a very common etiological cause of fever (as in a rare epidemic when most fevers in a population may just happen to be due to a penicillin-sensitive bug), or in a massive study that allows drilling down to meaningful subgroups in post-hoc analyses (purists will likely object to this no matter the size of the study).

In fact, our large randomized fever study will likely tell us that the risk/benefit of using penicillin to treat fever is entirely unacceptable (given that penicillin has the potential to kill) – clearly no regulator would ever consider allowing penicillin on the market, especially for a condition as common as fever. Imagine all the people “misusing” penicillin to treat their fevers – no benefit (on average) – huge risks (for individuals).

No doubt, a company hoping to develop penicillin as a new treatment for fever, better invest heavily into identifying the group of fever patients for whom penicillin does in fact work. Patients in whom penicillin is so effective that even with the occasional death from anaphylactic shock, the “average” benefit remains indisputable. Clearly, simply taking 10,000 cases of fever off the street and treating them all with penicillin is unlikely to convince any regulator on the planet that this drug belongs on the market.

For readers, who may perhaps argue that using penicillin for fever is a long stretch, I’d be happy to offer other analogies: try methotrexate for patients with malignancies, try allopurinol for patients with an inflamed joint, or try vitamin B12 injections for patients with anemia. While all of these treatments may well be highly effective in a subset of patients with these disorders, for the “average” patient with cancer, joint pain, or anemia, these treatments will harbor nothing but side effects.

So what about obesity? The notion that we can take the next best 10,000 people with excess weight off the street and treat them all with a given compound that will result in clinically meaningful weight loss with virtually no side effects is not only overly optimistic but also contrary to any current understanding of the complex nature of obesity.

From where do the companies developing these compounds get the notion that a compound that is indeed powerful enough to override one of nature’s most intricate and essential survival instincts, will be both safe and effective for the “average” person who happens to find himself in a state of positive energy balance?

What is the biological rationale for hoping to find a drug that is as effective in reducing emotional (hedonic) eating as it is in overeating due to true hunger (homeostatic overeating), or perhaps overeating in social settings (as in peer pressure)? And how should this compound work in the person where clearly the problem is not overeating but undermoving (perhaps from that back injury, asthma, or lack of time). Indeed, it would truly have to be a miracle drug if it could also override the hyperphagic response to a hypoglycemic agent or to an atypical antipsychotic drug.

If scientific rationale does not convince us, that obesity is a remarkably heterogeneous condition, let us simply look at the results of our clinical trials with antiobesity drugs. Yes, the average response is modest (indeed some people even gain weight in obesity trials), but that should hardly be a surprise. The real surprise (or is this expected?) is that there is often a subset of patients (perhaps as few as 15% of the entire study population), who do remarkably well, losing not twice, but three-times the amount of weight seen in the control group. Not only do these patients reap clear benefits, but strangely, they may even appear to tolerate the drug better than the rest. Are these patients “random” outliers or are these the very patients for whom this drug would truly be nothing short of a Godsend?

Regulators may well agree that such subgroups exist but would want to see data to support this. They may not care about the biological reason why these “super responders” respond so well, but would certainly want to know if there is a way that these patients can be identified (so as to reasonably limit the license to this population).

But predicting responders (as any prediction) can be a tricky business. Once we know that penicillin is only likely to control fever in people with gram-positive infections, we can certainly limit the use of penicillin to patients with evidence for such infections (or even better use actual resistance testing) – but when we have no such “rationale”, can we somehow still screen for responders?

What easier way to screen than to actually try the drug – albeit in a limited and controlled setting. If a drug is meant to produce weight loss but fails to do so, clearly it is not working and should be discontinued. Even the safest weight loss drug is unlikely to have any benefits in someone who does not lose or even continues to gain weight – in such a setting even the smallest risk will have an infinitely high risk/benefit ratio.

Fortunately, response to weight loss medications can be easily measured (on a simple office scale). All we need to ask are the following questions:

1) How long would it take to be reasonably sure that we are dealing with a “responder”?

2) What is the risk of exposing “non-responders” to this drug long enough to determine if they are indeed “non-responders”?

3) How likely will “non-responders” continue using the drug (despite not losing weight) thereby exposing themselves to unacceptable risk?

Most obesity experts will agree that the answer to the first question is probably 6-12 weeks. The answer to the second question will of course depend on the nature of the drug and its potential for serious (irreversible?) side effects with short-term treatment. The answer to the third question is, probably very few.

Interestingly, this is exactly the way most drugs are actually used in the real world, i.e. outside of the highly artificial construct of randomized double-blind clinical trials.

In my clinical practice I routinely start patients on drugs for any number of complaints and conditions and judging on my patient’s response (with regard to both efficacy and tolerability), I adjust the dose, or discontinue the drug altogether (often only to switch to the next available agent or running additional tests to confirm my diagnosis). Never in clinical practice would I (or my patients) consider continuing patients on drugs that have no demonstrable effect or precipitate unacceptable side effects (cost alone would prove a remarkable deterrent).

Denying approval for compounds that have the potential to deliver important benefits to even a subgroup of patients, simply with the argument that the “average” patient may not benefit and would therefore have an unacceptable risk/benefit ratio cannot be an ethical rationale for denying patients who could well benefit from such compounds.

Obesity has high risks – killing an estimated 300,000 Americans every year. For those with medically relevant obesity the only evidence-based option today is bariatric surgery (surprisingly safe but definitely not without risk). If only a subset of obese patients (15%?) could be effectively and safely treated with existing or emerging anti-obesity compounds, is the potential for misuse by those who should not be taking these compounds enough of an ethical argument to deny this treatment to those who do benefit?

For those who chose to misuse or abuse these compounds, where is the role of personal responsibility, which we so readily call upon to justify ridiculously lax gun or gambling laws? (Inability to enforce these laws has certainly not convinced courts or legislatures of the need to reverse their decisions)

On what legal precedents do regulators (and their advisors) base their recommendations to deny potentially safe and effective treatments to a few (for whom these treatments may well be safe and effective) in order to protect those who should clearly not be using these compounds in the first place?

If such compounds do exist, all I can say is, “restrictions, yes – denial, no”!

I firmly believe that as long as companies (and regulators) continue treating obesity as a homogeneous condition for which we can potentially find a drug that is both safe and effective for anyone with excess weight (irrespective of the cause), we will be unlikely to have safe and effective pharmacological treatments for ANY patients with obesity in the foreseeable future.

Arya M. Sharma is a Professor of Medicine at the University of Alberta who blogs at Dr. Sharma’s Obesity Notes.

Submit a guest post and be heard.

Comments are moderated before they are published. Please read the comment policy.

  • Kevin N.

    As a psychiatrist, I immediately thought of how most of our diagnoses (e.g. “depression”) are more akin to “fever” than to “group b streptococcal infection.” We see the same disparities between our literature–that say SSRIs, for example, barely outperform placebo–and our anecdotal experience, which is that many patients don’t remit, while others respond robustly. The reasons for this are legion (including “noise” such as episodes naturally remitting on their own), but I imagine that, like “fever,” depressive disorders are syndromes with multiple possible etiologies, both exogenous and endogenous. The more I practice, the more I’m understanding the limits of EBM.

    • Ron Wolf


      What you are understanding is the limits of medicine and psychiatry. While they are very limited and while our understanding of the body and the mind are still in their infancy, without evidence, we have nothing but ego-driven quacks. There is huge variability in the effectiveness of individual practitioners. And the person least likely to fairly judge the effectiveness of a specific practitioner is that very practitioner. We all think that we are geniuses, that we know better. Well, studies that show that is not at all the case are easy to find and, IMO, quite convincing. For instance, many many Doctors prescribe off-label – a practice that on the whole its quite harmful. Yet so many Doctors thinks that they know best. Disappointing to see you falling into this same narcissistic trap.

  • Ron Wolf

    You cover several topics starting with the very general (the application of statistics to medicine) and moving to the very specific (drug protocols for weight management). In making that progression, you leave out an essential consideration. Not sure if you do this purposefully – to better make your argument that if a drug can help one person then it should be approved for use – or if you do it out of ignorance. I’ll assume the former and then suggest that you do not do yourself or medicine a boon by being yet another Doctor telling only half of the story. What am I babbling about you might wonder? I’m babbling about our living in a society that chronically overuses and misuses drugs. How can you ignore this? People do all sorts of things that harm themselves (including overeating, smoking, etc), why do you think they would stop taking pills that don’t have benefit? People demand anti-biotics when they have a cold – and their Doctors comply partially causing the resistance crisis. And yet this behavior continues. Indeed, the FDA (and similar) protocols MUST deny approval in these sort of situations as Doctors and patients alike make judgment errors on a routine basis. If you somehow think that yourself and your practice are immune from these sort of mistakes then you are self-deluding.

  • HJ

    Please explain how you get from the normal distribution to the Law of Averages.

  • SkeptVet

    As always when looking at the tension between scientific evidence derived from studies on groups of people and the complex, imperfectly predictable needs of individual patients, we need to keep in mind the benefits and risks of both approaches. Sure, the average response in a clinical trial is not going to give us the ability to predict perfectly the response of any single patient. On the other hand, centuries of just trying out therapies on individual patients and monitoring for apparent response led to far less effective medicine than has come from formal, scientific studies in the century. And we are all prone to seeing benefits and harms which are not real if the controls of a RCT are not in place, which makes our clinical judgement and experience worth less than we would like it to be.

    The answer, then, is for the odds, as determined by controlled research, to inform but not absolutely dictate our clinical decision making. Playing the odds doesn’t always work, but ignoring them works no better in medicine than it works in Vegas. And as frustrating as it is for us and our patients, there is an inevitable uncertainty to decisions made concerning something as complex as health and diesease, and we must constantly navigate the winding path between blind algorithm and foolish trust in our own beliefs and judgements. The degree of uncertainty, the quality of the available evidence about risks and benefits, and the urgency of acting must all be balanced in making the best posisble clinical decisions. Evidence-based medicine is simply remembering that our decisions are better if the information behind them is the most reliable we can get, and this is seldom the subjective impression we have based on the cases we’ve seen individually.

  • IVF-MD

    Nowhere is the fallacy of always thinking in averages demonstrated better than in infertility where a 50% success rate does not result in everybody having half a baby. In some specific scenarios, the odds are not that great and there is only a 10% chance of success. But if you get ten couples and tell them that if they all participate, that one couple can expect to end up with a healthy baby, then it becomes more meaningful to them than giving them a statistic like 10%.

  • Never Again

    I am not in favor of using averages at all. Midazolam/Versed caused me enormous distress and later PTSD and anxiety disorders. Because the PRACTITIONERS like it so much, they deem it perfectly safe and claim that everybody just loves it, hence, nearly everybody gets it. It is hard to seperate what the practitioners desire and what’s best for the patient. So for patients it appears that at least 50% dislike or intensely hate the drug Midazolam. It looks like 90% or so of the injectors of this drug LOVE it. We won’t have any studies about what this horrible drug does to brain function because it is so popular with PRACTITIONERS. FDA approved it, it is being used off label and in amounts not prescribed by the manufacturer and it is harming people…

  • Sara Stein MD

    Medicine has transferred the magic bullet mindset to bariatric surgery for obesity. Yet, 20-30% of surgery patients will have failed results (failure to lose weight, weight regain, development of transfer addictions such as alcoholism). Others will lose some weight and diabetes, but remain obese and deal with remaining serious endstage obesity diseases for life.

    The mindset of restrictive diet and exercise also fails in the majority of patients, yet it continues to be the most prescribed and touted cure for obesity, despite terrible statistics in practice.

    This magic bullet approach leads to indiscriminate prescribing of obesity medications that may have serious side effects (like surgery doesnt!), and then utter dismissal of the treatment because it didn’t provide the highly promised Madison Avenue cure.

    Dr. Sharma is absolutely right. Obesity is a complex medical, psychosocial and spiritual condition and there is no magic bullet that cures it. There are many tools including medication and surgery and therapy that can help, however. We should be trying to expand our toolkits, and our provider education on toolkit selection for obesity, rather than searching for the obesity holy grail.

  • Tripp Wingate

    “We should be trying to expand our toolkits, and our provider education on toolkit selection for obesity, rather than searching for the obesity holy grail.”

    Sara I am working on that (“expanding our toolkits”). The challenge is how do we develop an office based program that provides medical oversight of intensive lifestyle change that motivates patients to learn and live dietary and physiacal activity habits that are consistent with the evidence based US guidelines. There are multiple research protocols that have been shown succesful i.e. DPP but none as of yet that has been rendered practical, affordable, reimbusable and most importantly efficient for office use. We have developed such a program in our Community Health Clinc that we will make available nationally once pilot testing is complete in late December. It will revolutionize how PC is done with chronic medical conditons, renivigorate the patient physician relationship, reduce the need for medication and thus provide low cost high quality results. Will share this approach with the rest of the medical community when “the soup” is ready.

    • Sara Stein MD

      Tripp, I would be interested in learning more about your program – I work at a CHC in Cleveland – if you can share information –
      thanks, Sara

Most Popular