Physicians’ reactions to ORBITA — a blinded, randomized controlled trial (RCT) from Britain with a sham arm comparing percutaneous coronary intervention (PCI) to placebo in patients with stable angina — are as fascinating as the cardiac cycle. There were murmurs, kicks and pulsating jugulars. Though many claimed to be surprised and many unsurprised by the null results of the trial, the responses were predictably predictable. Some basked in playful schadenfreude, and some became defensive and bisferious.
No shame in sham
The coverage of the trial in the The New York Times was predictably jejune and hyperbolic. Predictably, the most nuanced and divergent viewpoints were curated by Larry Husten. Predictably, medical Twitter was set alight. The trial vindicated Vinay Prasad and Adam Cifu who predicted that PCI for stable angina will get placeboed in their popular book, “Ending Medical Reversal”. Prasad and Cifu are tireless advocates for using sham control trials to judge the true efficacy of procedures, such as PCI in relieving symptoms and reject the notion that invasive placebos are unethical. There’s no shame in sham, they say. They were right.
The Objective Randomized Blinded Investigation With Optimal Medical Therapy in Stable Angina (ORBITA) is an impressive trial. It enrolled 230 patients with stable angina and single vessel stenosis greater than 70%. The vast majority had class 2 (59%) and class 3 (39%) angina. Majority of the patients, 70%, had LAD lesions. If you look in the appendix, which has pictures of catheter angiograms of all patients, you’ll see scary tight proximal LAD stenosis — yes, even these patients had a 50% chance of getting sham. This takes balls. The trialists deserve applause, as do the Brits who volunteered. These were no snowflakes.
Unsurprisingly, the first sham control trial of PCI for stable angina was not from the U.S. Some will blame the incentive structure which favors protectionism in the U.S. But consider other explanations — Brits have become more frontier-spirited than the Americans, and American institutional review boards and regulators are more risk-averse than their European counterparts. Seven out of ninety-five patients in the sham arm faced major complications – this was no ethical joke.
The trial, designed to reduce panic in the participants, had a short, though long enough for symptoms to improve, follow-up period of six weeks. After a run-in of six weeks, in which the myocardium was pharmacologically armed to fight ischemia, patients were randomized to a drug-eluting stent (100), or a sham procedure in which a stent was all but deployed (95). The blinding was excellent. As an example, post procedure all patients got dual antiplatelet therapy (DAPT) for a month. Patients in the placebo arm could have received fake DAPT, they would never have known, but the trialists, likely anticipating criticism from RCT purists that the placebo arm had an unfair advantage, kept the arms identical in every respect, except the deployment of a stent.
Endpoints
The primary endpoint was changed in exercise time between baseline, but after the run-in, and six weeks after the procedure. The trial asked: does PCI improve exercise time over placebo after six weeks of medical therapy?
The mean increase in exercise time (ET) was 28 and 12 seconds in the PCI and placebo groups, respectively. The difference between the two groups of 16 seconds was not statistically significant the probability of getting this difference assuming the null hypothesis that there is no difference between PCI and placebo is true was 20%, which is higher than the scriptural 5%.
Because the null results of a trial of a widely adopted procedure, which some believe is the poster child of overtreatment are so consequential, ORBITA faces an unusual — though not unreasonable — scrutiny.
The researchers used objective secondary endpoints to see if opening the blocked artery reduced ischemia, notably wall motion on dobutamine stress echo. The way dobutamine stress works is that the heart faces higher and higher doses of dobutamine, which makes the heart work harder and harder – dobutamine is like cracking the whip on trireme oarsmen. The idea is that the myocardial wall supplied by a
stenotic coronary artery eventually buckles under pressure — a form of confession under torture. A better choice might have been myocardial perfusion scintigraphy — which measures the perfusion reserve of the myocardium — it’s more sensitive than dobutamine stress echo for ischemia, more easily quantifiable, a sharper instrument for small changes, and would more reliably have captured the ischemic reversal from PCI.
The researchers measured the fractional flow reserve (FFR), a physiological method of quantifying the severity of coronary stenosis in which pressures on either side of a stenosis are measured. The more the pressure drops across a stenosis, the more significant the stenosis. The mean FFR increased from 0.69 to 0.9 in the PCI group, indicating procedural success, though not necessarily a treatment effect. The standard deviation reduced from 0.16 to 0.06— the technical success was uniform.
Power: Modest expectations with great hopes
The main criticism was that the trial was underpowered. The power of a test — the probability of detecting that PCI improves symptoms beyond placebo when it truly does — traditionally 80%, determines the sample size. To say a study is underpowered is to say that the sample size is insufficient, though power is meaningless without specifying the effect size.
The effect size is an assumption made by trialists. As a general rule, the larger the effect size, the smaller the sample size required to prove the effect — this is obvious. What may be less obvious is that researchers have an incentive to overestimate effect so that the trial need recruit fewer patients, costs less and can be completed sooner. As I have quipped before, the principle aim of principal investigators of RCTs isn’t seeking the truth, but accruing patients to meet the sample size so they can get on with publishing their paper.
However, the ORBITA investigators didn’t overestimate the effect of PCI. The literature suggests improvement in exercise times as high as 96 seconds for PCI over medical therapy and 45 seconds for medical therapy over placebo. The researchers assumed an effect size of 30 seconds. The researchers could have flattered PCI by a legitimately larger estimation — this would have reduced their sample size. Instead, they had lower expectations of PCI, allowing PCI to prove itself more easily. Further, the researchers assumed a between patient standard deviation of 75 seconds, meaning that they were expecting wild success stories.
The sample size was 200 patients, which doesn’t sound like many, but the famed ACME trial, which showed that stents work in stable angina, had a sample size of 212.
Quagmire of frequentist statistics
However, something very interesting happened which gets to the belly of the trial and the Achilles’ heel of the measuring instrument in stable angina. At baseline, the mean ET was 38 seconds higher in the PCI than placebo group. Though this is chance having a laugh, it doesn’t mean that the study was underpowered.
That the between-group difference in mean ET is greater than the effect size means that we’re in the event horizon of frequentist statistics. Now look at the standard deviation of ET, which is 179 seconds and 195 seconds at baseline, and 179 second and 191 seconds at follow-up for the PCI and placebo groups, respectively.
With standard deviations six times the anticipated effect size, it means we’re dealing with considerable heterogeneity, or spread in the population called “stable angina.” One consequence of heterogeneity is that two groups can differ in baseline characteristics by chance alone.
Of note, and again this is chance ruining the parade, the mean angina duration is 9.5 months in the PCI group and 8.4 months in the placebo group — the difference is small, but the standard deviations are a whopping 15.7 months in the former, and 7.5 months in the latter. Basically, there’s a bit of a spread. “Stable angina” is no cultural monolith — whether it’s a melting pot or salad bowl is debatable, though I favor salad bowl.
Is the 38 seconds-difference in baseline ET between the two groups clinically significant? The tempting answer is “yes” because the difference is greater than the effect size. I don’t think the difference in baseline ET is clinically significant, because for it to be significant we’d have to assume, in the clinical significance of ET, non-linearity, which I think exists, and also that the accelerating slope of non-linearity lies between an ET of 490 and 528 seconds, the mean ETs at baseline for placebo and PCI groups, respectively. A simpler way of saying this — mean ETs of both 8 minutes 10 seconds (placebo) and 8 minutes 48 seconds (PCI) are pretty damn good, at least according to the cardiologists I’ve spoken to.
But I do think that the heterogeneity points to something important. The large standard deviation in ET should make us cautious of generalizing the results, because not all generalizations are equal. The mean ET in PCI group is 8 minutes 48 seconds. One standard deviation, which includes 68% of sample, includes ETs between 5 minutes 49 seconds and 11 minutes 47 seconds. Two standard deviations, which includes 95% of sample, include ETs between 2 minutes 50 seconds and 14 minutes 26 seconds. Don’t quibble about the bounds — I know the variance is unequal. I know the assumption, a big assumption, is that ETs have a normal distribution. I know that there’s little to learn, prognostically, beyond 10 minutes of exercise.
But my point is this: A patient with an ET of 5 minutes 49 seconds, i.e., low exercise capacity, has more to gain from PCI, indeed just more to gain than a patient with an ET of 8 minutes 48 seconds. And a patient with ET of 8 minutes 10 seconds doesn’t have much more, if at all more, to gain than a patient with ET of 8 minutes 48 seconds. This is the law of diminishing returns — lower the baseline ET, the more there is to gain. And the point I’m making is that the difference in the mean ET at baseline between the two groups isn’t significant — what’s significant is the spread.
This is also to say that the mean baseline ET of both the placebo and PCI groups are an enviable ET. Indeed, many cardiologists have been unable to reconcile a mean ET of 8 minutes and the fact that 40% of patients in the trial have class 3 angina. Recall, in class 3 angina patients can’t walk up a flight of steps at normal pace without limitations. There are a couple of explanations. First, the people who volunteered for the trial, indubitably brave, mostly weren’t terribly perturbed by their symptoms — which is why they volunteered. That is the placebo nature of the trial selected the lowest risk patients, de facto. Of note, 83/ 368, one-in-five deemed eligible, declined participation.
The second explanation is that the correlation between class of angina and ET isn’t perfect, and many patients who know they have narrowed coronary arteries underestimate what they can actually do, a sort of negative placebo effect, though ORBITA is not uniquely afflicted by this disconnect.
There is another problem with ET. Though a continuous variable, making it sound like temperature, pressure or height, it’s a lumpy metric. This is because there are many reasons for stopping an exercise test — some honestly objective, like depression of ST segment on ECG and others variably subjective — such as development of angina or fatigue. There’s asymmetry about the mean also in lumpiness. If you can make 10 minutes on the treadmill your chances in life are pretty good. If you stop at 2 minutes 50 seconds, it could be the angina, the arthritis, the fatigue, or that you just can’t be bothered to plough on. To slightly paraphrase Tolstoy, all ETs greater than 8 minutes are alike (happy prognosis) and all ETs less than 6 minutes are unhappy in their own way.
Quid est veritas?
That there is a placebo effect in PCI for stable angina is not in doubt — the question is how much. To believe that PCI is entirely placebo is just as wrong as believing there’s no placebo. Though placebo has sent ligation of the internal mammary artery (IMA) and transmyocardial laser revascularization to the graveyard, a similar fate for PCI for angina would be unfair — not least because PCI is more bioplausible (a term that’s getting unfair flak these days) than IMA ligation. Further, PCI has proven itself to be at least as good as medical therapy in the COURAGE trial. Ironically, in the same issue of the Lancet a trial found that drug eluting stents have better outcomes than bare metal stents in elderly patients – as both devices had a placebo component, and the placebo effect canceled each other out, it’s fair to say that stents do work in some patients.
There are two questions: Does the trial favor or penalize PCI in its design or statistical analysis? Are the protocols and patients from the trial generalizable?
The medical therapy in the run-in period certainly took the wind out of the sails of PCI, perhaps by 45 seconds. Most patients were started on three anti-anginal agents rather rapidly. Though cardiologists would better comment, this does strike me as unrealistic in practice. The level of vigilance was high and the patients spoke to the cardiologists several times a week— which is more care than even private patients in the U.K. get and is the care one expects from a concierge practice in San Francisco. Suffice to say, there’s placebo in vigilance. And if this degree of vigilance were the standard of care — the NHS would implode.
As I alluded, the patient population, either because they’re explicitly brave or subtly low risk, may not be generalizable.
The researchers used drug eluting stents, so it can’t be said that the best option for PCI was foregone. 29% of patients had FFR > 0.8 — a group that might not have been revascularized in real practice settings.
The statistical analysis was two-tailed — not unusual but it assumes that PCI could either be better or worse than placebo. To make it clear — the carrier of placebo is unlikely to be worse than placebo at symptom-relief. Think about it, it’s illogical to believe otherwise.
Frequentism means that ORBITA is a null test and, as the accompanying editorial implied, we should start hammering the last nail in the coffin of PCI for stable angina. A wise Bayesian might disagree. The treatment effect is 16 seconds — that is PCI increased exercise time 16 seconds more than placebo. The 95% confidence interval is — 8.9 to 42 seconds — I’m going to ignore the negative numbers, because they mean that PCI worsened the ET compared to placebo and, as I mentioned before, that’s just silly non-judgmentalism.
Also relevant is that the Duke Treadmill score — the higher the better — improved by 1.12. This, too, was struck down by the gods of statistical significance because the p value was 0.10. Statistical significance was designed to catch BS. It’s not designed to select the right patients.
Given the choppy nature of ET as a metric and likely non-linearity in ET, it’s fair to deduce that nearly all the true effect came from the patients flung on the left side of the mean,— further to the left meaning lower ET and greater the gain from PCI. Whereas the patients on the right side of the mean — that is with ETs greater than 8 minutes — were least affected by PCI. Thus, it’s not unreasonable to deduce that in the lower quartiles of ET, an improvement of 42 seconds in ET, or even more, can be imparted by PCI.
Keep calm and carry on stenting, judiciously
ORBITA and COURAGE aren’t death knells of PCI for stable angina. But they rightly question the incontinent adoption of PCI for stable angina. However, the matter isn’t easily resolved. It’s arguable, plausible, though not provable that because PCI was used so widely for stable angina, the interventional cardiology community grew large enough to offer rapid door-to-balloon times in acute myocardial infarctions. That stents in stable angina are a classic example of the less sick saving the lives of the more sick, indirectly. It’s possible that my conjecture is wrong but unlikely.
ORBITA was designed to succeed. The beauty of its design isn’t that it mirrors real life practice but that it internalizes the clinical uncertainties cardiologists face. The duration of the run-in and follow-up gives an idea of the duration in which medical therapy should be encouraged and actively pursued. It also suggests that in patients with an exercise time greater than eight minutes, medical therapy should be pursued with the greatest vigor — many cardiologists say, “but we already knew that.” If this knowledge is so widespread then perhaps the AHA/ACC should relegate this specific group of patients to a lower division in its guidelines for PCI for stable angina.
Here’s a suggested algorithm for stable angina: cardiac CT to exclude left main disease, optimal medical therapy for a short burst — if symptoms improve and patients are happy, great. Of note: in 39/230 patients, symptoms resolved after medical therapy. If symptoms don’t improve — exercise test. If patient can manage more than 8 minutes, pat them on the back, and reassure that medical therapy is the way to go, and recommend a trip to Bryce Canyon. If patient does less than 6 minutes – send the patient to a FFR-driven interventional cardiologist.
Crucially, patients determine the intractability of their symptoms, and physicians determine the tractability of treatment. The dialectic between them is shared decision making. Still, judgment is paramount, because it’s easy for a patient to be free from angina simply by restricting physical activity, which would be self-defeating.
There’s a school of thought which says that unless patients know the true treatment effect of PCI they can’t make an informed decision. This premise gets you into endless rabbit holes. The obvious retort is that the aggregate doesn’t apply to any one patient and judgment is necessary, which leads to an intractable argument about burden of proof, philosophy of science, and epistemology. I’m partial to facts, but quantitative truths have diminishing returns.
There are two points to make here: First, it’s important to clarify the harms of PCI — the Lancet editorial quoted mortality of 0.65%. It also quoted rates of myocardial infarction and renal injury after PCI of 15% and 13%, respectively, which strike me as being more reflective of overdiagnosis of these conditions. Many interventional cardiologists struck back at these numbers saying the mortality from PCI is as low as 0.23%. It’s just as important to not exaggerate the harms of PCI as it is to not exaggerate the benefits of PCI. Perhaps the interventional cardiologists can write a joint piece with the Lancet editorialists about the true harms of elective PCI in stable angina.
The second point is that I doubt many patients ask about the true treatment effect — in quantitative terms, I mean. What does an improvement in ET of 42 seconds even mean? It means jack shit to me, and, allegedly, I have a little knowledge about this topic. Patients want to know two questions. WILL PCI make me feel better? CAN PCI make me feel better? The answer to the second question is, no doubt, yes.
How one answers the first question is the crux of the debate. Hope is a placebo and is most in need when other treatment options have failed. An interventional cardiologist who doesn’t believe in stenting can’t offer hope. The best doctors may be the ones with the highest placebo.
Acknowledgments
Insights for this piece were derived from several Tweeps, including Anish Koka, Vinay Prasad, Anupam Singh, David Brown, John Mandrola, Robert Yeh, Frank Harrell and Eric Topol. My opinions, as always, are my own.
Saurabh Jha is a radiologist and can be reached on Twitter @RogueRad. This article originally appeared in the Health Care Blog.
Image credit: Shutterstock.com