Poor quality evidence, lack of affordability and uninformed patients suggest an awful lot of research doesn’t actually matter. However, for informing better decisions when presented with a piece of evidence there are three questions that I use to identify and weed out most research that doesn’t matter: 1) does this research apply to my patient; 2) is the research of sufficient length to inform the outcome given the clinical course of the disease, and 3) will this evidence make a difference to my patient’s outcome?
1. Does this research apply to my patient?
External validity is the extent to which we can generalize the results of a trial to the population of interest, whereas internal validity refers to the extent a study properly measures what it is meant to. The issue is that most interventions in clinical trials don’t apply to real world patients and so have poor external validity.
An analysis of 20,000 Medicare patients with a principal diagnosis of heart failure reported that only 13–25% met the criteria for 3 of the pivotal RCTs. A further review of 52 studies, which compared baseline characteristics of RCT patients with real world patients, found that many trials are highly selective and have lower risk profile patients than those seen in the real-world: 37 (71%) of the studies concluded the populations were not representative. The patients we are often most interested in applying evidence to – the elderly and those with comorbidities – were most often excluded. In only 15 (29%) studies were the RCT samples generally representative of real-world populations. Furthermore, amongst 155 RCTs of drugs frequently used by elderly patients, with chronic medical conditions, only three studies exclusively included elderly patients. Similar problems have also been observed in cancer trials; there have been recent calls to expand the inclusion criteria, not least to increase the number of participants and improve the generalisability.
A systematic review of the eligibility criteria of 283 RCTs published between 1994 and 2006, in high impact general medical journals, reported that common medical conditions led to exclusions in 81% of trials and commonly prescribed medications in 54%. Similar problems have also been seen in alzheimer trials: information on comorbidities and drugs is often lacking, as a consequence there is a significant difference between trial participants and the real world populations with Alzheimer’s. Some of the blame is due to reporting bias: one cause of poor quality evidence, which is easily rectified.
2. Is the research of sufficient length to inform the outcome given the clinical course of the disease?
There are two problems when it comes to trial length and informing outcomes: 1) trials that are stopped too early, and 2) trials of insufficient length that often use surrogate outcomes and therefore do not reflect the outcomes of interest for the real course of the disease.
Trials stopped early, on average, will overestimate treatment effects. These overestimates are larger in smaller trials. A review of 143 trials stopped early (STOP-IT 1) found they are on the increase (0.5% in 1990-1994 to 1.2% in 2000-2004 (P<.001 for trend); they recruit on average 63% of the planned sample; often they do not report important trial features and they report larger treatment effects – particularily when the numbers of events is small. A further comparison of RCTs stopped early (STOP-IT 2) with those that weren’t – in the same meta-analysis – found that the tru=ncated RCTs also have greater effect sizes.
The excellent ‘Absolutely Maybe Blog,’ by Hilda Bastian, on ‘The Mess That Trials Stopped Early Can Leave Behind’ reports on a leukemia trial testing courses of treatment analysed every year. Figure 1 shows the annual analyses, and whilst early results suggested significant benefits this subsequently disappeared over time. The results of RCTs stopped early – particularly with small sample sizes and small number of events – should be viewed with a healthy dose of skepticism.
Figure 1. Attempts to optimize induction and consolidation treatment in acute myeloid leukemia: results of the MRC AML12 trial.
There is a growing body of trials that simply do not reflect the course of the disease, and as a consequence insufficient evidence exists around many current interventions to determine if they are effective. As an example, a 2009 Cochrane systematic review of antidepressants versus placebo for depression in primary care found 14 studies (10 examined TCAs, 2 SSRIs and 2 included both, all compared with placebo) including 1364 participants in the intervention group and 919 in the placebo group. Nearly all studies were of short duration, typically 6-8 weeks, there was no dose information on SSRIs, and the authors were unable to comment on the appropriate duration of treatment. Given the paucity of evidence you would think there would have been an increase in the number of longer term trials. A 2015 updated systematic review including a total of 66 studies found most trials were poor quality ‘because there was a small number of studies with observation periods of longer than 12 weeks, reliable comparative analysis of long-term effects was not possible….and the effects size compared with placebo is frequently considered rather small.’
3. Will this evidence make a difference to my patient’s outcome?
When using evidence to inform patient care what we mostly do is asses statistical significance. Only then do we consider the issue of clinical significance. But, if we asked what effect size would we consider sufficiently important enough to implement this treatment before looking at results we might discard a significant amount of evidence irrespective of the statistical significance.
This is sometime referred to as the minimally clinically important difference (MCID), which is the smallest difference you would be willing to accept. In some areas there have been calls to develop a catalogue of MCIDs. As an example, in alcohol behavioural interventions there is need to rethink relevant outcomes and the evidence that might contribute to recommendations for MCIDs. However, MCID measures may be too conservative as they reflect minimal values. In an analysis of 8931 rheumatoid patients < 65 years of age improvement consistent with a “really important difference” (RID) was reported by patients to be 3 to 4 times greater than the MCID.
There are a number of available methods to develop MCID and although no one method is better than another – and there are shortcomings – they are still useful and certainly better than nothing. A COPD symposium assessing MCID stated, ‘clinical opinion and patient subjective response should trump statistical theory,’ which fits with the definition and ethos of EBM, which may help you when next using evidence to make a better decision.
The next in this series will look at better decisions require less conflicts of interest.
Carl Heneghan is professor of EBM at the University of Oxford, director of CEBM, which co-organizes the EvidenceLive conference with the BMJ
His research interests span chronic diseases, diagnostics, use of new technologies, improving the quality of research and investigative work with The BMJ on drugs and devices that you might stumble across in the media.
I declare that I have read and understood BMJ Policy on declaration of interests and I hereby declare the following interests: CEBM jointly runs the EvidenceLive conference with The BMJ and is one of the founders of the AllTrials campaign. He has received expenses and payments for his media work and has received expenses from the World Health Organization (WHO) and and holds grant funding from the NIHR, the National School of Primary Care Research, the Wellcome Trust, and the WHO.