dealing with heterogeneity of treatment effects: is the literature up to the challenge?

Download Dealing with heterogeneity of treatment effects: is the literature up to the challenge?

Post on 06-Jul-2016




1 download

Embed Size (px)


  • ral

    ssBioMed CentTrials

    Open AcceResearchDealing with heterogeneity of treatment effects: is the literature up to the challenge?Nicole B Gabler*1, Naihua Duan2, Diana Liao3, Joann G Elmore4,5, Theodore G Ganiats6 and Richard L Kravitz1,7

    Address: 1Center for Healthcare Policy and Research, University of California, Davis, California, USA, 2Columbia University, New York State Psychiatric Institute, New York, USA, 3University of California, Los Angeles Neuropsychiatric Institute, California, USA, 4University of Washington School of Medicine, Seattle, Washington, USA, 5Department of Medicine, Harborview Medical Center, Seattle, Washington, USA, 6Department of Family and Preventive Medicine, University of California, San Diego, California, USA and 7Department of Internal Medicine, University of California, Davis, California, USA

    Email: Nicole B Gabler* -; Naihua Duan -; Diana Liao -; Joann G Elmore -; Theodore G Ganiats -; Richard L Kravitz -

    * Corresponding author

    AbstractBackground: Some patients will experience more or less benefit from treatment than theaverages reported from clinical trials; such variation in therapeutic outcome is termedheterogeneity of treatment effects (HTE). Identifying HTE is necessary to individualize treatment.The degree to which heterogeneity is sought and analyzed correctly in the general medicalliterature is unknown. We undertook this literature sample to track the use of HTE analyses overtime, examine the appropriateness of the statistical methods used, and explore the predictors ofsuch analyses.

    Methods: Articles were selected through a probability sample of randomized controlled trials(RCTs) published in Annals of Internal Medicine, BMJ, JAMA, The Lancet, and NEJM during oddnumbered months of 1994, 1999, and 2004. RCTs were independently reviewed and coded by twoabstractors, with adjudication by a third. Studies were classified as reporting: (1) HTE analysis,utilizing a formal test for heterogeneity or treatment-by-covariate interaction, (2) subgroup analysisonly, involving no formal test for heterogeneity or interaction; or (3) neither. Chi-square tests andmultiple logistic regression were used to identify variables associated with HTE reporting.

    Results: 319 studies were included. Ninety-two (29%) reported HTE analysis; another 88 (28%)reported subgroup analysis only, without examining HTE formally. Major covariates examinedincluded individual risk factors associated with prognosis, responsiveness to treatment, orvulnerability to adverse effects of treatment (56%); gender (30%); age (29%); study site or center(29%); and race/ethnicity (7%). Journal of publication and sample size were significant independentpredictors of HTE analysis (p < 0.05 and p < 0.001, respectively).

    Conclusion: HTE is frequently ignored or incorrectly analyzed. An iterative process ofexploratory analysis followed by confirmatory HTE analysis will generate the data needed tofacilitate an individualized approach to evidence-based medicine.

    Published: 19 June 2009

    Trials 2009, 10:43 doi:10.1186/1745-6215-10-43

    Received: 3 October 2008Accepted: 19 June 2009

    This article is available from:

    2009 Gabler et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 12(page number not for citation purposes)

  • Trials 2009, 10:43

    BackgroundRandomized controlled trials (RCTs) are the cornerstoneof evidence-based medicine. Such trials rely on randomassignment to alternative treatment groups to control forbaseline patient factors that could affect outcomes. Theresulting estimate of the average treatment effect is anaverage of the individual treatment effects (ITEs) for partic-ipants in the study. While estimates of the average treat-ment effect are generally useful, some treated individuals,both within and outside of clinical trials, will experiencemore or less benefit than the reported average. Such vari-ation in treatment effect is termed heterogeneity of treat-ment effects (HTE) [1,2].

    HTE may be quantitative (subgroup effects in the same direc-tion as the average effect but varying in magnitude) or qual-itative (treatment effects in different directions in differentsubgroups, where treatment is beneficial in some subgroupsand harmful in others). The prevalence of HTE is unknownand perhaps unknowable, but highly variable treatmentresponse rates for many common conditions suggest it issubstantial. [3,4] For example, Allen Roses, the vice presidentof genetics at GlaxoSmithKline, has stated, "Our drugs don'twork on most patients" [5]. Several empirical demonstra-tions of HTE have recently been published, including studiesof ischemic stroke [6], risk reduction by carotid endarterec-tomy [7,8], and diabetes [9]. While qualitative HTE may beuncommon, quantitative HTE should not be dismissed,because even modest variations in the magnitude of nettreatment benefits may have important implications forpatient care and cost-effectiveness.

    HTE can be assessed in several ways. The most directapproach is the n-of-1 clinical trial, which assigns individ-ual patients to receive alternative treatment in a randomlypredetermined sequence [10,11]. Results from a series ofsuch trials can be aggregated to assess heterogeneity in thepopulation. However, n-of-1 trials are applicable to a rel-atively small subset of conditions and treatments [10,11]and are subject to random within-patient variability (thusrequiring a careful design and repeated crossovers)[12,13]. A second approach is to stratify patients accord-ing to risk of disease-related adverse events [14,15]. Athird approach, typically performed for purposes ofhypothesis generation rather than testing, entails a carefulexamination of subgroups within RCTs.

    Subgroup analysis can be perilous. Real effects can bemissed because of inadequate statistical power [16,17],and reported effects may be spurious because of the per-formance of multiple statistical tests (1316) and/or dueto random intra-individual variability [12,13]. Randomintra-individual variability is especially problematic

    design. In parallel group trials, participants are only rand-omized to one treatment and do not crossover to alterna-tive treatments. As such, it is not possible to estimate anyvariation that occurs within a participant. In recognition ofthe drawbacks of subgroup analysis, the ConsolidatedStandards of Reporting Trials (CONSORT) statementwarns that subgroup analyses, especially post hoc sub-group comparisons, "do not have great credibility" [18].

    On the other hand, it has been claimed that nearly every-thing we have learned from epidemiology resulted fromsubgroup analysis [19]. While this conclusion applies mostobviously to observational studies, careful scrutiny of sub-group-specific effects in randomized trials has generatedimportant new hypotheses and sometimes directly influ-enced practice. Nevertheless, subgroup analyses are notalways performed correctly. Some studies (e.g. [20-23])report results by subgroups, without any statistical testing orinterval estimation for the difference across subgroups;these studies do not provide quantitative information onHTE per se. Other studies report p-values corresponding toeach subgroup, subsequently claiming that the treatmenteffect differs across subgroups because it is statistically sig-nificant in one subgroup and not in another [24]. However,both treatment effect and sample size influence the p-value,such that similar effect sizes within each subgroup mightgenerate markedly different p-values. Instead of comparingthe p-values across subgroups, the appropriate way to iden-tify significant HTE is to make statistical comparisons fortreatment effects across subgroups, using a test for heteroge-neity or interaction [7,17,18,25-27].

    The tension between needing to understand HTE andlacking the statistical power to properly examine itpresents difficulties for researchers, clinicians, andpatients. While some experts have offered general encour-agement to perform more HTE analyses [28,29], the liter-ature is relatively silent on how to manage the risks ofover- and under-testing. Kraemer et al. [25,30,31] havesuggested a sequential approach that could shed light onpossible HTE. Defining treatment modifiers as factors thatinfluence the treatment effect size across subgroups, theypropose that all RCTs use exploratory interaction analyses asa method to generate hypotheses regarding moderators oftreatment effects. The presence of strong moderator effectswould encourage future researchers to perform ade-quately powered confirmatory studies stratified prospec-tively on these moderators. While the proposal ofKraemer et al. makes sense, several small reviews, mostpublished well before the revised CONSORT statement,suggest that testing for HTE is reported in only 25% to50% of RCTs [18,27,32-35].Page 2 of 12(page number not for citation purposes)

    because it is not possible to estimate this variability in par-allel group trials, the most common type of clinical trial

    We undertook the current review of the prevalence of HTEanalyses in a comparatively larger sample of articles pub-

  • Trials 2009, 10:43

    Page 3 of 12(page number not for citation purposes)

    Article selection for systemic review of heterogeneity of treatment effect in RCTs published in five general medical journalsF


View more >