comparing results of large clinical trials to those of meta-analyses

8
STATISTICS IN MEDICINE Statist. Med. 2002; 21:793–800 (DOI: 10.1002/sim.1098) Comparing results of large clinical trials to those of meta-analyses Lincoln E. Moses 1; , Frederick Mosteller 2 and John H. Buehler 2 1 Division of Biostatistics; Department of Health Research and Policy; Stanford University School of Medicine, HRP Redwood Building; Stanford; CA 94305-5405; U.S.A. 2 Department of Statistics; Harvard University; Cambridge; MA 02138; U.S.A. SUMMARY We consider methods for assessing agreement or disagreement between the results of a meta-analysis of small studies addressing a clinical question and the result of a large clinical trial (LCT) addressing the same clinical question. We recommend basing conclusions about agreement upon the dierence between the two results (relative risk, log-odds ratio or similar summary statistic), in the light of the estimated standard error of that dierence. To estimate the standard error of the meta-analytic result we recommend a random eects analysis, and where a between-studies variance component is found, that component of variance should be used twice: once in the estimated standard error for the meta-analytic result and again in the standard error of the LCT result (augmenting the internal standard error of that statistic). Such broadening of the standard error reduces the appearance of disagreement. We also oer a critique of a dierent published approach, which is based on consistency of ndings of statistical signicance, a matter of how the two results regard zero, which is a poor measure of how closely they agree with each other. Copyright ? 2002 John Wiley & Sons, Ltd. KEY WORDS: assessing agreement between estimates; xed eects analysis; large clinical trial; meta-analysis; random eects analysis INTRODUCTION How well do the conclusions from a large clinical trial (LCT) agree with those from a well-executed meta-analysis of earlier, smaller clinical trials addressed to the same clinical question? This question was examined previously by Ioannidis et al. [1]. The question is important; if typically the agreement were suciently poor this would call for discounting the results of one method or the other. Typically high agreement would justify combining the results suitably, and arriving at a stronger overall conclusion. Correspondence to: Lincoln E. Moses, Division of Biostatistics, Department of Health Research and Policy, Stanford University School of Medicine, HRP Redwood Building, Stanford, CA 94305-5405, U.S.A. Received October 2000 Copyright ? 2002 John Wiley & Sons, Ltd. Accepted July 2001

Upload: lincoln-e-moses

Post on 06-Jul-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparing results of large clinical trials to those of meta-analyses

STATISTICS IN MEDICINEStatist. Med. 2002; 21:793–800 (DOI: 10.1002/sim.1098)

Comparing results of large clinical trials to those ofmeta-analyses

Lincoln E. Moses1;∗, Frederick Mosteller2 and John H. Buehler2

1Division of Biostatistics; Department of Health Research and Policy; Stanford University School of Medicine,HRP Redwood Building; Stanford; CA 94305-5405; U.S.A.

2Department of Statistics; Harvard University; Cambridge; MA 02138; U.S.A.

SUMMARY

We consider methods for assessing agreement or disagreement between the results of a meta-analysisof small studies addressing a clinical question and the result of a large clinical trial (LCT) addressingthe same clinical question. We recommend basing conclusions about agreement upon the di�erencebetween the two results (relative risk, log-odds ratio or similar summary statistic), in the light of theestimated standard error of that di�erence. To estimate the standard error of the meta-analytic result werecommend a random e�ects analysis, and where a between-studies variance component is found, thatcomponent of variance should be used twice: once in the estimated standard error for the meta-analyticresult and again in the standard error of the LCT result (augmenting the internal standard error of thatstatistic). Such broadening of the standard error reduces the appearance of disagreement. We also o�era critique of a di�erent published approach, which is based on consistency of �ndings of statisticalsigni�cance, a matter of how the two results regard zero, which is a poor measure of how closely theyagree with each other. Copyright ? 2002 John Wiley & Sons, Ltd.

KEY WORDS: assessing agreement between estimates; �xed e�ects analysis; large clinical trial;meta-analysis; random e�ects analysis

INTRODUCTION

How well do the conclusions from a large clinical trial (LCT) agree with those from awell-executed meta-analysis of earlier, smaller clinical trials addressed to the same clinicalquestion? This question was examined previously by Ioannidis et al. [1]. The question isimportant; if typically the agreement were su�ciently poor this would call for discountingthe results of one method or the other. Typically high agreement would justify combining theresults suitably, and arriving at a stronger overall conclusion.

∗Correspondence to: Lincoln E. Moses, Division of Biostatistics, Department of Health Research and Policy,Stanford University School of Medicine, HRP Redwood Building, Stanford, CA 94305-5405, U.S.A.

Received October 2000Copyright ? 2002 John Wiley & Sons, Ltd. Accepted July 2001

Page 2: Comparing results of large clinical trials to those of meta-analyses

794 L. E. MOSES, F. MOSTELLER AND J. H. BUEHLER

A RECOMMENDED APPROACH

A well-grounded straightforward way to answer the question is at hand, at least where ‘many’clinical trials are summarized by the meta-analysis. In that situation, the meta-analysis sum-mary statistic, �X (estimated relative risk (RR), or odds ratio (OR), or log of either) will beapproximately normally distributed. The summary of many studies, �X , in a meta-analysis, hasthe character of a weighted average, and thus, with su�ciently many independent contribu-tions, must be nearly normally distributed, in the light of the central limit theorem. The cor-responding statistic X0 from the LCT will also be approximately normally distributed, as willthen be the di�erence d=X0− �X . The LCT provides a summary statistic X0 that is a smoothfunction of two binomial proportions. This su�ces to insure approximate normality of X0 ifthere are su�ciently many observations entering into the binomial proportions. Both �X and X0have associated, estimated standard errors, SE( �X ) and SE(X0); �X and X0 are statistically inde-pendent, being based on di�erent observations. It follows that SE(d)= [SE2( �X )+SE2(X0)]1=2,and that a 95 per cent con�dence interval for �, the parameter (‘true value’) correspondingto d=X0 − �X , is

d− 1:96 SE(d)6�6d+ 1:96 SE(d) (1)

If all the numbers in this interval are positive, we have high con�dence that � is positive,that is, that X0, the LCT result, actually exceeds �X , the meta-analysis result – that chancevariation is not the source of the di�erence. If instead all values of � in the interval (1)are negative, we have high con�dence that the ‘true’ value is negative, that is, that �X , themeta-analysis result, systematically exceeds X0, the LCT result. If instead of either of thesepossibilities, the left endpoint of the interval (1) is negative, and the right endpoint is positive,we must allow that � might be positive, or it might be negative; it is plausible that �X andX0 di�er only because each must carry its own ‘margin of error’ – and their ‘true’ valuesmay coincide – that is, we have no clear basis for choosing �X over X0 or vice versa. Indeed,�X and X0 may be combined in a suitably weighted average, producing a better-than-eitherestimate of the log (OR) (or other similar measures of clinical e�ect).Computation of SE(d) requires careful consideration. We recommend that the meta-analysis

use a ‘random e�ects’ model, which we regard as more realistic than a ‘�xed e�ects’ model.The key idea is this: if each ‘small’ trial were in fact very large we would expect each to arriveat somewhat di�erent estimates of the clinical e�ect, since the various trials presumably di�erin many respects, like patient referral and recruitment patterns, nutrition, aspects of diagnosticwork-up, demographics etc. Why should they agree precisely in the magnitude of treatmente�ect? We prefer to allow for a possible component of between-study variation, in additionto within-study variation. Thus, we recommend that the meta-analysis be done with a modellike (2), below:

Xi= �+ ui + ei (2)

in which Xi is the treatment versus control value observed in the ith small clinical trial, andis represented as the sum of three components, none of which we can directly observe: (i)the ‘true’ value � (which we are trying to estimate); (ii) a random disturbance ui, particularto the ith study, embodying between-study variability and having mean zero and unknownvariance �2u ; (iii) another random disturbance ei, again with mean zero and with variance �2i ,

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:793–800

Page 3: Comparing results of large clinical trials to those of meta-analyses

LARGE CLINICAL TRIALS COMPARED WITH META-ANALYSES 795

which we know with servicable accuracy, because it is simply the square of the within-studystandard error of Xi, as reported for small trial i. Applying the random e�ects analysis tothe suite of small trials results in three statistics: (i) an estimate, �̂u2, re�ecting between-studyvariation; (ii) �X , a weighted average of the Xi and an estimate of the treatment versus controloverall e�ect; (iii) [SE( �X )]2 or its equivalent, �̂2�x , which suitably combine �̂u

2 and all the �̂2i ,and which expresses the sampling variation of the estimate �X . Note that SE( �X ) is ordinarilylarger than is the estimated standard error of �X in a �xed e�ects analysis of the many smalltrials. Note also that when model (2) applies then

var(Xi)=�2u + �2i (3)

Our preference for a random e�ects analysis is based on three considerations. First, as justremarked, it is natural to expect variation among the ‘true’ values for the various studies.Second, the random e�ects analysis will allow for a between-studies variance componentonly when the data provide evidence for its existence. Third, experience with meta-analysesoften gives evidence of such variation; to ignore it is to underestimate the actual variabilityof the meta-analytic summary. Thus, we drew from Chalmers et al. [2] a strati�ed randomsample of 50 meta-analyses evaluating the e�ects of care during pregnancy and childbirth.We randomly chose 25 meta-analyses which combined six to nine studies and 25 meta-analyses which combined 10 to 20 studies. To estimate the inter-study variation, we useda non-iterative method of moments estimator recommended by DerSimonian and Laird [3].Our analysis found that 30 of the 50 studies called for random e�ects analysis because ofinter-study variation. Of these 30, 15 occurred in the group combining six to nine studies,and 15 occurred in the group combining 10 to 20 studies (see Table I).To express the discrepancy between �X , the meta-analysis estimate of treatment versus con-

trol e�ect, and X0, the LCT estimate of that e�ect, compute d=X0 − �X . The standard errorfor d can be calculated from: (i) SE(X0), the (internal) standard error reported for the LCT;(ii) the estimate �̂u2; (iii) SE( �X ), with the second and third coming from the random e�ectsmeta-analysis. Our reliance on model (2) tells us because of equation (3) that for the LCTwe need to take into account both its internal reported standard error and an allowance for thepresence of u0. This is quite obvious where the LCT has been chosen as the largest from thesuite of trials producing the meta-analysis. If those other trials exhibit convincing evidenceof �2u¿0, it is hard to argue that the one trial that is largest lacks such a component! Wherethe LCT comes later than the suite and is not part of it, the case for including an allowancefor u seems to retain its force, or be even stronger, because of additional possible problemswith comparability of inclusion criteria.Thus, the standard error of d=X0 − �X is calculated as

[SE(d)]2 = [SE(X0 − �X )]2 = [SE2(X0) + �̂u2] + [SE( �X )]2 (4)

The bracketed quantity, [SE2(X0) + �̂u2], contains the two sources of variation in X0: theinternal variance, based on binomial variation and reported for the LCT, and the additionalbetween-study variance component estimated from the meta-analysis. The second bracketedquantity, [SE( �X )]2, already includes the two sources of variation for �X , as calculated fromthe random e�ects meta-analysis.Using the expression for SE(d) in equation (4) in the con�dence interval (1) provides, in

our opinion, a satisfactory way to answer the question ‘How well do the conclusions from

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:793–800

Page 4: Comparing results of large clinical trials to those of meta-analyses

796 L. E. MOSES, F. MOSTELLER AND J. H. BUEHLER

Table I. Comparison of random e�ects and �xed e�ects estimates of the appropriately weightedaverage in 50 randomly chosen meta-analyses, where R is the ratio of the random e�ects variance

estimate to the �xed e�ects variance estimate.

Meta-analysis Number of studies Random e�ects Fixed e�ects R

1 6 0.030 0.030 1.0262 6 0.049 0.008 6.4113 6 0.017 0.008 2.0624 6 0.116 0.116 15 6 0.121 0.018 6.6226 6 0.035 0.035 17 6 0.142 0.082 1.728 6 0.185 0.010 19.3659 6 0.182 0.019 9.77510 6 0.445 0.445 111 6 0.116 0.116 112 6 0.215 0.028 7.78913 6 0.081 0.081 114 6 0.066 0.066 115 7 0.064 0.018 3.57516 7 0.017 0.017 117 7 0.114 0.079 1.44518 7 0.014 0.011 1.23619 7 0.130 0.011 12.31420 8 0.227 0.227 121 8 0.013 0.013 122 8 0.205 0.026 8.02923 8 0.060 0.032 1.89624 8 0.107 0.052 2.06125 9 0.083 0.083 126 10 0.037 0.008 4.73527 10 0.099 0.023 4.35528 10 0.016 0.016 129 10 0.019 0.019 130 10 0.064 0.062 1.03531 10 0.072 0.039 1.85432 11 0.020 0.020 133 11 0.100 0.100 134 11 0.022 0.013 1.67135 11 0.059 0.059 136 12 0.066 0.048 1.35237 12 0.014 0.010 1.38438 12 0.164 0.030 5.39239 12 0.031 0.025 1.23540 12 0.031 0.015 2.03441 12 0.026 0.016 1.64642 12 0.047 0.031 1.51243 14 0.066 0.055 1.18244 14 0.020 0.020 145 15 0.076 0.076 146 15 0.045 0.016 2.8447 16 0.026 0.014 1.82448 17 0.013 0.013 149 17 0.039 0.039 150 18 0.012 0.012 1

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:793–800

Page 5: Comparing results of large clinical trials to those of meta-analyses

LARGE CLINICAL TRIALS COMPARED WITH META-ANALYSES 797

a large clinical trial agree with those from a well-executed meta-analysis of earlier, smallerclinical trials addressed to the same clinical question?’Further, if |X0 − �X |¡1:96 SE(d), it is reasonable to combine X0 and �X in a weighted

average:

�̂=(�̂2�x)(X0) + (�̂

20 + �̂u

2)( �X )

�̂2�x + (�̂20 + �̂u2)

SE2(�̂) =

[1

�̂2�x+

1

�̂20 + �̂u2

]−1

combining the information in both.Cappelleri et al. [4] also recommend using a random e�ects analysis in the meta-analysis,

both for arriving at a summary estimate and for calculating the standard error. Our treatmentextends theirs by taking into account the e�ect of any observed inter-study variation uponthe uncertainty of the LCT’s estimate of treatment e�ect. The LCT might have been analysedby random e�ects across sites. The ‘borrowing’ information from the meta-analysis could bethen forgone; the LCT’s standard error would contain the needed component.

A DIFFERENT APPROACH, NOT RECOMMENDED

The literature o�ers another way to appraise the agreement or disagreement of LCT withmeta-analysis [5]. The idea is to regard the meta-analysis result as a prediction of the signand statistical signi�cance of the LCT. Posing the problem in this way entails paradoxicaland unsatisfactory consequences, to be described below. The formulation we shall examine isde�ned thus. If the estimates of log(OR) (or log(RR)) from the meta-analysis and the LCTare of opposite sign, then they ‘disagree’; if they agree as to sign and statistical signi�cance(both signi�cantly di�erent from zero or both not) then they ‘totally agree’, if they agree insign, but one is signi�cantly di�erent from zero, while the other is not, they ‘partially agree’.This formulation contains some di�culties. Figure 1 shows three situations, each involving a

pair of estimates (say log(OR)); one member of the pair, , is to be thought of as the estimatede�ect X0 from the LCT, with its associated con�dence interval, and the other member of thepair, �, depicts the estimated e�ect �X from the meta-analysis, with its associated con�denceinterval. In panel A, the two estimates are exactly equal, but the upper con�dence interval liesentirely to the left of zero, meaning that X0 is signi�cantly di�erent from zero, while the lowercon�dence interval for �X straddles zero, meaning that �X is not signi�cantly di�erent fromzero. Thus there is ‘partial agreement’ of these two identically equal estimates, X0 and �X .We �nd this unsatisfactory. Though the intervals di�er regarding the value zero, the estimatesagree with each other exactly.This panel provides another anomaly. In your mind’s eye, shift the lower con�dence interval

leftward until it lies clear of zero. Now there is a di�erence between the two estimates, butthe pair has moved from being coded for ‘partial agreement’ to ‘total agreement’.Consider panel B. Here the two estimates disagree as to sign and are coded ‘disagree’, but

they resemble each other quite well; each lies well inside the other’s con�dence interval, andthey surely di�er no more than chance easily explains.

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:793–800

Page 6: Comparing results of large clinical trials to those of meta-analyses

798 L. E. MOSES, F. MOSTELLER AND J. H. BUEHLER

Figure 1. Three comparisons of 95 per cent con�dence intervals.

Panel C illustrates two other pathologies. First, the exact location of the right end of thelower con�dence interval determines whether these ‘totally agree’. That is an undesirablefeature. Second, and more seriously, there is no doubt that the di�erence between the twoestimates is statistically signi�cant; their con�dence intervals are entirely disjoint; the twoestimates are pointing at di�erent e�ects – and should not be said to ‘agree’. We believesimilar objections apply to proposals advanced by LeLorier et al. [6] which also quantifyagreement by referral to signi�cance of the individual studies.These examples illustrate that how closely two estimates agree with each other (and thus

support, or contradict, one another) is one question, and a very di�erent question is whetherone or both is signi�cantly di�erent from zero.The method we have proposed focuses on only the separation of the two estimates and

takes into account their sampling variation.

DISCUSSION

Using the random e�ects analysis in the meta-analysis leads to increasing SE(X0− �X ) in twoways then, once in the SE( �X ) for the meta-analysis and again in the SE(X0). The latter e�ectis larger.Enlarging SE(X0− �X ) reduces the prospects for X0 and �X to be found signi�cantly di�erent.

It is natural to ask how great is the enlargement of SE(X0 − �X ), growing out of using therandom e�ects analysis.A partial light on that question emerges from our study of 50 meta-analyses, already referred

to. Unfortunately our experience there relates to SE( �X ), rather than to SE(X0) or SE(d). Ofthose 50, 20 experienced no increase at all, for no overdispersion appeared. Another 15

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:793–800

Page 7: Comparing results of large clinical trials to those of meta-analyses

LARGE CLINICAL TRIALS COMPARED WITH META-ANALYSES 799

standard errors were augmented by a factor lying between 1.0 and 2.0. The remaining 15values of SE( �X ) were more than doubled. Of these 15, 10 occurred in the 25 meta-analysesinvolving between 6 and 9 studies; the other 5 occurred among the 25 meta-analyses involving10 to 18 studies.

CONCLUSION

We have proposed basing the judgement of agreement between X0, the result of a large clinicaltrial, and �X , the result of a meta-analysis, upon the di�erence, d=X0− �X , and considering thisdi�erence in relation to its standard error. We recommend this approach over an alternativeone which focuses on how well the statistical signi�cance of one (of �X and X0) correspondsto the statistical signi�cance of the other; that approach responds strongly to matters otherthan the closeness of X0 and �X .Estimating the standard error of d calls for care; we favour using a random e�ects analysis

in doing the meta-analysis for it takes account of overdispersion among the randomized clinicaltrials entering into �X when such overdispersion is present; when that occurs the standard errorof X0 needs to be increased.Further, when overdispersion of the studies in the meta-analysis is found, the standard error

of X0, the LCT result, needs to incorporate an allowance for that between-study variancecomponent; this gives mathematical expression to the idea that if the suite of studies hasshown erratic variation, it is prudent to take that record into account when estimating theprecision of this LCT result, X0.In closing: two messages form the core of this report.

(i) We recommend that to quantify the agreement between X0 from the LCT and �X fromthe meta-analysis, one use the di�erence d=X0 − �X .

(ii) In estimating the SE(d) a random e�ects analysis should be used to estimate �X , andif a positive between-studies variance component is found that should increase theestimated SE(d) in two ways: once in calculating SE( �X ) and again in calculatingSE(X0).

We can imagine satisfactory variants of the particular random e�ects computations we haveused – so long as they are consistent with the two messages above.

ACKNOWLEDGEMENTS

Professor Lincoln E. Moses delivered this paper in a public lecture at the Harvard School of PublicHealth on 2 June 2000 after receiving the 2000 Marvin Zelen Leadership Award in Statistical Sciencefrom the Department of Biostatistics at the Harvard School of Public Health.

REFERENCES

1. Ioannidis JP, Cappelleri JC, Lau J. Issues in comparisons between meta-analyses and large trials. Journal of theAmerican Medical Association 1998; 279(14):1089–1093.

2. Chalmers I, Enkin M, Keirse MJNC (eds). E�ective Care in Pregnancy and Childbirth, Vols. I and II. OxfordUniversity Press: New York, 1989.

3. DerSimonian R, Laird NM. Meta-analysis in clinical trials. Controlled Clinical Trials 1986; 7:177–188.

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:793–800

Page 8: Comparing results of large clinical trials to those of meta-analyses

800 L. E. MOSES, F. MOSTELLER AND J. H. BUEHLER

4. Cappelleri JC, Ioannidis JP, Schmid CH, deFerranti SD, Aubert M, Chalmers TC, Lau J. Large trials vs meta-analysis of smaller trials. Journal of the American Medical Association 1996; 276(16):1332–1338.

5. Villar J, Carroli G, Belizan JM. Predictive ability of meta-analysis of randomized controlled trials. Lancet 1995;345:772–776.

6. LeLorier J, Gregoire G, Benhaddad A, Lapierre J, Derderian F. Discrepancies between meta-analysis andsubsequent large randomized controlled trials. New England Journal of Medicine 1997; 337:536–542.

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:793–800