meta-analysis iii: “advanced topics” mary s. beattie, md, mas ucsf women’s health division of...
TRANSCRIPT
Meta-analysis III: “Advanced Topics”
Mary S. Beattie, MD, MASUCSF Women’s Health
Division of General Internal MedicineApril 27, 2006
Goals1. Describe examples in your field of
a. Meta-analysis of diagnostic testsb. Meta-regression
2. Understand methodology and stats fora. Meta-analysis of diagnostic testsb. Meta-regression
3. Critique meta-analyses of these 2 “advanced topics”
Clinical Cases
1. 65 y/o man with atypical chest pain…Which to order: stress echo, or stress MIBI?
2. 52 y/o woman with DUB… On ultrasound, what level of endometrial thickness could “rule out” uterine cancer?
Diagnostic Meta’s• Examples from your field?
• Goals of a diagnostic meta?
– Improve statistical precision of estimates of accuracy
– Measure variability of test characteristics across different populations
Diagnostics meta’s differ from RCT and observational meta’s
• Different endpoints– Sensitivity, specificity, likelihood ratios,
ROC curve, diagnostic odds ratio
• Electronic literature searches more difficult
• Need for a “gold standard” – common and reliable
• Clinically relevant population
Practical Problems• Potential for multiple outcomes
– Sens, spec, pos LR, neg LR, etc.
• Poor quality studies
• Different definitions for “positive” gold standard
• Different populations in each study
• Many potential biases, incomplete reporting
Sensitivity and SpecificitySensitivity
TP/(TP + FN)Positive in Disease
SpecificityTN/(TN + FP)Negative in Health
Gold Std
+
Gold Std
-
Test
+
TP FP
Test
-
FN TN
(+) Likelihood Ratio =Sensitivity
1-Specificity
Pre-test OddsPre-test OddsOf DiseaseOf Disease
== Posttest OddsPosttest Odds Of DiseaseOf Disease
xx + LR+ LR
Diagnostic OR = +LR/-LR= TP x TN / FP x FN
Sensitivity Specificity Pos LR Neg LR Diag OR0.5 0.5 1 1 10.6 0.6 1.5 0.67 2.30.7 0.7 2.3 0.43 5.40.8 0.8 4 0.25 160.9 0.9 9 0.11 810.95 0.95 19 0.05 3610.99 0.99 99 0.01 9801
Case 1: Chest Pain, Echo or SPECT?
• What is the perfect study design?– Echo, SPECT, and gold std done in
reproducible way and at the same time– Gold std is 100% reliable and reproducible– Blind all readers
• What is the perfect population?– Consecutive– “Gray zone” prior prob of having disease
• How do “real life” studies differ from ideal?
Meta-analysis: the Steps1. Formulate a question, eligibility criteria
2. Perform a systematic literature search
3. Abstract the data
4. Perform a statistical analysis
5. Calculate the summary effect size
6. Calculate the summary effect size for subgroups
7. Check for heterogeneity/publication bias
Echo vs. SPECTFleischmann, JAMA 98
• Included:– Exercise + echo or single-photon emission CT– Cath as reference– TP, TN, FP, FN available
• Excluded: – Exclusively after MI, PTCA, CABG, unstable angina
admission
• Data extracted: – Study design, population, test characteristics; TP, TN,
FP, FN; and ?verification bias
Bias Types• Verification Bias: generally ↑sens ↓spec
– Partial: not everyone gets “gold standard”– Differential: different “gold standards” based
on results of test
• Incorporation Bias: over-estimates diagnostic accuracy– “Gold std” dx based partially on results of test
• Spectrum Bias– Choosing cases known to have disease ↑ sens– Choosing healthy controls ↑ specificity
Spectrum Bias & Sensitivity
Disease Grade Sensitivity
Case-Control
General Practice Hospital
Early 0.50 0 80 20Intermediate 0.75 20 15 30Advanced 1.00 80 5 50Observed Sensitivity 0.95 0.56 0.83
Source of Samples
Sensitivity elevated in case-control studies
Cases
Spectrum Bias & Specificity
Specificity elevated in case control studies
Controls
Disease SpecificityControls in
Case-ControlGeneral
Practice HospitalAlternative X 0.30 0 30 75Alternative Y 0.95 0 65 25Healthy adults 0.99 100 5 0Observed Specificity 0.99 0.76 0.46
Factors Leading to BiasFeature Multiple of DOR 95% CICase-control vs. clinical cohort 3.0 2.0 to 4.5Different reference standard 2.2 1.5 to 3.3No description of test 1.7 1.1 to 2.7No description of population 1.4 1.1 to 1.7Assessors not blinded 1.3 1.0 to 1.9No description of reference test 0.7 0.6 to 0.9
Based on analysis of 218 test evaluations from 18 separate meta-analyses
Lijmer et al. JAMA 282:1061-6, 1999
SummarySensitivity and Specificity
Test Results With disease Without disease
Positive True positive False positiveTotal positive
Negative False negative True negativeTotal negative
Total w/disease Total w/o disease
Participants
TP(i)
FN(i)
FP(i)
TN(i)
n1 n2
Summary sensitivity =∑TP(i) [positives]
∑TP(i)+FN(i) [with disease]
∑TN(i) [negatives]
∑TN(i)+FP(i) [without disease]Summary specificity =
. twoway (scatter tpr fpr, sort).6
.7.8
.91
tpr
0 .2 .4 .6 .8fpr
0.1
.2.3
.4.5
.6.7
.8.9
1S
EN
SIT
IVIT
Y
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 11-SPECIFICITY
Observed DataUninformative Test
ROC Plot of SENSITIVITY vs. 1-SPECIFICITY
. twoway (scatter tpr fpr, sort msymbol(circle) mcolor(black) msize(medium)), ytitle(SENSITIVITY) yscale(range(0 1)) ylabel(0(.1)1) xtitle(1-SPECIFICITY) xlabel(0(.1)1) title(ROC Plot of SENSITIVITY vs. 1-SPECIFICITY, size(medium)) graphregion(margin(zero)) legend(pos(2) col(1) lab(1 "Observed Data") lab(2 "Uninformative Test"))
Case 2: DUB, Ultrasound Cut-off?
Smith-Bindman, 1998
Case 2: DUB, Ultrasound Cut-off? Smith-Bindman 1998
Sensitivity and Specificity
Healthy Cancer
cutpoint
SensitivitySensitivity
1–specificity
ROC Curve for endometrial thickness
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
1-specificity(false positive)
Sen
siti
vity
(tr
ue p
osit
ive) 4 mm
5 mm10 mm
15 mm
20 mm
25 mm
Copyright restrictions may apply.
Test for HomogeneityFor each study measure squared deviation
from pooled estimate scaled by variance
Q i 2
vari
i = estimate from ith study= pooled estimatevar i = variance of estimate
Q has chi-square distribution with df = # studies - 1
Are Specificities
Homogeneous?
Chi-square, 19 df, p<0.001
= 0.81 * 131
Pooled specificity = .81
Study TN TN+FP exptd(obs-exp)
2/exp
Abu 436 459 372 10.8
Auslender 132 138 112 3.6
Botsis 112 112 91 4.9
Cacciator 19 41 33 6.1
Chan 46 50 41 0.7
Dorum 55 85 69 2.8
Goldstein 14 27 22 2.9
Granberg 153 157 127 5.1
Hanggi 58 68 55 0.1
Karlsson 801 1015 824 0.6
Karlsson (b) 70 90 73 0.1
Klug 120 171 139 2.5
Manilova 55 61 50 0.6
Nasri 51 52 42 1.8
Nasri (b) 74 83 67 0.7
Peril 43 131 106 37.7
Taviani 30 39 32 0.1
Varner 11 13 11 0.0
Weigl 117 163 132 1.8
Wolman 41 50 41 0.0
Total 2439 3005 83.1
Heterogeneity• Not the last step, just the beginning…
– Needs thought, explanation, and further work
• Statistical heterogeneity: is the variation likely to have occurred by chance?
• Clinical heterogeneity: are studies similar in…
– Design?
– Population?
– Test and gold standard characteristics?
Copyright restrictions may apply.
HRT: Source of Heterogeneity?
Copyright restrictions may apply.
.
Tests for LR Heterogeneity
• Positive LRs heterogeneous– Cochran’s Q = 187, df=19, p<0.001
• Negative LRs no evidence of heterogeneity– Cochran’s Q = 28, df=19, p=0.09
Summary Likelihood Ratios for Uterine Disease
• Heterogeneity in Pos LRs makes estimation of Pr{D | T+} unreliable
• Homogeneity in Neg LRs allows– Pr{D | T–} more reliable– If test is negative, more comfortable ruling out
disease
Meta-Regression: Step 8 or 9• A way to potentially explain and quantify
heterogeneity• Each study is a subject with co-variates
– Each unit is one study– Study-level covariates such as
• Quality• Publication year• Mean age• Gender distribution
• Can also apply to multi-center trials– Each unit is one center
Mega-Regression Cases
1. What characteristics of echo studies influenced its diagnostic odds ratio?
2. Fish oil: what are its effects on heart rate and who gets the most “bang for the buck” in ↓ heart rate from fish oil?
Echo, Sensitivity & Specificity
0.0 0.2 0.4 0.6 0.8 1.0
Specificity
0.0 0.2 0.4 0.6 0.8 1.0
Sensitivity
Echo ROC Curve
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
1-Specificity
Sen
sit
ivit
y
Summary LR+
Risk ratio.01 1 100
Study % Weight
Risk ratio (95% CI)
1.94 (1.34,2.82) Betesin 9.0
1.67 (0.92,3.03) Bjornstad 3.1
2.46 (1.41,4.27) Cohen 3.9
5.88 (2.78,12.44) Dagianti 1.7
12.98 (3.41,49.37) Galanti 0.8
4.34 (1.83,10.34) Jun 2.0
1.94 (1.15,3.28) Luotolahti 5.0
6.53 (2.89,14.76) Marangelli 2.3
5.64 (3.25,9.78) Marwick1 4.0
2.59 (1.79,3.75) Marwick2 9.8
4.47 (2.90,6.88) Marwick3 5.1
3.93 (2.16,7.14) Roger1 4.5
1.28 (1.06,1.53) Roger2 25.5
1.42 (0.91,2.23) Roger3 7.1
4.69 (3.09,7.12) Ryan 10.1
5.64 (1.59,20.03) Tawa 1.2
7.25 (2.85,18.46) Williams 1.6
7.02 (3.09,15.93) crouse 3.4
3.06 (2.72,3.45) Overall (95% CI)
Summary LR-
Risk ratio.01 1 100
Study % Weight
Risk ratio (95% CI)
0.06 (0.02,0.18) Betesin 3.5
0.16 (0.04,0.74) Bjornstad 0.9
0.10 (0.03,0.42) Cohen 2.4
0.11 (0.03,0.42) Dagianti 3.6
0.04 (0.01,0.28) Galanti 3.8
0.04 (0.01,0.31) Jun 2.7
0.06 (0.02,0.20) Luotolahti 1.9
0.08 (0.03,0.23) Marangelli 5.3
0.33 (0.22,0.49) Marwick1 10.6
0.08 (0.03,0.19) Marwick2 6.5
0.19 (0.10,0.36) Marwick3 8.5
0.48 (0.36,0.63) Roger1 9.8
0.46 (0.29,0.75) Roger2 5.0
0.64 (0.39,1.03) Roger3 3.2
0.13 (0.08,0.19) Ryan 16.5
0.07 (0.02,0.29) Tawa 2.3
0.19 (0.09,0.40) Williams 4.8
0.12 (0.07,0.18) crouse 8.8
0.21 (0.18,0.24) Overall (95% CI)
Homogeneity Tests, Echo
• Sensitivity: p = .43
• Specificity: p = .059
• + Likelihood Ratio: p = .018
• - Likelihood Ratio: p = .008
• ROC curve: p < .0001
• DOR: p < .0000001
. metareg lnor pmi, wsse(selnor) eform
Meta-regression Number of studies = 18
Fit of model without heterogeneity (tau2=0): Q (16 df) = 278.123 Prob > Q = 0.000Proportion of variation due to heterogeneity I-squared = 0.942
REML estimate of between-study variance: tau2 = 1.1220------------------------------------------------------------------------------ lnor | exp(b) Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- pmi | 1.0103 .0131902 0.78 0.444 .9827211 1.038652------------------------------------------------------------------------------
Prior MI & Diagnostic OR Univariate Analysis
. metareg lnor men, wsse(selnor) eform
Meta-regression Number of studies = 18
Fit of model without heterogeneity (tau2=0): Q (16 df) = 414.828 Prob > Q = 0.000Proportion of variation due to heterogeneity I-squared = 0.961
REML estimate of between-study variance: tau2 = 1.1610------------------------------------------------------------------------------ lnor | exp(b) Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- men | 1.008248 .0087052 0.95 0.356 .9899616 1.026872------------------------------------------------------------------------------
Gender and Diagnostic OR Univariate Analysis
. metareg lnor age, wsse(selnor) eform
Meta-regression Number of studies = 18
Fit of model without heterogeneity (tau2=0): Q (16 df) = 115.32 Prob > Q = 0.000Proportion of variation due to heterogeneity I-squared = 0.861
REML estimate of between-study variance: tau2 = 0.5732------------------------------------------------------------------------------ lnor | exp(b) Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | .827714 .0432287 -3.62 0.002 .7409641 .9246203------------------------------------------------------------------------------
Age and Diagnostic OR Univariate Analysis
Mean age = 59
For a year over the mean, the DOR increases by e .8277 , (approximately 2-fold increase for 1 year)
Does Fish Oil Affect HR?Mozaffarian, Circulation 2005
• Meta-analysis of 30 RPCT of fish oil that measured heart rate– Including 2 trails for which unpublished HR data
was obtained from authors– Excluded: no HR measured, no placebo, < 2
weeks, organ transplants, non-blinded
• Abstacted study, intervention, population characteristics; measurement of HR, dropout, quality
Were the trials homogeneous?• Q test, p < .0001
• Pre-specified characteristics to explore hetero– Design, age, health, CAD, baseline HR, dose, duration,
HR measure, control oil, Delphi criteria
• After stratification (or univariate analysis), meta-regression can explore multiple variables at once– Independent heterogeneity related to baseline HR (P
for interaction = 0.04) and duration (P for interaction = 0.09)
• Among 9 trials with mean BL HR >68 & duration > 12 weeks, HR ↓ 2.9 (p<.001) and Q> .05
Summary• Diagnostic meta-analysis follows the
same steps at other meta-analysis
• Follow up heterogeneity by examining study design, test/gold standard, and population characteristics
• Meta-regression is a way to explore and quantify heterogeneity using multiple co-variates (each study is a “participant”)