meta-analysis iii: “advanced topics” mary s. beattie, md, mas ucsf women’s health division of...

Meta-analysis III: “Advanced Topics”

Mary S. Beattie, MD, MASUCSF Women’s Health

Division of General Internal MedicineApril 27, 2006

Goals1. Describe examples in your field of

a. Meta-analysis of diagnostic testsb. Meta-regression

2. Understand methodology and stats fora. Meta-analysis of diagnostic testsb. Meta-regression

3. Critique meta-analyses of these 2 “advanced topics”

Clinical Cases

1. 65 y/o man with atypical chest pain…Which to order: stress echo, or stress MIBI?

2. 52 y/o woman with DUB… On ultrasound, what level of endometrial thickness could “rule out” uterine cancer?

Diagnostic Meta’s• Examples from your field?

• Goals of a diagnostic meta?

– Improve statistical precision of estimates of accuracy

– Measure variability of test characteristics across different populations

Diagnostics meta’s differ from RCT and observational meta’s

• Different endpoints– Sensitivity, specificity, likelihood ratios,

ROC curve, diagnostic odds ratio

• Electronic literature searches more difficult

• Need for a “gold standard” – common and reliable

• Clinically relevant population

Practical Problems• Potential for multiple outcomes

– Sens, spec, pos LR, neg LR, etc.

• Poor quality studies

• Different definitions for “positive” gold standard

• Different populations in each study

• Many potential biases, incomplete reporting

Sensitivity and SpecificitySensitivity

TP/(TP + FN)Positive in Disease

SpecificityTN/(TN + FP)Negative in Health

Gold Std

+

Gold Std

-

Test

+

TP FP

Test

-

FN TN

(+) Likelihood Ratio =Sensitivity

1-Specificity

Pre-test OddsPre-test OddsOf DiseaseOf Disease

== Posttest OddsPosttest Odds Of DiseaseOf Disease

xx + LR+ LR

Diagnostic OR = +LR/-LR= TP x TN / FP x FN

Sensitivity Specificity Pos LR Neg LR Diag OR0.5 0.5 1 1 10.6 0.6 1.5 0.67 2.30.7 0.7 2.3 0.43 5.40.8 0.8 4 0.25 160.9 0.9 9 0.11 810.95 0.95 19 0.05 3610.99 0.99 99 0.01 9801

Case 1: Chest Pain, Echo or SPECT?

• What is the perfect study design?– Echo, SPECT, and gold std done in

reproducible way and at the same time– Gold std is 100% reliable and reproducible– Blind all readers

• What is the perfect population?– Consecutive– “Gray zone” prior prob of having disease

• How do “real life” studies differ from ideal?

Meta-analysis: the Steps1. Formulate a question, eligibility criteria

2. Perform a systematic literature search

3. Abstract the data

4. Perform a statistical analysis

5. Calculate the summary effect size

6. Calculate the summary effect size for subgroups

7. Check for heterogeneity/publication bias

Echo vs. SPECTFleischmann, JAMA 98

• Included:– Exercise + echo or single-photon emission CT– Cath as reference– TP, TN, FP, FN available

• Excluded: – Exclusively after MI, PTCA, CABG, unstable angina

admission

• Data extracted: – Study design, population, test characteristics; TP, TN,

FP, FN; and ?verification bias

Bias Types• Verification Bias: generally ↑sens ↓spec

– Partial: not everyone gets “gold standard”– Differential: different “gold standards” based

on results of test

• Incorporation Bias: over-estimates diagnostic accuracy– “Gold std” dx based partially on results of test

• Spectrum Bias– Choosing cases known to have disease ↑ sens– Choosing healthy controls ↑ specificity

Spectrum Bias & Sensitivity

Disease Grade Sensitivity

Case-Control

General Practice Hospital

Early 0.50 0 80 20Intermediate 0.75 20 15 30Advanced 1.00 80 5 50Observed Sensitivity 0.95 0.56 0.83

Source of Samples

Sensitivity elevated in case-control studies

Cases

Spectrum Bias & Specificity

Specificity elevated in case control studies

Controls

Disease SpecificityControls in

Case-ControlGeneral

Practice HospitalAlternative X 0.30 0 30 75Alternative Y 0.95 0 65 25Healthy adults 0.99 100 5 0Observed Specificity 0.99 0.76 0.46

Factors Leading to BiasFeature Multiple of DOR 95% CICase-control vs. clinical cohort 3.0 2.0 to 4.5Different reference standard 2.2 1.5 to 3.3No description of test 1.7 1.1 to 2.7No description of population 1.4 1.1 to 1.7Assessors not blinded 1.3 1.0 to 1.9No description of reference test 0.7 0.6 to 0.9

Based on analysis of 218 test evaluations from 18 separate meta-analyses

Lijmer et al. JAMA 282:1061-6, 1999

SummarySensitivity and Specificity

Test Results With disease Without disease

Positive True positive False positiveTotal positive

Negative False negative True negativeTotal negative

Total w/disease Total w/o disease

Participants

TP(i)

FN(i)

FP(i)

TN(i)

n1 n2

Summary sensitivity =∑TP(i) [positives]

∑TP(i)+FN(i) [with disease]

∑TN(i) [negatives]

∑TN(i)+FP(i) [without disease]Summary specificity =

. twoway (scatter tpr fpr, sort).6

.7.8

.91

tpr

0 .2 .4 .6 .8fpr

0.1

.2.3

.4.5

.6.7

.8.9

1S

EN

SIT

IVIT

Y

0 .1 .2 .3 .4 .5 .6 .7 .8 .9 11-SPECIFICITY

Observed DataUninformative Test

ROC Plot of SENSITIVITY vs. 1-SPECIFICITY

. twoway (scatter tpr fpr, sort msymbol(circle) mcolor(black) msize(medium)), ytitle(SENSITIVITY) yscale(range(0 1)) ylabel(0(.1)1) xtitle(1-SPECIFICITY) xlabel(0(.1)1) title(ROC Plot of SENSITIVITY vs. 1-SPECIFICITY, size(medium)) graphregion(margin(zero)) legend(pos(2) col(1) lab(1 "Observed Data") lab(2 "Uninformative Test"))

Case 2: DUB, Ultrasound Cut-off?

Smith-Bindman, 1998

Case 2: DUB, Ultrasound Cut-off? Smith-Bindman 1998

Sensitivity and Specificity

Healthy Cancer

cutpoint

SensitivitySensitivity

1–specificity

ROC Curve for endometrial thickness

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

1-specificity(false positive)

Sen

siti

vity

(tr

ue p

osit

ive) 4 mm

5 mm10 mm

15 mm

20 mm

25 mm

Copyright restrictions may apply.

Test for HomogeneityFor each study measure squared deviation

from pooled estimate scaled by variance

Q i 2

vari

i = estimate from ith study= pooled estimatevar i = variance of estimate

Q has chi-square distribution with df = # studies - 1

Are Specificities

Homogeneous?

Chi-square, 19 df, p<0.001

= 0.81 * 131

Pooled specificity = .81

Study TN TN+FP exptd(obs-exp)

2/exp

Abu 436 459 372 10.8

Auslender 132 138 112 3.6

Botsis 112 112 91 4.9

Cacciator 19 41 33 6.1

Chan 46 50 41 0.7

Dorum 55 85 69 2.8

Goldstein 14 27 22 2.9

Granberg 153 157 127 5.1

Hanggi 58 68 55 0.1

Karlsson 801 1015 824 0.6

Karlsson (b) 70 90 73 0.1

Klug 120 171 139 2.5

Manilova 55 61 50 0.6

Nasri 51 52 42 1.8

Nasri (b) 74 83 67 0.7

Peril 43 131 106 37.7

Taviani 30 39 32 0.1

Varner 11 13 11 0.0

Weigl 117 163 132 1.8

Wolman 41 50 41 0.0

Total 2439 3005 83.1

Heterogeneity• Not the last step, just the beginning…

– Needs thought, explanation, and further work

• Statistical heterogeneity: is the variation likely to have occurred by chance?

• Clinical heterogeneity: are studies similar in…

– Design?

– Population?

– Test and gold standard characteristics?


HRT: Source of Heterogeneity?


.

Tests for LR Heterogeneity

• Positive LRs heterogeneous– Cochran’s Q = 187, df=19, p<0.001

• Negative LRs no evidence of heterogeneity– Cochran’s Q = 28, df=19, p=0.09

Summary Likelihood Ratios for Uterine Disease

• Heterogeneity in Pos LRs makes estimation of Pr{D | T+} unreliable

• Homogeneity in Neg LRs allows– Pr{D | T–} more reliable– If test is negative, more comfortable ruling out

disease

Meta-Regression: Step 8 or 9• A way to potentially explain and quantify

heterogeneity• Each study is a subject with co-variates

– Each unit is one study– Study-level covariates such as

• Quality• Publication year• Mean age• Gender distribution

• Can also apply to multi-center trials– Each unit is one center

Mega-Regression Cases

1. What characteristics of echo studies influenced its diagnostic odds ratio?

2. Fish oil: what are its effects on heart rate and who gets the most “bang for the buck” in ↓ heart rate from fish oil?

Echo, Sensitivity & Specificity

0.0 0.2 0.4 0.6 0.8 1.0

Specificity

0.0 0.2 0.4 0.6 0.8 1.0

Sensitivity

Echo ROC Curve

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

1-Specificity

Sen

sit

ivit

y

Summary LR+

Risk ratio.01 1 100

Study % Weight

Risk ratio (95% CI)

1.94 (1.34,2.82) Betesin 9.0

1.67 (0.92,3.03) Bjornstad 3.1

2.46 (1.41,4.27) Cohen 3.9

5.88 (2.78,12.44) Dagianti 1.7

12.98 (3.41,49.37) Galanti 0.8

4.34 (1.83,10.34) Jun 2.0

1.94 (1.15,3.28) Luotolahti 5.0

6.53 (2.89,14.76) Marangelli 2.3

5.64 (3.25,9.78) Marwick1 4.0

2.59 (1.79,3.75) Marwick2 9.8

4.47 (2.90,6.88) Marwick3 5.1

3.93 (2.16,7.14) Roger1 4.5

1.28 (1.06,1.53) Roger2 25.5

1.42 (0.91,2.23) Roger3 7.1

4.69 (3.09,7.12) Ryan 10.1

5.64 (1.59,20.03) Tawa 1.2

7.25 (2.85,18.46) Williams 1.6

7.02 (3.09,15.93) crouse 3.4

3.06 (2.72,3.45) Overall (95% CI)

Summary LR-

Risk ratio.01 1 100

Study % Weight

Risk ratio (95% CI)

0.06 (0.02,0.18) Betesin 3.5

0.16 (0.04,0.74) Bjornstad 0.9

0.10 (0.03,0.42) Cohen 2.4

0.11 (0.03,0.42) Dagianti 3.6

0.04 (0.01,0.28) Galanti 3.8

0.04 (0.01,0.31) Jun 2.7

0.06 (0.02,0.20) Luotolahti 1.9

0.08 (0.03,0.23) Marangelli 5.3

0.33 (0.22,0.49) Marwick1 10.6

0.08 (0.03,0.19) Marwick2 6.5

0.19 (0.10,0.36) Marwick3 8.5

0.48 (0.36,0.63) Roger1 9.8

0.46 (0.29,0.75) Roger2 5.0

0.64 (0.39,1.03) Roger3 3.2

0.13 (0.08,0.19) Ryan 16.5

0.07 (0.02,0.29) Tawa 2.3

0.19 (0.09,0.40) Williams 4.8

0.12 (0.07,0.18) crouse 8.8

0.21 (0.18,0.24) Overall (95% CI)

Homogeneity Tests, Echo

• Sensitivity: p = .43

• Specificity: p = .059

• + Likelihood Ratio: p = .018

• - Likelihood Ratio: p = .008

• ROC curve: p < .0001

• DOR: p < .0000001

. metareg lnor pmi, wsse(selnor) eform

Meta-regression Number of studies = 18

Fit of model without heterogeneity (tau2=0): Q (16 df) = 278.123 Prob > Q = 0.000Proportion of variation due to heterogeneity I-squared = 0.942

REML estimate of between-study variance: tau2 = 1.1220------------------------------------------------------------------------------ lnor | exp(b) Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- pmi | 1.0103 .0131902 0.78 0.444 .9827211 1.038652------------------------------------------------------------------------------

Prior MI & Diagnostic OR Univariate Analysis

. metareg lnor men, wsse(selnor) eform



REML estimate of between-study variance: tau2 = 1.1610------------------------------------------------------------------------------ lnor | exp(b) Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- men | 1.008248 .0087052 0.95 0.356 .9899616 1.026872------------------------------------------------------------------------------

Gender and Diagnostic OR Univariate Analysis

. metareg lnor age, wsse(selnor) eform



REML estimate of between-study variance: tau2 = 0.5732------------------------------------------------------------------------------ lnor | exp(b) Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- age | .827714 .0432287 -3.62 0.002 .7409641 .9246203------------------------------------------------------------------------------

Age and Diagnostic OR Univariate Analysis

Mean age = 59

For a year over the mean, the DOR increases by e .8277 , (approximately 2-fold increase for 1 year)

Does Fish Oil Affect HR?Mozaffarian, Circulation 2005

• Meta-analysis of 30 RPCT of fish oil that measured heart rate– Including 2 trails for which unpublished HR data

was obtained from authors– Excluded: no HR measured, no placebo, < 2

weeks, organ transplants, non-blinded

• Abstacted study, intervention, population characteristics; measurement of HR, dropout, quality

Were the trials homogeneous?• Q test, p < .0001

• Pre-specified characteristics to explore hetero– Design, age, health, CAD, baseline HR, dose, duration,

HR measure, control oil, Delphi criteria

• After stratification (or univariate analysis), meta-regression can explore multiple variables at once– Independent heterogeneity related to baseline HR (P

for interaction = 0.04) and duration (P for interaction = 0.09)

• Among 9 trials with mean BL HR >68 & duration > 12 weeks, HR ↓ 2.9 (p<.001) and Q> .05

Summary• Diagnostic meta-analysis follows the

same steps at other meta-analysis

• Follow up heterogeneity by examining study design, test/gold standard, and population characteristics

• Meta-regression is a way to explore and quantify heterogeneity using multiple co-variates (each study is a “participant”)

meta-analysis iii: “advanced topics” mary s. beattie, md, mas ucsf women’s health division of...

Documents

diagnostic metasexamples

gold standarddifferential

stress echo

critique metaanalyses

different gold standards

exercise echo

test characteristics

gold standard common