mrcpsych - how to analyse diagnostic test studies (may09)

MRCPsych 2009

Critical Appraisal of Diagnostic Tests

Alex J MitchellConsultant in Liaison PsychiatryUniversity of Leicester

MRCPsych Teaching 2009

Studies of Accuracy, Validity, Screening & Case finding

MRCPsych 2009

Contents

1. Importance of understanding diagnostic tests

2. Concept of diagnostic tests: traits to diseases

3. Statistics of diagnostic tests

4. Clinical Value of diagnostic tests

5. Worked examples

6. Advances techniques

1. Importance of understanding diagnostic tests

MRCPsych 2009

What Is a Diagnostic Test in Psychiatry?

• CT/MRI• CSF• Blood tests eg TFTs• SCAN/SCID/PSE/MINI• Neuropsychological Testing• MMSE• HADS/BDI/CESD?• Clinical Judgement• Self-report

MRCPsych 2009

Why Is a HADS score not a diagnosis?

MRCPsych 2009

Why Is a HADS score not a diagnosis?

1. No core features2. No symptom ranking3. No functional assessment4. Duration unclear5. What if Missing items?6. Imprecise

MRCPsych 2009

Defining Diagnostic Testing• INTENTION• Screening

– The systematic application of a test or inquiry, to identify individuals at sufficient risk of a specific disorder to warrant further actions among those who have not sought medical help for that disorder

• Case-Finding– The selected application of a test or inquiry, to identify individuals with a suspected disorder

and exclude those without a disorder, usually in those who have sought medical help for that disorder

• APPLICATION• Targeted (High Risk)

– The highly selected application of a test or inquiry, to identify individuals at high risk of a specific disorder by virtue of known risk factors

• Routine Screening– The systematic application of a test or inquiry, to individuals without a known disorder (or who

have not sought medical help for that disorder)

Adapted from Department of Health. Annual report of the national screening committee. London: DoH, 1997.

MRCPsych 2009

Defining Diagnostic Testing• COMPARATOR• Accuracy

– The degree of approximation (veracity) to a robust comparator

• Validity– The degree of approximation (veracity) to a criterion reference

• Precision– The degree of predictability (low SD) in the measure

MRCPsych 2009

Aims of Detection

• Screening:– Short; Easy; some false +ve (low SpS PPV), few false

–ve (High Sens, NPV)

• Diagnosis (case-finding)– Accurate, Few false +ve or –ve

• Rating– Simple, patient rated, correl. With QoL and other

outcomes

MRCPsych 2009

UK National Screening Committee Guidelines

• The condition should:• • Be an important health issue• • Have a well-understood history, with a

detectable risk factor or disease marker• • Have cost-effective primary preventions

implemented.

• The screening tool should:• • Be a valid tool with known cut-off• • Be acceptable to the public• • Have agreed diagnostic procedures.

• The treatment should:• • Be effective, with evidence of benefits of

early intervention• • Have adequate resources• • Have appropriate policies as to who should

be treated.

• The screening program should:• • Show evidence that benefits of screening

outweighing risks• • Be acceptable to public and professionals• • Be cost effective (and have ongoing

evaluation)• • Have quality-assurance strategies in place.• Adapted from: UK National Screening

Committee Criteria for appraising the viability, effectiveness and appropriateness of a screening programme

• http://www.nsc.nhs.uk/pdfs/criteria.pdf

MRCPsych 2009

In this last step the screening tool /method is introduced clinically but monitored to discover the effect on important patient outcomes such as new identifications, new cases treated and new cases entering remission.

Screening implementation studies using real-world outcomes

ImplementationPhase IV_screen

This is an important step in which the tool is evaluated clinically in one group with access to the new method compared to a second group (ideally selected in a randomized fashion) who make assessments without the tool.

Screening RCT; clinicians using vs not using a screening tool

ImplementationPhase III_screen

The aim is to assess the refined tool against a criterion (gold standard) in a real world sample where the comparator subjects may comprise several competing condition which may otherwise cause difficulty regarding differential diagnosis.

Diagnostic validity in a representative sample

Diagnostic validityPhase II_screen

The aim is to evaluate the early design of the screening method against a known (ideally accurate) standard known as the criterion reference. In early testing the tool may be refined, selecting most useful aspects and deleting redundant aspects in order to make the tool as efficient (brief) as possible whilst retaining its value.

Early diagnostic validity testing in a selected sample and refinement of tool

Diagnostic validityPhase I_screen

Here the aim is to develop a screening method that is likely to help in the detection of the underlying disorder, either in a specific setting or in all setting. Issues of acceptability of the tool to both patients and staff must be considered in order for implementation to be successful.

Development of the proposed tool or test

DevelopmentPre-clinical

DescriptionPurposeTypeStage

Development of Diagnostic Tests

MRCPsych 2009

Theory of Diagnostic Tests

Non-Depressed

Depressed# ofIndividuals

TestResult

Cut-off value

False +veFalse -ve

True -ve

True +ve

MRCPsych 2009

Low Prevalence (Se Sp = same)

Non-Depressed

Mj Depression# ofIndividuals

TestResult

Cut-off value

False +veLARGE

False –veSMALL

MRCPsych 2009

High Prevalence (Se Sp = same)

Non-Depressed Mj+Mn Depression

# ofIndividuals

TestResult

Cut-off value

False +veSMALL

False –veLARGE

2. Concepts of Diagnostic Tests: Trait / Syndrome / Disease

Can This Help establish a syndrome?

Example: A Clear Disease [#1]

Disorder

Number ofIndividuals

False +veFalse +ve

True ‐veTrue ‐ve

Point of Partial Rarity

Test Result

No Disorder

False ‐veFalse ‐ve

True +veTrue +ve

Example: A Probable Syndrome [#2]

Disorder


False +veFalse +ve False ‐veFalse ‐ve


True +veTrue +ve

MMSE Cognitive Score

No Disorder

Example: A Normally Distributed Trait [#3]

Disorder


False +veFalse +ve False ‐veFalse ‐ve


True +veTrue +ve

MMSE Cognitive Score

No Disorder

MRCPsych 2009

Example: Dementia

Disease?Syndrome?Trait?

MRCPsych 2009

Hubbert et al (2005) BMC Geriatrics

MMSE scores for dementia (n=72)and non-dementia (n=2735)

Huppert et al BMC Geriatrc 2005

MRCPsych 2009

Example: Depression

DiseaseSyndromeTrait

MRCPsych 2009

Mitchell, Coyne et al (2008)

0

10

20

30

40

50

60

70

80

90

100

110

Early Pregnancy3months Post-Partum12months Post-Partum

Scores on the CES-D during Pregnancy, 3 and 12 months Post-partum in 947 Women

Depressive Symptoms Moderate to Severe DepressionHealthy Mild Depression

MRCPsych 2009

PHQ9 Linear distribution

0

5

10

15

20

25

30

35

Zero One Two

Three

Four

Five Six

Seven

Eight

Nine

TenElev

enTwelveThir

teen

Fourte

enFifte

enSixt

een

Sevente

enEigh

teen

PHQ9 (Major Depression)PHQ9 (Minor Depression)PHQ9 (Non-Depressed)

Baker-Glen, Mitchell et al (2008)

MRCPsych 2009

Thompson et al (2001) n=18,414

0

500

1000

1500

2000

2500

3000

Zero One

TwoThree Four

Five SixSev

en

eight

Nine

TenEleve

nTwelv

eThirt

een

Fourtee

nFifte

enSixtee

nSev

entee

nEightee

n

3. Statistics of Diagnostic Tests: 2x2s

MRCPsych 2009

Accuracy 2x2 Table

PrevalenceSpecificitySensitivity

NPVTrue -VeFalse -VeTest -ve

PPVFalse +veTrue +veTest +ve

DepressionABSENT

DepressionPRESENT D / B + D

SpA / A + C

SnTotal

D/C + DNPV DC

Test-ve

A/A + BPPV BA

Test+ve

Reference StandardNo Disorder

Reference StandardDisorder Present

MRCPsych 2009

Accuracy 2x2 Table


NPVTNFNTest -ve

PPVFPTPTest +ve

DepressionABSENT

DepressionPRESENT

MRCPsych 2009

Basic Measures of Accuracy• Sensitivity (Se) a/(a + c) TP / (TP + FN)

• A measure of accuracy defined the proportion of patients with disease in whom the test result is positive: a/(a + c)

• Specificity (Sp) d/(b + d) TN / (TN + FP)• A measure of accuracy defined as the proportion of patients without disease in

whom the test result is negative

• Positive Predictive Value a/(a+b) TP / (TP + FP)• A measure of rule-in accuracy defined as the proportion of true positives in

those that screen positive screening result, as follows

• Negative Predictive Value c/(c+d) TN / (TN + FN)• A measure of rule-out accuracy defined as the proportion of true negatives in

those that screen negative screening result, as follows

MRCPsych 2009

Accuracy in words• Sensitivity

– The chance of testing positive among those with the condition – The chance of rejecting the null hypothesis among those that do not satisfy the null hypothesis

• Specificity– The chance of testing negative among those without the condition– The chance of accepting the null hypothesis among those that satisfy the null hypothesis

• Positive Predictive Value – The chance of having the condition among those that test positive – The chance of not satisfying the null hypothesis among those that reject the null hypothesis

• Negative Predictive Value – The chance of not having the condition among those that test negative – The chance of satisfying the null hypothesis among those that accept the null hypothesis

• Type I Error or α (alpha) or p-Value or false positive rate – The chance of testing positive among those without the condition– The chance of rejecting the null hypothesis among those that satisfy the null hypothesis

• Type II Error or β (beta) or false negative rate – The chance of testing negative among those with the condition – The chance of accepting the null hypothesis among those that do not satisfy the null hypothesis

• False Discovery Rate or q-Value – The chance of not having the condition among those that test positive – The chance of satisfying the null hypothesis among those that reject the null hypothesis

• False Omission Rate – The chance of having the condition among those that test negative – The chance of not satisfying the null hypothesis among those that accept the null hypothesis

MRCPsych 2009

Rule-in Accuracy

PrevalenceSpecificitySensitivity(occurrence)

NPVTrue -VeFalse –Ve

(type II error)

Test -ve

PPV(discrimination)

False +ve(type I error)

True +veTest +ve

DepressionABSENT

DepressionPRESENT

MRCPsych 2009

Rule-Out Accuracy

PrevalenceSpecificity(occurrence)

Sensitivity

NPV(discrimination)

True -VeFalse –Ve(type II error)

Test -ve


DepressionABSENT

DepressionPRESENT

MRCPsych 2009

Likelihood Ratios• Likelihood Ratio for Positive Tests• The chance of testing positive among those with the condition; divided by the

chance of testing positive among those without the condition • Sensitivity / (1 - Specificity) • [ TP / (TP + FN) ] / [ FP / (FP + TN) ]

• = PPV/Prevalence

• Likelihood Ratio for Negative Tests• The chance of testing negative among those with the condition; divided by the

chance of testing negative among those without the condition • Specificity (1 – Sensitivity)• [ FN / (FN + TP) ] / [ TN / (TN + FP) ]

• = NPV/Prevalence

MRCPsych 2009

Summary Measures• Youden's J

– Sensitivity + Specificity – 1

• Predictive Summary Index– PPV + NPV – 1

• Overall accuracy (fraction correct)– TP+TN / TP+FP+TN+FN

MRCPsych 2009

Reciprocal Measures• Number Needed to Diagnose (NND)

– 1 / (Youden's J)

• Number Needed to Predict (NNP)– 1 / (PSI)

• Number Needed to Screen (NNS)– 1/(FC-FiC)

Murphy JM, Berwick DM, Weinstein MC, Borus JF, Budman SH, Klerman GL 1987 : Performance of screening and diagnostic tests: Application of Receiver Operating Characteristic ROC analysis. Arch Gen Psychiatry 44:550-555

Receiver Operating Characteristic

MRCPsych 2009

Accuracy 2x2 Table


NPVTrue -VeFalse -VeTest -ve


DepressionABSENT

DepressionPRESENT

MRCPsych 2009

Test vs Major Depression

700060001000

50004500500Test -ve

20001500500Test +ve

DepressionABSENT

DepressionPRESENT

Sensitivity50%

PPV 33%

Specificity75%

NPV 90%

Prevalence 14%

MRCPsych 2009

Test vs Major + Min Depression

300020001000

1000500500Test -ve

20001500500Test +ve

DepressionABSENT

DepressionPRESENT

Sensitivity50%

PPV 33%

Specificity33%

NPV 50%

Prevalence 33%

4. Clinical Value of Diagnostic Tests

MRCPsych 2009

Added Value• Definition 1:

– The additional ability of a test to rule-in or rule-out compared with the baseline rate

– PPV minus Prevalence– NPV minus prevalence

• Definition 2:– The additional of a test to rule-in or rule-out compared

with the unassisted rate– PPV test minus PPV no test (assuming equal prevalence)

– LR+ test minus LR+ no test

– AUC test minus AUC no test

MRCPsych 2009

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Loss

of

ener

gy

Dim

inis

hed

driv

e

Slee

p di

stur

banc

e

Con

cent

rati

on/i

ndec

isio

n

Dep

ress

ed m

ood

Anx

iety

Dim

inis

hed

conc

entr

atio

n

Inso

mni

a

Dim

inis

hed

inte

rest

/ple

asur

e

Psyc

hic

anxi

ety

Hel

ples

snes

s

Wor

thle

ssne

ss

Hop

eles

snes

s

Som

atic

anx

iety

Tho

ught

s of

dea

th

Ang

er

Exce

ssiv

e gu

ilt

Psyc

hom

otor

cha

nge

Inde

cisi

vene

ss

Dec

reas

ed a

ppet

ite

Psyc

hom

otor

agi

tati

on

Psyc

hom

otor

ret

arda

tion

Dec

reas

ed w

eigh

t

Lack

of

reac

tive

moo

d

Incr

ease

d ap

peti

te

Hyp

erso

mni

a

Incr

ease

d w

eigh

t

All Case ProportionDepressed ProportionNon-Depressed Proportion

Mitchell, Zimmerman et al MIDAS Database. Psychol Med 2007 Submitted

MRCPsych 2009

-0.10

0.00

0.10

0.20

0.30

0.40

0.50A

nger

Anx

iety

Dec

reas

ed a

ppet

ite

Dec

reas

ed w

eigh

t

Dep

ress

ed m

ood

Dim

inis

hed

conc

entr

atio

n

Dim

inis

hed

driv

eD

imin

ishe

d in

tere

st/p

leas

ure

Exce

ssiv

e gu

ilt

Hel

ple

ssne

ss

Hop

eles

snes

s

Hyp

erso

mni

a

Incr

ease

d ap

peti

te

Incr

ease

d w

eigh

t

Inde

cisi

vene

ss

Inso

mni

aLa

ck o

f re

acti

ve m

ood

Loss

of

ener

gy

Psyc

hic

anxi

ety

Psyc

hom

otor

agi

tati

on

Psyc

hom

otor

cha

nge

Psyc

hom

otor

ret

arda

tion

Slee

p di

stur

banc

e

Som

atic

anx

iety

Thou

ghts

of

deat

h

Wor

thle

ssne

ss

Rule-In Added Value (PPV-Prev)Rule-Out Added Value (NPV-Prev)

MRCPsych 2009

Accuracy of Tests: Visual

0% 100%25% 75%

Very unlikely Very likelylikelyunlikely

2 Questions

Overall

PHQ-2

WHO5 (1+3)

1 Question3% - (37) - 63% = 60%

3% - (16) - 32% = 29%

3% - (16) - 32% = 29%

10% - (22) -50% = 54%

32% - (37) - 96% = 64%

Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci

CIDI (computer) Any Depression

Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci

CIDI (computer) Any Depression

Arroll B et al (2003) BMJ

CIDI (computer) Mj Depression

CIDI (computer) Mj Depression

MRCPsych 2009

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Pre-test Probability

Post

-test

Pro

babi

lity

Clinician Positive (Fallowfield et al, 2001)

Clinician Negative (Fallowfield et al, 2001)

Baseline Probability

HADS-D Positive (Mata-analysis)

HADS-D Negative (Meta-analysis)

MRCPsych 2009

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Pre-test Probability

Post

-test

Pro

babi

lity

Depression Present (Routine)

Depression Absent (Routine)

Depression Scales +ve (Median)

Depression Scales -ve (Median)

Prior Probability

PPV=0.41

NPV=0. 97

Prevalence of 0.15

5. Worked Examples of diagnostic tests

MRCPsych 2009

PostStroke Mj Depression vs NonMj

• Clinicians diagnosis using DSMIV vs SCAN/PSE

• Using the SCAN:• 50 people with major depression • 150 healthy people• 50 with minor depression

MRCPsych 2009

Clinicians using DSMIV• Clinicians diagnosed 52 cases with Mj depression• The specificity of DSMIV was 95%

• Q. What was the sensitivity?• Q. What was the prevalence?• Q. What was the PPV?• Q. What was the % correctly identified per every

100 screened?

MRCPsych 2009

Test vs Major Depression

20050

??Test -ve

52??Test +ve(Clinician)

DepressionABSENT

DepressionOn SCAN

Sensitivity50%

PPV ??%

Specificity95%

NPV ??%

Prevalence ??%

MRCPsych 2009

1.301.271.1785.600.910.680.960.810.951902000.844250DSMIV algorithm

4.6151.9551.200.720.130.840.380.861722000.341750Anger

46.92502.5539.200.660.040.800.220.821642000.201050Poor orientation

11.937.697.3513.600.480.140.840.250.571142000.562850Poor concentration

7.32501.7158.400.790.010.800.330.981962000.04250Suicidal thoughts

2.452.561.6062.400.780.270.880.530.891782000.502550Poor appetite

3.932.632.7236.800.610.250.900.350.681362000.703550Insomnia

6.0112.50-2.23-44.800.100.210.950.210.10202000.984950Low energy

3.902.503.57280.550.270.920.330.601202000.804050Loss of drive

1.961.351.5863.200.770.500.990.520.781562000.964850Loss of interest

1.411.221.2083.200.900.660.970.740.921842000.904550Persistent low mood

NNPNNDNNSIdentification Index

Negative Utility Index

Positive Utility Index

NPVPPVSpecificity

Non Depressed Stroke Patient withoutsymptom

No Post-Stroke Depression by reference standard

Sensitivity

Post-Stroke Depression withsymptom

Post-Stroke Depression by reference standard

Symptoms

6. Advanced Techniques

sROCReal World NumbersNND; NNSBivariate meta-analysisEconomics

MRCPsych 2009

PPV DT Distress = 55%; PPV Other Methods 65%

MRCPsych 2009ROC Plot

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

1 - Specifity

Sens

itivi

ty Low Mood

DSMIV

Low mood & loss interest

MRCPsych 2009

MRCPsych 2009

Bivariate Diagnostic meta-analysis

NNS= 1/Idemtification Index

Number needed to ScreenRequires application of criterion (gold) standard)

Measures real number of correct identifications vs misidentificationsCan be easily converted into a percentage

TP+TN / TP+FP+TN+FNOverall Accuracy (Fraction Correct)

NNP = 1/PSINumber Needed to Predict

Dependent of prevalencePlaces equal weight on rule-in and rule-out accuracy

Measures gainClinically applicable

PPV + NPV – 1Predictive Summary Index

NND = 1/YoudenNumber Needed to Diagnose

Requires application of criterion (gold) standard)Does not assess ratio of false positives to negatives

Relatively independent of prevalenceNot clinically interpretable

sensitivity + specificity – 1Youden Index

Reciprocal Absolute Benefit Formula

Reciprocal Absolute Benefit

WeaknessStrengthBasic FormulaMeasure

MRCPsych 2009

Further Reading• David A Grimes, Kenneth F Schulz Uses and abuses of screening tests Lancet

2002; 359: 881–84

• Jonathan J Deeks, Douglas G Altman Diagnostic tests 4: likelihood ratios BMJ VOLUME 329 17 JULY 2004

• Patrick M Bossuyt, Les Irwig, Jonathan Craig and Paul Glasziou Comparative accuracy: assessing new tests against existing diagnostic pathways. BMJ

• 2006;332;1089-1092

• Reitsma JB et al Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. Journal of Clinical Epidemiology 58 (2005) 982–990

mrcpsych - how to analyse diagnostic test studies (may09)

Technology

diagnostic testing mrcpsych

screening method

screening case

screening tool method

diagnostic validity

screening toolto

diagnostic procedures

concept of diagnostic