mrcpsych - how to analyse diagnostic test studies (may09)
DESCRIPTION
Everything you wanted to know about diagnostic studies but were afraid to ask! MRCPsych 2009, LeicesterTRANSCRIPT
MRCPsych 2009
Critical Appraisal of Diagnostic Tests
Alex J MitchellConsultant in Liaison PsychiatryUniversity of Leicester
MRCPsych Teaching 2009
Studies of Accuracy, Validity, Screening & Case finding
MRCPsych 2009
Contents
1. Importance of understanding diagnostic tests
2. Concept of diagnostic tests: traits to diseases
3. Statistics of diagnostic tests
4. Clinical Value of diagnostic tests
5. Worked examples
6. Advances techniques
1. Importance of understanding diagnostic tests
MRCPsych 2009
What Is a Diagnostic Test in Psychiatry?
• CT/MRI• CSF• Blood tests eg TFTs• SCAN/SCID/PSE/MINI• Neuropsychological Testing• MMSE• HADS/BDI/CESD?• Clinical Judgement• Self-report
MRCPsych 2009
Why Is a HADS score not a diagnosis?
MRCPsych 2009
Why Is a HADS score not a diagnosis?
1. No core features2. No symptom ranking3. No functional assessment4. Duration unclear5. What if Missing items?6. Imprecise
MRCPsych 2009
Defining Diagnostic Testing• INTENTION• Screening
– The systematic application of a test or inquiry, to identify individuals at sufficient risk of a specific disorder to warrant further actions among those who have not sought medical help for that disorder
• Case-Finding– The selected application of a test or inquiry, to identify individuals with a suspected disorder
and exclude those without a disorder, usually in those who have sought medical help for that disorder
• APPLICATION• Targeted (High Risk)
– The highly selected application of a test or inquiry, to identify individuals at high risk of a specific disorder by virtue of known risk factors
• Routine Screening– The systematic application of a test or inquiry, to individuals without a known disorder (or who
have not sought medical help for that disorder)
Adapted from Department of Health. Annual report of the national screening committee. London: DoH, 1997.
MRCPsych 2009
Defining Diagnostic Testing• COMPARATOR• Accuracy
– The degree of approximation (veracity) to a robust comparator
• Validity– The degree of approximation (veracity) to a criterion reference
• Precision– The degree of predictability (low SD) in the measure
MRCPsych 2009
Aims of Detection
• Screening:– Short; Easy; some false +ve (low SpS PPV), few false
–ve (High Sens, NPV)
• Diagnosis (case-finding)– Accurate, Few false +ve or –ve
• Rating– Simple, patient rated, correl. With QoL and other
outcomes
MRCPsych 2009
UK National Screening Committee Guidelines
• The condition should:• • Be an important health issue• • Have a well-understood history, with a
detectable risk factor or disease marker• • Have cost-effective primary preventions
implemented.
• The screening tool should:• • Be a valid tool with known cut-off• • Be acceptable to the public• • Have agreed diagnostic procedures.
• The treatment should:• • Be effective, with evidence of benefits of
early intervention• • Have adequate resources• • Have appropriate policies as to who should
be treated.
• The screening program should:• • Show evidence that benefits of screening
outweighing risks• • Be acceptable to public and professionals• • Be cost effective (and have ongoing
evaluation)• • Have quality-assurance strategies in place.• Adapted from: UK National Screening
Committee Criteria for appraising the viability, effectiveness and appropriateness of a screening programme
• http://www.nsc.nhs.uk/pdfs/criteria.pdf
MRCPsych 2009
In this last step the screening tool /method is introduced clinically but monitored to discover the effect on important patient outcomes such as new identifications, new cases treated and new cases entering remission.
Screening implementation studies using real-world outcomes
ImplementationPhase IV_screen
This is an important step in which the tool is evaluated clinically in one group with access to the new method compared to a second group (ideally selected in a randomized fashion) who make assessments without the tool.
Screening RCT; clinicians using vs not using a screening tool
ImplementationPhase III_screen
The aim is to assess the refined tool against a criterion (gold standard) in a real world sample where the comparator subjects may comprise several competing condition which may otherwise cause difficulty regarding differential diagnosis.
Diagnostic validity in a representative sample
Diagnostic validityPhase II_screen
The aim is to evaluate the early design of the screening method against a known (ideally accurate) standard known as the criterion reference. In early testing the tool may be refined, selecting most useful aspects and deleting redundant aspects in order to make the tool as efficient (brief) as possible whilst retaining its value.
Early diagnostic validity testing in a selected sample and refinement of tool
Diagnostic validityPhase I_screen
Here the aim is to develop a screening method that is likely to help in the detection of the underlying disorder, either in a specific setting or in all setting. Issues of acceptability of the tool to both patients and staff must be considered in order for implementation to be successful.
Development of the proposed tool or test
DevelopmentPre-clinical
DescriptionPurposeTypeStage
Development of Diagnostic Tests
MRCPsych 2009
Theory of Diagnostic Tests
Non-Depressed
Depressed# ofIndividuals
TestResult
Cut-off value
False +veFalse -ve
True -ve
True +ve
MRCPsych 2009
Low Prevalence (Se Sp = same)
Non-Depressed
Mj Depression# ofIndividuals
TestResult
Cut-off value
False +veLARGE
False –veSMALL
MRCPsych 2009
High Prevalence (Se Sp = same)
Non-Depressed Mj+Mn Depression
# ofIndividuals
TestResult
Cut-off value
False +veSMALL
False –veLARGE
2. Concepts of Diagnostic Tests: Trait / Syndrome / Disease
Can This Help establish a syndrome?
Example: A Clear Disease [#1]
Disorder
Number ofIndividuals
False +veFalse +ve
True ‐veTrue ‐ve
Point of Partial Rarity
Test Result
No Disorder
False ‐veFalse ‐ve
True +veTrue +ve
Example: A Probable Syndrome [#2]
Disorder
Number ofIndividuals
False +veFalse +ve False ‐veFalse ‐ve
True ‐veTrue ‐ve
True +veTrue +ve
MMSE Cognitive Score
No Disorder
Example: A Normally Distributed Trait [#3]
Disorder
Number ofIndividuals
False +veFalse +ve False ‐veFalse ‐ve
True ‐veTrue ‐ve
True +veTrue +ve
MMSE Cognitive Score
No Disorder
MRCPsych 2009
Example: Dementia
Disease?Syndrome?Trait?
MRCPsych 2009
Hubbert et al (2005) BMC Geriatrics
MMSE scores for dementia (n=72)and non-dementia (n=2735)
Huppert et al BMC Geriatrc 2005
MRCPsych 2009
Example: Depression
DiseaseSyndromeTrait
MRCPsych 2009
Mitchell, Coyne et al (2008)
0
10
20
30
40
50
60
70
80
90
100
110
Early Pregnancy3months Post-Partum12months Post-Partum
Scores on the CES-D during Pregnancy, 3 and 12 months Post-partum in 947 Women
Depressive Symptoms Moderate to Severe DepressionHealthy Mild Depression
MRCPsych 2009
PHQ9 Linear distribution
0
5
10
15
20
25
30
35
Zero One Two
Three
Four
Five Six
Seven
Eight
Nine
TenElev
enTwelveThir
teen
Fourte
enFifte
enSixt
een
Sevente
enEigh
teen
PHQ9 (Major Depression)PHQ9 (Minor Depression)PHQ9 (Non-Depressed)
Baker-Glen, Mitchell et al (2008)
MRCPsych 2009
Thompson et al (2001) n=18,414
0
500
1000
1500
2000
2500
3000
Zero One
TwoThree Four
Five SixSev
en
eight
Nine
TenEleve
nTwelv
eThirt
een
Fourtee
nFifte
enSixtee
nSev
entee
nEightee
n
3. Statistics of Diagnostic Tests: 2x2s
MRCPsych 2009
Accuracy 2x2 Table
PrevalenceSpecificitySensitivity
NPVTrue -VeFalse -VeTest -ve
PPVFalse +veTrue +veTest +ve
DepressionABSENT
DepressionPRESENT D / B + D
SpA / A + C
SnTotal
D/C + DNPV DC
Test-ve
A/A + BPPV BA
Test+ve
Reference StandardNo Disorder
Reference StandardDisorder Present
MRCPsych 2009
Accuracy 2x2 Table
PrevalenceSpecificitySensitivity
NPVTNFNTest -ve
PPVFPTPTest +ve
DepressionABSENT
DepressionPRESENT
MRCPsych 2009
Basic Measures of Accuracy• Sensitivity (Se) a/(a + c) TP / (TP + FN)
• A measure of accuracy defined the proportion of patients with disease in whom the test result is positive: a/(a + c)
• Specificity (Sp) d/(b + d) TN / (TN + FP)• A measure of accuracy defined as the proportion of patients without disease in
whom the test result is negative
• Positive Predictive Value a/(a+b) TP / (TP + FP)• A measure of rule-in accuracy defined as the proportion of true positives in
those that screen positive screening result, as follows
• Negative Predictive Value c/(c+d) TN / (TN + FN)• A measure of rule-out accuracy defined as the proportion of true negatives in
those that screen negative screening result, as follows
MRCPsych 2009
Accuracy in words• Sensitivity
– The chance of testing positive among those with the condition – The chance of rejecting the null hypothesis among those that do not satisfy the null hypothesis
• Specificity– The chance of testing negative among those without the condition– The chance of accepting the null hypothesis among those that satisfy the null hypothesis
• Positive Predictive Value – The chance of having the condition among those that test positive – The chance of not satisfying the null hypothesis among those that reject the null hypothesis
• Negative Predictive Value – The chance of not having the condition among those that test negative – The chance of satisfying the null hypothesis among those that accept the null hypothesis
• Type I Error or α (alpha) or p-Value or false positive rate – The chance of testing positive among those without the condition– The chance of rejecting the null hypothesis among those that satisfy the null hypothesis
• Type II Error or β (beta) or false negative rate – The chance of testing negative among those with the condition – The chance of accepting the null hypothesis among those that do not satisfy the null hypothesis
• False Discovery Rate or q-Value – The chance of not having the condition among those that test positive – The chance of satisfying the null hypothesis among those that reject the null hypothesis
• False Omission Rate – The chance of having the condition among those that test negative – The chance of not satisfying the null hypothesis among those that accept the null hypothesis
MRCPsych 2009
Rule-in Accuracy
PrevalenceSpecificitySensitivity(occurrence)
NPVTrue -VeFalse –Ve
(type II error)
Test -ve
PPV(discrimination)
False +ve(type I error)
True +veTest +ve
DepressionABSENT
DepressionPRESENT
MRCPsych 2009
Rule-Out Accuracy
PrevalenceSpecificity(occurrence)
Sensitivity
NPV(discrimination)
True -VeFalse –Ve(type II error)
Test -ve
PPVFalse +veTrue +veTest +ve
DepressionABSENT
DepressionPRESENT
MRCPsych 2009
Likelihood Ratios• Likelihood Ratio for Positive Tests• The chance of testing positive among those with the condition; divided by the
chance of testing positive among those without the condition • Sensitivity / (1 - Specificity) • [ TP / (TP + FN) ] / [ FP / (FP + TN) ]
• = PPV/Prevalence
• Likelihood Ratio for Negative Tests• The chance of testing negative among those with the condition; divided by the
chance of testing negative among those without the condition • Specificity (1 – Sensitivity)• [ FN / (FN + TP) ] / [ TN / (TN + FP) ]
• = NPV/Prevalence
MRCPsych 2009
Summary Measures• Youden's J
– Sensitivity + Specificity – 1
• Predictive Summary Index– PPV + NPV – 1
• Overall accuracy (fraction correct)– TP+TN / TP+FP+TN+FN
MRCPsych 2009
Reciprocal Measures• Number Needed to Diagnose (NND)
– 1 / (Youden's J)
• Number Needed to Predict (NNP)– 1 / (PSI)
• Number Needed to Screen (NNS)– 1/(FC-FiC)
Murphy JM, Berwick DM, Weinstein MC, Borus JF, Budman SH, Klerman GL 1987 : Performance of screening and diagnostic tests: Application of Receiver Operating Characteristic ROC analysis. Arch Gen Psychiatry 44:550-555
Receiver Operating Characteristic
MRCPsych 2009
Accuracy 2x2 Table
PrevalenceSpecificitySensitivity
NPVTrue -VeFalse -VeTest -ve
PPVFalse +veTrue +veTest +ve
DepressionABSENT
DepressionPRESENT
MRCPsych 2009
Test vs Major Depression
700060001000
50004500500Test -ve
20001500500Test +ve
DepressionABSENT
DepressionPRESENT
Sensitivity50%
PPV 33%
Specificity75%
NPV 90%
Prevalence 14%
MRCPsych 2009
Test vs Major + Min Depression
300020001000
1000500500Test -ve
20001500500Test +ve
DepressionABSENT
DepressionPRESENT
Sensitivity50%
PPV 33%
Specificity33%
NPV 50%
Prevalence 33%
4. Clinical Value of Diagnostic Tests
MRCPsych 2009
Added Value• Definition 1:
– The additional ability of a test to rule-in or rule-out compared with the baseline rate
– PPV minus Prevalence– NPV minus prevalence
• Definition 2:– The additional of a test to rule-in or rule-out compared
with the unassisted rate– PPV test minus PPV no test (assuming equal prevalence)
– LR+ test minus LR+ no test
– AUC test minus AUC no test
MRCPsych 2009
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Loss
of
ener
gy
Dim
inis
hed
driv
e
Slee
p di
stur
banc
e
Con
cent
rati
on/i
ndec
isio
n
Dep
ress
ed m
ood
Anx
iety
Dim
inis
hed
conc
entr
atio
n
Inso
mni
a
Dim
inis
hed
inte
rest
/ple
asur
e
Psyc
hic
anxi
ety
Hel
ples
snes
s
Wor
thle
ssne
ss
Hop
eles
snes
s
Som
atic
anx
iety
Tho
ught
s of
dea
th
Ang
er
Exce
ssiv
e gu
ilt
Psyc
hom
otor
cha
nge
Inde
cisi
vene
ss
Dec
reas
ed a
ppet
ite
Psyc
hom
otor
agi
tati
on
Psyc
hom
otor
ret
arda
tion
Dec
reas
ed w
eigh
t
Lack
of
reac
tive
moo
d
Incr
ease
d ap
peti
te
Hyp
erso
mni
a
Incr
ease
d w
eigh
t
All Case ProportionDepressed ProportionNon-Depressed Proportion
Mitchell, Zimmerman et al MIDAS Database. Psychol Med 2007 Submitted
MRCPsych 2009
-0.10
0.00
0.10
0.20
0.30
0.40
0.50A
nger
Anx
iety
Dec
reas
ed a
ppet
ite
Dec
reas
ed w
eigh
t
Dep
ress
ed m
ood
Dim
inis
hed
conc
entr
atio
n
Dim
inis
hed
driv
eD
imin
ishe
d in
tere
st/p
leas
ure
Exce
ssiv
e gu
ilt
Hel
ple
ssne
ss
Hop
eles
snes
s
Hyp
erso
mni
a
Incr
ease
d ap
peti
te
Incr
ease
d w
eigh
t
Inde
cisi
vene
ss
Inso
mni
aLa
ck o
f re
acti
ve m
ood
Loss
of
ener
gy
Psyc
hic
anxi
ety
Psyc
hom
otor
agi
tati
on
Psyc
hom
otor
cha
nge
Psyc
hom
otor
ret
arda
tion
Slee
p di
stur
banc
e
Som
atic
anx
iety
Thou
ghts
of
deat
h
Wor
thle
ssne
ss
Rule-In Added Value (PPV-Prev)Rule-Out Added Value (NPV-Prev)
MRCPsych 2009
Accuracy of Tests: Visual
0% 100%25% 75%
Very unlikely Very likelylikelyunlikely
2 Questions
Overall
PHQ-2
WHO5 (1+3)
1 Question3% - (37) - 63% = 60%
3% - (16) - 32% = 29%
3% - (16) - 32% = 29%
10% - (22) -50% = 54%
32% - (37) - 96% = 64%
Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci
CIDI (computer) Any Depression
Henckel et al (2004) Eur Arch Psychiatry Clin Neurosci
CIDI (computer) Any Depression
Arroll B et al (2003) BMJ
CIDI (computer) Mj Depression
CIDI (computer) Mj Depression
MRCPsych 2009
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Pre-test Probability
Post
-test
Pro
babi
lity
Clinician Positive (Fallowfield et al, 2001)
Clinician Negative (Fallowfield et al, 2001)
Baseline Probability
HADS-D Positive (Mata-analysis)
HADS-D Negative (Meta-analysis)
MRCPsych 2009
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Pre-test Probability
Post
-test
Pro
babi
lity
Depression Present (Routine)
Depression Absent (Routine)
Depression Scales +ve (Median)
Depression Scales -ve (Median)
Prior Probability
PPV=0.41
NPV=0. 97
Prevalence of 0.15
5. Worked Examples of diagnostic tests
MRCPsych 2009
PostStroke Mj Depression vs NonMj
• Clinicians diagnosis using DSMIV vs SCAN/PSE
• Using the SCAN:• 50 people with major depression • 150 healthy people• 50 with minor depression
MRCPsych 2009
Clinicians using DSMIV• Clinicians diagnosed 52 cases with Mj depression• The specificity of DSMIV was 95%
• Q. What was the sensitivity?• Q. What was the prevalence?• Q. What was the PPV?• Q. What was the % correctly identified per every
100 screened?
MRCPsych 2009
Test vs Major Depression
20050
??Test -ve
52??Test +ve(Clinician)
DepressionABSENT
DepressionOn SCAN
Sensitivity50%
PPV ??%
Specificity95%
NPV ??%
Prevalence ??%
MRCPsych 2009
1.301.271.1785.600.910.680.960.810.951902000.844250DSMIV algorithm
4.6151.9551.200.720.130.840.380.861722000.341750Anger
46.92502.5539.200.660.040.800.220.821642000.201050Poor orientation
11.937.697.3513.600.480.140.840.250.571142000.562850Poor concentration
7.32501.7158.400.790.010.800.330.981962000.04250Suicidal thoughts
2.452.561.6062.400.780.270.880.530.891782000.502550Poor appetite
3.932.632.7236.800.610.250.900.350.681362000.703550Insomnia
6.0112.50-2.23-44.800.100.210.950.210.10202000.984950Low energy
3.902.503.57280.550.270.920.330.601202000.804050Loss of drive
1.961.351.5863.200.770.500.990.520.781562000.964850Loss of interest
1.411.221.2083.200.900.660.970.740.921842000.904550Persistent low mood
NNPNNDNNSIdentification Index
Negative Utility Index
Positive Utility Index
NPVPPVSpecificity
Non Depressed Stroke Patient withoutsymptom
No Post-Stroke Depression by reference standard
Sensitivity
Post-Stroke Depression withsymptom
Post-Stroke Depression by reference standard
Symptoms
6. Advanced Techniques
sROCReal World NumbersNND; NNSBivariate meta-analysisEconomics
MRCPsych 2009
PPV DT Distress = 55%; PPV Other Methods 65%
MRCPsych 2009ROC Plot
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
1 - Specifity
Sens
itivi
ty Low Mood
DSMIV
Low mood & loss interest
MRCPsych 2009
MRCPsych 2009
Bivariate Diagnostic meta-analysis
NNS= 1/Idemtification Index
Number needed to ScreenRequires application of criterion (gold) standard)
Measures real number of correct identifications vs misidentificationsCan be easily converted into a percentage
TP+TN / TP+FP+TN+FNOverall Accuracy (Fraction Correct)
NNP = 1/PSINumber Needed to Predict
Dependent of prevalencePlaces equal weight on rule-in and rule-out accuracy
Measures gainClinically applicable
PPV + NPV – 1Predictive Summary Index
NND = 1/YoudenNumber Needed to Diagnose
Requires application of criterion (gold) standard)Does not assess ratio of false positives to negatives
Relatively independent of prevalenceNot clinically interpretable
sensitivity + specificity – 1Youden Index
Reciprocal Absolute Benefit Formula
Reciprocal Absolute Benefit
WeaknessStrengthBasic FormulaMeasure
MRCPsych 2009
Further Reading• David A Grimes, Kenneth F Schulz Uses and abuses of screening tests Lancet
2002; 359: 881–84
• Jonathan J Deeks, Douglas G Altman Diagnostic tests 4: likelihood ratios BMJ VOLUME 329 17 JULY 2004
• Patrick M Bossuyt, Les Irwig, Jonathan Craig and Paul Glasziou Comparative accuracy: assessing new tests against existing diagnostic pathways. BMJ
• 2006;332;1089-1092
• Reitsma JB et al Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. Journal of Clinical Epidemiology 58 (2005) 982–990