tests of diagnostic accuracy
TRANSCRIPT
1
Studies of Diagnostic Accuracy and their Reporting
Simba Takuva, MBChB, MSc, DipHIVManData Analysis and Working Group Meeting
Epidemiology and Biostatistics DivisionClinical HIV Research Unit
2
Medicine is the science of uncertainty and the art of probability
…E. Mumford
3
1. Acknowledgements2. Introduction3. Sensitivity and Specificity4. Receiver Operator Characteristic (ROC) curves5. Predictive values (PV)6. Prevalence7. Likelihood Ratios (LR)8. Logistic Regression9. Clinical prediction rules10. Bias in Studies in Diagnostic Accuracy11. Reporting studies of diagnostic accuracy12. References
Outline of presentation
4
Clinical Epidemiology : the essentials. 4th edition. Fletcher & Fletcher. Lippincott Williams & Wilkins
Thomas Newman :Lecture notes series -UCSF Paul Rheeder: Lecture notes - UP
acknowledgements
5
Examples of studies of diagnostic accuracy◦ Diagnostic accuracy
of CD4 cell count increase for virologic response after initiating highly active antiretroviral therapy
Bisson, G P; Gross, R; Rollins, C; Bellamy, S; Weinstein, R; Friedman, H; et al . AIDS, August 1, 2006, 20(12):1613-1619
Changes in total lymphocyte count as a surrogate for changes in CD4 count following initiation of HAART: implications for monitoring in resource-limited settings.Mahajan AP, Hogan JW, Snyder B, Kumarasamy N, Mehta K J Acquir Immune Defic Syndr. 2004 May 1;36(1):567-75
◦ Validation of a WHO algorithm with risk assessment for the clinical management of vaginal discharge in Mwanza, Tanzania.Mayaud P, ka-Gina G, Cornelissen J, Todd J, Kaatano G , reference ?
Introduction
6
The Evolution of Diagnostic reasoning
Patient either has the disease or not : D+ or D-
Test results are dichotomous◦ Most tests have more than two possible answers
Disease states are dichotomous◦ Many diseases occur on a spectrum◦ There are many kinds of “nondisease”!
Evaluating diagnostic tests Reliability Accuracy Usefulness
introduction
7
Introduction - The Evolution of Diagnostic Reasoning
Sens and specificity
Likelihood ratio
PPV and NPV
ROC curves
Logistic regression
diagram from P. Rheeder :EBM notes
8
introduction
AIM: to use clinical and non clinical factors to cross thresholds
Crossing test / treatment threshold
Do not test Test and treat on basis of Do not test Do not treat test result Get on with
treatment
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Likelihood of target disorder
9
Figure 1 : the relationship between a diagnostic test result and occurrence of disease
introduction
Disease present (+)
Disease absent (-)
Test (+) True positives
False positives
Test (-) False negatives
True negatives
10
fig 1 shows the relationship between a diagnostic test result and occurrence of disease
the goal of all studies aimed at describing the value of diagnostic tests should be to obtain data for all four cells shown in fig 1
a test’s accuracy is considered in relation to some reference standard or ‘ Gold Standard’
some issues with tests of diagnostic accuracy :1. lack of information on negative tests2. lack of information on test results in the nondiseased 3. Lack of objective standards for disease all the above 3 issues lead to the concern that no new test can perform
better than an established gold standard unless special strategies are used
introduction
11
describe how often the test is correct in the diseased and non-diseased groups respectively
a Sn test has a high true positive ratio ( TPR ) and is good in detecting patients with the target disease
a Sp test has a high true negative ratio ( TNR ) and is good in detecting patients without the target disease
Sensitivity (Sn) and Specificity (Sp)
12
Sn = TPR = p(T+|D+ ) = a/a+c Proportion of patients with disease who test positive SNOUTS : sensitive test - ,rules out ( high true
positive ratio )
sensitivity (Sn) and specificity (Sp)
Disease present (+)
Disease absent (-)
Test (+) True positives a False positives b
Test (-) False negatives c True negatives d
13
Sp = TNR = p(T-|D- ) = d/b+d The proportion of patients without disease who test
negative SPINS : specific test + ,rules in ( high true negative
ratio )
sensitivity (Sn) and specificity (Sp)
Disease present (+)
Disease absent (-)
Test (+) True positives a False positives b
Test (-) False negatives c True negatives d
14
it is obviously desirable to have a test that is both highly sensitive and highly specific
unfortunately , usually not possible – instead there is a trade –off between the Sp and Sn
especially true when data take on range of values – in this case , the location of a cuttoff point is an arbitrary decision
as a result, for any given test result on a continous scale, one characteristic i.e. Sn can only be increased at the expense of the other i.e Sp
sensitivity (Sn) and specificity (Sp)
15
16
expresses the relationship between Sn and Sp it is a popular summary measure of the discriminatory ability
of a clinical marker that can be used when there is a gold standard
the ROC plots Sn against 1-Sp ( True positivity vs False positivity ) for all thresholds that could have been used to define ‘test positive’
assessed by measuring the area under the curve (AUC) AUC ranges from 0.5 (no discriminatory ability) to 1.0
(perfect discriminatory ability) two diagnostic tests can be compared by calculating the
difference between the areas under their 2 ROC curves
The Receiver Operator Characteristic (ROC) curve
17
18
19
Characteristics1. shows how severe the trade –off between Sp and Sn is for a
test2. the best cutt-off point is at or near the ‘shoulder ‘ of the
curve 3. the closer the curve follows the left hand border and then the
top border of the ROC space, the more accurate the test4. the closer the curve follows the 45 degree diagonal of the
ROC space ,the less accurate the test5. the slope of the tangent line at a cutt-off point gives the
likelihood ratio (LR) for the value of the test
the ROC curve
20
Comparing diagnostic test performance Accuracy is measured by the AUC
◦ 0.90 to 1 = excellent ◦ 0.80 to 0.90 = good ◦ 0.70 to 0.80 = fair ◦ 0.60 to 0.70 = poor ◦ 0.50 to 0.60 = fail
the ROC curve
21
clinicians are more concerned with the following question (than with Sp and Sn):
does the patient have the disease , given the results of a test ?
Predictive Values and Prevelence
22
the predictive value (PV) is the probability of disease, given the results of a test
Only absolute diagnostic measure of diagnostic accuracy
it is also known as the posterior ( posttest ) probability - the probability of disease after the test result is known
Positive Predictive Value (PPV) is the probability of disease in a patient with a positive (abnormal) test result
Negative Predictive Value (NPV) is the probability of not having the disease when the test result is negative (normal)
predictive value and prevalence
23
PPV = a/a+b NPV = d/c+d P = a+c/(a+b+c+d)
Disease present (+)
Disease absent (-)
Test (+) True positives a False positives b
Test (-) False negatives c True negatives d
24
Prevalence ( P ) is the proportion of persons in a defined population at a given point in time having the condition in question
prevalence is also known as pretest (prior ) probability
predictive value and prevalence
25
Determinants of predictive value (PV)
the formula relating these concepts is derived from Baye’s theorem of conditional probabilities :
PPV = Sn * P / (Sn* P) + (1-Sp)*(1-P)
predictive value and prevalence
26
prevalence is an important determinant of the interpretation of the result of a diagnostic test
when the prevalence of disease in the population tested is relatively high – the test performs well
at lower prevalences, the PPV drops to nearly zero, and the test is virtually useless
As Sn and Sp fall, the influence of prevalence on PV becomes more pronounced !
predictive value and prevalence
27
Pitfalls in the literature
data from publications are often gathered in university teaching hospitals were prevalence of serious disease is relatively high – as a result ,statements about PPV of a test are applied in less highly selected settings
occasionally ,authors compare the perfomance of a test in a number of diseased patients to an equal number of undiseased patients – this is efficient for Sn and Sp but means little for PPV because already the investigators have artificially set the prevalence of disease at 50%
predictive value and prevalence
28
revisiting Baye’s theorem :
The posttest probability (PPV)of disease is related to the pretest probability (prev ) and the test characteristics
Baye’s formula makes use of 2 concepts 1. Odds2. Likelihood ratio
The Likelihood Ratio (LR) is the probability of having a positive test result when you have disease divided by the probability of having the same result when you do not have disease ( it is an odds ratio )
pretest odds x LR = posttest odds
Likelihood Ratios (LR)
29
Advantages of LR◦ Is more stable ( depends on Sn and Sp not prevalence)??◦ Can use different cut-off values eg not dependent on one cut
off value only◦ Used in Bayesian reasoning◦ Likelihood ratios can deal with tests with more than two
possible results (not just normal/abnormal).
Likelihood ratios (LR)
30
Fagan’s Normogram
NEJM 1975; 293: 257
31
ROC curves can be compared statistically to see if added info is of any benefit
regression coefficients can also be made into scores risk scores used to predict outcome (Diagnosis)
Logistic Regression
32
Multiple Tests◦ Usually there is need for multiple tests
1. Parallel testing2. Serial testing
Clinical prediction rules
◦ These are rules used to “predict” diagnostic outcome◦ A modification of parallel testing when a combination of multiple
tests are used – some with positive and some with negative results.◦ Usually includes history , physical examination and certain
laboratory tests
The independence assumption
Clinical Prediction Rules
33
Background: Disseminated infection with Histoplasma capsulatum and Mycobacterium
avium complex (MAC) in patients with AIDS are frequently difficult to distinguish
clinically.
Methods: We retrospectively compared demographic information, other opportunistic
infections, medications, symptoms, physical examination findings and laboratory
parameters at the time of hospital presentation for 32 patients with culture documented
disseminated histoplasmosis and 58 patients with disseminated MAC infection.
Results: Positive predictors of histoplasma infection by univariate analysis included
lactate dehydrogenase level, white blood cell (WBC) count, platelet count, alkaline
phosphatase level, and CD4 cell count. By multivariate logistic regression analysis,
those characteristics that remained significant included a lactate dehydrogenase value
500 U/L (risk ratio [RR], 42; 95% confidence interval [CI], 18.53–97.5; p < .001),
alkaline phosphatase 300 U/L (RR, 9.35; 95% CI, 2.61–33.48; p .008), WBC
4.5 × 106/L (RR, 21.29; 95% CI, 6.79–66.75; p .008), and CD4 cell count (RR,
0.958; 95% CI, 0.946–0.971; p .001).
Conclusions: A predictive model for distinguishing disseminated histoplasmosis
from MAC infection was developed using lactate dehydrogenase and alkaline phosphatase
levels as well as WBC count. This model had a sensitivity of 83%, a specificity
of 91%, and a misclassification rate of 13%.
Clinical Prediction Model for Differentiation of Disseminated Histoplasma capsulatum and Mycobacterium avium Complex Infections in Febrile Patients With AIDS .Gravis E,Vanden H etal ,JAIDS 2000;24:30-36.
34
Fig. 1
FIG. 1. Receiver operating characteristic (ROC) curve for individual variables and full model. The solid diagonal line indicates an area under the curve (AUC) of 0.5, which corresponds to a random chance at discrimination. LDH, lactate dehydrogenase; WBC, white blood cells; Alk Phos, alkaline phosphatase.
Copyright © 2009 JAIDS Journal of Acquired Immune Deficiency Syndromes. Published by Lippincott Williams & Wilkins.
34
Clinical Prediction Model for Differentiation of Disseminated Histoplasma capsulatum and Mycobacterium avium Complex Infections in Febrile Patients With AIDS
Graviss, Edward A.; Vanden Heuvel, Elizabeth A.; Lacke, Christine E.; Spindel, Steven A.; White, A. Clinton Jr; Hamill, Richard J.
JAIDS Journal of Acquired Immune Deficiency Syndromes. 24(1):30-36, May 1, 2000.
doi:
35
Overfitting Bias – “Data snooped” cutoffs take advantage of chance variations in derivations set making test look falsely good.
Incorporation Bias – index test part of gold standard (Sensitivity Up, Specificity Up)
Verification/Referral Bias – positive index test increases referral to gold standard (Sensitivity Up, Specificity Down)
Double Gold Standard – positive index test causes application of definitive gold standard, negative index test results in clinical follow-up (Sensitivity Up, Specificity Up)
Spectrum Bias◦ D+ sickest of the sick (Sensitivity Up)◦ D- wellest of the well (Specificity Up)
Biases in Studies of Diagnostic Test Accuracy
Clinicians,probability and EBM - T. Newman MD
36
STARD statement is what CONSORT is to Clinical Trials and what STROBE is to Observational Studies
The objective is to improve the accuracy and completeness of reporting of studies of diagnostic accuracy, to allow readers to assess the potential for bias in the study (internal validity) and to evaluate its generalisability (external validity).
The STARD statement consist of a checklist of 25 items and recommends the use of a flow diagram which describe the design of the study and the flow of patients.
Handouts attached More on www.stard-statement.org
Reporting of Studies of Diagnostic Accuracy – The STARD Statement
37
These tools are very relevant in our setting as we have lots of unanswered questions regarding alternative , cost-effective and optimal strategies for patient monitoring and treatment simplification
Conclusion