tests of diagnostic accuracy

37
1 Studies of Diagnostic Accuracy and their Reporting Simba Takuva, MBChB, MSc, DipHIVMan Data Analysis and Working Group Meeting Epidemiology and Biostatistics Division Clinical HIV Research Unit

Upload: simbarashe-takuva

Post on 07-May-2015

566 views

Category:

Health & Medicine


3 download

TRANSCRIPT

Page 1: Tests of diagnostic accuracy

1

Studies of Diagnostic Accuracy and their Reporting

Simba Takuva, MBChB, MSc, DipHIVManData Analysis and Working Group Meeting

Epidemiology and Biostatistics DivisionClinical HIV Research Unit

Page 2: Tests of diagnostic accuracy

2

Medicine is the science of uncertainty and the art of probability

…E. Mumford

Page 3: Tests of diagnostic accuracy

3

1. Acknowledgements2. Introduction3. Sensitivity and Specificity4. Receiver Operator Characteristic (ROC) curves5. Predictive values (PV)6. Prevalence7. Likelihood Ratios (LR)8. Logistic Regression9. Clinical prediction rules10. Bias in Studies in Diagnostic Accuracy11. Reporting studies of diagnostic accuracy12. References

Outline of presentation

Page 4: Tests of diagnostic accuracy

4

Clinical Epidemiology : the essentials. 4th edition. Fletcher & Fletcher. Lippincott Williams & Wilkins

Thomas Newman :Lecture notes series -UCSF Paul Rheeder: Lecture notes - UP

acknowledgements

Page 5: Tests of diagnostic accuracy

5

Examples of studies of diagnostic accuracy◦ Diagnostic accuracy

of CD4 cell count increase for virologic response after initiating highly active antiretroviral therapy

Bisson, G P; Gross, R; Rollins, C; Bellamy, S; Weinstein, R; Friedman, H; et al . AIDS, August 1, 2006, 20(12):1613-1619

Changes in total lymphocyte count as a surrogate for changes in CD4 count following initiation of HAART: implications for monitoring in resource-limited settings.Mahajan AP, Hogan JW, Snyder B, Kumarasamy N, Mehta K J Acquir Immune Defic Syndr. 2004 May 1;36(1):567-75

◦ Validation of a WHO algorithm with risk assessment for the clinical management of vaginal discharge in Mwanza, Tanzania.Mayaud P, ka-Gina G, Cornelissen J, Todd J, Kaatano G , reference ?

Introduction

Page 6: Tests of diagnostic accuracy

6

The Evolution of Diagnostic reasoning

Patient either has the disease or not : D+ or D-

Test results are dichotomous◦ Most tests have more than two possible answers

Disease states are dichotomous◦ Many diseases occur on a spectrum◦ There are many kinds of “nondisease”!

Evaluating diagnostic tests Reliability Accuracy Usefulness

introduction

Page 7: Tests of diagnostic accuracy

7

Introduction - The Evolution of Diagnostic Reasoning

Sens and specificity

Likelihood ratio

PPV and NPV

ROC curves

Logistic regression

diagram from P. Rheeder :EBM notes

Page 8: Tests of diagnostic accuracy

8

introduction

AIM: to use clinical and non clinical factors to cross thresholds

Crossing test / treatment threshold

Do not test Test and treat on basis of Do not test Do not treat test result Get on with

treatment

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Likelihood of target disorder

Page 9: Tests of diagnostic accuracy

9

Figure 1 : the relationship between a diagnostic test result and occurrence of disease

introduction

Disease present (+)

Disease absent (-)

Test (+) True positives

False positives

Test (-) False negatives

True negatives

Page 10: Tests of diagnostic accuracy

10

fig 1 shows the relationship between a diagnostic test result and occurrence of disease

the goal of all studies aimed at describing the value of diagnostic tests should be to obtain data for all four cells shown in fig 1

a test’s accuracy is considered in relation to some reference standard or ‘ Gold Standard’

some issues with tests of diagnostic accuracy :1. lack of information on negative tests2. lack of information on test results in the nondiseased 3. Lack of objective standards for disease all the above 3 issues lead to the concern that no new test can perform

better than an established gold standard unless special strategies are used

introduction

Page 11: Tests of diagnostic accuracy

11

describe how often the test is correct in the diseased and non-diseased groups respectively

a Sn test has a high true positive ratio ( TPR ) and is good in detecting patients with the target disease

a Sp test has a high true negative ratio ( TNR ) and is good in detecting patients without the target disease

Sensitivity (Sn) and Specificity (Sp)

Page 12: Tests of diagnostic accuracy

12

Sn = TPR = p(T+|D+ ) = a/a+c Proportion of patients with disease who test positive SNOUTS : sensitive test - ,rules out ( high true

positive ratio )

sensitivity (Sn) and specificity (Sp)

Disease present (+)

Disease absent (-)

Test (+) True positives a False positives b

Test (-) False negatives c True negatives d

Page 13: Tests of diagnostic accuracy

13

Sp = TNR = p(T-|D- ) = d/b+d The proportion of patients without disease who test

negative SPINS : specific test + ,rules in ( high true negative

ratio )

sensitivity (Sn) and specificity (Sp)

Disease present (+)

Disease absent (-)

Test (+) True positives a False positives b

Test (-) False negatives c True negatives d

Page 14: Tests of diagnostic accuracy

14

it is obviously desirable to have a test that is both highly sensitive and highly specific

unfortunately , usually not possible – instead there is a trade –off between the Sp and Sn

especially true when data take on range of values – in this case , the location of a cuttoff point is an arbitrary decision

as a result, for any given test result on a continous scale, one characteristic i.e. Sn can only be increased at the expense of the other i.e Sp

sensitivity (Sn) and specificity (Sp)

Page 15: Tests of diagnostic accuracy

15

Page 16: Tests of diagnostic accuracy

16

expresses the relationship between Sn and Sp it is a popular summary measure of the discriminatory ability

of a clinical marker that can be used when there is a gold standard

the ROC plots Sn against 1-Sp ( True positivity vs False positivity ) for all thresholds that could have been used to define ‘test positive’

assessed by measuring the area under the curve (AUC) AUC ranges from 0.5 (no discriminatory ability) to 1.0

(perfect discriminatory ability) two diagnostic tests can be compared by calculating the

difference between the areas under their 2 ROC curves

The Receiver Operator Characteristic (ROC) curve

Page 17: Tests of diagnostic accuracy

17

Page 18: Tests of diagnostic accuracy

18

Page 19: Tests of diagnostic accuracy

19

Characteristics1. shows how severe the trade –off between Sp and Sn is for a

test2. the best cutt-off point is at or near the ‘shoulder ‘ of the

curve 3. the closer the curve follows the left hand border and then the

top border of the ROC space, the more accurate the test4. the closer the curve follows the 45 degree diagonal of the

ROC space ,the less accurate the test5. the slope of the tangent line at a cutt-off point gives the

likelihood ratio (LR) for the value of the test

the ROC curve

Page 20: Tests of diagnostic accuracy

20

Comparing diagnostic test performance Accuracy is measured by the AUC

◦ 0.90 to 1 = excellent ◦ 0.80 to 0.90 = good ◦ 0.70 to 0.80 = fair ◦ 0.60 to 0.70 = poor ◦ 0.50 to 0.60 = fail

the ROC curve

Page 21: Tests of diagnostic accuracy

21

clinicians are more concerned with the following question (than with Sp and Sn):

does the patient have the disease , given the results of a test ?

Predictive Values and Prevelence

Page 22: Tests of diagnostic accuracy

22

the predictive value (PV) is the probability of disease, given the results of a test

Only absolute diagnostic measure of diagnostic accuracy

it is also known as the posterior ( posttest ) probability - the probability of disease after the test result is known

Positive Predictive Value (PPV) is the probability of disease in a patient with a positive (abnormal) test result

Negative Predictive Value (NPV) is the probability of not having the disease when the test result is negative (normal)

predictive value and prevalence

Page 23: Tests of diagnostic accuracy

23

PPV = a/a+b NPV = d/c+d P = a+c/(a+b+c+d)

Disease present (+)

Disease absent (-)

Test (+) True positives a False positives b

Test (-) False negatives c True negatives d

Page 24: Tests of diagnostic accuracy

24

Prevalence ( P ) is the proportion of persons in a defined population at a given point in time having the condition in question

prevalence is also known as pretest (prior ) probability

predictive value and prevalence

Page 25: Tests of diagnostic accuracy

25

Determinants of predictive value (PV)

the formula relating these concepts is derived from Baye’s theorem of conditional probabilities :

PPV = Sn * P / (Sn* P) + (1-Sp)*(1-P)

predictive value and prevalence

Page 26: Tests of diagnostic accuracy

26

prevalence is an important determinant of the interpretation of the result of a diagnostic test

when the prevalence of disease in the population tested is relatively high – the test performs well

at lower prevalences, the PPV drops to nearly zero, and the test is virtually useless

As Sn and Sp fall, the influence of prevalence on PV becomes more pronounced !

predictive value and prevalence

Page 27: Tests of diagnostic accuracy

27

Pitfalls in the literature

data from publications are often gathered in university teaching hospitals were prevalence of serious disease is relatively high – as a result ,statements about PPV of a test are applied in less highly selected settings

occasionally ,authors compare the perfomance of a test in a number of diseased patients to an equal number of undiseased patients – this is efficient for Sn and Sp but means little for PPV because already the investigators have artificially set the prevalence of disease at 50%

predictive value and prevalence

Page 28: Tests of diagnostic accuracy

28

revisiting Baye’s theorem :

The posttest probability (PPV)of disease is related to the pretest probability (prev ) and the test characteristics

Baye’s formula makes use of 2 concepts 1. Odds2. Likelihood ratio

The Likelihood Ratio (LR) is the probability of having a positive test result when you have disease divided by the probability of having the same result when you do not have disease ( it is an odds ratio )

pretest odds x LR = posttest odds

Likelihood Ratios (LR)

Page 29: Tests of diagnostic accuracy

29

Advantages of LR◦ Is more stable ( depends on Sn and Sp not prevalence)??◦ Can use different cut-off values eg not dependent on one cut

off value only◦ Used in Bayesian reasoning◦ Likelihood ratios can deal with tests with more than two

possible results (not just normal/abnormal).

Likelihood ratios (LR)

Page 30: Tests of diagnostic accuracy

30

Fagan’s Normogram

NEJM 1975; 293: 257

Page 31: Tests of diagnostic accuracy

31

ROC curves can be compared statistically to see if added info is of any benefit

regression coefficients can also be made into scores risk scores used to predict outcome (Diagnosis)

Logistic Regression

Page 32: Tests of diagnostic accuracy

32

Multiple Tests◦ Usually there is need for multiple tests

1. Parallel testing2. Serial testing

Clinical prediction rules

◦ These are rules used to “predict” diagnostic outcome◦ A modification of parallel testing when a combination of multiple

tests are used – some with positive and some with negative results.◦ Usually includes history , physical examination and certain

laboratory tests

The independence assumption

Clinical Prediction Rules

Page 33: Tests of diagnostic accuracy

33

Background: Disseminated infection with Histoplasma capsulatum and Mycobacterium

avium complex (MAC) in patients with AIDS are frequently difficult to distinguish

clinically.

Methods: We retrospectively compared demographic information, other opportunistic

infections, medications, symptoms, physical examination findings and laboratory

parameters at the time of hospital presentation for 32 patients with culture documented

disseminated histoplasmosis and 58 patients with disseminated MAC infection.

Results: Positive predictors of histoplasma infection by univariate analysis included

lactate dehydrogenase level, white blood cell (WBC) count, platelet count, alkaline

phosphatase level, and CD4 cell count. By multivariate logistic regression analysis,

those characteristics that remained significant included a lactate dehydrogenase value

500 U/L (risk ratio [RR], 42; 95% confidence interval [CI], 18.53–97.5; p < .001),

alkaline phosphatase 300 U/L (RR, 9.35; 95% CI, 2.61–33.48; p .008), WBC

4.5 × 106/L (RR, 21.29; 95% CI, 6.79–66.75; p .008), and CD4 cell count (RR,

0.958; 95% CI, 0.946–0.971; p .001).

Conclusions: A predictive model for distinguishing disseminated histoplasmosis

from MAC infection was developed using lactate dehydrogenase and alkaline phosphatase

levels as well as WBC count. This model had a sensitivity of 83%, a specificity

of 91%, and a misclassification rate of 13%.

Clinical Prediction Model for Differentiation of Disseminated Histoplasma capsulatum and Mycobacterium avium Complex Infections in Febrile Patients With AIDS .Gravis E,Vanden H etal ,JAIDS 2000;24:30-36.

Page 34: Tests of diagnostic accuracy

34

Fig. 1

FIG. 1. Receiver operating characteristic (ROC) curve for individual variables and full model. The solid diagonal line indicates an area under the curve (AUC) of 0.5, which corresponds to a random chance at discrimination. LDH, lactate dehydrogenase; WBC, white blood cells; Alk Phos, alkaline phosphatase.

Copyright © 2009 JAIDS Journal of Acquired Immune Deficiency Syndromes. Published by Lippincott Williams & Wilkins.

34

Clinical Prediction Model for Differentiation of Disseminated Histoplasma capsulatum and Mycobacterium avium Complex Infections in Febrile Patients With AIDS

Graviss, Edward A.; Vanden Heuvel, Elizabeth A.; Lacke, Christine E.; Spindel, Steven A.; White, A. Clinton Jr; Hamill, Richard J.

JAIDS Journal of Acquired Immune Deficiency Syndromes. 24(1):30-36, May 1, 2000.

doi:

Page 35: Tests of diagnostic accuracy

35

Overfitting Bias – “Data snooped” cutoffs take advantage of chance variations in derivations set making test look falsely good.

Incorporation Bias – index test part of gold standard (Sensitivity Up, Specificity Up)

Verification/Referral Bias – positive index test increases referral to gold standard (Sensitivity Up, Specificity Down)

Double Gold Standard – positive index test causes application of definitive gold standard, negative index test results in clinical follow-up (Sensitivity Up, Specificity Up)

Spectrum Bias◦ D+ sickest of the sick (Sensitivity Up)◦ D- wellest of the well (Specificity Up)

Biases in Studies of Diagnostic Test Accuracy

Clinicians,probability and EBM - T. Newman MD

Page 36: Tests of diagnostic accuracy

36

STARD statement is what CONSORT is to Clinical Trials and what STROBE is to Observational Studies

The objective is to improve the accuracy and completeness of reporting of studies of diagnostic accuracy, to allow readers to assess the potential for bias in the study (internal validity) and to evaluate its generalisability (external validity).

The STARD statement consist of a checklist of 25 items and recommends the use of a flow diagram which describe the design of the study and the flow of patients.

Handouts attached More on www.stard-statement.org

Reporting of Studies of Diagnostic Accuracy – The STARD Statement

Page 37: Tests of diagnostic accuracy

37

These tools are very relevant in our setting as we have lots of unanswered questions regarding alternative , cost-effective and optimal strategies for patient monitoring and treatment simplification

Conclusion