tests of diagnostic accuracy

1

Studies of Diagnostic Accuracy and their Reporting

Simba Takuva, MBChB, MSc, DipHIVManData Analysis and Working Group Meeting

Epidemiology and Biostatistics DivisionClinical HIV Research Unit

2

Medicine is the science of uncertainty and the art of probability

…E. Mumford

3

1. Acknowledgements2. Introduction3. Sensitivity and Specificity4. Receiver Operator Characteristic (ROC) curves5. Predictive values (PV)6. Prevalence7. Likelihood Ratios (LR)8. Logistic Regression9. Clinical prediction rules10. Bias in Studies in Diagnostic Accuracy11. Reporting studies of diagnostic accuracy12. References

Outline of presentation

4

Clinical Epidemiology : the essentials. 4th edition. Fletcher & Fletcher. Lippincott Williams & Wilkins

Thomas Newman :Lecture notes series -UCSF Paul Rheeder: Lecture notes - UP

acknowledgements

5

Examples of studies of diagnostic accuracy◦ Diagnostic accuracy

of CD4 cell count increase for virologic response after initiating highly active antiretroviral therapy

Bisson, G P; Gross, R; Rollins, C; Bellamy, S; Weinstein, R; Friedman, H; et al . AIDS, August 1, 2006, 20(12):1613-1619

Changes in total lymphocyte count as a surrogate for changes in CD4 count following initiation of HAART: implications for monitoring in resource-limited settings.Mahajan AP, Hogan JW, Snyder B, Kumarasamy N, Mehta K J Acquir Immune Defic Syndr. 2004 May 1;36(1):567-75

◦ Validation of a WHO algorithm with risk assessment for the clinical management of vaginal discharge in Mwanza, Tanzania.Mayaud P, ka-Gina G, Cornelissen J, Todd J, Kaatano G , reference ?

Introduction

http://journals.lww.com/aidsonline/Fulltext/2006/08010/Diagnostic_accuracy_of_CD4_cell_count_increase_for.6.aspx



http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=Search&Term=%22Mahajan%20AP%22%5BAuthor%5D&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_RVAbstractPlus

http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=Search&Term=%22Hogan%20JW%22%5BAuthor%5D&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_RVAbstractPlus

http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=Search&Term=%22Snyder%20B%22%5BAuthor%5D&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_RVAbstractPlus

http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=Search&Term=%22Kumarasamy%20N%22%5BAuthor%5D&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_RVAbstractPlus

http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=Search&Term=%22Mehta%20K%22%5BAuthor%5D&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DiscoveryPanel.Pubmed_RVAbstractPlus

http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=Search&Term=%22Mayaud%20P%22%5BAuthor%5D&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVAbstractPlusDrugs1

http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=Search&Term=%22ka-Gina%20G%22%5BAuthor%5D&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVAbstractPlusDrugs1

http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=Search&Term=%22Cornelissen%20J%22%5BAuthor%5D&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVAbstractPlusDrugs1

http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=Search&Term=%22Todd%20J%22%5BAuthor%5D&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVAbstractPlusDrugs1

http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=Search&Term=%22Kaatano%20G%22%5BAuthor%5D&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVAbstractPlusDrugs1

6

The Evolution of Diagnostic reasoning

Patient either has the disease or not : D+ or D-

Test results are dichotomous◦ Most tests have more than two possible answers

Disease states are dichotomous◦ Many diseases occur on a spectrum◦ There are many kinds of “nondisease”!

Evaluating diagnostic tests Reliability Accuracy Usefulness

introduction

7

Introduction - The Evolution of Diagnostic Reasoning

Sens and specificity

Likelihood ratio

PPV and NPV

ROC curves

Logistic regression

diagram from P. Rheeder :EBM notes

8

introduction

AIM: to use clinical and non clinical factors to cross thresholds

Crossing test / treatment threshold

Do not test Test and treat on basis of Do not test Do not treat test result Get on with

treatment

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Likelihood of target disorder

9

Figure 1 : the relationship between a diagnostic test result and occurrence of disease

introduction

Disease present (+)

Disease absent (-)

Test (+) True positives

False positives

Test (-) False negatives

True negatives

10

fig 1 shows the relationship between a diagnostic test result and occurrence of disease

the goal of all studies aimed at describing the value of diagnostic tests should be to obtain data for all four cells shown in fig 1

a test’s accuracy is considered in relation to some reference standard or ‘ Gold Standard’

some issues with tests of diagnostic accuracy :1. lack of information on negative tests2. lack of information on test results in the nondiseased 3. Lack of objective standards for disease all the above 3 issues lead to the concern that no new test can perform

better than an established gold standard unless special strategies are used

introduction

11

describe how often the test is correct in the diseased and non-diseased groups respectively

a Sn test has a high true positive ratio ( TPR ) and is good in detecting patients with the target disease

a Sp test has a high true negative ratio ( TNR ) and is good in detecting patients without the target disease

Sensitivity (Sn) and Specificity (Sp)

12

Sn = TPR = p(T+|D+ ) = a/a+c Proportion of patients with disease who test positive SNOUTS : sensitive test - ,rules out ( high true

positive ratio )

sensitivity (Sn) and specificity (Sp)

Disease present (+)

Disease absent (-)

Test (+) True positives a False positives b

Test (-) False negatives c True negatives d

13

Sp = TNR = p(T-|D- ) = d/b+d The proportion of patients without disease who test

negative SPINS : specific test + ,rules in ( high true negative

ratio )


Disease present (+)

Disease absent (-)



14

it is obviously desirable to have a test that is both highly sensitive and highly specific

unfortunately , usually not possible – instead there is a trade –off between the Sp and Sn

especially true when data take on range of values – in this case , the location of a cuttoff point is an arbitrary decision

as a result, for any given test result on a continous scale, one characteristic i.e. Sn can only be increased at the expense of the other i.e Sp


16

expresses the relationship between Sn and Sp it is a popular summary measure of the discriminatory ability

of a clinical marker that can be used when there is a gold standard

the ROC plots Sn against 1-Sp ( True positivity vs False positivity ) for all thresholds that could have been used to define ‘test positive’

assessed by measuring the area under the curve (AUC) AUC ranges from 0.5 (no discriminatory ability) to 1.0

(perfect discriminatory ability) two diagnostic tests can be compared by calculating the

difference between the areas under their 2 ROC curves

The Receiver Operator Characteristic (ROC) curve

19

Characteristics1. shows how severe the trade –off between Sp and Sn is for a

test2. the best cutt-off point is at or near the ‘shoulder ‘ of the

curve 3. the closer the curve follows the left hand border and then the

top border of the ROC space, the more accurate the test4. the closer the curve follows the 45 degree diagonal of the

ROC space ,the less accurate the test5. the slope of the tangent line at a cutt-off point gives the

likelihood ratio (LR) for the value of the test

the ROC curve

20

Comparing diagnostic test performance Accuracy is measured by the AUC

◦ 0.90 to 1 = excellent ◦ 0.80 to 0.90 = good ◦ 0.70 to 0.80 = fair ◦ 0.60 to 0.70 = poor ◦ 0.50 to 0.60 = fail

the ROC curve

21

clinicians are more concerned with the following question (than with Sp and Sn):

does the patient have the disease , given the results of a test ?

Predictive Values and Prevelence

22

the predictive value (PV) is the probability of disease, given the results of a test

Only absolute diagnostic measure of diagnostic accuracy

it is also known as the posterior ( posttest ) probability - the probability of disease after the test result is known

Positive Predictive Value (PPV) is the probability of disease in a patient with a positive (abnormal) test result

Negative Predictive Value (NPV) is the probability of not having the disease when the test result is negative (normal)

predictive value and prevalence

23

PPV = a/a+b NPV = d/c+d P = a+c/(a+b+c+d)

Disease present (+)

Disease absent (-)



24

Prevalence ( P ) is the proportion of persons in a defined population at a given point in time having the condition in question

prevalence is also known as pretest (prior ) probability


25

Determinants of predictive value (PV)

the formula relating these concepts is derived from Baye’s theorem of conditional probabilities :

PPV = Sn * P / (Sn* P) + (1-Sp)*(1-P)


26

prevalence is an important determinant of the interpretation of the result of a diagnostic test

when the prevalence of disease in the population tested is relatively high – the test performs well

at lower prevalences, the PPV drops to nearly zero, and the test is virtually useless

As Sn and Sp fall, the influence of prevalence on PV becomes more pronounced !


27

Pitfalls in the literature

data from publications are often gathered in university teaching hospitals were prevalence of serious disease is relatively high – as a result ,statements about PPV of a test are applied in less highly selected settings

occasionally ,authors compare the perfomance of a test in a number of diseased patients to an equal number of undiseased patients – this is efficient for Sn and Sp but means little for PPV because already the investigators have artificially set the prevalence of disease at 50%


28

revisiting Baye’s theorem :

The posttest probability (PPV)of disease is related to the pretest probability (prev ) and the test characteristics

Baye’s formula makes use of 2 concepts 1. Odds2. Likelihood ratio

The Likelihood Ratio (LR) is the probability of having a positive test result when you have disease divided by the probability of having the same result when you do not have disease ( it is an odds ratio )

pretest odds x LR = posttest odds

Likelihood Ratios (LR)

29

Advantages of LR◦ Is more stable ( depends on Sn and Sp not prevalence)??◦ Can use different cut-off values eg not dependent on one cut

off value only◦ Used in Bayesian reasoning◦ Likelihood ratios can deal with tests with more than two

possible results (not just normal/abnormal).

Likelihood ratios (LR)

30

Fagan’s Normogram

NEJM 1975; 293: 257

31

ROC curves can be compared statistically to see if added info is of any benefit

regression coefficients can also be made into scores risk scores used to predict outcome (Diagnosis)

Logistic Regression

32

Multiple Tests◦ Usually there is need for multiple tests

1. Parallel testing2. Serial testing

Clinical prediction rules

◦ These are rules used to “predict” diagnostic outcome◦ A modification of parallel testing when a combination of multiple

tests are used – some with positive and some with negative results.◦ Usually includes history , physical examination and certain

laboratory tests

The independence assumption

Clinical Prediction Rules

33

Background: Disseminated infection with Histoplasma capsulatum and Mycobacterium

avium complex (MAC) in patients with AIDS are frequently difficult to distinguish

clinically.

Methods: We retrospectively compared demographic information, other opportunistic

infections, medications, symptoms, physical examination findings and laboratory

parameters at the time of hospital presentation for 32 patients with culture documented

disseminated histoplasmosis and 58 patients with disseminated MAC infection.

Results: Positive predictors of histoplasma infection by univariate analysis included

lactate dehydrogenase level, white blood cell (WBC) count, platelet count, alkaline

phosphatase level, and CD4 cell count. By multivariate logistic regression analysis,

those characteristics that remained significant included a lactate dehydrogenase value

500 U/L (risk ratio [RR], 42; 95% confidence interval [CI], 18.53–97.5; p < .001),

alkaline phosphatase 300 U/L (RR, 9.35; 95% CI, 2.61–33.48; p .008), WBC

4.5 × 106/L (RR, 21.29; 95% CI, 6.79–66.75; p .008), and CD4 cell count (RR,

0.958; 95% CI, 0.946–0.971; p .001).

Conclusions: A predictive model for distinguishing disseminated histoplasmosis

from MAC infection was developed using lactate dehydrogenase and alkaline phosphatase

levels as well as WBC count. This model had a sensitivity of 83%, a specificity

of 91%, and a misclassification rate of 13%.

Clinical Prediction Model for Differentiation of Disseminated Histoplasma capsulatum and Mycobacterium avium Complex Infections in Febrile Patients With AIDS .Gravis E,Vanden H etal ,JAIDS 2000;24:30-36.

34

Fig. 1

FIG. 1. Receiver operating characteristic (ROC) curve for individual variables and full model. The solid diagonal line indicates an area under the curve (AUC) of 0.5, which corresponds to a random chance at discrimination. LDH, lactate dehydrogenase; WBC, white blood cells; Alk Phos, alkaline phosphatase.

Copyright © 2009 JAIDS Journal of Acquired Immune Deficiency Syndromes. Published by Lippincott Williams & Wilkins.

34

Clinical Prediction Model for Differentiation of Disseminated Histoplasma capsulatum and Mycobacterium avium Complex Infections in Febrile Patients With AIDS

Graviss, Edward A.; Vanden Heuvel, Elizabeth A.; Lacke, Christine E.; Spindel, Steven A.; White, A. Clinton Jr; Hamill, Richard J.

JAIDS Journal of Acquired Immune Deficiency Syndromes. 24(1):30-36, May 1, 2000.

doi:

http://journals.lww.com/jaids/Fulltext/2000/05010/Clinical_Prediction_Model_for_Differentiation_of.5.aspx





35

Overfitting Bias – “Data snooped” cutoffs take advantage of chance variations in derivations set making test look falsely good.

Incorporation Bias – index test part of gold standard (Sensitivity Up, Specificity Up)

Verification/Referral Bias – positive index test increases referral to gold standard (Sensitivity Up, Specificity Down)

Double Gold Standard – positive index test causes application of definitive gold standard, negative index test results in clinical follow-up (Sensitivity Up, Specificity Up)

Spectrum Bias◦ D+ sickest of the sick (Sensitivity Up)◦ D- wellest of the well (Specificity Up)

Biases in Studies of Diagnostic Test Accuracy

Clinicians,probability and EBM - T. Newman MD

36

STARD statement is what CONSORT is to Clinical Trials and what STROBE is to Observational Studies

The objective is to improve the accuracy and completeness of reporting of studies of diagnostic accuracy, to allow readers to assess the potential for bias in the study (internal validity) and to evaluate its generalisability (external validity).

The STARD statement consist of a checklist of 25 items and recommends the use of a flow diagram which describe the design of the study and the flow of patients.

Handouts attached More on www.stard-statement.org

Reporting of Studies of Diagnostic Accuracy – The STARD Statement

http://www.stard-statement.org/

37

These tools are very relevant in our setting as we have lots of unanswered questions regarding alternative , cost-effective and optimal strategies for patient monitoring and treatment simplification

Conclusion

tests of diagnostic accuracy

Health & Medicine