epidemiology and biostatistics 679: clinical epidemiology june 6-29, 2005 instructors: dr. jean...

Epidemiology and Biostatistics 679: Clinical EpidemiologyJune 6-29, 2005

Instructors:

Dr. Jean Bourbeau ([email protected])Dr. Dick Menzies ([email protected])Dr. Kevin Schwartzman (course coordinator;

[email protected])

Research Offices:

Respiratory Epidemiology and Clinical Research UnitMontreal Chest Institute K1

3650 St. Urbain

Course Objectives

• The general objective of this 3-credit course is to provide students with a basic understanding of the methods of epidemiology, as applied to clinical practice and clinical research.

• Specifically, we will address key principles of testing and measurement in the clinical context, as well as study design, analysis, and inference in the clinical research setting.

• Students will be encouraged to apply concepts covered in the course to their own areas of interest.

Course Materials

• Textbook: Fletcher, Clinical Epidemiology: The Essentials, 3rd edition

• Course pack with supplemental readings from McGill bookstore

• Lecture notes, handouts, assignments from course website (www.mcgill.ca/epi-biostat/summer/courses)

• Journal articles on-line from Health Sciences Library (www.health.library.mcgill.ca)

Format

• Ten classroom sessions, from 1:30-4:45 Mondays, Wednesdays, and Fridays for four weeks (no class June 24 and July 1)

• Attendance at all sessions is mandatory.

• Students will be divided into teams of 3-4, for purposes of assignments and presentations (8 groups total)

Assignments

• Before each lecture, an assignment addressing key points of that day’s lecture will be distributed.

• During each classroom session, one team will give an oral presentation outlining its answers to the assignment on the topic of that day’s lecture. Over the month, all students will be expected to present in this fashion.

• The written assignments must be handed in (1 per team) at the beginning of the following lecture.

Assignments

• For lecture 1 (today: diagnostic tests and screening) the oral presentation of the assignment will be during lecture 2, with the written assignment due at the beginning of lecture 3

• For lecture 2 (Wednesday, June 8: measurement issues) the oral presentation will also be during that class, with the written assignment due at the beginning of lecture 3

• After that, there will be one oral presentation per classroom session, with the written assignment due at the beginning of the following session

Assignments

• The assignments will include questions about papers from the medical literature, which reflect issues addressed in the lectures

• With the exception of assignments 1 and 2, these papers will be selected by the group responsible for each oral presentation, and identified ahead of time so that all students in the class use the same paper.

• Papers should be available on-line through the health sciences library

• For example, the students responsible for the oral presentation on cohort studies will select a paper reporting a cohort study of interest to them.

Assignments

• For the final assignment, each group will hand in a summary (maximum 2 pages double-spaced) of an original proposed research protocol, addressing a clinical research question which group members consider relevant.

• Further details on content and format will be provided in class. These summaries will be presented by the groups in class on Monday, June 27 and handed in that day.

Final Exam

• A written final exam, in short-answer format, will be administered in class on Wednesday, June 29.

Grading

• Written homework assignments (8): 20%• Oral presentation of homework assignment: 10%• Written protocol summary: 20%• Oral presentation of protocol summary: 10%• Final exam: 30%• Class participation: 10%

TOTAL 100%

Academic Integrity

• It is understood that assignments submitted by groups of students will include contributions of all group members; for such assignments, a single copy submitted with all group members’ names will be sufficient.

• However, we expect that each group will submit its own assignment, written separately from those of other groups.

• The same holds true for the protocol summaries. • Where assignments cite others’ research work,

appropriate references must be provided. • Direct quotes from other writers should be indicated by

quotation marks.

Academic IntegrityIII. ACADEMIC OFFENCESThe integrity of University academic life and of the degrees the University confers is dependent uponthe honesty and soundness of the teacher- student learning relationship and, as well, that of theevaluation process. Conduct by any member of the University community that adversely affects thisrelationship or this process must, therefore, be considered a serious offence.15 Plagiarism(a) No student shall, with intent to deceive, represent the work of another person as his or her own inany academic writing, essay, thesis, research report, project or assignment submitted in a course orprogram of study or represent as his or her own an entire essay or work of another, whether thematerial so represented constitutes a part or the entirety of the work submitted.(b) Upon demonstration that the student has represented and submitted another person’s workas his or her own, it shall be presumed that the student intended to deceive; the student shall bear theburden of rebutting this presumption by evidence satisfying the person or body hearing the case that no

such intent existed, notwithstanding Article 22 of the Charter of Student Rights.(c) No student shall contribute any work to another student with the knowledge that the latter maysubmit the work in part or whole as his or her own. Receipt of payment for work contributed shall because for presumption that the student had such knowledge; the student shall bear the burden ofrebutting this presumption by evidence satisfying the person or body hearing the case that no such intent

existed (notwithstanding Article 22 of the Charter of Students’ Rights).Downloaded and excerpted from A Handbook on Student Rights and Responsibilities, 2003, p. 17.

Available on-line at http://upload.mcgill.ca/secretariat/greenbookenglish.pdfAdditional information is available at www.mcgill.ca/integrity/

http://upload.mcgill.ca/secretariat/greenbookenglish.pdf






http://www.mcgill.ca/integrity/



# Date Topics Instructor(s)

1 Mon June 6 Introduction, course overview

Diagnostic tests, screening, prevention

All

Schwartzman

2 Wed June 8 Measurement issues: precision, validity, responsiveness; clinical scales/scores

Bourbeau

3 Fri June 10 From clinical observations to research: hierarchy of study designs

Planning and designing a first study

Menzies

Dr. S. Dial, MUHC

4 Mon June 13 Measures of disease occurrence, association; descriptive, cross-sectional and ecologic studies

Menzies

5 Wed June 15 Cohort studies, survival analysis, selection bias Menzies

6 Fri June 17 Clinical trials Bourbeau

7 Mon June 20 Case-control studies

Beginning your own clinical research

Peer review process; protocol assignment

Schwartzman/

Menzies

8 Wed June 22 Confounding, matching; analysis

Inference and hypothesis testing

Schwartzman

Fri June 24 HOLIDAY—NO CLASS

9 Mon June 27 Protocol summary presentations

Exam review

All

10 Wed June 29 Final exam All

Lecture 1

Topic: DIAGNOSTIC TESTS AND SCREENING

ObjectivesStudents will be able to:

1. Define and calculate the following:Sensitivity, specificity, positive and negative predictive values of diagnostic tests

2. Illustrate the influence of prevalence and/or pre-test probability on predictive values

3. Define pre- and post-test probabilities in terms of Bayes’ theorem and likelihood ratios

4. Identify key elements of screening programs and evaluations of their impact

5. Describe the impact of misclassification on results of clinical research studies

Diagnostic Tests and Screening

Readings:• Fletcher, chapters 1 (Introduction), 3

(Diagnosis), 8 (Prevention)• Barry MJ, Prostate-specific antigen testing for

early diagnosis of prostate cancer, N Engl J Med 2001; 344:1373-1377 [Clinical Practice]

• Hamm CW et al, Emergency room triage of patients with acute chest pain by means of rapid testing for cardiac troponin T or troponin I, N Engl J Med 1997; 337:1648-53 (for assignment)

Tests as diagnostic aids and screening tools - key element of clinical medicine and public health.

• Electrocardiogram, cardiac enzymes for diagnosis of myocardial infarction

• Murphy’s sign (right upper abdominal tenderness on inspiration) in diagnosis of acute cholecystitis

• Pap smear for detection of cervical cancerAlso essential in many epidemiologic studies where

diagnostic criteria and/or tests are used to establish exposure, outcome status.

Goal is to minimize misclassification; yet some misclassification may be inevitable for logistical reasons

Diagnostic Tests

Diagnostic Tests and Screening—Slide 2

Definitive diagnosis/classification may be difficult or impossible to obtain.

“Gold standard” may be expensive, inappropriate (e.g. autopsy based) or unsuitable (e.g. clinical follow-up when immediate decision required).

Tests may serve as surrogates but this requires that they be appropriately validated against a suitable gold standard - and that their properties be documented.

Diagnostic Tests and Screening--Slide 3

We will focus largely on the situation where the diagnosis/outcome and the test result are both dichotomous, i.e.

Disease: Present vs. absent

Test: Positive vs. negative

We need to know how well the test separates those who have the disease of interest from those who do not.

Diagnostic Tests and Screening-- Slide 4

We can use a 2x2 table to describe the various possibilities:

Disease + Disease -

Test + True + False +

Test - False - True -

True positive rate = P(T+ D+)

= TP/(TP+FN)

= Sensitivity: The probability that a diseased individual will be identified as such by the test


Disease + Disease -

Test + True + False +

Test - False - True -

True negative rate = P(T- D-)

= TN/(TN+FP)

= Specificity: The probability that an individual without the disease will be identified as such by the test


Complementary probabilities:

False negative rate = FN/(TP+FN) = P(T- D+)

= 1-sensitivity

False positive rate = FP/(TN+FP) = P(T+ D-)

= 1-specificity


Example:

A researcher develops a new saliva pregnancy test. She collects samples from 100 women known to be pregnant by blood test (the gold standard) and 100 women known not be pregnant, also based on the same blood test.

The saliva test is “positive” in 95 of the pregnant women. It is also “positive” in 15 of the non-pregnant women. What are the sensitivity and specificity?


Pregnant Non-pregnantTotals

Saliva + 95 15 110

Saliva - 5 85 90

Totals 100 100 200

Sensitivity = TP/(TP+FN) = 95/100 = 95%

Specificity = TN/(TN+FP) = 85/100 = 85%


Is it more important that a test be sensitive or specific?• It depends on its purpose. A cheap mass screening

test should be sensitive (few cases missed). A test designed to confirm the presence of disease should be specific (few cases wrongly diagnosed).

• Note that sensitivity and specificity are two distinct properties. Where classification is based on an cutpoint along a continuum, there is a tradeoff between the two.


Example:The saliva pregnancy test detects progesterone.

A refined version is developed.

Suppose you add a drop of indicator solution to the saliva sample. It can stay clear (0 reaction) or turn green (1+), red (2+), or black (3+).

(For purposes of discussion we will ignore overlapping colors)


The researcher conducts a validation study and finds the following:

Pregnant Non-pregnantTotals

Saliva 3+ 85 5 90

Saliva 2+ 10 10 20

Saliva 1+ 3 17 20

Saliva 0 2 68 70

Totals 100 100 200


The sensitivity and specificity of the saliva test will depend on the definition of “positive” and “negative” used.

• If “positive” 1+, sensitivity = (85+10+3)/100 = 98%

specificity = 68/100 = 68%

• If “positive” 2+, sensitivity = (85+16)/100 = 95%

specificity = (68+17)/100 = 85%

• If “positive” = 3+, sensitivity = 85/100 = 85%

specificity = (68+17+10)/100 = 95%


The choice of cutpoint depends on the relative adverse consequences of false-negatives vs. false-positives.

If it is most important not to miss anyone, use sensitivity and specificity.

If it is most important that people not be erroneously labeled as having the condition, use sensitivity and specificity.


In practice, the clinician or researcher needs to know how to interpret test results without the simultaneous gold standard measurement.

(If you already know the “gold standard” result, why would you obtain the other test?)

Hence we need to know:

1. How likely is a patient to have the condition of interest, given a “positive” test result?

This is P(D+ T+), or the positive predictive value of the test [=TP/(TP+FP)]

2. How likely is a patient not to have the condition of interest, given a “negative” test result?

This is P(D- T-), or the negative predictive value of the test [=TN/(TN+FN)]


Key point: The positive and negative predictive values depend on the pretest probability of the condition of interest - in addition to the sensitivity and specificity of the test.

This pretest probability is often the prevalence of the condition in the population of interest.

But it can also reflect restriction of this population based on clinical features and/or other test results.

For example, the pretest probability of pregnancy will be very different among young women using oral contraceptives from that among sexually active young women using no form of contraception.


Example: The saliva pregnancy test is administered 30 days after the first day of the last menstrual period to two groups of women who have thus far “missed” a period.

Group 1: 1000 sexually active young women using no contraception. Pretest probability of pregnancy 40% (hypothetical)

Based on sensitivity of 95%, expected TP = 400 x 0.95 = 380

expected FN = 400-380 = 20

Based on specificity of 85%, expected TN = 600 x 0.85 = 510

expected FP = 600-510 = 90Pregnant Non-pregnant Totals

Test + 380 90 470

Test - 20 510 530

Totals 400 600 1000


Positive predictive value = TP = 380/470 = 81%

TP+FP

In this context, a woman with a positive saliva test has an 81% chance of being pregnant.

Negative predictive value = TN = 510/530 = 96%

TN+FN

In this context, a woman with a negative saliva test has a 96% chance of not being pregnant (and a 4% chance of being pregnant)


Group 2: 1000 oral contraceptive users - pretest probability of pregnancy = 10% (hypothetical)

Pregnant Non-pregnant Totals

Test + 95 135 230

Test - 5 765 770

Totals 100 900 1000

Using sensitivity = 95%, expected TP = 0.95 x 100 = 95

expected FN = 100-95 = 5

Using specificity = 85%, expected TN = 0.85 x 900 = 765

expected FP = 900-765 = 135


In this context, positive predictive value is only

95/230 = 41% [TP/(TP+FP)]

Negative predictive value is [TN/(TN+FN)]

= 765/770 = 99%


In which situation is the saliva test more helpful?

Group 1: Test +: 81% probability of pregnancy

Pretest probability 40%

Test -: 4% probability of pregnancy

Group 2: Test +: 41% probability

Pretest probability 10%

Test -: 1% probability


• Note that the same test would likely be used and interpreted very differently in these two contexts.

• This does not imply any difference in the characteristics of the test itself, i.e. sensitivity and specificity are not altered by the pretest probability of the condition of interest.

• Test are most useful when the pretest probability is in a middle range. They are unlikely to be useful when the pretest probability is already very high or low.


Deriving predictive values (post-test probabilities) using a 2x2 table:

1. Fill in totals with/without disease based on pretest probabilities. In general these depend on external information about the population of interest and cannot be extrapolated from a validation study.

2. Fill in the positives and false negatives using sensitivity.

- TP = Number with disease x sensitivity

- FN = Number with disease x (1-sensitivity)

2. Fill in true negatives and false positives using specificity.

- TN = Number free of disease x specificity

- FP = Number free of disease x (1-specificity)

4. Calculate PPV = TP/(TP+FP)

Calculate NPV = TN/(TN+FN)


Bayes’ theorem:

Allows us to calculate revised (“posterior” or post-test) probabilities, based on “prior” (pretest) probabilities and new information (here, test results).

General form:

P(B A) = P(A B) x P(B)

P[(A B) x P(B)] + [P(A B) x P(B)]

Note that B corresponds to “Not B”, so P(B) = 1 - P(B)


For positive predictive value,

P (D+ T+) = P (T+ D+) x P(D+)

[P(T+ D+) x P(D+)] + [P(T+ D-) x P(D-)]

Note this is identical to TP

TP+FP

Lecture 17 - DTESTS - Slide 25

For negative predictive value,

P(D- T-) = P(T- D-) x P(D-)

[P(T- D-) x P(D-)]+[P(T- D+)xP(D+)]

which is equal to TN

TN+FN


Example:

What would be the positive and negative predictive values for the saliva pregnancy test if the pretest probability of pregnancy is 20%?

(sensitivity = 95%, specificity = 85%)

P(pregnant T+) = P(T+ pregnant) x P(pregnant)

[P(T+ pregnant)xP(pregnant)]+[P(T+ not pregnant)xP(not

pregnant)]

= 0.95 x 0.2 = 0.19 = 0.61 or 61%

(0.95x0.2)+(0.15x0.8) 0.19+0.12

Diagnostic Tests and Screening - Slide 27

P(not pregnant T-) = P(T- not pregnant)xP(not pregnant)

[P(T- not pregnant)xP(not pregnant)]+

[P(T- pregnant)xP(pregnant)]

= 0.85 x 0.8 = 0.68 = 0.99 or 99%

(0.85x0.8)+(0.05x0.2) 0.68+0.01

Diagnostic Tests and Screening - Slide 28 Likelihood Ratios

• An alternative way of developing post-test probabilities (predictive values)

• Relationship between pre- and post-test odds, where

• Odds = [probability of x]/[1-probability of x]– If pre-test probability of pregnancy is 20%, then odds

of pregnancy = 0.2/(1-0.2) = 0.25– Odds of no pregnancy = 0.8/(1-0.8) = 4 [the

reciprocal]• Probability = [odds of x]/[1+odds of x]

– If prior odds of pregnancy = 0.25, then pre-test probability of pregnancy = 0.25/(1+0.25) = 0.2


Likelihood Ratios

• Post-test odds = pre-test odds x likelihood ratio, where

• Likelihood ratio =[P test result│condition of interest]

[P test result│no condition of interest]


Likelihood Ratios

• Pregnancy example, saliva test as before– Prior odds 0.25 (20% pre-test probability)– Sensitivity 95%, specificity 85%

• Post-test odds with positive test

= 0.25 x (0.95/0.15)

= 0.25 x 6.33 = 1.58• Post-test probability = 1.58/(1+1.58) = 61%• This approach can be particularly useful for tests

with multiple categories, and for serial testing


Pitfalls in assessments of diagnostic test performance

• Importance of pretest probability, as discussed.

• Pretest probability (and predictive values) cannot ordinarily be extrapolated from a validation study, since the proportions with and without disease are determined by the investigator - unless there is truly random sampling that reflects the context in which the test will be applied.


Was the test applied in a consistent fashion to all members of the validation sample?

e.g. was test interpretation properly blinded?

(unrelated to “true” presence or absence of disease or clues to it)

Was the gold standard applied in a consistent fashion to all members of the validation sample?

(again, blinded application not related to results of test(s) being evaluated)


Example: New diagnostic tests for pulmonary embolism

“Positive” results confirmed by pulmonary angiography (an invasive test with some risk)

“Negative” results confirmed by clinical follow-up, i.e. does the patient return with further symptoms or signs?

- this condition can resolve spontaneously and not recur


Result: Good documentation of true and false positives

Overestimate true negatives, underestimate false negatives

sensitivity of test overestimated

specificity of test also overestimated


Importance of the sample used for test validation:• What was the spectrum of the condition evaluated?• How similar is this to the situation in which the test

will be used?

Example: saliva pregnancy test

Imagine that test hinges on ability to detect progesterone, a hormone where the level increases as pregnancy progresses

• If the test is validated by comparing women who are 3 months pregnant with young, non-pregnant women, it will perform very well as progesterone levels are very high by 3 months.


• On the other hand, the sensitivity may be much lower if the pregnant group consists of women who are only 1 month after their last menstrual period.

• Conversely, the estimated specificity of the test will be higher if the comparison group has very low progesterone levels (e.g. postmenopausal women).


You would reject results of a validation study involving women who are 3 months pregnant, or women who are postmenopausal

• by 3 months, pregnancy is usually relatively obvious by history and thus is unlikely to be the situation where the test will be used.

• the test would never be administered to post-menopausal women!


So:

Sensitivity and specificity estimates do not depend on the prevalence of the condition in question.

BUT their values and their validity depend on the context in which they were obtained, vis-a-vis the context in which they will be used.

This in turn will affect positive and negative predictive values, quite apart from the prevalence/prior probability of the condition.


MisclassificationThe use of an imperfect diagnostic test leads to

misclassification (assigning individuals to the wrong category). In research studies, it is most often non-differential.

• That is, the probability of misclassification is not associated with the exposure or intervention under study.

• For example, the use of an imperfect cardiac enzyme assay to define myocardial infarction in a primary prevention study with a novel anti-platelet agent.

• Another example: ascertaining the development of HIV infection based on a saliva test, comparing injection drug users who do vs. who do not clean their needles (in a cohort study).


• The effect of nondifferential misclassification is to dilute any association which may be present, i.e. the effect measure is biased toward the null value.

• Consider the extreme case where the cardiac enzyme assay is no better than flipping a coin. Then no effect of the antiplatelet drug will be detected, even if it is truly very beneficial.

• If the degree of misclassification is known, then corrected 2x2 tables and parameter estimates can be derived.


Differential misclassification implies that measurement error is associated with study group membership, i.e. it operates differentially between groups.

For example, imagine that the antiplatelet drug directly interferes with the cardiac enzyme assay, leading to underestimation of enzyme levels.

Here, the drug may appear to be protective even if in reality, it is no better than placebo.

Hence depending on the specific circumstances, differential misclassification may lead to under- or overestimation of the true association between exposure and outcome.

Screening• “The identification of an unrecognized disease or risk factor by…[a]

procedure that can be applied rapidly.” (Fletcher, p. 167)• Screening is relevant only if disease is relatively common, testing is

sensitive, specific, and cost-effective, and early treatment improves outcomes

Sensitivity may be calculated by • Detection method:

Cases found by screeningCases found by screening + those identified during followup of screened

persons (interval cases)• Incidence method:

Incidence among unscreened - interval incidence among screenedIncidence among unscreened

Incidence method accounts for “overdiagnosis” of abnormalities that are not clinically important, e.g. prostate cancer


Biases in performance of screening tests

(Does screening lead to better survival?)

1. Lead time bias

The earlier in its natural history an ultimately fatal disease is detected, the longer will be the survival from the time of diagnosis, even if there is no difference in treatment effect.

e.g. 2 years3 years 5 years

Disease Detectable Clinical Death

develops by screening symptoms

If 2 persons A+B develop the same disease at the same age but person A is diagnosed by screening, person A will live 3 more years than person B from time of diagnosis, even if neither is treated, though the chronological survival is equivalent


2. Length bias

The probability of detecting a disease during its preclinical period is proportional to the length of that period, which is inversely proportional to the rate of disease progression.

Hence cases diagnosed by screening may be “destined” for a more favourable evolution, regardless of treatment.


3. Overdiagnosis bias (a variant of length bias; courtesy of Dr. W. Black)

Screening may detect disease that would never have become clinically detectable, e.g. remains stable or regresses spontaneously.

It may also detect disease that would not have contributed to the patient’s death e.g. competing mortality risks among smokers with early-stage lung cancer, or men with early-stage prostate cancer detected by PSA screening.


4. Compliance bias

• Persons who comply with a screening intervention may be healthier—on average--and have healthier behaviours than non-compliers.

• Also likely to be healthier than an unscreened “control group,” which implicitly includes a mixture of persons who would and would not have complied, had they been offered screening.

• Leads to biases in observational (non-randomized) studies, and with analyses limited to “compliers” within randomized trials.

• Relevance of “intent to screen” analyses.

epidemiology and biostatistics 679: clinical epidemiology june 6-29, 2005 instructors: dr. jean...

Documents