1 epi-820 evidence-based medicine (ebm) lecture 2: medical measurement mat reeves bvsc, phd...

Post on 22-Dec-2015

224 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

EPI-820 Evidence-Based Medicine (EBM)

LECTURE 2: MEDICAL MEASUREMENT

Mat Reeves BVSc, PhD

Department of Epidemiology

Michigan State University

2

Objectives:• 1. Understand biological and measurement variation

and its effects on precision and validity.• 2. Understand the components of variability

– biological and measurement – between- and within-person/observer

• 3. Understand measures of variation and measures of agreement.

• 4. Understand the calculation and application of K.• 5. Understand the consequences of variability in

clinical data and possible remedies to ameliorate• 6. Understand regression to the mean.

3

I. Variation in Clinical Data

• 1. Biologic Variation= variation in the actual entity being measured

• derives from the dynamic nature of physiology, homeostasis and pathophysiology.

• within (intra-person) biologic variability and,• between (inter-person) biologic variability

4

Within (day-to-day variation) and Between Person Biological Variation: Coefficient of Variation (%) (see Winkel et al, 1974)

• Variable CV (Within) CV (Between)• Na 0.7% 0.8%• K 4.3% 4.3%• Cl 2.1% 1.2% • Ca 1.7% 2.8%• BUN 12.3% 16.4%• Creatinine 4.3% 9.5%• Cholesterol 5.3% 13.6%• SGOT (ALT) 24.2% 24.8%• TP 2.9% 5.7%

5

I. Variation in Clinical Data

• 2. Measurement Variation= variation due to the measurement process

• inaccuracy of the instrument (instrument error), and/or,

• inaccuracy of the person (operator error)

• can introduce both random error and bias

6

Analytical Variation - Coefficient of Variation (%) of Duplicate Samples

• Variable CV (Analytical)• Na 1.1%• K 2.6%• Cl 2.1%• Ca 2.1%• BUN 2.2%• Creatinine 3.4%• Cholesterol 3.1%• SGOT (ALT) 7.3%• TP 1.7%

7

Validity

• Degree to which a measurement process measures what is intended i.e., accuracy.

• Lack of systematic error or bias.   

• A valid instrument will, on average, be close to the underlying true value.

• Assessment of validity requires a “gold standard” (a reference).

8

What if no gold standard? (e.g., pain, nausea or anxiety)

• Use instrument or clinical scale to measure a specific phenomenon or construct.  • Criterion Validity - the degree to which the scale predicts a

directly observable phenomenon e.g. APGAR score and neonatal survival.  

• Content Validity - the extent to which the instrument includes all of the dimensions of the construct being measured e.g. does APGAR include all relevant patho-physiological parameters?

• Construct Validity - the degree to which the scale correlates with other known measures of the phenomenon e.g. how well does a new “Neonatal assessment scale” correlate with APGAR score?  

9

How do you measure validity?

• Dichotomous data • sensitivity, specificity, and predictive values.

• Continuous data • mean and standard deviation of the difference

between surrogate measure and gold standard (see Bland and Altman, 1986).

10

Precision (or reliability or reproducibility)

• the extent that repeated measurements of a phenomenon tend to yield the same results (regardless of their accuracy!).

• Precision refers to the lack of random error

• Precision ~ 1 / random error   

11

Hard versus Soft Data ?

• Blood chloride level

• Left ventricular ejection volume

• Migraine severity

• 28-d stroke case-fatality rate 

• Indirect costs of school absenteeism 

• Direct costs of school absenteeism 

• Degree of depression 

• Alzheimer severity 

• Self-reported ability to do domestic chores 

• Self-reported ability to climb stairs 

• Patient preferences for induced labour 

• Self-reported assessment of health

12

Hard versus Soft Data

• No specific criteria to define “hard” data, attributes include:• Consistency: the ability to preserve basic

evidence (repeated observations are consistent) (most important attribute).

• Objectivity: observations are free of subjective influences.

• Quantifiable: the ability to express the result as a number.

13

Hard versus Soft Data

• Usually hard data are numeric measures, such as lab data, but not always (e.g., histology, cancer stage)

• Hard (numeric) data preferred to softer (qualitative) measures because they are more objective and reliable? (but see Feinstein AR et al, 1985, Will Rogers phenomenon)

14

Between and Within Person Variation

• Four categories of clinical variability:

• 1. Between-person biological variability • 2. Within-person biological variability • 3. Between-observer measurement variability • 4. Within-observer measurement variability

15

ANOVA Model Conceptualization

• yijkl = i + ij + ik + il

• where:– yijk = the observed measurement for individual i, measured at

time j, by the kth observer at the lth replication. i = individuals usual true mean (between person biological

variation) ij = perturbation due to biological variation at time j (within

person biologic variation). ik = perturbation due to measurement error by the kth observer

(between observer measurement variation). – il = perturbation due to measurement error at the lth replication

(within observer measurement variation).

16

II. Statistical aspects of variability

• A. Measures of Variation• 1. Variance and Standard Deviation

• SD = absolute value of average differences of individual values from the overall mean.

• CLT = 68%, 95%, 99%• Example:

– Av. US Cholesterol = 220 mg/dl, SD = 15 mg/dl– Indv. readings expected to vary 190-250 mg/dl

1 - n

)x - x( = SD

2i

17

A. Measures of Variation

• represents the % variation of a set of measurements around their mean

• conceptualized as a “noise-to-signal ratio”• useful index for comparing the precision of

different instruments, individuals and/or laboratories.

X

SD%

• 2. Co-efficient of Variation (CV)

18

B. Measures of Agreement

• 1. Correlation (r) • Pearson product moment correlation and

Spearman’s rank correlation

• measures the degree of linear relationship between two variables (-1, +1)

• correlation between two sets of continuous measurements (= reliability) or extent of replication

19

1. Correlation (Cont’d)

• Two observers, same time period = inter-rater reliability.

• Single observer, two time periods = intra-rater reliability (test-retest reliability).

• Can have very high values of r, but little direct agreement between raters or instruments.

• Can only be used as a test of validity if the actual true values are known.

20

B. Measures of Agreement

2. Intra-class Correlation Coefficient

(R or reliability)• a measure of reliability for continuous or quantitative data • an observed value (X) consists of two parts:• X = T + e

– where:

• T = the “True” unknown level or “error-free” score or “steady state” or “signal”

• e = error (whether “biologic” or “measurement” error)

• true error-free value varies about some unknown mean () with a variance of 2

T.

21

2. R (Cont’d)• error term is regarded as iid ( = 0, 2

e ). • Variance of X (2

x ) = 2T + 2

e • relative size of error variance (2

e) in relation to variance of true value (2

T ) is a measure of the imprecision.

• R = 2T.

2T + 2

e

• R = the proportion of the total variance due to subject-to-subject (or between-person) variability in the “true” value.

• As random error decreases, the value of R increases

22

2. Categorical data – Kappa (K)

• A measure of reliability for categorical or qualitative data.

• Kappa corrects for the degree of chance in the overall level of agreement, and is preferred over other measures (like overall percent agreement).

• K = Po - Pe = Actual agreement beyond chance 1 - Pe Potential agreement beyond chance

• Po = the total proportion of observations on which there is agreement

• Pe = the proportion of agreement expected by chance alone.

23

Agreement matrix for kappa statistic (inter-rater agreement, 2 observers, dichotomous data)

  

 

OBSERVER B

 

OBSERVER A  

 

Yes 

No 

TOTALS

 

Yes 

f1

 

No 

f2

 

TOTALS 

n1

 

n2

 

N

24

Agreement matrix for kappa statistic (2 observers, dichotomous data)

  

 

OBSERVER B

 

OBSERVER A  

 

Yes 

No 

TOTALS

 

Yes 

69 

15 

84

 

No 

18 

48 

66

 

TOTALS 

87 

63 

150

25

K (Cont’d)

• Observed agreement (Po) = 78%

• (69 + 48)/150 = 0.78 or 78%.

• Agreement expected dt chance (Pe) = 51%.

• Calculated by the product of the marginal totals for cells a and d [87 x 84/150 = 48.75 + 63 x 66/150 = 27.72]

• Then divide sum [76.47] by 150 to get Pe = 0.51 or

51%.

26

K (Cont’d)  

• K = Po - Pe = 0.78 - 0.51 = 0.27 = 0.55 or 55% 1 - Pe 1 - 0.51 0.47

• Kappa varies from -1 to +1, with a value of zero denoting agreement no better than chance (negative values denotes agreement worse than chance!)  

• Value of k Strength of agreement <0 Poor0 - 0.20 Slight0.21 - 0.40 Fair0.41 - 0.60 Moderate0.61 - 0.80 Substantial0.81 - 1.0 Almost perfect

27

K - Issue of Prevalence

• The prevalence of condition affects the likelihood that observers will agree purely due to chance - hence the importance of using kappa. Example: • Observer A classified 120/150 patients• Observer B classified 130/150 patients

• Pe is now 72%.

28

K - More Complicated Scenarios

• Overall (summary) kappa:• several observers or raters and/or where the subjects are

classified into several different categories. 

• Weighted kappa:• measuring the relative degree of disagreement when subjects

are classified into several ordinal categories (e.g., normal, slightly abnormal and very abnormal).  

• MacClure and Willett (1987): • Use kappa for dichotomous data or nominal polytomous data

only. • For ordinal data use either Spearman’s rank correlation or R.

29

IV. Consequences of variability of clinical data

• A. Clinical impact• Errors in diagnosis, prognosis and even treatment.• Clinical disagreement between clinicians.

• B. Research Impact• Between-person biological variability is a prerequisite for

etiologic studies. • Random within-person variability (a form unreliability) results

in non-differential misclassification - with a resulting dilution or attenuation of effect.

30

B. Research impact

• Generally, imprecision has less impact in research setting than individual clinical setting because can average over a large number of observations (but still require measure to be valid).

• Variability and misclassification result in the need for larger samples sizes (and increased costs).

• Measurement errors can introduce bias if they do not occur at random - non-differential misclassification

31

Regression Dilution Bias

• Example: MacMahon et al., (1990)

• imprecision resulting from a single measurement of diastolic blood pressure resulted in a 60% attenuation of RR’s (for the effect of elevated blood pressure on stroke and MI).

• “regression dilution bias”.

32

C. Regression towards the mean

• Group of individuals selected based on the results of an “abnormal” test can be divided into:• a) those with a true underlying abnormal value, and• b) those with a true underlying normal value (but random

fluctuations resulted in an outlying [abnormal] value).

• On retesting, patients in group b are closer to their typical (normal) values, so, the overall mean is less extreme (= regression to the mean).

• Occurs when repeated observations are performed on a variable that is inherently variable.

33

C. RTTM• Often interpreted as a sign of clinical improvement,

regardless of effectiveness of treatment (an important explanation for the placebo effect)

• If first reading is d units higher than the true value (), then on average, the next value will be closer to the mean by d(1 - r) units, • where r is the correlation between the two measurements• RTTM increases if d is large and r is small.

• RTTM is a general tendency for describing the average behaviour of a group, not necessarily individuals!!

34

V. Remedies for variability of clinical data• A. Within-person biologic variation

• Standardized measurements: use a standard protocol i.e., time of day, body position etc.

• Average repeated tests e.g., take several blood pressure reading.

• Use a less variable test e.g., for diabetes use glycosolated Hb, rather than blood glucose.

• Plot the data - what is the trend?• Develop reference values for each individual - especially if:

– within-person variability <<< between-person variability – this results in a wide reference range which makes it difficult to

identify individual deviations – e.g., body weight, PSA, EKG

35

B. Measurement Error

• Measurement imprecision corrected by adjusting the machine or re-training the tester, (or, average several values?).

• Measurement error that causes bias requires quality assurance testing. Fix by re-calibration (don’t average!!).

36

Sackett - Six strategies for preventing or

minimizing clinical disagreements

• 1. Match diagnostic environment to the diagnostic task.• 2. Corroborate key findings by:

– repeating observations and questions– confirm information with other sources (e.g., family members) – confirm key findings using appropriate diagnostic tests– seek confirmation from “blinded” colleagues

• 3. Report actual findings then report inference • 4. Use appropriate technical aids to avoid imprecision

(e.g., ruler).• 5. “Blinded” assessments of diagnostic findings.• 6. Apply skills of social sciences

– establish understanding, follow a logical order, listen, observe, interrupt only where necessary). 

top related