phlebotomy training for m-iii students: statistical analysis of test results richard a. mcpherson,...

56
Phlebotomy Training for M-III Students: Statistical Analysis of Test Results Richard A. McPherson, M.D., M.S.

Upload: cecilia-ferguson

Post on 28-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Phlebotomy Training for M-III Students: Statistical Analysis of

Test Results

Richard A. McPherson, M.D., M.S.

Phlebotomy Training 2001-2008

• Exercise offered to third year medical students as part of orientation every year since 2001.

• This is the third year that phlebotomy training was mandatory.

• Other exercises offered in IV/Foley catheter placement.

Numbers of Students Submitting Blood Specimens Each year

• 2001 103• 2002 83• 2003 102• 2004 87• 2005 98• 2006 150• 2007 147• 2008 134• Total 904

50

100

150

Cou

nt

2001 2002 2003 2004 2005 2006 2007 2008

Phlebotomy Training 2008• Wednesday, July 23, 2008 in three separate

sessions held in the Medical Sciences Building at 1, 2 and 3 PM.

• A total of 153 students attended the exercise in which each student collected two tubes of blood on a partner.

• Specimens were successfully collected from 134 students and submitted to the laboratory for simple chemical and hematological measurements.

• The students’ own results were provided to them with a unique identifying number known only to each individual student.

10

20

30

Cou

nt

22 Y

ears

23 Y

ears

24 Y

ears

25 Y

ears

26 Y

ears

27 Y

ears

28 Y

ears

29 Y

ears

30 Y

ears

31 Y

ears

32 Y

ears

37 Y

ears

39 Y

ears

22 Years

23 Years

24 Years

25 Years

26 Years

27 Years

28 Years

29 Years

30 Years

31 Years

32 Years

37 Years

39 Years

Total

Level

1

15

33

31

21

12

6

6

3

2

2

1

1

134

Count

0.00746

0.11194

0.24627

0.23134

0.15672

0.08955

0.04478

0.04478

0.02239

0.01493

0.01493

0.00746

0.00746

1.00000

Prob

13 Levels

Frequencies

Age

20

40

60

Cou

nt

F M

F

M

Total

Level

66

68

134

Count

0.49254

0.50746

1.00000

Prob

2 Levels

Frequencies

Gender

25

50

75

Cou

ntAsian-Pacific

Black

Hispanic Other Unknown White

Asian-Pacific

Black

Hispanic

Other

Unknown

White

Total

Level

30

4

1

3

7

89

134

Count

0.22388

0.02985

0.00746

0.02239

0.05224

0.66418

1.00000

Prob

6 Levels

Frequencies

Race

Specimens by Gender

Male Female Total

Chemistry 68 66 134

Hematology 67 62 129

Reasons to Test Student Specimens• Courtesy to students for participation

• Teach interpretation of laboratory results (i.e., reference ranges) to students

• Evaluate current reference ranges for appropriateness

• Discover previously unknown medical condition– Students could opt out from testing blood.

• Demonstrate statistical applications

Goal 1. Descriptive Statistics

• Measure of Central Tendency– Mean– Median– Mode

• Measure of Dispersion– Standard deviation– Interquartile range (25th to 75th percentile range)

Before you get going with the analysis,

LOOK AT YOUR DATA!!!**#$!@$%

Strategies for Dealing withNon-normal distributions

1. Check for outliers– Extreme cases from errors of recording or

entering data– Individuals that clearly do not belong in the

population sampled.

Example: Checking for OutliersFour methods evaluated for Erythrocyte

Mean Cell Volume on 131 blood specimens

60

70

80

90

100

110

120

60

70

80

90

100

110

120

0

100

200

300

400

500

600

700

800

900

1000

1100

60

70

80

90

100

110

120

Method 1 vs Method 3Method 3 clearly has data entry errors of

1000.0 and 18.9

100.0%

99.5%

97.5%

90.0%

75.0%

50.0%

25.0%

10.0%

2.5%

0.5%

0.0%

maximum

quartile

median

quartile

minimum

1000.0

1000.0

117.3

99.5

93.6

89.5

84.4

74.0

64.4

18.9

18.9

Quantiles

100.0%

99.5%

97.5%

90.0%

75.0%

50.0%

25.0%

10.0%

2.5%

0.5%

0.0%

maximum

quartile

median

quartile

minimum

123.00

123.00

108.00

98.16

93.40

88.80

83.80

74.64

64.98

58.90

58.90

Quantiles

Method 3 edited to remove incorrect values; more normal in distribution

60

70

80

90

100

110

120

60

70

80

90

100

110

120

50

60

70

80

90

100

110

120

130

60

70

80

90

100

110

120

Outlier Trimming

• Remove upper and lower percentiles of data such as 0.5% to use data between 0.5 percentile and 99.5 percentile

• Eliminates what is most likely to be severely atypical information or data entry error

Serum ALT values trimmed for central 99 percent

0 100 300 500 700 900 1100 1300 1500 1700 1900 2100 2300 2500 270010 20 30 40 50 60 70 80 90100 120 140 160 180 200 220

Strategies for Dealing withNon-normal distributions

2. If results are skewed, transform to a scale that is more nearly normal by logarithm, square root, etc.

ALT, Log ALT, SQRT ALT

.01

.05

.10

.25

.50

.75

.90

.95

.99

-3

-2

-1

0

1

2

3

Nor

mal

Qua

ntile

Plo

t

2.5 3 3.5 4 4.5 5

.01

.05

.10

.25

.50

.75

.90

.95

.99

-3

-2

-1

0

1

2

3

Nor

mal

Qua

ntile

Plo

t

3 4 5 6 7 8 9 10 11 12

.01

.05

.10

.25

.50

.75

.90

.95

.99

-3

-2

-1

0

1

2

3

Nor

mal

Qua

ntile

Plo

t

10 20 30 40 50 60 70 80 90 100 120

2008 Student Hemoglobin Distribution

5

10

15

20

Cou

nt11 12 13 14 15 16 17

Hemoglobin (g/dL)

100.0%

99.5%

97.5%

90.0%

75.0%

50.0%

25.0%

10.0%

2.5%

0.5%

0.0%

maximum

quartile

median

quartile

minimum

17.3

17.3

16.9

16.2

15.1

14.3

13.4

12.6

11.6

10.8

10.8

Quantiles

Mean

Std Dev

Std Err Mean

upper 95% Mean

lower 95% Mean

N

14.3

1.32

0.12

14.5

14.1

129

Moments

.01

.05

.10

.25

.50

.75

.90

.95

.99

-3

-2

-1

0

1

2

3

Nor

mal

Qua

ntile

Plo

t

5

10

15

20

Cou

nt

11 12 13 14 15 16 17

Hemoglobin (g/dL)

Assessment of Normality of Distribution:Normal Quantile Plot

Parameters: Mean

• Formula for mean

n

x

n

xxxxx

n

ii

n

1321 ...

mean

Parameters: Variance

• Formula for variance (variances are additive)

1variance 1

2

2

n

xxs

n

ii

Parameters: Standard Deviation

• Formula for Std Dev

11

2

2

n

xxss

n

ii

Goal 2. Comparative Statistics

• Parametric: uses a formula to describe distribution– Student t-test– One-way analysis of variance

• Non-parametric: assumes no particular distribution– Wilcoxon rank-order test

Comparison of Hgb in Females vs Males

10

20

30

Coun

t

10 11 12 13 14 15 16 17 18

Hemoglobin in Females

5

10

15

20

Coun

t

10 11 12 13 14 15 16 17 18

Hemoglobin in Males

Assumptions for Use of t test• Similar numbers in each group• Similar variances in each group• Individuals in each group are independent of one

another (random selection, non-biased)• Values are normally distributed.

You want to make a conclusion (inference) that is generalizable to a larger population than that which constitutes your sample. Accordingly the sample should be representative of the population.

Student’s t Test

• Student was the pseudonym for William Sealy Gossett [1876-1937], who developed statistical methods for solving problems in a brewery where he worked (Guinness in Dublin). He published his work in 1908 in the journal Biometrika. He did not publish under his own name so the nature of his work for optimizing production conditions could remain a trade secret.

Student’s t Test• Principle: Compare the difference between

means to the amount of noise (scatter) in measurements to judge if the difference in means could be due to chance alone.

B ofnumber B of variance

A ofnumber A of variance

B) of(mean - A) of(mean

groups ofy Variabilit

means groupbetween Difference

noise

signal

t

Comparative Statistics: Student’s t Test

• Hemoglobin mean values: females, 13.3 g/dL, males15.2 g/dL

• Are these mean values truly different from one another?

• Student t-value of 10.873, df=127, p-value <0.0001, or less than once in 10,000 times by chance alone.

HGB

11

12

13

14

15

16

17

F M

Gender

Confidence Intervals on Means of the Groups being Compared: no overlap

Level Number Mean Std Error

Lower 95% CI

Upper 95% CI

Female 62 13.33 0.1209 13.093 13.572

Male 67 15.16 0.1163 14.927 15.387

Comparison of WBC in Females vs Males

• WBC mean values: females, 7.1, males 6.6

• Are these mean values truly different from one another?

• Student t-value of 1.787, df=127, p-value = 0.0764, or about 1 in 13 times by chance alone.

WBC

3

4

5

6

7

8

9

10

11

12

13

F M

Gender

If we suspect a gender-related difference, how can we show it to have statistical

significance? Adjustables:

• Distance between means; use a more discriminating instrument, method, principle of measurement.

• Noise level: use a more precise method with less scatter in measurement

• Accept a higher type I error rate.

• Number of observations: increase N

Goal 3. Power Analysis

Do a power analysis to find N at which the conditions of a pilot study predict significance (at the level specified) could be achieved if your estimates of mean difference (delta) and variance (noise level) are accurate.

Definitions of Statistical Power• The likelihood of finding a statistically significant

difference when a true difference exists.Online Learning Center

• The power of a statistical test is the probability that the test will reject a false null hypothesis (that it will not make a Type II error). As power increases, the chances of a Type II error decrease. The probability of a Type II error is referred to as the false negative rate (β). Therefore power is equal to 1 − β. Wikipedia

Formula for calculating sample size

• N = number of subjects in each group• Z= parameter for chance of finding a difference

by chance alone (usually set to 5 percent) = 1.96• Z = parameter indicating power of finding a

difference (usually set to 80 percent) = 0.84

= the difference between group means (usually obtained from a pilot study or by an informed guess

= common SD for both groups

2

2242

ZZ

N

Z

Power Analysis for WBC vs Gender• Power• Alpha Sigma Delta Number Power• 0.0500 1.586712 0.249608 129 0.4260• 0.0500 1.586712 0.249608 139 0.4530• 0.0500 1.586712 0.249608 149 0.4792• 0.0500 1.586712 0.249608 159 0.5046• 0.0500 1.586712 0.249608 169 0.5293• 0.0500 1.586712 0.249608 179 0.5530• 0.0500 1.586712 0.249608 189 0.5760• 0.0500 1.586712 0.249608 199 0.5981• 0.0500 1.586712 0.249608 209 0.6193• 0.0500 1.586712 0.249608 219 0.6397• 0.0500 1.586712 0.249608 229 0.6593• 0.0500 1.586712 0.249608 239 0.6780• 0.0500 1.586712 0.249608 249 0.6959• 0.0500 1.586712 0.249608 259 0.7130• 0.0500 1.586712 0.249608 269 0.7294• 0.0500 1.586712 0.249608 279 0.7449• 0.0500 1.586712 0.249608 289 0.7597• 0.0500 1.586712 0.249608 299 0.7738• 0.0500 1.586712 0.249608 309 0.7872• 0.0500 1.586712 0.249608 319 0.7999• 0.0500 1.586712 0.249608 329 0.8119• 0.0500 1.586712 0.249608 339 0.8233• 0.0500 1.586712 0.249608 349 0.8342

Pow

er

0.00

0.20

0.40

0.60

0.80

1.00

100 150 200 250 300 350

Number

Gender

Alpha=0.05 Sigma=1.58671 Delta=0.24961

Power Plot

Need a total of 320 subjects to show significance at 0.05 levelwith 80% power

Goal 4. How to fit a line

• Least squares regression minimizes square of (vertical) distances from data points to line (best fit)

• y = ax + b

34

36

38

40

42

44

46

48

50

HC

T

11 12 13 14 15 16 17

HGB

Plot of residuals shows homoscedasticity (uniformity of data

over entire range)

-3

-1

1

3

Res

idua

l

11 12 13 14 15 16 17

HGB

Hct = 8.890 + 2.284xHgb

• R2 = 0.9069, so >90% of variation in Hct is predicted by Hgb

• Intercept = 8.890– t = 9.55, p<0.0001

• Slope = 2.284– t = 35.17, p<0.0001

• Is this a great fit or what?

• What if Hgb = 0? Then Hct should = 0, not 8.890

So force the line through the origin.

34

36

38

40

42

44

46

48

50H

CT

11 12 13 14 15 16 17

HGB

Hct = 0 + 2.901xHgb

60

65

70

75

80

85

90

95

100M

CV

4 4.5 5 5.5 6 6.5

RBC

Normal Variants?

Uric A

cid

1

2

3

4

5

6

7

8

9

10

11

F M

Gender

Super Difference by Gender

Gender Different AnalytesSo

dium

136

137

138

139

140

141

142

143

144

145

146

147

F M

Gender

Carb

on D

ioxide

16

18

20

22

24

26

28

30

32

F M

Gender

Gender Different AnalytesBU

N

10

15

20

25

30

35

F M

Gender

Crea

tinine

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

F M

Gender

Gender Different AnalytesCa

lcium

8.5

9

9.5

10

10.5

11

F M

Gender

Mag

nesiu

m

1.7

1.8

1.9

2

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

F M

Gender

Gender Different AnalytesAlb

umin

2.5

3

3.5

4

4.5

5

5.5

F M

Gender

Glob

ulins

2

2.5

3

3.5

4

4.5

5

F M

Gender

Gender Different AnalytesAS

T

0

10

20

30

40

50

60

70

80

90

F M

Gender

ALT

0

20

40

60

80

100

120

F M

Gender

Gender Different AnalytesAlk

Pho

s

30

40

50

60

70

80

90

100

110

120

130

F M

Gender

Bili T

otal

0

0.5

1

1.5

2

2.5

F M

Gender

Variation over Time: PlateletsP

LT

100

200

300

400

500

600

2001 2002 2003 2004 2005 2006 2007 2008

Year

Variation over Time: WBCW

BC

10

2001 2002 2003 2004 2005 2006 2007 2008

Year

Glucose 2008: postprandial (1 to 4 PM)

20

40

60

Cou

nt

50 60 70 80 90 100 110 120 130

Glucose (mg/dL)

Glucose over the YearsG

lucose

100

200

300

2001 2002 2003 2004 2005 2006 2007 2008

Year

AcknowledgementsPathology faculty• Roger Riley, MD• Kim Sanford, MD• Samuel B. Hunter, MD

Resident• Saud Rahman, MD

Nurse• Jennifer Anderson, RN

Phlebotomists• Linda Walker, MT, Supervisor• Charity Delacruz, CPT• Rogelio Inocencio, MLT• Jean Merritt, CPT• Shirley White, CPT

Test ordering, set-up, and processing

• Caroline Greene, MT• Susan Handwerk• June Lee, MT, Evening

Supervisor• Kristina Nilsen, MT• Millicent Smith, MT• Karen Tinsley, MT