introduction to biostatistics, harvard extension school © scott evans, ph.d. and lynne peeples,...

51
© Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Introduction to Biostatistics, Harvard Extension School Descriptive Statistics, The Normal Distribution, and Standardization

Post on 19-Dec-2015

236 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.1

Introduction to Biostatistics, Harvard Extension School

Descriptive Statistics,The Normal Distribution,

andStandardization

Page 2: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.2

Introduction to Biostatistics, Harvard Extension School

Happy Valentine’s Day!

How many candy hearts in a box of NECCO Sweethearts?

1, 2, 3, 4, …, 40?1, 2, 3, 4, …, 40?

Page 3: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.3

Introduction to Biostatistics, Harvard Extension School

Big Picture revisited…

Sample

x, s, s2

Populationμ, σ, σ2

Step I Step II Step III

StatisticalInference

(w/ Probability)

Page 4: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.4

Introduction to Biostatistics, Harvard Extension School

SAMPLE = Boxes of Sweethearts

x, s, s2

POPULATION =All boxes of Sweethearts

μ, σ, σ2

~ 8 billion hearts made each year at NECCO!!

Step I: Take the Sample

Want a representative sample of boxes (i.e. hoping for different batches - purchased at different stores in Cambridge and Boston)

The larger the sample, the better

Page 5: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.5

Introduction to Biostatistics, Harvard Extension School

Step I: Take the Sample

SAMPLE = 12 boxes Sweetheart counts ranging from 28 to 36

xx11 = 29= 29 xx22 = 31 = 31 xx33 = 32 = 32 xx44 = 27 = 27 xx55 = 36 = 36 xx66 = 35 = 35 xx77 = 29 = 29 xx88 = 30 = 30 xx99 = 31 = 31 xx1010 = 29 = 29 xx1111 = 28 = 28 xx1212 = 33 = 33

Page 6: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.6

Introduction to Biostatistics, Harvard Extension School

Step II: Describe the Sample

Descriptive Statistics Measures of Central Tendency Measures of Variability Other Descriptive Measures

How can we describe our Sweetheart sample?

Page 7: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.7

Introduction to Biostatistics, Harvard Extension School

Measures of Central Tendency

Measures the “center” of the data Examples

Mean Median Mode

The choice of which to use, depends… It is okay to report more than one.

They are simply descriptive (not inferential) However, when presenting (i.e. journal) limited on

space - forced to choose

Page 8: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.8

Introduction to Biostatistics, Harvard Extension School

Mean

The “average”. If the data are made up of n observations:

x1, x2,…, xn, then the mean is given by the sum of the observations divided by the number of observations.

For example, if the data are: x1=1, x2=2, x3=3, then the mean is (1+2+3)/3=2.

Often denoted as

n

i

iXn

X1

1

Page 9: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.9

Introduction to Biostatistics, Harvard Extension School

Mean

The population mean is often denoted by μ. This is usually unknown (although we try to make inferences about this).

The sample mean is an estimator of the population mean.

Page 10: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.10

Introduction to Biostatistics, Harvard Extension School

Mean

n

i

iXn

X1

1= (29 + 31 + … + 28 + 33)/12

= 370/12 = 30.83

≈ 31 Sweethearts

What is the mean of our sample of sweethearts?

Page 11: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.11

Introduction to Biostatistics, Harvard Extension School

Median

The “middle observation” according to its rank in the data.

The median is: The observation with rank (n+1)/2 if n is

odd. For example, if the data are {1,2,3}, then the median is 2.

The average of observations with rank n/2 and (n+2)/2 if n is even. For example, if the data are {1,2,3,4} then the median is 2.5.

Page 12: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.12

Introduction to Biostatistics, Harvard Extension School

Median

What is the median of our sample of Sweethearts? Sort our 12 boxes in order by counts:

27, 28, 29, 29, 29, 30 | 31, 31, 32, 33, 35, 36

In our example, 30 and 31 are our middle numbers…So, the median = 30.5.

Page 13: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.13

Introduction to Biostatistics, Harvard Extension School

Median

Another example: Income level with Bill Gates in the room.

The median is more robust than the mean to extreme observations. If data are skewed to the right, then the mean >

median (in general). For example, if the data are {1,2,3,4,20} then median=3 and mean=6.

If data are skewed to the left, then mean < median (in general).

For example, if the data are {1,15,16,18,20} then median=16 and mean=14.

If data are symmetric, then mean≈median

Page 14: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.14

Introduction to Biostatistics, Harvard Extension School

Mode

The value that occurs the most often. For example, if data are {1,1,2,2,2,2,3,3}, the mode is 2.

Good for ordinal or nominal data in which there are a limited number of categories.

Not very useful for continuous data. For example, if data are {2,2,3,4,5,6,7,8,9}, the mode is 2 but is

not a good measure of central tendency in this case.

29 appears the most often (3x) in our Sweetheart example.

Page 15: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.15

Introduction to Biostatistics, Harvard Extension School

Measures of Variability

Measure the “spread” in the data Example: Age distribution in the Extension

School vs. FAS college Some important measures

Variance Standard Deviation Range Interquartile Range

--- The larger value of these measures, the larger the spread and variability.

Page 16: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.16

Introduction to Biostatistics, Harvard Extension School

Variance

The sample variance (s2) may be calculated from the data. It is the average of the square deviations of the observations from the mean.

The population variance is often denoted by σ2. This is usually unknown.

n

ii XX

ns

1

22

1

1

Page 17: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.17

Introduction to Biostatistics, Harvard Extension School

Variance

The deviations are squared because we are only interested in the size of the deviation rather than the direction (larger or smaller than the mean).

Note:

Why? 0

1

n

ii XX

Page 18: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.18

Introduction to Biostatistics, Harvard Extension School

Variance

The reason that we divide by n -1 instead of n has to do with the number of “information units” in the variance. After estimating the sample mean, there are only n-1 observations that are a priori unknown (degrees of freedom).

Page 19: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.19

Introduction to Biostatistics, Harvard Extension School

Variance

n

ii XX

ns

1

22

1

1

12

1

283.30112

1

iiX

222 )83.3033(...)83.3031()83.3029(11

1

= = =

= 7.61

11

67.83

11

84.4...04.024.3

11

2.2...2.08.1 222

For our Sweetheart data…

Page 20: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.20

Introduction to Biostatistics, Harvard Extension School

Standard Deviation

Square root of the variance s = sqrt(s2) = sample SD

Calculate from the data (see formula for s2 ) σ = sqrt(σ2) = population SD

Usually unknown

Expressed in the same units as the mean (instead of squared units like the variance)

In our Sweetheart example,

Now, summarized sample with just 2 numbers!

ssweetheart 76.261.7

Page 21: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.21

Introduction to Biostatistics, Harvard Extension School

Range

Maximum-Minimum Sweetheart example: 36 – 28 = 8

Very sensitive to extreme observations (outliers)

Page 22: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.22

Introduction to Biostatistics, Harvard Extension School

Interquartile Range

IQR=Q3-Q1 Q1: the first quartile Q3: the third quartile

More robust than the range to extreme observations

In our example, 27, 28, 29 | 29, 29, 30 | 31, 31, 32 | 33, 35, 36

IQR = 29-32.5 = 3.5 Sweethearts

Page 23: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.23

Introduction to Biostatistics, Harvard Extension School

Other Descriptive Measures

Minimum and Maximum Very sensitive to extreme observations

Sample size (N) (i.e. 12 boxes)

Percentiles Examples:

Median = 50th percentile Q1, Q3 = 25th and 75th percentiles

Page 24: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.24

Introduction to Biostatistics, Harvard Extension School

Small Samples

For very small samples (e.g., <5 observations), summary statistics are not very meaningful (actually can be misleading).

Better to simply list the data.

Page 25: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.25

Introduction to Biostatistics, Harvard Extension School

Example – Firefighter CHD StudyTable 4: CHD Retirements versus Active Firefighters (Controls)

CHD Retirements (n= 277)Mean (Median), % (n)

Active Firefighters(n=310)Mean (Median), % (n)

Age 54.2 (55.0) 39.3 (39.0)

Age≥ 45 years old 94% (261) 21% (64)

Current Smoking 30% (76) 10% (31)

Hypertension 59% (141) 21% (65)

Cholesterol >/= 5.18 mmol/L(200 mg/dl)

80% (169) 63% (196)

Prior Diagnosis of CHD 22% (48) 1% (3)

BMI 30.3 (29.8) 28..9 (28.4)

Obesity, BMI >/=30 41% (98) 34% (104)

Page 26: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.26

Introduction to Biostatistics, Harvard Extension School

Example – A5095

TZV Pooled EFV (n=382) (n=765)

Male 81% 81%

Mean age, years 38.0 38.0

Race or ethnic groupNon-Hispanic White 39% 41%Non-Hispanic Black 37% 36%Hispanic 21% 21%Other 2% <1%

Mean baseline HIV RNA, log10 c/mL 4.85 4.86

100,000 c/mL at screening 43% 43%

Mean baseline CD4 count, cells/mm3 234 242

Page 27: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.27

Introduction to Biostatistics, Harvard Extension School

Random Variables

Variable A characteristic that can be measured,

categorized, quantified, or qualified. Random variable

A variable whose value is determined by a random phenomena (I.e., not determined by study design)

Continuous random variable Can take on any value within a specified interval

or continuum

Page 28: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.28

Introduction to Biostatistics, Harvard Extension School

Probability Distributions

Every random variable has a corresponding probability distribution

A probability distribution describes the behavior of the random variable It identifies possible values of the random variable

and provides information about the probability that these values (or ranges of values) will occur.

A particularly important probability distribution is the Normal Distribution…

Page 29: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.29

Introduction to Biostatistics, Harvard Extension School

Normal Distribution

xxf (2

1exp

2

1)(

Page 30: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.30

Introduction to Biostatistics, Harvard Extension School

Normal Distribution

“ Bell-shaped curve”

Symmetric about its mean (μ)

The closer that an observation is to the mean, the more frequently it occurs.

Notation: X~N(μ,σ)

),(~ NX

Page 31: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.31

Introduction to Biostatistics, Harvard Extension School

Location & Shape

μ = LOCATION σ = SHAPE

Note that some may have same mean, but differentiated by their spread (shape)

Page 32: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.32

Introduction to Biostatistics, Harvard Extension School

Normal Distribution

The normal distribution, N(μ, σ) can be described by the following “density function”:

x

exf(

2

1

2

1)(

Page 33: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.33

Introduction to Biostatistics, Harvard Extension School

Normal Distribution

The area under this curve (function) is one. Probabilities may be calculated as the area under the curve

(above the x-axis). Integration (calculus) can help quantify these areas (probabilities).

Page 34: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.34

Introduction to Biostatistics, Harvard Extension School

Moving towards Step III…

Sample

x, s, s2

Populationμ, σ, σ2

Step I Step II Step III

StatisticalInference

(w/ Probability)

Page 35: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.35

Introduction to Biostatistics, Harvard Extension School

Standard Normal Distribution

A special normal distribution: N(0,1) Values from this distribution represent

the number of SDs away from the mean (0).

Known properties of this distribution-- Can make probabilistic statements using

the standard normal table

Page 36: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.36

Introduction to Biostatistics, Harvard Extension School

Standard Normal Distribution

-4 -3 -2 -1 0 1 2 3 4 μ-2σ μ-σ μ μ+σ μ+2σ

• For any variable X, with

mean μ and SD = σ :

• Z now has mean 0 and SD = 1.

• This “standardization” creates a variable Z, such that values of this variable represent the number of SD’s away from the mean (0).

X

Z

Page 37: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.37

Introduction to Biostatistics, Harvard Extension School

Standard Normal Table

Page 38: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.38

Introduction to Biostatistics, Harvard Extension School

Standardization

Common Mistake: X has mean μ and SD = σ, then Z=(X- μ)/σ ~ N(0,1). This is NOT true!! It is true that Z has mean 0 and SD=1

(standardization). However, Z is only normal if X was also normal.

Page 39: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.39

Introduction to Biostatistics, Harvard Extension School

Standardization

However, if X~N(μ,σ), then Z=(X-μ)/σ ~N(0,1). We can then make probabilistic statements about X.

Thus we can make probabilistic statements about any variable with any normal distribution.

Page 40: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.40

Introduction to Biostatistics, Harvard Extension School

Example - IQ

IQ~N(100,15)

What’s the probability that a person chosen at random has an IQ>135?

Z = (135-100)/15 = 2.33

-4 -3 -2 -1 0 1 2 3 4 70 85 100 115 130

15

100Z

IQ

Page 41: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.41

Introduction to Biostatistics, Harvard Extension School

P(Z>2.33) = 0.010

Example - IQ

Page 42: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.42

Introduction to Biostatistics, Harvard Extension School

Example – IQ

-4 -3 -2 -1 0 1 2 3 4 70 85 100 115 130

What’s the probability that a person chosen at random has an IQ<90?

Z = (90-100)/15 = -0.67

By symmetry, P(Z<-0.67) = P(Z>0.67)

Probabilities that a person chosen at random has an IQ between two values may also be obtained.

Page 43: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.43

Introduction to Biostatistics, Harvard Extension School

Example - IQP(Z>0.67) = 0.251

Page 44: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.44

Introduction to Biostatistics, Harvard Extension School

Central Limit Theorem

A very important result in statistics that permits use of the normal distribution for

making inferences (hypothesis testing and estimation) concerning the

population mean.

),(~n

NX n

Page 45: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.45

Introduction to Biostatistics, Harvard Extension School

Central Limit Theorem

Sample 1

x1

Population(any distribution)

μ,σ

Sample 3

x3

Sample 2

x2

Sample 4

x4

Sample 5

x5

),(~n

NX n

• All samples of size n

Sample Means

Page 46: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.46

Introduction to Biostatistics, Harvard Extension School

Central Limit Theorem

If the distribution of each observation in the population has mean μ and standard deviation σ regardless of whether the distribution is normal or not :

1. The distribution of the sample means (from samples of size n taken from the population) has mean μ identical to that of the population.

2. The standard deviation of this distribution is .

as n

as σ

3. As n gets large the shape of the distribution of the sample means is approximately that of a normal distribution

n

nX

Page 47: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.47

Introduction to Biostatistics, Harvard Extension School

Central Limit Theorem

Variable X, population mean=100, SD=15 Samples of size 25 (for example)

Sample 1, mean=90 Sample 2, mean=115 Sample 3, mean=101 Sample 4, mean=94 . . Sample 30, mean=99

Page 48: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.48

Introduction to Biostatistics, Harvard Extension School

Central Limit Theorem

Plot sample means (histogram): The sample means have mean 100 The sample means have a SD of

= 15/5 = 3

The distribution of sample means would tend to be normal as n gets large.

2515

Page 49: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.49

Introduction to Biostatistics, Harvard Extension School

Central Limit Theorem

Now we can combine this normality result from the CLT with standardization to make probabilistic statements about the population mean!

Page 50: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.50

Introduction to Biostatistics, Harvard Extension School

Assume, μ = 30.83 = 2.71/3.46 = 0.78

Sampling Distributionof Sweethearts

12

28.1 30.8 33.5 30.0 30.8 31.6

Population Distribution Sampling Distribution of Means

Page 51: Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S. 1 Descriptive Statistics, The Normal Distribution,

© Scott Evans, Ph.D. and Lynne Peeples, M.S.51

Introduction to Biostatistics, Harvard Extension School

Sampling Distributionof Sweethearts

28.1 30.8 33.5 -1 0 1

30.0 30.8 31.6 -1 0 1

Population Distribution Sampling Distribution of Means

73.2

83.30XZ

78.0

83.30XZ