sampling distributions central limit theorem. objectives investigate the variability in sample...

54
Sampling Distributions Central Limit Theorem

Post on 21-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Sampling Distributions

Central Limit Theorem

Page 2: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Objectives

• Investigate the variability in sample statistics from sample to sample

• Find measures of central tendency for distribution of sample statistics

• Find measures of dispersion for distribution of sample statistics.

• Find the pattern of variability for sample statistics

Page 3: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Distribution of Sample Mean

Page 4: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Overview

• A new sample mean can be calculated each time a new sample is taken

• In this way, the sample mean can be analyzed as a random variable

• Being able to calculate (approximately) the distribution of the sample mean is a critical tool for inference

Page 5: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Learning Objectives

• Understand the concept of a sampling distribution

• Describe the distribution of the sample mean for samples obtained from normal populations

• Describe the distribution of the sample mean for samples obtained from a population that is not normal

Page 6: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Statistical Inference

• Often the population is too large to perform a census … so we take a sample

• How do the results of the sample apply to the population?– What’s the relationship between the sample mean

and the population mean?– What’s the relationship between the sample standard

deviation and the population standard deviation?

• This is statistical inference

Page 7: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Estimation

• We want to use the sample mean to estimate the population mean μ

• If we want to estimate the heights of eight year old girls, we can proceed as follows– Randomly select 100 eight year old girls– Compute the sample mean of the 100 heights– Use that as our estimate

• This is using the sample mean to estimate the population mean

X

Page 8: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Sample Mean is a Variable

• Usually, we just take one single sample to estimate population parameter.

• However, if we take a series of different random samples from a target population

– Sample 1 – we compute sample mean x1

– Sample 2 – we compute sample mean x2

– Sample 3 – we compute sample mean x3

– Etc.• Each time we sample, we may get a different result• The sample mean is a random variable!

X

Page 9: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Distribution of Sample Mean

• Because the sample mean is a random variable– The sample mean has a probability distribution– We can obtain the center and spread of the

probability distribution of the sample mean

• This is called the distribution of the sample mean

• Because the sample mean is a sample statistic, a distribution of a sample statistic is often called a sampling distribution. So, the distribution of the sample mean is also called sampling distribution of the mean.

Page 10: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 1

1) Calculate the mean and standard deviation of the population.

2) Make a list of all samples of size 2 that can be drawn from this set (Sample with replacement)

3) Construct the sampling distribution for the sample mean for samples of size 2

4) Calculate the center and spread of the sampling distribution for the sample mean

5) Compare 1) and 4)

Consider a population of a uniformly distributed variable X with all of the values in the set {1, 2, 3, 4} occurring equally likely:

Page 11: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 1 (continued)

• Mean of the population distribution of the variable X, denoted by x or simply

• Variance of the population distribution of the variable X, denoted by x

2 or simply

• Standard deviation of the population distribution of the variable X, denoted by x or simply :

524

14

4

13

4

12

4

11 .)x(pxx

1181251 ..x

251524

14

4

13

4

12

4

11 22222222 ..)x(Px xx

Page 12: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

This table lists all possible samples of size 2, the mean for each sample, and the probability of each sample occurring (all equally likely)

Example 1 (continued)Sample Sample Mean Probability

{1,1} 1.0 1/16

{1,2} 1.5 1/16

{1,3} 2.0 1/16

{1,4} 2.5 1/16

{2,1} 1.5 1/16

{2,2} 2.0 1/16

{2,3} 2.5 1/16

{2,4} 3.0 1/16

{3,1} 2.0 1/16

{3,2} 2.5 1/16

{3,3} 3.0 1/16

{3,4} 3.5 1/16

{4,1} 2.5 1/16

{4,2} 3.0 1/16

{4,3} 3.5 1/16

{4,4} 4.0 1/16

Page 13: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

1.0 1/161.5 2/162.0 3/162.5 4/163.0 3/163.5 2/164.0 1/16

Sampling Distributionof the Sample Mean

x P x( )

1.0 1.5 2.0 2.5 3.0 3.5 4.00.00

0.05

0.10

0.15

0.20

0.25

x

P x( )

Histogram: Sampling Distributionof the Sample Mean

Example 1 (continued)

• Summarize the information in the previous table to obtain the sampling distribution of the sample mean :

Notice that the sampling distribution of the sample mean is normal.

Page 14: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 1 (continued)

• Mean of the sampling distribution of the sample mean , denoted by

is:

• Variance of the sampling distribution of the sample mean , denoted by is :

• Standard deviation of the sampling distribution of the sample mean , denoted by is:

xX

X

X

2x

x

5216

104

16

2513

16

303

16

452

16

302

16

251

16

101 ........x

2x 625052

16

104

16

253

16

303

16

452

16

302

16

251

16

101 22222222 .........

79106250 ..x

Page 15: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 1 (continued)

From the above example, we conclude that

• Sampling distribution of the sample mean tends to be bell-shaped.

• The mean of the sampling distribution of the sample mean is the same as the underlying population mean. That is

• The standard deviation of the sampling distribution of the sample mean is less than the standard deviation of the population standard deviation. In fact,

xx

nx

x

Check: 2

11817910

..

Page 16: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 2

• We have the data1, 7, 11, 12, 17, 17, 17, 21, 21, 21, 22, 22

and we want to take samples of size n = 3

• First, a histogram of the entire data set

Page 17: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 2 (continued)

• A histogram of the entire data set

• Definitely skewed left … not bell shaped

Page 18: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 2 (continued)

• Taking some samples of size 3

• The first sample, 17, 21, 12, has a mean of 16.7• The second sample, 17, 7, 17 has a mean of 13.7• The third sample, 22, 11, 21 has a mean of 18

Page 19: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 2 (continued)

• More sample means from more samples• We calculate the mean for each sample as shown below:

Page 20: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 2 (continued)

• Finally, a histogram of 20 sample means

Page 21: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 2 (continued)

• The original data set was highly left skewed, but the set of sample means is less skewed and closer to bell shaped

Page 22: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 2 (continued)

Sampling Distribution of Sample Mean ( sample size = 5)

0

1

2

3

4

5

6

7

8

8.1 - 10.0 10.1 - 12.0 12.1 - 14.0 14.1 - 16.0 16.1 - 18.0 18.1 - 20.0 20.1 - 22.0

Sample Mean

Fre

qu

en

cy

• If taking a sample of size 5 repeatedly 20 times.• Here is a histogram of 20 sample means:

Observe that the empirical distribution of sample means is more closer to bell shaped when the size of the sample increases.

Page 23: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Distribution of Sample Mean

• In general, if the underlying population is closer to be bell shaped (normally distributed), then the sampling distribution (i.e. the distribution of sample mean) will tend to be more bell shaped as well.

• In fact, the sampling distribution– Will be normally distributed– Will have a mean equal to the mean of the population– Will have a standard deviation less than the standard

deviation of the population

Page 24: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Distribution of Sample Mean• Why does it have a smaller standard deviation?

• The population standard deviation – Is a measure of the distance/deviation between an individual

value and the population mean– Is a standard deviation of the sample mean for n = 1

• The standard deviation of the sample mean – Is a measure of the distance/deviation between the sample

mean and the mean of the sampling distribution (which is the same as the population mean)

• It makes sense that the estimate of the population mean using a sample mean is more accurate (closer to the population mean) if the sample contains more values (a larger n) from the population. Therefore, the larger the sample size, the less of the deviation of the sample mean from the true population mean. => standard deviation of sample mean is inversely related to the sample size n.

x

x

Page 25: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Sampling Distribution of Sample Mean

• If a simple random sample of size n is drawn from a population, then the sampling distribution has– Mean x and

– Standard deviation

• In addition, if the population is normally distributed, then– The sampling distribution is normally distributed

x

nx

x

Page 26: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Standard Error

• The standard deviation of the sample mean

is also called the standard error

• The formula for is

• This is an extremely important formula

x

x

nx

x

Page 27: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example

• If the random variable X has a normal distribution with a mean of 20 and a standard deviation of 12– If we choose samples of size n = 4, then the sample

mean will have a normal distribution with a mean of 20 and a standard deviation of 6 (since )

– If we choose samples of size n = 9, then the sample

mean will have a normal distribution with a mean of 20 and a standard deviation of 4 ( since )

4

126

9

124

Note: if the underlying population distribution is normal, the sampling Distribution of sample mean will be noraml regarless if the sample size is large or small.

Page 28: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Sampling Distribution of sample Mean

• This is great if our random variable X has a normal distribution

• However … what if underlying population distribution of the random variable X does not have a normal distribution

• What can we say about the sampling distribution of the sample mean?– Wouldn’t it be very nice if the sampling distribution for

sample mean also was normal?– This is almost true …

Page 29: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Central Limit Theorem

• The Central Limit Theorem statesRegardless of the shape of the underlying population distribution, the sampling distribution of the sample mean becomes approximately normal as the sample size n increases.

• Thus– If the random variable X is normally distributed, then the

sampling distribution of the sample mean is normally distributed also regardless the size of the sample.

– For all other random variables X, the sampling distributions are approximately normally distributed if the size of the sample is large enough.

Page 30: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Graphical Illustration of the Central Limit TheoremOriginal Population

x10 3020

10 x

Distribution of x: n = 10

x

Distribution of x:n = 30

10 20

x

Distribution of x: n = 2

10 3020

Page 31: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

How large is the sample?

• This approximation, of the sampling distribution being normal, is good for large sample sizes … large values of n

• How large does n have to be?

• A rule of thumb – if n is 30 or higher, this approximation is probably pretty good

Page 32: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Applications of the Central Limit Theorem

• When the sampling distribution of the sample mean is (exactly) normally distributed, or approximately normally distributed (by the CLT), we can answer probability questions using the standard normal distribution.

Page 33: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 1

• We’ve been told that the average weight of giraffes is 2400 pounds with a standard deviation of 300 pounds

• We randomly picked 50 giraffes and measured them and found that the sample mean was 2600 pounds

• Is our data consistent with what we’ve been told? ( That is, does the sample mean of 2600 pounds observed support the claim that the average of population giraffes is 2400 pounds?)

Page 34: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 1 (continued)• Although we do not know the shape of the distribution of giraffe’s weight.

Since the sample size is 50 which is large enough to justify the central limit theorem. The sampling distribution of the average weight of 50 giraffes is expected to be approximately normal with mean 2400 pounds (the same as the population) and a standard deviation of 300 / √ 50 = 42.4 pounds

• Using our calculations for the general normal distribution, 2600 is 200 pounds over 2400, and 200 pounds is 200 / 42.4 = 4.7 . That is the Z-score for the average weight of 2600 is 4.7 which is a value near the end of the right tail of a standard normal distribution.

• From our normal calculator, probability obtaining an average at least this large by chance is less than 0.00001.

• Something is definitely strange … we’ll see what to do later in inferential statistics

Page 35: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 2

Consider a normal population with = 50 and =15. Suppose a sample of size 9 is selected at random. Find:

P x( )45 60

P x( . )47 5

1)

2)

Solutions: Since the original population is normal, the distribution of the sample mean is also (exactly) normal with

1) x 50

3) use TI calculator to find the probability : normalcdf(45,60,50,5) = 0.8186 normalcdf(-E99,47.5,50,5) = 0.3085

x n 15 9 15 3 5 2)

Page 36: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

5045 60 x0 1.00 2.00 z

0 3413. 0 4772.

Example 2 (continued)

P x P

P z

( )

(

. . .

45 6045 50

560 50

5

1.00 2.00)

0 3413 0 4772 08185

zz = ;x - n

Or, use Z-table to solve:

Page 37: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

5047.5 x0-0.50 z

01915.0 3085.

Example 2 (continued)

P x Px

P z

( . ).

( . )

. . .

47 550

547 5 50

5

5

05000 01915 0 3085

z = ;x - n

Page 38: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 3 A recent report stated that the day-care cost per week in Boston is $109.

Suppose this figure is taken as the mean cost per week and that the standard deviation is known to be $20.

1) Find the probability that a sample of 50 day-care centers would show a mean cost of $105 or less per week.

2) Suppose the actual sample mean cost for the sample of 50 day-care centers is $120. Is there any evidence to refute the claim of $109 presented in the report?

Solutions:The shape of the original distribution is unknown, but the sample size,n = 50, is large. The CLT applies.The distribution of is approximately normal with X

x n 109 20 50 2 83 .

x

Page 39: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 3 (continued)

xP P

P z

( ).

( . )

. . .

105105 109

2 83

141

0 5000 0 4207 0 0793

z = ;x - n

z

109105 x0 141. z

0 4207.0 0793.

1)Use Z-table:

Or, Use TI calculator: normalcdf(-E99,105,109,2.83) = 0.0787

Page 40: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example 3 (continued)• To investigate the claim, we need to examine how likely an observation is

the sample mean of $120

• There is evidence (the sample) to suggest the claim of = $109 is likely wrong

• Since the probability is so small, this suggests the observation of $120 is very rare (if the mean cost is really $109)

• Consider how far out in the tail of the distribution of the sample mean is $120.

P x P

P z

( ).

( . )

120 120 1092 83

3890.5000 - 0.4999 = 0.0001

z = ;x - n

z

2)

Or using TI calculator: normalcdf(120,E99,120,2.83) = 5.08E-5

Page 41: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Summary

• The sample mean is a random variable with a distribution called the sampling distribution– If the sample size n is sufficiently large (30 or more

is a good rule of thumb), then this distribution is approximately normal

– The mean of the sampling distribution is equal to the mean of the underlying population

– The standard error/deviation of the sampling distribution is equal to n/x

Page 42: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Distribution of the Sample Proportion

Page 43: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Learning Objective

• Describe the sampling distribution of a sample proportion

• Calculate probabilities of a sample proportion

Page 44: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Sample Proportion

• In an election, polling companies wish to estimate the percent of people who will vote for each candidate

• This clearly is a situation for sampling as it is impractical to contact every single voter

• The desired results are proportions, for example that 59% of the voters (a proportion of 0.59) said that they will vote for candidate A

Page 45: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Sampling Distribution of Sample Proportion

• We have the same questions for the sample proportion as we had for the sample mean– What is the mean for the sampling distribution of the

sample proportion?– What is the standard deviation for the sampling

distribution of the sample proportion?– What is the distribution of the sample proportion?– Can we apply the Central Limit Theorem to

approximate these with normal distributions?

• The answer is yes …

Page 46: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Sample Proportions

• A random sample is take– Of size n– Each individual either has or does not have a

certain characteristic (dichotomous outcomes)– In total, there are x individuals that have this

characteristic• Then the sample proportion (p hat) (the

proportion of individuals with this characteristic is given by

nx

Page 47: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example

• If a polling company polled 800 people to see if they supported a certain issue and 475 did, then we have a sample proportion problem with– n = 800– x = 475– and a sample proportion of 590

800475

.p̂

Page 48: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

• If the population proportion is p, then the distribution of the sample proportion for a sample of size n– Is approximately normal if np(1-p) ≥ 10– Has a mean of

– Has a standard deviation of

Sampling Distribution of Sample proportion

pp̂

n)p(p

p̂ 1

Page 49: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example

• Assume that 80% of the people taking aerobics classes are female and a simple random sample of n = 100 students is taken– What is the probability that at most 75% of the

sample students are female?– If the sample had exactly 90 female students,

would that be unusual?

Page 50: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example (continued)

• The sample proportion of aerobics students who are female– Has an approximately normal distribution– Has a mean of 0.80 and a standard deviation of 0.04

• What is the probability that is 0.75 or less?– 0.75 is 0.05 less than the mean of 0.80– 0.05 is 1.25 standard deviations less than the mean (i.e. the z-

score is –1.25)– The normal probability P(z ≤ –1.25) = .1056– Thus P( ≤ 0.75) = .1056

Note: To obtain P( ≤ 0.75) = .1056, apply TI graphing calculator with normalcdf(-E99,-1.25,0,1) or normalcdf(-E99,0.75, 0.80,0.04)

[or normalcdf(0,0.75,0.80,0.04),since probability can’t be less than zero.]

Page 51: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Example (continued)• The sample proportion of aerobics students who are female

– Has an approximately normal distribution– Has a mean of 0.80 and a standard deviation of 0.04

• What is the probability that is 0.90 or more?

Notice that instead of finding the probability being exactly 0.90 (which will be zero using normal distribution), we can justify if 0.90 is unusual or not (proportion is too high) by evaluating the probability being at least as high as 0.90, since 0.90 is larger than the expected value of 0.80.

– 0.90 is 0.10 more than the mean of 0.80– 0.10 is 2.5 standard deviations more than the mean (i.e. the z-score is

2.5)– The normal probability P(z ≥ 2.5) = .0062– Thus P( ≥ 2.5) = 0.0062 … pretty unlikely Note: To obtain P( ≥ 2.5) = 0.0062 , apply TI-calculator with normcdf(2.5,E99,0,1) = 0.0062 or normcdf(0.9, E99,0.8,0.04) = 0.0062 [or normcdf(0.9,1,0.8,0.04) , since probability can only go up to 1]

p̂p̂

Page 52: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Summary

• The sample proportion, like the sample mean, is a random variable– If the sample size n is sufficiently large and the

population proportion p isn’t close to either 0 or 1, then this distribution is approximately normal

– The mean of the sampling distribution is equal to the population proportion p

– The standard deviation of the sampling distribution is equal to n/)p(p 1

Page 53: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Summaryof

Sampling Distributions

Page 54: Sampling Distributions Central Limit Theorem. Objectives Investigate the variability in sample statistics from sample to sample Find measures of central

Summary

• The sample mean and the sample proportion can be considered as random variables

• The sample mean is approximately normal with– A mean equal to the population mean– A standard deviation equal to

• The sample proportion is approximately normal with– A mean equal to the population proportion – A standard deviation equal to

x

n/x

pp̂

n/)p(pp̂ 1