sampling distribution models - anne gloag's math page...

42
Sampling Distribution Models Copyright © 2009 Pearson Education, Inc.

Upload: others

Post on 27-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Sampling Distribution Models

Copyright © 2009 Pearson Education,

Inc.

Page 2: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Rather than showing real repeated samples, imagine what would happen if we were to actually draw many samples.

The histogram we’d get if we could see all the proportions from all possible samples is called the sampling distribution of the proportions.

Sli

de

1-

2

Page 3: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

It turns out that the histogram is unimodal, symmetric, and centered at p.

More specifically, it’s an amazing and fortunate fact that a Normal model is just the right one for the histogram of sample proportions.

Sli

de

1-

3

Page 4: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Model how sample proportions vary from sample to sample.

A sampling distribution model for how a sample proportion varies from sample to sample allows us to quantify that variation and how likely it is that we’d observe a sample proportion in any particular interval.

Sli

de

1-

4

Page 5: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

When working with proportions,

Mean = p

Standard deviation =

So, the distribution of the sample proportions is modeled with a probability model that is

Sli

de

1-

5

pq

n

N p,pq

n

Page 6: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

A picture of what we just discussed is as follows:

Sli

de

1-

6

Page 7: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Normal model says that 95% of values are within two standard deviations of the mean.

So 95% of various polls gave results that were near the mean but varied above and below that by no more than two standard deviations.

This is what we mean by sampling error. It’s not really an error at all, but just variability you’d expect to see from one sample to another.

Sli

de

1-

7

Page 8: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

The Normal model gets better as a good model for the distribution of sample proportions as the sample size gets bigger.

Sli

de

1-

8

Page 9: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

There are two assumptions in the case of the model for the distribution of sample proportions:

1. The Independence Assumption: The sampled values must be independent of each other.

2. The Sample Size Assumption: The sample size, n, must be large enough.

Sli

de

1-

9

Page 10: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

1. Randomization Condition: The sample should be a simple random sample of the population.

2. 10% Condition: If sampling has not been made with replacement, then the sample

size, n, must be no larger than 10% of the population.

3. Success/Failure Condition: The sample size

has to be big enough so that both np and

nq are at least 10.

Sli

de

1-

10

Page 11: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Sampling distribution models are important because ◦ they act as a bridge from the real world of data to

the imaginary model of the statistic and

◦ enable us to say something about the population when all we have is data from the real world.

Sli

de

1-

11

Page 12: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Proportions summarize categorical variables.

The Normal sampling distribution model looks like it will be very useful.

Can we do something similar with quantitative data?

We can indeed. Even more remarkable, not only can we use all of the same concepts, but almost the same model.

Sli

de

1-

12

Page 13: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

A sample mean also has a sampling distribution.

Let’s start with a simulation of 10,000 tosses of a die. A histogram of the results is:

Sli

de

1-

13

Page 14: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Looking at the average of two dice after a simulation of 10,000 tosses:

The average of three dice after a simulation of 10,000 tosses looks like:

Sli

de

1-

14

Page 15: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

The average of 5 dice after a simulation of 10,000 tosses looks like:

The average of 20 dice after a simulation of 10,000 tosses looks like:

Sli

de

1-

15

Page 16: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

As the sample size (number of dice) gets larger, each sample average is more likely to be closer to the population mean.

The sampling distribution of a mean becomes Normal.

Sli

de

1-

16

Page 17: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

The sampling distribution of any mean becomes more nearly Normal as the sample size grows. ◦ All we need is for the observations to be

independent and collected with randomization.

◦ We don’t even care about the shape of the population distribution!

The Fundamental Theorem of Statistics is called the Central Limit Theorem (CLT).

Sli

de

1-

17

Page 18: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

The Central Limit Theorem (CLT)

The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approximation will be.

Sli

de

1-

18

Page 19: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

The CLT requires essentially the same assumptions we saw for modeling proportions: Independence Assumption: The sampled values

must be independent of each other. Sample Size Assumption: The sample size must

be sufficiently large.

Sli

de

1-

19

Page 20: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

The Normal model for the sampling distribution of the mean has a mean equal to the population mean:

𝑦 = 𝜇 And a standard deviation equal to

where σ is the population standard deviation.

Sli

de

1-

20

SD y

n

Page 21: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Both of the sampling distributions we’ve looked at are Normal. ◦ For proportions

◦ For means

Sli

de

1-

21

SD p̂ pq

n

SD y

n

Page 22: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

When we don’t know p or σ, we will use sample statistics to estimate these population parameters.

Whenever we estimate the standard deviation of a sampling distribution, we call it a standard error.

Sli

de

1-

22

Page 23: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

For a sample proportion, the standard error is

For the sample mean, the standard error is

Sli

de

1-

23

SE p̂ p̂q̂

n

SE y s

n

Page 24: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Be careful! Now we have two distributions to deal with.

Sli

de

1-

24

The first is the real world distribution of the sample, which we might display with a histogram.

The second is the math world sampling distribution of the statistic, which we model with a Normal model based on the Central Limit Theorem.

Don’t confuse the two!

Page 25: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

There are two basic truths about sampling distributions:

1. Sampling distributions arise because samples vary. Each random sample will have different cases and so, a different value of the statistic.

2. Although we can always simulate a sampling distribution, the Central Limit Theorem saves us the trouble for means and proportions.

Sli

de

1-

25

Page 26: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Sli

de

1-

26

Page 27: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Don’t confuse the sampling distribution with the distribution of the sample. ◦ When you take a sample, you look at the distribution of

the values, usually with a histogram, and you may calculate summary statistics.

◦ The sampling distribution is an imaginary collection of the values that a statistic might have taken for all random samples—the one you got and the ones you didn’t get.

Watch out for small samples from skewed populations. ◦ The more skewed the distribution, the larger the sample

size we need for the CLT to work.

Sli

de

1-

27

Page 28: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Based on past experience, a bank believes that 12% of people who receive loans will not make payments on time. The bank has recently approved 500 loans. a) What is the mean and standard deviation of the

proportion of clients in this group who many not make timely payments?

μ = p = 0.12

𝑺𝑫 = 𝒑𝒒/𝒏 = . 𝟏𝟐 ∙. 𝟖𝟖/𝟓𝟎𝟎 = 𝟎. 𝟎𝟏𝟓 b) What is the probability that over 14% of these clients will not make payments on time? P(p > 0.14) = 1 – Normdist(0.14,0.12,0.015,1) = 0.91

Sli

de

1-

28

Page 29: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Just before a referendum on a school budget, a local newspaper polls 435 voters to predict whether the budget will pass. Suppose the budget has the support of 54% of the voters. What is the probability that the newspaper’s sample will lead it to predict defeat?

a) mean and standard deviation of the proportion:

μ = p = 0.54

𝑺𝑫 = 𝒑𝒒/𝒏 = . 𝟓𝟒 ∙. 𝟒𝟔/𝟒𝟑𝟓 = 𝟎. 𝟎𝟐𝟒

P(p < 0.5) = 1 – Normdist(0.5,0.54,0.024,1) = 0.048

Sli

de

1-

29

Page 30: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

When a truckload of apples arrives at a packing plant, a random sample of 125 is selected and examined for bruises, discoloration, and other defects. The whole truckload is rejected if more than 10% of the sample is unsatisfactory. Suppose in fact that 12% of the apples on the truck do not meet the desired standard. What is the probability that the shipment will be accepted anyway?

mean and standard deviation of the proportion:

μ = p = 0.12

𝑺𝑫 = 𝒑𝒒/𝒏 = . 𝟏𝟐 ∙. 𝟖𝟖/𝟏𝟐𝟓 = 𝟎. 𝟎𝟐𝟗

P(p < 0.10) = 1 – Normdist(0.1,0.12,0.029,1) = 0.245

Sli

de

1-

30

Page 31: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

A new restaurant with 119 seats is being planned. Studies show that 63% of the customers demand a smoke-free area. How many seats should be in the non-smoking area in order to be very sure (μ+3σ) of having enough seating there? mean and standard deviation of the proportion: μ = p = 0.63

𝑺𝑫 = 𝒑𝒒/𝒏 = . 𝟔𝟑 ∙. 𝟑𝟕/𝟏𝟏𝟗 = 𝟎. 𝟎𝟒𝟒 μ+3σ = .63 + 3*0.044 = 0.763 0.763*119 = 90 seats

Sli

de

1-

31

Page 32: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Assume that the duration of human pregnancies can be described by a normal model with mean 268 days and standard deviation 16 days.

a) What percentage of pregnancies should last between 255 and 270 days?

P(255<x < 270) =

normdist(270,268,16,1)-normdist(255,268,16,1)

=.341 = 34.1%

Sli

de

1-

32

Page 33: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Assume that the duration of human pregnancies can be described by a normal model with mean 268 days and standard deviation 16 days.

b) At least how many days should the longest 30% of all pregnancies last?

P(x > ?) = 0.3

norminv(0.7,268,16) = 276.4

Sli

de

1-

33

Page 34: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

c) Suppose a certain obstetrician is currently providing prenatal care to 40 pregnant women. According to the CLT, what is the mean and standard deviation of this model?

Mean = 268

𝑺𝑫 =𝝈

𝒏=

𝟏𝟔

𝟒𝟎= 𝟐. 𝟓𝟑

d) What is the probability that the mean duration of these patients’ pregnancies will be less than 274 days?

P(y < 274) = normdist(274,268,2.53,1) = .991

Sli

de

1-

34

Page 35: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

The score distribution shown in the table is for all students who took a yearly AP statistics exam. An AP statistics teacher had 46 students preparing to take the AP exam. He considered his students to be “typical” of all the national students.

Sli

de

1-

35

Score Percent of students

5 13.9

4 22.5

3 25.3

2 17.2

1 21.1

Page 36: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

The score distribution shown in the table is for all students who took a yearly AP statistics exam. An AP statistics teacher had 46 students preparing to take the AP exam. He considered his students to be “typical” of all the national students.

Sli

de

1-

36

Score Percent of students

5 13.9

4 22.5

3 25.3

2 17.2

1 21.1

Page 37: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

What is the probability that his students will achieve an average score of at least 3. 1. Find mean and standard deviation of the population. μ= E(X) = Σ x * P(X) = 2.909 σ = sqrt(Σ (x – μ)2 * P(x)) = 1.337

Sli

de

1-

37

Score Percent of students

5 13.9

4 22.5

3 25.3

2 17.2

1 21.1

Page 38: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

What is the probability that his students will achieve an average score of at least 3?

2. Find mean and standard deviation of the sample.

Mean = 2.909

SD = σ/sqrt(n)= 1.337/sqrt(46) = .171

3. Find probability:

P(x > 3) = 1 – normdist(3,2.909,.171,1) = .2976

Sli

de

1-

38

Page 39: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

The weight of potato chips in a large-size bag is stated to be 16 ounces. The amount that the packaging machine puts in these bags is believed to have a normal model with a mean of 16.3 ounces and a standard deviation of .21 ounces. a) What fraction of all bags sold are underweight? P(x<16) = normdist(16,16.3,0.21,1) = .0766 b) Some of the chips are sold in bargain packs of 5 bags. What is the probability that none of the 5 is underweight? P(x = 0) = p0q5 = (1-.0766)5 = .6715 Sli

de

1-

39

Page 40: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

The weight of potato chips in a large-size bag is stated to be 16 ounces. The amount that the packaging machine puts in these bags is believed to have a normal model with a mean of 16.3 ounces and a standard deviation of .21 ounces.

c) What is the probability that the mean weight of the 5 bags is below the stated amount?

P(x<16) = normdist(16,16.3,0.21/sqrt(5),1) = .0007

d) What is the probability that the mean weight of a 30-bag case of potato chips is below 16 ounces?

P(x<16) = normdist(16,16.3,0.21/sqrt(30),1) = .0000

Sli

de

1-

40

Page 41: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Suppose that the IQs of university A’s students can be described by a normal model with mean 130 and standard deviation 7 points. Also suppose that IQs of students from university B can be described by a normal model with mean 110 and standard deviation 12. a) Select a student at random from university A. Find the probability that the student’s IQ is at least 125 points. P(x > 125) = 1 - normdist(125,130, 7,1) = .762 b) Select a student at random from each school. Find the probability that the university A student’s IQ is at least 5 points higher than the university B student’s IQ. Define Z = A – B μ= 130 – 110 = 20 σ = sqrt(72 + 122) = 13.89 P(Z > 5) = 1 – normdist(5,20,13,89,1) = 0.860

Sli

de

1-

41

Page 42: Sampling Distribution Models - Anne Gloag's Math Page …annegloag.weebly.com/uploads/2/2/9/9/22998796/chapter18... · 2019-12-01 · The score distribution shown in the table is

Suppose that the IQs of university A’s students can be described by a normal model with mean 130 and standard deviation 7 points. Also suppose that IQs of students from university B can be described by a normal model with mean 110 and standard deviation 12. c) Select 3 university B students at random. Find the probability that this groups average IQ is at least 115 points. P(x > 115) = 1 - normdist(115,110, 12/sqrt(3),1) = .235 d) Also select 3 university A students at random. What is the probability that their average IQ is at least 5 points higher than the average for the 3 university B student? Define Z = A – B μ= 130 – 110 = 20 σ = sqrt(72 /3+ 122/3) = 8.02 P(Z > 5) = 1 – normdist(5,20,8.02,1) = 0.969

Sli

de

1-

42