form groups of three. each group needs: 3 sampling distributions worksheets (one per person)

37

Upload: red

Post on 23-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Form groups of three. Each group needs: 3 Sampling Distributions Worksheets (one per person) 5 six-sided dice. Chapter 8. Sampling Variability and Sampling Distributions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)
Page 2: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

• Form groups of three. Each group needs: • 3 Sampling Distributions Worksheets (one

per person)• 5 six-sided dice

Page 3: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Chapter 8Sampling Variability and Sampling Distributions

Page 4: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Suppose we are interested in finding the true mean (m) fat content of quarter-pound hamburgers marketed by a national fast food chain. To learn something about m, we could obtain a sample of n = 50 hamburgers and determine the fat content of each one.

Would the sample mean be a good estimate of m?How close is the

sample mean to m?

Will other samples of n = 50 have the same sample mean?

To answer these questions, we will examine the sampling distribution,

which describes the long-run behavior of sample statistic.

Recall that the sample mean is a statistic

Page 5: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Statistic• A number that that can be computed

from sample data • Some statistics we will use include

x – sample means – standard deviationp – sample proportion

• The observed value of the statistic depends on the particular sample selected from the population and it will vary from sample to sample.

This variability is called sampling

variability

Page 6: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

The campus of Wolf City College has a fish pond. Suppose there are 20 fish in the pond. The lengths of the fish (in inches) are given below: 4.5 5.4 10.

37.9 8.5 6.6 11.

78.9 2.2 9.8

6.3 4.3 9.6 8.7 13.3

4.6 10.7

13.4

7.7 5.6We caught fish with lengths 6.3 inches, 2.2 inches, and 13.3 inches.x = 7.27 inches

Let’s catch two more

samples and look at the

sample means.

2nd sample - 8.5, 4.6, and 5.6 inches.x = 6.23 inches3rd sample – 10.3, 8.9, and 13.4 inches.x = 10.87 inches

Suppose we randomly catch a sample of 3 fish from this pond and measure their length. What would the mean length of the sample be?

This is an example

of sampling variability

This is a statistic!The true mean m =

8.Notice that some

sample means are closer and some

farther away; some above and some below the mean.

Page 7: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Fish Pond Continued . . .

4.5 5.4 10.3

7.9 8.5 6.6 11.7

8.9 2.2 9.8

6.3 4.3 9.6 8.7 13.3

4.6 10.7

13.4

7.7 5.6There are 1140 (20C3) different possible samples of size 3 from this population. If we were to catch all those different samples and calculate the mean length of each sample, we would have a distribution of all possible x.

This would be the sampling distribution of x.

Page 8: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Sampling Distributions of x• The distribution that would be

formed by considering the value of a sample statistic for every possible different sample of a given size from a population.

In this case, the sample statistic is the sample

mean x.

Page 9: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Fish Pond Revisited . . .Suppose there are only 5 fish in the pond. The lengths of the fish (in inches) are given below:

mx = 7.84sx = 3.262

What is the mean and standard

deviation of this population?

6.6 11.7

8.9 2.2 9.8 We will keep the population size

small so that we can find ALL the

possible samples.

Page 10: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Fish Pond Revisited . . .

What is the mean and standard

deviation of these sample

means?

6.6 11.7

8.9 2.2 9.8

Pairs 6.6 & 11.7

6.6 & 8.9

6.6 & 2.2

6.6 & 9.8

11.7 & 8.9

11.7 & 2.2

11.7 & 9.8

8.9 & 2.2

8.9 & 9.8

2.2 &

9.8x 9.15 7.75 4.4 8.2 10.3 6.95 10.7

55.55 9.35 6How many

samples of size 2 are possible?

mx = 7.84sx = 1.998

How do these values compare

to the population mean and standard

deviation?

mx = 7.84 and sx = 3.262

Let’s find all the samples of size

2.These values determine the sampling distribution of x

for samples of size 2.

Page 11: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Fish Pond Revisited . . .

6.6 11.7

8.9 2.2 9.8

Triples 6.6, 11.7, 8.9

6.6, 11.7, 2.2

6.6, 11.7, 9.8

6.6, 8.9, 2.2

6.6, 8.9, 9.8

6.6, 2.2, 9.8

11.7, 8.9, 2.2

11.7, 8.9, 9.8

11.7, 2.2, 9.8

8.9, 2.2, 9.8

x 9.067 6.833 9.367 5.9 8.433 6.2 7.6 10.133

7.9 6.967

How many samples of size 3

are possible?

mx = 7.84sx = 1.332

How do these values compare to

the population mean and standard

deviation?

mx = 7.84 and sx = 3.262

Now let’s find all the samples of size 3.

What is the mean and standard

deviation of these sample

means?

These values determine the sampling distribution of x

for samples of size 3.

Page 12: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

What do you notice?• The mean of the sampling

distribution EQUALS the mean of the population.

• As the sample size increases, the standard deviation of the sampling distribution decreases.

mx = m

as n sx

Page 13: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

General Properties of Sampling Distributions of x

Rule 1:

Rule 2:

This rule is exact if the population is infinite, and is approximately correct if the population is finite and no more than 10% of the population is included in the sample

mm x

nxss

Note that in the previous fish pond

examples this standard deviation

formula was not correct because the sample sizes were

more than 10% of the population.

Page 14: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

The paper “Mean Platelet Volume in Patients with Metabolic Syndrome and Its Relationship with Coronary Artery Disease” (Thrombosis Research, 2007) includes data that suggests that the distribution of platelet volume of patients who do not have metabolic syndrome is approximately normal with mean m = 8.25 and standard deviation s = 0.75.

We can use Minitab to generate random samples from this population. We will generate 500 random samples of n = 5 and compute the

sample mean for each.

Page 15: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Platelets Continued . . .Similarly, we will generate 500 random samples of n = 10, n = 20, and n = 30. The density histograms below display the resulting 500 x for each of the given sample sizes.

What do you notice about the means of these

histograms?

What do you notice about the

standard deviation of these

histograms?

What do you notice about the shape of these

histograms?

Page 16: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

General Properties Continued . . .

Rule 3: When the population distribution is normal, the sampling distribution of x is also normal for any sample size n.

Page 17: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

The paper “Is the Overtime Period in an NHL Game Long Enough?” (American Statistician, 2008) gave data on the time (in minutes) from the start of the game to the first goal scored for the 281 regular season games from the 2005-2006 season that went into overtime. The density histogram for the data is shown below.

Let’s consider these 281 values as a population. The distribution is strongly positively skewed with mean m = 13 minutes and with a median of 10 minutes.Using Minitab, we will

generate 500 samples of the following sample

sizes from this distribution:

n = 5, n = 10, n = 20, n = 30.

Page 18: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

These are the density histograms for the

500 samples

Are these histograms centered at

approximately m = 13?

What do you notice about the standard deviations of these

histograms?

What do you notice about the shape of these histograms?

Page 19: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

General Properties Continued . . .

Rule 4: Central Limit TheoremWhen n is sufficiently large, the sampling distribution of x is well approximated by a normal curve, even when the population distribution is not itself normal.

CLT can safely be applied if n exceeds 30.

How large is “sufficiently large” anyway?

Page 20: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

A soft-drink bottler claims that, on average, cans contain 12 oz of soda. Let x denote the actual volume of soda in a randomly selected can. Suppose that x is normally distributed with s = .16 oz. Sixteen cans are randomly selected, and the soda volume is determined for each one. Let x = the resulting sample mean soda. If the bottler’s claim is correct, then the sampling distribution of x is normally distributed with:

04.1616.

12

s

s

mm

nx

x

Page 21: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

.9772 - .1587 = .8185

Soda Problem Continued . . .

04.1616.

12

s

s

mm

nx

x

What is the probability that the sample mean soda volume is between 11.96 ounces and 12.08 ounces?P(11.96 < x < 12.08)

=

To standardized these endpoints, use

n

xxzx

xs

m

sm

104.1296.11*

a 204.

1208.12*

b

Look these up in the table and subtract the

probabilities.

Page 22: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

A hot dog manufacturer asserts that one of its brands of hot dogs has a average fat content of 18 grams per hot dog with standard deviation of 1 gram. Consumers of this brand would probably not be disturbed if the mean was less than 18 grams, but would be unhappy if it exceeded 18 grams. An independent testing organization is asked to analyze a random sample of 36 hot dogs. Suppose the resulting sample mean is 18.4 grams. Does this result suggest that the manufacturer’s claim is incorrect?

Since the sample size is greater than 30, the Central Limit Theorem

applies.So the distribution of x is approximately normal with

1667.361and18 sm xx

Page 23: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Hot Dogs Continued . . .

Suppose the resulting sample mean is 18.4 grams. Does this result suggest that the manufacturer’s claim is incorrect?

P(x > 18.4) =

1667.361and18 sm xx

40.21667.184.18

z

1 - .9918 = .0082

Values of x at least as large as 18.4 would be observed only about .82% of the time.

The sample mean of 18.4 is large enough to cause us to doubt that the manufacturer’s

claim is correct.

Page 24: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)
Page 25: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Let’s explore what happens with in distributions of sample proportions (p). Have students perform the following experiment.

•Toss a penny 20 times and record the number of heads. •Calculate the proportion of heads and mark it on the dot plot on the board.

What shape do you think the dot plot will have?

The dotplot is a partial graph of the sampling distribution of all sample

proportions of sample size 20.

This is a statistic!

What would happen to the dotplot if we flipped the penny 50 times and recorded the proportion of heads?

Page 26: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Sampling Distribution of pThe distribution that would be formed by considering the value of a sample statistic for every possible different sample of a given size from a population.

We will use:p for the population proportion

and p for the sample proportion

In this case, we will use

np sample the in successes of numberˆ

Page 27: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Suppose we have a population of six students: Alice, Ben, Charles, Denise, Edward, & Frank

We are interested in the proportion of females. This is calledWhat is the proportion of females?

Let’s select samples of two from this population.How many different samples are possible?

the parameter of interest

6C2 =15

1/3

We will keep the population small so that we can find ALL

the possible samples of a given size.

Page 28: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Find the 15 different samples that are possible and find the sample proportion of the number of females in each sample.

Alice & Ben .5

Alice & Charles .5Alice & Denise 1Alice & Edward .5Alice & Frank .5

Ben & Charles 0Ben & Denise .5Ben & Edward 0

Ben & Frank 0Charles & Denise .5Charles & Edward 0Charles & Frank 0Denise & Edward .5Denise & Frank .5Edward & Frank 0

Find the mean and standard deviation of these sample proportions.

29814.0and31

ˆˆ sm pp

How does the mean of the sampling distribution

compare to the population parameter

(p)?

Page 29: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

General Properties for Sampling Distributions of pRule 1:

Rule 2:

This rule is exact if the population is infinite, and is approximately correct if the population is finite and no more than 10% of the population is included in the sample

pp m ˆ

npp

p

)1(ˆ

s

Note that in the previous student

example this standard deviation

formula was not correct because the

sample size was more than 10% of the

population.

Page 30: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

In the fall of 2008, there were 18,516 students enrolled at California Polytechnic State University, San Luis Obispo. Of these students, 8091 (43.7%) were female. We will use a statistical software package to simulate sampling from this Cal Poly population. We will generate 500 samples of each of the following sample sizes: n = 10, n = 25, n = 50, n = 100 and compute the proportion of females for each sample.

The following histograms display the distributions of the sample proportions for the 500 samples of each sample size.

Page 31: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Are these histograms centered around the

true proportion p = .437?

What do you notice about the standard deviation of these

distributions?

What do you notice about the shape of these distributions?

Page 32: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

The development of viral hepatitis after a blood transfusion can cause serious complications for a patient. The article

“Lack of Awareness Results in Poor Autologous Blood Transfusions” (Health Care Management, May 15, 2003) reported that hepatitis occurs in 7% of patients who receive blood transfusions during heart surgery. We will simulate sampling from a population of blood recipients. We will generate 500 samples of each of the following sample sizes: n = 10, n = 25, n = 50, n = 100 and compute the proportion of people who contract hepatitis for each sample.The following histograms display the distributions of the sample proportions for the 500 samples of each sample size.

Page 33: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Are these histogra

ms centered around the true proportio

n p = .07?

What happens

to the shape of

these histograms as the sample

size increases?

Page 34: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

General Properties Continued . . .

Rule 3: When n is large and p is not too near 0 or 1, the sampling distribution of p is approximately normal.

The farther the value of p is from 0.5, the larger n must be for the sampling distribution of p to be approximately normal.A conservative rule of thumb:If np > 10 and n (1 – p) > 10, then a normal distribution provides a reasonable approximation to the sampling distribution of p.

Page 35: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Blood Transfusions Revisited . . .Let p = proportion of patients who

contract hepatitis after a blood transfusion

p = .07

Suppose a new blood screening procedure is believed to reduce the incident rate of hepatitis. Blood screened using this procedure is given to n = 200 blood recipients. Only 6 of the 200 patients contract hepatitis. Does this result indicate that the true proportion of patients who contract hepatitis when the new screening is used is less than 7%?

To answer this question, we must consider the sampling

distribution of p.

Page 36: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Blood Transfusions Revisited . . .Let p = .07 p = 6/200 = .03

Is the sampling distribution approximately normal?np = 200(.07) = 14 > 10n(1-p) = 200(.93) = 186 > 10

What is the mean and standard deviation of the sampling distribution?

Yes, we can use a normal approximation.

018.200)93(.07.

07.

ˆ

ˆ

s

m

p

p

Page 37: Form groups of three. Each group needs:  3  Sampling Distributions Worksheets (one per person)

Blood Transfusions Revisited . . .Let p = .07 p = 6/200 = .03

Does this result indicate that the true proportion of patients who contract hepatitis when the new screening is used is less than 7%?

018.200)93(.07.

07.

ˆ

ˆ

s

m

p

p

P(p < .03) =Assume the screening

procedure is not effective and p = .07.

22.2

200)93(.07.

07.03.

z

.0132This small probability

tells us that it is unlikely that a

sample proportion of .03 or smaller would be observed if the

screening procedure was ineffective.

This new screening procedure

appears to yield a smaller incidence rate for hepatitis.