chapter 7 sampling distributions. introduction in real life calculating parameters of populations is...

43
Chapter 7 Sampling Distributions

Upload: abel-gaines

Post on 02-Jan-2016

229 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Chapter 7

Sampling Distributions

Page 2: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

IntroductionIntroduction

• In real life calculating parameters of populations is prohibitive (difficult to find) because populations are very large.

• Rather than investigating the whole population, we take a sample, calculate a statisticstatistic related to the parameterparameter of interest, and make an inference.

• The sampling distributionsampling distribution of the statistic is the tool that tells us how close is the statistic to the parameter.

Page 3: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

population

2,

sample

inference

2, sx

Sampling Techniques

Statistical Procedures

Parameters

Statistics

Page 4: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

SamplingSamplingExample:Example:

• A pollster is sure that the responses to his “agree/disagree” question will follow a binomial distribution, but p, the proportion of those who “agree” in the population, is unknown.

• An agronomist believes that the yield per acre of a variety of wheat is approximately normally distributed, but the mean and the standard deviation of the yields are unknown.

If you want the sample to provide reliable information about the population, you must select your sample in a certain way!

Page 5: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Types of Sampling Methods/Techniques

Quota

Sampling

Non-Probability Samples

Judgement Chunk

Probability Samples

Simple Random

Systematic

Stratified

Cluster

The sampling plansampling plan or experimental designexperimental design determines the amount of information you can extract, and often allows you to measure the reliability of your inferencereliability of your inference.

Page 6: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Sampling Plan:1.1. Simple random samplingSimple random sampling is a method of sampling

that allows each possible sample of size n an equal probability of being selected.

Simple Random SamplingSimple Random Sampling

Page 7: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

ExampleExample•There are 89 students in a statistics class. The instructor wants to choose 5 students to form a project group. How should he proceed?

1. Give each student a number from 01 to 89.

2. Choose 5 pairs of random digits from the random number table.

3. If a number between 90 and 00 is chosen, choose another number.

4. The five students with those numbers form the group.

1. Give each student a number from 01 to 89.

2. Choose 5 pairs of random digits from the random number table.

3. If a number between 90 and 00 is chosen, choose another number.

4. The five students with those numbers form the group.

Page 8: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Other Sampling TechniquesOther Sampling Techniques• There are several other sampling plans that

still involve randomizationrandomization:

2. Stratified random sample: Divide the population into subpopulations or strata and select a simple random sample from each strata.

3. Cluster sample: Divide the population into subgroups called clusters; select a simple random sample of clusters and take a census of every element in the cluster.

4. 1-in-k systematic sample: Randomly select one of the first k elements in an ordered population, and then select every k-th element thereafter.

2. Stratified random sample: Divide the population into subpopulations or strata and select a simple random sample from each strata.

3. Cluster sample: Divide the population into subgroups called clusters; select a simple random sample of clusters and take a census of every element in the cluster.

4. 1-in-k systematic sample: Randomly select one of the first k elements in an ordered population, and then select every k-th element thereafter.

Page 9: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

ExamplesExamples

• Divide West Malaysia into states and take a simple random sample within each state.

• Divide West Malaysia into states and take a simple random sample of 5 states.

• Divide a city into city blocks, choose a simple random sample of 10 city blocks, and interview all wholive there.

• Choose an entry at random from the phone book, and select every 50th number thereafter.

Page 10: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Non-Random Sampling PlansNon-Random Sampling Plans• There are several other sampling plans that do not

involve randomizationrandomization. They should NOTNOT be used for statistical inference!

1. Convenience sample: A sample that can be taken easily without random selection.

• People walking by on the street

2. Judgment sample: The sampler decides who will and won’t be included in the sample.

3. Quota sample: The makeup of the sample must reflect the makeup of the population on some selected characteristic.

• Race, ethnic origin, gender, etc.

1. Convenience sample: A sample that can be taken easily without random selection.

• People walking by on the street

2. Judgment sample: The sampler decides who will and won’t be included in the sample.

3. Quota sample: The makeup of the sample must reflect the makeup of the population on some selected characteristic.

• Race, ethnic origin, gender, etc.

Page 11: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Types of SamplesTypes of Samples• Sampling can occur in two types of practical

situations:

1. Observational studies: The data existed before you decided to study it. Watch out for

Nonresponse: Are the responses biased because only opinionated people responded?

Undercoverage: Are certain segments of the population systematically excluded?

Wording bias: The question may be too complicated or poorly worded.

1. Observational studies: The data existed before you decided to study it. Watch out for

Nonresponse: Are the responses biased because only opinionated people responded?

Undercoverage: Are certain segments of the population systematically excluded?

Wording bias: The question may be too complicated or poorly worded.

Page 12: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

2. Experimentation: The data are generated by imposing an experimental condition or treatment on the experimental units.

Hypothetical populations can make random sampling difficult if not impossible.

Samples must sometimes be chosen so that the experimenter believes they are representative of the whole population.

Samples must behave like random samples!

2. Experimentation: The data are generated by imposing an experimental condition or treatment on the experimental units.

Hypothetical populations can make random sampling difficult if not impossible.

Samples must sometimes be chosen so that the experimenter believes they are representative of the whole population.

Samples must behave like random samples!

Types of SamplesTypes of Samples• Sampling can occur in two types of practical

situations:

Page 13: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Sampling Distributions

•Numerical descriptive measures calculated from the sample are called statisticsstatistics.

•Statistics vary from sample to sample and hence are random variables.

•The probability distributions for statistics are called sampling distributionssampling distributions.

•In repeated sampling, they tell us what values of the statistics can occur and how often each value occurs.

Page 14: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Possible samples

3, 5, 23, 5, 13, 2, 15, 2, 1

Possible samples

3, 5, 23, 5, 13, 2, 15, 2, 1

x

Sampling DistributionsSampling Distributions

Definition: The sampling distribution of a statisticsampling distribution of a statistic is the probability distribution for the possible values of the statistic that results when random samples of size n are repeatedly drawn from the population.

Population: 3, 5, 2, 1

Draw samples of size n = 3 without replacement

Population: 3, 5, 2, 1

Draw samples of size n = 3 without replacement

67.23/8

23/6

33/9

33.33/10

Each value of x-bar is equally

likely, with probability 1/4

x

p(x)

1/4

2 3

Page 15: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

1. Sampling Distribution of the Mean…•A fair die is thrown infinitely many times, with the random variable X = # of spots on any throw. The probability distribution of X is:

…and the mean and variance are:

x 1 2 3 4 5 6

P(x) 1/6 1/6 1/6 1/6 1/6 1/6

71.1)()(

5.3)6

1(6...)

6

1(2)

6

1(1

)(

2

xpx

xxp

71.1)()(

5.3)6

1(6...)

6

1(2)

6

1(1

)(

2

xpx

xxp

Page 16: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Sampling Distribution of Two Dice

•A sampling distribution is created by looking atall samples of size n=2 (i.e. two dice) and their means…

•While there are 36 possible samples of size 2, there are only 11 values for , and some (e.g. =3.5) occur more frequently than others (e.g. =1).

Page 17: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Sampling Distribution of Two Dice…•The sampling distribution of is shown below:

1.0 1/361.5 2/362.0 3/362.5 4/363.0 5/363.5 6/364.0 5/364.5 4/365.0 3/365.5 2/366.0 1/36

P( )

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

1/36

xp2/36

3/36

4/36

5/36

6/36

Page 18: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

ExampleExample

Thrown two fair dice. Based on all possible samples, the calculation of mean and standard deviation can also be done as..

21.12/71.12/

:Dev Std

5.3:Mean

21.12/71.12/

:Dev Std

5.3:Mean

Page 19: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

19

6)

5(5833.

5.35n

2x2

x

x

)10

(2917.

5.310n

2x2

x

x

)25

(1167.

5.325n

2x2

x

x

Sampling Distribution of the Mean

Page 20: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Sampling Distribution of the Mean

)5

(5833.

5.35n

2x2

x

x

)10

(2917.

5.310n

2x2

x

x

)25

(1167.

5.325n

2x2

x

x

Notice that is smaller than . The larger the sample size the

smaller . Therefore, tends to fall closer to , as the sample

size increases.

x2x

2x2

x

Page 21: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Central Limit Theorem: If random samples of n observations are drawn from a nonnormal population with finite and standard deviation , then, when n is large, the sampling distribution of the sample mean is approximately normally distributed, with mean and standard deviation . The approximation becomes more accurate as n becomes large.

Central Limit Theorem: If random samples of n observations are drawn from a nonnormal population with finite and standard deviation , then, when n is large, the sampling distribution of the sample mean is approximately normally distributed, with mean and standard deviation . The approximation becomes more accurate as n becomes large.

x

/ n

Central Limit TheoremCentral Limit Theorem

Page 22: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

How Large is Large?How Large is Large?If the population is normalnormal, then the sampling distribution of will also be normal, no matter what the sample size.

When the population is approximately symmetricsymmetric, the distribution becomes approximately normal for relatively small values of n.

When the population is skewedskewed, the sample size must be at least 30at least 30 before the sampling distribution of becomes approximately normal.

x

x

Page 23: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

A random sample of size n is selected from a population with mean and standard deviation

he sampling distribution of the sample mean will have mean and standard deviation .

If the original population is normalnormal, , the sampling distribution will be normal for any sample size.

If the original population is nonnormal, nonnormal, the sampling distribution will be normal when n is large.

The Sampling Distribution of the The Sampling Distribution of the Sample MeanSample Mean

The standard deviation of x-bar is sometimes called the STANDARD ERROR (SE).

xn/

Page 24: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Sampling Distribution of the MeanSampling Distribution of the Mean

Central Limit TheoremGiven population with and the sampling

distribution will have:

2

Mean

Variance

Standard Deviation = Standard Error (mean)

As n increases, the shape of the distribution becomes normal (whatever the shape of the population)

x

nx

22

nx

Page 25: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

ExampleExample

Page 26: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Finding Probabilities for Finding Probabilities for the Sample Meanthe Sample Mean

1587.8413.1)1(

)16/8

1012()12(

zP

zPxP

1587.8413.1)1(

)16/8

1012()12(

zP

zPxP

If the sampling distribution of is normal or approximately normal standardize or rescale the interval of interest in terms of

Find the appropriate area using Z Table.

If the sampling distribution of is normal or approximately normal standardize or rescale the interval of interest in terms of

Find the appropriate area using Z Table.

Example: Example: A random sample of size n = 16 from a normal distribution with = 10 and = 8.

x

/

xz

n

Page 27: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

ExampleExample

A soda filling machine is supposed to fill cans of soda with 12 fluid ounces. Suppose that the fills are actually normally distributed with a mean of 12.1 oz and a standard deviation of 0.2 oz. What is the probability that the average fill for a 6-pack of soda is less than 12 oz?

)12(xP

)6/2.

1.1212

/(

n

xP

1112.)22.1( zP

Page 28: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

ExerciseExerciseThe time that the laptop’s battery pack can function before recharging is needed is normally distributed with a mean of 6 hours and standard deviation of 1.8 hours. A random sample of 25 laptops with a type of battery pack is selected and tested. What is the probability that the mean until recharging is needed is at least 7 hours?

Page 29: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

The characteristics of the sampling distribution of a statistic:

•The distribution of values is obtained by means of repeated sampling

•The samples are all of size n

•The samples are drawn from the same population

Page 30: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

2. Sampling Distribution of a Proportion…

np

successes"" ofnumber ˆ

tois as , tois ˆ xpp

The proportion in the sample is denoted "p-hat"

The proportion in the population (parameter) is denoted p

Page 31: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Types of response variables

Quantitative Sums Averages

Categorical Counts Proportions

Response type

Prior chapters have focused on quantitative response variables. We now focus on categorical response variables.

Page 32: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

The Sampling Distribution of The Sampling Distribution of the Sample Proportionthe Sample Proportion

The Central Limit TheoremCentral Limit Theorem can be used to conclude that the binomial random variable X with mean np and standard deviation is approximately normal when n is large.

The sample proportion, is simply a rescaling of the binomial random variable X, dividing it by n.

From the Central Limit Theorem, the sampling distribution of will also be approximately approximately normal, normal, with a rescaled mean and standard deviation.

n

xp ˆ

pnp 1

Page 33: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Approximating Normal from the Binomial

Under certain conditions, a binomial random variable has a distribution that is approximately normal.

When n is large, and p is not too close to zero or one, areas under the normal curve with mean np and standard deviation can be used to approximate binomial probabilities.

Make sure that np and n(1-p) are both greater than 5 to avoid inaccurate approximations!

pnp 1

Page 34: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

The Sampling Distribution of The Sampling Distribution of the Sample Proportionthe Sample Proportion

The standard deviation of p-hat is sometimes called the STANDARD ERROR (SE) of p-hat.

A random sample of size n is selected from a binomial population with parameter p.

The sampling distribution of the sample proportion,

will have mean p and standard deviation

If n is large, and p is not too close to zero or one, the sampling distribution of will be approximately approximately normal.normal.

n

xp ˆ

n

pq

Page 35: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Finding Probabilities for Finding Probabilities for the Sample Proportionthe Sample Proportion

0207.9793.1)04.2(

)

100)6(.4.

4.5.()5.ˆ(

zP

zPpP

0207.9793.1)04.2(

)

100)6(.4.

4.5.()5.ˆ(

zP

zPpPExample: Example: A random sample of size n = 100 from a binomial population with p = 0.4.

If the sampling distribution of is normal or approximately normal, standardize or rescale the interval of interest in terms of

Find the appropriate area using Z Table.

If the sampling distribution of is normal or approximately normal, standardize or rescale the interval of interest in terms of

Find the appropriate area using Z Table.

p̂ pz

pq

n

If both np > 5 and

np(1-p) > 5

Page 36: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

ExampleExampleThe soda bottler in the previous example claims that only 5% of the soda cans are underfilled. A quality control technician randomly samples 200 cans of soda. What is the probability that more than 10% of the cans are underfilled?

)10.ˆ( pP

)24.3()

200)95(.05.

05.10.(

zPzP

0006.9994.1 This would be very

unusual, if indeed p = .05!

n = 200

U: underfilled can

p = P(U) = 0.05

q = 0.95

np = 10 nq = 190

n = 200

U: underfilled can

p = P(U) = 0.05

q = 0.95

np = 10 nq = 190OK to use the normal

approximation

Page 37: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

ExampleExample

Page 38: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

38

Sampling Distribution of the Difference Between Two Averages

• Theorem : If independent sample of size n1 and n2 are drawn at random from two populations,with means 1 and 2 and variances 1

2 and 22, respectively, then the sampling distribution of the

differences of means, is approximately normally distributed with mean and variance given by

variable.normal standard aely approximat is

)/()/(

)()( Hence

. and

2221

21

2121

2

22

1

212

21 2121

nn

XXZ

nnXXXX

,21 XX

Page 39: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

Example•Starting salaries for MBA grads at two universities are normally distributed with the following means and standard deviations. Samples from each school are taken…

University 1 University 2

Mean RM 62,000 /yr RM 60,000 /yr

Std. Dev. RM14,500 /yr RM18,300 /yr

sample size, n 50 60

• What is the sampling distribution of• What is the probability that a sample mean of U1 students will

exceed the sample mean of U2 students?

Page 40: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

mean: = 62,000 – 60,000 =2000and standard deviation: =

= 3128.3

60

300,18

50

500,14 22

212121xxxx

Example

7389.2389.5.)64.(

)3128

20000) - (()0(

2

22

1

21

212121

zP

nn

xxPxxP

Page 41: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

41

Sampling Distribution of the Difference Between Two Sample Proportions

• Theorem : If independent sample of size n1 and n2 are drawn at random from two populations, where the proportions of obs with the characteristic of interest in the two populations are p1 and p2

respectively, then the sampling distribution of the differences between sample proportions, is approximately normally distributed with mean and variance given by

1 2ˆ ˆ 1 2p p p p

1 2

1 2 1 2

ˆ ˆ

ˆ ˆ ( )

p p

p p p pZ

is approximately a standard normal variable

and

Then

1 2ˆ ˆp p

Page 42: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

42

Sampling Distribution of the Difference Between Two Sample Proportions

• Example : It is known that 16% of the households in Community A and 11% of the households in Community B have internets in their houses. If 200 households and 225 households are selected at random from Community A and Community B respectively, compute the probability of observing the difference between the two sample proportions at least 0.10?

Page 43: Chapter 7 Sampling Distributions. Introduction In real life calculating parameters of populations is prohibitive (difficult to find) because populations

43

Sampling Distribution of S2

• Theorem : If S2 is the variance of a random sample of size n taken from a normal population having the variance 2, then the statistic

• has a chi-squared distribution with v = n -1 degrees of freedom.

n

i

i XXSn

12

2

2

22 )()1(

167.2

067.14

7For

295.0

205.0

v