sampling
DESCRIPTION
for sample mean. for sample standard deviation. for sample proportion. Sampling. The sampling errors are:. Sampling. Example: St. Andrew’s. - PowerPoint PPT PresentationTRANSCRIPT
Sampling
The sampling errors are:
| |p p for sample proportion
| |s for sample standard deviation
| |x for sample mean
( ) 0P p p
( ) 0P s
( ) 0P x
Example: St. Andrew’s
St. Andrew’s College receives 900 applications annually from prospective students. The application form contains a variety of information including the individual’s scholastic aptitude test (SAT) score and whether or not the individual desires on-campus housing.
The director of admissions would like to know the following information:– Applicants’ average SAT score over the past 10 years– the proportion of applicants who live on campus.
Sampling
We will now look at two alternatives for obtaining the desired information.
Example: St. Andrew’s
If the relevant data for the entire 9000 applicants were in the college’s database, the population parameters of interest could be calculated using the formulas presented in Chapter 3.
Conducting a census of all applicants over the last ten years (N = 9000) allows us to compute population parameters.
Selecting a sample of 30 from the 9000 current applicantsallows us to compute the sample statistics.
Sampling
Applicant Number SAT score
Wants on-campus housing
Sqrd. dev. from SAT mean
1 1004 Yes 1122 942 Yes 26433 890 Yes 106944 1032 no 14895 857 no 186086 1015 Yes 4667 1063 Yes 4843
8999 1090 Yes 93299000 1094 no 10118Total 8,940,700 6,480 57,642,979
Conducting a Census
Conducting a Census
Population Mean SAT Score
Population Standard Deviation for SAT Score
Population Proportion Wanting On-Campus Housing
8,940,700 9939000ix
N
6480 .729000p
Applicant Number SAT score
Wants on-campus housing
Sqrd. dev. from SAT mean
1 1004 Yes 1212 942 Yes 26013 890 Yes 106094 1032 no 15215 857 no 184966 1015 Yes 4847 1063 Yes 4900
8999 1090 Yes 94099000 1094 no 10201Total 8,940,700 6,480 57,642,979
993Conducting a Census
Conducting a Census
Population Mean SAT Score
Population Standard Deviation for SAT Score
Population Proportion Wanting On-Campus Housing
8,940,700 9939000ix
N
2( ) 57,642,979 809000
ixN
6480 .729000p
data_sat_pop.xls
She decides a sample of 30 applicants will be used.
The Director of Admissions needs estimates of the population parameters for a meeting taking place in an hour.
Suppose the data is stored in boxes off campus.
The number of random samples (without replacement) of size 30 that can be drawn from a population of size 9000 is huge. For just this year, it is
900 5530
900! 900! 9.80 1030!(900 30)! 30! 870!
C
Simple Random Sampling
Taking a Sample of 30 Applicants
Step 1: Assign a random number to each of the 9000 current applicants.
Step 2: Select the 30 applicants corresponding to the 30 smallest random numbers.
Excel’s RAND function generates random numbers between 0 and 1
Simple Random Sampling
Applicant Number random
1 .9872 .5673 .8674 .1245 .3456 .1037 .698
8999 .4329000 .211
Sort rows by the random numbers
Simple Random Sampling
Applicant Number random SAT score Wants on-
campus housing675 .001 985 Yes34 .001 1002 Yes
768 .002 913 Yes1823 .003 987 No8897 .008 1123 No7837 .009 989 Yes231 .009 912 Yes
701 .012 987 Yes5065 .015 998 noTotal 30,299 20
30 applicant numbers with
smallest random numbers.
Simple Random Sampling
30,299 1009.9730ix
xn
Sample Mean SAT Score
Sample Standard Deviation for SAT Score
Sample Proportion Wanting On-Campus Housing
20 30 .667p
Simple Random Sampling
Applicant Number SAT score Wants on-
campus housingSqrd. dev. from
SAT mean675 985 Yes 623.534 1002 Yes 63.52
768 913 Yes 9403.181823 987 no 527.628897 1123 no 12,775.787837 989 Yes 439.74231 912 Yes 9598.12
701 987 Yes 527.625065 998 no 143.28Total 30,299 20 211,746.97
x = 1009.97
Simple Random Sampling
2( ) 211,746.97 85.451 29
ix xs
n
30,299 1009.9730ix
xn
Sample Mean SAT Score
Sample Standard Deviation for SAT Score
Sample Proportion Wanting On-Campus Housing
20 30 .667p
Simple Random Sampling
data_sampling.xls
The sampling distribution of is the probability distribution of all possible values of the sample mean.
x
Expected Value of x
Sampling Distribution of x
where = the population mean
E( ) = x
Standard Deviation of from an infinite population is
x
x n
Under repeated sampling using random samples of size n, the sample means are normally distributed with mean and variance 2/n when either
Sampling Distribution of x
OR
OR
The data is heavily skewed, n > 50, and is known.
The data is symmetric, n > 30, and is known.
The data is normally distributed and is known.
x
SamplingDistribution
of x
Sampling Distribution of x
80 14.630x n
( ) 993E x
What is the probability that a simple random sampleof 30 applicants will provide an estimate of thepopulation mean SAT score that is within 10 points ofthe actual population mean ? In other words, what is the probability that will bebetween 983 and 1003?
x
Sampling Distribution of x
Step 1: Calculate the z-value at the upper endpoint of the interval.
z = (1003 - 993)/14.6 = .68
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09. . . . . . . . . . ..5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389. . . . . . . . . . .
Sampling Distribution of xStep 2: Find the area under the curve to the left of the upper endpoint.
P(z < .68) = .7517 P(x < 1003) = .7517
z = .6 8
x
SamplingDistribution
of x
Sampling Distribution of x
993
14.6x
1003
Area = .7517 Area = .2483
Step 3: Calculate the z-value at the lower endpoint of the interval.
Step 4: Find the area under the curve to the left of the lower endpoint.
Sampling Distribution of x
z = (983 - 993)/14.6 = - .68
P(z < -.68) = .2483P(x < 983) = .2483
Sampling Distribution of x
Step 5: Calculate the area under the curve between the lower and upper endpoints of the interval.
P(983 < < 1003) = .5034x
x993 1003983
14.6x
With n = 30,.5034
.2483 .2483
If the simple had included 100 applicants instead of 30,E( ) remains equal to 993x , but the standard error falls.
x1003983
14.6x
With n = 30,.5034
.2483 .2483993
80 8.0100x n
Sampling Distribution of x
If the simple had included 100 applicants instead of 30,E( ) remains equal to 993x , but the standard error falls.
x1003983
14.6x
With n = 30,.5034
.2483 .2483993
.7888 8x
With n = 100,
Sampling Distribution of x
E p p( ) The Expected value of p
from an infinite population isStandard deviation of P
𝜎 𝑝=𝜎𝐷
√𝑛D = standard deviation of D
The sampling distribution of is approximately normal whenp
andnp > 5
n(1 – p) > 5
PSampling Distribution of
6.0106
nD
p i
The sample proportion can be computed in the same way as the sample mean when a dummy variable is coded from a nominal scaled binomial variable.
Vote for Obama D
Yes 1 No 0 No 0
No 0 Yes 1 Yes 1 Yes 1 Yes 1
No 0 Yes 1
PSampling Distribution of
2 2 2 2 2
2 2 2 2 22
( .6) ( .6) ( .6) ( .6) ( .6)
( .6) ( .6) ( .6) ( .6) ( .6)10D
1 11
0 0 001 1 1
Since there are six 1s and four 0s2 2
26( .6) 4( .6)1
0
0
1D 2 2(.6)(.4) (.4)(.6)
(.6)(.4)[(.4) (.6)]
(.6)(.4) .24 2 (1 )D p p
The sampling distribution of is the probability distribution of all possible values of the sample proportion.
p We should have divided by n – 1 because the data came from a
sample.
In most cases involving sample proportions, n is very large.
Hence, dividing by n or n – 1 yields roughly the same
value
(1 )D p p 𝜎 𝑝=𝜎𝐷
√𝑛
PSampling Distribution of
Recall that 72% of the prospective students applying to St. Andrew’s College desire on-campus housing. What is the probability that a simple random sample of 30 applicants will provide an estimate of the population proportion of applicants desiring on-campus housing that is within .05 of the actual population proportion?
Example: St. Andrew’s College
Step 1: Convert the upper endpoint of the interval to z.
z1 = (.77 - .72)/.082 = .61
P(0.67 < < 0.77) = ?p
(1 )p
p pn
.72(1 .72)
.08230p
PSampling Distribution of
.72
For this example, with n = 30 and p = .72, the normal distribution is an acceptable approximation because:
and n(1 - p) = 30(.28) = 8.4 > 5np = 30(.72) = 21.6 > 5
.77.67p
?
PSampling Distribution of
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09. . . . . . . . . . ..5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549.7 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852.8 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133.9 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389. . . . . . . . . . .
Step 2: Find the area under the curve to the right of the upper endpoint.
P(z1 < .61) = .7291 P(p < .77) = .7291
z1 = .6 1
PSampling Distribution of
.72
.082p
.77
Area = .7291 Area = .2709
p
PSampling Distribution of
Step 3: Calculate the z-value of the lower endpoint of the interval.
Step 4: Find the area under the curve to the left of the lower endpoint.
z0 = (.67 - .72)/.082 = -.61
P(z0 < -.61) = .2709
P(p < .67) = .2709
PSampling Distribution of
.72 .77
Area = .2709
.67
Area = .2709.4582
.082p
p
Step 5: Calculate the area under the curve between the lower and upper endpoints of the interval.
PSampling Distribution of
PopulationParameter
PointEstimator
PointEstimate
ParameterValue
= Population mean SAT score
= Population std. deviation for SAT score
s = Sample std. deviation for SAT score
p = Population pro- portion wanting campus housing
= Sample mean SAT score x
= Sample pro- portion wanting campus housing
p
993 1009.97
80 85.45
.72 .667
Simple Random Sampling
data_sampling_dist.xls