7. sampling & sample size determination ldr 280
TRANSCRIPT
SAMPLING:SAMPLING:
Process of Selecting your Process of Selecting your ObservationsObservations
(Masoud Hemmasi, Ph.D.)(Masoud Hemmasi, Ph.D.)
SAMPLING: Process of Selecting your SAMPLING: Process of Selecting your ObservationsObservations
• QUESTION:
During presidential election campaigns, in a typical poll, of the potentially 100 million potential voters, how many would you say are contacted?
• History and Evolution of Political Polling
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
Types of Probability Sampling: Simple (Unrestricted) Random Sampling
Complex (Restricted) Probability Sampling:Some times offer more efficient alternatives to Simple Random Sampling
b. Stratified Random Samplingc. Cluster Samplinga. Systematic Samplingd. Convenience Samplinge. Double Sampling
Simple Random (or Unrestricted) Simple Random (or Unrestricted) SamplingSampling A sampling procedure in which A sampling procedure in which every elementevery element in the population in the population has a has a knownknown and and equal chanceequal chance of being selected as a subject of being selected as a subject (e.g., drawing names out of a hat).(e.g., drawing names out of a hat).
Types of Probability Sampling:
Advantage: Advantage: has the least bias and offers the most generalizability.has the least bias and offers the most generalizability.
Disadvantage: Disadvantage: At times, can be inefficient/expensive.At times, can be inefficient/expensive.
Systematic SamplingSystematic Sampling If a If a sample size of sample size of nn is desired from a is desired from a population containing population containing NN elements, we might sample one element for elements, we might sample one element for every every nn//NN elements elements in the population. in the population.
First, we First, we randomly select one of the first randomly select one of the first nn//NN elements elements from from the population list. the population list.
We then We then select every select every nn//NNth elementth element that follows in the that follows in the population list. population list.
This method has the properties of a simple random sample,This method has the properties of a simple random sample, especially if the list of the population elements is a randomespecially if the list of the population elements is a random ordering. ordering.
Systematic SamplingSystematic Sampling AdvantageAdvantage: : The sample usually will be easier to identify than itThe sample usually will be easier to identify than it would be if simple random sampling were used. would be if simple random sampling were used.
ExampleExample: : Selecting every 100Selecting every 100thth listing in a telephone book listing in a telephone book after the first randomly selected listingafter the first randomly selected listing
The The population is first divided intopopulation is first divided into groups called groups called stratastrata with with respect to respect to salient/relevant characteristicssalient/relevant characteristics (e.g., gender, age, race, (e.g., gender, age, race, department, location, industry, etc.) department, location, industry, etc.)
Stratified Random SamplingStratified Random Sampling
Each element in the population belongs to one and only oneEach element in the population belongs to one and only one stratum. stratum.
Best results are obtained when the elements within each stratumBest results are obtained when the elements within each stratum are as much alike as possible (i.e. a are as much alike as possible (i.e. a homogeneous grouphomogeneous group).).
A simple A simple random sample random sample is taken is taken from each stratumfrom each stratum..
Advantage: If strata are homogeneous, this method is Advantage: If strata are homogeneous, this method is as “preciseas “precise”” as simple random sampling as simple random sampling but with a smaller total sample sizebut with a smaller total sample size..
Cluster SamplingCluster Sampling The The population is first divided intopopulation is first divided into separate groups called separate groups called clustersclusters..
Ideally, Ideally, each clustereach cluster would be a would be a small-scale versionsmall-scale version (representative) (representative) of the populationof the population..
A simple A simple random sample of the clustersrandom sample of the clusters is then taken. is then taken.
All elements within eachAll elements within each selected cluster will selected cluster will make up the finalmake up the final sample sample..
ExampleExample: A primary application is : A primary application is area samplingarea sampling, where clusters, where clusters are are city blockscity blocks or other well-defined areas (neighborhoods, or other well-defined areas (neighborhoods, precincts, school districts, etc.).precincts, school districts, etc.).
Cluster SamplingCluster Sampling
Advantage: Advantage: The The close proximity of elementsclose proximity of elements can be can be cost and time cost and time effective effective (i.e. many sample observations can be obtained in (i.e. many sample observations can be obtained in a short time). a short time).
DisadvantageDisadvantage: : This method generally requires a larger totalThis method generally requires a larger total sample size than simple or stratified random sampling. sample size than simple or stratified random sampling.
Convenience SamplingConvenience Sampling It is a It is a nonprobabilitynonprobability sampling techniquesampling technique.. Items are included in the sample Items are included in the sample without knownwithout known probabilities probabilities of being selected. of being selected.
ExampleExample: A professor conducting research might use: A professor conducting research might use student volunteersstudent volunteers to constitute a sample. to constitute a sample.
The sample is identified primarily The sample is identified primarily by convenienceby convenience..
Advantage: Advantage: Sample selection and data collection areSample selection and data collection are relatively easy.relatively easy.
Disadvantage:Disadvantage: It is impossible to determine howIt is impossible to determine how representative of the population the sample is.representative of the population the sample is.
Sample Size Determination
Sampling Process of Selecting your observationsSampling Process of Selecting your observations
Standard Deviation—What does it measure?
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
Sx
• Variations/differences in scores among members of a group with respect to a given characteristic (e.g., test scores for a class, income).
• Standard deviation represents the average distance of a group of numbers from their mean.
How do we calculate it?Hint: You can think of it as the average deviation from the
norm/typical.
For a Population:
For a Sample:
Income level for particular a class like this:Income level for particular a class like this:Xs = Incomes of students in an MBA Class$6,000 $6,000$15,000$16,000$39,000$38,000 $50,000$70,000
ΣX = $240,000
Average = x = $240,000 / 8 = $30,000
Part-Time Employed
Part-Time Employed
Grad Assistants
X X - x (X - x )2
6,000 -24,000 576,000,000
6,000 -24,000 576,000,000
15,000 -15,000 225,000,000
16,000 -14,000 196,000,000
39,000 9,000 81,000,000
38,000 8,000 64,000,000
50,000 20,000 400,000,000
70,000 40,000 1,600,000,000
Sum ( x) 240,000 0 3,718,000,000
Average = x 30,000
Variance = 2 3,718,000,000 / 8 = 464,750,000
Std. Dev. = $21,558.06
Life of a randomly drawn light bulb: 100 – 5 Z x 100 + 5 Z Z = 1 for 68% confidence, Z = 1.96 for 95% confidence, Z = 3for 99% confidence
Formula: X = x + Z x (Where Z is an index that reflects the level of confidence/certainty with which we wish to estimate x.)
SAMPLING: Process of Selecting Your ObservationsSAMPLING: Process of Selecting Your Observations
x = 100 hrsx = 5 hrs
X= Hours
Freq
85 90 95 100 105 110 115
What can we say about the expected life of a randomly selected bulb (xi) = ?
Suppose frequency distribution of life of light bulbs is normal.
………
……………..
…………………….
……………………………..
.
…………
………………..
……………………….
………………………………………..
x = life of light bulbs—e.g., 3 bulbs lasted 108 hrs each
xi
True Population Mean = μ = Σxi / n = 45 / 10 = $4.5
Population Standard Deviation:
Income of a randomly drawn person (Xi) = ?
= 2.87
Income Distribution for a hypothetical populationIncome Distribution for a hypothetical population
$1
$3
$0
$4
$2
$5
$6 $7 $8 $9
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
This formula: X = x + Z x is ONLY applicable when the population distribution is NORMAL
What is the Distribution of our hypothetical population?
Distribution of the Hypothetical PopulationDistribution of the Hypothetical Population10987654321
$0 $1 $2 $3 $4 $5 $6 $7 $8 $9
* * * * * * * * * * x
Uniform Distribution
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
• If (and only if) we know that our sample mean ( x ) comes from a normally distributed population, the same formula can be modified and applied.
NOTE that X is the X of a sample of size n = 1
What is the generic formula for mean (X) of samples of any size (any n)?That is, what if instead of a single observation/case (X), we draw a random sample of a particular size from the population? Can we say something about the mean of that sample--X?
Rather than X = x + Z x use X = x + Z x
But, what does this statement mean?
Std. Error
X = x + Z x
Sampling Distribution Sampling Distribution = Frequency distribution of sample means= Frequency distribution of sample means
Sampling Distribution for Samples of Size Sampling Distribution for Samples of Size n = 2 n = 2 (from our earlier population)(from our earlier population)
Sample # SAMPLE MEAN (X) 1 $0 & $1 0.5
2 $0 & $2 1.0
3 $0 & $3 1.5
. . .
. . .
10 $1 & $2 1.5
11 $1 & $3 2.0
12 $1 & $4 2.5
. . .
. . .
18 $2 & $3 2.5
19 $2 & $4 3.0
. . .
. . .
43 $7 & $8 7.5
44 $7 & $9 8.0
45 $8 & $9 8.5
45 Possible Samples of size n = 2, thus 45 possible sample means.
Distribution of these 45 sample means is called Sampling Distribution! See next slide!!!
Mean of all the 45 sample means xs = x = x = 4.5 (i.e., the same as mean of the original population
So, the earlier statement means: if these sample means are normally distributed, we can use the related formula.
x = Standard Error is the standard dev. of these Xs
Sampling Distribution of Sampling Distribution of Samples of Size n=2Samples of Size n=2
x = ($0+$1)/2=$.50
($0+$3)/2=$1.50 &
($1+$2)/2=$1.50
μx =
x
# SAMPLE MEAN1 $0 & $1 0.52 $0 & $2 1.03 $0 & $3 1.5. . .. . .
10 $1 & $2 1.511 $1 & $3 2.0. . .. . .
44 $7 & $9 8.045 $8 & $9 8.5
.
…………
………--..…..
……………………….
………………………………………..
………
……X……..
…….…….X….…….
……………………………..
We will be able to say the following about the mean ( x ) of a randomly selected sample: x = x + Z x
Since μX = μX , substitute x for x : x = x + Z x
SAMPLING: Process of Selecting Your ObservationsSAMPLING: Process of Selecting Your Observations
x = x
x = Standard Error = x / n
Freq So, if we know that distribution of our Sample Means (i.e., Sampling Distribution) is NORMAL, as shown below:
x
√
Answer: Shows the relationship between x and x.--So, if x comes from a normal distribution, we can rewrite the formula to estimate x based on value of x.
Question: But, is the sampling distribution (i.e., distribution of x ) always normal (so that we can use the above formula)? Let’s see it!
QUESTION: What is the primary purpose of sampling?Answer: To use sample characteristics (e.g., X) as estimates
of population characteristics (e.g., x) What is the significance of this formula? x = x + Z x
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
x = x + Z xxx + Z x
Think of these as distributionof life of all individual lightbulbs (X).
Think of these as distribution
of average life of samples
of n light bulbs (X).
(n = 1) (n = 1) (n = 1) (n = 1)
Distribution of Sample Means (Xs) for Different Population Distributions
As n increases, sampling distribution (i.e., distribution of Xs) will more and more resemble a normal distribution so that for all n > 30, sampling distribution will always be normal, regardless of the distribution of the original population.
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
Conclusion?
Distribution of Xs
Mean of Xs = x
Std. Dev. of Xs =x
SAMPLING: Process of Selecting Your SAMPLING: Process of Selecting Your ObservationsObservations
sx
1x
2xn1>30
Distribution of for all samples of the same size (Sampling Distribution)
Mean of = = x
Std. Error = =
n2 >30n3 >30
•
••
Xs
Sampling distribution is guaranteed to be normal only when n 30 is used.
Variable of interest X is NOT normally distributed.
3xsx
sx
x
x
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
So, for samples of n 30:
x = X + Z x
SO, x = X + Z x / nNow, Let’s examine the elements of this formula!
__
Standard Error = x = x / n√
√
x = X + Z x / n
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
1) We are interested in estimating x from x
2) Estimation involves a margin of error, that is
3) Actual Score = Estimate + Margin of Error
Estimate Actual Score
_
Margin of Error, lets call it “E”
So, when using random samples of size n > 30, margin of error in estimation would be:
E = Z x / n
√
√
• x (population Std. Dev.) is often unknown. Sx (Std. Dev.of a sample) is a reasonable estimate (substitute) for it.
• Sx can be estimated based on previous studies or a pilot study.
n = Z2 S2x / E2
E = Z x / √ nSquare both sides of the equation:
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
E2 = Z2 2
x / nRewrite it to solve for n:
n = Z2 2x / E2
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
Sample size required for estimating a population mean*x):
n = Z2 S2x / E2
n = Sample size required E = Margin of error we are willing/able to tolerate in estimating the population characteristic (mean) Z = An index reflecting the degree of confidence/ certainty we wish to have in achieving the level of precision/accuracy represented by E above. S = An estimate of Std. Dev. of the characteristic being estimated/studied.
* The case of n for estimating a population proportion will be covered later.
An example:Suppose you were to use a random sample to estimate average IQ of adult males. Suppose you know, from a pilot study that the Std. Dev. of males’ IQ is about 16 points. What size sample should you use if you wish to be 95% sure that your margin of error in estimating average IQ is no more than 3 points (that is if you wish to be 95% sure that the estimate you will obtain from the sample would be within +3 points of the actual/true average IQ of the adult male population)?
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
n = Z2 S2 / E2
Z = 2S = 16 n = 22 (16)2 / 32 = 113.78 round up = 114E = 3
Z = ? S = ? E = ?
An Example:Suppose we were to use a random sample to estimate average IQ of adult males. Further suppose that we have absolutely no basis for determining the Std. Dev. of males’ IQ. But, we know that the IQ of the overwhelming majority of adult males ranges between 80 and 120. What size sample should we use if we wish to be 99% sure that our margin of error in estimating the average IQ is no more than 2 points (that is if we wish to be 99% sure that the estimate we will obtain from the sample would be within +2 points of the actual/true average IQ of the adult male population)?
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
Assuming Assuming worst case scenario worst case scenario when S is unknown:when S is unknown:
n = Z2 S2 / E2
If no information is available on S, you can assume maximum variability by setting S = ¼ of Range.
Range = 120 – 80 = 40S = 40/4 = 10Z=3 n = 32 ( 10)2 / 22 = 225E=2
E = Z S / \/ n
Assessing Resulting Accuracy/Precision of the Estimates, Given a Particular Sample Size:
• Suppose, we used a survey with lots of 7-point scale items,• Collected data from 225 respondents, and • Descriptive statistics on the data shows typical Std. Dev. on
most items/variables is in the 1.3 to 1.5 range.• What can we say about the precision/accuracy of our
results, say, with 95% confidence/certainty?
n = Z2 S2x / E2
E2 = Z2 S2 / n
We can be 95% certain that the sample mean for a typical variable is not off from the true population mean by more than two-tenth of a point. (e.g., if the reported sample mean on a given variable is 4.7, we can be 95% sure that the actual population mean is between 4.5 and 4.9).
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
E = 2 (1.5) / \/ 225 = 3/15 = .2 ?
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
Sample size determination for estimating Sample size determination for estimating ProportionsProportions ( (pp):):
EXAMPLE:EXAMPLE: Projecting the Projecting the percentagepercentage of people who would be voting of people who would be votingfor a particular candidate in a presidential election. for a particular candidate in a presidential election.
In such cases, dispersion is measured by = pqpq (instead of variance, ss22)Where, p =p = proportion of the population that is expected to have the
attribute under study, andq = q = (1(1- p- p)), , the proportion of the population that is expected NOT to have
that attribute
So, the sample size formula will change to: n = ZSo, the sample size formula will change to: n = Z22 pq / E pq / E22
Or :Or :
NOTE:NOTE: If we have If we have no basis for judgingno basis for judging the expected value of the expected value of pp, we can , we can assume assume maximum variabilitymaximum variability (i.e., err on the side of overestimating the (i.e., err on the side of overestimating the required sample size) required sample size) by setting p at p=0.50by setting p at p=0.50 (see the example on next slid). (see the example on next slid).
n = Z2 p(1-p) / E2
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations
Sample size determination for Estimating Sample size determination for Estimating ProportionsProportions::
EXAMPLE:EXAMPLE: Suppose you are to project the Suppose you are to project the percentagepercentage of potential voters who would be of potential voters who would be
expected to vote for the Republican candidate in the upcoming expected to vote for the Republican candidate in the upcoming presidential election. Suppose you have no basis for presidential election. Suppose you have no basis for estimating/guessing what the percentage could possibly be. Also, estimating/guessing what the percentage could possibly be. Also, suppose that you want to be 99% confident/certain that your margin of suppose that you want to be 99% confident/certain that your margin of error would be 3% (i.e., 99% certain that your projection/estimate will be error would be 3% (i.e., 99% certain that your projection/estimate will be within within ++ 3% of the actual number). What size sample will you need? 3% of the actual number). What size sample will you need? n = Z2 p(1-p) / E2
Z = 3 p = 0.50 E = 0.03
n = Z2 p(1-p) / E2
n = 32 ( 0.5) (0.5) / 0.032
n = 9 (0.25) / 0.0009 = 25002500
QUESTIONS OR COMMENTS
?
SAMPLING: Process of Selecting your ObservationsSAMPLING: Process of Selecting your Observations