6 inference intervals sample size
TRANSCRIPT
-
7/28/2019 6 Inference Intervals Sample Size
1/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Applied Marketing
(Market Research Methods)
Topic 6:
Inference, confidence intervals andsample size determination
Dr James Abdey
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
2/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Overview
Here we consider sample size determination insimple random sampling
Properties of the sampling distribution are
discussed
We describe the required adjustments to statistically
determined sample sizes to account for incidence
and completion rates
Non-response issues in sampling are also covered,
with ways of improving response rates
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
3/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Choosing a sample size
The question How big a sample do I need to take?
is a common one when sampling data The answer to this depends on the quality of
inference that the researcher requires from the data
In the estimation context this can be expressed in
terms of the accuracy of estimation If the researcher requires that there should be a 95%
chance that the estimation error should be no bigger
than d units (we refer to d as the tolerance), then
this is equivalent to having a 95% confidence
interval of width 2d
Note here d represents the half-width of the
confidence interval since the point estimate is, by
construction, at the centre of the confidence interval
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
4/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Simple random sampling (SRS)
Recall a simple random sample is a sample
selected by a process such that every possiblesample (of the same size, n) has the same
probability of selection
The selection process is left to chance, thus
eliminating the effect of selection bias
Due to the random selection mechanism, we do not
know (in advance) which sample will occur
Every population element has a known, non-zero
probability of selection in the sample but no element
is certain to appear
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
5/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Simple random sampling (SRS)
Example
Consider a population of size N = 6 elements: A, B,C, D, E and F
We consider all possible samples of size n= 2
(without replacement)
There are 15 different, but equally likely, such
samples:
AB, AC, AD, AE, AF, BC, BD, BE,
BF, CD, CE, CF, DE, DF, EF
Since this is SRS, each sample has a probability of
selection of 1/15
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
6/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Estimation
A population has particular characteristics of interest
such as the mean, variance etc.
Collectively we refer to these characteristics as
parameters
If we do not have population data, the parametervalues will be unknown
Statistical inference is the process of estimating
the (unknown) parameter values using the (known)sample data
We use a statistic (estimator) calculated from
sample observations to provide a point estimate
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
7/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Estimation Example
Returning to our example, recall there are 15
different samples of size 2 from a population of size 6
Suppose the variable of interest is income
Individual A B C D E F
Income in 000s 3 6 4 9 7 7
If we seek the population mean, , we will use thesample mean, X, as our estimator
X =1
n
ni=1
Xi
For example, if the observed sample was AB, the
sample mean is (3000 + 6000)/2 = 4,500
http://find/http://goback/ -
7/28/2019 6 Inference Intervals Sample Size
8/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Estimation Example
Clearly, different observed samples will lead to
different sample means
Consider X for all possible samples (in 000s):
Sample AB AC AD AE AF BC BD BE
Values 3 6 3 4 3 9 3 7 3 7 6 4 6 9 6 7
X 4.5 3.5 6 5 5 5 7.5 6.5
Sample BF CD CE CF DE DF EF
Values 6 7 4 9 4 7 4 7 9 7 9 7 7 7
X 6.5 6.5 5.5 5.5 8 8 7
So X values vary from 3.5 to 8, depending on the
sample values
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
9/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sampling distribution of X
The previous slide showed all possible values of the
estimator X
Since we have the population data here, we can
actually compute the population mean (in 000s)
=1
N
Ni=1
Xi =3 + 6 + 4 + 9 + 7 + 7
6= 6
So even with SRS, we obtain someX values far from
Here only one sample (AD) results in X =
http://goforward/http://find/http://goback/ -
7/28/2019 6 Inference Intervals Sample Size
10/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sampling distribution of X
Lets now consider the maximum | X |
max | X | Number of samples Probability0 1 0.067
0.5 6 0.400
1 10 0.667
1.5 12 0.800
2 14 0.933
2.5 15 1.000
So, for example, there is an 80% chance of being
within 1.5 units of
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
11/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sampling distribution of X
We now represent this as a frequency distribution
That is, we record the frequency of each possiblevalue of X
X Frequency Relative frequency
3.5 1 1/15 = 0.067
4.5 1 1/15 = 0.067
5.0 3 3/15 = 0.2005.5 2 2/15 = 0.133
6.0 1 1/15 = 0.067
6.5 3 3/15 = 0.200
7.0 1 1/15 = 0.0677.5 1 1/15 = 0.067
8.0 2 2/15 = 0.133
This is known as the sampling distribution of X
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
12/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sampling distribution of X
The sampling distribution is a central and vital
concept in statistics
It can be used to evaluate how good an estimator is
Specifically, we care about how close the estimator
is to the population parameter of interest
As we have seen, different samples yield different X
values, as a consequence of the random sampling
procedure
Hence estimators (of which X is an example) are
random variables So, X is our estimator of
The observed value of X is a point estimate
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
13/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit TheoremPrinciple of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sampling distribution properties
Like any distribution, we care about a samplingdistributions mean and variance
Together, we can assess how good an estimator is
First, consider the mean we seek an estimator
which does not mislead us systematically
So the average (mean) value of an estimator, over
all possible samples, should be equal to the
population parameter
http://find/http://goback/ -
7/28/2019 6 Inference Intervals Sample Size
14/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit TheoremPrinciple of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sampling distribution properties
Returning to our example:
X Frequency Product3.5 1 3.5
4.5 1 4.5
5.0 3 15.0
5.5 2 11.0
6.0 1 6.06.5 3 19.5
7.0 1 7.0
7.5 1 7.5
8.0 2 16.0Total 15 90.0
Hence the mean of this sampling distribution is 90/15
= 6
f
http://goforward/http://find/http://goback/ -
7/28/2019 6 Inference Intervals Sample Size
15/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit TheoremPrinciple of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sampling distribution properties An important difference between a sampling
distribution and other distributions is that the values
in a sampling distribution are summary measures ofwhole samples (i.e. statistics/estimators) rather than
individual observations
Formally, the mean of a sampling distribution is
called the expected value of the estimator, denotedby E[]
Hence the expected value of the sample mean is
E[X]
An unbiased estimator has its expected value equal
to the parameter being estimated
For our example, E[X] = 6 =
I fS li di ib i i
http://goforward/http://find/http://goback/ -
7/28/2019 6 Inference Intervals Sample Size
16/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit TheoremPrinciple of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sampling distribution properties
Fortunately the sample mean X is always an
unbiased estimator in SRS, regardless of:
the sample size, n
the distribution of the (parent) population
This is a good illustration of a population parameter,
, being estimated by its sample counterpart, X
InferenceS li di ib i i
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
17/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit TheoremPrinciple of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sampling distribution properties
The unbiasedness of an estimator is clearly
desirable, however we also need to take into accountthe dispersion of the estimators sampling distribution
Ideally, the possible values of the estimator should
not vary much around the true parameter value
So, we seek an estimator with a small variance
Recall the variance is defined to be the mean of the
squared deviations about the mean of the distribution
In the case of sampling distributions, it is referred to
as the sampling variance
InferenceS li di t ib ti ti
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
18/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit TheoremPrinciple of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sampling distribution properties
Returning to our example:
X X (X )2 Frequency Product3.5 2.5 6.25 1 6.254.5 1.5 2.25 1 2.255.0 1.0 1.00 3 3.00
5.5 0.5 0.25 2 0.506.0 0.0 0.00 1 0.006.5 0.5 0.25 3 1.75
7.0 1.0 1.00 1 1.00
7.5 1.5 2.25 1 2.25
8.0 2.0 4.00 2 8.00Total 15 24.00
Hence sampling variance is 24/15 = 1.6
InferenceS li di t ib ti ti
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
19/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit TheoremPrinciple of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sampling distribution properties
The population itself has a variance thepopulation variance, 2
X X (X )2 Frequency Product3 3 9 1 96 0 0 1 04 2 4 1 49 3 9 1 9
7 1 1 2 2
Hence the population variance is 2 = 24/6 = 4
InferenceSampling distrib tion properties
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
20/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit TheoremPrinciple of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sampling distribution properties We now consider the relationship between 2 and
the sampling variance
Intuitively, a larger 2
should lead to a largersampling variance why?
For population size N and sample size n,
Var(X) =N nN 1
2
n So for our example,
Var(X) =6 26
1 4
2= 1.6
We use the term standard error to refer to the
standard deviation of the sampling distribution,
S.E.(X) =Var(X) =N n
N 1
2
n
= X
Inference,Sampling distribution properties
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
21/48
Inference,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit TheoremPrinciple of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sampling distribution properties
Implications:
as the sample size, n, increases, the samplingvariance decreases, i.e. the precision increases1
provided the sampling fraction, n/N, is small, theterm
N nN 1 1
so can be ignored the precision depends
effectively on n only
1
Although greater precision is desirable, data collection costs willrise with n (remember why we sample in the first place!)
Inference,Sample size and sampling fraction
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
22/48
,confidence
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit TheoremPrinciple of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sample size and sampling fraction
The larger the sample, the less variability there will
be between samples
X n= 2 n = 43.50 1
4.50 1
5.00 3 2
5.25 1
5.50 2 15.75 3
6.00 1 1
6.25 2
6.50 3
6.75 1
7.00 1
7.25 1
7.50 1
8.00 2
Inference,Sample size and sampling fraction
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
23/48
confidenceintervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit TheoremPrinciple of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sample size and sampling fraction
There is a striking improvement in the precision of
the estimator
The variability has decreased considerably
Range of possible X values goes from 3.5 to 8.0down to 5.0 to 7.25
The sampling variance is reduced from 1.6 to 0.4
Note precision in statistics refers to the inverse of
the sampling variance
Inference,Sample size and sampling fraction
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
24/48
confidenceintervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sample size and sampling fraction
The factor NnN1 decreases steadily as n N When n= 1 the factor equals 1, and when n= N it
equals 0
Sampling without replacement, increasing n must
increase precision since less of the population is left
out
In much practical sampling N is very large (e.g.
several million), while n is comparably small (e.g. at
most 1,000, say)
Therefore in such cases the factor NnN1 becomes
negligible, hence
Var(X) =N nN 1
2
n
2
nfor small n/N
Inference,Sample size and sampling fraction
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
25/48
confidenceintervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sample size and sampling fraction
n/N is called the sampling fraction
When N is large, it is the sample size nwhich isimportant in determining precision, not the sampling
fraction
Consider two populations: N1 = 3 million andN2 = 200 million, both with the same variance
2
We sample n1 = n2 = 1,000 from each population,then
2X1
=N1 n1N1
1
2
n1= (0.999667)
2
1000
2X2
=N2 n2N2 1
2
n2= (0.999995)
2
1000
So 2X1 2
X2, despite N1
-
7/28/2019 6 Inference Intervals Sample Size
26/48
confidenceintervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Central Limit Theorem
When sampling from (almost) any non-normal
distribution, for sufficiently large n, X:
1. is approximately normally distributed2. has mean
3. has variance 2
nand standard error
n
The approximation is reasonable for nat least 30, asa rule-of-thumb
Though because this is an asymptotic
approximation (i.e. as n
), the bigger n is, the
better the normal approximation
Special case: if the population distribution is itself
Normal, X will have an exact Normal distribution for
any sample size n
Inference,confidenceCentral Limit Theorem
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
27/48
confidenceintervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Central Limit Theorem
Below is the sampling distribution of X for small
(red) and large (black) n
As n increases, the sampling variability of X
decreases
4 2 0 2 4
0.0
0.1
0.2
0.3
0.4
Sampling Distribution of Sample Mean
Sample mean
Density
Inference,confidenceCentral Limit Theorem
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
28/48
confidenceintervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Central Limit Theorem
Although the shape of the population distribution
does not affect the generality of the CLT result, it
does affect the speed of convergence of the
sampling distribution of X to the Normal distribution
Obviously a symmetric population distribution would
converge faster in n
In practice, n = 30 is usually adequate to make theNormal approximation reasonable
Inference,confidenceCentral Limit Theorem
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
29/48
confidenceintervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Central Limit Theorem
Remember the CLT is based on SRS
Without probability sampling methods, there is
absolutely no basis for the use of the CLT
This is principally why we insist on probability
(random) sampling
Otherwise the whole structure of statistical inference
collapses!
Inference,confidenceCentral Limit Theorem
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
30/48
confidenceintervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Central Limit Theorem
The CLT also makes the use of the variance more
reasonable
The Normal distribution is completely characterisedby its mean and variance
Hence it is sensible to focus attention on these two
characteristics of the sampling distribution
Inference,confidencePrinciples of confidence intervals
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
31/48
confidenceintervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Principles of confidence intervals
A point estimate is our best guess of an unknown
population parameter based on sample data
But as its based on a sample, there is someuncertainty/imprecision
Confidence intervals (CIs) communicate the level
of imprecision
Inference,confidencePrinciples of confidence intervals
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
32/48
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
p
Formally, an x% confidence interval covers the
unknown parameter with x% probability overrepeated samples
The shorter the confidence interval, the more reliable
the estimate
As we shall see, this is achievable by:
reducing the level of confidence
increasing the sample size
We now look at how to construct CIs
Inference,confidencePrinciples of confidence intervals
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
33/48
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
p
The general format (for our purposes) for a
confidence interval is
statistic
(multiplier coefficient)
standard error
Alternatively,
estimate
margin of error
Inference,confidenceCI for (variance known)
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
34/48
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
( )
Point estimate for is calculated usingX =
ni=1 Xi
n
Assuming the (population) variance 2 is known, thestandard error of X is
S.E.(X) = X =
N nN 1
2
n
n
Hence a 95% confidence interval for is
X 1.96 n
= X 1.96 n, X + 1.96
n
Inference,confidenceCI for (variance known)
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
35/48
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
( ) This is a simple, but important result, forming a
useful template
Note the above interval was for 95% confidence
Other levels of confidence pose no problem, but
require a different multiplier coefficient
When the variance (2) is known we obtain amultiplier from the standard normal distribution
For 90% confidence, use the multiplier 1.645
For 95% confidence, use the multiplier 1.96
For 99% confidence, use the multiplier 2.576
Hence a 99% confidence interval for is
X
2.576
n=
X 2.576
n, X + 2.576
n
Inference,confidence
i t l d
CI for (variance known)
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
36/48
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
So we see that a higher level of confidence (a
good thing) leads to a larger multiplier coefficient,and hence a wider confidence interval (a bad
thing)
Hence, other things equal, we face a trade-offbetween level of confidence and width of confidence
interval
Since the width of a CI is part-determined by the
standard error, by increasing n (costly) we willreduce the standard error, hence shorten the CI (a
good thing)
Inference,confidence
i t l d
CI for (variance unknown)
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
37/48
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Unfortunately, to use the approach just discussed
requires knowledge of the population variance, 2
This is because it is used in the standard error:
X zn
In practice, we are unlikely to know 2
After all, its a population characteristic, and so if we
do not know , why would we know 2?
Inference,confidence
intervals and
CI for (variance unknown)
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
38/48
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Recall the sampling variance of X is
Var(X) = 2X
=N nN 1
2
n
2
n
But if
2
is unknown we have a problem
It is not that we are fundamentally interested in 2,only that we need to estimate it because the
precision of X depends on it
And there is little point having a point estimate if we
know nothing about its precision
Inference,confidence
intervals and
CI for (variance unknown)
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
39/48
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Our estimate of Var(X) is
s2
X =
N
n
N s2
n s2
n
where
s2 =1
n 1
n
i=1
(xi
x)2 =
1
n 1 n
i=1
x2i
nx2
Our estimate of the standard error is thus
sx =N n
N
s2
n
s
nsince typically NnN 1 in the social sciences
Once we have estimated this, we proceed as before
to construct a CI using the estimate of the standard
error in place of the actual standard error
Inference,confidence
intervals and
CI for (variance unknown)
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
40/48
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
So, for a 90% confidence interval we use
x 1.645 sn
Similarly, for a 95% confidence interval we use
x 1.96 sn
Finally, for a 99% confidence interval we use
x 2.576 sn
Inference,confidence
intervals and
Choosing sample size
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
41/48
intervals andsample size
determination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Note the trade-off between accuracy and data cost
Solution: fix desired precision and find smallest n
which achieves this
If we want the sample mean to be within a tolerance
d of with a specified probability, then
d = z n
= n= z22
d2
n is the minimum sample size required to achieve the
desired precision nmust be an integer, so always round up!
Inference,confidence
intervals and
Choosing sample size Example
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
42/48
sample sizedetermination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
A random sample is to be taken from a population
with unknown mean and = 3 How big a sample size would be needed if there is to
be a 95% chance of X being within 1 unit of ?
The sample size nrequired for a tolerance of 1
satisfies
1 = 1.96 3n
= n = 34.57 = n= 35
Note that the required sample size in this type ofcalculation needs to be rounded up from a decimal
fraction, since rounding down would result in a value
not quite large enough!
Inference,confidence
intervals and
Adjusting the statistically determined
l i
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
43/48
sample sizedetermination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidence
intervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
sample size Incidence rate refers to the rate of occurrence, or
the percentage, of persons eligible to participate inthe study
In general, if there are k qualifying factors with an
incidence of Q1, Q2, Q3, . . ., Qk, each expressed as
a proportion:
Incidence rate = Q1 Q2 Q3 . . .Qk The completion rate is the percentage of qualified
respondents who complete the interview, enablingresearchers to account for anticipated refusals by
people who qualify
Initial sample size =Final sample size
Incidence rate
Completion rate
Inference,confidence
intervals and
Adjusting for non-response
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
44/48
sample sizedetermination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidenceintervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Sub-sampling of non-respondents the
researcher contacts a sub-sample of thenon-respondents, usually by means of telephone or
personal interviews
In replacement, the non-respondents in the current
survey are replaced with non-respondents from an
earlier, similar survey
The researcher attempts to contact these
non-respondents from the earlier survey andadminister the current survey questionnaire to them,
possibly by offering a suitable incentive
Inference,confidence
intervals and
Adjusting for non-response
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
45/48
sample sizedetermination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidenceintervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
In substitution, the researcher substitutes for
non-respondents other elements from thesampling frame that are expected to respond
The sampling frame is divided into sub-groups that
are internally homogeneous in terms of
respondent characteristics but heterogeneous in
terms of response rates
These sub-groups are then used to identify
substitutes who are similar to particularnon-respondents but dissimilar to respondents
already in the sample
Inference,confidence
intervals andl i
Adjusting for non-response
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
46/48
sample sizedetermination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidenceintervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Subjective estimates when it is no longer feasible
to increase the response rate by sub-sampling,
replacement, or substitution, it may be possible to
arrive at subjective estimates of the nature and effect
of non-response bias
This involves evaluating the likely effects of
non-response based on experience and available
information
http://find/ -
7/28/2019 6 Inference Intervals Sample Size
47/48
Inference,confidence
intervals andsample size
Imputation
-
7/28/2019 6 Inference Intervals Sample Size
48/48
sample sizedetermination
Dr James Abdey
Overview
Choosing a sample size
Estimation
Sampling distribution of X
Sampling distribution
properties
Sample size and sampling
fraction
Central Limit Theorem
Principle of confidenceintervals
Construction: CI for X
Variance Known
Construction: CI for X
Variance Unknown
Choosing sample size
Adjusting the statistically
determined sample size
Adjusting for non-response
Imputation involves imputing, or assigning, the
characteristic of interest to the non-respondents
based on the similarity of the variables available for
both non-respondents and respondents
For example, a respondent who does not report
brand usage may be imputed the usage of a
respondent with similar demographic
characteristics
http://find/