6 inference intervals sample size

Upload: eduson2013

Post on 03-Apr-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 6 Inference Intervals Sample Size

    1/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Applied Marketing

    (Market Research Methods)

    Topic 6:

    Inference, confidence intervals andsample size determination

    Dr James Abdey

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    2/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Overview

    Here we consider sample size determination insimple random sampling

    Properties of the sampling distribution are

    discussed

    We describe the required adjustments to statistically

    determined sample sizes to account for incidence

    and completion rates

    Non-response issues in sampling are also covered,

    with ways of improving response rates

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    3/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Choosing a sample size

    The question How big a sample do I need to take?

    is a common one when sampling data The answer to this depends on the quality of

    inference that the researcher requires from the data

    In the estimation context this can be expressed in

    terms of the accuracy of estimation If the researcher requires that there should be a 95%

    chance that the estimation error should be no bigger

    than d units (we refer to d as the tolerance), then

    this is equivalent to having a 95% confidence

    interval of width 2d

    Note here d represents the half-width of the

    confidence interval since the point estimate is, by

    construction, at the centre of the confidence interval

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    4/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Simple random sampling (SRS)

    Recall a simple random sample is a sample

    selected by a process such that every possiblesample (of the same size, n) has the same

    probability of selection

    The selection process is left to chance, thus

    eliminating the effect of selection bias

    Due to the random selection mechanism, we do not

    know (in advance) which sample will occur

    Every population element has a known, non-zero

    probability of selection in the sample but no element

    is certain to appear

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    5/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Simple random sampling (SRS)

    Example

    Consider a population of size N = 6 elements: A, B,C, D, E and F

    We consider all possible samples of size n= 2

    (without replacement)

    There are 15 different, but equally likely, such

    samples:

    AB, AC, AD, AE, AF, BC, BD, BE,

    BF, CD, CE, CF, DE, DF, EF

    Since this is SRS, each sample has a probability of

    selection of 1/15

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    6/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Estimation

    A population has particular characteristics of interest

    such as the mean, variance etc.

    Collectively we refer to these characteristics as

    parameters

    If we do not have population data, the parametervalues will be unknown

    Statistical inference is the process of estimating

    the (unknown) parameter values using the (known)sample data

    We use a statistic (estimator) calculated from

    sample observations to provide a point estimate

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    7/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Estimation Example

    Returning to our example, recall there are 15

    different samples of size 2 from a population of size 6

    Suppose the variable of interest is income

    Individual A B C D E F

    Income in 000s 3 6 4 9 7 7

    If we seek the population mean, , we will use thesample mean, X, as our estimator

    X =1

    n

    ni=1

    Xi

    For example, if the observed sample was AB, the

    sample mean is (3000 + 6000)/2 = 4,500

    http://find/http://goback/
  • 7/28/2019 6 Inference Intervals Sample Size

    8/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Estimation Example

    Clearly, different observed samples will lead to

    different sample means

    Consider X for all possible samples (in 000s):

    Sample AB AC AD AE AF BC BD BE

    Values 3 6 3 4 3 9 3 7 3 7 6 4 6 9 6 7

    X 4.5 3.5 6 5 5 5 7.5 6.5

    Sample BF CD CE CF DE DF EF

    Values 6 7 4 9 4 7 4 7 9 7 9 7 7 7

    X 6.5 6.5 5.5 5.5 8 8 7

    So X values vary from 3.5 to 8, depending on the

    sample values

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    9/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sampling distribution of X

    The previous slide showed all possible values of the

    estimator X

    Since we have the population data here, we can

    actually compute the population mean (in 000s)

    =1

    N

    Ni=1

    Xi =3 + 6 + 4 + 9 + 7 + 7

    6= 6

    So even with SRS, we obtain someX values far from

    Here only one sample (AD) results in X =

    http://goforward/http://find/http://goback/
  • 7/28/2019 6 Inference Intervals Sample Size

    10/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sampling distribution of X

    Lets now consider the maximum | X |

    max | X | Number of samples Probability0 1 0.067

    0.5 6 0.400

    1 10 0.667

    1.5 12 0.800

    2 14 0.933

    2.5 15 1.000

    So, for example, there is an 80% chance of being

    within 1.5 units of

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    11/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sampling distribution of X

    We now represent this as a frequency distribution

    That is, we record the frequency of each possiblevalue of X

    X Frequency Relative frequency

    3.5 1 1/15 = 0.067

    4.5 1 1/15 = 0.067

    5.0 3 3/15 = 0.2005.5 2 2/15 = 0.133

    6.0 1 1/15 = 0.067

    6.5 3 3/15 = 0.200

    7.0 1 1/15 = 0.0677.5 1 1/15 = 0.067

    8.0 2 2/15 = 0.133

    This is known as the sampling distribution of X

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    12/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sampling distribution of X

    The sampling distribution is a central and vital

    concept in statistics

    It can be used to evaluate how good an estimator is

    Specifically, we care about how close the estimator

    is to the population parameter of interest

    As we have seen, different samples yield different X

    values, as a consequence of the random sampling

    procedure

    Hence estimators (of which X is an example) are

    random variables So, X is our estimator of

    The observed value of X is a point estimate

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    13/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit TheoremPrinciple of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sampling distribution properties

    Like any distribution, we care about a samplingdistributions mean and variance

    Together, we can assess how good an estimator is

    First, consider the mean we seek an estimator

    which does not mislead us systematically

    So the average (mean) value of an estimator, over

    all possible samples, should be equal to the

    population parameter

    http://find/http://goback/
  • 7/28/2019 6 Inference Intervals Sample Size

    14/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit TheoremPrinciple of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sampling distribution properties

    Returning to our example:

    X Frequency Product3.5 1 3.5

    4.5 1 4.5

    5.0 3 15.0

    5.5 2 11.0

    6.0 1 6.06.5 3 19.5

    7.0 1 7.0

    7.5 1 7.5

    8.0 2 16.0Total 15 90.0

    Hence the mean of this sampling distribution is 90/15

    = 6

    f

    http://goforward/http://find/http://goback/
  • 7/28/2019 6 Inference Intervals Sample Size

    15/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit TheoremPrinciple of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sampling distribution properties An important difference between a sampling

    distribution and other distributions is that the values

    in a sampling distribution are summary measures ofwhole samples (i.e. statistics/estimators) rather than

    individual observations

    Formally, the mean of a sampling distribution is

    called the expected value of the estimator, denotedby E[]

    Hence the expected value of the sample mean is

    E[X]

    An unbiased estimator has its expected value equal

    to the parameter being estimated

    For our example, E[X] = 6 =

    I fS li di ib i i

    http://goforward/http://find/http://goback/
  • 7/28/2019 6 Inference Intervals Sample Size

    16/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit TheoremPrinciple of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sampling distribution properties

    Fortunately the sample mean X is always an

    unbiased estimator in SRS, regardless of:

    the sample size, n

    the distribution of the (parent) population

    This is a good illustration of a population parameter,

    , being estimated by its sample counterpart, X

    InferenceS li di ib i i

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    17/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit TheoremPrinciple of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sampling distribution properties

    The unbiasedness of an estimator is clearly

    desirable, however we also need to take into accountthe dispersion of the estimators sampling distribution

    Ideally, the possible values of the estimator should

    not vary much around the true parameter value

    So, we seek an estimator with a small variance

    Recall the variance is defined to be the mean of the

    squared deviations about the mean of the distribution

    In the case of sampling distributions, it is referred to

    as the sampling variance

    InferenceS li di t ib ti ti

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    18/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit TheoremPrinciple of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sampling distribution properties

    Returning to our example:

    X X (X )2 Frequency Product3.5 2.5 6.25 1 6.254.5 1.5 2.25 1 2.255.0 1.0 1.00 3 3.00

    5.5 0.5 0.25 2 0.506.0 0.0 0.00 1 0.006.5 0.5 0.25 3 1.75

    7.0 1.0 1.00 1 1.00

    7.5 1.5 2.25 1 2.25

    8.0 2.0 4.00 2 8.00Total 15 24.00

    Hence sampling variance is 24/15 = 1.6

    InferenceS li di t ib ti ti

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    19/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit TheoremPrinciple of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sampling distribution properties

    The population itself has a variance thepopulation variance, 2

    X X (X )2 Frequency Product3 3 9 1 96 0 0 1 04 2 4 1 49 3 9 1 9

    7 1 1 2 2

    Hence the population variance is 2 = 24/6 = 4

    InferenceSampling distrib tion properties

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    20/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit TheoremPrinciple of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sampling distribution properties We now consider the relationship between 2 and

    the sampling variance

    Intuitively, a larger 2

    should lead to a largersampling variance why?

    For population size N and sample size n,

    Var(X) =N nN 1

    2

    n So for our example,

    Var(X) =6 26

    1 4

    2= 1.6

    We use the term standard error to refer to the

    standard deviation of the sampling distribution,

    S.E.(X) =Var(X) =N n

    N 1

    2

    n

    = X

    Inference,Sampling distribution properties

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    21/48

    Inference,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit TheoremPrinciple of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sampling distribution properties

    Implications:

    as the sample size, n, increases, the samplingvariance decreases, i.e. the precision increases1

    provided the sampling fraction, n/N, is small, theterm

    N nN 1 1

    so can be ignored the precision depends

    effectively on n only

    1

    Although greater precision is desirable, data collection costs willrise with n (remember why we sample in the first place!)

    Inference,Sample size and sampling fraction

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    22/48

    ,confidence

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit TheoremPrinciple of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sample size and sampling fraction

    The larger the sample, the less variability there will

    be between samples

    X n= 2 n = 43.50 1

    4.50 1

    5.00 3 2

    5.25 1

    5.50 2 15.75 3

    6.00 1 1

    6.25 2

    6.50 3

    6.75 1

    7.00 1

    7.25 1

    7.50 1

    8.00 2

    Inference,Sample size and sampling fraction

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    23/48

    confidenceintervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit TheoremPrinciple of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sample size and sampling fraction

    There is a striking improvement in the precision of

    the estimator

    The variability has decreased considerably

    Range of possible X values goes from 3.5 to 8.0down to 5.0 to 7.25

    The sampling variance is reduced from 1.6 to 0.4

    Note precision in statistics refers to the inverse of

    the sampling variance

    Inference,Sample size and sampling fraction

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    24/48

    confidenceintervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sample size and sampling fraction

    The factor NnN1 decreases steadily as n N When n= 1 the factor equals 1, and when n= N it

    equals 0

    Sampling without replacement, increasing n must

    increase precision since less of the population is left

    out

    In much practical sampling N is very large (e.g.

    several million), while n is comparably small (e.g. at

    most 1,000, say)

    Therefore in such cases the factor NnN1 becomes

    negligible, hence

    Var(X) =N nN 1

    2

    n

    2

    nfor small n/N

    Inference,Sample size and sampling fraction

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    25/48

    confidenceintervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sample size and sampling fraction

    n/N is called the sampling fraction

    When N is large, it is the sample size nwhich isimportant in determining precision, not the sampling

    fraction

    Consider two populations: N1 = 3 million andN2 = 200 million, both with the same variance

    2

    We sample n1 = n2 = 1,000 from each population,then

    2X1

    =N1 n1N1

    1

    2

    n1= (0.999667)

    2

    1000

    2X2

    =N2 n2N2 1

    2

    n2= (0.999995)

    2

    1000

    So 2X1 2

    X2, despite N1

  • 7/28/2019 6 Inference Intervals Sample Size

    26/48

    confidenceintervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Central Limit Theorem

    When sampling from (almost) any non-normal

    distribution, for sufficiently large n, X:

    1. is approximately normally distributed2. has mean

    3. has variance 2

    nand standard error

    n

    The approximation is reasonable for nat least 30, asa rule-of-thumb

    Though because this is an asymptotic

    approximation (i.e. as n

    ), the bigger n is, the

    better the normal approximation

    Special case: if the population distribution is itself

    Normal, X will have an exact Normal distribution for

    any sample size n

    Inference,confidenceCentral Limit Theorem

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    27/48

    confidenceintervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Central Limit Theorem

    Below is the sampling distribution of X for small

    (red) and large (black) n

    As n increases, the sampling variability of X

    decreases

    4 2 0 2 4

    0.0

    0.1

    0.2

    0.3

    0.4

    Sampling Distribution of Sample Mean

    Sample mean

    Density

    Inference,confidenceCentral Limit Theorem

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    28/48

    confidenceintervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Central Limit Theorem

    Although the shape of the population distribution

    does not affect the generality of the CLT result, it

    does affect the speed of convergence of the

    sampling distribution of X to the Normal distribution

    Obviously a symmetric population distribution would

    converge faster in n

    In practice, n = 30 is usually adequate to make theNormal approximation reasonable

    Inference,confidenceCentral Limit Theorem

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    29/48

    confidenceintervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Central Limit Theorem

    Remember the CLT is based on SRS

    Without probability sampling methods, there is

    absolutely no basis for the use of the CLT

    This is principally why we insist on probability

    (random) sampling

    Otherwise the whole structure of statistical inference

    collapses!

    Inference,confidenceCentral Limit Theorem

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    30/48

    confidenceintervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Central Limit Theorem

    The CLT also makes the use of the variance more

    reasonable

    The Normal distribution is completely characterisedby its mean and variance

    Hence it is sensible to focus attention on these two

    characteristics of the sampling distribution

    Inference,confidencePrinciples of confidence intervals

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    31/48

    confidenceintervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Principles of confidence intervals

    A point estimate is our best guess of an unknown

    population parameter based on sample data

    But as its based on a sample, there is someuncertainty/imprecision

    Confidence intervals (CIs) communicate the level

    of imprecision

    Inference,confidencePrinciples of confidence intervals

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    32/48

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    p

    Formally, an x% confidence interval covers the

    unknown parameter with x% probability overrepeated samples

    The shorter the confidence interval, the more reliable

    the estimate

    As we shall see, this is achievable by:

    reducing the level of confidence

    increasing the sample size

    We now look at how to construct CIs

    Inference,confidencePrinciples of confidence intervals

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    33/48

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    p

    The general format (for our purposes) for a

    confidence interval is

    statistic

    (multiplier coefficient)

    standard error

    Alternatively,

    estimate

    margin of error

    Inference,confidenceCI for (variance known)

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    34/48

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    ( )

    Point estimate for is calculated usingX =

    ni=1 Xi

    n

    Assuming the (population) variance 2 is known, thestandard error of X is

    S.E.(X) = X =

    N nN 1

    2

    n

    n

    Hence a 95% confidence interval for is

    X 1.96 n

    = X 1.96 n, X + 1.96

    n

    Inference,confidenceCI for (variance known)

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    35/48

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    ( ) This is a simple, but important result, forming a

    useful template

    Note the above interval was for 95% confidence

    Other levels of confidence pose no problem, but

    require a different multiplier coefficient

    When the variance (2) is known we obtain amultiplier from the standard normal distribution

    For 90% confidence, use the multiplier 1.645

    For 95% confidence, use the multiplier 1.96

    For 99% confidence, use the multiplier 2.576

    Hence a 99% confidence interval for is

    X

    2.576

    n=

    X 2.576

    n, X + 2.576

    n

    Inference,confidence

    i t l d

    CI for (variance known)

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    36/48

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    So we see that a higher level of confidence (a

    good thing) leads to a larger multiplier coefficient,and hence a wider confidence interval (a bad

    thing)

    Hence, other things equal, we face a trade-offbetween level of confidence and width of confidence

    interval

    Since the width of a CI is part-determined by the

    standard error, by increasing n (costly) we willreduce the standard error, hence shorten the CI (a

    good thing)

    Inference,confidence

    i t l d

    CI for (variance unknown)

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    37/48

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Unfortunately, to use the approach just discussed

    requires knowledge of the population variance, 2

    This is because it is used in the standard error:

    X zn

    In practice, we are unlikely to know 2

    After all, its a population characteristic, and so if we

    do not know , why would we know 2?

    Inference,confidence

    intervals and

    CI for (variance unknown)

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    38/48

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Recall the sampling variance of X is

    Var(X) = 2X

    =N nN 1

    2

    n

    2

    n

    But if

    2

    is unknown we have a problem

    It is not that we are fundamentally interested in 2,only that we need to estimate it because the

    precision of X depends on it

    And there is little point having a point estimate if we

    know nothing about its precision

    Inference,confidence

    intervals and

    CI for (variance unknown)

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    39/48

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Our estimate of Var(X) is

    s2

    X =

    N

    n

    N s2

    n s2

    n

    where

    s2 =1

    n 1

    n

    i=1

    (xi

    x)2 =

    1

    n 1 n

    i=1

    x2i

    nx2

    Our estimate of the standard error is thus

    sx =N n

    N

    s2

    n

    s

    nsince typically NnN 1 in the social sciences

    Once we have estimated this, we proceed as before

    to construct a CI using the estimate of the standard

    error in place of the actual standard error

    Inference,confidence

    intervals and

    CI for (variance unknown)

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    40/48

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    So, for a 90% confidence interval we use

    x 1.645 sn

    Similarly, for a 95% confidence interval we use

    x 1.96 sn

    Finally, for a 99% confidence interval we use

    x 2.576 sn

    Inference,confidence

    intervals and

    Choosing sample size

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    41/48

    intervals andsample size

    determination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Note the trade-off between accuracy and data cost

    Solution: fix desired precision and find smallest n

    which achieves this

    If we want the sample mean to be within a tolerance

    d of with a specified probability, then

    d = z n

    = n= z22

    d2

    n is the minimum sample size required to achieve the

    desired precision nmust be an integer, so always round up!

    Inference,confidence

    intervals and

    Choosing sample size Example

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    42/48

    sample sizedetermination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    A random sample is to be taken from a population

    with unknown mean and = 3 How big a sample size would be needed if there is to

    be a 95% chance of X being within 1 unit of ?

    The sample size nrequired for a tolerance of 1

    satisfies

    1 = 1.96 3n

    = n = 34.57 = n= 35

    Note that the required sample size in this type ofcalculation needs to be rounded up from a decimal

    fraction, since rounding down would result in a value

    not quite large enough!

    Inference,confidence

    intervals and

    Adjusting the statistically determined

    l i

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    43/48

    sample sizedetermination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidence

    intervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    sample size Incidence rate refers to the rate of occurrence, or

    the percentage, of persons eligible to participate inthe study

    In general, if there are k qualifying factors with an

    incidence of Q1, Q2, Q3, . . ., Qk, each expressed as

    a proportion:

    Incidence rate = Q1 Q2 Q3 . . .Qk The completion rate is the percentage of qualified

    respondents who complete the interview, enablingresearchers to account for anticipated refusals by

    people who qualify

    Initial sample size =Final sample size

    Incidence rate

    Completion rate

    Inference,confidence

    intervals and

    Adjusting for non-response

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    44/48

    sample sizedetermination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidenceintervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Sub-sampling of non-respondents the

    researcher contacts a sub-sample of thenon-respondents, usually by means of telephone or

    personal interviews

    In replacement, the non-respondents in the current

    survey are replaced with non-respondents from an

    earlier, similar survey

    The researcher attempts to contact these

    non-respondents from the earlier survey andadminister the current survey questionnaire to them,

    possibly by offering a suitable incentive

    Inference,confidence

    intervals and

    Adjusting for non-response

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    45/48

    sample sizedetermination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidenceintervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    In substitution, the researcher substitutes for

    non-respondents other elements from thesampling frame that are expected to respond

    The sampling frame is divided into sub-groups that

    are internally homogeneous in terms of

    respondent characteristics but heterogeneous in

    terms of response rates

    These sub-groups are then used to identify

    substitutes who are similar to particularnon-respondents but dissimilar to respondents

    already in the sample

    Inference,confidence

    intervals andl i

    Adjusting for non-response

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    46/48

    sample sizedetermination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidenceintervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Subjective estimates when it is no longer feasible

    to increase the response rate by sub-sampling,

    replacement, or substitution, it may be possible to

    arrive at subjective estimates of the nature and effect

    of non-response bias

    This involves evaluating the likely effects of

    non-response based on experience and available

    information

    http://find/
  • 7/28/2019 6 Inference Intervals Sample Size

    47/48

    Inference,confidence

    intervals andsample size

    Imputation

  • 7/28/2019 6 Inference Intervals Sample Size

    48/48

    sample sizedetermination

    Dr James Abdey

    Overview

    Choosing a sample size

    Estimation

    Sampling distribution of X

    Sampling distribution

    properties

    Sample size and sampling

    fraction

    Central Limit Theorem

    Principle of confidenceintervals

    Construction: CI for X

    Variance Known

    Construction: CI for X

    Variance Unknown

    Choosing sample size

    Adjusting the statistically

    determined sample size

    Adjusting for non-response

    Imputation involves imputing, or assigning, the

    characteristic of interest to the non-respondents

    based on the similarity of the variables available for

    both non-respondents and respondents

    For example, a respondent who does not report

    brand usage may be imputed the usage of a

    respondent with similar demographic

    characteristics

    http://find/