two-stage sampling plans for clustered population fileunlike the \single stage" clustered...
TRANSCRIPT
Two-stage sampling plansfor clustered population
Jiahua Chen
This is prepared for Stat 344, Chapter 8
2014
Jiahua Chen Week10a
Equal-sized clusters first
Assume that the finite population is made of N clusters ofequal size M.
Consider a sampling plan that is carried out in two stages.
an SRSWOR of n clusters from N clusters.within each sampled cluster, an SRSWOR of m elements.
Observations are denoted as yij for j = 1, 2, . . . ,m, andi = 1, 2, . . . , n.
Jiahua Chen Week10a
What is new here?
Unlike the “single stage” clustered sampling plan, we do not obtainresponse values on all M elements in the sample cluster.
Instead, only a subset of this cluster are inspected with theirresponse values measured.
This is the reason why this sample plan is a “two-stage” plan.
Jiahua Chen Week10a
What is new here?
Unlike the “single stage” clustered sampling plan, we do not obtainresponse values on all M elements in the sample cluster.
Instead, only a subset of this cluster are inspected with theirresponse values measured.
This is the reason why this sample plan is a “two-stage” plan.
Jiahua Chen Week10a
Inference under two-stage sampling plan
Inference targets are population cluster level mean Y orelement level mean ¯Y = Y /M.
We estimate ¯Y by the most straight estimators:
¯y =n∑
i=1
m∑j=1
yij/(nm) = n−1n∑
i=1
yi
with yi = m−1∑m
j=1 yij .
Estimating Y by M ¯y .
Jiahua Chen Week10a
Unbiasedness
We make use of the fact E (¯y) = E{E (¯y |C )} which is a formulamany of us may not remember.
Intuitively, this formula says that the ”average value of ¯y can becomputed in two steps:
obtain the average of ¯y in each instance of C ;
take the average of the above average over instances of C .
Jiahua Chen Week10a
Unbiasedness
Let the instance of C being that the 9th cluster in the populationis sampled. If so,
E (y9|the 9th cluster) =∑M
j=1 y9j = Y9.
Any cluster in the population could be sampled and the chancesare the same, hence,
E (yi , any i) = N−1∑N
j=1 Yi = ¯Y .
Jiahua Chen Week10a
Unbiasedness
Because ¯y is the average of yi over an srswor of size n:
E (¯y) = n−1n∑
i=1
E (yi ) = ¯Y .
This gives us unbiasedness of ¯y as an estimator of ¯Y undertwo-stage cluster sampling plan with equal cluster size.
Jiahua Chen Week10a
Variance formula (8.4)
We plan to use Var(¯y |C ) = Var(E{¯y |C}) + E{Var(¯y |C )} toget Var(¯y).
We have interpreted E{¯y |C} as computing the conditionalexpectation of the cluster sample mean yi given cluster i .
If this i = 9, then E (y9| 9th cluster) = M−1∑M
j=1 y9j = Y9.
This consideration leads to
E{¯y |C} = E (n−1n∑
i=1
yi |C ) = n−1n∑
i=1
Yi .
Jiahua Chen Week10a
Refreshing the purpose now: try to getVariance formula (8.4)
Do not forget: usingVar(¯y |C ) = Var(E{¯y |C}) + E{Var(¯y |C )} to get Var(¯y).
The first task is to compute E{¯y |C} which has been done:
E (yi | ith cluster) = Yi .
E (¯y |C ) = n−1∑n
i=1 Yi .
We now move to
Var (E (¯y |C )) = Var(n−1∑
Yi ).
Jiahua Chen Week10a
Variance of E (¯y | clusters)
What is random in E (¯y |C ) = n−1∑
Yi?
The cluster means Yi are not random;
Which subset of clusters are included in∑n
i=1 is random.
Jiahua Chen Week10a
Variance of E (¯y | clusters)
Because the clusters in the sample are obtained based onSRSWOR, we get
Var(n−1∑
Yi ) = (1
n− 1
N)S2
1
where
S21 =
1
N − 1
N∑i=1
(Yi − ¯Y )2
and this is the population variance of the cluster mean response.
More directly, S21 defined above tells us how different the cluster
means are in this population
Jiahua Chen Week10a
Variance of n−1∑
Yi
Worth to remind us again: usingVar(¯y |C ) = Var(E{¯y |C}) + E{Var(¯y |C )} to get Var(¯y).
We have succeeded on the first step at
Var(E{¯y |C}) = Var(n−1∑
Yi ) = (1
n− 1
N)S2
1 .
Jiahua Chen Week10a
Next task
The next step is on E{Var(¯y |C )}.
Recall that,
conditioning on C is to have clusters fixed.
So Var(¯y |C ) is the variation in ¯y when the clusters in the sampleare fixed.
Jiahua Chen Week10a
Variation given a single cluster
Suppose the 9th cluster is in the sample.
How much is Var(y9)?
We note y9 is the sample mean based on m elements obtainedSRSWOR from M elements in the 9th cluster of the population.
This tells us
Var(y9) = (1
m− 1
M)S2
2,9
where S22,9 is the variation in the 9th cluster.
Jiahua Chen Week10a
When clusters are given, we have
Var(¯yn|C ) = Var(n−1n∑
i=1
yi |C )
= n−2(1
m− 1
M)
n∑i=1
S22,i .
This is Var(¯y |C ) in Var(¯y) = Var(E{¯y |C}) + E{Var(¯y |C )}.
The final step to taking average over all possible samples of nclusters.
Jiahua Chen Week10a
Taking average over all possible sample of n clusters,
E{Var(¯yn|C )} = E{n−2(1
m− 1
M)
n∑i=1
S22,i}
= n−1(1
m− 1
M)
{N−1
N∑i=1
S22,i
}
= n−1(1
m− 1
M)S2
2
where
S22 = N−1
N∑i=1
S22,i
is the population average of the within cluster variations.
Jiahua Chen Week10a
We obtained
Var(E{¯yn|C}) = (1
n− 1
N)S2
1
earlier, and from the last slide
E{Var(¯yn|C )} = n−1(1
m− 1
M)S2
2 .
The formula Var(¯y) = Var(E{¯y |C}) + E{Var(¯y |C )} leads to
Var(¯y) = (1
n− 1
N)S2
1 + n−1(1
m− 1
M)S2
2 .
Jiahua Chen Week10a
Variance estimation
We estimate these s2 by their sample versions:s21 = (n − 1)−1
∑ni=1(yi − ¯y)2 and s22 = n−1
∑ni=1 s
22i .
An unbiased estimator of Var(¯y) is given by
v(¯y) = (1
n− 1
N)s21 + n−1(
1
m− 1
M)s22 .
Jiahua Chen Week10a
Sample size determination
How do we determine the sample size required to meet someprecision/cost requirements?
The answer: solving the inequality based on whateverrequirements are posted.
Jiahua Chen Week10a
Sample size determination
How do we determine the sample size required to meet someprecision/cost requirements?
The answer: solving the inequality based on whateverrequirements are posted.
Jiahua Chen Week10a
Unequal cluster sizes
We now move to unequal cluster size situation for two-stagesampling plan.
The finite population is made of N clusters (primary samplingunits).
The clusters have different numbers of elements (ultimatesampling units).
Jiahua Chen Week10a
Two-stage sampling design
Two-stage sampling plan is as follows.
in the first stage, select an SRSWOR of n primary units;(regardless of their sizes).
in the second stage, select mi elements srswor from the ithcluster in the sample with Mi elements.
Jiahua Chen Week10a
Two-stage sampling design
We have implicitly assumed Mi is known once this cluster isselected.
We have also assumed mi is somehow pre-determined.
Jiahua Chen Week10a
Response values
The response value is again denoted as yij , for the ith clusterand jth element in that cluster.
At population level, i = 1, 2, . . . ,N and j = 1, 2, . . . ,Mi .
At sampling level, i = 1, 2, . . . , n and j = 1, 2, . . . ,mi .
Be aware the abuse of notation: y13 at the population level isnot y13 at the sampling level.
Jiahua Chen Week10a
Response values
The response value is again denoted as yij , for the ith clusterand jth element in that cluster.
At population level, i = 1, 2, . . . ,N and j = 1, 2, . . . ,Mi .
At sampling level, i = 1, 2, . . . , n and j = 1, 2, . . . ,mi .
Be aware the abuse of notation: y13 at the population level isnot y13 at the sampling level.
Jiahua Chen Week10a
Population parameters
We use Yi =∑Mi
j=1 yij for cluster total. The population total
is Y =∑N
i=1 Yi .
The average response of the ith cluster is Yi = Yi/Mi .
The population mean of the cluster average isY = N−1
∑Ni=1 Yi .
The population mean of response at element level is
¯Y =
∑Ni=1 Yi∑Ni=1Mi
.
Jiahua Chen Week10a
Population parameters
When Mi ’s are not all equal, ¯Y does not related to Yi in asimple way but
¯Y =
∑Ni=1Mi Yi∑Ni=1Mi
.
Within cluster variations remain as
S22i = (Mi − 1)−1
Mi∑j=1
(yij − Yi )2.
Jiahua Chen Week10a
Summary statistics related to populationparameters
We use
yi = m−1i
mi∑j=1
yij
for sample mean of the responses in the ith sampling unit.
We estimate population mean of elements ¯Y by
ˆYR =
∑ni=1Mi yi∑ni=1Mi
.
This estimator estimates Yi by Mi yi
It then takes average over the clusters in the sample to get anestimate of the population mean at the element level.
Jiahua Chen Week10a
Summary statistics related to populationparameters
We estimate Y =∑N
i=1 Yi naturally by
YR = (N∑i=1
Mi )ˆYR .
Notice that subscript R is used here because the estimator isa ratio type estimator.
Jiahua Chen Week10a
Variance formulas
The variance formulas are given as (8.9) and (8.10) in thetextbook.
Var(YR) ≈ N2(1/n − 1/N)N∑i=1
M2i (Yi − ¯Y )2/(N − 1)
+(N/n)N∑i=1
M2i (1/mi − 1/Mi )S
22i
v(YR) = N2(1/n − 1/N)n∑
i=1
M2i (yi −
ˆYR)2/(n − 1)
+(N/n)n∑
i=1
M2i (1/mi − 1/Mi )s
22i
First remark: no need to memorize them, not insightful.The last bonus question of this course: why the first formulais an approximation but the second one is not?
Jiahua Chen Week10a
Variance formulas
The variance formulas are given as (8.9) and (8.10) in thetextbook.
Var(YR) ≈ N2(1/n − 1/N)N∑i=1
M2i (Yi − ¯Y )2/(N − 1)
+(N/n)N∑i=1
M2i (1/mi − 1/Mi )S
22i
v(YR) = N2(1/n − 1/N)n∑
i=1
M2i (yi −
ˆYR)2/(n − 1)
+(N/n)n∑
i=1
M2i (1/mi − 1/Mi )s
22i
First remark: no need to memorize them, not insightful.The last bonus question of this course: why the first formulais an approximation but the second one is not?
Jiahua Chen Week10a
Variance formulas
The variance formulas are given as (8.9) and (8.10) in thetextbook.
Var(YR) ≈ N2(1/n − 1/N)N∑i=1
M2i (Yi − ¯Y )2/(N − 1)
+(N/n)N∑i=1
M2i (1/mi − 1/Mi )S
22i
v(YR) = N2(1/n − 1/N)n∑
i=1
M2i (yi −
ˆYR)2/(n − 1)
+(N/n)n∑
i=1
M2i (1/mi − 1/Mi )s
22i
First remark: no need to memorize them, not insightful.The last bonus question of this course: why the first formulais an approximation but the second one is not?
Jiahua Chen Week10a