cluster sampling...cluster sampling •the smallest unit into which the population can be divided is...

16
Cluster Sampling Bijay Lal Pradhan

Upload: others

Post on 06-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

Cluster SamplingBijay Lal Pradhan

Page 2: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

Cluster Sampling• The smallest unit into which the population can be divided is known as

elements. The group of such elements is known as clusters. When the sampling units are the clusters, the method of selecting sample is cluster sampling.

• In cluster sampling the population is first divided into number of non-overlapping clusters. A simple random sample of clusters is selected and all the elements belonging to the clusters are surveyed or studied.

• For the specified sample size, less number of clusters are selected. This will considerable save in time, money and labour. The construction of clusters are made in such way that they are heterogeneous as far as possible within the elements in a cluster and homogeneous as far as possible between the elements of the clusters.

Page 3: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

Procedure of cluster sampling

- divide the whole population into clusters according to some well defined rule.

- Treat the clusters as sampling units.

- Choose a sample of clusters according to some procedure.

- Carry out a complete enumeration of the selected clusters, i.e., collect information on all the sampling units available in selected clusters.

Page 4: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

Case of equal clusters

• Suppose the population is divided into N clusters and each cluster is of size M .

• Select a sample of n clusters from N clusters by the method of SRS, generally WOR.

• So total population size = NM total sample size = nM .

Let yij

be the value of the characteristic under study for the value of the jth element ( j= 1,2,...,M ) j in the ith cluster ( i= 1,2,...,N ).

Page 5: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling
Page 6: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

Equal Number of Units (M) in each cluster

Page 7: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

NotationsYij = jth element of ith cluster in the population

yij = jth element of ith cluster in sample

𝑌𝑖 = σ𝑗=1𝑀 𝑌𝑖𝑗 = 𝑡𝑜𝑡𝑎𝑙 𝑜𝑓 𝑖𝑡ℎ 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 (for equal no of items in cluster)

= σ𝑗=1𝑀𝑖 𝑌𝑖𝑗 (for unequal no of items in cluster)

𝑌 = σ𝑖=1𝑁 𝑌𝑖 = σ𝑖=1

𝑁 σ𝑗=1𝑀 𝑌𝑖𝑗 = 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑡𝑜𝑡𝑎𝑙

σ𝑖=1𝑁 σ

𝑗=1𝑀𝑖 𝑌𝑖𝑗

Page 8: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

Notations

ത𝑌𝑖 =σ𝑗=1𝑀 𝑌𝑖𝑗

𝑀=

𝑌𝑖

𝑀= 𝑚𝑒𝑎𝑛 𝑝𝑒𝑟 𝑒𝑙𝑒𝑚𝑒𝑛𝑡 𝑜𝑓 𝑖𝑡ℎ 𝑐𝑙𝑢𝑠𝑡𝑒𝑟

= σ𝑗=1

𝑀𝑖 𝑌𝑖𝑗

𝑀𝑖=

𝑌𝑖

𝑀𝑖(unequal cluster)

𝑌 =

𝑖=1

𝑁𝑌𝑖𝑁=

𝑖=1

𝑁

𝑗=1

𝑀𝑌𝑖𝑗

𝑁𝑀= 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 mean per element

𝑦𝑖 =

𝑗=1

𝑚𝑦𝑖𝑗

𝑀= sample mean per element for i𝑡ℎ cluster

𝑦 =

𝑖=1

𝑛𝑦𝑖𝑛=

𝑖=1

𝑛

𝑗=1

𝑀𝑦𝑖𝑗

𝑛𝑀= sample mean per element

Here we assume the equal number of elements within Cluster

Page 9: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

Notations

S2 =

𝑖=1

𝑁

𝑗=1

𝑀(Y𝑖𝑗−𝑌)2

(𝑁𝑀 − 1)=population mean square among elements

S𝑏2 = σ𝑖=1

𝑁 (Y𝑖−𝑌)2

(𝑁−1)=population mean square between cluster means

S𝑤2 =

𝑖=1

𝑁

𝑗=1

𝑀(Y𝑖𝑗−𝑌𝑖)

2

𝑁(𝑀 − 1)=population mean square within clusters

s2 =

𝑖=1

𝑛

𝑗=1

𝑀(𝑦𝑖𝑗 − 𝑦)2

(n𝑀 − 1)=sample mean square among elements

s𝑏2 =

𝑖=1

𝑛(𝑦

𝑖− 𝑦)2

(𝑛 − 1)=sample mean square between cluster means

s𝑤2 =

𝑖=1

𝑛

𝑗=1

𝑀(𝑦𝑖𝑗 − 𝑦𝑖)

2

𝑛(𝑀 − 1)=sample mean square within clusters

Page 10: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

The relation between S2, Sb2 and Sw

2 is as follows

We have S2 = σ𝑖=1𝑁 σ𝑗=1

𝑀 (Y𝑖𝑗 −𝑌)2

(𝑁𝑀−1)

𝑁𝑀−1 𝑆2 = σ𝑖=1𝑁 σ𝑗=1

𝑀 (Y𝑖𝑗−𝑌)2=σ𝑖=1𝑁 σ𝑗=1

𝑀 (Y𝑖𝑗−ത𝑌𝑖 + ത𝑌𝑖 −𝑌)2

𝑁𝑀 − 1 𝑆2 = σ𝑖=1𝑁 σ𝑗=1

𝑀 (Y𝑖𝑗− ത𝑌𝑖)2 + M σ𝑖=1

𝑁 ( ത𝑌𝑖 −𝑌)2

𝑁𝑀 − 1 𝑆2 = 𝑁 𝑀 − 1 𝑆𝑤2 +𝑀(𝑁 − 1)𝑆𝑏

2

Page 11: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

In Cluster sampling sample mean is unbiased estimator of population mean 𝐸 ധ𝑦 = ധ𝑌

Proof:

Now, 𝐸 ധ𝑦 = 𝐸 σ𝑖=1𝑛 ത𝑦𝑖

𝑛=

1

𝑛σ𝑖=1𝑛 𝐸(ത𝑦𝑖) = ധ𝑌

Since SRSWOR is used

Thus ധ𝑦= ത𝑦𝑐𝑙 is an unbiased estimate of ധ𝑌

Page 12: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

Theorem:In a simple random sampling without replacement of n clusters from population of N clusters each containing M elements

𝑽 ന𝒚 =𝟏 − 𝒇 𝑺𝒃

𝟐

𝒏

The variance of ന𝒚 can be derived on the same lines as deriving the variance of sample mean in SRSWOR. The only difference is that in SRSWOR, the sampling units are y1, y2, …..,yn whereas in case of cluster sampling y , the sampling units are ȳ1, ȳ2, ….., ȳn

𝑉 ധ𝑦 = 𝐸(ധ𝑦 − ത𝑌)2= 𝑁−𝑛

𝑁𝑛𝑆𝑏2 =

𝟏−𝒇 𝑺𝒃𝟐

𝒏

Where 𝑆𝑏2 =

1

(𝑁−1)σ𝑖=1𝑁 ( ത𝑌𝑖 − ധ𝑌)2 which is the mean sum of square

between the cluster means in the population.

Page 13: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

Theorem:In a simple random sampling without replacement of n clusters from population of N clusters each containing M elements

𝑽 ന𝒚 =𝟏−𝒇 𝑺𝒃

𝟐

𝒏= 𝟏 − 𝒇 𝑵𝑴− 𝟏 𝑺𝟐

{𝟏+ 𝑴−𝟏 𝝆𝒄𝒍}

𝑴𝟐 𝑵−𝟏 𝒏= 𝟏 − 𝒇 𝑺𝟐 =

{𝟏+ 𝑴−𝟏 𝝆𝒄𝒍}

𝑴𝒏Where 𝒇 =

𝒏

𝑵

Proof:

𝑉 ധ𝑦 =1−𝑓 𝑆𝑏

2

𝑛=

(1−𝑓)

𝑛 (𝑁−1)σ𝑖=1𝑁 ത𝑌𝑖 − ധ𝑌

2Where ത𝑌 =

𝑌

𝑁and S𝑏

2 =σ𝑖=1𝑁 ത𝑌𝑖−ധ𝑌

2

𝑁−1

Now, σ𝑖=1𝑁 ത𝑌𝑖 − ധ𝑌

2= σ𝑖=1

𝑁 1

𝑀σ𝑖=1𝑀 𝑌𝑖𝑗 − ധ𝑌

2

=1

𝑀2σ𝑖=1𝑁 σ𝑗=1

𝑀 (𝑌𝑖𝑗 − ധ𝑌)2+2σ𝑖=1𝑁 σ𝑗<𝑘=1

𝑀 (𝑌𝑖𝑗 − ധ𝑌)(𝑌𝑖𝑘 − ധ𝑌)

=1

𝑀2 𝑁𝑀 − 1 𝑆2 +1

𝑀2 𝑀 − 1 𝑁𝑀 − 1 𝜌𝑐𝑙𝑆2

=1

𝑀2 𝑁𝑀 − 1 𝑆2{1 + 𝑀 − 1 𝜌𝑐𝑙}

Hence, 𝑉 ധ𝑦 =(1−𝑓)

𝑀2𝑛(𝑁−1)𝑁𝑀 − 1 𝑆2{1 + 𝑀 − 1 𝜌𝑐𝑙}

When population size is very large, 𝑁𝑀 − 1 ≈ 𝑁𝑀 and M(N – 1) » MN

Therefore, 𝑉 ധ𝑦 = (1 − 𝑓)𝑆2{1+ 𝑀−1 𝜌𝑐𝑙}

𝑀𝑛

Intraclass Correlation

𝜌𝑐𝑙 =𝐸(𝑌𝑖𝑗 − ത𝑌)(𝑌𝑖𝑘 − ത𝑌)

𝐸(𝑌𝑖𝑗 − ത𝑌)2

=2σ𝑖=1

𝑁 σ𝑗<𝑘=1𝑀 (𝑌𝑖𝑗 − ത𝑌)(𝑌𝑖𝑘 − ത𝑌)

𝑀 − 1 𝑁𝑀 − 1 𝑆2

Page 14: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

Note:Efficiency of cluster sampling with respect to SRSWOR of nM from the whole population is

𝐸 =𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑟𝑎𝑛𝑑𝑜𝑚 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔

𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑜𝑓 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔=

𝑆2

𝑀𝑆𝑏2

= 𝑁 𝑀−1 𝑆𝑤

2+𝑀(𝑁−1)𝑆𝑏2

𝑀𝑆𝑏2 =

𝑁 𝑀−1 𝑆𝑤2

𝑀𝑆𝑏2 + (N-1)

Thus the relative efficiency increases when 𝑆𝑤2 is large and 𝑆𝑏

2 is small. So cluster sampling will be efficient if clusters are so formed that the variation the between cluster means is as small as possible while variation within the clusters is as large as possible.

Page 15: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

Merits and Demerits of Cluster Sampling

Merits

• It is useful when sampling frame of elements may not be readily available

• It is the most time-efficient and cost-efficient probability design for large geographical areas

• This method is easy to be used cheaper and faster from practicality viewpoint

• Larger sample size can be used due to increased level of accessibility of perspective sample group members

Demerits

• Requires group level information.

• The efficiency decrease with increase in size of cluster

• Enumeration of sampling units within cluster is difficult when population size is large

Page 16: Cluster Sampling...Cluster Sampling •The smallest unit into which the population can be divided is known as elements. The group of such elements is known as clusters. When the sampling

Questions1. What is cluster sampling? When is cluster sampling more effective

than other method of sampling?

2. Derive the relation for unbiased estimator of population mean and variance using cluster sampling.

3. Differentiate simple random sampling and cluster sampling.

4. Prove that

𝑉 ധ𝑦 =1 − 𝑓 𝑆𝑏

2

𝑛= 1 − 𝑓 𝑁𝑀 − 1 𝑆2

1 + 𝑀 − 1 𝜌𝑐𝑙𝑀2 𝑁 − 1 𝑛

= 1 − 𝑓 𝑆21+ 𝑀−1 𝜌𝑐𝑙

𝑀𝑛Where 𝑓 =

𝑛

𝑁