university of massachusetts amherst · vi abstract use of random permutation model in rate...
TRANSCRIPT
USE OF RANDOM PERMUTATION MODEL
IN RATE ESTIMATION AND STANDARDIZATION
A Dissertation Presented
by
WENJUN LI
Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment
of the requirements for the degree of
DOCTOR OF PHILOSOPHY
February 2003
Public Health Biostatistics and Epidemiology
© Copyright by Wenjun Li 2003
All Rights Reserved
USE OF RANDOM PERMUTATION MODEL
IN RATE ESTIMATION AND STANDARDIZATION
A Dissertation Presented
by
WENJUN LI
Approved as to style and content by: ____________________________________
Edward J. Stanek III, Chair ____________________________________
Carol Biglow, Member ____________________________________
John P. Buonaccorsi, Member
_________________________________________ Philip C. Nasca, Department Chair Biostatistics and Epidemiology
DEDICATION
To my parents Li Beihua and Wu Dehua,
and my wife Lu Lu
v
ACKNOWLEDGMENTS
I would like to express appreciation on Professors Edward J. Stanek III, John P.
Buonaccorsi and Carol Biglow. I am deeply grateful to Professor Stanek, without his
encouragement and guidance, this dissertation research would not be completed
successfully. I thank Drs David Hosmer and John Griffith for their helpful comments.
vi
ABSTRACT
USE OF RANDOM PERMUTATION MODEL IN RATE ESTIMATION AND
STANDARDIZATION
FEBRUARY 2003
WENJUN LI, B.S., FUDAN UNIVERSITY
M.S., UNIVERSITY OF MASSACHUSETTS AMHERST
Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST
Directed by: Professor Edward J. Stanek, III
Through integrating techniques from several areas in survey sampling, we
develop an alternative method of deriving estimators using random permutation models
under (stratified) simple random sampling without replacement. The finite random
permutation model links the samples to the population. The joint permutation of
response and auxiliary variables is modeled using seemingly unrelated regression. We
use prediction theory from the super-population sampling literature to derive the linear
unbiased minimum variance predictors of population means under the design-based
framework using the finite estimating equation approach of Binder and Patak (1994).
The predictors have functional forms similar to those derived using design-based,
model-assisted and calibration approaches, but depend on neither superpopulation nor
regression model assumptions. We applied the results to standardization of multiple
rates, and illustrate how our methods account for the covariance of the standardized
rates, unlike conventional standardization methods.
vii
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS ................................................................................................ v ABSTRACT ..................................................................................................................... vi LIST OF TABLES ............................................................................................................ x LIST OF FIGURES ......................................................................................................... xi CHAPTER 1. INTRODUCTION ........................................................................................................ 1
1.1. Motivations .................................................................................................. 1 1.2. Rate estimation and adjustment in epidemiology ........................................ 2 1.3. Three major approaches in survey sampling ............................................... 7 1.4. Calibration estimation ................................................................................ 11 1.5. Direct adjustment and poststratification .................................................... 12 1.6. Sampling framework .................................................................................. 13 1.7. Seemingly unrelated regression models .................................................... 15 1.8. Functional form approach in survey sampling .......................................... 16 1.9. Optimization methods ................................................................................ 18
2. NOTATIONS, DEFINITIONS AND BASICS OF RANDOM PERMUTATION
MODELS ............................................................................................................ 20
2.1. Notation ..................................................................................................... 20 2.2. Definition of population and population parameters ................................. 22 2.3. Single stage random permutation .............................................................. 23 2.4. Simultaneous permutation of multiple variables ....................................... 24 2.5. Sampling and partition of random matrices ............................................... 26 2.6. Derivation of variance-covariance matrix of ( )vec U ............................... 27 2.7. Derivation of variance-covariance matrix in Section 2.4. ......................... 28 2.8. Derivation of variance-covariance matrix of the partitioned matrix ......... 32
3. GENERAL ESTIMATION STRATEGY .................................................................. 35
3.1. Parameters of interest ................................................................................. 37 3.2. Random permutation seemingly unrelated regression model .................... 37 3.3. Simultaneous estimation of 1p + population totals .................................. 39 3.4. Example: best linear unbiased estimator of a population total .................. 41 3.5. Estimation of a population total using auxiliary information .................... 42
viii
3.6. Enumeration and simulation ...................................................................... 45 3.7. Research objectives .................................................................................... 46
4. ESTIMATION OF POPULATION TOTALS ........................................................... 49
4.1. Motivations ................................................................................................ 49 4.2. Parameters of interest ................................................................................. 50 4.3. Formulation of a RPSUR model ................................................................ 50 4.4. Two classes of estimators for population total vector ............................... 51 4.5. Estimating multiple population totals ........................................................ 53 4.6. Remarks ..................................................................................................... 57 4.7. Derivations of results in Section 4.5.1. ...................................................... 58 4.8. Derivations of results in Section 4.5.2. ...................................................... 61 4.9. Derivations of results in Section 4.5.3. ...................................................... 64 4.10. Derivations of results in Section 4.5.4. ...................................................... 65
5. ESTIMATING POPULATION TOTALS BY REPARAMETERIZATION ............. 68
5.1. Motivation .................................................................................................. 68 5.2. Formulation of the model through reparameterization .............................. 68 5.3. Estimating ( )0T and ( )0μ with a known auxiliary population total ........... 70 5.4. Estimating ( )0T and ( )0μ with several known auxiliary totals .................. 74 5.5. Estimation of variance and confidence intervals ....................................... 78 5.6. Remarks ..................................................................................................... 82 5.7. Derivations of results in Section 5.3. ......................................................... 83 5.8. Derivations of results in Section 5.4. ......................................................... 87
6. ESTIMATORS BASED ON RANDOM PERMUTATION MODELS UNDER
STRATIFIED SIMPLE RANDOM SAMPLING .............................................. 90
6.1. Motivation .................................................................................................. 90 6.2. Definition and notations ............................................................................. 90 6.3. Independent permutation of one response variable across H strata ......... 92 6.4. Simultaneous permutations of one response variable and p auxiliary
variables across H strata ........................................................................... 93 6.5. Stratified random sampling and partition of random matrices .................. 95 6.6. Parameter of interest .................................................................................. 96 6.7. Formulation of the model when auxiliary variables are present ................ 96 6.8. Estimating ( )0T when one auxiliary variable is present ............................ 97 6.9. Derivations of results in Section 6.8.4. .................................................... 103 6.10. Derivations of results in Section 6.8.5. .................................................... 107
ix
7. APPLICATIONS TO RATE ESTIMATION .......................................................... 111
7.1. Motivations .............................................................................................. 111 7.2. Cases with single categorical auxiliary variable ...................................... 111 7.3. Cases with two categorical variables ....................................................... 125 7.4. Cases with single continuous auxiliary variable ...................................... 128 7.5. Derivations of results (7.1) and (7.2) in Section 7.2. ............................... 146 7.6. Derivations of results (7.3) and (7.4) in Section 7.2. ............................... 150
8. DIRECT RATE STANDARDIZATION USING RANDOM PERMUTATION
MODELS .......................................................................................................... 154
8.1. Motivations .............................................................................................. 154 8.2. Rate estimation with internal direct adjustment ...................................... 154 8.3. Rate comparison with internal direct adjustment .................................... 160 8.4. External direct adjustment ....................................................................... 162 8.5. Discussion ................................................................................................ 163 8.6. Derivations of the results in Section 8.2. ................................................. 164
9. CONCLUDING REMARKS ................................................................................... 172 BIBLIOGRAPHY ......................................................................................................... 177
x
LIST OF TABLES
Table Page
4.1. Two classes of unbiased estimators of population totals under two optimization criteria ...................................................................................... 53
7.1. Mean squared error (x 10,000) of four estimators by population size and sample size .................................................................................................. 116
7.2. Coverage rates and width of 95% confidence intervals .................................. 117
7.3. Percentage of left- and right-misses of the 95% confidence intervals, by population size, sample size and estimation methods ................................. 118
7.4. Two categorical auxiliary variable examples .................................................. 125
7.5. Parameters of hypothetical population Series A and B ................................... 133
7.6. MSE of design-based estimators adjusted to known continuous or categorical auxiliary variables, by population size and sample size (Series A) .................................................................................................... 134
7.7. MSE of design-based estimators adjusted to known continuous or categorical auxiliary variables, by population size and sample size (Series B) .................................................................................................... 135
7.8. Probability of encountering negative variance when sample size is small and age is used as a continuous or categorical auxiliary variable .............. 142
7.9. Coverage rates and width of 95% confidence interval using age as a continuous or categorical variable, by population size and sample size (Series A) .................................................................................................... 144
7.10. Coverage rates and width of 95% confidence interval using age as a continuous or categorical variable, by population and sample sizes (Series B) .................................................................................................... 145
xi
LIST OF FIGURES
Figure Page
3.1. General Estimation Strategy .............................................................................. 36
7.1. Mean squared error by population size and sample size ................................. 119
7.2. MSE reduction (%) of a design-based predictor (DPS or DPP ) by population and sample sizes ....................................................................... 120
7.3. MSE Ratio of design-based predictors to SRS estimator by population and sample sizes .......................................................................................... 121
7.4. Hypothetical relationship between smoking and age of population series A and B ....................................................................................................... 132
7.5. MSE and variance of design-based predictors adjusted for continuous or categorical age (Series A) ........................................................................... 137
7.6. MSE and variance ratios of estimators adjusted for age median, quartiles, octiles and sedeciles to estimators adjusted for age mean (Series A) ......... 138
7.7. MSE and variance of design-based predictors adjusted for continuous or categorical age (Series B) ........................................................................... 140
7.8. MSE and variance ratios of estimators adjusted for age median, quartiles, octiles and sedeciles to estimators adjusted for age mean (Series B) ......... 141
1
CHAPTER 1
INTRODUCTION
Estimation and comparison of disease prevalence based on survey samples is
one of the most common procedures in epidemiological investigation. Analysis of
survey data often involves “adjustment” or “control” for covariates. A methodological
issue that arises is how to adjust for the imbalance of covariates in a sample or to
remove “confounding”. For example, a sample may have a very different gender ratio
than that of the parent population. This issue is closely related to the theory of survey
sampling, and application of recently advanced methods for addressing this issue may
improve methodologies in Epidemiological surveys.
1.1. Motivations
Rate estimation and standardization in epidemiology and vital statistics, in
particular, is closely related to calibration, poststratification and generalized regression
estimation (GREG) in survey sampling. Under certain assumptions, directly
standardized rates can be viewed as estimators “calibrated” to known population or
subpopulation totals, or estimators based on poststratification, or GREG estimators
based on group mean models. These methods of estimation have experienced significant
theoretical and practical advances in the past twenty years. Application of these
techniques may lead to a new class of estimators that will improve rate estimation and
standardization procedures commonly used in epidemiology and vital statistics.
This dissertation research develops alternative methods of defining and
estimating population proportions (means) based on a functional form approach (Binder
2
and Patak 1994; Thompson 1997) and calibration methods (Deville and Särndal 1992;
Särndal, Swensson and Wretman 1992; Brewer 1995; Estevao and Särndal 2000).
Seemingly unrelated regression and finite random permutation model frameworks
(Stanek, Singer and Lencina 2003) are used as vehicles for mathematical deviation.
Research focuses on developing estimators for population totals and proportions in the
setting of simple random sampling without replacement (SRSWOR) and stratified
simple random sampling without replacement (STSRS).
In the following sections, I will briefly review current methods of rate
standardization and advances in several areas of survey sampling that are directly
relevant to the estimation methods proposed in this dissertation. This will lay a ground
for theoretical development of later Chapters.
1.2. Rate estimation and adjustment in epidemiology
Various methods of adjusting for the imbalance of covariates resulting from
complex sampling have been developed in the epidemiologic literature (Rothman 1986;
Breslow and Day 1987; Rosenbaum 1987; Esteve, Benhamou and Raymond 1994).
When applied to rate estimation, these methods are used to “control” or “adjust” for
confounding, and produce a single index that is “comparable” between populations or
subpopulations. For example, in epidemiological research, disease morbidity or
mortality experience (hereafter, referred to more generally as prevalence) is often
compared across geographical areas that have different distribution with respect to the
control variables (e.g., age). Direct comparison of crude prevalence rates may be
severely confounded by the difference in the population structure of the areas being
compared when the difference is related to the outcome of interest (e.g., mortality).
3
“Standardization” or “adjustment” needs to be made to remove such effects of
“confounding”. A typical example is the gender- and age-adjusted mortality rates in
vital statistics.
Adjustment for confounding can be made with or without a multiple regression
model (Kahn and Sempos 1989). Methods not based on regression models are
commonly referred to as “standardization” in literature. Briefly, a standardized rate is a
weighted average of the stratum specific prevalence rates in the study populations,
where weights are determined by the relative frequencies of the control variable in the
referred population. Standardization methods can be grossly classified into “direct” or
“indirect” standardization according to whether a same weighting system is applied to
different populations (Kahn and Sempos 1989). We will discuss these methods in more
detail in the subsequent sections.
Adjustment can be also made based on regression models (Kahn and Sempos
1989). For example, a categorical auxiliary variable can be used for poststratification; a
continuous variable strongly correlated with the response variable can be used to
improve the estimation using ratio or regression estimation. When all the covariates in
the regression model are categorical, they are analogous to directly standardized rates
but include the possibility of adjusting for many variables at one time (Feldstein 1966;
Grizzle, Starmer and Koch 1969; Forthofer and Lehnen 1981). Continuous covariates
can be included in the model in such cases, and it leads to generalized regression
estimators (GREG) and calibration estimators.
4
1.2.1. Rate standardization and finite sampling
Direct standardization of rates can be viewed as special cases of finite sampling.
A fundamental assumption is that the populations under comparison arise from the same
parent population or are similar with respect to factors associated with the event under
study (Fleiss 1981), which is often implied by the “null hypothesis” (Neison 1844). The
choice of a “standard population” is, in statistical term, to define the “parent
population” under study. In addition, the populations under comparison are considered
as realizations of random samples from that parent population. Under such assumptions,
directly standardized rates may be considered as special cases of calibration estimators
or generalized regression (GREG) estimators. The weight systems for standardization
are “calibrated” to the known population or group quantities of the covariates, and the
standardized rates are estimated (predicted) rates of the “parent population” based on
these samples.
Direct standardization of rates, under certain assumptions, is also related to the
class of estimators commonly referred to as “poststratified estimators” (Thomsen 1978;
Oh and Scheuren 1983; Särndal, Swensson and Wretman 1992). A presumption in this
methodology is that random sampling may result in an imbalance of auxiliary variables
but not the probability distribution of the study variable given the auxiliary values
(Mukhopadhyay 2000).
1.2.2. Direct standardization
The purpose of direct adjustment is to present an estimate of event prevalence
that characterizes a population if the distribution of the confounding variables were the
same as that in the standard population. The adjusted totals or rates are then
5
“comparable” between populations. There are two major approaches, i.e., internal and
external comparisons. The former involves only the populations under comparison, the
later involves an outside reference population with a known population structure. The
choice of a “standard population” can make a substantial difference in the standardized
rates and may have significant impact on the interpretation of study results. This
weakness has renewed interest in methodological research on the appropriateness and
validity of rate standardization to describe the discrepancy in event prevalence among
different populations (CDC 1999; Choi, de Guia and Walsh 1999; Goldman and
Brender 2000; Julious, Nicholl and George 2001).
Once the standard population (SP) is chosen, direct standardization methods
utilize the distribution of the confounders in this population to define weights that are
used in the weighted average of prevalence estimates for all populations under
comparison. To illustrate this method, we consider the following simple case. Suppose
that two populations, indexed as 1 2i , ,= are under comparison. Assume each
population has J age strata that are indexed as 1 2j , , ,J= … and that age is a
confounder of the study outcome. The Directly Standardized Rate (DSR) for population
i is defined as
1DSR J j
i ijjSP
Nr
N== ∑ ,
where ijr is the rate in the j-th stratum of the i -th population, j SPN N is the relative
frequency of stratum j in the standard population. Using DSRs, a Directly
Standardized Rate Ratio (DSRR) can be defined as
6
11
111
Predicted overall rate of SP withstratum-specific rates of population DSRR
Actual overall rate of SP
JSP j ij Jj
i j ijJ jSP j jj
N N ri w rN N R
−=
=−=
= = =∑ ∑∑
,
where ( ) 1
1
Jj j j jj
w N N R−
== ∑ . Notice that the weights ( jw ) are fixed and independent
of the populations under comparison (or not dependent on samples); therefore, the
weight system is “identical” for all populations under comparison. DSRR is commonly
referred as the Comparative Mortality Figure (CMF) or the Standardized Rate Ratio
(SRR) in the literature. Since we are interested in not only mortality rates but also other
prevalence rates, we use a more general term DSRR.
1.2.3. Indirect standardization, ISR and ISRR
Indirect standardization utilizes as its weighting system the distribution of the
confounder in the comparison (e.g., “exposed”) population. Thus, a separate
“Indirectly Standardized Rate Ratio” (ISRR) is produced for each population being
compared. ISRR is commonly referred as “Standardized Mortality Ratio” (SMR) in
vital statistics. The indirectly standardized rate for population i is defined as
[ ]
1 1
1 1
1
Observed overall rate in population ISR SP overall ratePredicted overall rate in population
using stratum-specific rates of SP
i
J Jij ij ijj j
J Jij j ijj j
Jij ijj
ii
N r NR
N R N
R v r
= =
= =
=
= ×
= ×
=
∑ ∑∑ ∑∑
where 1 1
J Jj j jj j
R N R N= =
= ∑ ∑ is the overall rate of the standard population, a known
constant; 1
Jij ij ij jj
v N N R=
= ∑ , is dependent on the size ( )ijN of group j in population
i , therefore each population under comparison will have different weighting factor
7
unless all corresponding group relative frequency distribution of the control variable
cross populations are equal.
The ISRR of population i is a rate ratio defined as
11
1
ISRISRRJ
ij ij Jjii ij ijJ j
ij jj
N rv r
R N R=
=
=
= = =∑
∑∑
.
ISRR is the most commonly used technique and has been used to inform allocation of
resources and health-related police-making (Julious, Nicholl and George 2001).
1.2.4. Other standardization methods
There are many other standardization methods in literature (Inskip et al. 1983;
Julious, Nicholl and George 2001). For example, Lee and Liaw proposed a new
weighting system to compute directly standardized rates, where the set of weights are
determined by the data rather than specified a priori, and optimized for stability while
ensuring the comparability of the rates between populations (Lee and Liaw 1999). This
method requires an assumption that the event counts follow a Poisson distribution, and
achieves its optimality by minimizing the total variance of the age-adjustment indices in
the logarithmic scale. Based on such optimized weights, (Lee 2002) proposed a new
standardized summary measure - “harmonically weighted ratio” that can be applied to
external and internal comparisons if the rate-ratio homogeneity assumption holds, i.e.,
rate-ratios of different groups are equal.
1.3. Three major approaches in survey sampling
Current sample survey methods include three major approaches: 1) design-
based, 2) model-based and 3) model-assisted approaches. What distinguishes them is
the choice of probability model that is used for statistical inference.
8
The design-based approach uses probability sampling (randomization) for both
sample selection and inference from sample data. The probability distribution
associated with the randomized sample selection provides the basis for probabilistic
inferences about the target population. The bias, variance and mean squared error
(MSE) are defined in terms of the expectation over all possible samples under the
sampling design. This approach leads to valid repeated sampling inference regardless of
the population structure, and is free from “model misspecification”. The design-based
framework established by (Horvitz and Thompson 1952) and (Godambe 1955) is a
typical example. For example, the population total may be written as 1N NT ×′=y 1 y , and
its Horvitz-Thompson estimator is 1HT n s sT −′= 1 Π y , where sy is an n-vector of the sample
values of y ; Π and sΠ are N N× and n n× diagonal matrices of inclusion
probabilities, respectively, and n′1 and N′1 indicate row vectors with all elements equal
to one. When auxiliary variables are available, the Horvitz-Thompson (HT) estimator
can be written as
( ) ( )( )
( )ˆ
ˆˆHT
HTRHT
TT T
T=
yy x
x
where ( )T x and ( )HTT x are population total of x and its HT estimator, respectively
(Brewer 1963). This estimator is asymptotically design-unbiased and design-consistent,
i.e., ( ) ( ) 0HTRˆE T y T y⎡ ⎤− →⎣ ⎦ and ( ) ( ) 0HTRPr T y T y ε⎡ ⎤− > →⎣ ⎦ when both sample and
population sizes become large ( ), n N→∞ →∞ (Brewer 1979; Särndal and Wright
1984).
9
The Model- or prediction-based approach assumes that the target population
follows a specified model (distribution), and the model distribution yields valid
inferences conditioning on the particular sample that has been drawn (Rao 1997;
Brewer 1999). The bias, variance and mean squared error (MSE) are defined in terms of
the expectation over all possible realizations of a stochastic model that connects the
variable of interest to a set of auxiliary variables (Brewer 1995). Model-dependent
methods can perform poorly as evaluated by the expected value of the model based
MSE over the design in large samples if the model is misspecified (Hansen, Madow and
Tepping 1983; Rao 1997). Model-based approaches have application when handling
nonsampling errors, such as measurement errors and non-response (Brewer 1963;
Royall 1970; Brewer 1995). In small-area estimation, model-based estimators can have
smaller MSE than design based estimators. The best linear unbiased predictor (BLUP)
is a typical model-based estimator (Ghosh and Rao 1994; Rao 1997).
The model-assisted approach is a hybrid of design-based and model-based
approaches. Plausible population models are often used to choose designs and efficient
design-consistent estimators, but the inference remains design-based. This means that
probability statements refer to the set of all possible samples that can be drawn with the
given probability sampling design. In the model-assisted approach, models may be used
to construct estimators, but randomization must be used to select the sample, and
statistical properties of estimators are computed with respect to the probability sampling
distribution (Särndal, Swensson and Wretman 1992; Rao 1999). The resulting estimator
is model-unbiased under the assumed model as well as design consistent (Särndal,
Swensson and Wretman 1992; Rao 1999). The model-assisted approach provides a
10
formal framework for using auxiliary information at the estimation stage. The popular
generalized regression estimator (GREG) is an example of this type (Cassel, Särndal
and Wretman 1977; Särndal, Swensson and Wretman 1992). For example, the general
form of GREG estimator of the population total is
( ) ( ) ( ) ( ) ˆ垐 �GREG HT HT GREGT T T T⎡ ⎤= + −⎣ ⎦y y X X β (1.1)
where X is an N p× matrix of auxiliary variables, ( ) NT ′=X 1 X is the 1 p× row vector
of its column sums, ( ) 1HT n s sT −′=X 1 Π X is the Horvitz-Thompson estimators of ( )T X ,
sX is the n p× matrix of known auxiliary variables of sampled subjects in a
hypothesized linear model linking y and X, e.g., y = Xβ+ ε , where ( )Eξ =ε 0 ,
( ) 2Eξ σ′ =εε A , A is a N N× diagonal matrix. Under this model, the estimator of the
p -vector regression parameter β is
( ) 11 1 1 1 1 1ˆGREG s s s s s s s s
−− − − − − −=β X A Π X X A Π y . (1.2)
This estimator is design-consistent and asymptotically design-unbiased. It is also an
estimator “calibrated” against all the auxiliary variables in X (Deville and Särndal
1992).
However, model-assisted approaches have also their limitations. When the
model is correctly specified, model-assisted inferences are inferior to those from model-
based approach (Rao 1997); when the model is misspecified, they are inferior to the
unconditional designed based approach (Rao 1997).
11
1.4. Calibration estimation
Calibration, or cosmetic calibration (Särndal and Wright 1984; Brewer 1999), is
another popular method of incorporating known auxiliary information to improve
estimation. This approach does not necessarily need a model at all (Thompson 1997).
The strategy is to identify a set of sample-dependent (or sometimes called adjusted)
weights, such that when they are applied to auxiliary variables, the “estimator” of the
auxiliary quantities will equate the known corresponding auxiliary quantities (Deville
and Särndal 1992; Särndal, Swensson and Wretman 1992). The rational behind
calibration methods is that a weight system should be good for a response variable if it
is good for a strongly correlated auxiliary variable (Deville and Särndal 1992; Särndal,
Swensson and Wretman 1992; Brewer 1999).
The idea can be illustrated using the following simple example. Suppose we are
interested in estimating the prevalence of teenager smoking in a population of size N ,
and the total number of male and female teenagers are known. It is believed that male
teenagers are much more likely to be smokers than their female peers. A SRSWOR
sample of teenagers of a predetermined size ( )n is drawn from the population. Without
using any auxiliary variable, one would use a simple expansion weight nNn
=a 1 to
estimate the prevalence. Using the calibration method, one can find a set of sample-
dependent weights ( )w by minimizing the squared distance between w and a subject
to a constraint such that when w is applied to “estimating” the total number of males or
females, the result must be equal to the known numbers of males and females. In
numerous publications, calibration estimators have been shown to be more efficient
12
than the sample expansion estimators (Deville and Särndal 1992; Särndal, Swensson
and Wretman 1992).
1.5. Direct adjustment and poststratification
In the statistical literature, direct adjustment and poststratification are often
referred to as the same estimation procedure (Cochran 1977; Holt and Smith 1979;
Little 1982; Oh and Scheuren 1983; Rosenbaum 1987). While “direct adjustment” in
epidemiology does not necessarily mean the same as in statistical literature, “direct
adjustment” in epidemiology can be viewed as a special case of poststratification in
survey sampling if the “standard population” is viewed as the parent population from
which the populations under comparison arise.
Poststratification has been discussed intensively in both model-based and
model-assisted approaches (Royall 1971; Holt and Smith 1979; Rosenbaum 1987;
Smith 1990; Särndal, Swensson and Wretman 1992). Using a model based approach
and assuming a distribution function of the response variable y and the auxiliary
variable x , i.e., under a superpopulation model, the post-stratified estimator of mean
(proportion or rate) is shown to be a maximum likelihood estimator of the population
mean (Jagers 1986; Mukhopadhyay 2000). Rosenbaum (1987) showed that sample
mean and poststratified or direct adjusted estimators are special cases of model-based
direct adjustment under two different extreme models for the subclass-specific selection
probabilities. The model-based poststratification approach is also used to improve
small-area estimations, by forming “synthetic estimators” (Holt, Smith and Tomberlin
1979).
13
From the perspective of a model-assisted approach, “poststratified estimators”
are a subclass of GREG estimators related to population groups (Särndal, Swensson and
Wretman 1992; Rao 1999). The underlying model is a group mean model, where group
membership is identified after sampling for sampled subjects, and the population group
sizes may or may not be known (Särndal, Swensson and Wretman 1992; Rao 1999).
Poststratification is particularly useful in multipurpose surveys because of its flexibility
in forming poststratra as compared to stratified simple random sampling.
Poststratification is a common tool for reducing nonresponse bias for survey
data (Oh and Scheuren 1983; Smith 1990; Zhang 1999). When non-response
probabilities are independent of the study outcome variable, poststratification can
efficiently reduce non-response bias.
1.6. Sampling framework
Making inference from samples drawn from a population is fundamental to
epidemiology surveys and statistics. In order to make inference, a probabilistic “link” is
needed to relate a sample to its parent population. This dissertation will adopt random
permutation models to provide such necessary “links” (Stanek, Singer and Lencina
2003). Permutation models were discussed more comprehensively in the earlier work of
Cassel, Särndal and Wretman (1977) and Rao and Bellhouse (1978). This framework,
similar to conventional design-based inference, does not require assumptions about
parametric distributions. The permuted population consists of random variables
corresponding to permutations of subjects in the finite population, where permutations
occur with equal probabilities. The population is represented as a non-stochastic vector
of fixed elements, whereas the permuted population is a similar vector of random
14
variables. Parameters in the population consist of the population values, and functions
of these parameters. Parameters of the permuted population are defined as the expected
value of functions of permuted population of random variables. The randomness is thus
attributable to random sampling (permutation). Inference can be made through
evaluating the connection of parameters in the population with parameters in the model
for the permuted population. Under this framework, the structural relationship between
the permuted population and the population can be explicitly represented using
matrices. This provides a convenient mathematical vehicle when regression models are
involved.
Briefly, random variables and a stochastic model are defined to represent the
random permutation of a finite real-valued population. Suppose that a finite population
consists of 1,2, ,s N= … subjects labeled by s , and associated with each subject is an
unknown but potentially observable value, sy . The population values are represented
with a 1N × column vector, namely, ( )1 2 Ny y y ′=y . Define a set of indicator
random variables isU for 1,2, ,i N= … that have a value of 1 if the thi subject in a
permutation is subject s , and 0 otherwise. Let the matrix
( )1 2, , ,N N N× =U U U U represent a matrix of indicator random variables, where
( )1 2s s s NsU U U ′=U . Then the vector of random variables representing a
permutation is given by
1N× =Y Uy , (1.3)
15
where ( )1 2 NY Y Y ′=Y , and i indicates the position in a permutation. The
random vector Y is a vector of linear combinations of random variables in the
matrix U because the vector of population values y is constant. It is obvious that
observing the random vector Y is equivalent to observing the random indicator matrix
U . Since any two subjects can not be at the same position in a permutation, it is easy
to show that
( ) ( )
1 when and
1 1 when and
0 otherwise.
is i s
N i i s s ,
E U U N N i i s s ,′ ′
′ ′≠ =⎧⎪⎪ ′ ′= − ≠ ≠⎨⎪⎪⎩
It immediately follows,
( ) y NE μ=Y 1 ,
( ) 2cov y Nσ=Y P ,
where 11
Ny ss
N yμ −=
= ∑ , ( ) 12 1y NNσ − ′= − y P y , and 1N N NN −= −P I J .
1.7. Seemingly unrelated regression models
Theories about seemingly unrelated regression models (SUR) are well
established. Finite sample properties have been discussed in a few occasions in
literature (Zellner 1963; Revankar 1974; 1976; Liu and Wang 1999), but these models
generally have a strong assumption about the error structure, typically in a form of that
assumes error vectors are independently and identically distributed, and multivariate
normal ( )MN 0,Σ . SUR models enable simultaneous estimation or testing of multiple
16
parameters across equations in the model. Applications of SUR in epidemiological
surveys are quite rare.
Theoretical considerations on integrating SUR models, survey sampling theory,
and random permutation models, have not been seen in literature. Combining these
three aspects may lead to alternative estimation methods for survey samples. The
feature of SUR of preserving covariance structures between covariates during the
estimation process is particularly useful for evaluating the properties of estimators
based on permutation models. With a SUR set-up, permuting response and auxiliary
variables jointly can be represented easily in matrix form, and the covariance structures
can be preserved during estimation.
1.8. Functional form approach in survey sampling
Many linear or non-linear finite population parameters under complex sampling
designs can be expressed as a smooth function of population totals, and their point
estimators are similar functions of the estimators of the totals (Thompson 1997). For
example, finite population parameters or functions, such as means, ratios and variances,
can be defined implicitly by a population equation of the form ( )1, , 0N
s s s Nsy xφ θ
==∑ ,
where sy and sx are values of observable variables, sφ are known real-valued functions,
and Nθ is a population parameter of interest. For example, the population ratio R of y
to x can be defined by ( )10N
s N ssy xθ
=− =∑ , that is ( )N y xg T T Tθ = = , where yT and
xT are the total of y and x , respectively. An estimators can be defined as a solution of
17
the sample estimating equation ( ) ( )1, , 0N
N s s s N ssy xφ θ φ θ π
== =∑s . For any Nθ , the
sample estimating function φs is design-consistent if ( ){ } ( )1, ,N
N s s s NsE y xφ θ φ θ
==∑s .
Linearization of the error as a function of the component total estimates
provides a fundamental basis for this approach. The properties of the point estimate
( )ˆ ˆgθ = T are evaluated using a Taylor series approximation to the error. The first-order
approximation of this type is a “linearization” of the error, for example, the linearized
error of population ratio is
( ) ( ) ( ) ( )垐 垐y y x x
y x
g gR R g g T T T T remainderT T
ε ∂ ∂= − = − = − + − +
∂ ∂T T .
Many estimators based on functional approach are definable through systems of
estimating equations rather than single estimating equation (Thompson 1997). Some
population parameters, such as variances or regression coefficients, can be defined in
terms of other population parameters and need a system of estimating functions for their
definition. For instance, the population variance ( )221
Ny s ys
y Nσ μ=
= −∑ is defined by
( )
( )
2 21
1
0
0
Ns y Ns
Ns ys
y
y
μ σ
μ
=
=
⎧ ⎡ ⎤− − =⎪ ⎢ ⎥⎣ ⎦⎨⎪ − =⎩
∑
∑.
Since SURs allow several variables be modeled simultaneously, their joint
distributions can be obtained. A combination of the functional approach and SUR
models may provide insight in estimation problems for sample surveys.
18
1.9. Optimization methods
Optimization criteria and methods may vary significantly according to the
nature of problems under consideration. The parameter estimation can be viewed as a
least square problem, which minimizes a specified distance function, say ( )Φ w ,
subject to a set of constraints. Commonly used distance functions include trace,
generalized mean squared error (GMSE) or other quadratic function of the variance-
covariance matrix of the estimators.
Often a set of linear constraints are present, e.g., ( ) =r w 0 , where ( )r w is a
vector of differentiable functions. Available optimization methods under linear
constraints include two major approaches, i.e., 1) reparameterization, and 2)
modification of ( )Φ w by introducing Lagrangian multipliers (Judge et al. 1985).
When linear equality constraints can be expressed in the form ( )* =w g w , the
restrictions can be introduced by reducing the dimension of the parameter space to
circumvent the singularity problem and thus a unique solution can be obtained. The
constrained optimum of the objective function ( )Φ w can be found by an unconstrained
minimization of ( ) ( )( )*AΦ = Φw g w . For example, if ( )1 2 3w w w ′=w is subject to
0′ =1 w , then the constraint can be represented as a reparameterization of w ,
( ) 1 3
2 3
1 0 11 1 0
*w ww w
−− ⎛ ⎞⎛ ⎞= = =⎜ ⎟⎜ ⎟ −−⎝ ⎠ ⎝ ⎠
g w w w
Minimization of ( )AΦ w subject to 0′ =1 w is equivalent to minimization of ( )*Φ w
without constraint.
19
When multiple constraints are present, it may be difficult to reparameterize the
optimization functions. Instead, a vector ( )1 2 Jλ λ λ ′=λ of Lagrangian
multipliers can be introduced and the constraints are incorporated in the minimization
procedure by defining an alternative optimization function
( ) ( ) ( )ˆ ,RΦ = Φ +w λ w r w λ (1.4)
A constrained minimum of ( )Φ w is equivalent to minimizing ( )ˆ ,RΦ w λ .
Differentiating ( )ˆ ,RΦ w λ with respect to elements of both w and λ , and setting the
derivatives to zeros, will lead to unique solutions for w .
20
CHAPTER 2
NOTATIONS, DEFINITIONS AND BASICS OF
RANDOM PERMUTATION MODELS
This Chapter sets up general notations used in this thesis (Section 2.1), defines
parameters of interest (Section 2.2), introduces the random permutation framework
(Section 2.3) and methods for simultaneously permuting multiple variables, and
establishes their relationship to seemingly unrelated regression models (Section 2.4). It
is shown that the sampling process can be represented as a partition of random matrices
(Section 2.5).
2.1. Notation
Random variables are represented with upper case Latin letters, for example, Y ,
and random vectors with bolded upper case letter, e.g., Y . The value of a random
variable is represented with lower case letter, e.g., y ; and a column vector of the values
with bolded lower case letter, e.g., y .
Population parameters are represented using Greek letters such as μ for mean,
Σ for variance-covariance matrix. Estimators of population parameters are denoted
using a hat over the letter of their corresponding Greek letters, for example, μ and Σ .
Letter “T” is reserved for population totals, such as ( )1T for the population total of a
response variable ( )1Y .
Matrices are represented using bolded upper or lower case letters. To simplify
notation, we define notation to represent matrices that are frequently used in
21
mathematical derivation (Harville 1997). A vector or matrix of zeros is represented with
0 . For any finite integer a and b ,
1) Summing Vector a′1 is a 1 a× row vector of ones,
2) Identity Matrix aI is an a a× identity matrix,
3) J-Matrix aJ is an a a× matrix of ones,
4) Centering Matrix 1a a aa= −P I J ,
5) Projection Matrix ( )−′ ′=X,WP X X WX X W , where X is any a b× matrix,
W is any a a× matrix and ( )−′X WX indicates a generalized inverse of
′X WX . When =W I , X,IP is simply denoted as XP .
6) Another frequently used matrix is ,1
n N n nN= −P I J .
For matrix operations, the transpose of a matrix is represented as ′A , and an
inverse of a matrix A is denoted as 1−A if it exists. The generalized inverse of A is
denoted as −A . A block diagonal matrix of iA , 1,2, ,i N= … , is represented using
1
N
ii=⊕A . Kronecker products of A and B are represented using ⊗A B ,
11 12 1
21 22 2
1 2
N
N
M N K L
M M MN
a a a
a a a
a a a
× ×
⎡ ⎤⎢ ⎥⎢ ⎥
⊗ = ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
B B B
B B BA B
B B B
.
The VEC operator is denoted as vec , for example,
22
[ ]
1
2
m n
n
vec ×
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
a
aA
a
,
where ia , 1 2i , , ,n= … , is the -thi column of the m n× matrix A .
2.2. Definition of population and population parameters
A finite population P consists of N subjects with labels 1,2, ,s N= … , where
N is known. The labels are non-informative and serve only to identify the units
(Royall 1970). Associated with each subject (e.g., s ) is a non-stochastic ( )1 1p + ×
column vector sy , where ( ) ( ) ( ) ( )( )0 1 k ps s s s sy y y y ′=y , ( )0
sy is the response
variable under study and ( ) , 1,2, ,ksy k p= … are auxiliary variables.
The values of the -thk variable ( )ky , 0,1, ,k p= … , for all subjects are
represented using an 1N × column vector ( ) ( ) ( ) ( )( )1k k k k
s Ny y y ′=y . The
values of all 1p + variables for all N subjects can be represented with an ( )1N p× +
matrix y , such that
( )( ) ( ) ( ) ( )( )
0
0 111
k pN p
N
y y y y× +
′⎛ ⎞⎜ ⎟′⎜ ⎟= =⎜ ⎟⎜ ⎟⎜ ⎟′⎝ ⎠
yy
y
y
.
Further, the population values are alternatively represented as a ( )1 1p N+ × column
vector z , such that ( )vec=z y .
23
For any variable ( )ky , 0,1,2, ,k p= … , we define population parameters as
follows,
Mean: ( ) ( )1k kNN
μ ′= 1 y
Total: ( ) ( )k kNT ′= 1 y
Ratio of ( )ky to ( )*ky : ( ) ( )*
*,
kkk k
R T T=
Variance: ( ) ( )21 1 k kk N
NN N
σ−⎛ ⎞ ′=⎜ ⎟⎝ ⎠
y P y
Covariance between ( )ky and ( )*ky : ( ) ( )*
*,
1 1 kkNk k
NN N
σ−⎛ ⎞ ′=⎜ ⎟⎝ ⎠
y P y .
Variance-covariance matrix:
( ) ( )
20 01 0
210 1 1
1 1
20 1
p
pp p
p p p
σ σ σσ σ σ
σ σ σ
+ × +
⎛ ⎞⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
Σ .
2.3. Single stage random permutation
Suppose that a sample of n subjects is drawn from a population of size N via
sample random sampling without replacement (SRSWOR). A stochastic model can be
defined as arising from a single-stage random permutation of the population. In a
permutation, the position of a subject is indexed by 1,2, ,i N= … ; there will be a
different subject, s, at each position, i. The subjects at the first 1,2, ,i n= … positions in
a permutation constitute a sample from SRSWOR.
24
We define a set of random variables and a stochastic model to represent the
random permutation of the subjects in P . First, we define a set of indicator random
variables isU , 1,2, ,i N= … , that have a value of 1 if the subject in the -thi position in a
permutation is subject s , and 0 otherwise. Let ( )1 2N N N× • • •′′ ′ ′=U U U U
represent a matrix of indicator random variables, where ( )1 2i i i iNU U U•′=U .
Then the vector of random variables representing a permutation of ( )ky is given by
( ) ( )k k=Y Uy .
For a random permutation, ( ) 1NE
N=U J . The variance of ( )kY can be expressed in
terms of the variance and covariance of the elements in U . Through proper
arrangement in order, the variance for U can be written as
( ) 1var1 N Nvec
N= ⊗⎡ ⎤⎣ ⎦ −
U P P . (2.1)
Derivation of (2.1) is presented in Section 2.6.
Since ( ) ( ) ( )( ) ( )k k kN vec′= = ⊗Y Uy y I U , we have,
( )( )kn kE μ=Y 1 , and
( )( ) 2var kk Nσ=Y P .
2.4. Simultaneous permutation of multiple variables
We develop expressions for simultaneous permutation of multiple variables,
first considering the joint permutation of two variables. The results can be readily
25
extended to scenarios with multiple variables. The joint permutation of both ( )0y and
( )1y can be expressed as
( )
( ) ( )( )
( )
0 0
21 1
⎛ ⎞ ⎛ ⎞= ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠
Y yI U
Y y. (2.2)
Since ( ) NEN
=JU , it is easy to show that
( )
( ) ( )0
021
1NE
μμ
⎛ ⎞ ⎛ ⎞= ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠
YI 1
Y, (2.3)
and
( )
( )
0
0,11 Ncov⎛ ⎞
= ⊗⎜ ⎟⎜ ⎟⎝ ⎠
YΣ P
Y, (2.4)
where 20 0,1
0,1 21,0 1
σ σσ σ⎛ ⎞
= ⎜ ⎟⎝ ⎠
Σ . Derivation of (2.4) is given in Section 2.7.
When there are 1p + covariates of interest, the matrix of random variables
representing a permutation is given by
( ) ( )1 1N NN p N p×× + × +=Y U y , (2.5)
where ( ) ( ) ( ) ( )( )0 1 k p ′=Y Y Y Y Y , ( )kY is the random variable
corresponding to variable ( )ky , 0,1, , .k p= … Denote a column vector ( )1 1p N+ ×Z as
( ) ( ) ( ) ( )1 1p pvec vec+ += = ⊗ = ⊗Z Y I U y I U z , (2.6)
where ( ) ( )1 1p N vec+ × =z y .
26
As an application of (2.3) and (2.4) when 1p = , it can be shown that the
expected value and variance-covariance matrix of Z are,
( ) ( )1p NE += ⊗Z I 1 μ , (2.7)
( ) Ncov = ⊗Z Σ P , (2.8)
where Σ is as defined in Section 2.2.
2.5. Sampling and partition of random matrices
A sample of size n , n N≤ , drawn by SRSWOR scheme from P is thus viewed
as the first n subjects in a random permutation. Therefore, a random vector Z can be
partitioned into a sample (indexed as I) and remaining (indexed as II) parts. The
elements of Z can be rearranged into the sampled and remaining portions through pre-
multiplication by a properly specified positioning matrix *K , i.e.,
* I
II
⎛ ⎞= = ⎜ ⎟
⎝ ⎠
*Z
Z K Z Z .
For example, if Z involves 2 variables,
( )( )( )( )
2*
2
n n N n
N nN n n
× −
−− ×
⎛ ⎞⊗⎜ ⎟= ⎜ ⎟⊗⎜ ⎟⎝ ⎠
I I 0K
I 0 I.
Similarly, if Z involves 1p + variables,
( )( )( )( )
1*
1
p n n N n
p N nN n n
+ × −
+ −− ×
⎛ ⎞⊗⎜ ⎟= ⎜ ⎟⊗⎜ ⎟⎝ ⎠
I I 0K
I 0 I. (2.9)
The expected values of *Z are,
27
( ) 1 0*
1 1
I p n
II p N nE E
μμ
+
+ −
⊗⎛ ⎞ ⎛ ⎞⎛ ⎞= =⎜ ⎟ ⎜ ⎟⎜ ⎟⊗ ⎝ ⎠⎝ ⎠ ⎝ ⎠
Z I 1Z Z I 1 .
Use the notation, we can derive the expression for the partitioned variance-
covariance matrix as follows. The partitioned variance-covariance matrix is defined by
,
,
I I I II
II II I II
cov⎛ ⎞ ⎛ ⎞
= =⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠
Z V VV Z V V
, (2.10)
where ,I n N= ⊗V Σ P , ( ), ,1
I II II I n N nN × −⎛ ⎞′= = ⊗ −⎜ ⎟⎝ ⎠
V V Σ J and ( ),II N n N−= ⊗V Σ P .
Furthermore, it can be shown that
1 1 1I n nN n− − ⎛ ⎞= ⊗ +⎜ ⎟−⎝ ⎠
V Σ I J . (2.11)
Detailed derivations for the above expressions can be found in Section 2.8.
2.6. Derivation of variance-covariance matrix of ( )vec U
Since ( )1 2N N N× • • •′′ ′ ′=U U U U represent a matrix of indicator random
variables, where ( )1 2i i i iNU U U•′=U ,
( )
( ) ( ) ( )( ) ( ) ( )
( ) ( ) ( )
1 1 1 2 1
2 1 2 2 2
1 2
cov , cov , cov ,
cov , cov , cov ,cov
cov , cov , cov ,
N
N
N N N N
vec
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟=⎡ ⎤⎣ ⎦ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
U U U U U U
U U U U U UU
U U U U U U
i i i i i i
i i i i i i
i i i i i i
. (2.12)
Since ( ) ( ) ( ) ( )* * *cov ,i i ii i iE E E′ ′= −U U U U U Ui i ii i i
, ( ) ( )*N
i iE E
N= =
1U Ui i, and
28
( )( ) ( )
*
*
*
1 , when ,
1 , when ,1
N
i i
N N
i iN
Ei i
N N
⎧ =⎪⎪′ = ⎨⎪− − ≠⎪ −⎩
I
U UI J
i i
it is readily shown that
( ) ( ) ( ) ( )( )
* * *
*
*
1 ,
cov ,1
1
N
i i ii i i
N
i iN
E E Ei i
N N
⎧ =⎪⎪′ ′= − = ⎨⎪− ≠⎪ −⎩
P
U U U U U UP
i i ii i i, (2.13)
where NN N N= −
JP I . Applying (2.13) to (2.12), we have
( ) 1cov1 N Nvec
N= ⊗⎡ ⎤⎣ ⎦ −
U P P .
2.7. Derivation of variance-covariance matrix in Section 2.4.
Simultaneous permutation of two variables can be expressed as
( )
( ) ( )( )
( )
0 0
21 1
⎛ ⎞ ⎛ ⎞= ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠
Y yI U
Y y (2.14)
In order to get the variance-covariance matrix, we can rewrite (2.14) as (2.15),
( )
( )( ) ( )( )( ) ( )
00 1
2 21 N vec⎛ ⎞
′ ′= ⊗ ⊗⎜ ⎟⎜ ⎟⎝ ⎠
Yy y I I U
Y. (2.15)
Since ( ) ( ) ( ) ( )m n p q n qm p m n p qvec vec vec× × × ×⎡ ⎤⊗ = ⊗ ⊗ ⊗⎣ ⎦A B I K I A B , where
qmK is a vec-permutation matrix. The ( )1i n j th− +⎡ ⎤⎣ ⎦ row of qmK is the
( )1j m i th− +⎡ ⎤⎣ ⎦ row of mnI ( )1,2, , ; 1,2, ,i m j n= =… … (see Harville 1997, page 345,
349). Therefore,
29
( ) ( ) ( ) ( )
( )
( )
( )
( )
2 2 2 2
2 2
*2 2 ,
N N
N N
N N
vec vec vec
vec
vec
⊗ = ⊗ ⊗ ⊗⎡ ⎤⎣ ⎦
⎛ ⎞⎜ ⎟⎜ ⎟= ⊗ ⊗ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
= ⊗ ⊗
I U I K I I U
U0
I K I0
U
I K I U
(2.16)
where ( ) ( )( )* vec vec′′ ′=U U 0 0 U .
Apply (2.16) to (2.15), we have
( )
( )( ) ( )( )( )( )
( ) ( )( )( )( ){ }
00 1 *
2 2 21
0 1 *2 2 2 ,
N N N
N N
⎛ ⎞′ ′= ⊗ ⊗ ⊗ ⊗⎜ ⎟⎜ ⎟
⎝ ⎠
⎡ ⎤′ ′= ⊗ ⊗ ⊗⎢ ⎥⎣ ⎦
Yy y I I I K I U
Y
y y I I K I U
(2.17)
where 2NK is a 2 2N N× matrix, comprising N rows and 2 columns of 2N ×
dimensional blocks whose ij -th element is 1 and whose other ( )2 1N − elements are 0.
2NK can be written explicitly as
1 1 2 1
1 2 2 2
2
1 2
1 0 0 0 0 0
0 0 0 1 0 0
0 1 0 0 0 0
0 0 0 0 1 0
0 0 1 0 0 0
0 0 0 0 0 1
N
N N
⎛ ⎞⎜ ⎟⎜ ⎟
′ ′ ⎜ ⎟⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟′ ′⎜ ⎟⎜ ⎟⎜ ⎟= =⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟′ ′⎜ ⎟
⎝ ⎠ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
u e u e
u e u eK
u e u e
,
where iu is the i -th column of 2I , 1,2i = , and je is the j -th column of NI ,
1,2, ,j N= … . Observing that
30
( )( )( )
( )
( )
( )
( )
( )
( ) ( ) ( )
( ) ( ) ( )
( )
( )
( )( )
0 0 01 20
2 0 0 01 2
0 0 01 1 1 2 1
0 0 02 1 2 2 2
01 1 2 1 1
01 2 2 2 2
02 2
0 0 0
0 0 0N
N
N
N
N
N
N
y y y
y y y
⎛ ⎞′ ⎜ ⎟⊗ =
⎜ ⎟⎝ ⎠⎛ ⎞′ ′ ′′ ′ ′⎜ ⎟=⎜ ⎟′ ′ ′′ ′ ′⎝ ⎠⎛ ⎞′ ′ ′ ′⎛ ⎞⎜ ⎟= ⎜ ⎟′ ′ ′⎜ ⎟′ ⎝ ⎠⎝ ⎠
′ ′= ⊗
y I
y u e y u e y u e
y u e y u e y u e
y 0 e u e u e ue u e u e u0 y
I y K
and 2 2 2N N N′ =K K I , we can simplify (2.17) as follows(for matrix operation, see page
348 & p349 of Harville 1997),
( )
( )( )( ) ( )( ){ }
( )( ) ( )( ){ }( )( ) ( )( ){ }( )( ) ( )( ){ }
( )
( )
( )
( )
( ) ( )( )
00 1 *
2 2 2 21
0 1 *2 2 2 2 2 2
0 1 *2 2 2 2
0 1 *2 2
0 1*
0 1
0 1 *2 2
N N N
N N N N N
N N N
N
N
N
⎛ ⎞ ⎡ ⎤′ ′= ⊗ ⊗ ⊗⎜ ⎟ ⎢ ⎥⎜ ⎟ ⎣ ⎦⎝ ⎠
⎡ ⎤′ ′′ ′= ⊗ ⊗ ⊗⎢ ⎥⎣ ⎦
⎡ ⎤′ ′= ⊗ ⊗ ⊗⎢ ⎥⎣ ⎦
⎡ ⎤′ ′= ⊗ ⊗ ⊗⎢ ⎥⎣ ⎦
⎡ ⎤⎛ ⎞′ ′⎢ ⎥⎜ ⎟= ⊗⎜ ⎟⎢ ⎥′ ′⎝ ⎠⎣ ⎦⎡ ⎤′ ′= ⊗ ⊗ ⊗⎢ ⎥⎣ ⎦
Yy I K y I K I U
Y
I y K K I y K K I U
I y I I y I I U
I y I y I U
y 0 y 0I U
0 y 0 y
I y I y I U
( )
( )( ) ( )( ) ( )
( )
( )
0020 1 *
2 2 112
cov covN N
⎡ ⎤⎛ ⎞⎛ ⎞ ⊗⎡ ⎤′ ′ ⎢ ⎥⎜ ⎟= ⊗ ⊗ ⊗ ⊗⎜ ⎟ ⎢ ⎥⎜ ⎟ ⎣ ⎦ ⎜ ⎟⊗⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦
I yYI y I y I U I
I yY
As shown in Section 2.6, the variance matrix for U is
( ) 1cov1 N Nvec
N= ⊗⎡ ⎤⎣ ⎦ −
U P P ,
where NN N N= −
JP I . Therefore,
31
( )( )
( )
( )( ) ( ) ( )
( ) ( ) ( )( )
*
cov cov ,
cov cov
cov , cov
vec vec vecvec
vec vec vec vec
⎛ ⎞⎡ ⎤⎛ ⎞ ⎣ ⎦⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟= = ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎡ ⎤⎝ ⎠ ⎣ ⎦⎝ ⎠
U 0 0 U UU0 0 0 0 0
U0 0 0 0 0
U U U 0 0 U
( ) ( )
( ) ( )
( ) ( )
*
1 0 0 10 0 0 0
cov cov0 0 0 01 0 0 1
10
1 0 0 1 cov01
11 N N
vec
vec
N
⎛ ⎞⎜ ⎟⎜ ⎟= ⊗ ⎡ ⎤⎣ ⎦⎜ ⎟⎜ ⎟⎝ ⎠⎡ ⎤⎛ ⎞⎢ ⎥⎜ ⎟⎢ ⎥⎜ ⎟= ⊗ ⎡ ⎤⎣ ⎦⎢ ⎥⎜ ⎟⎢ ⎥⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦
= ⊗ ⊗−
U U
U
LL' P P
(2.18)
where ( )1 0 0 1′ =L .
This is a patterned matrix, we can further simplify it as follows.
( )
( )( ) ( )( ) ( )( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
0020 1
2 2 112
0
00 1
10 1
1
0
1cov1
11
11
N N N N
N N
N
N N
N
N
N
⎡ ⎤⎛ ⎞⎛ ⎞ ⊗⎡ ⎤′ ′ ⎢ ⎥⎜ ⎟= ⊗ ⊗ ⊗ ⊗ ⊗ ⊗⎜ ⎟ ⎢ ⎥⎜ ⎟ ⎣ ⎦ ⎜ ⎟− ⎢ ⎥⊗⎝ ⎠ ⎝ ⎠⎣ ⎦
⎧ ⎫⎛ ⎞⎛ ⎞⎪ ⎪⎜ ⎟⎜ ⎟⎛ ⎞′ ′⎪ ⎪⎜ ⎟⎪ ⎪⎜ ⎟⎜ ⎟= ⊗⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟− ′ ′ ⎜ ⎟⎪ ⎪⎝ ⎠⎜ ⎟⎜ ⎟⎜ ⎟⎪ ⎪⎜ ⎟⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭
′=
−
I yYI y I y I LL' P P I
I yY
y 0P 0 0 P0 y0 0 0 0y 0 y 0
P0 0 0 0 y 00 y 0 y
P 0 0 P 0 y
y P ( ) ( ) ( )
( ) ( ) ( ) ( )
0 1 0
0 1 1 1
20 0,1
20,1 1
0,1
N NN
N N
N
N
σ σσ σ
⎛ ⎞′⎜ ⎟ ⊗⎜ ⎟′ ′⎝ ⎠
⎛ ⎞= ⊗⎜ ⎟⎜ ⎟⎝ ⎠
= ⊗
y y P yP
y P y y P y
P
Σ P
32
where 20 0,1
0,1 20,1 1
σ σσ σ⎛ ⎞
= ⎜ ⎟⎝ ⎠
Σ .
2.8. Derivation of variance-covariance matrix of the partitioned matrix
( ) ( ) ( )( )0 1 p ′′ ′ ′=Y Y Y Y can be partitioned into a sample (indexed as I) and
remainder (indexed as II) parts. After sampling is completed, the values for the sampled
portion ( ) ( ) ( )0 1, , pI I IY Y Y will be known, and can be represented as realized values.
Through proper specification of permutation matrices K , we can rearrange the
elements of ( ) ( ) ( )0 1, , pI I IY Y Y into the sampled and remaining portions by pre-
multiplication by *K . We can rewrite
( )
( )
( )
( )( )( )( )
( )
( )
( )
0 0
1 11 -*
1 -
p n n N n I
IIp N nN n np p
+ ×
+ −×
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎛ ⎞⊗ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟= = ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⊗ ⎝ ⎠⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
Y YI I 0 ZY Y
K ZI 0 IY Y
, (2.19)
where *K as defined in (2.9), ( ) ( ) ( )( )0 1 pI I I I
′′ ′ ′=Z Y Y Y and
( ) ( ) ( )( )0 1 pII II II II
′′ ′ ′=Z Y Y Y .
Use this notation, we can derive the expression for the partitioned variance-
covariance matrix as follows. The partitioned variance-covariance matrix is thus
( ), * *
,
cov covI I I II
II II I II
⎛ ⎞ ⎛ ⎞ ′= = =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠
Z V VV K Y KZ V V
Since ( ) Ncov = ⊗Y Σ P , the above expression can be simplified as,
( )* *N
′= ⊗V K Σ P K . (2.20)
33
V can be simplified as follows,
( )( )( )( ) ( )
( )
( )
( )( )( )
( )( ) ( )
( )( )( )
1 - -1 1
-1 -
-- -
-
--
p n n N n n n N nN p p
N n n N np N nN n n
n n N nn N n Nn N n n N n
N n n N n
nN n NN n n
N n n
+ × ×+ +
× −+ −×
×× ×
× −
−××
⎛ ⎞⊗ ⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟= ⊗ ⊗ ⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⊗ ⎝ ⎠ ⎝ ⎠⎜ ⎟ ⎝ ⎠⎝ ⎠
⎡ ⎤⎡ ⎤ ⎛ ⎞⎛ ⎞⊗ ⊗ ⎢ ⎥⎢ ⎥ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎢ ⎥⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦
=⎛ ⎞
⊗ ⎜ ⎟⎜⎝ ⎠
I I 0 0IV Σ P I I0 II 0 I
0IΣ I 0 P Σ I 0 P0 I
IΣ 0 I P 0 ( ) ( )( ) ( )-
-
.n N n
NN n n N nN n
×× −
−
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟
⎡ ⎤⎡ ⎤ ⎛ ⎞⎜ ⎟⊗ ⎢ ⎥⎢ ⎥ ⎜ ⎟⎜ ⎟⎟ ⎜ ⎟⎜ ⎟⎢ ⎥⎢ ⎥ ⎝ ⎠⎣ ⎦ ⎣ ⎦⎝ ⎠
0Σ 0 I P
I
Therefore,
( )
( )
,,
,,
1
1
n N n N nI I II
I II IIN n NN n n
N
N
× −
−− ×
⎛ ⎞⎛ ⎞⊗ ⊗ −⎜ ⎟⎜ ⎟⎛ ⎞ ⎝ ⎠⎜ ⎟= =⎜ ⎟′ ⎜ ⎟⎛ ⎞⎝ ⎠ ⊗ − ⊗⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
Σ P Σ JV V
V V VΣ J Σ P
, (2.21)
where ,I n N= ⊗V Σ P , ( ),1
I II n N nN × −⎛ ⎞= ⊗ −⎜ ⎟⎝ ⎠
V Σ J and ,II N n N−= ⊗V Σ P .
Further, ( ),1 1n n n n
n N n n n n fN n n N n
⎛ ⎞ ⎛ ⎞= − = − + − = + −⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
J J J JP I J I P , and
,1 N n
N n N N n N n N n fN N n
−− − − −= − = +
−JP I J P , where f n N= is the sampling fraction.
Consequently,
( )( ) ( )( )( )( ) ( )( )
( )( ) ( )( )( )( ) ( )( )
1 1
11
1 1
11
1
1,
n n n N n
N n N nN n n
n n N nn
N n N nN n n
f n N
N f N n
f n N
N f N n
− −× −
−−− −− ×
− −× −
−−− −− ×
⎛ ⎞⊗ + − ⊗ −⎜ ⎟= ⎜ ⎟⊗ − ⊗ + −⎜ ⎟⎝ ⎠
⎛ ⎞⊗ − ⊗ −⊗⎛ ⎞ ⎜ ⎟= +⎜ ⎟ ⎜ ⎟⎜ ⎟⊗ ⊗ − ⊗ −⎝ ⎠ ⎜ ⎟⎝ ⎠
Σ P J Σ JV
Σ J Σ P J
Σ J Σ JΣ P 00 Σ P Σ J Σ J
and
34
( )1 nI n f
n⎛ ⎞= ⊗ + −⎜ ⎟⎝ ⎠
JV Σ P ,
( ),1
I II n N nN × −= − ⊗V Σ J ,
N nII N n f
N n−
−⎛ ⎞= ⊗ +⎜ ⎟−⎝ ⎠
JV Σ P .
Furthermore, it is shown that,
1 1 1 1,
1I n N n nN n− − − − ⎛ ⎞= ⊗ = ⊗ +⎜ ⎟−⎝ ⎠
V Σ P Σ I J
and
1 1 1 1,
1II N n N N n N nn− − − −
− − −⎛ ⎞= ⊗ = ⊗ +⎜ ⎟⎝ ⎠
V Σ P Σ I J .
35
CHAPTER 3
GENERAL ESTIMATION STRATEGY
This Chapter proposes a general method for estimating population parameters
based on survey samples. This method connects permutation models, estimation
functions and seemingly unrelated regression. In later Chapters, we will apply this
method to develop various estimators for scenarios that are common in practical
settings. The basic strategy for mathematical derivation is to:
1) Represent the randomness arising from sampling and relate samples to the
population using a random permutation model;
2) Define target parameters as linear or nonlinear functions of population totals;
3) Represent joint permutations of response and auxiliary variables in a seemingly
unrelated regression framework;
4) Represent estimation of population totals as predictors of the remainder based
on linear functions of the sample;
5) Define unbiasedness constraints as parameter-wise or average unbiasedness;
6) Introduce auxiliary information as additional linear constraints to estimating
equations using reparameterization or additional Lagrangian functions;
7) Optimize estimation by minimizing the generalized mean squared errors, or
trace of variance-covariance matrix of the population totals subject to various
unbiasedness or calibration constraints.
Since many population parameters can be defined as functions of population
totals, estimating population totals is fundamental to the estimation of other parameters.
36
The following sections outline several strategies to obtain unbiased and minimum
variance estimators for population totals. We will limit our discussion to the settings of
simple random sampling without replacement.
Figure 3.1. General Estimation Strategy
Finite Population
Sample
Random Permutation
Define Target RandomVariables
Optimization
Criteria:1. GMSE, 2. Trace3. Penalty function
Methods:1. Reparameterization2. Lagrangian function
揃est? estimators
Auxiliary variablesResponse variable
(Measured w/o error)
RPSUR Model
Parameters representedas functions of totals
Sample + Remainder
SRSWOR
Constraints:1. Parameter-wise2. Average unbiasedness3. Auxiliary information
37
3.1. Parameters of interest
In Chapter 2, we have shown that the joint distribution of ( ) ( ) ( )0 1, , , pY Y Y… can
be obtained through simultaneous permutation of the underlying variables ( ( )0y , ( )1y ,
…, ( )py ). A ( )1 1p + × vector T of population totals of interest can be represented as a
set of 1p + linear combinations of the random variables arising from the permutations,
′=T L Z , where ′L is a ( ) ( )1 1p p N+ × + matrix, ( ) ( ) ( )( )0 1 p ′′ ′ ′=Z Y Y Y . The
elements of Z can be partitioned into two parts that represent the sample ( )IZ and the
remaining ( )IIZ by premultiplication by the permutation matrix *K given by (2.9),
( )*I II
′′ ′=K Z Z Z . Since * *′ =K K I , T can be represented using following identities,
* *I I II II
′′ ′ ′ ′= = = +T L Z L K K Z L Z L Z , (3.1)
where IL and IIL are ( )( )1p n q+ × and ( )( )( ) ( )1 1p N n p+ − × + matrices
respectively. Once IZ is observed after sampling, estimating T is equivalent to
predicting II II′L Z .
3.2. Random permutation seemingly unrelated regression model
Based on the explicit presentation of simultaneous permutation of multiple
variables from Section 2.4, a ( )1p + -equation simple location seemingly unrelated
regression model can be defined as
38
( )
( )
( )
00
11
N
N
p pN
μμ
μ
⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠⎝ ⎠
Y 1 0 00 1 0Y E
0 0 1Y
, (3.2)
where E is a ( )1 1p N+ × column vector of residuals. Since ( ) ( ) ( )0 1, , , pY Y Y… arise
from exactly the same permutation of ( ) ( ) ( )0 1, , , py y y… as U , these 1p + equations are
indeed related to each other and this relationship is captured through the identical
permutation represented by ( )vec U . Following conventional notation, Model (3.2) can
be expressed more compactly as,
= +Z Xμ E (3.3)
where ( ) ( )0 11 1 pp μ μ μ+ ×′=μ , 1p N+= ⊗X I 1 is the design matrix for the model.
Hereafter, Model (3.3) is referred as Random Permutation Seemingly Unrelated
Regression (RPSUR) model. Accordingly, the RPSUR model can be written in terms
of sample and remaining parts,
I I
II II
⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠
Z Xμ E
Z X. (3.4)
It is readily shown that ( )I IE =Z X μ and ( )II IIE =Z X μ .
With (3.4), linear estimators of T can be derived under design-based framework
(Stanek, Singer and Lencina 2003) using a method similar to the prediction-based
approach (Royall 1976; Bolfarine and Zacks 1992; Valliant, Dorfman and Royall 2000).
We will illustrate several methods in the following sections.
39
3.3. Simultaneous estimation of 1p + population totals
The 1p + population totals of interest can be represented as a vector
( )0 1 pT T T ′=T . By definition, ( )1p N I I II II+ ′ ′ ′= ⊗ ≡ +T I 1 Z L Z L Z , where
1I p n+′ ′= ⊗L I 1 and 1II p N n+ −′ ′= ⊗L I 1 . We are interested in deriving an estimator for T
that is a linear function of IZ and unbiased for T while a chosen optimization criterion
is satisfied. A linear estimator of T can be defined as ,
( )ˆI I′ ′= +T L W Z , (3.5)
where W is a ( ) ( )1 1n p p+ × + matrix of weights (or coefficients). There are many
possible ways to define W . Usually, W is chosen to satisfy certain optimization
criteria, such as to minimize trace of the variance-covariance matrix, generalized mean
squared error or M-estimation criteria. An intuitive but arbitrary way to define W is to
define 1 *p+= ⊗W I w , where *w is 1n × column vector, that is, a single weight system
is applied to all variables of interest. Once W is defined, the corresponding estimation
error can be defined as
( )ˆI I I II II′ ′ ′ ′ ′− = + − = −T T L W Z L Z W Z L Z . (3.6)
The unbiasedness of T for T requires that
( ) ( )ˆI II IIE E ′ ′− = − =T T W Z L Z 0 . (3.7)
Since ( )I IE =Z X μ and ( )II IIE =Z X μ , for (3.7) to hold for any T , it is required that
I II II′ ′− =W X L X 0 , (3.8)
where 1I p n+= ⊗X I 1 and 1II p N n+ −= ⊗X I 1 .
40
The variance-covariance matrix of T under Model (3.4) is defined as
( ) ( )垐E ′− −T T T T , which can be represented in terms of the variance-covariance
matrices of the sample ( )IV and remaining parts ( )IIV , and the covariance matrix
between the sample and remaining parts ( ),I IIV ,
( ) ( ) ,
,
ˆcov I I IIII
II I II II
⎛ ⎞ ⎛ ⎞′ ′− = − ⎜ ⎟ ⎜ ⎟−⎝ ⎠⎝ ⎠
V V WT T W L
V V L. (3.9)
To find the best estimate for T is to find an estimate for W that satisfies (3.8)
and results in minimizing the expected value of a chosen function of ( )ˆcov −T T .
Commonly used optimization criteria include minimizing ( )ˆcov −T T in terms of its
1) trace, or
2) quadratic function ( )ˆcov′ −q T T q , or M-estimation; when 1p+=q 1 , this
criterion is referred to as the generalized mean squared error (GMSE).
For example, if minimizing the trace of ( )ˆcov −T T is the criterion, we can
define the optimization function as (3.8),
( ) ( ) ( ) ( )1
,1
2 2p
I I II II i I II IIi
trace trace+
=
′ ′ ′ ′ ′Φ = − + −∑W W V W W V L u W X L X λ , (3.10)
where iu is the i -th row of identity matrix 1p+I and λ is a ( )1 1p + × vector of
Lagrangian multipliers. Differentiating (3.10) with respect to elements of W and λ
and setting the derivatives to zero will yield two sets of estimation equations. The
estimation equations can then be used to identify W . We pursue this approach in
Chapter 4.
41
3.4. Example: best linear unbiased estimator of a population total
A special application of the method proposed in Section 3.3 is to derive a linear
unbiased minimum variance estimator (BLUE) of ( )0T when no auxiliary information is
used. By definition, ( )0I I II IIT ′ ′= +Z Zl l , where ( )I n′ ′= 1 0l and ( )II N n−′ ′= 1 0l . Using
(3.5), a linear estimator of ( )0T is defined as
( ) ( )0ˆI IT ′ ′= +w Zl , (3.11)
where w is a ( )1 1n p + × column vector. Following (3.6), the estimation error is
defined as ( ) ( )0 0ˆI II IIT T ′ ′= −- w Z Zl . Applying (3.8), the unbiasedness of ( )0T for ( )0T
becomes
I II II′ ′− =w X X 0l , (3.12)
where 1I p n+= ⊗X I 1 and 1II p N n+ −= ⊗X I 1 .
Under the unbiasedness constraint (3.12), the mean squared error of ( )0T is the
same as its variance. As an analogue to (3.10), minimizing ( )( )0ˆvar T subject to (3.12)
is equivalent to minimize
( ) ( ),2 2I I II II I II II′ ′ ′ ′Φ = − + −w w V w w V w X X λl l , (3.13)
where λ is a ( )1 1p + × vector of Lagrangian multipliers. Differentiating (3.10) and
setting the derivatives to zero results in a set of estimating equations,
,ˆˆ
I I I II II
I II II
⎛ ⎞⎛ ⎞ ⎛ ⎞=⎜ ⎟⎜ ⎟ ⎜ ⎟′ ′⎝ ⎠ ⎝ ⎠⎝ ⎠
wV X VX 0 Xλ
ll
, (3.14)
which has a unique solution,
42
( ) ( )( )11 1 1, ,ˆ I I II I I I I I I I II II
−− − −′ ′= − −w V V X X V X X V V I l . (3.15)
By applying (3.15) to (3.11), we have the BLUE for ( )0T ,
( )10 ,ˆ
I I II II I II I I IT −⎡ ⎤′ ′ ′= + + −⎣ ⎦Z X β V V Z X βl l , (3.16)
where ( ) 11 1I I I I I I
−− −′′=β X V X X V Z . The variance of ( )0T can be evaluated as
( )( ) ( )( ) ( ) ( )0ˆ 垐 �var var I I I I IT ′ ′= + = + +w Z w V wl l l . (3.17)
After algebraic simplification,
( ) ( ) ( )( ) ( ) ( ) ( )( )0 1 0 11 p pn n n Y Y Y
n′ ′′ ′ ′= =β 1 Y 1 Y 1 Y ,
and
( ) ( ) ( ) ( ) ( )0 0 0 0T nY N n Y NY= + − = ,
where ( ) ( )0 01n IY
n′= 1 Y is the sample mean of permuted response variable. The variance
of ( )0T is
( ) 2 20 0
1ˆvar fT Nn
σ−= ,
where f n N= is the sampling fraction.
3.5. Estimation of a population total using auxiliary information
In this section, we propose a method of using auxiliary information to improve
precision of linear unbiased estimators of population totals through a simple
reparameterization. We illustrate this method using a simple case that has one response
and one auxiliary variable and assuming the auxiliary total ( )1T is known.
43
Our target parameter is population total of the response variable,
( ) ( )0 0NT ′ ′= =1 Y Zl , (3.18)
where ( )1N N×′′= 1 0l . We are interested in deriving an estimator for ( )0T that is a
linear function of the sample data and unbiased for ( )0T , and has minimum mean
squared error.
We transform the vector of auxiliary variables ( )1Y by pre-multiplying
NN N NN
′= +1I 1 P , such that ( ) ( )( ) ( )1 1 1N
N NN′= +
1Y 1 Y P Y . The first term in this expression
is a constant vector ( )( )1Nμ 1 , and the second term is ( ) ( ) ( ) ( )1 * 1 1 1
N Nμ= = −Y P Y Y 1 . It is
clear that
( )
( )
( )
( )
( )
( ) ( )
0 0 0
11 1 1 *
N
NN NNN
μ
⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= + = +⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟′⎜ ⎟⎝ ⎠ ⎝ ⎠⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠
0 0I 0 0Y Y Y10 P 10 1Y Y Y
(3.19)
Applying identity (3.19) to Model (3.2) (here 1p = ) leads to a reparameterized model,
( )
( )( )
0*
21 * N
⎛ ⎞⎜ ⎟ = ⊗ +⎜ ⎟⎝ ⎠
YI 1 μ E
Y, (3.20)
where ( )( )0* 0μ ′=μ . Since this transformation does not have impact on E , its
variance-covariance matrix remains the same as ( ) ( )( )0 1 *cov , N= ⊗Y Y Σ P , where Σ is
given in Section 2.2.
44
Using the same method as shown in previous section, ( ) ( )( )0 1 * ′′ ′Y Y can be
partitioned as ( ) ( )( ) ( )0 1 ** *I II
′ ′′ ′ ′ ′= =Z K Y Y Z Z , where *K is a permutation matrix
as given by (2.9), ( ) ( )( )0 1 *I I I
′′ ′=Z Y Y represents the sample and ( ) ( )( )0 1 *II II II
′′ ′=Z Y Y
represents the remaining part. It immediately follows that ( ) ( ) *2I nE = ⊗Z I 1 μ ,
( ) ( ) *2II N nE −= ⊗Z I 1 μ , ,I n N= ⊗V Σ P and ( ), ,
1I II II I n N nN × −
⎛ ⎞′= = ⊗ −⎜ ⎟⎝ ⎠
V V Σ J .
Similar to (3.5), ( )0T can be represented as a linear combination of the sampled
and remaining parts,
( )0I I II IIT ′ ′= +Z Zl l , (3.21)
where ( )1 0I n′= ⊗1l and ( )1 0II N n−
′= ⊗1l . Its linear estimator is defined as,
( ) ( )0ˆI IT ′ ′= + w Zl , (3.22)
where ( )0 1′′ ′=w w w is a 2 1n × vector. Using (3.21) and (3.22), the unbiasedness for
( )0T implies
( ) 0I II IIE ′ ′− =w Z Zl . (3.23)
Since ( ) ( ) *2I nE = ⊗Z I 1 μ , ( ) ( ) *
2II N nE −= ⊗Z I 1 μ and ( )( )0* 0μ ′=μ , (3.23) is
equivalent to ( )( ) ( )00 0n N n μ′ − − =1 w . The constraint is satisfied for any ( )0μ simply
by
( )0 0n N n′ − − =w 1 . (3.24)
45
The mean squared error of ( )0T is given as,
( )( ) ( ),0
,
ˆvarI I II
II
II I II II
T⎛ ⎞ ⎛ ⎞
′ ′ ⎜ ⎟ ⎜ ⎟= −⎜ ⎟⎜ ⎟ −⎝ ⎠⎝ ⎠
V V ww
V Vl
l. (3.25)
To minimize (3.25) subject to (3.24), we define the optimization function as
( ) ( ) ( ) ( )( )0
0 1 0 1 , 0
1
2 2I I II II n N n λ⎛ ⎞
′ ′ ′ ′ ′⎜ ⎟Φ = − + − −⎜ ⎟⎝ ⎠
ww w w V w w V w 1
wl , (3.26)
where λ is a Lagrangian multiplier. Differentiating (3.26) and setting the derivatives to
zero leads to a set of estimating equations that can be used to identify w .
Unlike the usual calibration technique of incorporating auxiliary information,
this approach leads a set of weights that do not depend on the sample, and yields a
linear unbiased minimum variance estimator of 0T . We will explore this approach
further in Chapter 5, and extend it to settings when multiple auxiliary totals are known.
3.6. Enumeration and simulation
Comparisons between candidate estimators are sometimes difficult through
mathematical derivation. When mathematical comparisons are not feasible, we will use
enumeration and simulation methods.
When the population size is small, it is possible to enumerate all possible
samples and compute the estimators based on each sample. The exact distribution of the
estimators can be obtained. Thus comparisons between various types of estimators are
possible. We will use this approach to illustrate the performance of estimators for small
samples from finite populations of small size.
46
When population size is large, the number of possible samples is so large that it
may be impossible to obtain the exact moments of the estimators. The exact values of
an expected value, the bias and the variance of estimator for population totals can not be
computed through enumeration. In such cases, Monte Carlo simulations can be used to
evaluate the properties of the estimates if theoretical derivation is not possible. The
finite population and the sampling design can be held constant, a large number of
samples can be repeatedly drawn from the finite population using the specified
sampling design. For every realized sample, the estimators can be calculated. If the
number of samples is large enough, the distributions of the calculated estimators will
closely approximate the exact sampling distribution to a given level of precision.
3.7. Research objectives
In this dissertation, we will primarily consider estimation of population
quantities based on samples drawn from a real-valued finite population of known size,
using simple random sampling without replacement (SRSWOR). The sample size is
assumed fixed. To simplify the problems without losing generality, we assume that
response for all sampled subjects can be measured without error.
Discussions will be restricted to the scenarios involving univariate response
variable that may be either continuous or binomial. In design-based approaches,
estimation of population totals or means of continuous and binomial response variable
are not notably different. Unless otherwise noted, we will consider binomial response
variables, and refer to the mean estimators as population rates (following the
terminology in Epidemiology). The problems involving multinomial or ordinal response
47
variables may be reduced to estimation using a series of binomial variables, which will
not be addressed in detail.
We will develop methods of improving estimation of population quantities
(totals, means or rates) by incorporating auxiliary information. The values of auxiliary
variables are known either for all population subjects, or at least some of its population
quantities (such as population totals) are known. We will consider several simple
problems that correspond to various scenarios that may arise according to the nature of
response and auxiliary variables, as well as number and information availability of
auxiliary variables.
We will define population totals as linear function of random variables arising
from permutations. To obtain estimators of target parameters for each scenario, we will
follow the strategies outlined in Section 3.3 through 3.5. First, we will derive estimators
of population totals and their variance-covariance matrix using a RPSUR model
described in 3.2. We optimize the estimators by minimizing either Generalized Mean
Squared Errors (GMSE) or the trace of the variance-covariance matrix of population
totals.
The properties of the derived estimators will be evaluated through mathematical
derivation whenever possible. We will try to represent the estimator in terms of a naïve
estimator (without incorporating auxiliary information) plus or multiplied by an
adjustment factor. When analytic derivations are not possible, we will evaluate their
properties using enumeration and simulation methods.
Each Chapter will start with a simple case that involves single binomial
response and single binomial auxiliary variable, therefore the parameters have
48
straightforward meaning. Results will be gradually extended to more complicated cases
that involves continuous response and auxiliary variables or multiple auxiliaries.
Finally, the derived results will be applied and related to direct and indirect
adjustment methods in Epidemiology and vital statistics.
49
CHAPTER 4
ESTIMATION OF POPULATION TOTALS
4.1. Motivations
Public health surveys often involve estimating one or more population totals,
such as the total number of persons who contracted a certain disease or the total number
of retirees having prescription drug coverage. There may also be interest in joint
estimates of multiple population totals when categorical outcomes are involved. For
example, the status of a chronic disease is classified as three categories according to
severity, that is, None, Stages I and II. Health policy researchers may be interested in
estimating the total numbers of persons who contracted the disease of Stages I or II, or
of those who did not contract the disease. One may question whether identical
estimation methods should be applied to all three categories, and whether the methods
are optimal for all three outcomes, or if methods can be identified to satisfy varying
precision requirements for different outcomes.
In this Chapter, we derive the best linear unbiased estimator for the vector T of
population totals of multiple variables, and its variance-covariance matrix, ( )ˆcov T
assuming no auxiliary totals are known. We derive two classes of linear unbiased
estimators based on either the minimum trace of ( )ˆcov T , or the minimum of a
specified quadratic function ( )ˆcov′q T q , where q is a ( )1 1p + × non-null column
vector.
50
4.2. Parameters of interest
Many estimators can be expressed as a linear function of estimated population
totals (Thompson 1999), which is equivalent to prediction of a linear combination of
random variables. The estimators for target parameters and their variances can be
readily computed based on the estimators and their variance-covariance matrix of
relevant population totals. Therefore, the estimation of population totals is the
cornerstone of estimating other target parameters.
Suppose that there are 1p + population totals of interest, the vector of
population totals can be represented as a set of 1p + linear combinations, ′=T L Z ,
where 1p N+= ⊗L I 1 and Z is a vector that represent the ( )1 1N p + × random variables
arising from joint permutation of the 1p + variables. After sampling is completed, Z
can be partitioned into sample and remaining parts such that ( )I II′′ ′=Z Z Z . Thus T
can be written as a set of linear combinations of IZ and IIZ that represent the random
variables for the sample and remaining parts,
( ) II II
II
⎛ ⎞′ ′= ⎜ ⎟⎜ ⎟
⎝ ⎠
ZT L L Z , (4.1)
where 1I p n+′ ′= ⊗L I 1 and 1II p N n+ −′ ′= ⊗L I 1 .
4.3. Formulation of a RPSUR model
We will derive estimators of population totals using a simple location seemingly
unrelated regression model based on a random permutation framework (RPSUR).
Similar to Model 3.4, we define a RPSUR model,
51
I I
II II
⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠
Z Xμ E
Z X, (4.2)
where 1I p n+= ⊗X I 1 and 1II p N n+ −= ⊗X I 1 , ( ) 1I IE
N=Z X T and ( ) 1
II IIEN
=Z X T .
4.4. Two classes of estimators for population total vector
We are interested in developing two classes of estimators of population totals of
response variables that is,
1) A linear function of the sample, ˆI′=T A Z
2) Unbiased for T , i.e., ( )E =T - T 0 .
4.4.1. Two classes of linear estimators of T
A linear estimator of I I II II′ ′= +T L Z L Z is defined as a linear function of the
sample, e.g., ˆI′=T A Z or equivalently as,
( )ˆII I′ ′= +T L W Z , (4.3)
where I′ ′ ′= −W A L is a ( ) ( )1 1p n p+ × + matrix. In general, all elements of ′W may
differ. We limit our development to two classes of estimators, 1) 0
p
kk == ⊕W w , and 2)
*0
p
k== ⊕W w , where *w and kw , 0,1, ,k p= are 1n × vectors.
4.4.2. Unbiasedness for T
The estimation error is defined as ˆI II II′ ′− = −T T W Z L Z . The unbiasedness of
T implies that ( ) ( )ˆII IE E ′ ′= + − =⎡ ⎤⎣ ⎦T - T L W Z T 0 holds for any T . Since
52
( ) ( ) ( )1I I I II IIE N −′ ′ ′ ′+ − = −⎡ ⎤⎣ ⎦L W Z T W X L X T , unbiasedness will hold for all T if
and only if
I II II′ ′− =W X L X 0 . (4.4)
Constraint (4.4) is a set of 1p + constraints:
( )( )
( )
0
1
0,
0,
0.
n
n
p n
N n
N n
N n
′ − − =
′ − − =
′ − − =
w 1
w 1
w 1
For convenience, the set of constraints can be rewritten alternatively as
( ) 0i I II II′ ′ ′− =u W X L X ,
where iu is the i -th row of identity matrix 1p+I , 1,2, , 1i p= +… .
4.4.3. Optimization criteria
We find the “best” estimators of T by evaluating W such that it attains a
minimum for a specified optimization criteria. Many optimization criteria are proposed
in literature (Deville and Särndal 1992; Singh and Mohl 1996). In this Chapter, we will
consider the following optimization criteria:
1) Trace of ( )cov T ,
2) M-estimation, ( )cov′q T q , where q is an arbitrary column vector; when
1p+=q 1 , it is generalized mean squared error (GMSE), ( )1 1covp p+ +′1 T 1 .
53
4.5. Estimating multiple population totals
In this section we develop two classes of estimators using the two optimization
criteria outlined above. Four cases of interest are presented in Table 1.
Table 4.1. Two classes of unbiased estimators of population totals under two optimization criteria
Class Definition of W Optimization Criteria
1 0
p
kk== ⊕W w ( )( )ˆcovtrace T
1 0
p
kk== ⊕W w ( )ˆcov′q T q
2 *0
p
k == ⊕W w ( )( )ˆcovtrace T
2 *0
p
k == ⊕W w ( )ˆcov′q T q
4.5.1. Estimators with minimum ( )( )covtrace T when 0
p
kk== ⊕W w
Minimizing trace of ( )ˆcov T - T subject to the unbiasedness constraints (4.4) is
equivalent to minimizing the following Lagrangian function,
( ) ( ) ( ), ,0
p
I I II II II I II i I II IIi
tr=
′ ′ ′ ′ ′ ′ ′Φ = − − + −∑W W V W W V L L V W u W X L X λ , (4.5)
where i′u is the i -th row of 1p+I , ( )0 k pλ λ λ ′=λ , 0,1, ,k p= … , is a
( )1 1p + × column vector of Lagrangian multipliers. Differentiating (4.5) with respect
to elements of W and λ , setting the derivatives to zeros, results in the following
estimation equations,
,
1
ˆˆ
I III I
II II pI +
⎛ ⎞ ⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟ ′′⎝ ⎠ ⎝ ⎠⎝ ⎠
w dD XX L 1X 0 λ
. (4.6)
54
where 2,0
p
I k n Nkσ
=
⎛ ⎞= ⊕ ⊗⎜ ⎟⎝ ⎠
D P and 2, 0
p
I II k p nk
N nN
σ=
− ⎛ ⎞⎛ ⎞= − ⊕ ⊗⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠d 1 1 . Estimation
equations (4.6) have the following solutions,
( ) ( )11 1 1 1, , 1ˆ I I II I I I I I I I I II II II p
−− − − −+′ ′ ′= − −w D d D X X D X X D d X L 1 .
After algebraic simplification, the above solution is equivalent to
ˆ nN n
n−
=w 1 or ( )1ˆ
p nN n
n +−
= ⊗W I 1 . (4.7)
Consequently, the best linear unbiased estimator for T and its variance are
ˆN=T μ and ( ) ( )var
N N nn−
=T Σ ,
where ( ) ( ) ( )( )0 1ˆ pY Y Y ′=μ is the vector of sample means, ( ) ( )
1
1 nk k
ss
Y Yn =
= ∑ ,
0,1, ,k p= … .
Derivation of the above results is presented in Section 4.7.
4.5.2. Estimators with minimum ( )ˆcov′q T q when 0
p
kk== ⊕W w
Another optimization criterion is to minimize an arbitrary quadratic function of
( )ˆcov T , i.e., ( )ˆcov′q T q , where q is ( )1 1p + × column vector of rank 1. When
1p+=q 1 , ( )ˆcov′q T q is the same as generalized mean squared error (GMSE). Thus, an
optimization function ( )Φ w can be defined using ( )ˆcov′q T q and include the unbiased
constraint for T , such that
( ) ( ) ( )1
, ,1
p
I I II II II I II i I II IIi
+
=
′ ′ ′ ′ ′ ′ ′ ′Φ = − − + −∑w q W V W W V L L V W q u W X L X λ , (4.8)
55
where ( ) ( )0 11 1 pp λ λ λ+ ×′=λ is a ( )1 1p + × column vector of Lagrangian
multipliers. Differentiating (4.8) and setting both sets of derivatives to zero yields the
following estimation equations,
( )
,
1
ˆ
ˆqI II IIqI I
II II pI +
⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟′′ ⎝ ⎠⎝ ⎠ ⎝ ⎠
V L qV X w
X L 1X 0 λ, (4.9)
where 1 1 1 1 1,0 0
p p
qI k k n Nk kq q− − − − −
= =
⎛ ⎞⎛ ⎞ ⎛ ⎞= ⊕ ⊕ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠V Σ P , and ( ), 0
1p
qI II k n N nkq
N × −=
⎛ ⎞⎛ ⎞ ⎛ ⎞= ⊕ ⊗ −⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠⎝ ⎠V Σ J .
Estimation equations (4.9) have the following unique solutions,
( ) ( )11 1 1 1, , 1ˆ qI qI II II qI qI I qI I I qI I II II II II p
−− − − −+′ ′ ′= − −w V V L q V X X V X X V V L q X L 1 , (4.10)
After some algebraic arrangement, (4.10) can be simplified as,
1ˆ p nN n
n+
−⎛ ⎞= ⊗ ⎜ ⎟⎝ ⎠
w 1 1 , (4.11)
and ˆ k nN n
n−
=w 1 , for 0,1, ,k p= … , ( )1ˆ
p nN n
n +−
= ⊗W I 1 . Thus, we have the
following expressions,
ˆ ˆN=T μ and ( ) ( )ˆvarN N n
n−
=T Σ .
Derivation of the above results is presented in Section 4.8.
4.5.3. Estimators with minimum ( )( )ˆcovtrace T when 1 *p+= ⊗W I w
Since 1 *p+= ⊗W I w , the unbiased constraints (4.4) reduce to,
( )* 0I N n′ − − =X w , (4.12)
where I n=X 1 . The Lagrangian function (4.5) is simplified to,
( ) ( ) ( )( ), , *I I II II II I II Itr N n λ′ ′ ′ ′ ′Φ = − − + − −W W V W W V L L V W X w . (4.13)
56
Differentiating (4.28) with respect to *w and λ and setting the derivatives to
equal zero yields the following estimating equations
*, , ,ˆˆ
tr I I tr I II I
I N nλ
⎛ ⎞⎛ ⎞ ⎛ ⎞=⎜ ⎟⎜ ⎟ ⎜ ⎟′ −⎝ ⎠⎝ ⎠⎝ ⎠
wV X VX 0
l, (4.14)
where ( ), ,tr I n Ntr=V Σ P , ( ), ,tr I IIN n tr
N−
= −V Σ and I n= 1l . The solutions of (4.14)
are
( ) ( )( )11 1 1 1* , , , , , , , ,ˆ tr I tr I II I tr I I I tr I I I tr I tr I II I N n
−− − − −′ ′= − − −w V V V X X V X X V Vl l . (4.15)
After algebraic simplification, (4.15) becomes
*ˆ nN n
n−
=w 1 ,
or equivalently,
( )1ˆ
p nN n
n +−
= ⊗W I 1 .
Consequently, ˆ ˆN=T μ and ( ) ( )varN N n
n−
=T Σ .
Detailed derivations are given in Section 4.9.
4.5.4. Estimators with minimum ( )ˆcov′q T q when 1 *p+= ⊗W I w
Using constraint (4.12), the Lagrangian function of this case can be defined as
( ) ( )( ), *2I I II II I N n λ′ ′ ′ ′ ′Φ = − + − −w q W V Wq q W V L q X w . (4.16)
Differentiating (4.16) and setting the derivatives to zero yields the following estimating
equations,
( )* , ,,
ˆˆ
q I II Iq I I
I N nλ
⎛ ⎞ ⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟′ −⎝ ⎠ ⎝ ⎠⎝ ⎠
w VV XX 0
l, (4.17)
57
where ( ), ,q I n N′=V q Σq P , ( ), ,q I IIN n
N− ′= −V q Σq , I n= 1l and I n=X 1 . The solutions
of (4.17) are
( ) ( )( )11 1 1 1* , , , , , , , ,ˆ q I q I II I q I I I q I I I q I q I II I N n
−− − − −′ ′= − − −w V V V X X V X X V Vl l . (4.18)
After simplification, (4.18) becomes,
*ˆ nN n
n−
=w 1 ,
or equivalently,
( )1ˆ
p nN n
n +−
= ⊗W I 1 .
Consequently,
1ˆ ˆn
p IN Nn+
′⎛ ⎞= ⊗ =⎜ ⎟⎝ ⎠
1T I Z μ , and ( ) ( )varN N n
n−
=T Σ .
Detailed derivations are given in Section 4.10.
4.6. Remarks
In this Chapter, we derived two classes of estimators that are defined by two
different sets of coefficients, i.e., 0
p
kk== ⊕W w and 1 *p+= ⊗W I w . The estimators were
optimized by minimizing either the trace of ( )ˆcov T or ( )ˆcov′q T q . Under SRSWOR
and unbiasedness constraints for every element of T , the estimators for all four cases
are identical and shown to be simple expansion estimators. It is apparent that neither
the method of defining estimators nor the optimization criterion had impact on the
estimation.
58
4.7. Derivations of results in Section 4.5.1.
This section provides proof of the results in Section 4.5.1. A class of estimators
defined by 0
p
kk== ⊕W w , where kw may or may not be equal to each other, is derived by
minimizing ( )( )ˆcovtrace T subject to the unbiasedness constraints for every element of
T .
As shown in Section 4.4.2., the unbiasedness constraints are
I II II′ ′− =W X L X 0 . (4.19)
Therefore, minimizing ( )( )ˆcovtrace T subject to unbiased constraints (4.19) is
equivalent to minimizing the following Lagrangian function,
( ) ( ) ( )1
, ,1
2p
I I II II II I II i I II IIi
tr+
=
′ ′ ′ ′ ′ ′Φ = − − + −∑W W V W W V L L V W u W X L X λ , (4.20)
where λ is a ( )1 1p + × vector of Lagrangian multipliers.
Since ,I n N= ⊗V Σ P , ,1
n N n nN= −P I J ,
0
p
kk == ⊕W w ,
( ),0 0
20 0 , 0 01 0 , 1 0 0 ,
210 1 , 0 2 1 , 1 1 1 ,
20 , 0 1 , 1 ,
.
p p
I k n N kk k
n N n N p n N p
n N n N p n N p
p p n N p p n N p p n N p
σ σ σ
σ σ σ
σ σ σ
= =
⎛ ⎞ ⎛ ⎞′ ′= ⊕ ⊗ ⊕⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
′ ′ ′⎛ ⎞⎜ ⎟⎜ ⎟′ ′ ′⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟′ ′ ′⎝ ⎠
W V W w Σ P w
w P w w P w w P w
w P w w P w w P w
w P w w P w w P w
It is obvious that ( ) 2,
1
p
I k k n N kk
trace σ=
′ ′= ∑W V W w P w , ( ) 2,2I k n N k
k
tr σ∂ ′ =∂
W V W P ww
and
59
( )
2,
10 200 , 0
2 2, 11 , 1 212 ,0
2,
2,
1
2
22
2
p
k k n N kk
n Np
pk k n N k n Nk
I k n Nk
pp n N pp
k k n N kkp
tr
σσ
σ σσ
σσ
=
==
=
⎛ ⎞∂ ′⎜ ⎟∂⎜ ⎟ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟∂⎜ ⎟ ⎜ ⎟′ ⎜ ⎟∂ ⎛ ⎞⎛ ⎞⎜ ⎟∂ ⎜ ⎟ ⎜ ⎟′ = = = ⊕ ⊗⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟∂ ⎜ ⎟⎝ ⎠⎝ ⎠⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠∂⎜ ⎟′⎜ ⎟∂⎝ ⎠
=
∑
∑
∑
w P ww
wP w
w P w wP wwW V W P
w
wP ww P w
w
( ),2
2 ,
n N
I
⊗
=
ΣD P w
D w
where 2
0
p
kkσ
== ⊕ΣD and ,I n N= ⊗ΣD D P .
Since ( ) 2,
0
p
I II II k k nk
N ntrN
σ=
−′ ′= − ∑W V L w 1 , ( ) 2,I II II k n
k
N ntrN
σ∂ −′ = −∂
W V L 1w
,
we have ( ) ( ),I II II p nN ntr
N∂ −′ = − ⊗∂ ΣW V L D 1 1w
. In addition,
( ) ( )( )
( )( )( )
( ) ( )( )( )
( ) ( )( )
1 1 1 1 10 0 0
1 ' 1 ' 1 ' 10 ' 0
1 ' 1 ' 1 ' 10 ' 0
1 ' 1 ' 1 ' 10 ' 0
p p p
k I k k k k n Ik k k
p p
k k k k n Ik k
p p
k k k k n Ik k
p p
k k k k nk k
tr tr
tr
+ + + + += = =
+ + + += =
+ + + += =
+ + + += =
⎛ ⎞⎛ ⎞∂ ∂′ ′ ′ ′ ′= ⊗⎜ ⎟⎜ ⎟∂ ∂ ⎝ ⎠⎝ ⎠
∂ ′ ′ ′= ⊗∂
∂′ ′ ′= ⊗∂
′ ′= ⊗
∑ ∑ ∑
∑∑
∑∑
∑∑
u W X λ u u w u u I X λw w
u u w u u I X λw
u u w u u I X λw
u u u u I
.
I
I
⎛ ⎞⎜ ⎟⎝ ⎠
=
X λ
X λ
Therefore, differentiating (4.20) with respect to elements of w and λ gives
( ) ,2 2 2I I II I∂Φ = − +
∂w D w D X λ
w, (4.21)
60
( ) ( )1
11
p
I II II i I II II pi
+
+=
∂ ′ ′ ′ ′Φ = − = −∂ ∑w X W X L u X w X L 1λ
, (4.22)
where ,I n N= ⊗ΣD D P and ( ),I II p nN n
N−
= − ⊗Σd D 1 1 . Setting both (4.21) and (4.22)
to zero results in the following estimating equations,
,
1
ˆˆ
I III I
II II pI +
⎛ ⎞ ⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟ ′′⎝ ⎠ ⎝ ⎠⎝ ⎠
w dD XX L 1X 0 λ
. (4.23)
The solutions to (4.23) are
( ) ( )11 1 1 1, , 1ˆ I I II I I I I I I I I II II II p
−− − − −+′ ′ ′= − −w D d D X X D X X D d X L 1 .
Since 1,
1n N n nN n− ⎛ ⎞= +⎜ ⎟−⎝ ⎠
P I J , 1,n N n n
NN n
− =−
P 1 1 , 1,n n N n
NnN n
−′ =−
1 P 1 ,
1, 1I I II p n
−+= − ⊗D d I 1 , we have
( ) ( ) ( ) ( ) ( ) ( )( )( ) ( )( )
111 1 1 1 1 1, 1 1 , 1
11 1 1, ,
11 ,
I I I I I n N p n p n n N p n
n N n n n N n
p nn
−−− − − − − −+ + +
−− − −
+
′ ′= ⊗ ⊗ ⊗ ⊗ ⊗
′= ⊗
⎛ ⎞= ⊗ ⎜ ⎟⎝ ⎠
Σ Σ
Σ Σ
D X X D X D P I 1 I 1 D P I 1
D D P 1 1 P 1
I 1
( )( )( )
( )( )( )
1 1 1, 1 1 ,
1 1 1
1 1 1 1 ,
I I I II II II p p n n N n
p N n p N n p
p p p p
N nN
N nn N n nn
− − −+ +
+ − + − +
+ + + +
−′ ′ ′− = − ⊗ ⊗ ⊗
′− ⊗ ⊗
−⎛ ⎞= − − − = − +⎜ ⎟⎝ ⎠
Σ ΣX D d X L 1 I 1 D P D 1
I 1 I 1 1
I 1 I 1
( ) ( )
( ) ( )
11 1 1 1, , 1
1 1 1 1
1
ˆ
.
I I II I I I I I I I I II II II p
p n p n p p
p n
N nn
N nn
−− − − −+
+ + + +
+
′ ′ ′= − −
−⎛ ⎞= − ⊗ + ⊗ +⎜ ⎟
⎝ ⎠−
= ⊗
w D d D X X D X X D d X L 1
I 1 I 1 I 1
1 1
61
Consequently, ( )1ˆ
p nN n
n +−
= ⊗W I 1 , and
( )1 1 1ˆ ˆn
p n p n I p IN n N N
n n+ + +
′− ⎛ ⎞⎛ ⎞′ ′= ⊗ + ⊗ = ⊗ =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
1T I 1 I 1 Z I Z μ .
( ) ( )( ) ( ) ( ),var p n n N p n
N N nN n N nn n n
−− −⎛ ⎞ ⎛ ⎞′= ⊗ ⊗ ⊗ =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
T I 1 Σ P I 1 Σ .
4.8. Derivations of results in Section 4.5.2.
This section presents the derivations for a class of estimators defined by
0
p
kk == ⊕W w and optimized by minimizing ( )ˆcov′q T q subject to the unbiasedness
constraints for every element of T , where q is an 1n × vector.
As shown in Section 4.4.2., the unbiasedness constraints are
I II II′ ′− =W X L X 0 . (4.24)
Minimizing ( )ˆcov′q T q subject to unbiased constraints (4.24) is equivalent to
minimizing the following Lagrangian function,
( ) ( ) ( ) ( )1
, ,1
2p
I I II II I II II i I II IIi
+
=
⎡ ⎤′′ ′ ′ ′ ′ ′Φ = − − + −⎢ ⎥⎣ ⎦∑W q W V W W V L V L W q u W X L X λ ,
where ( ) ( )0 11 1 pp λ λ λ+ ×′=λ is a ( )1 1p + × vector of Lagrangian multipliers.
For convenience, we denote ( )0 1 p′′ ′ ′=w w w w . It is clear that
10
p
k pk +=
⎛ ⎞= ⊕⎜ ⎟⎝ ⎠
w w 1 and ( )( )1
1
p
i i n ii
+
=
′ ′= ⊗∑W u u I wu , where iu and i′u are the i -th column
or row of identity matrix 1p+I . Therefore, ( ) ( )Φ = Φw W .
62
Since 0
p
k nkq
=
⎛ ⎞⎛ ⎞= ⊕ ⊗⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠Wq I w and
0
p
k nkq
=
⎛ ⎞⎛ ⎞′ ′ ′= ⊕ ⊗⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠q W w I ,
0 0
0 02 ,
p p
I k n I k nk k
p p
k n I k nk k
q q
q q
= =
= =
∂ ∂ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞′ ′ ′= ⊕ ⊗ ⊕ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟∂ ∂ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞= ⊕ ⊗ ⊕ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
q W V Wq w I V I ww w
I V I w
( ), , ,0 0
p p
I II II k n I II II k n I II IIk kq q
= =
∂ ∂ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞′ ′ ′= ⊕ ⊗ = ⊕ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟∂ ∂ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠q W V L q w I V L q I V L q
w w.
As shown in Section 4.9, ( ) ( )
1
11
p
i I II II I II IIpi
+
+=
′ ′ ′ ′ ′ ′− = −∑u W X L X λ w X λ 1 L X λ ,
( )1
1
p
i I II II Ii
+
=
∂ ′ ′ ′− =∂ ∑u W X L X λ X λw
, ( )1
11
p
i I II II I II II pi
+
+=
∂ ′ ′ ′ ′ ′− = −∂ ∑u W X L X λ X w X L 1λ
.
Consequently, the two sets of derivatives are,
,0 0 02 2 2
p p p
k n I k n k n I II II Ik k kq q q
= = =
∂Φ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞= ⊕ ⊗ ⊕ ⊗ − ⊕ ⊗ +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟∂ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠I V I w I V L q X λ
w,
1I II II p+∂Φ ′ ′= −∂
X w X L 1λ
.
Setting both sets of derivatives to zeros, we have,
,00 0
1
ˆ
ˆ
pp p
k n I II IIk n I k n I kk k
II II pI
qq q== =
+
⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⊕ ⊗⊕ ⊗ ⊕ ⊗ ⎛ ⎞ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠ =⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟ ⎜ ⎟′′⎝ ⎠ ⎝ ⎠
I V L qI V I X w
λX L 1X 0
. (4.25)
Denote 0 0
p p
qI k n I k nk kq q
= =
⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞= ⊕ ⊗ ⊕ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠V I V I and , ,0
p
qI II k n I IIkq
=
⎛ ⎞⎛ ⎞= ⊕ ⊗⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠V I V , the
solutions to (4.25) can be written as
( )( )
11 1, 1
11 1 1 1, 1 , 1
ˆ
ˆ
I qI I I qI qI II II II p
qI qI II II p qI I I qI I I qI I II II II p
−− −+
−− − − −+ +
⎧ ′ ′ ′⎡ ⎤= −⎣ ⎦⎪⎨
′ ′ ′⎡ ⎤⎪ = − −⎣ ⎦⎩
λ X V X X V V X L 1
w V V L 1 V X X V X X V V X L 1. (4.26)
63
The explicit expression for w can be derived as follows,
( ) 11 1 1 1, , 1ˆ qI qI II II qI qI I qI I I qI I II II II II p
−− − − −+′ ′ ′⎡ ⎤= − −⎣ ⎦w V V L q V X X V X X V V L q X L 1 ,
where
1 1 1 1 1 1 1 1 1, ,0 0 0 0
p p p p
qI k n n N k n k k n Nk k k kq q q q− − − − − − − − −
= = = =
⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎡ ⎤= ⊕ ⊗ ⊗ ⊕ ⊗ = ⊕ ⊕ ⊗⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎣ ⎦⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠V I Σ P I Σ P ,
( ), 0
1p
qI II k n N nkq
N × −=
⎛ ⎞⎛ ⎞ ⎛ ⎞= ⊕ ⊗ −⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠⎝ ⎠V Σ J , ( )
1 1, 0
1p
qI qI II k n N nkq
N n− −
× −=
⎛ ⎞ ⎛ ⎞= ⊕ ⊗ −⎜ ⎟⎜ ⎟ −⎝ ⎠⎝ ⎠V V J , and
1 1 1 1
0 0
p p
qI I k k nk k
N q qN n
− − − −
= =
⎛ ⎞⎛ ⎞ ⎛ ⎞= ⊕ ⊕ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟− ⎝ ⎠ ⎝ ⎠⎝ ⎠V X Σ 1 ,
( ) ( ) ( )
( )
111 1 1 1 1
1 , 10 0
1 111 1 1 1 1 1 1
,0 0 0 0
p p
I qI I p n k k n N p nk k
p p p p
k k n n N n k kk k k k
q q
N nq q q qNn
−−− − − − −
+ += =
− −−− − − − − − −
= = = =
⎧ ⎫⎛ ⎞⎛ ⎞⎪ ⎪⎛ ⎞ ⎛ ⎞′ ′= ⊗ ⊕ ⊕ ⊗ ⊗⎨ ⎬⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭
−⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞′= ⊕ ⊕ = ⊕ ⊕⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
X V X I 1 Σ P I 1
Σ 1 P 1 Σ
( ) ( )1 1 1
, 1 0 0
1p p
I qI qI II p n k k N nn N nk k
nq qN n N n
− − −+ −× −= =
⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞′ ′ ′= ⊗ ⊕ ⊗ − = − ⊕ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠X V V I 1 J 1 .
Therefore,
( ) ( )
( )
110
11 1 1 1 1 1
0 0 0 0
11 10
1ˆp
k p N nn N nk
p p p p
k k n k kk k k k
p
k N n p N n pk
qN n
N N nq q q qN n Nn
n qN n
−+ −× −=
−− − − − − −
= = = =
−− + − +=
⎛ ⎞⎛ ⎞⎛ ⎞= ⊕ ⊗ − ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟−⎝ ⎠ ⎝ ⎠⎝ ⎠
⎛ ⎞ −⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞− ⊕ ⊕ ⊗ ⊕ ⊕⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟− ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠
⎛ ⎞⎛ ⎞ ′× − ⊕ ⊗ ⊗ − ⊗⎜ ⎟⎜ ⎟− ⎝ ⎠⎝ ⎠
w J I 1 q
Σ 1 Σ
1 I 1 q I( )( )
( ) ( )
( ) ( )
1 1
1 11 10 0
1 1
1
,
N n p N n p
p p
k n p n k pk k
p n p n
q n q N nn
Nn
− + − +
− −+ += =
+ +
⎡ ⎤′ ⊗⎢ ⎥
⎣ ⎦⎛ ⎞ ⎡ ⎤⎛ ⎞ ⎛ ⎞= − ⊕ ⊗ − ⊗ − ⊕ − −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎢ ⎥⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎣ ⎦
= − ⊗ + ⊗
1 I 1 1
1 q I 1 q 1
1 1 1 1
or simply,
64
( )1ˆ p nN n
n +
−= ⊗w 1 1 and 1
ˆp n
N nn+
−⎛ ⎞= ⊗ ⎜ ⎟⎝ ⎠
W I 1 .
4.9. Derivations of results in Section 4.5.3.
This section provides proof of the results in Section 4.5.3. A class of estimators
defined by 1 *p+= ⊗W I w is derived by minimizing trace of ( )ˆcov T subject to the
unbiasedness constraints for every element of T .
Since 1 *p+= ⊗W I w , the unbiasedness constraints are reduced to
( )* 0n N n′ − − =w 1 . (4.27)
Minimizing the trade of ( )ˆcov T subject to (4.27) is equivalent to minimizing the
following Lagrangian function,
( ) ( ) ( )( ), , *2I I II II II I II ntr N n λ′′ ′ ′ ′Φ = − − + − −W W V W W V L L V W w 1 , (4.28)
where λ is a Lagrangian multiplier.
Since ( ) ( )1 * 1 *I p I p+ +′′ = ⊗ ⊗W V W I w V I w , ,I n N= ⊗V Σ P ,
( ) ( )( )( ){ } ( )( )1 * , 1 * * , *I p n N p n Ntr tr tr+ +′ ′′ = ⊗ ⊗ ⊗ =W V W I w Σ P I w Σ w P w ,
( ) ( ) ( ) ( )* , * , ***
2I n N n Ntr tr tr∂ ∂ ′′ = =∂ ∂
W V W Σ w P w Σ P ww w
( ) ( ) ( ) ( )
( )
, 1 * 1
*
1I II II p p N nn N n
n
trace trN
N ntrN
+ + −× −
⎡ ⎤⎛ ⎞⎛ ⎞′′ = ⊗ ⊗ − ⊗⎢ ⎥⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎣ ⎦−⎛ ⎞′= −⎜ ⎟
⎝ ⎠
W V L I w Σ J I 1
w Σ 1,
( ) ( ),*
I II II nN ntr tr
N∂ −′ = −∂
W V L Σ 1w
,
65
( )( )**
n nN n λ λ∂ ′ − − =∂
w 1 1w
,
( ) ( )**n N n
λ∂ ′Φ = − −∂
w 1 w .
Therefore, differentiating (4.28) with respect to *w and λ and setting the
derivatives to equal zero gives the following estimating equations
*, , ,ˆˆ
tr I I tr I II I
I N nλ
⎛ ⎞⎛ ⎞ ⎛ ⎞=⎜ ⎟⎜ ⎟ ⎜ ⎟′ −⎝ ⎠⎝ ⎠⎝ ⎠
wV X VX 0
l, (4.29)
where ( ), ,tr I n Ntr=V Σ P , I n=X 1 , ( ), ,tr I IIN n tr
N−
= −V Σ and I n= 1l . The solutions of
(4.29) are
( ) ( )( )11 1 1 1* , , , , , , , ,ˆ tr I tr I II I tr I I I tr I I I tr I tr I II I N n
−− − − −′ ′= − − −w V V V X X V X X V Vl l . (4.30)
After algebraic simplification, (4.30) is equivalent to ( )*ˆ nN nn
= −1w or
( )1ˆ
p nN n
n +
−= ⊗W I 1 . Consequently,
1ˆ ˆn
p IN Nn+
′⎛ ⎞= ⊗ =⎜ ⎟⎝ ⎠
1T I Z μ , and ( ) ( )varN N n
n−
=T Σ .
4.10. Derivations of results in Section 4.5.4.
This section provides derivations for the class of estimators defined by
1 *p+= ⊗W I w and optimized by minimizing ( )ˆcov′q T q subject to the unbiasedness
constraints for every element of T , where q is an 1n × vector.
Since 1 *p+= ⊗W I w , the unbiasedness constraints reduce to
( )* 0n N n′ − − =1 w . (4.31)
66
Minimizing ( )ˆcov′q T q subject to (4.24) is equivalent to minimize the following
Lagrangian function,
( ) ( ) ( )( ), *2 2I I II II n N n λ′ ′ ′ ′ ′Φ = − + − −W q W V W q W V L q 1 w ,
where λ is a ( )1 1p + × column vector of Lagrangian multipliers. The Lagrangian
function can be written alternatively as,
( ) ( ) ( )( )* , *2 2I I II II n N n λ′ ′ ′ ′ ′Φ = − + − −w q W V Wq q W V L q 1 w .
Let us evaluate them by part.
( )( )( ) ( )
( )
* , * * , ** * *
, *2
I n N n N
n N
∂ ∂ ∂′ ′ ′ ′ ′ ′= ⊗ ⊗ ⊗ =∂ ∂ ∂
′=
q W V Wq q w Σ P q w q Σq w P ww w w
q Σq P w,
( ) ( )( )
( ) ( ) ( )
( ) ( )
( )
, 1 * ,* *
* 1*
**
1
I II II p I II II
p N nn N n
n
n
NN n
NN n
N
+
+ −× −
∂ ∂′ ′ ′ ′= ⊗∂ ∂
∂ ⎛ ⎞⎛ ⎞′ ′= ⊗ − ⊗ ⊗⎜ ⎟⎜ ⎟∂ ⎝ ⎠⎝ ⎠− ∂′ ′= −
∂− ′= −
q W V L q q I w V L qw w
q w Σ J I 1 qw
q Σq 1 ww
q Σq 1
,
( )( )*
*n nN n λ λ∂ ′ − − =
∂1 w 1
w,
( )( ) ( )* *n nN n N nλλ∂ ′ ′− − = − −∂
1 w 1 w .
Therefore,
( ) ( ) ( )* , **
2 2 2n N n nN n
Nλ∂ −′ ′Φ = − +
∂w q Σq P w q Σq 1 1
w (4.32)
( ) ( )* *n N nλ∂ ′Φ = − −∂
w 1 w (4.33)
67
Set both (4.32) and (4.33) to zero, we have,
( )* , ,,
ˆˆ
q I II Iq I I
I N nλ
⎛ ⎞ ⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟′ −⎝ ⎠ ⎝ ⎠⎝ ⎠
w VV XX 0
l, (4.34)
where ( ), ,q I n N′=V q Σq P , ( ), ,q I IIN n
N− ′= −V q Σq , I n= 1l and I n=X 1 . The solutions
of (4.34) are
( ) ( )( )11 1 1 1* , , , , , , , ,ˆ q I q I II I q I I I q I I I q I q I II I N n
−− − − −′ ′= − − −w V V V X X V X X V Vl l . (4.35)
After simplification, (4.35) becomes,
*ˆ nN n
n−
=w 1 ,
or equivalently,
( )1ˆ
p nN n
n +
−= ⊗W I 1 .
Consequently,
1ˆ ˆn
p IN Nn+
′⎛ ⎞= ⊗ =⎜ ⎟⎝ ⎠
1T I Z μ , and ( ) ( )varN N n
n−
=T Σ .
68
CHAPTER 5
ESTIMATING POPULATION TOTALS BY REPARAMETERIZATION
5.1. Motivation
In many practical situations, auxiliary information such as gender, age, income
and chronic disease-bearing history are completely or partially known either from a
sampling frame or publicly available census data. Although such information is
commonly used in survey design, its use in estimation after sampling is equally
important. In this Chapter, we demonstrate several methods of incorporating auxiliary
information in estimation through reparameterization. We will consider two scenarios,
i.e.,
1) Values of auxiliary variable are known for all units in the population,
2) Values of auxiliary variable are known only for all units in the sample while
its population total (or mean) is known for the auxiliary variable.
5.2. Formulation of the model through reparameterization
In Chapter 4, we defined a random permutation seemingly unrelated regression
model,
( )
( )( )
0
21 N
⎛ ⎞⎜ ⎟ = ⊗ +⎜ ⎟⎝ ⎠
YI 1 μ E
Y. (5.1)
In both scenarios considered here, the auxiliary total ( )1T (or the mean ( )1μ ) is known.
As a result, the N random variables representing the permutation of auxiliary variable
( )1Y is constrained by ( ) ( )1 1N T′ =1 Y . We transform the vector of auxiliary variables
69
( )1Y by pre-multiplying NN N NN
′= +1I 1 P , such that ( ) ( )( ) ( )1 1 1N
N NN′= +
1Y 1 Y P Y . The
first term in this expression is a vector of constant ( )( )1Nμ 1 . We define the second term
as
( ) ( ) ( ) ( )1 * 1 1 1N Nμ= = −Y P Y Y 1 .
This transformation can be represented jointly in terms of both ( )0Y and ( )1Y ,
( )
( )
( )
( ) ( )
0 0
11 1 *Nμ
⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟= + ⎜ ⎟
⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎝ ⎠⎝ ⎠ ⎝ ⎠
0Y YΠ
1Y Y,
where N
NN NN
⎛ ⎞⎛ ⎞ ⎜ ⎟= +⎜ ⎟ ⎜ ⎟⎜ ⎟ ′⎜ ⎟⎝ ⎠ ⎝ ⎠
0 0I 0Π 10 P 0 1
. Therefore, Model (5.1) becomes,
( )
( ) ( )
( )
( )
00
1 11 *
N
N N
μ
μ μ
⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ + = +⎜ ⎟
⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠
0 1 0YE
1 0 1Y,
or simply
( )
( )( )
0*
21 * N
⎛ ⎞⎜ ⎟ = ⊗ +⎜ ⎟⎝ ⎠
YI 1 μ E
Y, (5.2)
where ( )0
*
0
μ⎛ ⎞⎜ ⎟=⎜ ⎟⎝ ⎠
μ . As shown in A2.1.15, ( )
( )
0
1 *cov N
⎛ ⎞⎜ ⎟ = ⊗⎜ ⎟⎝ ⎠
YΣ P
Y, where
20 01
210 1
σ σ
σ σ
⎛ ⎞⎜ ⎟=⎜ ⎟⎝ ⎠
Σ .
The next step is to partition ( ) ( )( )0 1 * ′′ ′Y Y into two parts that represent sample
and the remaining parts,
70
( )
( )
0* *
1 *
I
II
⎛ ⎞ ⎛ ⎞⎜ ⎟= = ⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠
ZYZ K
ZY,
where *K is a permutation matrix as given by (2.9) in Chapter 2, ( ) ( )( )0 1 *I I I
′′ ′=Z Y Y is
the sample and ( ) ( )( )0 1 *II II II
′′ ′=Z Y Y is the remaining part. It immediately follows that
( ) ( )( )0
20
I nEμ⎛ ⎞⎜ ⎟= ⊗⎜ ⎟⎝ ⎠
Z I 1 , ( ) ( )( )0
20
II N nEμ
−
⎛ ⎞⎜ ⎟= ⊗⎜ ⎟⎝ ⎠
Z I 1 , ,I n N= ⊗V Σ P and
( ), ,1
I II II I n N nN × −⎛ ⎞′= = ⊗ −⎜ ⎟⎝ ⎠
V V Σ J .
5.3. Estimating ( )0T and ( )0μ with a known auxiliary population total
In this Section , we will derive an estimator of response population total (and
mean) using the method outlined in Section 3.5. of Chapter 3. First, we briefly reiterate
the research question and then derive the estimator.
5.3.1. Parameters of interest and their linear estimators
The parameter of interest is the population total of the response variable,
( ) ( )0 0NT ′ ′= =1 Y Z*l , (5.3)
where ( )1N N×′′= 1 0l . We are interested in deriving an estimator for ( )0T that is a
linear function of the sample data and unbiased for ( )0T , and has minimum mean
squared error. After partitioning, ( )0T can be represented as a linear combination of the
sampled and remaining parts,
( )0I I II IIT ′ ′= +Z Zl l , (5.4)
71
where ( )1 0I n′= ⊗1l and ( )1 0II N n−
′= ⊗1l .
The linear estimator of ( )0T is defined as linear function of IZ ,
( ) ( )0ˆI IT ′ ′= + w Zl , (5.5)
where ( )0 1′′ ′=w w w is a 2 1n × vector.
5.3.2. Unbiasedness constraint and optimization criterion
As shown in Section 3.5, the unbiasedness for ( )0T implies constraint (3.24), or
namely,
( )0 0n N n′ − − =w 1 . (5.6)
Under this constraint, the mean square error of ( )0T is the same as it variance,
( )( ) ( ),0
,
ˆvarI I II
II
II I II II
T⎛ ⎞⎛ ⎞
′ ′ ⎜ ⎟⎜ ⎟= −⎜ ⎟⎜ ⎟ −⎝ ⎠⎝ ⎠
V V ww
V Vl
l. (5.7)
To minimize (3.25) subject to (3.24), we define the optimization function as
( ) ( ) ( ) ( )( )0
0 1 0 1 , 0
1
2 2I I II II n N n λ⎛ ⎞
′ ′ ′ ′ ′⎜ ⎟Φ = − + − −⎜ ⎟⎝ ⎠
ww w w V w w V w 1
wl , (5.8)
where λ is a Lagrangian multiplier, and IV is the expected variance-covariance matrix
of the sample. Differentiating (3.26) and setting the derivatives to zero leads to the
following estimating equations,
( )
( )
( )
0
,
1
ˆ1 11
ˆ0 0
ˆ1 0 0 1
n N n n
n
f
N fλ
⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⊗ ⊗ − − ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ =⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟′⊗ −⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠
wΣ P 1 Σ 1
w
1
,
where Σ is given in Section 2.2. This leads to the unique solution,
72
0
011
ˆ
ˆ
nn
n
Nn β
⎛ ⎞⎛ ⎞ ⎛ ⎞= − + ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟−⎝ ⎠⎝ ⎠ ⎝ ⎠
1w 1
1w 0,
where 201 01 1β σ σ= is the linear regression coefficient of ( )0y on ( )1y in the finite
population (Cochran 1977). Consequently, the estimator for ( )0T is,
( ) ( )( )( )
( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )( )( ) ( ) ( )( )
00
1 0 1 1 *
0 0 1 *01
0 0 1 1 1 101 01
ˆ 垐
,
I
I
n nn I I I
T
NN nn N n n
nY N n Y Y n Y
β
β μ β μ
⎛ ⎞⎜ ⎟′ ′ ′= +⎜ ⎟⎝ ⎠
′ ′⎛ ⎞⎛ ⎞′= + − − ⎜ ⎟⎜ ⎟− ⎝ ⎠⎝ ⎠
= + − − − − −
Yw w
Y
1 11 Y Y Y
l
(5.9)
where ( ) ( )0 01n IY
n′= 1 Y and ( ) ( )1 11
n IYn
′= 1 Y are the sample means of the response and
auxiliary variables, respectively. Let us write (5.9) alternatively as,
( ) ( ) ( ) ( )( )0 1 0 101 01T N N Y Yβ μ β= + − . (5.10)
Identity (5.10) implies that, in estimating ( )0T , we would be estimating ( )0T as a
constant ( )( )101Nβ μ plus the finite population estimating function
( ) ( )( )0 101N μ β μ− ,
for which the best sample-based estimating function is
( ) ( )( )0 101N Y Yβ− .
The variance of ( )0T is
( )( ) ( )0 2 2 20 01
1ˆvar 1fT Nn
σ ρ−= − , (5.11)
73
where 2 2 201 01 0 1ρ σ σ σ= is the population correlation coefficient between the response
and auxiliary variable.
Similarly, the estimator of the mean for the response variable and its variance
are
( ) ( ) ( ) ( )( )0 0 1 101ˆ Y Yμ β μ= − − (5.12)
and
( ) ( )2 20 0 01
1ˆvar 1fn
μ σ ρ−= − , (5.13)
respectively.
Derivations of the above results are given in Section 5.7.
5.3.3. Special case: ( )0′′=w w 0
A special case is when 1w is forced to be a null vector, such that ( )0′′=w w 0 .
Using a similar derivation as above, it is shown that, 0ˆ nN n
n−
=w 1 , which is the weight
of the simple expansion estimator. Therefore, the estimator for ( )0T when
( )0′′=w w 0 is ( ) ( )0 0T NY= , and its variance is ( )( )0 2 2
01ˆvar .fT N
nσ−⎛ ⎞= ⎜ ⎟
⎝ ⎠
5.3.4. Remarks
Unlike the usual calibration technique of incorporating auxiliary information,
this approach leads to a set of weights that do not depend on sample, and yields a linear
unbiased minimum variance estimator of ( )0T . Estimator (5.10) is seen to be a
generalized regression estimator (GREG); this has been shown to be both design- and
model-unbiased (Särndal, Swensson and Wretman 1989; Särndal, Swensson and
74
Wretman 1992; Thompson 1997). That is, we have derived under a design-based
framework an estimator that is exactly the same as a GREG estimator. This estimator
depends on neither a superpopulation model nor an assumption about a regression
model about the response and auxiliary variables.
5.4. Estimating ( )0T and ( )0μ with several known auxiliary totals
In this section, we extend the results in Section 5.3 to scenarios with multiple
auxiliary variables. Similar to Model (5.1), we define a random permutation seemingly
unrelated regression model that includes p auxiliary variables,
( )
( )( )
( )
( )
0 0
1p N
μ•
+•
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟= ⊗ +⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
YI 1 E
Y μ, (5.14)
where ( ) ( ) ( ) ( )( )1 2 p• ′′ ′ ′=Y Y Y Y is the vector of all p auxiliary variables and
( ) ( ) ( ) ( )( )1 2 pμ μ μ• ′=μ is the vector of auxiliary means.
Since the vector of auxiliary totals ( ) ( ) ( ) ( )( )1 2 pT T T• ′=T (and thus ( )•μ )
is known, the N random variables arising from permuting the k -th auxiliary variable
have a constraint ( ) ( )k kN T′ =1 Y , where 1,2, ,k p= … . We transform ( )•Y by pre-
multiplying Np N NN
⎛ ⎞′⊗ +⎜ ⎟⎝ ⎠
1I 1 P , such that,
( ) ( ) ( ) ( )Np N p NN
• • •⎛ ⎞′= ⊗ + ⊗⎜ ⎟⎝ ⎠
1Y I 1 Y I P Y , (5.15)
75
The first component of (5.15) is a vector of constants equal to ( )Np N
•⎛ ⎞⊗⎜ ⎟⎝ ⎠
1I T . We
denote the second component of (5.15) as
( ) ( ) ( ) ( ) ( ) ( )*p N p N
•• • •= ⊗ = − ⊗Y I P Y Y I 1 μ .
This transformation can be represented jointly in terms of both ( )0Y and ( )•Y ,
( )
( )
( )
( ) ( )
0 0
*
N
N Np N p N pN N
•• •
⎡ ⎤⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎢ ⎥⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ + = +⎢ ⎥⎜ ⎟ ⎜ ⎟⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟⊗ ′⊗ ⊗⎢ ⎥⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎣ ⎦
0 0 0I 0 Y Y
1 10 I P 0 I 1 I TY Y.
After this transformation, Model (5.14) becomes,
( )
( )( )
( )0 0
1* p N
μ+
•
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟= ⊗ +⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
YI 1 E
Y 0. (5.16)
As shown in Chapter 2, Equation A2.1.15, ( )
( )
0
cov N•
⎛ ⎞⎜ ⎟ = ⊗⎜ ⎟⎝ ⎠
YΣ P
Y, where
( )
20 01 0
220 010 1 1
0
20 1
p
p
p p p
σ σ σ
σσ σ σ
σ σ σ
•
• •
⎛ ⎞⎜ ⎟
′⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟= =⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟
⎜ ⎟⎝ ⎠
σΣ
σ Σ.
After partitioning into sample and remaining parts, ( )I II′′ ′=Z Z Z , where
( ) ( )( )0 *I I I
• ′′ ′=Z Y Y is the sample and ( ) ( )( )0 *II II II
• ′′ ′=Z Y Y is the remaining part. It
immediately follows that ( ) ( )( )0
1I p nEμ
+
⎛ ⎞⎜ ⎟= ⊗⎜ ⎟⎝ ⎠
Z I 10
, ( ) ( )( )0
1II p N nEμ
+ −
⎛ ⎞⎜ ⎟= ⊗⎜ ⎟⎝ ⎠
Z I 10
,
( )
20 0
,0
I n N
σ •
• •
′⎛ ⎞⎜ ⎟= ⊗⎜ ⎟⎝ ⎠
σV P
σ Σ and
( )( )
20 0
, ,0
1I II II I n N nN
σ •
× −• •
′⎛ ⎞⎛ ⎞⎜ ⎟′= = ⊗ −⎜ ⎟⎜ ⎟ ⎝ ⎠
⎝ ⎠
σV V J
σ Σ.
76
The population total of the response variable, the population parameter of our
primary interest, can be defined as,
( ) ( )( )
( )
( )
( )
0 00 0
*NT• •
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟′ ′ ′= = =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
Y Y1 Y
Y Yl l , (5.17)
where ( )( )1N p N×′′ ′= ⊗1 1 0l . After partitioning, ( )0T can be represented as a linear
combination of the sampled and remaining parts,
( )0I I II IIT ′ ′= +Z Zl l ,
where ( )11I p n×′= ⊗0 1l and ( )11II p N n× −
′= ⊗0 1l .
We derive an estimator for ( )0T that is a linear function of IZ , unbiased for ( )0T
and has minimum variance. The linear estimator of ( )0T is defined as linear function of
the sample,
( ) ( )0ˆI IT ′ ′= + w Zl , (5.18)
where ( )( )0 •′′ ′=w w w is a 1pn × vector of weights, and ( ) ( )1 2 p•
′′ ′ ′=w w w w .
The unbiasedness for ( )0T requires that ( ) 0I II IIE ′ ′− =w Z Zl . Since
( ) ( ) ( )( )01I p nE μ+
′= ⊗Z I 1 0 , ( ) ( ) ( )( )01II p N nE μ+ −
′= ⊗Z I 1 0 , this constraint is
equivalent to
( )( ) ( )( )
( )0 *0
1
0p
kn n
k
N n μ μ=
′ ′− − + =∑1 w 1 w i . (5.19)
Since ( )* 0kμ = for any 1,2, ,k p= … , (5.19) is satisfied for any ( )0μ simply by
( )0 0n N n′ − − =1 w . (5.20)
77
Similar to (3.25), the variance of ( )0T can be represented in terms of partitioned
matrices that include multiple auxiliary totals,
( )( )0,
ˆvar 2I I II II II II IIT ′ ′ ′= − +w V w w V L Vl l .
where ( )0 1 p′′ ′ ′=w w w w is an ( )1 1n p + × vector, ( )11II p N n× −
′= ⊗0 1l . Under
unbiasedness constraint (5.20), the Lagrangian function is defined as
( ) ( ){ }, 02 2I I II II n N n λ′ ′ ′Φ = − + − −w w V w w V w 1l , (5.21)
where λ is a Lagrangian multiplier. Differentiating (5.21) with respect to w and λ
results in the following estimating equations,
( )( )
( )1 , 1
1
I n I II N n
n N nλ−
⊗⎛ ⎞ ⊗⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟′ ′⊗ −⎝ ⎠ ⎝ ⎠⎝ ⎠
V u 1 w V u 1u 1 0
, (5.22)
where ( )1 11 p×=u 0 is the first column of identity matrix 1p+I . The unique solution to
(5.22) is
( )( )11 11 1 1 1
10
ˆ
1,
n n
n n
NnNn
−− −
•
′= − ⊗ + ⊗
⎛ ⎞= − ⊗ + ⊗⎜ ⎟−⎝ ⎠
w u 1 Σ u u Σ u 1
u 1 1β
(5.23)
where 10 0.
−• • •=β Σ σ Consequently,
( ) ( ) ( ) ( )( )0 00
ˆ ,T NY N • ••′= − −β Y μ
( )( ) ( )( )0 2 2 200
1ˆvar 1 .fT Nn
ρ σ•
−⎛ ⎞⎛ ⎞= − ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
where ( )•Y is a 1p × column vector of sample means of the p auxiliary variables, ( )•μ
is the column vector of known auxiliary means, and ( )1 2
0 0 00ρ σ−• • •• ′= σ Σ σ is the
78
multiple correlation coefficient of ( )0Y on ( )*•Y . This expression is similar to those
expression commonly seen in multiple linear regression models (Graybill 1976), except
its finite population correction factor. It is convenient to show
( ) ( ) ( )( )00 0ˆ ,Yμ • •
•′= − −β Y μ (5.24)
( ) ( )( )2 20 00
1ˆvar 1 fn
μ ρ σ•
−⎛ ⎞= − ⎜ ⎟⎝ ⎠
. (5.25)
Detailed derivations of the above results are summarized in Section 5.8.
5.5. Estimation of variance and confidence intervals
The estimators derived in this Chapter depend on the knowledge of the
covariance vector 0•σ between the response variable and the auxiliary variables, and the
variance-covariance matrix •Σ of auxiliary variables. They are often unknown to
surveyors in practice.
1) When auxiliary values for all units in the population are known, •Σ is
known, we only need to approximate 0•σ .
2) When individual auxiliary values are only known for the sampled units, we
may need to approximate both •Σ and 0•σ .
The problem of associating standard errors and confidence intervals with the
estimators can be assessed in several ways. One commonly used approach is to replace
0•σ and •Σ with their sample estimates, which is referred as methods of moments in
literature.
79
5.5.1. Normal approximation
To compute the variance of the estimators (5.10) and (5.24), one needs to know
20σ , 01σ (or 0•σ ), and 2
1σ (or •Σ ). Using the method of moments, one may simply use
their sample estimates, i.e.,
( ) ( )0 020
1 1ˆ1 I n n In n
σ ⎛ ⎞′= −⎜ ⎟− ⎝ ⎠Y I J Y ,
( ) ( )0 101
1 1ˆ1 I n n In n
σ ⎛ ⎞′= −⎜ ⎟− ⎝ ⎠Y I J Y ,
( ) ( )1 121
1 1ˆ1 I n n In n
σ ⎛ ⎞′= −⎜ ⎟− ⎝ ⎠Y I J Y .
However, if 21σ (or •Σ ) is provided or auxiliary values are known for all subjects, one
only need to use plug-in estimates for 20σ and 01σ (or 0•σ ). For example, using
methods of moment estimates for 20σ and 01σ , (5.10) leads to
( ) ( ) ( ) ( )( )( )0 0 1 101
ˆT N Y Yβ μ= + − ,
where 201 01 1
ˆ 垐β σ σ= if 21σ is unknown or 2
01 01 1ˆ ˆβ σ σ= if 2
1σ is known. The estimated
variance of ( )0T is,
( )( ) ( )0 2 2 20 01
1ˆ ˆˆvar 1fT Nn
σ ρ−= − ,
where 2 2 2 201 01 0 1ˆ 垐 �ρ σ σ σ= if 2
1σ is unknown or 2 2 2 201 01 0 1ˆ 垐ρ σ σ σ= if 2
1σ is known. An ad
hoc confidence interval based on appropriate normal-based prediction interval for ( )0T
would be
( ) ( )01 0varNY z Tα−± .
80
5.5.2. Jackknife variance
Another widely recommended variance estimation procedure is Jackknife
variance estimation (Royall and Cumberland 1978; Wolter 1985; Särndal, Swensson
and Wretman 1992; Stukel, Hidiroglou and Särndal 1996; Duchesne 2000; Valliant
2002). Suppose we are interested in estimating the variance of 0μ using (5.12); the
process of Jackknife variance estimation is as follows:
1) Partition the sample into B random groups of equal size m n B= , assuming
each of the m groups is a SRSWOR subsample from the sample;
2) For each group, compute a pseudo estimator ( )( )0 , 1, 2, ,b b mμ = … of ( )0μ
with the same function form as (5.12) based on the remaining units of the
sample after omitting the units in the group, ( )( )
( )( )
( )( ) ( )( )0 0 1 1
01ˆ b b bY Yμ β μ= − − .
We then define ( ) ( ) ( ) ( )( )0 0 0垐 �1b bm mμ μ μ= − − , where ( )0μ is computed with
(5.12) based on full sample.
Then the Jackknife estimator of ( )0μ is
( )0 0
1
1垐
m
JK bbm
μ μ=
= ∑
and the Jackknife variance estimator is defined as
( )
( ) ( )( )20 0
1
1ˆ 垐1
m
JK b JKb
Vm m
μ μ=
= −− ∑ .
An appropriate normal-based ( )1 100%α− × confidence interval for ( )0T would be
( )( )00 1
垐ˆ JKN z V Tαμ −± .
81
5.5.3. Bootstrap variance estimator and confidence intervals
Standard error and confidence intervals can be also computed using Bootstrap
method (Efron 1979; Efron and Tibshirani 1993). To apply bootstrap method in finite
sampling, modifications must be made (McCarthy and Snowden 1985). Suppose that a
sample ( ) ( )( )0 1I I I
′′ ′=Z Y Y has been drawn and observed. A bootstrap sample
( ) ( )( )0 1b bbI I I
′′ ′=Z Y Y is obtained by randomly sampling with replacement *n times
from IZ , where ( ) ( )1 1*n n f= − − , f n N= (McCarthy and Snowden 1985). In
matrix format, each bootstrap sampling process may be represented as
( )2b bI I= ⊗Z I Q Z ,
where ( )1 2 n
bm m m=Q u u u ,
imu is any im -th column of identity matrix of nI ,
*1,2, ,i n= … and *1 im n≤ ≤ . The bootstrap process begins by generating a large
number ( )B of independent bootstrap samples 1 2, , , , ,b BI I I IZ Z Z Z… … . For each
bootstrap replication bIZ , 1,2, ,b B= … , the estimator ( )0 bT is calculated using (5.10)
while replacing 01β , ( )0Y and ( )1Y with the estimates from bIZ . The bootstrap estimate
of variance of ( )0T is the variance of the bootstrap replications,
( )( ) ( ) ( )( )20 0 0
1
1垐1
Bb bb
b
T T TB =
= −− ∑V ,
where ( ) ( )0 01
1
Bb b
b
T B T−
=
= ∑ . To construct a ( )100 1 %α− confidence intervals, one may
use ( )( )0垐b TV and normal approximation, such that ( ) ( ) ( )( )0 1 0垐 �bT z Tα−± V .
82
The method by McCarthy and Snowden (1985) has a weakness that
( ) ( )1 1*n n f= − − needs to be approximated if it is not an integer, the impacts of such
approximation on estimation have not investigated. Rao and Wu (1988) developed a
method to the bootstrap estimation of the standard error of a population mean, which
overcomes the non-integer *n issue of McCarthy-Snowden method.
5.6. Remarks
The estimators presented in Section 5.3 and 5.4 are identical to the estimator by
(Cochran 1977) (page 191-192), and GREG estimators derived from model-assisted
approach (Särndal, Swensson and Wretman 1992)(page 273). In addition, the estimators
and their variances derived in Section 5.3 and 5.4 are the same as the asymptotically
design-unbiased estimator and its asymptotic design-variance derived from a model-
calibration approach based on a superpopulation framework (Wu and Sitter 2001).
They have a functional form similar to those of calibration estimators (Deville
and Särndal 1992) and estimators derived from model-based prediction approach
(Valliant, Dorfman and Royall 2000)(page 32); (Bolfarine and Zacks 1992) (page37).
When the variance-covariance of the auxiliary variables ( )•Σ and the covariance
between response and auxiliary variables ( )0•σ are unknown, we apply these results by
substituting their sample estimates (method of moments). The resulting estimators will
be the same as usual calibration estimators (Deville and Särndal 1992).
However, our results do not rely on model assumptions that are necessary for
superpopulation models and model-assisted approaches. In addition, these estimators
and their variance are exact but not approximated.
83
These estimators require the knowledge of both 01σ and 21σ , and their variances
are always smaller than variance of simple expansion estimator. The gain in the
precision of prediction is proportional to the squared (multiple) correlation coefficient
of the response variable and auxiliary variable(s).
5.7. Derivations of results in Section 5.3.
As shown in Section 5.2, the reparameterized model is
( )
( )( )
( )0 0
21 * 0N
μ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟= ⊗ +⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
YI 1 E
Y.
The parameter of interest is population total of the response variable that is defined as,
( ) ( )( )
( )
00
1 *NT⎛ ⎞⎜ ⎟′ ′= =⎜ ⎟⎝ ⎠
YZ 1 0
Yl ,
where ( )1N N×′′1 0l = . After partitioning, we denote the partitioned vector as Z , and
the sample and remaining parts as IZ and IIZ respectively, such that ( )I II′′ ′=Z Z Z .
Thus, ( )0T can be expressed as ( )0I I II IIT ′ ′= +Z Zl l , where ( )1 0I n′ ′= ⊗1l and
( )1 0II N n−′ ′= ⊗1l .
We define a linear estimator for ( )0T as a linear function of IZ , such that
( ) ( )0ˆI IT ′ ′= + w Zl ,
where w can be defined in two possible ways, i.e., ( )0 1′′ ′=w w w and ( )0
′′=w w 0 ,
0w and 1w are vectors of weights corresponding to the response and auxiliary variable.
84
Scenario 1: ( )0 1′′ ′=w w w
The unbiasedness for ( )0T requires ( ) 0I II IIE ′ ′− =w Z Zl , equivalently,
( )( ) ( ) ( )0 1 *0 1 0n nN n μ μ′ ′− − + =1 w 1 w . Since ( )1 * 0μ = , this constraint is satisfied for any
( )0μ simply by 0 n N n′ = −w 1 . Therefore, the optimization function is
( ) ( ) ( )( )0
0 1 0 1 , 0
1
2 2I I II II n N n λ⎛ ⎞
′ ′ ′ ′ ′Φ = − + − −⎜ ⎟⎜ ⎟⎝ ⎠
ww w V w w V w 1
wl ,
which can be simplified to
( ) ( ) ( )( )
2 20 0 , 0 01 1 , 0 1 1 , 1
20 0 01 1 0
2
2 1 2 1
n N n N n N
n n nf N f
σ σ σ
σ σ λ
′ ′ ′Φ = + +
′ ′ ′+ − + + − −
w P w w P w w P w
w 1 w 1 w 1 (5.26)
Differentiating (5.26) with respect to 0w , 1w and λ yields following estimating
equations,
( )
( )
( )
2 20 , 01 , 0 0
201 , 1 , 1 01
1
1
1
n N n N n n
n N n N n
n
f
f
N f
σ σ σ
σ σ σ
λ
⎛ ⎞ ⎛ ⎞− −⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ = − −⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟′ −⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
P P 1 w 1
P P 0 w 1
1 0 0
,
or
( )
( )
( )
0
,
1
1 11
0 0
1 0 0 1
n N n n
n
f
N fλ
⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⊗ ⊗ − − ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ =⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟′⊗ −⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠
wΣ P 1 Σ 1
w
1
.
Solving these estimating equations leads to
85
( ) ( ) ( ) ( )
( )
1
1 1 1 1, ,
1
1
22 010 2
1
1 1ˆ 1 1 0 1 0
0 0
11 0
0
n n N n n n N nf N
N nn
N nn
λ
σσσ
−
− − − −
−
−
⎧ ⎫⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎪ ⎪⎜ ⎟′ ′⎜ ⎟ ⎜ ⎟= − − ⊗ ⊗ +⎜ ⎟ ⎜ ⎟⎨ ⎬⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎪ ⎪⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎩ ⎭⎝ ⎠
⎛ ⎞⎛ ⎞− ⎜ ⎟= − ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎛ ⎞−
= − −⎜ ⎟⎝ ⎠
Σ 1 P 1 Σ Σ 1 P 1
Σ
0
01 011 2 2
1 1
1 1ˆ 1
ˆ 0n n n
fN Nn nσ σ
σ σ
−⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎜ ⎟ ⎜ ⎟
= − ⊗ + ⊗ = ⊗⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ − −⎝ ⎠ ⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠
w1 1 1
w
( ) ( )( ) ( ) ( )( )
( )
( ) ( )
( )
00 01
0 1 2 1 *1
0 1 *0121
010 1 12
1
ˆ II I n n n n
I
n I n I
NTn
Nn
N y y
σσ
σσ
σ μσ
⎛ ⎞⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟′ ′ ′ ′ ′ ′ ′= + = − + −⎨ ⎬⎜ ⎟ ⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭⎝ ⎠
⎛ ⎞′ ′= −⎜ ⎟
⎝ ⎠
⎛ ⎞= − −⎜ ⎟
⎝ ⎠
Yw w Z 1 0 1 0 1 1
Y
1 Y 1 Y
l
( )( ) ( )( ) ( ) ( )( )
( ){ }( ) ( ){ }
( )
00 1 0 1
22 2
01 , 0 01
01
2 2 201 0
ˆvar cov
11
1 1 ,
I I I
n n N n
T
N N nNn n
fNn
β σ ββ
ρ σ
′ ′ ′ ′ ′= + +
⎧ ⎫⎛ ⎞ −⎪ ⎪⎛ ⎞ ′= − ⊗ ⊗ ⊗ = −⎜ ⎟⎨ ⎬⎜ ⎟ ⎜ ⎟⎝ ⎠ −⎪ ⎪⎝ ⎠⎩ ⎭
−⎛ ⎞= −⎜ ⎟⎝ ⎠
w w Z w w
1 Σ P 1
l l
where 201 01 1β σ σ= and
201
01 2 20 1
σρσ σ
= is the correlation coefficient between the
response and auxiliary variable.
86
Scenario 2: ( )0′′=w w 0
The unbiasedness for 0T requires ( ) 0I II IIE ′ ′− =w Z Zl , equivalently,
( )( ) ( ) ( )0 1 *0 0n nN n μ μ′ ′− − + =1 w 1 0 . This constraint is satisfied for any 0μ simply by
0 n N n′ = −w 1 . Therefore, the optimization function is
( ) ( ) ( )( )0 0 0 , 02 2n
n I n I II II n N n λ⎛ ⎞⎛ ⎞′ ′ ′⎜ ⎟Φ = − + − −⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
Iw I 0 V w w I 0 V w 1
0l ,
which can be simplified to
( ) ( )( )2 20 0 , 0 0 0 02 1 2n N n nf N nσ σ λ′ ′ ′Φ = − − + − −w P w w 1 w 1 (5.27)
Differentiating (5.26) with respect to 0w and λ and setting the derivatives to zero
yields following estimating equations,
( )
( )
2 20 , 0 01
0 1
n N n n
n
f
N f
σ σ
λ
⎛ ⎞ ⎛ ⎞− −⎛ ⎞⎜ ⎟ ⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟′ −⎝ ⎠ ⎝ ⎠⎝ ⎠
P 1 w 1
1,
Solving these estimating equations leads to
0ˆ nN n
n−
=w 1 ,
which is the weight of simple expansion estimator. Therefore, the estimator for ( )0T is
( ) ( )0 00
ˆn I
NT Nyn
′= =1 Y ,
and its variance is
( )( )0 2 20
1ˆvar .fT Nn
σ−⎛ ⎞= ⎜ ⎟⎝ ⎠
87
5.8. Derivations of results in Section 5.4.
This section provides proof of the results in Section 5.4. The derivations are
analogues to those presented in Section 5.7. The population total of the response
variable that is defined as,
( ) ( )( )
( )
00
1 *NT•
⎛ ⎞⎜ ⎟′ ′ ′= = ⊗⎜ ⎟⎝ ⎠
YZ u 1
Yl ,
where 1 N′ ′⊗u 1l = and ( )1 11 p×′=u 0 . After partitioning, we denote the sample and
remaining parts as IZ and IIZ , and ( )0T can be expressed as ( )0I I II IIT ′ ′= +Z Zl l , where
1I n′ ′ ′= ⊗u 1l and 1II N n−′ ′ ′= ⊗u 1l .
The estimator for ( )0T as ( ) ( )0ˆI IT ′ ′= + w Zl , where ( )0 1 p
′′ ′ ′=w w w w ,
kw , 0,1, 2, ,k p= … are vectors of weights corresponding to the response and the p
auxiliary variables.
The unbiasedness for ( )0T requires ( ) 0I II IIE ′ ′− =w Z Zl , equivalently,
( )( ) ( ) ( )0 *0
1
0p
kn n k
k
N n μ μ=
′ ′− − + =∑1 w 1 w . Since ( )* 0kμ = for any 1, 2, ,k p= … , this
constraint is satisfied for any ( )0μ simply by 0 n N n′ = −w 1 .
Moreover, the variance of ( )0T can be represented as,
( )( )0,
ˆvar 2I I II II II II IIT ′ ′ ′= − +w V w w V Vl l l .
The Lagrangian function is defined as
( ) ( ) ( ){ }, 1 02 2I I II N n n N n λ−′ ′ ′Φ = − ⊗ + − −w w V w w V u 1 w 1 , (5.28)
88
where λ is a Lagrangian multiplier. Differentiating (5.28) with respect to w and λ
gives,
( ) ( ) ( )
( ) ( )
, 1 1
0
2 2 2I I II N n n
n N n
λ
λ
−
∂⎧ Φ = − ⊗ + ⊗⎪∂⎪⎨∂⎪ ′Φ = − −⎪ ∂⎩
w V w V u 1 u 1w
w w 1
. (5.29)
Setting both equations of (5.29) to zeros results in the following estimating equations
( )( )
( )1 , 1
1
I n I II N n
n N nλ−
⊗⎛ ⎞ ⊗⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟′ ′⊗ −⎝ ⎠ ⎝ ⎠⎝ ⎠
V u 1 w V u 1u 1 0
(5.30)
The solutions of (5.30) are
( ) ( )
( ) ( )( ) ( ) ( ) ( )( )
1 1, 1 1
11 11 1 1 , 1
ˆˆ
ˆI I II N n I n
n I n n I I II N n N n
λ
λ
− −−
−− −−
⎧ = ⊗ − ⊗⎪⎨
′ ′ ′ ′= ⊗ ⊗ ⊗ ⊗ − −⎪⎩
w V V u 1 V u 1
u 1 V u 1 u 1 V V u 1 . (5.31)
Since 1 1 1,I n N
− − −= ⊗V Σ P , ( ),1
I II n N nN × −⎛ ⎞= ⊗ −⎜ ⎟⎝ ⎠
V Σ J , ( )1
, 11
I I II p n N nN n−
+ × −= − ⊗−
V V I J ,
1,n N n n
NN n
− =−
P 1 1 and 1,n n N n
NnN n
−′ =−
1 P 1 , (5.31) can be simplified as
( )( )11 1
ˆˆ n nN
N nλ−= − ⊗ − ⊗
−w u 1 Σ u 1 and ( ) 11
1 1ˆ N n
nλ
−−− ′= − u Σ u .
Therefore,
( )( )11 11 1 1 1ˆ n n
Nn
−− −′= − ⊗ + ⊗w u 1 Σ u u Σ u 1 , (5.32)
Since ( ) ( )
( ) ( )
1 11 2 1 2 220 0 0 0 0 0 0 01 0 0
1 12 2 20 0 0 0 0 0 0 0 0
σ σ σσ
σ σ σ
− −− − − −• • • • • • •− •
− −− − −• • • • • • • • •
⎛ ⎞′ ′ ′− − −′⎛ ⎞ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ′ ′− − −⎝ ⎠
σ Σ σ σ Σ σ σσΣ
σ Σ Σ σ σ σ Σ σ σ,
( ) 11 2 11 1 0 0 0σ
−− −• • •′ ′= −u Σ u σ Σ σ and
( )( )
12 10 0 01
1 12 20 0 0 0 0
σ
σ σ
−−• • •
−−− −
• • • •
⎛ ⎞′−⎜ ⎟= ⎜ ⎟′ ′− −⎜ ⎟⎝ ⎠
σ Σ σΣ u
Σ σ σ σ, we have
89
( )( ) 11 1 2 20 0 0 0 0 0 0
1ˆ
1n nNn σ σ
−−• • • • • • •
⎛ ⎞⎜ ⎟= − ⊗ + ⊗
′ ′− − −⎜ ⎟⎝ ⎠
w u 1 1σ Σ σ Σ σ σ σ
. (5.33)
Denote ( )( ) 11 2 20 0 0 0 0 0 0 01 σ σ
−−• • • • • • • •′ ′= − −β σ Σ σ Σ σ σ σ , it can be shown that
( )( )
( ) ( ){ }( )( )
11 2 20 0 0 0 0 0 0 0
11 2 1 20 0 0 0 0 0 0
11 2 1 2 10 0 0 0 0 0 0
10
1
.
p p
p p
σ σ
σ σ
σ σ
−−• • • • • • • •
−− −• • • • • • • •
−− − −• • • • • • • •
−• •
′ ′= − −
′ ′= − −
′ ′= − −
=
β σ Σ σ Σ σ σ σ
I Σ σ σ Σ I Σ σ σ σ
I Σ σ σ I Σ σ σ Σ σ
Σ σ
Therefore, (5.33) is simply
10
1ˆ n n
Nn •
⎛ ⎞= − ⊗ + ⊗⎜ ⎟−⎝ ⎠
w u 1 1β . (5.34)
Subsequently, the estimator for ( )0T is
( ) ( ) ( ) ( )( )0 00
ˆ ,T NY N • ••′= − −β Y μ (5.35)
which has variance of
( )( ) ( ){ } ( )
( )( )
20
00
2 2 200
1ˆvar 1 cov
11 .
n I nNTn
fNn
ρ σ
••
•
⎧ ⎫⎛ ⎞⎪ ⎪⎛ ⎞ ⎜ ⎟′= − ⊗ ⊗⎨ ⎬⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎪ ⎪⎝ ⎠⎩ ⎭
−⎛ ⎞⎛ ⎞= − ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
β 1 Z 1β
where ( )1 2
0 0 00ρ σ−• • •• ′= σ Σ σ is the multiple correlation coefficient of ( )0Y on ( )*•Y .
90
CHAPTER 6
ESTIMATORS BASED ON RANDOM PERMUTATION MODELS
UNDER STRATIFIED SIMPLE RANDOM SAMPLING
6.1. Motivation
In this Chapter, we extend the results based on RPSUR for SRSWOR to settings
of stratified simple random sampling without replacement (STSRS). We consider two
scenarios where 1) no auxiliary information is available, and 2) one auxiliary variable is
known to some extent. For each scenario, we first derive general results when sampling
fractions of strata may or may not be equal, and then consider a special case when
sampling fractions are proportional to stratum sizes.
6.2. Definition and notations
Strata are H nonoverlapping subpopulations within the population of size N ,
each subpopulation, referred to as a stratum, has a known size of 1 2, , , HN N N… ,
respectively, and 1
H
hh
N N=
=∑ .
The stratified simple random sampling without replacement (hereafter, STSRS)
refers to the procedure that a sample of fixed sizes is drawn from each stratum
independently using SRSWOR scheme, and the H samples across strata constitute a
STSRS sample. The sample sizes within strata are denoted by 1 2, , , Hn n n… ,
respectively.
91
Following conventional notation for stratified sampling (Cochran 1977;
Bolfarine and Zacks 1992; Särndal, Swensson and Wretman 1992; Thompson 1997),
we adopt the following notations in this Chapter.
Suffix h , 1,2, ,h H= … , denotes the stratum. The following notations all refer
to stratum h .
hN total number of subjects
hn number of subjects in sample
( )khsy value of variable k of subject s
h ha N N= , stratum weight
h h hf n N= sampling fraction in the stratum
( ) ( )11
hNk kh h hss
N yμ −=
= ∑ stratum mean of the k -th variable
( ) ( )11
hnk kh h hii
Y n Y−=
= ∑ stratum sample mean of the k -th variable
( ) ( ) ( )( )0 1 ph h h hμ μ μ ′=μ vector of stratum means for all 1p + variables
hΣ stratum variance-covariance matrix
In addition, the vector of stratum means of the k -th variable is denoted as
( ) ( ) ( ) ( )( )1 2k k k k
Hμ μ μ ′=μ .
For matrix operations, we represent a block diagonal matrix as a direct sum,
1
H
hh=⊕ B , of matrices hB , 1,2, ,h H= … . A matrix consists of a column or row of matrices
92
or vectors is represented by { } ( )1 21
Hc h Hh=
′′ ′ ′=B B B B and
{ } ( )1 21
Hr h Hh=
=B B B B , respectively (Searle, Cassella and McCulloch 1992).
6.3. Independent permutation of one response variable across H strata
Applying results of Chapter 2, the random variables arising from permuting the
response variable y in the h -th stratum can be represented as
h h h=Y U y ,
where 1,2, ,h H= … , hU is of dimension h hN N× . The independent permutations of
H strata can be represented as
1 1 1
2 2 2
H H H
⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟=⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
Y U 0 0 y
Y 0 U 0 y
0
Y 0 0 U y
. (6.1)
Denote { } 1
Hhc h=
=Y Y and { } 1
Hhc h=
=y y , (6.1) can be simply represented as
1
H
hh=
⎛ ⎞= ⊕⎜ ⎟⎝ ⎠
Y U y . (6.2)
It immediately follows that,
( )1 1 h
H H
h Nh hE E
= =
⎧ ⎫⎛ ⎞ ⎛ ⎞= ⊕ = ⊕⎨ ⎬⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎩ ⎭
Y U y 1 μ ,
where ( )1 2 Hμ μ μ ′=μ .
Since Y representing the joint permutations of the H strata can be rewritten as
( ) ( ){ } 11 h
H Hh N c h hh
vec==
⎛ ⎞′= ⊕ ⊗⎜ ⎟⎝ ⎠
Y y I U , (6.3)
93
the variance-covariance matrix of Y is thus,
( ) ( ) ( ){ }( ) ( )11 1cov cov
h h
H HHh N c h h Nhh h
vec== =
′⎛ ⎞ ⎛ ⎞′= ⊕ ⊗ ⊕ ⊗⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
Y y I U y I .
Since the permutations of the H strata are independent, ( ) ( )( )cov , 0h hvec vec ′ =U U
when h h′≠ . Consequently, we have
( ){ }( ) ( )( ){ }1 1cov cov
HHc h hh hvec vec
= == ⊕U U
and thus
( ) ( )2
1cov
h
H
h Nhσ
== ⊕Y P ,
where 2 11 hh N
hNσ ′=
−y P y is the population variance of y in h -th stratum.
6.4. Simultaneous permutations of one response variable and p auxiliary
variables across H strata
The ( )1 hp N+ × random variables arising from permuting variables
( ) ( ) ( )0 1, , , py y y… in the h -th stratum can be represented as
( ){ } ( ) ( ){ }10 0
p pk kh p h hc ck k+= =
= ⊗Y I U y , (6.4)
where 1,2, ,h H= … , hU is of dimension h hN N× . We denote ( ){ }0
pkh hc k ==Y Y and
( ){ }0
pkh hc k ==y y and rewrite identity (6.4) as
( )1h p h h+= ⊗Y I U y . (6.5)
By applying (6.5) to (6.2) , the joint permutations of 1p + variables in all H strata can
be represented as
94
( )
1 1
2 2
11
H
p hh
H H
+=
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎛ ⎞⎜ ⎟ ⎜ ⎟= ⊕ ⊗⎜ ⎟
⎝ ⎠⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
Y y
Y yI U
Y y
. (6.6)
Denote { } 1
Hhc h=
=Y Y and { } 1
Hhc h=
=y y , (6.6) can be simplified as
( )11
H
p hh +=
⎛ ⎞= ⊕ ⊗⎜ ⎟⎝ ⎠
Y I U y . (6.7)
It immediately follows that,
( ) ( ) ( )1 11 1 h
H H
p h p Nh hE E + += =
⎧ ⎫⎛ ⎞ ⎛ ⎞= ⊕ ⊗ = ⊕ ⊗⎨ ⎬⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎩ ⎭
Y I U y I 1 μ ,
where { } 1
Hhc h=
=μ μ , ( ) ( ) ( )( )0 1 ph h h hμ μ μ ′=μ , 1,2, ,h H= … .
Similar to (6.3), re-express the permutations of the response and auxiliary
variable in the h -th stratum as
( ){ } ( ){ } ( ) ( )110 0 h
ppk kh h h p hp Nc rk k
vec ++= =
⎛ ⎞′= = ⊗ ⊗⎜ ⎟⎝ ⎠
Y Y y I I U .
In Chapter 2, we have shown that
( )covhh h N= ⊗Y Σ P ,
where hΣ is the population variance-covariance matrix of the h -th stratum. Therefore,
the joint variance-covariance matrix for all H strata is
( ) ( ) ( )1 1
cov covh
H H
h h Nh j= == ⊕ = ⊕ ⊗Y Y Σ P .
95
6.5. Stratified random sampling and partition of random matrices
A sample of size hn , h hn N≤ , is drawn by SRSWOR from the h -th stratum,
where 1,2, ,h H= … , can be viewed as the first hn subjects in a random permutation
within the h -th stratum. The elements of random vector hY arising from the
permutation can be partitioned into a sample (indexed as I ) and the remaining (indexed
as II ) parts by pre-multiplication by a permutation matrix hK , i.e.,
hI
h h hhII
⎛ ⎞⎜ ⎟= =⎜ ⎟⎝ ⎠
ZZ K Y
Z,
where hK is defined similarly as (2.9) of Section 2.5, ( ) ( ) ( )( )0 1 phI hI hI hI
′′ ′ ′=Z Y Y Y
and ( ) ( ) ( )( )0 1 phII hII hII hII
′′ ′ ′=Z Y Y Y . The expected values of hZ are,
( )1
1
h
h h
hI p n
h hhII p N n
E E+
+ −
⊗⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⊗⎝ ⎠ ⎝ ⎠
Z I 1Z μ
Z I 1.
Further, the partitioned variance-covariance matrix of the h -th stratum is
( )
( )
,
,
covhI hI h I II
hhII hIIh II I
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟= =
⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
Z V VV
Z V V,
where ,h hhI h n N= ⊗V Σ P , ( ) ( ) ( ), ,1
h h hhh I II h II I n N nhN × −
⎛ ⎞′= = ⊗ −⎜ ⎟
⎝ ⎠V V Σ J ,
( ),h h hhII h N n N−= ⊗V Σ P .
The partitioning of all H strata can be represented jointly as
*
1
I I H
hhII II
=
⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟= = ⊕⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎝ ⎠
⎝ ⎠ ⎝ ⎠
Z RZ K Y
Z R, (6.8)
96
where { }{ }
1
1
H
hIc hHI
hIc h
=
=
⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟⎝ ⎠
YZ
Y,
{ }{ }
1
1
H
hIIc hHII
hIIc h
=
=
⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟⎝ ⎠
YZ
Y, ( )( )1 11 h h h h
H
I p n p n N nh + + × −== ⊕ ⊗ ⊗R I I I 0
and ( )( )1 11 h hh h h
H
II p p N nN n nh + + −− ×== ⊕ ⊗ ⊗R I 0 I I . After algebraic simplification, we have
( ) ( )( )
( )11*
1
11
h
h h
H
I p nH hh Hh
IIp N nh
E E+=
=
+ −=
⎛ ⎞⊕ ⊗⎛ ⎞ ⎜ ⎟⎛ ⎞⎜ ⎟= ⊕ = ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠ ⎜ ⎟⊕ ⊗⎜ ⎟⎝ ⎠ ⎝ ⎠
R I 1Z K Y μ
R I 1,
( ) ( ),
*
1 1,
cov covI I IIH H
h hh hII I II
= =
⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎜ ⎟′ ′= ⊕ ⊕ =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
⎝ ⎠
V VZ R K Y K R
V V
where 1
H
I hIh== ⊕V V , ( ), , ,1
H
I II II I h I IIh=′= = ⊕V V V and
1
H
II hIIh== ⊕V V .
6.6. Parameter of interest
The population mean of the response variable can be define as a weighted
average of stratum means, for example,
( ) ( ) ( ) ( )( )( )( )
01 1 2 1 1
11
p p H p
p
a a aμ × × ×
×
=
′= ⊗
0 0 0 μ
a 0 μ,
where ( )1 2 Ha a a ′=a , h ha N N= is stratum weight of the h -th stratum,
1,2, ,h H= . The vector of population means of the 1p + variables can be written as
( ) ( ){ }1 0
p
kc k+ =′ ′= ⊗μ a u μi , where 1k+′u is the ( )1k + -th row of identity matrix 1p+I .
6.7. Formulation of the model when auxiliary variables are present
The general approach to analyze stratified sampling data with RPSUR models is
to treat the sampling process in each stratum as an independent permutation process.
97
When one response variable ( )0y and p auxiliary variables ( ) ( ) ( )1 2, , , py y y… are present,
a RPSUR model similar to that for the simple random sampling can be defined as,
{ } ( ) { }11 11 h
HH Hh p N hc ch hh += ==
⎛ ⎞= ⊕ ⊗ +⎜ ⎟⎝ ⎠
Y I 1 μ E , (6.9)
where ( ){ }1
pkh hc k ==Y Y is defined in (6.4). Model (6.9) can be written alternatively as
= +Y Xμ E , (6.10)
where { } 1
Hhc h=
=Y Y , ( )11 h
H
p Nh +== ⊕ ⊗X I 1 and { } 1
Hhc h=
=μ μ .
Suppose that a set of H samples of fixed size hn , 1,2, ,h H= … , are drawn with
SRSWOR scheme from the H strata. Using a similar derivation as in Section 4.3, we
can partition model (6.10) into sample and remaining proportions, and write the
partitioned model as
I I
II II
⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠
Z Xμ EZ X , (6.11)
where { } 1
HI hIc h==Z Z and { } 1
HII hIIc h==Z Z as defined in (6.8), ( )11 h
H
I p nh +== ⊕ ⊗X I 1 and
( )11 h h
H
II p N nh + −== ⊕ ⊗X I 1 , and E is the vector of residuals. It is evident that
( )I IE =Z X μ and ( )II IIE =Z X μ .
6.8. Estimating ( )0T when one auxiliary variable is present
Without losing generality, we first consider a simple case when there are only
three strata ( )3H = and one auxiliary variable ( )1p = is known. We will consider two
possible scenarios of interest, i.e., 1) the stratum means of the auxiliary variable for all
98
H strata are known, 2) some but not all of the auxiliary variable stratum means are
known.
Based on Model (6.10), the population total ( )0T of the response variable can be
represented using the following identity,
( ) ( )01 2 3T ′ ′ ′ ′= =Y Yl l l l , (6.12)
where ( )1 2 3′′ ′ ′=l l l l , ( )1 0
hh N′= ⊗1l , 1,2,3h = . After sampling and partitioning
Z into IZ and IIZ , ( )0T can be written as
( )0I I II IIT ′ ′= +Z Zl l , (6.13)
where ( )1 2 3I I I I′′ ′ ′=l l l l , ( )( )1 0
hhI n′′= ⊗1l , and ( )1 2 3II II II II
′′ ′ ′=l l l l ,
( )( )1 0h hhII N n−
′′= ⊗1l .
The primary interest is to derive an estimator for ( )0T that is a linear function of
the sample, unbiased for ( )0T and has minimum mean squared error.
6.8.1. Linear estimator for ( )0T
We define the linear estimator as,
( ) ( )0ˆI IT ′ ′= + w Zl , (6.14)
where ( )
( )
0
1
1
H
h
hc h=
⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟= ⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭
ww
w, ( )0
hw and ( )1hw have dimension of 1hn × .
6.8.2. Unbiasedness constraints
The unbiasedness for ( )0T requires that ( ) 0I II IIE ′ ′− =w Z Zl for any possible
( )0T . This constraint is equivalent to,
99
( ) ( )( ) ( )( ) ( ) ( )
3 30 0 1 1
1 1
0h hh n h h h h n h
h h
N n μ μ= =
′ ′− − + =∑ ∑w 1 w 1 .
To satisfy this condition for all ( )0hμ and ( )1
hμ , it is required that both terms of the left-
hand side equal 0, i.e.,
( )0 :q ( ) ( ){ } ( )3 3 0
111 0 0
hn h hr hhN n
==
⎛ ⎞⎛ ⎞′′ ⊕ ⊗ − − =⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠w 1 μ ,
( )1 :q ( ) ( )31
10 1 0
hnh=
⎛ ⎞′′ ⊕ ⊗ =⎜ ⎟⎝ ⎠
w 1 μ ,
where ( ) ( ) ( ) ( )( )1 2 3k k k kμ μ μ ′=μ , 0,1k = .
To satisfy ( )0q for any ( )0μ , it is required that,
( ) ( ){ }3 3
111 0
hn h hc hhN n
==
⎛ ⎞′⊕ ⊗ − − =⎜ ⎟⎝ ⎠
1 w 0 . (6.15)
Constraint ( )1q can be satisfied by either forcing ( )1hw to be null vectors or
through proper transformation such that ( )1 0hμ = , where 1,2,3h = . Transformation of
random variables have been discussed in Chapter 5 in great detail, we apply those
results here.
Case 1: If all stratum means of auxiliary variable are known, ( )1q will be
satisfied by applying following transformation,
( )
( )
( )
( ) ( )
33 3
0 0
11 * 1
111
h
h
hh h
Nh h
Nh NN N h h cc hh
hc hN
μ==
=
⎧ ⎫⎛ ⎞ ⎧ ⎫ ⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎪ ⎪⎜ ⎟⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎜ ⎟ ⎜ ⎟ ⎜ ⎟= −⎨ ⎬ ⎨ ⎬ ⎨ ⎬⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟′+⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎜ ⎟ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎩ ⎭⎩ ⎭⎪ ⎪⎝ ⎠⎩ ⎭
I0Y Y
11P 1 Y Y
. (6.16)
100
Case 2: If some but not all stratum-specific auxiliary means are known while the
population auxiliary mean is unknown, such incomplete auxiliary information can also
be used to improve estimation. To illustrate this we consider a case where two stratum
auxiliary means ( )11μ and ( )1
2μ are known. We apply the following transformation to
incorporating the two known stratum auxiliary means, ( )
( )
( )
( ) ( )
( ) ( )
( )
1
2
* 1 1 11 1 1
* 1 1 12 2 2
* 1 13 3
N
N
μ
μ
⎛ ⎞ ⎛ ⎞−⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟= −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
Y Y 1
Y Y 1
Y Y
. (6.17)
Therefore, constraint ( )1q is reduced to
( )( ) ( )3
1 13 3 0n μ′ =1 w ,
to hold this condition for any ( )13μ , it is required that ( )
3
13n′ =1 w 0 .
6.8.3. Estimators for ( )0T after reparameterization
Following the strategy outlined in Chapter 5, we transform the auxiliary variable
( )1Y to ( )* 1Y using either (6.16) or (6.17) whenever appropriate. As an analogue to
(5.2) of Chapter 5, it can be shown that ( )( ) ( )( )* 1 1cov cov=Y Y .
Based on Model (6.11) , the partitioned model can be written as
**
*II
IIII
⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠
XZμ EXZ
, (6.18)
where ( )
( )
30
*1 *
1
hII
hIc h=
⎧ ⎫⎛ ⎞⎪ ⎪= ⎜ ⎟⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭
YZ
Y ,
( )
( )
30
*1 *
1
hIIII
hIIc h=
⎧ ⎫⎛ ⎞⎪ ⎪= ⎜ ⎟⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭
YZ
Y, ( )
3
11 hI p nh +== ⊕ ⊗X I 1 ,
( )3
11 h hII p N nh + −== ⊕ ⊗X I 1 , and ( )( ) ( )( ) ( )( )( )0 0 0*
1 2 30 0 0μ μ μ′
=μ .
The variance of ( )0T is thus
101
( ) ( ),
0,
ˆvarI I II
III II II II
T⎛ ⎞⎛ ⎞⎜ ⎟′ ′= − ⎜ ⎟⎜ ⎟′⎜ ⎟ −⎝ ⎠⎝ ⎠
V V ww
V Vl
l.
6.8.4. Case 1: all stratum means of the auxiliary variable are known
When all stratum means of the auxiliary variable are known, we apply
transformation (6.16), and define the optimization function as
( ) ( ){ }{ }3*, 1
2 2I I II II I h hc hN n
=′′ ′ ′Φ = − + − −w w V w w V λ X wl ,
where ( ) ( )( )3 3
*
1 11 0 1 0
hI I nh h= =
⎛ ⎞ ⎛ ⎞′ ′ ′= ⊕ = ⊕ ⊗⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
X X 1 . Differentiating ( )Φ w with respect
to w and λ , and setting the derivatives to zero leads to the following solution,
3
1
1ˆ
h
hI n
hhc h
Nnβ
=
⎧ ⎫⎛ ⎞ ⎛ ⎞⎪ ⎪= − + ⊗⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭w 1l , (6.19)
where 01,21,
hh
h
σβ
σ= is the linear regression coefficient of ( )0y on ( ) ( ) ( )1 * 1 1
Nμ= −y y 1 in
the h -th stratum of the finite population (Cochran 1977).
The estimator for ( )0T can be written as
( ) ( ) ( ) ( )( )3
3 30 0 1 1
1 11
1ˆ
h
hn I h h h h h
h hhhc h
NT N y N yn
β μβ = =
=
⎧ ⎫⎛ ⎞ ⎛ ⎞⎪ ⎪⎜ ⎟= ⊗ = − −⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭∑ ∑1 Z ,
where ( )0hy and ( )1
hy are sample means of the response and auxiliary of the h -th stratum.
The variances of ( )0T is
( )( ) ( ) ( )3
0 2 2 20,
1
1ˆvar 1 ,hh h h
h h
fT N
nρ σ
=
−= −∑
102
where 01,
0, 1,
hh
h h
σρ
σ σ= is the correlation coefficient between the response variable and
auxiliary variable in the h -th stratum.
6.8.5. Case 2: some but not all stratum means of the auxiliary variable are known
When some but not all stratum means of the auxiliary variable are known, we
apply transformation (6.17) and define the optimization function as
( ) ( ){ }3
* 1,2 2 ,
0
h hc hI I II II I
N n=
⎧ ⎫⎛ ⎞−⎪ ⎪′ ⎜ ⎟′ ′ ′Φ = − + −⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭w w V w w V λ X wl
where ( )
( )
( )( )( )( )( )31 2 3
33
1* 1
1 21 5
1 01 0
1
hnhhI I
nn n n
==
× + +×
⎛ ⎞⎛ ⎞ ′⊕ ⊗⊕ ⎜ ⎟⎜ ⎟′ ′= = ⎜ ⎟⎜ ⎟ ′⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
1X X
0 10. Differentiating ( )Φ w with
respect to w and λ and setting the derivatives to zero results in the following
estimating equations,
( ){ },
*3
* 1
ˆ
ˆ0
I II II
I I
h hc hI
N n=
⎛ ⎞⎜ ⎟⎛ ⎞⎛ ⎞⎜ ⎟⎛ ⎞⎜ ⎟⎜ ⎟ −= ⎜ ⎟⎜ ⎟′⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠ ⎜ ⎟⎝ ⎠ ⎜ ⎟⎝ ⎠⎝ ⎠
VwV X
λX 0
l
. (6.20)
Solving (6.20) leads to
( ) ( ){ }
1 1 *,
31
* 1 * * 1 1,
ˆˆ
ˆ .0
I I II II I I
h hc hI I I I I I II II
N n
− −
−− − =
⎧ = −⎪⎪ ⎛ ⎞⎛ ⎞⎨ −⎜ ⎟′ ′ ⎜ ⎟= −⎪
⎜ ⎟⎜ ⎟⎪ ⎝ ⎠⎝ ⎠⎩
w V V V X λ
λ X V X X V V
l
l
After algebraic simplification, we have
103
3
2
3
1
1 3
3
1
1ˆ
0 1
0
h
h
hn
h hc h
n
c hn
Nn
Nn
β=
=
⎛ ⎞⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟⊗⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟−⎧ ⎫ ⎪ ⎪⎝ ⎠⎛ ⎞⎛ ⎞ ⎩ ⎭⎜ ⎟⎪ ⎪⎜ ⎟= − ⊗ +⎜ ⎟⎨ ⎬ ⎜ ⎟⎜ ⎟⎜ ⎟⎪ ⎪ ⎛ ⎞⎝ ⎠ ⎜ ⎟⎝ ⎠⎩ ⎭⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
1
w 1
1
, (6.21)
where 201, 1,h h hβ σ σ= , 1,2.h = Consequently, the estimator for 0T and its variance are
( ) ( ) ( )( ) ( )
( ) ( ) ( )( )( ) ( )
1
20 1 * 0 ** 3
0 31 3
20 1 1 03
31 3
ˆ ˆ 1h h
hI I n hI h n hI n I
h h
hh h h h
h h
N NTn n
N Ny y yn n
β
β μ
=
=
′′ ′ ′ ′= + = − +
= − − +
∑
∑
w Z Y 1 Y 1 Yl
( ) ( ) ( )( ) ( )2
* * 2 2 2 2 230 0, 3 0,3
1 3
1 1ˆ 垐var cov 1hI I I h h h
h h
f fT Z N Nn n
ρ σ σ=
− −′′= + + = − +∑w wl l .
6.9. Derivations of results in Section 6.8.4.
Case I: All stratum means of the auxiliary variable are known
As shown in Section 6.7, under the unbiasedness constraint of 0T for 0T and
after transformation (6.15), the optimization function is defined as
( ) ( ){ }{ }3*, 1
2 2 0I I II II I h hc hN n
=′′ ′ ′Φ = − + − − =w w V w w V λ X wl ,
where ( ) ( ) ( ) ( ) ( ) ( )( )0 1 0 1 0 11 1 2 2 3 3
′′ ′ ′ ′ ′ ′=w w w w w w w , ( )khw has dimension of 1hn × ;
( ) ( )( )3 3
*
1 11 0 1 0
hI I nh h= =
⎛ ⎞ ⎛ ⎞′ ′ ′= ⊕ = ⊕ ⊗⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
X X 1 . Differentiating ( )Φ w with respect to w
and λ leads to the following estimation equations
( ){ }
*,
3*1
I II III I
h hI c hN n
=
⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟=⎜ ⎟⎜ ⎟′⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠⎝ ⎠
VV X w
X 0 λ
l
104
which has a unique solution,
( ) ( ){ }( )1 3* 1 * * 1, 1
1 1 *,
ˆ
ˆˆ
I I I I I I II II h hc h
I I II II I I
N n−
− −
=
− −
′ ′= − −
= −
λ X V X X V V
w V V V X λ
l
l
Since 3
1 1
1I hIh
− −
== ⊕V V , ( ), , ,1
H
I II II I h I IIh=′= = ⊕V V V and ( )
31 1
, ,1I I II hI h I IIh
− −
== ⊕V V V V , we have
( )( ) ( )( )( )( ) ( ) ( ){ }
( )( ) ( )( )( )( ) ( )( )
13 3 31
1 1 1
3 3 31, 11 1
131
1
31
,1
ˆ 1 0 1 0
1 0
1 0 1 0
1 0
h h
h
h h
h
n hI nh h h
n hI II h hh I II c hh h
n hI nh
n hI II hh I II ch
N n
N
−−
= = =
−
== =
−−
=
−
=
⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞′′= ⊕ ⊗ ⊕ ⊕ ⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠⎛ ⎞⎛ ⎞⎛ ⎞′× ⊕ ⊗ ⊕ − −⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠
⎛ ⎞⎛ ⎞′′= ⊕ ⊗ ⊗⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
⎛ ⎞′× ⊕ ⊗ −⎜ ⎟⎝ ⎠
λ 1 V 1
1 V V
1 V 1
1 V V
l
l ( ){ }3
1h hn
=
⎛ ⎞−⎜ ⎟⎝ ⎠
,
( ) ( )( )3 31 1
,1 1ˆˆ 1 0
hhI II hI nh I IIh h
− −
= =
⎛ ⎞ ⎛ ⎞′= ⊕ − ⊕ ⊗⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
w V V V 1 λl
Since ,h hhI h n N= ⊗V Σ P , 1 1 1,h hhI h n N
− − −= ⊗V Σ P , ( ) ( ) ( ), ,1
h h hhh I II h II I n N nhN × −
⎛ ⎞′= = ⊗ −⎜ ⎟
⎝ ⎠V V Σ J ,
( ) ( )1
2,1
h h hhI h I II n N nh hN n
−× −
⎛ ⎞= ⊗ −⎜ ⎟−⎝ ⎠
V V I J , and
3
1
1
0 h hII N n
c h
−
=
⎧ ⎫⎛ ⎞⎪ ⎪= ⊗⎜ ⎟⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭1l ,
( )( )( ) ( )( )( ) ( ){ }
131 1
,1
3 3
11
ˆ 1 0 1 0
1 0
h h h h
h h
n h n N nh
hN n II h hc hh
h h
n N nN n
−− −
=
− ==
⎛ ⎞⎛ ⎞′′= ⊕ ⊗ ⊗ ⊗⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞
′⎜ ⎟× ⊕ ⊗ − − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟−⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠
λ 1 Σ P 1
1 l
105
( ) ( )
( ) ( ){ }
( )
13 11 1
,1
33 3
111
31
1
1ˆ 1 0
0
11 0
0
11 0
0
h h h h
h h h h
h n n N nh
hN n N n h hc hh
h h c h
h hhh
h h
n N nN n
N nN n
−
−− −
=
− − ==
=
−
=
⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟ ′= ⊕ ⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠⎛ ⎞⎧ ⎫⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞ ⎪ ⎪⎜ ⎟′× ⊕ ⊗ − ⊗ − −⎜ ⎟⎜ ⎟⎜ ⎟ ⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟−⎜ ⎟⎝ ⎠ ⎪ ⎪⎝ ⎠⎝ ⎠ ⎝ ⎠⎩ ⎭⎝ ⎠
⎛ ⎞−= ⊕ ⎜ ⎟
⎝ ⎠
λ Σ 1 P 1
1 1
Σ ( ){ } ( ){ }( )
( )
13 3
1 1
31
1
13
201,2
0, 2 20, 1,
1
11 0
0
1
h h hc ch h
h hh
hc h
hh hh
h h hc h
n N n
N nn
N nn
σσ
σ σ
−
= =
−
−
=
=
⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟ − − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠
⎧ ⎫⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞−⎪ ⎪⎜ ⎟⎜ ⎟= − ⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎪ ⎪⎝ ⎠⎝ ⎠⎝ ⎠⎩ ⎭
⎧ ⎫⎛ ⎞⎛ ⎞−⎪ ⎪= − −⎜ ⎟⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎪ ⎪⎝ ⎠⎝ ⎠⎩ ⎭
Σ
( )
( )
3 31
21 1
3
1
31
1
11 ˆˆ0
1 1
0
1
0
hh h h
h hh h h
hII h nn N nh h
h h h h
N nn N nh hc h
hhh
h h
NN n N n
N n
NN n
−× −= =
−× −
=
−
=
⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟= ⊕ ⊗ − − ⊕ ⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠
⎧ ⎫⎛ ⎞ ⎛ ⎞⎪ ⎪= ⊗ −⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟ −⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭
⎛ ⎞⎛ ⎞− ⊕ ⎜⎜ ⎟ ⎜−⎝ ⎠ ⎝
w I J Σ 1 λ
J 1
Σ
l
( )
( ) ( )
31
1
1
3 1
1 1
1
11 0
0
1 1 11 0
0 0 0
h
h h
h hn h
hc h
hn h n h
hc h c
N nn
Nn
−
−
=
−
− −
=
⎧ ⎫⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎞−⎪ ⎪⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⊗ −⎟ ⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎪ ⎪⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠⎩ ⎭
⎛ ⎞⎧ ⎫ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞⎪ ⎪ ⎜ ⎟⎜ ⎟ ⎜ ⎟= ⊗ − − − ⊗⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎨ ⎬ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠⎪ ⎪⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎩ ⎭ ⎝ ⎠
1 Σ
1 Σ 1 Σ
( )
3
1
31
1 1
1
1 11 0
0 0h
h
hI h n h
hc h
Nn
=
−
− −
=
⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭
⎧ ⎫⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎪ ⎪⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟= − + − ⊗⎜ ⎟ ⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎪ ⎪⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠⎩ ⎭
Σ 1 Σl
106
( )
( )
31
1 1
1
31
1 1
1
1 1ˆ 1 0
0 0
1 11 0
0 0
h
h
hI h n h
hc h
hI h h n
hc h
Nn
Nn
−
− −
=
−
− −
=
⎧ ⎫⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎪ ⎪⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟= − + − ⊗⎜ ⎟ ⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎪ ⎪⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠⎩ ⎭
⎧ ⎫⎡ ⎤⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎪ ⎪⎢ ⎥⎜ ⎟⎜ ⎟= − + ⊗⎜ ⎟ ⎜ ⎟⎨ ⎬⎜ ⎟⎢ ⎥⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎪ ⎪⎝ ⎠ ⎝ ⎠⎝ ⎠⎢ ⎥⎝ ⎠⎣ ⎦⎩ ⎭
= −
w Σ 1 Σ
Σ Σ 1
l
l
l
3
01,21, 1
3
1
1
1,
h
h
hI nh
hhc h
hI n
hhc h
Nn
Nn
σσ
β
=
=
⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟ ⎛ ⎞⎪ ⎪+ ⊗⎜ ⎟⎨ ⎬⎜ ⎟
− ⎝ ⎠⎜ ⎟⎪ ⎪⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭
⎧ ⎫⎛ ⎞ ⎛ ⎞⎪ ⎪⎜ ⎟= − + ⊗⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭
1
1l
where 01,21,
hh
h
σβ
σ= .
Therefore, the estimator for ( )0T can be written as
( ) ( )
( ) ( )
( ) ( )( )
3
0
1
3 30 * 1
1 1
3 30 1
1 1
ˆ 1
.
h
h h
hh n I
hr h
h hn hI h n hI
h hh h
h h h h hh h
NTn
N Nn n
N y N y
β
β
β μ
=
= =
= =
⎧ ⎫⎛ ⎞⎪ ⎪′= − ⊗⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭
′ ′= −
= − −
∑ ∑
∑ ∑
1 Z
1 Y 1 Y
The variance of ( )0T is
( )( ) ( ) ( ) ( )
( ) ( )
( ) ( )
33
0
1 1
333
11 1
,
ˆvar 1 cov 1
1 1
1
h h
h h
h h h
h hh n I h n
h hr ch h
h hh n hI h nh
h hr ch h
hh n h n N
h
N NTn n
N Nn n
Nn
β β
β β
β
= =
== =
⎧ ⎫⎧ ⎫⎛ ⎞ ⎛ ⎞⎪ ⎪ ⎪ ⎪′′= − ⊗ − ⊗⎨ ⎬ ⎨ ⎬⎜ ⎟ ⎜ ⎟⎪ ⎪ ⎪ ⎪⎝ ⎠ ⎝ ⎠⎩ ⎭ ⎩ ⎭
⎧ ⎫⎧ ⎫⎛ ⎞ ⎛ ⎞⎪ ⎪ ⎪ ⎪⎛ ⎞ ′′= − ⊗ ⊕ − ⊗⎨ ⎬ ⎨ ⎬⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠⎪ ⎪ ⎪ ⎪⎝ ⎠ ⎝ ⎠⎩ ⎭ ⎩ ⎭
⎛ ⎞⎛ ⎞′= − ⊗ ⊗⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
1 Z 1
1 V 1
1 Σ P3
1
1h
hn
h hh
Nnβ=
⎧ ⎫⎛ ⎞⎛ ⎞ ⎛ ⎞⎪ ⎪⎜ ⎟⎜ ⎟ ⊗⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎝ ⎠⎩ ⎭∑ 1
107
( )( ) ( ) ( )
( ) ( )
( ) ( )
( )
30
1
3
1
32 2 20, 01, 1,
1
201,2
0, 21 1,
1ˆvar 1
11
2
h h hh h
h h h
h h hh h
h h h
h h hh h h h h
h h
hh h hh
h h h
N N nT
n
N N nn
N N nn
N N nn
ββ
ββ
σ β σ β σ
σσ
σ
=
=
=
=
⎧ ⎫⎛ ⎞⎛ ⎞−⎪ ⎪⎜ ⎟⎜ ⎟= − Σ⎨ ⎬⎜ ⎟⎜ ⎟−⎪ ⎪⎝ ⎠⎝ ⎠⎩ ⎭⎧ ⎫⎛ ⎞⎛ ⎞−⎪ ⎪⎜ ⎟⎜ ⎟= − Σ⎨ ⎬⎜ ⎟⎜ ⎟−⎪ ⎪⎝ ⎠⎝ ⎠⎩ ⎭
−⎧ ⎫= − +⎨ ⎬
⎩ ⎭⎧ ⎫⎛ ⎞−⎪ ⎪= −⎜ ⎟⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭
∑
∑
∑
( ) ( )
3
32 2 2
0,1
11 ,h
h h hh h
fN
nρ σ
=
−⎧ ⎫= −⎨ ⎬
⎩ ⎭
∑
∑
where 01,
0, 1,
hh
h h
σρ
σ σ= is the correlation coefficient between the response variable and
auxiliary variable in the h -th stratum.
6.10. Derivations of results in Section 6.8.5.
When some but not all stratum means for the auxiliary variable are known, after
application of transformation (6.16) to ( )1hY and partitioning into sample and remaining
parts IZ and IIZ . The optimization function is defined as
( ) ( ){ }3
* 1,2 2 ,
0
h hc hI I II II I
N n=
⎧ ⎫⎛ ⎞−⎪ ⎪′ ⎜ ⎟′ ′ ′Φ = − + −⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭w w V w w V λ X wl
where ( )
( )
( )( )( )( )( )31 2 3
33
1* 1
1 21 5
1 01 0
1
hnhhI I
nn n n
==
× + +×
⎛ ⎞⎛ ⎞ ′⊕ ⊗⊕ ⎜ ⎟⎜ ⎟′ ′= = ⎜ ⎟⎜ ⎟ ′⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
1X X
0 10. Differentiating ( )Φ w with
respect to w and λ and setting the derivatives to zero results in the following
estimating equations,
108
( ){ },
*3
* 1
ˆ
ˆ0
I II II
I I
h hc hI
N n=
⎛ ⎞⎜ ⎟⎛ ⎞⎛ ⎞ ⎜ ⎟⎛ ⎞⎜ ⎟⎜ ⎟ −= ⎜ ⎟⎜ ⎟′⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠ ⎜ ⎟⎝ ⎠ ⎜ ⎟⎝ ⎠⎝ ⎠
VwV X
λX 0
l
. (6.22)
After simplification, ( )1 2
3 3
2
2 21*
2 2 2
1
0 hn n nhI
n n
+ ×=
×
⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⊕ ⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟= ⎝ ⎠⎝ ⎠⎜ ⎟⎜ ⎟⊗⎝ ⎠
1 0X
0 I 1
. Solving (6.22) leads to
( ) ( ){ }
1 1 *,
31
* 1 * * 1 1,
ˆˆ
ˆ0
I I II II I I
h hc hI I I I I I II II
N n
− −
−− − =
⎧ = −⎪⎪ ⎛ ⎞⎛ ⎞⎨ −⎜ ⎟′ ′ ⎜ ⎟= −⎪
⎜ ⎟⎜ ⎟⎪ ⎝ ⎠⎝ ⎠⎩
w V V V X λ
λ X V X X V V
l
l.
Since 1
H
I hIh== ⊕V V , 1 1 1
,h hhI h n N− − −= ⊗V Σ P , ( ), , ,1
H
I II II I h I IIh=′= = ⊕V V V ,
1
H
II hIIh== ⊕V V ,
( ) ( ) ( ), ,1
h h hhh I II h II I n N nhN × −
⎛ ⎞′= = ⊗ −⎜ ⎟
⎝ ⎠V V Σ J , and
3
1
1
0 h hII N n
c h
−
=
⎧ ⎫⎛ ⎞⎪ ⎪= ⊗⎜ ⎟⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭1l , we have,
( ) ( )1
2,1
h h hhI h I II n N nh hN n
−× −
⎛ ⎞= ⊗ −⎜ ⎟−⎝ ⎠
V V I J ,
3
1,
1
1
0 hI I II II n
c h
−
=
⎧ ⎫⎛ ⎞⎛ ⎞⎪ ⎪⎜ ⎟= − ⊗⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎪ ⎪⎝ ⎠⎝ ⎠⎩ ⎭V V 1l , and
consequently,
( )( )( )
{ }3
31 2
22 3 1
2 2* 1 1,
322 2 1
11 01
00
h
h
hc hn nh
I I I II II n
cnn n h
n
n
=
×− =
× + =
⎛ ⎞⎛ ⎞ ⎜ ⎟⎧ ⎫⎛ ⎞′⊕ ⊗ ⎪ ⎪⎜ ⎟ ⎜ ⎟′ = − ⊗ = −⎜ ⎟⎨ ⎬ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟′ ⎪ ⎪ ⊗⎜ ⎟⊗ ⎝ ⎠⎩ ⎭ ⎜ ⎟⎝ ⎠ ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
1 0X V V 1
0 I 1l ,
( ){ }{ }
( ){ } { }2
3 3 31
* 1 11 1,
3
100 0
0
hc hhh h h h cc c hh h
I I I II II
nNN n N n
n
=
− == =
⎛ ⎞⎜ ⎟⎛ ⎞ ⎛ ⎞ ⎛ ⎞− −⎜ ⎟′ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟− = − − = −⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⊗⎜ ⎟ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
X V V l ,
109
( )( )
22 20, 01,1
1* 1 *
3 33
3 3
1h hh hh
h h
I I I
N nN n
N nN n
σ ρ=
−−
⎛ ⎞−⎛ ⎞⊕ −⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟
′ = ⎜ ⎟−⎜ ⎟
⎜ ⎟⎜ ⎟⎝ ⎠
0
X V X0 Σ
,
( )2
2 20, 01,
1
23 30,3 01,3
3 20,3
1
ˆ 1
h hh h
hc h
N nn
N nn
σ ρ
σ σσ
=
⎛ ⎞−⎧ ⎫⎪ ⎪⎜ ⎟−⎨ ⎬⎜ ⎟⎪ ⎪⎩ ⎭⎜ ⎟⎜ ⎟= − ⎛ ⎞⎜ ⎟⎜ ⎟−⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
λ .
( ) ( )
( )
1 2
3 3
233 2 211 1
,1
12 2 2
2
2 20, 01,
1
20,33 3
3 01,3
11
0ˆ0
1
h
h h h
h
n n nhn h n Nh
c hn n
hh h
hc h
N nn
N nn
σ ρ
σ
σ
+ ×=− −
=
=×
=
⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎧ ⎫ ⊕ ⊗⎜ ⎟⎛ ⎞⎛ ⎞⎪ ⎪ ⎛ ⎞ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟= − ⊗ − ⊕ ⊗⎜ ⎟ ⎝ ⎠⎨ ⎬ ⎜ ⎟ ⎝ ⎠⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠⎪ ⎪⎝ ⎠⎝ ⎠⎩ ⎭ ⎜ ⎟⎜ ⎟⊗⎝ ⎠
⎛ −⎧ ⎫⎪ ⎪⎜ − −⎨ ⎬⎜⎪ ⎪⎩ ⎭⎜×⎜ ⎛ ⎞−⎜ ⎜ ⎟−⎜ ⎜ ⎟⎜ ⎝ ⎠⎝
1 0w 1 Σ P
0 I 1
( ) ( )
( )( )3 3 3
22
1 1 2 2, 0, 01,13
1
20,31 11 3 3
3 ,3 01,3
11
01.
0
h
h h h
h
hh n N n h hh
hc h
n
c hn N n
N nn
N nn
σ ρ
σ
σ
− −
==
− −=
⎞⎟⎟⎟⎟⎟⎟⎟⎠
⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞ −⎧ ⎫⎪ ⎪⎜ ⎟⎜ ⎟⎜ ⎟⊕ ⊗ −⎜ ⎟ ⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎪ ⎪⎩ ⎭⎝ ⎠⎝ ⎠⎧ ⎫ ⎝ ⎠⎛ ⎞⎛ ⎞ ⎜ ⎟⎪ ⎪⎜ ⎟= − ⊗ +⎜ ⎟⎨ ⎬ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎛ ⎞⎪ ⎪ ⎜ ⎟⎝ ⎠⎝ ⎠ −⎩ ⎭ ⎜ ⎟⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
Σ P 1
1
Σ P 1
This can be further simplified as
110
3
2
301, 1
1 3
3
1
1ˆ
0 1
0
h
h
hn
h hc h
n
c hn
Nn
Nn
β=
=
⎛ ⎞⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟⎜ ⎟ ⊗⎨ ⎬⎜ ⎟⎜ ⎟−⎧ ⎫ ⎪ ⎪⎛ ⎞ ⎝ ⎠⎛ ⎞ ⎩ ⎭⎜ ⎟⎪ ⎪⎜ ⎟= − ⊗ +⎜ ⎟⎨ ⎬ ⎜ ⎟⎜ ⎟⎜ ⎟⎪ ⎪ ⎜ ⎟⎛ ⎞⎝ ⎠⎝ ⎠⎩ ⎭⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
1
w 1
1
,
where 201 01, 1,h h hβ σ σ= , 1,2.h =
111
CHAPTER 7
APPLICATIONS TO RATE ESTIMATION
7.1. Motivations
This Chapter discusses applications of the methods developed in Chapters 4, 5
and 6 to rate estimation that is common in epidemiologic surveys. We will consider
scenarios involving up to two auxiliary variables, either categorical or continuous.
7.2. Cases with single categorical auxiliary variable
We first consider the scenarios with single categorical auxiliary variable that has
only two categories (binomial), then extend the results to the case with more than two
categories.
7.2.1. Binomial auxiliary variable
Suppose that a sample survey is conducted to estimate the disease prevalence of
a population. A SRSWOR sample of size n is drawn from the population, and the
auxiliary variable is subjects’ gender. In practice, the following scenarios may occur,
1) Gender is known for all subjects, and
2) Only total numbers of males ( )mN and females ( )fN are known.
From Section 5.3 of Chapter 5, we know the estimator for population disease
prevalence can be written as
( )0 0 01 1 1r r r rβ= + − , (7.1)
112
where 0r is disease rate of the sample, 1r is the proportion of males in the sample, 1r is
the proportion of males in the population, and 201 01 1β σ σ= is the population regression
coefficient of disease status on gender.
When either the gender of all subjects or the total numbers of males (females) of
the population is known, 21σ is known, that is,
( )
21 1
m fN NN N
σ =−
.
For this scenario, knowledge of the total number of males or females is adequate.
By methods of moments, we replace other variance components, 20σ and 01σ ,
with their sample estimates,
( )( )
0 020ˆ
1n n nn n
σ−
=−
and ( )
0001ˆ
1m f fm
m f
n n nnn n n n
σ⎛ ⎞
= −⎜ ⎟⎜ ⎟− ⎝ ⎠,
where mn and fn are numbers of males and females in the sample, 0 0 0m fn n n= + , 0mn
and 0 fn are the numbers of male and female disease bearers in the sample,
respectively. Consequently, using 21σ ,
( )( )
0001
1ˆ1
m f fm
m f m f
n n nN N nn n N N n n
β⎛ ⎞⎛ ⎞−
= −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟− ⎝ ⎠⎝ ⎠.
The estimator for disease prevalence is simplified to
00 00 2
1ˆ1
m f f fm m
f m m f
n n n nn N n nrn n n N N n n
⎛ ⎞⎛ ⎞⎛ ⎞−= + − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟− ⎝ ⎠⎝ ⎠⎝ ⎠
. (7.2)
Its variance is
113
( ) ( )( ) ( )
( )( )
( )( )
0 0 20 01
22 200 0 0
22
1 ˆˆ 11
11 1 ,1 1
m f fm
m f m f
n n nfV rn n n
n n nn n n N Nf f nn n n n N N n nn n
ρ⎛ ⎞−−
= −⎜ ⎟⎜ ⎟−⎝ ⎠
⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞− −− −= − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠
(7.3)
where ( )( ) ( )
2
02 001
0 0
1ˆ1
m f m f fm
m f m f
n n n n nN N nn n N N n n n n n
ρ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞−
= −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟− −⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠.
If we use ( ) ( )20 0 0ˆ 1n n n n nσ = − − , ( )2
1ˆ 1m fn n n nσ = − and
( )00
01ˆ1
m f fm
m f
n n nnn n n n
σ⎛ ⎞
= −⎜ ⎟⎜ ⎟− ⎝ ⎠, then using 2
1σ , 0001
ˆ fm
m f
nnn n
β = − , and
000
f fm m
m f
N nN nrN n N n
⎛ ⎞⎛ ⎞= + ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
. (7.4)
This is a “post-stratified” estimator of prevalence rate, which has been shown
using calibration (Deville and Särndal 1992) and model-assisted approaches (Särndal,
Swensson and Wretman 1992). Its variance is
( ) ( )
2200
0 01ˆ
1fm
pm f
nf nV r nn n n n
⎛ ⎞⎛ ⎞−= − +⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟− ⎝ ⎠⎝ ⎠
. (7.5)
Derivations of (7.2), (7.3), (7.4) and (7.5) are given in Section 7.5.
7.2.2. Relative performance of various estimation methods
We conducted a small Monte Carlo simulation to evaluate the performance of
estimators (7.2) and (7.5). We created a series of hypothetical populations of sizes 50,
100, 200, 400, 800 and 1,600; each consists of 50% men and 50% women. In all
populations, the prevalence of smokers is 60% for men and 20% for women. We use
114
the known gender proportions to improve our estimation based on estimators shown in
the previous section.
The simulation study was carried out as follows. From each population of size
25,50,100,200,400,800N = , a series of 5,000 SRSWOR samples of size n , where
25,50,100,200,400,800n = and n N< , were drawn. For each sample, we estimated
the proportion of smokers, and its variance using the following methods:
1) SEP: Simple Expansion Predictor without auxiliary information.
2) DPS: Design-based Predictor using Sample estimate of population variance
of gender ( )21σ and estimated covariance ( )01σ , i.e., (7.4) and (7.5).
3) DPP: Design-based Predictor using known Population variance of gender
( 21σ ), i.e., (7.2) and (7.3), respectively.
4) POP: Design-based predictor using known Population variance and
covariance, i.e., predictor (5.12).
5) JKF: The variance of design-based predictor (7.4) was estimated with
modified delete-one Jackknife method (Jones 1974) using functional form
(7.4).
The 95% confidence intervals (CIs) were computed using the variance estimators based
on asymptotic normal approximation. We then compared the overall performance of
these estimation methods in terms of mean squared error (MSE), coverage rates and
average length of the confidence intervals. Simulation results are summarized in the
following Tables 7.1, 7.2 and 7.3, and presented graphically in Figures 7.1, 7.2 and 7.3.
Based on these simulation data, we made the following observations.
115
1) As expected, these simulations confirm that all these estimators are
asymptotically unbiased, or at least the biases are negligible; especially when sample
size is not small.
2) We observed significant reduction in MSE of the estimators relative to
simple expansion estimators for small and large sample sizes (Table 7.1, Figure 7.2),
and thus narrower confidence intervals (Table 7.2).
3) From Figures 7.2 and 7.3, we concluded that DPP method is superior to DPS
method in terms of MSE reduction, especially when sample size is relatively small
(n<200).
4) The coverage rate of the confidence intervals constructed using the variance
estimators of the SRS, DPS, DPP, POP and JKF methods are all very close to their
nominal level (95%). When sample sizes are small ( )50n ≤ , DPP and DPS confidence
intervals tend to have highest coverage rates and the narrowest width. When sample
sizes are moderate or large, there are little differences in coverage rates between the
methods. However, the CIs based on Jackknife variance estimator tend to be much
wider than POP, DPP and DPS methods.
5) From Table 7.3, the CIs based on Jackknife variance and DPS method
appeared to have higher rate of right-miss for smaller sample sizes (n<=100).
116
Table 7.1. Mean squared error (x 10,000) of four estimators by population size and sample size
---------------------------------------------------------------------------------------
| Sample size and estimation method
Population | --------- 25 --------- --------- 50 --------- --------- 100 --------
size | SEP DPS DPP POP SEP DPS DPP POP SEP DPS DPP POP
-----------+---------------------------------------------------------------------------
50 | 48.3 42.6 39.7 39.7
100 | 71.6 64.3 60.6 60.6 24.4 21.2 20.5 20.5
200 | 85.2 75.2 70.8 70.8 35.5 30.4 29.5 29.5 12.0 10.1 9.9 9.9
400 | 89.1 76.8 73.0 73.0 40.2 34.9 34.0 34.0 17.8 15.1 15.0 15.0
800 | 96.3 85.0 81.0 81.0 44.9 38.7 37.7 37.7 21.5 18.0 17.8 17.8
1600 | 95.6 84.1 80.8 80.8 47.1 40.8 39.9 39.9 21.8 18.7 18.5 18.5
---------------------------------------------------------------------------------------
| --------- 200 -------- --------- 400 -------- --------- 800 --------
-----------+---------------------------------------------------------------------------
400 | 5.9 4.9 4.9 4.9
800 | 9.2 7.6 7.6 7.6 3.0 2.5 2.5 2.5
1600 | 10.8 9.1 9.0 9.0 4.6 3.8 3.8 3.8 1.5 1.2 1.2 1.2
---------------------------------------------------------------------------------------
SEP = Simple expansion predictor based SRSWOR, DPS = Estimates based on
sample variance and covariance, DPP = Estimates based on sample covariances and
population variance of the auxiliary variable, POP = Estimates based on population
variance and covariance.
117
Table 7.2. Coverage rates and width of 95% confidence intervals
---------------------------------------------------------------------------------------------------------
| Sample size and Estimation method
Population | ------------ 25 ------------ ------------ 50 ------------ ------------ 100 -----------
size | SEP DPS DPP POP JKF SEP DPS DPP POP JKF SEP DPS DPP POP JKF
-----------+---------------------------------------------------------------------------------------------
50 | 96.0 94.3 95.1 95.1 94.6
| 27.4 25.0 24.7 24.7 26.1
|
100 | 93.4 93.4 94.0 94.0 93.4 93.2 94.1 94.7 94.7 94.7
| 33.4 30.5 30.0 30.0 31.8 19.3 17.6 17.5 17.5 17.9
|
200 | 91.0 92.6 94.0 94.0 93.7 95.4 94.5 95.1 95.1 94.8 94.2 94.9 95.0 95.0 95.0
| 35.9 32.8 32.2 32.2 34.2 23.6 21.5 21.3 21.3 21.9 13.6 12.4 12.4 12.4 12.6
|
400 | 94.7 93.6 94.8 94.8 94.0 94.1 94.5 95.0 95.0 94.8 94.1 94.8 95.1 95.1 95.1
| 37.2 34.0 33.2 33.2 35.4 25.4 23.2 23.0 23.0 23.7 16.7 15.2 15.1 15.1 15.3
|
800 | 93.6 92.9 93.7 93.7 92.8 95.0 94.0 94.3 94.3 94.6 94.8 94.3 94.4 94.4 94.7
| 37.7 34.4 33.7 33.7 35.9 26.3 24.0 23.7 23.7 24.4 18.0 16.4 16.3 16.3 16.6
|
1600 | 93.6 93.0 94.1 94.1 93.3 94.2 93.8 94.1 94.1 94.0 94.2 94.7 95.0 95.0 94.9
| 38.1 34.8 34.0 34.0 36.3 26.7 24.4 24.1 24.1 24.9 18.6 17.0 16.9 16.9 17.1
---------------------------------------------------------------------------------------------------------
| ------------ 200 ----------- ------------ 400 ----------- ------------ 800 -----------
-----------+---------------------------------------------------------------------------------------------
400 | 95.0 95.6 95.8 95.8 95.7
| 9.6 8.8 8.8 8.8 8.8
|
800 | 94.2 94.7 94.7 94.7 94.6 94.6 94.7 94.6 94.6 94.6
| 11.8 10.7 10.7 10.7 10.8 6.8 6.2 6.2 6.2 6.2
|
1600 | 94.5 94.3 94.3 94.3 94.4 94.6 95.0 94.9 94.9 94.8 95.6 95.2 95.3 95.3 95.1
| 12.7 11.6 11.6 11.6 11.7 8.3 7.6 7.6 7.6 7.6 4.8 4.4 4.4 4.4 4.4
---------------------------------------------------------------------------------------------------------
SEP = Simple expansion predictor, DPS = DP based on sample variance and
covariances, DPP = DP using sample covariances and population variance of the auxiliary
variable, POP = Design-based predictor (DP) based on population variance and covariances,
JKF= CI based on Jackknife variance estimator.
118
Table 7.3. Percentage of left- and right-misses of the 95% confidence intervals, by population size, sample size and estimation methods
---------------------------------------------------------------------------------------------------------
| Sample size and Estimation method
Population | ------------ 25 ------------ ------------ 50 ------------ ------------ 100 -----------
size | SEP DPS DPP POP JKF SEP DPS DPP POP JKF SEP DPS DPP POP JKF
-----------+---------------------------------------------------------------------------------------------
50 | 2.0 2.3 2.4 2.4 2.0
| 2.0 3.5 2.5 2.5 3.4
|
100 | 1.7 2.2 3.3 3.3 2.7 3.2 2.4 2.7 2.7 2.4
| 4.9 4.4 2.7 2.7 3.9 3.5 3.5 2.6 2.6 2.9
| 200 | 2.6 2.6 3.3 3.3 2.7 1.3 2.2 2.4 2.4 2.2 3.0 2.6 2.9 2.9 2.5
| 6.5 4.8 2.7 2.7 3.6 3.3 3.4 2.5 2.5 3.1 2.8 2.4 2.1 2.1 2.5
|
400 | 2.7 2.4 3.0 3.0 2.5 2.0 2.5 2.7 2.7 2.4 2.4 2.4 2.7 2.7 2.3
| 2.6 4.1 2.2 2.2 3.5 3.9 3.0 2.3 2.3 2.8 3.5 2.8 2.2 2.2 2.6 |
800 | 3.3 2.6 3.7 3.7 2.7 2.8 2.6 3.2 3.2 2.4 2.1 2.6 3.1 3.1 2.4
| 3.1 4.5 2.6 2.6 4.4 2.2 3.4 2.4 2.4 2.9 3.0 3.1 2.5 2.5 2.9
|
1600 | 3.5 2.6 3.4 3.4 2.7 3.2 2.9 3.6 3.6 3.0 2.2 2.2 2.5 2.5 2.2 | 2.8 4.4 2.5 2.5 4.0 2.5 3.4 2.3 2.3 3.0 3.6 3.1 2.5 2.5 2.9
---------------------------------------------------------------------------------------------------------
| ------------ 200 ----------- ------------ 400 ----------- ------------ 800 -----------
-----------+---------------------------------------------------------------------------------------------
400 | 2.2 1.6 1.8 1.8 1.7
| 2.8 2.8 2.4 2.4 2.7
|
800 | 2.8 2.2 2.6 2.6 2.3 2.8 2.6 2.9 2.9 2.7 | 3.0 3.1 2.7 2.7 3.1 2.6 2.7 2.6 2.6 2.7
|
1600 | 3.0 2.8 3.2 3.2 2.9 2.8 2.4 2.7 2.7 2.5 2.1 2.1 2.2 2.2 2.2
| 2.5 2.9 2.5 2.5 2.7 2.7 2.6 2.4 2.4 2.7 2.3 2.7 2.4 2.4 2.7
---------------------------------------------------------------------------------------------------------
SEP = Simple expansion predictor, DPS = Design-based predictor (DP) based on sample variance and covariances, DPP = DP using sample covariances and population variance of the auxiliary variable, POP = DP based on population variance and covariances, JKF= CI based on Jackknife variance estimator.
119
Figure 7.1. Mean squared error by population size and sample size
Sample sizes (from top to bottom): n=25, 50, 100, 200, 400)Symbols: .=SRS, x=DPS, O=DPP, +=POP
MSE
x 1
0,00
0
Population size50 100 200 400 800 1600
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
Sample sizes (from top to bottom): n=25, 50, 100, 200, 400Symbols: . = SRS, x=DPS, O=DPP, +=POP
Var
ianc
e x
10,0
00
Population size50 100 200 400 800 1600
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
120
Figure 7.2. MSE reduction (%) of a design-based predictor (DPS or DPP ) by population and sample sizes
Estimation method: DPSSample size: .=25, o=50, x=100, +=200, O=400, D=800
MS
E re
duct
ion
%
Population size50 100 200 400 800 1600
10
12
14
16
18
20
10
12
14
16
18
20
Estimation method: DPPSample size: .=25, o=50, x=100, +=200, O=400, D=800
MS
E re
duct
ion
%
Population size50 100 200 400 800 1600
10
12
14
16
18
20
10
12
14
16
18
20
121
Figure 7.3. MSE Ratio of design-based predictors to SRS estimator by population and sample sizes
Estimation method: DPSSample size: .=25, o=50, x=100, +=200, O=400, D=800
MS
E R
atio
of D
PS
to S
RS
Population size50 100 200 400 800 1600
.8
.825
.85
.875
.9
.8
.825
.85
.875
.9
Estimation method: DPPSample size: .=25, o=50, x=100, +=200, O=400, D=800
MS
E R
atio
of D
PS
to S
RS
Population size50 100 200 400 800 1600
.8
.825
.85
.875
.9
.8
.825
.85
.875
.9
122
7.2.3. Extension to multinomial auxiliary variable
Let us consider the example of estimating the rate of drug coverage. When a
categorical variable has more than two categories, we may extend the above results by
replacing the categorical variable with a set of indicator variables. For example, we are
interested in estimating the disease prevalence by age categories ( ( )0 1sy = is subject s
has drug coverage, 0 otherwise) based on a fixed size sample survey. The subjects are
classified into three age categories (referred as subpopulations hereafter), e.g., ≤ 65 ,
>65 but <=75 and >75. The marginal totals of the three age categories ( )1 2 3, ,N N N are
known. The age categories may be represented using three indicator variables,
( )1 1 65.0 otherwise.
y≤⎧
= ⎨⎩
( )2 1 65< but 75,0 otherwise.
y≤⎧
= ⎨⎩
( )3 1 >75,0 otherwise.
y ⎧= ⎨⎩
To avoid singularity, we use ( )1y and ( )2y in the model. By applying results
derived in Section 5.4 of Chapter 5, we have
( ) ( )( )0 0 0r r • ••′= − −β Y μ , (7.6)
where 10 0,−• • •=β Σ σ 1 2N N
N N•
′⎛ ⎞= ⎜ ⎟⎝ ⎠
μ and ( ) ( ) ( )( )1 2Y Y• ′=Y .
Since 1N and 2N are known, we have
( )
( )( )
12 11 2
11 2
111
N N NN NN N NN N
−
• −
⎛ ⎞− −= ⎜ ⎟⎜ ⎟− −− ⎝ ⎠
Σ , and
123
( )( )
2 2 1 21
1 2 1 11 2 3
1 N N N N NNN N N N NN N N
−•
−⎛ ⎞−= ⎜ ⎟−⎝ ⎠
Σ .
It is shown that ( )( )
0 020ˆ
1n n nn n
σ−
=−
and ( )
01 0 10
02 0 2
1ˆ1
nn n nnn n nn n•
−⎛ ⎞= ⎜ ⎟−− ⎝ ⎠
σ . Therefore,
( )
( )
( )( ) ( )
01 0 11
0 3 0 023
02 0 22
111 1 1ˆ
1 11 1
nn n nNN N n n nn
n n n n Nnn n n
N
•
⎛ ⎞−⎜ ⎟ ⎛ ⎞− −⎜ ⎟= + − ⎜ ⎟⎜ ⎟ ⎜ ⎟− − ⎝ ⎠⎜ ⎟−⎜ ⎟⎝ ⎠
β
After algebraic simplification, we have
( )
30
0 0 01
1ˆ1
k kk
k k
n N n n nr n nn n n N N n=
⎧ ⎫⎛ ⎞− ⎪ ⎪⎛ ⎞= − − −⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭∑ . (7.7)
Its variance is
( ) ( )2 20 0?
1ˆ垐var 1 frn
ρ σ−⎛ ⎞= − ⎜ ⎟⎝ ⎠
, (7.8)
where 2 1 20 0 0 0ˆ 垐 �ρ σ−• • • •= σ Σ σ .
When sample estimates of variance and covariance are used
It can be shown that this estimator is similar to poststratified estimators. If the
sample estimator of •Σ is used, we have ( )
( )2 2 1 21
1 2 1 11 2 3
1ˆ n n n n nnn n n n nn n n
−•
−⎛ ⎞−= ⎜ ⎟−⎝ ⎠
Σ ,
( )01 0 1
002 0 2
1ˆ1
nn n nnn n nn n•
−⎛ ⎞= ⎜ ⎟−− ⎝ ⎠
σ and thus
101 0
1 30 03 0
3202 0
2
111ˆ11
nn nn n nn n
n nnn nn n
•
⎛ ⎞⎛ ⎞−⎜ ⎟⎜ ⎟ ⎛ ⎞⎝ ⎠ ⎛ ⎞⎜ ⎟= − − ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎛ ⎞ ⎝ ⎠⎜ ⎟−⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
β .
124
Therefore,
3 3
0 00 0 0
1 1
1ˆ k k k k kk
k kk k k
n N n n n N nr n nn n n N N n N n= =
⎛ ⎞⎛ ⎞= − − − =⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
∑ ∑ . (7.9)
Its variance is
( )230
0 01
1 1ˆvar1
k
k k
f nr nn n n=
⎛ ⎞−= −⎜ ⎟− ⎝ ⎠
∑ . (7.10)
When 3p > categories are present
Results (7.7), (7.8), (7.9) and (7.10) can be readily extended to scenarios when
more than three categories are present.
A. When the population variance of the auxiliary variables are used:
( )
00 0 0
1
1ˆ1
pk k
kk k
n N n n nr n nn n n N N n=
⎧ ⎫⎛ ⎞− ⎪ ⎪⎛ ⎞= − − −⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭∑ , (7.11)
and
( ) ( )( ) ( )
( )20 00 0 022
1
1 1 1ˆvar1 1
p
k kk k
n n nf Nr nn n nn n n Nn n =
⎧ ⎫−− −⎪ ⎪⎛ ⎞= − −⎨ ⎬⎜ ⎟ −⎝ ⎠ −⎪ ⎪⎩ ⎭∑ . (7.12)
B. When the sample variance of the auxiliary variables are used:
0 00 0 0
1 1
1ˆp p
k k k k kk
k kk k k
n N n n n N nr n nn n n N N n N n= =
⎛ ⎞⎛ ⎞= − − − =⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
∑ ∑ . (7.13)
and its variance is
( ) ( )20
0 01
1ˆvar1
pk
k k
f nr nn n n=
⎛ ⎞−= −⎜ ⎟− ⎝ ⎠
∑ . (7.14)
Derivations of the above results are given in Section 7.6.
125
7.3. Cases with two categorical variables
There are generally two scenarios when two categorical auxiliary variables are
involved, depending on whether the cell or marginal totals of the auxiliary variables are
known. Suppose in a health-related survey, we are interested in estimating the
prevalence of skipping medication behavior (binomial), and we have information about
the subjects’ gender and age from Center for Medicare and Medicaid Services (CMS)
administrative data. After a sample is taken, the gender and age of each sampled
subject will be known. There are several scenarios of practical interest, based on the
availability of information about subjects’ gender and age.
Table 7.4. Two categorical auxiliary variable examples
Age
Total Gender 65-80 >80
Female 1T 2T 1T i
Male 3T 4T 2T i
Total 1Ti 2Ti T
7.3.1. Cases with known cell totals
When all cell totals ( )1 2 3 4, , ,T T T T are known, we may create the following set of
indicator variables:
( )11 Female, Aged 65-80 years,
0 Otherwise.y
⎧⎪= ⎨⎪⎩
126
( )21 Female, Aged >80 years,
0 Otherwise.y
⎧⎪= ⎨⎪⎩
( )21 Male, Aged 65-80 years,
0 Otherwise.y
⎧⎪= ⎨⎪⎩
( )41 Male, Aged >80 years,
0 Otherwise.y
⎧⎪= ⎨⎪⎩
With them, results from Section 5.4 can thus be directly applied to this situation.
To avoid singularity, we may include any three of the four categorical variables into the
model. The estimator and its variance will be the same as (7.11) and (7.12), or (7.13)
and (7.14), respectively.
7.3.2. Cases with only known marginal totals
When only marginal totals are known, we may use two indicator variables to
improve precision of prediction, i.e., one for gender and the other for age category. We
define the two indicator variables as
( )11 Male,
0 Female,y
⎧⎪= ⎨⎪⎩
and
( )21 Aged 65-80,
0 Aged >80 years.y
⎧⎪= ⎨⎪⎩
We denote the total number of male and female subjects as mN , fN , respectively, the
number of persons aged 65-80 and >80 years as aN and bN .
We may apply results derived in Section 5.4. of Chapter 5 as follows,
127
( ) ( )( )0 0 0r r • ••′= − −β Y μ , (7.15)
where 10 0,−• • •=β Σ σ m aN N
N N•
′⎛ ⎞= ⎜ ⎟⎝ ⎠
μ and ( ) ( ) ( )( )1 2Y Y• ′=Y .
Since only the marginal totals are known, •Σ is undetermined because the
population covariance between ( )1Y and ( )2Y is unknown. We may use the sample
variance ˆ•Σ , the sample covariance between ( )0Y and ( )1Y , ( )2Y , 0ˆ •σ , and sample
variance of ( )0Y , ( ) ( )20 0 0ˆ 1n n n n nσ = − − . Since , , ,m a m a f m a m f ann n n n n n n− = − , we
have
( ) ( )
,,
,
, ,,
11ˆ
1 1
f am a
m fm f m a m a m f
m a m a a b f am a a b
m f m f
nnn nn n nn n n n n
nn n n n nn n n n nn n nn n n n
•
⎛ ⎞⎛ ⎞−⎜ ⎟⎜ ⎟⎜ ⎟− ⎜ ⎟⎛ ⎞ ⎝ ⎠= = ⎜ ⎟⎜ ⎟−− − ⎛ ⎞⎝ ⎠ ⎜ ⎟
−⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
Σ ,
Since 0, 0 0, 0,m m f m m fnn n n n n n n− = − and 0, 0 0, 0,a a b a a bnn n n n n n n− = − , we have
( ) ( )
0,0,
0, 00
0, 0 0, 0,
1 1ˆ1 1
fmm f
m fm m
a a a ba b
a b
nnn n
n nnn n nnn n nn n n n n n
n nn n
•
⎛ ⎞⎛ ⎞−⎜ ⎟⎜ ⎟⎜ ⎟− ⎜ ⎟⎛ ⎞ ⎝ ⎠= = ⎜ ⎟⎜ ⎟−− − ⎛ ⎞⎝ ⎠ ⎜ ⎟−⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
σ .
For simplicity, we denote ,,,
f am am a
m f
nnr
n n⎛ ⎞
Δ = −⎜ ⎟⎜ ⎟⎝ ⎠
, 0,0,0,
fmm
m f
nnr
n n⎛ ⎞
Δ = −⎜ ⎟⎜ ⎟⎝ ⎠
,
0, 0,0,
a ba
a b
n nr
n n⎛ ⎞
Δ = −⎜ ⎟⎝ ⎠
and ( )2
,
a b
a b m f m a
n nBn n n n r
=− Δ
. We have
128
( ) ,1
,
1ˆ
1
a bm a
m f
a bm a
n n rn n n nBn n
r
−•
⎛ ⎞−Δ− ⎜ ⎟= ⎜ ⎟⎜ ⎟−Δ⎝ ⎠
Σ and ( )
0,0
0,
1ˆ1
m f m
a b a
n n rn n rn n•
Δ⎛ ⎞= ⎜ ⎟⎜ ⎟Δ− ⎝ ⎠
σ , and thus
0, , 0,
0
0, , 0,
ˆm m a a
m fa m a m
a b
r r r
B n nr r r
n n•
Δ − Δ Δ⎛ ⎞⎜ ⎟
= ⎜ ⎟Δ − Δ Δ⎜ ⎟⎜ ⎟⎝ ⎠
β .
Therefore,
00 0, 0,
, 0, , 0,
ˆ
.
m m a am a
m fm m a am a a m a m
a b
n N n N nr B r B rn N n N n
n nN n N nB r r B r rN n N n n n
⎛ ⎞ ⎛ ⎞= + − Δ + − Δ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
⎛ ⎞⎛ ⎞ ⎛ ⎞− − Δ Δ − − Δ Δ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
The variance of 0r is
( ) ( )2 20 0?
1ˆ垐var 1 frn
ρ σ−⎛ ⎞= − ⎜ ⎟⎝ ⎠
,
where ( )( ) ( ) ( )( )
222 22
0? , 0, 0, 0, ,0 0
1ˆ 2m f m a b a m f m a m a
n nB n n r n n r n n r r r
n n nρ
−= Δ + Δ − Δ Δ Δ
−.
After algebraic simplification, we have
( ) ( )
( )
( )( ) ( ) ( )( )
0 00
2 2
0, 0, 0, 0, ,
1ˆvar1
1 1 2m f m a b a m f m a m a
n n nfrn n n
f n B n n r n n r n n r r r
⎛ ⎞−−= ⎜ ⎟⎜ ⎟−⎝ ⎠
− − − Δ + Δ − Δ Δ Δ
.
7.4. Cases with single continuous auxiliary variable
Consider a prevalence survey of skipping medications in a low-income senior
community, where an individual’s age is a strong predictor. However, the information
about an individual’s age may or may not be available before sampling. Assuming such
information of the sampled subjects will be available for analysis, the following
129
scenarios may happen, 1) only the population mean age is known; 2) ages are known
for all individuals in the population.
For cases where only the population mean age is known, we may directly apply
those results derived in Section 5.3 of Chapter 5 while using sample variance of age and
sample covariance between the response variable and age.
For cases where ages are known for all subjects in the population, we may apply
those results presented in Section 5.3. of Chapter 5 while using population variance of
age and sample covariance between the response variable and age.
When age is known for all subjects in the population, the distribution
information such as quartiles, octiles or deciles will also be known and can be used to
improve precision of estimation. A question of practical interest is how such
information should be used. An analyst may chose to,
Strategy I: Use age as a continuous variable, and follow the method of
Section 7.4.1;
Strategy II: Use distribution properties of age, such as quartiles, octiles or
deciles, and follow the strategies of Section 7.2.2.
Intuitively, Strategy II should give better estimators because more information is
incorporated into estimation. The immediate question is how much detail of the
distribution information should be used. We will answer this question by a series of
Monte Carlo simulations. We will compared estimators based on continuous age, its
median, quartiles, octiles and double-octiles.
130
7.4.1. Hypothetical populations
We generated two series of hypothetical populations of sizes 80, 160, 320, 640
and 1280, each series has a hypothetical relationship between smoking behavior and age
as illustrated in Figure 7.4. The two series of hypothetical populations were held as
fixed finite populations. In populations of Series A, the proportions of smokers increase
monotonically with increases of age. In populations of Series B, the relationship
between proportion of smokers and age is bell-shaped; the proportions of smokers
increases with age until its fifth octile (age 71-75), then gradually declines. Age was set
as uniformly distributed within each octiles. The characteristics of the two series of
populations are summarized in Table 7.5.
7.4.2. Monte Carlo simulations
In all the simulations, we used the population gender proportions to improve
estimation using methods developed in Sections 5.3 and 7.2. The study was carried out
as follows. From each fixed finite population, a series of 5,000 SRSWOR samples of
size n , where 20,40,80,160,320,640n = and n N< , were repeatedly drawn. For each
sample, we estimated the proportion of smokers with (7.11) and its variance estimate
with (7.12). The corresponding 95% confidence intervals (CIs) were constructed based
on normal approximation. Age was treated as, 1) a continuous variable (CNT), 2) a
binomial variable with cut-off point at its population median (MED), 3) a multinomial
variable with cut-off points at its population quartiles (QRT), 4) a multinomial variable
with cut-off points at its population octiles (OCT), and 5) a multinomial variable with
cut-off points at its population sedeciles (SED, quantiles based on division into 16
groups of equal frequency), respectively.
131
We compared estimation methods in terms of their average mean squared errors
and the overall performance of their 95% confidence intervals based the estimated
variance. We evaluated their confidence intervals in terms of the coverage rates,
average length. Simulation results are summarized in Tables 7.6, 7.7, 7.8, 7.9 and 7.10
and presented graphically in Figures 7.5, 7.6, 7.7 and 7.8. Based on these simulations,
we made the following observations.
7.4.3. Comparison between the age-adjusted and the simple expansion predictors
When the relationship between smoker proportion and age is monotonic
(population series A), as shown in Table 7.6, MSEs were significantly reduced by
adjustment for age mean (CNT), median (MED) or quartiles (QTR) even when sample
sizes were smaller (n=20, 40, 80). When sample sizes became ≥160, it became more
efficient to adjust for octiles or sedeciles than the simple expansion predictors (SEP).
Regardless of sample sizes, it was more efficient to adjust for age mean than for age
categories, however, the differences were gradually disappearing when sample size
became large ( )320n ≥ . When sample sizes were small (n≤40), adjusting for age
octiles or sedeciles resulted in larger MSE than the SEP.
When the relationship is not monotonic (population series B), as shown in Table
7.7, the estimators adjusted for age mean, median, octiles and sedeciles were less
efficient than the SEPs when sample sizes were ≤80. When sample sizes became 40≥ ,
estimators adjusted for age quartiles became more efficient; and when sample sizes
became 80≥ , the estimators adjusted for either octiles or sedeciles became more
efficient. Finally, when sample sizes became moderate (n≤160), the estimators adjusted
for age median, quartiles, octiles or sedeciles became more efficient than the SEP.
132
Figure 7.4. Hypothetical relationship between smoking and age of population series A and B
Population Series A P
ropo
rtion
of s
mok
ers
Age60 65 70 75 80 85 90 95 100
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
Population Series B
Pro
porti
on o
f sm
oker
s
Age60 65 70 75 80 85 90 95 100
0.0
0.2
0.4
0.6
0.8
0.0
0.2
0.4
0.6
0.8
133
Table 7.5. Parameters of hypothetical population Series A and B
Population Series A
Age
Population Size (N)
80 160 320 640 1280
Octiles Range n rate n rate n rate n rate n rate
1 60.0-62.0 10 .1 20 .1 40 .1 80 .1 160 .1
2 62.1-64.5 10 .2 20 .2 40 .2 80 .2 160 .2
3 64.6-67.5 10 .3 20 .3 40 .3 80 .3 160 .3
4 67.6-71.0 10 .4 20 .4 40 .4 80 .4 160 .4
5 71.1-75.0 10 .5 20 .5 40 .5 80 .5 160 .5
6 75.1-80.0 10 .6 20 .6 40 .6 80 .6 160 .6
7 80.1-85.0 10 .7 20 .7 40 .7 80 .7 160 .7
8 85.1-100 10 .8 20 .8 40 .8 80 .8 160 .8
Overall 60.0-62.0 .45 .45 .45 .45 .45
Age mean 73.0 73.2 73.0 73.2 73.1
Population Series B
Age
Population Size (N)
80 160 320 640 1280
Octiles Range n rate n rate n rate n rate n rate
1 60.0-62.0 10 .1 20 .1 40 .1 80 .1 160 .1
2 62.1-64.5 10 .1 20 .1 40 .1 80 .1 160 .1
3 64.6-67.5 10 .2 20 .15 40 .15 80 .15 160 .15
4 67.6-71.0 10 .4 20 .4 40 .4 80 .4 160 .4
5 71.1-75.0 10 .6 20 .6 40 .6 80 .6 160 .6
6 75.1-80.0 10 .4 20 .4 40 .4 80 .4 160 .4
7 80.1-85.0 10 .2 20 .2 40 .2 80 .2 160 .2
8 85.1-100 10 .1 20 .1 40 .1 80 .1 160 .1
Overall 60.0-62.0 .263 .268 .268 .268 .268
Age mean 73.0 73.1 73.1 73.2 73.1
134
Table 7.6. MSE of design-based estimators adjusted to known continuous or categorical auxiliary variables, by population size and sample size (Series A)
-----------------------------------------------------------------------------------
Sample size and | Estimation method
Population size | SEP CNT MED QRT OCT SED
-----------------+-----------------------------------------------------------------
20 80 | 96.3 81.5 86.0 93.0 114.8 143.7
160 | 109.7 96.5 97.4 105.6 129.5 172.2
320 | 117.7 102.3 104.5 113.5 138.0 188.6
640 | 119.3 108.0 106.9 117.7 143.4 195.5
1280 | 123.8 106.8 108.4 117.5 143.1 195.8
-----------------+-----------------------------------------------------------------
40 80 | 30.4 25.0 27.2 28.1 31.6 36.4
160 | 45.8 39.8 39.7 41.4 47.2 54.9
320 | 55.3 45.9 47.3 48.2 54.4 66.3
640 | 57.0 48.1 48.7 50.0 56.7 70.1
1280 | 58.0 48.6 50.4 51.3 57.0 69.5
-----------------+-----------------------------------------------------------------
80 160 | 15.3 12.7 13.1 13.1 13.8 14.6
320 | 23.3 19.1 19.6 19.4 20.6 23.1
640 | 26.6 22.1 22.8 22.6 24.1 27.3
1280 | 29.2 23.8 24.7 24.6 25.9 28.9
-----------------+-----------------------------------------------------------------
160 320 | 7.8 6.3 6.6 6.3 6.6 7.0
640 | 11.2 9.2 9.6 9.3 9.5 10.1
1280 | 13.3 11.0 11.4 11.1 11.4 12.0
-----------------+-----------------------------------------------------------------
320 640 | 3.9 3.2 3.3 3.2 3.2 3.3
1280 | 5.8 4.7 5.0 4.7 4.8 4.9
-----------------+-----------------------------------------------------------------
640 1280 | 1.9 1.6 1.6 1.6 1.6 1.6
-----------------------------------------------------------------------------------
SEP = Simple expansion predictor, CNT = Design-based predictor (DP) using
continuous age, MED = DP using median age, QRT = DP using quartiles of age,
OCT= DP using octiles of age, SED=DP using sedeciles (16-quantiles) of age.
135
Table 7.7. MSE of design-based estimators adjusted to known continuous or categorical auxiliary variables, by population size and sample size (Series B)
-----------------------------------------------------------------------------------
Sample size and | Estimation method
Population size | SEP CNT MED QRT OCT SED
-----------------+-----------------------------------------------------------------
20 80 | 73.6 78.9 76.9 77.9 92.1 113.0
160 | 87.5 94.8 90.3 90.8 107.9 142.1
320 | 93.1 100.8 95.7 96.2 115.9 153.6
640 | 96.8 104.1 99.7 100.2 117.4 158.9
1280 | 98.8 105.6 101.2 100.7 116.6 157.3
-----------------+-----------------------------------------------------------------
40 80 | 25.0 25.8 25.3 24.4 26.5 30.4
160 | 38.5 39.8 38.9 36.9 40.1 47.2
320 | 41.2 43.0 41.3 40.1 44.2 52.9
640 | 45.3 46.8 45.7 44.0 48.4 59.8
1280 | 45.7 47.5 46.3 44.3 48.3 58.8
-----------------+-----------------------------------------------------------------
80 160 | 12.4 12.7 12.4 11.6 12.1 13.3
320 | 18.5 18.8 18.6 17.1 17.6 19.5
640 | 21.1 21.4 20.9 19.1 19.9 22.1
1280 | 22.3 22.7 22.2 20.7 21.2 23.9
-----------------+-----------------------------------------------------------------
160 320 | 6.2 6.2 6.1 5.5 5.5 5.8
640 | 9.0 9.1 9.0 8.2 8.2 8.8
1280 | 10.4 10.5 10.3 9.6 9.5 10.1
-----------------+-----------------------------------------------------------------
320 640 | 3.2 3.2 3.1 2.8 2.8 2.9
1280 | 4.5 4.5 4.4 4.1 4.0 4.1
-----------------+-----------------------------------------------------------------
640 1280 | 1.5 1.5 1.5 1.4 1.3 1.4
-----------------------------------------------------------------------------------
SEP = Simple expansion predictor, CNT = Design-based predictor (DP) using
continuous age, MED = DP using median age, QRT = DP using quartiles of age,
OCT= DP using octiles of age, SED=DP using sedeciles (16-quantiles) of age.
136
7.4.4. MSEs and variances of the estimators when relationship between smoker
proportion and age is monotonic (Series A)
When the relationship between smoker proportions and age is monotonic, as
shown in Tables 7.6 and Figures 7.5 and 7.6, the estimators adjusted for age categories
had larger MSEs than the estimators adjusted for age mean (CNT, thereafter). The
more categories the age variable had, the larger their MSEs were. For n≤80, estimators
adjusted for age quartiles, octiles and sedeciles had larger MSEs than those adjusted for
mean or median age. When the sample became moderate or large ( )160n ≥ , the MSEs
of the estimators adjusted for age quartiles or octiles became smaller than those of
estimators adjusted for median age, and became close to MSEs of the estimators
adjusted for mean age.
Figure 7.6 presents ratios of MSE or variance of MED, QTR, OCT and SED
estimators to MSE or variance of CNT estimators, respectively. The MED estimators
had slightly larger MSEs and variances than CNT estimators, their ratios of MSE and
variance were between 1 and 1.05. However, the MSE ratios of QTR, OCT and SED
estimators were all larger than 1, regardless of smaller or larger sample sizes. It is
obvious that the smaller the sample sizes were, the greater the MSE ratios were. In
contrast, the smaller the sample sizes were, the smaller the variance ratios were, which
indicated the overall underestimated the variances of estimators using sample-based
variance estimators. The smaller the sample size was, the larger the difference was
between MSE and variance ratios. This phenomena raises questions about the validity
of the 95% confidence intervals constructed using sample-based variance estimators.
137
Figure 7.5. MSE and variance of design-based predictors adjusted for continuous or categorical age (Series A)
Sample sizes (from top to bottom): 40, 80, 160, 320Symbols: - =CNT, o=MED, x=QRT, +=OCT, D=SED
MS
E x
10,
000
Population size80 160 320 640 1280
0
10
20
30
40
50
60
70
80
0
10
20
30
40
50
60
70
80
Sample sizes (from top to bottom): 40, 80, 160, 320Symbols: - =SRS, o=CNT, x=MED, +=QTR, D=SED
Varia
nce
x 10
,000
Population size80 160 320 640 1280
0
10
20
30
40
50
60
0
10
20
30
40
50
60
138
Figure 7.6. MSE and variance ratios of estimators adjusted for age median, quartiles, octiles and sedeciles to estimators adjusted for age mean (Series A)
Sample Sizes: -= 20, o=40, x=80, +=160, T=320, S=640M
SE
ratio
Etimation method: Median
Population size80 160 320 640 1280
1
1.05
1.1
1.15
1.2
1
1.05
1.1
1.15
1.2
Estimation method: Quartiles
Population size80 160 320 640 1280
1
1.05
1.1
1.15
1.2
1
1.05
1.1
1.15
1.2
Estimation method: Octiles
Population size80 160 320 640 1280
1
1.25
1.5
1.75
2
1
1.25
1.5
1.75
2
Estimation method: Sedeciles
Population size80 160 320 640 1280
1
1.25
1.5
1.75
2
1
1.25
1.5
1.75
2
Sample Sizes: - = 20, o=40, x=80, +=160, T=320, S=640
Varia
nce
ratio
Etimation method: Median
Population size80 160 320 640 1280
.85
.9
.95
1
1.05
1.1
.85
.9
.95
1
1.05
1.1
Estimation method: Quartiles
Population size80 160 320 640 1280
.85
.9
.95
1
1.05
1.1
.85
.9
.95
1
1.05
1.1
Estimation method: Octiles
Population size80 160 320 640 1280
.3
.4
.5
.6
.7
.8
.9
1
.3
.4
.5
.6
.7
.8
.9
1
Estimation method: Sedeciles
Population size80 160 320 640 1280
.3
.4
.5
.6
.7
.8
.9
1
.3
.4
.5
.6
.7
.8
.9
1
139
7.4.5. MSEs and variances of the estimators when relationship between smoker
proportion and age is not monotonic (Series B)
When the relationship between smoker proportions and age is not monotonic, as
shown in Tables 7.7 and Figures 7.7 and 7.8, the estimators adjusted for age median
(MED) or quartiles (QTR) had smaller MSEs than the estimators adjusted for age mean
(CNT), regardless of population or sample sizes. When sample sizes became larger than
80, the estimators adjusted for age octiles (OCT) became more efficient than the CNT
estimators. When sample sizes were larger than 160, the estimators adjusted for age
sedeciles (SED) became more efficient than the CNT and MED estimators.
When sample sizes were small (n<160) or sampling fractions were high
( )25%f ≥ , the variance ratios of the QTR to the CNT estimators were much smaller
than the MSE ratios. Such effects were more pronounced for the variance ratios of the
OCT and SED to the CNT estimators. This indicated overall underestimation of the
variability of the estimators using sample-based variance estimators.
7.4.6. Negative variances
The variance estimates were calculated using (7.12). As shown in Table 7.8, the
probability of encountering negative variance estimates increased dramatically with the
number of categories the age variable had when sample sizes were small (n=20 or 40).
Since the sample covariance between response and auxiliary variable(s) were used for
estimation, the higher probabilities of encountering negative variance estimates might
be attributable to the larger variation of sample covariance when sample sizes were
small.
140
Figure 7.7. MSE and variance of design-based predictors adjusted for continuous or categorical age (Series B)
Sample sizes (from top to bottom): 40, 80, 160, 320Symbols: - =CNT, o=MED, x=QRT, +=OCT, D=SED
MS
E x
10,
000
Population size80 160 320 640 1280
0
10
20
30
40
50
60
0
10
20
30
40
50
60
Sample sizes (from top to bottom): 40, 80, 160, 320Symbols: .=SRS, o=CNT, x=MED, +=QTR, D=SED
Varia
nce
x 10
,000
Population size80 160 320 640 1280
0
10
20
30
40
50
60
0
10
20
30
40
50
60
141
Figure 7.8. MSE and variance ratios of estimators adjusted for age median, quartiles, octiles and sedeciles to estimators adjusted for age mean (Series B)
Sample Sizes: - = 20, o=40, x=80, +=160, T=320, D=640
MS
E ra
tio
Etimation method: Median
Population size80 160 320 640 1280
.85
.9
.95
1
.85
.9
.95
1
Estimation method: Quartiles
Population size80 160 320 640 1280
.85
.9
.95
1
.85
.9
.95
1
Estimation method: Octiles
Population size80 160 320 640 1280
.8
.9
1
1.1
1.2
1.3
1.4
1.5
.8
.9
1
1.1
1.2
1.3
1.4
1.5Estimation method: Sedeciles
Population size80 160 320 640 1280
.8
.9
1
1.1
1.2
1.3
1.4
1.5
.8
.9
1
1.1
1.2
1.3
1.4
1.5
Sample Sizes: - = 20, o=40, x=80, +=160, T=320, S=640
Varia
nce
ratio
Etimation method: Median
Population size80 160 320 640 1280
.7
.75
.8
.85
.9
.95
1
.7
.75
.8
.85
.9
.95
1
Estimation method: Quartiles
Population size80 160 320 640 1280
.7
.75
.8
.85
.9
.95
1
.7
.75
.8
.85
.9
.95
1
Estimation method: Octiles
Population size80 160 320 640 1280
.2
.3
.4
.5
.6
.7
.8
.9
1
.2
.3
.4
.5
.6
.7
.8
.9
1Estimation method: Sedeciles
Population size80 160 320 640 1280
.2
.3
.4
.5
.6
.7
.8
.9
1
.2
.3
.4
.5
.6
.7
.8
.9
1
142
Table 7.8. Probability of encountering negative variance when sample size is small and age is used as a continuous or categorical auxiliary variable
-----------------------------------------------------------------------------------
Population Series A
-----------------------------------------------------------------------------------
Sample size and | Estimation method
Population size | SEP CNT MED QRT OCT SED
-----------------+-----------------------------------------------------------------
20 80 | --- 0.4 --- 0.5 3.0 38.1
160 | --- 0.2 0.1 0.8 5.9 49.4
320 | --- 0.4 0.1 1.1 6.4 45.9
640 | --- 0.2 0.1 0.8 7.0 49.3
1280 | --- 0.5 0.1 0.9 6.9 50.1
-----------------+-----------------------------------------------------------------
40 80 | --- --- --- --- --- ---
160 | --- --- --- --- --- 1.3
320 | --- --- --- --- --- 1.7
640 | --- --- --- --- 0.1 2.7
1280 | --- --- --- --- 0.2 2.9
-----------------------------------------------------------------------------------
Population Series B
-----------------------------------------------------------------------------------
Sample size and | Estimation method
Population size | SEP CNT MED QRT OCT SED
-----------------+-----------------------------------------------------------------
20 80 | --- --- --- 0.1 1.7 28.9
160 | --- --- --- 0.2 3.4 31.6
320 | --- --- --- 0.2 3.9 35.2
640 | --- --- --- 0.2 4.4 36.1
1280 | --- --- --- 0.4 5.0 39.0
-----------------+-----------------------------------------------------------------
40 80 | --- --- --- --- --- ---
160 | --- --- --- --- --- 0.3
320 | --- --- --- --- --- 0.8
640 | --- --- --- --- 0.1 1.3
1280 | --- --- --- --- 0.1 1.6
-----------------+-----------------------------------------------------------------
--- = not apply or negligible (<0.1%). SEP = Simple expansion predictor, CNT =
Design-based predictor (DP) using continuous age, MED = DP using median of
age, QRT = DP using quartiles of age, OCT= DP using octiles of age, SED=DP
sedeciles (16-quantiles) of age. When sample size ≥ 80, the probability of
encountering negative variance is negligible (<0.1%).
143
7.4.7. Performance of 95% confidence intervals
The 95% confidence intervals were constructed using normal approximation
with variance estimates based on (7.12). Regardless of the functional form between
smoker proportion and age, the 95% confidence intervals (CI) of the estimators adjusted
for age mean and quantiles performed worse than simple expansion estimators when
sample sizes were very small (n=20). The more categories the variables had, the poorer
the coverage rates were. When sample size is 20, the 95% CIs constructed based on age
octiles or sedeciles had coverage rates as low as 70% or 30%, respectively. When
sample sizes became moderate (n=80), the 95% CIs for estimators adjusted for age
mean, median and quartiles had coverage rates close to its nominal level. Not until
sample sizes became larger (n=320 or 640), the 95% CIs for estimators adjusted for age
octiles or sedeciles had coverage rates close to their nominal level.
Results in Tables 7.9 and 7.10 suggested that the width of the confidence
intervals became increasingly narrower when the number of categories of age increased,
especially when sample sizes were small. When the relationship between smoker
proportions and age was not monotonic and sample size was moderate or large (n≥320),
the 95% CIs of the estimators adjusted for age quartiles, octiles or sedeciles were much
narrower than those adjusted for age mean or median, yet their coverage rates close to
their nominal level.
144
Table 7.9. Coverage rates and width of 95% confidence interval using age as a continuous or categorical variable, by population size and sample size (Series
A)
----------------------------------------------------------------------------------- Sample size and | Estimation method Population size | SEP CNT MED QRT OCT SED -----------------+----------------------------------------------------------------- 20 80 | 92.9 89.0 90.5 85.0 71.9 32.6 | 37.9 31.1 32.7 29.9 25.5 16.8 | 160 | 93.8 89.6 90.2 84.1 69.6 26.0 | 40.9 34.6 35.1 31.6 26.6 17.1 | 320 | 93.4 89.8 91.1 83.9 69.0 28.3 | 42.2 35.5 36.4 32.7 27.2 18.6 | 640 | 93.1 89.6 90.6 83.7 68.4 25.8 | 42.9 36.0 36.8 32.9 27.1 18.4 | 1280 | 92.3 89.5 90.5 84.7 69.1 26.5 | 43.2 36.1 37.0 33.3 27.3 18.9 -----------------+----------------------------------------------------------------- 40 80 | 95.8 92.9 92.9 91.4 87.9 78.3 | 21.9 18.8 19.6 18.8 17.9 15.2 | 160 | 96.2 93.5 93.6 91.1 86.8 72.9 | 26.8 23.6 23.9 22.6 21.2 16.7 | 320 | 93.4 92.4 92.9 90.5 85.6 72.0 | 28.9 25.1 25.7 24.2 22.4 18.6 | 640 | 93.5 93.1 93.4 91.2 85.9 71.0 | 29.9 26.0 26.5 24.9 22.8 18.6 | 1280 | 95.0 93.0 93.5 91.1 86.4 71.1 | 30.4 26.2 26.9 25.3 23.1 18.7 -----------------+----------------------------------------------------------------- 80 160 | 96.2 94.3 94.4 93.4 91.7 87.4 | 15.5 13.9 14.0 13.5 13.2 11.9 | 320 | 94.9 93.9 94.7 93.4 91.4 87.5 | 18.9 16.7 17.1 16.4 15.9 14.9 | 640 | 94.6 94.6 94.7 93.7 91.9 86.8 | 20.4 18.1 18.4 17.7 17.1 15.8 | 1280 | 95.4 94.1 93.9 93.1 90.9 85.7 | 21.1 18.6 19.1 18.3 17.6 16.2 -----------------+----------------------------------------------------------------- 160 320 | 94.3 94.8 94.5 94.4 93.8 91.7 | 10.9 9.7 9.9 9.7 9.5 9.3 | 640 | 95.2 95.1 94.9 94.1 93.5 91.7 | 13.4 12.0 12.2 11.8 11.6 11.2 | 1280 | 95.0 94.5 94.5 94.0 93.1 91.6 | 14.4 12.8 13.1 12.7 12.4 12.0 -----------------+----------------------------------------------------------------- 320 640 | 95.5 94.7 94.8 94.8 94.3 93.6 | 7.7 6.9 7.0 6.9 6.8 6.7 | 1280 | 95.1 94.8 95.0 94.8 94.2 93.2 | 9.4 8.4 8.6 8.4 8.3 8.1 -----------------+----------------------------------------------------------------- 640 1280 | 95.7 94.9 94.8 95.0 94.8 94.3 | 5.5 4.9 5.0 4.9 4.8 4.8 -----------------------------------------------------------------------------------
SEP = Simple expansion predictor, CNT = Design-based predictor (DP) using continuous age, MED = DP using median of age, QRT = DP using quartiles of age, OCT= DP using octiles of age, SED=DP using sedeciles (16-quantiles) of age.
145
Table 7.10. Coverage rates and width of 95% confidence interval using age as a continuous or categorical variable, by population and sample sizes (Series B)
----------------------------------------------------------------------------------- Sample size and | Estimation method Population size | SEP CNT MED QRT OCT SED -----------------+----------------------------------------------------------------- 20 80 | 94.7 92.5 91.6 86.7 75.7 36.0 | 33.3 32.0 31.4 27.9 23.9 15.3 | 160 | 92.8 91.6 91.0 86.3 73.5 34.8 | 36.1 34.6 34.0 30.1 25.2 16.1 | 320 | 91.7 90.5 89.7 85.0 71.4 32.9 | 37.3 35.7 35.1 30.9 25.6 17.0 | 640 | 91.4 90.2 90.1 85.2 72.0 32.3 | 37.9 36.2 35.6 31.2 25.9 17.0 | 1280 | 91.3 90.3 90.2 85.1 71.4 30.3 | 38.2 36.5 35.8 31.4 25.9 17.0 -----------------+----------------------------------------------------------------- 40 80 | 91.6 91.9 92.3 91.4 88.2 79.2 | 19.4 19.0 18.8 17.5 16.6 14.1 | 160 | 94.7 93.6 93.0 90.8 87.1 76.3 | 23.8 23.3 23.1 21.3 19.8 16.7 | 320 | 94.3 93.6 93.6 91.8 87.1 76.1 | 25.7 25.2 24.9 22.9 21.1 17.6 | 640 | 92.8 92.6 92.9 90.8 86.3 73.4 | 26.5 26.0 25.6 23.5 21.6 18.0 | 1280 | 92.4 92.6 92.4 90.9 86.2 74.1 | 26.9 26.3 26.0 23.9 21.7 18.0 -----------------+----------------------------------------------------------------- 80 160 | 94.5 94.4 94.2 94.4 92.6 87.6 | 13.8 13.7 13.5 12.7 12.3 11.6 | 320 | 93.9 93.8 93.8 92.9 91.2 87.8 | 16.8 16.6 16.5 15.5 14.8 13.9 | 640 | 93.9 94.2 94.2 94.1 92.1 88.7 | 18.2 18.0 17.8 16.7 15.9 14.9 | 1280 | 95.1 94.6 94.4 93.7 91.9 87.7 | 18.8 18.6 18.4 17.2 16.4 15.3 -----------------+----------------------------------------------------------------- 160 320 | 95.4 95.2 94.5 94.4 94.0 92.6 | 9.7 9.7 9.6 9.1 8.9 8.6 | 640 | 94.8 94.6 94.7 94.4 93.7 92.2 | 11.9 11.8 11.7 11.1 10.8 10.5 | 1280 | 95.4 94.6 94.8 94.6 93.8 91.8 | 12.9 12.8 12.7 11.9 11.6 11.2 -----------------+----------------------------------------------------------------- 320 640 | 94.9 94.8 94.6 94.6 94.2 93.5 | 6.9 6.9 6.8 6.4 6.3 6.2 | 1280 | 95.5 95.2 95.4 95.2 94.7 93.9 | 8.4 8.4 8.3 7.9 7.7 7.6 -----------------+----------------------------------------------------------------- 640 1280 | 95.2 95.2 95.2 95.1 94.7 94.4 | 4.9 4.9 4.8 4.6 4.5 4.5 -----------------------------------------------------------------------------------
SRS = Simple expansion predictor, CNT = Design-based predictor (DP) using continuous age, MED = DP using median of age, QRT = DP using quartiles of age, OCT= DP using octiles of age, SED=DP using sedeciles (16-quantiles) of age.
146
In conclusion, the estimators adjusted for age categories were more efficient
than the estimators adjusted for age mean when the functional form between smoker
proportions and age is not monotonic. When sample size became large, such efficiency
became more apparent. However, we caution the use of categorical auxiliary variable
to improve rate estimation when sample size is small due to the higher probability of
encountering negative variances and overall underestimate of the variances. For the
simulations with sample size less than 40, using more than 2 categories resulted in loss
of efficiency. When sample size is greater than 80 and less than 160, age quartiles may
be used. When the sample size is moderate (n=320 or 640), octiles may be used. As a
rough guild based on the limited simulations, the average number per group in the
sample should not be less than 20.
7.5. Derivations of results (7.1) and (7.2) in Section 7.2.
By applying results derived in Section 5.4 of Chapter 5, we have
( ) ( )( )0 0 0r r • ••′= − −β Y μ , (7.16)
where 10 0,−• • •=β Σ σ 1 2N N
N N•
′⎛ ⎞= ⎜ ⎟⎝ ⎠
μ and ( ) ( ) ( )( )1 2Y Y• ′=Y .
Since 1N and 2N are known, we have
( )
( )( )
12 11 2
11 2
111
N N NN NN N NN N
−
• −
⎛ ⎞− −= ⎜ ⎟⎜ ⎟− −− ⎝ ⎠
Σ ,
( )
( )2 2 1 21
1 2 1 11 2 3
1 N N N N NNN N N N NN N N
−•
−⎛ ⎞−= ⎜ ⎟−⎝ ⎠
Σ .
Since ( )( )
0 020ˆ
1n n nn n
σ−
=−
and ( )
01 0 10
02 0 2
1ˆ1
nn n nnn n nn n•
−⎛ ⎞= ⎜ ⎟−− ⎝ ⎠
σ , we have
147
( )( )
( )
( )( )( ) ( )
( ) ( )( )
( )
( )
( )
2 2 1 2 01 0 1
01 2 3 1 2 1 1 02 0 2
2 2 01 0 1 1 2 02 0 2
1 2 3 1 2 01 0 1 1 1 02 0 2
01 0 11
02 0 22
1 1ˆ1
1 11
1
11 1
N N N N N nn n nNn n N N N N N N N N nn n n
N N N nn n n N N nn n nNn n N N N N N nn n n N N N nn n n
nn n nNN
n nnn n n
N
•
− −⎛ ⎞⎛ ⎞− ⎜ ⎟⎜ ⎟=⎜ ⎟⎜ ⎟− − −⎝ ⎠⎝ ⎠
− − + −⎛ ⎞− ⎜ ⎟=⎜ ⎟− − + − −⎝ ⎠
⎛ ⎞−⎜ ⎟− ⎜ ⎟= ⎜ ⎟−
⎜ ⎟−⎜ ⎟⎝ ⎠
β
( ) ( )3 0 023
11 1 .1 1
N n n nnn n N
⎛ ⎞−+ − ⎜ ⎟
⎜ ⎟− ⎝ ⎠
Consequently,
( ) ( ) ( )
( ) ( ) ( )
1 1
0 01 0 1 02 0 2 3 0 020
2 21 2 3
1 1
01 0 1 02 0 2 3 0 02
2 21 2 3
1 1ˆ 1 11 1
1 1 1 11 1
n nn N nn n n nn n n N n n nnn nr
n nn n n N N n n Nn n
N NN nn n n nn n n N n n nnN N
N Nn n N N n n NN N
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎛ ⎞− − − − −
= − −⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎜ ⎟ ⎜ ⎟⎝ ⎠⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎛ ⎞− − − − −
+ +⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎜ ⎟ ⎜ ⎟⎝ ⎠⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
After algebraic simplification, we have
( )
30
0 0 01
1ˆ1
k kk
k k
n N n n nr n nn n n N N n=
⎧ ⎫⎛ ⎞− ⎪ ⎪⎛ ⎞= − − −⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭∑ . (7.17)
This is a “post-stratified” estimator. To obtain its variance, we first evaluate its 20ρ •
( ) ( )
( ) ( )( )
2 1 20 0 0 0 0
0 0 1 2 3
2 2 1 2 01 0 101 0 1 02 0 2
1 2 1 1 02 0 2
ˆ 垐 垐
1 1 11
Nn n n n n N N N
N N N N N nn n nnn n n nn n n
N N N N N nn n n
ρ σ−• • • • •= =
−=
− −
− −⎛ ⎞⎛ ⎞× − − ⎜ ⎟⎜ ⎟− −⎝ ⎠⎝ ⎠
σ Σ σ σ
148
( ) ( )
( ) ( )( )( )( ) ( )
( ) ( )( )( )
2 21 2 01 0 1 1 2 02 0 2
20? 2 01 0 1 02 0 2
0 0 1 2 3 2 21 3 02 0 2 2 3 01 0 1
21 2 01 0 1 02 0 2
20 0 1 2 3 1 3 02 0 2 2 3 01 0
1 1 1ˆ 21
1 1 11
N N nn n n N N nn n nN N N nn n n nn n n
n n n n n N N NN N nn n n N N nn n n
N N nn n n nn n nNn n n n n N N N N N nn n n N N nn n n
ρ
⎧ ⎫− + −⎪ ⎪− ⎪ ⎪= × + − −⎨ ⎬
− − ⎪ ⎪+ − + −⎪ ⎪⎩ ⎭
− + −−= ×
− − + − + −( )
( ) ( )
21
32
0 010 0
1 1 1 .1 k k
k k
N nn n nn nn n n N=
⎧ ⎫⎪ ⎪⎨ ⎬⎪ ⎪⎩ ⎭
⎛ ⎞−⎛ ⎞= −⎜ ⎟⎜ ⎟⎜ ⎟− −⎝ ⎠⎝ ⎠∑
Therefore, its variance is
( ) ( )( ) ( )
( )3
20 00 0 022
1
1 1 1 1ˆvar1 1 k k
k k
n n nf f Nr nn n nn n n n Nn n =
⎛ ⎞−− − −⎛ ⎞ ⎛ ⎞= − −⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟−⎝ ⎠ ⎝ ⎠ −⎝ ⎠∑ (7.18)
Results (7.17) and (7.18) can be readily extended to scenarios where 3p >
categories are present:
( )
00 0 0
1
1ˆ1
pk k
kk k
n N n n nr n nn n n N N n=
⎧ ⎫⎛ ⎞− ⎪ ⎪⎛ ⎞= − − −⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭∑ , (7.19)
and
( ) ( )( ) ( )
( )20 00 0 022
1
1 1 1ˆvar1 1
p
k kk k
n n nf Nr nn n nn n n Nn n =
⎧ ⎫−− −⎪ ⎪⎛ ⎞= − −⎨ ⎬⎜ ⎟ −⎝ ⎠ −⎪ ⎪⎩ ⎭∑ . (7.20)
Further, if the sample estimator of •Σ is used, we have
( )( )
2 2 1 21
1 2 1 11 2 3
1ˆ n n n n nnn n n n nn n n
−•
−⎛ ⎞−= ⎜ ⎟−⎝ ⎠
Σ , and ( )
01 0 10
02 0 2
1ˆ1
nn n nnn n nn n•
−⎛ ⎞= ⎜ ⎟−− ⎝ ⎠
σ , and then
101 0
1 30 03 0
3202 0
2
111ˆ11
nn nn n nn n
n nnn nn n
•
⎛ ⎞⎛ ⎞−⎜ ⎟⎜ ⎟⎛ ⎞⎝ ⎠⎜ ⎟ ⎛ ⎞= − − ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎛ ⎞ ⎝ ⎠⎜ ⎟−⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
β . (7.21)
149
( ) ( )
( )
( )
0 0 0 0
1 10 1 2 301 0 02 0 03 0
2 21 2 3
1 11 2 301 0 02 0 03 0
2 21 2 3
ˆ
1 1 1 1 1 1 1
1 1 1 1 1 1 1
r r
n nn n n nn n n n n nn nn n n n n n n n n
N Nn n nn n n n n nN NN n n n n N n n
• •• •′ ′= − +
⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞= − − − + −⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠
⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ − − − −⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠
β Y β μ
0 1 1 2 2 3 301 0 02 0 03 0
1 2 3
1 1 2 2 3 301 0 02 0 03 0
1 2 3
0 1 201 0 02 0
1 1 1 1
1 1 1 1
1
n n n n n n nn n n n n nn n n n n n n n n n
N n N n N nn n n n n nn N n n N n n N n
n n nn n n nn n n n
⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞= − − + − + − −⎨ ⎬⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎩ ⎭
⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞+ − + − − − −⎨ ⎬⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎩ ⎭
⎛ ⎞ ⎛ ⎞= − − + −⎜ ⎟ ⎜⎝ ⎠ ⎝
303 0
1 1 2 2 3 301 0 02 0 03 0
1 2 3
3 30
0 0 0 00 0
1
1 1k k kk k
k k k
nn nn
N n N n N nn n n n n nN n n n n n n
n n N nn n n nn n n N n n= =
⎧ ⎫⎛ ⎞+ −⎨ ⎬⎟ ⎜ ⎟⎠ ⎝ ⎠⎩ ⎭
⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ − + − + −⎨ ⎬⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎩ ⎭
⎧ ⎫⎛ ⎞ ⎛ ⎞= − − + −⎨ ⎬⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎩ ⎭
∑ ∑
After algebraic simplification, we have
3
00 0 0
0
1ˆ k k kk
k k k
n N n nnr n nn n n N N n=
⎛ ⎞ ⎛ ⎞= − − −⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠
∑ . (7.22)
To obtain its variance, we first calculate 20ρ • ,
( ) ( ) ( )( )
2 1 20 0 0 0
2 2 1 2 01 0 101 0 1 02 0 2
1 2 1 1 02 0 21 2 3 0 0
ˆˆ 垐 �
1 1 n n n n n nn n nnn n n nn n n
n n n n n nn n nnn n n n n n
ρ σ−• • • •=
− −⎛ ⎞⎛ ⎞= − − ⎜ ⎟⎜ ⎟− −− ⎝ ⎠⎝ ⎠
σ Σ σ
This can be simplified as follows.
( )( )( ) ( )( )( )( ) ( )( )
( )( ) ( ) ( )
( )( )
2 01 2 01 0 1 1 02 1 02 0 220�
0 0 1 2 3 1 2 0 02 01 0 1 1 2 0 01 02 0 2
1 2 0 01 0 1 1 2 0 02 0 2 1 2 0 03 3 0
0 0 1 2 3 2 3 01 1 03 01 0 1 1 3 02
1 1ˆ
1 1
n n n n nn n n n n n n nn n nn n n n n n n n n n nn n n n n n n nn n n
n n n nn n n n n n nn n n n n n nn n nn n n n n n n n n n n nn n n n n n
ρ− − + − −⎧ ⎫⎪ ⎪= ⎨ ⎬
− − − − − − −⎪ ⎪⎩ ⎭− + − + −
=− + − − + ( )( )2 03 02 0 2n n nn n n
⎧ ⎫⎪ ⎪⎨ ⎬
− −⎪ ⎪⎩ ⎭
150
( ) ( )( ) ( )( ){ }
( ) { }
( )
( )
20? 3 01 1 03 01 0 1 1 3 02 2 03 02 0 2
0 0 1 2 3
2 2 2 22 3 01 1 3 02 1 2 03 1 2 3 0
0 0 1 2 3
2 2 2 201 02 03 0
0 0 1 2 3
23200
10 0
1 1ˆ
1 1
1
1 k
k k
n n n n n nn n n n n n n n nn n nn n n n n n
nn n n nn n n nn n n n n n nn n n n n n
n n nn n n nn n n n n n
nn nn n n n
ρ
=
= − − + − −−
= + + −−
⎧ ⎫= + + −⎨ ⎬
− ⎩ ⎭⎧ ⎫
= −⎨ ⎬− ⎩ ⎭
∑
Therefore, the variance of estimator (7.22) is
( ) ( )230
0 01
1ˆvar1
k
k k
f nr nn n n=
⎛ ⎞−= −⎜ ⎟− ⎝ ⎠
∑ . (7.23)
Analogously, these can be extend to situations where the subjects are classified
into more than three groups (i.e., 3p > ).
00 0 0
0
1ˆp
k k kk
k k k
n N n n nr n nn n n N N n=
⎛ ⎞⎛ ⎞= − − −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
∑ . (7.24)
and its variance is
( ) ( )20
0 01
1ˆvar1
pk
k k
f nr nn n n=
⎛ ⎞−= −⎜ ⎟− ⎝ ⎠
∑ . (7.25)
7.6. Derivations of results (7.3) and (7.4) in Section 7.2
By applying results derived in Section 5.4 of Chapter 5, we have
( ) ( )( )0 0 0r r • ••′= − −β Y μ , (7.26)
where 10 0,−• • •=β Σ σ 1 2N N
N N•
′⎛ ⎞= ⎜ ⎟⎝ ⎠
μ and ( ) ( ) ( )( )1 2Y Y• ′=Y .
Since 1N and 2N are known, we have
151
( )
( )( )
12 11 2
11 2
111
N N NN NN N NN N
−
• −
⎛ ⎞− −= ⎜ ⎟⎜ ⎟− −− ⎝ ⎠
Σ ,
and
( )
( )2 2 1 21
1 2 1 11 2 3
1 N N N N NNN N N N NN N N
−•
−⎛ ⎞−= ⎜ ⎟−⎝ ⎠
Σ .
It is shown that ( )( )
0 020ˆ
1d n d
n nσ
−=
− and
( )01 0 1
002 0 2
1ˆ1
nn n nnn n nn n•
−⎛ ⎞= ⎜ ⎟−− ⎝ ⎠
σ . Therefore,
( )( )
( )
( )( )( ) ( )
( ) ( )( )
( )
( )
( )
2 2 1 2 01 0 1
01 2 3 1 2 1 1 02 0 2
2 2 01 0 1 1 2 02 0 2
1 2 3 1 2 01 0 1 1 1 02 0 2
01 0 11
02 0 22
1 1ˆ1
1 11
1
11 1
N N N N N nn n nNn n N N N N N N N N nn n n
N N N nn n n N N nn n nNn n N N N N N nn n n N N N nn n n
nn n nNN
n nnn n n
N
•
− −⎛ ⎞⎛ ⎞− ⎜ ⎟⎜ ⎟=⎜ ⎟⎜ ⎟− − −⎝ ⎠⎝ ⎠
− − + −⎛ ⎞− ⎜ ⎟=⎜ ⎟− − + − −⎝ ⎠
⎛ ⎞−⎜ ⎟− ⎜ ⎟= ⎜ ⎟−
⎜ ⎟−⎜ ⎟⎝ ⎠
β
( ) ( )3 0 023
11 11 1
N n n nnn n N
⎛ ⎞−+ − ⎜ ⎟⎜ ⎟− ⎝ ⎠
( ) ( )
( ) ( ) ( )
( ) ( ) ( )
0 0 0 0
1 1
0 01 0 1 02 0 2 3 0 02
2 21 2 3
1 1
01 0 1 02 0 2 3 0 02
2 21 2 3
ˆ
1 1 1 11 1
1 1 1 11 1
r rn n
n N nn n n nn n n N n n nnn nn nn n n N N n n Nn n
N NN nn n n nn n n N n n nnN N
N Nn n N N n n NN N
• •• •′ ′= − +
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎛ ⎞− − − − −
= − −⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎜ ⎟ ⎜ ⎟⎝ ⎠⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
⎛ ⎞ ⎛⎜ ⎟⎛ ⎞− − − − −
+ +⎜ ⎟⎜ ⎟− −⎜ ⎟⎝ ⎠⎜ ⎟⎝ ⎠ ⎝
β Y β μ
⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟
⎠
After algebraic simplification, we have
( )
30
0 0 01
1ˆ1
k kk
k k
n N n n nr n nn n n N N n=
⎧ ⎫⎛ ⎞− ⎪ ⎪⎛ ⎞= − − −⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭∑ . (7.27)
This is a “post-stratified” estimator.
152
Further, if the sample estimator of •Σ is used, we have
( )( )
2 2 1 21
1 2 1 11 2 3
1ˆ n n n n nnn n n n nn n n
−•
−⎛ ⎞−= ⎜ ⎟−⎝ ⎠
Σ , and ( )
01 0 10
02 0 2
1ˆ1
nn n nnn n nn n•
−⎛ ⎞= ⎜ ⎟−− ⎝ ⎠
σ
( )( )
( )( ) ( )( ) ( )( )
( ) ( ){ }
2 2 1 2 01 0 1
01 2 3 1 2 1 1 02 0 2
2 2 01 0 1 1 2 02 0 2
1 2 3 1 2 01 0 1 1 1 02 0 2
2 1 01 0 1 02 0 2 3 01 0 1
1 2 3 1 2 01 0
1 1ˆ
1 1
1 1
n n n n n nn n n
n n n n n n n n n nn n n
n n n nn n n n n nn n n
n n n n n n nn n n n n n nn n n
n n nn n n nn n n n nn n n
n n n n n n nn n
•
− −⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟=
⎜ ⎟⎜ ⎟− −⎝ ⎠⎝ ⎠
− − + −⎛ ⎞⎜ ⎟=⎜ ⎟− + − −⎝ ⎠
− + − + −=
−
β
( ) ( ){ }1 02 0 2 3 02 0 2
101 0
1 303 0
3202 0
2
111
11
n nn n n n nn n n
nn nn n nn n
n nnn nn n
⎛ ⎞⎜ ⎟⎜ ⎟+ − + −⎝ ⎠
⎛ ⎞⎛ ⎞−⎜ ⎟⎜ ⎟⎛ ⎞⎝ ⎠⎜ ⎟ ⎛ ⎞= − − ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎛ ⎞ ⎝ ⎠⎜ ⎟−⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
( ) ( )
( )
0 0 0 0
1 1
0 1 2 301 0 02 0 03 0
2 21 2 3
1
1 2 301 0 02 0 03 0
21 2 3
ˆ
1 1 1 1 1
1 1 1
r rn n
n n n nn nn n n n n nn nn n n n n n nn n
Nn n nNn n n n n n
Nn n n n n nN
• •• •′ ′= − +
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞= − − − + −⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎜ ⎟ ⎜ ⎟⎝ ⎠⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
⎛ ⎞⎜ ⎟⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛+ − − − −⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠ ⎝⎜ ⎟⎝ ⎠⎜ ⎟⎝ ⎠
β Y β μ
( )1
2
0 1 1 2 2 3 301 0 02 0 03 0
1 2 3
1 1 2 2 3 301 0 02 0 03 0
1 2 3
001
1 1
1 1 1 1
1 1 1 1
1
NNNN
n n n n n n nn n n n n nn n n n n n n n n n
N n N n N nn n n n n nn N n n N n n N n
n nnn n
⎛ ⎞⎜ ⎟⎞ ⎜ ⎟⎜ ⎟
⎠ ⎜ ⎟⎜ ⎟⎝ ⎠
⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞= − − + − + − −⎨ ⎬⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎩ ⎭
⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞+ − + − − − −⎨ ⎬⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎩ ⎭
= − − 1 2 3 30 02 0 03 0 03 0
3
1 1 2 2 3 3 301 0 02 0 03 0 03 0
1 2 3 3
1
1 1
n n nn n n n n n nn n n n n
N n N n N n nn n n n n n n nN n n n n n n n n
⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ − + − + −⎨ ⎬⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎩ ⎭
⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ − + − + − − −⎨ ⎬⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎩ ⎭
153
3 30
0 0 0 0 00 0
30
0 00
1 1ˆ
1
k k kk k
k k k
k k kk
k k k
n n N nr n n n nn n n N n n
n N n n nn nn n n N N n
= =
=
⎧ ⎫⎛ ⎞ ⎛ ⎞= − − + −⎨ ⎬⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎩ ⎭
⎛ ⎞⎛ ⎞= − − −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
∑ ∑
∑
After algebraic simplification, we have
3
00 0 0
0
1ˆ k k kk
k k k
n N n nnr n nn n n N N n=
⎛ ⎞ ⎛ ⎞= − − −⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠
∑ .
154
CHAPTER 8
DIRECT RATE STANDARDIZATION USING
RANDOM PERMUTATION MODELS
8.1. Motivations
Results derived in Chapters 4 and 5 can be also used for “rate standardization” -
a common procedure in Epidemiology and vital statistics. In this chapter, we will show
that direct rate standardization is a special case of the estimators derived in Chapter 5
when rates are assumed to be equal across populations. We may view direct rate
standardization as a prediction problem, where one is trying to predict the population
death rates of the pooled population based on the two “samples”. Both populations,
observed either completely or partially, can be considered as two parts resulting from a
random partition of the pooled population. Therefore, the previously described
estimation strategies can be applied with minimal modification.
Without losing generality, we use gender-adjusted death rate as an example.
Results can be extended to situation where auxiliary variable is multinomial such as
age- or age and gender specific categories. We will consider a simple case of internal
direct standardization when two populations under comparison are observed
completely.
8.2. Rate estimation with internal direct adjustment
A common approach to rate standardization is to “pool” the populations under
study, and use the “combined population structure” for adjustment. For example, if two
populations A and B are under comparison, let us denote their sizes and gender ratio (%
155
of males) are A AN ,g and B BN ,g , respectively. Then the pooled population AB will
have size A BN N N= + and gender ratio ( ) ( )AB A A B B A Bg N g N g N N= + + . The rate
standardization can be made using ABg . This method was first proposed by Neison
(1844), who suggested comparing the mortality of two communities by having the
population of one community “actually transferred” to the other community and
subjecting it to “exactly the same rate of mortality as that prevailed” in other
community (Neison 1844).
Let us consider a simple case. Let ( )0jy represent survival status of a subject
( 1y = for death, and 0 for alive), and ( )1jy represent a subject’s gender (1 for male and 0
for female). Suppose that we are interested in estimating and comparing death rates of
population A and B (i.e., ( )0Ay and ( )0
By ). Further, suppose that population A consists of
AmN males and AfN females, and Am Af AN N N+ = ; population B consists of BmN males
and BfN females, and Bm Bf BN N N+ = . The pooled population AB thus consists of
m Am BmN N N= + males and f Af BfN N N= + , and m fN N N+ = subjects. Under
Neison’s approach, populations A and B are considered as random partitions of the
pooled population AB. Based on parts A and B , we will obtain a set of two predicted
total numbers of deaths (or death rates), i.e., ( )0ˆAT and ( )0
BT as well as Ar and Br . The
variance-covariance matrices of ( )0ˆAT and ( )0
BT (or Ar and Br ) can be also obtained.
Further, we will be able to estimate difference between Ar and Br and the rate ratio of
Ar to Br . Since the predicted total numbers are predictors of the “same” population,
they are comparable.
156
Suppose that two populations are completely observed, and the two observed
populations are viewed as a realization of random partition of the pooled population
into two parts that have the sizes AN and BN . Let ( ) ( )( )0 1 ′′ ′Y Y represent a random
permutation of the vector of response and auxiliary variables of the pooled population,
( ) ( )( ) ( ) ( )( )( )0 0 1 1A B A B
′′ ′ ′ ′=z y y y y . Using the method outlined in Chapter 5, we transform
( )1Y to ( ) ( )1 * 11N NN
⎛ ⎞= −⎜ ⎟⎝ ⎠
Y I J Y . We then partition ( ) ( )( )0 1 * ′′ ′Y Y into parts A and B
that have the same dimension as populations A and B,
( ) ( ) ( ) ( )0 1 * 0 1 *
A A B B
Part A Part B
′⎛ ⎞′ ′ ′ ′= ⎜ ⎟
⎜ ⎟⎝ ⎠
Z Y Y Y Y . (8.1)
The total number of deaths in the combined population is defined as
( ) ( )0 AA B
BT
⎛ ⎞′ ′= ⎜ ⎟
⎝ ⎠
ZZl l ,
where ( )1 0AA N
′= ⊗1l , ( )1 0BB N
′= ⊗1l , ( ) ( )( )0 1 *A A A
′′ ′=Z Y Y and
( ) ( )( )0 1 *B B B
′′ ′=Z Y Y .
Let assume that we are interested in deriving estimator of the overall total
number of deaths ( )0T based on populations A and B, i.e., ( )0ˆAT and ( )0
BT , respectively.
They can be defined using (8.1) as,
( )
( )
0
0
ˆ
ˆA A AA
B B BB
T
T
′ ′⎛ ⎞ +⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟′ ′⎜ ⎟ +⎝ ⎠⎝ ⎠⎝ ⎠
w 0 Z0 w Z
ll . (8.2)
157
The prediction error is defined as,
( )
( )
( )
( )
0 0
0 0
ˆ.
ˆA A A A A B AA A B
B B B B A B BA BB
T T
T T
′ ′ ′ ′⎛ ⎞ ⎛ ⎞ + −′ ′⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎞− = − =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟′ ′ ′ ′⎜ ⎟⎜ ⎟ + −′ ′⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠
w 0 Z Z w Z0 w Z Z w Z
l ll ll ll l
The unbiasedness of ( )0ˆAT and ( )0
BT for ( )0T implies that,
( ) ( )
( ) ( )
0 0
0 0
ˆ
ˆA A A B B
B B A AB
T TE E
T T
⎛ ⎞ ′ ′− −⎛ ⎞= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ′ ′−− ⎝ ⎠⎝ ⎠
w Z Z0
w Z Zll
.
Since ( ) ( ) ( )( )02 0
AA NE μ ′= ⊗Z I 1 and ( ) ( ) ( )( )02 0
BB NE μ ′= ⊗Z I 1 , this constraint
is equivalent to
( )( )
( )( )
A A
B B
N A B N B
AN B A N
N NNN
⎛ ⎞ ⎛ ⎞′ ′− ⎛ ⎞⎜ ⎟ ⎜ ⎟= − =⎜ ⎟⎜ ⎟ ⎜ ⎟′ ′− ⎝ ⎠⎝ ⎠ ⎝ ⎠
1 0 w 1 0 0w 0
1 0 w 0 1 0, (8.3)
where ( )A B′′ ′=w w w .
From Chapter 2, we have ,
,
cov A A BA
B A BB
⎛ ⎞⎛ ⎞= ⎜ ⎟⎜ ⎟
⎝ ⎠ ⎝ ⎠
V VZV VZ
, where ,AA N N= ⊗V Σ P ,
,BB N N= ⊗V Σ P and , ,1
A BA B B A N NN ×⎛ ⎞′= = ⊗ −⎜ ⎟⎝ ⎠
V V Σ J . The variance-covariance
matrix of ( ) ( )( )0 0垐A BT T can be represented as
( )
( )
0, , ,
0, ,
ˆcov 2 .
ˆA A A A A B B B B A A B B B B B B B B A AA
B B A A B B B A A A A A B B A AB B A A AB
T
T
′ ′ ′ ′ ′ ′⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞= − +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟′ ′ ′ ′ ′ ′⎜ ⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠
w V w w V w V w V w V Vw V w w V w V w V w V V
l l l l l ll l l l l l
Furthermore, we can show that
( )
( ) ( ) ( )0
0
ˆcov 2 .
ˆA BAA
B A A A A B B BB ABB
Ttrace
T
⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎪ ⎪ ′ ′ ′ ′ ′= − + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎨ ⎬ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠ ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭
V 0 V 0w w w V V0 V 0 Vl l l l l l
(8.4)
158
Following the methods of Chapters 4 and 5, we derive the estimators ( )0AT and
( )0BT by minimizing the trace of ( ) ( )( )0 0垐cov A BT T subject to (8.3), which is equivalent to
minimizing the following optimization function
( ) ( )
( )( )
2
2 .A
B
A BAB A
B AB
N B
AN
NN
⎛ ⎞ ⎛ ⎞′ ′ ′Φ = −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎧ ⎫⎛ ⎞′ ⎛ ⎞⎪ ⎪⎜ ⎟′+ −⎨ ⎬⎜ ⎟⎜ ⎟′ ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭
V 0 V 0w w w w0 V 0 V
1 0 0λ w
0 1 0
l l
Differentiating ( )Φ w with respect to w and λ , and setting the derivatives to zero
results in the following estimation equations,
( )( )
,
,ˆˆ
A
B
A
B
N
AA B B
B N B A A
B
N A
N
NN
⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟
⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠ ⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞ ⎜ ⎟⎝ ⎠⎜ ⎟⎜ ⎟ ⎛ ⎞ ⎝ ⎠⎜ ⎟⎝ ⎠⎜ ⎟⎜ ⎟ =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎛ ⎞⎜ ⎟⎝ ⎠ ⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞ ⎜ ⎟′ ⎝ ⎠⎜ ⎟ ⎝ ⎠⎜ ⎟⎜ ⎟⎜ ⎟′⎜ ⎟⎝ ⎠⎝ ⎠
10
V 0 0 V 00 V 1 0 Vw0
0 λ1 0 0
00 1 0
ll
. (8.5)
Solving (8.5) yields the following unique solution,
01
01
110ˆ
ˆ ˆ 1 10
BA
B B
NNAA
BN N
B
NN
NN
β
β
⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⊗⊗ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ −⎛ ⎞ ⎝ ⎠ ⎝ ⎠⎜ ⎟⎜ ⎟= = − +⎜ ⎟ ⎜ ⎟⎜ ⎟⎛ ⎞ ⎛ ⎞⎝ ⎠ ⎜ ⎟⎜ ⎟⊗ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
11w
w w1 1
. (8.6)
where 201 01 1β σ σ= . Using (8.6) and (8.2), it is convenient to show that
( )
( )
( ) ( ) ( )( )( )( ) ( ) ( )( )( )0 1 1
0 01
0 0 1 101
ˆ
ˆ
A AA
B B B
NT
T N
β μ
β μ
⎛ ⎞− −⎛ ⎞ ⎜ ⎟=⎜ ⎟ ⎜ ⎟⎜ ⎟ − −⎝ ⎠ ⎜ ⎟⎝ ⎠
Y Y
Y Y. (8.7)
159
where ( )1μ is the population mean of the auxiliary variable of the pooled population,
( )0AY and ( )1
AY , ( )0BY and ( )1
BY are the means of the response and auxiliary variables of
subpopulation A and B, respectively. The variance-covariance matrix of ( ) ( )( )0 0A BT T is
( )
( ) ( )0
2 2 20 01 1 01 010
1ˆ
cov 2 .ˆ 1
B
AA
AB
B
NNT
N NTN
σ β σ β σ
⎛ ⎞−⎜ ⎟⎛ ⎞ ⎜ ⎟= + −⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ −⎜ ⎟⎜ ⎟
⎝ ⎠
(8.8)
Application to gender-adjusted death rates
By definition, ( )1 m mA mB
A B
N N NN N N
μ += =
+,
( )00
01 1m f fm
m f
N N NNN N N N
σ⎛ ⎞
= −⎜ ⎟⎜ ⎟− ⎝ ⎠,
( )21 1
m fN NN N
σ =−
, ( )( )
0 020 1
N N NN N
σ−
=−
, and thus 0001
fm
m f
NNN N
β = − . When applied to
estimating gender-adjusted population totals of deaths, (8.7) can be simplified as
( )
( )
00 0
0
000 0
ˆ
ˆ
fA B mB mA m
A B A m fA
fB B A mA mB m
B A B m f
NN N N N NNN N N N N NT
NT N N N N NNN N N N N N
⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟− − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎛ ⎞ ⎝ ⎠⎝ ⎠⎜ ⎟=⎜ ⎟⎜ ⎟ ⎛ ⎞⎜ ⎟⎛ ⎞⎛ ⎞⎝ ⎠ − − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠
. (8.9)
Its variance-variance matrix is
( )
( )
( )( ) ( )
2000 0 0
0
1ˆ
covˆ 1 11
B
A m f fA m
A m fB
B
NN N N NT N N N NN N N N N N N NT
N
⎛ ⎞−⎜ ⎟⎛ ⎞⎛ ⎞ ⎛ ⎞−⎜ ⎟⎜ ⎟= − −⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟− − ⎝ ⎠⎝ ⎠ − ⎝ ⎠⎜ ⎟⎜ ⎟⎝ ⎠
. (8.10)
Similarly, the standardized death rates and the variance-covariances are
160
( )
( )
00 0
0
000 0
ˆ
ˆ
fA B mB mA m
A B A m fA
fB B A mA mB m
B A B m f
NN N N N NN N N N N Nr
Nr N N N N NN N N N N N
⎛ ⎞⎛ ⎞⎛ ⎞− − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞ ⎜ ⎟⎝ ⎠⎝ ⎠
=⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎛ ⎞⎛ ⎞⎜ ⎟⎝ ⎠ − − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠
, (8.11)
( )
( )
( )( ) ( )
2000 0 0
0
1ˆ 1cov 11 1ˆB Am f fA m
A Bm fB
N NN N Nr N N N NN NN N N N N N Nr
⎧ ⎫⎛ ⎞ −⎛ ⎞ ⎛ ⎞−⎪ ⎪= − −⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟ −− − ⎝ ⎠⎪ ⎪⎝ ⎠⎝ ⎠ ⎩ ⎭, (8.12)
respectively.
Detailed derivations are given in Section 8.6.
8.3. Rate comparison with internal direct adjustment
With the results established in the previous sections, rates can be compared
using either their differences or their ratios.
8.3.1. Rate differences
When all units of the subpopulations are observed, one may evaluate the rate
difference by applying result (8.11). We define the rate difference between A and B as,
( ) ( )0 0 00 0 0垐 fA B mA mB mAB A B
A B A B m f
NN N N N NDR r rN N N N N N
⎛ ⎞⎛ ⎞= − = − + − −⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
. (8.13)
Applying result (8.12), the variance of ABDR is evaluated as
( ) ( )( )
( )
( )
( )
0
0
2 2 20 01 1 01 01
2
00 0 0
ˆ 1var 1 1 cov
1ˆ
2
1 1 .1 1
AAB
B
A B
m f fm
A B A B m f
rDR
rN
N N
N N NN N N NN N N N N N N N
σ β σ β σ
⎛ ⎞⎛ ⎞= − ⎜ ⎟⎜ ⎟⎜ ⎟ −⎝ ⎠⎝ ⎠
= + −
⎛ ⎞− ⎛ ⎞= − −⎜ ⎟⎜ ⎟⎜ ⎟− − ⎝ ⎠⎝ ⎠
161
Loosely based on (8.13) being a function of sample means, a test statistic may
be formulated based on the above results. One may test if ABDR is equal to zero based
on normal approximation. Notice that, when the two populations under comparison are
completely observed, ( )var ABDR are known constant.
8.3.2. Rate ratio
Similarly, the rate ratio of ( )0Ar to ( )0
Br is
( )
( )
00 00
000 0
ˆˆ
fA mB mA m
A B B A m fA BAB
AB fB mB mA m
A B B A m f
NNN N N NN N N N N Nr NRR
Nr NNN N N NN N N N N N
⎛ ⎞⎛ ⎞− − −⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞ ⎝ ⎠⎝ ⎠= = ⎜ ⎟ ⎛ ⎞⎛ ⎞⎝ ⎠ + − −⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
. (8.14)
Result (8.14) can be also expressed alternatively as,
0
00 0
1 .B A BAB
A fB mB mA m
A B B A m f
NNN N NRRN NNN N N N
N N N N N N
⎧ ⎫⎪ ⎪
⎛ ⎞⎪ ⎪= −⎨ ⎬⎜ ⎟ ⎛ ⎞⎛ ⎞⎝ ⎠⎪ ⎪+ − −⎜ ⎟⎜ ⎟⎪ ⎪⎜ ⎟⎝ ⎠⎝ ⎠⎩ ⎭
Its variance can be evaluated using the first-order Taylor linearization of
AB A BRR r r= , expanding around ( )A Br r ′=r , i.e.,
( )ˆ| ˆ垐 RemainderAB A B ABRR r r RR =′= = + − +r r rΔ r r ,
where 2
1ˆ
AAB
B B
rRRr r
′⎛ ⎞∂= = −⎜ ⎟∂ ⎝ ⎠
rr
Δr
. When the remainder term is ignored, we have an
expression for the linearized error,
( )ˆ| ˆABL ABRR RRε =′= − ≅ −r r rΔ r r . (8.15)
162
Since ( ){ } ( )垐| |垐 0E E= =′ ′− = − =r r r r r rΔ r r Δ r r , ABRR is approximately unbiased for
ABRR . The linearized approximate variance is thus
( ) ( )0,1ˆ ˆcovAV R ′≅ r rΔ r Δ . (8.16)
A sample estimator can be obtained by substituting a consistent estimator of
( )ˆcov r and replace r in rΔ by r , ˆ 2
ˆ1垐
A
B B
rr r
′⎛ ⎞= −⎜ ⎟⎝ ⎠
rΔ , such that
( ) ( )2 2
垐1 1ˆ ˆcov垐 垐
A AAB
A B A B
r rV RRr r r r
′⎛ ⎞ ⎛ ⎞≅ − −⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
r . (8.17)
Equation (8.17) can be further simplified as
( ) ( ) ( )( ) ( )
2 22 20 0 00 0 0
4 20 0
1ˆ .1 1
B A A B m f fB mAB
A B A m f
N N N N N N NN N NN NV RRN N N N N N N N N N
⎧ ⎫+ ⎛ ⎞−⎪ ⎪≅ − −⎜ ⎟⎨ ⎬⎜ ⎟− −⎪ ⎪⎝ ⎠⎩ ⎭
If we assume that sample sizes are large enough so that asymptotic normality
applies, a 95% confidence interval of ABRR can be constructed using ( )ˆ ABV RR . A test
statistic can be formulated to test whether ABRR is equal to 1, that is, 垐A BR R= .
8.4. External direct adjustment
External direct adjustment method usually use an “outside” reference
population, for example, US Population based on recent census, or the immediately
larger population which includes the populations under comparison. For example, if we
are interested in estimating and comparing the directly adjusted death rates of Franklin
County (A) and Worcester County (B) in Massachusetts, we may choose the
Commonwealth of Massachusetts as the reference population.
163
Suppose we are interested in the adjusted rates for population A and B.
Similarly, populations A and B can be considered as two parts resulting from random
permutation of the larger reference population (the Commonwealth of Massachusetts).
The estimation process and the consequent estimators are invariant to those presented in
Section 8.2.
8.5. Discussion
In this Chapter, we have developed a method of internal direct rate
standardization based on random permutation models where we assume equal
population rates. Unlike those direct rate standardization method commonly used in
Epidemiology and vital statistics, this new method addresses the finiteness of the
populations involved and yields a variance-covariance matrix of the standardized rates
of the populations under study. In conventional approaches (Kahn and Sempos 1989),
the covariances between the rates of any two populations (or subclasses) are ignored.
With the variance-covariance, it is possible to conduct hypothesis testing based on
asymptotic normal approximation. When both populations under comparison are
observed completely, it is also possible to use non-parametric method to evaluate the
distribution of the two adjusted rates, their differences or ratios.
We limited our discussion to gender-adjusted rates. Extension of the above
results to age- or gender- and age-adjusted rates, rate differences and ratios are analogue
to cases where an auxiliary variable is multinomial. Applying the results developed in
Chapter 5 to internal direct standardization involving multiple auxiliary variables is an
interest field of research in the future.
164
8.6. Derivations of the results in Section 8.2.
As shown in (8.1), the estimators for ( ) ( )( )0 0A BT T ′ is defined as,
( )
( )
0
0
ˆ
ˆA A AA
B B BB
T
T
′ ′⎛ ⎞ +⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟′ ′⎜ ⎟ +⎝ ⎠⎝ ⎠⎝ ⎠
w 0 Z0 w Z
ll .
The corresponding prediction error is,
( )
( )
( )
( )
0 0
0 0
ˆ
ˆ
.
A A A AA A B
B B B BA BB
A B A
A B B
T T
T T
′ ′⎛ ⎞ ⎛ ⎞ + ′ ′⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞− = −⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟′ ′⎜ ⎟⎜ ⎟ + ′ ′⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠
′ ′−⎛ ⎞⎛ ⎞= ⎜ ⎟⎜ ⎟′ ′−⎝ ⎠⎝ ⎠
w 0 Z Z0 w Z Z
w Zw Z
l l ll l l
ll
The unbiasedness of ( )0AT and ( )0
BT for ( )0T implies that,
( ) ( )
( ) ( )
0 0
0 0
A A A B B
B B A AB
T TE E
T T
⎛ ⎞ ′ ′− −⎛ ⎞= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ′ ′−− ⎝ ⎠⎝ ⎠
w Z Z0
w Z Zll
.
Since ( ) ( ) ( )( )02 0
AA NE μ ′= ⊗Z I 1 and ( ) ( ) ( )( )02 0
BB NE μ ′= ⊗Z I 1 , this constraint
is equivalent to
( )( )
( )( )
A A
B B
N A B N B
AN B A N
N NNN
⎛ ⎞ ⎛ ⎞′ ′− ⎛ ⎞⎜ ⎟ ⎜ ⎟= − =⎜ ⎟⎜ ⎟ ⎜ ⎟′ ′− ⎝ ⎠⎝ ⎠ ⎝ ⎠
1 0 w 1 0 0w 0
1 0 w 0 1 0. (8.18)
where ( )A B′′ ′=w w w .
Now, we apply results derived in Section 4.5.2 of Chapter 4 and Chapter 5. We
optimize the estimators in terms of the trace of ( ) ( )( )0 0垐cov A BT T . From Chapter 2, we
have ,
,
covA A BA
B A BB
⎛ ⎞⎛ ⎞⎜ ⎟=⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
V VZ
V VZ, where ,AA N N= ⊗V Σ P , ,BB N N= ⊗V Σ P and
165
, ,1
A BA B B A N NN ×⎛ ⎞′= = ⊗ −⎜ ⎟⎝ ⎠
V V Σ J . Therefore, ( ) ( )( )0 0垐cov A BT T can be represented as,
( )
( )
0,
0,
,
,
,
,
cov A B A A B A AA
A B B A B B BB
A B A A B A A
B A B A B B B
A A A B A
B B A B
T
T
′ ′⎛ ⎞ − −⎛ ⎞⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟′ ′⎜ ⎟ − −⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠
′ ′⎧ ⎫ ⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎪ ⎪ ⎪ ⎪= − −⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎨ ⎬ ⎨ ⎬′ ′⎪ ⎪ ⎪ ⎪⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎩ ⎭ ⎩ ⎭′⎛ ⎞⎛ ⎞
= ⎜ ⎟⎜ ⎟′⎝ ⎠⎝ ⎠
w V V ww V V w
w 0 0 V V w 0 00 w 0 V V 0 w 0
w 0 V V w0 w V V
l ll l
l ll l
,
,
, ,
, ,
, ,
,2
B A A B A
B A B A B B
A A A B A B A A B A
B B A B B A B A B B
A A A A A B B B B A A B B
B B A A B B B
′⎛ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎞+⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟′⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠
′ ′⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎞− −⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟′ ′⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠
′ ′ ′ ′⎛ ⎞= −⎜ ⎟′ ′⎝ ⎠
0 0 V V 00 w 0 V V 0
w 0 V V 0 0 V V w 00 w V V 0 0 V V 0 w
w V w w V w V w Vw V w w V w
l ll l
l ll l
l l
,
,
B
A A A A A B B
B B B B B A A
A AB B A A A
⎛ ⎞⎜ ⎟′ ′⎝ ⎠
′ ′⎛ ⎞+⎜ ⎟′ ′⎝ ⎠
wV w V w
V VV V
l l
l l l ll l l l
We derive estimators of ( ) ( )( )0 0A BT T by minimizing trace of ( ) ( )( )0 0cov ,A BT T .
The trace of ( ) ( )( )0 0cov ,A BT T is
( )
( )
( )
0, ,
0, ,
,
,
,
cov 2
2
A A A A A B B B B A A B B BA
B B A A B B B A A A A A B BB
B B B B B A A
A AB B A A A
A B AB A
B A B
Ttrace trace trace
T
trace
⎧ ⎫ ′ ′ ′ ′⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎪ ⎪ = −⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎨ ⎬ ′ ′ ′ ′⎜ ⎟ ⎝ ⎠ ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭′ ′⎛ ⎞
+ ⎜ ⎟′ ′⎝ ⎠⎛ ⎞ ⎛′ ′ ′= −⎜ ⎟⎝ ⎠ ⎝
w V w w V w V w V ww V w w V w V w V w
V VV V
V 0 V 0w w0 V 0 V
l ll l
l l l ll l l l
l l ( )A A A B B B
⎞′ ′+ +⎜ ⎟
⎠w V Vl l l l
The optimization function is thus defined as
( ) ( )
( )( )
,
,2
2 A
B
A B AB A
B A B
N B
AN
NN
⎛ ⎞ ⎛ ⎞′ ′ ′Φ = −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎧ ⎫⎛ ⎞′ ⎛ ⎞⎪ ⎪⎜ ⎟′+ −⎨ ⎬⎜ ⎟⎜ ⎟′ ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭
V 0 V 0w w w w0 V 0 V
1 0 0λ w
0 1 0
l l
.
166
Differentiating ( )Φ w with respect to w and λ , we have
( )
( )( )
( )
,
,2 2 2
2
A
B
A
B
N
A A B B
B B A A N
N B
AN
NN
⎧ ⎛ ⎞⎛ ⎞⎪ ⎜ ⎟⎜ ⎟
⎛ ⎞ ⎛ ⎞⎛ ⎞⎪ ∂ ⎜ ⎟⎝ ⎠Φ = − +⎜ ⎟ ⎜ ⎟⎜ ⎟⎪ ⎜ ⎟⎜ ⎟ ⎜ ⎟∂ ⎛ ⎞⎝ ⎠⎝ ⎠ ⎝ ⎠⎪ ⎜ ⎟⎪ ⎜ ⎟⎜ ⎟⎨ ⎝ ⎠⎝ ⎠⎪⎪ ⎧ ⎫⎛ ⎞′ ⎛ ⎞∂ ⎪ ⎪⎪ ⎜ ⎟Φ = −⎨ ⎬⎜ ⎟⎪ ⎜ ⎟∂ ′ ⎝ ⎠⎪ ⎪⎝ ⎠⎪ ⎩ ⎭⎩
10
V 0 V 0 0w w λ0 V 0 Vw 1
00
1 0 0w w
λ 0 1 0
ll
Setting both derivatives to zero results in the following estimating equations
( )( )
,
,ˆˆ
A
B
A
B
N
AA B B
B N B A A
B
N A
N
NN
⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟
⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠ ⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞ ⎜ ⎟⎝ ⎠⎜ ⎟⎜ ⎟ ⎛ ⎞ ⎝ ⎠⎜ ⎟⎝ ⎠⎜ ⎟⎜ ⎟ =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎛ ⎞⎜ ⎟⎝ ⎠ ⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞ ⎜ ⎟′ ⎝ ⎠⎜ ⎟ ⎝ ⎠⎜ ⎟⎜ ⎟⎜ ⎟′⎜ ⎟⎝ ⎠⎝ ⎠
10
V 0 0 V 00 V 1 0 Vw0
0 λ1 0 0
00 1 0
ll
. (8.19)
Solving (8.19) yields the following unique solution,
1 1,
,
ˆˆ
A
B
N
A A B AB
B B A BA N
− −⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎝ ⎠⎜ ⎟= −⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎛ ⎞⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
10
V 0 V 0 V 0 0w λ0 V 0 V 0 V 1
00
ll
( )( )
( )( )
1
1
1,
,
ˆ
.
A
A
BB
A
B
N
N A
B NN
N A A B B B
B B A A AN
NN
−
−
−
⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞′ ⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎛ ⎞′ ⎝ ⎠⎜ ⎟⎜ ⎟⎝ ⎠ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎛ ⎞⎛ ⎞′ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟× −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟′⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠
10
1 0 0 V 0 0λ 0 V 10 1 0
00
1 0 0 V 0 V 00 V 0 V0 1 0
ll
After algebraic simplification, we have
167
( ) ( ){ }
( ) ( ){ }
1
1 1,
1
1 1,
ˆ
A
A A
B
B B
NN A N A A B B B
NN B N B B A A A
N
N
−
− −
−
− −
⎛ ⎞⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟′ ′ −⎨ ⎜ ⎟⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭⎜ ⎟=⎜ ⎟⎧ ⎫⎛ ⎞⎪ ⎪′ ′⎜ ⎟−⎨ ⎜ ⎟⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭⎝ ⎠
11 0 V 1 0 V V
0λ
11 0 V 1 0 V V
0
l
l
,
and
( ) ( ){ }
( ) ( ){ }
1
1 1 1,1
,1 1
,1 1 1
,
ˆ
A A
A A
B B
B B
N NA N A N A A B B B
A A B B
B B A A N NB N B N B B A A A
N
N
−
− − −
−
− −
− − −
⎛ ⎞⎧ ⎫⎛ ⎞ ⎛ ⎞⎪ ⎪⎜ ⎟′ ′ −⎜ ⎟⎨ ⎜ ⎟⎬⎜ ⎟⎛ ⎞ ⎪ ⎪⎝ ⎠ ⎝ ⎠⎩ ⎭⎜ ⎟= −⎜ ⎟⎜ ⎟ ⎜ ⎟⎧ ⎫⎝ ⎠ ⎛ ⎞ ⎛ ⎞⎪ ⎪′ ′⎜ ⎟−⎜ ⎟⎨ ⎜ ⎟⎬⎜ ⎟⎪ ⎪⎝ ⎠ ⎝ ⎠⎩ ⎭⎝ ⎠
1 1V 1 0 V 1 0 V V
0 0V Vw
V V 1 1V 1 0 V 1 0 V V
0 0
lll
l
.
Since 1 1 1,AA N N
− − −= ⊗V Σ P , 1 1,BB N N
− −= ⊗V Σ P , , ,1
A BA B B A N NN ×⎛ ⎞′= = ⊗ −⎜ ⎟⎝ ⎠
V V Σ J ,
we have 1, 2
1A BA A B N N
AN N−
×
⎛ ⎞= ⊗ −⎜ ⎟−⎝ ⎠
V V I J , 1, 2
1B AB B A N N
BN N−
×
⎛ ⎞= ⊗ −⎜ ⎟−⎝ ⎠
V V I J ,
( )1, 1 0
BB B A A N− ′= − ⊗V V 1l , ( )1
, 1 0AA A B B N
− ′= − ⊗V V 1l . Consequently,
( )2
1 12 2 20 1 01
11 0
0σ
σ σ σ− ⎛ ⎞
=⎜ ⎟ −⎝ ⎠Σ , and ( )
1 2 2 21 0 1 01
21
11 0
0σ σ σ
σ
−
−⎛ ⎞⎛ ⎞ −=⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠Σ . Further,
( )2
1 12 2 20 1 01
A
A
N AN A
A
NNN N
σσ σ σ
− ⎛ ⎞′ =⎜ ⎟
− −⎝ ⎠
11 0 V
0, ( )
21 1
2 2 20 1 01
B
B
N BN B
B
NNN N
σσ σ σ
− ⎛ ⎞′ =⎜ ⎟
− −⎝ ⎠
11 0 V
0.
Therefore,
01 01
01
01
110
ˆ11
0
AAA
BBB
B
NNN A
A
ANN
NBB
NNN NNN
NN NNN N
β β
ββ
⎛ ⎞⎛ ⎞−⎜ ⎟⎜ ⎟⎛ ⎞⎛ ⎞ ⊗⎛ ⎞⎛ ⎞ ⎜ ⎟⊗ ⎜ ⎟⊗ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ − −⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎜ ⎟⎜ ⎟= − + = ⎜ ⎟⎜ ⎟⎜ ⎟ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎜ ⎟−⎜ ⎟⊗⎜ ⎟⊗ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⊗⎜ ⎟⎜ ⎟−⎝ ⎠⎝ ⎠⎝ ⎠ ⎜ ⎟⎝ ⎠ ⎜ ⎟⎜ ⎟⎜ ⎟−⎝ ⎠⎝ ⎠
111w
11 1
, (8.20)
where 201 01 1β σ σ= . Consequently,
168
01
01
11ˆ
0 A A A
B
A N N NA A
NN N
NN Nβ β
⎛ ⎞−⎛ ⎞⎛ ⎞ ⎜ ⎟= − ⊗ + ⊗ = ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟− ⎜ ⎟⎝ ⎠ ⎝ ⎠ −⎝ ⎠
w 1 1 1
01
01
11ˆ
0 B B B
A
B N N NB B
NN N
NN Nβ β
⎛ ⎞−⎛ ⎞⎛ ⎞ ⎜ ⎟= − ⊗ + ⊗ = ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟− ⎜ ⎟⎝ ⎠ ⎝ ⎠ −⎝ ⎠
w 1 1 1 .
The estimators for ( ) ( )( )0 0A BT T can be represented as
( )
( )
( )( )( )
( )
( )( )( )
( )
( ) ( )( )( ) ( )( )
( ) ( )( )
0
01 1 *0
00
01 1 *
0 1 *0 1 *01
01
0 1 *01
1ˆ
ˆ1
A
B
A A
B B
AN
A AA A AA
B B B BBN
B B
N A N AA AA
N B N BB
NNT
T NN
NNN
N NN
β
β
β β
β
⎛ ⎞⎛ ⎞′⎜ ⎟− ⊗ ⎜ ⎟⎜ ⎟⎜ ⎟′ ′⎛ ⎞ +⎛ ⎞⎛ ⎞ ⎝ ⎠
⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟′ ′⎜ ⎟ + ⎛ ⎞⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠ ′− ⊗ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎛ ⎞′ ′−⎜ ⎟ −⎜ ⎟= =⎜ ⎟
′ ′−⎜ ⎟⎜ ⎟⎝ ⎠
Y1
Yw 0 Z0 w Z Y
1Y
1 Y 1 Y Y Y
Y1 Y 1 Y
ll
( ) ( )( )0 1 *01
,B Bβ
⎛ ⎞⎜ ⎟⎜ ⎟−⎜ ⎟⎝ ⎠
Y
or simply,
( )
( )
( ) ( ) ( )( )( )( ) ( ) ( )( )( )0 1 1
0 01
0 0 1 101
ˆ,
ˆ
A AA
B B B
NT
T N
β μ
β μ
⎛ ⎞− −⎛ ⎞ ⎜ ⎟=⎜ ⎟ ⎜ ⎟⎜ ⎟ − −⎝ ⎠ ⎜ ⎟⎝ ⎠
Y Y
Y Y (8.21)
where ( )1μ is the population mean of the auxiliary variable of the pooled population,
( )0AY and ( )1
AY , ( )0BY and ( )1
BY are the means of the response and auxiliary variables of
subpopulation A and B, respectively.
The variance-covariance matrix of ( ) ( )( )0 0A BT T is evaluated as follows.
169
( )
( )
( ) ( ) ( ) ( )( ) ( ) ( ) ( )
0,
0,
,
,
ˆcov
ˆ
.
A A A AA A BA
B B B BB A BB
A A A A A A A A B B B
B B B A A A B B B B B
T
T
′ ′⎛ ⎞ + +⎛ ⎞ ⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟′ ′⎜ ⎟ + +⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠
′ ′ ′ ′⎛ + + + + ⎞= ⎜ ⎟⎜ ⎟′ ′ ′ ′+ + + +⎝ ⎠
w 0 w 0V V0 w 0 wV V
w V w w V ww V w w V w
l ll l
l l l ll l l l
Since ( )( )011AA A N
A
NN
β′ ′ ′+ = − ⊗w 1l and ( )( )011BB B N
B
NN
β′ ′ ′+ = − ⊗w 1l , we have
( ) ( ) ( )( )( )
( ) ( )
( )
( )
2
01 ,01
2
01 ,01
2 20 01
01 20101 1
2 2 20 01 1 01 01
11
11
11
2
A A A
A A A
A A A A A N N N NA
N N N NA
A
B
A
NN
NN
NN
NNN
ββ
ββ
σ σβ
βσ σ
σ β σ β σ
⎛ ⎞⎛ ⎞ ⎛ ⎞′ ′ ′+ + = − ⊗ ⊗ ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟−⎝ ⎠⎝ ⎠ ⎝ ⎠
⎛ ⎞⎛ ⎞ ⎛ ⎞′= − ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟−⎝ ⎠⎝ ⎠ ⎝ ⎠
⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞= −⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟−⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠
= + −
w V w 1 Σ P 1
Σ 1 P 1
l l
( ) ( ) ( )( )( )
( ) ( )
( )
( )
2
01 ,01
2
01 ,01
2 20 01
01 20101 1
2 2 20 01 1 01 01
11
11
11
2
B B B
B B B
B B B B B N N N NB
N N N NB
B
A
B
NN
NN
NN
NNN
ββ
ββ
σ σβ
βσ σ
σ β σ β σ
⎛ ⎞⎛ ⎞ ⎛ ⎞′ ′ ′+ + = − ⊗ ⊗ ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟−⎝ ⎠⎝ ⎠ ⎝ ⎠
⎛ ⎞⎛ ⎞ ⎛ ⎞′= − ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟−⎝ ⎠⎝ ⎠ ⎝ ⎠
⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞= −⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟−⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠
= + −
w V w 1 Σ P 1
Σ 1 P 1
l l
( ) ( ) ( ) ( )
( )
( )
, ,
2
01 01
2
0101
2 2 20 01 1 01 01
1 1
1 11
2
A B
A B
A A B B
A A A B B B B B B A A A
N NN N
A B
N N N NA B
NN N N
NN N N
N
β β
ββ
σ β σ β σ
×
×
′ ′ ′ ′+ + = + +
⎛ ⎞′ ⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟′= ⊗ ⊗ − ⊗⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞′= − ⊗ −⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟− ⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠
= − + −
w V w w V w
J1 Σ 1
Σ 1 J 1
l l l l
170
Consequently,
( )
( ) ( )0
2 2 20 01 1 01 010
1ˆ
cov 2 .ˆ 1
B
AA
AB
B
NNT
N NTN
σ β σ β σ
⎛ ⎞−⎜ ⎟⎛ ⎞ ⎜ ⎟= + −⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ −⎜ ⎟⎜ ⎟
⎝ ⎠
(8.22)
By definition, ( )1 m mA mB
A B
N N NN N N
μ += =
+, ( )
( )0 02
0 1N N N
N Nσ
−=
−,
( )21 1
m fN NN N
σ =−
,
( )00
01 1m f fm
m f
N N NNN N N N
σ⎛ ⎞
= −⎜ ⎟⎜ ⎟− ⎝ ⎠, and thus 00
01fm
m f
NNN N
β = − . Equation (8.21) can be
further simplified as follows,
( )
( )
0 001 010
00 0
01 01
ˆ
ˆ
A mA mB mA A B mB mA
A A B A A B AA
B mA mB mB B A mA mBB
B A B B B A B
N N N N N N N NN NN N N N N N N NT
N N N N N N N NT N NN N N N N N N N
β β
β β
⎛ ⎞ ⎛⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞+− − − −⎜ ⎟ ⎜⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟+⎛ ⎞ ⎜ ⎟ ⎜⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠
= =⎜ ⎟ ⎜ ⎟ ⎜⎜ ⎟ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞+⎜ ⎟ ⎜⎝ ⎠ − − − −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟+⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝
00 0
00 0
.
fA B mB mA m
A B A m f
fB A mA mB m
B A B m f
NN N N N NNN N N N N N
NN N N N NNN N N N N N
⎞⎟⎟⎟⎟
⎜ ⎟⎜ ⎟⎠
⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟− − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎜ ⎟=
⎛ ⎞⎜ ⎟⎛ ⎞⎛ ⎞− − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠
It is also shown that
( )( ) ( )
2
00 02 2 2 00 01 1 01 012
1 1m f fm
m f
N N NN N N NN N N N N N
σ β σ β σ⎛ ⎞−
+ − = − −⎜ ⎟⎜ ⎟− − ⎝ ⎠. (8.23)
Therefore,
( )
( )
( )( ) ( )
2000 0 0
0
1ˆ
covˆ 1 11
B
A m f fA m
A m fB
B
NN N N NT N N N NN N N N N N N NT
N
⎛ ⎞−⎜ ⎟⎧ ⎫⎛ ⎞ ⎛ ⎞−⎪ ⎪⎜ ⎟= − −⎜ ⎟ ⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟ − −⎪ ⎪⎝ ⎠⎝ ⎠ − ⎩ ⎭⎜ ⎟⎜ ⎟⎝ ⎠
.
171
Through similar derivation, the estimators and the variance-covariance matrix
for gender-adjusted rates of populations A and B are shown to be
( )
( )
00 0
0
000 0
ˆ
ˆ
fA B mB mA m
A B A m fA
fB B A mA mB m
B A B m f
NN N N N NN N N N N Nr
Nr N N N N NN N N N N N
⎛ ⎞⎛ ⎞⎛ ⎞− − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞ ⎜ ⎟⎝ ⎠⎝ ⎠
=⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎛ ⎞⎛ ⎞⎜ ⎟⎝ ⎠ − − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠
,
( )
( )
( )( ) ( )
2000 0 0
0
1ˆ 1cov 1 1 1ˆB A m f fA m
A B m fB
N N N N Nr N N N NN NN N N N N N Nr
⎧ ⎫⎛ ⎞ − ⎛ ⎞⎛ ⎞ −⎪ ⎪= − −⎜ ⎟ ⎜ ⎟⎜ ⎟⎨ ⎬⎜ ⎟ ⎜ ⎟⎜ ⎟ − − −⎝ ⎠⎪ ⎪⎝ ⎠⎝ ⎠ ⎩ ⎭,
respectively.
172
CHAPTER 9
CONCLUDING REMARKS
This dissertation proposed alternative methods of deriving estimators using
random permutation model (Stanek, Singer and Lencina 2003) for settings of simple
random sampling without replacement (SRSWOR) and stratified simple random
sampling without replacement (STSRS). We integrated techniques from several areas
in survey sampling. We used a finite random permutation as a probability link between
samples and the population. This framework does not require assumptions about
parametric distribution, and the randomness is attributable only to random sampling
(permutation). We represented the joint permutation of response and auxiliary variables
with a seemingly unrelated regression setup, and thus the structural relationship
between the permuted population and the population was represented explicitly in
matrix format. This setup provides a convenient mathematical vehicle for deriving
estimators when multiple variables are involved. Such a setup enabled us to derive
linear unbiased estimators under design-based framework (Stanek, Singer and Lencina
2003) using estimation techniques that are commonly applied in prediction-based
approaches, such as Royall’s general prediction theorem (Royall 1976; Valliant,
Dorfman and Royall 2000). When multiple estimators are simultaneous estimated, this
approach is capable of retaining the covariance between the estimators, which is a
feature particularly useful in rate standardization and small-area estimates. Finally, we
used finite estimating equations approach (Binder and Patak 1994) to optimize the
173
estimators by minimizing either generalized mean squared error or the trace of
variance-covariance matrix of the target parameters subject to various constraints.
With this proposed method, we obtained explicit expression for the joint
permutation of response and multiple auxiliary variables, and formulae for the
population mean vector and their variance-covariance matrix. These results are similar
to those commonly seen in literature regarding seemingly unrelated regression (Zellner
1962; Zellner 1963; Revankar 1974; Srivastava and Giles 1987; Gao and Huang 2000;
Liu 2000), but our results incorporate finiteness and rely on neither parametric nor
superpopulation model assumptions.
We showed that sampling can be represented as a partition of the joint
permutation into the sample and the remainder parts, and the vector of population
means can be represented as a linear function of the random variables arising from the
permutation. We derived linear unbiased minimum variance predictors of population
means by predicting the remainder parts based on the sample, which may be called as
“design-based predictors”. Using this method, we derived, in Chapter 4, two classes of
estimators for population totals when auxiliary information was not available. The two
classes were defined by two sets of weights that differed in whether identical weights
were applied to each variable. The estimators were optimized by minimizing either the
trace of ( )ˆcov T or M-estimation criteria. Under SRSWOR and unbiasedness
constraints, it is shown that the estimators are identical to simple expansion estimators.
It is apparent that neither the method of defining estimators nor the optimization criteria
had impact on the estimation.
174
In Chapter 5, we derived estimators adjusted for known auxiliary information
through reparameterization. This approach yielded a set of weights that are not sample-
dependent, and the estimators are unbiased by definition. In contrast, the usual
calibration technique of incorporating auxiliary information leads to sample-dependent
weights, and the corresponding estimators do not have an unbiasedness property
(Deville and Särndal 1992).
Although the estimators derived in this dissertation have a similar functional
form to those derived using design-based regression model (Cochran 1977), model-
assisted (Särndal, Swensson and Wretman 1992), model-based (Bolfarine and Zacks
1992; Valliant, Dorfman and Royall 2000) and calibration approaches (Deville and
Särndal 1992; Brewer 1999), they depend on neither the superpopulation model nor
regression model assumptions. Further, the characteristics of design-based predictors
are best illustrated in contrast to usual calibration estimators:
1) Both approaches incorporate known auxiliary information and satisfy
constraints for the known auxiliary quantities.
2) Design-based predictors are optimized in terms of its mean squared error or
its variance, a direct and natural measure of precision; while calibration estimators are
optimized in terms of the distance between adjusted and naïve weights, with no
assurance that MSE or variance is minimum.
3) Unbiasedness constraints can be explicitly prescribed in design-based
prediction method but not in the usual calibration method.
175
4) Design-based prediction approach considers weights for auxiliary variables
irrelevant to the estimation of parameters, while calibration approach is based on
argument that identical weights should be used for response and auxiliary variables.
5) Unlike the usual calibration technique, the design-based prediction approach
leads to a set of coefficients (weights) that do not depend on the sample.
6) Another advantage of design-based predictors is that it always has a unique
solution when known population variance (or variance-covariance matrix) of auxiliary
variable(s) is used for estimation. In contrast, calibration estimator is computed based
on sample variance (or variance-covariance matrix) of the auxiliary variable(s), there is
no unique solution when sample variance-covariance matrix of the auxiliary variables is
singular.
In Chapter 6, we extended the results of Chapter 5 to settings of STSRS. By
treating the sampling process in each stratum as independent permutation processes,
extension of the results to STSRS is straightforward.
We applied these results to rate estimation in Chapter 7. When the response
variable is binomial and the auxiliary variable is categorical, we showed that the
derived estimators are post-stratified estimators when sample variance-covariances are
used for estimation, and equal to those derived using model-assisted (Särndal,
Swensson and Wretman 1992) or calibration (Deville and Särndal 1992) approaches.
Through a series Monte Carlo simulations, we illustrated that using known population
variance-covariance matrix in estimation has its competitive edge, which always yields
a unique solution with equal or smaller MSE to estimators based on sample auxiliary
variance-covariances.
176
We showed, in Chapter 8, that rate standardization based on a random
permutation model may account for the finiteness of the involved populations and
yields a variance-covariance matrix of the corresponding standardized rates. Unlike the
conventional rate standardization methods that assume the rate estimations are
independent from each other (Kahn and Sempos 1989), we obtained the covariance of
the estimated (standardized) rates under a model that assume equal rates in each
population.
In conclusion, we demonstrated in this dissertation a design-based prediction
approach of incorporating auxiliary information through reparameterization under
random permutation framework. We showed that this method is a useful alternative
approach of deriving linear unbiased estimators.
This dissertation research has its limitations. We considered only linear
unbiased minimum variance estimators for (stratified) simple random sampling without
replacement and use of auxiliary variables at the estimation stage. The usefulness of the
design-based prediction approach can be extended with further research on its
application to other estimation/prediction problems, such as small-domain or –area
estimation, unit non-response adjustment, and estimation of non-linear function. It is
interesting to examine whether relaxation of unbiasedness constraints or use of other
optimal criteria such as log-determinants or M-estimation will result in improvement or
loss of efficiency in estimation. Another challenge to this design-based prediction
approach is to develop estimation/prediction method for more complex sampling
design, such as multiple stage cluster random sampling or other type of unequal
probability sampling.
177
BIBLIOGRAPHY
Binder, D. A. and Z. Patak (1994). Use of estimating functions for estimation from complex surveys. Journal of the American Statistical Association 89(427): 1035-1043.
Bolfarine, H. and S. Zacks (1992). Prediction theory for finite populations. New York, Springer-Verlag.
Breslow, N. E. and N. E. Day (1987). Design and analysis of cohort studies. Lyon, France, IARC Scientific Publication N. 82.
Brewer, K. R. W. (1963). Ratio estimation and finite populations: Some results deducible from the assumption of underlying stochastic process. Australian Journal of Statistics 5: 93-105.
Brewer, K. R. W. (1979). A class of robust sampling designs for large-scale surveys. Journal of the American Statistical Association 74: 911-915.
Brewer, K. R. W. (1995). Combining design-based and model-based inference (chapter 30). Business survey methods. B. B. Cox, DA; Chinnappa, BN; Christianson, A; Colledge, MJ; Kott, PS. New York, John Wiley & Sons, Inc.: 589-606.
Brewer, K. R. W. (1999). Cosmetic calibration with unequal probability sampling. Survey methodology 25: 205-212.
Brewer, K. R. W. (1999). Design-based or prediction-based inference? Stratified random vs stratified balanced sampling. International Statistical Review 67(1): 35-47.
Cassel, C. M., C.-E. Särndal and J. H. Wretman (1977). Foundations of inference in survey sampling. New York, NY, Wiley.
CDC (1999). Notice to readers: New population standard for age-adjusting death rates. Morbidity and Mortality Weekly Report 48: 126-127.
Choi, B. C., N. A. de Guia and P. Walsh (1999). Look before you leap: Stratify before you standardize. Am J Epidemiol 149(12): 1087-1096.
Cochran, W. G. (1977). Sampling techniques. New York, John Wiley and Sons. Deville, J. C. and C. E. Särndal (1992). Calibration estimators in survey sampling.
Journal of the American Statistical Association 87: 376-382. Duchesne, P. (2000). A note on jackknife variance estimation for the general regression
estimator. Journal of Official Statistics 16(2): 133-138. Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistics
7: 1-26. Efron, B. and R. J. Tibshirani (1993). An introduction to the bootstrap. New York,
Chapman and Hall. Estevao, V. M. and C.-E. Särndal (2000). A functional form approach to calibration.
Journal of Official Statistics 16(4): 379-399. Esteve, J., E. Benhamou and L. Raymond, Eds. (1994). Descriptive epidemiology.
Statistical methods in cancer research no. 128. Lyon, France, IARC Scientific Publication.
Feldstein, M. S. (1966). A binary variable multiple regression method of analysing factors affecting peri-natal mortality and other outcomes of pregnancy. Journal of the Royal Statistical Society, Series A 129: 61-73.
178
Fleiss, J. L. (1981). Statistical methods for rates and proportions. New York, NY, John Wiley & Sons, Inc.
Forthofer, R. N. and R. G. Lehnen (1981). Public program analysis: A new categorical data approach. Belmont, CA, Lifetime Learning Publications.
Gao, D.-d. and R.-B. Huang (2000). Some finite sample properties of zellner estimator in the context of m seemingly unrelated regression equations. Journal of Statistical Planning and Inference 88: 267-283.
Ghosh, M. and J. N. K. Rao (1994). Small area estimation: An appraisal. Statistical Science 9(1): 55-76.
Godambe, V. P. (1955). A unified theory of sampling from finite populations. Journal of the Royal Statistical Society B 17: 269-278.
Goldman, D. A. and J. D. Brender (2000). Are standardized mortality ratios valid for public health data analysis? Statistics in Medicine 19: 1081-1088.
Graybill, F. A. (1976). Theory and application of the linear model. Belmont, CA, Wadsworth Publishing Company, Inc.
Grizzle, J. E., C. F. Starmer and G. G. Koch (1969). Analysis of categorical data by linear models (corr: V26 p860). Biometrics 25: 489-504.
Hansen, M. H., W. G. Madow and B. J. Tepping (1983). An evaluation of model-dependent and probability-sampling inferences in sample surveys. Journal of the American Statistical Association 78(384): 776-807.
Harville, D. A. (1997). Matrix algebra from a statistician's perspective. New York, Springer-Verlag.
Holt, D. and T. M. F. Smith (1979). Post stratification. Journal of the Royal Statistical Society, Series A, General 142: 33-46.
Holt, D., T. M. F. Smith and T. J. Tomberlin (1979). A model-based approach to estimation for small subgroups of a population. Journal of the American Statistical Association 74: 405-410.
Horvitz, D. G. and D. J. Thompson (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association: 663-685.
Inskip, H., V. Beral, P. Fraser and J. Haskey (1983). Methods for age-adjustment of rates. Statistics in Medicine 2(4): 455-466.
Jagers, P. (1986). Post-stratification against bias in sampling. International Statistical Review 54(2): 159-167.
Jones, H. L. (1974). Jackknife estimation of functions of stratum means. Biometrika 61(2): 343-348.
Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lutkepohl and T. C. Lee (1985). The theory and practice of econometrics. New York, John Wiley and Sons.
Julious, S. A., J. Nicholl and S. George (2001). Why do we continue to use standardized mortality ratios for small area comparisons? Journal of Public Health Medicine 23(1): 40-46.
Kahn, H. A. and C. T. Sempos (1989). Statistical methods in epidemiology. New York, Oxford University Press.
Lee, W.-C. (2002). Standardization using the harmonically weighted ratios: Internal and external comparisons. Statistics in Medicine 21: 247-261.
179
Lee, W.-C. and Y.-P. Liaw (1999). Optimal weighting systems for direct age-adjustment of vital rates. Statistics in Medicine 18: 2645-2654.
Little, R. J. A. (1982). Direct standardization - a tool for teaching linear-models for un-balanced data. American Statistician 36(1): 38-43.
Liu, J. S. (2000). Msem dominance of estimators in two seemingly unrelated regressions. Journal of Statistical Planning and Inference 88(2): 255-266.
Liu, J. S. and S. G. Wang (1999). Two-stage estimate of the parameters in seemingly unrelated regression model. Progress in Natural Science 9(7): 489-496.
McCarthy, P. J. and C. B. Snowden (1985). The bootstrap and finite population sampling. Vital and Health Statistics. Ser. 2. No. 95. DHHS Pub. No. (PHS) 85-1369, Public Health Service, Washington, US Goverment Printing Office.
Mukhopadhyay, P. (2000). Topics in survey sampling. New York, Springer-Verlag. Neison, F. G. P. (1844). On a method recently proposed for conducting inquires into the
comparative sanitary condition of various districts. London Journal of Royal Statistical Society 7: 40-68.
Oh, H. L. and F. J. Scheuren (1983). Weighting adjustments for unit nonresponse. Incomplete data in sample survey - theory and bibliographies. W. G. O. Madow, I. and Rubin, D. B. New York, Academic Press. 2: 435-483.
Rao, J. N. K. (1997). Developments in sample survey theory: An appraisal. The Canadian Journal of Statistics 25(1): 1-21.
Rao, J. N. K. (1999). Some current trends in sample survey theory and methods. Sankhya : The Indian Journal of Statistics, Series B 61(1): 1-57.
Rao, J. N. K. (1999). Some recent advances in model-based small area estimation. Survey Methodology 25(2): 175-186.
Rao, J. N. K. and D. R. Bellhouse (1978). Optimal estimation of a finite population mean under generalized random permutation models. Journal of Statistical Planning and Inference 2: 125-141.
Rao, J. N. K. and C. F. J. Wu (1988). Resampling inference with complex survey data. Journal of the American Statistical Association 83(401): 231-241.
Revankar, N. S. (1974). Some finite sample results in the context of two seemingly unrelated regression equations. Journal of the American Statistical Association 69(345): 187-190.
Revankar, N. S. (1976). Use of restricted residuals in sur systems: Some finite results. Journal of the American Statistical Association 71(353): 183-188.
Rosenbaum, P. R. (1987). Model-based direct adjustment. Journal of the American Statistical Association 82(398): 387-394.
Rothman, K. J., Ed. (1986). Modern epidemiology. Boston, Little, Brown and Company.
Royall, R. M. (1970). Finite population sampling - on labels in estimation. The Annals of Mathematical Statistics 41(5): 1774-1779.
Royall, R. M. (1970). On finite population sampling theory under certain linear regression models. Biometrika 57(2): 377-387.
Royall, R. M. (1971). Linear regression models in finite population sampling. Foundations of statistical inference. V. P. Godambe and D. A. Sprott. Toronto, Holt, Rinehart and Winston: 259-279.
180
Royall, R. M. (1976). The linear least-squares prediction approach to two-stage sampling. Journal of the American Statistical Association 71: 657-664.
Royall, R. M. and W. G. Cumberland (1978). Variance estimation in finite population sampling. Journal of the American Statistical Association 73: 351-358.
Särndal, C. E., B. Swensson and J. Wretman (1992). Model assisted survey sampling. New York, Springer-Verlag.
Särndal, C. E., B. Swensson and J. H. Wretman (1989). The weighted residual technique for estimating the variance of the general regression estimator of the finite population total. Biometrika 76: 527-537.
Särndal, C. E. and R. L. Wright (1984). Cosmetic form of estimators in survey sampling. Scandinavian Journal of Statistics 11: 146-156.
Searle, S. R., G. Cassella and C. E. McCulloch (1992). Variance components. New York, John Wiley & Sons, Inc.
Singh, A. C. and C. A. Mohl (1996). Understanding calibration estimators in survey sampling. Survey methodology 22(2): 107-115.
Smith, T. M. F. (1990). Post-stratification. The Statistician 40: 315-323. Srivastava, V. K. and D. E. A. Giles (1987). Seemingly unrelated regression equations
models : Estimation and inference. New York, Dekker. Stanek, E. J., J. d. M. Singer and V. B. Lencina (2003). A unified approach to
estimation and prediction under simple random sampling. Journal of Statistical Planning and Inference: (Manuscript accepted, December 2002).
Stukel, D. M., M. A. Hidiroglou and C. E. Särndal (1996). Variance estimation for calibration estimators: A comparison of jackknifing versus taylor linearization. Survey methodology 22(2): 117-125.
Thompson, M. E. (1997). Theory of sample surveys. New York, Chapman & Hall. Thomsen, I. (1978). Design and estimation problems when estimating a regression
coefficient from survey data. Metrika 25: 27-35. Valliant, R. (2002). Variance estimation for the general regression estimator. Survey
methodology 28(1): 103-114. Valliant, R., A. H. Dorfman and R. M. Royall (2000). Finite population sampling and
inference, a prediction approach. New York, John Wiley & Sons. Wolter, K. M. (1985). Introduction to variance estimation. New York, Springer-Verlag. Wu, C. B. and R. R. Sitter (2001). A model-calibration approach to using complete
auxiliary information from survey data. Journal of the American Statistical Association 96(453): 185-193.
Zellner, A. (1962). An efficient method of estimating seemingly unrelated regression and tests for aggregation bias. Journal of the American Statistical Association 57(298): 348-368.
Zellner, A. (1963). Estimators of seemingly unrelated regression equations: Some exact finite sample results. Journal of the American Statistical Association 58: 977-992.
Zhang, L.-C. (1999). A note on post-stratification when analyzing binary survey data subject to nonresponse. Journal of Official Statistics 15(2): 329-334.