university of massachusetts amherst · vi abstract use of random permutation model in rate...

USE OF RANDOM PERMUTATION MODEL

IN RATE ESTIMATION AND STANDARDIZATION

A Dissertation Presented

by

WENJUN LI

Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment

of the requirements for the degree of

DOCTOR OF PHILOSOPHY

February 2003

Public Health Biostatistics and Epidemiology

USE OF RANDOM PERMUTATION MODEL

IN RATE ESTIMATION AND STANDARDIZATION

A Dissertation Presented

by

WENJUN LI

Approved as to style and content by: ____________________________________

Edward J. Stanek III, Chair ____________________________________

Carol Biglow, Member ____________________________________

John P. Buonaccorsi, Member

_________________________________________ Philip C. Nasca, Department Chair Biostatistics and Epidemiology

DEDICATION

To my parents Li Beihua and Wu Dehua,

and my wife Lu Lu

v

ACKNOWLEDGMENTS

I would like to express appreciation on Professors Edward J. Stanek III, John P.

Buonaccorsi and Carol Biglow. I am deeply grateful to Professor Stanek, without his

encouragement and guidance, this dissertation research would not be completed

successfully. I thank Drs David Hosmer and John Griffith for their helpful comments.

vi

ABSTRACT

USE OF RANDOM PERMUTATION MODEL IN RATE ESTIMATION AND

STANDARDIZATION

FEBRUARY 2003

WENJUN LI, B.S., FUDAN UNIVERSITY

M.S., UNIVERSITY OF MASSACHUSETTS AMHERST

Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST

Directed by: Professor Edward J. Stanek, III

Through integrating techniques from several areas in survey sampling, we

develop an alternative method of deriving estimators using random permutation models

under (stratified) simple random sampling without replacement. The finite random

permutation model links the samples to the population. The joint permutation of

response and auxiliary variables is modeled using seemingly unrelated regression. We

use prediction theory from the super-population sampling literature to derive the linear

unbiased minimum variance predictors of population means under the design-based

framework using the finite estimating equation approach of Binder and Patak (1994).

The predictors have functional forms similar to those derived using design-based,

model-assisted and calibration approaches, but depend on neither superpopulation nor

regression model assumptions. We applied the results to standardization of multiple

rates, and illustrate how our methods account for the covariance of the standardized

rates, unlike conventional standardization methods.

vii

TABLE OF CONTENTS

Page

ACKNOWLEDGMENTS ................................................................................................ v ABSTRACT ..................................................................................................................... vi LIST OF TABLES ............................................................................................................ x LIST OF FIGURES ......................................................................................................... xi CHAPTER 1. INTRODUCTION ........................................................................................................ 1

1.1. Motivations .................................................................................................. 1 1.2. Rate estimation and adjustment in epidemiology ........................................ 2 1.3. Three major approaches in survey sampling ............................................... 7 1.4. Calibration estimation ................................................................................ 11 1.5. Direct adjustment and poststratification .................................................... 12 1.6. Sampling framework .................................................................................. 13 1.7. Seemingly unrelated regression models .................................................... 15 1.8. Functional form approach in survey sampling .......................................... 16 1.9. Optimization methods ................................................................................ 18

2. NOTATIONS, DEFINITIONS AND BASICS OF RANDOM PERMUTATION

MODELS ............................................................................................................ 20

2.1. Notation ..................................................................................................... 20 2.2. Definition of population and population parameters ................................. 22 2.3. Single stage random permutation .............................................................. 23 2.4. Simultaneous permutation of multiple variables ....................................... 24 2.5. Sampling and partition of random matrices ............................................... 26 2.6. Derivation of variance-covariance matrix of ( )vec U ............................... 27 2.7. Derivation of variance-covariance matrix in Section 2.4. ......................... 28 2.8. Derivation of variance-covariance matrix of the partitioned matrix ......... 32

3. GENERAL ESTIMATION STRATEGY .................................................................. 35

3.1. Parameters of interest ................................................................................. 37 3.2. Random permutation seemingly unrelated regression model .................... 37 3.3. Simultaneous estimation of 1p + population totals .................................. 39 3.4. Example: best linear unbiased estimator of a population total .................. 41 3.5. Estimation of a population total using auxiliary information .................... 42

viii

3.6. Enumeration and simulation ...................................................................... 45 3.7. Research objectives .................................................................................... 46

4. ESTIMATION OF POPULATION TOTALS ........................................................... 49

4.1. Motivations ................................................................................................ 49 4.2. Parameters of interest ................................................................................. 50 4.3. Formulation of a RPSUR model ................................................................ 50 4.4. Two classes of estimators for population total vector ............................... 51 4.5. Estimating multiple population totals ........................................................ 53 4.6. Remarks ..................................................................................................... 57 4.7. Derivations of results in Section 4.5.1. ...................................................... 58 4.8. Derivations of results in Section 4.5.2. ...................................................... 61 4.9. Derivations of results in Section 4.5.3. ...................................................... 64 4.10. Derivations of results in Section 4.5.4. ...................................................... 65

5. ESTIMATING POPULATION TOTALS BY REPARAMETERIZATION ............. 68

5.1. Motivation .................................................................................................. 68 5.2. Formulation of the model through reparameterization .............................. 68 5.3. Estimating ( )0T and ( )0μ with a known auxiliary population total ........... 70 5.4. Estimating ( )0T and ( )0μ with several known auxiliary totals .................. 74 5.5. Estimation of variance and confidence intervals ....................................... 78 5.6. Remarks ..................................................................................................... 82 5.7. Derivations of results in Section 5.3. ......................................................... 83 5.8. Derivations of results in Section 5.4. ......................................................... 87

6. ESTIMATORS BASED ON RANDOM PERMUTATION MODELS UNDER

STRATIFIED SIMPLE RANDOM SAMPLING .............................................. 90

6.1. Motivation .................................................................................................. 90 6.2. Definition and notations ............................................................................. 90 6.3. Independent permutation of one response variable across H strata ......... 92 6.4. Simultaneous permutations of one response variable and p auxiliary

variables across H strata ........................................................................... 93 6.5. Stratified random sampling and partition of random matrices .................. 95 6.6. Parameter of interest .................................................................................. 96 6.7. Formulation of the model when auxiliary variables are present ................ 96 6.8. Estimating ( )0T when one auxiliary variable is present ............................ 97 6.9. Derivations of results in Section 6.8.4. .................................................... 103 6.10. Derivations of results in Section 6.8.5. .................................................... 107

ix

7. APPLICATIONS TO RATE ESTIMATION .......................................................... 111

7.1. Motivations .............................................................................................. 111 7.2. Cases with single categorical auxiliary variable ...................................... 111 7.3. Cases with two categorical variables ....................................................... 125 7.4. Cases with single continuous auxiliary variable ...................................... 128 7.5. Derivations of results (7.1) and (7.2) in Section 7.2. ............................... 146 7.6. Derivations of results (7.3) and (7.4) in Section 7.2. ............................... 150

8. DIRECT RATE STANDARDIZATION USING RANDOM PERMUTATION

MODELS .......................................................................................................... 154

8.1. Motivations .............................................................................................. 154 8.2. Rate estimation with internal direct adjustment ...................................... 154 8.3. Rate comparison with internal direct adjustment .................................... 160 8.4. External direct adjustment ....................................................................... 162 8.5. Discussion ................................................................................................ 163 8.6. Derivations of the results in Section 8.2. ................................................. 164

9. CONCLUDING REMARKS ................................................................................... 172 BIBLIOGRAPHY ......................................................................................................... 177

x

LIST OF TABLES

Table Page

4.1. Two classes of unbiased estimators of population totals under two optimization criteria ...................................................................................... 53

7.1. Mean squared error (x 10,000) of four estimators by population size and sample size .................................................................................................. 116

7.2. Coverage rates and width of 95% confidence intervals .................................. 117

7.3. Percentage of left- and right-misses of the 95% confidence intervals, by population size, sample size and estimation methods ................................. 118

7.4. Two categorical auxiliary variable examples .................................................. 125

7.5. Parameters of hypothetical population Series A and B ................................... 133

7.6. MSE of design-based estimators adjusted to known continuous or categorical auxiliary variables, by population size and sample size (Series A) .................................................................................................... 134

7.7. MSE of design-based estimators adjusted to known continuous or categorical auxiliary variables, by population size and sample size (Series B) .................................................................................................... 135

7.8. Probability of encountering negative variance when sample size is small and age is used as a continuous or categorical auxiliary variable .............. 142

7.9. Coverage rates and width of 95% confidence interval using age as a continuous or categorical variable, by population size and sample size (Series A) .................................................................................................... 144

7.10. Coverage rates and width of 95% confidence interval using age as a continuous or categorical variable, by population and sample sizes (Series B) .................................................................................................... 145

xi

LIST OF FIGURES

Figure Page

3.1. General Estimation Strategy .............................................................................. 36

7.1. Mean squared error by population size and sample size ................................. 119

7.2. MSE reduction (%) of a design-based predictor (DPS or DPP ) by population and sample sizes ....................................................................... 120

7.3. MSE Ratio of design-based predictors to SRS estimator by population and sample sizes .......................................................................................... 121

7.4. Hypothetical relationship between smoking and age of population series A and B ....................................................................................................... 132

7.5. MSE and variance of design-based predictors adjusted for continuous or categorical age (Series A) ........................................................................... 137

7.6. MSE and variance ratios of estimators adjusted for age median, quartiles, octiles and sedeciles to estimators adjusted for age mean (Series A) ......... 138

7.7. MSE and variance of design-based predictors adjusted for continuous or categorical age (Series B) ........................................................................... 140

7.8. MSE and variance ratios of estimators adjusted for age median, quartiles, octiles and sedeciles to estimators adjusted for age mean (Series B) ......... 141

1

CHAPTER 1

INTRODUCTION

Estimation and comparison of disease prevalence based on survey samples is

one of the most common procedures in epidemiological investigation. Analysis of

survey data often involves “adjustment” or “control” for covariates. A methodological

issue that arises is how to adjust for the imbalance of covariates in a sample or to

remove “confounding”. For example, a sample may have a very different gender ratio

than that of the parent population. This issue is closely related to the theory of survey

sampling, and application of recently advanced methods for addressing this issue may

improve methodologies in Epidemiological surveys.

1.1. Motivations

Rate estimation and standardization in epidemiology and vital statistics, in

particular, is closely related to calibration, poststratification and generalized regression

estimation (GREG) in survey sampling. Under certain assumptions, directly

standardized rates can be viewed as estimators “calibrated” to known population or

subpopulation totals, or estimators based on poststratification, or GREG estimators

based on group mean models. These methods of estimation have experienced significant

theoretical and practical advances in the past twenty years. Application of these

techniques may lead to a new class of estimators that will improve rate estimation and

standardization procedures commonly used in epidemiology and vital statistics.

This dissertation research develops alternative methods of defining and

estimating population proportions (means) based on a functional form approach (Binder

2

and Patak 1994; Thompson 1997) and calibration methods (Deville and Särndal 1992;

Särndal, Swensson and Wretman 1992; Brewer 1995; Estevao and Särndal 2000).

Seemingly unrelated regression and finite random permutation model frameworks

(Stanek, Singer and Lencina 2003) are used as vehicles for mathematical deviation.

Research focuses on developing estimators for population totals and proportions in the

setting of simple random sampling without replacement (SRSWOR) and stratified

simple random sampling without replacement (STSRS).

In the following sections, I will briefly review current methods of rate

standardization and advances in several areas of survey sampling that are directly

relevant to the estimation methods proposed in this dissertation. This will lay a ground

for theoretical development of later Chapters.

1.2. Rate estimation and adjustment in epidemiology

Various methods of adjusting for the imbalance of covariates resulting from

complex sampling have been developed in the epidemiologic literature (Rothman 1986;

Breslow and Day 1987; Rosenbaum 1987; Esteve, Benhamou and Raymond 1994).

When applied to rate estimation, these methods are used to “control” or “adjust” for

confounding, and produce a single index that is “comparable” between populations or

subpopulations. For example, in epidemiological research, disease morbidity or

mortality experience (hereafter, referred to more generally as prevalence) is often

compared across geographical areas that have different distribution with respect to the

control variables (e.g., age). Direct comparison of crude prevalence rates may be

severely confounded by the difference in the population structure of the areas being

compared when the difference is related to the outcome of interest (e.g., mortality).

3

“Standardization” or “adjustment” needs to be made to remove such effects of

“confounding”. A typical example is the gender- and age-adjusted mortality rates in

vital statistics.

Adjustment for confounding can be made with or without a multiple regression

model (Kahn and Sempos 1989). Methods not based on regression models are

commonly referred to as “standardization” in literature. Briefly, a standardized rate is a

weighted average of the stratum specific prevalence rates in the study populations,

where weights are determined by the relative frequencies of the control variable in the

referred population. Standardization methods can be grossly classified into “direct” or

“indirect” standardization according to whether a same weighting system is applied to

different populations (Kahn and Sempos 1989). We will discuss these methods in more

detail in the subsequent sections.

Adjustment can be also made based on regression models (Kahn and Sempos

1989). For example, a categorical auxiliary variable can be used for poststratification; a

continuous variable strongly correlated with the response variable can be used to

improve the estimation using ratio or regression estimation. When all the covariates in

the regression model are categorical, they are analogous to directly standardized rates

but include the possibility of adjusting for many variables at one time (Feldstein 1966;

Grizzle, Starmer and Koch 1969; Forthofer and Lehnen 1981). Continuous covariates

can be included in the model in such cases, and it leads to generalized regression

estimators (GREG) and calibration estimators.

4

1.2.1. Rate standardization and finite sampling

Direct standardization of rates can be viewed as special cases of finite sampling.

A fundamental assumption is that the populations under comparison arise from the same

parent population or are similar with respect to factors associated with the event under

study (Fleiss 1981), which is often implied by the “null hypothesis” (Neison 1844). The

choice of a “standard population” is, in statistical term, to define the “parent

population” under study. In addition, the populations under comparison are considered

as realizations of random samples from that parent population. Under such assumptions,

directly standardized rates may be considered as special cases of calibration estimators

or generalized regression (GREG) estimators. The weight systems for standardization

are “calibrated” to the known population or group quantities of the covariates, and the

standardized rates are estimated (predicted) rates of the “parent population” based on

these samples.

Direct standardization of rates, under certain assumptions, is also related to the

class of estimators commonly referred to as “poststratified estimators” (Thomsen 1978;

Oh and Scheuren 1983; Särndal, Swensson and Wretman 1992). A presumption in this

methodology is that random sampling may result in an imbalance of auxiliary variables

but not the probability distribution of the study variable given the auxiliary values

(Mukhopadhyay 2000).

1.2.2. Direct standardization

The purpose of direct adjustment is to present an estimate of event prevalence

that characterizes a population if the distribution of the confounding variables were the

same as that in the standard population. The adjusted totals or rates are then

5

“comparable” between populations. There are two major approaches, i.e., internal and

external comparisons. The former involves only the populations under comparison, the

later involves an outside reference population with a known population structure. The

choice of a “standard population” can make a substantial difference in the standardized

rates and may have significant impact on the interpretation of study results. This

weakness has renewed interest in methodological research on the appropriateness and

validity of rate standardization to describe the discrepancy in event prevalence among

different populations (CDC 1999; Choi, de Guia and Walsh 1999; Goldman and

Brender 2000; Julious, Nicholl and George 2001).

Once the standard population (SP) is chosen, direct standardization methods

utilize the distribution of the confounders in this population to define weights that are

used in the weighted average of prevalence estimates for all populations under

comparison. To illustrate this method, we consider the following simple case. Suppose

that two populations, indexed as 1 2i , ,= are under comparison. Assume each

population has J age strata that are indexed as 1 2j , , ,J= … and that age is a

confounder of the study outcome. The Directly Standardized Rate (DSR) for population

i is defined as

1DSR J j

i ijjSP

Nr

N== ∑ ,

where ijr is the rate in the j-th stratum of the i -th population, j SPN N is the relative

frequency of stratum j in the standard population. Using DSRs, a Directly

Standardized Rate Ratio (DSRR) can be defined as

6

11

111

Predicted overall rate of SP withstratum-specific rates of population DSRR

Actual overall rate of SP

JSP j ij Jj

i j ijJ jSP j jj

N N ri w rN N R

−=

=−=

= = =∑ ∑∑

,

where ( ) 1

1

Jj j j jj

w N N R−

== ∑ . Notice that the weights ( jw ) are fixed and independent

of the populations under comparison (or not dependent on samples); therefore, the

weight system is “identical” for all populations under comparison. DSRR is commonly

referred as the Comparative Mortality Figure (CMF) or the Standardized Rate Ratio

(SRR) in the literature. Since we are interested in not only mortality rates but also other

prevalence rates, we use a more general term DSRR.

1.2.3. Indirect standardization, ISR and ISRR

Indirect standardization utilizes as its weighting system the distribution of the

confounder in the comparison (e.g., “exposed”) population. Thus, a separate

“Indirectly Standardized Rate Ratio” (ISRR) is produced for each population being

compared. ISRR is commonly referred as “Standardized Mortality Ratio” (SMR) in

vital statistics. The indirectly standardized rate for population i is defined as

[ ]

1 1

1 1

1

Observed overall rate in population ISR SP overall ratePredicted overall rate in population

using stratum-specific rates of SP

i

J Jij ij ijj j

J Jij j ijj j

Jij ijj

ii

N r NR

N R N

R v r

= =

= =

=

= ×

= ×

=

∑ ∑∑ ∑∑

where 1 1

J Jj j jj j

R N R N= =

= ∑ ∑ is the overall rate of the standard population, a known

constant; 1

Jij ij ij jj

v N N R=

= ∑ , is dependent on the size ( )ijN of group j in population

i , therefore each population under comparison will have different weighting factor

7

unless all corresponding group relative frequency distribution of the control variable

cross populations are equal.

The ISRR of population i is a rate ratio defined as

11

1

ISRISRRJ

ij ij Jjii ij ijJ j

ij jj

N rv r

R N R=

=

=

= = =∑

∑∑

.

ISRR is the most commonly used technique and has been used to inform allocation of

resources and health-related police-making (Julious, Nicholl and George 2001).

1.2.4. Other standardization methods

There are many other standardization methods in literature (Inskip et al. 1983;

Julious, Nicholl and George 2001). For example, Lee and Liaw proposed a new

weighting system to compute directly standardized rates, where the set of weights are

determined by the data rather than specified a priori, and optimized for stability while

ensuring the comparability of the rates between populations (Lee and Liaw 1999). This

method requires an assumption that the event counts follow a Poisson distribution, and

achieves its optimality by minimizing the total variance of the age-adjustment indices in

the logarithmic scale. Based on such optimized weights, (Lee 2002) proposed a new

standardized summary measure - “harmonically weighted ratio” that can be applied to

external and internal comparisons if the rate-ratio homogeneity assumption holds, i.e.,

rate-ratios of different groups are equal.

1.3. Three major approaches in survey sampling

Current sample survey methods include three major approaches: 1) design-

based, 2) model-based and 3) model-assisted approaches. What distinguishes them is

the choice of probability model that is used for statistical inference.

8

The design-based approach uses probability sampling (randomization) for both

sample selection and inference from sample data. The probability distribution

associated with the randomized sample selection provides the basis for probabilistic

inferences about the target population. The bias, variance and mean squared error

(MSE) are defined in terms of the expectation over all possible samples under the

sampling design. This approach leads to valid repeated sampling inference regardless of

the population structure, and is free from “model misspecification”. The design-based

framework established by (Horvitz and Thompson 1952) and (Godambe 1955) is a

typical example. For example, the population total may be written as 1N NT ×′=y 1 y , and

its Horvitz-Thompson estimator is 1HT n s sT −′= 1 Π y , where sy is an n-vector of the sample

values of y ; Π and sΠ are N N× and n n× diagonal matrices of inclusion

probabilities, respectively, and n′1 and N′1 indicate row vectors with all elements equal

to one. When auxiliary variables are available, the Horvitz-Thompson (HT) estimator

can be written as

( ) ( )( )

( )ˆ

ˆˆHT

HTRHT

TT T

T=

yy x

x

where ( )T x and ( )HTT x are population total of x and its HT estimator, respectively

(Brewer 1963). This estimator is asymptotically design-unbiased and design-consistent,

i.e., ( ) ( ) 0HTRˆE T y T y⎡ ⎤− →⎣ ⎦ and ( ) ( ) 0HTRPr T y T y ε⎡ ⎤− > →⎣ ⎦ when both sample and

population sizes become large ( ), n N→∞ →∞ (Brewer 1979; Särndal and Wright

1984).

9

The Model- or prediction-based approach assumes that the target population

follows a specified model (distribution), and the model distribution yields valid

inferences conditioning on the particular sample that has been drawn (Rao 1997;

Brewer 1999). The bias, variance and mean squared error (MSE) are defined in terms of

the expectation over all possible realizations of a stochastic model that connects the

variable of interest to a set of auxiliary variables (Brewer 1995). Model-dependent

methods can perform poorly as evaluated by the expected value of the model based

MSE over the design in large samples if the model is misspecified (Hansen, Madow and

Tepping 1983; Rao 1997). Model-based approaches have application when handling

nonsampling errors, such as measurement errors and non-response (Brewer 1963;

Royall 1970; Brewer 1995). In small-area estimation, model-based estimators can have

smaller MSE than design based estimators. The best linear unbiased predictor (BLUP)

is a typical model-based estimator (Ghosh and Rao 1994; Rao 1997).

The model-assisted approach is a hybrid of design-based and model-based

approaches. Plausible population models are often used to choose designs and efficient

design-consistent estimators, but the inference remains design-based. This means that

probability statements refer to the set of all possible samples that can be drawn with the

given probability sampling design. In the model-assisted approach, models may be used

to construct estimators, but randomization must be used to select the sample, and

statistical properties of estimators are computed with respect to the probability sampling

distribution (Särndal, Swensson and Wretman 1992; Rao 1999). The resulting estimator

is model-unbiased under the assumed model as well as design consistent (Särndal,

Swensson and Wretman 1992; Rao 1999). The model-assisted approach provides a

10

formal framework for using auxiliary information at the estimation stage. The popular

generalized regression estimator (GREG) is an example of this type (Cassel, Särndal

and Wretman 1977; Särndal, Swensson and Wretman 1992). For example, the general

form of GREG estimator of the population total is

( ) ( ) ( ) ( ) ˆ垐 �GREG HT HT GREGT T T T⎡ ⎤= + −⎣ ⎦y y X X β (1.1)

where X is an N p× matrix of auxiliary variables, ( ) NT ′=X 1 X is the 1 p× row vector

of its column sums, ( ) 1HT n s sT −′=X 1 Π X is the Horvitz-Thompson estimators of ( )T X ,

sX is the n p× matrix of known auxiliary variables of sampled subjects in a

hypothesized linear model linking y and X, e.g., y = Xβ+ ε , where ( )Eξ =ε 0 ,

( ) 2Eξ σ′ =εε A , A is a N N× diagonal matrix. Under this model, the estimator of the

p -vector regression parameter β is

( ) 11 1 1 1 1 1ˆGREG s s s s s s s s

−− − − − − −=β X A Π X X A Π y . (1.2)

This estimator is design-consistent and asymptotically design-unbiased. It is also an

estimator “calibrated” against all the auxiliary variables in X (Deville and Särndal

1992).

However, model-assisted approaches have also their limitations. When the

model is correctly specified, model-assisted inferences are inferior to those from model-

based approach (Rao 1997); when the model is misspecified, they are inferior to the

unconditional designed based approach (Rao 1997).

11

1.4. Calibration estimation

Calibration, or cosmetic calibration (Särndal and Wright 1984; Brewer 1999), is

another popular method of incorporating known auxiliary information to improve

estimation. This approach does not necessarily need a model at all (Thompson 1997).

The strategy is to identify a set of sample-dependent (or sometimes called adjusted)

weights, such that when they are applied to auxiliary variables, the “estimator” of the

auxiliary quantities will equate the known corresponding auxiliary quantities (Deville

and Särndal 1992; Särndal, Swensson and Wretman 1992). The rational behind

calibration methods is that a weight system should be good for a response variable if it

is good for a strongly correlated auxiliary variable (Deville and Särndal 1992; Särndal,

Swensson and Wretman 1992; Brewer 1999).

The idea can be illustrated using the following simple example. Suppose we are

interested in estimating the prevalence of teenager smoking in a population of size N ,

and the total number of male and female teenagers are known. It is believed that male

teenagers are much more likely to be smokers than their female peers. A SRSWOR

sample of teenagers of a predetermined size ( )n is drawn from the population. Without

using any auxiliary variable, one would use a simple expansion weight nNn

=a 1 to

estimate the prevalence. Using the calibration method, one can find a set of sample-

dependent weights ( )w by minimizing the squared distance between w and a subject

to a constraint such that when w is applied to “estimating” the total number of males or

females, the result must be equal to the known numbers of males and females. In

numerous publications, calibration estimators have been shown to be more efficient

12

than the sample expansion estimators (Deville and Särndal 1992; Särndal, Swensson

and Wretman 1992).

1.5. Direct adjustment and poststratification

In the statistical literature, direct adjustment and poststratification are often

referred to as the same estimation procedure (Cochran 1977; Holt and Smith 1979;

Little 1982; Oh and Scheuren 1983; Rosenbaum 1987). While “direct adjustment” in

epidemiology does not necessarily mean the same as in statistical literature, “direct

adjustment” in epidemiology can be viewed as a special case of poststratification in

survey sampling if the “standard population” is viewed as the parent population from

which the populations under comparison arise.

Poststratification has been discussed intensively in both model-based and

model-assisted approaches (Royall 1971; Holt and Smith 1979; Rosenbaum 1987;

Smith 1990; Särndal, Swensson and Wretman 1992). Using a model based approach

and assuming a distribution function of the response variable y and the auxiliary

variable x , i.e., under a superpopulation model, the post-stratified estimator of mean

(proportion or rate) is shown to be a maximum likelihood estimator of the population

mean (Jagers 1986; Mukhopadhyay 2000). Rosenbaum (1987) showed that sample

mean and poststratified or direct adjusted estimators are special cases of model-based

direct adjustment under two different extreme models for the subclass-specific selection

probabilities. The model-based poststratification approach is also used to improve

small-area estimations, by forming “synthetic estimators” (Holt, Smith and Tomberlin

1979).

13

From the perspective of a model-assisted approach, “poststratified estimators”

are a subclass of GREG estimators related to population groups (Särndal, Swensson and

Wretman 1992; Rao 1999). The underlying model is a group mean model, where group

membership is identified after sampling for sampled subjects, and the population group

sizes may or may not be known (Särndal, Swensson and Wretman 1992; Rao 1999).

Poststratification is particularly useful in multipurpose surveys because of its flexibility

in forming poststratra as compared to stratified simple random sampling.

Poststratification is a common tool for reducing nonresponse bias for survey

data (Oh and Scheuren 1983; Smith 1990; Zhang 1999). When non-response

probabilities are independent of the study outcome variable, poststratification can

efficiently reduce non-response bias.

1.6. Sampling framework

Making inference from samples drawn from a population is fundamental to

epidemiology surveys and statistics. In order to make inference, a probabilistic “link” is

needed to relate a sample to its parent population. This dissertation will adopt random

permutation models to provide such necessary “links” (Stanek, Singer and Lencina

2003). Permutation models were discussed more comprehensively in the earlier work of

Cassel, Särndal and Wretman (1977) and Rao and Bellhouse (1978). This framework,

similar to conventional design-based inference, does not require assumptions about

parametric distributions. The permuted population consists of random variables

corresponding to permutations of subjects in the finite population, where permutations

occur with equal probabilities. The population is represented as a non-stochastic vector

of fixed elements, whereas the permuted population is a similar vector of random

14

variables. Parameters in the population consist of the population values, and functions

of these parameters. Parameters of the permuted population are defined as the expected

value of functions of permuted population of random variables. The randomness is thus

attributable to random sampling (permutation). Inference can be made through

evaluating the connection of parameters in the population with parameters in the model

for the permuted population. Under this framework, the structural relationship between

the permuted population and the population can be explicitly represented using

matrices. This provides a convenient mathematical vehicle when regression models are

involved.

Briefly, random variables and a stochastic model are defined to represent the

random permutation of a finite real-valued population. Suppose that a finite population

consists of 1,2, ,s N= … subjects labeled by s , and associated with each subject is an

unknown but potentially observable value, sy . The population values are represented

with a 1N × column vector, namely, ( )1 2 Ny y y ′=y . Define a set of indicator

random variables isU for 1,2, ,i N= … that have a value of 1 if the thi subject in a

permutation is subject s , and 0 otherwise. Let the matrix

( )1 2, , ,N N N× =U U U U represent a matrix of indicator random variables, where

( )1 2s s s NsU U U ′=U . Then the vector of random variables representing a

permutation is given by

1N× =Y Uy , (1.3)

15

where ( )1 2 NY Y Y ′=Y , and i indicates the position in a permutation. The

random vector Y is a vector of linear combinations of random variables in the

matrix U because the vector of population values y is constant. It is obvious that

observing the random vector Y is equivalent to observing the random indicator matrix

U . Since any two subjects can not be at the same position in a permutation, it is easy

to show that

( ) ( )

1 when and

1 1 when and

0 otherwise.

is i s

N i i s s ,

E U U N N i i s s ,′ ′

′ ′≠ =⎧⎪⎪ ′ ′= − ≠ ≠⎨⎪⎪⎩

It immediately follows,

( ) y NE μ=Y 1 ,

( ) 2cov y Nσ=Y P ,

where 11

Ny ss

N yμ −=

= ∑ , ( ) 12 1y NNσ − ′= − y P y , and 1N N NN −= −P I J .

1.7. Seemingly unrelated regression models

Theories about seemingly unrelated regression models (SUR) are well

established. Finite sample properties have been discussed in a few occasions in

literature (Zellner 1963; Revankar 1974; 1976; Liu and Wang 1999), but these models

generally have a strong assumption about the error structure, typically in a form of that

assumes error vectors are independently and identically distributed, and multivariate

normal ( )MN 0,Σ . SUR models enable simultaneous estimation or testing of multiple

16

parameters across equations in the model. Applications of SUR in epidemiological

surveys are quite rare.

Theoretical considerations on integrating SUR models, survey sampling theory,

and random permutation models, have not been seen in literature. Combining these

three aspects may lead to alternative estimation methods for survey samples. The

feature of SUR of preserving covariance structures between covariates during the

estimation process is particularly useful for evaluating the properties of estimators

based on permutation models. With a SUR set-up, permuting response and auxiliary

variables jointly can be represented easily in matrix form, and the covariance structures

can be preserved during estimation.

1.8. Functional form approach in survey sampling

Many linear or non-linear finite population parameters under complex sampling

designs can be expressed as a smooth function of population totals, and their point

estimators are similar functions of the estimators of the totals (Thompson 1997). For

example, finite population parameters or functions, such as means, ratios and variances,

can be defined implicitly by a population equation of the form ( )1, , 0N

s s s Nsy xφ θ

==∑ ,

where sy and sx are values of observable variables, sφ are known real-valued functions,

and Nθ is a population parameter of interest. For example, the population ratio R of y

to x can be defined by ( )10N

s N ssy xθ

=− =∑ , that is ( )N y xg T T Tθ = = , where yT and

xT are the total of y and x , respectively. An estimators can be defined as a solution of

17

the sample estimating equation ( ) ( )1, , 0N

N s s s N ssy xφ θ φ θ π

== =∑s . For any Nθ , the

sample estimating function φs is design-consistent if ( ){ } ( )1, ,N

N s s s NsE y xφ θ φ θ

==∑s .

Linearization of the error as a function of the component total estimates

provides a fundamental basis for this approach. The properties of the point estimate

( )ˆ ˆgθ = T are evaluated using a Taylor series approximation to the error. The first-order

approximation of this type is a “linearization” of the error, for example, the linearized

error of population ratio is

( ) ( ) ( ) ( )垐垐y y x x

y x

g gR R g g T T T T remainderT T

ε ∂ ∂= − = − = − + − +

∂ ∂T T .

Many estimators based on functional approach are definable through systems of

estimating equations rather than single estimating equation (Thompson 1997). Some

population parameters, such as variances or regression coefficients, can be defined in

terms of other population parameters and need a system of estimating functions for their

definition. For instance, the population variance ( )221

Ny s ys

y Nσ μ=

= −∑ is defined by

( )

( )

2 21

1

0

0

Ns y Ns

Ns ys

y

y

μ σ

μ

=

=

⎧ ⎡ ⎤− − =⎪ ⎢ ⎥⎣ ⎦⎨⎪ − =⎩

∑

∑.

Since SURs allow several variables be modeled simultaneously, their joint

distributions can be obtained. A combination of the functional approach and SUR

models may provide insight in estimation problems for sample surveys.

18

1.9. Optimization methods

Optimization criteria and methods may vary significantly according to the

nature of problems under consideration. The parameter estimation can be viewed as a

least square problem, which minimizes a specified distance function, say ( )Φ w ,

subject to a set of constraints. Commonly used distance functions include trace,

generalized mean squared error (GMSE) or other quadratic function of the variance-

covariance matrix of the estimators.

Often a set of linear constraints are present, e.g., ( ) =r w 0 , where ( )r w is a

vector of differentiable functions. Available optimization methods under linear

constraints include two major approaches, i.e., 1) reparameterization, and 2)

modification of ( )Φ w by introducing Lagrangian multipliers (Judge et al. 1985).

When linear equality constraints can be expressed in the form ( )* =w g w , the

restrictions can be introduced by reducing the dimension of the parameter space to

circumvent the singularity problem and thus a unique solution can be obtained. The

constrained optimum of the objective function ( )Φ w can be found by an unconstrained

minimization of ( ) ( )( )*AΦ = Φw g w . For example, if ( )1 2 3w w w ′=w is subject to

0′ =1 w , then the constraint can be represented as a reparameterization of w ,

( ) 1 3

2 3

1 0 11 1 0

*w ww w

−− ⎛ ⎞⎛ ⎞= = =⎜ ⎟⎜ ⎟ −−⎝ ⎠ ⎝ ⎠

g w w w

Minimization of ( )AΦ w subject to 0′ =1 w is equivalent to minimization of ( )*Φ w

without constraint.

19

When multiple constraints are present, it may be difficult to reparameterize the

optimization functions. Instead, a vector ( )1 2 Jλ λ λ ′=λ of Lagrangian

multipliers can be introduced and the constraints are incorporated in the minimization

procedure by defining an alternative optimization function

( ) ( ) ( )ˆ ,RΦ = Φ +w λ w r w λ (1.4)

A constrained minimum of ( )Φ w is equivalent to minimizing ( )ˆ ,RΦ w λ .

Differentiating ( )ˆ ,RΦ w λ with respect to elements of both w and λ , and setting the

derivatives to zeros, will lead to unique solutions for w .

20

CHAPTER 2

NOTATIONS, DEFINITIONS AND BASICS OF

RANDOM PERMUTATION MODELS

This Chapter sets up general notations used in this thesis (Section 2.1), defines

parameters of interest (Section 2.2), introduces the random permutation framework

(Section 2.3) and methods for simultaneously permuting multiple variables, and

establishes their relationship to seemingly unrelated regression models (Section 2.4). It

is shown that the sampling process can be represented as a partition of random matrices

(Section 2.5).

2.1. Notation

Random variables are represented with upper case Latin letters, for example, Y ,

and random vectors with bolded upper case letter, e.g., Y . The value of a random

variable is represented with lower case letter, e.g., y ; and a column vector of the values

with bolded lower case letter, e.g., y .

Population parameters are represented using Greek letters such as μ for mean,

Σ for variance-covariance matrix. Estimators of population parameters are denoted

using a hat over the letter of their corresponding Greek letters, for example, μ and Σ .

Letter “T” is reserved for population totals, such as ( )1T for the population total of a

response variable ( )1Y .

Matrices are represented using bolded upper or lower case letters. To simplify

notation, we define notation to represent matrices that are frequently used in

21

mathematical derivation (Harville 1997). A vector or matrix of zeros is represented with

0 . For any finite integer a and b ,

1) Summing Vector a′1 is a 1 a× row vector of ones,

2) Identity Matrix aI is an a a× identity matrix,

3) J-Matrix aJ is an a a× matrix of ones,

4) Centering Matrix 1a a aa= −P I J ,

5) Projection Matrix ( )−′ ′=X,WP X X WX X W , where X is any a b× matrix,

W is any a a× matrix and ( )−′X WX indicates a generalized inverse of

′X WX . When =W I , X,IP is simply denoted as XP .

6) Another frequently used matrix is ,1

n N n nN= −P I J .

For matrix operations, the transpose of a matrix is represented as ′A , and an

inverse of a matrix A is denoted as 1−A if it exists. The generalized inverse of A is

denoted as −A . A block diagonal matrix of iA , 1,2, ,i N= … , is represented using

1

N

ii=⊕A . Kronecker products of A and B are represented using ⊗A B ,

11 12 1

21 22 2

1 2

N

N

M N K L

M M MN

a a a

a a a

a a a

× ×

⎡ ⎤⎢ ⎥⎢ ⎥

⊗ = ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

B B B

B B BA B

B B B

.

The VEC operator is denoted as vec , for example,

22

[ ]

1

2

m n

n

vec ×

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

a

aA

a

,

where ia , 1 2i , , ,n= … , is the -thi column of the m n× matrix A .

2.2. Definition of population and population parameters

A finite population P consists of N subjects with labels 1,2, ,s N= … , where

N is known. The labels are non-informative and serve only to identify the units

(Royall 1970). Associated with each subject (e.g., s ) is a non-stochastic ( )1 1p + ×

column vector sy , where ( ) ( ) ( ) ( )( )0 1 k ps s s s sy y y y ′=y , ( )0

sy is the response

variable under study and ( ) , 1,2, ,ksy k p= … are auxiliary variables.

The values of the -thk variable ( )ky , 0,1, ,k p= … , for all subjects are

represented using an 1N × column vector ( ) ( ) ( ) ( )( )1k k k k

s Ny y y ′=y . The

values of all 1p + variables for all N subjects can be represented with an ( )1N p× +

matrix y , such that

( )( ) ( ) ( ) ( )( )

0

0 111

k pN p

N

y y y y× +

′⎛ ⎞⎜ ⎟′⎜ ⎟= =⎜ ⎟⎜ ⎟⎜ ⎟′⎝ ⎠

yy

y

y

.

Further, the population values are alternatively represented as a ( )1 1p N+ × column

vector z , such that ( )vec=z y .

23

For any variable ( )ky , 0,1,2, ,k p= … , we define population parameters as

follows,

Mean: ( ) ( )1k kNN

μ ′= 1 y

Total: ( ) ( )k kNT ′= 1 y

Ratio of ( )ky to ( )*ky : ( ) ( )*

*,

kkk k

R T T=

Variance: ( ) ( )21 1 k kk N

NN N

σ−⎛ ⎞ ′=⎜ ⎟⎝ ⎠

y P y

Covariance between ( )ky and ( )*ky : ( ) ( )*

*,

1 1 kkNk k

NN N

σ−⎛ ⎞ ′=⎜ ⎟⎝ ⎠

y P y .

Variance-covariance matrix:

( ) ( )

20 01 0

210 1 1

1 1

20 1

p

pp p

p p p

σ σ σσ σ σ

σ σ σ

+ × +

⎛ ⎞⎜ ⎟⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

Σ .

2.3. Single stage random permutation

Suppose that a sample of n subjects is drawn from a population of size N via

sample random sampling without replacement (SRSWOR). A stochastic model can be

defined as arising from a single-stage random permutation of the population. In a

permutation, the position of a subject is indexed by 1,2, ,i N= … ; there will be a

different subject, s, at each position, i. The subjects at the first 1,2, ,i n= … positions in

a permutation constitute a sample from SRSWOR.

24

We define a set of random variables and a stochastic model to represent the

random permutation of the subjects in P . First, we define a set of indicator random

variables isU , 1,2, ,i N= … , that have a value of 1 if the subject in the -thi position in a

permutation is subject s , and 0 otherwise. Let ( )1 2N N N× • • •′′ ′ ′=U U U U

represent a matrix of indicator random variables, where ( )1 2i i i iNU U U•′=U .

Then the vector of random variables representing a permutation of ( )ky is given by

( ) ( )k k=Y Uy .

For a random permutation, ( ) 1NE

N=U J . The variance of ( )kY can be expressed in

terms of the variance and covariance of the elements in U . Through proper

arrangement in order, the variance for U can be written as

( ) 1var1 N Nvec

N= ⊗⎡ ⎤⎣ ⎦ −

U P P . (2.1)

Derivation of (2.1) is presented in Section 2.6.

Since ( ) ( ) ( )( ) ( )k k kN vec′= = ⊗Y Uy y I U , we have,

( )( )kn kE μ=Y 1 , and

( )( ) 2var kk Nσ=Y P .

2.4. Simultaneous permutation of multiple variables

We develop expressions for simultaneous permutation of multiple variables,

first considering the joint permutation of two variables. The results can be readily

25

extended to scenarios with multiple variables. The joint permutation of both ( )0y and

( )1y can be expressed as

( )

( ) ( )( )

( )

0 0

21 1

⎛ ⎞ ⎛ ⎞= ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠

Y yI U

Y y. (2.2)

Since ( ) NEN

=JU , it is easy to show that

( )

( ) ( )0

021

1NE

μμ

⎛ ⎞ ⎛ ⎞= ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠

YI 1

Y, (2.3)

and

( )

( )

0

0,11 Ncov⎛ ⎞

= ⊗⎜ ⎟⎜ ⎟⎝ ⎠

YΣ P

Y, (2.4)

where 20 0,1

0,1 21,0 1

σ σσ σ⎛ ⎞

= ⎜ ⎟⎝ ⎠

Σ . Derivation of (2.4) is given in Section 2.7.

When there are 1p + covariates of interest, the matrix of random variables

representing a permutation is given by

( ) ( )1 1N NN p N p×× + × +=Y U y , (2.5)

where ( ) ( ) ( ) ( )( )0 1 k p ′=Y Y Y Y Y , ( )kY is the random variable

corresponding to variable ( )ky , 0,1, , .k p= … Denote a column vector ( )1 1p N+ ×Z as

( ) ( ) ( ) ( )1 1p pvec vec+ += = ⊗ = ⊗Z Y I U y I U z , (2.6)

where ( ) ( )1 1p N vec+ × =z y .

26

As an application of (2.3) and (2.4) when 1p = , it can be shown that the

expected value and variance-covariance matrix of Z are,

( ) ( )1p NE += ⊗Z I 1 μ , (2.7)

( ) Ncov = ⊗Z Σ P , (2.8)

where Σ is as defined in Section 2.2.

2.5. Sampling and partition of random matrices

A sample of size n , n N≤ , drawn by SRSWOR scheme from P is thus viewed

as the first n subjects in a random permutation. Therefore, a random vector Z can be

partitioned into a sample (indexed as I) and remaining (indexed as II) parts. The

elements of Z can be rearranged into the sampled and remaining portions through pre-

multiplication by a properly specified positioning matrix *K , i.e.,

* I

II

⎛ ⎞= = ⎜ ⎟

⎝ ⎠

*Z

Z K Z Z .

For example, if Z involves 2 variables,

( )( )( )( )

2*

2

n n N n

N nN n n

× −

−− ×

⎛ ⎞⊗⎜ ⎟= ⎜ ⎟⊗⎜ ⎟⎝ ⎠

I I 0K

I 0 I.

Similarly, if Z involves 1p + variables,

( )( )( )( )

1*

1

p n n N n

p N nN n n

+ × −

+ −− ×

⎛ ⎞⊗⎜ ⎟= ⎜ ⎟⊗⎜ ⎟⎝ ⎠

I I 0K

I 0 I. (2.9)

The expected values of *Z are,

27

( ) 1 0*

1 1

I p n

II p N nE E

μμ

+

+ −

⊗⎛ ⎞ ⎛ ⎞⎛ ⎞= =⎜ ⎟ ⎜ ⎟⎜ ⎟⊗ ⎝ ⎠⎝ ⎠ ⎝ ⎠

Z I 1Z Z I 1 .

Use the notation, we can derive the expression for the partitioned variance-

covariance matrix as follows. The partitioned variance-covariance matrix is defined by

,

,

I I I II

II II I II

cov⎛ ⎞ ⎛ ⎞

= =⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠

Z V VV Z V V

, (2.10)

where ,I n N= ⊗V Σ P , ( ), ,1

I II II I n N nN × −⎛ ⎞′= = ⊗ −⎜ ⎟⎝ ⎠

V V Σ J and ( ),II N n N−= ⊗V Σ P .

Furthermore, it can be shown that

1 1 1I n nN n− − ⎛ ⎞= ⊗ +⎜ ⎟−⎝ ⎠

V Σ I J . (2.11)

Detailed derivations for the above expressions can be found in Section 2.8.

2.6. Derivation of variance-covariance matrix of ( )vec U

Since ( )1 2N N N× • • •′′ ′ ′=U U U U represent a matrix of indicator random

variables, where ( )1 2i i i iNU U U•′=U ,

( )

( ) ( ) ( )( ) ( ) ( )

( ) ( ) ( )

1 1 1 2 1

2 1 2 2 2

1 2

cov , cov , cov ,

cov , cov , cov ,cov

cov , cov , cov ,

N

N

N N N N

vec

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟=⎡ ⎤⎣ ⎦ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

U U U U U U

U U U U U UU

U U U U U U

i i i i i i

i i i i i i

i i i i i i

. (2.12)

Since ( ) ( ) ( ) ( )* * *cov ,i i ii i iE E E′ ′= −U U U U U Ui i ii i i

, ( ) ( )*N

i iE E

N= =

1U Ui i, and

28

( )( ) ( )

*

*

*

1 , when ,

1 , when ,1

N

i i

N N

i iN

Ei i

N N

⎧ =⎪⎪′ = ⎨⎪− − ≠⎪ −⎩

I

U UI J

i i

it is readily shown that

( ) ( ) ( ) ( )( )

* * *

*

*

1 ,

cov ,1

1

N

i i ii i i

N

i iN

E E Ei i

N N

⎧ =⎪⎪′ ′= − = ⎨⎪− ≠⎪ −⎩

P

U U U U U UP

i i ii i i, (2.13)

where NN N N= −

JP I . Applying (2.13) to (2.12), we have

( ) 1cov1 N Nvec

N= ⊗⎡ ⎤⎣ ⎦ −

U P P .

2.7. Derivation of variance-covariance matrix in Section 2.4.

Simultaneous permutation of two variables can be expressed as

( )

( ) ( )( )

( )

0 0

21 1

⎛ ⎞ ⎛ ⎞= ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠

Y yI U

Y y (2.14)

In order to get the variance-covariance matrix, we can rewrite (2.14) as (2.15),

( )

( )( ) ( )( )( ) ( )

00 1

2 21 N vec⎛ ⎞

′ ′= ⊗ ⊗⎜ ⎟⎜ ⎟⎝ ⎠

Yy y I I U

Y. (2.15)

Since ( ) ( ) ( ) ( )m n p q n qm p m n p qvec vec vec× × × ×⎡ ⎤⊗ = ⊗ ⊗ ⊗⎣ ⎦A B I K I A B , where

qmK is a vec-permutation matrix. The ( )1i n j th− +⎡ ⎤⎣ ⎦ row of qmK is the

( )1j m i th− +⎡ ⎤⎣ ⎦ row of mnI ( )1,2, , ; 1,2, ,i m j n= =… … (see Harville 1997, page 345,

349). Therefore,

29

( ) ( ) ( ) ( )

( )

( )

( )

( )

2 2 2 2

2 2

*2 2 ,

N N

N N

N N

vec vec vec

vec

vec

⊗ = ⊗ ⊗ ⊗⎡ ⎤⎣ ⎦

⎛ ⎞⎜ ⎟⎜ ⎟= ⊗ ⊗ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

= ⊗ ⊗

I U I K I I U

U0

I K I0

U

I K I U

(2.16)

where ( ) ( )( )* vec vec′′ ′=U U 0 0 U .

Apply (2.16) to (2.15), we have

( )

( )( ) ( )( )( )( )

( ) ( )( )( )( ){ }

00 1 *

2 2 21

0 1 *2 2 2 ,

N N N

N N

⎛ ⎞′ ′= ⊗ ⊗ ⊗ ⊗⎜ ⎟⎜ ⎟

⎝ ⎠

⎡ ⎤′ ′= ⊗ ⊗ ⊗⎢ ⎥⎣ ⎦

Yy y I I I K I U

Y

y y I I K I U

(2.17)

where 2NK is a 2 2N N× matrix, comprising N rows and 2 columns of 2N ×

dimensional blocks whose ij -th element is 1 and whose other ( )2 1N − elements are 0.

2NK can be written explicitly as

1 1 2 1

1 2 2 2

2

1 2

1 0 0 0 0 0

0 0 0 1 0 0

0 1 0 0 0 0

0 0 0 0 1 0

0 0 1 0 0 0

0 0 0 0 0 1

N

N N

⎛ ⎞⎜ ⎟⎜ ⎟

′ ′ ⎜ ⎟⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟′ ′⎜ ⎟⎜ ⎟⎜ ⎟= =⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟′ ′⎜ ⎟

⎝ ⎠ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

u e u e

u e u eK

u e u e

,

where iu is the i -th column of 2I , 1,2i = , and je is the j -th column of NI ,

1,2, ,j N= … . Observing that

30

( )( )( )

( )

( )

( )

( )

( )

( ) ( ) ( )

( ) ( ) ( )

( )

( )

( )( )

0 0 01 20

2 0 0 01 2

0 0 01 1 1 2 1

0 0 02 1 2 2 2

01 1 2 1 1

01 2 2 2 2

02 2

0 0 0

0 0 0N

N

N

N

N

N

N

y y y

y y y

⎛ ⎞′ ⎜ ⎟⊗ =

⎜ ⎟⎝ ⎠⎛ ⎞′ ′ ′′ ′ ′⎜ ⎟=⎜ ⎟′ ′ ′′ ′ ′⎝ ⎠⎛ ⎞′ ′ ′ ′⎛ ⎞⎜ ⎟= ⎜ ⎟′ ′ ′⎜ ⎟′ ⎝ ⎠⎝ ⎠

′ ′= ⊗

y I

y u e y u e y u e

y u e y u e y u e

y 0 e u e u e ue u e u e u0 y

I y K

and 2 2 2N N N′ =K K I , we can simplify (2.17) as follows(for matrix operation, see page

348 & p349 of Harville 1997),

( )

( )( )( ) ( )( ){ }

( )( ) ( )( ){ }( )( ) ( )( ){ }( )( ) ( )( ){ }

( )

( )

( )

( )

( ) ( )( )

00 1 *

2 2 2 21

0 1 *2 2 2 2 2 2

0 1 *2 2 2 2

0 1 *2 2

0 1*

0 1

0 1 *2 2

N N N

N N N N N

N N N

N

N

N

⎛ ⎞ ⎡ ⎤′ ′= ⊗ ⊗ ⊗⎜ ⎟ ⎢ ⎥⎜ ⎟ ⎣ ⎦⎝ ⎠

⎡ ⎤′ ′′ ′= ⊗ ⊗ ⊗⎢ ⎥⎣ ⎦

⎡ ⎤′ ′= ⊗ ⊗ ⊗⎢ ⎥⎣ ⎦

⎡ ⎤′ ′= ⊗ ⊗ ⊗⎢ ⎥⎣ ⎦

⎡ ⎤⎛ ⎞′ ′⎢ ⎥⎜ ⎟= ⊗⎜ ⎟⎢ ⎥′ ′⎝ ⎠⎣ ⎦⎡ ⎤′ ′= ⊗ ⊗ ⊗⎢ ⎥⎣ ⎦

Yy I K y I K I U

Y

I y K K I y K K I U

I y I I y I I U

I y I y I U

y 0 y 0I U

0 y 0 y

I y I y I U

( )

( )( ) ( )( ) ( )

( )

( )

0020 1 *

2 2 112

cov covN N

⎡ ⎤⎛ ⎞⎛ ⎞ ⊗⎡ ⎤′ ′ ⎢ ⎥⎜ ⎟= ⊗ ⊗ ⊗ ⊗⎜ ⎟ ⎢ ⎥⎜ ⎟ ⎣ ⎦ ⎜ ⎟⊗⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦

I yYI y I y I U I

I yY

As shown in Section 2.6, the variance matrix for U is

( ) 1cov1 N Nvec

N= ⊗⎡ ⎤⎣ ⎦ −

U P P ,

where NN N N= −

JP I . Therefore,

31

( )( )

( )

( )( ) ( ) ( )

( ) ( ) ( )( )

*

cov cov ,

cov cov

cov , cov

vec vec vecvec

vec vec vec vec

⎛ ⎞⎡ ⎤⎛ ⎞ ⎣ ⎦⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟= = ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎡ ⎤⎝ ⎠ ⎣ ⎦⎝ ⎠

U 0 0 U UU0 0 0 0 0

U0 0 0 0 0

U U U 0 0 U

( ) ( )

( ) ( )

( ) ( )

*

1 0 0 10 0 0 0

cov cov0 0 0 01 0 0 1

10

1 0 0 1 cov01

11 N N

vec

vec

N

⎛ ⎞⎜ ⎟⎜ ⎟= ⊗ ⎡ ⎤⎣ ⎦⎜ ⎟⎜ ⎟⎝ ⎠⎡ ⎤⎛ ⎞⎢ ⎥⎜ ⎟⎢ ⎥⎜ ⎟= ⊗ ⎡ ⎤⎣ ⎦⎢ ⎥⎜ ⎟⎢ ⎥⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦

= ⊗ ⊗−

U U

U

LL' P P

(2.18)

where ( )1 0 0 1′ =L .

This is a patterned matrix, we can further simplify it as follows.

( )

( )( ) ( )( ) ( )( )

( )

( )

( )

( )

( )

( )

( )

( )

( )

( )

( )

0020 1

2 2 112

0

00 1

10 1

1

0

1cov1

11

11

N N N N

N N

N

N N

N

N

N

⎡ ⎤⎛ ⎞⎛ ⎞ ⊗⎡ ⎤′ ′ ⎢ ⎥⎜ ⎟= ⊗ ⊗ ⊗ ⊗ ⊗ ⊗⎜ ⎟ ⎢ ⎥⎜ ⎟ ⎣ ⎦ ⎜ ⎟− ⎢ ⎥⊗⎝ ⎠ ⎝ ⎠⎣ ⎦

⎧ ⎫⎛ ⎞⎛ ⎞⎪ ⎪⎜ ⎟⎜ ⎟⎛ ⎞′ ′⎪ ⎪⎜ ⎟⎪ ⎪⎜ ⎟⎜ ⎟= ⊗⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟− ′ ′ ⎜ ⎟⎪ ⎪⎝ ⎠⎜ ⎟⎜ ⎟⎜ ⎟⎪ ⎪⎜ ⎟⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭

′=

−

I yYI y I y I LL' P P I

I yY

y 0P 0 0 P0 y0 0 0 0y 0 y 0

P0 0 0 0 y 00 y 0 y

P 0 0 P 0 y

y P ( ) ( ) ( )

( ) ( ) ( ) ( )

0 1 0

0 1 1 1

20 0,1

20,1 1

0,1

N NN

N N

N

N

σ σσ σ

⎛ ⎞′⎜ ⎟ ⊗⎜ ⎟′ ′⎝ ⎠

⎛ ⎞= ⊗⎜ ⎟⎜ ⎟⎝ ⎠

= ⊗

y y P yP

y P y y P y

P

Σ P

32

where 20 0,1

0,1 20,1 1

σ σσ σ⎛ ⎞

= ⎜ ⎟⎝ ⎠

Σ .

2.8. Derivation of variance-covariance matrix of the partitioned matrix

( ) ( ) ( )( )0 1 p ′′ ′ ′=Y Y Y Y can be partitioned into a sample (indexed as I) and

remainder (indexed as II) parts. After sampling is completed, the values for the sampled

portion ( ) ( ) ( )0 1, , pI I IY Y Y will be known, and can be represented as realized values.

Through proper specification of permutation matrices K , we can rearrange the

elements of ( ) ( ) ( )0 1, , pI I IY Y Y into the sampled and remaining portions by pre-

multiplication by *K . We can rewrite

( )

( )

( )

( )( )( )( )

( )

( )

( )

0 0

1 11 -*

1 -

p n n N n I

IIp N nN n np p

+ ×

+ −×

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎛ ⎞⊗ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟= = ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⊗ ⎝ ⎠⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

Y YI I 0 ZY Y

K ZI 0 IY Y

, (2.19)

where *K as defined in (2.9), ( ) ( ) ( )( )0 1 pI I I I

′′ ′ ′=Z Y Y Y and

( ) ( ) ( )( )0 1 pII II II II

′′ ′ ′=Z Y Y Y .

Use this notation, we can derive the expression for the partitioned variance-

covariance matrix as follows. The partitioned variance-covariance matrix is thus

( ), * *

,

cov covI I I II

II II I II

⎛ ⎞ ⎛ ⎞ ′= = =⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠

Z V VV K Y KZ V V

Since ( ) Ncov = ⊗Y Σ P , the above expression can be simplified as,

( )* *N

′= ⊗V K Σ P K . (2.20)

33

V can be simplified as follows,

( )( )( )( ) ( )

( )

( )

( )( )( )

( )( ) ( )

( )( )( )

1 - -1 1

-1 -

-- -

-

--

p n n N n n n N nN p p

N n n N np N nN n n

n n N nn N n Nn N n n N n

N n n N n

nN n NN n n

N n n

+ × ×+ +

× −+ −×

×× ×

× −

−××

⎛ ⎞⊗ ⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟= ⊗ ⊗ ⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⊗ ⎝ ⎠ ⎝ ⎠⎜ ⎟ ⎝ ⎠⎝ ⎠

⎡ ⎤⎡ ⎤ ⎛ ⎞⎛ ⎞⊗ ⊗ ⎢ ⎥⎢ ⎥ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎢ ⎥⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦

=⎛ ⎞

⊗ ⎜ ⎟⎜⎝ ⎠

I I 0 0IV Σ P I I0 II 0 I

0IΣ I 0 P Σ I 0 P0 I

IΣ 0 I P 0 ( ) ( )( ) ( )-

-

.n N n

NN n n N nN n

×× −

−

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟

⎡ ⎤⎡ ⎤ ⎛ ⎞⎜ ⎟⊗ ⎢ ⎥⎢ ⎥ ⎜ ⎟⎜ ⎟⎟ ⎜ ⎟⎜ ⎟⎢ ⎥⎢ ⎥ ⎝ ⎠⎣ ⎦ ⎣ ⎦⎝ ⎠

0Σ 0 I P

I

Therefore,

( )

( )

,,

,,

1

1

n N n N nI I II

I II IIN n NN n n

N

N

× −

−− ×

⎛ ⎞⎛ ⎞⊗ ⊗ −⎜ ⎟⎜ ⎟⎛ ⎞ ⎝ ⎠⎜ ⎟= =⎜ ⎟′ ⎜ ⎟⎛ ⎞⎝ ⎠ ⊗ − ⊗⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

Σ P Σ JV V

V V VΣ J Σ P

, (2.21)

where ,I n N= ⊗V Σ P , ( ),1

I II n N nN × −⎛ ⎞= ⊗ −⎜ ⎟⎝ ⎠

V Σ J and ,II N n N−= ⊗V Σ P .

Further, ( ),1 1n n n n

n N n n n n fN n n N n

⎛ ⎞ ⎛ ⎞= − = − + − = + −⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

J J J JP I J I P , and

,1 N n

N n N N n N n N n fN N n

−− − − −= − = +

−JP I J P , where f n N= is the sampling fraction.

Consequently,

( )( ) ( )( )( )( ) ( )( )

( )( ) ( )( )( )( ) ( )( )

1 1

11

1 1

11

1

1,

n n n N n

N n N nN n n

n n N nn

N n N nN n n

f n N

N f N n

f n N

N f N n

− −× −

−−− −− ×

− −× −

−−− −− ×

⎛ ⎞⊗ + − ⊗ −⎜ ⎟= ⎜ ⎟⊗ − ⊗ + −⎜ ⎟⎝ ⎠

⎛ ⎞⊗ − ⊗ −⊗⎛ ⎞ ⎜ ⎟= +⎜ ⎟ ⎜ ⎟⎜ ⎟⊗ ⊗ − ⊗ −⎝ ⎠ ⎜ ⎟⎝ ⎠

Σ P J Σ JV

Σ J Σ P J

Σ J Σ JΣ P 00 Σ P Σ J Σ J

and

34

( )1 nI n f

n⎛ ⎞= ⊗ + −⎜ ⎟⎝ ⎠

JV Σ P ,

( ),1

I II n N nN × −= − ⊗V Σ J ,

N nII N n f

N n−

−⎛ ⎞= ⊗ +⎜ ⎟−⎝ ⎠

JV Σ P .

Furthermore, it is shown that,

1 1 1 1,

1I n N n nN n− − − − ⎛ ⎞= ⊗ = ⊗ +⎜ ⎟−⎝ ⎠

V Σ P Σ I J

and

1 1 1 1,

1II N n N N n N nn− − − −

− − −⎛ ⎞= ⊗ = ⊗ +⎜ ⎟⎝ ⎠

V Σ P Σ I J .

35

CHAPTER 3

GENERAL ESTIMATION STRATEGY

This Chapter proposes a general method for estimating population parameters

based on survey samples. This method connects permutation models, estimation

functions and seemingly unrelated regression. In later Chapters, we will apply this

method to develop various estimators for scenarios that are common in practical

settings. The basic strategy for mathematical derivation is to:

1) Represent the randomness arising from sampling and relate samples to the

population using a random permutation model;

2) Define target parameters as linear or nonlinear functions of population totals;

3) Represent joint permutations of response and auxiliary variables in a seemingly

unrelated regression framework;

4) Represent estimation of population totals as predictors of the remainder based

on linear functions of the sample;

5) Define unbiasedness constraints as parameter-wise or average unbiasedness;

6) Introduce auxiliary information as additional linear constraints to estimating

equations using reparameterization or additional Lagrangian functions;

7) Optimize estimation by minimizing the generalized mean squared errors, or

trace of variance-covariance matrix of the population totals subject to various

unbiasedness or calibration constraints.

Since many population parameters can be defined as functions of population

totals, estimating population totals is fundamental to the estimation of other parameters.

36

The following sections outline several strategies to obtain unbiased and minimum

variance estimators for population totals. We will limit our discussion to the settings of

simple random sampling without replacement.

Figure 3.1. General Estimation Strategy

Finite Population

Sample

Random Permutation

Define Target RandomVariables

Optimization

Criteria:1. GMSE, 2. Trace3. Penalty function

Methods:1. Reparameterization2. Lagrangian function

揃est? estimators

Auxiliary variablesResponse variable

(Measured w/o error)

RPSUR Model

Parameters representedas functions of totals

Sample + Remainder

SRSWOR

Constraints:1. Parameter-wise2. Average unbiasedness3. Auxiliary information

37

3.1. Parameters of interest

In Chapter 2, we have shown that the joint distribution of ( ) ( ) ( )0 1, , , pY Y Y… can

be obtained through simultaneous permutation of the underlying variables ( ( )0y , ( )1y ,

…, ( )py ). A ( )1 1p + × vector T of population totals of interest can be represented as a

set of 1p + linear combinations of the random variables arising from the permutations,

′=T L Z , where ′L is a ( ) ( )1 1p p N+ × + matrix, ( ) ( ) ( )( )0 1 p ′′ ′ ′=Z Y Y Y . The

elements of Z can be partitioned into two parts that represent the sample ( )IZ and the

remaining ( )IIZ by premultiplication by the permutation matrix *K given by (2.9),

( )*I II

′′ ′=K Z Z Z . Since * *′ =K K I , T can be represented using following identities,

* *I I II II

′′ ′ ′ ′= = = +T L Z L K K Z L Z L Z , (3.1)

where IL and IIL are ( )( )1p n q+ × and ( )( )( ) ( )1 1p N n p+ − × + matrices

respectively. Once IZ is observed after sampling, estimating T is equivalent to

predicting II II′L Z .

3.2. Random permutation seemingly unrelated regression model

Based on the explicit presentation of simultaneous permutation of multiple

variables from Section 2.4, a ( )1p + -equation simple location seemingly unrelated

regression model can be defined as

38

( )

( )

( )

00

11

N

N

p pN

μμ

μ

⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟= +⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠⎝ ⎠

Y 1 0 00 1 0Y E

0 0 1Y

, (3.2)

where E is a ( )1 1p N+ × column vector of residuals. Since ( ) ( ) ( )0 1, , , pY Y Y… arise

from exactly the same permutation of ( ) ( ) ( )0 1, , , py y y… as U , these 1p + equations are

indeed related to each other and this relationship is captured through the identical

permutation represented by ( )vec U . Following conventional notation, Model (3.2) can

be expressed more compactly as,

= +Z Xμ E (3.3)

where ( ) ( )0 11 1 pp μ μ μ+ ×′=μ , 1p N+= ⊗X I 1 is the design matrix for the model.

Hereafter, Model (3.3) is referred as Random Permutation Seemingly Unrelated

Regression (RPSUR) model. Accordingly, the RPSUR model can be written in terms

of sample and remaining parts,

I I

II II

⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠

Z Xμ E

Z X. (3.4)

It is readily shown that ( )I IE =Z X μ and ( )II IIE =Z X μ .

With (3.4), linear estimators of T can be derived under design-based framework

(Stanek, Singer and Lencina 2003) using a method similar to the prediction-based

approach (Royall 1976; Bolfarine and Zacks 1992; Valliant, Dorfman and Royall 2000).

We will illustrate several methods in the following sections.

39

3.3. Simultaneous estimation of 1p + population totals

The 1p + population totals of interest can be represented as a vector

( )0 1 pT T T ′=T . By definition, ( )1p N I I II II+ ′ ′ ′= ⊗ ≡ +T I 1 Z L Z L Z , where

1I p n+′ ′= ⊗L I 1 and 1II p N n+ −′ ′= ⊗L I 1 . We are interested in deriving an estimator for T

that is a linear function of IZ and unbiased for T while a chosen optimization criterion

is satisfied. A linear estimator of T can be defined as ,

( )Î I′ ′= +T L W Z , (3.5)

where W is a ( ) ( )1 1n p p+ × + matrix of weights (or coefficients). There are many

possible ways to define W . Usually, W is chosen to satisfy certain optimization

criteria, such as to minimize trace of the variance-covariance matrix, generalized mean

squared error or M-estimation criteria. An intuitive but arbitrary way to define W is to

define 1 *p+= ⊗W I w , where *w is 1n × column vector, that is, a single weight system

is applied to all variables of interest. Once W is defined, the corresponding estimation

error can be defined as

( )Î I I II II′ ′ ′ ′ ′− = + − = −T T L W Z L Z W Z L Z . (3.6)

The unbiasedness of T for T requires that

( ) ( )Î II IIE E ′ ′− = − =T T W Z L Z 0 . (3.7)

Since ( )I IE =Z X μ and ( )II IIE =Z X μ , for (3.7) to hold for any T , it is required that

I II II′ ′− =W X L X 0 , (3.8)

where 1I p n+= ⊗X I 1 and 1II p N n+ −= ⊗X I 1 .

40

The variance-covariance matrix of T under Model (3.4) is defined as

( ) ( )垐E ′− −T T T T , which can be represented in terms of the variance-covariance

matrices of the sample ( )IV and remaining parts ( )IIV , and the covariance matrix

between the sample and remaining parts ( ),I IIV ,

( ) ( ) ,

,

ˆcov I I IIII

II I II II

⎛ ⎞ ⎛ ⎞′ ′− = − ⎜ ⎟ ⎜ ⎟−⎝ ⎠⎝ ⎠

V V WT T W L

V V L. (3.9)

To find the best estimate for T is to find an estimate for W that satisfies (3.8)

and results in minimizing the expected value of a chosen function of ( )ˆcov −T T .

Commonly used optimization criteria include minimizing ( )ˆcov −T T in terms of its

1) trace, or

2) quadratic function ( )ˆcov′ −q T T q , or M-estimation; when 1p+=q 1 , this

criterion is referred to as the generalized mean squared error (GMSE).

For example, if minimizing the trace of ( )ˆcov −T T is the criterion, we can

define the optimization function as (3.8),

( ) ( ) ( ) ( )1

,1

2 2p

I I II II i I II IIi

trace trace+

=

′ ′ ′ ′ ′Φ = − + −∑W W V W W V L u W X L X λ , (3.10)

where iu is the i -th row of identity matrix 1p+I and λ is a ( )1 1p + × vector of

Lagrangian multipliers. Differentiating (3.10) with respect to elements of W and λ

and setting the derivatives to zero will yield two sets of estimation equations. The

estimation equations can then be used to identify W . We pursue this approach in

Chapter 4.

41

3.4. Example: best linear unbiased estimator of a population total

A special application of the method proposed in Section 3.3 is to derive a linear

unbiased minimum variance estimator (BLUE) of ( )0T when no auxiliary information is

used. By definition, ( )0I I II IIT ′ ′= +Z Zl l , where ( )I n′ ′= 1 0l and ( )II N n−′ ′= 1 0l . Using

(3.5), a linear estimator of ( )0T is defined as

( ) ( )0ˆI IT ′ ′= +w Zl , (3.11)

where w is a ( )1 1n p + × column vector. Following (3.6), the estimation error is

defined as ( ) ( )0 0ˆI II IIT T ′ ′= −- w Z Zl . Applying (3.8), the unbiasedness of ( )0T for ( )0T

becomes

I II II′ ′− =w X X 0l , (3.12)

where 1I p n+= ⊗X I 1 and 1II p N n+ −= ⊗X I 1 .

Under the unbiasedness constraint (3.12), the mean squared error of ( )0T is the

same as its variance. As an analogue to (3.10), minimizing ( )( )0ˆvar T subject to (3.12)

is equivalent to minimize

( ) ( ),2 2I I II II I II II′ ′ ′ ′Φ = − + −w w V w w V w X X λl l , (3.13)

where λ is a ( )1 1p + × vector of Lagrangian multipliers. Differentiating (3.10) and

setting the derivatives to zero results in a set of estimating equations,

,ˆˆ

I I I II II

I II II

⎛ ⎞⎛ ⎞ ⎛ ⎞=⎜ ⎟⎜ ⎟ ⎜ ⎟′ ′⎝ ⎠ ⎝ ⎠⎝ ⎠

wV X VX 0 Xλ

ll

, (3.14)

which has a unique solution,

42

( ) ( )( )11 1 1, ,ˆ I I II I I I I I I I II II

−− − −′ ′= − −w V V X X V X X V V I l . (3.15)

By applying (3.15) to (3.11), we have the BLUE for ( )0T ,

( )10 ,ˆ

I I II II I II I I IT −⎡ ⎤′ ′ ′= + + −⎣ ⎦Z X β V V Z X βl l , (3.16)

where ( ) 11 1I I I I I I

−− −′′=β X V X X V Z . The variance of ( )0T can be evaluated as

( )( ) ( )( ) ( ) ( )0ˆ 垐 �var var I I I I IT ′ ′= + = + +w Z w V wl l l . (3.17)

After algebraic simplification,

( ) ( ) ( )( ) ( ) ( ) ( )( )0 1 0 11 p pn n n Y Y Y

n′ ′′ ′ ′= =β 1 Y 1 Y 1 Y ,

and

( ) ( ) ( ) ( ) ( )0 0 0 0T nY N n Y NY= + − = ,

where ( ) ( )0 01n IY

n′= 1 Y is the sample mean of permuted response variable. The variance

of ( )0T is

( ) 2 20 0

1ˆvar fT Nn

σ−= ,

where f n N= is the sampling fraction.

3.5. Estimation of a population total using auxiliary information

In this section, we propose a method of using auxiliary information to improve

precision of linear unbiased estimators of population totals through a simple

reparameterization. We illustrate this method using a simple case that has one response

and one auxiliary variable and assuming the auxiliary total ( )1T is known.

43

Our target parameter is population total of the response variable,

( ) ( )0 0NT ′ ′= =1 Y Zl , (3.18)

where ( )1N N×′′= 1 0l . We are interested in deriving an estimator for ( )0T that is a

linear function of the sample data and unbiased for ( )0T , and has minimum mean

squared error.

We transform the vector of auxiliary variables ( )1Y by pre-multiplying

NN N NN

′= +1I 1 P , such that ( ) ( )( ) ( )1 1 1N

N NN′= +

1Y 1 Y P Y . The first term in this expression

is a constant vector ( )( )1Nμ 1 , and the second term is ( ) ( ) ( ) ( )1 * 1 1 1

N Nμ= = −Y P Y Y 1 . It is

clear that

( )

( )

( )

( )

( )

( ) ( )

0 0 0

11 1 1 *

N

NN NNN

μ

⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟= + = +⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟′⎜ ⎟⎝ ⎠ ⎝ ⎠⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠

0 0I 0 0Y Y Y10 P 10 1Y Y Y

(3.19)

Applying identity (3.19) to Model (3.2) (here 1p = ) leads to a reparameterized model,

( )

( )( )

0*

21 * N

⎛ ⎞⎜ ⎟ = ⊗ +⎜ ⎟⎝ ⎠

YI 1 μ E

Y, (3.20)

where ( )( )0* 0μ ′=μ . Since this transformation does not have impact on E , its

variance-covariance matrix remains the same as ( ) ( )( )0 1 *cov , N= ⊗Y Y Σ P , where Σ is

given in Section 2.2.

44

Using the same method as shown in previous section, ( ) ( )( )0 1 * ′′ ′Y Y can be

partitioned as ( ) ( )( ) ( )0 1 ** *I II

′ ′′ ′ ′ ′= =Z K Y Y Z Z , where *K is a permutation matrix

as given by (2.9), ( ) ( )( )0 1 *I I I

′′ ′=Z Y Y represents the sample and ( ) ( )( )0 1 *II II II

′′ ′=Z Y Y

represents the remaining part. It immediately follows that ( ) ( ) *2I nE = ⊗Z I 1 μ ,

( ) ( ) *2II N nE −= ⊗Z I 1 μ , ,I n N= ⊗V Σ P and ( ), ,

1I II II I n N nN × −

⎛ ⎞′= = ⊗ −⎜ ⎟⎝ ⎠

V V Σ J .

Similar to (3.5), ( )0T can be represented as a linear combination of the sampled

and remaining parts,

( )0I I II IIT ′ ′= +Z Zl l , (3.21)

where ( )1 0I n′= ⊗1l and ( )1 0II N n−

′= ⊗1l . Its linear estimator is defined as,

( ) ( )0ˆI IT ′ ′= + w Zl , (3.22)

where ( )0 1′′ ′=w w w is a 2 1n × vector. Using (3.21) and (3.22), the unbiasedness for

( )0T implies

( ) 0I II IIE ′ ′− =w Z Zl . (3.23)

Since ( ) ( ) *2I nE = ⊗Z I 1 μ , ( ) ( ) *

2II N nE −= ⊗Z I 1 μ and ( )( )0* 0μ ′=μ , (3.23) is

equivalent to ( )( ) ( )00 0n N n μ′ − − =1 w . The constraint is satisfied for any ( )0μ simply

by

( )0 0n N n′ − − =w 1 . (3.24)

45

The mean squared error of ( )0T is given as,

( )( ) ( ),0

,

ˆvarI I II

II

II I II II

T⎛ ⎞ ⎛ ⎞

′ ′ ⎜ ⎟ ⎜ ⎟= −⎜ ⎟⎜ ⎟ −⎝ ⎠⎝ ⎠

V V ww

V Vl

l. (3.25)

To minimize (3.25) subject to (3.24), we define the optimization function as

( ) ( ) ( ) ( )( )0

0 1 0 1 , 0

1

2 2I I II II n N n λ⎛ ⎞

′ ′ ′ ′ ′⎜ ⎟Φ = − + − −⎜ ⎟⎝ ⎠

ww w w V w w V w 1

wl , (3.26)

where λ is a Lagrangian multiplier. Differentiating (3.26) and setting the derivatives to

zero leads to a set of estimating equations that can be used to identify w .

Unlike the usual calibration technique of incorporating auxiliary information,

this approach leads a set of weights that do not depend on the sample, and yields a

linear unbiased minimum variance estimator of 0T . We will explore this approach

further in Chapter 5, and extend it to settings when multiple auxiliary totals are known.

3.6. Enumeration and simulation

Comparisons between candidate estimators are sometimes difficult through

mathematical derivation. When mathematical comparisons are not feasible, we will use

enumeration and simulation methods.

When the population size is small, it is possible to enumerate all possible

samples and compute the estimators based on each sample. The exact distribution of the

estimators can be obtained. Thus comparisons between various types of estimators are

possible. We will use this approach to illustrate the performance of estimators for small

samples from finite populations of small size.

46

When population size is large, the number of possible samples is so large that it

may be impossible to obtain the exact moments of the estimators. The exact values of

an expected value, the bias and the variance of estimator for population totals can not be

computed through enumeration. In such cases, Monte Carlo simulations can be used to

evaluate the properties of the estimates if theoretical derivation is not possible. The

finite population and the sampling design can be held constant, a large number of

samples can be repeatedly drawn from the finite population using the specified

sampling design. For every realized sample, the estimators can be calculated. If the

number of samples is large enough, the distributions of the calculated estimators will

closely approximate the exact sampling distribution to a given level of precision.

3.7. Research objectives

In this dissertation, we will primarily consider estimation of population

quantities based on samples drawn from a real-valued finite population of known size,

using simple random sampling without replacement (SRSWOR). The sample size is

assumed fixed. To simplify the problems without losing generality, we assume that

response for all sampled subjects can be measured without error.

Discussions will be restricted to the scenarios involving univariate response

variable that may be either continuous or binomial. In design-based approaches,

estimation of population totals or means of continuous and binomial response variable

are not notably different. Unless otherwise noted, we will consider binomial response

variables, and refer to the mean estimators as population rates (following the

terminology in Epidemiology). The problems involving multinomial or ordinal response

47

variables may be reduced to estimation using a series of binomial variables, which will

not be addressed in detail.

We will develop methods of improving estimation of population quantities

(totals, means or rates) by incorporating auxiliary information. The values of auxiliary

variables are known either for all population subjects, or at least some of its population

quantities (such as population totals) are known. We will consider several simple

problems that correspond to various scenarios that may arise according to the nature of

response and auxiliary variables, as well as number and information availability of

auxiliary variables.

We will define population totals as linear function of random variables arising

from permutations. To obtain estimators of target parameters for each scenario, we will

follow the strategies outlined in Section 3.3 through 3.5. First, we will derive estimators

of population totals and their variance-covariance matrix using a RPSUR model

described in 3.2. We optimize the estimators by minimizing either Generalized Mean

Squared Errors (GMSE) or the trace of the variance-covariance matrix of population

totals.

The properties of the derived estimators will be evaluated through mathematical

derivation whenever possible. We will try to represent the estimator in terms of a naïve

estimator (without incorporating auxiliary information) plus or multiplied by an

adjustment factor. When analytic derivations are not possible, we will evaluate their

properties using enumeration and simulation methods.

Each Chapter will start with a simple case that involves single binomial

response and single binomial auxiliary variable, therefore the parameters have

48

straightforward meaning. Results will be gradually extended to more complicated cases

that involves continuous response and auxiliary variables or multiple auxiliaries.

Finally, the derived results will be applied and related to direct and indirect

adjustment methods in Epidemiology and vital statistics.

49

CHAPTER 4

ESTIMATION OF POPULATION TOTALS

4.1. Motivations

Public health surveys often involve estimating one or more population totals,

such as the total number of persons who contracted a certain disease or the total number

of retirees having prescription drug coverage. There may also be interest in joint

estimates of multiple population totals when categorical outcomes are involved. For

example, the status of a chronic disease is classified as three categories according to

severity, that is, None, Stages I and II. Health policy researchers may be interested in

estimating the total numbers of persons who contracted the disease of Stages I or II, or

of those who did not contract the disease. One may question whether identical

estimation methods should be applied to all three categories, and whether the methods

are optimal for all three outcomes, or if methods can be identified to satisfy varying

precision requirements for different outcomes.

In this Chapter, we derive the best linear unbiased estimator for the vector T of

population totals of multiple variables, and its variance-covariance matrix, ( )ˆcov T

assuming no auxiliary totals are known. We derive two classes of linear unbiased

estimators based on either the minimum trace of ( )ˆcov T , or the minimum of a

specified quadratic function ( )ˆcov′q T q , where q is a ( )1 1p + × non-null column

vector.

50

4.2. Parameters of interest

Many estimators can be expressed as a linear function of estimated population

totals (Thompson 1999), which is equivalent to prediction of a linear combination of

random variables. The estimators for target parameters and their variances can be

readily computed based on the estimators and their variance-covariance matrix of

relevant population totals. Therefore, the estimation of population totals is the

cornerstone of estimating other target parameters.

Suppose that there are 1p + population totals of interest, the vector of

population totals can be represented as a set of 1p + linear combinations, ′=T L Z ,

where 1p N+= ⊗L I 1 and Z is a vector that represent the ( )1 1N p + × random variables

arising from joint permutation of the 1p + variables. After sampling is completed, Z

can be partitioned into sample and remaining parts such that ( )I II′′ ′=Z Z Z . Thus T

can be written as a set of linear combinations of IZ and IIZ that represent the random

variables for the sample and remaining parts,

( ) II II

II

⎛ ⎞′ ′= ⎜ ⎟⎜ ⎟

⎝ ⎠

ZT L L Z , (4.1)

where 1I p n+′ ′= ⊗L I 1 and 1II p N n+ −′ ′= ⊗L I 1 .

4.3. Formulation of a RPSUR model

We will derive estimators of population totals using a simple location seemingly

unrelated regression model based on a random permutation framework (RPSUR).

Similar to Model 3.4, we define a RPSUR model,

51

I I

II II

⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠

Z Xμ E

Z X, (4.2)

where 1I p n+= ⊗X I 1 and 1II p N n+ −= ⊗X I 1 , ( ) 1I IE

N=Z X T and ( ) 1

II IIEN

=Z X T .

4.4. Two classes of estimators for population total vector

We are interested in developing two classes of estimators of population totals of

response variables that is,

1) A linear function of the sample, Î′=T A Z

2) Unbiased for T , i.e., ( )E =T - T 0 .

4.4.1. Two classes of linear estimators of T

A linear estimator of I I II II′ ′= +T L Z L Z is defined as a linear function of the

sample, e.g., Î′=T A Z or equivalently as,

( )ÎI I′ ′= +T L W Z , (4.3)

where I′ ′ ′= −W A L is a ( ) ( )1 1p n p+ × + matrix. In general, all elements of ′W may

differ. We limit our development to two classes of estimators, 1) 0

p

kk == ⊕W w , and 2)

*0

p

k== ⊕W w , where *w and kw , 0,1, ,k p= are 1n × vectors.

4.4.2. Unbiasedness for T

The estimation error is defined as Î II II′ ′− = −T T W Z L Z . The unbiasedness of

T implies that ( ) ( )ÎI IE E ′ ′= + − =⎡ ⎤⎣ ⎦T - T L W Z T 0 holds for any T . Since

52

( ) ( ) ( )1I I I II IIE N −′ ′ ′ ′+ − = −⎡ ⎤⎣ ⎦L W Z T W X L X T , unbiasedness will hold for all T if

and only if

I II II′ ′− =W X L X 0 . (4.4)

Constraint (4.4) is a set of 1p + constraints:

( )( )

( )

0

1

0,

0,

0.

n

n

p n

N n

N n

N n

′ − − =

′ − − =

′ − − =

w 1

w 1

w 1

For convenience, the set of constraints can be rewritten alternatively as

( ) 0i I II II′ ′ ′− =u W X L X ,

where iu is the i -th row of identity matrix 1p+I , 1,2, , 1i p= +… .

4.4.3. Optimization criteria

We find the “best” estimators of T by evaluating W such that it attains a

minimum for a specified optimization criteria. Many optimization criteria are proposed

in literature (Deville and Särndal 1992; Singh and Mohl 1996). In this Chapter, we will

consider the following optimization criteria:

1) Trace of ( )cov T ,

2) M-estimation, ( )cov′q T q , where q is an arbitrary column vector; when

1p+=q 1 , it is generalized mean squared error (GMSE), ( )1 1covp p+ +′1 T 1 .

53

4.5. Estimating multiple population totals

In this section we develop two classes of estimators using the two optimization

criteria outlined above. Four cases of interest are presented in Table 1.

Table 4.1. Two classes of unbiased estimators of population totals under two optimization criteria

Class Definition of W Optimization Criteria

1 0

p

kk== ⊕W w ( )( )ˆcovtrace T

1 0

p

kk== ⊕W w ( )ˆcov′q T q

2 *0

p

k == ⊕W w ( )( )ˆcovtrace T

2 *0

p

k == ⊕W w ( )ˆcov′q T q

4.5.1. Estimators with minimum ( )( )covtrace T when 0

p

kk== ⊕W w

Minimizing trace of ( )ˆcov T - T subject to the unbiasedness constraints (4.4) is

equivalent to minimizing the following Lagrangian function,

( ) ( ) ( ), ,0

p

I I II II II I II i I II IIi

tr=

′ ′ ′ ′ ′ ′ ′Φ = − − + −∑W W V W W V L L V W u W X L X λ , (4.5)

where i′u is the i -th row of 1p+I , ( )0 k pλ λ λ ′=λ , 0,1, ,k p= … , is a

( )1 1p + × column vector of Lagrangian multipliers. Differentiating (4.5) with respect

to elements of W and λ , setting the derivatives to zeros, results in the following

estimation equations,

,

1

ˆˆ

I III I

II II pI +

⎛ ⎞ ⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟ ′′⎝ ⎠ ⎝ ⎠⎝ ⎠

w dD XX L 1X 0 λ

. (4.6)

54

where 2,0

p

I k n Nkσ

=

⎛ ⎞= ⊕ ⊗⎜ ⎟⎝ ⎠

D P and 2, 0

p

I II k p nk

N nN

σ=

− ⎛ ⎞⎛ ⎞= − ⊕ ⊗⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠d 1 1 . Estimation

equations (4.6) have the following solutions,

( ) ( )11 1 1 1, , 1ˆ I I II I I I I I I I I II II II p

−− − − −+′ ′ ′= − −w D d D X X D X X D d X L 1 .

After algebraic simplification, the above solution is equivalent to

ˆ nN n

n−

=w 1 or ( )1ˆ

p nN n

n +−

= ⊗W I 1 . (4.7)

Consequently, the best linear unbiased estimator for T and its variance are

ˆN=T μ and ( ) ( )var

N N nn−

=T Σ ,

where ( ) ( ) ( )( )0 1ˆ pY Y Y ′=μ is the vector of sample means, ( ) ( )

1

1 nk k

ss

Y Yn =

= ∑ ,

0,1, ,k p= … .

Derivation of the above results is presented in Section 4.7.

4.5.2. Estimators with minimum ( )ˆcov′q T q when 0

p

kk== ⊕W w

Another optimization criterion is to minimize an arbitrary quadratic function of

( )ˆcov T , i.e., ( )ˆcov′q T q , where q is ( )1 1p + × column vector of rank 1. When

1p+=q 1 , ( )ˆcov′q T q is the same as generalized mean squared error (GMSE). Thus, an

optimization function ( )Φ w can be defined using ( )ˆcov′q T q and include the unbiased

constraint for T , such that

( ) ( ) ( )1

, ,1

p


+

=

′ ′ ′ ′ ′ ′ ′ ′Φ = − − + −∑w q W V W W V L L V W q u W X L X λ , (4.8)

55

where ( ) ( )0 11 1 pp λ λ λ+ ×′=λ is a ( )1 1p + × column vector of Lagrangian

multipliers. Differentiating (4.8) and setting both sets of derivatives to zero yields the

following estimation equations,

( )

,

1

ˆ

ˆqI II IIqI I

II II pI +

⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟′′ ⎝ ⎠⎝ ⎠ ⎝ ⎠

V L qV X w

X L 1X 0 λ, (4.9)

where 1 1 1 1 1,0 0

p p

qI k k n Nk kq q− − − − −

= =

⎛ ⎞⎛ ⎞ ⎛ ⎞= ⊕ ⊕ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠V Σ P , and ( ), 0

1p

qI II k n N nkq

N × −=

⎛ ⎞⎛ ⎞ ⎛ ⎞= ⊕ ⊗ −⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠⎝ ⎠V Σ J .

Estimation equations (4.9) have the following unique solutions,

( ) ( )11 1 1 1, , 1ˆ qI qI II II qI qI I qI I I qI I II II II II p

−− − − −+′ ′ ′= − −w V V L q V X X V X X V V L q X L 1 , (4.10)

After some algebraic arrangement, (4.10) can be simplified as,

1ˆ p nN n

n+

−⎛ ⎞= ⊗ ⎜ ⎟⎝ ⎠

w 1 1 , (4.11)

and ˆ k nN n

n−

=w 1 , for 0,1, ,k p= … , ( )1ˆ

p nN n

n +−

= ⊗W I 1 . Thus, we have the

following expressions,

ˆ ˆN=T μ and ( ) ( )ˆvarN N n

n−

=T Σ .

Derivation of the above results is presented in Section 4.8.

4.5.3. Estimators with minimum ( )( )ˆcovtrace T when 1 *p+= ⊗W I w

Since 1 *p+= ⊗W I w , the unbiased constraints (4.4) reduce to,

( )* 0I N n′ − − =X w , (4.12)

where I n=X 1 . The Lagrangian function (4.5) is simplified to,

( ) ( ) ( )( ), , *I I II II II I II Itr N n λ′ ′ ′ ′ ′Φ = − − + − −W W V W W V L L V W X w . (4.13)

56

Differentiating (4.28) with respect to *w and λ and setting the derivatives to

equal zero yields the following estimating equations

*, , ,ˆˆ

tr I I tr I II I

I N nλ

⎛ ⎞⎛ ⎞ ⎛ ⎞=⎜ ⎟⎜ ⎟ ⎜ ⎟′ −⎝ ⎠⎝ ⎠⎝ ⎠

wV X VX 0

l, (4.14)

where ( ), ,tr I n Ntr=V Σ P , ( ), ,tr I IIN n tr

N−

= −V Σ and I n= 1l . The solutions of (4.14)

are

( ) ( )( )11 1 1 1* , , , , , , , ,ˆ tr I tr I II I tr I I I tr I I I tr I tr I II I N n

−− − − −′ ′= − − −w V V V X X V X X V Vl l . (4.15)

After algebraic simplification, (4.15) becomes

*ˆ nN n

n−

=w 1 ,

or equivalently,

( )1ˆ

p nN n

n +−

= ⊗W I 1 .

Consequently, ˆ ˆN=T μ and ( ) ( )varN N n

n−

=T Σ .

Detailed derivations are given in Section 4.9.

4.5.4. Estimators with minimum ( )ˆcov′q T q when 1 *p+= ⊗W I w

Using constraint (4.12), the Lagrangian function of this case can be defined as

( ) ( )( ), *2I I II II I N n λ′ ′ ′ ′ ′Φ = − + − −w q W V Wq q W V L q X w . (4.16)

Differentiating (4.16) and setting the derivatives to zero yields the following estimating

equations,

( )* , ,,

ˆˆ

q I II Iq I I

I N nλ

⎛ ⎞ ⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟′ −⎝ ⎠ ⎝ ⎠⎝ ⎠

w VV XX 0

l, (4.17)

57

where ( ), ,q I n N′=V q Σq P , ( ), ,q I IIN n

N− ′= −V q Σq , I n= 1l and I n=X 1 . The solutions

of (4.17) are

( ) ( )( )11 1 1 1* , , , , , , , ,ˆ q I q I II I q I I I q I I I q I q I II I N n

−− − − −′ ′= − − −w V V V X X V X X V Vl l . (4.18)

After simplification, (4.18) becomes,

*ˆ nN n

n−

=w 1 ,

or equivalently,

( )1ˆ

p nN n

n +−

= ⊗W I 1 .

Consequently,

1ˆ ˆn

p IN Nn+

′⎛ ⎞= ⊗ =⎜ ⎟⎝ ⎠

1T I Z μ , and ( ) ( )varN N n

n−

=T Σ .


4.6. Remarks

In this Chapter, we derived two classes of estimators that are defined by two

different sets of coefficients, i.e., 0

p

kk== ⊕W w and 1 *p+= ⊗W I w . The estimators were

optimized by minimizing either the trace of ( )ˆcov T or ( )ˆcov′q T q . Under SRSWOR

and unbiasedness constraints for every element of T , the estimators for all four cases

are identical and shown to be simple expansion estimators. It is apparent that neither

the method of defining estimators nor the optimization criterion had impact on the

estimation.

58

4.7. Derivations of results in Section 4.5.1.

This section provides proof of the results in Section 4.5.1. A class of estimators

defined by 0

p

kk== ⊕W w , where kw may or may not be equal to each other, is derived by

minimizing ( )( )ˆcovtrace T subject to the unbiasedness constraints for every element of

T .

As shown in Section 4.4.2., the unbiasedness constraints are

I II II′ ′− =W X L X 0 . (4.19)

Therefore, minimizing ( )( )ˆcovtrace T subject to unbiased constraints (4.19) is

equivalent to minimizing the following Lagrangian function,

( ) ( ) ( )1

, ,1

2p


tr+

=

′ ′ ′ ′ ′ ′Φ = − − + −∑W W V W W V L L V W u W X L X λ , (4.20)

where λ is a ( )1 1p + × vector of Lagrangian multipliers.

Since ,I n N= ⊗V Σ P , ,1

n N n nN= −P I J ,

0

p

kk == ⊕W w ,

( ),0 0

20 0 , 0 01 0 , 1 0 0 ,

210 1 , 0 2 1 , 1 1 1 ,

20 , 0 1 , 1 ,

.

p p

I k n N kk k

n N n N p n N p

n N n N p n N p

p p n N p p n N p p n N p

σ σ σ

σ σ σ

σ σ σ

= =

⎛ ⎞ ⎛ ⎞′ ′= ⊕ ⊗ ⊕⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

′ ′ ′⎛ ⎞⎜ ⎟⎜ ⎟′ ′ ′⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟′ ′ ′⎝ ⎠

W V W w Σ P w

w P w w P w w P w

w P w w P w w P w

w P w w P w w P w

It is obvious that ( ) 2,

1

p

I k k n N kk

trace σ=

′ ′= ∑W V W w P w , ( ) 2,2I k n N k

k

tr σ∂ ′ =∂

W V W P ww

and

59

( )

2,

10 200 , 0

2 2, 11 , 1 212 ,0

2,

2,

1

2

22

2

p

k k n N kk

n Np

pk k n N k n Nk

I k n Nk

pp n N pp

k k n N kkp

tr

σσ

σ σσ

σσ

=

==

=

⎛ ⎞∂ ′⎜ ⎟∂⎜ ⎟ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟∂⎜ ⎟ ⎜ ⎟′ ⎜ ⎟∂ ⎛ ⎞⎛ ⎞⎜ ⎟∂ ⎜ ⎟ ⎜ ⎟′ = = = ⊕ ⊗⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟∂ ⎜ ⎟⎝ ⎠⎝ ⎠⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠∂⎜ ⎟′⎜ ⎟∂⎝ ⎠

=

∑

∑

∑

w P ww

wP w

w P w wP wwW V W P

w

wP ww P w

w

( ),2

2 ,

n N

I

⊗

=

ΣD P w

D w

where 2

0

p

kkσ

== ⊕ΣD and ,I n N= ⊗ΣD D P .

Since ( ) 2,

0

p

I II II k k nk

N ntrN

σ=

−′ ′= − ∑W V L w 1 , ( ) 2,I II II k n

k

N ntrN

σ∂ −′ = −∂

W V L 1w

,

we have ( ) ( ),I II II p nN ntr

N∂ −′ = − ⊗∂ ΣW V L D 1 1w

. In addition,

( ) ( )( )

( )( )( )

( ) ( )( )( )

( ) ( )( )

1 1 1 1 10 0 0

1 ' 1 ' 1 ' 10 ' 0

1 ' 1 ' 1 ' 10 ' 0

1 ' 1 ' 1 ' 10 ' 0

p p p

k I k k k k n Ik k k

p p

k k k k n Ik k

p p

k k k k n Ik k

p p

k k k k nk k

tr tr

tr

+ + + + += = =

+ + + += =

+ + + += =

+ + + += =

⎛ ⎞⎛ ⎞∂ ∂′ ′ ′ ′ ′= ⊗⎜ ⎟⎜ ⎟∂ ∂ ⎝ ⎠⎝ ⎠

∂ ′ ′ ′= ⊗∂

∂′ ′ ′= ⊗∂

′ ′= ⊗

∑ ∑ ∑

∑∑

∑∑

∑∑

u W X λ u u w u u I X λw w

u u w u u I X λw

u u w u u I X λw

u u u u I

.

I

I

⎛ ⎞⎜ ⎟⎝ ⎠

=

X λ

X λ

Therefore, differentiating (4.20) with respect to elements of w and λ gives

( ) ,2 2 2I I II I∂Φ = − +

∂w D w D X λ

w, (4.21)

60

( ) ( )1

11

p

I II II i I II II pi

+

+=

∂ ′ ′ ′ ′Φ = − = −∂ ∑w X W X L u X w X L 1λ

, (4.22)

where ,I n N= ⊗ΣD D P and ( ),I II p nN n

N−

= − ⊗Σd D 1 1 . Setting both (4.21) and (4.22)

to zero results in the following estimating equations,

,

1

ˆˆ

I III I

II II pI +

⎛ ⎞ ⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟ ′′⎝ ⎠ ⎝ ⎠⎝ ⎠

w dD XX L 1X 0 λ

. (4.23)

The solutions to (4.23) are

( ) ( )11 1 1 1, , 1ˆ I I II I I I I I I I I II II II p

−− − − −+′ ′ ′= − −w D d D X X D X X D d X L 1 .

Since 1,

1n N n nN n− ⎛ ⎞= +⎜ ⎟−⎝ ⎠

P I J , 1,n N n n

NN n

− =−

P 1 1 , 1,n n N n

NnN n

−′ =−

1 P 1 ,

1, 1I I II p n

−+= − ⊗D d I 1 , we have

( ) ( ) ( ) ( ) ( ) ( )( )( ) ( )( )

111 1 1 1 1 1, 1 1 , 1

11 1 1, ,

11 ,

I I I I I n N p n p n n N p n

n N n n n N n

p nn

−−− − − − − −+ + +

−− − −

+

′ ′= ⊗ ⊗ ⊗ ⊗ ⊗

′= ⊗

⎛ ⎞= ⊗ ⎜ ⎟⎝ ⎠

Σ Σ

Σ Σ

D X X D X D P I 1 I 1 D P I 1

D D P 1 1 P 1

I 1

( )( )( )

( )( )( )

1 1 1, 1 1 ,

1 1 1

1 1 1 1 ,

I I I II II II p p n n N n

p N n p N n p

p p p p

N nN

N nn N n nn

− − −+ +

+ − + − +

+ + + +

−′ ′ ′− = − ⊗ ⊗ ⊗

′− ⊗ ⊗

−⎛ ⎞= − − − = − +⎜ ⎟⎝ ⎠

Σ ΣX D d X L 1 I 1 D P D 1

I 1 I 1 1

I 1 I 1

( ) ( )

( ) ( )

11 1 1 1, , 1

1 1 1 1

1

ˆ

.

I I II I I I I I I I I II II II p

p n p n p p

p n

N nn

N nn

−− − − −+

+ + + +

+

′ ′ ′= − −

−⎛ ⎞= − ⊗ + ⊗ +⎜ ⎟

⎝ ⎠−

= ⊗

w D d D X X D X X D d X L 1

I 1 I 1 I 1

1 1

61

Consequently, ( )1ˆ

p nN n

n +−

= ⊗W I 1 , and

( )1 1 1ˆ ˆn

p n p n I p IN n N N

n n+ + +

′− ⎛ ⎞⎛ ⎞′ ′= ⊗ + ⊗ = ⊗ =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

1T I 1 I 1 Z I Z μ .

( ) ( )( ) ( ) ( ),var p n n N p n

N N nN n N nn n n

−− −⎛ ⎞ ⎛ ⎞′= ⊗ ⊗ ⊗ =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

T I 1 Σ P I 1 Σ .


This section presents the derivations for a class of estimators defined by

0

p

kk == ⊕W w and optimized by minimizing ( )ˆcov′q T q subject to the unbiasedness

constraints for every element of T , where q is an 1n × vector.

As shown in Section 4.4.2., the unbiasedness constraints are

I II II′ ′− =W X L X 0 . (4.24)

Minimizing ( )ˆcov′q T q subject to unbiased constraints (4.24) is equivalent to

minimizing the following Lagrangian function,

( ) ( ) ( ) ( )1

, ,1

2p

I I II II I II II i I II IIi

+

=

⎡ ⎤′′ ′ ′ ′ ′ ′Φ = − − + −⎢ ⎥⎣ ⎦∑W q W V W W V L V L W q u W X L X λ ,

where ( ) ( )0 11 1 pp λ λ λ+ ×′=λ is a ( )1 1p + × vector of Lagrangian multipliers.

For convenience, we denote ( )0 1 p′′ ′ ′=w w w w . It is clear that

10

p

k pk +=

⎛ ⎞= ⊕⎜ ⎟⎝ ⎠

w w 1 and ( )( )1

1

p

i i n ii

+

=

′ ′= ⊗∑W u u I wu , where iu and i′u are the i -th column

or row of identity matrix 1p+I . Therefore, ( ) ( )Φ = Φw W .

62

Since 0

p

k nkq

=

⎛ ⎞⎛ ⎞= ⊕ ⊗⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠Wq I w and

0

p

k nkq

=

⎛ ⎞⎛ ⎞′ ′ ′= ⊕ ⊗⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠q W w I ,

0 0

0 02 ,

p p

I k n I k nk k

p p

k n I k nk k

q q

q q

= =

= =

∂ ∂ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞′ ′ ′= ⊕ ⊗ ⊕ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟∂ ∂ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞= ⊕ ⊗ ⊕ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

q W V Wq w I V I ww w

I V I w

( ), , ,0 0

p p

I II II k n I II II k n I II IIk kq q

= =

∂ ∂ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞′ ′ ′= ⊕ ⊗ = ⊕ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟∂ ∂ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠q W V L q w I V L q I V L q

w w.

As shown in Section 4.9, ( ) ( )

1

11

p

i I II II I II IIpi

+

+=

′ ′ ′ ′ ′ ′− = −∑u W X L X λ w X λ 1 L X λ ,

( )1

1

p

i I II II Ii

+

=

∂ ′ ′ ′− =∂ ∑u W X L X λ X λw

, ( )1

11

p

i I II II I II II pi

+

+=

∂ ′ ′ ′ ′ ′− = −∂ ∑u W X L X λ X w X L 1λ

.

Consequently, the two sets of derivatives are,

,0 0 02 2 2

p p p

k n I k n k n I II II Ik k kq q q

= = =

∂Φ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞= ⊕ ⊗ ⊕ ⊗ − ⊕ ⊗ +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟∂ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠I V I w I V L q X λ

w,

1I II II p+∂Φ ′ ′= −∂

X w X L 1λ

.

Setting both sets of derivatives to zeros, we have,

,00 0

1

ˆ

ˆ

pp p

k n I II IIk n I k n I kk k

II II pI

qq q== =

+

⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⊕ ⊗⊕ ⊗ ⊕ ⊗ ⎛ ⎞ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠ =⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟ ⎜ ⎟′′⎝ ⎠ ⎝ ⎠

I V L qI V I X w

λX L 1X 0

. (4.25)

Denote 0 0

p p

qI k n I k nk kq q

= =

⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞= ⊕ ⊗ ⊕ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠V I V I and , ,0

p

qI II k n I IIkq

=

⎛ ⎞⎛ ⎞= ⊕ ⊗⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠V I V , the

solutions to (4.25) can be written as

( )( )

11 1, 1

11 1 1 1, 1 , 1

ˆ

ˆ

I qI I I qI qI II II II p

qI qI II II p qI I I qI I I qI I II II II p

−− −+

−− − − −+ +

⎧ ′ ′ ′⎡ ⎤= −⎣ ⎦⎪⎨

′ ′ ′⎡ ⎤⎪ = − −⎣ ⎦⎩

λ X V X X V V X L 1

w V V L 1 V X X V X X V V X L 1. (4.26)

63

The explicit expression for w can be derived as follows,

( ) 11 1 1 1, , 1ˆ qI qI II II qI qI I qI I I qI I II II II II p

−− − − −+′ ′ ′⎡ ⎤= − −⎣ ⎦w V V L q V X X V X X V V L q X L 1 ,

where

1 1 1 1 1 1 1 1 1, ,0 0 0 0

p p p p

qI k n n N k n k k n Nk k k kq q q q− − − − − − − − −

= = = =

⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎡ ⎤= ⊕ ⊗ ⊗ ⊕ ⊗ = ⊕ ⊕ ⊗⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎣ ⎦⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠V I Σ P I Σ P ,

( ), 0

1p

qI II k n N nkq

N × −=

⎛ ⎞⎛ ⎞ ⎛ ⎞= ⊕ ⊗ −⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠⎝ ⎠V Σ J , ( )

1 1, 0

1p

qI qI II k n N nkq

N n− −

× −=

⎛ ⎞ ⎛ ⎞= ⊕ ⊗ −⎜ ⎟⎜ ⎟ −⎝ ⎠⎝ ⎠V V J , and

1 1 1 1

0 0

p p

qI I k k nk k

N q qN n

− − − −

= =

⎛ ⎞⎛ ⎞ ⎛ ⎞= ⊕ ⊕ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟− ⎝ ⎠ ⎝ ⎠⎝ ⎠V X Σ 1 ,

( ) ( ) ( )

( )

111 1 1 1 1

1 , 10 0

1 111 1 1 1 1 1 1

,0 0 0 0

p p

I qI I p n k k n N p nk k

p p p p

k k n n N n k kk k k k

q q

N nq q q qNn

−−− − − − −

+ += =

− −−− − − − − − −

= = = =

⎧ ⎫⎛ ⎞⎛ ⎞⎪ ⎪⎛ ⎞ ⎛ ⎞′ ′= ⊗ ⊕ ⊕ ⊗ ⊗⎨ ⎬⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭

−⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞′= ⊕ ⊕ = ⊕ ⊕⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

X V X I 1 Σ P I 1

Σ 1 P 1 Σ

( ) ( )1 1 1

, 1 0 0

1p p

I qI qI II p n k k N nn N nk k

nq qN n N n

− − −+ −× −= =

⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞′ ′ ′= ⊗ ⊕ ⊗ − = − ⊕ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠X V V I 1 J 1 .

Therefore,

( ) ( )

( )

110

11 1 1 1 1 1

0 0 0 0

11 10

1ˆp

k p N nn N nk

p p p p

k k n k kk k k k

p

k N n p N n pk

qN n

N N nq q q qN n Nn

n qN n

−+ −× −=

−− − − − − −

= = = =

−− + − +=

⎛ ⎞⎛ ⎞⎛ ⎞= ⊕ ⊗ − ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟−⎝ ⎠ ⎝ ⎠⎝ ⎠

⎛ ⎞ −⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞− ⊕ ⊕ ⊗ ⊕ ⊕⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟− ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠

⎛ ⎞⎛ ⎞ ′× − ⊕ ⊗ ⊗ − ⊗⎜ ⎟⎜ ⎟− ⎝ ⎠⎝ ⎠

w J I 1 q

Σ 1 Σ

1 I 1 q I( )( )

( ) ( )

( ) ( )

1 1

1 11 10 0

1 1

1

,

N n p N n p

p p

k n p n k pk k

p n p n

q n q N nn

Nn

− + − +

− −+ += =

+ +

⎡ ⎤′ ⊗⎢ ⎥

⎣ ⎦⎛ ⎞ ⎡ ⎤⎛ ⎞ ⎛ ⎞= − ⊕ ⊗ − ⊗ − ⊕ − −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎢ ⎥⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎣ ⎦

= − ⊗ + ⊗

1 I 1 1

1 q I 1 q 1

1 1 1 1

or simply,

64

( )1ˆ p nN n

n +

−= ⊗w 1 1 and 1

ˆp n

N nn+

−⎛ ⎞= ⊗ ⎜ ⎟⎝ ⎠

W I 1 .


This section provides proof of the results in Section 4.5.3. A class of estimators

defined by 1 *p+= ⊗W I w is derived by minimizing trace of ( )ˆcov T subject to the

unbiasedness constraints for every element of T .

Since 1 *p+= ⊗W I w , the unbiasedness constraints are reduced to

( )* 0n N n′ − − =w 1 . (4.27)

Minimizing the trade of ( )ˆcov T subject to (4.27) is equivalent to minimizing the

following Lagrangian function,

( ) ( ) ( )( ), , *2I I II II II I II ntr N n λ′′ ′ ′ ′Φ = − − + − −W W V W W V L L V W w 1 , (4.28)

where λ is a Lagrangian multiplier.

Since ( ) ( )1 * 1 *I p I p+ +′′ = ⊗ ⊗W V W I w V I w , ,I n N= ⊗V Σ P ,

( ) ( )( )( ){ } ( )( )1 * , 1 * * , *I p n N p n Ntr tr tr+ +′ ′′ = ⊗ ⊗ ⊗ =W V W I w Σ P I w Σ w P w ,

( ) ( ) ( ) ( )* , * , ***

2I n N n Ntr tr tr∂ ∂ ′′ = =∂ ∂

W V W Σ w P w Σ P ww w

( ) ( ) ( ) ( )

( )

, 1 * 1

*

1I II II p p N nn N n

n

trace trN

N ntrN

+ + −× −

⎡ ⎤⎛ ⎞⎛ ⎞′′ = ⊗ ⊗ − ⊗⎢ ⎥⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎣ ⎦−⎛ ⎞′= −⎜ ⎟

⎝ ⎠

W V L I w Σ J I 1

w Σ 1,

( ) ( ),*

I II II nN ntr tr

N∂ −′ = −∂

W V L Σ 1w

,

65

( )( )**

n nN n λ λ∂ ′ − − =∂

w 1 1w

,

( ) ( )**n N n

λ∂ ′Φ = − −∂

w 1 w .

Therefore, differentiating (4.28) with respect to *w and λ and setting the

derivatives to equal zero gives the following estimating equations

*, , ,ˆˆ

tr I I tr I II I

I N nλ

⎛ ⎞⎛ ⎞ ⎛ ⎞=⎜ ⎟⎜ ⎟ ⎜ ⎟′ −⎝ ⎠⎝ ⎠⎝ ⎠

wV X VX 0

l, (4.29)

where ( ), ,tr I n Ntr=V Σ P , I n=X 1 , ( ), ,tr I IIN n tr

N−

= −V Σ and I n= 1l . The solutions of

(4.29) are

( ) ( )( )11 1 1 1* , , , , , , , ,ˆ tr I tr I II I tr I I I tr I I I tr I tr I II I N n

−− − − −′ ′= − − −w V V V X X V X X V Vl l . (4.30)

After algebraic simplification, (4.30) is equivalent to ( )*ˆ nN nn

= −1w or

( )1ˆ

p nN n

n +

−= ⊗W I 1 . Consequently,

1ˆ ˆn

p IN Nn+

′⎛ ⎞= ⊗ =⎜ ⎟⎝ ⎠


n−

=T Σ .


This section provides derivations for the class of estimators defined by

1 *p+= ⊗W I w and optimized by minimizing ( )ˆcov′q T q subject to the unbiasedness

constraints for every element of T , where q is an 1n × vector.

Since 1 *p+= ⊗W I w , the unbiasedness constraints reduce to

( )* 0n N n′ − − =1 w . (4.31)

66

Minimizing ( )ˆcov′q T q subject to (4.24) is equivalent to minimize the following

Lagrangian function,

( ) ( ) ( )( ), *2 2I I II II n N n λ′ ′ ′ ′ ′Φ = − + − −W q W V W q W V L q 1 w ,

where λ is a ( )1 1p + × column vector of Lagrangian multipliers. The Lagrangian

function can be written alternatively as,

( ) ( ) ( )( )* , *2 2I I II II n N n λ′ ′ ′ ′ ′Φ = − + − −w q W V Wq q W V L q 1 w .

Let us evaluate them by part.

( )( )( ) ( )

( )

* , * * , ** * *

, *2

I n N n N

n N

∂ ∂ ∂′ ′ ′ ′ ′ ′= ⊗ ⊗ ⊗ =∂ ∂ ∂

′=

q W V Wq q w Σ P q w q Σq w P ww w w

q Σq P w,

( ) ( )( )

( ) ( ) ( )

( ) ( )

( )

, 1 * ,* *

* 1*

**

1

I II II p I II II

p N nn N n

n

n

NN n

NN n

N

+

+ −× −

∂ ∂′ ′ ′ ′= ⊗∂ ∂

∂ ⎛ ⎞⎛ ⎞′ ′= ⊗ − ⊗ ⊗⎜ ⎟⎜ ⎟∂ ⎝ ⎠⎝ ⎠− ∂′ ′= −

∂− ′= −

q W V L q q I w V L qw w

q w Σ J I 1 qw

q Σq 1 ww

q Σq 1

,

( )( )*

*n nN n λ λ∂ ′ − − =

∂1 w 1

w,

( )( ) ( )* *n nN n N nλλ∂ ′ ′− − = − −∂

1 w 1 w .

Therefore,

( ) ( ) ( )* , **

2 2 2n N n nN n

Nλ∂ −′ ′Φ = − +

∂w q Σq P w q Σq 1 1

w (4.32)

( ) ( )* *n N nλ∂ ′Φ = − −∂

w 1 w (4.33)

67

Set both (4.32) and (4.33) to zero, we have,

( )* , ,,

ˆˆ

q I II Iq I I

I N nλ

⎛ ⎞ ⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟′ −⎝ ⎠ ⎝ ⎠⎝ ⎠

w VV XX 0

l, (4.34)

where ( ), ,q I n N′=V q Σq P , ( ), ,q I IIN n

N− ′= −V q Σq , I n= 1l and I n=X 1 . The solutions

of (4.34) are

( ) ( )( )11 1 1 1* , , , , , , , ,ˆ q I q I II I q I I I q I I I q I q I II I N n

−− − − −′ ′= − − −w V V V X X V X X V Vl l . (4.35)

After simplification, (4.35) becomes,

*ˆ nN n

n−

=w 1 ,

or equivalently,

( )1ˆ

p nN n

n +

−= ⊗W I 1 .

Consequently,

1ˆ ˆn

p IN Nn+

′⎛ ⎞= ⊗ =⎜ ⎟⎝ ⎠


n−

=T Σ .

68

CHAPTER 5

ESTIMATING POPULATION TOTALS BY REPARAMETERIZATION

5.1. Motivation

In many practical situations, auxiliary information such as gender, age, income

and chronic disease-bearing history are completely or partially known either from a

sampling frame or publicly available census data. Although such information is

commonly used in survey design, its use in estimation after sampling is equally

important. In this Chapter, we demonstrate several methods of incorporating auxiliary

information in estimation through reparameterization. We will consider two scenarios,

i.e.,

1) Values of auxiliary variable are known for all units in the population,

2) Values of auxiliary variable are known only for all units in the sample while

its population total (or mean) is known for the auxiliary variable.

5.2. Formulation of the model through reparameterization

In Chapter 4, we defined a random permutation seemingly unrelated regression

model,

( )

( )( )

0

21 N

⎛ ⎞⎜ ⎟ = ⊗ +⎜ ⎟⎝ ⎠

YI 1 μ E

Y. (5.1)

In both scenarios considered here, the auxiliary total ( )1T (or the mean ( )1μ ) is known.

As a result, the N random variables representing the permutation of auxiliary variable

( )1Y is constrained by ( ) ( )1 1N T′ =1 Y . We transform the vector of auxiliary variables

69

( )1Y by pre-multiplying NN N NN

′= +1I 1 P , such that ( ) ( )( ) ( )1 1 1N

N NN′= +

1Y 1 Y P Y . The

first term in this expression is a vector of constant ( )( )1Nμ 1 . We define the second term

as

( ) ( ) ( ) ( )1 * 1 1 1N Nμ= = −Y P Y Y 1 .

This transformation can be represented jointly in terms of both ( )0Y and ( )1Y ,

( )

( )

( )

( ) ( )

0 0

11 1 *Nμ

⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟= + ⎜ ⎟

⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎝ ⎠⎝ ⎠ ⎝ ⎠

0Y YΠ

1Y Y,

where N

NN NN

⎛ ⎞⎛ ⎞ ⎜ ⎟= +⎜ ⎟ ⎜ ⎟⎜ ⎟ ′⎜ ⎟⎝ ⎠ ⎝ ⎠

0 0I 0Π 10 P 0 1

. Therefore, Model (5.1) becomes,

( )

( ) ( )

( )

( )

00

1 11 *

N

N N

μ

μ μ

⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟ + = +⎜ ⎟

⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠

0 1 0YE

1 0 1Y,

or simply

( )

( )( )

0*

21 * N

⎛ ⎞⎜ ⎟ = ⊗ +⎜ ⎟⎝ ⎠

YI 1 μ E

Y, (5.2)

where ( )0

*

0

μ⎛ ⎞⎜ ⎟=⎜ ⎟⎝ ⎠

μ . As shown in A2.1.15, ( )

( )

0

1 *cov N

⎛ ⎞⎜ ⎟ = ⊗⎜ ⎟⎝ ⎠

YΣ P

Y, where

20 01

210 1

σ σ

σ σ

⎛ ⎞⎜ ⎟=⎜ ⎟⎝ ⎠

Σ .

The next step is to partition ( ) ( )( )0 1 * ′′ ′Y Y into two parts that represent sample

and the remaining parts,

70

( )

( )

0* *

1 *

I

II

⎛ ⎞ ⎛ ⎞⎜ ⎟= = ⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠

ZYZ K

ZY,

where *K is a permutation matrix as given by (2.9) in Chapter 2, ( ) ( )( )0 1 *I I I

′′ ′=Z Y Y is

the sample and ( ) ( )( )0 1 *II II II

′′ ′=Z Y Y is the remaining part. It immediately follows that

( ) ( )( )0

20

I nEμ⎛ ⎞⎜ ⎟= ⊗⎜ ⎟⎝ ⎠

Z I 1 , ( ) ( )( )0

20

II N nEμ

−

⎛ ⎞⎜ ⎟= ⊗⎜ ⎟⎝ ⎠

Z I 1 , ,I n N= ⊗V Σ P and

( ), ,1

I II II I n N nN × −⎛ ⎞′= = ⊗ −⎜ ⎟⎝ ⎠

V V Σ J .

5.3. Estimating ( )0T and ( )0μ with a known auxiliary population total

In this Section , we will derive an estimator of response population total (and

mean) using the method outlined in Section 3.5. of Chapter 3. First, we briefly reiterate

the research question and then derive the estimator.

5.3.1. Parameters of interest and their linear estimators

The parameter of interest is the population total of the response variable,

( ) ( )0 0NT ′ ′= =1 Y Z*l , (5.3)

where ( )1N N×′′= 1 0l . We are interested in deriving an estimator for ( )0T that is a

linear function of the sample data and unbiased for ( )0T , and has minimum mean

squared error. After partitioning, ( )0T can be represented as a linear combination of the

sampled and remaining parts,

( )0I I II IIT ′ ′= +Z Zl l , (5.4)

71

where ( )1 0I n′= ⊗1l and ( )1 0II N n−

′= ⊗1l .

The linear estimator of ( )0T is defined as linear function of IZ ,

( ) ( )0ˆI IT ′ ′= + w Zl , (5.5)

where ( )0 1′′ ′=w w w is a 2 1n × vector.

5.3.2. Unbiasedness constraint and optimization criterion

As shown in Section 3.5, the unbiasedness for ( )0T implies constraint (3.24), or

namely,

( )0 0n N n′ − − =w 1 . (5.6)

Under this constraint, the mean square error of ( )0T is the same as it variance,

( )( ) ( ),0

,

ˆvarI I II

II

II I II II

T⎛ ⎞⎛ ⎞

′ ′ ⎜ ⎟⎜ ⎟= −⎜ ⎟⎜ ⎟ −⎝ ⎠⎝ ⎠

V V ww

V Vl

l. (5.7)

To minimize (3.25) subject to (3.24), we define the optimization function as

( ) ( ) ( ) ( )( )0

0 1 0 1 , 0

1


′ ′ ′ ′ ′⎜ ⎟Φ = − + − −⎜ ⎟⎝ ⎠

ww w w V w w V w 1

wl , (5.8)

where λ is a Lagrangian multiplier, and IV is the expected variance-covariance matrix

of the sample. Differentiating (3.26) and setting the derivatives to zero leads to the

following estimating equations,

( )

( )

( )

0

,

1

ˆ1 11

ˆ0 0

ˆ1 0 0 1

n N n n

n

f

N fλ

⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⊗ ⊗ − − ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ =⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟′⊗ −⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠

wΣ P 1 Σ 1

w

1

,

where Σ is given in Section 2.2. This leads to the unique solution,

72

0

011

ˆ

ˆ

nn

n

Nn β

⎛ ⎞⎛ ⎞ ⎛ ⎞= − + ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟−⎝ ⎠⎝ ⎠ ⎝ ⎠

1w 1

1w 0,

where 201 01 1β σ σ= is the linear regression coefficient of ( )0y on ( )1y in the finite

population (Cochran 1977). Consequently, the estimator for ( )0T is,

( ) ( )( )( )

( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( )( )( ) ( ) ( )( )

00

1 0 1 1 *

0 0 1 *01

0 0 1 1 1 101 01

ˆ 垐

,

I

I

n nn I I I

T

NN nn N n n

nY N n Y Y n Y

β

β μ β μ

⎛ ⎞⎜ ⎟′ ′ ′= +⎜ ⎟⎝ ⎠

′ ′⎛ ⎞⎛ ⎞′= + − − ⎜ ⎟⎜ ⎟− ⎝ ⎠⎝ ⎠

= + − − − − −

Yw w

Y

1 11 Y Y Y

l

(5.9)

where ( ) ( )0 01n IY

n′= 1 Y and ( ) ( )1 11

n IYn

′= 1 Y are the sample means of the response and

auxiliary variables, respectively. Let us write (5.9) alternatively as,

( ) ( ) ( ) ( )( )0 1 0 101 01T N N Y Yβ μ β= + − . (5.10)

Identity (5.10) implies that, in estimating ( )0T , we would be estimating ( )0T as a

constant ( )( )101Nβ μ plus the finite population estimating function

( ) ( )( )0 101N μ β μ− ,

for which the best sample-based estimating function is

( ) ( )( )0 101N Y Yβ− .

The variance of ( )0T is

( )( ) ( )0 2 2 20 01

1ˆvar 1fT Nn

σ ρ−= − , (5.11)

73

where 2 2 201 01 0 1ρ σ σ σ= is the population correlation coefficient between the response

and auxiliary variable.

Similarly, the estimator of the mean for the response variable and its variance

are

( ) ( ) ( ) ( )( )0 0 1 101ˆ Y Yμ β μ= − − (5.12)

and

( ) ( )2 20 0 01

1ˆvar 1fn

μ σ ρ−= − , (5.13)

respectively.

Derivations of the above results are given in Section 5.7.

5.3.3. Special case: ( )0′′=w w 0

A special case is when 1w is forced to be a null vector, such that ( )0′′=w w 0 .

Using a similar derivation as above, it is shown that, 0ˆ nN n

n−

=w 1 , which is the weight

of the simple expansion estimator. Therefore, the estimator for ( )0T when

( )0′′=w w 0 is ( ) ( )0 0T NY= , and its variance is ( )( )0 2 2

01ˆvar .fT N

nσ−⎛ ⎞= ⎜ ⎟

⎝ ⎠

5.3.4. Remarks

Unlike the usual calibration technique of incorporating auxiliary information,

this approach leads to a set of weights that do not depend on sample, and yields a linear

unbiased minimum variance estimator of ( )0T . Estimator (5.10) is seen to be a

generalized regression estimator (GREG); this has been shown to be both design- and

model-unbiased (Särndal, Swensson and Wretman 1989; Särndal, Swensson and

74

Wretman 1992; Thompson 1997). That is, we have derived under a design-based

framework an estimator that is exactly the same as a GREG estimator. This estimator

depends on neither a superpopulation model nor an assumption about a regression

model about the response and auxiliary variables.

5.4. Estimating ( )0T and ( )0μ with several known auxiliary totals

In this section, we extend the results in Section 5.3 to scenarios with multiple

auxiliary variables. Similar to Model (5.1), we define a random permutation seemingly

unrelated regression model that includes p auxiliary variables,

( )

( )( )

( )

( )

0 0

1p N

μ•

+•

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟= ⊗ +⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

YI 1 E

Y μ, (5.14)

where ( ) ( ) ( ) ( )( )1 2 p• ′′ ′ ′=Y Y Y Y is the vector of all p auxiliary variables and

( ) ( ) ( ) ( )( )1 2 pμ μ μ• ′=μ is the vector of auxiliary means.

Since the vector of auxiliary totals ( ) ( ) ( ) ( )( )1 2 pT T T• ′=T (and thus ( )•μ )

is known, the N random variables arising from permuting the k -th auxiliary variable

have a constraint ( ) ( )k kN T′ =1 Y , where 1,2, ,k p= … . We transform ( )•Y by pre-

multiplying Np N NN

⎛ ⎞′⊗ +⎜ ⎟⎝ ⎠

1I 1 P , such that,

( ) ( ) ( ) ( )Np N p NN

• • •⎛ ⎞′= ⊗ + ⊗⎜ ⎟⎝ ⎠

1Y I 1 Y I P Y , (5.15)

75

The first component of (5.15) is a vector of constants equal to ( )Np N

•⎛ ⎞⊗⎜ ⎟⎝ ⎠

1I T . We

denote the second component of (5.15) as

( ) ( ) ( ) ( ) ( ) ( )*p N p N

•• • •= ⊗ = − ⊗Y I P Y Y I 1 μ .

This transformation can be represented jointly in terms of both ( )0Y and ( )•Y ,

( )

( )

( )

( ) ( )

0 0

*

N

N Np N p N pN N

•• •

⎡ ⎤⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎢ ⎥⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ + = +⎢ ⎥⎜ ⎟ ⎜ ⎟⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟⊗ ′⊗ ⊗⎢ ⎥⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎣ ⎦

0 0 0I 0 Y Y

1 10 I P 0 I 1 I TY Y.

After this transformation, Model (5.14) becomes,

( )

( )( )

( )0 0

1* p N

μ+

•

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟= ⊗ +⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

YI 1 E

Y 0. (5.16)

As shown in Chapter 2, Equation A2.1.15, ( )

( )

0

cov N•

⎛ ⎞⎜ ⎟ = ⊗⎜ ⎟⎝ ⎠

YΣ P

Y, where

( )

20 01 0

220 010 1 1

0

20 1

p

p

p p p

σ σ σ

σσ σ σ

σ σ σ

•

• •

⎛ ⎞⎜ ⎟

′⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟= =⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟

⎜ ⎟⎝ ⎠

σΣ

σ Σ.

After partitioning into sample and remaining parts, ( )I II′′ ′=Z Z Z , where

( ) ( )( )0 *I I I

• ′′ ′=Z Y Y is the sample and ( ) ( )( )0 *II II II

• ′′ ′=Z Y Y is the remaining part. It

immediately follows that ( ) ( )( )0

1I p nEμ

+

⎛ ⎞⎜ ⎟= ⊗⎜ ⎟⎝ ⎠

Z I 10

, ( ) ( )( )0

1II p N nEμ

+ −

⎛ ⎞⎜ ⎟= ⊗⎜ ⎟⎝ ⎠

Z I 10

,

( )

20 0

,0

I n N

σ •

• •

′⎛ ⎞⎜ ⎟= ⊗⎜ ⎟⎝ ⎠

σV P

σ Σ and

( )( )

20 0

, ,0

1I II II I n N nN

σ •

× −• •

′⎛ ⎞⎛ ⎞⎜ ⎟′= = ⊗ −⎜ ⎟⎜ ⎟ ⎝ ⎠

⎝ ⎠

σV V J

σ Σ.

76

The population total of the response variable, the population parameter of our

primary interest, can be defined as,

( ) ( )( )

( )

( )

( )

0 00 0

*NT• •

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟′ ′ ′= = =⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

Y Y1 Y

Y Yl l , (5.17)

where ( )( )1N p N×′′ ′= ⊗1 1 0l . After partitioning, ( )0T can be represented as a linear

combination of the sampled and remaining parts,

( )0I I II IIT ′ ′= +Z Zl l ,

where ( )11I p n×′= ⊗0 1l and ( )11II p N n× −

′= ⊗0 1l .

We derive an estimator for ( )0T that is a linear function of IZ , unbiased for ( )0T

and has minimum variance. The linear estimator of ( )0T is defined as linear function of

the sample,

( ) ( )0ˆI IT ′ ′= + w Zl , (5.18)

where ( )( )0 •′′ ′=w w w is a 1pn × vector of weights, and ( ) ( )1 2 p•

′′ ′ ′=w w w w .

The unbiasedness for ( )0T requires that ( ) 0I II IIE ′ ′− =w Z Zl . Since

( ) ( ) ( )( )01I p nE μ+

′= ⊗Z I 1 0 , ( ) ( ) ( )( )01II p N nE μ+ −

′= ⊗Z I 1 0 , this constraint is

equivalent to

( )( ) ( )( )

( )0 *0

1

0p

kn n

k

N n μ μ=

′ ′− − + =∑1 w 1 w i . (5.19)

Since ( )* 0kμ = for any 1,2, ,k p= … , (5.19) is satisfied for any ( )0μ simply by

( )0 0n N n′ − − =1 w . (5.20)

77

Similar to (3.25), the variance of ( )0T can be represented in terms of partitioned

matrices that include multiple auxiliary totals,

( )( )0,

ˆvar 2I I II II II II IIT ′ ′ ′= − +w V w w V L Vl l .

where ( )0 1 p′′ ′ ′=w w w w is an ( )1 1n p + × vector, ( )11II p N n× −

′= ⊗0 1l . Under

unbiasedness constraint (5.20), the Lagrangian function is defined as

( ) ( ){ }, 02 2I I II II n N n λ′ ′ ′Φ = − + − −w w V w w V w 1l , (5.21)

where λ is a Lagrangian multiplier. Differentiating (5.21) with respect to w and λ

results in the following estimating equations,

( )( )

( )1 , 1

1

I n I II N n

n N nλ−

⊗⎛ ⎞ ⊗⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟′ ′⊗ −⎝ ⎠ ⎝ ⎠⎝ ⎠

V u 1 w V u 1u 1 0

, (5.22)

where ( )1 11 p×=u 0 is the first column of identity matrix 1p+I . The unique solution to

(5.22) is

( )( )11 11 1 1 1

10

ˆ

1,

n n

n n

NnNn

−− −

•

′= − ⊗ + ⊗

⎛ ⎞= − ⊗ + ⊗⎜ ⎟−⎝ ⎠

w u 1 Σ u u Σ u 1

u 1 1β

(5.23)

where 10 0.

−• • •=β Σ σ Consequently,

( ) ( ) ( ) ( )( )0 00

ˆ ,T NY N • ••′= − −β Y μ

( )( ) ( )( )0 2 2 200

1ˆvar 1 .fT Nn

ρ σ•

−⎛ ⎞⎛ ⎞= − ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

where ( )•Y is a 1p × column vector of sample means of the p auxiliary variables, ( )•μ

is the column vector of known auxiliary means, and ( )1 2

0 0 00ρ σ−• • •• ′= σ Σ σ is the

78

multiple correlation coefficient of ( )0Y on ( )*•Y . This expression is similar to those

expression commonly seen in multiple linear regression models (Graybill 1976), except

its finite population correction factor. It is convenient to show

( ) ( ) ( )( )00 0ˆ ,Yμ • •

•′= − −β Y μ (5.24)

( ) ( )( )2 20 00

1ˆvar 1 fn

μ ρ σ•

−⎛ ⎞= − ⎜ ⎟⎝ ⎠

. (5.25)

Detailed derivations of the above results are summarized in Section 5.8.

5.5. Estimation of variance and confidence intervals

The estimators derived in this Chapter depend on the knowledge of the

covariance vector 0•σ between the response variable and the auxiliary variables, and the

variance-covariance matrix •Σ of auxiliary variables. They are often unknown to

surveyors in practice.

1) When auxiliary values for all units in the population are known, •Σ is

known, we only need to approximate 0•σ .

2) When individual auxiliary values are only known for the sampled units, we

may need to approximate both •Σ and 0•σ .

The problem of associating standard errors and confidence intervals with the

estimators can be assessed in several ways. One commonly used approach is to replace

0•σ and •Σ with their sample estimates, which is referred as methods of moments in

literature.

79

5.5.1. Normal approximation

To compute the variance of the estimators (5.10) and (5.24), one needs to know

20σ , 01σ (or 0•σ ), and 2

1σ (or •Σ ). Using the method of moments, one may simply use

their sample estimates, i.e.,

( ) ( )0 020

1 1ˆ1 I n n In n

σ ⎛ ⎞′= −⎜ ⎟− ⎝ ⎠Y I J Y ,

( ) ( )0 101

1 1ˆ1 I n n In n

σ ⎛ ⎞′= −⎜ ⎟− ⎝ ⎠Y I J Y ,

( ) ( )1 121

1 1ˆ1 I n n In n

σ ⎛ ⎞′= −⎜ ⎟− ⎝ ⎠Y I J Y .

However, if 21σ (or •Σ ) is provided or auxiliary values are known for all subjects, one

only need to use plug-in estimates for 20σ and 01σ (or 0•σ ). For example, using

methods of moment estimates for 20σ and 01σ , (5.10) leads to

( ) ( ) ( ) ( )( )( )0 0 1 101

ˆT N Y Yβ μ= + − ,

where 201 01 1

ˆ 垐β σ σ= if 21σ is unknown or 2

01 01 1ˆ ˆβ σ σ= if 2

1σ is known. The estimated

variance of ( )0T is,

( )( ) ( )0 2 2 20 01

1ˆ ˆˆvar 1fT Nn

σ ρ−= − ,

where 2 2 2 201 01 0 1ˆ 垐 �ρ σ σ σ= if 2

1σ is unknown or 2 2 2 201 01 0 1ˆ 垐ρ σ σ σ= if 2

1σ is known. An ad

hoc confidence interval based on appropriate normal-based prediction interval for ( )0T

would be

( ) ( )01 0varNY z Tα−± .

80

5.5.2. Jackknife variance

Another widely recommended variance estimation procedure is Jackknife

variance estimation (Royall and Cumberland 1978; Wolter 1985; Särndal, Swensson

and Wretman 1992; Stukel, Hidiroglou and Särndal 1996; Duchesne 2000; Valliant

2002). Suppose we are interested in estimating the variance of 0μ using (5.12); the

process of Jackknife variance estimation is as follows:

1) Partition the sample into B random groups of equal size m n B= , assuming

each of the m groups is a SRSWOR subsample from the sample;

2) For each group, compute a pseudo estimator ( )( )0 , 1, 2, ,b b mμ = … of ( )0μ

with the same function form as (5.12) based on the remaining units of the

sample after omitting the units in the group, ( )( )

( )( )

( )( ) ( )( )0 0 1 1

01ˆ b b bY Yμ β μ= − − .

We then define ( ) ( ) ( ) ( )( )0 0 0垐 �1b bm mμ μ μ= − − , where ( )0μ is computed with

(5.12) based on full sample.

Then the Jackknife estimator of ( )0μ is

( )0 0

1

1垐

m

JK bbm

μ μ=

= ∑

and the Jackknife variance estimator is defined as

( )

( ) ( )( )20 0

1

1ˆ 垐1

m

JK b JKb

Vm m

μ μ=

= −− ∑ .

An appropriate normal-based ( )1 100%α− × confidence interval for ( )0T would be

( )( )00 1

垐ˆ JKN z V Tαμ −± .

81

5.5.3. Bootstrap variance estimator and confidence intervals

Standard error and confidence intervals can be also computed using Bootstrap

method (Efron 1979; Efron and Tibshirani 1993). To apply bootstrap method in finite

sampling, modifications must be made (McCarthy and Snowden 1985). Suppose that a

sample ( ) ( )( )0 1I I I

′′ ′=Z Y Y has been drawn and observed. A bootstrap sample

( ) ( )( )0 1b bbI I I

′′ ′=Z Y Y is obtained by randomly sampling with replacement *n times

from IZ , where ( ) ( )1 1*n n f= − − , f n N= (McCarthy and Snowden 1985). In

matrix format, each bootstrap sampling process may be represented as

( )2b bI I= ⊗Z I Q Z ,

where ( )1 2 n

bm m m=Q u u u ,

imu is any im -th column of identity matrix of nI ,

*1,2, ,i n= … and *1 im n≤ ≤ . The bootstrap process begins by generating a large

number ( )B of independent bootstrap samples 1 2, , , , ,b BI I I IZ Z Z Z… … . For each

bootstrap replication bIZ , 1,2, ,b B= … , the estimator ( )0 bT is calculated using (5.10)

while replacing 01β , ( )0Y and ( )1Y with the estimates from bIZ . The bootstrap estimate

of variance of ( )0T is the variance of the bootstrap replications,

( )( ) ( ) ( )( )20 0 0

1

1垐1

Bb bb

b

T T TB =

= −− ∑V ,

where ( ) ( )0 01

1

Bb b

b

T B T−

=

= ∑ . To construct a ( )100 1 %α− confidence intervals, one may

use ( )( )0垐b TV and normal approximation, such that ( ) ( ) ( )( )0 1 0垐 �bT z Tα−± V .

82

The method by McCarthy and Snowden (1985) has a weakness that

( ) ( )1 1*n n f= − − needs to be approximated if it is not an integer, the impacts of such

approximation on estimation have not investigated. Rao and Wu (1988) developed a

method to the bootstrap estimation of the standard error of a population mean, which

overcomes the non-integer *n issue of McCarthy-Snowden method.

5.6. Remarks

The estimators presented in Section 5.3 and 5.4 are identical to the estimator by

(Cochran 1977) (page 191-192), and GREG estimators derived from model-assisted

approach (Särndal, Swensson and Wretman 1992)(page 273). In addition, the estimators

and their variances derived in Section 5.3 and 5.4 are the same as the asymptotically

design-unbiased estimator and its asymptotic design-variance derived from a model-

calibration approach based on a superpopulation framework (Wu and Sitter 2001).

They have a functional form similar to those of calibration estimators (Deville

and Särndal 1992) and estimators derived from model-based prediction approach

(Valliant, Dorfman and Royall 2000)(page 32); (Bolfarine and Zacks 1992) (page37).

When the variance-covariance of the auxiliary variables ( )•Σ and the covariance

between response and auxiliary variables ( )0•σ are unknown, we apply these results by

substituting their sample estimates (method of moments). The resulting estimators will

be the same as usual calibration estimators (Deville and Särndal 1992).

However, our results do not rely on model assumptions that are necessary for

superpopulation models and model-assisted approaches. In addition, these estimators

and their variance are exact but not approximated.

83

These estimators require the knowledge of both 01σ and 21σ , and their variances

are always smaller than variance of simple expansion estimator. The gain in the

precision of prediction is proportional to the squared (multiple) correlation coefficient

of the response variable and auxiliary variable(s).

5.7. Derivations of results in Section 5.3.

As shown in Section 5.2, the reparameterized model is

( )

( )( )

( )0 0

21 * 0N

μ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟= ⊗ +⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

YI 1 E

Y.

The parameter of interest is population total of the response variable that is defined as,

( ) ( )( )

( )

00

1 *NT⎛ ⎞⎜ ⎟′ ′= =⎜ ⎟⎝ ⎠

YZ 1 0

Yl ,

where ( )1N N×′′1 0l = . After partitioning, we denote the partitioned vector as Z , and

the sample and remaining parts as IZ and IIZ respectively, such that ( )I II′′ ′=Z Z Z .

Thus, ( )0T can be expressed as ( )0I I II IIT ′ ′= +Z Zl l , where ( )1 0I n′ ′= ⊗1l and

( )1 0II N n−′ ′= ⊗1l .

We define a linear estimator for ( )0T as a linear function of IZ , such that

( ) ( )0ˆI IT ′ ′= + w Zl ,

where w can be defined in two possible ways, i.e., ( )0 1′′ ′=w w w and ( )0

′′=w w 0 ,

0w and 1w are vectors of weights corresponding to the response and auxiliary variable.

84

Scenario 1: ( )0 1′′ ′=w w w

The unbiasedness for ( )0T requires ( ) 0I II IIE ′ ′− =w Z Zl , equivalently,

( )( ) ( ) ( )0 1 *0 1 0n nN n μ μ′ ′− − + =1 w 1 w . Since ( )1 * 0μ = , this constraint is satisfied for any

( )0μ simply by 0 n N n′ = −w 1 . Therefore, the optimization function is

( ) ( ) ( )( )0

0 1 0 1 , 0

1


′ ′ ′ ′ ′Φ = − + − −⎜ ⎟⎜ ⎟⎝ ⎠

ww w V w w V w 1

wl ,

which can be simplified to

( ) ( ) ( )( )

2 20 0 , 0 01 1 , 0 1 1 , 1

20 0 01 1 0

2

2 1 2 1

n N n N n N

n n nf N f

σ σ σ

σ σ λ

′ ′ ′Φ = + +

′ ′ ′+ − + + − −

w P w w P w w P w

w 1 w 1 w 1 (5.26)

Differentiating (5.26) with respect to 0w , 1w and λ yields following estimating

equations,

( )

( )

( )

2 20 , 01 , 0 0

201 , 1 , 1 01

1

1

1

n N n N n n

n N n N n

n

f

f

N f

σ σ σ

σ σ σ

λ

⎛ ⎞ ⎛ ⎞− −⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ = − −⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟′ −⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

P P 1 w 1

P P 0 w 1

1 0 0

,

or

( )

( )

( )

0

,

1

1 11

0 0

1 0 0 1

n N n n

n

f

N fλ

⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⊗ ⊗ − − ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ =⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟′⊗ −⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠

wΣ P 1 Σ 1

w

1

.

Solving these estimating equations leads to

85

( ) ( ) ( ) ( )

( )

1

1 1 1 1, ,

1

1

22 010 2

1

1 1ˆ 1 1 0 1 0

0 0

11 0

0

n n N n n n N nf N

N nn

N nn

λ

σσσ

−

− − − −

−

−

⎧ ⎫⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎪ ⎪⎜ ⎟′ ′⎜ ⎟ ⎜ ⎟= − − ⊗ ⊗ +⎜ ⎟ ⎜ ⎟⎨ ⎬⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎪ ⎪⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎩ ⎭⎝ ⎠

⎛ ⎞⎛ ⎞− ⎜ ⎟= − ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎛ ⎞−

= − −⎜ ⎟⎝ ⎠

Σ 1 P 1 Σ Σ 1 P 1

Σ

0

01 011 2 2

1 1

1 1ˆ 1

ˆ 0n n n

fN Nn nσ σ

σ σ

−⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎜ ⎟ ⎜ ⎟

= − ⊗ + ⊗ = ⊗⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ − −⎝ ⎠ ⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠

w1 1 1

w

( ) ( )( ) ( ) ( )( )

( )

( ) ( )

( )

00 01

0 1 2 1 *1

0 1 *0121

010 1 12

1

ˆ II I n n n n

I

n I n I

NTn

Nn

N y y

σσ

σσ

σ μσ

⎛ ⎞⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟′ ′ ′ ′ ′ ′ ′= + = − + −⎨ ⎬⎜ ⎟ ⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭⎝ ⎠

⎛ ⎞′ ′= −⎜ ⎟

⎝ ⎠

⎛ ⎞= − −⎜ ⎟

⎝ ⎠

Yw w Z 1 0 1 0 1 1

Y

1 Y 1 Y

l

( )( ) ( )( ) ( ) ( )( )

( ){ }( ) ( ){ }

( )

00 1 0 1

22 2

01 , 0 01

01

2 2 201 0

ˆvar cov

11

1 1 ,

I I I

n n N n

T

N N nNn n

fNn

β σ ββ

ρ σ

′ ′ ′ ′ ′= + +

⎧ ⎫⎛ ⎞ −⎪ ⎪⎛ ⎞ ′= − ⊗ ⊗ ⊗ = −⎜ ⎟⎨ ⎬⎜ ⎟ ⎜ ⎟⎝ ⎠ −⎪ ⎪⎝ ⎠⎩ ⎭

−⎛ ⎞= −⎜ ⎟⎝ ⎠

w w Z w w

1 Σ P 1

l l

where 201 01 1β σ σ= and

201

01 2 20 1

σρσ σ

= is the correlation coefficient between the

response and auxiliary variable.

86

Scenario 2: ( )0′′=w w 0

The unbiasedness for 0T requires ( ) 0I II IIE ′ ′− =w Z Zl , equivalently,

( )( ) ( ) ( )0 1 *0 0n nN n μ μ′ ′− − + =1 w 1 0 . This constraint is satisfied for any 0μ simply by

0 n N n′ = −w 1 . Therefore, the optimization function is

( ) ( ) ( )( )0 0 0 , 02 2n

n I n I II II n N n λ⎛ ⎞⎛ ⎞′ ′ ′⎜ ⎟Φ = − + − −⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

Iw I 0 V w w I 0 V w 1

0l ,

which can be simplified to

( ) ( )( )2 20 0 , 0 0 0 02 1 2n N n nf N nσ σ λ′ ′ ′Φ = − − + − −w P w w 1 w 1 (5.27)

Differentiating (5.26) with respect to 0w and λ and setting the derivatives to zero

yields following estimating equations,

( )

( )

2 20 , 0 01

0 1

n N n n

n

f

N f

σ σ

λ

⎛ ⎞ ⎛ ⎞− −⎛ ⎞⎜ ⎟ ⎜ ⎟=⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟′ −⎝ ⎠ ⎝ ⎠⎝ ⎠

P 1 w 1

1,

Solving these estimating equations leads to

0ˆ nN n

n−

=w 1 ,

which is the weight of simple expansion estimator. Therefore, the estimator for ( )0T is

( ) ( )0 00

ˆn I

NT Nyn

′= =1 Y ,

and its variance is

( )( )0 2 20

1ˆvar .fT Nn

σ−⎛ ⎞= ⎜ ⎟⎝ ⎠

87

5.8. Derivations of results in Section 5.4.

This section provides proof of the results in Section 5.4. The derivations are

analogues to those presented in Section 5.7. The population total of the response

variable that is defined as,

( ) ( )( )

( )

00

1 *NT•

⎛ ⎞⎜ ⎟′ ′ ′= = ⊗⎜ ⎟⎝ ⎠

YZ u 1

Yl ,

where 1 N′ ′⊗u 1l = and ( )1 11 p×′=u 0 . After partitioning, we denote the sample and

remaining parts as IZ and IIZ , and ( )0T can be expressed as ( )0I I II IIT ′ ′= +Z Zl l , where

1I n′ ′ ′= ⊗u 1l and 1II N n−′ ′ ′= ⊗u 1l .

The estimator for ( )0T as ( ) ( )0ˆI IT ′ ′= + w Zl , where ( )0 1 p

′′ ′ ′=w w w w ,

kw , 0,1, 2, ,k p= … are vectors of weights corresponding to the response and the p

auxiliary variables.

The unbiasedness for ( )0T requires ( ) 0I II IIE ′ ′− =w Z Zl , equivalently,

( )( ) ( ) ( )0 *0

1

0p

kn n k

k

N n μ μ=

′ ′− − + =∑1 w 1 w . Since ( )* 0kμ = for any 1, 2, ,k p= … , this

constraint is satisfied for any ( )0μ simply by 0 n N n′ = −w 1 .

Moreover, the variance of ( )0T can be represented as,

( )( )0,

ˆvar 2I I II II II II IIT ′ ′ ′= − +w V w w V Vl l l .

The Lagrangian function is defined as

( ) ( ) ( ){ }, 1 02 2I I II N n n N n λ−′ ′ ′Φ = − ⊗ + − −w w V w w V u 1 w 1 , (5.28)

88

where λ is a Lagrangian multiplier. Differentiating (5.28) with respect to w and λ

gives,

( ) ( ) ( )

( ) ( )

, 1 1

0

2 2 2I I II N n n

n N n

λ

λ

−

∂⎧ Φ = − ⊗ + ⊗⎪∂⎪⎨∂⎪ ′Φ = − −⎪ ∂⎩

w V w V u 1 u 1w

w w 1

. (5.29)

Setting both equations of (5.29) to zeros results in the following estimating equations

( )( )

( )1 , 1

1

I n I II N n

n N nλ−

⊗⎛ ⎞ ⊗⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟′ ′⊗ −⎝ ⎠ ⎝ ⎠⎝ ⎠

V u 1 w V u 1u 1 0

(5.30)

The solutions of (5.30) are

( ) ( )

( ) ( )( ) ( ) ( ) ( )( )

1 1, 1 1

11 11 1 1 , 1

ˆˆ

ˆI I II N n I n

n I n n I I II N n N n

λ

λ

− −−

−− −−

⎧ = ⊗ − ⊗⎪⎨

′ ′ ′ ′= ⊗ ⊗ ⊗ ⊗ − −⎪⎩

w V V u 1 V u 1

u 1 V u 1 u 1 V V u 1 . (5.31)

Since 1 1 1,I n N

− − −= ⊗V Σ P , ( ),1

I II n N nN × −⎛ ⎞= ⊗ −⎜ ⎟⎝ ⎠

V Σ J , ( )1

, 11

I I II p n N nN n−

+ × −= − ⊗−

V V I J ,

1,n N n n

NN n

− =−

P 1 1 and 1,n n N n

NnN n

−′ =−

1 P 1 , (5.31) can be simplified as

( )( )11 1

ˆˆ n nN

N nλ−= − ⊗ − ⊗

−w u 1 Σ u 1 and ( ) 11

1 1ˆ N n

nλ

−−− ′= − u Σ u .

Therefore,

( )( )11 11 1 1 1ˆ n n

Nn

−− −′= − ⊗ + ⊗w u 1 Σ u u Σ u 1 , (5.32)

Since ( ) ( )

( ) ( )

1 11 2 1 2 220 0 0 0 0 0 0 01 0 0

1 12 2 20 0 0 0 0 0 0 0 0

σ σ σσ

σ σ σ

− −− − − −• • • • • • •− •

− −− − −• • • • • • • • •

⎛ ⎞′ ′ ′− − −′⎛ ⎞ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ′ ′− − −⎝ ⎠

σ Σ σ σ Σ σ σσΣ

σ Σ Σ σ σ σ Σ σ σ,

( ) 11 2 11 1 0 0 0σ

−− −• • •′ ′= −u Σ u σ Σ σ and

( )( )

12 10 0 01

1 12 20 0 0 0 0

σ

σ σ

−−• • •

−−− −

• • • •

⎛ ⎞′−⎜ ⎟= ⎜ ⎟′ ′− −⎜ ⎟⎝ ⎠

σ Σ σΣ u

Σ σ σ σ, we have

89

( )( ) 11 1 2 20 0 0 0 0 0 0

1ˆ

1n nNn σ σ

−−• • • • • • •

⎛ ⎞⎜ ⎟= − ⊗ + ⊗

′ ′− − −⎜ ⎟⎝ ⎠

w u 1 1σ Σ σ Σ σ σ σ

. (5.33)

Denote ( )( ) 11 2 20 0 0 0 0 0 0 01 σ σ

−−• • • • • • • •′ ′= − −β σ Σ σ Σ σ σ σ , it can be shown that

( )( )

( ) ( ){ }( )( )

11 2 20 0 0 0 0 0 0 0

11 2 1 20 0 0 0 0 0 0

11 2 1 2 10 0 0 0 0 0 0

10

1

.

p p

p p

σ σ

σ σ

σ σ

−−• • • • • • • •

−− −• • • • • • • •

−− − −• • • • • • • •

−• •

′ ′= − −

′ ′= − −

′ ′= − −

=

β σ Σ σ Σ σ σ σ

I Σ σ σ Σ I Σ σ σ σ

I Σ σ σ I Σ σ σ Σ σ

Σ σ

Therefore, (5.33) is simply

10

1ˆ n n

Nn •

⎛ ⎞= − ⊗ + ⊗⎜ ⎟−⎝ ⎠

w u 1 1β . (5.34)

Subsequently, the estimator for ( )0T is

( ) ( ) ( ) ( )( )0 00

ˆ ,T NY N • ••′= − −β Y μ (5.35)

which has variance of

( )( ) ( ){ } ( )

( )( )

20

00

2 2 200

1ˆvar 1 cov

11 .

n I nNTn

fNn

ρ σ

••

•

⎧ ⎫⎛ ⎞⎪ ⎪⎛ ⎞ ⎜ ⎟′= − ⊗ ⊗⎨ ⎬⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎪ ⎪⎝ ⎠⎩ ⎭

−⎛ ⎞⎛ ⎞= − ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

β 1 Z 1β

where ( )1 2

0 0 00ρ σ−• • •• ′= σ Σ σ is the multiple correlation coefficient of ( )0Y on ( )*•Y .

90

CHAPTER 6

ESTIMATORS BASED ON RANDOM PERMUTATION MODELS

UNDER STRATIFIED SIMPLE RANDOM SAMPLING

6.1. Motivation

In this Chapter, we extend the results based on RPSUR for SRSWOR to settings

of stratified simple random sampling without replacement (STSRS). We consider two

scenarios where 1) no auxiliary information is available, and 2) one auxiliary variable is

known to some extent. For each scenario, we first derive general results when sampling

fractions of strata may or may not be equal, and then consider a special case when

sampling fractions are proportional to stratum sizes.

6.2. Definition and notations

Strata are H nonoverlapping subpopulations within the population of size N ,

each subpopulation, referred to as a stratum, has a known size of 1 2, , , HN N N… ,

respectively, and 1

H

hh

N N=

=∑ .

The stratified simple random sampling without replacement (hereafter, STSRS)

refers to the procedure that a sample of fixed sizes is drawn from each stratum

independently using SRSWOR scheme, and the H samples across strata constitute a

STSRS sample. The sample sizes within strata are denoted by 1 2, , , Hn n n… ,

respectively.

91

Following conventional notation for stratified sampling (Cochran 1977;

Bolfarine and Zacks 1992; Särndal, Swensson and Wretman 1992; Thompson 1997),

we adopt the following notations in this Chapter.

Suffix h , 1,2, ,h H= … , denotes the stratum. The following notations all refer

to stratum h .

hN total number of subjects

hn number of subjects in sample

( )khsy value of variable k of subject s

h ha N N= , stratum weight

h h hf n N= sampling fraction in the stratum

( ) ( )11

hNk kh h hss

N yμ −=

= ∑ stratum mean of the k -th variable

( ) ( )11

hnk kh h hii

Y n Y−=

= ∑ stratum sample mean of the k -th variable

( ) ( ) ( )( )0 1 ph h h hμ μ μ ′=μ vector of stratum means for all 1p + variables

hΣ stratum variance-covariance matrix

In addition, the vector of stratum means of the k -th variable is denoted as

( ) ( ) ( ) ( )( )1 2k k k k

Hμ μ μ ′=μ .

For matrix operations, we represent a block diagonal matrix as a direct sum,

1

H

hh=⊕ B , of matrices hB , 1,2, ,h H= … . A matrix consists of a column or row of matrices

92

or vectors is represented by { } ( )1 21

Hc h Hh=

′′ ′ ′=B B B B and

{ } ( )1 21

Hr h Hh=

=B B B B , respectively (Searle, Cassella and McCulloch 1992).

6.3. Independent permutation of one response variable across H strata

Applying results of Chapter 2, the random variables arising from permuting the

response variable y in the h -th stratum can be represented as

h h h=Y U y ,

where 1,2, ,h H= … , hU is of dimension h hN N× . The independent permutations of

H strata can be represented as

1 1 1

2 2 2

H H H

⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟=⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

Y U 0 0 y

Y 0 U 0 y

0

Y 0 0 U y

. (6.1)

Denote { } 1

Hhc h=

=Y Y and { } 1

Hhc h=

=y y , (6.1) can be simply represented as

1

H

hh=

⎛ ⎞= ⊕⎜ ⎟⎝ ⎠

Y U y . (6.2)

It immediately follows that,

( )1 1 h

H H

h Nh hE E

= =

⎧ ⎫⎛ ⎞ ⎛ ⎞= ⊕ = ⊕⎨ ⎬⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎩ ⎭

Y U y 1 μ ,

where ( )1 2 Hμ μ μ ′=μ .

Since Y representing the joint permutations of the H strata can be rewritten as

( ) ( ){ } 11 h

H Hh N c h hh

vec==

⎛ ⎞′= ⊕ ⊗⎜ ⎟⎝ ⎠

Y y I U , (6.3)

93

the variance-covariance matrix of Y is thus,

( ) ( ) ( ){ }( ) ( )11 1cov cov

h h

H HHh N c h h Nhh h

vec== =

′⎛ ⎞ ⎛ ⎞′= ⊕ ⊗ ⊕ ⊗⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

Y y I U y I .

Since the permutations of the H strata are independent, ( ) ( )( )cov , 0h hvec vec ′ =U U

when h h′≠ . Consequently, we have

( ){ }( ) ( )( ){ }1 1cov cov

HHc h hh hvec vec

= == ⊕U U

and thus

( ) ( )2

1cov

h

H

h Nhσ

== ⊕Y P ,

where 2 11 hh N

hNσ ′=

−y P y is the population variance of y in h -th stratum.

6.4. Simultaneous permutations of one response variable and p auxiliary

variables across H strata

The ( )1 hp N+ × random variables arising from permuting variables

( ) ( ) ( )0 1, , , py y y… in the h -th stratum can be represented as

( ){ } ( ) ( ){ }10 0

p pk kh p h hc ck k+= =

= ⊗Y I U y , (6.4)

where 1,2, ,h H= … , hU is of dimension h hN N× . We denote ( ){ }0

pkh hc k ==Y Y and

( ){ }0

pkh hc k ==y y and rewrite identity (6.4) as

( )1h p h h+= ⊗Y I U y . (6.5)

By applying (6.5) to (6.2) , the joint permutations of 1p + variables in all H strata can

be represented as

94

( )

1 1

2 2

11

H

p hh

H H

+=

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎛ ⎞⎜ ⎟ ⎜ ⎟= ⊕ ⊗⎜ ⎟

⎝ ⎠⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

Y y

Y yI U

Y y

. (6.6)

Denote { } 1

Hhc h=

=Y Y and { } 1

Hhc h=

=y y , (6.6) can be simplified as

( )11

H

p hh +=

⎛ ⎞= ⊕ ⊗⎜ ⎟⎝ ⎠

Y I U y . (6.7)

It immediately follows that,

( ) ( ) ( )1 11 1 h

H H

p h p Nh hE E + += =

⎧ ⎫⎛ ⎞ ⎛ ⎞= ⊕ ⊗ = ⊕ ⊗⎨ ⎬⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎩ ⎭

Y I U y I 1 μ ,

where { } 1

Hhc h=

=μ μ , ( ) ( ) ( )( )0 1 ph h h hμ μ μ ′=μ , 1,2, ,h H= … .

Similar to (6.3), re-express the permutations of the response and auxiliary

variable in the h -th stratum as

( ){ } ( ){ } ( ) ( )110 0 h

ppk kh h h p hp Nc rk k

vec ++= =

⎛ ⎞′= = ⊗ ⊗⎜ ⎟⎝ ⎠

Y Y y I I U .

In Chapter 2, we have shown that

( )covhh h N= ⊗Y Σ P ,

where hΣ is the population variance-covariance matrix of the h -th stratum. Therefore,

the joint variance-covariance matrix for all H strata is

( ) ( ) ( )1 1

cov covh

H H

h h Nh j= == ⊕ = ⊕ ⊗Y Y Σ P .

95

6.5. Stratified random sampling and partition of random matrices

A sample of size hn , h hn N≤ , is drawn by SRSWOR from the h -th stratum,

where 1,2, ,h H= … , can be viewed as the first hn subjects in a random permutation

within the h -th stratum. The elements of random vector hY arising from the

permutation can be partitioned into a sample (indexed as I ) and the remaining (indexed

as II ) parts by pre-multiplication by a permutation matrix hK , i.e.,

hI

h h hhII

⎛ ⎞⎜ ⎟= =⎜ ⎟⎝ ⎠

ZZ K Y

Z,

where hK is defined similarly as (2.9) of Section 2.5, ( ) ( ) ( )( )0 1 phI hI hI hI

′′ ′ ′=Z Y Y Y

and ( ) ( ) ( )( )0 1 phII hII hII hII

′′ ′ ′=Z Y Y Y . The expected values of hZ are,

( )1

1

h

h h

hI p n

h hhII p N n

E E+

+ −

⊗⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟= =⎜ ⎟ ⎜ ⎟⊗⎝ ⎠ ⎝ ⎠

Z I 1Z μ

Z I 1.

Further, the partitioned variance-covariance matrix of the h -th stratum is

( )

( )

,

,

covhI hI h I II

hhII hIIh II I

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟= =

⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

Z V VV

Z V V,

where ,h hhI h n N= ⊗V Σ P , ( ) ( ) ( ), ,1

h h hhh I II h II I n N nhN × −

⎛ ⎞′= = ⊗ −⎜ ⎟

⎝ ⎠V V Σ J ,

( ),h h hhII h N n N−= ⊗V Σ P .

The partitioning of all H strata can be represented jointly as

*

1

I I H

hhII II

=

⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟= = ⊕⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎝ ⎠

⎝ ⎠ ⎝ ⎠

Z RZ K Y

Z R, (6.8)

96

where { }{ }

1

1

H

hIc hHI

hIc h

=

=

⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟⎝ ⎠

YZ

Y,

{ }{ }

1

1

H

hIIc hHII

hIIc h

=

=

⎛ ⎞⎜ ⎟= ⎜ ⎟⎜ ⎟⎝ ⎠

YZ

Y, ( )( )1 11 h h h h

H

I p n p n N nh + + × −== ⊕ ⊗ ⊗R I I I 0

and ( )( )1 11 h hh h h

H

II p p N nN n nh + + −− ×== ⊕ ⊗ ⊗R I 0 I I . After algebraic simplification, we have

( ) ( )( )

( )11*

1

11

h

h h

H

I p nH hh Hh

IIp N nh

E E+=

=

+ −=

⎛ ⎞⊕ ⊗⎛ ⎞ ⎜ ⎟⎛ ⎞⎜ ⎟= ⊕ = ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠ ⎜ ⎟⊕ ⊗⎜ ⎟⎝ ⎠ ⎝ ⎠

R I 1Z K Y μ

R I 1,

( ) ( ),

*

1 1,

cov covI I IIH H

h hh hII I II

= =

⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎜ ⎟′ ′= ⊕ ⊕ =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

⎝ ⎠

V VZ R K Y K R

V V

where 1

H

I hIh== ⊕V V , ( ), , ,1

H

I II II I h I IIh=′= = ⊕V V V and

1

H

II hIIh== ⊕V V .

6.6. Parameter of interest

The population mean of the response variable can be define as a weighted

average of stratum means, for example,

( ) ( ) ( ) ( )( )( )( )

01 1 2 1 1

11

p p H p

p

a a aμ × × ×

×

=

′= ⊗

0 0 0 μ

a 0 μ,

where ( )1 2 Ha a a ′=a , h ha N N= is stratum weight of the h -th stratum,

1,2, ,h H= . The vector of population means of the 1p + variables can be written as

( ) ( ){ }1 0

p

kc k+ =′ ′= ⊗μ a u μi , where 1k+′u is the ( )1k + -th row of identity matrix 1p+I .

6.7. Formulation of the model when auxiliary variables are present

The general approach to analyze stratified sampling data with RPSUR models is

to treat the sampling process in each stratum as an independent permutation process.

97

When one response variable ( )0y and p auxiliary variables ( ) ( ) ( )1 2, , , py y y… are present,

a RPSUR model similar to that for the simple random sampling can be defined as,

{ } ( ) { }11 11 h

HH Hh p N hc ch hh += ==

⎛ ⎞= ⊕ ⊗ +⎜ ⎟⎝ ⎠

Y I 1 μ E , (6.9)

where ( ){ }1

pkh hc k ==Y Y is defined in (6.4). Model (6.9) can be written alternatively as

= +Y Xμ E , (6.10)

where { } 1

Hhc h=

=Y Y , ( )11 h

H

p Nh +== ⊕ ⊗X I 1 and { } 1

Hhc h=

=μ μ .

Suppose that a set of H samples of fixed size hn , 1,2, ,h H= … , are drawn with

SRSWOR scheme from the H strata. Using a similar derivation as in Section 4.3, we

can partition model (6.10) into sample and remaining proportions, and write the

partitioned model as

I I

II II

⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠

Z Xμ EZ X , (6.11)

where { } 1

HI hIc h==Z Z and { } 1

HII hIIc h==Z Z as defined in (6.8), ( )11 h

H

I p nh +== ⊕ ⊗X I 1 and

( )11 h h

H

II p N nh + −== ⊕ ⊗X I 1 , and E is the vector of residuals. It is evident that

( )I IE =Z X μ and ( )II IIE =Z X μ .

6.8. Estimating ( )0T when one auxiliary variable is present

Without losing generality, we first consider a simple case when there are only

three strata ( )3H = and one auxiliary variable ( )1p = is known. We will consider two

possible scenarios of interest, i.e., 1) the stratum means of the auxiliary variable for all

98

H strata are known, 2) some but not all of the auxiliary variable stratum means are

known.

Based on Model (6.10), the population total ( )0T of the response variable can be

represented using the following identity,

( ) ( )01 2 3T ′ ′ ′ ′= =Y Yl l l l , (6.12)

where ( )1 2 3′′ ′ ′=l l l l , ( )1 0

hh N′= ⊗1l , 1,2,3h = . After sampling and partitioning

Z into IZ and IIZ , ( )0T can be written as

( )0I I II IIT ′ ′= +Z Zl l , (6.13)

where ( )1 2 3I I I I′′ ′ ′=l l l l , ( )( )1 0

hhI n′′= ⊗1l , and ( )1 2 3II II II II

′′ ′ ′=l l l l ,

( )( )1 0h hhII N n−

′′= ⊗1l .

The primary interest is to derive an estimator for ( )0T that is a linear function of

the sample, unbiased for ( )0T and has minimum mean squared error.

6.8.1. Linear estimator for ( )0T

We define the linear estimator as,

( ) ( )0ˆI IT ′ ′= + w Zl , (6.14)

where ( )

( )

0

1

1

H

h

hc h=

⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟= ⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭

ww

w, ( )0

hw and ( )1hw have dimension of 1hn × .

6.8.2. Unbiasedness constraints

The unbiasedness for ( )0T requires that ( ) 0I II IIE ′ ′− =w Z Zl for any possible

( )0T . This constraint is equivalent to,

99

( ) ( )( ) ( )( ) ( ) ( )

3 30 0 1 1

1 1

0h hh n h h h h n h

h h

N n μ μ= =

′ ′− − + =∑ ∑w 1 w 1 .

To satisfy this condition for all ( )0hμ and ( )1

hμ , it is required that both terms of the left-

hand side equal 0, i.e.,

( )0 :q ( ) ( ){ } ( )3 3 0

111 0 0

hn h hr hhN n

==

⎛ ⎞⎛ ⎞′′ ⊕ ⊗ − − =⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠w 1 μ ,

( )1 :q ( ) ( )31

10 1 0

hnh=

⎛ ⎞′′ ⊕ ⊗ =⎜ ⎟⎝ ⎠

w 1 μ ,

where ( ) ( ) ( ) ( )( )1 2 3k k k kμ μ μ ′=μ , 0,1k = .

To satisfy ( )0q for any ( )0μ , it is required that,

( ) ( ){ }3 3

111 0

hn h hc hhN n

==

⎛ ⎞′⊕ ⊗ − − =⎜ ⎟⎝ ⎠

1 w 0 . (6.15)

Constraint ( )1q can be satisfied by either forcing ( )1hw to be null vectors or

through proper transformation such that ( )1 0hμ = , where 1,2,3h = . Transformation of

random variables have been discussed in Chapter 5 in great detail, we apply those

results here.

Case 1: If all stratum means of auxiliary variable are known, ( )1q will be

satisfied by applying following transformation,

( )

( )

( )

( ) ( )

33 3

0 0

11 * 1

111

h

h

hh h

Nh h

Nh NN N h h cc hh

hc hN

μ==

=

⎧ ⎫⎛ ⎞ ⎧ ⎫ ⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎪ ⎪⎜ ⎟⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎜ ⎟ ⎜ ⎟ ⎜ ⎟= −⎨ ⎬ ⎨ ⎬ ⎨ ⎬⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟′+⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎜ ⎟ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎩ ⎭⎩ ⎭⎪ ⎪⎝ ⎠⎩ ⎭

I0Y Y

11P 1 Y Y

. (6.16)

100

Case 2: If some but not all stratum-specific auxiliary means are known while the

population auxiliary mean is unknown, such incomplete auxiliary information can also

be used to improve estimation. To illustrate this we consider a case where two stratum

auxiliary means ( )11μ and ( )1

2μ are known. We apply the following transformation to

incorporating the two known stratum auxiliary means, ( )

( )

( )

( ) ( )

( ) ( )

( )

1

2

* 1 1 11 1 1

* 1 1 12 2 2

* 1 13 3

N

N

μ

μ

⎛ ⎞ ⎛ ⎞−⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟= −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

Y Y 1

Y Y 1

Y Y

. (6.17)

Therefore, constraint ( )1q is reduced to

( )( ) ( )3

1 13 3 0n μ′ =1 w ,

to hold this condition for any ( )13μ , it is required that ( )

3

13n′ =1 w 0 .

6.8.3. Estimators for ( )0T after reparameterization

Following the strategy outlined in Chapter 5, we transform the auxiliary variable

( )1Y to ( )* 1Y using either (6.16) or (6.17) whenever appropriate. As an analogue to

(5.2) of Chapter 5, it can be shown that ( )( ) ( )( )* 1 1cov cov=Y Y .

Based on Model (6.11) , the partitioned model can be written as

**

*II

IIII

⎛ ⎞ ⎛ ⎞= +⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠

XZμ EXZ

, (6.18)

where ( )

( )

30

*1 *

1

hII

hIc h=

⎧ ⎫⎛ ⎞⎪ ⎪= ⎜ ⎟⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭

YZ

Y ,

( )

( )

30

*1 *

1

hIIII

hIIc h=

⎧ ⎫⎛ ⎞⎪ ⎪= ⎜ ⎟⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭

YZ

Y, ( )

3

11 hI p nh +== ⊕ ⊗X I 1 ,

( )3

11 h hII p N nh + −== ⊕ ⊗X I 1 , and ( )( ) ( )( ) ( )( )( )0 0 0*

1 2 30 0 0μ μ μ′

=μ .

The variance of ( )0T is thus

101

( ) ( ),

0,

ˆvarI I II

III II II II

T⎛ ⎞⎛ ⎞⎜ ⎟′ ′= − ⎜ ⎟⎜ ⎟′⎜ ⎟ −⎝ ⎠⎝ ⎠

V V ww

V Vl

l.

6.8.4. Case 1: all stratum means of the auxiliary variable are known

When all stratum means of the auxiliary variable are known, we apply

transformation (6.16), and define the optimization function as

( ) ( ){ }{ }3*, 1

2 2I I II II I h hc hN n

=′′ ′ ′Φ = − + − −w w V w w V λ X wl ,

where ( ) ( )( )3 3

*

1 11 0 1 0

hI I nh h= =

⎛ ⎞ ⎛ ⎞′ ′ ′= ⊕ = ⊕ ⊗⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

X X 1 . Differentiating ( )Φ w with respect

to w and λ , and setting the derivatives to zero leads to the following solution,

3

1

1ˆ

h

hI n

hhc h

Nnβ

=

⎧ ⎫⎛ ⎞ ⎛ ⎞⎪ ⎪= − + ⊗⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭w 1l , (6.19)

where 01,21,

hh

h

σβ

σ= is the linear regression coefficient of ( )0y on ( ) ( ) ( )1 * 1 1

Nμ= −y y 1 in

the h -th stratum of the finite population (Cochran 1977).

The estimator for ( )0T can be written as

( ) ( ) ( ) ( )( )3

3 30 0 1 1

1 11

1ˆ

h

hn I h h h h h

h hhhc h

NT N y N yn

β μβ = =

=

⎧ ⎫⎛ ⎞ ⎛ ⎞⎪ ⎪⎜ ⎟= ⊗ = − −⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭∑ ∑1 Z ,

where ( )0hy and ( )1

hy are sample means of the response and auxiliary of the h -th stratum.

The variances of ( )0T is

( )( ) ( ) ( )3

0 2 2 20,

1

1ˆvar 1 ,hh h h

h h

fT N

nρ σ

=

−= −∑

102

where 01,

0, 1,

hh

h h

σρ

σ σ= is the correlation coefficient between the response variable and

auxiliary variable in the h -th stratum.

6.8.5. Case 2: some but not all stratum means of the auxiliary variable are known

When some but not all stratum means of the auxiliary variable are known, we

apply transformation (6.17) and define the optimization function as

( ) ( ){ }3

* 1,2 2 ,

0

h hc hI I II II I

N n=

⎧ ⎫⎛ ⎞−⎪ ⎪′ ⎜ ⎟′ ′ ′Φ = − + −⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭w w V w w V λ X wl

where ( )

( )

( )( )( )( )( )31 2 3

33

1* 1

1 21 5

1 01 0

1

hnhhI I

nn n n

==

× + +×

⎛ ⎞⎛ ⎞ ′⊕ ⊗⊕ ⎜ ⎟⎜ ⎟′ ′= = ⎜ ⎟⎜ ⎟ ′⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

1X X

0 10. Differentiating ( )Φ w with

respect to w and λ and setting the derivatives to zero results in the following

estimating equations,

( ){ },

*3

* 1

ˆ

ˆ0

I II II

I I

h hc hI

N n=

⎛ ⎞⎜ ⎟⎛ ⎞⎛ ⎞⎜ ⎟⎛ ⎞⎜ ⎟⎜ ⎟ −= ⎜ ⎟⎜ ⎟′⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠ ⎜ ⎟⎝ ⎠ ⎜ ⎟⎝ ⎠⎝ ⎠

VwV X

λX 0

l

. (6.20)

Solving (6.20) leads to

( ) ( ){ }

1 1 *,

31

* 1 * * 1 1,

ˆˆ

ˆ .0

I I II II I I

h hc hI I I I I I II II

N n

− −

−− − =

⎧ = −⎪⎪ ⎛ ⎞⎛ ⎞⎨ −⎜ ⎟′ ′ ⎜ ⎟= −⎪

⎜ ⎟⎜ ⎟⎪ ⎝ ⎠⎝ ⎠⎩

w V V V X λ

λ X V X X V V

l

l

After algebraic simplification, we have

103

3

2

3

1

1 3

3

1

1ˆ

0 1

0

h

h

hn

h hc h

n

c hn

Nn

Nn

β=

=

⎛ ⎞⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟⊗⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟−⎧ ⎫ ⎪ ⎪⎝ ⎠⎛ ⎞⎛ ⎞ ⎩ ⎭⎜ ⎟⎪ ⎪⎜ ⎟= − ⊗ +⎜ ⎟⎨ ⎬ ⎜ ⎟⎜ ⎟⎜ ⎟⎪ ⎪ ⎛ ⎞⎝ ⎠ ⎜ ⎟⎝ ⎠⎩ ⎭⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

1

w 1

1

, (6.21)

where 201, 1,h h hβ σ σ= , 1,2.h = Consequently, the estimator for 0T and its variance are

( ) ( ) ( )( ) ( )

( ) ( ) ( )( )( ) ( )

1

20 1 * 0 ** 3

0 31 3

20 1 1 03

31 3

ˆ ˆ 1h h

hI I n hI h n hI n I

h h

hh h h h

h h

N NTn n

N Ny y yn n

β

β μ

=

=

′′ ′ ′ ′= + = − +

= − − +

∑

∑

w Z Y 1 Y 1 Yl

( ) ( ) ( )( ) ( )2

* * 2 2 2 2 230 0, 3 0,3

1 3

1 1ˆ 垐var cov 1hI I I h h h

h h

f fT Z N Nn n

ρ σ σ=

− −′′= + + = − +∑w wl l .


Case I: All stratum means of the auxiliary variable are known

As shown in Section 6.7, under the unbiasedness constraint of 0T for 0T and

after transformation (6.15), the optimization function is defined as

( ) ( ){ }{ }3*, 1

2 2 0I I II II I h hc hN n

=′′ ′ ′Φ = − + − − =w w V w w V λ X wl ,

where ( ) ( ) ( ) ( ) ( ) ( )( )0 1 0 1 0 11 1 2 2 3 3

′′ ′ ′ ′ ′ ′=w w w w w w w , ( )khw has dimension of 1hn × ;

( ) ( )( )3 3

*

1 11 0 1 0

hI I nh h= =

⎛ ⎞ ⎛ ⎞′ ′ ′= ⊕ = ⊕ ⊗⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

X X 1 . Differentiating ( )Φ w with respect to w

and λ leads to the following estimation equations

( ){ }

*,

3*1

I II III I

h hI c hN n

=

⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟=⎜ ⎟⎜ ⎟′⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠⎝ ⎠

VV X w

X 0 λ

l

104

which has a unique solution,

( ) ( ){ }( )1 3* 1 * * 1, 1

1 1 *,

ˆ

ˆˆ

I I I I I I II II h hc h

I I II II I I

N n−

− −

=

− −

′ ′= − −

= −

λ X V X X V V

w V V V X λ

l

l

Since 3

1 1

1I hIh

− −

== ⊕V V , ( ), , ,1

H

I II II I h I IIh=′= = ⊕V V V and ( )

31 1

, ,1I I II hI h I IIh

− −

== ⊕V V V V , we have

( )( ) ( )( )( )( ) ( ) ( ){ }

( )( ) ( )( )( )( ) ( )( )

13 3 31

1 1 1

3 3 31, 11 1

131

1

31

,1

ˆ 1 0 1 0

1 0

1 0 1 0

1 0

h h

h

h h

h

n hI nh h h

n hI II h hh I II c hh h

n hI nh

n hI II hh I II ch

N n

N

−−

= = =

−

== =

−−

=

−

=

⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞′′= ⊕ ⊗ ⊕ ⊕ ⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠⎛ ⎞⎛ ⎞⎛ ⎞′× ⊕ ⊗ ⊕ − −⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠

⎛ ⎞⎛ ⎞′′= ⊕ ⊗ ⊗⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

⎛ ⎞′× ⊕ ⊗ −⎜ ⎟⎝ ⎠

λ 1 V 1

1 V V

1 V 1

1 V V

l

l ( ){ }3

1h hn

=

⎛ ⎞−⎜ ⎟⎝ ⎠

,

( ) ( )( )3 31 1

,1 1ˆˆ 1 0

hhI II hI nh I IIh h

− −

= =

⎛ ⎞ ⎛ ⎞′= ⊕ − ⊕ ⊗⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

w V V V 1 λl

Since ,h hhI h n N= ⊗V Σ P , 1 1 1,h hhI h n N

− − −= ⊗V Σ P , ( ) ( ) ( ), ,1


⎛ ⎞′= = ⊗ −⎜ ⎟

⎝ ⎠V V Σ J ,

( ) ( )1

2,1

h h hhI h I II n N nh hN n

−× −

⎛ ⎞= ⊗ −⎜ ⎟−⎝ ⎠

V V I J , and

3

1

1

0 h hII N n

c h

−

=

⎧ ⎫⎛ ⎞⎪ ⎪= ⊗⎜ ⎟⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭1l ,

( )( )( ) ( )( )( ) ( ){ }

131 1

,1

3 3

11

ˆ 1 0 1 0

1 0

h h h h

h h

n h n N nh

hN n II h hc hh

h h

n N nN n

−− −

=

− ==

⎛ ⎞⎛ ⎞′′= ⊕ ⊗ ⊗ ⊗⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞

′⎜ ⎟× ⊕ ⊗ − − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟−⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠

λ 1 Σ P 1

1 l

105

( ) ( )

( ) ( ){ }

( )

13 11 1

,1

33 3

111

31

1

1ˆ 1 0

0

11 0

0

11 0

0

h h h h

h h h h

h n n N nh

hN n N n h hc hh

h h c h

h hhh

h h

n N nN n

N nN n

−

−− −

=

− − ==

=

−

=

⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟ ′= ⊕ ⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠⎛ ⎞⎧ ⎫⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞ ⎪ ⎪⎜ ⎟′× ⊕ ⊗ − ⊗ − −⎜ ⎟⎜ ⎟⎜ ⎟ ⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟−⎜ ⎟⎝ ⎠ ⎪ ⎪⎝ ⎠⎝ ⎠ ⎝ ⎠⎩ ⎭⎝ ⎠

⎛ ⎞−= ⊕ ⎜ ⎟

⎝ ⎠

λ Σ 1 P 1

1 1

Σ ( ){ } ( ){ }( )

( )

13 3

1 1

31

1

13

201,2

0, 2 20, 1,

1

11 0

0

1

h h hc ch h

h hh

hc h

hh hh

h h hc h

n N n

N nn

N nn

σσ

σ σ

−

= =

−

−

=

=

⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟ − − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠

⎧ ⎫⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞−⎪ ⎪⎜ ⎟⎜ ⎟= − ⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎪ ⎪⎝ ⎠⎝ ⎠⎝ ⎠⎩ ⎭

⎧ ⎫⎛ ⎞⎛ ⎞−⎪ ⎪= − −⎜ ⎟⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎪ ⎪⎝ ⎠⎝ ⎠⎩ ⎭

Σ

( )

( )

3 31

21 1

3

1

31

1

11 ˆˆ0

1 1

0

1

0

hh h h

h hh h h

hII h nn N nh h

h h h h

N nn N nh hc h

hhh

h h

NN n N n

N n

NN n

−× −= =

−× −

=

−

=

⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟= ⊕ ⊗ − − ⊕ ⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠

⎧ ⎫⎛ ⎞ ⎛ ⎞⎪ ⎪= ⊗ −⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟ −⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭

⎛ ⎞⎛ ⎞− ⊕ ⎜⎜ ⎟ ⎜−⎝ ⎠ ⎝

w I J Σ 1 λ

J 1

Σ

l

( )

( ) ( )

31

1

1

3 1

1 1

1

11 0

0

1 1 11 0

0 0 0

h

h h

h hn h

hc h

hn h n h

hc h c

N nn

Nn

−

−

=

−

− −

=

⎧ ⎫⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎞−⎪ ⎪⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⊗ −⎟ ⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎪ ⎪⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠⎩ ⎭

⎛ ⎞⎧ ⎫ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞⎪ ⎪ ⎜ ⎟⎜ ⎟ ⎜ ⎟= ⊗ − − − ⊗⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎨ ⎬ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠⎪ ⎪⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎩ ⎭ ⎝ ⎠

1 Σ

1 Σ 1 Σ

( )

3

1

31

1 1

1

1 11 0

0 0h

h

hI h n h

hc h

Nn

=

−

− −

=

⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭

⎧ ⎫⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎪ ⎪⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟= − + − ⊗⎜ ⎟ ⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎪ ⎪⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠⎩ ⎭

Σ 1 Σl

106

( )

( )

31

1 1

1

31

1 1

1

1 1ˆ 1 0

0 0

1 11 0

0 0

h

h

hI h n h

hc h

hI h h n

hc h

Nn

Nn

−

− −

=

−

− −

=

⎧ ⎫⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎪ ⎪⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟= − + − ⊗⎜ ⎟ ⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎪ ⎪⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠⎩ ⎭

⎧ ⎫⎡ ⎤⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎪ ⎪⎢ ⎥⎜ ⎟⎜ ⎟= − + ⊗⎜ ⎟ ⎜ ⎟⎨ ⎬⎜ ⎟⎢ ⎥⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎪ ⎪⎝ ⎠ ⎝ ⎠⎝ ⎠⎢ ⎥⎝ ⎠⎣ ⎦⎩ ⎭

= −

w Σ 1 Σ

Σ Σ 1

l

l

l

3

01,21, 1

3

1

1

1,

h

h

hI nh

hhc h

hI n

hhc h

Nn

Nn

σσ

β

=

=

⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟ ⎛ ⎞⎪ ⎪+ ⊗⎜ ⎟⎨ ⎬⎜ ⎟

− ⎝ ⎠⎜ ⎟⎪ ⎪⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭

⎧ ⎫⎛ ⎞ ⎛ ⎞⎪ ⎪⎜ ⎟= − + ⊗⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭

1

1l

where 01,21,

hh

h

σβ

σ= .

Therefore, the estimator for ( )0T can be written as

( ) ( )

( ) ( )

( ) ( )( )

3

0

1

3 30 * 1

1 1

3 30 1

1 1

ˆ 1

.

h

h h

hh n I

hr h

h hn hI h n hI

h hh h

h h h h hh h

NTn

N Nn n

N y N y

β

β

β μ

=

= =

= =

⎧ ⎫⎛ ⎞⎪ ⎪′= − ⊗⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭

′ ′= −

= − −

∑ ∑

∑ ∑

1 Z

1 Y 1 Y

The variance of ( )0T is

( )( ) ( ) ( ) ( )

( ) ( )

( ) ( )

33

0

1 1

333

11 1

,

ˆvar 1 cov 1

1 1

1

h h

h h

h h h

h hh n I h n

h hr ch h

h hh n hI h nh

h hr ch h

hh n h n N

h

N NTn n

N Nn n

Nn

β β

β β

β

= =

== =

⎧ ⎫⎧ ⎫⎛ ⎞ ⎛ ⎞⎪ ⎪ ⎪ ⎪′′= − ⊗ − ⊗⎨ ⎬ ⎨ ⎬⎜ ⎟ ⎜ ⎟⎪ ⎪ ⎪ ⎪⎝ ⎠ ⎝ ⎠⎩ ⎭ ⎩ ⎭

⎧ ⎫⎧ ⎫⎛ ⎞ ⎛ ⎞⎪ ⎪ ⎪ ⎪⎛ ⎞ ′′= − ⊗ ⊕ − ⊗⎨ ⎬ ⎨ ⎬⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠⎪ ⎪ ⎪ ⎪⎝ ⎠ ⎝ ⎠⎩ ⎭ ⎩ ⎭

⎛ ⎞⎛ ⎞′= − ⊗ ⊗⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

1 Z 1

1 V 1

1 Σ P3

1

1h

hn

h hh

Nnβ=

⎧ ⎫⎛ ⎞⎛ ⎞ ⎛ ⎞⎪ ⎪⎜ ⎟⎜ ⎟ ⊗⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎝ ⎠⎩ ⎭∑ 1

107

( )( ) ( ) ( )

( ) ( )

( ) ( )

( )

30

1

3

1

32 2 20, 01, 1,

1

201,2

0, 21 1,

1ˆvar 1

11

2

h h hh h

h h h

h h hh h

h h h

h h hh h h h h

h h

hh h hh

h h h

N N nT

n

N N nn

N N nn

N N nn

ββ

ββ

σ β σ β σ

σσ

σ

=

=

=

=

⎧ ⎫⎛ ⎞⎛ ⎞−⎪ ⎪⎜ ⎟⎜ ⎟= − Σ⎨ ⎬⎜ ⎟⎜ ⎟−⎪ ⎪⎝ ⎠⎝ ⎠⎩ ⎭⎧ ⎫⎛ ⎞⎛ ⎞−⎪ ⎪⎜ ⎟⎜ ⎟= − Σ⎨ ⎬⎜ ⎟⎜ ⎟−⎪ ⎪⎝ ⎠⎝ ⎠⎩ ⎭

−⎧ ⎫= − +⎨ ⎬

⎩ ⎭⎧ ⎫⎛ ⎞−⎪ ⎪= −⎜ ⎟⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭

∑

∑

∑

( ) ( )

3

32 2 2

0,1

11 ,h

h h hh h

fN

nρ σ

=

−⎧ ⎫= −⎨ ⎬

⎩ ⎭

∑

∑

where 01,

0, 1,

hh

h h

σρ

σ σ= is the correlation coefficient between the response variable and

auxiliary variable in the h -th stratum.


When some but not all stratum means for the auxiliary variable are known, after

application of transformation (6.16) to ( )1hY and partitioning into sample and remaining

parts IZ and IIZ . The optimization function is defined as

( ) ( ){ }3

* 1,2 2 ,

0

h hc hI I II II I

N n=

⎧ ⎫⎛ ⎞−⎪ ⎪′ ⎜ ⎟′ ′ ′Φ = − + −⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭w w V w w V λ X wl

where ( )

( )

( )( )( )( )( )31 2 3

33

1* 1

1 21 5

1 01 0

1

hnhhI I

nn n n

==

× + +×

⎛ ⎞⎛ ⎞ ′⊕ ⊗⊕ ⎜ ⎟⎜ ⎟′ ′= = ⎜ ⎟⎜ ⎟ ′⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

1X X

0 10. Differentiating ( )Φ w with

respect to w and λ and setting the derivatives to zero results in the following

estimating equations,

108

( ){ },

*3

* 1

ˆ

ˆ0

I II II

I I

h hc hI

N n=

⎛ ⎞⎜ ⎟⎛ ⎞⎛ ⎞ ⎜ ⎟⎛ ⎞⎜ ⎟⎜ ⎟ −= ⎜ ⎟⎜ ⎟′⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠ ⎜ ⎟⎝ ⎠ ⎜ ⎟⎝ ⎠⎝ ⎠

VwV X

λX 0

l

. (6.22)

After simplification, ( )1 2

3 3

2

2 21*

2 2 2

1

0 hn n nhI

n n

+ ×=

×

⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⊕ ⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟= ⎝ ⎠⎝ ⎠⎜ ⎟⎜ ⎟⊗⎝ ⎠

1 0X

0 I 1

. Solving (6.22) leads to

( ) ( ){ }

1 1 *,

31

* 1 * * 1 1,

ˆˆ

ˆ0

I I II II I I

h hc hI I I I I I II II

N n

− −

−− − =

⎧ = −⎪⎪ ⎛ ⎞⎛ ⎞⎨ −⎜ ⎟′ ′ ⎜ ⎟= −⎪

⎜ ⎟⎜ ⎟⎪ ⎝ ⎠⎝ ⎠⎩

w V V V X λ

λ X V X X V V

l

l.

Since 1

H

I hIh== ⊕V V , 1 1 1

,h hhI h n N− − −= ⊗V Σ P , ( ), , ,1

H

I II II I h I IIh=′= = ⊕V V V ,

1

H

II hIIh== ⊕V V ,

( ) ( ) ( ), ,1


⎛ ⎞′= = ⊗ −⎜ ⎟

⎝ ⎠V V Σ J , and

3

1

1

0 h hII N n

c h

−

=

⎧ ⎫⎛ ⎞⎪ ⎪= ⊗⎜ ⎟⎨ ⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭1l , we have,

( ) ( )1

2,1

h h hhI h I II n N nh hN n

−× −

⎛ ⎞= ⊗ −⎜ ⎟−⎝ ⎠

V V I J ,

3

1,

1

1

0 hI I II II n

c h

−

=

⎧ ⎫⎛ ⎞⎛ ⎞⎪ ⎪⎜ ⎟= − ⊗⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎪ ⎪⎝ ⎠⎝ ⎠⎩ ⎭V V 1l , and

consequently,

( )( )( )

{ }3

31 2

22 3 1

2 2* 1 1,

322 2 1

11 01

00

h

h

hc hn nh

I I I II II n

cnn n h

n

n

=

×− =

× + =

⎛ ⎞⎛ ⎞ ⎜ ⎟⎧ ⎫⎛ ⎞′⊕ ⊗ ⎪ ⎪⎜ ⎟ ⎜ ⎟′ = − ⊗ = −⎜ ⎟⎨ ⎬ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟′ ⎪ ⎪ ⊗⎜ ⎟⊗ ⎝ ⎠⎩ ⎭ ⎜ ⎟⎝ ⎠ ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

1 0X V V 1

0 I 1l ,

( ){ }{ }

( ){ } { }2

3 3 31

* 1 11 1,

3

100 0

0

hc hhh h h h cc c hh h

I I I II II

nNN n N n

n

=

− == =

⎛ ⎞⎜ ⎟⎛ ⎞ ⎛ ⎞ ⎛ ⎞− −⎜ ⎟′ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟− = − − = −⎛ ⎞⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⊗⎜ ⎟ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

X V V l ,

109

( )( )

22 20, 01,1

1* 1 *

3 33

3 3

1h hh hh

h h

I I I

N nN n

N nN n

σ ρ=

−−

⎛ ⎞−⎛ ⎞⊕ −⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟

′ = ⎜ ⎟−⎜ ⎟

⎜ ⎟⎜ ⎟⎝ ⎠

0

X V X0 Σ

,

( )2

2 20, 01,

1

23 30,3 01,3

3 20,3

1

ˆ 1

h hh h

hc h

N nn

N nn

σ ρ

σ σσ

=

⎛ ⎞−⎧ ⎫⎪ ⎪⎜ ⎟−⎨ ⎬⎜ ⎟⎪ ⎪⎩ ⎭⎜ ⎟⎜ ⎟= − ⎛ ⎞⎜ ⎟⎜ ⎟−⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

λ .

( ) ( )

( )

1 2

3 3

233 2 211 1

,1

12 2 2

2

2 20, 01,

1

20,33 3

3 01,3

11

0ˆ0

1

h

h h h

h

n n nhn h n Nh

c hn n

hh h

hc h

N nn

N nn

σ ρ

σ

σ

+ ×=− −

=

=×

=

⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎧ ⎫ ⊕ ⊗⎜ ⎟⎛ ⎞⎛ ⎞⎪ ⎪ ⎛ ⎞ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟= − ⊗ − ⊕ ⊗⎜ ⎟ ⎝ ⎠⎨ ⎬ ⎜ ⎟ ⎝ ⎠⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠⎪ ⎪⎝ ⎠⎝ ⎠⎩ ⎭ ⎜ ⎟⎜ ⎟⊗⎝ ⎠

⎛ −⎧ ⎫⎪ ⎪⎜ − −⎨ ⎬⎜⎪ ⎪⎩ ⎭⎜×⎜ ⎛ ⎞−⎜ ⎜ ⎟−⎜ ⎜ ⎟⎜ ⎝ ⎠⎝

1 0w 1 Σ P

0 I 1

( ) ( )

( )( )3 3 3

22

1 1 2 2, 0, 01,13

1

20,31 11 3 3

3 ,3 01,3

11

01.

0

h

h h h

h

hh n N n h hh

hc h

n

c hn N n

N nn

N nn

σ ρ

σ

σ

− −

==

− −=

⎞⎟⎟⎟⎟⎟⎟⎟⎠

⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞ −⎧ ⎫⎪ ⎪⎜ ⎟⎜ ⎟⎜ ⎟⊕ ⊗ −⎜ ⎟ ⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎪ ⎪⎩ ⎭⎝ ⎠⎝ ⎠⎧ ⎫ ⎝ ⎠⎛ ⎞⎛ ⎞ ⎜ ⎟⎪ ⎪⎜ ⎟= − ⊗ +⎜ ⎟⎨ ⎬ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎛ ⎞⎪ ⎪ ⎜ ⎟⎝ ⎠⎝ ⎠ −⎩ ⎭ ⎜ ⎟⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

Σ P 1

1

Σ P 1

This can be further simplified as

110

3

2

301, 1

1 3

3

1

1ˆ

0 1

0

h

h

hn

h hc h

n

c hn

Nn

Nn

β=

=

⎛ ⎞⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟⎜ ⎟ ⊗⎨ ⎬⎜ ⎟⎜ ⎟−⎧ ⎫ ⎪ ⎪⎛ ⎞ ⎝ ⎠⎛ ⎞ ⎩ ⎭⎜ ⎟⎪ ⎪⎜ ⎟= − ⊗ +⎜ ⎟⎨ ⎬ ⎜ ⎟⎜ ⎟⎜ ⎟⎪ ⎪ ⎜ ⎟⎛ ⎞⎝ ⎠⎝ ⎠⎩ ⎭⊗⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

1

w 1

1

,

where 201 01, 1,h h hβ σ σ= , 1,2.h =

111

CHAPTER 7

APPLICATIONS TO RATE ESTIMATION

7.1. Motivations

This Chapter discusses applications of the methods developed in Chapters 4, 5

and 6 to rate estimation that is common in epidemiologic surveys. We will consider

scenarios involving up to two auxiliary variables, either categorical or continuous.

7.2. Cases with single categorical auxiliary variable

We first consider the scenarios with single categorical auxiliary variable that has

only two categories (binomial), then extend the results to the case with more than two

categories.

7.2.1. Binomial auxiliary variable

Suppose that a sample survey is conducted to estimate the disease prevalence of

a population. A SRSWOR sample of size n is drawn from the population, and the

auxiliary variable is subjects’ gender. In practice, the following scenarios may occur,

1) Gender is known for all subjects, and

2) Only total numbers of males ( )mN and females ( )fN are known.

From Section 5.3 of Chapter 5, we know the estimator for population disease

prevalence can be written as

( )0 0 01 1 1r r r rβ= + − , (7.1)

112

where 0r is disease rate of the sample, 1r is the proportion of males in the sample, 1r is

the proportion of males in the population, and 201 01 1β σ σ= is the population regression

coefficient of disease status on gender.

When either the gender of all subjects or the total numbers of males (females) of

the population is known, 21σ is known, that is,

( )

21 1

m fN NN N

σ =−

.

For this scenario, knowledge of the total number of males or females is adequate.

By methods of moments, we replace other variance components, 20σ and 01σ ,

with their sample estimates,

( )( )

0 020ˆ

1n n nn n

σ−

=−

and ( )

0001ˆ

1m f fm

m f

n n nnn n n n

σ⎛ ⎞

= −⎜ ⎟⎜ ⎟− ⎝ ⎠,

where mn and fn are numbers of males and females in the sample, 0 0 0m fn n n= + , 0mn

and 0 fn are the numbers of male and female disease bearers in the sample,

respectively. Consequently, using 21σ ,

( )( )

0001

1ˆ1

m f fm

m f m f

n n nN N nn n N N n n

β⎛ ⎞⎛ ⎞−

= −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟− ⎝ ⎠⎝ ⎠.

The estimator for disease prevalence is simplified to

00 00 2

1ˆ1

m f f fm m

f m m f

n n n nn N n nrn n n N N n n

⎛ ⎞⎛ ⎞⎛ ⎞−= + − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟− ⎝ ⎠⎝ ⎠⎝ ⎠

. (7.2)

Its variance is

113

( ) ( )( ) ( )

( )( )

( )( )

0 0 20 01

22 200 0 0

22

1 ˆˆ 11

11 1 ,1 1

m f fm

m f m f

n n nfV rn n n

n n nn n n N Nf f nn n n n N N n nn n

ρ⎛ ⎞−−

= −⎜ ⎟⎜ ⎟−⎝ ⎠

⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞− −− −= − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠

(7.3)

where ( )( ) ( )

2

02 001

0 0

1ˆ1

m f m f fm

m f m f

n n n n nN N nn n N N n n n n n

ρ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞−

= −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟− −⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠.

If we use ( ) ( )20 0 0ˆ 1n n n n nσ = − − , ( )2

1ˆ 1m fn n n nσ = − and

( )00

01ˆ1

m f fm

m f

n n nnn n n n

σ⎛ ⎞

= −⎜ ⎟⎜ ⎟− ⎝ ⎠, then using 2

1σ , 0001

ˆ fm

m f

nnn n

β = − , and

000

f fm m

m f

N nN nrN n N n

⎛ ⎞⎛ ⎞= + ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

. (7.4)

This is a “post-stratified” estimator of prevalence rate, which has been shown

using calibration (Deville and Särndal 1992) and model-assisted approaches (Särndal,

Swensson and Wretman 1992). Its variance is

( ) ( )

2200

0 01ˆ

1fm

pm f

nf nV r nn n n n

⎛ ⎞⎛ ⎞−= − +⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟− ⎝ ⎠⎝ ⎠

. (7.5)

Derivations of (7.2), (7.3), (7.4) and (7.5) are given in Section 7.5.

7.2.2. Relative performance of various estimation methods

We conducted a small Monte Carlo simulation to evaluate the performance of

estimators (7.2) and (7.5). We created a series of hypothetical populations of sizes 50,

100, 200, 400, 800 and 1,600; each consists of 50% men and 50% women. In all

populations, the prevalence of smokers is 60% for men and 20% for women. We use

114

the known gender proportions to improve our estimation based on estimators shown in

the previous section.

The simulation study was carried out as follows. From each population of size

25,50,100,200,400,800N = , a series of 5,000 SRSWOR samples of size n , where

25,50,100,200,400,800n = and n N< , were drawn. For each sample, we estimated

the proportion of smokers, and its variance using the following methods:

1) SEP: Simple Expansion Predictor without auxiliary information.

2) DPS: Design-based Predictor using Sample estimate of population variance

of gender ( )21σ and estimated covariance ( )01σ , i.e., (7.4) and (7.5).

3) DPP: Design-based Predictor using known Population variance of gender

( 21σ ), i.e., (7.2) and (7.3), respectively.

4) POP: Design-based predictor using known Population variance and

covariance, i.e., predictor (5.12).

5) JKF: The variance of design-based predictor (7.4) was estimated with

modified delete-one Jackknife method (Jones 1974) using functional form

(7.4).

The 95% confidence intervals (CIs) were computed using the variance estimators based

on asymptotic normal approximation. We then compared the overall performance of

these estimation methods in terms of mean squared error (MSE), coverage rates and

average length of the confidence intervals. Simulation results are summarized in the

following Tables 7.1, 7.2 and 7.3, and presented graphically in Figures 7.1, 7.2 and 7.3.

Based on these simulation data, we made the following observations.

115

1) As expected, these simulations confirm that all these estimators are

asymptotically unbiased, or at least the biases are negligible; especially when sample

size is not small.

2) We observed significant reduction in MSE of the estimators relative to

simple expansion estimators for small and large sample sizes (Table 7.1, Figure 7.2),

and thus narrower confidence intervals (Table 7.2).

3) From Figures 7.2 and 7.3, we concluded that DPP method is superior to DPS

method in terms of MSE reduction, especially when sample size is relatively small

(n<200).

4) The coverage rate of the confidence intervals constructed using the variance

estimators of the SRS, DPS, DPP, POP and JKF methods are all very close to their

nominal level (95%). When sample sizes are small ( )50n ≤ , DPP and DPS confidence

intervals tend to have highest coverage rates and the narrowest width. When sample

sizes are moderate or large, there are little differences in coverage rates between the

methods. However, the CIs based on Jackknife variance estimator tend to be much

wider than POP, DPP and DPS methods.

5) From Table 7.3, the CIs based on Jackknife variance and DPS method

appeared to have higher rate of right-miss for smaller sample sizes (n<=100).

116

Table 7.1. Mean squared error (x 10,000) of four estimators by population size and sample size

---------------------------------------------------------------------------------------

| Sample size and estimation method

Population | --------- 25 --------- --------- 50 --------- --------- 100 --------

size | SEP DPS DPP POP SEP DPS DPP POP SEP DPS DPP POP

-----------+---------------------------------------------------------------------------

50 | 48.3 42.6 39.7 39.7

100 | 71.6 64.3 60.6 60.6 24.4 21.2 20.5 20.5

200 | 85.2 75.2 70.8 70.8 35.5 30.4 29.5 29.5 12.0 10.1 9.9 9.9

400 | 89.1 76.8 73.0 73.0 40.2 34.9 34.0 34.0 17.8 15.1 15.0 15.0

800 | 96.3 85.0 81.0 81.0 44.9 38.7 37.7 37.7 21.5 18.0 17.8 17.8

1600 | 95.6 84.1 80.8 80.8 47.1 40.8 39.9 39.9 21.8 18.7 18.5 18.5

---------------------------------------------------------------------------------------

| --------- 200 -------- --------- 400 -------- --------- 800 --------

-----------+---------------------------------------------------------------------------

400 | 5.9 4.9 4.9 4.9

800 | 9.2 7.6 7.6 7.6 3.0 2.5 2.5 2.5

1600 | 10.8 9.1 9.0 9.0 4.6 3.8 3.8 3.8 1.5 1.2 1.2 1.2

---------------------------------------------------------------------------------------

SEP = Simple expansion predictor based SRSWOR, DPS = Estimates based on

sample variance and covariance, DPP = Estimates based on sample covariances and

population variance of the auxiliary variable, POP = Estimates based on population

variance and covariance.

117

Table 7.2. Coverage rates and width of 95% confidence intervals

---------------------------------------------------------------------------------------------------------

| Sample size and Estimation method

Population | ------------ 25 ------------ ------------ 50 ------------ ------------ 100 -----------

size | SEP DPS DPP POP JKF SEP DPS DPP POP JKF SEP DPS DPP POP JKF

-----------+---------------------------------------------------------------------------------------------

50 | 96.0 94.3 95.1 95.1 94.6

| 27.4 25.0 24.7 24.7 26.1

|

100 | 93.4 93.4 94.0 94.0 93.4 93.2 94.1 94.7 94.7 94.7

| 33.4 30.5 30.0 30.0 31.8 19.3 17.6 17.5 17.5 17.9

|

200 | 91.0 92.6 94.0 94.0 93.7 95.4 94.5 95.1 95.1 94.8 94.2 94.9 95.0 95.0 95.0

| 35.9 32.8 32.2 32.2 34.2 23.6 21.5 21.3 21.3 21.9 13.6 12.4 12.4 12.4 12.6

|

400 | 94.7 93.6 94.8 94.8 94.0 94.1 94.5 95.0 95.0 94.8 94.1 94.8 95.1 95.1 95.1

| 37.2 34.0 33.2 33.2 35.4 25.4 23.2 23.0 23.0 23.7 16.7 15.2 15.1 15.1 15.3

|

800 | 93.6 92.9 93.7 93.7 92.8 95.0 94.0 94.3 94.3 94.6 94.8 94.3 94.4 94.4 94.7

| 37.7 34.4 33.7 33.7 35.9 26.3 24.0 23.7 23.7 24.4 18.0 16.4 16.3 16.3 16.6

|

1600 | 93.6 93.0 94.1 94.1 93.3 94.2 93.8 94.1 94.1 94.0 94.2 94.7 95.0 95.0 94.9

| 38.1 34.8 34.0 34.0 36.3 26.7 24.4 24.1 24.1 24.9 18.6 17.0 16.9 16.9 17.1

---------------------------------------------------------------------------------------------------------

| ------------ 200 ----------- ------------ 400 ----------- ------------ 800 -----------

-----------+---------------------------------------------------------------------------------------------

400 | 95.0 95.6 95.8 95.8 95.7

| 9.6 8.8 8.8 8.8 8.8

|

800 | 94.2 94.7 94.7 94.7 94.6 94.6 94.7 94.6 94.6 94.6

| 11.8 10.7 10.7 10.7 10.8 6.8 6.2 6.2 6.2 6.2

|

1600 | 94.5 94.3 94.3 94.3 94.4 94.6 95.0 94.9 94.9 94.8 95.6 95.2 95.3 95.3 95.1

| 12.7 11.6 11.6 11.6 11.7 8.3 7.6 7.6 7.6 7.6 4.8 4.4 4.4 4.4 4.4

---------------------------------------------------------------------------------------------------------

SEP = Simple expansion predictor, DPS = DP based on sample variance and

covariances, DPP = DP using sample covariances and population variance of the auxiliary

variable, POP = Design-based predictor (DP) based on population variance and covariances,

JKF= CI based on Jackknife variance estimator.

118

Table 7.3. Percentage of left- and right-misses of the 95% confidence intervals, by population size, sample size and estimation methods

---------------------------------------------------------------------------------------------------------

| Sample size and Estimation method

Population | ------------ 25 ------------ ------------ 50 ------------ ------------ 100 -----------

size | SEP DPS DPP POP JKF SEP DPS DPP POP JKF SEP DPS DPP POP JKF

-----------+---------------------------------------------------------------------------------------------

50 | 2.0 2.3 2.4 2.4 2.0

| 2.0 3.5 2.5 2.5 3.4

|

100 | 1.7 2.2 3.3 3.3 2.7 3.2 2.4 2.7 2.7 2.4

| 4.9 4.4 2.7 2.7 3.9 3.5 3.5 2.6 2.6 2.9

| 200 | 2.6 2.6 3.3 3.3 2.7 1.3 2.2 2.4 2.4 2.2 3.0 2.6 2.9 2.9 2.5

| 6.5 4.8 2.7 2.7 3.6 3.3 3.4 2.5 2.5 3.1 2.8 2.4 2.1 2.1 2.5

|

400 | 2.7 2.4 3.0 3.0 2.5 2.0 2.5 2.7 2.7 2.4 2.4 2.4 2.7 2.7 2.3

| 2.6 4.1 2.2 2.2 3.5 3.9 3.0 2.3 2.3 2.8 3.5 2.8 2.2 2.2 2.6 |

800 | 3.3 2.6 3.7 3.7 2.7 2.8 2.6 3.2 3.2 2.4 2.1 2.6 3.1 3.1 2.4

| 3.1 4.5 2.6 2.6 4.4 2.2 3.4 2.4 2.4 2.9 3.0 3.1 2.5 2.5 2.9

|

1600 | 3.5 2.6 3.4 3.4 2.7 3.2 2.9 3.6 3.6 3.0 2.2 2.2 2.5 2.5 2.2 | 2.8 4.4 2.5 2.5 4.0 2.5 3.4 2.3 2.3 3.0 3.6 3.1 2.5 2.5 2.9

---------------------------------------------------------------------------------------------------------

| ------------ 200 ----------- ------------ 400 ----------- ------------ 800 -----------

-----------+---------------------------------------------------------------------------------------------

400 | 2.2 1.6 1.8 1.8 1.7

| 2.8 2.8 2.4 2.4 2.7

|

800 | 2.8 2.2 2.6 2.6 2.3 2.8 2.6 2.9 2.9 2.7 | 3.0 3.1 2.7 2.7 3.1 2.6 2.7 2.6 2.6 2.7

|

1600 | 3.0 2.8 3.2 3.2 2.9 2.8 2.4 2.7 2.7 2.5 2.1 2.1 2.2 2.2 2.2

| 2.5 2.9 2.5 2.5 2.7 2.7 2.6 2.4 2.4 2.7 2.3 2.7 2.4 2.4 2.7

---------------------------------------------------------------------------------------------------------

SEP = Simple expansion predictor, DPS = Design-based predictor (DP) based on sample variance and covariances, DPP = DP using sample covariances and population variance of the auxiliary variable, POP = DP based on population variance and covariances, JKF= CI based on Jackknife variance estimator.

119

Figure 7.1. Mean squared error by population size and sample size

Sample sizes (from top to bottom): n=25, 50, 100, 200, 400)Symbols: .=SRS, x=DPS, O=DPP, +=POP

MSE

x 1

0,00

0

Population size50 100 200 400 800 1600

0

10

20

30

40

50

60

70

80

90

100

0

10

20

30

40

50

60

70

80

90

100

Sample sizes (from top to bottom): n=25, 50, 100, 200, 400Symbols: . = SRS, x=DPS, O=DPP, +=POP

Var

ianc

e x

10,0

00

Population size50 100 200 400 800 1600

0

10

20

30

40

50

60

70

80

90

100

0

10

20

30

40

50

60

70

80

90

100

120

Figure 7.2. MSE reduction (%) of a design-based predictor (DPS or DPP ) by population and sample sizes

Estimation method: DPSSample size: .=25, o=50, x=100, +=200, O=400, D=800

MS

E re

duct

ion

%

Population size50 100 200 400 800 1600

10

12

14

16

18

20

10

12

14

16

18

20

Estimation method: DPPSample size: .=25, o=50, x=100, +=200, O=400, D=800

MS

E re

duct

ion

%

Population size50 100 200 400 800 1600

10

12

14

16

18

20

10

12

14

16

18

20

121

Figure 7.3. MSE Ratio of design-based predictors to SRS estimator by population and sample sizes

Estimation method: DPSSample size: .=25, o=50, x=100, +=200, O=400, D=800

MS

E R

atio

of D

PS

to S

RS

Population size50 100 200 400 800 1600

.8

.825

.85

.875

.9

.8

.825

.85

.875

.9

Estimation method: DPPSample size: .=25, o=50, x=100, +=200, O=400, D=800

MS

E R

atio

of D

PS

to S

RS

Population size50 100 200 400 800 1600

.8

.825

.85

.875

.9

.8

.825

.85

.875

.9

122

7.2.3. Extension to multinomial auxiliary variable

Let us consider the example of estimating the rate of drug coverage. When a

categorical variable has more than two categories, we may extend the above results by

replacing the categorical variable with a set of indicator variables. For example, we are

interested in estimating the disease prevalence by age categories ( ( )0 1sy = is subject s

has drug coverage, 0 otherwise) based on a fixed size sample survey. The subjects are

classified into three age categories (referred as subpopulations hereafter), e.g., ≤ 65 ,

>65 but <=75 and >75. The marginal totals of the three age categories ( )1 2 3, ,N N N are

known. The age categories may be represented using three indicator variables,

( )1 1 65.0 otherwise.

y≤⎧

= ⎨⎩

( )2 1 65< but 75,0 otherwise.

y≤⎧

= ⎨⎩

( )3 1 >75,0 otherwise.

y ⎧= ⎨⎩

To avoid singularity, we use ( )1y and ( )2y in the model. By applying results

derived in Section 5.4 of Chapter 5, we have

( ) ( )( )0 0 0r r • ••′= − −β Y μ , (7.6)

where 10 0,−• • •=β Σ σ 1 2N N

N N•

′⎛ ⎞= ⎜ ⎟⎝ ⎠

μ and ( ) ( ) ( )( )1 2Y Y• ′=Y .

Since 1N and 2N are known, we have

( )

( )( )

12 11 2

11 2

111

N N NN NN N NN N

−

• −

⎛ ⎞− −= ⎜ ⎟⎜ ⎟− −− ⎝ ⎠

Σ , and

123

( )( )

2 2 1 21

1 2 1 11 2 3

1 N N N N NNN N N N NN N N

−•

−⎛ ⎞−= ⎜ ⎟−⎝ ⎠

Σ .

It is shown that ( )( )

0 020ˆ

1n n nn n

σ−

=−

and ( )

01 0 10

02 0 2

1ˆ1

nn n nnn n nn n•

−⎛ ⎞= ⎜ ⎟−− ⎝ ⎠

σ . Therefore,

( )

( )

( )( ) ( )

01 0 11

0 3 0 023

02 0 22

111 1 1ˆ

1 11 1

nn n nNN N n n nn

n n n n Nnn n n

N

•

⎛ ⎞−⎜ ⎟ ⎛ ⎞− −⎜ ⎟= + − ⎜ ⎟⎜ ⎟ ⎜ ⎟− − ⎝ ⎠⎜ ⎟−⎜ ⎟⎝ ⎠

β


( )

30

0 0 01

1ˆ1

k kk

k k

n N n n nr n nn n n N N n=

⎧ ⎫⎛ ⎞− ⎪ ⎪⎛ ⎞= − − −⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭∑ . (7.7)

Its variance is

( ) ( )2 20 0?

1ˆ垐var 1 frn

ρ σ−⎛ ⎞= − ⎜ ⎟⎝ ⎠

, (7.8)

where 2 1 20 0 0 0ˆ 垐 �ρ σ−• • • •= σ Σ σ .

When sample estimates of variance and covariance are used

It can be shown that this estimator is similar to poststratified estimators. If the

sample estimator of •Σ is used, we have ( )

( )2 2 1 21

1 2 1 11 2 3

1ˆ n n n n nnn n n n nn n n

−•

−⎛ ⎞−= ⎜ ⎟−⎝ ⎠

Σ ,

( )01 0 1

002 0 2

1ˆ1

nn n nnn n nn n•

−⎛ ⎞= ⎜ ⎟−− ⎝ ⎠

σ and thus

101 0

1 30 03 0

3202 0

2

111ˆ11

nn nn n nn n

n nnn nn n

•

⎛ ⎞⎛ ⎞−⎜ ⎟⎜ ⎟ ⎛ ⎞⎝ ⎠ ⎛ ⎞⎜ ⎟= − − ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎛ ⎞ ⎝ ⎠⎜ ⎟−⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

β .

124

Therefore,

3 3

0 00 0 0

1 1

1ˆ k k k k kk

k kk k k

n N n n n N nr n nn n n N N n N n= =

⎛ ⎞⎛ ⎞= − − − =⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

∑ ∑ . (7.9)

Its variance is

( )230

0 01

1 1ˆvar1

k

k k

f nr nn n n=

⎛ ⎞−= −⎜ ⎟− ⎝ ⎠

∑ . (7.10)

When 3p > categories are present

Results (7.7), (7.8), (7.9) and (7.10) can be readily extended to scenarios when

more than three categories are present.

A. When the population variance of the auxiliary variables are used:

( )

00 0 0

1

1ˆ1

pk k

kk k


⎧ ⎫⎛ ⎞− ⎪ ⎪⎛ ⎞= − − −⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭∑ , (7.11)

and

( ) ( )( ) ( )

( )20 00 0 022

1

1 1 1ˆvar1 1

p

k kk k

n n nf Nr nn n nn n n Nn n =

⎧ ⎫−− −⎪ ⎪⎛ ⎞= − −⎨ ⎬⎜ ⎟ −⎝ ⎠ −⎪ ⎪⎩ ⎭∑ . (7.12)

B. When the sample variance of the auxiliary variables are used:

0 00 0 0

1 1

1ˆp p

k k k k kk

k kk k k

n N n n n N nr n nn n n N N n N n= =

⎛ ⎞⎛ ⎞= − − − =⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

∑ ∑ . (7.13)

and its variance is

( ) ( )20

0 01

1ˆvar1

pk

k k

f nr nn n n=

⎛ ⎞−= −⎜ ⎟− ⎝ ⎠

∑ . (7.14)

Derivations of the above results are given in Section 7.6.

125

7.3. Cases with two categorical variables

There are generally two scenarios when two categorical auxiliary variables are

involved, depending on whether the cell or marginal totals of the auxiliary variables are

known. Suppose in a health-related survey, we are interested in estimating the

prevalence of skipping medication behavior (binomial), and we have information about

the subjects’ gender and age from Center for Medicare and Medicaid Services (CMS)

administrative data. After a sample is taken, the gender and age of each sampled

subject will be known. There are several scenarios of practical interest, based on the

availability of information about subjects’ gender and age.

Table 7.4. Two categorical auxiliary variable examples

Age

Total Gender 65-80 >80

Female 1T 2T 1T i

Male 3T 4T 2T i

Total 1Ti 2Ti T

7.3.1. Cases with known cell totals

When all cell totals ( )1 2 3 4, , ,T T T T are known, we may create the following set of

indicator variables:

( )11 Female, Aged 65-80 years,

0 Otherwise.y

⎧⎪= ⎨⎪⎩

126

( )21 Female, Aged >80 years,

0 Otherwise.y

⎧⎪= ⎨⎪⎩

( )21 Male, Aged 65-80 years,

0 Otherwise.y

⎧⎪= ⎨⎪⎩

( )41 Male, Aged >80 years,

0 Otherwise.y

⎧⎪= ⎨⎪⎩

With them, results from Section 5.4 can thus be directly applied to this situation.

To avoid singularity, we may include any three of the four categorical variables into the

model. The estimator and its variance will be the same as (7.11) and (7.12), or (7.13)

and (7.14), respectively.

7.3.2. Cases with only known marginal totals

When only marginal totals are known, we may use two indicator variables to

improve precision of prediction, i.e., one for gender and the other for age category. We

define the two indicator variables as

( )11 Male,

0 Female,y

⎧⎪= ⎨⎪⎩

and

( )21 Aged 65-80,

0 Aged >80 years.y

⎧⎪= ⎨⎪⎩

We denote the total number of male and female subjects as mN , fN , respectively, the

number of persons aged 65-80 and >80 years as aN and bN .

We may apply results derived in Section 5.4. of Chapter 5 as follows,

127

( ) ( )( )0 0 0r r • ••′= − −β Y μ , (7.15)

where 10 0,−• • •=β Σ σ m aN N

N N•

′⎛ ⎞= ⎜ ⎟⎝ ⎠

μ and ( ) ( ) ( )( )1 2Y Y• ′=Y .

Since only the marginal totals are known, •Σ is undetermined because the

population covariance between ( )1Y and ( )2Y is unknown. We may use the sample

variance ˆ•Σ , the sample covariance between ( )0Y and ( )1Y , ( )2Y , 0ˆ •σ , and sample

variance of ( )0Y , ( ) ( )20 0 0ˆ 1n n n n nσ = − − . Since , , ,m a m a f m a m f ann n n n n n n− = − , we

have

( ) ( )

,,

,

, ,,

11ˆ

1 1

f am a

m fm f m a m a m f

m a m a a b f am a a b

m f m f

nnn nn n nn n n n n

nn n n n nn n n n nn n nn n n n

•

⎛ ⎞⎛ ⎞−⎜ ⎟⎜ ⎟⎜ ⎟− ⎜ ⎟⎛ ⎞ ⎝ ⎠= = ⎜ ⎟⎜ ⎟−− − ⎛ ⎞⎝ ⎠ ⎜ ⎟

−⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

Σ ,

Since 0, 0 0, 0,m m f m m fnn n n n n n n− = − and 0, 0 0, 0,a a b a a bnn n n n n n n− = − , we have

( ) ( )

0,0,

0, 00

0, 0 0, 0,

1 1ˆ1 1

fmm f

m fm m

a a a ba b

a b

nnn n

n nnn n nnn n nn n n n n n

n nn n

•

⎛ ⎞⎛ ⎞−⎜ ⎟⎜ ⎟⎜ ⎟− ⎜ ⎟⎛ ⎞ ⎝ ⎠= = ⎜ ⎟⎜ ⎟−− − ⎛ ⎞⎝ ⎠ ⎜ ⎟−⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

σ .

For simplicity, we denote ,,,

f am am a

m f

nnr

n n⎛ ⎞

Δ = −⎜ ⎟⎜ ⎟⎝ ⎠

, 0,0,0,

fmm

m f

nnr

n n⎛ ⎞

Δ = −⎜ ⎟⎜ ⎟⎝ ⎠

,

0, 0,0,

a ba

a b

n nr

n n⎛ ⎞

Δ = −⎜ ⎟⎝ ⎠

and ( )2

,

a b

a b m f m a

n nBn n n n r

=− Δ

. We have

128

( ) ,1

,

1ˆ

1

a bm a

m f

a bm a

n n rn n n nBn n

r

−•

⎛ ⎞−Δ− ⎜ ⎟= ⎜ ⎟⎜ ⎟−Δ⎝ ⎠

Σ and ( )

0,0

0,

1ˆ1

m f m

a b a

n n rn n rn n•

Δ⎛ ⎞= ⎜ ⎟⎜ ⎟Δ− ⎝ ⎠

σ , and thus

0, , 0,

0

0, , 0,

ˆm m a a

m fa m a m

a b

r r r

B n nr r r

n n•

Δ − Δ Δ⎛ ⎞⎜ ⎟

= ⎜ ⎟Δ − Δ Δ⎜ ⎟⎜ ⎟⎝ ⎠

β .

Therefore,

00 0, 0,

, 0, , 0,

ˆ

.

m m a am a

m fm m a am a a m a m

a b

n N n N nr B r B rn N n N n

n nN n N nB r r B r rN n N n n n

⎛ ⎞ ⎛ ⎞= + − Δ + − Δ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

⎛ ⎞⎛ ⎞ ⎛ ⎞− − Δ Δ − − Δ Δ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠

The variance of 0r is

( ) ( )2 20 0?

1ˆ垐var 1 frn

ρ σ−⎛ ⎞= − ⎜ ⎟⎝ ⎠

,

where ( )( ) ( ) ( )( )

222 22

0? , 0, 0, 0, ,0 0

1ˆ 2m f m a b a m f m a m a

n nB n n r n n r n n r r r

n n nρ

−= Δ + Δ − Δ Δ Δ

−.


( ) ( )

( )

( )( ) ( ) ( )( )

0 00

2 2

0, 0, 0, 0, ,

1ˆvar1

1 1 2m f m a b a m f m a m a

n n nfrn n n

f n B n n r n n r n n r r r

⎛ ⎞−−= ⎜ ⎟⎜ ⎟−⎝ ⎠

− − − Δ + Δ − Δ Δ Δ

.

7.4. Cases with single continuous auxiliary variable

Consider a prevalence survey of skipping medications in a low-income senior

community, where an individual’s age is a strong predictor. However, the information

about an individual’s age may or may not be available before sampling. Assuming such

information of the sampled subjects will be available for analysis, the following

129

scenarios may happen, 1) only the population mean age is known; 2) ages are known

for all individuals in the population.

For cases where only the population mean age is known, we may directly apply

those results derived in Section 5.3 of Chapter 5 while using sample variance of age and

sample covariance between the response variable and age.

For cases where ages are known for all subjects in the population, we may apply

those results presented in Section 5.3. of Chapter 5 while using population variance of

age and sample covariance between the response variable and age.

When age is known for all subjects in the population, the distribution

information such as quartiles, octiles or deciles will also be known and can be used to

improve precision of estimation. A question of practical interest is how such

information should be used. An analyst may chose to,

Strategy I: Use age as a continuous variable, and follow the method of

Section 7.4.1;

Strategy II: Use distribution properties of age, such as quartiles, octiles or

deciles, and follow the strategies of Section 7.2.2.

Intuitively, Strategy II should give better estimators because more information is

incorporated into estimation. The immediate question is how much detail of the

distribution information should be used. We will answer this question by a series of

Monte Carlo simulations. We will compared estimators based on continuous age, its

median, quartiles, octiles and double-octiles.

130

7.4.1. Hypothetical populations

We generated two series of hypothetical populations of sizes 80, 160, 320, 640

and 1280, each series has a hypothetical relationship between smoking behavior and age

as illustrated in Figure 7.4. The two series of hypothetical populations were held as

fixed finite populations. In populations of Series A, the proportions of smokers increase

monotonically with increases of age. In populations of Series B, the relationship

between proportion of smokers and age is bell-shaped; the proportions of smokers

increases with age until its fifth octile (age 71-75), then gradually declines. Age was set

as uniformly distributed within each octiles. The characteristics of the two series of

populations are summarized in Table 7.5.

7.4.2. Monte Carlo simulations

In all the simulations, we used the population gender proportions to improve

estimation using methods developed in Sections 5.3 and 7.2. The study was carried out

as follows. From each fixed finite population, a series of 5,000 SRSWOR samples of

size n , where 20,40,80,160,320,640n = and n N< , were repeatedly drawn. For each

sample, we estimated the proportion of smokers with (7.11) and its variance estimate

with (7.12). The corresponding 95% confidence intervals (CIs) were constructed based

on normal approximation. Age was treated as, 1) a continuous variable (CNT), 2) a

binomial variable with cut-off point at its population median (MED), 3) a multinomial

variable with cut-off points at its population quartiles (QRT), 4) a multinomial variable

with cut-off points at its population octiles (OCT), and 5) a multinomial variable with

cut-off points at its population sedeciles (SED, quantiles based on division into 16

groups of equal frequency), respectively.

131

We compared estimation methods in terms of their average mean squared errors

and the overall performance of their 95% confidence intervals based the estimated

variance. We evaluated their confidence intervals in terms of the coverage rates,

average length. Simulation results are summarized in Tables 7.6, 7.7, 7.8, 7.9 and 7.10

and presented graphically in Figures 7.5, 7.6, 7.7 and 7.8. Based on these simulations,

we made the following observations.

7.4.3. Comparison between the age-adjusted and the simple expansion predictors

When the relationship between smoker proportion and age is monotonic

(population series A), as shown in Table 7.6, MSEs were significantly reduced by

adjustment for age mean (CNT), median (MED) or quartiles (QTR) even when sample

sizes were smaller (n=20, 40, 80). When sample sizes became ≥160, it became more

efficient to adjust for octiles or sedeciles than the simple expansion predictors (SEP).

Regardless of sample sizes, it was more efficient to adjust for age mean than for age

categories, however, the differences were gradually disappearing when sample size

became large ( )320n ≥ . When sample sizes were small (n≤40), adjusting for age

octiles or sedeciles resulted in larger MSE than the SEP.

When the relationship is not monotonic (population series B), as shown in Table

7.7, the estimators adjusted for age mean, median, octiles and sedeciles were less

efficient than the SEPs when sample sizes were ≤80. When sample sizes became 40≥ ,

estimators adjusted for age quartiles became more efficient; and when sample sizes

became 80≥ , the estimators adjusted for either octiles or sedeciles became more

efficient. Finally, when sample sizes became moderate (n≤160), the estimators adjusted

for age median, quartiles, octiles or sedeciles became more efficient than the SEP.

132

Figure 7.4. Hypothetical relationship between smoking and age of population series A and B

Population Series A P

ropo

rtion

of s

mok

ers

Age60 65 70 75 80 85 90 95 100

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Population Series B

Pro

porti

on o

f sm

oker

s

Age60 65 70 75 80 85 90 95 100

0.0

0.2

0.4

0.6

0.8

0.0

0.2

0.4

0.6

0.8

133

Table 7.5. Parameters of hypothetical population Series A and B

Population Series A

Age

Population Size (N)

80 160 320 640 1280

Octiles Range n rate n rate n rate n rate n rate

1 60.0-62.0 10 .1 20 .1 40 .1 80 .1 160 .1

2 62.1-64.5 10 .2 20 .2 40 .2 80 .2 160 .2

3 64.6-67.5 10 .3 20 .3 40 .3 80 .3 160 .3

4 67.6-71.0 10 .4 20 .4 40 .4 80 .4 160 .4

5 71.1-75.0 10 .5 20 .5 40 .5 80 .5 160 .5

6 75.1-80.0 10 .6 20 .6 40 .6 80 .6 160 .6

7 80.1-85.0 10 .7 20 .7 40 .7 80 .7 160 .7

8 85.1-100 10 .8 20 .8 40 .8 80 .8 160 .8

Overall 60.0-62.0 .45 .45 .45 .45 .45

Age mean 73.0 73.2 73.0 73.2 73.1

Population Series B

Age

Population Size (N)

80 160 320 640 1280

Octiles Range n rate n rate n rate n rate n rate

1 60.0-62.0 10 .1 20 .1 40 .1 80 .1 160 .1

2 62.1-64.5 10 .1 20 .1 40 .1 80 .1 160 .1

3 64.6-67.5 10 .2 20 .15 40 .15 80 .15 160 .15

4 67.6-71.0 10 .4 20 .4 40 .4 80 .4 160 .4

5 71.1-75.0 10 .6 20 .6 40 .6 80 .6 160 .6

6 75.1-80.0 10 .4 20 .4 40 .4 80 .4 160 .4

7 80.1-85.0 10 .2 20 .2 40 .2 80 .2 160 .2

8 85.1-100 10 .1 20 .1 40 .1 80 .1 160 .1

Overall 60.0-62.0 .263 .268 .268 .268 .268

Age mean 73.0 73.1 73.1 73.2 73.1

134

Table 7.6. MSE of design-based estimators adjusted to known continuous or categorical auxiliary variables, by population size and sample size (Series A)

-----------------------------------------------------------------------------------

Sample size and | Estimation method

Population size | SEP CNT MED QRT OCT SED

-----------------+-----------------------------------------------------------------

20 80 | 96.3 81.5 86.0 93.0 114.8 143.7

160 | 109.7 96.5 97.4 105.6 129.5 172.2

320 | 117.7 102.3 104.5 113.5 138.0 188.6

640 | 119.3 108.0 106.9 117.7 143.4 195.5

1280 | 123.8 106.8 108.4 117.5 143.1 195.8

-----------------+-----------------------------------------------------------------

40 80 | 30.4 25.0 27.2 28.1 31.6 36.4

160 | 45.8 39.8 39.7 41.4 47.2 54.9

320 | 55.3 45.9 47.3 48.2 54.4 66.3

640 | 57.0 48.1 48.7 50.0 56.7 70.1

1280 | 58.0 48.6 50.4 51.3 57.0 69.5

-----------------+-----------------------------------------------------------------

80 160 | 15.3 12.7 13.1 13.1 13.8 14.6

320 | 23.3 19.1 19.6 19.4 20.6 23.1

640 | 26.6 22.1 22.8 22.6 24.1 27.3

1280 | 29.2 23.8 24.7 24.6 25.9 28.9

-----------------+-----------------------------------------------------------------

160 320 | 7.8 6.3 6.6 6.3 6.6 7.0

640 | 11.2 9.2 9.6 9.3 9.5 10.1

1280 | 13.3 11.0 11.4 11.1 11.4 12.0

-----------------+-----------------------------------------------------------------

320 640 | 3.9 3.2 3.3 3.2 3.2 3.3

1280 | 5.8 4.7 5.0 4.7 4.8 4.9

-----------------+-----------------------------------------------------------------

640 1280 | 1.9 1.6 1.6 1.6 1.6 1.6

-----------------------------------------------------------------------------------

SEP = Simple expansion predictor, CNT = Design-based predictor (DP) using

continuous age, MED = DP using median age, QRT = DP using quartiles of age,

OCT= DP using octiles of age, SED=DP using sedeciles (16-quantiles) of age.

135

Table 7.7. MSE of design-based estimators adjusted to known continuous or categorical auxiliary variables, by population size and sample size (Series B)

-----------------------------------------------------------------------------------



-----------------+-----------------------------------------------------------------

20 80 | 73.6 78.9 76.9 77.9 92.1 113.0

160 | 87.5 94.8 90.3 90.8 107.9 142.1

320 | 93.1 100.8 95.7 96.2 115.9 153.6

640 | 96.8 104.1 99.7 100.2 117.4 158.9

1280 | 98.8 105.6 101.2 100.7 116.6 157.3

-----------------+-----------------------------------------------------------------

40 80 | 25.0 25.8 25.3 24.4 26.5 30.4

160 | 38.5 39.8 38.9 36.9 40.1 47.2

320 | 41.2 43.0 41.3 40.1 44.2 52.9

640 | 45.3 46.8 45.7 44.0 48.4 59.8

1280 | 45.7 47.5 46.3 44.3 48.3 58.8

-----------------+-----------------------------------------------------------------

80 160 | 12.4 12.7 12.4 11.6 12.1 13.3

320 | 18.5 18.8 18.6 17.1 17.6 19.5

640 | 21.1 21.4 20.9 19.1 19.9 22.1

1280 | 22.3 22.7 22.2 20.7 21.2 23.9

-----------------+-----------------------------------------------------------------

160 320 | 6.2 6.2 6.1 5.5 5.5 5.8

640 | 9.0 9.1 9.0 8.2 8.2 8.8

1280 | 10.4 10.5 10.3 9.6 9.5 10.1

-----------------+-----------------------------------------------------------------

320 640 | 3.2 3.2 3.1 2.8 2.8 2.9

1280 | 4.5 4.5 4.4 4.1 4.0 4.1

-----------------+-----------------------------------------------------------------

640 1280 | 1.5 1.5 1.5 1.4 1.3 1.4

-----------------------------------------------------------------------------------

SEP = Simple expansion predictor, CNT = Design-based predictor (DP) using

continuous age, MED = DP using median age, QRT = DP using quartiles of age,

OCT= DP using octiles of age, SED=DP using sedeciles (16-quantiles) of age.

136

7.4.4. MSEs and variances of the estimators when relationship between smoker

proportion and age is monotonic (Series A)

When the relationship between smoker proportions and age is monotonic, as

shown in Tables 7.6 and Figures 7.5 and 7.6, the estimators adjusted for age categories

had larger MSEs than the estimators adjusted for age mean (CNT, thereafter). The

more categories the age variable had, the larger their MSEs were. For n≤80, estimators

adjusted for age quartiles, octiles and sedeciles had larger MSEs than those adjusted for

mean or median age. When the sample became moderate or large ( )160n ≥ , the MSEs

of the estimators adjusted for age quartiles or octiles became smaller than those of

estimators adjusted for median age, and became close to MSEs of the estimators

adjusted for mean age.

Figure 7.6 presents ratios of MSE or variance of MED, QTR, OCT and SED

estimators to MSE or variance of CNT estimators, respectively. The MED estimators

had slightly larger MSEs and variances than CNT estimators, their ratios of MSE and

variance were between 1 and 1.05. However, the MSE ratios of QTR, OCT and SED

estimators were all larger than 1, regardless of smaller or larger sample sizes. It is

obvious that the smaller the sample sizes were, the greater the MSE ratios were. In

contrast, the smaller the sample sizes were, the smaller the variance ratios were, which

indicated the overall underestimated the variances of estimators using sample-based

variance estimators. The smaller the sample size was, the larger the difference was

between MSE and variance ratios. This phenomena raises questions about the validity

of the 95% confidence intervals constructed using sample-based variance estimators.

137

Figure 7.5. MSE and variance of design-based predictors adjusted for continuous or categorical age (Series A)

Sample sizes (from top to bottom): 40, 80, 160, 320Symbols: - =CNT, o=MED, x=QRT, +=OCT, D=SED

MS

E x

10,

000

Population size80 160 320 640 1280

0

10

20

30

40

50

60

70

80

0

10

20

30

40

50

60

70

80

Sample sizes (from top to bottom): 40, 80, 160, 320Symbols: - =SRS, o=CNT, x=MED, +=QTR, D=SED

Varia

nce

x 10

,000


0

10

20

30

40

50

60

0

10

20

30

40

50

60

138

Figure 7.6. MSE and variance ratios of estimators adjusted for age median, quartiles, octiles and sedeciles to estimators adjusted for age mean (Series A)

Sample Sizes: -= 20, o=40, x=80, +=160, T=320, S=640M

SE

ratio

Etimation method: Median


1

1.05

1.1

1.15

1.2

1

1.05

1.1

1.15

1.2

Estimation method: Quartiles


1

1.05

1.1

1.15

1.2

1

1.05

1.1

1.15

1.2

Estimation method: Octiles


1

1.25

1.5

1.75

2

1

1.25

1.5

1.75

2

Estimation method: Sedeciles


1

1.25

1.5

1.75

2

1

1.25

1.5

1.75

2

Sample Sizes: - = 20, o=40, x=80, +=160, T=320, S=640

Varia

nce

ratio



.85

.9

.95

1

1.05

1.1

.85

.9

.95

1

1.05

1.1



.85

.9

.95

1

1.05

1.1

.85

.9

.95

1

1.05

1.1



.3

.4

.5

.6

.7

.8

.9

1

.3

.4

.5

.6

.7

.8

.9

1

Estimation method: Sedeciles


.3

.4

.5

.6

.7

.8

.9

1

.3

.4

.5

.6

.7

.8

.9

1

139

7.4.5. MSEs and variances of the estimators when relationship between smoker

proportion and age is not monotonic (Series B)

When the relationship between smoker proportions and age is not monotonic, as

shown in Tables 7.7 and Figures 7.7 and 7.8, the estimators adjusted for age median

(MED) or quartiles (QTR) had smaller MSEs than the estimators adjusted for age mean

(CNT), regardless of population or sample sizes. When sample sizes became larger than

80, the estimators adjusted for age octiles (OCT) became more efficient than the CNT

estimators. When sample sizes were larger than 160, the estimators adjusted for age

sedeciles (SED) became more efficient than the CNT and MED estimators.

When sample sizes were small (n<160) or sampling fractions were high

( )25%f ≥ , the variance ratios of the QTR to the CNT estimators were much smaller

than the MSE ratios. Such effects were more pronounced for the variance ratios of the

OCT and SED to the CNT estimators. This indicated overall underestimation of the

variability of the estimators using sample-based variance estimators.

7.4.6. Negative variances

The variance estimates were calculated using (7.12). As shown in Table 7.8, the

probability of encountering negative variance estimates increased dramatically with the

number of categories the age variable had when sample sizes were small (n=20 or 40).

Since the sample covariance between response and auxiliary variable(s) were used for

estimation, the higher probabilities of encountering negative variance estimates might

be attributable to the larger variation of sample covariance when sample sizes were

small.

140

Figure 7.7. MSE and variance of design-based predictors adjusted for continuous or categorical age (Series B)

Sample sizes (from top to bottom): 40, 80, 160, 320Symbols: - =CNT, o=MED, x=QRT, +=OCT, D=SED

MS

E x

10,

000


0

10

20

30

40

50

60

0

10

20

30

40

50

60

Sample sizes (from top to bottom): 40, 80, 160, 320Symbols: .=SRS, o=CNT, x=MED, +=QTR, D=SED

Varia

nce

x 10

,000


0

10

20

30

40

50

60

0

10

20

30

40

50

60

141

Figure 7.8. MSE and variance ratios of estimators adjusted for age median, quartiles, octiles and sedeciles to estimators adjusted for age mean (Series B)

Sample Sizes: - = 20, o=40, x=80, +=160, T=320, D=640

MS

E ra

tio



.85

.9

.95

1

.85

.9

.95

1



.85

.9

.95

1

.85

.9

.95

1



.8

.9

1

1.1

1.2

1.3

1.4

1.5

.8

.9

1

1.1

1.2

1.3

1.4

1.5Estimation method: Sedeciles


.8

.9

1

1.1

1.2

1.3

1.4

1.5

.8

.9

1

1.1

1.2

1.3

1.4

1.5

Sample Sizes: - = 20, o=40, x=80, +=160, T=320, S=640

Varia

nce

ratio



.7

.75

.8

.85

.9

.95

1

.7

.75

.8

.85

.9

.95

1



.7

.75

.8

.85

.9

.95

1

.7

.75

.8

.85

.9

.95

1



.2

.3

.4

.5

.6

.7

.8

.9

1

.2

.3

.4

.5

.6

.7

.8

.9

1Estimation method: Sedeciles


.2

.3

.4

.5

.6

.7

.8

.9

1

.2

.3

.4

.5

.6

.7

.8

.9

1

142

Table 7.8. Probability of encountering negative variance when sample size is small and age is used as a continuous or categorical auxiliary variable

-----------------------------------------------------------------------------------

Population Series A

-----------------------------------------------------------------------------------



-----------------+-----------------------------------------------------------------

20 80 | --- 0.4 --- 0.5 3.0 38.1

160 | --- 0.2 0.1 0.8 5.9 49.4

320 | --- 0.4 0.1 1.1 6.4 45.9

640 | --- 0.2 0.1 0.8 7.0 49.3

1280 | --- 0.5 0.1 0.9 6.9 50.1

-----------------+-----------------------------------------------------------------

40 80 | --- --- --- --- --- ---

160 | --- --- --- --- --- 1.3

320 | --- --- --- --- --- 1.7

640 | --- --- --- --- 0.1 2.7

1280 | --- --- --- --- 0.2 2.9

-----------------------------------------------------------------------------------

Population Series B

-----------------------------------------------------------------------------------



-----------------+-----------------------------------------------------------------

20 80 | --- --- --- 0.1 1.7 28.9

160 | --- --- --- 0.2 3.4 31.6

320 | --- --- --- 0.2 3.9 35.2

640 | --- --- --- 0.2 4.4 36.1

1280 | --- --- --- 0.4 5.0 39.0

-----------------+-----------------------------------------------------------------

40 80 | --- --- --- --- --- ---

160 | --- --- --- --- --- 0.3

320 | --- --- --- --- --- 0.8

640 | --- --- --- --- 0.1 1.3

1280 | --- --- --- --- 0.1 1.6

-----------------+-----------------------------------------------------------------

--- = not apply or negligible (<0.1%). SEP = Simple expansion predictor, CNT =

Design-based predictor (DP) using continuous age, MED = DP using median of

age, QRT = DP using quartiles of age, OCT= DP using octiles of age, SED=DP

sedeciles (16-quantiles) of age. When sample size ≥ 80, the probability of

encountering negative variance is negligible (<0.1%).

143

7.4.7. Performance of 95% confidence intervals

The 95% confidence intervals were constructed using normal approximation

with variance estimates based on (7.12). Regardless of the functional form between

smoker proportion and age, the 95% confidence intervals (CI) of the estimators adjusted

for age mean and quantiles performed worse than simple expansion estimators when

sample sizes were very small (n=20). The more categories the variables had, the poorer

the coverage rates were. When sample size is 20, the 95% CIs constructed based on age

octiles or sedeciles had coverage rates as low as 70% or 30%, respectively. When

sample sizes became moderate (n=80), the 95% CIs for estimators adjusted for age

mean, median and quartiles had coverage rates close to its nominal level. Not until

sample sizes became larger (n=320 or 640), the 95% CIs for estimators adjusted for age

octiles or sedeciles had coverage rates close to their nominal level.

Results in Tables 7.9 and 7.10 suggested that the width of the confidence

intervals became increasingly narrower when the number of categories of age increased,

especially when sample sizes were small. When the relationship between smoker

proportions and age was not monotonic and sample size was moderate or large (n≥320),

the 95% CIs of the estimators adjusted for age quartiles, octiles or sedeciles were much

narrower than those adjusted for age mean or median, yet their coverage rates close to

their nominal level.

144

Table 7.9. Coverage rates and width of 95% confidence interval using age as a continuous or categorical variable, by population size and sample size (Series

A)

----------------------------------------------------------------------------------- Sample size and | Estimation method Population size | SEP CNT MED QRT OCT SED -----------------+----------------------------------------------------------------- 20 80 | 92.9 89.0 90.5 85.0 71.9 32.6 | 37.9 31.1 32.7 29.9 25.5 16.8 | 160 | 93.8 89.6 90.2 84.1 69.6 26.0 | 40.9 34.6 35.1 31.6 26.6 17.1 | 320 | 93.4 89.8 91.1 83.9 69.0 28.3 | 42.2 35.5 36.4 32.7 27.2 18.6 | 640 | 93.1 89.6 90.6 83.7 68.4 25.8 | 42.9 36.0 36.8 32.9 27.1 18.4 | 1280 | 92.3 89.5 90.5 84.7 69.1 26.5 | 43.2 36.1 37.0 33.3 27.3 18.9 -----------------+----------------------------------------------------------------- 40 80 | 95.8 92.9 92.9 91.4 87.9 78.3 | 21.9 18.8 19.6 18.8 17.9 15.2 | 160 | 96.2 93.5 93.6 91.1 86.8 72.9 | 26.8 23.6 23.9 22.6 21.2 16.7 | 320 | 93.4 92.4 92.9 90.5 85.6 72.0 | 28.9 25.1 25.7 24.2 22.4 18.6 | 640 | 93.5 93.1 93.4 91.2 85.9 71.0 | 29.9 26.0 26.5 24.9 22.8 18.6 | 1280 | 95.0 93.0 93.5 91.1 86.4 71.1 | 30.4 26.2 26.9 25.3 23.1 18.7 -----------------+----------------------------------------------------------------- 80 160 | 96.2 94.3 94.4 93.4 91.7 87.4 | 15.5 13.9 14.0 13.5 13.2 11.9 | 320 | 94.9 93.9 94.7 93.4 91.4 87.5 | 18.9 16.7 17.1 16.4 15.9 14.9 | 640 | 94.6 94.6 94.7 93.7 91.9 86.8 | 20.4 18.1 18.4 17.7 17.1 15.8 | 1280 | 95.4 94.1 93.9 93.1 90.9 85.7 | 21.1 18.6 19.1 18.3 17.6 16.2 -----------------+----------------------------------------------------------------- 160 320 | 94.3 94.8 94.5 94.4 93.8 91.7 | 10.9 9.7 9.9 9.7 9.5 9.3 | 640 | 95.2 95.1 94.9 94.1 93.5 91.7 | 13.4 12.0 12.2 11.8 11.6 11.2 | 1280 | 95.0 94.5 94.5 94.0 93.1 91.6 | 14.4 12.8 13.1 12.7 12.4 12.0 -----------------+----------------------------------------------------------------- 320 640 | 95.5 94.7 94.8 94.8 94.3 93.6 | 7.7 6.9 7.0 6.9 6.8 6.7 | 1280 | 95.1 94.8 95.0 94.8 94.2 93.2 | 9.4 8.4 8.6 8.4 8.3 8.1 -----------------+----------------------------------------------------------------- 640 1280 | 95.7 94.9 94.8 95.0 94.8 94.3 | 5.5 4.9 5.0 4.9 4.8 4.8 -----------------------------------------------------------------------------------

SEP = Simple expansion predictor, CNT = Design-based predictor (DP) using continuous age, MED = DP using median of age, QRT = DP using quartiles of age, OCT= DP using octiles of age, SED=DP using sedeciles (16-quantiles) of age.

145

Table 7.10. Coverage rates and width of 95% confidence interval using age as a continuous or categorical variable, by population and sample sizes (Series B)

----------------------------------------------------------------------------------- Sample size and | Estimation method Population size | SEP CNT MED QRT OCT SED -----------------+----------------------------------------------------------------- 20 80 | 94.7 92.5 91.6 86.7 75.7 36.0 | 33.3 32.0 31.4 27.9 23.9 15.3 | 160 | 92.8 91.6 91.0 86.3 73.5 34.8 | 36.1 34.6 34.0 30.1 25.2 16.1 | 320 | 91.7 90.5 89.7 85.0 71.4 32.9 | 37.3 35.7 35.1 30.9 25.6 17.0 | 640 | 91.4 90.2 90.1 85.2 72.0 32.3 | 37.9 36.2 35.6 31.2 25.9 17.0 | 1280 | 91.3 90.3 90.2 85.1 71.4 30.3 | 38.2 36.5 35.8 31.4 25.9 17.0 -----------------+----------------------------------------------------------------- 40 80 | 91.6 91.9 92.3 91.4 88.2 79.2 | 19.4 19.0 18.8 17.5 16.6 14.1 | 160 | 94.7 93.6 93.0 90.8 87.1 76.3 | 23.8 23.3 23.1 21.3 19.8 16.7 | 320 | 94.3 93.6 93.6 91.8 87.1 76.1 | 25.7 25.2 24.9 22.9 21.1 17.6 | 640 | 92.8 92.6 92.9 90.8 86.3 73.4 | 26.5 26.0 25.6 23.5 21.6 18.0 | 1280 | 92.4 92.6 92.4 90.9 86.2 74.1 | 26.9 26.3 26.0 23.9 21.7 18.0 -----------------+----------------------------------------------------------------- 80 160 | 94.5 94.4 94.2 94.4 92.6 87.6 | 13.8 13.7 13.5 12.7 12.3 11.6 | 320 | 93.9 93.8 93.8 92.9 91.2 87.8 | 16.8 16.6 16.5 15.5 14.8 13.9 | 640 | 93.9 94.2 94.2 94.1 92.1 88.7 | 18.2 18.0 17.8 16.7 15.9 14.9 | 1280 | 95.1 94.6 94.4 93.7 91.9 87.7 | 18.8 18.6 18.4 17.2 16.4 15.3 -----------------+----------------------------------------------------------------- 160 320 | 95.4 95.2 94.5 94.4 94.0 92.6 | 9.7 9.7 9.6 9.1 8.9 8.6 | 640 | 94.8 94.6 94.7 94.4 93.7 92.2 | 11.9 11.8 11.7 11.1 10.8 10.5 | 1280 | 95.4 94.6 94.8 94.6 93.8 91.8 | 12.9 12.8 12.7 11.9 11.6 11.2 -----------------+----------------------------------------------------------------- 320 640 | 94.9 94.8 94.6 94.6 94.2 93.5 | 6.9 6.9 6.8 6.4 6.3 6.2 | 1280 | 95.5 95.2 95.4 95.2 94.7 93.9 | 8.4 8.4 8.3 7.9 7.7 7.6 -----------------+----------------------------------------------------------------- 640 1280 | 95.2 95.2 95.2 95.1 94.7 94.4 | 4.9 4.9 4.8 4.6 4.5 4.5 -----------------------------------------------------------------------------------

SRS = Simple expansion predictor, CNT = Design-based predictor (DP) using continuous age, MED = DP using median of age, QRT = DP using quartiles of age, OCT= DP using octiles of age, SED=DP using sedeciles (16-quantiles) of age.

146

In conclusion, the estimators adjusted for age categories were more efficient

than the estimators adjusted for age mean when the functional form between smoker

proportions and age is not monotonic. When sample size became large, such efficiency

became more apparent. However, we caution the use of categorical auxiliary variable

to improve rate estimation when sample size is small due to the higher probability of

encountering negative variances and overall underestimate of the variances. For the

simulations with sample size less than 40, using more than 2 categories resulted in loss

of efficiency. When sample size is greater than 80 and less than 160, age quartiles may

be used. When the sample size is moderate (n=320 or 640), octiles may be used. As a

rough guild based on the limited simulations, the average number per group in the

sample should not be less than 20.

7.5. Derivations of results (7.1) and (7.2) in Section 7.2.

By applying results derived in Section 5.4 of Chapter 5, we have

( ) ( )( )0 0 0r r • ••′= − −β Y μ , (7.16)

where 10 0,−• • •=β Σ σ 1 2N N

N N•

′⎛ ⎞= ⎜ ⎟⎝ ⎠

μ and ( ) ( ) ( )( )1 2Y Y• ′=Y .


( )

( )( )

12 11 2

11 2

111

N N NN NN N NN N

−

• −

⎛ ⎞− −= ⎜ ⎟⎜ ⎟− −− ⎝ ⎠

Σ ,

( )

( )2 2 1 21

1 2 1 11 2 3


−•

−⎛ ⎞−= ⎜ ⎟−⎝ ⎠

Σ .

Since ( )( )

0 020ˆ

1n n nn n

σ−

=−

and ( )

01 0 10

02 0 2

1ˆ1

nn n nnn n nn n•

−⎛ ⎞= ⎜ ⎟−− ⎝ ⎠

σ , we have

147

( )( )

( )

( )( )( ) ( )

( ) ( )( )

( )

( )

( )

2 2 1 2 01 0 1

01 2 3 1 2 1 1 02 0 2

2 2 01 0 1 1 2 02 0 2

1 2 3 1 2 01 0 1 1 1 02 0 2

01 0 11

02 0 22

1 1ˆ1

1 11

1

11 1

N N N N N nn n nNn n N N N N N N N N nn n n

N N N nn n n N N nn n nNn n N N N N N nn n n N N N nn n n

nn n nNN

n nnn n n

N

•

− −⎛ ⎞⎛ ⎞− ⎜ ⎟⎜ ⎟=⎜ ⎟⎜ ⎟− − −⎝ ⎠⎝ ⎠

− − + −⎛ ⎞− ⎜ ⎟=⎜ ⎟− − + − −⎝ ⎠

⎛ ⎞−⎜ ⎟− ⎜ ⎟= ⎜ ⎟−

⎜ ⎟−⎜ ⎟⎝ ⎠

β

( ) ( )3 0 023

11 1 .1 1

N n n nnn n N

⎛ ⎞−+ − ⎜ ⎟

⎜ ⎟− ⎝ ⎠

Consequently,

( ) ( ) ( )

( ) ( ) ( )

1 1

0 01 0 1 02 0 2 3 0 020

2 21 2 3

1 1

01 0 1 02 0 2 3 0 02

2 21 2 3

1 1ˆ 1 11 1

1 1 1 11 1

n nn N nn n n nn n n N n n nnn nr

n nn n n N N n n Nn n

N NN nn n n nn n n N n n nnN N

N Nn n N N n n NN N

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎛ ⎞− − − − −

= − −⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎜ ⎟ ⎜ ⎟⎝ ⎠⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎛ ⎞− − − − −

+ +⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎜ ⎟ ⎜ ⎟⎝ ⎠⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠


( )

30

0 0 01

1ˆ1

k kk

k k


⎧ ⎫⎛ ⎞− ⎪ ⎪⎛ ⎞= − − −⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭∑ . (7.17)

This is a “post-stratified” estimator. To obtain its variance, we first evaluate its 20ρ •

( ) ( )

( ) ( )( )

2 1 20 0 0 0 0

0 0 1 2 3

2 2 1 2 01 0 101 0 1 02 0 2

1 2 1 1 02 0 2

ˆ 垐垐

1 1 11

Nn n n n n N N N

N N N N N nn n nnn n n nn n n

N N N N N nn n n

ρ σ−• • • • •= =

−=

− −

− −⎛ ⎞⎛ ⎞× − − ⎜ ⎟⎜ ⎟− −⎝ ⎠⎝ ⎠

σ Σ σ σ

148

( ) ( )

( ) ( )( )( )( ) ( )

( ) ( )( )( )

2 21 2 01 0 1 1 2 02 0 2

20? 2 01 0 1 02 0 2

0 0 1 2 3 2 21 3 02 0 2 2 3 01 0 1

21 2 01 0 1 02 0 2

20 0 1 2 3 1 3 02 0 2 2 3 01 0

1 1 1ˆ 21

1 1 11

N N nn n n N N nn n nN N N nn n n nn n n

n n n n n N N NN N nn n n N N nn n n

N N nn n n nn n nNn n n n n N N N N N nn n n N N nn n n

ρ

⎧ ⎫− + −⎪ ⎪− ⎪ ⎪= × + − −⎨ ⎬

− − ⎪ ⎪+ − + −⎪ ⎪⎩ ⎭

− + −−= ×

− − + − + −( )

( ) ( )

21

32

0 010 0

1 1 1 .1 k k

k k

N nn n nn nn n n N=

⎧ ⎫⎪ ⎪⎨ ⎬⎪ ⎪⎩ ⎭

⎛ ⎞−⎛ ⎞= −⎜ ⎟⎜ ⎟⎜ ⎟− −⎝ ⎠⎝ ⎠∑

Therefore, its variance is

( ) ( )( ) ( )

( )3

20 00 0 022

1

1 1 1 1ˆvar1 1 k k

k k

n n nf f Nr nn n nn n n n Nn n =

⎛ ⎞−− − −⎛ ⎞ ⎛ ⎞= − −⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟−⎝ ⎠ ⎝ ⎠ −⎝ ⎠∑ (7.18)

Results (7.17) and (7.18) can be readily extended to scenarios where 3p >

categories are present:

( )

00 0 0

1

1ˆ1

pk k

kk k


⎧ ⎫⎛ ⎞− ⎪ ⎪⎛ ⎞= − − −⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭∑ , (7.19)

and

( ) ( )( ) ( )

( )20 00 0 022

1

1 1 1ˆvar1 1

p

k kk k

n n nf Nr nn n nn n n Nn n =

⎧ ⎫−− −⎪ ⎪⎛ ⎞= − −⎨ ⎬⎜ ⎟ −⎝ ⎠ −⎪ ⎪⎩ ⎭∑ . (7.20)

Further, if the sample estimator of •Σ is used, we have

( )( )

2 2 1 21

1 2 1 11 2 3


−•

−⎛ ⎞−= ⎜ ⎟−⎝ ⎠

Σ , and ( )

01 0 10

02 0 2

1ˆ1

nn n nnn n nn n•

−⎛ ⎞= ⎜ ⎟−− ⎝ ⎠

σ , and then

101 0

1 30 03 0

3202 0

2

111ˆ11

nn nn n nn n

n nnn nn n

•

⎛ ⎞⎛ ⎞−⎜ ⎟⎜ ⎟⎛ ⎞⎝ ⎠⎜ ⎟ ⎛ ⎞= − − ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎛ ⎞ ⎝ ⎠⎜ ⎟−⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

β . (7.21)

149

( ) ( )

( )

( )

0 0 0 0

1 10 1 2 301 0 02 0 03 0

2 21 2 3

1 11 2 301 0 02 0 03 0

2 21 2 3

ˆ

1 1 1 1 1 1 1

1 1 1 1 1 1 1

r r

n nn n n nn n n n n nn nn n n n n n n n n

N Nn n nn n n n n nN NN n n n n N n n

• •• •′ ′= − +

⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞= − − − + −⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠

⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ − − − −⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠

β Y β μ

0 1 1 2 2 3 301 0 02 0 03 0

1 2 3

1 1 2 2 3 301 0 02 0 03 0

1 2 3

0 1 201 0 02 0

1 1 1 1

1 1 1 1

1

n n n n n n nn n n n n nn n n n n n n n n n

N n N n N nn n n n n nn N n n N n n N n

n n nn n n nn n n n

⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞= − − + − + − −⎨ ⎬⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎩ ⎭

⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞+ − + − − − −⎨ ⎬⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎩ ⎭

⎛ ⎞ ⎛ ⎞= − − + −⎜ ⎟ ⎜⎝ ⎠ ⎝

303 0

1 1 2 2 3 301 0 02 0 03 0

1 2 3

3 30

0 0 0 00 0

1

1 1k k kk k

k k k

nn nn

N n N n N nn n n n n nN n n n n n n

n n N nn n n nn n n N n n= =

⎧ ⎫⎛ ⎞+ −⎨ ⎬⎟ ⎜ ⎟⎠ ⎝ ⎠⎩ ⎭

⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ − + − + −⎨ ⎬⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎩ ⎭

⎧ ⎫⎛ ⎞ ⎛ ⎞= − − + −⎨ ⎬⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎩ ⎭

∑ ∑


3

00 0 0

0

1ˆ k k kk

k k k

n N n nnr n nn n n N N n=

⎛ ⎞ ⎛ ⎞= − − −⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠

∑ . (7.22)

To obtain its variance, we first calculate 20ρ • ,

( ) ( ) ( )( )

2 1 20 0 0 0

2 2 1 2 01 0 101 0 1 02 0 2

1 2 1 1 02 0 21 2 3 0 0

ˆˆ 垐 �

1 1 n n n n n nn n nnn n n nn n n

n n n n n nn n nnn n n n n n

ρ σ−• • • •=

− −⎛ ⎞⎛ ⎞= − − ⎜ ⎟⎜ ⎟− −− ⎝ ⎠⎝ ⎠

σ Σ σ

This can be simplified as follows.

( )( )( ) ( )( )( )( ) ( )( )

( )( ) ( ) ( )

( )( )

2 01 2 01 0 1 1 02 1 02 0 220�

0 0 1 2 3 1 2 0 02 01 0 1 1 2 0 01 02 0 2

1 2 0 01 0 1 1 2 0 02 0 2 1 2 0 03 3 0

0 0 1 2 3 2 3 01 1 03 01 0 1 1 3 02

1 1ˆ

1 1

n n n n nn n n n n n n nn n nn n n n n n n n n n nn n n n n n n nn n n

n n n nn n n n n n nn n n n n n nn n nn n n n n n n n n n n nn n n n n n

ρ− − + − −⎧ ⎫⎪ ⎪= ⎨ ⎬

− − − − − − −⎪ ⎪⎩ ⎭− + − + −

=− + − − + ( )( )2 03 02 0 2n n nn n n

⎧ ⎫⎪ ⎪⎨ ⎬

− −⎪ ⎪⎩ ⎭

150

( ) ( )( ) ( )( ){ }

( ) { }

( )

( )

20? 3 01 1 03 01 0 1 1 3 02 2 03 02 0 2

0 0 1 2 3

2 2 2 22 3 01 1 3 02 1 2 03 1 2 3 0

0 0 1 2 3

2 2 2 201 02 03 0

0 0 1 2 3

23200

10 0

1 1ˆ

1 1

1

1 k

k k

n n n n n nn n n n n n n n nn n nn n n n n n

nn n n nn n n nn n n n n n nn n n n n n

n n nn n n nn n n n n n

nn nn n n n

ρ

=

= − − + − −−

= + + −−

⎧ ⎫= + + −⎨ ⎬

− ⎩ ⎭⎧ ⎫

= −⎨ ⎬− ⎩ ⎭

∑

Therefore, the variance of estimator (7.22) is

( ) ( )230

0 01

1ˆvar1

k

k k

f nr nn n n=

⎛ ⎞−= −⎜ ⎟− ⎝ ⎠

∑ . (7.23)

Analogously, these can be extend to situations where the subjects are classified

into more than three groups (i.e., 3p > ).

00 0 0

0

1ˆp

k k kk

k k k


⎛ ⎞⎛ ⎞= − − −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

∑ . (7.24)

and its variance is

( ) ( )20

0 01

1ˆvar1

pk

k k

f nr nn n n=

⎛ ⎞−= −⎜ ⎟− ⎝ ⎠

∑ . (7.25)

7.6. Derivations of results (7.3) and (7.4) in Section 7.2

By applying results derived in Section 5.4 of Chapter 5, we have

( ) ( )( )0 0 0r r • ••′= − −β Y μ , (7.26)

where 10 0,−• • •=β Σ σ 1 2N N

N N•

′⎛ ⎞= ⎜ ⎟⎝ ⎠

μ and ( ) ( ) ( )( )1 2Y Y• ′=Y .


151

( )

( )( )

12 11 2

11 2

111

N N NN NN N NN N

−

• −

⎛ ⎞− −= ⎜ ⎟⎜ ⎟− −− ⎝ ⎠

Σ ,

and

( )

( )2 2 1 21

1 2 1 11 2 3


−•

−⎛ ⎞−= ⎜ ⎟−⎝ ⎠

Σ .

It is shown that ( )( )

0 020ˆ

1d n d

n nσ

−=

− and

( )01 0 1

002 0 2

1ˆ1

nn n nnn n nn n•

−⎛ ⎞= ⎜ ⎟−− ⎝ ⎠

σ . Therefore,

( )( )

( )

( )( )( ) ( )

( ) ( )( )

( )

( )

( )

2 2 1 2 01 0 1

01 2 3 1 2 1 1 02 0 2

2 2 01 0 1 1 2 02 0 2

1 2 3 1 2 01 0 1 1 1 02 0 2

01 0 11

02 0 22

1 1ˆ1

1 11

1

11 1

N N N N N nn n nNn n N N N N N N N N nn n n

N N N nn n n N N nn n nNn n N N N N N nn n n N N N nn n n

nn n nNN

n nnn n n

N

•

− −⎛ ⎞⎛ ⎞− ⎜ ⎟⎜ ⎟=⎜ ⎟⎜ ⎟− − −⎝ ⎠⎝ ⎠

− − + −⎛ ⎞− ⎜ ⎟=⎜ ⎟− − + − −⎝ ⎠

⎛ ⎞−⎜ ⎟− ⎜ ⎟= ⎜ ⎟−

⎜ ⎟−⎜ ⎟⎝ ⎠

β

( ) ( )3 0 023

11 11 1

N n n nnn n N

⎛ ⎞−+ − ⎜ ⎟⎜ ⎟− ⎝ ⎠

( ) ( )

( ) ( ) ( )

( ) ( ) ( )

0 0 0 0

1 1

0 01 0 1 02 0 2 3 0 02

2 21 2 3

1 1

01 0 1 02 0 2 3 0 02

2 21 2 3

ˆ

1 1 1 11 1

1 1 1 11 1

r rn n

n N nn n n nn n n N n n nnn nn nn n n N N n n Nn n

N NN nn n n nn n n N n n nnN N

N Nn n N N n n NN N

• •• •′ ′= − +

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎛ ⎞− − − − −

= − −⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎜ ⎟ ⎜ ⎟⎝ ⎠⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

⎛ ⎞ ⎛⎜ ⎟⎛ ⎞− − − − −

+ +⎜ ⎟⎜ ⎟− −⎜ ⎟⎝ ⎠⎜ ⎟⎝ ⎠ ⎝

β Y β μ

⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟

⎠


( )

30

0 0 01

1ˆ1

k kk

k k


⎧ ⎫⎛ ⎞− ⎪ ⎪⎛ ⎞= − − −⎨ ⎬⎜ ⎟⎜ ⎟− ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭∑ . (7.27)

This is a “post-stratified” estimator.

152

Further, if the sample estimator of •Σ is used, we have

( )( )

2 2 1 21

1 2 1 11 2 3


−•

−⎛ ⎞−= ⎜ ⎟−⎝ ⎠

Σ , and ( )

01 0 10

02 0 2

1ˆ1

nn n nnn n nn n•

−⎛ ⎞= ⎜ ⎟−− ⎝ ⎠

σ

( )( )

( )( ) ( )( ) ( )( )

( ) ( ){ }

2 2 1 2 01 0 1

01 2 3 1 2 1 1 02 0 2

2 2 01 0 1 1 2 02 0 2

1 2 3 1 2 01 0 1 1 1 02 0 2

2 1 01 0 1 02 0 2 3 01 0 1

1 2 3 1 2 01 0

1 1ˆ

1 1

1 1

n n n n n nn n n

n n n n n n n n n nn n n

n n n nn n n n n nn n n

n n n n n n nn n n n n n nn n n

n n nn n n nn n n n nn n n

n n n n n n nn n

•

− −⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟=

⎜ ⎟⎜ ⎟− −⎝ ⎠⎝ ⎠

− − + −⎛ ⎞⎜ ⎟=⎜ ⎟− + − −⎝ ⎠

− + − + −=

−

β

( ) ( ){ }1 02 0 2 3 02 0 2

101 0

1 303 0

3202 0

2

111

11

n nn n n n nn n n

nn nn n nn n

n nnn nn n

⎛ ⎞⎜ ⎟⎜ ⎟+ − + −⎝ ⎠

⎛ ⎞⎛ ⎞−⎜ ⎟⎜ ⎟⎛ ⎞⎝ ⎠⎜ ⎟ ⎛ ⎞= − − ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎛ ⎞ ⎝ ⎠⎜ ⎟−⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

( ) ( )

( )

0 0 0 0

1 1

0 1 2 301 0 02 0 03 0

2 21 2 3

1

1 2 301 0 02 0 03 0

21 2 3

ˆ

1 1 1 1 1

1 1 1

r rn n

n n n nn nn n n n n nn nn n n n n n nn n

Nn n nNn n n n n n

Nn n n n n nN

• •• •′ ′= − +

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞= − − − + −⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎜ ⎟ ⎜ ⎟⎝ ⎠⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

⎛ ⎞⎜ ⎟⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛+ − − − −⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠ ⎝⎜ ⎟⎝ ⎠⎜ ⎟⎝ ⎠

β Y β μ

( )1

2

0 1 1 2 2 3 301 0 02 0 03 0

1 2 3

1 1 2 2 3 301 0 02 0 03 0

1 2 3

001

1 1

1 1 1 1

1 1 1 1

1

NNNN

n n n n n n nn n n n n nn n n n n n n n n n

N n N n N nn n n n n nn N n n N n n N n

n nnn n

⎛ ⎞⎜ ⎟⎞ ⎜ ⎟⎜ ⎟

⎠ ⎜ ⎟⎜ ⎟⎝ ⎠

⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞= − − + − + − −⎨ ⎬⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎩ ⎭

⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞+ − + − − − −⎨ ⎬⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎩ ⎭

= − − 1 2 3 30 02 0 03 0 03 0

3

1 1 2 2 3 3 301 0 02 0 03 0 03 0

1 2 3 3

1

1 1

n n nn n n n n n nn n n n n

N n N n N n nn n n n n n n nN n n n n n n n n

⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ − + − + −⎨ ⎬⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎩ ⎭

⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞+ − + − + − − −⎨ ⎬⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎩ ⎭

153

3 30

0 0 0 0 00 0

30

0 00

1 1ˆ

1

k k kk k

k k k

k k kk

k k k

n n N nr n n n nn n n N n n

n N n n nn nn n n N N n

= =

=

⎧ ⎫⎛ ⎞ ⎛ ⎞= − − + −⎨ ⎬⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎩ ⎭

⎛ ⎞⎛ ⎞= − − −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

∑ ∑

∑


3

00 0 0

0

1ˆ k k kk

k k k

n N n nnr n nn n n N N n=

⎛ ⎞ ⎛ ⎞= − − −⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠

∑ .

154

CHAPTER 8

DIRECT RATE STANDARDIZATION USING

RANDOM PERMUTATION MODELS

8.1. Motivations

Results derived in Chapters 4 and 5 can be also used for “rate standardization” -

a common procedure in Epidemiology and vital statistics. In this chapter, we will show

that direct rate standardization is a special case of the estimators derived in Chapter 5

when rates are assumed to be equal across populations. We may view direct rate

standardization as a prediction problem, where one is trying to predict the population

death rates of the pooled population based on the two “samples”. Both populations,

observed either completely or partially, can be considered as two parts resulting from a

random partition of the pooled population. Therefore, the previously described

estimation strategies can be applied with minimal modification.

Without losing generality, we use gender-adjusted death rate as an example.

Results can be extended to situation where auxiliary variable is multinomial such as

age- or age and gender specific categories. We will consider a simple case of internal

direct standardization when two populations under comparison are observed

completely.

8.2. Rate estimation with internal direct adjustment

A common approach to rate standardization is to “pool” the populations under

study, and use the “combined population structure” for adjustment. For example, if two

populations A and B are under comparison, let us denote their sizes and gender ratio (%

155

of males) are A AN ,g and B BN ,g , respectively. Then the pooled population AB will

have size A BN N N= + and gender ratio ( ) ( )AB A A B B A Bg N g N g N N= + + . The rate

standardization can be made using ABg . This method was first proposed by Neison

(1844), who suggested comparing the mortality of two communities by having the

population of one community “actually transferred” to the other community and

subjecting it to “exactly the same rate of mortality as that prevailed” in other

community (Neison 1844).

Let us consider a simple case. Let ( )0jy represent survival status of a subject

( 1y = for death, and 0 for alive), and ( )1jy represent a subject’s gender (1 for male and 0

for female). Suppose that we are interested in estimating and comparing death rates of

population A and B (i.e., ( )0Ay and ( )0

By ). Further, suppose that population A consists of

AmN males and AfN females, and Am Af AN N N+ = ; population B consists of BmN males

and BfN females, and Bm Bf BN N N+ = . The pooled population AB thus consists of

m Am BmN N N= + males and f Af BfN N N= + , and m fN N N+ = subjects. Under

Neison’s approach, populations A and B are considered as random partitions of the

pooled population AB. Based on parts A and B , we will obtain a set of two predicted

total numbers of deaths (or death rates), i.e., ( )0ˆAT and ( )0

BT as well as Ar and Br . The

variance-covariance matrices of ( )0ˆAT and ( )0

BT (or Ar and Br ) can be also obtained.

Further, we will be able to estimate difference between Ar and Br and the rate ratio of

Ar to Br . Since the predicted total numbers are predictors of the “same” population,

they are comparable.

156

Suppose that two populations are completely observed, and the two observed

populations are viewed as a realization of random partition of the pooled population

into two parts that have the sizes AN and BN . Let ( ) ( )( )0 1 ′′ ′Y Y represent a random

permutation of the vector of response and auxiliary variables of the pooled population,

( ) ( )( ) ( ) ( )( )( )0 0 1 1A B A B

′′ ′ ′ ′=z y y y y . Using the method outlined in Chapter 5, we transform

( )1Y to ( ) ( )1 * 11N NN

⎛ ⎞= −⎜ ⎟⎝ ⎠

Y I J Y . We then partition ( ) ( )( )0 1 * ′′ ′Y Y into parts A and B

that have the same dimension as populations A and B,

( ) ( ) ( ) ( )0 1 * 0 1 *

A A B B

Part A Part B

′⎛ ⎞′ ′ ′ ′= ⎜ ⎟

⎜ ⎟⎝ ⎠

Z Y Y Y Y . (8.1)

The total number of deaths in the combined population is defined as

( ) ( )0 AA B

BT

⎛ ⎞′ ′= ⎜ ⎟

⎝ ⎠

ZZl l ,

where ( )1 0AA N

′= ⊗1l , ( )1 0BB N

′= ⊗1l , ( ) ( )( )0 1 *A A A

′′ ′=Z Y Y and

( ) ( )( )0 1 *B B B

′′ ′=Z Y Y .

Let assume that we are interested in deriving estimator of the overall total

number of deaths ( )0T based on populations A and B, i.e., ( )0ˆAT and ( )0

BT , respectively.

They can be defined using (8.1) as,

( )

( )

0

0

ˆ

ˆA A AA

B B BB

T

T

′ ′⎛ ⎞ +⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟′ ′⎜ ⎟ +⎝ ⎠⎝ ⎠⎝ ⎠

w 0 Z0 w Z

ll . (8.2)

157

The prediction error is defined as,

( )

( )

( )

( )

0 0

0 0

ˆ.

Â A A A A B AA A B

B B B B A B BA BB

T T

T T

′ ′ ′ ′⎛ ⎞ ⎛ ⎞ + −′ ′⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎞− = − =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟′ ′ ′ ′⎜ ⎟⎜ ⎟ + −′ ′⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠

w 0 Z Z w Z0 w Z Z w Z

l ll ll ll l

The unbiasedness of ( )0ÂT and ( )0

BT for ( )0T implies that,

( ) ( )

( ) ( )

0 0

0 0

ˆ

Â A A B B

B B A AB

T TE E

T T

⎛ ⎞ ′ ′− −⎛ ⎞= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ′ ′−− ⎝ ⎠⎝ ⎠

w Z Z0

w Z Zll

.

Since ( ) ( ) ( )( )02 0

AA NE μ ′= ⊗Z I 1 and ( ) ( ) ( )( )02 0

BB NE μ ′= ⊗Z I 1 , this constraint

is equivalent to

( )( )

( )( )

A A

B B

N A B N B

AN B A N

N NNN

⎛ ⎞ ⎛ ⎞′ ′− ⎛ ⎞⎜ ⎟ ⎜ ⎟= − =⎜ ⎟⎜ ⎟ ⎜ ⎟′ ′− ⎝ ⎠⎝ ⎠ ⎝ ⎠

1 0 w 1 0 0w 0

1 0 w 0 1 0, (8.3)

where ( )A B′′ ′=w w w .

From Chapter 2, we have ,

,

cov A A BA

B A BB

⎛ ⎞⎛ ⎞= ⎜ ⎟⎜ ⎟

⎝ ⎠ ⎝ ⎠

V VZV VZ

, where ,AA N N= ⊗V Σ P ,

,BB N N= ⊗V Σ P and , ,1

A BA B B A N NN ×⎛ ⎞′= = ⊗ −⎜ ⎟⎝ ⎠

V V Σ J . The variance-covariance

matrix of ( ) ( )( )0 0垐A BT T can be represented as

( )

( )

0, , ,

0, ,

ˆcov 2 .

Â A A A A B B B B A A B B B B B B B B A AA

B B A A B B B A A A A A B B A AB B A A AB

T

T

′ ′ ′ ′ ′ ′⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞= − +⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟′ ′ ′ ′ ′ ′⎜ ⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠

w V w w V w V w V w V Vw V w w V w V w V w V V

l l l l l ll l l l l l

Furthermore, we can show that

( )

( ) ( ) ( )0

0

ˆcov 2 .

Â BAA

B A A A A B B BB ABB

Ttrace

T

⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎪ ⎪ ′ ′ ′ ′ ′= − + +⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎨ ⎬ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠ ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭

V 0 V 0w w w V V0 V 0 Vl l l l l l

(8.4)

158

Following the methods of Chapters 4 and 5, we derive the estimators ( )0AT and

( )0BT by minimizing the trace of ( ) ( )( )0 0垐cov A BT T subject to (8.3), which is equivalent to

minimizing the following optimization function

( ) ( )

( )( )

2

2 .A

B

A BAB A

B AB

N B

AN

NN

⎛ ⎞ ⎛ ⎞′ ′ ′Φ = −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎧ ⎫⎛ ⎞′ ⎛ ⎞⎪ ⎪⎜ ⎟′+ −⎨ ⎬⎜ ⎟⎜ ⎟′ ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭

V 0 V 0w w w w0 V 0 V

1 0 0λ w

0 1 0

l l

Differentiating ( )Φ w with respect to w and λ , and setting the derivatives to zero

results in the following estimation equations,

( )( )

,

,ˆˆ

A

B

A

B

N

AA B B

B N B A A

B

N A

N

NN

⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟

⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠ ⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞ ⎜ ⎟⎝ ⎠⎜ ⎟⎜ ⎟ ⎛ ⎞ ⎝ ⎠⎜ ⎟⎝ ⎠⎜ ⎟⎜ ⎟ =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎛ ⎞⎜ ⎟⎝ ⎠ ⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞ ⎜ ⎟′ ⎝ ⎠⎜ ⎟ ⎝ ⎠⎜ ⎟⎜ ⎟⎜ ⎟′⎜ ⎟⎝ ⎠⎝ ⎠

10

V 0 0 V 00 V 1 0 Vw0

0 λ1 0 0

00 1 0

ll

. (8.5)

Solving (8.5) yields the following unique solution,

01

01

110ˆ

ˆ ˆ 1 10

BA

B B

NNAA

BN N

B

NN

NN

β

β

⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⊗⊗ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ −⎛ ⎞ ⎝ ⎠ ⎝ ⎠⎜ ⎟⎜ ⎟= = − +⎜ ⎟ ⎜ ⎟⎜ ⎟⎛ ⎞ ⎛ ⎞⎝ ⎠ ⎜ ⎟⎜ ⎟⊗ ⊗⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

11w

w w1 1

. (8.6)

where 201 01 1β σ σ= . Using (8.6) and (8.2), it is convenient to show that

( )

( )

( ) ( ) ( )( )( )( ) ( ) ( )( )( )0 1 1

0 01

0 0 1 101

ˆ

ˆ

A AA

B B B

NT

T N

β μ

β μ

⎛ ⎞− −⎛ ⎞ ⎜ ⎟=⎜ ⎟ ⎜ ⎟⎜ ⎟ − −⎝ ⎠ ⎜ ⎟⎝ ⎠

Y Y

Y Y. (8.7)

159

where ( )1μ is the population mean of the auxiliary variable of the pooled population,

( )0AY and ( )1

AY , ( )0BY and ( )1

BY are the means of the response and auxiliary variables of

subpopulation A and B, respectively. The variance-covariance matrix of ( ) ( )( )0 0A BT T is

( )

( ) ( )0

2 2 20 01 1 01 010

1ˆ

cov 2 .ˆ 1

B

AA

AB

B

NNT

N NTN

σ β σ β σ

⎛ ⎞−⎜ ⎟⎛ ⎞ ⎜ ⎟= + −⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ −⎜ ⎟⎜ ⎟

⎝ ⎠

(8.8)

Application to gender-adjusted death rates

By definition, ( )1 m mA mB

A B

N N NN N N

μ += =

+,

( )00

01 1m f fm

m f

N N NNN N N N

σ⎛ ⎞

= −⎜ ⎟⎜ ⎟− ⎝ ⎠,

( )21 1

m fN NN N

σ =−

, ( )( )

0 020 1

N N NN N

σ−

=−

, and thus 0001

fm

m f

NNN N

β = − . When applied to

estimating gender-adjusted population totals of deaths, (8.7) can be simplified as

( )

( )

00 0

0

000 0

ˆ

ˆ

fA B mB mA m

A B A m fA

fB B A mA mB m

B A B m f

NN N N N NNN N N N N NT

NT N N N N NNN N N N N N

⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟− − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎛ ⎞ ⎝ ⎠⎝ ⎠⎜ ⎟=⎜ ⎟⎜ ⎟ ⎛ ⎞⎜ ⎟⎛ ⎞⎛ ⎞⎝ ⎠ − − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠

. (8.9)

Its variance-variance matrix is

( )

( )

( )( ) ( )

2000 0 0

0

1ˆ

covˆ 1 11

B

A m f fA m

A m fB

B

NN N N NT N N N NN N N N N N N NT

N

⎛ ⎞−⎜ ⎟⎛ ⎞⎛ ⎞ ⎛ ⎞−⎜ ⎟⎜ ⎟= − −⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟− − ⎝ ⎠⎝ ⎠ − ⎝ ⎠⎜ ⎟⎜ ⎟⎝ ⎠

. (8.10)

Similarly, the standardized death rates and the variance-covariances are

160

( )

( )

00 0

0

000 0

ˆ

ˆ

fA B mB mA m

A B A m fA

fB B A mA mB m

B A B m f

NN N N N NN N N N N Nr

Nr N N N N NN N N N N N

⎛ ⎞⎛ ⎞⎛ ⎞− − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞ ⎜ ⎟⎝ ⎠⎝ ⎠

=⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎛ ⎞⎛ ⎞⎜ ⎟⎝ ⎠ − − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠

, (8.11)

( )

( )

( )( ) ( )

2000 0 0

0

1ˆ 1cov 11 1ˆB Am f fA m

A Bm fB

N NN N Nr N N N NN NN N N N N N Nr

⎧ ⎫⎛ ⎞ −⎛ ⎞ ⎛ ⎞−⎪ ⎪= − −⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟ −− − ⎝ ⎠⎪ ⎪⎝ ⎠⎝ ⎠ ⎩ ⎭, (8.12)

respectively.


8.3. Rate comparison with internal direct adjustment

With the results established in the previous sections, rates can be compared

using either their differences or their ratios.

8.3.1. Rate differences

When all units of the subpopulations are observed, one may evaluate the rate

difference by applying result (8.11). We define the rate difference between A and B as,

( ) ( )0 0 00 0 0垐 fA B mA mB mAB A B

A B A B m f

NN N N N NDR r rN N N N N N

⎛ ⎞⎛ ⎞= − = − + − −⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

. (8.13)

Applying result (8.12), the variance of ABDR is evaluated as

( ) ( )( )

( )

( )

( )

0

0

2 2 20 01 1 01 01

2

00 0 0

ˆ 1var 1 1 cov

1ˆ

2

1 1 .1 1

AAB

B

A B

m f fm

A B A B m f

rDR

rN

N N

N N NN N N NN N N N N N N N

σ β σ β σ

⎛ ⎞⎛ ⎞= − ⎜ ⎟⎜ ⎟⎜ ⎟ −⎝ ⎠⎝ ⎠

= + −

⎛ ⎞− ⎛ ⎞= − −⎜ ⎟⎜ ⎟⎜ ⎟− − ⎝ ⎠⎝ ⎠

161

Loosely based on (8.13) being a function of sample means, a test statistic may

be formulated based on the above results. One may test if ABDR is equal to zero based

on normal approximation. Notice that, when the two populations under comparison are

completely observed, ( )var ABDR are known constant.

8.3.2. Rate ratio

Similarly, the rate ratio of ( )0Ar to ( )0

Br is

( )

( )

00 00

000 0

ˆˆ

fA mB mA m

A B B A m fA BAB

AB fB mB mA m

A B B A m f

NNN N N NN N N N N Nr NRR

Nr NNN N N NN N N N N N

⎛ ⎞⎛ ⎞− − −⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞ ⎝ ⎠⎝ ⎠= = ⎜ ⎟ ⎛ ⎞⎛ ⎞⎝ ⎠ + − −⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

. (8.14)

Result (8.14) can be also expressed alternatively as,

0

00 0

1 .B A BAB

A fB mB mA m

A B B A m f

NNN N NRRN NNN N N N

N N N N N N

⎧ ⎫⎪ ⎪

⎛ ⎞⎪ ⎪= −⎨ ⎬⎜ ⎟ ⎛ ⎞⎛ ⎞⎝ ⎠⎪ ⎪+ − −⎜ ⎟⎜ ⎟⎪ ⎪⎜ ⎟⎝ ⎠⎝ ⎠⎩ ⎭

Its variance can be evaluated using the first-order Taylor linearization of

AB A BRR r r= , expanding around ( )A Br r ′=r , i.e.,

( )ˆ| ˆ垐 RemainderAB A B ABRR r r RR =′= = + − +r r rΔ r r ,

where 2

1ˆ

AAB

B B

rRRr r

′⎛ ⎞∂= = −⎜ ⎟∂ ⎝ ⎠

rr

Δr

. When the remainder term is ignored, we have an

expression for the linearized error,

( )ˆ| ˆABL ABRR RRε =′= − ≅ −r r rΔ r r . (8.15)

162

Since ( ){ } ( )垐| |垐 0E E= =′ ′− = − =r r r r r rΔ r r Δ r r , ABRR is approximately unbiased for

ABRR . The linearized approximate variance is thus

( ) ( )0,1ˆ ˆcovAV R ′≅ r rΔ r Δ . (8.16)

A sample estimator can be obtained by substituting a consistent estimator of

( )ˆcov r and replace r in rΔ by r , ˆ 2

ˆ1垐

A

B B

rr r

′⎛ ⎞= −⎜ ⎟⎝ ⎠

rΔ , such that

( ) ( )2 2

垐1 1ˆ ˆcov垐垐

A AAB

A B A B

r rV RRr r r r

′⎛ ⎞ ⎛ ⎞≅ − −⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

r . (8.17)

Equation (8.17) can be further simplified as

( ) ( ) ( )( ) ( )

2 22 20 0 00 0 0

4 20 0

1ˆ .1 1

B A A B m f fB mAB

A B A m f

N N N N N N NN N NN NV RRN N N N N N N N N N

⎧ ⎫+ ⎛ ⎞−⎪ ⎪≅ − −⎜ ⎟⎨ ⎬⎜ ⎟− −⎪ ⎪⎝ ⎠⎩ ⎭

If we assume that sample sizes are large enough so that asymptotic normality

applies, a 95% confidence interval of ABRR can be constructed using ( )ˆ ABV RR . A test

statistic can be formulated to test whether ABRR is equal to 1, that is, 垐A BR R= .

8.4. External direct adjustment

External direct adjustment method usually use an “outside” reference

population, for example, US Population based on recent census, or the immediately

larger population which includes the populations under comparison. For example, if we

are interested in estimating and comparing the directly adjusted death rates of Franklin

County (A) and Worcester County (B) in Massachusetts, we may choose the

Commonwealth of Massachusetts as the reference population.

163

Suppose we are interested in the adjusted rates for population A and B.

Similarly, populations A and B can be considered as two parts resulting from random

permutation of the larger reference population (the Commonwealth of Massachusetts).

The estimation process and the consequent estimators are invariant to those presented in

Section 8.2.

8.5. Discussion

In this Chapter, we have developed a method of internal direct rate

standardization based on random permutation models where we assume equal

population rates. Unlike those direct rate standardization method commonly used in

Epidemiology and vital statistics, this new method addresses the finiteness of the

populations involved and yields a variance-covariance matrix of the standardized rates

of the populations under study. In conventional approaches (Kahn and Sempos 1989),

the covariances between the rates of any two populations (or subclasses) are ignored.

With the variance-covariance, it is possible to conduct hypothesis testing based on

asymptotic normal approximation. When both populations under comparison are

observed completely, it is also possible to use non-parametric method to evaluate the

distribution of the two adjusted rates, their differences or ratios.

We limited our discussion to gender-adjusted rates. Extension of the above

results to age- or gender- and age-adjusted rates, rate differences and ratios are analogue

to cases where an auxiliary variable is multinomial. Applying the results developed in

Chapter 5 to internal direct standardization involving multiple auxiliary variables is an

interest field of research in the future.

164

8.6. Derivations of the results in Section 8.2.

As shown in (8.1), the estimators for ( ) ( )( )0 0A BT T ′ is defined as,

( )

( )

0

0

ˆ

ˆA A AA

B B BB

T

T

′ ′⎛ ⎞ +⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟′ ′⎜ ⎟ +⎝ ⎠⎝ ⎠⎝ ⎠

w 0 Z0 w Z

ll .

The corresponding prediction error is,

( )

( )

( )

( )

0 0

0 0

ˆ

ˆ

.

A A A AA A B

B B B BA BB

A B A

A B B

T T

T T

′ ′⎛ ⎞ ⎛ ⎞ + ′ ′⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞− = −⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟′ ′⎜ ⎟⎜ ⎟ + ′ ′⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠

′ ′−⎛ ⎞⎛ ⎞= ⎜ ⎟⎜ ⎟′ ′−⎝ ⎠⎝ ⎠

w 0 Z Z0 w Z Z

w Zw Z

l l ll l l

ll

The unbiasedness of ( )0AT and ( )0

BT for ( )0T implies that,

( ) ( )

( ) ( )

0 0

0 0

A A A B B

B B A AB

T TE E

T T

⎛ ⎞ ′ ′− −⎛ ⎞= =⎜ ⎟ ⎜ ⎟⎜ ⎟ ′ ′−− ⎝ ⎠⎝ ⎠

w Z Z0

w Z Zll

.

Since ( ) ( ) ( )( )02 0

AA NE μ ′= ⊗Z I 1 and ( ) ( ) ( )( )02 0

BB NE μ ′= ⊗Z I 1 , this constraint

is equivalent to

( )( )

( )( )

A A

B B

N A B N B

AN B A N

N NNN

⎛ ⎞ ⎛ ⎞′ ′− ⎛ ⎞⎜ ⎟ ⎜ ⎟= − =⎜ ⎟⎜ ⎟ ⎜ ⎟′ ′− ⎝ ⎠⎝ ⎠ ⎝ ⎠

1 0 w 1 0 0w 0

1 0 w 0 1 0. (8.18)

where ( )A B′′ ′=w w w .

Now, we apply results derived in Section 4.5.2 of Chapter 4 and Chapter 5. We

optimize the estimators in terms of the trace of ( ) ( )( )0 0垐cov A BT T . From Chapter 2, we

have ,

,

covA A BA

B A BB

⎛ ⎞⎛ ⎞⎜ ⎟=⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

V VZ

V VZ, where ,AA N N= ⊗V Σ P , ,BB N N= ⊗V Σ P and

165

, ,1

A BA B B A N NN ×⎛ ⎞′= = ⊗ −⎜ ⎟⎝ ⎠

V V Σ J . Therefore, ( ) ( )( )0 0垐cov A BT T can be represented as,

( )

( )

0,

0,

,

,

,

,

cov A B A A B A AA

A B B A B B BB

A B A A B A A

B A B A B B B

A A A B A

B B A B

T

T

′ ′⎛ ⎞ − −⎛ ⎞⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟′ ′⎜ ⎟ − −⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠

′ ′⎧ ⎫ ⎧ ⎫⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎪ ⎪ ⎪ ⎪= − −⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎨ ⎬ ⎨ ⎬′ ′⎪ ⎪ ⎪ ⎪⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎩ ⎭ ⎩ ⎭′⎛ ⎞⎛ ⎞

= ⎜ ⎟⎜ ⎟′⎝ ⎠⎝ ⎠

w V V ww V V w

w 0 0 V V w 0 00 w 0 V V 0 w 0

w 0 V V w0 w V V

l ll l

l ll l

,

,

, ,

, ,

, ,

,2

B A A B A

B A B A B B

A A A B A B A A B A

B B A B B A B A B B

A A A A A B B B B A A B B

B B A A B B B

′⎛ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎞+⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟′⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠

′ ′⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎞− −⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟′ ′⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠

′ ′ ′ ′⎛ ⎞= −⎜ ⎟′ ′⎝ ⎠

0 0 V V 00 w 0 V V 0

w 0 V V 0 0 V V w 00 w V V 0 0 V V 0 w

w V w w V w V w Vw V w w V w

l ll l

l ll l

l l

,

,

B

A A A A A B B

B B B B B A A

A AB B A A A

⎛ ⎞⎜ ⎟′ ′⎝ ⎠

′ ′⎛ ⎞+⎜ ⎟′ ′⎝ ⎠

wV w V w

V VV V

l l

l l l ll l l l

We derive estimators of ( ) ( )( )0 0A BT T by minimizing trace of ( ) ( )( )0 0cov ,A BT T .

The trace of ( ) ( )( )0 0cov ,A BT T is

( )

( )

( )

0, ,

0, ,

,

,

,

cov 2

2

A A A A A B B B B A A B B BA

B B A A B B B A A A A A B BB

B B B B B A A

A AB B A A A

A B AB A

B A B

Ttrace trace trace

T

trace

⎧ ⎫ ′ ′ ′ ′⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎪ ⎪ = −⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎨ ⎬ ′ ′ ′ ′⎜ ⎟ ⎝ ⎠ ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭′ ′⎛ ⎞

+ ⎜ ⎟′ ′⎝ ⎠⎛ ⎞ ⎛′ ′ ′= −⎜ ⎟⎝ ⎠ ⎝

w V w w V w V w V ww V w w V w V w V w

V VV V

V 0 V 0w w0 V 0 V

l ll l

l l l ll l l l

l l ( )A A A B B B

⎞′ ′+ +⎜ ⎟

⎠w V Vl l l l

The optimization function is thus defined as

( ) ( )

( )( )

,

,2

2 A

B

A B AB A

B A B

N B

AN

NN

⎛ ⎞ ⎛ ⎞′ ′ ′Φ = −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎧ ⎫⎛ ⎞′ ⎛ ⎞⎪ ⎪⎜ ⎟′+ −⎨ ⎬⎜ ⎟⎜ ⎟′ ⎝ ⎠⎪ ⎪⎝ ⎠⎩ ⎭

V 0 V 0w w w w0 V 0 V

1 0 0λ w

0 1 0

l l

.

166

Differentiating ( )Φ w with respect to w and λ , we have

( )

( )( )

( )

,

,2 2 2

2

A

B

A

B

N

A A B B

B B A A N

N B

AN

NN

⎧ ⎛ ⎞⎛ ⎞⎪ ⎜ ⎟⎜ ⎟

⎛ ⎞ ⎛ ⎞⎛ ⎞⎪ ∂ ⎜ ⎟⎝ ⎠Φ = − +⎜ ⎟ ⎜ ⎟⎜ ⎟⎪ ⎜ ⎟⎜ ⎟ ⎜ ⎟∂ ⎛ ⎞⎝ ⎠⎝ ⎠ ⎝ ⎠⎪ ⎜ ⎟⎪ ⎜ ⎟⎜ ⎟⎨ ⎝ ⎠⎝ ⎠⎪⎪ ⎧ ⎫⎛ ⎞′ ⎛ ⎞∂ ⎪ ⎪⎪ ⎜ ⎟Φ = −⎨ ⎬⎜ ⎟⎪ ⎜ ⎟∂ ′ ⎝ ⎠⎪ ⎪⎝ ⎠⎪ ⎩ ⎭⎩

10

V 0 V 0 0w w λ0 V 0 Vw 1

00

1 0 0w w

λ 0 1 0

ll

Setting both derivatives to zero results in the following estimating equations

( )( )

,

,ˆˆ

A

B

A

B

N

AA B B

B N B A A

B

N A

N

NN

⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟

⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠ ⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞ ⎜ ⎟⎝ ⎠⎜ ⎟⎜ ⎟ ⎛ ⎞ ⎝ ⎠⎜ ⎟⎝ ⎠⎜ ⎟⎜ ⎟ =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎛ ⎞⎜ ⎟⎝ ⎠ ⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞ ⎜ ⎟′ ⎝ ⎠⎜ ⎟ ⎝ ⎠⎜ ⎟⎜ ⎟⎜ ⎟′⎜ ⎟⎝ ⎠⎝ ⎠

10

V 0 0 V 00 V 1 0 Vw0

0 λ1 0 0

00 1 0

ll

. (8.19)

Solving (8.19) yields the following unique solution,

1 1,

,

ˆˆ

A

B

N

A A B AB

B B A BA N

− −⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎝ ⎠⎜ ⎟= −⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎛ ⎞⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

10

V 0 V 0 V 0 0w λ0 V 0 V 0 V 1

00

ll

( )( )

( )( )

1

1

1,

,

ˆ

.

A

A

BB

A

B

N

N A

B NN

N A A B B B

B B A A AN

NN

−

−

−

⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞′ ⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎛ ⎞′ ⎝ ⎠⎜ ⎟⎜ ⎟⎝ ⎠ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎛ ⎞⎛ ⎞′ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟× −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟′⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠

10

1 0 0 V 0 0λ 0 V 10 1 0

00

1 0 0 V 0 V 00 V 0 V0 1 0

ll


167

( ) ( ){ }

( ) ( ){ }

1

1 1,

1

1 1,

ˆ

A

A A

B

B B

NN A N A A B B B

NN B N B B A A A

N

N

−

− −

−

− −

⎛ ⎞⎧ ⎫⎛ ⎞⎪ ⎪⎜ ⎟′ ′ −⎨ ⎜ ⎟⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭⎜ ⎟=⎜ ⎟⎧ ⎫⎛ ⎞⎪ ⎪′ ′⎜ ⎟−⎨ ⎜ ⎟⎬⎜ ⎟⎪ ⎪⎝ ⎠⎩ ⎭⎝ ⎠

11 0 V 1 0 V V

0λ

11 0 V 1 0 V V

0

l

l

,

and

( ) ( ){ }

( ) ( ){ }

1

1 1 1,1

,1 1

,1 1 1

,

ˆ

A A

A A

B B

B B

N NA N A N A A B B B

A A B B

B B A A N NB N B N B B A A A

N

N

−

− − −

−

− −

− − −

⎛ ⎞⎧ ⎫⎛ ⎞ ⎛ ⎞⎪ ⎪⎜ ⎟′ ′ −⎜ ⎟⎨ ⎜ ⎟⎬⎜ ⎟⎛ ⎞ ⎪ ⎪⎝ ⎠ ⎝ ⎠⎩ ⎭⎜ ⎟= −⎜ ⎟⎜ ⎟ ⎜ ⎟⎧ ⎫⎝ ⎠ ⎛ ⎞ ⎛ ⎞⎪ ⎪′ ′⎜ ⎟−⎜ ⎟⎨ ⎜ ⎟⎬⎜ ⎟⎪ ⎪⎝ ⎠ ⎝ ⎠⎩ ⎭⎝ ⎠

1 1V 1 0 V 1 0 V V

0 0V Vw

V V 1 1V 1 0 V 1 0 V V

0 0

lll

l

.

Since 1 1 1,AA N N

− − −= ⊗V Σ P , 1 1,BB N N

− −= ⊗V Σ P , , ,1

A BA B B A N NN ×⎛ ⎞′= = ⊗ −⎜ ⎟⎝ ⎠

V V Σ J ,

we have 1, 2

1A BA A B N N

AN N−

×

⎛ ⎞= ⊗ −⎜ ⎟−⎝ ⎠

V V I J , 1, 2

1B AB B A N N

BN N−

×

⎛ ⎞= ⊗ −⎜ ⎟−⎝ ⎠

V V I J ,

( )1, 1 0

BB B A A N− ′= − ⊗V V 1l , ( )1

, 1 0AA A B B N

− ′= − ⊗V V 1l . Consequently,

( )2

1 12 2 20 1 01

11 0

0σ

σ σ σ− ⎛ ⎞

=⎜ ⎟ −⎝ ⎠Σ , and ( )

1 2 2 21 0 1 01

21

11 0

0σ σ σ

σ

−

−⎛ ⎞⎛ ⎞ −=⎜ ⎟⎜ ⎟

⎝ ⎠⎝ ⎠Σ . Further,

( )2

1 12 2 20 1 01

A

A

N AN A

A

NNN N

σσ σ σ

− ⎛ ⎞′ =⎜ ⎟

− −⎝ ⎠

11 0 V

0, ( )

21 1

2 2 20 1 01

B

B

N BN B

B

NNN N

σσ σ σ

− ⎛ ⎞′ =⎜ ⎟

− −⎝ ⎠

11 0 V

0.

Therefore,

01 01

01

01

110

ˆ11

0

AAA

BBB

B

NNN A

A

ANN

NBB

NNN NNN

NN NNN N

β β

ββ

⎛ ⎞⎛ ⎞−⎜ ⎟⎜ ⎟⎛ ⎞⎛ ⎞ ⊗⎛ ⎞⎛ ⎞ ⎜ ⎟⊗ ⎜ ⎟⊗ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ − −⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎜ ⎟⎜ ⎟= − + = ⎜ ⎟⎜ ⎟⎜ ⎟ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎜ ⎟−⎜ ⎟⊗⎜ ⎟⊗ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⊗⎜ ⎟⎜ ⎟−⎝ ⎠⎝ ⎠⎝ ⎠ ⎜ ⎟⎝ ⎠ ⎜ ⎟⎜ ⎟⎜ ⎟−⎝ ⎠⎝ ⎠

111w

11 1

, (8.20)

where 201 01 1β σ σ= . Consequently,

168

01

01

11ˆ

0 A A A

B

A N N NA A

NN N

NN Nβ β

⎛ ⎞−⎛ ⎞⎛ ⎞ ⎜ ⎟= − ⊗ + ⊗ = ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟− ⎜ ⎟⎝ ⎠ ⎝ ⎠ −⎝ ⎠

w 1 1 1

01

01

11ˆ

0 B B B

A

B N N NB B

NN N

NN Nβ β

⎛ ⎞−⎛ ⎞⎛ ⎞ ⎜ ⎟= − ⊗ + ⊗ = ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟− ⎜ ⎟⎝ ⎠ ⎝ ⎠ −⎝ ⎠

w 1 1 1 .

The estimators for ( ) ( )( )0 0A BT T can be represented as

( )

( )

( )( )( )

( )

( )( )( )

( )

( ) ( )( )( ) ( )( )

( ) ( )( )

0

01 1 *0

00

01 1 *

0 1 *0 1 *01

01

0 1 *01

1ˆ

ˆ1

A

B

A A

B B

AN

A AA A AA

B B B BBN

B B

N A N AA AA

N B N BB

NNT

T NN

NNN

N NN

β

β

β β

β

⎛ ⎞⎛ ⎞′⎜ ⎟− ⊗ ⎜ ⎟⎜ ⎟⎜ ⎟′ ′⎛ ⎞ +⎛ ⎞⎛ ⎞ ⎝ ⎠

⎜ ⎟= =⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟′ ′⎜ ⎟ + ⎛ ⎞⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠ ′− ⊗ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎛ ⎞′ ′−⎜ ⎟ −⎜ ⎟= =⎜ ⎟

′ ′−⎜ ⎟⎜ ⎟⎝ ⎠

Y1

Yw 0 Z0 w Z Y

1Y

1 Y 1 Y Y Y

Y1 Y 1 Y

ll

( ) ( )( )0 1 *01

,B Bβ

⎛ ⎞⎜ ⎟⎜ ⎟−⎜ ⎟⎝ ⎠

Y

or simply,

( )

( )

( ) ( ) ( )( )( )( ) ( ) ( )( )( )0 1 1

0 01

0 0 1 101

ˆ,

ˆ

A AA

B B B

NT

T N

β μ

β μ

⎛ ⎞− −⎛ ⎞ ⎜ ⎟=⎜ ⎟ ⎜ ⎟⎜ ⎟ − −⎝ ⎠ ⎜ ⎟⎝ ⎠

Y Y

Y Y (8.21)

where ( )1μ is the population mean of the auxiliary variable of the pooled population,

( )0AY and ( )1

AY , ( )0BY and ( )1

BY are the means of the response and auxiliary variables of

subpopulation A and B, respectively.

The variance-covariance matrix of ( ) ( )( )0 0A BT T is evaluated as follows.

169

( )

( )

( ) ( ) ( ) ( )( ) ( ) ( ) ( )

0,

0,

,

,

ˆcov

ˆ

.

A A A AA A BA

B B B BB A BB

A A A A A A A A B B B

B B B A A A B B B B B

T

T

′ ′⎛ ⎞ + +⎛ ⎞ ⎛ ⎞⎛ ⎞=⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟′ ′⎜ ⎟ + +⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠

′ ′ ′ ′⎛ + + + + ⎞= ⎜ ⎟⎜ ⎟′ ′ ′ ′+ + + +⎝ ⎠

w 0 w 0V V0 w 0 wV V

w V w w V ww V w w V w

l ll l

l l l ll l l l

Since ( )( )011AA A N

A

NN

β′ ′ ′+ = − ⊗w 1l and ( )( )011BB B N

B

NN

β′ ′ ′+ = − ⊗w 1l , we have

( ) ( ) ( )( )( )

( ) ( )

( )

( )

2

01 ,01

2

01 ,01

2 20 01

01 20101 1

2 2 20 01 1 01 01

11

11

11

2

A A A

A A A

A A A A A N N N NA

N N N NA

A

B

A

NN

NN

NN

NNN

ββ

ββ

σ σβ

βσ σ

σ β σ β σ

⎛ ⎞⎛ ⎞ ⎛ ⎞′ ′ ′+ + = − ⊗ ⊗ ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟−⎝ ⎠⎝ ⎠ ⎝ ⎠

⎛ ⎞⎛ ⎞ ⎛ ⎞′= − ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟−⎝ ⎠⎝ ⎠ ⎝ ⎠

⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞= −⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟−⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠

= + −

w V w 1 Σ P 1

Σ 1 P 1

l l

( ) ( ) ( )( )( )

( ) ( )

( )

( )

2

01 ,01

2

01 ,01

2 20 01

01 20101 1

2 2 20 01 1 01 01

11

11

11

2

B B B

B B B

B B B B B N N N NB

N N N NB

B

A

B

NN

NN

NN

NNN

ββ

ββ

σ σβ

βσ σ

σ β σ β σ

⎛ ⎞⎛ ⎞ ⎛ ⎞′ ′ ′+ + = − ⊗ ⊗ ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟−⎝ ⎠⎝ ⎠ ⎝ ⎠

⎛ ⎞⎛ ⎞ ⎛ ⎞′= − ⊗⎜ ⎟⎜ ⎟ ⎜ ⎟−⎝ ⎠⎝ ⎠ ⎝ ⎠

⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞= −⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟−⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠

= + −

w V w 1 Σ P 1

Σ 1 P 1

l l

( ) ( ) ( ) ( )

( )

( )

, ,

2

01 01

2

0101

2 2 20 01 1 01 01

1 1

1 11

2

A B

A B

A A B B

A A A B B B B B B A A A

N NN N

A B

N N N NA B

NN N N

NN N N

N

β β

ββ

σ β σ β σ

×

×

′ ′ ′ ′+ + = + +

⎛ ⎞′ ⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟′= ⊗ ⊗ − ⊗⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞′= − ⊗ −⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟− ⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠

= − + −

w V w w V w

J1 Σ 1

Σ 1 J 1

l l l l

170

Consequently,

( )

( ) ( )0

2 2 20 01 1 01 010

1ˆ

cov 2 .ˆ 1

B

AA

AB

B

NNT

N NTN

σ β σ β σ

⎛ ⎞−⎜ ⎟⎛ ⎞ ⎜ ⎟= + −⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ −⎜ ⎟⎜ ⎟

⎝ ⎠

(8.22)

By definition, ( )1 m mA mB

A B

N N NN N N

μ += =

+, ( )

( )0 02

0 1N N N

N Nσ

−=

−,

( )21 1

m fN NN N

σ =−

,

( )00

01 1m f fm

m f

N N NNN N N N

σ⎛ ⎞

= −⎜ ⎟⎜ ⎟− ⎝ ⎠, and thus 00

01fm

m f

NNN N

β = − . Equation (8.21) can be

further simplified as follows,

( )

( )

0 001 010

00 0

01 01

ˆ

ˆ

A mA mB mA A B mB mA

A A B A A B AA

B mA mB mB B A mA mBB

B A B B B A B

N N N N N N N NN NN N N N N N N NT

N N N N N N N NT N NN N N N N N N N

β β

β β

⎛ ⎞ ⎛⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞+− − − −⎜ ⎟ ⎜⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟+⎛ ⎞ ⎜ ⎟ ⎜⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠

= =⎜ ⎟ ⎜ ⎟ ⎜⎜ ⎟ ⎛ ⎞ ⎛ ⎞⎛ ⎞ ⎛ ⎞+⎜ ⎟ ⎜⎝ ⎠ − − − −⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟+⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝ ⎠⎝ ⎠ ⎝

00 0

00 0

.

fA B mB mA m

A B A m f

fB A mA mB m

B A B m f

NN N N N NNN N N N N N

NN N N N NNN N N N N N

⎞⎟⎟⎟⎟

⎜ ⎟⎜ ⎟⎠

⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞⎜ ⎟− − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎜ ⎟=

⎛ ⎞⎜ ⎟⎛ ⎞⎛ ⎞− − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠⎝ ⎠

It is also shown that

( )( ) ( )

2

00 02 2 2 00 01 1 01 012

1 1m f fm

m f

N N NN N N NN N N N N N

σ β σ β σ⎛ ⎞−

+ − = − −⎜ ⎟⎜ ⎟− − ⎝ ⎠. (8.23)

Therefore,

( )

( )

( )( ) ( )

2000 0 0

0

1ˆ

covˆ 1 11

B

A m f fA m

A m fB

B

NN N N NT N N N NN N N N N N N NT

N

⎛ ⎞−⎜ ⎟⎧ ⎫⎛ ⎞ ⎛ ⎞−⎪ ⎪⎜ ⎟= − −⎜ ⎟ ⎜ ⎟⎨ ⎬⎜ ⎟⎜ ⎟⎜ ⎟ − −⎪ ⎪⎝ ⎠⎝ ⎠ − ⎩ ⎭⎜ ⎟⎜ ⎟⎝ ⎠

.

171

Through similar derivation, the estimators and the variance-covariance matrix

for gender-adjusted rates of populations A and B are shown to be

( )

( )

00 0

0

000 0

ˆ

ˆ

fA B mB mA m

A B A m fA

fB B A mA mB m

B A B m f

NN N N N NN N N N N Nr

Nr N N N N NN N N N N N

⎛ ⎞⎛ ⎞⎛ ⎞− − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎛ ⎞ ⎜ ⎟⎝ ⎠⎝ ⎠

=⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎛ ⎞⎛ ⎞⎜ ⎟⎝ ⎠ − − −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠⎝ ⎠

,

( )

( )

( )( ) ( )

2000 0 0

0

1ˆ 1cov 1 1 1ˆB A m f fA m

A B m fB

N N N N Nr N N N NN NN N N N N N Nr

⎧ ⎫⎛ ⎞ − ⎛ ⎞⎛ ⎞ −⎪ ⎪= − −⎜ ⎟ ⎜ ⎟⎜ ⎟⎨ ⎬⎜ ⎟ ⎜ ⎟⎜ ⎟ − − −⎝ ⎠⎪ ⎪⎝ ⎠⎝ ⎠ ⎩ ⎭,

respectively.

172

CHAPTER 9

CONCLUDING REMARKS

This dissertation proposed alternative methods of deriving estimators using

random permutation model (Stanek, Singer and Lencina 2003) for settings of simple

random sampling without replacement (SRSWOR) and stratified simple random

sampling without replacement (STSRS). We integrated techniques from several areas

in survey sampling. We used a finite random permutation as a probability link between

samples and the population. This framework does not require assumptions about

parametric distribution, and the randomness is attributable only to random sampling

(permutation). We represented the joint permutation of response and auxiliary variables

with a seemingly unrelated regression setup, and thus the structural relationship

between the permuted population and the population was represented explicitly in

matrix format. This setup provides a convenient mathematical vehicle for deriving

estimators when multiple variables are involved. Such a setup enabled us to derive

linear unbiased estimators under design-based framework (Stanek, Singer and Lencina

2003) using estimation techniques that are commonly applied in prediction-based

approaches, such as Royall’s general prediction theorem (Royall 1976; Valliant,

Dorfman and Royall 2000). When multiple estimators are simultaneous estimated, this

approach is capable of retaining the covariance between the estimators, which is a

feature particularly useful in rate standardization and small-area estimates. Finally, we

used finite estimating equations approach (Binder and Patak 1994) to optimize the

173

estimators by minimizing either generalized mean squared error or the trace of

variance-covariance matrix of the target parameters subject to various constraints.

With this proposed method, we obtained explicit expression for the joint

permutation of response and multiple auxiliary variables, and formulae for the

population mean vector and their variance-covariance matrix. These results are similar

to those commonly seen in literature regarding seemingly unrelated regression (Zellner

1962; Zellner 1963; Revankar 1974; Srivastava and Giles 1987; Gao and Huang 2000;

Liu 2000), but our results incorporate finiteness and rely on neither parametric nor

superpopulation model assumptions.

We showed that sampling can be represented as a partition of the joint

permutation into the sample and the remainder parts, and the vector of population

means can be represented as a linear function of the random variables arising from the

permutation. We derived linear unbiased minimum variance predictors of population

means by predicting the remainder parts based on the sample, which may be called as

“design-based predictors”. Using this method, we derived, in Chapter 4, two classes of

estimators for population totals when auxiliary information was not available. The two

classes were defined by two sets of weights that differed in whether identical weights

were applied to each variable. The estimators were optimized by minimizing either the

trace of ( )ˆcov T or M-estimation criteria. Under SRSWOR and unbiasedness

constraints, it is shown that the estimators are identical to simple expansion estimators.

It is apparent that neither the method of defining estimators nor the optimization criteria

had impact on the estimation.

174

In Chapter 5, we derived estimators adjusted for known auxiliary information

through reparameterization. This approach yielded a set of weights that are not sample-

dependent, and the estimators are unbiased by definition. In contrast, the usual

calibration technique of incorporating auxiliary information leads to sample-dependent

weights, and the corresponding estimators do not have an unbiasedness property

(Deville and Särndal 1992).

Although the estimators derived in this dissertation have a similar functional

form to those derived using design-based regression model (Cochran 1977), model-

assisted (Särndal, Swensson and Wretman 1992), model-based (Bolfarine and Zacks

1992; Valliant, Dorfman and Royall 2000) and calibration approaches (Deville and

Särndal 1992; Brewer 1999), they depend on neither the superpopulation model nor

regression model assumptions. Further, the characteristics of design-based predictors

are best illustrated in contrast to usual calibration estimators:

1) Both approaches incorporate known auxiliary information and satisfy

constraints for the known auxiliary quantities.

2) Design-based predictors are optimized in terms of its mean squared error or

its variance, a direct and natural measure of precision; while calibration estimators are

optimized in terms of the distance between adjusted and naïve weights, with no

assurance that MSE or variance is minimum.

3) Unbiasedness constraints can be explicitly prescribed in design-based

prediction method but not in the usual calibration method.

175

4) Design-based prediction approach considers weights for auxiliary variables

irrelevant to the estimation of parameters, while calibration approach is based on

argument that identical weights should be used for response and auxiliary variables.

5) Unlike the usual calibration technique, the design-based prediction approach

leads to a set of coefficients (weights) that do not depend on the sample.

6) Another advantage of design-based predictors is that it always has a unique

solution when known population variance (or variance-covariance matrix) of auxiliary

variable(s) is used for estimation. In contrast, calibration estimator is computed based

on sample variance (or variance-covariance matrix) of the auxiliary variable(s), there is

no unique solution when sample variance-covariance matrix of the auxiliary variables is

singular.

In Chapter 6, we extended the results of Chapter 5 to settings of STSRS. By

treating the sampling process in each stratum as independent permutation processes,

extension of the results to STSRS is straightforward.

We applied these results to rate estimation in Chapter 7. When the response

variable is binomial and the auxiliary variable is categorical, we showed that the

derived estimators are post-stratified estimators when sample variance-covariances are

used for estimation, and equal to those derived using model-assisted (Särndal,

Swensson and Wretman 1992) or calibration (Deville and Särndal 1992) approaches.

Through a series Monte Carlo simulations, we illustrated that using known population

variance-covariance matrix in estimation has its competitive edge, which always yields

a unique solution with equal or smaller MSE to estimators based on sample auxiliary

variance-covariances.

176

We showed, in Chapter 8, that rate standardization based on a random

permutation model may account for the finiteness of the involved populations and

yields a variance-covariance matrix of the corresponding standardized rates. Unlike the

conventional rate standardization methods that assume the rate estimations are

independent from each other (Kahn and Sempos 1989), we obtained the covariance of

the estimated (standardized) rates under a model that assume equal rates in each

population.

In conclusion, we demonstrated in this dissertation a design-based prediction

approach of incorporating auxiliary information through reparameterization under

random permutation framework. We showed that this method is a useful alternative

approach of deriving linear unbiased estimators.

This dissertation research has its limitations. We considered only linear

unbiased minimum variance estimators for (stratified) simple random sampling without

replacement and use of auxiliary variables at the estimation stage. The usefulness of the

design-based prediction approach can be extended with further research on its

application to other estimation/prediction problems, such as small-domain or –area

estimation, unit non-response adjustment, and estimation of non-linear function. It is

interesting to examine whether relaxation of unbiasedness constraints or use of other

optimal criteria such as log-determinants or M-estimation will result in improvement or

loss of efficiency in estimation. Another challenge to this design-based prediction

approach is to develop estimation/prediction method for more complex sampling

design, such as multiple stage cluster random sampling or other type of unequal

probability sampling.

177

BIBLIOGRAPHY

Binder, D. A. and Z. Patak (1994). Use of estimating functions for estimation from complex surveys. Journal of the American Statistical Association 89(427): 1035-1043.

Bolfarine, H. and S. Zacks (1992). Prediction theory for finite populations. New York, Springer-Verlag.

Breslow, N. E. and N. E. Day (1987). Design and analysis of cohort studies. Lyon, France, IARC Scientific Publication N. 82.

Brewer, K. R. W. (1963). Ratio estimation and finite populations: Some results deducible from the assumption of underlying stochastic process. Australian Journal of Statistics 5: 93-105.

Brewer, K. R. W. (1979). A class of robust sampling designs for large-scale surveys. Journal of the American Statistical Association 74: 911-915.

Brewer, K. R. W. (1995). Combining design-based and model-based inference (chapter 30). Business survey methods. B. B. Cox, DA; Chinnappa, BN; Christianson, A; Colledge, MJ; Kott, PS. New York, John Wiley & Sons, Inc.: 589-606.

Brewer, K. R. W. (1999). Cosmetic calibration with unequal probability sampling. Survey methodology 25: 205-212.

Brewer, K. R. W. (1999). Design-based or prediction-based inference? Stratified random vs stratified balanced sampling. International Statistical Review 67(1): 35-47.

Cassel, C. M., C.-E. Särndal and J. H. Wretman (1977). Foundations of inference in survey sampling. New York, NY, Wiley.

CDC (1999). Notice to readers: New population standard for age-adjusting death rates. Morbidity and Mortality Weekly Report 48: 126-127.

Choi, B. C., N. A. de Guia and P. Walsh (1999). Look before you leap: Stratify before you standardize. Am J Epidemiol 149(12): 1087-1096.

Cochran, W. G. (1977). Sampling techniques. New York, John Wiley and Sons. Deville, J. C. and C. E. Särndal (1992). Calibration estimators in survey sampling.

Journal of the American Statistical Association 87: 376-382. Duchesne, P. (2000). A note on jackknife variance estimation for the general regression

estimator. Journal of Official Statistics 16(2): 133-138. Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistics

7: 1-26. Efron, B. and R. J. Tibshirani (1993). An introduction to the bootstrap. New York,

Chapman and Hall. Estevao, V. M. and C.-E. Särndal (2000). A functional form approach to calibration.

Journal of Official Statistics 16(4): 379-399. Esteve, J., E. Benhamou and L. Raymond, Eds. (1994). Descriptive epidemiology.

Statistical methods in cancer research no. 128. Lyon, France, IARC Scientific Publication.

Feldstein, M. S. (1966). A binary variable multiple regression method of analysing factors affecting peri-natal mortality and other outcomes of pregnancy. Journal of the Royal Statistical Society, Series A 129: 61-73.

178

Fleiss, J. L. (1981). Statistical methods for rates and proportions. New York, NY, John Wiley & Sons, Inc.

Forthofer, R. N. and R. G. Lehnen (1981). Public program analysis: A new categorical data approach. Belmont, CA, Lifetime Learning Publications.

Gao, D.-d. and R.-B. Huang (2000). Some finite sample properties of zellner estimator in the context of m seemingly unrelated regression equations. Journal of Statistical Planning and Inference 88: 267-283.

Ghosh, M. and J. N. K. Rao (1994). Small area estimation: An appraisal. Statistical Science 9(1): 55-76.

Godambe, V. P. (1955). A unified theory of sampling from finite populations. Journal of the Royal Statistical Society B 17: 269-278.

Goldman, D. A. and J. D. Brender (2000). Are standardized mortality ratios valid for public health data analysis? Statistics in Medicine 19: 1081-1088.

Graybill, F. A. (1976). Theory and application of the linear model. Belmont, CA, Wadsworth Publishing Company, Inc.

Grizzle, J. E., C. F. Starmer and G. G. Koch (1969). Analysis of categorical data by linear models (corr: V26 p860). Biometrics 25: 489-504.

Hansen, M. H., W. G. Madow and B. J. Tepping (1983). An evaluation of model-dependent and probability-sampling inferences in sample surveys. Journal of the American Statistical Association 78(384): 776-807.

Harville, D. A. (1997). Matrix algebra from a statistician's perspective. New York, Springer-Verlag.

Holt, D. and T. M. F. Smith (1979). Post stratification. Journal of the Royal Statistical Society, Series A, General 142: 33-46.

Holt, D., T. M. F. Smith and T. J. Tomberlin (1979). A model-based approach to estimation for small subgroups of a population. Journal of the American Statistical Association 74: 405-410.

Horvitz, D. G. and D. J. Thompson (1952). A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association: 663-685.

Inskip, H., V. Beral, P. Fraser and J. Haskey (1983). Methods for age-adjustment of rates. Statistics in Medicine 2(4): 455-466.

Jagers, P. (1986). Post-stratification against bias in sampling. International Statistical Review 54(2): 159-167.

Jones, H. L. (1974). Jackknife estimation of functions of stratum means. Biometrika 61(2): 343-348.

Judge, G. G., W. E. Griffiths, R. C. Hill, H. Lutkepohl and T. C. Lee (1985). The theory and practice of econometrics. New York, John Wiley and Sons.

Julious, S. A., J. Nicholl and S. George (2001). Why do we continue to use standardized mortality ratios for small area comparisons? Journal of Public Health Medicine 23(1): 40-46.

Kahn, H. A. and C. T. Sempos (1989). Statistical methods in epidemiology. New York, Oxford University Press.

Lee, W.-C. (2002). Standardization using the harmonically weighted ratios: Internal and external comparisons. Statistics in Medicine 21: 247-261.

179

Lee, W.-C. and Y.-P. Liaw (1999). Optimal weighting systems for direct age-adjustment of vital rates. Statistics in Medicine 18: 2645-2654.

Little, R. J. A. (1982). Direct standardization - a tool for teaching linear-models for un-balanced data. American Statistician 36(1): 38-43.

Liu, J. S. (2000). Msem dominance of estimators in two seemingly unrelated regressions. Journal of Statistical Planning and Inference 88(2): 255-266.

Liu, J. S. and S. G. Wang (1999). Two-stage estimate of the parameters in seemingly unrelated regression model. Progress in Natural Science 9(7): 489-496.

McCarthy, P. J. and C. B. Snowden (1985). The bootstrap and finite population sampling. Vital and Health Statistics. Ser. 2. No. 95. DHHS Pub. No. (PHS) 85-1369, Public Health Service, Washington, US Goverment Printing Office.

Mukhopadhyay, P. (2000). Topics in survey sampling. New York, Springer-Verlag. Neison, F. G. P. (1844). On a method recently proposed for conducting inquires into the

comparative sanitary condition of various districts. London Journal of Royal Statistical Society 7: 40-68.

Oh, H. L. and F. J. Scheuren (1983). Weighting adjustments for unit nonresponse. Incomplete data in sample survey - theory and bibliographies. W. G. O. Madow, I. and Rubin, D. B. New York, Academic Press. 2: 435-483.

Rao, J. N. K. (1997). Developments in sample survey theory: An appraisal. The Canadian Journal of Statistics 25(1): 1-21.

Rao, J. N. K. (1999). Some current trends in sample survey theory and methods. Sankhya : The Indian Journal of Statistics, Series B 61(1): 1-57.

Rao, J. N. K. (1999). Some recent advances in model-based small area estimation. Survey Methodology 25(2): 175-186.

Rao, J. N. K. and D. R. Bellhouse (1978). Optimal estimation of a finite population mean under generalized random permutation models. Journal of Statistical Planning and Inference 2: 125-141.

Rao, J. N. K. and C. F. J. Wu (1988). Resampling inference with complex survey data. Journal of the American Statistical Association 83(401): 231-241.

Revankar, N. S. (1974). Some finite sample results in the context of two seemingly unrelated regression equations. Journal of the American Statistical Association 69(345): 187-190.

Revankar, N. S. (1976). Use of restricted residuals in sur systems: Some finite results. Journal of the American Statistical Association 71(353): 183-188.

Rosenbaum, P. R. (1987). Model-based direct adjustment. Journal of the American Statistical Association 82(398): 387-394.

Rothman, K. J., Ed. (1986). Modern epidemiology. Boston, Little, Brown and Company.

Royall, R. M. (1970). Finite population sampling - on labels in estimation. The Annals of Mathematical Statistics 41(5): 1774-1779.

Royall, R. M. (1970). On finite population sampling theory under certain linear regression models. Biometrika 57(2): 377-387.

Royall, R. M. (1971). Linear regression models in finite population sampling. Foundations of statistical inference. V. P. Godambe and D. A. Sprott. Toronto, Holt, Rinehart and Winston: 259-279.

180

Royall, R. M. (1976). The linear least-squares prediction approach to two-stage sampling. Journal of the American Statistical Association 71: 657-664.

Royall, R. M. and W. G. Cumberland (1978). Variance estimation in finite population sampling. Journal of the American Statistical Association 73: 351-358.

Särndal, C. E., B. Swensson and J. Wretman (1992). Model assisted survey sampling. New York, Springer-Verlag.

Särndal, C. E., B. Swensson and J. H. Wretman (1989). The weighted residual technique for estimating the variance of the general regression estimator of the finite population total. Biometrika 76: 527-537.

Särndal, C. E. and R. L. Wright (1984). Cosmetic form of estimators in survey sampling. Scandinavian Journal of Statistics 11: 146-156.

Searle, S. R., G. Cassella and C. E. McCulloch (1992). Variance components. New York, John Wiley & Sons, Inc.

Singh, A. C. and C. A. Mohl (1996). Understanding calibration estimators in survey sampling. Survey methodology 22(2): 107-115.

Smith, T. M. F. (1990). Post-stratification. The Statistician 40: 315-323. Srivastava, V. K. and D. E. A. Giles (1987). Seemingly unrelated regression equations

models : Estimation and inference. New York, Dekker. Stanek, E. J., J. d. M. Singer and V. B. Lencina (2003). A unified approach to

estimation and prediction under simple random sampling. Journal of Statistical Planning and Inference: (Manuscript accepted, December 2002).

Stukel, D. M., M. A. Hidiroglou and C. E. Särndal (1996). Variance estimation for calibration estimators: A comparison of jackknifing versus taylor linearization. Survey methodology 22(2): 117-125.

Thompson, M. E. (1997). Theory of sample surveys. New York, Chapman & Hall. Thomsen, I. (1978). Design and estimation problems when estimating a regression

coefficient from survey data. Metrika 25: 27-35. Valliant, R. (2002). Variance estimation for the general regression estimator. Survey

methodology 28(1): 103-114. Valliant, R., A. H. Dorfman and R. M. Royall (2000). Finite population sampling and

inference, a prediction approach. New York, John Wiley & Sons. Wolter, K. M. (1985). Introduction to variance estimation. New York, Springer-Verlag. Wu, C. B. and R. R. Sitter (2001). A model-calibration approach to using complete

auxiliary information from survey data. Journal of the American Statistical Association 96(453): 185-193.

Zellner, A. (1962). An efficient method of estimating seemingly unrelated regression and tests for aggregation bias. Journal of the American Statistical Association 57(298): 348-368.

Zellner, A. (1963). Estimators of seemingly unrelated regression equations: Some exact finite sample results. Journal of the American Statistical Association 58: 977-992.

Zhang, L.-C. (1999). A note on post-stratification when analyzing binary survey data subject to nonresponse. Journal of Official Statistics 15(2): 329-334.

university of massachusetts amherst · vi abstract use of random permutation model in rate...

Documents