1
A. Demnati and J. N. K. RaoStatistics Canada / Carleton University
A Presentation at the Third International Conference onEstablishment Surveys
June 18-21, 2007
Montréal, Québec, CanadaJune 20, 2007
Linearization Variance Estimatorsfor
Survey Data: Some Recent Work
2
Situation looking for a method of variance estimation that
is simple
is widely applicable
has good properties
provides unique choice
for estimators
of nonlinear finite population parameters
defined explicitly or implicitly
using calibration weights
under missing data
using repeated survey
of model parameters
SM, 2004 SM, 2004
SM, 2004
JSM, 2002 and JMS, 2002
FCSM, 2003
Symposium, 2005
of dual frames JSM, 2007
3
Demnati –Rao Approach
N
General formulation
Finite population parameters
Model parameters
Estimator for both parameters
Variance estimators associated with and are different
N
4
Demnati –Rao Approach( Survey Methodology, 2004 )
)(ˆ df
TNdd ),...,( 1d
kkk ad /
Write the estimator of a finite population parameter as
with
0ka if element k is not in sample s;
1ka if element k is in sample s;
N
5
Demnati –Rao Approach( Survey Methodology, 2004 )
with
)()ˆ( zL
A linearization sampling variance estimator is given by
skbfz kk ,|/)( dbb
: variance estimator of the H-T estimator
of the total
)(u kkudU ˆ
kuU
TNbb ),...,( 1b is a (N×1) vector of arbitrary number
6
Demnati –Rao Approach( Survey Methodology, 2004 )
YN Example – Ratio estimator of
)()()( kkkk xbybXf b
For SRS and
YXXxd
ydX
kk
kk ˆ)ˆ/(ˆ
kkkk eXXxRyXXz )ˆ/()ˆ)(ˆ/(
2112 )()( usNnNu
)()ˆ/()()ˆ( 2 eXXzL
)1/()( 22 nuus ksku
7
Demnati –Rao Approach( Survey Methodology, 2004 )
YN Example – Ratio estimator of
is a better choice over customary
Royall and Cumberland (1981)
Särndal et al. (1989)
)(z
Valliant (1993)
)(e
Binder (1996)
Skinner (2004)
8
Demnati –Rao Approach
Also in Survey Methodology, 2004:
Calibration Estimators:
Two-Phase Sampling
the GREG Estimator
the “Optimal” Regression Estimator
the Generalized Raking Estimator
New Extensions:
Wilcoxon Rank-Sum Test
Cox Proportional Hazards Model
9
Model parameters(Symposium, 2005)
)ˆ()ˆ()ˆ( smsm EVarVarEVar
Finite-population assumed to be generated from a superpopulation model
: model expectation and variance
where f is the sampling fraction. For multistage sampling, the psu sampling fraction plays the role of f.
mm VarE ,
Inference on model parameter
Total variance of :
ss VarE , : design expectation and variance
i) if f ≈ 0 then
ii) if f ≈ 1 then
)ˆ()ˆ( smVarEVar
)ˆ()ˆ( smEVarVar
)()ˆ( zL In case i),
10
Tkk
Tkkkk ddydd ),(),( 21d
Example: Ratio estimator when y is assumed to be random
skzzxRXX
f
Tkk
Tk
kbk db
,),()1,ˆ)(ˆ/(
|/)(
21
AAbAz
for)/()(ˆkkkk dxdyX
where Ad is a 2×N matrix of random variables with kth column:
)()ˆ( z L
Define
We have )/()()( 12 kkkd dxdXf AT
kkk dd ),( 21d
We get
where Ab is a 2×N matrix of arbitrary real numbers with kth column:T
kkk bb ),( 21b
where is an estimator of the total variance of)(u kTkU duˆ
)( km yE
11
ttkTk udduu ),cov()(
Estimator of the total variance of
with
when
kTkU duˆ
Note that is an estimator of model covariance
kttkkt aad /
),(cov tkm yy
kkk d vd andT
kk y ),1(v
A variance estimator of is given by k
TkU duˆ
Ttk
tk
tkktkt
tkmkttk d
yyd vvdd
)(
),(cov0
00),cov(
where
),( tkm yyCov
when and when 0),( tkm yyCov 0),(cov tkm yy 0),( tkm yyCov
12
sm
ktktstskkttkmmtmkkt zzdyyzzd
/)1(),(cov)( ;;;;z
Hence
where
)ˆ/(2; XXzz kmk
= model variance + sampling variance
222 )ˆ/(/)/1( es sXXnNnN
where
)1/()ˆ( 22 nxRyas kkke
and
)ˆ)(ˆ/()( 21; kkkkkkTksk xRyXXyzzz vz
Under SRS,
13
Under ratio model,
Note: g-weight appears automatically in
,)( kkm xyE
22 )1()ˆ/(/ em snXXnN
Note: remains valid under misspecification of
Hence,
,,0),( tkyyCov tkm X
m )( km yVar
222 1
)ˆ/()( esN
NXX
n
Nz
)ˆ/( XX )(z
and the finite population correction 1-n/N is absent in )(z
14
Simulation 1: Unconditional performance
kkkk xxy 2/12
}{ ky
k
We generated R=2,000 finite populations , each of size N=393 from the ratio model
where
kx
are independent observations generated from a N(0,1)
are the “number of beds” for the Hospitals population
studied in Valliant, Dorfman, and Royall (2000, p.424-427)
One simple random sample of specified size n is drawn from each generated population
Parameter of interest: XXyE km 2)(
15
Simulation 1: Unconditional performance
220001
1 )ˆ()ˆ()ˆ(
rrRMMSE
)/(ˆ xyX
)ˆ()ˆ( 20001
1rDRrDR R
Ratio estimator:
We calculated:
Simulated
and its components and s m
16
Simulation 1: Unconditional performance
Figure 1: Averages of variance estimates for selected sample sizes compared to simulated MSE of the ratio estimator.
17
Simulation 2: Conditional performance
kx
}{ ky
kkkk xxy 2/12
We generate R=20,000 finite populations , each of size N=393 from the ratio model
using the number of beds as
One simple random sample of size n=100 is drawn from each generated population
Parameter of interest: XX 2
We arranged the 20,000 samples in ascending order of -values and then grouped them into 20 groups each of size 1,000
x
18
Simulation 2: Conditional performance
Figure 2: Conditional relative bias of the expansion and ratio estimators of X2
19
Simulation 2: Conditional performance
Figure 3: Conditional relative bias of variance estimators
20
Simulation 2: Conditional performance
Figure 4: Conditional coverage rates of normal theory confidence intervals based on , and for nominal level of 95%cus
sDR
21
g-weighted estimating functions: model parameter Generalized Linear Model
))(()( βuβl kkkk y
Linear Regression Model βuβ Tkk )(
Logistic Regression Model )]exp(1/[)exp()( βuβuβ Tk
Tkk
is the solution of weighted estimating equation:
β
β
0 )()(ˆ βlβl kkw
δ Xx kkw
msL )ˆ(β
aaH 1)(
)ˆ( Tkkk Hdw x weightncalibratio
is solution
Special case: (GREG)
22
Simulation 3: Estimating equations
kx
}{ ky
)(~| kkk Pzy We generated R=10,000 finite populations , each of size N=393 from the model
One simple random sample of size n=30 is drawn from each generated population
Parameter of interest:TT )1,2(),( 10 θ
Population units are grouped into two classes with 271 units k having x<350 in class 1 and 122 units k with x>=350 in class 2
)}exp(1/{)exp()(~ 1010 kkkkk xxpwithpBz Using the number of beds as
leads to an average of about 60% for z)002.,1(),( 10
Post-stratification: X=(271,122)T
)exp( 10 kk z
23
Simulation 3: Estimating equations
Table 2: DR variance estimator
Parameter No Calibration
Post-stratification
0.0122 0.0123
0.0148 0.01501
0
0
1
Table 1: Monte Carlo Variances
Parameter No Calibration Post-stratification
0.0133 0.0139
0.0161 0.0167
1
0
Table 3: DR naïve variance estimator
Parameter No Calibration
Post-stratification
0.0120
0.0145
24
Multiple Weight Adjustments
Weight Adjustments for
Units (or complete) nonresponse
Calibration
Due to lack of time, not presented in the talk,
but it is included in the proceeding paper
25
Concluding Remarks
We provided a method of variance estimation for estimators:
The method
of nonlinear model parameters
using survey data
is simple
has good properties
defined explicitly or implicitly
is widely applicable
provides unique choice
using multiple weight adjustments
under missing data
Thank you Very Much