Download - 1 A. Demnati and J. N. K. Rao Statistics Canada / Carleton University A Presentation at the Third International Conference on Establishment Surveys June

1

A. Demnati and J. N. K. RaoStatistics Canada / Carleton University

A Presentation at the Third International Conference onEstablishment Surveys

June 18-21, 2007

Montréal, Québec, CanadaJune 20, 2007

Linearization Variance Estimatorsfor

Survey Data: Some Recent Work

2

Situation looking for a method of variance estimation that

is simple

is widely applicable

has good properties

provides unique choice

for estimators

of nonlinear finite population parameters

defined explicitly or implicitly

using calibration weights

under missing data

using repeated survey

of model parameters

SM, 2004 SM, 2004

SM, 2004

JSM, 2002 and JMS, 2002

FCSM, 2003

Symposium, 2005

of dual frames JSM, 2007

3

Demnati –Rao Approach

N

General formulation

Finite population parameters

Model parameters

Estimator for both parameters

Variance estimators associated with and are different

N

4

Demnati –Rao Approach( Survey Methodology, 2004 )

)(ˆ df

TNdd ),...,( 1d

kkk ad /

Write the estimator of a finite population parameter as

with

0ka if element k is not in sample s;

1ka if element k is in sample s;

N

5


with

)()ˆ( zL

A linearization sampling variance estimator is given by

skbfz kk ,|/)( dbb

: variance estimator of the H-T estimator

of the total

)(u kkudU ˆ

kuU

TNbb ),...,( 1b is a (N×1) vector of arbitrary number

6


YN Example – Ratio estimator of

)()()( kkkk xbybXf b

For SRS and

YXXxd

ydX

kk

kk ˆ)ˆ/(ˆ

kkkk eXXxRyXXz )ˆ/()ˆ)(ˆ/(

2112 )()( usNnNu

)()ˆ/()()ˆ( 2 eXXzL

)1/()( 22 nuus ksku

7


YN Example – Ratio estimator of

is a better choice over customary

Royall and Cumberland (1981)

Särndal et al. (1989)

)(z

Valliant (1993)

)(e

Binder (1996)

Skinner (2004)

8

Demnati –Rao Approach

Also in Survey Methodology, 2004:

Calibration Estimators:

Two-Phase Sampling

the GREG Estimator

the “Optimal” Regression Estimator

the Generalized Raking Estimator

New Extensions:

Wilcoxon Rank-Sum Test

Cox Proportional Hazards Model

9

Model parameters(Symposium, 2005)

)ˆ()ˆ()ˆ( smsm EVarVarEVar

Finite-population assumed to be generated from a superpopulation model

: model expectation and variance

where f is the sampling fraction. For multistage sampling, the psu sampling fraction plays the role of f.

mm VarE ,

Inference on model parameter

Total variance of :

ss VarE , : design expectation and variance

i) if f ≈ 0 then

ii) if f ≈ 1 then

)ˆ()ˆ( smVarEVar

)ˆ()ˆ( smEVarVar

)()ˆ( zL In case i),

10

Tkk

Tkkkk ddydd ),(),( 21d

Example: Ratio estimator when y is assumed to be random

skzzxRXX

f

Tkk

Tk

kbk db

,),()1,ˆ)(ˆ/(

|/)(

21

AAbAz

for)/()(ˆkkkk dxdyX

where Ad is a 2×N matrix of random variables with kth column:

)()ˆ( z L

Define

We have )/()()( 12 kkkd dxdXf AT

kkk dd ),( 21d

We get

where Ab is a 2×N matrix of arbitrary real numbers with kth column:T

kkk bb ),( 21b

where is an estimator of the total variance of)(u kTkU duˆ

)( km yE

11

ttkTk udduu ),cov()(

Estimator of the total variance of

with

when

kTkU duˆ

Note that is an estimator of model covariance

kttkkt aad /

),(cov tkm yy

kkk d vd andT

kk y ),1(v

A variance estimator of is given by k

TkU duˆ

Ttk

tk

tkktkt

tkmkttk d

yyd vvdd

)(

),(cov0

00),cov(

where

),( tkm yyCov

when and when 0),( tkm yyCov 0),(cov tkm yy 0),( tkm yyCov

12

sm

ktktstskkttkmmtmkkt zzdyyzzd

/)1(),(cov)( ;;;;z

Hence

where

)ˆ/(2; XXzz kmk

= model variance + sampling variance

222 )ˆ/(/)/1( es sXXnNnN

where

)1/()ˆ( 22 nxRyas kkke

and

)ˆ)(ˆ/()( 21; kkkkkkTksk xRyXXyzzz vz

Under SRS,

13

Under ratio model,

Note: g-weight appears automatically in

,)( kkm xyE

22 )1()ˆ/(/ em snXXnN

Note: remains valid under misspecification of

Hence,

,,0),( tkyyCov tkm X

m )( km yVar

222 1

)ˆ/()( esN

NXX

n

Nz

)ˆ/( XX )(z

and the finite population correction 1-n/N is absent in )(z

14

Simulation 1: Unconditional performance

kkkk xxy 2/12

}{ ky

k

We generated R=2,000 finite populations , each of size N=393 from the ratio model

where

kx

are independent observations generated from a N(0,1)

are the “number of beds” for the Hospitals population

studied in Valliant, Dorfman, and Royall (2000, p.424-427)

One simple random sample of specified size n is drawn from each generated population

Parameter of interest: XXyE km 2)(

15


220001

1 )ˆ()ˆ()ˆ(

rrRMMSE

)/(ˆ xyX

)ˆ()ˆ( 20001

1rDRrDR R

Ratio estimator:

We calculated:

Simulated

and its components and s m

16


Figure 1: Averages of variance estimates for selected sample sizes compared to simulated MSE of the ratio estimator.

17

Simulation 2: Conditional performance

kx

}{ ky

kkkk xxy 2/12

We generate R=20,000 finite populations , each of size N=393 from the ratio model

using the number of beds as

One simple random sample of size n=100 is drawn from each generated population

Parameter of interest: XX 2

We arranged the 20,000 samples in ascending order of -values and then grouped them into 20 groups each of size 1,000

x

18


Figure 2: Conditional relative bias of the expansion and ratio estimators of X2

19


Figure 3: Conditional relative bias of variance estimators

20


Figure 4: Conditional coverage rates of normal theory confidence intervals based on , and for nominal level of 95%cus

sDR

21

g-weighted estimating functions: model parameter Generalized Linear Model

))(()( βuβl kkkk y

Linear Regression Model βuβ Tkk )(

Logistic Regression Model )]exp(1/[)exp()( βuβuβ Tk

Tkk

is the solution of weighted estimating equation:

β

β

0 )()(ˆ βlβl kkw

δ Xx kkw

msL )ˆ(β

aaH 1)(

)ˆ( Tkkk Hdw x weightncalibratio

is solution

Special case: (GREG)

22

Simulation 3: Estimating equations

kx

}{ ky

)(~| kkk Pzy We generated R=10,000 finite populations , each of size N=393 from the model

One simple random sample of size n=30 is drawn from each generated population

Parameter of interest:TT )1,2(),( 10 θ

Population units are grouped into two classes with 271 units k having x<350 in class 1 and 122 units k with x>=350 in class 2

)}exp(1/{)exp()(~ 1010 kkkkk xxpwithpBz Using the number of beds as

leads to an average of about 60% for z)002.,1(),( 10

Post-stratification: X=(271,122)T

)exp( 10 kk z

23

Simulation 3: Estimating equations

Table 2: DR variance estimator

Parameter No Calibration

Post-stratification

0.0122 0.0123

0.0148 0.01501

0

0

1

Table 1: Monte Carlo Variances

Parameter No Calibration Post-stratification

0.0133 0.0139

0.0161 0.0167

1

0

Table 3: DR naïve variance estimator

Parameter No Calibration

Post-stratification

0.0120

0.0145

24

Multiple Weight Adjustments

Weight Adjustments for

Units (or complete) nonresponse

Calibration

Due to lack of time, not presented in the talk,

but it is included in the proceeding paper

25

Concluding Remarks

We provided a method of variance estimation for estimators:

The method

of nonlinear model parameters

using survey data

is simple

has good properties

defined explicitly or implicitly

is widely applicable

provides unique choice

using multiple weight adjustments

under missing data

Thank you Very Much

Download - 1 A. Demnati and J. N. K. Rao Statistics Canada / Carleton University A Presentation at the Third International Conference on Establishment Surveys June

Top Related