empirical methods for unbalanced panel data: an empirical ...kwakdo/kwak_slide.pdf · empirical...

32
Empirical methods for unbalanced panel data: An empirical application to the e/ect of class size reduction on SAT score for grades in K-3 Do Won Kwak Michigan State University Feb 2011

Upload: dinhnhan

Post on 08-Mar-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Empirical methods for unbalanced panel data: Anempirical application to the e¤ect of class sizereduction on SAT score for grades in K-3

Do Won KwakMichigan State University

Feb 2011

Page 2: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Introduction

I Empirical strategies to deal with unbalanced panel dataI Large cross-section and small time dimensionI Substantial proportion of data is missing. (e.g. PSID, SIPP, NLSYand so forth)

I Typical reasons for missing data in panel data.I AttritionI Non-responseI Lost survey formI Administrative data with missing values

I Inappropriate ways (traditional methods) to handle missing data inprevious studies.

I Ignoring it (i.e. Complete case analysis)I Single imputationI Last observation carried forward

Page 3: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

MCAR assumptionI Traditional methods is valid only when missing completely atrandom (MCAR) assumption is satis�ed.

I MCAR means that missing is not correlated with any other variables.I MCAR assumption is violated if there is di¤erential missing in thedata.

I Di¤erential missing (i.e. Systematically di¤erent counterfactualresponse variable across covariates for missing units) causes theviolation of MCAR and bias of estimates.

Examples of violation of MCARI The e¤ect of class size reduction on SAT score

1. Among those students who left program, SAT score is signi�cantlyhigher for students in regular class than for students in small class.

2. The complete case analysis that ignores missing overestimates thee¤ect of small class on SAT score.

I The e¤ect of kinder-entering age on SAT score

1. Among those students who left program, SAT score is signi�cantlyhigher for young students than old students.

2. The complete case analysis that ignores missing overestimates thee¤ect of age on SAT score.

Page 4: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class
Page 5: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class
Page 6: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Eliminating the bias using IPW method

I Illustration of eliminating the bias from di¤erential missing usingIPW method

I missing process for unit i is completely determined by race.I There are 16 students (9=White, 4=Black, 3=Hispanic) at thebeginning.

I There are 12 students (9=White, 2=Black, 1=Hispanic) at the endof study.

I Probability of selection is:P(sit = 1jwhite) = 1;P(sit = 1jblack) = 1

2 ;P(sit = 1jhispanic) =13

I Weighting: Weight is 1 for 9 White students. Weight is 2 for 2 Blackstudents. Weight is 3 for 1 Hispanic student.

I 2 Black students play the role of 4 Black students and 1 Hispanicstudent plays the role of 3 Hispanic students.

I This weighting allows completers to mimic a pseudo random samplefrom population.

Page 7: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

 

Missing Pattern 

  Y(continuous) X1(binary) X2(categorical)  X3(truncated) Missing patternUnit1  V  V V V CompleteUnit2  V  V V V CompleteUnit3  M  M M M Monotone Unit4  V  V V V CompleteUnit5  M  V V V Non‐monotoneUnit6  V  V V V CompleteUnit7  M  M M M MonotoneUnit8  M  M M M MonotoneUnit9  V  V V V CompleteUnit10  M  M M M MonotoneUnit11  V  V V V CompleteUnit12  V  M V V Non‐monotoneUnit13  V  V V V CompleteUnit14  V  V V V CompleteUnit15  M  V M V Non‐monotoneUnit16  M  V M M Non‐monotone

(Note) V: observation available, M: observation is missing; missing proportion: 0.5(8/16); proportion of monotone missing among missing=0.5    

Page 8: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

  

 

 

Partition of units based on probability of observing all variables 

              

   (Note): Bold is completers  Yellow units have same predictors for probability of non‐missing:  1/2 Blue units have same predictors for probability of non‐missing:  1/3 Gray units have same predictors for probability of non‐missing: 1 

   

 

Group A U1 ,   U2 U3 ,   U7 

 

Group BU4 

U8,  U10 

Group C  U6, U9, U11, U13, U14 

 U5, U12, U15, U16 

Page 9: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Objectives of the current studyI Eliminate the bias from di¤erential missing by applying inverseprobability weighted (IPW) and multiple imputation (MI) methodsunder missing at random (MAR) assumption. MAR is satis�ed ifmissing depends on observed variables.1. Developing test for MCAR assumption.2. Studying IPW and MI methods that can provide valid inference evenif MCAR is not satis�ed.

Contribution of the current studyI Robust Hausman test has reliable power.

I MC simulation shows that power is close to 1 for large observations.

I IPW estimator has smaller bias than unweighted estimator evenwhen probability of selection is misspeci�ed.

I This study suggests a new method that combines IPW and MImethods.

I Example of IPW estimator shows progress over unweightedestimator.

I Under MCAR, the e¤ect of class size reduction on SAT score forcomplete case analysis is about 5.5 � 7% while the estimated e¤ectfor IPW method is about 4.5 � 6%. Using IPW method, weeliminate the bias from non-random missing.

Page 10: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Outline of today�s talk

I Idea behind the test of MCARI Robust Hausman test

1. Construction of test statistic2. Convergence to χ2 distribution3. Power test with Monte Carlo experiment

I Distinguishing missing patterns of unbalanced panel data: monotonevs non-monotone

I MI and IPW methods for unbalanced panel data under MARI Implementation of MI and IPW methodsI Sensitivity analysis for IPW method to misspeci�ed selectionprobability

I Example of application: The e¤ect of class size reduction on SATscore

I Test MCAR using robust Hausman test and application of MI andIPW methods

Page 11: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Extension of Hausman test to unbalanced panel dataI Model: Linear panel data model

yit = Dit � α+ x1it δ+ x2iγ+ ft + ci + uit = xit θ + vit (1)

where vit = ci + uit , xit = (Dit x1it x2i iT ), θ = (α0 δ0 γ0 f 0)0

I Pooled LS, FE and FD estimators are all consistent under the null ofMCAR.

bθpls = (N

∑i=1

T

∑t=1

sitx0itxit )�1

N

∑i=1

T

∑t=1

sitx0ityit

bθfe = (N

∑i=1

T

∑t=1

sit..x0it..xit )�1

N

∑i=1

T

∑t=1

sit..x0it..y it

I A robust Hausman test statistic is constructed by di¤erencing twoconsistent estimators under the null.

I The rejection of null implies inconsistency among pooled, FE and FDestimators.

Null AlternativePooled Estimator consistent, ine¢ cient inconsistentFE/FD estimator consistent, ine¢ cient consistent

Page 12: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Robust Hausman Test: Test of MCAR assumptionConsider stacked estimator bθa = (bα0pls ,bα0fe )0

bθ = � bθplsbθfe�=

26664N

∑i=1

T

∑t=1

sitxitx 0it 0

0N

∑i=1

T

∑t=1

sit..x 0it

..x it

37775�1 26664

N

∑i=1

T

∑t=1

sitxityit

N

∑i=1

T

∑t=1

sit..x it

..y it

37775bθa = (bα0pls bα0fe )0; θa = (α

0pls α0fe )

0;

where αpls and αFE are probability limits for bαpls and bαFE respectively.

Page 13: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Construction of a Wald statisticLet R = [Ik j � Ik ] where R is k � 2k matrix if the number of treatmentsis k.

Rθa=

266666641 0 � � � � � � 0 �1 0 � � � � � � 00 1 0 � � � 0 0 �1 0 � � � 0...

. . ....

.... . .

......

. . ....

.... . .

...0 0 � � � 0 1 0 0 � � � 0 �1

37777775

2666666664

θ1,pls...

θk ,plsθ1,fe...

θk ,fe

3777777775Consider a null hypothesis of H0 : Rθa = 0 (i.e. θj ,pls � θj ,fe = 08j = 1, 2, .., k)

Page 14: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

A Wald statistic for robust Hausman Test

I Under the null of H0 : Rθa = 0 (i.e. θj ,pls � θj ,fe = 08j = 1, 2, .., k), Wa has asymptotic distribution of χ2k .

Wa = [RpN(bθa)]0[Rcvar(pNbθa)R 0]�1RpN(bθa) a� χ2k

I Experiment is designed to generate monotone missing only.I DGP is designed to generate data with missing proportion of 0%,25%, 50% and 75%.

I Di¤erential missing across dit is induced by wi and the correlationbetween sitdit and ci through wi .

I PLS estimator is inconsistent while FE/FD transformations eliminatewi so that FE/FD estimators are consistent.

I Table 1 shows that bαfe � bαpls � 0.2 when missing proportion is0.25, bαfe � bαpls � 0.25 when missing proportion is 0.5, andbαfe � bαpls � 0.22 when missing proportion is 0.75. The power oftest is close to 1 when n > 2, 000 and missing proportion> 0.25.

Page 15: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Table 1: Pooled LS and FE estimates for α and p-value for Wa

a = 0 True α Pooled FE p-value r ≡ 1(P<.05) 95-CI

n = 100 1 1.003 1.002 0.502 0.054 (0.040,0.068)

n = 500 1 1.003 1.002 0.510 0.062 (0.047,0.076)

n = 1, 000 1 0.996 0.998 0.496 0.047 (0.034,0.060)

n = 2, 000 1 0.999 1.001 0.490 0.047 (0.034,0.060)

n = 5, 000 1 0.998 1.000 0.510 0.048 (0.035,0.061)

a = 0.25 True α Pooled FE p-value r ≡ 1(P<.05) 95-CI

n = 100 1 0.810 1.000 0.374 0.155 (0.132,0.177)

n = 500 1 0.812 1.000 0.132 0.533 (0.502,0.564)

n = 1, 000 1 0.804 0.997 0.033 0.846 (0.823,0.868)

n = 2, 000 1 0.810 1.001 0.002 0.990 (0.984,0.996)

n = 5, 000 1 0.807 0.999 0.000 1 (1,1)

a = 0.5 True α Pooled FE p-value r ≡ 1(P<.05) 95-CI

n = 100 1 0.758 1.002 0.348 0.182 (0.158,0.206)

n = 500 1 0.754 1.000 0.089 0.664 (0.635,0.693)

n = 1, 000 1 0.747 0.999 0.014 0.936 (0.920,0.951)

n = 2, 000 1 0.751 0.999 0.001 0.995 (0.991,0.999)

n = 5, 000 1 0.750 1.000 0.000 1 (1,1)

a = 0.75 True α Pooled FE p-value r ≡ 1(P<.05) 95-CI

n = 100 1 0.773 0.999 0.409 0.125 (0.104,0.146)

n = 500 1 0.777 1.001 0.225 0.400 (0.370,0.430)

n = 1, 000 1 0.768 0.999 0.114 0.627 (0.597,0.657)

n = 2, 000 1 0.774 0.997 0.056 0.822 (0.798,0.846)

n = 5, 000 1 0.773 1.002 0.014 0.954 (0.941,0.967)*Note: 1(P<.05) is the rejection rate for the null of H0:α=1 with nominal value 0.05.; a is the ratio of missing sample

to full sample.; Cluster robust standard error has been used in all estimations.; p-value is the mean of p-value for Wa,

r is rejection rate, and CI-95 is coverage for r.

13

Page 16: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Missing Pattern

I Complete casesI Monotone missing (attrition in panel data)I Non-monotone missing

1. Individual units are in the sample but some variables are missing.2. Individual units leave sample and reappear later.

I Distinguishing two di¤erent missing patterns (monotone vsnon-monotone) is important since each pattern requires di¤erentmethod if MCAR assumption is not satis�ed.

1. Applying IPW method to attrition units2. Using MI method to non-monotone missing units

Page 17: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Methods under MAR when MCAR is violatedI We can estimate the determination of missing variables or of missingprocess using observed variables as predictors.

I Non-monotone missing: Applying MI method by estimating missingvariables for those units in the sample with some missing variablesusing observed variables as predictors (Only part of variables aremissing.)

I Monotone missing: Applying IPW method by estimating theprobability of selection for completers

Implementation of multiple imputation (MI) for non-monotonemissing data

g(Zmissing jZobserved , ξ); ξ parameters

I An appropriate imputation model of g(�) with predictors fromoutcome and covariates

I Multiple posterior Bayesian draws from this imputation model is usedto re�ect the uncertainty of parameters from Bayesian draws.

g(Zm jZo , ξ) = g(Z1m ,Z2m ,Z3m ,Z4m jZo , ξ) (2)

Zmissing include continuous (Z1m), binary (Z2m), categorical (Z3m), andtruncated (Z4m) variables and Z1m is a response variable.

Page 18: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

 

Missing Pattern 

  Y(continuous) X1(binary) X2(categorical)  X3(truncated) Missing patternUnit1  V  V V V CompleteUnit2  V  V V V CompleteUnit3  M  M M M Monotone Unit4  V  V V V CompleteUnit5  M  V V V Non‐monotoneUnit6  V  V V V CompleteUnit7  M  M M M MonotoneUnit8  M  M M M MonotoneUnit9  V  V V V CompleteUnit10  M  M M M MonotoneUnit11  V  V V V CompleteUnit12  V  M V V Non‐monotoneUnit13  V  V V V CompleteUnit14  V  V V V CompleteUnit15  M  V M V Non‐monotoneUnit16  M  V M M Non‐monotone

(Note) V: observation available, M: observation is missing; missing proportion: 0.5(8/16); proportion of monotone missing among missing=0.5    

Page 19: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Example: MI method under MAR

Unit5: g(Z1m jZ2m ,Z3m ,Z4m ,Zo , ξ1) = normalUnit12: g(Z2m jZ1m ,Z3m ,Z4m ,Zo , ξ2) = logistic

Unit15 :

g(Z1m ,Z3m jZo , ξ2)= g(Z1m jZo , ξ2a) � g(Z3m jZo , ξ2b)

where g(Z1m jZo , ξ2a) is normal and g(Z3m jZo , ξ2b) is Poisson

Unit16 :

g(Z1m ,Z3m ,Z4m jZo , ξ3)= g(Z1m jZo , ξ3a) � g(Z3m jZo , ξ3b) � g(Z4m jZo , ξ3c )

where g(Z4m jZo , ξ3c ) is truncated normal

Page 20: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Implementation of IPW method

I Applying IPW method for monotone missingI The probability of selection for completers using observed variables isestimated and inverse probability weighted completers are used inestimation.

I Estimating the probability of selection (i.e. remaining in sample forunit i at time t) based on economic model for missing process. (i.e.model for why unit i remains in sample while unit j leaves sample)

Pit = P(sit = 1jZo ,it , ξ)

I IPW eliminates bias by giving higher weights for those remainingunits with very low probability.

I The choice of predictors Zo ,it for probability of selection is criticalfor MAR assumption to work.

I We use an economic model to select predictors Zo ,it if possible.

I Correct model for p(sit = 1jZo ,it ) is also important.I In the application we use logit model but nonparametric estimation isalso possible.

Page 21: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

IPW Estimation Under MAR

I Suppose we have an estimating equation for complete case data

0 =N

∑i=1

T

∑t=1

qit (yit , xit ;bθ); where Wit = (yit , xit )

and we know (or can estimate) the probability of observing acomplete unit i at t, pit , then we can estimate bθ using (5) underMAR of (6).

0 =N

∑i=1

T

∑t=1

sitqit (yit , xit ;bθ)pit

(3)

MAR assumption:

p(sit = 1jWit ,Zo ,it , ξ) = p(sit = 1jZo ,it , ξ) = p(Zo ,it ) � pit (4)

Page 22: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Consistency of IPW :plim of LHS of (5)

T

∑t=1

E�sitqit (yit , xit ; θ)

pit

�=

T

∑t=1

E�E (sitqit (yit , xit ; θ)

pitjWit ,Zo ,it )

=T

∑t=1

E�E (sit jWit ,Zo ,it )qit (yit , xit ; θ)

pit

=T

∑t=1

E�p(sit = 1jWit ,Zo ,it )qit (yit , xit ; θ)

pit

=|{z}MAR

T

∑t=1

E�p(sit = 1jZo ,it )qit (yit , xit ; θ)

pit

=T

∑t=1

E�pit � qit (yit , xit ; θ)

pit

=T

∑t=1

E (qit (yit , xit ; θ))

Page 23: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Important practical issues for IPW method

I pit could be very close to zero.I The sensitivity of IPW method to misspeci�cation of probability ofselection.

I True missing (selection) model:

si1 = 1; sit = 1(fi + c2it + δyit�1 � c2it + εit > a), t = 2, 3, ..,T

where a is cuto¤, εit � N(0, 1), and the degree of misspeci�cation iscontrolled by δ. As δ increases from 0.2 to 1, the degree ofmisspeci�cation increases.

I Estimation model for probability of non-missing

πit = p(sit = 1jZobs , sit�1 = 1,γ) =exp(fiγ1 + c2itγ2)

1+ exp(fiγ1 + c2itγ2), t = 2, 3, ..,T

Estimation model is misspeci�ed since its predictors omit interactionterm yit�1 � c2it .

Page 24: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Table 14: Mis-specified model for probability of selection: MTE for unweighted LS, FE and FD

estimators, δ=0.5

Unweighted LS FE FD

number of n missing fraction Mean(α̂) SD(α̂) Mean(α̂) SD(α̂) Mean(α̂) SD(α̂)

100 0.75 1.416 .215 .936 .142 .916 .157

100 0.5 1.303 .147 .971 .094 .962 .110

100 0.25 1.292 .143 .980 .089 .976 .105

200 0.75 1.424 .146 .936 .097 .918 .110

200 0.5 1.307 .101 .974 .064 .967 .074

200 0.25 1.288 .097 .982 .060 .977 .070

500 0.75 1.424 .098 .939 .062 .922 .070

500 0.5 1.306 .071 .974 .042 .965 .047

500 0.25 1.289 .068 .982 .039 .976 .045

1,000 0.75 1.423 .067 .941 .042 .924 .048

1,000 0.5 1.305 .048 .976 .028 .968 .033

1,000 0.25 1.287 .046 .983 .027 .978 .031

Table 15: Mis-specified model for probability of selection: MTE estimates for IPW LS and FD

estimators, δ=0.5

IPW LS FD

n % missing Mean(α̂) SD(α̂) r ≡ 1(p < .05) coverage of r Mean(α̂) SD(α̂) r ≡ 1(p < .05) coverage of r

100 0.75 1.379 .259 .400 (.370,.431) .914 .178 .142 (.120,.164)

100 0.5 1.297 .154 .505 (.472,.539) .962 .112 .120 (.098,.141)

100 0.25 1.290 .145 .495 (.454,.535) .977 .104 .114 (.088,.140)

200 0.75 1.389 .177 .656 (.627,.685) .910 .124 .174 (.150,.198)

200 0.5 1.300 .105 .798 (.773,.823) .968 .075 .109 (.090,.129)

200 0.25 1.285 .099 .799 (.773,.825) .978 .071 .099 (.080,.119)

500 0.75 1.385 .113 .942 (.927,.957) .918 .077 .257 (.230,.284)

500 0.5 1.299 .073 .991 (.985,.997) .967 .048 .169 (.146,.192)

500 0.25 1.286 .069 .992 (.986,.998) .977 .045 .138 (.117,.160)

1,000 0.75 1.384 .079 .995 (.991,.999) .920 .053 .405 (.375,.435)

1,000 0.5 1.297 .049 1 (1,1) .969 .033 .215 (.189,.241)

1,000 0.25 1.284 .046 1 (1,1) .979 .031 .161 (.138,.184)

*Note: 1(P<.05) is the rejection rate for the null ofH0:α=1 with nominal value 0.05.; Cluster robust standard error has

been used in all estimations.

38

Page 25: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

The sensitivity of IPW to misspeci�cation of probability ofnon-missing

I In the simulation, for δ � 0.5, the bias of IPW estimators (both PLSand FD) is less than the bias of unweighted estimators.

I As the magnitude of misspeci�cation increases δ � 1, the bias ofIPW estimators (both PLS and FD) is greater than the bias ofunweighted estimators.

I For monotone (attrition in panel data) missing, IPW works prettywell if misspeci�cation is not so severe.

I If practitioners have enough information (predictors and speci�cationfor selection probability model), IPW method should work quite well.

Page 26: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Combining MI and IPW methods

I Typically, most panel data are composed of complete case,monotone missing, and non-monotone missing.

I Combining MI and IPW methods. IPW method eliminateinconsistency and MI method enhances e¢ ciency.

1. Impute missing variables for non-monotone missing units.2. Using imputed units and completers, estimate the probability ofselection. Estimate parameters by IPW method. Let�s denote the

estimate of θ and its covariance matrix as bθj and bV j .3. Iterate 1 and 2 M times. Using Rubin�s formula (1987), obtain MI

estimates for θ and its covariance matrix. bθ = 1M ∑Mj=1 bθj andbV = 1

M ∑Mj=1 bV j + (M+1M )B where B = 1M�1 ∑Mj=1(bθj � bθ)(bθj � bθ)0.

Page 27: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Empirical Application: Project STAR Educational ExperimentI Unbalanced data of Project STAR: Investigation of the e¤ect ofClass Size Reduction on educational outcomes for grades K-3.

I Randomized experiment - Randomly assigning students and teachersto three di¤erent types of class

I Small class (13-17, treatment)I Regular class with aide class (22-25, another treatment)I Regular class (22-25, control)I Average class size at Tennessee was 22.3 at 1985-1986.

I Participants: 6,800 kindergarten students from 1,340 classes in 79Tennessee public schools for grades in K-3

I Collecting yearly data on: student�s SAT score, student, teacher,class, and school characteristics variables

I Conclusions from previous study: Krueger (1999) used regressionanalysis

Scoreit = Small_Classit � α1 +Aide_Classit � α2 + controls+ uit

I A strong and lasting e¤ect of small class (bα1, 5.5 � 7%), nosigni�cant e¤ect of regular class with aide, and stronger e¤ect forminority and poor are reported in Word et al (1990) and Krueger(1999).

Page 28: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Implementation problem (attrition problem)

I Missing is severe.I At each grade, some students left and new students added. Thus,about 6,000�6,500 students remain in the sample for each grade.

I About 48% of students who participated in the program left programor had partial missing.

I Majority of missing is by attrition: non-monotone missing is less than10%. (i.e. 52% �> completers, 40% �> monotone missing, 8% �>non-monotone missing)

I Some evidence of possibility for di¤erential missing across class typesI Formal robust Hausman tests reject the null of MCAR.

Page 29: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Implementation of IPW methodI Current study focuses on complete case and missing due to attritionwhen applying IPW method.

I When implementing IPW method, MAR assumption and correctlyspeci�ed selection probability are very important.

I Economic model to select predictors Zo ,itI Objective function: Parents maximize utility directly for two periods(their own and child) and indirectly for in�nite periods.

I Optimal condition: Marginal cost (of child�s human capitalinvestment) becomes equal to marginal bene�t (of enhancedproductivity of child from human capital investment).

u0(ct )| {z }marginal cost

= ρ � Et�a(h2i ) � u0(ct+1)jIt

�| {z }expected increase in child�s production| {z }

marginal bene�t

where It is new information available at t.I Parent can improve child�s expected productivity by school choice.(public vs private school)

I It is information available to parents at the end of grade t. Thisinclude child�s SAT score, teacher characteristics, school and peercharacteristics. These information is used for predictors of probabilityof selection.

Page 30: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Probability Weight Estimation

I Estimation of pit : since the missing is monotone sit = 1 impliessit�j = 1 for all j > 0

I

p(sit = 1jZo ,it ) = p(sit = 1, sit�1 = 1, ..., si1 = 1jZo ,it ) � pit

p(sit = 1jZo ,it )= p(sit = 1jZo ,it , sit�1 = 1) � p(sit�1 = 1jZo ,it�1, sit�2 = 1)

� � � � �p(si2 = 1jZo ,i2, si1 = 1)= πit � � � �πi2πi1

where πit = p(sit = 1jZo ,it , sit�1 = 1).I We estimate πit = p(sit = 1jZo ,it , sit�1 = 1) using logit model

πit =exp(Zo ,itγ+ vit )

1+ exp(Zo ,itγ+ vit ), for sit�1 = 1

Page 31: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

IPW estimators

I Using estimated probability weights, we estimate PLS and FD asfollows:

bθIPW �pls = (N

∑i=1

T

∑t=1

sitx0itxitbpit )�1(N

∑i=1

T

∑t=1

sitx0ityitbpit )

bθIPW �FD = (N

∑i=1

T

∑t=2

sFDit ∆x0it∆xitbpit )�1(N

∑i=1

T

∑t=2

sFDit ∆x0it∆yitbpit )

I Estimated e¤ect of class size reduction (CSR) on student�s score forgrades in K-3 is about 4.5 � 6% for IPW estimator.

I The estimated e¤ect for unweighted estimator with complete-case isabout 5.5 � 7%.

I Estimators ignoring missing data overestimate the e¤ect of CSR onstudent�s score by about 1 � 2 percentage points.

Page 32: Empirical methods for unbalanced panel data: An empirical ...kwakdo/kwak_slide.pdf · Empirical methods for unbalanced panel data: An empirical application to the e⁄ect of class

Extension and Further Applications

I Combining MI and IPW methods: Combining completers sampleand imputed non-monotone missing sample. Estimate probabilityweights for these combined sample. Apply IPW methods and obtainestimates. Iterate M times.

I Test of MCAR and application of IPW and MI methods can also beapplied to observational data with valid IV or control functionapproach.

I IPW method can be applied to estimation equation of quantileregression. (e.g. The e¤ect of kindergarten entering age on SATscore)

I There are many previous empirical studies with panel survey data(PSID, SIPP, NLSY) with substantial missing proportion: methodsintroduced in this study can be applied to the studies with thesepanel data.