bootstrap for panel data (ppt), hounkannounon

Chapter 1 Chapter 2 Chapter 3

Bootstrap for Panel Data Models with anApplication to the Evaluation of Public

Policies

Bertrand G. B. HounkannounonUniversite de Montreal

Ph.D. defense


ACKNOWLEDGMENT


THESIS

The purpose of this thesis is to develop bootstrap methods forpanel data models, to prove their validity and apply them in the

framework of evaluation of public policies.

Chapter 1 : Double resampling bootstrap for the mean of apanel

Chapter 2 : Bootstrap for panel regression models withrandom effects

Chapter 3 : Bootstrapping Differences-in-DifferencesEstimates


Chapter 1 : Double resampling bootstrap for themean of a panel

The theoretical results and simulations are provided for the samplemean.


Panel Data

Panel data refers to data sets where observations on individualunits (such as households, firms or countries) are available overseveral time periods.

The availability of two dimensions (cross section and time series)allows for the identification of effects that could not be accountedfor otherwise.

y11 y12 ... ... y1Ty21 y22 ... ... y2T... ... ... .. ...yN1 yN2 .. ... yNT

yit is the cross-sectional i

s observation at period t.


Bootstrap Methods

Why do Statisticians and Econometricians use bootstrap ?

The true probability distribution of a test statistic is rarelyknown in finite sample.

Avoid Asymptotic fiction: Asymptotic theory uses thebehavior of the statistic at infinity as an approximation.Bootstrap methods can provide a more accurate inference.

Possibility to make weak structure hypothesis. Simulation of nuisance parameters.

Multiple asymptotic distributions in Large Panels (N and Tare both important) : Multiple asymptotic fictions.


Bootstrap Methods

The method consists in drawing many random samples thatresembles as much as possible and estimating the distribution ofthe object of interest over these random samples.

Resample the original data, to create pseudo data.

Use estimations on these pseudo data to make inference.


Resampling Methods for Panel Data

How to bootstrap panel data ?



Cross-sectional Resampling

Cross - sectional Resampling : Resample cross-sectional units.Application of original i.i.d bootstrap in cross-sectiondimension.


=

yi11 yi12 ... ... yi1Tyi21 yi22 ... ... yi2T... ... ... ... ...

yiN1 yiN2 ... ... yiNT

(i1, i2, .., iN) by i.i.d. drawing with replacement from (1, 2, ...,N).

A statistical unit can appear 0, 1, 2, 3... times in a pseudo dataset.





=

yi11 yi12 ... ... yi1T

yi21 yi22 ... ... yi2T... ... ... ... ...








=

yi11 yi12 ... ... yi1Tyi21 yi22 ... ... yi2T

... ... ... ... ...








=

yi11 yi12 ... ... yi1Tyi21 yi22 ... ... yi2T... ... ... ... ...





Block Bootstrap Resampling

Block Bootstrap Resampling : Accommodation of traditionalTime series block bootstrap. Resample blocks of time periodsin order to capture temporal dependence.

Y (N,T )

=

y11 = y1t1 y

12 = y1t2 ... y

1T = y1tT

y21 = y2t1 y22 = y2t2 ... y

2T = y2tT

... ... .. ...yN1 = yNt1 y

N2 = yNt2 ... y

NT = yNtT

(t1, t2, ., tT ) taking the form :1, 1 + 1, ., 1 + l 1

block 1

, 2, 2 + 1, ., 2 + l 1 ,block 2

..,K , K + 1, ., K + l 1 block K

where the vector of indices (1, 2, ..., K ) , K = [T/l ] is obtainedby i.i.d. drawing with replacement from (1, 2, .....,T )


Double Resampling Bootstrap

Double Resampling Bootstrap : Combination of block andcross-sectional resamplings.

Y =

y11 = yi1t1 y

12 = yi1t2 ... y

1T = yi1tT

y21 = yi2t1 y22 = yi2t2 ... y

2T = yi2tT

... ... .. ...yN1 = yiN t1 y

N2 = yiN t2 ... y

NT = yiN tT

where the indices (i1, i2, ....., iN) and (t1, t2, ., tT ) are chosen asdescribed previously.


Double Resampling Bootstrap Variance

Var(y)

= Var(z)

+

(1 1

K

)Var

(ycros

)+

(1 1

N

)Var

(ybl

)Finite sample property, holding without any assumption

about yit .


Double Resampling Bootstrap Variance

Var(y)

>

(1 1

K

)Var

(ycros

)Var

(y)

>

(1 1

N

)Var

(ybl

)The two inequalities mean that the double resampling bootstrapinduces a greater variance than the cross-sectional resamplingbootstrap and the block resampling bootstrap.


InterpretationFor N and K=T/l large enough

CI cros1 CI 1

CI bl1 CI 1

If the Double Resampling Bootstrap (DRB) CI rejects the NullHypothesis, there is NO CHANCE for one dimensionbootstrap CI to Not Reject it.

One dimension bootstrap methods can reject the Nullhypothesis, and the DRB CI not reject it.

The Double Resampling Bootstrap dominates the resamplingmethods in one dimension, in the sense that It is valid formore processes.


Panel Data Models

IID panel yit = + it

Cross. one-way ECM yit = + i + it

Temp. one-way ECM yit = + ft + it

Two-way ECM yit = + i + ft + it

Factor model yit = + iFt + ityit = + i + iFt + it


Consistency

A bootstrap method is consistent if :

supxR

P (M (y y) x) P (M (y ) x) PNT

0

with M {N,T ,NT} .

Intuition : The behavior of(y y) is similar to the behavior of(

y ) when the sample size increases.


Consistency

Y = +

1 ... 12 ... 2... .. ...N ... N

+

f1 ... fTf1 ... fT... ... ...f1 ... fT

+

12...N

( F1 ... FT )+

11 ... 1T21 ... 2T... .. ...N1 ... NT

The cross-sectional resampling is also equivalent to i.i.d.resampling on (1, .., N) . and treats (f1, ..., fT ) and (F1, ....,FT )as constants

yit,cros = + i + ft +

i Ft +

it,cros


Consistency

Y = +

1 ... 12 ... 2... .. ...N ... N

+

f1 ... fTf1 ... fT... ... ...f1 ... fT

+

12...N

( F1 ... FT )+

11 ... 1T21 ... 2T... .. ...N1 ... NT

The block resampling, is equivalent to block resampling on(f1, .., fT ) and (F1, ...,FT )and treats (1, .., N) and (1, .., N) asconstants.

yit,bl = + i + ft,bl + iF

t,bl +

it,bl


Consistency

Y = +

1 ... 12 ... 2... .. ...N ... N

+

f1 ... fTf1 ... fT... ... ...f1 ... fT

+

12...N

( F1 ... FT )+

11 ... 1T21 ... 2T... .. ...N1 ... NT

The double resampling is equivalent to i.i.d. resampling on(1, ...., N) and (1, ...., N) and block resampling on (f1, ...., fT )and (F1, ....,FT ) .

yit = + i + f

t,bl +

i Ft,bl +

it


Consistency

yit,cros = + i + ft +

i Ft +

it,cros

yit,bl = + i + ft,bl + iF

t,bl +

it,bl

yit = + i + f

t,bl +

i Ft,bl +

it

(ycros y

)= ( ) +

(F F

)+([inter ]

)(ybl y

)=

(fbl f

)+(Fbl F

)+([inter ]

bl

)(y y) = ( ) + (f bl f )+ (F bl F)+ ( )


Summary of Bootstrap Consistency

Cross-sect. Block DoubleResampling Resampling Resampling

Cross. one-way ECM Consistent Consistentyit = + i + it

Temp. one-way ECM Consistent Consistentyit = + ft + it

Two-way ECM Consistentyit = + i + ft + it

Factor model Consistent Consistentyit = + i + iFt + it


Simulations

(N,T ) (10, 10) Cross Bl(1) Bl(2) 2Res(1) 2Res(2)

yit = + it 4.5 4.3 4.7 1.0 2.0yit = + i + it 5.2 50.1 40.9 5.0 5.1

Temp ECM 0.00 49.1 5.3 5.2 5.0 6.5yit = + 0.25 66.8 10.1 6.5 11.3 9.4ft + it 0.50 63.1 22.2 12.8 24.1 16.7

0.00 5.4 5.2 5.5 1.0 1.3Factor 0.25 4.7 7.5 5.4 1.2 1.6yit = + 0.50 5.0 11.3 7.5 2.3 2.0iFt + it 0.95 5.0 29.3 24.3 4.2 4.3

1.00 4.8 34.0 29.5 4.2 4.92-ECM 0.00 13.8 14.0 9.9 5.6 5.2

yit = + i 0.25 17.2 16.9 12.9 7.1 7.5ft + it 0.50 24.2 28.4 17.3 14.1 12.7


Simulations(N,T ) (30, 30) Cross Bl(2) Bl(3) 2Res(2) 2Res(3)

yit = + it 5.0 4.8 5.3 1.1 1.3yit = + i + it 4.8 71.3 68.7 4.7 4.9

Temp ECM 0.00 71.6 69.4 5.3 5.2 5.2yit = + 0.25 77.0 9.3 6.9 9.9 7.5ft + it 0.50 83.6 15.3 13.2 15.4 14.3

0.00 4.6 4.7 5.0 0.8 1.2Factor 0.25 4.4 6.0 5.6 1.3 1.1yit = + 0.50 5.7 9.2 8.3 1.3 1.2iFt + it 0.95 5.0 38.8 39.0 5.4 4.1

1.00 4.6 65.0 57.9 5.0 5.52-ECM 0.00 13.1 13.6 14.0 4.6 5.0

yit = + i 0.25 23.0 18.0 12.6 7.1 6.9ft + it 0.50 30.3 23.0 19.2 12.1 10.8


Simulations

(N,T ) (60, 60) Cross Bl(3) Bl(5) 2Res(3) 2Res(5)

yit = + it 5.6 4.5 5.2 0.8 1.0yit = + i + it 4.4 79.7 77.7 4.2 4.8

Temp ECM 0.00 78.3 5.7 5.9 5.8 6.1yit = + 0.25 83.7 7.8 6.1 7.8 6.3ft + it 0.50 88.6 12.5 8.5 12.8 8.7

0.00 4.8 5.1 4.5 0.5 0.9Factor 0.25 5.0 6.0 5.4 1.3 0.9yit = + 0.50 4.7 7.2 5.6 1.0 1.4iFt + it 0.95 5.2 40.3 33.2 3.7 3.9

1.00 5.2 73.1 67.3 5.4 4.92-ECM 0.00 15.7 14.6 14.8 4.7 5.4

yit = + i 0.25 22.8 15.1 12.8 7.4 5.3ft + it 0.50 30.8 20.0 12.7 11.7 8.0


Conclusion

The Double Resampling Bootstrap (DRB) method dominatesresampling methods in one dimension, in the sense that theset of DGP for which DRB is valid is greater.

The double resampling is valid under general conditions oncross-sectional and temporal heterogeneity as well ascross-sectional dependence.

Resampling only in the cross section dimension is not valid inthe presence of temporal heterogeneity

Block resampling only in the time series dimension is not validin the presence of cross section heterogeneity.

The bootstrap does not require the researcher to choose oneof several asymptotic approximations available for panelmodels.


Chap 2 : Bootstrap for panel regression models withrandom effects

Extension to previous results to panel linear regression model.

yit = + Vi + Wt + Xit + it = Zit + it

it = i + ft + iFt + uit


Residuals based bootstrap

yit = Zit + it

Use OLS estimator of to get the residuals.

uit = yit Zit

Resample the residuals to create pseudo data.

yit = Zit + uit

Repeat in other to have many realizations of {Y ,Z} and and use them to make inference.


Pairs bootstrap

yit = Zit + it

Resample directly {Y ,Z} to create pseudo data {Y ,Z }.

Run OLS regression with {Y ,Z } to have

Repeat to have many realizations of and use them to makeinference


Bootstrap Validity

supxRK

P (M ( ) x) P (M ( ) x) PNT

0

M {

N,T ,NT}

Intuition : The behavior of(

)is similar to the behavior of(

)

when the sample size increases.


Theoretical Related Literature

Kapetanios (2008) A bootstrap procedure for panel datasetswith many cross-sectional units : N-asymptotic theoreticalresults with iid cross-sectional vector.

yit = + Vi + Xit + it

Goncalves (2010) The Moving Blocks Bootstrap for PanelRegression Models with Individual Fixed Effects:Accommodation of Moving Blocks Bootstrap to linear panelmodels.

yit = Vi + Wt + Xit + it


Theoretical contribution

We prove that of the Cross-section resampling bootstrap isvalid only for parameters associated with cross-section varyingregressors in the presence of random effects.

yit = + Vi + Wt + Xit + it



We prove that the block resampling bootstrap is valid only forparameters associated with time varying regressors thepresence of random effects.




We prove that the double resampling bootstrap induces acorrect inference for all the vector of the parameters in thepresence of random effects.



Simulations

(N;T ) = (10; 10)

Cros. Bloc. D-Res

1 31.4 34.4 9.42-way Vi 12.6 59.4 6.0ECM Wt 58.5 12.2 9.9

i + ft + it Xit 26.3 28.7 7.02-way ECM 1 27.0 35.8 9.9with spatial Vi 12.7 53.8 9.5dependence Wt 45.9 11.0 6.8

i + ft + iFt + it Xit 18.4 24.1 5.5


Simulations

(N;T ) = (20; 20)

Cros. Bloc. D-Res

1 25.2 24.4 8.92-way Vi 8.1 67.5 6.9ECM Wt 67.3 7.8 7.2


i + ft + iFt + it Xit 21.4 19.8 5.7


Simulations

(N;T ) = (30; 30)

Cros. Bloc. D-Res

1 26.0 23.8 6.52-way Vi 8.7 73.8 5.2ECM Wt 73.8 7.3 4.7


i + ft + iFt + it Xit 20.5 21.2 5.5


Simulations

(N;T ) = (50; 50)

Cros. Bloc. D-Res

1 24.3 20.5 6.02-way Vi 5.5 81.6 5.5ECM Wt 78.2 5.2 5.6


i + ft + iFt + it Xit 19.5 20.4 4.9


Chapter 3: BootstrappingDifferences-in-Differences Estimates

How bootstrap method can help to avoid spurious findings in theevaluation of public policies using panel data.

Double Resampling Bootstrap avoids size distortions and givesmore reliable evaluation of public policies


Differences-in-Differences Estimation

Basic setup : Y outcome of interest

Two groups : Treatment group, Control group of statistical units,Two periods before and after a public intervention.

The Differences-in-Differences (DD) estimator is :

DD = (yT ,2 yT ,1) (yU,2 yU,1) =

y = 0 + 1I2 + I + u

I2 is a time dummy variable, I is a binary program indicator.

By analogy, OLS estimator is called Differences-in-Differences(DD) estimator, even in a more complex linear regression model.


Impact Evaluation Using Panel Data

General setup :Introduction of Control Variables X to avoidselection bias, Several periods. The model becomes :

yit = Xit + Iit + uit

i = 1, 2, ....N; t = 1, 2, ....T

Typically its a linear panel data model : several statistical unitsduring several time periods.

Advantages : Robustness in time dimension, Possibility todistinguish short term impact and long term impact.

Difficulties : Heterogeneities, temporal correlation, moderatesample size (specially in time dimension).


BDM Exercise

Bertrand, Duflo and Mullainathan (QJE,2004) examines thedifferences-in-differences estimator commonly used with panel datato evaluate the impact of public policies.

Their empirical application uses panel data constructed from theCurrent Population Survey (CPS) on wages of women in the 50states, from 1979 to 1999.


BDM Exercise

Formally, consider the next model :

Yist = As + Bt + cXist + Ist + ist

Yist : outcome (wage), As : state effects, Bt : time effects

Ist : dummy intervention variable : Randomly generated


BDM Exercise

Yist = As + Bt + cXist + Ist + ist

First regression on individual controls Xist (education and age)

Panel construction with mean of residuals by state and year.

Y st = s + t + Ist + st


BDM EXERCISE


Placebo public interventions are randomly generated across Statesand Periods its impact measured on wages. By construction, noimpact should be found : = 0.

Intuition of BDM Exercise: Several Researchers evaluateindependently a public policy without real impact, using a correctinference method, only 5% of the Researchers should conclude thatthe public policy has a significant impact (Wrong answer).


BDM Exercise


States BDM-OLS FGLS BDM-BSP

06 48.0 . 43.5

10 38.5 . 22.5

20 38.5 . 13.5

50 43.0 24.0 6.5Table 1 : BDM Simulations Results (Theoretical level 5%)

Several evaluations conclude to a significant impact when there isno impact.Dummy variables not enough to remove all the correlationstructure.Parametric Assumptions for FGLS fail to correct the problem.BDM bootstrap method (without rigorous theoretical justification).


BDM Revisited

States BDM FGLS BDM-BSP Pair-BSP D.Res.1 D.Res.2

06 48.0 - 43.5 17.1 15.0 4.9

10 38.5 - 22.5 13.3 9.6 5.3

20 38.5 - 13.5 8.1 6.3 5.1

50 43.0 24.0 6.5 6.5 5.1 5.1Table 2 : Simulations Results(Theoretical level 5%)

BDM : BDM Fixed effects OLS

FGLS : Assume AR1 process for Error term

BDM-BSP : BDM Bootstrap

Pair-BSP : Correct Version of BDM Bootstrap (correct bootstrapvariance)

D.Res.1 : Double Resampling , Residuals based bootstrap

D.Res.2 : Double Resampling, Pairs bootstrap


THANKS !

Chapter 1Bootstrap MethodsTheoretical Results

Chapter 2Chapter 2.1Chapter 2.2

Chapter 3Empirical MotivationBDM Revisited

bootstrap for panel data (ppt), hounkannounon

Documents