small sample performance of dynamic panel data estimators...

40
Small Sample Performance of Dynamic Panel Data Estimators: A Monte Carlo Study on the Basis of Growth Data Nazrul Islam Department of Economics Emory University Current Draft: August, 1998 --------------------------------------------------------------------- Initial versions of this paper were presented in seminars at Harvard University and Emory University. I would like to thank the participants of these seminars whose comments helped improve this paper. All remaining errors are mine.

Upload: lecong

Post on 08-May-2019

225 views

Category:

Documents


0 download

TRANSCRIPT

Small Sample Performance of Dynamic Panel DataEstimators: A Monte Carlo Study on the Basis of

Growth Data

Nazrul IslamDepartment of Economics

Emory University

Current Draft: August, 1998

---------------------------------------------------------------------Initial versions of this paper were presented in seminars at Harvard University and

Emory University. I would like to thank the participants of these seminars whose commentshelped improve this paper. All remaining errors are mine.

1

Small Sample Performance of Dynamic Panel DataEstimators: A Monte Carlo Study on the Basis of Growth

Data

1. Introduction

This paper investigates the small sample properties of dynamic panel data estimators

as applied for estimation of growth-convergence equation using Summers-Heston (1988,

1991) data set. It shows that, among various estimators used for this purpose, the ones not

using lagged dependent variable as instrument perform better than the ones that do. This is

an important result, which is useful in assessing the results on convergence equation

presented recently by different authors using different dynamic panel data estimators.

One of the issues around which new growth literature has evolved is that of

convergence. Statistically, convergence has been interpreted as a negative correlation

between initial level of income and subsequent growth rate. Hence, the popular method for

testing convergence hypothesis has been to conduct growth-initial level regressions. For a

long time, these regressions were estimated using cross-section data only. However, in the

early nineties, it was observed that the cross-section approach suffers from certain important

limitations in this regard. This led to the panel approach, and since then use of panel data in

estimating the convergence equation has become quite common

Close scrutiny show that the growth-initial level regression is actually a dynamic

panel data model. This provides the rationale for using dynamic panel data estimators. The

resulting panel estimates generally differ from the cross-section estimates. However, they

also differ among themselves. This poses a problem because the properties of the dynamic

panel data estimators are mostly asymptotic, and hence, their small sample properties are a-

priori not known. It is therefore difficult to judge which of the different panel estimates are

more reasonable.

The Monte Carlo study presented in this paper is a response to this difficulty. It is

customized to the data set and specification that are generally used in growth and

convergence studies. The estimators included in this study are the instrumental variable (IV)

estimators of Anderson and Hsiao (1981, 1982), the multivariate estimators, including the

Minimum Distance estimator, suggested by Chamberlain (1982, 1983), and the generalized

2

method of moments (GMM) estimators proposed by Arellano (1989a, 1989b) and Arellano

and Bond (1991). In addition, we also include conventional estimators like the ordinary least

squares (OLS), least squares with dummy variable (LSDV), and a few other simultaneous

equations estimators.

Two things emerge from this exercise. First, wide variation is actually observed in

the results from different dynamic panel estimators. Second, estimators that rely on further

lagged values of the dependent variable as instruments perform worse than the estimators

that do not rely on them as instruments. These two results together suggest that significant

caution should accompany presentation of results from specific panel estimator. In fact, it

seems desirable that researchers check out the reliability of their panel results by conducting

their own Monte Carlo experiments.

2. Previous Monte Carlo Studies

The issue of small sample properties of dynamic panel data estimators is not entirely

new. There have been earlier attempts to investigate it. For example, Nerlove (1967)

considers a simple auto-regressive model with no exogenous variable, and compares the

performance of OLS, LSDV, MLE, and several variants of GLS in estimating the model. In

Nerlove (1971), the dynamic panel data model is extended to have exogenous variable. This

allows consideration of instrumental variable (IV) estimator with lagged values of the

exogenous variable as instrument. It also allows having another variant of two-stage GLS.

Overall, Nerlove’s Monte Carlo results favor GLS estimators over other estimators.

Since Nerlove’s work, there have been significant developments in the field of

dynamic panel data estimators. Among these has been introduction of Anderson and Hsiao

IV estimators which use further lagged values of the dependent variable as instruments.

Arellano has carried this idea forward and has proposed using all possible further lagged

variables as instruments within a GMM framework. Arellano and Bond (1991) perform a

Monte Carlo study to compare primarily the small sample properties of their GMM

estimators with the corresponding properties of Anderson-Hsiao estimators. According to

their results, the GMM estimators perform better than Anderson-Hsiao IV estimators though

not so much in terms of bias as in terms of dispersion.

3

3. Objectives of the Present Study

Given these earlier studies, what can be the motivation for the Monte Carlo study

presented in this paper? There are several of them. First, Monte Carlo results have been

generally found to be more useful when these are customized to the particular model, data

set, and range of parameter values that pertain to the actual problem to which application of

panel data estimators is being considered. Thus, in view of the controversy regarding

convergence parameter estimates, it is useful to have Monte Carlo results that are specific to

the growth-convergence equation, specific to the Summers-Heston growth data set, and

specific to the pertinent range of parameter values. This is exactly what is done in this

exercise. This also leads to the following second motivation. The previous Monte Carlo

studies have generally focussed on panel data models with random individual effects.

However, the individual effect that arises in the growth-convergence equation is of

correlated nature. There have been no Monte Carlo results, as far as we know, on models

with correlated effects. The third motivation for this study arises from the fact that several

important dynamic panel estimators do not find place in the earlier Monte Carlo studies.

Among these are the estimators that use the multivariate approach, like the three stage least

squares (3SLS) and generalized three stage least squares (G3SLS). In addition,

Chamberlain’s MD estimator also remains outside the purview of the previous Monte Carlo

studies. The MD estimator using optimal weighting matrix, sometimes referred to as

Optimal Minimum Distance Estimator (OMDE), has been used as the benchmark for

judging theoretical efficiency of many of the GMM estimators proposed by Arellano. So it is

of some interest that Monte Carlo study covered these estimators as well. The present study

takes a comprehensive view and includes almost all the known dynamic panel data

estimators for evaluation. Fourth, the present study does not limit itself to only one

particular generating mechanism of the transitory error term. Instead, it considers three

different generating schemes are considered, namely uncorrelated, autoregressive, and

moving average. Finally, the study also investigates the impact of sample size on relative

performance of the estimators. This is very relevant for the problem at hand. The

convergence equation has generally been estimated in the context of two different samples

of countries, one being small and the other, large. It is therefore necessary to see how

different estimators perform in these samples of different size.

4

Model and Parameter Values

The Model

The dynamic panel data model that arises in the convergence literature is as follows:

(1) ittititiit xyy νηµβα ++++= −− 1,1, .

Here ity represents log of per capita GDP of country i at time t, 1, −tiy is the same

lagged by one period, and 1, −tix is the difference in the log of investment and population

growth variables of country i at time t-1. Finally, iµ and tη are the individual and time

effect terms, and itν is the transitory error which varies across both individual and time.

The most important issue for a panel model is specification of the individual effect

term iµ . Growth theory implies that iµ is correlated with the included explanatory variable

1, −tix . This means that the random-effects assumption is not appropriate for this model. The

most appropriate model is that of correlated effects. However, there are different ways to

specify this correlation. Of the linear specifications, the simplest is the one proposed by

Mundlak (1971), whereby the individual effect term is assumed to be a linear function of the

mean (over time) of the exogenous variable for the individual concerned.1 Chamberlain

suggests a more general specification whereby the individual effect term is a linear function

of the exogenous variable for all the time periods with varying coefficients. This still leaves

out the possibility for specification to be non-linear, but interpreted as linear predictor, this

does not involve any restriction. According to this more general specification, we have

(2) iTTiii xxx ωλλλλµ +++++= − 112010 Λ ,

with iω distributed as ),0( 2ωσN .2

Parameter Values

1 However, if the transitory error term itν is serially uncorrelated, then this specification of the individualeffect renders the random-effects model to be equivalent to the fixed effects model.2 Another issue that arises here is that of dealing with the time effect term tη . One possible approach in thisregard is running residual regression, which in this case would mean subtracting out the cross-section meansfrom the variables included in the equation. The other approach is to include appropriate set of time dummies.

5

Considered in full, the model presented in equation (1) and (2) has three sets of

parameters. Of foremost importance is, of course, the auto-regressive parameter α ,

followed by the slope parameter β , attached to the exogenous variable 1, −tix . The other

group consists of Tλλλ ,,, 10 Κ , parameters that arise from specification of the individual

effect term iµ . Finally, the third set of parameters are those governing the error terms itν

and iω . Their number further increases when itν is considered to be serially correlated. For

itν , the following three processes are considered:

1. Serially uncorrelated with ),0(~ 2νσν Nit .

2. MA (1) process: 1, −+= tiitit εθεν , with ),0(~ 2εσε N .

3. AR (1) process: ittiit ενϕν += − 1, , with ),0(~ 2εσε N .

The values of these parameters for simulation are chosen in a data-dependent

manner. A three-step procedure is employed for this purpose. In the first, we obtain

consistent estimates of α and β using itx ’s as instruments. These are used to compute the

composite residual )( itit νµη ++ . These residuals are then regressed on itx ’s to get

estimates of λ’s and of the time dummies. The residuals from this second regression are

then used to get the parameters of different generating mechanisms of itν .3 Table-1 gives the

values of those parameters of the model that stay the same under all different generating

schemes of itν .

In growth-convergence studies, three different samples have been frequently used.

Following Mankiw et al. (1992), these may be named as NONOIL, INTER, and OECD. Of

these, OECD is the smallest and consists of 22 OECD countries. NONOIL is the largest and

consists of most of the sizable countries of the world for which oil extraction is not the

dominant economic activity. This sample consists of 96 countries. Finally, INTER is an

intermediate sample comprised of all those countries included in NONOIL except those for

which data quality is not that satisfactory. This sample has 74 countries.

3 Details regarding these steps can be obtained from the author upon request.

6

Table-1

Common Parameter Values

Parameter NONOIL INTER OECDα 0.7886 0.7925 0.6294β 0.1641 0.1732 0.0954

0λ 1.3334 1.3588 2.8986

1λ -0.0028 0.1927 0.5863

2λ 0.1200 -0.1098 -0.6354

3λ -0.1243 -0.1644 -0.0702

4λ 0.0267 0.1286 0.6355

5λ 0.2277 0.1715 -0.3484

70η 0.0171 0.0093 0.0680

75η -0.0156 -0.0015 0.0827

80η -0.0067 0.0218 0.1295

85η -0.0669 -0.0523 0.1238

Some notable aspects of these parameter values are as follows. First, there seems to

be some agreement across samples regarding direction in which itx ’s for different years

relate to the individual effect term iµ . However, this agreement is not complete. Second, the

way different time periods affect the growth process differs across samples. There are some

differences in this regard between NONOIL and INTER samples, but more significant is the

difference between these two samples, on the one hand, and OECD, on the other.

Table-2 presents the parameter values for the three different generating mechanisms

of itν . Notable aspects of this set of parameters are as follows. First, any serial dependence

that itν may have in actual data is of fairly low order. Second, variance of the individual

country effect term remains quite stable under alternative generating schemes of itν for all

different samples. Finally, the estimate of the variance of itν also remains very similar.

These suggest that the problem of serial dependence in the data is not serious, and the

relative performance of different estimators may not vary widely for different ways of

modeling of itν . We assume that all the disturbance terms have normal distribution.

7

Table-2

Parameter Values for Different Generating Mechanisms of itν

Parameter NONOIL INTER OECDUncorrelated itν

νσ 0.1054 0.0872 0.0300

ωσ 0.1281 0.0139 0.0762

MA(1) itνθ 0.2037 0.1250 0.1125

νσ 0.1179 0.0990 0.0302

ωσ 0.1225 0.1010 0.0742

εσ 0.1153 0.0980 0.0300

AR(1) itνϕ 0.2994 0.1787 0.1394

νσ 0.1227 0.0943 0.0319

ωσ 0.1183 0.0995 0.0742

εσ 0.1171 0.0927 0.0316

Estimators Considered and Related Issues

Estimators Included in the Study

The following gives a complete list of the estimators included in this Monte Carlo

study. It also shows the abbreviations that will be used to refer to these estimators in the

remaining of the paper.

1. Ordinary Least Squares (OLS)

2. Least Squares with Dummy Variables (LSDV)

3. Anderson-Hsiao Instrumental Variable Estimator in Level Form (AH(l))

4. Anderson-Hsiao Instrumental Variable Estimator in Difference From (AH(d))

5. Arellano GMM Estimator, One Step (AGMM1)

6. Arellano GMM Estimator, Two Step (AGMM2)

7. Two Stage Least Squares Estimator (2SLS)

8

8. Three Stage Least Squares Estimator (3SLS)

9. Generalized Three Stage Least Squares Estimator (G3SLS)

10. Minimum Distance Estimator (MD)

Brief description of these estimators, as applied to the model considered in this

study, is provided in Appendix. We can, therefore, directly proceed to presentation of the

Monte Carlo work.

Sample Size and Related Issues:

Given a certain number of cross-sections available (i.e., given T), different panel data

estimators can make use of different numbers of these in the final stage of estimation. In

simulation, therefore, it is possible to adopt two different approaches. First, it is possible to

keep the actual number of cross-sections used by the estimators the same by generating

varying number of cross-sections for different estimators. Second, the number of cross-

sections generated may be kept the same, and the number of actual cross-sections used in the

final stage of estimation by different estimators may be allowed to vary. It is the second

situation that a researcher faces in actual practice. She confronts a given data set and then

has to choose from among different estimators depending on their relative merits. In order to

conform to this real situation, this study adopts the second approach. This also conforms to

the fact that this Monte Carlo study is based on a given data set, namely Summers-Heston

growth data set. The panel based on this data set is considered at five-year intervals. Thus,

the data available are for 1960, 1965, 1970, 1975, 1980, and 1985. Since this is a dynamic

model as given by equation (1), the effective T equals to five.

Issues Particular to Individual Estimators

OLS:

OLS, the simplest of all estimators considered, is applied to the equation in the level

form. Since initial values of ity (i.e., of 1960) are known, OLS can use in actual estimation

all of the T cross-sections. OLS, however, is not geared to providing estimates of either tλ’s

or iµ ’s. It gives an estimate of the composite error term itu but cannot provide its further

9

decomposition into the component parts, iµ and itν . Also, OLS is inconsistent even with T

going to infinity.

LSDV:

LSDV is also applied to the equation in level form and, for the same reasons as with

OLS, all of the T cross-sections can be used in actual estimation. It can provide estimates of

iµ ’s (viewed as parameters to be estimated) and hence of tλ’s, from a subsequent

regression of the estimated iµ on the itx s. Also, it can give estimates of variances of the

two error components separately. The between-group variations or, more directly, variance

of the estimated iµ ’s, can be used to get an estimate of 2µσ . On the other hand, 2

vσ may be

obtained from the within-group variations. LSDV estimator, though not consistent in the

direction of N, is consistent when T goes to infinity.

AH(l) and AH(d):

Both these IV estimators apply to the model in first-differenced form. This results in

the ‘loss’ of one cross-section from the actual estimation exercise. Furthermore, since in

AH(l) 2, −tiy is used as instrument for )( 2,1, −− − titi yy , one more cross-section is lost in the

process. Thus, if T cross-sections are available, the number of cross-sections actually used is

(T-2). AH(l) is mainly geared to consistent estimation of the slope parameters α and β . It

can also provide an estimate of 2νσ . However, other parameters of the model cannot be

recovered using this estimator. AH(d) has similar features as those of AH(l). However, since

AH(d) uses )( 3,2, −− − titi yy as instrument for )( 2,1, −− − titi yy , two cross-sections are lost in

the process. Hence with T cross-sections available, only (T-3) are used. When T is small, this

may be an important consideration.

AGMM(1) and AGMM(2):

Like the AH estimators, both AGMM(1) and AGMM(2) work with the model in first

differenced form. This results in loss of one cross-section from the actual estimation process.

AGMM(1) is based on arbitrary weighting matrix, whereas in AGMM(2) the weighting

matrix is appropriately constructed using residuals from AGMM1 estimation. The details of

10

the construction of the instrument matrix have been sketched in Appendix. Like AH

estimators, these estimators are also geared to estimation of α and β , and do not provide

estimates of the remaining parameters.

2SLS, 3SLS, and G3SLS:

This set of estimators may be commonly referred to as Simultaneous Equation (SE)

estimators. All of them apply to the model in first differenced form. Thus, if T is the number

of cross-sections available, the number of equations used in estimation is (T-1). These

estimators are also geared to estimation of α and β .4 One issue in G3SLS estimation is

which residuals to use for construction of the covariance matrix. The residuals can be

obtained from either 2SLS or 3SLS or even from a generalized 2SLS. However, this does

not change the asymptotic properties of the estimator. In this study residuals from 2SLS are

used for both 3SLS and G3SLS.

MD:

In contrast with AH, AGMM, and SE estimators, MD works with the model in

levels. Hence, all the cross-sections available can be used in estimation. Instead of

eliminating iµ through first differencing, as other estimators do, MD specifies iµ in terms

of its correlation with itx ’s, and uses this specification to substitute for iµ . Hence, his

estimator can provide estimates not only of the slope parameters of the main dynamic

equation but also of the tλ’s. However, this also makes MD a non-linear estimation

procedure. An important issue in MD estimation, therefore, is of setting initial values of the

parameters for non-linear iteration to proceed. The task is made difficult by the fact that, in

case of a simulation exercise, the true parameter values are known! A related question is

whether to use the same set of initial values to start the iteration for each replication or to

change it over replications. In this study, we resolve these issues in the following manner.

4 However, in principle it is possible to add the equation for iµ to the system. This raises the number of

equations to T and allows having estimates of tλ ’s. However, inclusion of the equation for iµ into the system

does not affect the asymptotic properties of the estimates of α and β because there are no across-equation

restrictions between the equation for iµ and those for ty ’s.

11

It is impossible to set ‘arbitrary’ initial values that are free from prior knowledge of

the true parameter values because they are known! Hence, it is preferable to have these

initial values result from a routine, for example, from a preliminary estimation procedure.

This also resolves the other question and allows the initial values to be data-dependent and

hence be different across replications. However, the question that arises is, which particular

estimator to use for generating the initial parameter values? It is clear that LSDV is the most

likely candidate because it is the only estimator other than MD that provides estimates of not

onlyα and β but also of the tλ s.

In MD estimation, there is also the issue of choosing the weighting matrix. There are

several options in this regard and the optimality of the estimates depends on the choice. This

study uses the optimal weighting matrix, which takes full account of possible

heteroskedasticity across individuals.

Simulation Results and Discussion

It is apparent from the discussion above that not all the panel estimators are geared to

provide estimate all the parameters of the model. Because of this and also in order not to

clutter the presentation with too many numerical results, we focus here on results concerning

only α and β , the main parameters of the model.5 The simulation results presented in this

paper are on the basis of one thousand replications. In most cases, Monte Carlo distributions

stabilized already with one hundred replications. Hence increasing the number of

replications by any further was not necessary.

The detailed results regarding the Monte Carlo distributions obtained from the study

are provided in three sets of tables in the Appendix for three different generating

mechanisms of itν . Tables A1 to A3 show results for different samples when itν is serially

uncorrelated. Similarly, Tables A4 to A6 give results when itν follows an MR(1) process.

Finally, Tables A7 to A9 give results when itν obeys an AR(1) scheme.

The two criteria that are usually used in judging performance of an estimator are bias

and mean square error (MSE). In order to make assessment of performance of the estimators

5 Hence, we do not report the results regarding the error variances.

12

easier, we present tables showing relative bias and relative magnitude of root mean square

error. Tables 3 and 4 provide relative magnitudes of bias, and Tables 5 and 6 show relative

magnitudes of root mean square error for the estimates of α and β , respectively. The

following general features emerge from the simulation results.

First, a notable aspect of the results is the unsatisfactory performance of AH(l). The

point estimates produced by this estimator fluctuate very widely. This finds expression in

large standard deviations of Monte Carlo distributions obtained for this estimator. The

unsatisfactory performance of AH (l) contrasts sharply with better performance of AH(d).

Both these estimators rely on the assumption of orthogonality of lagged ty with itν . This

assumption holds only when itν is serially uncorrelated. Therefore, one would expect both

these estimators to perform well when itν is serially uncorrelated, and both of them to

perform poorly when itν is either AR(1) or MA(1). However, as can be seen from the

relevant tables, AH(d) performs relatively well under all different generation mechanisms of

itν and for all samples, while the performance of AH(l) is found to be unsatisfactory under

all different generation mechanisms of itν , and particularly for NONOIL and INTER

samples. The explanation, as it turns out, lies in the difference in the degree to which

instruments are correlated with the explanatory variable. It is found that )( 3,2, −− − titi yy , the

instrument used by AH(d), is strongly correlated with the explanatory variable

)( 2,1, −− − titi yy , while 2, −tiy , the instrument used by AH(l), is very poorly correlated with

)( 2,1, −− − titi yy . This poor correlation finds reflection in astronomically large values of

standard error estimates for AH(l).6 These results testify to the general rule that, in order to

be successful, it is not sufficient for an instrument to be just uncorrelated with the error term,

it also has to be adequately correlated with the explanatory variable for which it acts as an

instrument.

Second, as we noticed, the degree of serial correlation in itν is very mild. As

suspected on the basis of a-priori reasoning, this mildness of serial correlation imparts a

general pattern to the simulation results. For most of the estimators, the performance does

6 Computation of these standard errors involves the inverse of this correlation matrix.

13

not vary to a great extent with respect to the alternative generating scheme of itν . We have

already seen this in the context of the AH estimators. Going by the results on α , this is true

even for AGMM1 and AGMM2. This is somewhat surprising because these estimators

depend rather heavily for their validity on orthogonality of lagged values of ity to itν , and

this orthogonality is violated when itν follows either an AR or a MA scheme.7

Third, performance of many of the estimators shows considerable variation with

respect to the sample considered. However, it is difficult to establish a single pattern in this

regard. For example, going by the results on bias of estimated α , performance of OLS

deteriorates for OECD when compared with that for either NONOIL or INTER samples.

However, in case of LSDV and MD, the opposite is true. The AGMM and SEM estimators

show yet a different kind of contrast. The performance of the AGMM estimators deteriorates

for INTER sample in comparison with that for either NONOIL or OECD. In case of SEM

estimators, the opposite is true. The contrasting performance of AGMM and SEM estimators

may not be entirely surprising in view of the fact that while the former depends on lagged

ity ’s as instruments, SEM estimators rely entirely on itx ’s.

Fourth, comparison of the performance of 3SLS and G3SLS with that of 2SLS

yields mixed results. In the NONOIL sample, regardless of generating mechanism for itν ,

results from 3SLS and G3SLS estimators seem to be better than that from 2SLS. For INTER

sample, however, 2SLS seems to perform better than either 3SLS or G3SLS. In case of the

OECD sample, the situation is less clear cut. In terms of mean of the Monte Carlo

distribution, 3SLS and G3SLS fare better than 2SLS, though not in terms of dispersion. On

the other hand, in the OECD sample, the Monte Carlo distributions for 2SLS have very large

standard deviation.

One reason for deterioration of performance of 3SLS and G3SLS estimators in

INTER and OECD samples, when compared to that in NONOIL sample, may lie in sample-

size. The sizes of INTER and OECD samples are smaller that that of NONOIL. Since the

7 To be accurate, it needs to be mentioned that AH(d) does show some expected sensitivity with respect to thegeneration scheme of itν . Why similar sensitivity is not found in the performance of the AGMM estimators isan interesting question.

14

superiority of 3SLS and G3SLS over 2SLS is an asymptotic result, a larger sample size may

help this result to surface.

Fifth, the MD estimator performs better in comparison with AGMM estimators. With

regard to α , this is particularly true for INTER and OECD samples. For β this is true for

all different samples. In terms of bias, some of the SEM estimators perform slightly better

than MD in some of the samples. Thus, for example, with regard to α , 3SLS and G3SLS

show less bias than MD in NONOIL and INTER samples. However, in case of OECD, the

bias involved with the former estimators is much larger than with the latter. Regarding β ,

MD estimator shows less bias than by SEM estimators for INTER sample as well. Even for

the NONOIL sample, the bias involved with the MD is similar to and sometimes less than

that with SEM estimators.

Bias of the Monte Carlo distributions needs to be evaluated in conjunction with

corresponding dispersion. Mean Square Error (MSE) is the popular measure that is used to

take into account the possible trade-off between bias and variance. Looking at the values of

root MSE (presented in Tables 5 and 6) of Monte Carlo distribution of the parameter

estimates, we see that the MD estimator proves to be uniformly superior to AGMM and

SEM estimators. The only exception in this regard is the performance of 2SLS in estimating

β in the INTER sample. We earlier noted somewhat erratic performance of AH(d). In terms

of bias, it performs better than MD in some isolated cases. However, in terms of root MSE,

except for one lone exception, MD outperforms AH(d).

Thus, overall, among the N-consistent estimators, MD appears to be a more

dependable estimator for estimating the growth convergence equation using Summers-

Heston data set. One problem noticed with MD estimator for the problem at hand is its

sensitivity with respect to the starting values of iteration procedure. However, in case of a

single estimation exercise, it is possible to explore this sensitivity by looking at the

minimum distance statistic of the converged results obtained from different starting values.

Applying this procedure, it can be ensured that a global maximum is reached, and not a local

one.

One surprising aspect of the results was the relatively better performance of LSDV.

In terms of both bias and root MSE, and in estimation of both α and β , LSDV proves to be

a relatively superior estimator for the problem at hand. This is somewhat surprising given

15

that theoretically LSDV is consistent only in the direction of T, and the data used in this

study has a small T. What this shows is that small sample properties of estimators may be

very specific to the model and data considered.

From another point of view, it is also interesting to note that simpler estimators such

as LSDV and 2SLS (and in some cases, AH(d)) outperform more sophisticated estimators

such as AGMM2 and some of the SEM estimators. This helps draw attention to the

following important fact. The optimal properties of sophisticated estimators often depend on

use of optimal weighting matrix. Unfortunately however, these weighting matrices have to

be estimated, and as a result, they pick up, along with signal, noise present in the data. This

noise gets transmitted to the final parameter estimates when these are produced using the

estimated weighting matrices. To the extent that simpler estimators do not have to use

estimated weighting matrices, they are spared of this potential source of additional noise.

This explains why sophisticated estimators may actually perform worse than simpler

estimators, though this does not necessarily have to be the case.

Conclusions

Two kinds of results emerge from this investigation of small sample properties of

dynamic panel estimators in the context of the growth convergence equation and Summers-

Heston data. On the one hand, there are specific results concerning the problem at hand. On

the other hand, there are more general aspects of these specific results. Among the specific

results, the following stand out. First, parameter estimates produced by different dynamic

panel data estimators are indeed found to vary widely. This should act as a caution for

researchers who are increasingly turning to panel data methods in investigating growth and

convergence issues. Use of panel data presents advantages that are not available within the

confines of cross-section data. However, these advantages come with some problems. The

possibility of small sample bias is one of them. It seems desirable that researchers using

panel estimators tried to check into the small sample properties of the estimators in

respective particular contexts. Second, for estimation of the convergence equation using

Summers-Heston data, it seems that, in general, estimators not relying of further lagged

dependent variable as instrument perform better than those which do rely on them. Third,

16

the least squares with dummy variables prove very satisfactory for this particular model and

particular data set.

Among the general aspects of these concrete results, the following may be noted.

First, asymptotic equivalence of properties may indeed be misleading. Monte Carlo

investigations are helpful to have some idea about the small sample behavior of the

estimators. Second, sometimes, small sample property may be quite at variance from that

suggested by the theoretical properties. This is exemplified by the performance of LSDV in

this exercise. Third, although sophisticated estimators generally have more desirable

theoretical properties, in practice their implementation requires use of estimated weighting

matrices. Sampling variability and other noise picked up in estimating the weighting

matrices may often cause the sophisticated estimators perform worse than their simpler

counterparts. Fourth, the results confirm the general rule that to be successful instruments

should not only be uncorrelated with the error term but also sufficiently correlated with the

variables for which they are working as instrument.

17

Table-3

Bias as Percentage of True Parameter Valueα

Estimator NONOIL NONOIL NONOIL INTER INTER INTER OECD OECD OECD

UC MA(1) AR(1) UC MA(1) AR(1) UC MA(1) AR(1)OLS 14.8 14.6 14.8 15.2 15.2 15.4 21.5 20.9 21.2LSDV -8.0 -8.2 -7.9 -8.4 -9.3 -8.0 -1.6 -1.7 -1.4AH(l) n.c. n.c. n.c. n.c. n.c. n.c. n.c. n.c. n.c.AH(d) 0.4 -14.5 -15.9 0.2 -9.5 -10.0 0.6 -1.2 -1.6AGMM1 -10.7 -10.4 -10.6 -44.4 -49.5 -43.4 -9.5 -8.6 -8.3AGMM2 -9.7 -10.1 -10.2 -47.3 -49.5 -44.4 -8.6 -8.2 -8.52SLS -9.3 -9.3 -8.6 -3.1 -3.1 -2.8 -18.8 -15.8 -17.13SLS -4.5 -3.3 -6.5 -5.8 -7.9 -5.3 -12.7 -12.2 -12.2G3SLS -6.0 -5.4 -5.2 -8.3 -10.1 -8.8 -19.9 -16.5 -13.4MD -6.7 -6.9 -6.4 -6.9 -7.9 -6.7 -1.3 -1.1 -1.2

Table-4

Bias as Percentage of True Parameter Valueβ

Estimator NONOIL NONOIL NONOIL INTER INTER INTER OECD OECD OECD

UC MA(1) AR(1) UC MA(1) AR(1) UC MA(1) AR(1)OLS 31.4 32.1 31.6 11.8 11.7 11.1 100.0 99.9 100.5LSDV 1.0 -0.7 -0.3 0.5 1.4 1.1 -1.5 -0.7 -2.1AH(l) n.c. n.c. n.c. n.c. n.c. n.c. n.c. n.c. n.c.AH(d) 0.4 -2.2 -4.0 -0.6 -1.1 -1.0 -1.7 -2.5 -0.4AGMM1 13.7 14.4 14.5 -7.5 16.3 26.4 -7.3 -5.4 -2.6AGMM2 3.9 5.2 14.7 3.1 22.1 34.5 -17.3 -19.9 1.52SLS -2.3 -2.7 -1.9 2.7 2.0 2.5 -0.8 -3.9 -3.93SLS -0.2 -0.2 -2.0 -2.0 -2.3 9.3 -5.1 4.5 -8.9G3SLS -1.3 -2.4 -1.7 2.0 2.2 8.7 -14.2 -8.1 2.0MD 0.2 -0.7 -0.8 1.0 0.5 0.0 -0.6 -0.6 1.3

18

Table-5

Root MSE as Percentage of True Parameter Valueα

Estimator NONOIL NONOIL NONOIL INTER INTER INTER OECD OECD OECD

UC MA(1) AR(1) UC MA(1) AR(1) UC MA(1) AR(1)OLS 15.0 14.8 14.9 15.3 15.3 15.3 22.3 21.7 22.0LSDV 8.5 8.7 8.5 8.9 9.9 8.7 3.5 3.6 3.6AH(l) n.c. n.c. n.c. n.c. n.c. n.c. n.c. n.c. n.c.AH(d) 8.3 16.6 17.6 5.4 13.9 13.3 7.3 7.2 7.5AGMM1 27.7 27.5 26.7 64.8 70.4 65.9 24.3 21.9 23.7AGMM2 29.6 29.3 28.9 79.7 84.9 77.1 32.9 29.1 31.02SLS 12.1 12.6 12.0 5.1 5.4 5.0 24.3 21.3 23.03SLS 8.5 14.7 10.4 9.6 11.1 8.9 28.4 28.4 23.6G3SLS 10.0 18.1 8.7 11.9 13.8 12.6 40.9 37.6 29.3MD 7.4 7.8 7.4 7.6 8.7 7.6 3.0 3.1 3.2

Table-6

Root MSE as Percentage of True Parameter Valueβ

Estimator NONOIL NONOIL NONOIL INTER INTER INTER OECD OECD OECD

UC MA(1) AR(1) UC MA(1) AR(1) UC MA(1) AR(1)OLS 34.6 35.2 34.7 18.8 18.1 17.7 117.4 116.6 116.0LSDV 12.8 15.3 15.4 12.4 14.5 14.3 40.1 44.9 43.8AH(l) n.c. n.c. n.c. n.c. n.c. n.c. n.c. n.c. n.c.AH(d) 19.9 20.1 19.1 20.3 21.2 18.1 64.9 60.2 62.1AGMM1 147.0 151.9 145.3 148.0 153.8 143.6 243.5 237.9 226.5AGMM2 169.6 169.7 165.1 187.7 205.2 181.9 306.8 284.6 284.12SLS 17.1 18.4 17.5 19.5 20.8 19.3 58.3 54.7 57.93SLS 13.7 21.9 16.1 16.5 15.6 17.8 67.2 64.9 82.5G3SLS 17.5 28.2 15.4 17.8 18.5 23.5 119.4 111.1 149.6MD 13.3 15.4 15.8 12.6 15.1 14.4 40.5 44.6 45.6

19

Appendix on the Estimators

The Appendix provides brief description of the estimators included in the Monte

Carlo study presented in this paper. The details can be found in the appropriate references.

In the following, we ignore the time effects, iη , which can be dealt with in fairly

straightforward manner, as described earlier in the paper.

OLS

In OLS estimation, the unobserved individual effect term is completely ignored and

estimation is carried out on the basis of the equation in levels. Let

y y y y yN iT NT= ′( , , , , , , )11 1Λ Λ Λ ,

y y y y yN i T N T− − −= ′1 10 0 1 1( , , , , , , ), ,Λ Λ Λ , and

),,,,,,( 1,1,010 ′= −− TNTiN xxxxx ΛΛΛ .

Also, let W y x= −[ ]1 . Then the OLS estimator of the parameter vector ( )α β γ′= is given

by

∃ ( )γ= ′ ′−W W W y1 .

The standard errors under homoskedasticity are obtained from Var ( ∃) ( )γ = ′ −s W W2 1 ,

with s e e NT2 2= ′ −/ ( ) , where e y W= −( ∃)γ . The general heteroskedasticity consistent

standard errors are obtained from ( ) ( ) ( )′ ′ ′ ′− −W W W diag ee W W W1 1 . Since Cov y i t i( , ), − ≠1 0µ ,

OLS estimator is biased. It is also inconsistent in the direction of both N and T.

LSDV

In LSDV, the assumption is that the individual effects are fixed. Hence the

straightforward way of implementing LSDV is through insertion of appropriate dummies

and then application of OLS on the enlarged model, hence the name Least Squares with

Dummy Variables. However, computationally, it is simpler to obtain LSDV through within

20

estimation. Averaging equation (1) over time and subtracting it from the original equation

yields

)()()( 1,1,1, iititiitiiit vvxxyyyy −+−+−=− −−− βα ,

where ∑∑∑∑=

=−

=−−

=====

T

titi

T

ttii

T

ttii

T

titi v

Tvx

Txy

Tyy

Ty

1

1

01,

1

01,1,

1

1and,

1,

1,

1. LSDV

estimation of the original model is equivalent to OLS estimation of the above equation.

Denote )(~and),(~),(~1,1,1,1,1, ititiititiiitit xxxyyyyyy −=−=−= −−−−− . Also, denote

~ (~ , , ~ , , ~ , , ~ )y y y y yN T NT= ′11 1 1Λ Λ Λ ,

~ (~ , , ~ , , ~ , , ~ ), ,y y y y yN T N T− − −= ′1 10 0 1 1 1Λ Λ Λ , and

)~,,~,,~,,~(~1,1,1010 ′= −− TNTN xxxxx ΛΛΛ .

Now, if ~ [~ ~]W y x= − 1 , then the LSDV estimator is given by

∃ ( ~ ~) ~γ= ′ ′−W W W y1 .

Clearly, Cov y y v vi t i it i[( ), ( )], ,− −− −1 1 is not equal to zero. This makes LSDV estimator

biased. It is also inconsistent in the direction of N. However, Amemiya (1967) shows that

LSDV for this model is consistent as ∞→T , and, if the errors are normally distributed,

LSDV is asymptotically equivalent to MLE. The standard errors of the LSDV estimates

under the assumption of homoskedasticity are obtained from Var s W W( ∃) ~ ( ~ ~)γ = ′ −2 1 , with

~ ~ ~ / ( )s e e NT N2 2= ′ − − , where ~ (~ ~ ∃)e y W= − γ .

Anderson and Hsiao Instrumental Variable (IV) Estimators

The two IV estimators proposed by Anderson and Hsiao (1981, 1982) both start by

differencing equation (1) to eliminate the individual effect term iµ . This yields

)()()( 1,2,1,2,1,1, −−−−−− −+−+−=− tiittititititiit vvxxyyyy βα

21

Because of correlation between 1, −tiy and 1, −tiv , instruments are needed. Provided itv is

serially uncorrelated, further lagged values of ity can act as necessary instruments.

AH(l)

Anderson and Hsiao (level) estimator uses yi t, − 2 as instrument. Denote

1,2,1,1,2,1,1,~)(and,~)(,~)( −−−−−−− =−=−=− titititititiittiit xxxyyyyyy . Further, let

vec y y y y yit l N T NT(~ ) [~ , , ~ , , ~ , ~ ]= ′12 2 1Λ Λ Λ

]~,~,,~,,~[)~( 1,1,1111 ′= −− TNTNlit xxxxxvec ΛΛΛ

vec y y y y yi t l N T N T(~ ) [~ , , ~ , , ~ , ~ ], , ,− − −= ′1 11 1 1 1 1Λ Λ Λ

vec y y y y yl N T N T( ) [~ , , ~ , , ~ , ~ ], ,− − −= ′2 10 0 1 2 2Λ Λ Λ .

Define ])~()([and],)~()~([ 1,21,1, ltilltiltil xvecyvecZxvecyvecW −−−− == . Then AH(l)

estimator of γ is given by

∃ ( ) (~ )γ= ′ ′−Z W Z vec yl l l it l1 .

The asymptotic standard error under the assumption of homoskedasticity are obtained from

the formula, Var s Z W Z Z W Zl l l l l l l( ∃) ( ) ( )γ = ′ ′ ′− −2 1 1 , where s e e N Tl l l2 1 2= ′ − −( ) / [ ( ) ] with

e vec y Wl it l l= −(~ ) ∃γ. The general heteroskedasticity consistent standard errors are obtained

from ( ) ( ) ( )′ ′ ′ ′− −Z W Z diag e e Z W Zl l l l l l l l1 1 .

AH (d)

In AH(d), the proposed instrument is )( 3,2, −− − titi yy instead of 2, −tiy . Continuing

with the notations above, let

vec y y y y yit d N T NT(~ ) [~ , , ~ , , ~ , ~ ]= ′13 3 1Λ Λ Λ

]~,~,,~,,~[)~( 1,1,12121, ′= −−− TNTNdti xxxxxvec ΛΛΛ

vec y y y y yi t d N T N T(~ ) [~ , , ~ , , ~ , ~ ], , ,− − −= ′1 12 2 1 1 1Λ Λ Λ

22

vec y y y y yi t d N T N T(~ ) [~ , , ~ , , ~ , ~ ], , ,− − −= ′2 11 1 1 2 2Λ Λ Λ , and

vec y y y y yi t d N T N T(~ ) [~ , , ~ , , ~ , ~ ], , ,− − −= ′3 10 0 1 3 3Λ Λ Λ .

Define ])~())~()~([(and],)~()~([ 1,3,2,1,1, ltidtidtiddtidtid xvecyvecyvecZxvecyvecW −−−−− −== .

Then the AH(d) estimator of γ is given by

∃ ( ) (~ )γ= ′ ′−Z W Z vec yd d d it d1 .

The asymptotic standard errors under the assumption of homoskedasticity are obtained from

Var s Z W Z Z W Zd d d d d d d( ∃) ( ) ( )γ = ′ ′ ′− −2 1 1 where s e e N Td d d2 2 2= ′ − −( ) / [ ( ) ] with

e vec y Wd it d d= −(~ ) ∃γ. The general heteroskedasticity consistent standard errors are obtained

from ( ) ( ) ( )′ ′ ′ ′− −Z W Z diag e e Z W Zd d d d d d d d1 1 .

It is apparent from the above details of construction of instrument matrices that if the

total number of available cross-sections is T, and the initial values of the dependent variable

are known or can be estimated, then the numbers of cross-sections actually used by AH(l)

and AH(d) are (T-2) and (T-3) respectively. Thus one cross-section is ‘lost’ when AH(d) is

chosen over AH(l). When the number of available cross-sections is small, this may be an

important issue.

Anderson and Hsiao show that both these IV estimators have desirable asymptotic

properties in the directions of both ∞→N and ∞→T . If itv is normally distributed, then

these estimators are asymptotically equivalent to the ML estimators. Further, these desirable

asymptotic properties of the AH estimators do not depend on the assumptions regarding the

initial values of ity .

3SLS and G3SLS

Dynamic panel data models can be viewed as a simultaneous system of T equations

with equations distinguished by t. Thus, conventional simultaneous equations estimators can

be applied to the model, and Chamberlain (1982, 1983) suggests use of a generalized

version of 3SLS. Like Anderson-Hsiao estimators, simultaneous equations estimators also

23

begin by first differencing the model to eliminate the individual effect term iµ . The system

therefore looks as follows.

)()()( 1,2,1,2,1,1, −−−−−− −+−+−=− TiiTTiTiTiTiTiiT vvxxyyyy βα Μ )()()( 12010112 iiiiiiii vvxxyyyy −+−+−=− βα

However, unlike AH and Arellano estimators, simultaneous equations estimators use only

itx ’s as instruments. If observations on 0iy ’s are not available, then the system may be

completed by adding a specification of 0iy in terms of itx ’s as follows.

iTiTii xxy υφφφ ++++= − 1,0100 Λ

3SLS estimation can proceed the usual way with the weighting variance-covariance

matrix estimated from the residuals from initial 2SLS estimation. For generalized 3SLS, the

required weighting matrix is given by E v v x xi i i i( )0 0′⊗ ′ , where 0iv ’s are residuals obtained

using true values of the parameters. In actual implementation, this is approximated by its

sample analog N v v x xi i i ii

N−

=′⊗ ′∑1

1( ∃∃ ) , where iv ’s are residuals from initial 2SLS or 3SLS

estimation. Thus, G3SLS estimator makes a heteroskedasticity correction to the

conventional 3SLS estimator.

Minimum Distance (Chamberlain)

Most of the dynamic panel data estimators begin by eliminating iµ through first

differencing the model. In this regard, MD estimator is distinctive because instead of

eliminating or ignoring, it tries to specify iµ . This allows MD estimator to work with the

model in its level form.

One simple specification of iµ is that suggested by Mundlak (1971), whereby iµ is

taken to be a linear function of ix , the mean of the exogenous variable for individual i. The

purpose of Mundlak’s specification was however to show that under this specification of iµ ,

24

the GLS estimator under random effects assumption becomes equivalent to the LSDV

estimator under fixed effects assumption.

Noting that Mundlak’s suggested specification is overly restrictive, Chamberlain

suggests a more general specification whereby iµ depends linearly on itx for all time

periods for individual i with varying coefficients. According to this specification,

iTiTiii xxx ωλλλλµ +++++= − 1,12010 Κ ,

with iw being the uncorrelated error term. If the initial values, 0iy ’s, are not available, a

similar specification, as the following, can be used for these as well.

iTiTiii xxxy υφφφφ +++++= − 1,120100 Λ

The procedure starts by recursive substitution for the lagged dependent variable, 1, −tiy . With

T = 5, as is the case with the concrete problem studied in this paper, this results into the

following system of equations:

1001 vyxy +++= µαβ)()( 120

2102 vvyxxy ααµµαβαβ ++++++=

)()( 12

232

03

2102

3 vvvyxxxy ααµααµµαβαββα +++++++++=

)(

)(

13

22

34

320

4321

20

34

vvvv

yxxxxy

αααµαµααµµαβαββαβα

++++++++++++=

)()( 14

23

32

45432

05

4322

13

04

5

vvvvv

yxxxxxy

ααααµαµαµααµµαβαββαβαβα

+++++++++++++++=

In matrix form, this system of equations may be expressed as follows:

25

+

+++++++++

++

+

=

5

4

3

2

1

432

32

2

0

5

4

3

2

4

3

2

1

0

23

2

4

3

2

5

4

3

2

1

1

11

11

0000

000000

uuuuu

y

xxxxx

yyyyy

µ

ααααααααα

α

ααααα

βαββ

βααβ

βαβα

βαβα

βαββαβαβ

β

In this system, the right side variables include, other than 0y and µ , only tx ’s. The tu ’s are

composite error terms consisting of tv ’s. It is now possible to substitute for 0y and µ ,

using their respective specifications above. This yields the following matrix of the reduced

form coefficients:

λ

ααααααα

ααα

φ

ααααα

βαββαβαβαβαββαβα

βαββαβαβ

β

+++++++

+++

+′

+

432

32

2

5

4

3

2

234

23

2

11

11

1

0000000000

,

where ′=φ φ φφ φ φ φ( )0 1 2 3 4 5 , and )( 543210 λλλλλλλ =′ . If tx ’s are strictly exogenous, then

they are also uncorrelated with the tu ’s, and the reduced form equations can be estimated by

applying OLS.

Ignoring the intercept term, the Π -matrix above contains twenty-five elements

which are non-linear functions of twelve underlying coefficients, namely

)( 5432154321 φφφφφλλλλλβα which may be denoted by vector θ′. The last stage of the MD

estimation is to squeeze out estimate of θ from Π by performing the following

minimization:

∃ arg min( ∃ ( )) ( ∃ ( ))θ θ θ= − ′ −vec g A vec gNΠ Π .

26

Here )(θg is the vector-valued function mapping the elements of θ into )(Πvec .

Chamberlain shows that the optimal choice for the weighting matrix 1−NA is the inverse of

Ω Π Π Φ Φ= − − ′⊗ ′− −E y x y x x xi i i i x i i x[( )( ) ( ) ]0 0 1 1 ,

where 0Π is the matrix of true coefficients, and Φ x i iE x x= ′( ) . It may be noted that Ω is a

heteroskedasticity consistent weighting matrix. In actual implementation, it is replaced by its

consistent sample analog

∃ [( ∃ )( ∃ ) ( ) ]Ω Π Π= − − ′⊗ ′− − −∑N y x y x S x x Si ii

N

i i x i i x1 1 1 ,

where S x x Nx i ii

N

= ′=∑ /

1. Minimization can be conducted through an iterative routine. In the

current study, the modified Gauss-Newton algorithm was used for the purpose. Note that if

0iy ’s are known then the φ’s do not appear in the system, and the dimensions of the

estimation problem decreases.

AGMM1 and AGMM2

Extending Anderson-Hsiao’s idea, Arellano proposes using as instruments all further

lagged values of ity that qualify as instruments in view of serial uncorrelatedness of itv . The

number of instruments then differs depending on the time period for which the equation is

considered. Arellano uses the GMM framework to adopt a multi-equation approach and use

all the qualified instruments for each equation. The resulting block diagonal matrix of

instruments, iZ , can be seen as follows:

−−−

=

−−− 2,1,2,2

23

12

01

10

210

10

0

|

|00|00|00

00

0000000000

0000

00000000

TiTiTii

ii

ii

ii

ii

iii

ii

i

i

xxyy

xxxxxx

yy

yyyyy

y

ZΜΜ

ΛΟΜΛΛΛ

ΜΜΜΜΜΜΜΜ

27

Let v denote the 1)1( ×−TN stacked vector of differenced error term )( 1, −− tiit vv , and Z

denote the mTN ×− )1( stacked matrix of the instrument matrices iZ ’s, where m is the

number of columns in the instrument matrix iZ . Then the GMM estimator is given by

∃ arg min( ) ( )γ= ′ ′v Z A Z vN ,

where NA is an appropriate weighting matrix. If y denotes the stacked vector )( 1, −− tiit yy ,

and W denotes the stacked matrix )]()[( 2,1,2,1, −−−− −− titititi xxyy , then the formula for the

GMM estimator from the above minimization is given by

∃ ( )γ= ′ ′ ′ ′−W ZA Z W W ZA Z yN N1 .

The one-step GMM estimator (AGMM1) is obtained by setting A N Z HZN i ii

= ′− −∑( )1 1 ,

where H is an arbitrary square matrix of dimension (T-2).8 In AGMM2, the weighting

matrix NA is replaced by ∃ ( ∃∃ )V N Z v v ZN ii

i i i11 1= ′ ′− −∑ , where iv are residuals from AGMM1

estimation. Thus AGMM2 introduces heteroskedasticity correction to the weighting matrix.

The general heteroskedasticity consistent standard errors of the estimates are obtained from

( ∃ )′ ′ −W ZV Z WN1 , where NV is estimated from the corresponding residuals. The standard

errors for the AGMM1 estiamtes under the assumption of homoskedasticity are given by

( ) ( ∃ )( )′ ′ ′ ′ ′ ′− − −W ZA Z W W ZA V A Z W W ZA Z WN N N N N1

11 1 , where 1NV is estimated using residuals

from AGMM1 estimation.

8 Arellano suggests a H which has 2’s in the main diagonal and –1 in the first sub-diagonals and zeroselsewhere. Note that (T-2), the dimension of this matrix, is the number of time periods for which the equationcan be estimated.

28

Appendix

Tables Showing Details of the Monte Carlo Distributions

Table-A1

Simulation Results for NONOIL with Uncorrelated itνTrue Parameter Values: α = 0.7884 and β = 0.1641

Estimator Parameter Mean ofPE.

StandardDeviation of PE.

Mean ofSE. (PE)

StandardDeviation of SE

(PE)OLS α

β0.90540.2157

0.01490.0235

0.01170.0181

0.00060.0009

LSDV αβ

0.72510.1624

0.02070.0209

0.02020.0209

0.00130.0008

AH(l) αβ

0.95550.1636

11.5350.4714

492.0515.368

142092407.01

AH(d) αβ

0.79170.1647

0.06520.0326

0.08440.0338

0.01170.0023

AGMM1 αβ

0.70410.1866

0.20170.2401

0.03240.0381

0.01690.0178

AGMM2 αβ

0.71230.1705

0.22060.2783

0.03350.0397

0.02160.0220

2SLS αβ

0.71500.1603

0.06140.0278

0.09600.0319

0.01140.0015

3SLS αβ

0.75320.1638

0.05720.0225

0.04810.0212

0.01070.0017

G3SLS αβ

0.74120.1620

0.06270.0286

0.04120.0172

0.00900.0024

MD αβ

0.73540.1644

0.02400.0218

0.03580.0202

0.00880.0023

Notes:1. PE = Parameter Estimate; SE = Standard Error of Estimate2. No. of replications = 1000

29

Table-A2

Simulation Results for INTER with Uncorrelated itνTrue Parameter Values: α = 0.7925 and β = 0.1732

Estimator Parameter Mean ofPE.

StandardDeviation of PE.

Mean ofSE. (PE)

StandardDeviation of SE

(PE)OLS α

β0.91310.1936

0.01610.0253

0.01140.0184

0.00070.0019

LSDV αβ

0.72620.1741

0.02450.0215

0.02310.0215

0.00170.0009

AH(l) αβ

0.20010.1405

10.5760.4778

492.0410.086

7704.5167.80

AH(d) αβ

0.79420.1721

0.04280.0351

0.05840.0356

0.00600.0023

AGMM1 αβ

0.44060.1603

0.37410.2560

0.05680.0340

0.03410.0169

AGMM2 αβ

0.41790.1785

0.50810.3251

0.06500.0382

0.05810.0325

2SLS αβ

0.76800.1778

0.03240.0334

0.05250.0346

0.00420.0016

3SLS αβ

0.74640.1766

0.06060.0283

0.05320.0217

0.00980.0022

G3SLS αβ

0.72690.1766

0.06770.0307

0.04250.0165

0.00950.0030

MD αβ

0.73800.1749

0.02470.0217

0.03180.0189

0.00910.0027

Notes:1. PE = Parameter Estimate; SE = Standard Error of Estimate2. No. of replications = 1000

30

Table-A3

Simulation Results for OECD with Uncorrelated itνTrue Parameter Values: α = 0.6294 and β = 0.0954

Estimator Parameter Mean ofPE.

StandardDeviation of PE.

Mean ofSE. (PE)

StandardDeviation of SE

(PE)OLS α

β0.76440.1908

0.03770.0587

0.02230.0350

0.00280.0045

LSDV αβ

0.61950.0940

0.01930.0382

0.01880.0389

0.00220.0030

AH(l) αβ

0.62810.0927

0.03300.0573

0.05670.0581

0.00930.0052

AH(d) αβ

0.63290.0938

0.04560.0619

0.06730.0648

0.01060.0069

AGMM1 αβ

0.56590.0884

0.14050.2322

0.03580.0455

0.02160.0280

AGMM2 αβ

0.57550.0789

0.19980.2922

0.03990.0510

0.02820.0394

2SLS αβ

0.51100.0946

0.09660.0556

0.20890.1104

0.04960.0094

3SLS αβ

0.54920.0905

0.16010.0639

0.07310.0432

0.03430.0077

G3SLS αβ

0.50440.0818

0.22540.1131

0.02760.0153

0.01790.0071

MD αβ

0.62140.0948

0.01720.0384

0.01140.0184

0.01670.0134

Notes:1. PE = Parameter Estimate; SE = Standard Error of Estimate2. No. of replications = 1000

31

Table-A4

Simulation Results for NONOIL with MA(1) itνTrue Parameter Values: α = 0.7884 and β = 0.1641

Estimator Parameter Mean ofPE.

StandardDeviation of PE.

Mean ofSE. (PE)

StandardDeviation of SE

(PE)OLS α

β0.90380.2167

0.01560.0239

0.01200.0186

0.00060.0009

LSDV αβ

0.72410.1630

0.02410.0251

0.02150.0224

0.00130.0009

AH(l) αβ

0.27830.0918

5.16620.3746

41.2472.3581

184.5514.054

AH(d) αβ

0.67430.1605

0.06360.0328

0.07760.0322

0.00930.0017

AGMM1 αβ

0.70680.1878

0.20110.2481

0.03100.0389

0.01370.0209

AGMM2 αβ

0.70850.1727

0.21640.2783

0.03170.0404

0.01440.0279

2SLS αβ

0.71500.1597

0.06630.0299

0.09900.0329

0.01250.0015

3SLS αβ

0.76210.1638

0.05620.0249

0.05430.0238

0.01030.0018

G3SLS αβ

0.74560.1601

0.06840.0264

0.04610.0189

0.00910.0023

MD αβ

0.73410.1630

0.02800.0253

0.04150.0225

0.00980.0026

Notes:1. 1.PE = Parameter Estimate; SE = Standard Error of Estimate2. No. of replications = 1000

32

Table-A5

Simulation Results for INTER with MA(1) itνTrue Parameter Values: α = 0.7925 and β = 0.1732

Estimator Parameter Mean ofPE.

StandardDeviation of PE.

Mean ofSE. (PE)

StandardDeviation of SE

(PE)OLS α

β0.91290.1935

0.01610.0253

0.01140.0184

0.00070.0019

LSDV αβ

0.71920.1756

0.02450.0215

0.02310.0215

0.00170.0009

AH(l) αβ

0.97030.1717

1.86580.0945

8.76330.4108

24.0201.1130

AH(d) αβ

0.71690.1713

0.08050.0367

0.09810.0360

0.01520.0023

AGMM1 αβ

0.39990.2015

0.39680.2649

0.06350.0417

0.04040.0255

AGMM2 αβ

0.40000.2114

0.54660.3534

0.07340.0479

0.06720.0426

2SLS αβ

0.76790.1767

0.03570.0358

0.05510.0363

0.00450.0018

3SLS αβ

0.72950.1771

0.06170.0268

0.06060.0245

0.01250.0022

G3SLS αβ

0.71280.1770

0.07500.0318

0.04960.0185

0.01200.0028

MD αβ

0.72960.1740

0.02830.0262

0.03790.0215

0.01120.0030

Notes:1. PE = Parameter Estimate; SE = Standard Error of Estimate2. No. of replications = 1000

33

Table-A6

Simulation Results for OECD with MA(1) itνTrue Parameter Values: α = 0.6294 and β = 0.0954

Estimator Parameter Mean ofPE.

StandardDeviation of PE.

Mean ofSE. (PE)

StandardDeviation of SE

(PE)OLS α

β0.76060.1907

0.03750.0573

0.02210.0345

0.00280.0045

LSDV αβ

0.61890.0947

0.02000.0428

0.01960.0405

0.00230.0032

AH(l) αβ

0.62940.0995

0.03330.0580

0.05620.0579

0.00890.0042

AH(d) αβ

0.62210.0930

0.04440.0574

0.06290.0609

0.00960.0065

AGMM1 αβ

0.57540.0903

0.12680.2269

0.03470.0457

0.01810.0288

AGMM2 αβ

0.57800.0764

0.17560.2708

0.03800.0499

0.02720.0352

2SLS αβ

0.52970.0917

0.08950.0521

0.21020.1096

0.05030.0087

3SLS αβ

0.55250.0997

0.06140.0618

0.07460.0442

0.02930.0068

G3SLS αβ

0.52560.0877

0.21280.1057

0.02600.0154

0.02900.0103

MD αβ

0.62220.0948

0.01840.0425

0.01200.0188

0.01400.0200

Notes:1. PE = Parameter Estimate; SE = Standard Error of Estimate2. No. of replications = 1000

34

Table-A7

Simulation Results for NONOIL with AR(1) itνTrue Parameter Values: α = 0.7884 and β = 0.1641

Estimator Parameter Mean ofPE.

StandardDeviation of PE.

Mean ofSE. (PE)

StandardDeviation of SE

(PE)OLS α

β0.90470.2160

0.01520.0234

0.01190.0185

0.00060.0009

LSDV αβ

0.72600.1636

0.02460.0253

0.02140.0224

0.00130.0009

AH(l) αβ

0.63990.1916

38.9282.6155

1998.8125.10

4650003333.0

AH(d) αβ

0.66280.1576

0.05950.0307

0.07240.0312

0.00780.0015

AGMM1 αβ

0.70500.1879

0.19330.2372

0.03060.0377

0.01430.0178

AGMM2 αβ

0.70820.1882

0.21310.2699

0.03140.0393

0.01700.0209

2SLS αβ

0.72060.1610

0.06640.0286

0.09850.0329

0.01200.0015

3SLS αβ

0.76210.1638

0.05620.0249

0.05430.0238

0.01030.0018

G3SLS αβ

0.74560.1601

0.06840.0264

0.04610.0189

0.00910.0023

MD αβ

0.73770.1628

0.02880.0259

0.04290.0224

0.01110.0026

Notes:1. PE = Parameter Estimate; SE = Standard Error of Estimate2. No. of replications = 1000

35

Table-A8

Simulation Results for INTER with AR(1) itνTrue Parameter Values: α = 0.7925 and β = 0.1732

Estimator Parameter Mean ofPE.

StandardDeviation of PE.

Mean ofSE. (PE)

StandardDeviation of SE

(PE)OLS α

β0.91460.1925

0.01540.0237

0.01150.0186

0.00060.0010

LSDV αβ

0.72880.1751

0.02720.0246

0.02380.0223

0.00170.0009

AH(l) αβ

0.69810.1629

5.73150.3521

87.7773.5420

784.7143.393

AH(d) αβ

0.71340.1714

0.07020.0313

0.08710.0332

0.01260.0020

AGMM1 αβ

0.44870.2189

0.39290.2444

0.06040.0387

0.04230.0299

AGMM2 αβ

0.44070.2330

0.49960.3093

0.06650.0428

0.05410.0333

2SLS αβ

0.77040.1776

0.03330.0331

0.05320.0350

0.00410.0016

3SLS αβ

0.75070.1893

0.05760.0263

0.05910.0221

0.01550.0020

G3SLS αβ

0.72300.1882

0.07150.0378

0.04880.0165

0.01750.0032

MD αβ

0.73920.1732

0.02770.0250

0.03610.0202

0.01070.0029

Notes:1. 1.PE = Parameter Estimate; SE = Standard Error of Estimate2. No. of replications = 1000

36

Table-A9

Simulation Results for OECD with AR(1) itνTrue Parameter Values: α = 0.6294 and β = 0.0954

Estimator Parameter Mean ofPE.

StandardDeviation of PE.

Mean ofSE. (PE)

StandardDeviation of SE

(PE)OLS α

β0.76290.1913

0.03780.0553

0.02210.0346

0.00280.0044

LSDV αβ

0.62080.0934

0.02080.0417

0.01920.0400

0.00220.0032

AH(l) αβ

0.63190.0965

0.03370.0586

0.05550.0571

0.00870.0049

AH(d) αβ

0.61920.0950

0.04590.0593

0.06490.0630

0.00950.0061

AGMM1 αβ

0.57700.0929

0.13990.2161

0.03500.0453

0.02000.0288

AGMM2 αβ

0.57620.0968

0.18770.2710

0.03780.0495

0.02510.0384

2SLS αβ

0.52150.0917

0.09640.0551

0.21360.1105

0.05250.0090

3SLS αβ

0.55290.0869

0.12750.0782

0.07780.0446

0.03370.0061

G3SLS αβ

0.54490.0973

0.16430.1427

0.02740.0165

0.01570.0079

MD αβ

0.62190.0966

0.01850.0435

0.01200.0191

0.02380.0161

Notes:1. PE = Parameter Estimate; SE = Standard Error of Estimate2. No. of replications = 1000

37

References

Amemiya, T. (1967), “A Note on the Estimation of Balestra-Nerlove Models,” Technical

Report No. 4, Institute of Mathematical Studies in Social Sciences, Stanford

University.

Amemiya, T. (1971), “Estimation of the Variance in a Variance-Component Model,”

International Economic Review, 12:1-13.

Anderson, T. W. and C. Hsiao (1981), “Estimation of Dynamic Models with Error

Components,” Journal of American Statistical Association, 76:598-606.

Anderson, T. W. and C. Hsiao (1982), “Formulation and Estimation of Dynamic Models

Using Panel Data,” Journal of Econometrics, 18:47-82.

Arellano, M. (1989a), “On the Efficient Estimation of Simultaneous Equations with

Covariance Restrictions,” Journal of Econometrics, 42:247-265.

Arellano, M. (1989b), “An Efficient GLS Estimation of Triangular Models with

Covariance Restrictions,” Journal of Econometrics, 42:267-273.

Arellano, M. and S. Bond (1991), “Some Tests of Specification for Panel Data: Monte

Carlo Evidence and an Application to Employment Equations,” The Review of

Economic Studies, 58:277-297.

Balestra, P. and M. Nerlove (1966), “Pooling Cross-section and Time Series Data in the

Estimation of a Dynamic Model: The Demand of Natural Gas,” Econometrica,

34:585-612.

Bhargava, A. and J. D. Sargan (1983), “Estimating Dynamic Random Effects Models

from Panel Data Covering Short Time Periods,” Econometrica, 51:1635-1659.

38

Chamberlain, G. (1982), “Multivariate Regression Models for Panel Data,” Journal of

Econometrics, 18:5-46.

Chamberlain, G. (1983), “Panel Data,” in Z. Griliches and M. Intrilligator (editors),

Handbook of Econometrics, Vol. II., 1247-1318.

Mankiw, N. G., D. Romer, and D. Weil (1992), “A Contribution to the Empirics of

Growth,” Quarterly Journal of Economics, CVII: 407-437.

Nerlove, M. (1967), “Experimental Evidence on the Estimation of Dynamic Economic

Relations from a Time Series of Cross-sections, Economic Studies Quarterly,

18:42-74.

Nerlove, M. (1971), “Further Evidence on the Estimation of Dynamic Economic

Relations from a Time Series of Cross-sections,” Econometrica, 39:383-396.

Nickel, S. (1979), “Biases in Dynamic Models with Fixed Effects,” Econometrica,

49:1399-1416.

Sevestre, P. and Trognon, A. (1982), “A Note on Autoregressive Error Components

Models,” 8209, Ecole Nationale de la Statistique et de l’Administration

Economique et Unite de Recherche.

Summers, R. and A. Heston (1988), “A New Set of International Comparisons of Real

Product and Price Levels Estimates for 130 Countries, 1950-85,” Review of

Income and Wealth, XXXIV: 1-26.

Summers, R. and A. Heston (1991), “The Penn World Table (Mark 5): An Expanded Set

of International Comparisons, 1950-1988,” Quarterly Journal of Economics,

106: 327-368.

39

Abstract

This paper investigates the small sample properties of dynamic panel data estimators

as applied for estimation of growth convergence equation using Summers-Heston data set.

The Monte Carlo results show that estimates from different estimators do vary. Second,

those dynamic panel estimators which do not use further lagged dependent variable as

instruments perform better than the ones which do. These results together suggest that

investigators using panel estimators should try to check into the possible small sample bias

of their results. (JEL classification: C3, O4)