aggregation in linear models for …. japan statist. soc. vol. 40 no. 1 2010 63–95 aggregation in...

33
J. Japan Statist. Soc. Vol. 40 No. 1 2010 63–95 AGGREGATION IN LINEAR MODELS FOR PANEL DATA David Veredas* and Alexandre Petkovic** We study the impact of individual and temporal aggregation in linear static and dynamic models for panel data in terms of i) model specification, ii) efficiency of the estimated parameters, and iii) the choice of the aggregation scheme. Model wise we find that i) individual aggregation does not affect the model structure but temporal aggregation may introduce residual autocorrelation, and ii) individual ag- gregation entails heteroscedasticity while temporal aggregation does not. Estimation and aggregation scheme wise we find that i) in the static model, estimation by least squares with the aggregated data entails a decrease in the efficiency of the estimated parameters and no aggregation scheme dominates in terms of efficiency, and ii) in the dynamic model, estimation with the aggregated data by GMM does not necessarily entail a decrease in the efficiency of the estimated parameters under individual ag- gregation, and no analytic comparison can be established for temporal aggregation, though simulations suggests that temporal aggregation deteriorates the accuracy of the estimates. Key words and phrases : Efficiency, model specificatio, panel data, temporal aggre- gation. 1. Introduction It is often the case that the researcher holds a database consisting of a panel in which, for the variable of interest, individuals are aggregated in groups and time frequency is low. We will denote this panel aggregated data . However, the researcher’s concern would actually be better addressed with a database on individual observations collected with a higher frequency. We will denote this alternative panel disaggregated data . The lack of knowledge on the disaggregated data is quite common in many policy areas for, at least, two reasons: cost and confidentiality. International agencies (e.g. the World Bank and the UN) finance regular household consumption or expenditure surveys conducted in developing coun- tries. These surveys cannot be conducted annually simply because they are too costly to implement. This clearly influences the frequency of the observations available to track the well being of a population. The second reason for lack of knowledge of the disaggregated data is con- fidentiality. As pointed out by Schmidt and Schneeweiss (2009), if anonymity Received February 10, 2010. Revised August 7, 2010. Accepted August 20, 2010. *ECARES, Solvay Brussels School of Economics and Management, Universit´ e libre de Bruxelles, Brussels, Belgium; the authors are also members of ECORE, the recently created association between CORE and ECARES. Email: [email protected] **School of Fundamental Science and Engineering, Waseda University, Tokyo, Japan. Email: [email protected]

Upload: ngocong

Post on 01-May-2019

213 views

Category:

Documents


0 download

TRANSCRIPT

J. Japan Statist. Soc.Vol. 40 No. 1 2010 63–95

AGGREGATION IN LINEAR MODELS FOR PANELDATA

David Veredas* and Alexandre Petkovic**

We study the impact of individual and temporal aggregation in linear staticand dynamic models for panel data in terms of i) model specification, ii) efficiencyof the estimated parameters, and iii) the choice of the aggregation scheme. Modelwise we find that i) individual aggregation does not affect the model structure buttemporal aggregation may introduce residual autocorrelation, and ii) individual ag-gregation entails heteroscedasticity while temporal aggregation does not. Estimationand aggregation scheme wise we find that i) in the static model, estimation by leastsquares with the aggregated data entails a decrease in the efficiency of the estimatedparameters and no aggregation scheme dominates in terms of efficiency, and ii) in thedynamic model, estimation with the aggregated data by GMM does not necessarilyentail a decrease in the efficiency of the estimated parameters under individual ag-gregation, and no analytic comparison can be established for temporal aggregation,though simulations suggests that temporal aggregation deteriorates the accuracy ofthe estimates.

Key words and phrases: Efficiency, model specificatio, panel data, temporal aggre-gation.

1. Introduction

It is often the case that the researcher holds a database consisting of a panelin which, for the variable of interest, individuals are aggregated in groups andtime frequency is low. We will denote this panel aggregated data. However,the researcher’s concern would actually be better addressed with a database onindividual observations collected with a higher frequency. We will denote thisalternative panel disaggregated data. The lack of knowledge on the disaggregateddata is quite common in many policy areas for, at least, two reasons: cost andconfidentiality.

International agencies (e.g. the World Bank and the UN) finance regularhousehold consumption or expenditure surveys conducted in developing coun-tries. These surveys cannot be conducted annually simply because they are toocostly to implement. This clearly influences the frequency of the observationsavailable to track the well being of a population.

The second reason for lack of knowledge of the disaggregated data is con-fidentiality. As pointed out by Schmidt and Schneeweiss (2009), if anonymity

Received February 10, 2010. Revised August 7, 2010. Accepted August 20, 2010.

*ECARES, Solvay Brussels School of Economics and Management, Universite libre de Bruxelles,

Brussels, Belgium; the authors are also members of ECORE, the recently created association between

CORE and ECARES. Email: [email protected]

**School of Fundamental Science and Engineering, Waseda University, Tokyo, Japan. Email:

[email protected]

64 DAVID VEREDAS AND ALEXANDRE PETKOVIC

issues arise, surveying agencies may prefer to transform the data before releas-ing them. For example, industry level corruption indexes based on surveys areoften aggregated by sub-regions on the grounds of confidentiality. Any releaseof country specific information on that industry could easily be associated withthe specific firm providing the monopolistic service. Without the protection ofconfidentiality, very few companies working in these industries would be willingto reveal if they had to pay bribes to get the right to deliver service in any specificcountry (for details, see for instance www.worldbank.org/wbi/governance).

The first articles on temporal aggregation track back to late the 60’s andearly 70’s (Tesler (1967), Zellner and Montmarquette (1971), Sims (1971), and,more recently, Geweke (1987)). The first paper to address temporal aggregationin time series models is Amemiya and Wu (1972). They show that, if the orig-inal variable is generated by an AR model of order p, the aggregated variablefollows an AR model of order p with MA residuals structure. Brewer (1973),Wei (1978), Tiao (1972), Weiss (1984), Drost and Nijman (1993) and Jordaand Marcellino (2002) extend these results to more ARMA structures with unitroots, seasonality and random aggregation, as well as GARCH models. Silvestriniand Veredas (2008) survey temporal aggregation in univariate and multivariateARIMA-GARCH time series models.

The literature of individual aggregation tracks back to earlier dates (Theil(1954), Zellner (1962), Chambers (1973)). Granger (1980) analyzes the impact ofindividual aggregation in ARMA models. Granger (1987) studies the aggregationof static micromodels that depend on two regressors: individual specific andcommon to all individuals. More recently, Garret (2003) analyzes the impactof aggregation on parameter inference, and Pesaram (2003) views aggregationlike a forecasting problem where the aim is optimal prediction of the aggregatedvariable.

The above literature on aggregation can be classified in different ways: interms of the model (time series, cross-sectional or panel), of the aggregationscheme (temporal, individual or both), or on the effect of aggregation (on thespecification of the model, on the estimated parameters, or on forecasting). Mostof the articles focus on i) aggregation of time series and cross-sectional models,ii) temporal or individual aggregation, and on iii) the specification of the modeland forecasting. However, the effect of aggregation in panel data models hasnot been investigated. Further, little is known on the effect of aggregation onparameter estimation. Last, nothing has been said on the comparison betweendifferent aggregation schemes.

We study the consequences of temporal and individual aggregation in dy-namic and static fixed and random effects panel data models. In particular weaddress three issues. The first is the specification of the model. We study howaggregation affects the model specification. We show that the consequences ofaggregation in the specification of the model in cross-sectional and time seriesmodels are carried over in panel data models. The second are the inferentialconsequences of aggregation. We analyze how data aggregation affects the esti-

MODEL AGGREGATION FOR PANELS 65

mates of the model. Third, given that aggregation generally leads to efficiencylosses, we analyze the aggregation schemes that minimize the efficiency loss ofthe estimates.

In terms of model specification we find that, while temporal aggregationdoes not have any influence in the static case, it generally affects the structureof the model in the dynamic setting, as it generally entails MA components.These findings hold regardless of whether the model has fixed or random effects.Individual aggregation, by contrast, typically does not entail changes in modelspecification, but it may induce heteroscedasticity.

Estimation and aggregation scheme wise we find that aggregation affectsthe properties of the estimates. Parameters are estimated mainly by two meth-ods: least squares and GMM. We show that if parameters are estimated byleast squares, this is the method we opt for in the static model, the informa-tion loss incurred by aggregation affects the efficiency of the estimates, whichwe are able to compute explicitly. We also show that theoretically there is noaggregation scheme that dominates in terms of efficiency loss. Yet, the analyticalresults show that some aggregation schemes perform better than others depend-ing on the stochastic process driving the exogenous variables. This intuition isconfirmed through a simulation study. If parameters are estimated by GMM(Arellano-Bond), this is the method we opt for in the dynamic model, an explicitcomparison is not possible but results can be worked out for individual and tem-poral aggregation separately. Surprisingly, theoretically it is not possible to showthat individual aggregation increases the variance of the estimator. On the otherhand, temporal aggregation modifies the number and the form of the momentconditions used in GMM and the analytic comparison between the disaggregatedand aggregated variances is unfeasible. However, the simulation study suggeststhat temporal aggregation increases the variance of the estimators.

The paper is organized as follows: Section 2 introduces notation and defini-tions, Section 3 studies the effect of aggregation on model specification, Section 4introduces further notation and analyzes the effect of aggregation on parameterestimation and the optimal choice of the aggregation schemes. Section 5 con-cludes. Proofs and preparatory Lemmas are relegated to the Appendix.

2. Notation and definitions

Let yi,t ∈ R be the realization of the random variable of interest observedat time t = 1, . . . , N for individual i = 1, . . . , I. Likewise for the exogenousregressor xi,t ∈ R. We assume that the relation between yi,t with its own pastand xi,t is given by a dynamic linear model with either fixed or random effects:

yi,t = αi + γyi,t−1 + βxi,t + ui,t,(2.1)yi,t = µ + γyi,t−1 + βxi,t + αi + ui,t,(2.2)

where ui,t ∀i, t are independent and identically distributed random variables withzero mean and variance σ2

u ∈]0 + ∞[, and γ ∈] − 1, 1[, µ ∈ R, and β ∈ R. In(2.1) the individual effects αi ∈ R are fixed, and hence they are parameters

66 DAVID VEREDAS AND ALEXANDRE PETKOVIC

to be estimated, while in (2.2) are assumed to be independent and identicallydistributed random variables with zero mean and variance σ2

α ∈]0 + ∞[.Sample information for yi,t is assumed to be available only every m periods

(m, 2m, 3m, . . .), where m, an integer value larger than one, is the aggregationfrequency, and in A groups of individuals.

Individual aggregation is of the form:

ya,t =

(I∑

i=1

Mai

)yi,t = B(a)yi,t,

where a = 1, . . . , A refers to the group and Mai is the weight, assumed to be

known, associated with individual i, being zero if an individual does not belongto group a. Group a is composed of Ia individuals, and the size of the A groupscan be different but known. The use of this aggregation scheme is justified by thefact that individuals are not sequentially ordered when they are aggregated. Weassume that an individual cannot belong to different groups, i.e.

∑Ii=1 Ma

i Ma′i =

0 ∀a′ = a. This however can be relaxed. This scheme embeds some importantaggregation schemes: i) Flow: Ma

i = 1 or 0 with∑A

a=1

∑Ii=1 Ma

i = I. That is,sum of individuals belonging to group a. ii) Stock: Ma

i = 1 or 0 and∑I

i=1 Mai =

1. That is, groups with only one individual that are selected according to somepre-specified criteria. iii) Per capita: Ma

i = (1/Ia) or 0 and∑A

a=1

∑Ii=1 Ma

i = A.That is, the total sum of individuals is divided by the number of individuals inthe group.

Temporal aggregation is of the form:

yi,t =

⎛⎝m−1∑

j=0

wjLj

⎞⎠ yi,t = T (L)yi,t,

where L is the lag operator and wj is the weight, assumed to be known, attachedto the observation at time t−j. This scheme embeds aggregation schemes similarto the ones of individual aggregation: i) Flow: wj = 1, for j = 0, . . . , m−1. Thatis, aggregation of yi,t is carried out over m periods. ii) Stock: w0 = 1 and wj = 0for j = 1, . . . , m − 1. One every m observations is kept, the rest being skipped.iii) Average: wj = 1

m , for j = 0, . . . , m − 1. iv) Weighted average: wj = χj , forj = 0, . . . , m − 1, where χj are the weights such that they sum one. Note thatflow, averaging and weighted averaging aggregation schemes are rolling sums. Inother words, yi,t is indexed by t, which means a sequence of sums that overlapover m− 1 periods. However, the aggregated data does not overlap. To indicatethe aggregated data we introduce another time scale, T , that runs in mt periods.So that t = 1, 2, . . . , N , while T = m, 2m, . . . , mτ . Thus, τ = N/m and wesubindex the temporally aggregated series by T : yi,T = yi,mt.

Combining these two aggregation schemes, we compute the aggregated dataas:

ya,T = T (L)B(a)yi,t,

MODEL AGGREGATION FOR PANELS 67

and likewise for xi,t and ui,t. We denote T (L)B(a) as the General Static Aggre-gation Scheme (henceforth GSAS) that we use in the static model (γ = 0). Theterm General stands for simultaneous temporal and individual aggregation.

For the dynamic model, since the dependent variable is present on both sidesof the model, we first express (2.1) and (2.2) in terms of the lag operator:

(1 − γL)yi,t = αi + βxi,t + ui,t,(2.3)(1 − γL)yi,t = µ + βxi,t + αi + ui,t.(2.4)

We also apply the aggregation scheme (1−γmLm)(1−γL)−1T (L)B(a), which wedenote as the General Dynamic Aggregation Scheme (henceforth GDAS) follow-ing Brewer (1973). So, for instance, the aggregated left-hand side of the modelis (

1 − γmLm

1 − γL

)T (L)B(a)(1 − γL)yi,t = (1 − γmLm)ya,T .

The first term of the left-hand side is a ratio of two polynomials. The denomi-nator contains the inverted roots of the AR part of the model and its numeratorcontains the same roots, but powered by the aggregation frequency. This termensures that the powers of the lag operator L of the model for aggregated dataare only divisible by m, as it can be seen on the right-hand side.

3. The effect of aggregation on model specification

In this short section we study the effect of not observing yi,t but ya,T onmodel specification. In other words, if we know the model for yi,t, how is thespecification of the model affected by the fact that we only observe ya,T ? Wefirst study the static case, followed by the dynamic model.

3.1. Static modelWe consider a static fixed effects model, i.e. (2.1) with γ = 0. Applying

GSAS on both sides of the model:

ya,T =

⎛⎝m−1∑

j=0

wj

⎞⎠ αa + xa,T β + ua,T ,

where αa = (∑I

i=1 Mai αi), ua,T is independent with E(ua,T ) = 0 and

Var(ua,T ) = σ2u

(I∑

i=1

(Mai )2

)⎛⎝m−1∑

j=0

w2j

⎞⎠ .

This result leads to the following conclusions (similar to those in Theil (1954), andZellner and Montmarquette (1971)). First, the slope parameter β is innocuousto temporal aggregation. This result is not surprising since we assume that theslopes are constant through time and individuals. Second, the fixed effects areaffected by aggregation in three ways: i) the number of fixed effects reduces to

68 DAVID VEREDAS AND ALEXANDRE PETKOVIC

the number of aggregated groups, from I to A, ii) temporal aggregation entailsa re-scaling of the same magnitude to all the effects, iii) αa is a weighted averageof the disaggregated effects. Third, the aggregated error is not iid as it showsheteroscedasticity, unless

∑Ii=1(M

ai )2 =

∑Ii=1(M

a′i )2, ∀a, a′. This may happen

in different scenarios. The easiest one is when individuals are aggregated ingroups of equal size (Ia = Ia′) and the weights are homogenous across groups.The heteroscedasticity increases with either more heterogenous group sizes, moreheterogenous weights across groups, or both.

We now consider a static random effects model, i.e. (2.2) with γ = 0. Ap-plying GSAS on both sides of the model:

ya,T =

⎛⎝m−1∑

j=0

wj

⎞⎠ µa + xa,T β + αa + ua,T ,

where µa = (∑I

i=1 Mai )µ, ua,T is the same as in the fixed effects model, E(αa) =

0, and

Var(αa) = σ2α

(I∑

i=1

(Mai )2

)⎛⎝m−1∑

j=0

wj

⎞⎠2

.

As in the fixed effects model, the slope parameter β is innocuous to aggrega-tion and individual aggregation causes heteroscedasticity in the aggregated errorterms αa and ua,T unless

∑Ii=1(M

ai )2 =

∑Ii=1(M

a′i )2, ∀a, a′. The aggregated

intercept is group-specific unless∑I

i=1 Mai =

∑Ii=1 Ma′

i , ∀a, a′. That is, by ag-gregating individuals in a random effects model, both random and fixed effectsappear.

3.2. Dynamic modelWe consider the dynamic fixed effects model (2.3) and the GDAS. Similar

to the static case, we premultiply the model by the aggregation scheme:

ya,T =

⎛⎝m−1∑

j=0

γj

⎞⎠

⎛⎝m−1∑

j=0

wj

⎞⎠ αa+γmya,T−m+xa,T β+· · ·+xa,T−m+1βγm−1+ηa,T ,

where αa = (∑I

i=1 Mai αi) and ηa,T = ua,T + γua,T−1 + · · · + γm−1ua,T−m+1 is a

zero mean error term with variance

σ2η = σ2

u

(I∑

i=1

(Mai )2

)⎡⎣w2

0 +m−1∑p=1

⎛⎝

⎛⎝ p∑

j=0

wjγp−j

⎞⎠2

+

⎛⎝m−1∑

j=p

wjγm−1+p−j

⎞⎠2⎞⎠

⎤⎦ .

This result leads to the following conclusions. First, the fixed effects are affectedby aggregation in the same way as in the static model (the term on γ has thesame consequences as the term on wj). Second, temporal aggregation createsa dependence with respect to past aggregated exogenous variables. However

MODEL AGGREGATION FOR PANELS 69

the dependence is constrained: the coefficients attached to the lagged aggregatedexogenous variables are nonlinear functions of β and γ. These results can be seenas a generalization of those derived in Palm and Nijman (1984), Tilak (1998,2000). Only if the temporal aggregation scheme is stock (where w0 = 1 andwj = 0 ∀j > 0) the dependence with lagged aggregated exogenous variablesdisappears. Third, similarly to Amemiya and Wu (1972), temporal aggregationentails autocorrelation of order one in ηa,T :

E(ηa,T ηa,T−m)

= σ2u

(I∑

i=1

(Mai )2

)⎡⎣m−2∑

v=0

⎛⎝

⎛⎝ v∑

j=0

wjγv−j

⎞⎠

⎛⎝ m−1∑

j=v+1

wjγm+v−j

⎞⎠

⎞⎠

⎤⎦

E(ηa,T ηa,T−sm) = 0 s = ±2,±3,±4, . . . .

In other words, the dynamic model specification is affected by temporal aggre-gation (from an AR(1) to an ARMA(1, 1)) except if aggregation is stock. Last,and as in the static case, the aggregated error shows heteroscedasticity unless∑I

i=1(Mai )2 =

∑Ii=1(M

a′i )2, ∀a, a′. Note that since the AR coefficient is the

same for all the individuals, individual aggregation does not change the order ofthe autoregressive polynomial, Granger (1987).

Finally, we consider the dynamic random effects model (2.4) and the GDAS.Premultiplying the model by the aggregation scheme yields:

ya,T =

⎛⎝m−1∑

j=0

γj

⎞⎠

⎛⎝m−1∑

j=0

wj

⎞⎠ µa + γmya,T−m + xa,T β + · · ·

+ xa,T−m+1βγm−1 + αa + ηa,T

where µa = (∑I

i=1 Mai )µ, ηa,T is the same as in the fixed effects model, E(αa) = 0,

and

Var(αa) = σ2α

(I∑

i=1

(Mai )2

)⎛⎝m−1∑

j=0

wj

⎞⎠2 ⎛

⎝m−1∑j=0

γj

⎞⎠2

.

Conclusions from applying GDAS to (2.2) are a combination of those from therandom effects static model and the fixed effects dynamic model.

4. The effect of aggregation on estimation

In this section we study the consequences of estimating the parameters of themodels, namely µ, β and γ for yi,t but with ya,T . We don’t focus on the estimationof the fixed effects as they are easily obtained once the slope parameters areestimated.

We first introduce further notation, rewriting (2.1)–(2.4), GSAS and GDASin matrix notation. In the second part of the section we focus on estimationof µ and β in the static model by least squares and we show how to explicitlycompute the efficiency loss. We also show that it is not possible to compare the

70 DAVID VEREDAS AND ALEXANDRE PETKOVIC

efficiency losses entailed by two different aggregation schemes. In the third partwe study estimation of β and γ in the dynamic model by GMM, namely Arellano-Bond. Because individual and temporal aggregation have different consequenceson the model specification, we study them separately. In the previous section wesaw that temporal aggregation changes the model specification: it generates adynamic response to the exogenous variables and may introduce autocorrelationin the aggregated residuals. Thus in this case the moment conditions may needto be adapted.

4.1. NotationThe fixed effects model (2.1) can be written as⎛

⎜⎜⎜⎜⎝y1

y2...

yI

⎞⎟⎟⎟⎟⎠ =

⎛⎜⎜⎜⎜⎝

α1eN

α2eN

...αIeN

⎞⎟⎟⎟⎟⎠ + γ

⎛⎜⎜⎜⎜⎝

y1,−1

y2,−1...

yI,−1

⎞⎟⎟⎟⎟⎠ +

⎛⎜⎜⎜⎜⎝

x1

x2

...xI

⎞⎟⎟⎟⎟⎠ β +

⎛⎜⎜⎜⎜⎝

u1

u2

...uI

⎞⎟⎟⎟⎟⎠ ,

where yi is a N×1 vector with the individual observations of individual i, αi is thefixed effect of individual i, eN is a N ×1 vector of ones, yi,−1 = (yi,0, . . . , yi,N−1)′

is a N × 1 vector of lagged yi,t, xi is a N × 1 vector containing the individualregressor, and ui is a N × 1 vector with the iid individual error terms withE(ui) = 0 and E(uiu

′i) = σ2

uIN where IN is an identity matrix of size N . Theabove equation can be written more compactly as

Y = α + γY −1 + Xβ + U ,(4.1)

where U is a IN × 1 vector of iid errors such that E(U) = 0 and E(UU ′) =σ2

uIIN . Likewise, the random effects model (2.2) can be written as

Y = Zδ + W ,(4.2)

where Z = (eIN ,Y −1,X), δ = (µ, γ, β)′, eIN is a IN × 1 vector of ones, and

W =

⎛⎜⎜⎜⎜⎝

u1 + α1eN

u2 + α2eN

...uI + αIeN

⎞⎟⎟⎟⎟⎠ ,

where αi are iid random variables with zero mean and variance σ2α. The variance-

covariance matrix of ui + αieN is Σ = σ2uIN + σ2

αeNe′N .

Models (2.3) and (2.4) can also be expressed in matrix form:

(1 − γL)Y = µ + Xβ + U and(4.3)(1 − γL)Y = Z∗δ∗ + W ,(4.4)

MODEL AGGREGATION FOR PANELS 71

where Z∗ = (eIN ,X) and δ∗ = (µ, β)′. We now express GSAS in vectorial form.Following Lutkepohl (1986), let F 1 be a τ × N matrix such that

F 1 =

⎛⎜⎜⎜⎜⎝

ω′ 0 . . . 00 ω′ . . . 0...

.... . .

...0 0 . . . ω′

⎞⎟⎟⎟⎟⎠ ,

with ω′ = (wm−1, . . . , w0). The vector F 1yi contains the temporally aggregatedobservations for individual i. Let

F 2 =

⎛⎜⎜⎜⎜⎝

M11 M1

2 . . . M1I

M21 M2

2 . . . M2I

......

. . ....

MA1 MA

2 . . . MAI

⎞⎟⎟⎟⎟⎠ ,

be a A× I matrix such that (F 2)vl = Mvl i.e. the vl entry is equal to the weight

of individual l in group v. Hence the vector of individually aggregated data canbe written as (F 2 ⊗ IN )Y , where ⊗ denotes the Kronecker product. The GSAScan be rewritten as FY = (F 2 ⊗ F 1)Y , where F is a Aτ × IN matrix, thatis a matrix with rows equal to the total number of aggregated observations andcolumns equal to the total number of disaggregated observations.

The GDAS can be written as an extension of GSAS, by just introducing theratio of autoregressive polynomials:(

1 − γmLm

1 − γL

)(F 2 ⊗ F 1).

Note that while GSAS is applied to the static models (4.1) and (4.2), to applyGDAS we have to express the models as (4.3) and (4.4).

4.2. Static fixed effects modelConsider (4.1) with γ = 0. If we could observe yi,t, β could be estimated as

usual, by first pre-multiplying the model by the projection matrix

QN,I = II ⊗ QN , where

QN = IN − 1N

eNe′N

centers the observations with respect to the individual means. The ordinary leastsquares estimator is

β = (X ′QN,IX)−1(X ′QN,IY ),

with variance

Var(β | X) = σ2u(X ′QN,IX)−1.(4.5)

72 DAVID VEREDAS AND ALEXANDRE PETKOVIC

But since we don’t observe yi,t, but ya,T , we multiply (4.1) by GSAS:

FY = Fα + FXβ + FU .

We can in principle estimate β by ordinary least squares by first pre-multiplyingthe model by an equivalent projection matrix to the previous one but for aggre-gated data

Qτ,A = IA ⊗ Qτ where

Qτ = Iτ − 1τeτe′τ .

However, this method produces inefficient estimators as

E(FUU ′F ′) = σ2u(F 2F

′2 ⊗ F 1F

′1) = σ2

u

⎛⎝m−1∑

j=0

w2j

⎞⎠ (F 2F

′2 ⊗ Iτ )

has different diagonal elements unless∑I

i=1(Mai )2 =

∑Ii=1(M

a′i )2, ∀a, a′. That

is, it is the term F 2F′2 ⊗ Iτ that produces heteroscedasticity. But since we

know the exact form of heteroscedasticity, it is simple to build a generalizedleast squares estimator. Multiplying both sides of the model by the matrixV = (F 2F

′2)

−1/2 ⊗ Iτ :

V FY = V Fα + V FXβ + V FU ,

renders V FU homocedastic:

E(V FUU ′F ′V ′) = V FE(UU ′)F ′V ′ = σ2u(V FF ′V ′)(4.6)

= σ2u(((F 2F

′2)

−1/2F 2F′2(F 2F

′2)

−1/2) ⊗ F 1F′1)

= σ2u

⎛⎝m−1∑

j=0

w2j

⎞⎠ (IA ⊗ Iτ ).

Note that computation of V is straightforward as F 2F′2 is a diagonal matrix

(individuals can only belong to one group). Then we estimate by ordinary leastsquares by applying the projection matrix Qτ,A:

Qτ,AV FY = Qτ,AV Fα + Qτ,AV FXβ + Qτ,AV FU

= Qτ,AV FXβ + Qτ,AV FU .

The estimator is

βa = (X ′F ′V Qτ,AV FX)−1(X ′F ′V Qτ,AV FY ),

with variance

Var(βa | X) = σ2u

⎛⎝m−1∑

j=0

w2j

⎞⎠ (X ′F ′V Qτ,AV FX)−1.(4.7)

MODEL AGGREGATION FOR PANELS 73

Variances (4.5) and (4.7) are conditional to X. From the variance decompo-sition

Var(β) = E[Var(β | X)] + Var(E(β | X)) = E[Var(β | X)],

since E(β | X) = β. Likewise for Var(βa). Thus, in order to show Var(βa) ≥Var(β) it is sufficient to show that Var(βa | X) ≥ Var(β | X) for almost all X.In the above computations the X matrix is such that the inverses in (4.5) and(4.7) are well defined. For this to be the case we need the following assumption.

Assumption A1. Let (ΩX,FX, PX) be the probabilistic triplet associatedto X. PX(w : X(w)′F ′V Qτ,AV FX(w) is invertible) = 1.

Assumption A1 implies that the aggregated model is identifiable with prob-ability one. The following Proposition proves that this assumption for the aggre-gated data also implies that the disaggregated model is identifiable with proba-bility one.

Proposition 1. Under A1 X ′QN,IX is invertible with probability one.

The following Theorem shows that temporal and individual aggregation in-crease the variance of the estimated coefficient and thus results in an efficiencyloss.

Theorem 1. Consider the fixed effects model (4.1) with γ = 0, and that Fand X satisfy A1. Then Var(βa) − Var(β) is semi-positive definite.

The choice of the aggregation scheme may be given by the problem at handbut often it is a subjective choice. It is hence of interest to know if some temporalaggregation schemes are less inefficient than others. The two Propositions belowshow that there does not exist a dominant aggregation scheme. In other words,though we know from Theorem 1 that estimation with aggregated data entails anefficiency loss in the estimated parameters, we cannot rank aggregation schemesin terms of efficiency loss. We first show the result for temporal aggregation,followed by the counterpart for individual aggregation.

Let F 1 and F ∗1 be two matrices corresponding to two different temporal

aggregation schemes at the same frequency. Then, computations similar to thosein the proof of Theorem 1 show that the aggregation scheme F 1 is always moreefficient than the aggregation scheme F ∗

1 if the matrix

F′1QτF 1 − F

∗′1 QτF

∗1(4.8)

is a semi-positive definite, and where

F 1 =1√∑m−1

j=0 w2j

F 1,

74 DAVID VEREDAS AND ALEXANDRE PETKOVIC

and likewise for F∗1.

1 The following Proposition shows that this difference is anindefinite matrix.

Proposition 2. Assume two temporal aggregation schemes F 1 and F ∗1

characterized by the weights ω and ω∗. If ω = λω∗ for all λ ∈ R\0 then(4.8) is indefinite. And if ∃λ ∈ R\0 such that ω = λω∗, then (4.8) equals zero.

This proposition basically states that two different aggregation schemes thatdefine directions in hyperplanes that are not parallel cannot be compared. Thefollowing example shows a simple case where this happens.

Example 1. Let m = 2 and consider the flow and stock aggregation schemescharacterized by the sequence of weights ω′ = (1, 1) and ω∗′ = (1, 0) respectively.It can be seen that ωc′ = (a,−a) with a ∈ R\0 since ω′ωc = a−a = 0. Thus thecondition ω′ωc defines a line in a two dimensional plane. Similarly, the conditionω∗′ω∗c = 0 leads to ω∗c = (0, a)′ with a ∈ R\0, and thus defines another linein the two dimensional plane. Since ω = λω∗ for all λ ∈ R\0, the two lines arenot parallel.

This result is counter intuitive. It is natural to consider that flow aggregationis more appropriate than stock aggregation as it sums observations instead ofsampling them. However, Proposition 2 shows that when it comes to parameterestimation this does not need to be true. To clarify this point consider thefollowing example.

Example 2. Consider model (4.1) with γ = 0:

yi,t = αi + xi,tβ + ui,t,

and let xi,t = −xi,t−1 for all i. If data are aggregated as flow with m = 2, theaggregated model is

yi,t + yi,t−1 = 2αi + (xi,t + xi,t−1)β + ui,t + ui,t−1

= 2αi + (−xi,t−1 + xi,t−1)β + ui,t + ui,t−1

= 2αi + ui,t + ui,t−1.

In such situation β is not identified but it can be seen that we do not have sucha problem under stock aggregation.

In this example the flow aggregation scheme violates A1. But neverthelessit gives the intuition behind the impossibility of ranking temporal aggregationschemes. In this example, β is not identified with the aggregated data because theregressor is perfectly negatively correlated. This suggests that stock aggregation

1 For such a comparison to be possible we need to assume that there exists a set CF ,F∗ ∈ F such

that PX (CF ,F∗ ) = 1 and both (X (w)′F ′Qτ,AFX (w))−1 and (X (w)′F∗′Qτ∗,AF∗X (w))−1 exist for

w ∈ CF ,F∗ , where F = (F2 ⊗ F1) and F∗ = (F2 ⊗ F∗1). Such a set exists provided that the set BF

and BF∗ do, since we can set CF ,F∗ = BF ∩ BF∗ .

MODEL AGGREGATION FOR PANELS 75

outperforms flow aggregation if the regressors have a strong oscillating behavior,while flow aggregation outperforms stock aggregation if the regressors have asmooth behavior. This is illustrated bellow in the numerical simulations.

The computation that lead to the proof of Theorem 1 can also be used tocompare different individual aggregation schemes. Specifically, let F 2 and F ∗

2

be two matrices representing two different individual aggregation schemes. Weassume that the total number of aggregated individuals is fixed and equal to A.Let P 2 = F ′

2(F 2F′2)

−1F 2 and likewise for P ∗2. The aggregation scheme F 2 is

more efficient than the aggregation scheme F ∗2 if the matrix

P 2 − P ∗2,(4.9)

is semi-positive definite.2 The matrix P 2 is the projection matrix in the columnspace of F ′

2 and P ∗′2 is the projection matrix in the column of F ∗′

2 . The columnspaces of F ∗′

2 and F ′2 do not necessarily coincide, as shown in the following

Proposition.

Proposition 3. Let F 2 and F ∗2 be two individual aggregation schemes such

that the number of aggregated groups is the same for both schemes. Consider thematrix A obtained by stacking the matrices F 2 and F ∗

2

A =

(F 2

F ∗2

).

If rank(A) > A, then (4.9) is indefinite.

Example 3. In order to fix ideas, consider the following two individual ag-gregation schemes:

F 2 =

(1 1 0 00 0 1 1

)and F ∗

2 =

(1 0 0 00 0 1 0

).

Using the matrix F 2, individuals are aggregated as flows, while using the matrixF ∗

2 they are aggregated as stock and rank(A) = 4. Thus stock aggregation is notnecessarily more inefficient than flow aggregation.

As for Proposition 1, it is intuitive that individual flow aggregation should bemore efficient than individual stock aggregation. However, the following exampleshows why this intuition does not always hold.

Example 4. Consider model (4.1) with γ = 0:

yi,t = αi + xi,tβ + ui,t,

where i = 1, 2, 3, 4. Assume that the 4 individuals are aggregated in two groups.The first group contains individuals 1 and 2, the second group individuals 3 and

2 For the comparison of two individual aggregation schemes it is enough to assume that the sets BFand BF∗ exist, where F = (F2 ⊗ F1) and F∗ = (F∗

2 ⊗ F1).

76 DAVID VEREDAS AND ALEXANDRE PETKOVIC

4. Assume furthermore that x1,t = −x2,t and x3,t = −x4,t. Similar to Example2, it can be seen that stock individual aggregation is more efficient than flowaggregation. Note that, also similar to Example 2, the flow aggregation schemeviolates A1. Nevertheless it allows us to gain intuition behind Proposition 3.

We illustrate the efficiency loss implied by aggregation with a simulationstudy. It is based on comparing (4.5) and (4.7) for different aggregation schemes.We consider 12 individuals and 40 time periods (I = 12 and N = 40), threetemporal aggregation schemes and four individual aggregation schemes. Time isaggregated as flow (wj = 1 ∀j) and as stock (w1 = 1 and wj = 0 ∀j > 1) andthe aggregation schemes are i) type 1: m = 10 or τ = 4, ii) type 2: m = 5 orτ = 8, iii) type 3: m = 4 or τ = 10, and iv) type 4: m = 2 or τ = 20. That is,the aggregation frequency increases with the type. Individuals are aggregated asflows (i.e. Ma

i = 1 ∀a and all i ∈ a) and the aggregation schemes are i) type 1:A = 3, Ia = 4, ∀a, ii) type 2: A = 4, Ia = 3, ∀a, and iii) type 3: A = 6, Ia = 2,∀a. That is, we aggregate in groups of equal size and the group size decreaseswith the type.

The regressors xi,t are generated according to a AR(1) processes xi,t =φxi,t−1 + εi,t where φ takes values 0.8, 0 and −0.8 and εi,t ∼ iidN(0, 1). Weset σ2

u = 1. We generate 10000 disaggregated samples for the regressors and wecompute the sample mean of the variances (4.5) and (4.7). Note that to computethese variances we don’t need to simulate yi,t nor to estimate (hence avoiding theuncertainty due to estimation). To compute (4.5) and (4.7) we only need σ2

u, theaggregation schemes, and to simulate the regressors.

Table 1 is divided in three panels, each divided further into three sub-panels.The upper panel shows the results for φ = 0.8, the middle for φ = −0.8 andthe bottom for φ = 0. For each sub-panel the upper and middle parts show theaverage relative efficiency loss when temporal aggregation is flow and stock. Forthe ease of exposition, throughout the simulation studies for the static modelswe consider flow individual aggregation. Results for stock individual aggregationconfirm Proposition 3 and they are available under request. The bottom partshows the percentage of times that the variances using the aggregated data underthe temporal stock scheme are smaller than under the temporal flow scheme. Inall cases, aggregation entails an important loss of efficiency, even if aggregationis mild, confirming Theorem 1. Results also confirm Proposition 2. There isno temporal aggregation scheme that dominates another. When φ is large andpositive (negative respectively) flow (stock respectively) temporal aggregationentails smaller efficiency loss. In fact, from the bottom parts of the sub-panelswe conclude that flow (stock respectively) aggregation tends to be more efficientwhen φ is large and positive (negative respectively). This is related to Example2 above. If a positive value for xi,t tends to be followed by another positive value,we lose more information sampling than summing every m periods. However, if apositive value for xi,t tends to be followed by a negative value, summing cancelsthe observations and the information loss is more important than sampling. Inthe case that the regressor is a white noise, stock and flow aggregation does not

MODEL AGGREGATION FOR PANELS 77

Table 1. Efficiency loss—Model (4.1) with γ = 0.

Individual ⇓/ Temporal ⇒ type 1 type 2 type 3 type 4 No Aggreg

φ = 0.8 Flow temporal aggregation

type 1 12.13 6.96 6.23 4.91 4.19

type 2 8.54 5.06 4.55 3.61 3.10

type 3 5.32 3.26 2.94 2.36 2.03

No Aggreg 2.50 9.95 1.43 1.15 1.00

Stock temporal aggregation

type 1 65.58 23.72 18.19 8.54 4.19

type 2 45.98 17.35 13.37 6.31 3.10

type 3 28.71 11.23 8.69 4.13 2.03

No Aggreg 13.45 9.95 4.25 2.03 1.00

% temporal stock < temporal flow

type 1 0.09 0 0 0 —

type 2 0 0 0 0 —

type 3 0 0 0 0 —

No Aggreg 0 0 0 0 —

φ = −−−0.8 Flow temporal aggregation

type 1 398.45 97.70 95.60 40.23 4.22

type 2 277.99 70.97 70.04 29.87 3.11

type 3 173.80 45.75 45.81 19.75 2.04

No Aggreg 81.67 8.01 22.49 9.80 1.00

Stock temporal aggregation

type 1 81.96 24.95 22.73 10.67 4.22

type 2 57.06 18.16 16.62 7.86 3.11

type 3 35.74 11.74 10.82 5.15 2.04

No Aggreg 16.73 8.01 5.28 2.52 1.00

% temporal stock < temporal flow

type 1 99.87 100 100 100 —

type 2 99.99 100 100 100 —

type 3 100 100 100 100 —

No Aggreg 100 100 100 100 —

φ = 0 Flow temporal aggregation

type 1 65.86 24.53 18.68 8.50 4.05

type 2 47.02 18.07 13.77 6.32 3.03

type 3 29.24 11.66 8.98 4.16 2.01

No Aggreg 13.80 21.44 4.41 2.07 1.00

Stock temporal aggregation

type 1 66.43 24.52 18.59 8.46 4.05

type 2 46.42 17.95 13.70 6.30 3.03

type 3 29.02 11.64 8.95 4.16 2.01

No Aggreg 13.72 21.44 4.39 2.06 1.00

% temporal stock < temporal flow

type 1 50.09 50.39 49.99 50.54 —

type 2 50.99 50.65 49.46 50.63 —

type 3 50.21 50.28 49.14 49.77 —

No Aggreg 50.73 50.4 48.84 49.73 —

The upper panel shows the results for φ = 0.8, the middle for φ = −0.8 and the bottom for φ = 0.

For each sub-panel the upper and middle parts show the average relative efficiency loss when temporal

aggregation is flow and stock (individual aggregation is always flow). The bottom part shows the

percentage of times that aggregated variances under the temporal stock scheme are smaller than under

the temporal flow scheme. No Aggreg column and rows stand for results under temporally disaggregated

data (column) and individually disaggregated data (rows).

78 DAVID VEREDAS AND ALEXANDRE PETKOVIC

make any difference and the proportion of times that one variance is larger thanthe other is around 50%.

4.3. Static random effects modelWe now consider the static random effects model (4.2) with γ = 0. Since

the effects are now part of the error term, the variance-covariance of W is notdiagonal. If we could observe yi,t, δ could be estimated by generalized leastsquares:

δ = (Z ′V 2NZ)−1(Z ′V 2

NY ),

where the weighing matrix V 2N is given by (see e.g. Hsiao (2003))

V 2N = II ⊗

1σ2

u

(IN − σ2

α

σ2u + Nσ2

α

eNe′N

).(4.10)

And its variance is

Var(δ | Z) = (Z ′V 2NZ)−1.(4.11)

Since we don’t observe yi,t, but ya,T , we multiply (4.2) by GSAS:

FY = FZδ + FW .

Note that, as we saw earlier, in the aggregated model the intercept Fµ becomesindividual-specific. The variance of FW is given by

E(FWW ′F ′) = (F 2F′2) ⊗ (F 1ΣF ′

1).

And, since Σ = σ2uIN + σ2

αeNe′N , it can be re-written as

E(FWW ′F ′) = (F 2F′2)(4.12)

⎛⎝σ2

u

⎛⎝m−1∑

j=0

w2j

⎞⎠ Iτ + σ2

α

⎛⎝m−1∑

j=0

wj

⎞⎠2

eτe′τ

⎞⎠ .

Thus, in the aggregated random effects model the error term presents covari-ances within individuals different to zero (through the random effects) and het-eroscedasticity (through F 2F

′2). In order to estimate δ by least squares we

need to weight the observations to obtain a diagonal and homocedastic variance-covariance matrix for the error term. Similarly to estimation with yi,t, the appro-priate weighing matrix is the square root of the inverse of the variance-covariancematrix of the original aggregated errors, (4.12), but replacing σ2

u, σ2α and N by

their aggregate counterparts given in (4.12), and IN by (F 2F′2)

−1:

V 2τ = (F 2F

′2)

−1 ⊗ (F 1ΣF ′1)

−1

= (F 2F′2)

−1

⊗ 1σ2

u(∑m−1

j=0 w2j )

(Iτ −

σ2α(

∑m−1j=0 wj)2

σ2u(

∑m−1j=0 w2

j ) + τσ2α(

∑m−1j=0 wj)2

eτe′τ

).

MODEL AGGREGATION FOR PANELS 79

Multiplying both sides of the model by V τ :

V τFY = V τFZδ + V τFW ,

and following the same lines as in (4.6), this transformation renders V τFWhomocedastic. The ordinary least squares estimator is

δa

= (Z ′F ′V 2τFZ)−1(Z ′F ′V 2

τFY ),

with variance

Var(δa | Z) = (Z ′F ′V 2

τFZ)−1.(4.13)

As in the fixed effects model, variances (4.11) and (4.13) are conditional toZ. If Var(δ | Z) ≤ Var(δ

a | Z) for all Z(w) with w ∈ BF,Z, then the samedominance applies to the unconditional variances, i.e. Var(δ) ≤ Var(δ

a).

In the above computations the Z matrix is such that the inverses in (4.11)and (4.13) are well defined. For this to be the case we need the following as-sumption.

Assumption A2. Let (ΩZ,FZ, PZ) be the probabilistic triplet associatedto Z. PZ(w : FZ(w) is of rank k + 1) = 1.

As with A1, A2 implies that the model for the aggregated data is identifiablewith probability one. The following Proposition proves that this assumption forthe aggregated data also implies that the disaggregated model is identifiable withprobability one.

Proposition 4. If FZ is of rank k + 1 then so is Z

The following Theorem shows that temporal and individual aggregation in-creases the unconditional variance of the estimates and thus results in an effi-ciency loss.

Theorem 2. Consider the random effects model (4.2) with γ = 0, and Fand Z satisfy A2. Then Var(δ

a) − Var(δ) is semi-positive definite.

As it happened in the fixed effects model, we analyze if there is a ranking ofaggregation schemes in terms of efficiency losses. Let F 1 and F ∗

1 be two τ × Nmatrices representing two different temporal aggregation schemes. Then compar-ing the efficiency loss of the two aggregation schemes boils down to determiningif the matrix

F ′1(F 1ΣF ′

1)−1F 1 − F ∗′

1 (F ∗1ΣF ∗′

1 )−1F ∗1,(4.14)

is semi-positive (negative) definite. This expression is in general indefinite mean-ing that there is no dominant aggregation scheme in terms of efficiency, as thefollowing Proposition shows.

80 DAVID VEREDAS AND ALEXANDRE PETKOVIC

Proposition 5. Assume two temporal aggregation schemes F 1 and F ∗1

characterized by the weights ω and ω∗. If there is no constant λ ∈ R\0 suchthat ω = λω∗, then (4.14) is indefinite.

We now turn to individual aggregation. As in the static fixed effects model,comparing two different individual aggregations schemes, F 2 and F ∗

2, boils downto determining whether the difference P 2 − P ∗

2 is semi positive or negative defi-nite. This is essentially the same problem as in the Proposition 3.

We illustrate the efficiency loss implied by aggregation by means of a simu-lation study, which has the same structure as the one for the fixed effects model.The differences are that we compare (4.11) and (4.13), we set σ2

u = 1 and σ2α = 2,

and we focus on β. Table 2 has the same structure as Table 1. In all cases, aggre-gation entails an important efficiency loss, even if aggregation is mild, confirmingTheorem 2. On the other hand, the comparison of the results for different val-ues of φ confirms Proposition 5. No temporal aggregation scheme dominatesanother. When φ is large and positive (negative respectively) flow (stock respec-tively) temporal aggregation entails less efficiency loss. In fact, from the bottomparts of the sub-panels we conclude that virtually always flow (stock respectively)aggregation does it better when φ is large and positive (negative respectively).In the case that the regressor is white noise, stock or flow aggregation does notmake any difference and the proportion of times that one variance is larger thanthe other is around 50%.

4.4. Dynamic modelsWe now study the impact of aggregation on the parameters θ = (γ, β) in

a dynamic linear model. There are many estimation methods available in theliterature. It is well known that the least squares estimator for γ has a biasof magnitude O(T−1) (see e.g. Hsiao (2003)). To avoid this bias, Arellano andBond (1991) introduced a GMM estimator, henceforth Arellano-Bond. This isthe method we choose since it is widely used in the empirical literature. Notethat in this case we don’t need to assume the fixed or random effects model (4.3)and (4.4) since estimation is performed by taking first differences and the effectsare swept out.

In contrast to the static model, we study separately the effect of individ-ual and temporal aggregation on the estimators for θ. There are two reasonsfor this choice. The first is that temporal and individual aggregation have verydifferent consequences on the specification of dynamic linear models, as shownin Section 3. Individual aggregation essentially causes heteroscedasticity whiletemporal aggregation substantially modifies the structure of the model, inducingmoving average terms and dependence with respect to past aggregated exogenousvariables. This has important implications in terms of the moment conditionsthat are used to estimate the parameters. Second, using GMM we obtain thesurprising result that aggregation may imply an efficiency gain. That is, estimat-ing with ya,T instead of with yi,t may be preferable. To better grasp the insightsof this result we analyze individual and temporal aggregation separately.

MODEL AGGREGATION FOR PANELS 81

Table 2. Efficiency loss—Model (4.2) with γ = 0.

Individual ⇓/ Temporal ⇒ type 1 type 2 type 3 type 4 No Aggreg

φ = 0.8 Flow temporal aggregation

type 1 12.28 6.99 6.26 4.93 4.21

type 2 8.52 5.08 4.56 3.62 3.11

type 3 5.29 3.26 2.94 2.36 2.04

No Aggreg 2.48 9.92 1.43 1.15 1.00

Stock temporal aggregation

type 1 52.69 21.93 17.17 8.41 4.21

type 2 36.81 15.91 12.53 6.18 3.11

type 3 23.09 10.31 8.15 4.05 2.04

No Aggreg 10.84 9.92 3.96 1.98 1.00

% temporal stock < temporal flow

type 1 0.15 0 0 0 —

type 2 0 0 0 0 —

type 3 0 0 0 0 —

No Aggreg 0 0 0 0 —

φ = −0.8 Flow temporal aggregation

type 1 394.22 96.71 95.34 40.20 4.22

type 2 275.02 70.41 69.97 29.87 3.11

type 3 171.66 45.57 45.69 19.74 2.04

No Aggreg 81.07 8.01 22.41 9.79 1.00

Stock temporal aggregation

type 1 64.95 22.69 21.11 10.38 4.22

type 2 46.00 16.64 15.54 7.67 3.11

type 3 28.73 10.80 10.11 5.03 2.04

No Aggreg 13.53 8.01 4.93 2.47 1.00

% temporal stock < temporal flow

type 1 99.56 100 100 100 —

type 2 99.94 100 100 100 —

type 3 100 100 100 100 —

No Aggreg 100 100 100 100 —

φ = 0 Flow temporal aggregation

type 1 66.84 24.56 18.63 8.48 4.05

type 2 46.60 17.93 13.71 6.31 3.03

type 3 29.20 11.64 8.96 4.17 2.01

type 4 13.73 21.43 4.41 2.07 1.00

Stock temporal aggregation

type 1 64.43 24.56 18.60 8.45 4.05

type 2 45.24 17.94 13.71 6.31 3.03

type 3 28.14 11.58 8.89 4.16 2.01

No Aggreg 13.27 21.43 4.38 2.06 1.00

% temporal stock < temporal flow

type 1 52.61 50.96 50.71 50.42 —

type 2 51.79 50.64 50.74 50.05 —

type 3 53.39 51.09 51.16 51.1 —

No Aggreg 54.59 52.37 51.2 50.34 —

The upper panel shows the results for φ = 0.8, the middle for φ = −0.8 and the bottom for φ = 0.

For each sub-panel the upper and middle parts show the average relative efficiency loss when temporal

aggregation is flow and stock (individual aggregation is always flow). The bottom part shows the

percentage of times that aggregated variances under the temporal stock scheme are smaller than under

the temporal flow scheme. No Aggreg column and rows stand for results under temporally disaggregated

data (column) and individually disaggregated data (rows).

82 DAVID VEREDAS AND ALEXANDRE PETKOVIC

Initially we review the Arellano-Bond estimator. This review is also help-ful for introducing further notation. Second, we study the case of individualaggregation. We end with temporal aggregation.

Arellano-Bond is based on the moment conditions

E(yi,t−2−j(ui,t − ui,t−1)) = 0, j = 0, 1, 2, . . . , t − 3E(xi,j(ui,t − ui,t−1)) = 0, j = 1, 2, . . . , N.(4.15)

This system of moment conditions can be written compactly asE(qi,t∆ui,t) = 0 where ∆ = (1−L), and qi,t = (yi,0, yi,1, . . . , yi,t−2, xi,1, . . . , xi,N )′

is the vector of instruments. We gather the moment conditions for the individuali as E(Ωi∆ui) = 0 where

Ωi =

⎛⎜⎜⎜⎜⎝

qi,2 0 . . . 00 qi,3 . . . 0...

.... . .

...0 0 . . . qi,N

⎞⎟⎟⎟⎟⎠

is the matrix of instruments and ∆ui = (∆ui,2,∆ui,3, . . . ,∆ui,N ). The Arellano-Bond’s estimator θ is

θ = argminθθθ(Ω∆U)′Ψ−1(Ω∆U),

where ∆U = (∆u1,∆u2, . . . ,∆uI) and Ω = (Ω1 Ω2 . . .ΩI). The optimal choicefor Ψ is the variance-covariance matrix of Ω∆U , Ψ = E(Ω∆U∆U ′Ω′), whichcan be approximated by Ψ = σ2

u(Ω(II ⊗DND′N )Ω′), where DN is a (N−1)×N

matrix of the form

DN =

⎛⎜⎜⎜⎜⎝

−1 1 . . . 0 00 −1 . . . 0 0...

.... . .

......

0 0 . . . −1 1

⎞⎟⎟⎟⎟⎠ .

The variance of θ is given by

Var(θ | S,Ω) = σ2u((S′(II ⊗ D′

N )Ω′)(Ω(II ⊗ DND′N )Ω′)−1(4.16)

× (Ω(II ⊗ DN )S))−1,

where S′ = (S1,S2, . . . ,SI) and Si = (yi,−1,xi).We now turn to the estimation problem when yi,t is only observed aggregated

in groups. We apply individual aggregation to the model (GDAS becomes F 2 ⊗IN ):

(F 2 ⊗ IN )Y = (F 2 ⊗ IN )α + γ(F 2 ⊗ IN )Y −1 + (F 2 ⊗ IN )Xβ + (F 2 ⊗ IN )U ,

MODEL AGGREGATION FOR PANELS 83

where α can be fixed or random.3 The aggregate counter part to the disaggre-gated orthogonality conditions are

E

(I∑

i=1

Mai yi,t−2−j

(I∑

i=1

Mai (uit − uit−1)

))= 0, j = 0, 1, 2, . . . , t − 3

E

(I∑

i=1

Mai xi,j

(I∑

i=1

Mai (uit − uit−1)

))= 0, j = 1, 2, . . . , N.(4.17)

Note that the above moments conditions are not exactly equivalent to those inthe disaggregated model. However under our assumption on the error term both(4.15) and (4.17) are verified.

Since Arellano-Bond is robust to the presence of heteroscedasticity, we couldestimate θ immediately. However because the form of the heteroscedasticity isknown, it is more appropriate to directly control for it by pre-multiplying themodel by the matrix (F 2F

′2)

−1/2 ⊗ IN that renders the errors homocedastic:

((F 2F′2)

−1/2F 2 ⊗ IN )Y= ((F 2F

′2)

−1/2F 2 ⊗ IN )α + γ((F 2F′2)

−1/2F 2 ⊗ IN )Y −1

+ ((F 2F′2)

−1/2F 2 ⊗ IN )Xβ + ((F 2F′2)

−1/2F 2 ⊗ IN )U .(4.18)

The individually aggregated model (4.18) entails a new matrix of instru-ments, denoted Ω, and that can be shown to be Ω = Ω((F 2F

′2)

−1/2F 2 ⊗ IN ).The moment conditions are unchanged by individual aggregation, implying thatthe matrix of instruments of the aggregated model has the same structure as thedisaggregated one.

The Arellano-Bond estimator for θ is

θa

= argminθθθ(Ω∆U)′Ψ−1

(Ω∆U),

where ∆U′= (∆u1,∆u2, . . . ,∆uA). The optimal weighing matrix is given by

Ψ = E(Ω∆U∆U ′Ω′) = E(Ω(P 2 ⊗ IN )∆U∆U

′(P 2 ⊗ IN )Ω′),

which can be approximated by ˆΨ = σ2u(Ω(P 2 ⊗DND′

N )Ω′). The variance of θa

is given by

Var(θa | S,Ω) = σ2

u((S′(P 2 ⊗ D′N )Ω′)(Ω(P 2 ⊗ DND′

N )Ω′)−1(4.19)× (Ω(P 2 ⊗ DN )S))−1.

Theorems 1 and 2 in the static model showed that aggregation entails anefficiency loss. In the dynamic case, for temporal aggregation, we obtain theresult shown below that it is not possible to establish an equivalent Theorem. In

3 We should include the term (F2 ⊗ IN )µ on the right hand side if ααα is random. For the sake of

presentation we continue with the fixed effects model. However, as said above, estimation wise it does

not matter if ααα is fixed or random.

84 DAVID VEREDAS AND ALEXANDRE PETKOVIC

other words, parameters estimated with aggregated data maybe more efficientthan with disaggregated data.

We know that the estimator based on the disaggregated data is more efficientif

Var(θ | S,Ω)−1 − Var(θa | S,Ω)−1(4.20)

is semi-positive definite. However this is not always true. To see why, use (4.16)and (4.19) to express (4.20) as

S′C ′(CC ′)−1CS − S′PC ′(CPC ′)−1CPS,(4.21)

where C = Ω(II ⊗ DN ) and P = (P 2 ⊗ IN ). This is the difference betweenprojected variances of S. The first matrix represents the variance that is pro-jected on the column space of the matrix C ′, while the second is the variancethat is projected on the column space of the matrix PC ′. Since the columnspaces generated by those two matrices are different, there is no reason for thisdifference to be semi positive or negative definite.4 Similarly the static model,we assume that all the inverses appearing in (4.16) and (4.19) are well definedwith probability one.

Assumption A3. Let (Ωm,F ,P ) be the triplet associated to the randommatrix (S, (II⊗D′

N )Ω). There exists a set BF ∈ F such that P (BF) = 1 ∀w ∈ Band the inverses in (4.16) and (4.19) are well defined.

The fact that (4.21) can be either semi positive or negative definite meansthat Var(θ

a | S,Ω) − Var(θ | S,Ω) can be either semi positive or negativedefinite, which implies that aggregation is not unconditionally dominated, i.e.that

Var(θa) − Var(θ)(4.22)

is semi-positive definite.As a check, we simulate 10000 times from (4.4) for β = 0, where ui,t ∼

iidN(0, 1), αi ∼ iidN(0, 1), and γ takes the values 0.8, 0.4, −0.4 and −0.8. Weconsider 12 individuals and four different time periods: 4, 8, 10 and 20. For N =20, the GMM estimator should be close to the least squares estimators and thedisaggregated estimator should always deliver smaller variances. Individuals areaggregated as flows (i.e. Ma

i = 1 ∀a and all i ∈ a) and the individual aggregationschemes are the same as in the static cases: i) type 1: A = 3, Ia = 4, ∀a, ii) type2: A = 4, Ia = 3, ∀a, iii) type 3: A = 6, Ia = 2, ∀a. That is, we aggregate in

4 Note that the preparatory Lemma 1 cannot be used in this context to determine whether (4.21)

is semi positive or semi negative definite. Define ϕ1(C ′) = C ′, ϕ2(C ′) = PC ′ and ψ(C ′) = C ′z where

z ∈ R(N−1)(N+2)/2. The structure of the matrix C ′ is such that its columns are not linearly independent,

meaning that they cannot be a basis of the column space they generate. Since they are not a basis, a

vector lying in the column space of C ′ can have many representations in terms of z . This means that

the function ψ is not injective.

MODEL AGGREGATION FOR PANELS 85

groups of equal size and the group size decreases with the type. Table 3 showsthe simulation results, which are divided into four panels, each for one valueof γ. Each panel is divided further into two subpanels. The upper subpanelshows the average relative efficiency loss and the bottom subpanel shows thepercentage of times that disaggregated variances are smaller than the aggregatedvariances. The upper subpanels show that, on average, aggregation entails aninformation loss. But the subpanels show that there are cases in which γa ismore efficient than γ. The proportion of times for which disaggregation is betterthan aggregation decreases with γ. As it tends to −1, the proportion tends to1. As N increases, the disaggregated GMM estimator is always better than theaggregated one.

Last we study the estimation problem when yi,t is observed aggregated intime. From now on we assume that γ > 0, however our results can be adaptedto the case γ < 0. We first express the model in the form of (4.3) or (4.4) andapply GDAS:

(1 − γmLm)FY =(

1 − γm

1 − γ

)Fα + FXβ + · · · + FX−m+1βγm−1 + η,

where η = U + · · · + U−m+1γm−1. As mentioned in Section 2, this model has

a constrained dynamic response with respect to the exogenous variables and itcan be written as:

FY =

(1 − c0

1 − c1/m0

)Fα + FY −mc0 + FXc1 + FX−1c2 + · · ·

+ FX−m+1cm + η

=

(1 − c0

1 − c1/m0

)Fα + Xc + η,(4.23)

where X = (Y −m,X, . . . ,X−m+1), c0 = γm, c1 = β, c2 = γβ, . . . , cm = γm−1β,and c = (c0, c1, . . . , cm). A typical element of (4.23) is

yi,T =

⎛⎝m−1∑

j=0

cj/m0

⎞⎠

⎛⎝m−1∑

j=0

wj

⎞⎠ αi + c0yi,T−m + xi,tc1 + · · · + xi,t−m+1cm + ηi,T ,

where cm+1 = E(ηa,T ηa,T−m) = f(c0) = 0 (expect for stock aggregation) andE(ηa,T ηa,T−sm) = 0 for s = ±2,±3,±4, . . . . The constraints on the coefficientsof the aggregated model can be written in an implicit form as

g(c) =

⎛⎜⎜⎜⎜⎜⎜⎜⎝

c2 − c1/m0 c1

c3 − c2/m0 c1

...cm − c

(m−1)/m0 c1

cm+1 − f(c0)

⎞⎟⎟⎟⎟⎟⎟⎟⎠

.

86 DAVID VEREDAS AND ALEXANDRE PETKOVIC

Table 3. Efficiency loss—GMM and individual aggregation.

N = 4 N = 8 N = 10 N = 20

γ = 0.8

type 1 1.24 1.36 1.45 1.86

type 2 1.36 1.76 2.00 2.79

type 3 1.43 2.26 2.61 3.76

% disaggregated < aggregated

type 1 64.4 82.3 91.16 100

type 2 67.4 91.2 97.2 100

type 3 67.6 95.1 98.9 100

γ = 0.4

type 1 1.90 1.73 1.79 1.97

type 2 2.61 2.51 2.64 2.96

type 3 3.16 3.35 3.55 3.99

% disaggregated < aggregated

type 1 93.4 99.75 99.9 100

type 2 96.1 100 100 100

type 3 97.2 100 100 100

γ = −0.4

type 1 2.08 1.94 1.97 2.01

type 2 3.18 2.97 3.00 3.04

type 3 4.31 4.08 4.08 4.10

% disaggregated < aggregated

type 1 99.8 100 100 100

type 2 100 100 100 100

type 3 100 100 100 100

γ = −0.8

type 1 2.21 2.09 2.09 2.07

type 2 3.55 3.30 3.28 3.20

type 3 5.07 4.71 4.61 4.41

% disaggregated < aggregated

type 1 99.9 100 100 100

type 2 100 100 100 100

type 3 100 100 100 100

The upper panel shows the results for γ = 0.8, the second from the top for γ = 0.4, the second

from the bottom for γ = −0.4 and the bottom for γ = −0.8. For each sub-panel the upper part shows

the average relative efficiency loss when individual aggregation is flow. The bottom part shows the

percentage of times that disaggregated variances are smaller than the aggregated variances. All results

are under the individual flow scheme.

There are no constraints on c0 and c1 as they are the parameters to estimate:c0 = γ and c1 = β. Since the aggregated residuals are autocorrelated of orderone at the aggregated frequency, the moment conditions are different from thedisaggregated ones. More specifically, the vector of instruments at frequency T isqa

i,T = (yi,0, yi,m, . . . , yi,T−2m, xi,1, . . . , xi,N )′. These instruments can be gathered

MODEL AGGREGATION FOR PANELS 87

in the matrix

Ωai =

⎛⎜⎜⎜⎜⎝

qai,2m 0 . . . 00 qa

i,3m . . . 0...

.... . .

...0 0 . . . qa

i,τ

⎞⎟⎟⎟⎟⎠ .

Let ∆η′i = (∆ηi,2m, ∆ηi,3m, . . . , ∆ηi,τ ) where ∆ = (1 − Lm). Then the moments

conditions for individual i are

E(Ωai ∆ηi) =

⎛⎜⎜⎜⎜⎝

E(qai,2m∆ηi,2m)

E(qai,3m∆ηi,3m)

...E(qa

i,τ ∆ηi,τ )

⎞⎟⎟⎟⎟⎠ =

⎛⎜⎜⎜⎜⎝

K2(cm+1)K3(cm+1)

...Kτ (cm+1)

⎞⎟⎟⎟⎟⎠ = K(cm+1),

where a typical s element of K(cm+1) is

Ks(cm+1) = (0, . . . , 0︸ ︷︷ ︸s−2 times

, cm+1 , 0, . . . , 0︸ ︷︷ ︸N times

)′,

with the non-zero element corresponding to the autocorrelation of order one inη created by temporal aggregation: cm+1 = f(c0). For instance, E(qa

i,2m∆ηi,2m)is a system of 1 + N equations with first element E(yi,0∆ηi,2m) = E(yi,0(ηi,2m −ηi,m)), which is not zero because ηi,m follows an AR(1) process and hence it isrelated with yi,0.

We therefore face a GMM problem with moment conditions that are not zeroand with parameter constraints. Let Ωa = (Ωa

1 Ωa2 . . .Ωa

I ), ∆Υ′ = (∆η1, ∆η2,. . . , ∆ηI), and λ be a vector of Lagrange multipliers. Estimates of c and λ are(see e.g. Gourieroux and Monfort (1995))

(c, λ) = argminc,λλλ L(c) + λ′g(c) where(4.24)

L(c) = (Ωa∆Υ − K(cm+1))′Ψ−1

(Ωa∆Υ − K(cm+1)),

with optimal weighing matrix Ψ−1

= E((Ωa∆Υ − K(cm+1))(Ωa∆Υ −K(cm+1))′). The minimization of (4.24) is a nonlinear problem with solutionsthat cannot be found analytically. We know however (see e.g. Gourieroux andMonfort (1995)) the asymptotic variance of

√τA(c − c):

Var(√

τA(c − c) | Ωa) = (IN − M)J −10 I0J −1

0 (IN − M)′,

where

M = J −10

∂g(c)′

∂c

(∂g(c)∂c

J −10

∂g(c)′

∂c

)−1 ∂g(c)∂c

with

J0 = limτ,A→∞

− 1τA

∂2L(c)∂c∂c′

and I0 = limτ,A→∞

1τA

∂L(c)∂c

∂L(c)∂c′

.

88 DAVID VEREDAS AND ALEXANDRE PETKOVIC

Table 4. Efficiency loss—GMM and temporal aggregation.

N = 4 N = 8 N = 10 N = 20

γ = 0.8

m = 2 4.05 2.42 2.09 1.67

m = 4 — 15 × 103 — 4.25

m = 5 — — 26 × 104 6.90

m = 10 — — — 11 × 103

% disaggregated < aggregated

m = 2 100 100 100 100

m = 4 — 100 — 100

m = 5 — — 100 100

m = 10 — — — 100

γ = 0.4

m = 2 6.86 5.10 4.76 4.18

m = 4 — 32 × 103 — 113.9

m = 5 — — 76 × 104 684.6

m = 10 — — — 57 × 106

% disaggregated < aggregated

m = 2 94.73 100 100 100

m = 4 — 100 — 100

m = 5 — — 100 100

m = 10 — — — 100

γ = −−−0.4

m = 2 16.64 8.09 6.77 4.74

m = 4 — 49 × 103 — 131

m = 5 — — 49 × 103 758

m = 10 — — — 42 × 107

% disaggregated < aggregated

m = 2 100 100 100 100

m = 4 — 100 — 100

m = 5 — — 100 100

m = 10 — — — 100

γ = −−−0.8

m = 2 49.10 11.80 6.81 2.92

m = 4 — 35 × 106 — 7.53

m = 5 — — 15.68 5.47

m = 10 — — — 13 × 105

% disaggregated < aggregated

m = 2 100 100 100 100

m = 4 — 100 — 100

m = 5 — — 100 100

m = 10 — — — 100

The upper panel shows the results for γ = 0.8, the second from the top for γ = 0.4, the second

from the bottom for γ = −0.4 and the bottom for γ = −0.8. For each sub-panel the upper part shows

the average relative efficiency loss when individual aggregation is flow. The bottom part shows the

percentage of times that disaggregated variances are smaller than the aggregated variances. All results

are under the temporal stock scheme.

MODEL AGGREGATION FOR PANELS 89

Assume that γ > 0 (this is an assumption in order to simplify the presentation,the next equation could also be rewritten in the case γ < 0 since the disaggregatedmodel is identified). Then from the estimated parameter vector c we can recoverthe parameters of the disaggregated model: γ = c

1/m0 and β = c1. Their variances

can be backed using the delta method:

Var(θa | Ωa) = Π(c)z′ Var(c)zΠ(c)′,(4.25)

where

z =

(1 0 . . . 00 1 . . . 0

)and

Π(c) =

(1mc

(1−m)/m0 0

0 1

).

To ensure the comparison of the variances we assume that X, U and F are suchthat all the inverses are well defined with probability one, in a similar way to theprevious cases.

To illustrate the information loss due to temporal aggregation, we consider asimulation exercise with the same set up as the previous one. Time is aggregatedas stock and for each N we aggregate every 2, 4, 5 and 10 periods whenever it ispossible (e.g. for N = 8 we aggregate every 2 and 4 periods but for N = 10 weaggregate every 2 and 5 periods). Table 4 shows the simulation results, which aredivided into four panels, each for one value of γ. Each panel is further dividedinto two subpanels. The upper subpanel shows the average relative efficiencyloss and the bottom subpanel shows the percentage of times that aggregatedvariances are smaller than the disaggregated variances. In all cases aggregationentails a loss of efficiency, and the loss can be very large. Two things can explainthe presence of large variances. First, as data are aggregated through time, thecorrelation between two successive observations decreases. Thus the quality ofthe instruments used in the GMM deteriorates which leads to an increase in thevariance. Second, to compute the variance of the GMM estimators we need toperform two matrix inversions, increasing the probability of numerical instabilityand the second cause of such large variances.5 But as N increases the efficiencyloss decreases. This is due to the fact that as N increases, τ also increases,improving the accuracy of the estimates. This intuition is also reflected by thefact that as m increases the relative efficiency loss increases.

5. Conclusion

We study the impact of individual and temporal aggregation in linear staticand dynamic models for panel data in terms of model specification and efficiency

5 We also did the simulation exercise with medians instead of means. In some cases the relative effi-

ciency loss gave more reasonable numbers but not always. Qualitative results however remain unchanged

(i.e. aggregation entails an efficiency loss). Results are available by request.

90 DAVID VEREDAS AND ALEXANDRE PETKOVIC

of the estimated parameters. Model wise we find that i) individual aggregationdoes not affect the model structure, while temporal aggregation may introduceresidual autocorrelation, and ii) individual aggregation entails heteroscedasticitywhile temporal aggregation does not. Estimation wise we find that i) in the staticmodel, estimation by least squares with the aggregated data entails a decrease inthe efficiency of the estimated parameters and there is no dominant aggregationscheme in terms of efficiency, ii) in the dynamic model, estimation by GMM doesnot necessarily entail a decrease in the efficiency of the estimated parametersunder individual aggregation and no analytic comparison can be established fortemporal aggregation, though simulation results suggest that temporal aggrega-tion deteriorates the accuracy of the estimates.

Appendix AWe start with two preparatory Lemmas. In the following lemma let x and y

be two elements of a Hilbert space, we will denote their inner product by x′y.

Preparatory Lemma 1. Let C be a real Hilbert space. Consider twoprojection operators π1 and π2 defined on C. Let C⊥

1 and C⊥2 be the null spaces

associated with the operator π1 and π2. If C⊥1 ⊂ C⊥

2 , then x(π1 −π2)x ≥ 0 for allx ∈ C.

Proof. For each πi divide the vectorial space C as Ci⊕C⊥i , where ⊕ denotes

the direct sum. Consider now v′(π1 − π2)v. First rewrite v as v2 + v⊥2 , wherev2 ∈ C2 and v⊥2 ∈ C⊥

2 . We have

v′(π1 − π2)v = (v2 + v⊥2 )′(π1 − π2)(v2 + v⊥2 )= (v2 + v⊥2 )′π1(v2 + v⊥2 ) − v′2v2,

since v⊥′2 π2 = 0 and v′2π2 = v′2. The assumption C⊥

1 ⊂ C⊥2 implies C2 ⊂ C1, and

thus π1v2 = v2 and the above equation can be rewritten as

v′2π1v⊥2 + v⊥′

2 π1v2 + v⊥′2 π1v

⊥2 .(A.1)

Now v⊥2 ∈ C, thus we can write v⊥2 = z1 + z⊥1 where z1 ∈ C1 and z⊥1 ∈ C⊥1 and

(A.1) becomes

v′2z1 + z′1v2 + z′1z1.(A.2)

Finally notice that since z1 ∈ C⊥2 ∩ C1 and v2 ∈ C2, the two first terms of (A.2)

are equal to zero. Thus we can rewrite (A.2) as z′1z1 = v⊥′2 π1v

⊥2 . And since π1 is

a projection matrix, it is semi-positive definite and the Lemma follows.

Preparatory Lemma 2. Let B, C and D be three vectorial spaces. Con-sider two linear forms ϕ1 and ϕ2 such that ϕi : B → C for i = 1, 2. Let ψ : C → Dbe an injective linear form. Then

ϕ1(B) ⊂ ϕ2(B) ⇔ ψ ϕ1(B) ⊂ ψ ϕ2(B).

MODEL AGGREGATION FOR PANELS 91

Proof. ⇒ Assume ϕ1(B) ⊂ ϕ2(B) then by definition have ψ ϕ1(B) ⊂ψ ϕ2(B).⇐ Assume that ψ ϕ1(B) ⊂ ψ ϕ2(B). Since ψ is injective ψ−1 is unique. Thenfor each v ∈ ψ ϕ1(B) ⊂ ψ ϕ2(B) we have that ψ−1(v) ∈ ϕ1(B), implying thatϕ1(B) ⊂ ϕ2(B).

Proof of Proposition 1. We will prove this result in the more generalcase when there are K ≥ 1 regressors, thus X is NI × K. The proof is bycontraposition. Start by assuming that the matrix X ′QN,IX is non invertible,i.e. the rank(QN,IX) < k. Thus there exists a sequence of coefficients aj j =1, . . . , k − 1 such that the columns in QN,IX satisfy

k−1∑j=1

aj [QN,IX].,j = [QN,IX].,k

or

QN,I

⎛⎝k−1∑

j=1

aj [X].,j − [X].,k

⎞⎠ = QN,IL = 0,

where [·].,j denotes the jth column. In other words, the vector L is in the nullspace of the matrix QN,I . Due to the form of QN,I , this implies that L = (b⊗eN )for some b ∈ R

I . But this also implies that⎛⎝k−1∑

j=1

aj [Qτ,AV FX].,j − [Qτ,AV FX].,k

⎞⎠ = Qτ,AV F

⎛⎝k−1∑

j=1

aj [X].,j − [X].,k

⎞⎠

= Qτ,AV FL

= ((F 2F′2)

−1/2F 2b ⊗ QτF 1eN ) = 0,

which means that the matrix X ′F ′V Qτ,AV FX is non invertible.

Proof of Theorem 1. We must show that the difference

σ2u

⎛⎝m−1∑

j=0

w2j

⎞⎠ (X ′F ′V ′Qτ,AV FX)−1 − σ2

u(X ′QN,IX)−1

is semi-positive definite for all X. This is equivalent to show that the difference

(X ′QN,IX) − 1∑m−1j=0 w2

j

(X ′F ′V ′Qτ,AV FX)(A.3)

is semi-positive definite. Note that

F ′V ′Qτ,AV F = (F ′2 ⊗ F ′

1)((F 2F′2)

−1/2 ⊗ Iτ )(IA ⊗ Qτ )

× ((F 2F′2)

−1/2 ⊗ Iτ )(F 2 ⊗ F 1)= (F ′

2(F 2F′2)

−1F 2 ⊗ F ′1QτF 1).

92 DAVID VEREDAS AND ALEXANDRE PETKOVIC

LetF 1 =

1√∑m−1j=0 w2

j

F 1 and P 2 = F ′2(F 2F

′2)

−1F 2,

where F 1 is a normalized version of the temporal aggregation matrix and P 2 isa projection matrix onto the column space of F ′

2. Then we can rewrite (A.3) as

X ′((II ⊗ QN ) − (P 2 ⊗ F′1QτF 1))X(A.4)

= X ′[((II − P 2) ⊗ QN ) + (P 2 ⊗ (QN − F′1QτF 1))]X.

To show that (A.3) is semi-positive definite, it is sufficient to show that, by theproperties of the Kronecker product, all the factors of the Kronecker products in(A.4) are semi-positive definite.

The matrices P 2, II − P 2 and QN are semi-positive definite as they areprojection matrices. The matrix QN−F

′1QτF 1 is the difference of two projection

matrices (since F 1F′1 = Iτ ). By the preparatory Lemma 1, to prove that the

difference is semi-positive definite, it is sufficient to show that the null space ofQN , N⊥, is included in the null space of F

′1QτF 1, M⊥, i.e. N⊥ ⊂ M⊥. This

can be shown by noting that N⊥ = λeNλ∈R, the set of all N dimensionalvectors whose entry are all equal to λ ∈ R. Since

F′1QτF 1λeN

= λ

⎛⎝m−1∑

j=0

w2j

⎞⎠−1

(Iτ ⊗ ω)Qτ (Iτ ⊗ ω′)eN

= λ

⎛⎝m−1∑

j=0

w2j

⎞⎠−1 (

(Iτ ⊗ ω)(Iτ ⊗ ω′) − (Iτ ⊗ ω)1τeτe

′τ (Iτ ⊗ ω′)

)eN

= λ

⎛⎝m−1∑

j=0

w2j

⎞⎠−1

(Qτ ⊗ ωω′)(eτ ⊗ em)

= λ

⎛⎝m−1∑

j=0

w2j

⎞⎠−1

(0 ⊗ ωω′em) = 0,

where the last equality follows since a demeaned vector of ones results in a vectorof zeros. Hence N⊥ ⊂ M⊥.

Remark to the Proof of Theorem 1. If the temporal aggregation frequencyis larger than 2 (m ≥ 2), which always happens if we aggregate temporally, theinclusion N⊥ ⊂ M⊥ is strict. To see this, note that (eτ ⊗ z)z∈Rm ∈ M⊥ butgenerally /∈ N⊥.

MODEL AGGREGATION FOR PANELS 93

Proof of Proposition 2. Rewrite (4.8) as⎛⎝m−1∑

j=0

w2j

⎞⎠−1

(Qτ ⊗ ωω′) −

⎛⎝m−1∑

j=0

w2j

⎞⎠−1

(Qτ ⊗ ωω′).

Consider now the family of vectors of the form z ⊗ ωc where z ∈ Rτ and ωc

is such that ω′ωc = 0. By construction such a vector is in the null space ofF

′1QτF 1 since

F′1QτF 1(z ⊗ ωc) =

⎛⎝m−1∑

j=0

w2j

⎞⎠−1

(Qτ ⊗ ωω′)(z ⊗ ωc)

=

⎛⎝m−1∑

j=0

w2j

⎞⎠−1

(Qτz ⊗ ωω′ωc) = 0.

However generally ω′ωc = 0. To see this note that the condition ω′ωc = 0 andω′ωc = 0 define two hyperplanes in the m-dimensional space. Since ω = λω forall λ ∈ R\0 these hyperplanes are not parallel and the result follows.

Proof or Proposition 3. If rank(A) > A then there is at least one rowof F 2 that is not a linear combination of the rows in F2 and this implies thatthe columns of F ′

2 and F′2 generate different spaces.

Proof of Proposition 4. Immediate since k + 1 = rank(FZ) ≤min(rank(F ), rank(Z)) = rank(Z).

Proof of Theorem 2. The proof heavily relies on the proof of Theorem1. We need to show that

Z ′V 2NZ − Z ′F ′V 2

τFZ = Z ′(V 2N − F ′V 2

τF )Z

is positive definite. We first rewrite

F ′V 2τF = F ′

2(F 2F′2)

−1F 2 ⊗ F ′1(F 1ΣF ′

1)−1F 1.

So

Z ′(V 2N − F ′V 2

τF )Z(A.5)= Z ′[((II − P 2) ⊗ Σ−1) + (P 2 ⊗ (Σ−1 − F ′

1(F 1ΣF ′1)

−1F 1))]Z.

The proof is completed if each component of the Kronecker product is semi-positive definite. The matrix Σ−1 is the inverse of a positive definite matrix, andhence it is positive definite matrix. The matrices II −P 2 and P 2 are projectionmatrices, and hence semi-positive definite. Finally

Σ−1 − F ′1(F 1ΣF ′

1)−1F 1 = Σ−1/2(IN − Σ1/2F ′

1(F 1ΣF ′1)

−1F 1Σ1/2)Σ−1/2.

94 DAVID VEREDAS AND ALEXANDRE PETKOVIC

Since the expression within brackets is a projection matrix, the entire matrixproduct is semi-positive definite.

Proof of Proposition 5. Start by rewriting (4.14) as

Σ−1/2′(Σ1/2′F ′1(F 1ΣF ′

1)−1F 1Σ1/2 − Σ1/2′F

′1(F 1ΣF

′1)

−1F 1Σ1/2)Σ−1/2.

Since the matrix Σ−1/2′ is invertible we only need to consider the matrix differ-ence between the brackets. Let ϕ1(z) = F ′

1z and ϕ2(z) = F′1z where z ∈ R

τ

and ϕ1 and ϕ2 are linear forms mapping Rτ to R

T . Define ψ(Z) = Σ1/2′Z whereZ ∈ R

T×τ . Thus we have that ψ ϕ1 = Σ1/2′F ′1 and ψ ϕ2 = Σ1/2′F

′1. Since

Σ1/2′ is invertible the mapping ψ is injective and the hypothesis of the prepara-tory Lemma 2 are fulfilled and hence we only need to compare the column spacesof F ′

1 and F′1.

We can assume without loss of generality that the first component of thevector ω is non zero. Consider the vector F ′

1z where z = (1, 0, . . . , 0)′ ∈ Rτ . By

assumption there is no λ ∈ R such that λF′1z = F ′

1z. Thus the column space ofF ′

1 and F′1 do not coincide.

AcknowledgementsWe are grateful to participants of the Third Brussels-Waseda seminar in

Time Series and Financial Statistics and to Antonio Estache, Christian Hafner,Marc Hallin, Cheng Hsiao, Helmut Lutkepohl, Chris Muris, Paolo Paruolo andMasanobu Taniguchi for useful comments. We are very grateful to Davy Paindav-eine for his detailed reading of the article and sharp comments. Any remainingerrors and inaccuracies are ours. David Veredas acknowledges financial supportfrom the Belgian National Bank, and the IAP P6/07 contract, from the IAPprogram (Belgian Scientific Policy), ‘Economic policy and finance in the globaleconomy’.

References

Amemiya, T. and Wu, R. (1972). The effect of aggregation on prediction in the autoregressivemodel, J. Am. Stat. Assoc., 67, 628–632.

Arellano, M. and Bond, S. (1991). Some tests of specification for panel data: Monte Carloevidence and an application to employment equations, Rev. Econ. Stud., 58, 277–297.

Brewer, K. R. W. (1973). Some consequences of temporal aggregation and systematic samplingfor ARMA and ARMAX models, J. Econom., 1(2), 133–154.

Chambers, M. J. (1973). Long memory and aggregation in macroeconomic time series, Int.Econ. Rev., 39, 1053–1072.

Drost, F. and Nijman, T. (1993). Temporal aggregation of Garch processes, Econometrica,61(4), 909–927.

Garrett, T. A. (2003). Aggregated versus disaggregated data in regression analysis: Implicationsfor inference, Econ. Lett., 81, 61–65.

Geweke, J. (1978). Temporal aggregation in the multiple regression model, Econometrica,46(3), 643–661.

Gourieroux, C. and Monfort, A. (1995). Statistics and Econometric Models, Cambridge Uni-versity Press.

MODEL AGGREGATION FOR PANELS 95

Granger, C. W. J. (1980). Long memory relationships and aggregation of dynamic models, J.Econom., 14, 227–238.

Granger, C. W. J. (1987). Implications of aggregation with common factors, Econ. Theory , 33,208–222.

Hsiao, C. (2003). Analysis of Panel Data, Econometric Society Monographs, Second edition.Jorda O. and Marcellino M. (2004). Time-scale transformations of discrete time processes, J.

Time Ser. Anal., 25, 873–894.Lutkepohl, H. (1986). Forecasting Aggregated Vector ARMA Processes, Springer Verlag.Palm, F. and Nijman, E. (1984). Missing observations in the dynamic regression model, Econo-

metrica, 52(6), 1415–1436.Pesaran, M. H. (2003). Aggregation of linear dynamic models: an application to life-cycle

consumption models under habit formation, Econ. Model., 20, 227–435.Schmid, M. and Schneeweiss, H. (2009). The effect of microaggregation by individual ranking

on the estimation of moments, J. Econom., 153(2), 174–182.Silvestrini, A. and Veredas, D. (2008). Temporal aggregation of univariate time series models,

J. Econ. Surv., 22(3), 458–497.Sims, C. (1971). Discrete approximation to continuous time distributed lags in econometrics,

Econometrica, 39(3), 545–563.Tesler, L. (1967). Discrete samples and moving sums in stationary stochastic processes, J. Am.

Stat. Assoc., 62, 484–499.Theil, H. (1954). Linear aggregations of economic relations, Amsterdam, North Holland Pub-

lishing Company.Tiao G. C. (1972). Asymptotic behaviour of temporal aggregates of time series, Biometrika,

59, 525–531.Tilak, A. (1998). Forecasting Singapore’s quarterly GDP with monthly external trade, Int. J.

Forecast., 14, 505–513.Tilak, A. (2000). Modeling variables of different frequencies, Int. J. Forecast., 16, 117–119.Wei, W. W. S. (1978). Some consequences of temporal aggregation in seasonal time series

models, in Zellner, A. (ed.) Seasonal Analysis of Economic Time Series, US Departmentof Commerce, Bureau of the Census, Washington, DC.

Weiss, A. (1984). Systematic sampling and temporal aggregation in time series models, J,Econom., 26, 271–281.

Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and testsfor aggregation bias, J. Am. Stat. Assoc., 57, 348–368.

Zellner, A. and Montmarquette, C. (1971). A study of some aspects of temporal aggregationproblems in econometric analysis, The Review of Econometrics and Statistics, 53(4), 335–342.