series estimation in partially linear inslide regression ...semiparametric models have been widely...

Scandinavian Journal of Statistics, Vol. 38: 89–107, 2011

doi: 10.1111/j.1467-9469.2010.00711.x© 2010 Board of the Foundation of the Scandinavian Journal of Statistics. Published by Blackwell Publishing Ltd.

Series Estimation in Partially LinearIn-Slide Regression ModelsJINHONG YOU

School of Statistics and Management, Shanghai University of Financeand Economics

XIAN ZHOU

Department of Actuarial Studies, Macquarie University

YONG ZHOU

Institute of Applied Mathematics, Chinese Academy of Sciences

ABSTRACT. The partially linear in-slide model (PLIM) is a useful tool to make econometric anal-yses and to normalize microarray data. In this article, by using series approximations and a leastsquares procedure, we propose a semiparametric least squares estimator (SLSE) for the parametriccomponent and a series estimator for the non-parametric component. Under weaker conditions thanthose imposed in the literature, we show that the SLSE is asymptotically normal and that the seriesestimator attains the optimal convergence rate of non-parametric regression. We also investigatethe estimating problem of the error variance. In addition, we propose a wild block bootstrap-basedtest for the form of the non-parametric component. Some simulation studies are conducted to illus-trate the finite sample performance of the proposed procedure. An example of application on a setof economical data is also illustrated.

Key words: fixed effect, partially linear, semiparametric least squares, series approximation

1. Introduction

Semiparametric models have been widely used for applications in many scientific fieldsbecause of their flexibility for both parametric and non-parametric modelling. Proposed formsinclude the partially linear regression models (e.g. Engle et al., 1986), partially nonlinearregression models (e.g. Andrews, 1996), single-index regression models (e.g. Ichimura, 1993;Delecroix et al., 2003) and varying-coefficient index regression models (e.g. Fan et al., 2003).Partially linear models with fixed effects have gained attention in modelling economics andgenomics data sets recently (e.g. Baltagi & Li, 2002; Fan et al., 2004; Fan et al., 2005). Inthis article, we consider efficient inference for such model.

A fixed-effects partially linear model can be written as:

Yij =�i +XTij �+m(Uij)+ �ij , i =1, . . . , G, j =1, . . . , I , (1)

where the Yij are the response variables, the �i are the unknown constants representing thefixed effects, Xij = (Xij1, . . . , Xijp)T and Uij are strictly exogenous regressors, �= (�1, . . . , �p)T

is an unknown p × 1 parameter vector, m(·) is an unknown function, �ij are i.i.d. randomerrors with mean 0 and variance �2 and the superscript T denotes transpose. Throughoutthe rest of this article, we assume that I is finite and G is large. In addition, following Fanet al. (2005) we choose

∑Gi =1 �i =0 as our identification condition.

Model (1) differs from the usual partially linear regression models (Engle et al., 1986)because of increasing number of nuisance parameters �i whose number grows with the sample

转载

http://www.paper.edu.cn中国科技论文在线

90 J. You et al. Scand J Statist 38

size. It can be viewed as an extension of the usual fixed-effects parametric model to thesemiparametric structure. Fixed-effects models are appropriate if the interest is in a specificset of subjects. Such models have been widely used in econometric analysis. For example,Lichtenberg (1988) used fixed effects to study the internal costs associated with the introduc-tion of new plants and equipment into manufacturing. Details of fixed-effects modelling canbe found in Baltagi (1995) and Ahn and Schmidt (2000). Recently, the fixed-effects modelhas also found its application in genomics. For instance, Fan et al. (2004, 2005) used model(1) to conduct a microarray analysis of neuroblastoma cells in response to the macrophagemigration inhibitory factor.

Because of the diverging number of fixed effects in (1), estimating the parametric and non-parametric components poses a challenge. Baltagi & Li (2002) proposed a difference-basedseries estimation (DSE) procedure for the parametric component � and non-parametric com-ponent m(·). They established the asymptotic normality of the former and derived the con-vergence rate of the latter. Fan et al. (2005) proposed a profile least squares estimator (PLSE)for � and a non-parametric estimator for m(·) by combining the local linear technique andleast squares. They established the asymptotic normality of the former and derived the upperboundary of the mean-squared error of the latter. Moreover, they found that the parameters{�i} cannot be consistently estimated. This is an interesting extension of the partial consis-tent phenomena observed by Neyman & Scott (1948). Compared with the PLSE, the DSEapproach is computationally simple and it is straightforward to establish the asymptotic prop-erties. The DSE, however, is asymptotically inefficient for I > 2, since it ignores the correla-tion between the errors caused by difference.

In this article, by using series approximations to the non-parametric component and theleast squares procedure, we propose a semiparametric least squares estimator (SLSE) forthe parametric component and a series estimator for the non-parametric component. Ourapproach is based on function approximation through basis expansions, and can easily beapplied to this semiparametric linear model. All parameters of the components in this modelcan be estimated by the usual least-squares method. In particular, this estimation methodis robust against sparse observations at some distinct points, but requires smoothing of thekernel and local linear estimator by binning of the non-parametric component. In contrast,our SLSE does not involve such work. Under weaker conditions than those imposed by Fanet al. (2005), we show that the SLSE is asymptotically normal and that the series estimatorattains the optimal convergence rate of non-parametric regression. We further investigate theestimation of the error variance.

Another important statistical question in fitting model (1) is whether m(·) can be de-scribed by a parametric structure. This amounts to testing if m(·) is in a certain para-metric form. We extend the generalized likelihood ratio proposed by Fan et al. (2001) to thesetting of partially linear in-slide models (PLIMs). A wild block bootstrap method isproposed for finding the null distribution of the test statistics. Simulation studies confirmthat the resulting testing procedure is powerful and that the bootstrap method gives theright null distribution.

The layout of the remainder of this article is as follows. In section 2, we present theestimators of the parametric component, the non-parametric component, as well as theerror variance. Large sample properties of these estimators are developed in section 3.Section 4 introduces a wild block bootstrap-based test for the form of the non-parametriccomponent. The selection of the smoothing parameters is shown in section 5. Simulationstudies are reported in section 6. An application of the proposed model and methods ona set of economical data is demonstrated in section 7, followed by concluding remarks insection 8. All proofs are relegated to the Appendix.

© 2010 Board of the Foundation of the Scandinavian Journal of Statistics.

中国科技论文在线 http://www.paper.edu.cn

Scand J Statist 38 Series estimation in in-slide models 91

2. Semiparametric least squares estimation

In this section we will present a SLSE of the parametric component, a series estimator ofthe non-parametric component and a residual-based estimator of the error variance. Theseestimators are constructed by combining series approximation and least squares.

Suppose that n=GI and B = (B1, . . . , Bn)T is in the form: B= IG ⊗1I . The non-parametriccomponent m(u) can be approximated by �T(u)ϑ, where �(u) = (�kn1(u), . . . , �knkn (u))T isa vector of approximating functions, such as power series or B-splines, ϑ is an unknownkn-variate constant vector and kn is a positive integer dependent on n. Then model (1) canbe written as:

Y =B�+X�+�ϑ+ �∗, (2)

where Y = (Y11, . . . , Y1I , . . . , YGI )T, � = (�1, . . . , �G)T, X = (X11, . . . , X1I , . . . , XGI )T, � =(�1, . . . , �p)T, � = (�(U11), . . . , �(U1I ), . . . , �(UGI ))T is the n × kn matrix with �(Uij) =(�kn1(Uij), . . . , �knkn (Uij))T, �∗

n =�+M−�ϑ and M=(m(U11), . . . , m(U1I ), . . . , m(UGI ))T. DefineMB = In −PB = In −B(BTB)−1BT = In − (1/I )BBT. Premultiplying (2) by MB leads to

MBY =MBX�+MB�ϑ+MB�∗. (3)

Obviously, if we take MB�∗ as the residuals, model (3) is a version of the usual linear regres-sion. By the usual ‘profile’ or ‘partialling out’ formula, the estimator of � is given by

�n = (XTMBMMB�MBX)−1XTMBMMB�MBY, (4)

where MMB� = In −PMB� = In −MB�(�TMB�)−�TMB. An estimator of ϑ is

ϑn = (�TMB�)−�TMB(Y −X�n),

where A− denotes any generalized inverse of matrix A. Then, an obvious estimator of m(u)is

mn(u) = �T(u)ϑn, (5)

which is a non-parametric projecting estimator.Moreover, since tr(PB) = (1/I )tr(IG) · tr(1I 1T

I ) =G, based on �n and ϑn we propose an esti-mator of the error variance �2 as:

�2n = 1

n−G −p−kn(Y −X�n −�ϑn)TMB(Y −X�n −�ϑn). (6)

In the next section, we will investigate the asymptotic properties of �n, mn(·) and �2n such

as consistency and asymptotic normality.

3. Large sample properties of the proposed estimators

Let �l (Uij) =E(Xijl |Uij). By the same argument as in Donald & Newey (1994), the conver-gence rates of the estimators proposed in section 2 depend on approximation error boundsfor m(·) and �l (·). We assume the existence of functions �m(kn) and ��l (kn) satisfying themagnitude restrictions specified next: there are �m and ��l

such that

supn

⎧⎨⎩1

n

G∑i =1

I∑j =1

E[m(Uij)− �T(Uij)�m

]2

⎫⎬⎭≤�m(kn)




and

supn

⎧⎨⎩1

n

G∑i =1

I∑j =1

E[�l (Uij)− �T(Uij)��l

]2

⎫⎬⎭≤��l (kn), l =1, . . . , p,

where �(·) is defined in the previous section.Obviously, for specific approximating functions, the magnitude of these approximation

errors will be determined by the degree of smoothness of the functions being approximated.To facilitate the notation we make the following definition for the the degree of smoothness.

Definition 1. Define a function f (u) to be smooth of degree df if it is continuously differentiableof order equal to [df ], the integer part of df and there is a constant c such that for each partialderivative ∇f (u) of order [df ], |∇f (u1)−∇f (u2)|≤ c|u1 −u2|df −[df ].

According to Lorentz (1986) and Schumaker (1981), for power and B-spline series, if m(·)is smooth of degree dm, then �m(kn) =O(k−dm

n ).To present the asymptotic properties of �n, mn(·) and �2

n we make the following assump-tions. They are not the weakest possible conditions, but are imposed to facilitate the technicalproofs. Let

�ij =Xij −E(Xij |Uij). (7)

Assumptions.

(A1) (XTi1, . . . , XT

iI , Ui1, . . . , UiI ) are i.i.d. over i =1, . . . , G. (�ij) are i.i.d. over i =1, . . . , Gand j =1, . . . , I . Furthermore,

(i) E[(�1· − �1)(�1· − �1)T]=� and E(�211) =�2, where �1· = (�11, . . . , �1I ) and �1 =

I−1∑I

j =1 �1j .(ii) E(|�1jl |2+)≤c <∞ and E(|�1j |2+)≤c <∞ for j =1, . . . , I , l =1, . . . , p and some >0,

where �1jl is the l-element of �1j .

(A2) The support of Uij is compact and there are extensions of m(u) and �l (u) for l =1, . . . , p, to a compact cube, containing the support such that the extensions aresmooth of degree dm and d�l , respectively. Also, �kns(·) are either power series orB-spline series.

Remark 1. In Fan et al. (2005), it was assumed that each component of Xij is bounded and(XT

ij , Uij) are i.i.d. over i =1, . . . , G and j =1, . . . , I . These assumptions have been relaxed in(A1). In particular, we do not require (XT

ij , Uij) to be independent over j =1, . . . , I , so that(XT

ij1, Uij1 ) and (Xij2 , Uij2 ) can be correlated for j1 /= j2. Assumption (A2) is also used by Donald

& Newey (1994).We are now ready to establish the main results on the large sample properties of �n, mn(·)

and �2n. Theorem 1 provides the asymptotic normality of �n.

Theorem 1. Under assumptions (A1) and (A2), if

√nk

−(dm +min1≤l≤p d�l )n →0 as n→∞, (8)




then√

n(�n −�)→D

N(0, �2I�−1) as n→∞,

where � is defined in assumption (A1).

Remark 2. Obviously, there exist kn to satisfy (8) if dm +min1≤l≤p d�l > 1/2, which holds,for example, if m(·) and �l (·) are both Lipschitz continuous of order 1/4 or higher. In Fanet al. (2005), the function m(·) was assumed to have a bounded second derivative and �l (·)are Lipschitz continuous in u, which are stronger than (8). Under the assumptions of Fanet al. (2005), the asymptotic variance in theorem 1 is given by

�2I�−1 = �2II −1

{E(�11�T11)}−1.

Next, we consider convergence rates for the estimator mn(·) of m(·). By the same methodas in Donald & Newey (1994), the difference between m(·) and mn(·) will be measured by thesquare root of the sample mean-squared error (MSE) defined as:

1√n

‖mn(·)−m(·)‖=⎧⎨⎩1

n

G∑i =1

I∑j =1

(mn(Uij)−m(Uij))2

⎫⎬⎭

1/2

.

The convergence rate is given by the following theorem.

Theorem 2. Under assumptions (A1), (A2) and (8), if kn = cn1/(2dm +1) for some positive constantc, then

1√n‖mn(·)−m(·)‖=Op

(n−dm/(2dm +1)

).

Therefore, mn(·) attains the optimal non-parametric regression convergence rate.

Theorem 3 shows that �2n is asymptotically normal.

Theorem 3. Under assumptions (A1), (A2) and (8), if E�41 <∞, then

√n(�2

n −�2)→D

N(0, ) as n→∞,

where =E�41 + (3− I )/(I −1)�4.

To apply theorems 1 and 3 to make statistical inferences such as constructing confidenceintervals of � and �2, we need consistent estimators of �2I�−1 and . Define

�n =n�2n(XTMBMMB�MBX)−1, �= (�1, . . . , �n)T = MB(Y −X�n −�ϑn)

and

n =(

1+ 6I 2

− 3I 3

− 4I

)−1[

1n

n∑i =1

�4i − I −1

I 2

(6− 9

I

)�4

]+ 3− I

I −1�4.

Theorem 4 shows that �n and n are consistent estimators of �2I�−1 and , respectively.

Theorem 4. Under the conditions of theorem 3, �n →p �2I�−1 and n →p as n→∞.




4. Wild block bootstrap to test the form of the non-parametric component

Another important statistical question in fitting model (1) is if there exists a parametric struc-ture for m(·). This amounts to testing if m(·) is in a certain parametric form. Because of thefact that � cannot be consistently estimated, model validation has become a great challenge.We here extend the generalized likelihood ratio proposed by Fan et al. (2001) to the settingof semiparametric in-slide models.

For a given family of functions �∗(·, �) indexed by an unknown parameter vector �, wecannot test m(u) = �∗(u, �) directly because of non-identifiable intercept. Therefore, wemodify �∗(·, �) to

�(·, �) = �∗(·, �)+∫

Um(u)p(u) du −

∫U

�∗(u, �)p(u) du.

Consider the null hypothesis H0 : m(u) = �(u, �). Let �n be an estimator of �. Based onmodel (3) we define the pseudoresidual sum of squares (PRSS) under the null hypothesisas follows:

PRSS0 =n−1(Y −X�n − �(�n))TMB(Y −X�n − �(�n)),

where �(�n) = (�(U1, �n), . . . , �(Un, �n))T. Analogously, the PRSS under the alternative hypoth-esis is:

PRSS1 =n−1(Y −X�n −�ϑn)TMB(Y −X�n −�ϑn).

The test statistic is defined as:

Qn = PRSS0 −PRSS1

PRSS1= PRSS0

PRSS1−1.

We have the following asymptotic result for Qn.

Theorem 5. Under assumptions (A1), (A2) and (8), as n → ∞, Qn → 0 in probability underH0. Otherwise, if H0 is false and inf [

∫U (m(u) − �(u, ))2 du]1/2 > 0, then Qn > with proba-

bility approaching 1 for some > 0.

Theorem 5 motivates the use of a test procedure that rejects H0 when Qn is larger than anappropriate critical value. According to the definition of MB and �∗,

MB�∗ =MB�+op(1) =⎛⎝�11 − 1

I

I∑j =1

�1j , . . . , �1I − 1I

I∑j =1

�1j , . . . , �GI − 1I

I∑j =1

�Gj

⎞⎠

T

+op(1),

where �∗ is defined in section 2. This implies that the errors in model (3) are block corre-lated. Therefore, the usual bootstrap approaches (such as classic bootstrap or wild bootstrap)are not applicable to find the p-value of the test. Here we propose to apply the wild blockbootstrap method. The details of the wild block bootstrap-based procedure are described asfollows.

Step 1. Given the initial sample (Yij , Xij , Uij), i =1, . . . , G and j =1, . . . , I , we constructan SLSE �n and ϑn by the method described in section 2. Then, we obtain theestimated pseudoresiduals (e11, . . . , e1I , . . . , eGI )T =MB(Y −X�n −�ϑn).

Step 2. Let F (·) be the distribution function of a symmetric random variable � withE[�]=0, E[�2]=1 and E[|�|3] <∞. We stress that F (·) is chosen independentlyof the given regression model. Generate a sequence of random variables {�i}G

i =1

from F (·).




Step 3. Generate a bootstrap sample

(Y ∗11, . . . , Y ∗

1I , . . . , Y ∗GI )T =MB(X�n + �(�n)) + (�1e11, . . . , �1e1I , . . . , �GeGI )T.

Step 4. Based on the bootstrap sample, define the conditional counterparts of PRSS0

and PRSS1, respectively, as:

CPRSS0 =n−1(Y∗ −X�∗n − �(�

∗n))TMB(Y∗ −X�

∗n − �(�

∗n))

and

CPRSS1 =n−1(Y∗ −X�∗n −�ϑ

∗n)TMB(Y∗ −X�

∗n −�ϑ

∗n),

where Y∗ = (Y ∗11, . . . , Y ∗

1I , . . . , Y ∗GI )T and �

∗n, ϑ

∗n have the same definitions as �n,

ϑn, but with Y∗ in place of Y. Correspondingly, the conditional counterpart ofQn is:

CQn = CPRSS0 −CPRSS1

CPRSS1= CPRSS0

CPRSS1−1.

Step 5. Generate N values of CQn, say CQ(i)n , i =1, . . . , N .

Step 6. The p-value is estimated by p=k/(N +1) where k is the number of CQ(i)n such

that CQ(i)n ≥Qn. Reject H0 if p is below the designated level of significance.

The p-value of the test is simply the relative frequency of the event {CQn ≥ Qn} in thereplications of the bootstrap sampling. For the sake of simplicity, we use the same numberof knots in calculating CQn as that in Qn. Note that we bootstrap the pseudoresiduals fromthe semiparametric fit instead of the parametric fit, because the semiparametric estimate ofthe residuals is consistent under either the null hypothesis or the alternative. The methodshould provide a consistent estimator of the null distribution even if the null hypothesis isnot true.

Remark 3. In applications,∫

U m(u)p(u)du is unknown and we can replace it by

1n

G∑i =1

I∑j =1

mn(Uij),

where m(·) is obtained by semiparametric fitting of model (1). If we accept H0, model (1)becomes a parametric regression

Y =B�+X�+ 01n + �(�)+ �, n=G × I . (9)

For this parametric regression model, we can first estimate � and � by combining (non)linearleast squares and premultiplying (9) by MB. Then, estimate 0 by

0 =1Tn (Y −X�n − �(�n)),

which can be shown to be root-n consistent.

5. Smoothing parameter selection

Smoothing parameter selection is essential in the setting of non-parametric and semipara-metric regressions. Here, we focus on the B-spline series approximation. For the B-splineseries approximation we need to select the degrees of splines and the numbers and locationsof knots. However, because of computation complexity it is often impractical to automati-cally select all three components. Similar to Rice & Wu (2001), we use B-spline series withequally spaced knots and fixed degrees, and select the numbers of knots , using the data.




Selecting the smoothing parameters for semiparametric models, particularly for estimating theparametric component, was posed by Bickel & Kwon (2001) as an important and unsolvedproblem.

Here, we propose an ad hoc method to select the smoothing parameter . Define the deletedblock pseudosquares cross-validation function by

CV() = 1n

G∑i =1

I∑j =1

(Yij −XTij �

−in − m−i

n (Uij))TMB(Yij −XTij �

−in − m−i

n (Uij)),

where �−in and m−i

n (·) have the same definitions as �n and mn(·) except omitting the ith indi-vidual. The CV(), depending on the smoothing parameter , is used as an overall measureof the effectiveness of the estimation scheme. The ad hoc bandwidth selector is the one thatminimizes CV(), namely

CV =arg min

CV().

6. Simulation studies

In this section, we carry out some simulation studies to demonstrate the finite sample perfor-mance of the estimation and testing procedure proposed in the previous sections. We considertwo examples in our simulations. The first one is motivated by the economical data used inthe next section to demonstrate the application; and the second one is the same as example 1in Fan et al. (2005), which is close to microarray data.

Example 1. In this example, we generate the data from the following model:

Yij =�i +X1ij�1 +X2ij�2 +m(Uij)+ �ij , i =1, . . . , G, j =1, . . . , I ,

where the �i are generated from a uniform distribution U (−1, 1) for i =1, . . . , G; X1ij and X2ij

are generated from normal distributions N(0, 1) and N(1, 1), respectively, for i =1, . . . , G andj =1, . . . , I ; Uij are from U (0, 1); �1 =2, �2 =1.5; m(u) = sin(2�u); and the random errors �ij

are i.i.d. N(0, 1) for i =1, . . . , G and j =1, . . . , I . Moreover, we take G =100, 150, 250 andI =3, 4, 5. In each case, simulated realizations are replicated 1000 times, whereas the �i valuesare generated only once.

For the estimators of the parametric components (�1, �2) and the error variance �2, wecalculate the sample means (SM) and their standard deviations (SD) along with the averageof the estimated standard errors (SE). In addition, the coverage probabilities (CP) of the95 per cent confidence intervals are also calculated based on the normal approximation. In oursimulation study, we use the univariate cubic B-spline basis function and the uniform knots.According to He & Shi (1996), uniform knots are usually sufficient when the function m(·)does not exhibit dramatic changes in its derivatives. The cross-validation method in section5 is used to determine the number of knots. Simulation results are summarized in Table 1.To compare with the method of Fan et al. (2005), we also present their PLSE �n in Table 1.

From Table 1, we make the following observations:

• The SLSE �n and error variance estimator �2n are asymptotic unbiased.

• The SDs of �n and �2n decrease as the sample size n increases.

• When n is fixed, the SDs of �n and �2n decrease as I increases.

• The means of the SE estimates closely agree with the simulation standard errors forboth the SLSE and the estimator of error variance.




Table 1. The finite sample performance of the estimators of the parametric components and error variance

G =100 G =150 G =250

I =3 I =4 I =5 I =3 I =4 I =5 I =3 I =4 I =5

�n1SM 1.9998 2.0026 2.0008 2.0001 2.0010 2.0003 1.9999 1.9998 2.0018SD 0.0700 0.0568 0.0504 0.0591 0.0494 0.0402 0.0472 0.0371 0.0318SE 0.0721 0.0583 0.0506 0.0583 0.0475 0.0411 0.0450 0.0368 0.0318CP 0.9530 0.9500 0.9490 0.9430 0.9360 0.9530 0.9380 0.9470 0.9560

�n2SM 1.4994 1.4991 1.5009 1.5012 1.5007 1.5003 1.4972 1.5011 1.4999SD 0.0721 0.0598 0.0518 0.0579 0.0477 0.0419 0.0473 0.0369 0.0317SE 0.0719 0.0583 0.0506 0.0583 0.0475 0.0411 0.0451 0.0368 0.0318CP 0.9480 0.9550 0.9440 0.9540 0.9480 0.9440 0.9340 0.9560 0.9540

�n1SM 2.0000 2.0027 2.0007 2.0000 2.0007 2.0002 2.0000 1.9996 2.0018SD 0.0706 0.0574 0.0508 0.0598 0.0497 0.0404 0.0477 0.0372 0.0319SE 0.0713 0.0579 0.0503 0.0580 0.0474 0.0410 0.0448 0.0367 0.0317CP 0.9500 0.9540 0.9500 0.9420 0.9360 0.9500 0.9300 0.9460 0.9530

�n2SM 1.4996 1.4988 1.5011 1.5016 1.5008 1.5004 1.4976 1.5010 1.5001SD 0.0735 0.0610 0.0518 0.0583 0.0480 0.0420 0.0474 0.0370 0.0318SE 0.0712 0.0580 0.0503 0.0580 0.0474 0.0410 0.0448 0.0366 0.0317CP 0.9420 0.9450 0.9440 0.9500 0.9490 0.9390 0.9330 0.9510 0.9510

�2n

SM 0.9675 0.9861 0.9976 0.9870 1.0051 1.0075 1.0028 1.0110 1.0149SD 0.0986 0.0809 0.0710 0.0822 0.0666 0.0591 0.0668 0.0554 0.0465SE 0.0962 0.0799 0.0702 0.0797 0.0666 0.0580 0.0631 0.0520 0.0453CP 0.9040 0.9350 0.9410 0.9180 0.9440 0.9400 0.9360 0.9320 0.9370

SM, sample mean; SD, standard deviation; SE, standard error; CP, coverage probability.

• Each estimated confidence interval attains a coverage close to the nominal 95 per centlevel.

• Our estimators of �n are comparable with those proposed by Fan et al. (2005).

In addition, we calculate the estimators of the non-parametric component �(·) by our seriesapproximation and the backfitting method proposed by Fan et al. (2005). Figure 1 shows arandomly selected estimate of m(·) for different G and I . From Fig. 1, we can see that ourseries approximation is also comparable with the backfitting method.

To demonstrate the power of the proposed wild block bootstrap test, we consider thefollowing null hypothesis:

• H0 : m(Uij) = Uij for i =1, . . . , G, j =1, . . . , I (a linear regression model) against thealternative:

• H1 : m(Uij) /= Uij for at least one pair (i, j).

The power function is evaluated under a sequence of the alternative models indexed by c:

H1 : m(Uij) = Uij + sin(c�Uij) for all i =1, . . . , G, j =1, . . . , I .

We apply the goodness-of-fit test described in section 4 in a simulation with 1000 replications.The random variables {�i} in step 2 (cf. section 4) are taken to be binary with

P

(�i =−

√5−12

)=

√5+1

2√

5and P

(�i =

√5+12

)=1−

√5+1

2√

5.




0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

−2

−1

0

1

2

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

−2

−1

0

1

2

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 1

−2

−1

0

1

2

−2

−1

0

1

2

−2

−1

0

1

2

−2

−1

0

1

2

−2

−1

0

1

2

−2

−1

0

1

2

−2

−1

0

1

2

A B C

D E F

G H I

Fig. 1. Simulated estimators of the non-parametric component m(u) with m(u) =2 sin(2�u) (solid curve),where dash-dotted curve represents the series approximation estimator and dashed curve is the back-fitting estimator. The cases in (A)–(I) are as follows: (A) G =100 and I =3, (B) G =100 and I =4, (C)G =100 and I =5, (D) G =150 and I =3, (E) G =150 and I =4, (F) G =150 and I =5, (G) G =250and I =3, (H) G =250 and I =4, (I) G =250 and I =5.

For each realization, we repeat bootstrap sampling of 500 samples. Table 2 summarizes thesimulated power function against c, with c =0 representing the null hypothesis, and c > 0 forthe alternatives. Table 2 demonstrates that the bootstrap estimate of the null distribution isapproximately correct and our test is powerful against the alternatives.

Example 2. In the second example, we generate the simulated data according to example1 in Fan et al. (2005), which is close to microarray data. To compare the performance of theproposed SLSE with their PLSE, we calculate

Ratio= MSE of the SLSEMSE of the PLSE

,

where the MSE represents the mean-squared error defined in Fan et al. (2005). The resultsare summarized in Table 3, which shows that the SLSE slightly outperforms the PLSE inmost cases with the ratio of the MSE below 1, indicating a smaller MSE of the SLSE. Inaddition, we applied the goodness-of-fit test described in section 4 to example 1 of Fan et al.(2005). The bootstrap estimate still approximates the null distribution correctly and the testis powerful against the alternatives. The results are not listed here.

Table 2. The level and power of the bootstrap-based goodness-of-fit test of models

c 0 0 0 0.15 0.20 0.25 0.30

Nominal level of test 5% 10% 20% 5% 5% 5% 5%

G =150, I =5 5.1% 11.3% 19.2% 10.6% 46.3% 84.1% 99.7%G =250, I =3 4.8% 10.7% 21.5% 13.8% 54.1% 89.8% 100%




Table 3. Comparison between the semiparametric and the profile least squares estimators based on example1 in Fan et al. (2005)

No aggregation Aggregation across arrays with J =4

G =100 G =200 G =400 G =800 G =100 G =200 G =400 G =800

m(·)2 0.9859 0.9877 0.9856 0.9929 1.0008 0.9823 1.0009 0.98803 0.9905 0.9920 0.9856 0.9934 0.9940 0.9987 0.9872 0.98874 0.9983 0.9926 0.9876 0.9855 0.9915 0.9999 0.9966 0.9912

�2 0.9972 1.0008 0.9990 1.0003 0.9937 0.9946 0.9935 0.98803 0.9843 0.9884 0.9888 0.9993 0.9889 0.9980 0.9910 0.98874 1.0014 1.0012 0.9916 0.9985 0.9894 0.9922 0.9831 0.9984

�2 0.9922 1.0006 0.9825 0.9942 0.9962 0.9952 0.9936 0.99943 0.9986 0.9916 0.9890 0.9854 0.9934 0.9830 0.9823 0.98204 0.9824 1.0001 0.9860 0.9859 1.0017 0.9836 0.9960 0.9986

7. Application

We now demonstrate the application of the proposed estimation procedures to analyse a setof economical data. The data were extracted from the STARS database of the World Bank.From this database we obtained measures of gross domestic product (GDP) and the aggre-gate physical capital stock; both were denominated in constant local currency units at theend of period 1987 (converted into US dollars at the end of period 1987) for 81 countriesover the period from 1960 to 1987. We excluded one country whose workforce has around 15years of schooling, which is much higher than others. The database also provided data on thenumber of individuals in the workforce between ages 15 and 64, and on the mean years ofschooling for the workforce (e.g. Duffy & Papageorgiou, 2000). Figure 2A–C presents scatterplots of the logarithm of the real GDP against the logarithms of real capital, labour supplyand mean years of schooling for the workforce.

From these plots we can see that, on the log scale, the relationship between GDP andcapital is linear, so is the relationship between GDP and labour supply. However, therelationship between GDP and mean years of schooling for the workforce is nonlinear.Therefore, we use the following PLIM to fit this data set.

Yij =�i +Xij1�1 +Xij2�2 +m(Uij)+ �ij , i =1, . . . , 81, j =1, . . . , 28,

where Yij is the log real GDP of country i in year j (with j =1 for year 1960, and so on),�i is individual effect of each country, Xij1 is the log of real capital, Xij2 is the log of laboursupply and Uij is the log of mean years of schooling for the workforce. The estimates of(�1, �2) are (�1, �2) = (0.5495, 0.2670) with SDs (0.0112, 0.0346), which imply that both capitaland labour supply are highly significant (even at the 1 per cent level) and positive. As a result,the real GDP is strongly and positively correlated with both real capital and labour supply.The estimate of the error variance �2 is �2 =0.0150. We also calculate the PLSE of (�1, �2)with bandwidth h=2×SD(U )(81×28)−1/5 =0.3546. The results are (�1, �2) = (0.5594, 0.2589)with SDs (0.0112, 0.0346).

Furthermore, the fitted non-parametric component curve m(·) is plotted in Fig. 2D.Figure 2D shows that the GDP changes little for log mean years of schooling between

−2.5 and 1 (corresponding approximately to mean years of schooling between 0 and 3 years).When the log mean years of schooling is over 1 (about 3 years), however, the GDP increaseswith the mean years of schooling rapidly at a nonlinear and accelerated pace. In addition, byimplementing the wild block bootstrap method developed in section 4, we found a p-value




18 20 22 24 26 28 30 3218

20

22

24

26

28

30A B

C D

Log of real capital

Log

of G

DP

11 12 13 14 15 16 17 18 19 20 2118

20

22

24

26

28

30

Log of labour supply

Log

of G

DP

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.518

20

22

24

26

28

30

Log of mean years of schooling

Log

of G

DP

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5Log mean years of schooling: u

−0.4

−0.2

0

0.2

0.4

0.6

m (

u)

Fig. 2. (A) The scatter plot of log real gross domestic product (GDP) against log real capital; (B) thescatter plot of log real GDP against log labour supply; (C) the scatter plot of log real GDP againstlog mean years of schooling for the workforce; (D) the fitted curve of m(·) with solid curve for theproposed estimator and dashed curve for the estimator of Fan et al. (2005).

of 0.036 against the null hypothesis of a linear relationship between the response Yij andthe regressor Uij ; thus, the null hypothesis is rejected at the 5 per cent level of significance.This confirms a nonlinear relationship between the log real GDP and the log mean years ofschooling.

All these results are consistent with classic labour economics. This example demonstratesthat our model and techniques are useful for sensible statistical analysis of economical data,and can provide statistical evidence to support classical economical theory.

The individual effect (�i) of each country is provided in Fig. 3.

8. Concluding remarks

In this article, we studied a semiparametric PLIM with fixed effects, which has been founduseful in modelling economical and microarray data. We developed an efficient procedure toestimate the parametric and non-parametric components, as well as the error variance, andderived the large sample properties of the estimators under weaker conditions than those inthe literature. A wild block bootstrap method was also proposed to test the structure of thenon-parametric component. The proposed inference method is applied to a set of economicaldata from the World Bank.

Compared with the PLSE proposed by Fan et al. (2005), our SLSE is intuitively appealingand easy to calculate. More importantly, the methodology of our SLSE can be extendedto more general semiparametric regression models. For example, the coefficients � of Xij inmodel (1) may be allowed to depend on the other regressor Uij , which is referred to as thevarying-coefficient partially linear regression model and has been found useful in capturingthe individual variations and ease the ‘curse of dimensionality’. Another extension is to theadditive varying coefficient partially linear regression model, which allows the coefficients ofcertain regressors Xij to be related to multiple additional regressors Uij1, Uij2, . . . in an addi-tive and non-parametric way. These extensions are currently under our further investigations.




0 10 20 30 40 50 60 70 80 90−1.5

−1

−0.5

0

0.5

1

1.5

Indi

vidu

al e

ffect

Fig. 3. Individual effects (�i) of 81 countries.

Another direction of extension is to relax the i.i.d. assumption on the errors to allowheteroscedasticity or serial correlations, which is particularly relevant to economic data. Whenthe errors are heteroscedastic or serially correlated in model (1), how to construct an efficientSLSE remains an open problem.

Acknowledgement

The authors are grateful to two referees and the editor for their valuable comments and sug-gestions that helped improve this article substantially. Yong Zhou’s research was supportedby grants from the National Natural Science Foundation of China (NSFC; nos 10731010and 10628104), the National Basic Research Program (no. 2007CB814902) and CreativeResearch Groups of China (no. 10721101). Partially supported by leading Academic DisciplineProgram, 211 Project for Shanghai University of Finance and Economics (the third phase)and project no.: B803.

References

Ahn, S. C. & Schmidt, P. (1999). Estimation of linear panel data models using GMM. In Generalizedmethod of moments estimation (ed. L. Mátyás), chapter 8, 211–247. Cambridge University Press, NewYork.

Andrews, D. W. K. (1996). Nonparametric kernel estimation for semiparametric models. EconometricTheory 11, 560–596.

Baltagi, B. H. (1995). Econometric analysis of panel data. John Wiley & Sons, Chichester.Baltagi, B. H. & Li, D. (2002). Series estimation of partially linear panel data models with fixed effects.

Ann. Econom. Finance 3, 103–116.Bickel, P. J. & Kwon, J. (2001). Inference for semiparametric models: some questions and an answer.

With comments and a rejoinder by the authors. Statist. Sinica 11, 863–960.Cai, Z., Fan, J. & Yao, Q. (2000). Functional-coefficient regression models for nonlinear time series.

J. Amer. Statist. Assoc. 95, 941–956.Delecroix, M., Härdle, W. & Hristache, M. (2003). Efficient estimation in conditional single-index

regression. J. Multivariate Anal. 86, 213–226.Donald, S. G. & Newey, W. K. (1994). Series estimation of semilinear models. J. Multivariate Anal. 50,

30–40.




Duffy, J. & Papageorgiou, C. (2000). A cross-country empirical investigation of the aggregate productionfunction apecification. J. Econom. Growth 5, 87–120.

Engle, R. F., Granger, W. J., Rice, J. & Weiss, A. (1986). Semiparametric estimates of the relation betweenweather and electricity sales. J. Amer. Statist. Assoc. 80, 310–319.

Fan, J., Zhang, C. & Zhang, J. (2001). Generalized likelihood ratio statistics and Wilks phenomenon.Ann. Statist. 29, 153–193.

Fan, J., Tam, P., Vande Woude, G. & Ren, Y. (2004). Normalization and analysis of cDNA micro-arraysusing within-array replications applied to neuroblastoma cell response to a cytokine. Proc. Natl. Acad.Sci. 101, 1135–1140.

Fan, J., Peng, H. & Huang, T. (2005). Semilinear high-dimensional model for normalization of micro-array data: a theoretical analysis and partial consistency. J. Amer. Statist. Assoc. 100, 781–813.

He, X. & Shi, P. (1996). Bivariate tensor-product B-spline in partly linear model. J. Multivariate Anal.58, 162–181.

Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation of single-indexmodels. J. Econometrics 58, 71–120.

Lichtenberg, F. R. (1988). Estimation of the internal adjustment cost model using longitudinal establish-ment data. Rev. Econ. Statist. 70, 421–430.

Lorentz, G. G. (1986). Approximation of functions, 2nd edn. Chelsea Publishing Co., New York.Neyman, J. & Scott, E. (1948). Consistent estimates based on partially consistent observations.

Econometrica 16, 1–32.Rice, J. A. & Wu, C. O. (2001). Nonparametric mixed effects models for unequally sampled noisy curves.

Biometrics 57, 253–259.Schumaker, L. L. (1981). Spline functions: basic theory. John Wiley & Sons, Inc., New York.

Received July 2008, in final form June 2010

Xian Zhou, Department of Actuarial Studies, Macquarie University, North Ryde, Sydney, NSW 2109,Australia.E-mail: [email protected]

Appendix: Proof of the main results

To prove the main results we need lemma 1.

Lemma 1. Under assumptions (A1), (A2) and (8),

limn→∞

1n

XTMBMMB�MBX = 1I� as n→∞

in probability, where � is defined in section 3.

Proof. According to (7), XTMBMMB�MBX can be written as:

XTMBMMB�MBX =�TMBMMB�MB�+�TMBMMB�MB�

+�TMBMMB�MB�+�TMBMMB�MB�,

where �= (�11, . . . , �1I , . . . , �GI )T and � is an n×p matrix with ((i −1)I + j, l)th entry being�l (Uij). The definitions of MB and MMB� imply MBMMB�MB =MB − (In −MMB�). Therefore,

�TMBMMB�MB�=�TMB�−�T(In −MMB�)�.

For the (l, k)th entry of n−1�T(I −MMB�)�, we have

1n

E[�T(l)(I −MMB�)�(k)]= 1

nE{tr [E(�(l)�

T(k) |U11, . . . , U1I , . . . , UGI )(I −MMB�)]},




where �(l) is the lth column of �. Note that

E[�(l)�T(k) |U11, . . . , U1I , . . . , UGI ]=diag{a1, . . . , an},

where ai =E(�1i�ik |U11, . . . , U1I , . . . , UGI ), i =1, . . . , n, and �il is the ith entry of �(l). Hence

1n

E(�T(l)(I −MMB�)�(k)) = 1

nE{tr[diag{a, . . . , a}(I −MMB�)]}=O(n−1kn) =o(1). (10)

Moreover, we can show that

�TMB�=G∑

i =1

I∑j =1

[�ij − 1

I

I∑j1 =1

�ij1

]⊗2

=G∑

i =1

�i , say,

where A⊗2 =AAT. It is easy to see that E(�i) =�. Hence

1n�TMBMMB�MB�= 1

I�+op(1). (11)

Since MBMMB�MB ≤ MB and MB is an idempotent matrix, the maximum eigenvalue ofMBMMB�MB is 1. Thus

1n�T

(l)MBMMB�MB�(k) = 1n

(�(l) − �T��l)TMBMMB�MB(�(k) − �T��k

)

≤ 1n

(�(l) − �T��l)T(�(k) − �T��k

)

=Op

[max

1 ≤ l ≤ p��l (kn)��k (kn)

]=op(1), (12)

where �(l) is the lth column of �. It follows from Markov’s inequality that

1n�TMBMMB�MB�=Op

(max

1 ≤ l ≤ p��l (kn)

)and

1n�TMBMMB�MB�=Op

(max

1 ≤ l ≤ p��l (kn)

).

This completes the proof of the lemma.

Proof of theorem 1. According to the definition of �,

�n = (XTMBMMB�MBX)−1XMBMMB�MBY

=� + (XTMBMMB�MBX)−1(XTMBMMB�MB)(M + �).

Obviously,

1n

XTMBMMB�MB(M + �)= 1n�TMBMMB�MBM + 1

n�TMBMMB�MBM

+ 1n�TMBMMB�MB�+ 1

n�TMBMMB�MB�

=J1 +J2 +J3 +J4, say.

Note that E(�) =0. Similar to the proof in lemma 1, we can show that

J1 =Op

(n−1/2 max

1 ≤ l ≤ p�1/2

�l(kn)

).




Similarly,

J2 =Op

(�1/2

m (kn) max1 ≤ l ≤ p

�1/2�l

(kn))

and J3 =Op

(n−1/2 max

1 ≤ l ≤ p�1/2

�l(kn)

).

Moreover, it is easy to see that

J4 = 1n�TMB�− 1

n�TMBPMB�MB�= 1

n�TMB�+op(n−1/2)

= 1n

G∑i =1

I∑j =1

[�ij − 1

I

I∑j1 =1

�ij1

][�ij − 1

I

I∑j1 =1

�ij1

]+op(n−1/2).

Let be a p-variate constant vector, and write

`n() =TG∑

i =1

(I∑

j =1

[�ij − 1

I

I∑j1 =1

�ij1

][�ij − 1

I

I∑j1 =1

�ij1

]).

Obviously, `n() is a sum of independent random variables with variance G�2T�. There-fore, combining lemma 1 with the results on J3 and J4, it is easy to show that

1√n

(XTMBMMB�MBX)−1XTMBMMB�MB�→D

N(0, �2I�−1) as n→∞.

Thus, the proof is complete.

Proof of theorem 2. Note that

M −M =M −�ϑn =M −�(�TMB�)−�TMB(Y −X�n)

=−�(�TMB�)−�TMB�(�− �n)−�(�TMB�)−�TMB�(�− �n)

+(I −�(�TMB�)−�TMB)M +�(�TMB�)−�TMB�

=J1 +J2 +J3 +J4, say.

By the same argument as that proving (11), we can show that �TMB�/n=Op(1)+op(kn).Therefore,

JT1 J1 = (�− �n)T�TMB�(�TMB�)−�T�(�TMB�)−�TMB�(�− �n)

=Op(1) · (�− �n)T(�− �T��l)TMB�(�TMB�)−�TMB(�− �T��l

)(�− �n)

=Op(1) · (�− �n)T(�− �T��l)T(�− �T��l

)(�− �n) =Op

(max

1 ≤ l ≤ p��l (kn)

).

Moreover, following the same line as proving (10), we have JT2 J2 =Op(kn). We can further

show that JT3 J3 =O(n�2

m(kn)) by the same argument as proving J1. Finally,

JT4 J4 =Op(1) · �TMB(�TMB�)−MB�=Op(kn).

This completes the proof.




Proof of theorem 3. According to the definition, �2n can be decomposed as:

�2n = 1

n−G −p−kn[�TMB�+(�− �n)TXTMBX(�− �n)+ (ϑ− ϑn)T�TMB�(ϑ− ϑn)

+2(�− �n)TXTMB�(ϑ− ϑn)+2(�− �n)TXTMB�+2(ϑ− ϑn)T�TMB�]

=J1 + · · ·+J6, say.

Following the proof of theorem 1, n−1XTMBX =Op(1). Therefore, combining the root-nconsistency of �n we have J2 =Op(n−1). Moreover,

J3 ≤ 1n−G −p−kn

(ϑ− ϑn)T�T�(ϑ− ϑn) =Op(�m(kn)).

By the results for J2 and J3 together with Cauchy–Schwartz’ inequality, we get J4 =op(n−1/2).The proof of theorem 2 leads to n−1XTMB�=Op(n−1/2), which implies J5 =Op(n−1).Moreover, J6 can be written as:

2n−G −p−kn

[ϑ− (�TMB�)−�TMB(Y −X�n)]T�TMB�

= 2n−G −p−kn

�TMB[�ϑ−�(�TMB�)−�TMBM]

− 2n−G −p−kn

�TMB�(�TMB�)−�TMBX(�− �n)

− 2n−G −p−kn

�TMB�(�TMB�)−�TMB�

= J61 +J62 +J63, say.

By the same argument as proving (12) we can obtain

J61 = 2n−G −p−kn

�TMB�(�TMB�)−�TMB(�ϑ−M)

≤ 2n−G −p−kn

{�TPMB��}1/2{(�ϑ−M)TPMB�(�ϑ−M)}1/2

=Op(�m(kn)k1/2n n−1/2),

J62 ≤ 2n−G −p−kn

{�TPMB��}1/2{(�− �n)TXT�PMB�X(�− �n)}1/2 =Op(knn−1),

and

J63 = 2n−G −p−kn

�TPMB�� ≤ Op(knn−1).

Therefore, J6 =Op(�m(kn)k1/2n n−1/2)+Op(knn−1). Finally,

J1 = 1n−G −p−kn

�T

(In − 1

IIG ⊗1I 1T

I

)�

= 1n−G −p−kn

G∑i =1

I∑j =1

�2ij −

1n−G −p−kn

G∑i =1

1I

(I∑

j =1

�ij

)2

= 1n−G −p−kn

G∑i =1

�i ,




where

�i =I∑

j =1

(�ij − 1

I

I∑j1 =1

�ij1

)2

, i =1,…, G.

Obviously, the �i are i.i.d. random variables with mean

E�i =I∑

j =1

⎡⎣ (I −1)2

I 2E�2

11 + 1I 2

I∑j1 /=j

E�21j1

⎤⎦= (I −1)�2

and variance

var(�i)=E(�i)2 − (E(�i))

2

=E

[I∑

j =1

�21j −

1I

(I∑

j1 =1

�1j1

)2]2

− (I −1)2�4

=E

(I∑

j =1

�21j

)2

+ 1I 2

E

(I∑

j1 =1

�1j1

)4

− 2I

E

[I∑

j =1

�21j

(I∑

j1 =1

�1j1

)2]− (I −1)2�4

=(

I + 1I

−2)

E�411 +(I −1)

(−1+ 3

I

)�4.

Therefore,

var

( √n

n−G −p−kn

G∑i =1

�i

)= I

(I −1)2

[(I + 1

I−2

)E�4

11 + (I −1)(

3I

−1)

�4

]

=E�411 + 3− I

I −1�4.

The proof is complete.

Proof of theorem 4. Applying the consistency of �2n and lemma 1 it is easy to show that �n

is a consistent estimator of �2I�−1. Therefore, to complete the proof, we just need to showthat n is a consistent estimator of . To simplify the notation, we write

∇ij =XTij (�− �n)− 1

I

I∑j1 =1

XTij1

(�− �n)+m(Uij)− mn(Uij)− 1I

I∑j1 =1

m(Uij1 )+ 1I

I∑j1 =1

mn(Uij1 ).

Then

1n

n∑i =1

�4i =

G∑i =1

I∑j =1

(�ij − 1

I

I∑j1 =1

�ij1

)4

+G∑

i =1

I∑j =1

∇4ij

+4G∑

i =1

I∑j =1

(�ij − 1

I

I∑j1 =1

�ij1

)3

∇ij +4G∑

i =1

I∑j =1

(�ij − 6

I

I∑j1 =1

�i + j1

)2

∇2ij

+4G∑

i =1

I∑j =1

(�ij − 1

I

I∑j1 =1

�ij1

)∇3

ij

=J1 + · · ·+J5, say.




For J1, we have

J1 = 1I

I∑j =1

E

(�1j − 1

I

I∑j1 =1

�1j1

)4

+op(1)

= 1I

I∑j =1

E

[�4

1j +1I 4

(I∑

j1 =1

�1j1

)4

+ 4I 2

�21j

(I∑

j1 =1

�1j1

)2

+ 2I 2

�21j

(I∑

j1 =1

�1j1

)2

− 4I�3

1j

I∑j1 =1

�1j1 − 4I 3

(I∑

j1 =1

�1j1

)3

�1j

]

= E�411 + 6

I 3

I∑j =1

E

[�2

1j

(I∑

i1 =1

�1j1

)2]− 4

I 2

I∑j =1

E

(�3

1j

I∑j1 =1

�1j1

)− 3

I 4

(I∑

j =1

�1j

)4

=(

1+ 6I 2

− 3I 3

− 4I

)E�4

1 + (I −1)I 2

(6− 9

I

)�4.

Combining theorems 1 and 3 it is easy to show that∑G

i =1

∑Ij =1 ∇4

ij =op(1). Therefore, itfollows from the Hölder inequality that Js =op(1) for s =3, . . . , 5. The theorem then follows.

Proof of theorem 5. It is easy to see that

PRSS0 −PRSS1 =n−1[X(�− �n)+ �(�)− �(�n)]TMB[X(�− �n)+ �(�)− �(�n)]

−n−1[X(�− �n)+ �(�)−M]TMB[X(�− �n)+ �(�)−M]

+2n−1[X(�− �n)+ �(�)− �(�n)]TMB�

−2n−1[X(�− �n)+ �(�)−M]TMB�

=J1 −J2 +J3 −J4, say,

where M = (mn(U11), . . . , mn(U1I ), . . . , mn(UGI ))T. Under the null hypothesis, the consistencyof �n, �n and mn(·) lead to Js →p 0 as n →∞ for s =1, . . . , 4, so that PRSS0 − PRSS1 →p 0.Therefore it suffices to show that PRSS1 is bounded away from zero and infinity. Accordingto the proof of theorem 3 we can show that PRSS1 →p (1−1/I )�2. This proves the first resultof the theorem. The proof of the second result is similar and we omit the details here.



series estimation in partially linear inslide regression ...semiparametric models have been widely...

Documents