research article direct determination of smoothing

12
Research Article Direct Determination of Smoothing Parameter for Penalized Spline Regression Takuma Yoshida Graduate School of Science and Engineering, Kagoshima University, Kagoshima 890-8580, Japan Correspondence should be addressed to Takuma Yoshida; [email protected] Received 7 January 2014; Revised 31 March 2014; Accepted 31 March 2014; Published 22 April 2014 Academic Editor: Dejian Lai Copyright © 2014 Takuma Yoshida. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Penalized spline estimator is one of the useful smoothing methods. To construct the estimator, having goodness of fit and smoothness, the smoothing parameter should be appropriately selected. e purpose of this paper is to select the smoothing parameter using the asymptotic property of the penalized splines. e new smoothing parameter selection method is established in the context of minimization asymptotic form of MISE of the penalized splines. e mathematical and the numerical properties of the proposed method are studied. First we organize the new method in univariate regression model. Next we extend to the additive models. A simulation study to confirm the efficiency of the proposed method is addressed. 1. Introduction Penalized spline methods are a well-known efficient tech- nique for nonparametric smoothing. Penalized splines were suggested by O’Sullivan [1] and Eilers and Marx [2]. In O’Sullivan [1], they used a cubic -spline function and the penalty was the integrated squared second derivative of the -spline function. On the other hand, Eilers and Marx [2] use a cubic -spline function and a difference penalty on the spline coefficients. e Eilers and Marx’s estimator is computationally efficient compared to smoothing splines and O’Sullivan’s estimator since it removes the integration part of the penalty. Hence this paper focuses on the penalized spline estimator provided via Eilers and Marx [2]. e penalized spline method is efficient for both univariate regression and multiple regressions such as the additive model (see Marx and Eilers [3]). General properties usages and a description of the flexibility of penalized splines are described in Ruppert et al. [4]. When using penalized splines, the determination of the smoothing parameter is very important since it controls the trade-off between the goodness of fit and the smoothness of the fitted curve. As the classical method for achieving this, the grid search method is oſten used. e grid search method is selected by minimizing one criterion from candidate points of the smoothing parameter. Criteria for grid searches include cross-validation, generalized cross-validation, Mallow’s , and so forth. Although the grid search selection generally finds one optimal smoothing parameter, it is possible that the worth curve is obtained when not all the candidates are good. is tendency is especially striking in additive models since the number of the smoothing parameter is the same as that of the covariates. Several smoothing parameter selection methods using the grid search criteria have been developed by many authors such as Krivobokova [5], Reiss and Ogden [6], Wood [7], Wood [8], and Wood [9]. On the other hand, the mixed model representation of the spline smoothing has also been studied (see Lin and Zhang [10], Wand [11], and Ruppert et al. [4]). In mixed models, the grid search method is not necessary to obtain the final fit curve. e smoothing parameter in the mixed model can be written as the ratio of the variance of the random coefficient and the error. By estimating these unknown variances using a maximum likelihood method or a restricted maximum likelihood method (REML), the final fitted curves are obtained, yielding the estimated best linear unbiased predictor (EBLUP). erefore the EBLUP does not require a grid search. However the fitted curve tends to theoretically oversmooth and the numerical stability is not guaranteed if a cubic spline is used (see Section 3). e Bayesian approach Hindawi Publishing Corporation Journal of Probability and Statistics Volume 2014, Article ID 203469, 11 pages http://dx.doi.org/10.1155/2014/203469

Upload: others

Post on 14-Apr-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Direct Determination of Smoothing

Research ArticleDirect Determination of Smoothing Parameter forPenalized Spline Regression

Takuma Yoshida

Graduate School of Science and Engineering Kagoshima University Kagoshima 890-8580 Japan

Correspondence should be addressed to Takuma Yoshida yoshidascikagoshima-uacjp

Received 7 January 2014 Revised 31 March 2014 Accepted 31 March 2014 Published 22 April 2014

Academic Editor Dejian Lai

Copyright copy 2014 Takuma Yoshida This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Penalized spline estimator is one of the useful smoothing methods To construct the estimator having goodness of fit andsmoothness the smoothing parameter should be appropriately selected The purpose of this paper is to select the smoothingparameter using the asymptotic property of the penalized splinesThe new smoothing parameter selectionmethod is established inthe context of minimization asymptotic form of MISE of the penalized splines The mathematical and the numerical properties ofthe proposed method are studied First we organize the newmethod in univariate regression model Next we extend to the additivemodels A simulation study to confirm the efficiency of the proposed method is addressed

1 Introduction

Penalized spline methods are a well-known efficient tech-nique for nonparametric smoothing Penalized splines weresuggested by OrsquoSullivan [1] and Eilers and Marx [2] InOrsquoSullivan [1] they used a cubic 119861-spline function and thepenalty was the integrated squared second derivative of the119861-spline function On the other hand Eilers and Marx [2]use a cubic 119861-spline function and a difference penalty onthe spline coefficients The Eilers and Marxrsquos estimator iscomputationally efficient compared to smoothing splines andOrsquoSullivanrsquos estimator since it removes the integration part ofthe penalty Hence this paper focuses on the penalized splineestimator provided via Eilers and Marx [2] The penalizedspline method is efficient for both univariate regression andmultiple regressions such as the additivemodel (seeMarx andEilers [3]) General properties usages and a description of theflexibility of penalized splines are described in Ruppert et al[4]

When using penalized splines the determination of thesmoothing parameter is very important since it controls thetrade-off between the goodness of fit and the smoothness ofthe fitted curve As the classicalmethod for achieving this thegrid search method is often used The grid search method isselected byminimizing one criterion fromcandidate points of

the smoothing parameter Criteria for grid searches includecross-validation generalized cross-validation Mallowrsquos 119862119901and so forth Although the grid search selection generallyfinds one optimal smoothing parameter it is possible thatthe worth curve is obtained when not all the candidatesare good This tendency is especially striking in additivemodels since the number of the smoothing parameter is thesame as that of the covariates Several smoothing parameterselection methods using the grid search criteria have beendeveloped by many authors such as Krivobokova [5] Reissand Ogden [6] Wood [7] Wood [8] and Wood [9] On theother hand the mixed model representation of the splinesmoothing has also been studied (see Lin and Zhang [10]Wand [11] and Ruppert et al [4]) In mixed models thegrid search method is not necessary to obtain the finalfit curve The smoothing parameter in the mixed modelcan be written as the ratio of the variance of the randomcoefficient and the error By estimating these unknownvariances using amaximum likelihoodmethod or a restrictedmaximum likelihood method (REML) the final fitted curvesare obtained yielding the estimated best linear unbiasedpredictor (EBLUP) Therefore the EBLUP does not require agrid search However the fitted curve tends to theoreticallyoversmooth and the numerical stability is not guaranteed if acubic spline is used (see Section 3) The Bayesian approach

Hindawi Publishing CorporationJournal of Probability and StatisticsVolume 2014 Article ID 203469 11 pageshttpdxdoiorg1011552014203469

2 Journal of Probability and Statistics

to select the smoothing parameter has been studied byFahrmeir et al [12] Fahrmeir andKneib [13] andHeinzl et al[14] Kauermann [15] compared some smoothing parameterselection methods

In this paper we propose a new method to determiningthe smoothing parameter using the asymptotic propertiesof the penalized splines For the remainder of this paperour new method will be known as the direct methodBefore describing the outline of the direct method we willbriefly introduce the asymptotic studies of penalized splinesFirst Hall and Opsomer [16] showed the consistency ofthe penalized spline estimator in white noise representationSubsequently Li and Ruppert [17] Claeskens et al [18]Kauermann et al [19] andWang et al [20] have developed theasymptotics for the penalized spline estimator in univariateregression Yoshida and Naito [21] and Yoshida and Naito[22] have studied the asymptotics for penalized splines inadditive regression models and generalized additive modelsrespectively Xiao et al [23] suggested a new penalized splineestimator and developed its asymptotic properties in bivari-ate regression Thus the developments of the asymptotictheories of the penalized splines are relatively recent eventsIn addition the smoothing parameter selection methodsusing asymptotic properties have not yet been studied Thismotivates us to try to establish such methods

The direct method is conducted by minimizing the meanintegrated squared error (MISE) of the penalized spline esti-mator In general the MISE of the nonparametric estimatoris divided into the integrated squared bias and the integratedvariance of the estimator Of course the penalized splineestimator is no exception and hence the direct method isstated by utilizing the expression of the asymptotic bias andvariance of the penalized spline estimator which have beenderived by Claeskens et al [18] Kauermann et al [19] andYoshida and Naito [22] From their result we see that theasymptotic order of the variance of the penalized splineestimator is only dependent on the sample size and thenumber of knots but not the smoothing parameter Howeverthe second term of the asymptotic variance of the penalizedspline estimator contains the smoothing parameter and wecan see that the variance becomes small when the smoothingparameter increases On the other hand the squared biasof the penalized spline estimator increases if the smoothingparameter is reduced Therefore the minimizer of the MISEof the penalized spline estimator can be seen as one of theoptimal smoothing parameters Since the MISE is asymptot-ically convex with respect to the smoothing parameter theglobalminimumof theMISE can be foundThis detection hasbeen sufficiently developed for bandwidth selection in kernelregression (see Ruppert et al [24]Wand and Jones [25] etc)First the present paper focuses on univariate regression andwe next extend the direct method to the additive models Inboth models the mathematical and the numerical propertiesof the direct method are studied In additive models we needto select a smoothing parameter of the same number as theexplanatory variable such that the computational cost of thegrid search becomes large We expect that the computationalcost of the direct method is dramatically reduced comparedto the grid search

The structure of this paper is as follows In Section 2we introduce a penalized spline estimator in a univariateregression model Section 3 provides the direct method andrelated properties Section 4 extends the direct method to theadditive model In Section 5 we confirm the performanceof the direct method using a numerical study We provide adiscussion on the outlook and further studies in Section 6The proofs of our theorems are provided in the appendix

2 Penalized Spline Estimator

Consider the regression problem with 119899 observations

119884119894 = 119891 (119909119894) + 120576119894 119894 = 1 119899 (1)

where 119884119894 is the response variable 119909119894 is the explanatory vari-able 119891 is the true regression function and 120576119894 is the randomerror which is assumed to be independently distributed withmean 0 and variance 1205902Throughout the paper we assume theexplanatory variable 119909119894 isin [119886 119887] (119894 = 1 119899) is not a randomvariable from which the expectation of 119884119894 can be expressedas 119864[119884119894 | 119909119894] = 119891(119909119894) The support of the explanatory 119909119894can be relaxed as the real space R In order to simplify theway of locating the knots in the following the support of theexplanatory is assumed to be with compact space We aim toestimate 119891 via a nonparametric penalized spline method Weconsider the knots 119886 = 1205810 lt 1205811 lt sdot sdot sdot lt 120581119870 lt 120581119870+1 = 119887 andfor 119896 = minus119901 119870 minus 1 let 119861[119901]

119896(119909) be the 119901th degree 119861-spline

basis function defined as

119861[0]

119896(119909) =

1 120581119896 lt 119909 le 120581119896+1

0 otherwise

119861[119901]

119896(119909) =

119909 minus 120581119896

120581119896+119901 minus 120581119896

119861[119901minus1]

119896(119909) +

120581119896+119901+1 minus 119909

120581119896+119901+1 minus 120581119896+1

119861[119901minus1]

119896+1(119909)

(2)

associated with the above knots and the additional knots120581minus119901 = 120581minus119901+1 = sdot sdot sdot = 1205810 and 120581119870+1 = sdot sdot sdot = 120581119870+119901+1 The 119861-spline function 119861[119901]

119896(119909) is a piecewise 119901th degree polynomial

on an interval [120581119896 120581119896+119901+1] The details of the 119861-spline basisfunctions are described in de Boor [26] For simplicity wewrite 119861119896(119909) = 119861

[119901]

119896(119909) since we do not specify the 119901 in the

following sentenceWeuse the linear combination of 119861119896(119909) | 119896 = minus119901 119870minus

1 and the unknown parameters 119887119896 (119896 = minus119901 119870 minus 1)

to approximate the regression function and consider the 119861-spline regression problem

119884119894 = 119904 (119909119894) + 120576119894 119894 = 1 119899 (3)

where

119904 (119909) =

119870minus1

sum

119896=minus119901

119861119896 (119909) 119887119896 (4)

The purpose is to estimate the parameter b =

(119887minus119901 sdot sdot sdot 119887119870minus1)119879 included in 119904(119909) instead of 119891 directly

Journal of Probability and Statistics 3

The penalized spline estimator b = (minus119901 sdot sdot sdot 119870minus1)119879

of b isdefined as the minimizer of

119899

sum

119894=1

119884119894 minus

119870minus1

sum

119896=minus119901

119861119896(119909)119887119896

2

+ 120582

119870minus1

sum

119896=119898minus119901

Δ119898(119887119896)2 (5)

where 120582 is the smoothing parameter and Δ is the backwarddifference operator defined as Δ119887119896 = 119887119896 minus 119887119896minus1 and

Δ119898(119887119896) = Δ

119898minus1(Δ119887119896) =

119898

sum

119895=0

(minus1)119898minus119895

119898119862119895119887119896minus119898+119895 (6)

Let 119863119898 = (119889(119898)

119894119895)119894119895 be a (119870 + 119901 minus 119898) times (119870 + 119901) matrix where

119889(119898)

119894119895= (minus1)

|119894minus119895|

119898119862|119894minus119895| for 119894 le 119895 le 119898+1 and 0 otherwise Usingthe notation Y = (1198841 sdot sdot sdot 119884119899)

119879 and 119885 = (119861minus119901+119895minus1(119909119894))119894119895 (5)can then be expressed as

(Y minus 119885b)119879 (Y minus 119885b) + 120582b119879119863119879119898119863119898b (7)

The minimum of (7) is obtained when

b = (119885119879119885 + 120582119863119879119898119863119898)minus1

119885119879Y (8)

The penalized spline estimator 119891(119909) of 119891(119909) for 119909 isin [119886 119887] isdefined as

119891 (119909) =

119870minus1

sum

119896=minus119901

119861119896 (119909) 119896 = B(119909)119879b (9)

where B(119909) = (119861minus119901(119909) sdot sdot sdot 119861119870minus1(119909))119879

If120582 equiv 0119891(119909) is reduced to the regression spline estimatorwhich is the spline estimator obtained via the least squaresmethod The regression spline estimator will lead to anoscillatory fit if the number of knots 119870 is large However thedetermination of119870 and the location of knots are very difficultproblems The advantage of the penalized spline smoothingis that the good smoothing parameter brings the estimatorto the curve with the fitness and smoothness simultaneouslywithout choosing the number and location of knots preciselyIn the present paper we use equidistant knots 120581119896 = 119886 + 119896119870and focus on the determination of the smoothing parameterAs the location of knots other than above the quantiles ofthe data points 1199091 119909119899 are often used (see Ruppert [27])However it is known that the penalized spline estimator ishardly affected by the location of knots if 119870 is not too smallTherefore we do not discuss the location of knots We suggestthe direct method for this in the next section

3 Direct Determination ofSmoothing Parameter

In this section we provide the direct method for determiningthe smoothing parameter without a grid search This directmethod is given theoretical justification by asymptotic theoryof the penalized spline estimator To investigate the asymp-totic property of the penalized spline estimator we assumethat 119891 isin 119862119901+1 119870 = 119900(11989912) and 120582 = 119874(1198991198701minus119898)

For convenience we first give some notation Let 119866119899 =119899minus1119885119879119885 and Λ 119899 = 119866119899 + (120582119899)119863

119879

119898119863119898 Let blowast be a best 119871infin

approximation to the true function 119891 This means that blowastsatisfies

sup119909isin(119886119887)

10038161003816100381610038161003816119891 (119909) + 119870

minus(119901+1)119887119886 (119909) minus B(119909)

119879blowast10038161003816100381610038161003816

= 119900 (119870minus(119901+1)

) as 119870 997888rarr infin(10)

where

119887119886 (119909) = minus119891(119901+1)

(119909)

(119901 + 1)

119870

sum

119896=1

119868 (120581119896minus1 le 119909 lt 120581119896)Br119901+1 (119909 minus 120581119896minus1

119870minus1)

(11)

119868(119886 lt 119909 lt 119887) is the indicator function of an interval (119886 119887)and Br119901(119909) is the 119901th Bernoulli polynomial (see Zhou et al[28]) It can be easily shown that 119887119886(119909) = 119874(1) as 119870 rarr infin

The penalized spline estimator can be written as

119891 (119909) = B(119909)119879(119885119879119885)minus1

119885119879Y

minus120582

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899(119885

119899)

119879

Y(12)

The first term of the right hand side of (12) is equal to theregression spline estimator denoted by 119891rs(119909) The asymp-totics for the regression spline estimator has been developedby Zhou et al [28] and can be expressed as

119864 [119891rs (119909)] = 119891 (119909) + 119870minus(119901+1)

119887119886 (119909) + 119900 (119870minus(119901+1)

)

119881 [119891rs (119909)] =1205902

119899B(119909)119879119866minus1

119899B (119909) 1 + 119900 (1) = 119874(119870

119899)

(13)

FromTheorem 2(a) of Claeskens et al [18] we have

119864 [119891 (119909)] minus 119864 [119891rs (119909)] = minus120582

119899B(119909)119879Λminus1

119899119863119879

119898119863119898blowast

+ 119900(1205821198701minus119898

119899) = 119874(

1205821198701minus119898

119899)

119881 [119891 (119909)] = 119881 [119891rs (119909)] minus120582119870

119899

119862 (119909 | 119899 119870 120582)

119870+ 119900(

1205821198702

1198992)

(14)

where

119862 (119909 | 119899 119870 120582) =21205902

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899B (119909) (15)

is the covariance of 119891rs(119909) and the second term of the righthand side of (12)The variance of the second term of (12) canbe shown to be negligible (see the appendix) The followingtheorem leads to 120582 controlling the trade-off between thesquared bias and variance of the penalized spline estimator

Theorem1 Thecovariance119862(119909 | 119899 119870 120582)119870 in (15) is positiveFurthermore as 119899 rarr infin 119862(119909 | 119899 119870 120582) = 119862(119909 | 119899 119870 0)(1 +119900(1)) and 119862(119909 | 119899 119870 0)119870 = 119874(119870119899)

4 Journal of Probability and Statistics

From the asymptotic form of 119864[119891(119909)] and 119881[119891(119911)] andTheorem 1 we see that for small 120582 the bias of 119891(119909) issmall and the variance becomes large On the other handthe large 120582 indicates the bias of 119891(119909) increases and thevariance decreases From Theorem 1 the MISE of 119891(119909) canbe expressed as

int

119887

119886

119864 [119891 (119909) minus 119891 (119909)2

] 119889119909

= int

119887

119886

119870minus(119901+1)

119887119886 (119909) minus120582

119899B(119909)119879Λminus1

119899119863119879

119898119863119898blowast

2

119889119909

+1205902

119899int

119887

119886

B(119909)119879119866119899B (119909) 119889119909 minus120582119870

119899

int119887

119886119862 (119909 | 119899 119870 0) 119889119909

119870

+ 1199031119899 (119870) + 1199032119899 (119870 120582)

(16)

where 1199031119899(119870) and 1199032119899(119870 120582) are of negligible order respec-tively compared to the regression spline and penalized splineof the second term of the right hand side of (12) Actuallywe have 1199031119899(119870) = 119900(119870119899) and 1199032119899(119870 120582) = 119900(120582

21198702(1minus119898)

1198992)

The MISE of 119891(119909) is asymptotically quadratic and a globalminimum exists Let

MISE (120582) = 1205822

1198992int

119887

119886

B(119909)119879Λminus1119899119863119879

119898119863119898blowast2

119889119909

minus120582

119899int

119887

119886

2119887119886 (119909)B(119909)119879Λminus1119899 119863

119879

119898119863119898blowast

119870119901+1

+119862 (119909 | 119899 119870 0) 119889119909

(17)

And let 120582opt be theminimizer ofMISE(120582)We suggest the useof 120582opt as the optimal smoothing parameter

120582opt =119899

2

int119887

1198861198632119899 (119909 | 119891

(119901+1) blowast) 119889119909

int119887

1198861198631119899 (119909 | blowast) 119889119909

(18)

where

1198631119899 (119909 | blowast) = B(119909)119879119866minus1

119899119863119879

119898119863119898blowast2

1198632119899 (119909 | 119891(119901+1)

blowast) = 2119887119886 (119909)B(119909)119879119866minus1119899 119863

119879

119898119863119898blowast

119870119901+1

+ 119862 (119909 | 119899 119870 0)

(19)

However it is easy to see that MISE(120582) and 120582opt contain anunknown function and parameters and hence these must beestimated We construct the estimator of 120582opt by using theconsistent estimator of 119891(119901+1) and blowast We can use the penal-ized spline estimator and its derivative as the pilot estimatorof119891(119901+1) and blowast If we then use another smoothing parameter1205820 it should be chosen appropriately Therefore we use theregression spline estimator as the pilot estimator of 119891(119901+1)(119909)

and blowast First we establish b = (119885119879119885)minus1119885119879119884 Next we con-struct the pilot estimator with119891(119901+1)(119909) by using the (119901+2)thdegree 119861-spline basis Let B[119901](119909) = (119861[119901]minus119901 (119909) sdot sdot sdot 119861

[119901]

119870minus1(119909))119879

Using the fundamental property of the 119861-spline function119891(119898)(119909) can be written as 119891(119898)(119909) = 119861

[119901minus119898](119909)119879119863119898blowast

asymptotically Hence the regression spline estimator 119891(119901+1)

can be constructed as 119891(119901+1)(119909) = B[119901+1](119909)119879119863119901+1b(119901+2)where b(119901+2) = ((119885[119901+2])119879119885[119901+2])minus1(119885[119901+2])119879Y and 119885[119901+2] =(119861[119901+2]

minus119901+119895minus1(119909119894))119894119895 Since the regression spline estimator tends to

be oscillatory with a higher order 119901th spline function thefewer knots are used to construct 119891(119901+1)(119909) The variance 1205902included in 119862(119909 | 119899 119870 0) is estimated via

2=

1

119899 minus (119870 + 119901)

119899

sum

119894=1

119884119894 minus B(119909119894)119879b2

(20)

Using the above pilot estimators

opt =119899

2

sum119869

119895=11198632 (119911119895 | 119891

(119901+1) b)

sum119869

119895=11198631 (119911119895 | b)

(21)

with some finite grid points 119911119895119869

1on [119886 119887]

Consequently the final penalized spline estimator isdefined as

119891 (119909) = B(119909)119879bopt (22)

where

bopt = (119885119879119885 + opt119863

119879

119898119863119898)minus1

119885119879Y (23)

It is known that the optimal order of 119870 of the penalizedspline estimator is the same as that of the regression splineestimator 119870 = 119874(119899

1(2119901+3)) (see Kauermann et al [19] and

Zhou et al [28]) Using this we show the asymptotic propertyof 120582opt in following theorem

Theorem 2 Let 119891 isin 119862119901+1 Suppose that 119870 = 119900(11989912) and

120582 = 119900(1198991198701minus119898) Then 120582119900119901119905 given in (21) exists and 120582119900119901119905 =

119874(119899119870119898minus119901minus2

+1198702119898) as 119899 rarr infin Furthermore119870 = 119874(1198991(2119901+3))

leads to the optimal order

120582opt = 119874 (119899(119901+119898+1)(2119901+3)

) + 119874 (1198992119898(2119901+3)

) (24)

and the rate of convergence of MISE of 119891(119909) becomes119874(119899minus2(119901+1)(2119901+3)

)

The proof of Theorem 2 is given in the appendix At theend of this section we give a few remarks

Remark 3 The asymptotic order of the squared biasand variance of the penalized splines are 119874(119870minus2(119901+1)) +119874(12058221198702(1minus119898)

1198992) and 119874(119870119899) respectively Therefore under

119870 = (1198991(2119901+3)

) and 120582 = 119874(119899(119901+119898+1)(2119901+3)

) the optimalrate of convergence of MISE of the penalized splines is119874(119899minus2(119901+1)(2119901+3)

) From Theorem 2 we see that the asymp-totic order of 120582opt yields the optimal rate of convergence

Journal of Probability and Statistics 5

Remark 4 OrsquoSullivan [1] used 120583int119887119886119904(119898)(119909)2119889119909 as the penalty

term where 120583 is the smoothing parameter When equidistantknots are used the penaltyint119887

119886119904(119898)(119909)2119889119909 can be expressed as

1198702119898b119879119863119879

119898119877119863119898b where 119877 = (int

119887

119886119861[119901minus119898]

119894(119909)119861[119901minus119898]

119895(119909))119894119895 The

penalty b119879119863119879119898119863119898b provided by Eilers and Marx [2] can be

seen as the simple version of1198702119898b119879119863119879119898119877119863119898b by replacing 119877

with 119870minus1119868 and 120582 = 1205831198702119898minus1 where 119868 is the identity matrixSo the order 119898 of the difference matrix 119863119898 controls thesmoothness of the 119901th 119861-spline function 119904(119909) and hence 119898should be set such that119898 le 119901 to give a theoretical justificationalthough the penalized spline estimator can be also calculatedfor 119898 gt 119901 Actually (119901119898) = (3 2) is often used by manyauthors

Remark 5 The penalized spline regression is often consid-ered as the mixed model representation (see Ruppert et al[4]) In this frame work we use the 119901th truncated splinemodel

119904 (119909) = 1205730 + 1205731119909 + sdot sdot sdot + 120573119901119909119901+

119870

sum

119895=1

119906119896(119909 minus 120581119896)119901

+ (25)

where (119909 minus 119886)+ = max119909 minus 119886 0 and the 120573rsquos are unknownparameters Each 119906119896 is independently distributed as 119906119896 sim119873(0 120590

2

119906) where 1205902

119906is an unknown variance parameter of 119906119896

The penalized spline fit of 119891(119909) is defined as the estimatedBLUP (see Robinson [29]) The smoothing parameter in theordinal spline regression model corresponds to 12059021205902

119906in the

spline mixed model Since the 1205902 and 1205902119906are estimated using

the ML or REML method we do not need to choose thesmoothing parameter via a grid search It is known that theestimated BLUP fit is linked to the penalized spline estimator(9) with 119898 = 119901 + 1 (see Kauermann et al [19]) Hence theestimated BLUP tends to have a theoretically underfit (seeRemark 4)

Remark 6 From Lyapunovrsquos theorem the asymptotic nor-mality of the penalized spline estimator119891(119909)with opt can bederived under the same assumption as Theorem 2 and someadditional mild conditions Although the proof is omittedit is straightforward since opt is the consistent estimator of120582opt and the asymptotic order of 120582opt satisfies Lyapunovrsquoscondition

4 Extension to Additive Models

We extend the direct method to the regression model withmultidimensional explanatory variables In particular weconsider additive models in this section For the dataset(119884119894 x119894) 119894 = 1 119899 with 1-dimensional response 119884119894 and119889-dimensional explanatory x119894 = (1199091198941 119909119894119889) the additivemodels are connected via unknown regression functions119891119895 (119895 = 1 119889) and the mean parameter 120583 such that

119884119894 = 120583 + 1198911 (1199091198941) + sdot sdot sdot + 119891119889 (119909119894119889) + 120576119894 (26)

We assume that 119909119894119895 is located on an interval [119886119895 119887119895] and1198911 119891119889 are normalized as

int

119887119895

119886119895

119891119895 (119906) 119889119906 = 0 119895 = 1 119889 (27)

to ensure the identifiability of 119891119895 Then the intercept 120583 istypically estimated via

120583 =1

119899

119899

sum

119894=1

119884119894 (28)

Hence we replace 119884119894 with 119884119894 minus 120583 in (26) and set 120583 = 0redefining the additive model as

119884119894 = 1198911 (1199091198941) + sdot sdot sdot + 119891119889 (119909119894119889) + 120576119894 119894 = 1 119899 (29)

where each 119884119894 is centered We aim to estimate 119891119895 via apenalized spline method Let

119904119895 (119909) =

119870119895minus1

sum

119896=minus119901

119861119895119896 (119909) 119887119895119896 119895 = 1 119889 (30)

be the 119861-spline model where 119861119895119896(119909) is the 119901th 119861-splinebasis function with knots 120581119895119896 | 119896 = minus119901 119870119895 + 119901 +

1 1205811198950 = 119886119895 120581119895119870119895+1= 119887119895 119895 = 1 119889 and 119887119895119896rsquos are unknown

parameters We consider the 119861-spline additive models

119884119894 = 1199041 (1199091198941) + sdot sdot sdot + 119904119889 (119909119894119889) + 120576119894 119894 = 1 119899 (31)

and estimate 119904119895 and 119887119895119896 For 119895 = 1 119889 the penalized spline

estimator b119895 = (119895minus119901 sdot sdot sdot 119895119870minus1)119879

of b119895 = (119887119895minus119901 sdot sdot sdot 119887119895119870minus1)119879

is defined as

[[

[

b1b119889

]]

]

= argminb1b119889

119899

sum

119894=1

119884119894 minus

119889

sum

119895=1

119870119895

sum

119896=minus119901

119861119895119896 (119909119894) 119887119895119896

2

+

119889

sum

119895=1

120582119895b119879

119895119863119879

119895119898119863119895119898b119895

(32)

where 120582119895 are the smoothing parameters and 119863119895119898 is the 119898thorder difference matrix with size (119870119895 + 119901 minus 119898) times (119870119895 + 119901) for119895 = 1 119889 By using b119895 the penalized spline estimator of119891119895(119909119895) is defined as

119891119895 (119909119895) =

119870119895minus1

sum

119896=minus119901

119861119895119896 (119909119895) 119895119896 119895 = 1 119889 (33)

6 Journal of Probability and Statistics

The asymptotics for 119891119895(119909119895) have been studied by Yoshidaand Naito [22] who derived the asymptotic bias and varianceto be

119864 [119891119895 (119909119895)] = 119891119895 (119909119895) + 119870minus(119901+1)

119895119887119895119886 (119909119895) (1 + 119900 (1))

+

120582119895

119899B119895(119909119895)

119879

Λminus1

119895119899119863119879

119895119898119863119895119898blowast

119895+ 119900(

1205821198951198701minus119898

119895

119899)

119881 [119891119895 (119909119895)] =1205902

119899B119895(119909119895)

119879

119866minus1

119895119899B119895 (119909119895)

minus120582119870

119899

1205902119862119895 (119909119895 | 119899 119870119895)

119870+ 119900(

1205821198951198702

119895

1198992)

(34)

where B119895(119909) = (119861minus119901(119909) sdot sdot sdot 119861119870119895minus1(119909))119879

119866119895119899 = 119899minus1119885119879

119895119885119895

Λ 119895119899 = 119866119895119899 + (120582119895119899)119863119879

119898119863119898 119885119895 = (119861minus119901+119896minus1(119909119894119895))119894119896 and blowast

119895is

the best 119871infin approximation of 119891119895

119887119895119886 (119909) = minus

119891(119901+1)

119895(119909)

(119901 + 1)

119870119895minus1

sum

119896=0

119868 (120581119895119896 le 119909 lt 120581119895119896+1)

times 119861119901+1(

119909 minus 120581119895119896

119870minus1119895

)

119862119895 (119909119895 | 119899 119870119895) =2

119899B119895(119909119895)

119879

119866minus1

119895119899119863119879

119895119898119863119895119898119866

minus1

119899B119895 (119909119895)

(35)

The above asymptotic bias and variance of 119891119895(119909119895) are sim-ilar to that of the penalized spline estimator in univariateregression with (119884119894 119909119894119895) 119894 = 1 119899 Furthermore

the asymptotic normality of [1198911(1199091) sdot sdot sdot 119891119889(119909119889)]119879

has beenshown by Yoshida and Naito [22] From their paper we findthat119891119894(119909119894) and119891119895(119909119895) are then asymptotically independent for119894 = 119895 This indicates some theoretical justification to select 120582119895in minimizing the MISE of 119891119895 Similar to the discussion inSection 3 the minimizer 120582119895opt of the MISE of 119891119895(119909119895) can beobtained for 119895 = 1 119889

120582119895opt =119899

2

int119887119895

119886119895119871119895119899 (119909 | 119891

(119901+1)

119895 blowast119895 1205902) 119889119909

int119887119895

119886119895119867119895119899 (119909 | blowast119895 ) 119889119909

(36)

where

119867119895119899 (119909 | blowast

119895) = B119895(119909119895)

119879

119866minus1

119895119899119863119879

119895119898119863119895119898blowast

119895

2

119871119895119899 (119909 | 119891(119901+1)

119895 blowast119895 1205902) = 2

119887119886 (119909)B119895(119909119895)119879

119866minus1

119895119899119863119879

119895119898119863119895119898blowast119895

119870119901+1

119895

+ 1205902119862119895 (119909 | 119899 119870119895)

(37)

Since 119891119895 blowast119895 and 1205902 are unknown they should be estimated

Thepilot estimators of119891119895 blowast119895 and1205902 are constructed based on

the regression splinemethod By using the pilot estimators119891119895b119895 of119891119895blowast119895 and the estimator of1205902 we construct the estimatorof 120582119895opt

119895opt =119899

2

sum119877

119903=1119871119895119899 (119911119903 | 119891

(119901+1)

119895 b119895 2)

sum119877

119903=1119867119895119899 (119911119903 | b119895)

(38)

where 119911119903119877

1is some finite grid point sequence on [119886119895 119887119895] We

obtain for 119895 = 1 119889 the penalized spline estimator 119891119895(119909119895)of 119891119895(119909119895)

119891119895 (119909119895) = B119895(119909119895)119879

b119895opt (39)

where b119895opt is the penalized spline estimator of b119895 using119895opt From Theorem 2 and the proof of Theorem 34of Yoshida and Naito [22] the asymptotic normality of[1198911(1199091) sdot sdot sdot 119891119889(119909119889)]

119879

using (1205821opt 120582119889opt) can be shown

Remark 7 Since the true regression functions are normalizedthe estimator 119891119895 should be also centered as

119891119895 (119909119895) minus1

119899

119899

sum

119894=1

119891119895 (119909119894119895) (40)

Remark 8 Thepenalized spline estimator of b1 b119889 can beobtained using a backfitting algorithm (Hastie and Tibshirani[30]) The backfitting algorithm for the penalized splines inadditive regression is detailed in Marx and Eilers [3] andYoshida and Naito [21]

Remark 9 Although we focus on the nonparametric additiveregression in this paper the directmethod can be also appliedto the generalized additive models However we omit thisdiscussion because the procedure is similar to the case of theadditive models discussed in this section

Remark 10 The direct method is quite computationally effi-cient when compared to the grid search method in additivemodels In grid searches we prepare the candidate of 120582119895 Let119872119895 be the set of all possible candidate grid value of 120582119895 for 119895 =1 119889 Then we need to compute the backfitting algorithm1198721 times sdot sdot sdot times 119872119889 times On the other hand it is sufficientto perform the backfitting algorithm for only two steps forthe pilot estimator and the final penalized spline estimatorThus compared with the conventional grid search methodthe direct method can drastically reduce computation time

5 Numerical Study

In this section we investigate the finite sample performanceof the proposed direct method in a Monte Carlo simulationLet us first consider the univariate regression model (1) forthe data (119884119894 119909119894) 119894 = 1 119899 Then we use the three

Journal of Probability and Statistics 7

Table 1 Results of sample MISE for 119899 = 50 and 119899 = 200 All entries for MISE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0526 0322 3684 1070 1160 0377L-GCV 0726 0621 5811 1082 1183 0379L-REML 2270 1544 9966 3520 1283 0873Local linear 0751 0220 4274 0901 1689 0503C-Direct 0401 0148 3027 0656 1044 0326C-GCV 0514 0137 3526 0666 1043 0290C-REML 1326 0732 8246 4213 3241 0835

Table 2 Results of sample MSE of smoothing parameter obtained by direct method GCV and REML for 119899 = 50 and 119899 = 200 All entries forMSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0331 0193 0113 0043 0025 0037L-GCV 0842 0342 0445 0101 0072 0043L-REML 1070 1231 0842 0437 0143 0268C-Direct 0262 0014 0082 0045 0053 0014C-GCV 0452 0123 0252 0092 0085 0135C-REML 0855 0224 0426 0152 0093 0463

types of true regression function 119891(119909) = cos(120587(119909 minus 03))119891(119909) = 120601((119909 minus 08)005)radic005 minus 120601((119909 minus 02)004)radic004and 119891(119909) = 21199093 +3 sin(2120587(119909minus08)3) + 3 exp[minus(119909minus05)201]which are labeled by F1 F2 and F3 respectively Here120601(119909) is the density function of the normal distribution Theexplanatory 119909119894 and error 120576119894 are independently generated fromuniform distribution on [0 1] and 119873(0 052) respectivelyWe estimate each true regression function via the penalizedspline method We then use the linear and cubic 119861-splinebases with equidistant knots and the second order differencepenalty In addition we set 119870 = 5119899

25 equidistant knotsand the smoothing parameter is determined by the directmethodThe penalized spline estimator with the linear splineand cubic spline are denoted as L-Direct and C-Directrespectively For comparison with L-Direct the same studiesare also implemented for the penalized spline estimator witha linear spline and the smoothing parameter selected viaGCV and restricted maximum likelihood method (REML)in mixed model representation and the local polynomialestimator with normal kernel and Plug-in bandwidth (seeRuppert et al [24]) In GCV we set the candidate values of120582 as 11989410 119894 = 0 1 99 The above three estimators aredenoted by L-GCV L-REML and local linear respectivelyFurthermore we compare C-Direct with C-GCV and C-REML which are the penalized spline estimator with thecubic spline and the smoothing parameter determined byGCV and REML Let

sMISE = 1119869

119869

sum

119895=1

[1

119877

119877

sum

119903=1

119891119903 (119911119895) minus 119891 (119911119895)2

] (41)

be the sample MISE of any estimator 119891(119909) of 119891(119909) where119891119903(119911) is 119891(119911) with 119903th replication and 119911119895 = 119895119869 We calculatethe sample MISE of the penalized spline estimator withthe direct method GCV and REML and the local linearestimator In this simulation we use 119869 = 100 and 119877 = 1000We have simulated 119899 = 50 and 200

The sMISE of all estimators for each model and 119899 aregiven in Table 1 The penalized spline estimator using thedirect method shows good performance in each setting Incomparison with other smoothing parameter methods thedirect method is a little better than the GCV as a wholeHowever for 119899 = 200 C-GCV is better than C-Direct inF1 and F3 though its difference is very small We see thatthe sMISE of C-Direct is smaller than local linear whereaslocal linear behaves better than L-Direct in some case In F2C-Direct is the smallest of all estimators for 119899 = 50 and200 Although the performance totally seems to depend onsituation in which data sets are generated we believe that theproposed method is one of the efficient methods

Next the difference between opt and 120582opt is investigatedempirically Let opt119903 be the opt with 119903th replications for 119903 =1 1000 Then we calculate the sample MSE of opt

sMSE = 1119877

119877

sum

119903=1

opt119903 minus 120582opt2

(42)

for F1 F2 and F3 and 119899 = 50 and 200 To construct 120582opt forF1 F2 and F3 we use true119891(4)(119909) and 1205902 and an approximateblowast Here the approximate blowast means the sample average of bwith 119899 = 200 and 1000 replications

In Table 2 the sMSE of opt for each true functionare described For comparison the sMSE of the smoothing

8 Journal of Probability and Statistics

Table 3 Results of sample PSE for 119899 = 50 and 119899 = 200 All entries for PSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0424 0052 0341 0070 0360 0117L-GCV 0331 0084 0381 0089 0333 0139L-REML 0642 0124 0831 1002 0733 0173Local linear 0751 0220 0474 0901 0489 0143C-Direct 0341 048 0237 063 0424 0086C-GCV 0314 043 0326 076 0533 0089C-REML 0863 1672 0456 121 1043 0125

parameter obtained via GCV and REML are calculated ThesMSE of the L-Direct and the C-Direct is small even in 119899 = 50for all true functions Therefore it seems that the accuracyof opt is guaranteed It indicates that the pilot estimatorconstructed via least squares method is not bad The sMSEwith the direct method are smaller than that with GCV andREML This result is not surprising since GCV and REMLare not concerned with 120582opt However together with Table 1it seems that the sample MSE of the smoothing parameter isreflected in the sample MISE of the estimator

The proposed direct method was derived based on theMISE On the other hand the GCV and the REML areobtained in context of minimizing prediction squared error(PSE) and prediction error Hence we compare the samplePSE of the penalized spline estimator with the direct methodGCV and REML and the local linear estimator Since theprediction error is almost the same as MISE (see Section 4 ofRuppert et al [4]) we omit the comparison of the predictionerror Let

sPSE = 1119899

119899

sum

119894=1

[1

119877

119877

sum

119903=1

119910119894119903 minus 119891119903 (119909119894)2

] (43)

be the sample PSE for any estimator 119891(119909) where119910119894119903 (119903 = 1 119877) is independently generated from119884119894 sim 119873(119891(119909119894) (05)

2) for all 119894

In Table 3 the modified sPSE |sPSE minus (05)2| of allestimators for each model and 119899 are described In Remark 11we discuss the modified sPSE From the result we canconfirm that the direct method has good predictability GCVcan be regarded as the estimator of sPSE Therefore in somecase sPSE with GCV is smaller than that with the directmethod It seems that the cause is the accuracy of theestimates of variance (see Remark 11) However its differenceis very small

We admit that the computational time (in second) takento obtain 119891(119909) for F1 119901 = 3 and 119899 = 200 The fits withthe direct method GCV and REML took 004 122 and 034Although the difference is small the computational time ofthe direct method was faster than that of GCV and REML

Next we confirm the behavior of the penalized splineestimator with the direct method in the additive model Forthe data (119884119894 1199091198941 1199091198942 1199091198943) 119894 = 1 119899 we assume theadditive model with true functions 1198911(1199091) = sin(21205871199091)

1198912(1199092) = 120601((1199092 minus 05)02) and 1198913(1199093) = 04120601((1199093 minus

01)02) + 06120601((1199092 minus 08)02) The error is similar to thefirst simulationThe design points (1199091198941 1199091198942 1199091198943) are generatedby

[

[

1199091198941

1199091198942

1199091198943

]

]

=

[[[

[

(1 + 120588 + 1205882)minus1

0 0

0 (1 + 2120588)minus1

0

0 0 (1 + 120588 + 1205882)minus1

]]]

]

times [

[

1 120588 1205882

120588 1 120588

1205882120588 1

]

]

[

[

1199111198941

1199111198942

1199111198943

]

]

(44)

where 119911119894119895 (119894 = 1 119899 119895 = 1 2 3) are generated inde-pendently from the uniform distribution on [0 1] In thissimulation we adopt 120588 = 0 and 120588 = 05 We then correctedto satisfy

int

1

0

119891119895 (119911) 119889119911 = 0 119895 = 1 2 3 (45)

in each 120588 We construct the penalized spline estimator viathe backfitting algorithm with the cubic spline and secondorder difference penalty The number of equidistant knots is119870119895 = 5119899

25 and the smoothing parameters are determinedusing the direct method The pilot estimator to construct119895opt is the regression spline estimator with fifth spline and119870119895 = 119899

25 We calculate the sample MISE of each 119891119895 for119899 = 200 and 1000 Monte Carlo iterations Then we calculatethe sample MISE of 1198911 1198912 and 1198913 In order to compare withthe direct method we also conduct the same simulation withGCV and REML In GCV we set the candidate values of 120582119895 as11989410 119894 = 0 1 9 for 119895 = 1 2 3 Table 4 summarizes thesample MISE of 119891119895 (119895 = 1 2 3) denoted by MISE119895 for directmethod GCV and REML The penalized spline estimatorwith the direct method performs like that with the GCV inboth the uncorrelated and correlated design cases For 120588 = 0the behaviors of MISE1 MISE2 and MISE3 with the directmethod are similar On the other hand for GCV the MISE1is slightly larger than MISE2 and MISE3 The direct methodleads to an efficient estimate of all covariates On the wholethe direct method is better than REML From above webelieve that the direct method is preferable in practice

Journal of Probability and Statistics 9

Table 4 Results of sample MISE of the penalized spline estimator For each direct method GCV and 120588 MISE1 MISE2 and MISE3 are thesample MISE of 1198911 1198912 and 1198913 respectively All entries for MISE are 102 times their actual values

Method 120588 = 0 120588 = 05

MISE1 MISE2 MISE3 MISE1 MISE2 MISE3Direct 0281 0209 0205 0916 0795 0972GCV 0687 0237 0390 0929 0857 1112REML 1204 0825 0621 2104 1832 2263

Table 5 Results of sample MSE of the selected smoothing parameter via direct method GCV and REML for 120588 = 0 and 120588 = 05 All entriesfor MSE are 102 times their actual values

120588 = 0 120588 = 05

True 1198911 1198912 1198913 1198911 1198912 1198913

Direct 0124 0042 0082 0312 0105 0428GCV 1315 0485 0213 0547 0352 0741REML 2531 1237 2656 1043 0846 1588

Table 6 Results of sample PSE of the penalized spline estimator Allentries for PSE are 102 times their actual values

119899 = 200 120588 = 0 120588 = 05

Direct 0281 0429GCV 0387 0467REML 1204 0925

To confirm the consistency of the direct method thesample MSE of 119895opt is calculated the same manner as thatgiven in univariate regression case For comparison thesample MSE of GCV and REML are obtained Here the true120582119895opt (119895 = 1 2 3) is defined in the same way as that forunivariate case In Table 5 the results for each 1198911 1198912 and 1198913and 120588 = 0 and 05 are shown We see from the results that thebehavior of the directmethod is goodThe sampleMSE of thedirect method is smaller than that of GCV and REML for all119895 For the random design 120588 = 05 such tendency can be alsoseen

Table 6 shows the sPSE of 119891(x119894) = 1198911(1199091198941) + 1198912(1199091198942) +1198913(1199091198943)with each smoothing parameter selectionmethodWesee from result that the efficiency of the direct method isguaranteed in the context of prediction accuracy

Finally we show the computational time (in second)required to construct the penalized spline estimatorwith eachmethod The computational times with the direct methodGCV and REML are 1187 s 12643 s and 435 s respectivelyWe see that the direct method is more efficient than othermethods in context of the computation (see Remark 11) Allof computations in the simulation were done using a softwareR and a computer with 340GHz CPU and 240GB memoryThough this is only few examples the direct method can beseen as one of the good methods to select the smoothingparameter

Remark 11 We calculate |sPSE minus (05)2| as the criteria toconfirm the prediction squared error The ordinal PSE isdefined as

PSE = 1119899

119899

sum

119894=1

119864 [119884119894 minus 119891 (119909119894)2

]

= 1205902+1

119899

119899

sum

119894=1

119864 [119891 (119909119894) minus 119891 (119909119894)2

]

(46)

where 119884119894rsquos are the test data The second term of the PSEis similar to MISE So it can be said that the sample PSEevaluates the accuracy of the variance part andMISE part Tosee the detail of the difference of the sample PSE between thedirect method and other method we calculated |sPSE minus 1205902|in this section

6 Discussion

In this paper we proposed a new direct method for deter-mining the smoothing parameter for a penalized splineestimator in a regression problemThe directmethod is basedon minimizing MISE of the penalized spline estimator Westudied the asymptotic property of the direct method Theasymptotic normality of the penalized spline estimator usingopt is theoretically guaranteed when the consistent estimatoris used as the pilot estimator to obtain opt In numericalstudy for the additive model the computational cost of thisdirect method is dramatically reduced when compared togrid search methods such as GCV Furthermore we find thatthe performance of the direct method is better than or at leastsimilar to that of other methods

The direct method will be developed for other regressionmodels such as varying-coefficient models Cox propor-tional hazards models single-index models and others ifthe asymptotic bias and variance of the penalized spline

10 Journal of Probability and Statistics

estimator are derived It is not limited to themean regressionit can be applied to quantile regression Actually Yoshida[31] has presented the asymptotic bias and variance of thepenalized spline estimator in univariate quantile regressionsFurthermore it is seen that improving the direct method isimportant for various situations and datasets In particularthe development of the determination of locally adaptive 120582 isan interesting avenue of further research

Appendix

We describe the technical details For the matrix 119860 = (119886119894119895)119894119895119860infin = max119894119895|119886119894119895| First to prove Theorems 1 and 2 weintroduce the fundamental property of penalized splines inthe following lemma

Lemma A1 Let 119860 = (119886119894119895)119894119895 be (119870 + 119901) matrix Supposethat 119870 = 119900(11989912) 120582 = 119900(1198991198701minus119898) and 119860infin = 119874(1) Then119860119866minus1

119899infin= 119874(119870) and 119860Λminus1

119899infin= 119874(119870)

The proof of Lemma A1 is shown in Zhou et al [28] andClaeskens et al [18] The repeated use of Lemma A1 yieldsthat the asymptotic order of the variance of the second termof (12) is 119874(1205822(119870119899)3) Actually the asymptotic order of thevariance of the second term of (12) can be calculated as

(120582

119899)

21205902

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899119863119879

119898119863119898Λminus1

119899B (119909)

= 119874(1205822

11989931198703)

(A1)

When 119870 = 119900(11989912) and 120582 = 119874(119899119870

1minus119898) 1205822(119870119899)3 =

119900(120582(119870119899)2) holds

Proof of Theorem 1 We write

119862 (119909 | 119899 119870 120582) =21205902

119899B(119909)119879119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909)

+21205902

119899B(119909)119879 Λminus1

119899minus 119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909) (A2)

Since

Λminus1

119899minus 119866minus1

119899= Λminus1

119899119868 minus Λ 119899119866

minus1

119899

=120582

119899Λminus1

119899119863119879

119898119863119898119866minus1

119899

(A3)

we have Λminus1119899minus 119866minus1

119899infin

= 119874(1205821198702119899) and hence the

second term of right hand side of (A2) can be shown119874((119870119899)

2(120582119870119899)) From Lemma A1 it is easy to derive that

119862(119909 | 119899 119870 0) = 119874(119870(119870119899)) Consequently we have

119862 (119909 | 119899 119870 120582)

119870=119862 (119909 | 119899 119870 0)

119870+ 119874((

119870

119899)

2

(120582

119899))

=119862 (119909 | 119899 119870 0)

119870+ 119900 (

119870

119899)

(A4)

and this leads to Theorem 1

Proof of Theorem 2 It is sufficient to prove that int1198871198861198631119899(119909 |

blowast)119889119909 = 119874(1198702(1minus119898)) and

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874(1198702

119899)

(A5)

Since100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

119889119909

100381610038161003816100381610038161003816100381610038161003816

le sup119909isin[119886119887]

[B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

] (119886 minus 119887)

(A6)

we have

int

119887

119886

1198631119899 (119909 | blowast) 119889119909 = 119874 (119870

2(1minus119898)) (A7)

by using Lemma A1 and the fact that 119863119898blowastinfin = 119874(119870minus119898)

(see Yoshida [31])From the property of119861-spline function for (119870+119901) square

matrix 119860 there exists 119862 gt 0 such that

int

119887

119886

B(119909)119879119860B (119909) 119889119909 le 119860infin int119887

119886

B(119909)119879B (119909) 119889119909

le 119860infin119862

(A8)

In other words the asymptotic order of int119887119886B(119909)119879119860B(119909)119889119909

and that of 119860infin are the same Let 119860119899 = 119866minus1

119899119863119879

119898119863119898119866minus1

119899

Then since 119860119899infin = 1198702 we obtain

int

119887

119886

119862 (119909 | 119899 119870 0) 119889119909 = 119874(1198702

119899) (A9)

Furthermore because sup119886isin[119886119887]

119887119886(119909) = 119874(1) we have

119870minus(119901+1)

100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

119887119886 (119909)B(119909)119879119866minus1

119899119863119879

119898119863119898blowast119889119909

100381610038161003816100381610038161003816100381610038161003816

= 119874 (119870minus(119901+119898)

)

(A10)

and hence it can be shown that

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874 (119899minus11198702) (A11)

Together with (A7) and (A11) 120582opt = 119874(119899119870119898minus119901minus2

+ 1198702119898)

can be obtained Then rate of convergence of the penalizedspline estimator with 120582opt is 119874(119899

minus2(119901+2)(2119901 + 3)) when 119870 =

119874(1198991(2119901+3)

) which is detailed in Yoshida and Naito [22]

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Journal of Probability and Statistics 11

Acknowledgment

The author also would like to thank two anonymous refereesfor their careful reading and comments to improve the paper

References

[1] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash527 1986

[2] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash121 1996 With comments and a rejoinder by the authors

[3] B D Marx and P H C Eilers ldquoDirect generalized additivemodeling with penalized likelihoodrdquo Computational Statisticsand Data Analysis vol 28 no 2 pp 193ndash209 1998

[4] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge UK 2003

[5] T Krivobokova ldquoSmoothing parameter selection in two frame-works for penalized splinesrdquo Journal of the Royal StatisticalSociety B vol 75 no 4 pp 725ndash741 2013

[6] P T Reiss and R T Ogden ldquoSmoothing parameter selection fora class of semiparametric linear modelsrdquo Journal of the RoyalStatistical Society B vol 71 no 2 pp 505ndash523 2009

[7] S N Wood ldquoModelling and smoothing parameter estimationwith multiple quadratic penaltiesrdquo Journal of the Royal Statisti-cal Society B vol 62 no 2 pp 413ndash428 2000

[8] S NWood ldquoStable and efficient multiple smoothing parameterestimation for generalized additive modelsrdquo Journal of theAmerican Statistical Association vol 99 no 467 pp 673ndash6862004

[9] S N Wood ldquoFast stable direct fitting and smoothness selectionfor generalized additive modelsrdquo Journal of the Royal StatisticalSociety B vol 70 no 3 pp 495ndash518 2008

[10] X Lin and D Zhang ldquoInference in generalized additive mixedmodels by using smoothing splinesrdquo Journal of the RoyalStatistical Society B vol 61 no 2 pp 381ndash400 1999

[11] M P Wand ldquoSmoothing and mixed modelsrdquo ComputationalStatistics vol 18 no 2 pp 223ndash249 2003

[12] L Fahrmeir T Kneib and S Lang ldquoPenalized structuredadditive regression for space-time data a Bayesian perspectiverdquoStatistica Sinica vol 14 no 3 pp 731ndash761 2004

[13] L Fahrmeir and T Kneib ldquoPropriety of posteriors in structuredadditive regression models theory and empirical evidencerdquoJournal of Statistical Planning and Inference vol 139 no 3 pp843ndash859 2009

[14] F Heinzl L Fahrmeir and T Kneib ldquoAdditive mixed modelswith Dirichlet process mixture and P-spline priorsrdquo Advancesin Statistical Analysis vol 96 no 1 pp 47ndash68 2012

[15] G Kauermann ldquoA note on smoothing parameter selection forpenalized spline smoothingrdquo Journal of Statistical Planning andInference vol 127 no 1-2 pp 53ndash69 2005

[16] P Hall and J D Opsomer ldquoTheory for penalised splineregressionrdquo Biometrika vol 92 no 1 pp 105ndash118 2005

[17] Y Li and D Ruppert ldquoOn the asymptotics of penalized splinesrdquoBiometrika vol 95 no 2 pp 415ndash436 2008

[18] G Claeskens T Krivobokova and J D Opsomer ldquoAsymptoticproperties of penalized spline estimatorsrdquo Biometrika vol 96no 3 pp 529ndash544 2009

[19] G Kauermann T Krivobokova and L Fahrmeir ldquoSomeasymptotic results on generalized penalized spline smoothingrdquo

Journal of the Royal Statistical Society B vol 71 no 2 pp 487ndash503 2009

[20] X Wang J Shen and D Ruppert ldquoOn the asymptotics ofpenalized spline smoothingrdquo Electronic Journal of Statistics vol5 pp 1ndash17 2011

[21] T Yoshida and K Naito ldquoAsymptotics for penalized additive 119861-spline regressionrdquo Journal of the Japan Statistical Society vol 42no 1 pp 81ndash107 2012

[22] T Yoshida and K Naito ldquoAsymptotics for penalized splinesin generalized additive modelsrdquo Journal of NonparametricStatistics vol 26 no 2 pp 269ndash289 2014

[23] L Xiao Y Li and D Ruppert ldquoFast bivariate 119875-splines thesandwich smootherrdquo Journal of the Royal Statistical Society Bvol 75 no 3 pp 577ndash599 2013

[24] D Ruppert S J Sheather and M P Wand ldquoAn effectivebandwidth selector for local least squares regressionrdquo Journal ofthe American Statistical Association vol 90 no 432 pp 1257ndash1270 1995

[25] M P Wand and M C Jones Kernel Smoothing Chapman ampHall London UK 1995

[26] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[27] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[28] S Zhou X Shen and D A Wolfe ldquoLocal asymptotics forregression splines and confidence regionsrdquo The Annals ofStatistics vol 26 no 5 pp 1760ndash1782 1998

[29] G K Robinson ldquoThat BLUP is a good thing the estimation ofrandom effectsrdquo Statistical Science vol 6 no 1 pp 15ndash32 1991

[30] T J Hastie and R J Tibshirani Generalized Additive ModelsChapman amp Hall London UKI 1990

[31] T Yoshida ldquoAsymptotics for penalized spline estimators inquantile regressionrdquo Communications in Statistics-Theory andMethods 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Direct Determination of Smoothing

2 Journal of Probability and Statistics

to select the smoothing parameter has been studied byFahrmeir et al [12] Fahrmeir andKneib [13] andHeinzl et al[14] Kauermann [15] compared some smoothing parameterselection methods

In this paper we propose a new method to determiningthe smoothing parameter using the asymptotic propertiesof the penalized splines For the remainder of this paperour new method will be known as the direct methodBefore describing the outline of the direct method we willbriefly introduce the asymptotic studies of penalized splinesFirst Hall and Opsomer [16] showed the consistency ofthe penalized spline estimator in white noise representationSubsequently Li and Ruppert [17] Claeskens et al [18]Kauermann et al [19] andWang et al [20] have developed theasymptotics for the penalized spline estimator in univariateregression Yoshida and Naito [21] and Yoshida and Naito[22] have studied the asymptotics for penalized splines inadditive regression models and generalized additive modelsrespectively Xiao et al [23] suggested a new penalized splineestimator and developed its asymptotic properties in bivari-ate regression Thus the developments of the asymptotictheories of the penalized splines are relatively recent eventsIn addition the smoothing parameter selection methodsusing asymptotic properties have not yet been studied Thismotivates us to try to establish such methods

The direct method is conducted by minimizing the meanintegrated squared error (MISE) of the penalized spline esti-mator In general the MISE of the nonparametric estimatoris divided into the integrated squared bias and the integratedvariance of the estimator Of course the penalized splineestimator is no exception and hence the direct method isstated by utilizing the expression of the asymptotic bias andvariance of the penalized spline estimator which have beenderived by Claeskens et al [18] Kauermann et al [19] andYoshida and Naito [22] From their result we see that theasymptotic order of the variance of the penalized splineestimator is only dependent on the sample size and thenumber of knots but not the smoothing parameter Howeverthe second term of the asymptotic variance of the penalizedspline estimator contains the smoothing parameter and wecan see that the variance becomes small when the smoothingparameter increases On the other hand the squared biasof the penalized spline estimator increases if the smoothingparameter is reduced Therefore the minimizer of the MISEof the penalized spline estimator can be seen as one of theoptimal smoothing parameters Since the MISE is asymptot-ically convex with respect to the smoothing parameter theglobalminimumof theMISE can be foundThis detection hasbeen sufficiently developed for bandwidth selection in kernelregression (see Ruppert et al [24]Wand and Jones [25] etc)First the present paper focuses on univariate regression andwe next extend the direct method to the additive models Inboth models the mathematical and the numerical propertiesof the direct method are studied In additive models we needto select a smoothing parameter of the same number as theexplanatory variable such that the computational cost of thegrid search becomes large We expect that the computationalcost of the direct method is dramatically reduced comparedto the grid search

The structure of this paper is as follows In Section 2we introduce a penalized spline estimator in a univariateregression model Section 3 provides the direct method andrelated properties Section 4 extends the direct method to theadditive model In Section 5 we confirm the performanceof the direct method using a numerical study We provide adiscussion on the outlook and further studies in Section 6The proofs of our theorems are provided in the appendix

2 Penalized Spline Estimator

Consider the regression problem with 119899 observations

119884119894 = 119891 (119909119894) + 120576119894 119894 = 1 119899 (1)

where 119884119894 is the response variable 119909119894 is the explanatory vari-able 119891 is the true regression function and 120576119894 is the randomerror which is assumed to be independently distributed withmean 0 and variance 1205902Throughout the paper we assume theexplanatory variable 119909119894 isin [119886 119887] (119894 = 1 119899) is not a randomvariable from which the expectation of 119884119894 can be expressedas 119864[119884119894 | 119909119894] = 119891(119909119894) The support of the explanatory 119909119894can be relaxed as the real space R In order to simplify theway of locating the knots in the following the support of theexplanatory is assumed to be with compact space We aim toestimate 119891 via a nonparametric penalized spline method Weconsider the knots 119886 = 1205810 lt 1205811 lt sdot sdot sdot lt 120581119870 lt 120581119870+1 = 119887 andfor 119896 = minus119901 119870 minus 1 let 119861[119901]

119896(119909) be the 119901th degree 119861-spline

basis function defined as

119861[0]

119896(119909) =

1 120581119896 lt 119909 le 120581119896+1

0 otherwise

119861[119901]

119896(119909) =

119909 minus 120581119896

120581119896+119901 minus 120581119896

119861[119901minus1]

119896(119909) +

120581119896+119901+1 minus 119909

120581119896+119901+1 minus 120581119896+1

119861[119901minus1]

119896+1(119909)

(2)

associated with the above knots and the additional knots120581minus119901 = 120581minus119901+1 = sdot sdot sdot = 1205810 and 120581119870+1 = sdot sdot sdot = 120581119870+119901+1 The 119861-spline function 119861[119901]

119896(119909) is a piecewise 119901th degree polynomial

on an interval [120581119896 120581119896+119901+1] The details of the 119861-spline basisfunctions are described in de Boor [26] For simplicity wewrite 119861119896(119909) = 119861

[119901]

119896(119909) since we do not specify the 119901 in the

following sentenceWeuse the linear combination of 119861119896(119909) | 119896 = minus119901 119870minus

1 and the unknown parameters 119887119896 (119896 = minus119901 119870 minus 1)

to approximate the regression function and consider the 119861-spline regression problem

119884119894 = 119904 (119909119894) + 120576119894 119894 = 1 119899 (3)

where

119904 (119909) =

119870minus1

sum

119896=minus119901

119861119896 (119909) 119887119896 (4)

The purpose is to estimate the parameter b =

(119887minus119901 sdot sdot sdot 119887119870minus1)119879 included in 119904(119909) instead of 119891 directly

Journal of Probability and Statistics 3

The penalized spline estimator b = (minus119901 sdot sdot sdot 119870minus1)119879

of b isdefined as the minimizer of

119899

sum

119894=1

119884119894 minus

119870minus1

sum

119896=minus119901

119861119896(119909)119887119896

2

+ 120582

119870minus1

sum

119896=119898minus119901

Δ119898(119887119896)2 (5)

where 120582 is the smoothing parameter and Δ is the backwarddifference operator defined as Δ119887119896 = 119887119896 minus 119887119896minus1 and

Δ119898(119887119896) = Δ

119898minus1(Δ119887119896) =

119898

sum

119895=0

(minus1)119898minus119895

119898119862119895119887119896minus119898+119895 (6)

Let 119863119898 = (119889(119898)

119894119895)119894119895 be a (119870 + 119901 minus 119898) times (119870 + 119901) matrix where

119889(119898)

119894119895= (minus1)

|119894minus119895|

119898119862|119894minus119895| for 119894 le 119895 le 119898+1 and 0 otherwise Usingthe notation Y = (1198841 sdot sdot sdot 119884119899)

119879 and 119885 = (119861minus119901+119895minus1(119909119894))119894119895 (5)can then be expressed as

(Y minus 119885b)119879 (Y minus 119885b) + 120582b119879119863119879119898119863119898b (7)

The minimum of (7) is obtained when

b = (119885119879119885 + 120582119863119879119898119863119898)minus1

119885119879Y (8)

The penalized spline estimator 119891(119909) of 119891(119909) for 119909 isin [119886 119887] isdefined as

119891 (119909) =

119870minus1

sum

119896=minus119901

119861119896 (119909) 119896 = B(119909)119879b (9)

where B(119909) = (119861minus119901(119909) sdot sdot sdot 119861119870minus1(119909))119879

If120582 equiv 0119891(119909) is reduced to the regression spline estimatorwhich is the spline estimator obtained via the least squaresmethod The regression spline estimator will lead to anoscillatory fit if the number of knots 119870 is large However thedetermination of119870 and the location of knots are very difficultproblems The advantage of the penalized spline smoothingis that the good smoothing parameter brings the estimatorto the curve with the fitness and smoothness simultaneouslywithout choosing the number and location of knots preciselyIn the present paper we use equidistant knots 120581119896 = 119886 + 119896119870and focus on the determination of the smoothing parameterAs the location of knots other than above the quantiles ofthe data points 1199091 119909119899 are often used (see Ruppert [27])However it is known that the penalized spline estimator ishardly affected by the location of knots if 119870 is not too smallTherefore we do not discuss the location of knots We suggestthe direct method for this in the next section

3 Direct Determination ofSmoothing Parameter

In this section we provide the direct method for determiningthe smoothing parameter without a grid search This directmethod is given theoretical justification by asymptotic theoryof the penalized spline estimator To investigate the asymp-totic property of the penalized spline estimator we assumethat 119891 isin 119862119901+1 119870 = 119900(11989912) and 120582 = 119874(1198991198701minus119898)

For convenience we first give some notation Let 119866119899 =119899minus1119885119879119885 and Λ 119899 = 119866119899 + (120582119899)119863

119879

119898119863119898 Let blowast be a best 119871infin

approximation to the true function 119891 This means that blowastsatisfies

sup119909isin(119886119887)

10038161003816100381610038161003816119891 (119909) + 119870

minus(119901+1)119887119886 (119909) minus B(119909)

119879blowast10038161003816100381610038161003816

= 119900 (119870minus(119901+1)

) as 119870 997888rarr infin(10)

where

119887119886 (119909) = minus119891(119901+1)

(119909)

(119901 + 1)

119870

sum

119896=1

119868 (120581119896minus1 le 119909 lt 120581119896)Br119901+1 (119909 minus 120581119896minus1

119870minus1)

(11)

119868(119886 lt 119909 lt 119887) is the indicator function of an interval (119886 119887)and Br119901(119909) is the 119901th Bernoulli polynomial (see Zhou et al[28]) It can be easily shown that 119887119886(119909) = 119874(1) as 119870 rarr infin

The penalized spline estimator can be written as

119891 (119909) = B(119909)119879(119885119879119885)minus1

119885119879Y

minus120582

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899(119885

119899)

119879

Y(12)

The first term of the right hand side of (12) is equal to theregression spline estimator denoted by 119891rs(119909) The asymp-totics for the regression spline estimator has been developedby Zhou et al [28] and can be expressed as

119864 [119891rs (119909)] = 119891 (119909) + 119870minus(119901+1)

119887119886 (119909) + 119900 (119870minus(119901+1)

)

119881 [119891rs (119909)] =1205902

119899B(119909)119879119866minus1

119899B (119909) 1 + 119900 (1) = 119874(119870

119899)

(13)

FromTheorem 2(a) of Claeskens et al [18] we have

119864 [119891 (119909)] minus 119864 [119891rs (119909)] = minus120582

119899B(119909)119879Λminus1

119899119863119879

119898119863119898blowast

+ 119900(1205821198701minus119898

119899) = 119874(

1205821198701minus119898

119899)

119881 [119891 (119909)] = 119881 [119891rs (119909)] minus120582119870

119899

119862 (119909 | 119899 119870 120582)

119870+ 119900(

1205821198702

1198992)

(14)

where

119862 (119909 | 119899 119870 120582) =21205902

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899B (119909) (15)

is the covariance of 119891rs(119909) and the second term of the righthand side of (12)The variance of the second term of (12) canbe shown to be negligible (see the appendix) The followingtheorem leads to 120582 controlling the trade-off between thesquared bias and variance of the penalized spline estimator

Theorem1 Thecovariance119862(119909 | 119899 119870 120582)119870 in (15) is positiveFurthermore as 119899 rarr infin 119862(119909 | 119899 119870 120582) = 119862(119909 | 119899 119870 0)(1 +119900(1)) and 119862(119909 | 119899 119870 0)119870 = 119874(119870119899)

4 Journal of Probability and Statistics

From the asymptotic form of 119864[119891(119909)] and 119881[119891(119911)] andTheorem 1 we see that for small 120582 the bias of 119891(119909) issmall and the variance becomes large On the other handthe large 120582 indicates the bias of 119891(119909) increases and thevariance decreases From Theorem 1 the MISE of 119891(119909) canbe expressed as

int

119887

119886

119864 [119891 (119909) minus 119891 (119909)2

] 119889119909

= int

119887

119886

119870minus(119901+1)

119887119886 (119909) minus120582

119899B(119909)119879Λminus1

119899119863119879

119898119863119898blowast

2

119889119909

+1205902

119899int

119887

119886

B(119909)119879119866119899B (119909) 119889119909 minus120582119870

119899

int119887

119886119862 (119909 | 119899 119870 0) 119889119909

119870

+ 1199031119899 (119870) + 1199032119899 (119870 120582)

(16)

where 1199031119899(119870) and 1199032119899(119870 120582) are of negligible order respec-tively compared to the regression spline and penalized splineof the second term of the right hand side of (12) Actuallywe have 1199031119899(119870) = 119900(119870119899) and 1199032119899(119870 120582) = 119900(120582

21198702(1minus119898)

1198992)

The MISE of 119891(119909) is asymptotically quadratic and a globalminimum exists Let

MISE (120582) = 1205822

1198992int

119887

119886

B(119909)119879Λminus1119899119863119879

119898119863119898blowast2

119889119909

minus120582

119899int

119887

119886

2119887119886 (119909)B(119909)119879Λminus1119899 119863

119879

119898119863119898blowast

119870119901+1

+119862 (119909 | 119899 119870 0) 119889119909

(17)

And let 120582opt be theminimizer ofMISE(120582)We suggest the useof 120582opt as the optimal smoothing parameter

120582opt =119899

2

int119887

1198861198632119899 (119909 | 119891

(119901+1) blowast) 119889119909

int119887

1198861198631119899 (119909 | blowast) 119889119909

(18)

where

1198631119899 (119909 | blowast) = B(119909)119879119866minus1

119899119863119879

119898119863119898blowast2

1198632119899 (119909 | 119891(119901+1)

blowast) = 2119887119886 (119909)B(119909)119879119866minus1119899 119863

119879

119898119863119898blowast

119870119901+1

+ 119862 (119909 | 119899 119870 0)

(19)

However it is easy to see that MISE(120582) and 120582opt contain anunknown function and parameters and hence these must beestimated We construct the estimator of 120582opt by using theconsistent estimator of 119891(119901+1) and blowast We can use the penal-ized spline estimator and its derivative as the pilot estimatorof119891(119901+1) and blowast If we then use another smoothing parameter1205820 it should be chosen appropriately Therefore we use theregression spline estimator as the pilot estimator of 119891(119901+1)(119909)

and blowast First we establish b = (119885119879119885)minus1119885119879119884 Next we con-struct the pilot estimator with119891(119901+1)(119909) by using the (119901+2)thdegree 119861-spline basis Let B[119901](119909) = (119861[119901]minus119901 (119909) sdot sdot sdot 119861

[119901]

119870minus1(119909))119879

Using the fundamental property of the 119861-spline function119891(119898)(119909) can be written as 119891(119898)(119909) = 119861

[119901minus119898](119909)119879119863119898blowast

asymptotically Hence the regression spline estimator 119891(119901+1)

can be constructed as 119891(119901+1)(119909) = B[119901+1](119909)119879119863119901+1b(119901+2)where b(119901+2) = ((119885[119901+2])119879119885[119901+2])minus1(119885[119901+2])119879Y and 119885[119901+2] =(119861[119901+2]

minus119901+119895minus1(119909119894))119894119895 Since the regression spline estimator tends to

be oscillatory with a higher order 119901th spline function thefewer knots are used to construct 119891(119901+1)(119909) The variance 1205902included in 119862(119909 | 119899 119870 0) is estimated via

2=

1

119899 minus (119870 + 119901)

119899

sum

119894=1

119884119894 minus B(119909119894)119879b2

(20)

Using the above pilot estimators

opt =119899

2

sum119869

119895=11198632 (119911119895 | 119891

(119901+1) b)

sum119869

119895=11198631 (119911119895 | b)

(21)

with some finite grid points 119911119895119869

1on [119886 119887]

Consequently the final penalized spline estimator isdefined as

119891 (119909) = B(119909)119879bopt (22)

where

bopt = (119885119879119885 + opt119863

119879

119898119863119898)minus1

119885119879Y (23)

It is known that the optimal order of 119870 of the penalizedspline estimator is the same as that of the regression splineestimator 119870 = 119874(119899

1(2119901+3)) (see Kauermann et al [19] and

Zhou et al [28]) Using this we show the asymptotic propertyof 120582opt in following theorem

Theorem 2 Let 119891 isin 119862119901+1 Suppose that 119870 = 119900(11989912) and

120582 = 119900(1198991198701minus119898) Then 120582119900119901119905 given in (21) exists and 120582119900119901119905 =

119874(119899119870119898minus119901minus2

+1198702119898) as 119899 rarr infin Furthermore119870 = 119874(1198991(2119901+3))

leads to the optimal order

120582opt = 119874 (119899(119901+119898+1)(2119901+3)

) + 119874 (1198992119898(2119901+3)

) (24)

and the rate of convergence of MISE of 119891(119909) becomes119874(119899minus2(119901+1)(2119901+3)

)

The proof of Theorem 2 is given in the appendix At theend of this section we give a few remarks

Remark 3 The asymptotic order of the squared biasand variance of the penalized splines are 119874(119870minus2(119901+1)) +119874(12058221198702(1minus119898)

1198992) and 119874(119870119899) respectively Therefore under

119870 = (1198991(2119901+3)

) and 120582 = 119874(119899(119901+119898+1)(2119901+3)

) the optimalrate of convergence of MISE of the penalized splines is119874(119899minus2(119901+1)(2119901+3)

) From Theorem 2 we see that the asymp-totic order of 120582opt yields the optimal rate of convergence

Journal of Probability and Statistics 5

Remark 4 OrsquoSullivan [1] used 120583int119887119886119904(119898)(119909)2119889119909 as the penalty

term where 120583 is the smoothing parameter When equidistantknots are used the penaltyint119887

119886119904(119898)(119909)2119889119909 can be expressed as

1198702119898b119879119863119879

119898119877119863119898b where 119877 = (int

119887

119886119861[119901minus119898]

119894(119909)119861[119901minus119898]

119895(119909))119894119895 The

penalty b119879119863119879119898119863119898b provided by Eilers and Marx [2] can be

seen as the simple version of1198702119898b119879119863119879119898119877119863119898b by replacing 119877

with 119870minus1119868 and 120582 = 1205831198702119898minus1 where 119868 is the identity matrixSo the order 119898 of the difference matrix 119863119898 controls thesmoothness of the 119901th 119861-spline function 119904(119909) and hence 119898should be set such that119898 le 119901 to give a theoretical justificationalthough the penalized spline estimator can be also calculatedfor 119898 gt 119901 Actually (119901119898) = (3 2) is often used by manyauthors

Remark 5 The penalized spline regression is often consid-ered as the mixed model representation (see Ruppert et al[4]) In this frame work we use the 119901th truncated splinemodel

119904 (119909) = 1205730 + 1205731119909 + sdot sdot sdot + 120573119901119909119901+

119870

sum

119895=1

119906119896(119909 minus 120581119896)119901

+ (25)

where (119909 minus 119886)+ = max119909 minus 119886 0 and the 120573rsquos are unknownparameters Each 119906119896 is independently distributed as 119906119896 sim119873(0 120590

2

119906) where 1205902

119906is an unknown variance parameter of 119906119896

The penalized spline fit of 119891(119909) is defined as the estimatedBLUP (see Robinson [29]) The smoothing parameter in theordinal spline regression model corresponds to 12059021205902

119906in the

spline mixed model Since the 1205902 and 1205902119906are estimated using

the ML or REML method we do not need to choose thesmoothing parameter via a grid search It is known that theestimated BLUP fit is linked to the penalized spline estimator(9) with 119898 = 119901 + 1 (see Kauermann et al [19]) Hence theestimated BLUP tends to have a theoretically underfit (seeRemark 4)

Remark 6 From Lyapunovrsquos theorem the asymptotic nor-mality of the penalized spline estimator119891(119909)with opt can bederived under the same assumption as Theorem 2 and someadditional mild conditions Although the proof is omittedit is straightforward since opt is the consistent estimator of120582opt and the asymptotic order of 120582opt satisfies Lyapunovrsquoscondition

4 Extension to Additive Models

We extend the direct method to the regression model withmultidimensional explanatory variables In particular weconsider additive models in this section For the dataset(119884119894 x119894) 119894 = 1 119899 with 1-dimensional response 119884119894 and119889-dimensional explanatory x119894 = (1199091198941 119909119894119889) the additivemodels are connected via unknown regression functions119891119895 (119895 = 1 119889) and the mean parameter 120583 such that

119884119894 = 120583 + 1198911 (1199091198941) + sdot sdot sdot + 119891119889 (119909119894119889) + 120576119894 (26)

We assume that 119909119894119895 is located on an interval [119886119895 119887119895] and1198911 119891119889 are normalized as

int

119887119895

119886119895

119891119895 (119906) 119889119906 = 0 119895 = 1 119889 (27)

to ensure the identifiability of 119891119895 Then the intercept 120583 istypically estimated via

120583 =1

119899

119899

sum

119894=1

119884119894 (28)

Hence we replace 119884119894 with 119884119894 minus 120583 in (26) and set 120583 = 0redefining the additive model as

119884119894 = 1198911 (1199091198941) + sdot sdot sdot + 119891119889 (119909119894119889) + 120576119894 119894 = 1 119899 (29)

where each 119884119894 is centered We aim to estimate 119891119895 via apenalized spline method Let

119904119895 (119909) =

119870119895minus1

sum

119896=minus119901

119861119895119896 (119909) 119887119895119896 119895 = 1 119889 (30)

be the 119861-spline model where 119861119895119896(119909) is the 119901th 119861-splinebasis function with knots 120581119895119896 | 119896 = minus119901 119870119895 + 119901 +

1 1205811198950 = 119886119895 120581119895119870119895+1= 119887119895 119895 = 1 119889 and 119887119895119896rsquos are unknown

parameters We consider the 119861-spline additive models

119884119894 = 1199041 (1199091198941) + sdot sdot sdot + 119904119889 (119909119894119889) + 120576119894 119894 = 1 119899 (31)

and estimate 119904119895 and 119887119895119896 For 119895 = 1 119889 the penalized spline

estimator b119895 = (119895minus119901 sdot sdot sdot 119895119870minus1)119879

of b119895 = (119887119895minus119901 sdot sdot sdot 119887119895119870minus1)119879

is defined as

[[

[

b1b119889

]]

]

= argminb1b119889

119899

sum

119894=1

119884119894 minus

119889

sum

119895=1

119870119895

sum

119896=minus119901

119861119895119896 (119909119894) 119887119895119896

2

+

119889

sum

119895=1

120582119895b119879

119895119863119879

119895119898119863119895119898b119895

(32)

where 120582119895 are the smoothing parameters and 119863119895119898 is the 119898thorder difference matrix with size (119870119895 + 119901 minus 119898) times (119870119895 + 119901) for119895 = 1 119889 By using b119895 the penalized spline estimator of119891119895(119909119895) is defined as

119891119895 (119909119895) =

119870119895minus1

sum

119896=minus119901

119861119895119896 (119909119895) 119895119896 119895 = 1 119889 (33)

6 Journal of Probability and Statistics

The asymptotics for 119891119895(119909119895) have been studied by Yoshidaand Naito [22] who derived the asymptotic bias and varianceto be

119864 [119891119895 (119909119895)] = 119891119895 (119909119895) + 119870minus(119901+1)

119895119887119895119886 (119909119895) (1 + 119900 (1))

+

120582119895

119899B119895(119909119895)

119879

Λminus1

119895119899119863119879

119895119898119863119895119898blowast

119895+ 119900(

1205821198951198701minus119898

119895

119899)

119881 [119891119895 (119909119895)] =1205902

119899B119895(119909119895)

119879

119866minus1

119895119899B119895 (119909119895)

minus120582119870

119899

1205902119862119895 (119909119895 | 119899 119870119895)

119870+ 119900(

1205821198951198702

119895

1198992)

(34)

where B119895(119909) = (119861minus119901(119909) sdot sdot sdot 119861119870119895minus1(119909))119879

119866119895119899 = 119899minus1119885119879

119895119885119895

Λ 119895119899 = 119866119895119899 + (120582119895119899)119863119879

119898119863119898 119885119895 = (119861minus119901+119896minus1(119909119894119895))119894119896 and blowast

119895is

the best 119871infin approximation of 119891119895

119887119895119886 (119909) = minus

119891(119901+1)

119895(119909)

(119901 + 1)

119870119895minus1

sum

119896=0

119868 (120581119895119896 le 119909 lt 120581119895119896+1)

times 119861119901+1(

119909 minus 120581119895119896

119870minus1119895

)

119862119895 (119909119895 | 119899 119870119895) =2

119899B119895(119909119895)

119879

119866minus1

119895119899119863119879

119895119898119863119895119898119866

minus1

119899B119895 (119909119895)

(35)

The above asymptotic bias and variance of 119891119895(119909119895) are sim-ilar to that of the penalized spline estimator in univariateregression with (119884119894 119909119894119895) 119894 = 1 119899 Furthermore

the asymptotic normality of [1198911(1199091) sdot sdot sdot 119891119889(119909119889)]119879

has beenshown by Yoshida and Naito [22] From their paper we findthat119891119894(119909119894) and119891119895(119909119895) are then asymptotically independent for119894 = 119895 This indicates some theoretical justification to select 120582119895in minimizing the MISE of 119891119895 Similar to the discussion inSection 3 the minimizer 120582119895opt of the MISE of 119891119895(119909119895) can beobtained for 119895 = 1 119889

120582119895opt =119899

2

int119887119895

119886119895119871119895119899 (119909 | 119891

(119901+1)

119895 blowast119895 1205902) 119889119909

int119887119895

119886119895119867119895119899 (119909 | blowast119895 ) 119889119909

(36)

where

119867119895119899 (119909 | blowast

119895) = B119895(119909119895)

119879

119866minus1

119895119899119863119879

119895119898119863119895119898blowast

119895

2

119871119895119899 (119909 | 119891(119901+1)

119895 blowast119895 1205902) = 2

119887119886 (119909)B119895(119909119895)119879

119866minus1

119895119899119863119879

119895119898119863119895119898blowast119895

119870119901+1

119895

+ 1205902119862119895 (119909 | 119899 119870119895)

(37)

Since 119891119895 blowast119895 and 1205902 are unknown they should be estimated

Thepilot estimators of119891119895 blowast119895 and1205902 are constructed based on

the regression splinemethod By using the pilot estimators119891119895b119895 of119891119895blowast119895 and the estimator of1205902 we construct the estimatorof 120582119895opt

119895opt =119899

2

sum119877

119903=1119871119895119899 (119911119903 | 119891

(119901+1)

119895 b119895 2)

sum119877

119903=1119867119895119899 (119911119903 | b119895)

(38)

where 119911119903119877

1is some finite grid point sequence on [119886119895 119887119895] We

obtain for 119895 = 1 119889 the penalized spline estimator 119891119895(119909119895)of 119891119895(119909119895)

119891119895 (119909119895) = B119895(119909119895)119879

b119895opt (39)

where b119895opt is the penalized spline estimator of b119895 using119895opt From Theorem 2 and the proof of Theorem 34of Yoshida and Naito [22] the asymptotic normality of[1198911(1199091) sdot sdot sdot 119891119889(119909119889)]

119879

using (1205821opt 120582119889opt) can be shown

Remark 7 Since the true regression functions are normalizedthe estimator 119891119895 should be also centered as

119891119895 (119909119895) minus1

119899

119899

sum

119894=1

119891119895 (119909119894119895) (40)

Remark 8 Thepenalized spline estimator of b1 b119889 can beobtained using a backfitting algorithm (Hastie and Tibshirani[30]) The backfitting algorithm for the penalized splines inadditive regression is detailed in Marx and Eilers [3] andYoshida and Naito [21]

Remark 9 Although we focus on the nonparametric additiveregression in this paper the directmethod can be also appliedto the generalized additive models However we omit thisdiscussion because the procedure is similar to the case of theadditive models discussed in this section

Remark 10 The direct method is quite computationally effi-cient when compared to the grid search method in additivemodels In grid searches we prepare the candidate of 120582119895 Let119872119895 be the set of all possible candidate grid value of 120582119895 for 119895 =1 119889 Then we need to compute the backfitting algorithm1198721 times sdot sdot sdot times 119872119889 times On the other hand it is sufficientto perform the backfitting algorithm for only two steps forthe pilot estimator and the final penalized spline estimatorThus compared with the conventional grid search methodthe direct method can drastically reduce computation time

5 Numerical Study

In this section we investigate the finite sample performanceof the proposed direct method in a Monte Carlo simulationLet us first consider the univariate regression model (1) forthe data (119884119894 119909119894) 119894 = 1 119899 Then we use the three

Journal of Probability and Statistics 7

Table 1 Results of sample MISE for 119899 = 50 and 119899 = 200 All entries for MISE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0526 0322 3684 1070 1160 0377L-GCV 0726 0621 5811 1082 1183 0379L-REML 2270 1544 9966 3520 1283 0873Local linear 0751 0220 4274 0901 1689 0503C-Direct 0401 0148 3027 0656 1044 0326C-GCV 0514 0137 3526 0666 1043 0290C-REML 1326 0732 8246 4213 3241 0835

Table 2 Results of sample MSE of smoothing parameter obtained by direct method GCV and REML for 119899 = 50 and 119899 = 200 All entries forMSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0331 0193 0113 0043 0025 0037L-GCV 0842 0342 0445 0101 0072 0043L-REML 1070 1231 0842 0437 0143 0268C-Direct 0262 0014 0082 0045 0053 0014C-GCV 0452 0123 0252 0092 0085 0135C-REML 0855 0224 0426 0152 0093 0463

types of true regression function 119891(119909) = cos(120587(119909 minus 03))119891(119909) = 120601((119909 minus 08)005)radic005 minus 120601((119909 minus 02)004)radic004and 119891(119909) = 21199093 +3 sin(2120587(119909minus08)3) + 3 exp[minus(119909minus05)201]which are labeled by F1 F2 and F3 respectively Here120601(119909) is the density function of the normal distribution Theexplanatory 119909119894 and error 120576119894 are independently generated fromuniform distribution on [0 1] and 119873(0 052) respectivelyWe estimate each true regression function via the penalizedspline method We then use the linear and cubic 119861-splinebases with equidistant knots and the second order differencepenalty In addition we set 119870 = 5119899

25 equidistant knotsand the smoothing parameter is determined by the directmethodThe penalized spline estimator with the linear splineand cubic spline are denoted as L-Direct and C-Directrespectively For comparison with L-Direct the same studiesare also implemented for the penalized spline estimator witha linear spline and the smoothing parameter selected viaGCV and restricted maximum likelihood method (REML)in mixed model representation and the local polynomialestimator with normal kernel and Plug-in bandwidth (seeRuppert et al [24]) In GCV we set the candidate values of120582 as 11989410 119894 = 0 1 99 The above three estimators aredenoted by L-GCV L-REML and local linear respectivelyFurthermore we compare C-Direct with C-GCV and C-REML which are the penalized spline estimator with thecubic spline and the smoothing parameter determined byGCV and REML Let

sMISE = 1119869

119869

sum

119895=1

[1

119877

119877

sum

119903=1

119891119903 (119911119895) minus 119891 (119911119895)2

] (41)

be the sample MISE of any estimator 119891(119909) of 119891(119909) where119891119903(119911) is 119891(119911) with 119903th replication and 119911119895 = 119895119869 We calculatethe sample MISE of the penalized spline estimator withthe direct method GCV and REML and the local linearestimator In this simulation we use 119869 = 100 and 119877 = 1000We have simulated 119899 = 50 and 200

The sMISE of all estimators for each model and 119899 aregiven in Table 1 The penalized spline estimator using thedirect method shows good performance in each setting Incomparison with other smoothing parameter methods thedirect method is a little better than the GCV as a wholeHowever for 119899 = 200 C-GCV is better than C-Direct inF1 and F3 though its difference is very small We see thatthe sMISE of C-Direct is smaller than local linear whereaslocal linear behaves better than L-Direct in some case In F2C-Direct is the smallest of all estimators for 119899 = 50 and200 Although the performance totally seems to depend onsituation in which data sets are generated we believe that theproposed method is one of the efficient methods

Next the difference between opt and 120582opt is investigatedempirically Let opt119903 be the opt with 119903th replications for 119903 =1 1000 Then we calculate the sample MSE of opt

sMSE = 1119877

119877

sum

119903=1

opt119903 minus 120582opt2

(42)

for F1 F2 and F3 and 119899 = 50 and 200 To construct 120582opt forF1 F2 and F3 we use true119891(4)(119909) and 1205902 and an approximateblowast Here the approximate blowast means the sample average of bwith 119899 = 200 and 1000 replications

In Table 2 the sMSE of opt for each true functionare described For comparison the sMSE of the smoothing

8 Journal of Probability and Statistics

Table 3 Results of sample PSE for 119899 = 50 and 119899 = 200 All entries for PSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0424 0052 0341 0070 0360 0117L-GCV 0331 0084 0381 0089 0333 0139L-REML 0642 0124 0831 1002 0733 0173Local linear 0751 0220 0474 0901 0489 0143C-Direct 0341 048 0237 063 0424 0086C-GCV 0314 043 0326 076 0533 0089C-REML 0863 1672 0456 121 1043 0125

parameter obtained via GCV and REML are calculated ThesMSE of the L-Direct and the C-Direct is small even in 119899 = 50for all true functions Therefore it seems that the accuracyof opt is guaranteed It indicates that the pilot estimatorconstructed via least squares method is not bad The sMSEwith the direct method are smaller than that with GCV andREML This result is not surprising since GCV and REMLare not concerned with 120582opt However together with Table 1it seems that the sample MSE of the smoothing parameter isreflected in the sample MISE of the estimator

The proposed direct method was derived based on theMISE On the other hand the GCV and the REML areobtained in context of minimizing prediction squared error(PSE) and prediction error Hence we compare the samplePSE of the penalized spline estimator with the direct methodGCV and REML and the local linear estimator Since theprediction error is almost the same as MISE (see Section 4 ofRuppert et al [4]) we omit the comparison of the predictionerror Let

sPSE = 1119899

119899

sum

119894=1

[1

119877

119877

sum

119903=1

119910119894119903 minus 119891119903 (119909119894)2

] (43)

be the sample PSE for any estimator 119891(119909) where119910119894119903 (119903 = 1 119877) is independently generated from119884119894 sim 119873(119891(119909119894) (05)

2) for all 119894

In Table 3 the modified sPSE |sPSE minus (05)2| of allestimators for each model and 119899 are described In Remark 11we discuss the modified sPSE From the result we canconfirm that the direct method has good predictability GCVcan be regarded as the estimator of sPSE Therefore in somecase sPSE with GCV is smaller than that with the directmethod It seems that the cause is the accuracy of theestimates of variance (see Remark 11) However its differenceis very small

We admit that the computational time (in second) takento obtain 119891(119909) for F1 119901 = 3 and 119899 = 200 The fits withthe direct method GCV and REML took 004 122 and 034Although the difference is small the computational time ofthe direct method was faster than that of GCV and REML

Next we confirm the behavior of the penalized splineestimator with the direct method in the additive model Forthe data (119884119894 1199091198941 1199091198942 1199091198943) 119894 = 1 119899 we assume theadditive model with true functions 1198911(1199091) = sin(21205871199091)

1198912(1199092) = 120601((1199092 minus 05)02) and 1198913(1199093) = 04120601((1199093 minus

01)02) + 06120601((1199092 minus 08)02) The error is similar to thefirst simulationThe design points (1199091198941 1199091198942 1199091198943) are generatedby

[

[

1199091198941

1199091198942

1199091198943

]

]

=

[[[

[

(1 + 120588 + 1205882)minus1

0 0

0 (1 + 2120588)minus1

0

0 0 (1 + 120588 + 1205882)minus1

]]]

]

times [

[

1 120588 1205882

120588 1 120588

1205882120588 1

]

]

[

[

1199111198941

1199111198942

1199111198943

]

]

(44)

where 119911119894119895 (119894 = 1 119899 119895 = 1 2 3) are generated inde-pendently from the uniform distribution on [0 1] In thissimulation we adopt 120588 = 0 and 120588 = 05 We then correctedto satisfy

int

1

0

119891119895 (119911) 119889119911 = 0 119895 = 1 2 3 (45)

in each 120588 We construct the penalized spline estimator viathe backfitting algorithm with the cubic spline and secondorder difference penalty The number of equidistant knots is119870119895 = 5119899

25 and the smoothing parameters are determinedusing the direct method The pilot estimator to construct119895opt is the regression spline estimator with fifth spline and119870119895 = 119899

25 We calculate the sample MISE of each 119891119895 for119899 = 200 and 1000 Monte Carlo iterations Then we calculatethe sample MISE of 1198911 1198912 and 1198913 In order to compare withthe direct method we also conduct the same simulation withGCV and REML In GCV we set the candidate values of 120582119895 as11989410 119894 = 0 1 9 for 119895 = 1 2 3 Table 4 summarizes thesample MISE of 119891119895 (119895 = 1 2 3) denoted by MISE119895 for directmethod GCV and REML The penalized spline estimatorwith the direct method performs like that with the GCV inboth the uncorrelated and correlated design cases For 120588 = 0the behaviors of MISE1 MISE2 and MISE3 with the directmethod are similar On the other hand for GCV the MISE1is slightly larger than MISE2 and MISE3 The direct methodleads to an efficient estimate of all covariates On the wholethe direct method is better than REML From above webelieve that the direct method is preferable in practice

Journal of Probability and Statistics 9

Table 4 Results of sample MISE of the penalized spline estimator For each direct method GCV and 120588 MISE1 MISE2 and MISE3 are thesample MISE of 1198911 1198912 and 1198913 respectively All entries for MISE are 102 times their actual values

Method 120588 = 0 120588 = 05

MISE1 MISE2 MISE3 MISE1 MISE2 MISE3Direct 0281 0209 0205 0916 0795 0972GCV 0687 0237 0390 0929 0857 1112REML 1204 0825 0621 2104 1832 2263

Table 5 Results of sample MSE of the selected smoothing parameter via direct method GCV and REML for 120588 = 0 and 120588 = 05 All entriesfor MSE are 102 times their actual values

120588 = 0 120588 = 05

True 1198911 1198912 1198913 1198911 1198912 1198913

Direct 0124 0042 0082 0312 0105 0428GCV 1315 0485 0213 0547 0352 0741REML 2531 1237 2656 1043 0846 1588

Table 6 Results of sample PSE of the penalized spline estimator Allentries for PSE are 102 times their actual values

119899 = 200 120588 = 0 120588 = 05

Direct 0281 0429GCV 0387 0467REML 1204 0925

To confirm the consistency of the direct method thesample MSE of 119895opt is calculated the same manner as thatgiven in univariate regression case For comparison thesample MSE of GCV and REML are obtained Here the true120582119895opt (119895 = 1 2 3) is defined in the same way as that forunivariate case In Table 5 the results for each 1198911 1198912 and 1198913and 120588 = 0 and 05 are shown We see from the results that thebehavior of the directmethod is goodThe sampleMSE of thedirect method is smaller than that of GCV and REML for all119895 For the random design 120588 = 05 such tendency can be alsoseen

Table 6 shows the sPSE of 119891(x119894) = 1198911(1199091198941) + 1198912(1199091198942) +1198913(1199091198943)with each smoothing parameter selectionmethodWesee from result that the efficiency of the direct method isguaranteed in the context of prediction accuracy

Finally we show the computational time (in second)required to construct the penalized spline estimatorwith eachmethod The computational times with the direct methodGCV and REML are 1187 s 12643 s and 435 s respectivelyWe see that the direct method is more efficient than othermethods in context of the computation (see Remark 11) Allof computations in the simulation were done using a softwareR and a computer with 340GHz CPU and 240GB memoryThough this is only few examples the direct method can beseen as one of the good methods to select the smoothingparameter

Remark 11 We calculate |sPSE minus (05)2| as the criteria toconfirm the prediction squared error The ordinal PSE isdefined as

PSE = 1119899

119899

sum

119894=1

119864 [119884119894 minus 119891 (119909119894)2

]

= 1205902+1

119899

119899

sum

119894=1

119864 [119891 (119909119894) minus 119891 (119909119894)2

]

(46)

where 119884119894rsquos are the test data The second term of the PSEis similar to MISE So it can be said that the sample PSEevaluates the accuracy of the variance part andMISE part Tosee the detail of the difference of the sample PSE between thedirect method and other method we calculated |sPSE minus 1205902|in this section

6 Discussion

In this paper we proposed a new direct method for deter-mining the smoothing parameter for a penalized splineestimator in a regression problemThe directmethod is basedon minimizing MISE of the penalized spline estimator Westudied the asymptotic property of the direct method Theasymptotic normality of the penalized spline estimator usingopt is theoretically guaranteed when the consistent estimatoris used as the pilot estimator to obtain opt In numericalstudy for the additive model the computational cost of thisdirect method is dramatically reduced when compared togrid search methods such as GCV Furthermore we find thatthe performance of the direct method is better than or at leastsimilar to that of other methods

The direct method will be developed for other regressionmodels such as varying-coefficient models Cox propor-tional hazards models single-index models and others ifthe asymptotic bias and variance of the penalized spline

10 Journal of Probability and Statistics

estimator are derived It is not limited to themean regressionit can be applied to quantile regression Actually Yoshida[31] has presented the asymptotic bias and variance of thepenalized spline estimator in univariate quantile regressionsFurthermore it is seen that improving the direct method isimportant for various situations and datasets In particularthe development of the determination of locally adaptive 120582 isan interesting avenue of further research

Appendix

We describe the technical details For the matrix 119860 = (119886119894119895)119894119895119860infin = max119894119895|119886119894119895| First to prove Theorems 1 and 2 weintroduce the fundamental property of penalized splines inthe following lemma

Lemma A1 Let 119860 = (119886119894119895)119894119895 be (119870 + 119901) matrix Supposethat 119870 = 119900(11989912) 120582 = 119900(1198991198701minus119898) and 119860infin = 119874(1) Then119860119866minus1

119899infin= 119874(119870) and 119860Λminus1

119899infin= 119874(119870)

The proof of Lemma A1 is shown in Zhou et al [28] andClaeskens et al [18] The repeated use of Lemma A1 yieldsthat the asymptotic order of the variance of the second termof (12) is 119874(1205822(119870119899)3) Actually the asymptotic order of thevariance of the second term of (12) can be calculated as

(120582

119899)

21205902

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899119863119879

119898119863119898Λminus1

119899B (119909)

= 119874(1205822

11989931198703)

(A1)

When 119870 = 119900(11989912) and 120582 = 119874(119899119870

1minus119898) 1205822(119870119899)3 =

119900(120582(119870119899)2) holds

Proof of Theorem 1 We write

119862 (119909 | 119899 119870 120582) =21205902

119899B(119909)119879119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909)

+21205902

119899B(119909)119879 Λminus1

119899minus 119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909) (A2)

Since

Λminus1

119899minus 119866minus1

119899= Λminus1

119899119868 minus Λ 119899119866

minus1

119899

=120582

119899Λminus1

119899119863119879

119898119863119898119866minus1

119899

(A3)

we have Λminus1119899minus 119866minus1

119899infin

= 119874(1205821198702119899) and hence the

second term of right hand side of (A2) can be shown119874((119870119899)

2(120582119870119899)) From Lemma A1 it is easy to derive that

119862(119909 | 119899 119870 0) = 119874(119870(119870119899)) Consequently we have

119862 (119909 | 119899 119870 120582)

119870=119862 (119909 | 119899 119870 0)

119870+ 119874((

119870

119899)

2

(120582

119899))

=119862 (119909 | 119899 119870 0)

119870+ 119900 (

119870

119899)

(A4)

and this leads to Theorem 1

Proof of Theorem 2 It is sufficient to prove that int1198871198861198631119899(119909 |

blowast)119889119909 = 119874(1198702(1minus119898)) and

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874(1198702

119899)

(A5)

Since100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

119889119909

100381610038161003816100381610038161003816100381610038161003816

le sup119909isin[119886119887]

[B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

] (119886 minus 119887)

(A6)

we have

int

119887

119886

1198631119899 (119909 | blowast) 119889119909 = 119874 (119870

2(1minus119898)) (A7)

by using Lemma A1 and the fact that 119863119898blowastinfin = 119874(119870minus119898)

(see Yoshida [31])From the property of119861-spline function for (119870+119901) square

matrix 119860 there exists 119862 gt 0 such that

int

119887

119886

B(119909)119879119860B (119909) 119889119909 le 119860infin int119887

119886

B(119909)119879B (119909) 119889119909

le 119860infin119862

(A8)

In other words the asymptotic order of int119887119886B(119909)119879119860B(119909)119889119909

and that of 119860infin are the same Let 119860119899 = 119866minus1

119899119863119879

119898119863119898119866minus1

119899

Then since 119860119899infin = 1198702 we obtain

int

119887

119886

119862 (119909 | 119899 119870 0) 119889119909 = 119874(1198702

119899) (A9)

Furthermore because sup119886isin[119886119887]

119887119886(119909) = 119874(1) we have

119870minus(119901+1)

100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

119887119886 (119909)B(119909)119879119866minus1

119899119863119879

119898119863119898blowast119889119909

100381610038161003816100381610038161003816100381610038161003816

= 119874 (119870minus(119901+119898)

)

(A10)

and hence it can be shown that

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874 (119899minus11198702) (A11)

Together with (A7) and (A11) 120582opt = 119874(119899119870119898minus119901minus2

+ 1198702119898)

can be obtained Then rate of convergence of the penalizedspline estimator with 120582opt is 119874(119899

minus2(119901+2)(2119901 + 3)) when 119870 =

119874(1198991(2119901+3)

) which is detailed in Yoshida and Naito [22]

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Journal of Probability and Statistics 11

Acknowledgment

The author also would like to thank two anonymous refereesfor their careful reading and comments to improve the paper

References

[1] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash527 1986

[2] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash121 1996 With comments and a rejoinder by the authors

[3] B D Marx and P H C Eilers ldquoDirect generalized additivemodeling with penalized likelihoodrdquo Computational Statisticsand Data Analysis vol 28 no 2 pp 193ndash209 1998

[4] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge UK 2003

[5] T Krivobokova ldquoSmoothing parameter selection in two frame-works for penalized splinesrdquo Journal of the Royal StatisticalSociety B vol 75 no 4 pp 725ndash741 2013

[6] P T Reiss and R T Ogden ldquoSmoothing parameter selection fora class of semiparametric linear modelsrdquo Journal of the RoyalStatistical Society B vol 71 no 2 pp 505ndash523 2009

[7] S N Wood ldquoModelling and smoothing parameter estimationwith multiple quadratic penaltiesrdquo Journal of the Royal Statisti-cal Society B vol 62 no 2 pp 413ndash428 2000

[8] S NWood ldquoStable and efficient multiple smoothing parameterestimation for generalized additive modelsrdquo Journal of theAmerican Statistical Association vol 99 no 467 pp 673ndash6862004

[9] S N Wood ldquoFast stable direct fitting and smoothness selectionfor generalized additive modelsrdquo Journal of the Royal StatisticalSociety B vol 70 no 3 pp 495ndash518 2008

[10] X Lin and D Zhang ldquoInference in generalized additive mixedmodels by using smoothing splinesrdquo Journal of the RoyalStatistical Society B vol 61 no 2 pp 381ndash400 1999

[11] M P Wand ldquoSmoothing and mixed modelsrdquo ComputationalStatistics vol 18 no 2 pp 223ndash249 2003

[12] L Fahrmeir T Kneib and S Lang ldquoPenalized structuredadditive regression for space-time data a Bayesian perspectiverdquoStatistica Sinica vol 14 no 3 pp 731ndash761 2004

[13] L Fahrmeir and T Kneib ldquoPropriety of posteriors in structuredadditive regression models theory and empirical evidencerdquoJournal of Statistical Planning and Inference vol 139 no 3 pp843ndash859 2009

[14] F Heinzl L Fahrmeir and T Kneib ldquoAdditive mixed modelswith Dirichlet process mixture and P-spline priorsrdquo Advancesin Statistical Analysis vol 96 no 1 pp 47ndash68 2012

[15] G Kauermann ldquoA note on smoothing parameter selection forpenalized spline smoothingrdquo Journal of Statistical Planning andInference vol 127 no 1-2 pp 53ndash69 2005

[16] P Hall and J D Opsomer ldquoTheory for penalised splineregressionrdquo Biometrika vol 92 no 1 pp 105ndash118 2005

[17] Y Li and D Ruppert ldquoOn the asymptotics of penalized splinesrdquoBiometrika vol 95 no 2 pp 415ndash436 2008

[18] G Claeskens T Krivobokova and J D Opsomer ldquoAsymptoticproperties of penalized spline estimatorsrdquo Biometrika vol 96no 3 pp 529ndash544 2009

[19] G Kauermann T Krivobokova and L Fahrmeir ldquoSomeasymptotic results on generalized penalized spline smoothingrdquo

Journal of the Royal Statistical Society B vol 71 no 2 pp 487ndash503 2009

[20] X Wang J Shen and D Ruppert ldquoOn the asymptotics ofpenalized spline smoothingrdquo Electronic Journal of Statistics vol5 pp 1ndash17 2011

[21] T Yoshida and K Naito ldquoAsymptotics for penalized additive 119861-spline regressionrdquo Journal of the Japan Statistical Society vol 42no 1 pp 81ndash107 2012

[22] T Yoshida and K Naito ldquoAsymptotics for penalized splinesin generalized additive modelsrdquo Journal of NonparametricStatistics vol 26 no 2 pp 269ndash289 2014

[23] L Xiao Y Li and D Ruppert ldquoFast bivariate 119875-splines thesandwich smootherrdquo Journal of the Royal Statistical Society Bvol 75 no 3 pp 577ndash599 2013

[24] D Ruppert S J Sheather and M P Wand ldquoAn effectivebandwidth selector for local least squares regressionrdquo Journal ofthe American Statistical Association vol 90 no 432 pp 1257ndash1270 1995

[25] M P Wand and M C Jones Kernel Smoothing Chapman ampHall London UK 1995

[26] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[27] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[28] S Zhou X Shen and D A Wolfe ldquoLocal asymptotics forregression splines and confidence regionsrdquo The Annals ofStatistics vol 26 no 5 pp 1760ndash1782 1998

[29] G K Robinson ldquoThat BLUP is a good thing the estimation ofrandom effectsrdquo Statistical Science vol 6 no 1 pp 15ndash32 1991

[30] T J Hastie and R J Tibshirani Generalized Additive ModelsChapman amp Hall London UKI 1990

[31] T Yoshida ldquoAsymptotics for penalized spline estimators inquantile regressionrdquo Communications in Statistics-Theory andMethods 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Direct Determination of Smoothing

Journal of Probability and Statistics 3

The penalized spline estimator b = (minus119901 sdot sdot sdot 119870minus1)119879

of b isdefined as the minimizer of

119899

sum

119894=1

119884119894 minus

119870minus1

sum

119896=minus119901

119861119896(119909)119887119896

2

+ 120582

119870minus1

sum

119896=119898minus119901

Δ119898(119887119896)2 (5)

where 120582 is the smoothing parameter and Δ is the backwarddifference operator defined as Δ119887119896 = 119887119896 minus 119887119896minus1 and

Δ119898(119887119896) = Δ

119898minus1(Δ119887119896) =

119898

sum

119895=0

(minus1)119898minus119895

119898119862119895119887119896minus119898+119895 (6)

Let 119863119898 = (119889(119898)

119894119895)119894119895 be a (119870 + 119901 minus 119898) times (119870 + 119901) matrix where

119889(119898)

119894119895= (minus1)

|119894minus119895|

119898119862|119894minus119895| for 119894 le 119895 le 119898+1 and 0 otherwise Usingthe notation Y = (1198841 sdot sdot sdot 119884119899)

119879 and 119885 = (119861minus119901+119895minus1(119909119894))119894119895 (5)can then be expressed as

(Y minus 119885b)119879 (Y minus 119885b) + 120582b119879119863119879119898119863119898b (7)

The minimum of (7) is obtained when

b = (119885119879119885 + 120582119863119879119898119863119898)minus1

119885119879Y (8)

The penalized spline estimator 119891(119909) of 119891(119909) for 119909 isin [119886 119887] isdefined as

119891 (119909) =

119870minus1

sum

119896=minus119901

119861119896 (119909) 119896 = B(119909)119879b (9)

where B(119909) = (119861minus119901(119909) sdot sdot sdot 119861119870minus1(119909))119879

If120582 equiv 0119891(119909) is reduced to the regression spline estimatorwhich is the spline estimator obtained via the least squaresmethod The regression spline estimator will lead to anoscillatory fit if the number of knots 119870 is large However thedetermination of119870 and the location of knots are very difficultproblems The advantage of the penalized spline smoothingis that the good smoothing parameter brings the estimatorto the curve with the fitness and smoothness simultaneouslywithout choosing the number and location of knots preciselyIn the present paper we use equidistant knots 120581119896 = 119886 + 119896119870and focus on the determination of the smoothing parameterAs the location of knots other than above the quantiles ofthe data points 1199091 119909119899 are often used (see Ruppert [27])However it is known that the penalized spline estimator ishardly affected by the location of knots if 119870 is not too smallTherefore we do not discuss the location of knots We suggestthe direct method for this in the next section

3 Direct Determination ofSmoothing Parameter

In this section we provide the direct method for determiningthe smoothing parameter without a grid search This directmethod is given theoretical justification by asymptotic theoryof the penalized spline estimator To investigate the asymp-totic property of the penalized spline estimator we assumethat 119891 isin 119862119901+1 119870 = 119900(11989912) and 120582 = 119874(1198991198701minus119898)

For convenience we first give some notation Let 119866119899 =119899minus1119885119879119885 and Λ 119899 = 119866119899 + (120582119899)119863

119879

119898119863119898 Let blowast be a best 119871infin

approximation to the true function 119891 This means that blowastsatisfies

sup119909isin(119886119887)

10038161003816100381610038161003816119891 (119909) + 119870

minus(119901+1)119887119886 (119909) minus B(119909)

119879blowast10038161003816100381610038161003816

= 119900 (119870minus(119901+1)

) as 119870 997888rarr infin(10)

where

119887119886 (119909) = minus119891(119901+1)

(119909)

(119901 + 1)

119870

sum

119896=1

119868 (120581119896minus1 le 119909 lt 120581119896)Br119901+1 (119909 minus 120581119896minus1

119870minus1)

(11)

119868(119886 lt 119909 lt 119887) is the indicator function of an interval (119886 119887)and Br119901(119909) is the 119901th Bernoulli polynomial (see Zhou et al[28]) It can be easily shown that 119887119886(119909) = 119874(1) as 119870 rarr infin

The penalized spline estimator can be written as

119891 (119909) = B(119909)119879(119885119879119885)minus1

119885119879Y

minus120582

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899(119885

119899)

119879

Y(12)

The first term of the right hand side of (12) is equal to theregression spline estimator denoted by 119891rs(119909) The asymp-totics for the regression spline estimator has been developedby Zhou et al [28] and can be expressed as

119864 [119891rs (119909)] = 119891 (119909) + 119870minus(119901+1)

119887119886 (119909) + 119900 (119870minus(119901+1)

)

119881 [119891rs (119909)] =1205902

119899B(119909)119879119866minus1

119899B (119909) 1 + 119900 (1) = 119874(119870

119899)

(13)

FromTheorem 2(a) of Claeskens et al [18] we have

119864 [119891 (119909)] minus 119864 [119891rs (119909)] = minus120582

119899B(119909)119879Λminus1

119899119863119879

119898119863119898blowast

+ 119900(1205821198701minus119898

119899) = 119874(

1205821198701minus119898

119899)

119881 [119891 (119909)] = 119881 [119891rs (119909)] minus120582119870

119899

119862 (119909 | 119899 119870 120582)

119870+ 119900(

1205821198702

1198992)

(14)

where

119862 (119909 | 119899 119870 120582) =21205902

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899B (119909) (15)

is the covariance of 119891rs(119909) and the second term of the righthand side of (12)The variance of the second term of (12) canbe shown to be negligible (see the appendix) The followingtheorem leads to 120582 controlling the trade-off between thesquared bias and variance of the penalized spline estimator

Theorem1 Thecovariance119862(119909 | 119899 119870 120582)119870 in (15) is positiveFurthermore as 119899 rarr infin 119862(119909 | 119899 119870 120582) = 119862(119909 | 119899 119870 0)(1 +119900(1)) and 119862(119909 | 119899 119870 0)119870 = 119874(119870119899)

4 Journal of Probability and Statistics

From the asymptotic form of 119864[119891(119909)] and 119881[119891(119911)] andTheorem 1 we see that for small 120582 the bias of 119891(119909) issmall and the variance becomes large On the other handthe large 120582 indicates the bias of 119891(119909) increases and thevariance decreases From Theorem 1 the MISE of 119891(119909) canbe expressed as

int

119887

119886

119864 [119891 (119909) minus 119891 (119909)2

] 119889119909

= int

119887

119886

119870minus(119901+1)

119887119886 (119909) minus120582

119899B(119909)119879Λminus1

119899119863119879

119898119863119898blowast

2

119889119909

+1205902

119899int

119887

119886

B(119909)119879119866119899B (119909) 119889119909 minus120582119870

119899

int119887

119886119862 (119909 | 119899 119870 0) 119889119909

119870

+ 1199031119899 (119870) + 1199032119899 (119870 120582)

(16)

where 1199031119899(119870) and 1199032119899(119870 120582) are of negligible order respec-tively compared to the regression spline and penalized splineof the second term of the right hand side of (12) Actuallywe have 1199031119899(119870) = 119900(119870119899) and 1199032119899(119870 120582) = 119900(120582

21198702(1minus119898)

1198992)

The MISE of 119891(119909) is asymptotically quadratic and a globalminimum exists Let

MISE (120582) = 1205822

1198992int

119887

119886

B(119909)119879Λminus1119899119863119879

119898119863119898blowast2

119889119909

minus120582

119899int

119887

119886

2119887119886 (119909)B(119909)119879Λminus1119899 119863

119879

119898119863119898blowast

119870119901+1

+119862 (119909 | 119899 119870 0) 119889119909

(17)

And let 120582opt be theminimizer ofMISE(120582)We suggest the useof 120582opt as the optimal smoothing parameter

120582opt =119899

2

int119887

1198861198632119899 (119909 | 119891

(119901+1) blowast) 119889119909

int119887

1198861198631119899 (119909 | blowast) 119889119909

(18)

where

1198631119899 (119909 | blowast) = B(119909)119879119866minus1

119899119863119879

119898119863119898blowast2

1198632119899 (119909 | 119891(119901+1)

blowast) = 2119887119886 (119909)B(119909)119879119866minus1119899 119863

119879

119898119863119898blowast

119870119901+1

+ 119862 (119909 | 119899 119870 0)

(19)

However it is easy to see that MISE(120582) and 120582opt contain anunknown function and parameters and hence these must beestimated We construct the estimator of 120582opt by using theconsistent estimator of 119891(119901+1) and blowast We can use the penal-ized spline estimator and its derivative as the pilot estimatorof119891(119901+1) and blowast If we then use another smoothing parameter1205820 it should be chosen appropriately Therefore we use theregression spline estimator as the pilot estimator of 119891(119901+1)(119909)

and blowast First we establish b = (119885119879119885)minus1119885119879119884 Next we con-struct the pilot estimator with119891(119901+1)(119909) by using the (119901+2)thdegree 119861-spline basis Let B[119901](119909) = (119861[119901]minus119901 (119909) sdot sdot sdot 119861

[119901]

119870minus1(119909))119879

Using the fundamental property of the 119861-spline function119891(119898)(119909) can be written as 119891(119898)(119909) = 119861

[119901minus119898](119909)119879119863119898blowast

asymptotically Hence the regression spline estimator 119891(119901+1)

can be constructed as 119891(119901+1)(119909) = B[119901+1](119909)119879119863119901+1b(119901+2)where b(119901+2) = ((119885[119901+2])119879119885[119901+2])minus1(119885[119901+2])119879Y and 119885[119901+2] =(119861[119901+2]

minus119901+119895minus1(119909119894))119894119895 Since the regression spline estimator tends to

be oscillatory with a higher order 119901th spline function thefewer knots are used to construct 119891(119901+1)(119909) The variance 1205902included in 119862(119909 | 119899 119870 0) is estimated via

2=

1

119899 minus (119870 + 119901)

119899

sum

119894=1

119884119894 minus B(119909119894)119879b2

(20)

Using the above pilot estimators

opt =119899

2

sum119869

119895=11198632 (119911119895 | 119891

(119901+1) b)

sum119869

119895=11198631 (119911119895 | b)

(21)

with some finite grid points 119911119895119869

1on [119886 119887]

Consequently the final penalized spline estimator isdefined as

119891 (119909) = B(119909)119879bopt (22)

where

bopt = (119885119879119885 + opt119863

119879

119898119863119898)minus1

119885119879Y (23)

It is known that the optimal order of 119870 of the penalizedspline estimator is the same as that of the regression splineestimator 119870 = 119874(119899

1(2119901+3)) (see Kauermann et al [19] and

Zhou et al [28]) Using this we show the asymptotic propertyof 120582opt in following theorem

Theorem 2 Let 119891 isin 119862119901+1 Suppose that 119870 = 119900(11989912) and

120582 = 119900(1198991198701minus119898) Then 120582119900119901119905 given in (21) exists and 120582119900119901119905 =

119874(119899119870119898minus119901minus2

+1198702119898) as 119899 rarr infin Furthermore119870 = 119874(1198991(2119901+3))

leads to the optimal order

120582opt = 119874 (119899(119901+119898+1)(2119901+3)

) + 119874 (1198992119898(2119901+3)

) (24)

and the rate of convergence of MISE of 119891(119909) becomes119874(119899minus2(119901+1)(2119901+3)

)

The proof of Theorem 2 is given in the appendix At theend of this section we give a few remarks

Remark 3 The asymptotic order of the squared biasand variance of the penalized splines are 119874(119870minus2(119901+1)) +119874(12058221198702(1minus119898)

1198992) and 119874(119870119899) respectively Therefore under

119870 = (1198991(2119901+3)

) and 120582 = 119874(119899(119901+119898+1)(2119901+3)

) the optimalrate of convergence of MISE of the penalized splines is119874(119899minus2(119901+1)(2119901+3)

) From Theorem 2 we see that the asymp-totic order of 120582opt yields the optimal rate of convergence

Journal of Probability and Statistics 5

Remark 4 OrsquoSullivan [1] used 120583int119887119886119904(119898)(119909)2119889119909 as the penalty

term where 120583 is the smoothing parameter When equidistantknots are used the penaltyint119887

119886119904(119898)(119909)2119889119909 can be expressed as

1198702119898b119879119863119879

119898119877119863119898b where 119877 = (int

119887

119886119861[119901minus119898]

119894(119909)119861[119901minus119898]

119895(119909))119894119895 The

penalty b119879119863119879119898119863119898b provided by Eilers and Marx [2] can be

seen as the simple version of1198702119898b119879119863119879119898119877119863119898b by replacing 119877

with 119870minus1119868 and 120582 = 1205831198702119898minus1 where 119868 is the identity matrixSo the order 119898 of the difference matrix 119863119898 controls thesmoothness of the 119901th 119861-spline function 119904(119909) and hence 119898should be set such that119898 le 119901 to give a theoretical justificationalthough the penalized spline estimator can be also calculatedfor 119898 gt 119901 Actually (119901119898) = (3 2) is often used by manyauthors

Remark 5 The penalized spline regression is often consid-ered as the mixed model representation (see Ruppert et al[4]) In this frame work we use the 119901th truncated splinemodel

119904 (119909) = 1205730 + 1205731119909 + sdot sdot sdot + 120573119901119909119901+

119870

sum

119895=1

119906119896(119909 minus 120581119896)119901

+ (25)

where (119909 minus 119886)+ = max119909 minus 119886 0 and the 120573rsquos are unknownparameters Each 119906119896 is independently distributed as 119906119896 sim119873(0 120590

2

119906) where 1205902

119906is an unknown variance parameter of 119906119896

The penalized spline fit of 119891(119909) is defined as the estimatedBLUP (see Robinson [29]) The smoothing parameter in theordinal spline regression model corresponds to 12059021205902

119906in the

spline mixed model Since the 1205902 and 1205902119906are estimated using

the ML or REML method we do not need to choose thesmoothing parameter via a grid search It is known that theestimated BLUP fit is linked to the penalized spline estimator(9) with 119898 = 119901 + 1 (see Kauermann et al [19]) Hence theestimated BLUP tends to have a theoretically underfit (seeRemark 4)

Remark 6 From Lyapunovrsquos theorem the asymptotic nor-mality of the penalized spline estimator119891(119909)with opt can bederived under the same assumption as Theorem 2 and someadditional mild conditions Although the proof is omittedit is straightforward since opt is the consistent estimator of120582opt and the asymptotic order of 120582opt satisfies Lyapunovrsquoscondition

4 Extension to Additive Models

We extend the direct method to the regression model withmultidimensional explanatory variables In particular weconsider additive models in this section For the dataset(119884119894 x119894) 119894 = 1 119899 with 1-dimensional response 119884119894 and119889-dimensional explanatory x119894 = (1199091198941 119909119894119889) the additivemodels are connected via unknown regression functions119891119895 (119895 = 1 119889) and the mean parameter 120583 such that

119884119894 = 120583 + 1198911 (1199091198941) + sdot sdot sdot + 119891119889 (119909119894119889) + 120576119894 (26)

We assume that 119909119894119895 is located on an interval [119886119895 119887119895] and1198911 119891119889 are normalized as

int

119887119895

119886119895

119891119895 (119906) 119889119906 = 0 119895 = 1 119889 (27)

to ensure the identifiability of 119891119895 Then the intercept 120583 istypically estimated via

120583 =1

119899

119899

sum

119894=1

119884119894 (28)

Hence we replace 119884119894 with 119884119894 minus 120583 in (26) and set 120583 = 0redefining the additive model as

119884119894 = 1198911 (1199091198941) + sdot sdot sdot + 119891119889 (119909119894119889) + 120576119894 119894 = 1 119899 (29)

where each 119884119894 is centered We aim to estimate 119891119895 via apenalized spline method Let

119904119895 (119909) =

119870119895minus1

sum

119896=minus119901

119861119895119896 (119909) 119887119895119896 119895 = 1 119889 (30)

be the 119861-spline model where 119861119895119896(119909) is the 119901th 119861-splinebasis function with knots 120581119895119896 | 119896 = minus119901 119870119895 + 119901 +

1 1205811198950 = 119886119895 120581119895119870119895+1= 119887119895 119895 = 1 119889 and 119887119895119896rsquos are unknown

parameters We consider the 119861-spline additive models

119884119894 = 1199041 (1199091198941) + sdot sdot sdot + 119904119889 (119909119894119889) + 120576119894 119894 = 1 119899 (31)

and estimate 119904119895 and 119887119895119896 For 119895 = 1 119889 the penalized spline

estimator b119895 = (119895minus119901 sdot sdot sdot 119895119870minus1)119879

of b119895 = (119887119895minus119901 sdot sdot sdot 119887119895119870minus1)119879

is defined as

[[

[

b1b119889

]]

]

= argminb1b119889

119899

sum

119894=1

119884119894 minus

119889

sum

119895=1

119870119895

sum

119896=minus119901

119861119895119896 (119909119894) 119887119895119896

2

+

119889

sum

119895=1

120582119895b119879

119895119863119879

119895119898119863119895119898b119895

(32)

where 120582119895 are the smoothing parameters and 119863119895119898 is the 119898thorder difference matrix with size (119870119895 + 119901 minus 119898) times (119870119895 + 119901) for119895 = 1 119889 By using b119895 the penalized spline estimator of119891119895(119909119895) is defined as

119891119895 (119909119895) =

119870119895minus1

sum

119896=minus119901

119861119895119896 (119909119895) 119895119896 119895 = 1 119889 (33)

6 Journal of Probability and Statistics

The asymptotics for 119891119895(119909119895) have been studied by Yoshidaand Naito [22] who derived the asymptotic bias and varianceto be

119864 [119891119895 (119909119895)] = 119891119895 (119909119895) + 119870minus(119901+1)

119895119887119895119886 (119909119895) (1 + 119900 (1))

+

120582119895

119899B119895(119909119895)

119879

Λminus1

119895119899119863119879

119895119898119863119895119898blowast

119895+ 119900(

1205821198951198701minus119898

119895

119899)

119881 [119891119895 (119909119895)] =1205902

119899B119895(119909119895)

119879

119866minus1

119895119899B119895 (119909119895)

minus120582119870

119899

1205902119862119895 (119909119895 | 119899 119870119895)

119870+ 119900(

1205821198951198702

119895

1198992)

(34)

where B119895(119909) = (119861minus119901(119909) sdot sdot sdot 119861119870119895minus1(119909))119879

119866119895119899 = 119899minus1119885119879

119895119885119895

Λ 119895119899 = 119866119895119899 + (120582119895119899)119863119879

119898119863119898 119885119895 = (119861minus119901+119896minus1(119909119894119895))119894119896 and blowast

119895is

the best 119871infin approximation of 119891119895

119887119895119886 (119909) = minus

119891(119901+1)

119895(119909)

(119901 + 1)

119870119895minus1

sum

119896=0

119868 (120581119895119896 le 119909 lt 120581119895119896+1)

times 119861119901+1(

119909 minus 120581119895119896

119870minus1119895

)

119862119895 (119909119895 | 119899 119870119895) =2

119899B119895(119909119895)

119879

119866minus1

119895119899119863119879

119895119898119863119895119898119866

minus1

119899B119895 (119909119895)

(35)

The above asymptotic bias and variance of 119891119895(119909119895) are sim-ilar to that of the penalized spline estimator in univariateregression with (119884119894 119909119894119895) 119894 = 1 119899 Furthermore

the asymptotic normality of [1198911(1199091) sdot sdot sdot 119891119889(119909119889)]119879

has beenshown by Yoshida and Naito [22] From their paper we findthat119891119894(119909119894) and119891119895(119909119895) are then asymptotically independent for119894 = 119895 This indicates some theoretical justification to select 120582119895in minimizing the MISE of 119891119895 Similar to the discussion inSection 3 the minimizer 120582119895opt of the MISE of 119891119895(119909119895) can beobtained for 119895 = 1 119889

120582119895opt =119899

2

int119887119895

119886119895119871119895119899 (119909 | 119891

(119901+1)

119895 blowast119895 1205902) 119889119909

int119887119895

119886119895119867119895119899 (119909 | blowast119895 ) 119889119909

(36)

where

119867119895119899 (119909 | blowast

119895) = B119895(119909119895)

119879

119866minus1

119895119899119863119879

119895119898119863119895119898blowast

119895

2

119871119895119899 (119909 | 119891(119901+1)

119895 blowast119895 1205902) = 2

119887119886 (119909)B119895(119909119895)119879

119866minus1

119895119899119863119879

119895119898119863119895119898blowast119895

119870119901+1

119895

+ 1205902119862119895 (119909 | 119899 119870119895)

(37)

Since 119891119895 blowast119895 and 1205902 are unknown they should be estimated

Thepilot estimators of119891119895 blowast119895 and1205902 are constructed based on

the regression splinemethod By using the pilot estimators119891119895b119895 of119891119895blowast119895 and the estimator of1205902 we construct the estimatorof 120582119895opt

119895opt =119899

2

sum119877

119903=1119871119895119899 (119911119903 | 119891

(119901+1)

119895 b119895 2)

sum119877

119903=1119867119895119899 (119911119903 | b119895)

(38)

where 119911119903119877

1is some finite grid point sequence on [119886119895 119887119895] We

obtain for 119895 = 1 119889 the penalized spline estimator 119891119895(119909119895)of 119891119895(119909119895)

119891119895 (119909119895) = B119895(119909119895)119879

b119895opt (39)

where b119895opt is the penalized spline estimator of b119895 using119895opt From Theorem 2 and the proof of Theorem 34of Yoshida and Naito [22] the asymptotic normality of[1198911(1199091) sdot sdot sdot 119891119889(119909119889)]

119879

using (1205821opt 120582119889opt) can be shown

Remark 7 Since the true regression functions are normalizedthe estimator 119891119895 should be also centered as

119891119895 (119909119895) minus1

119899

119899

sum

119894=1

119891119895 (119909119894119895) (40)

Remark 8 Thepenalized spline estimator of b1 b119889 can beobtained using a backfitting algorithm (Hastie and Tibshirani[30]) The backfitting algorithm for the penalized splines inadditive regression is detailed in Marx and Eilers [3] andYoshida and Naito [21]

Remark 9 Although we focus on the nonparametric additiveregression in this paper the directmethod can be also appliedto the generalized additive models However we omit thisdiscussion because the procedure is similar to the case of theadditive models discussed in this section

Remark 10 The direct method is quite computationally effi-cient when compared to the grid search method in additivemodels In grid searches we prepare the candidate of 120582119895 Let119872119895 be the set of all possible candidate grid value of 120582119895 for 119895 =1 119889 Then we need to compute the backfitting algorithm1198721 times sdot sdot sdot times 119872119889 times On the other hand it is sufficientto perform the backfitting algorithm for only two steps forthe pilot estimator and the final penalized spline estimatorThus compared with the conventional grid search methodthe direct method can drastically reduce computation time

5 Numerical Study

In this section we investigate the finite sample performanceof the proposed direct method in a Monte Carlo simulationLet us first consider the univariate regression model (1) forthe data (119884119894 119909119894) 119894 = 1 119899 Then we use the three

Journal of Probability and Statistics 7

Table 1 Results of sample MISE for 119899 = 50 and 119899 = 200 All entries for MISE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0526 0322 3684 1070 1160 0377L-GCV 0726 0621 5811 1082 1183 0379L-REML 2270 1544 9966 3520 1283 0873Local linear 0751 0220 4274 0901 1689 0503C-Direct 0401 0148 3027 0656 1044 0326C-GCV 0514 0137 3526 0666 1043 0290C-REML 1326 0732 8246 4213 3241 0835

Table 2 Results of sample MSE of smoothing parameter obtained by direct method GCV and REML for 119899 = 50 and 119899 = 200 All entries forMSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0331 0193 0113 0043 0025 0037L-GCV 0842 0342 0445 0101 0072 0043L-REML 1070 1231 0842 0437 0143 0268C-Direct 0262 0014 0082 0045 0053 0014C-GCV 0452 0123 0252 0092 0085 0135C-REML 0855 0224 0426 0152 0093 0463

types of true regression function 119891(119909) = cos(120587(119909 minus 03))119891(119909) = 120601((119909 minus 08)005)radic005 minus 120601((119909 minus 02)004)radic004and 119891(119909) = 21199093 +3 sin(2120587(119909minus08)3) + 3 exp[minus(119909minus05)201]which are labeled by F1 F2 and F3 respectively Here120601(119909) is the density function of the normal distribution Theexplanatory 119909119894 and error 120576119894 are independently generated fromuniform distribution on [0 1] and 119873(0 052) respectivelyWe estimate each true regression function via the penalizedspline method We then use the linear and cubic 119861-splinebases with equidistant knots and the second order differencepenalty In addition we set 119870 = 5119899

25 equidistant knotsand the smoothing parameter is determined by the directmethodThe penalized spline estimator with the linear splineand cubic spline are denoted as L-Direct and C-Directrespectively For comparison with L-Direct the same studiesare also implemented for the penalized spline estimator witha linear spline and the smoothing parameter selected viaGCV and restricted maximum likelihood method (REML)in mixed model representation and the local polynomialestimator with normal kernel and Plug-in bandwidth (seeRuppert et al [24]) In GCV we set the candidate values of120582 as 11989410 119894 = 0 1 99 The above three estimators aredenoted by L-GCV L-REML and local linear respectivelyFurthermore we compare C-Direct with C-GCV and C-REML which are the penalized spline estimator with thecubic spline and the smoothing parameter determined byGCV and REML Let

sMISE = 1119869

119869

sum

119895=1

[1

119877

119877

sum

119903=1

119891119903 (119911119895) minus 119891 (119911119895)2

] (41)

be the sample MISE of any estimator 119891(119909) of 119891(119909) where119891119903(119911) is 119891(119911) with 119903th replication and 119911119895 = 119895119869 We calculatethe sample MISE of the penalized spline estimator withthe direct method GCV and REML and the local linearestimator In this simulation we use 119869 = 100 and 119877 = 1000We have simulated 119899 = 50 and 200

The sMISE of all estimators for each model and 119899 aregiven in Table 1 The penalized spline estimator using thedirect method shows good performance in each setting Incomparison with other smoothing parameter methods thedirect method is a little better than the GCV as a wholeHowever for 119899 = 200 C-GCV is better than C-Direct inF1 and F3 though its difference is very small We see thatthe sMISE of C-Direct is smaller than local linear whereaslocal linear behaves better than L-Direct in some case In F2C-Direct is the smallest of all estimators for 119899 = 50 and200 Although the performance totally seems to depend onsituation in which data sets are generated we believe that theproposed method is one of the efficient methods

Next the difference between opt and 120582opt is investigatedempirically Let opt119903 be the opt with 119903th replications for 119903 =1 1000 Then we calculate the sample MSE of opt

sMSE = 1119877

119877

sum

119903=1

opt119903 minus 120582opt2

(42)

for F1 F2 and F3 and 119899 = 50 and 200 To construct 120582opt forF1 F2 and F3 we use true119891(4)(119909) and 1205902 and an approximateblowast Here the approximate blowast means the sample average of bwith 119899 = 200 and 1000 replications

In Table 2 the sMSE of opt for each true functionare described For comparison the sMSE of the smoothing

8 Journal of Probability and Statistics

Table 3 Results of sample PSE for 119899 = 50 and 119899 = 200 All entries for PSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0424 0052 0341 0070 0360 0117L-GCV 0331 0084 0381 0089 0333 0139L-REML 0642 0124 0831 1002 0733 0173Local linear 0751 0220 0474 0901 0489 0143C-Direct 0341 048 0237 063 0424 0086C-GCV 0314 043 0326 076 0533 0089C-REML 0863 1672 0456 121 1043 0125

parameter obtained via GCV and REML are calculated ThesMSE of the L-Direct and the C-Direct is small even in 119899 = 50for all true functions Therefore it seems that the accuracyof opt is guaranteed It indicates that the pilot estimatorconstructed via least squares method is not bad The sMSEwith the direct method are smaller than that with GCV andREML This result is not surprising since GCV and REMLare not concerned with 120582opt However together with Table 1it seems that the sample MSE of the smoothing parameter isreflected in the sample MISE of the estimator

The proposed direct method was derived based on theMISE On the other hand the GCV and the REML areobtained in context of minimizing prediction squared error(PSE) and prediction error Hence we compare the samplePSE of the penalized spline estimator with the direct methodGCV and REML and the local linear estimator Since theprediction error is almost the same as MISE (see Section 4 ofRuppert et al [4]) we omit the comparison of the predictionerror Let

sPSE = 1119899

119899

sum

119894=1

[1

119877

119877

sum

119903=1

119910119894119903 minus 119891119903 (119909119894)2

] (43)

be the sample PSE for any estimator 119891(119909) where119910119894119903 (119903 = 1 119877) is independently generated from119884119894 sim 119873(119891(119909119894) (05)

2) for all 119894

In Table 3 the modified sPSE |sPSE minus (05)2| of allestimators for each model and 119899 are described In Remark 11we discuss the modified sPSE From the result we canconfirm that the direct method has good predictability GCVcan be regarded as the estimator of sPSE Therefore in somecase sPSE with GCV is smaller than that with the directmethod It seems that the cause is the accuracy of theestimates of variance (see Remark 11) However its differenceis very small

We admit that the computational time (in second) takento obtain 119891(119909) for F1 119901 = 3 and 119899 = 200 The fits withthe direct method GCV and REML took 004 122 and 034Although the difference is small the computational time ofthe direct method was faster than that of GCV and REML

Next we confirm the behavior of the penalized splineestimator with the direct method in the additive model Forthe data (119884119894 1199091198941 1199091198942 1199091198943) 119894 = 1 119899 we assume theadditive model with true functions 1198911(1199091) = sin(21205871199091)

1198912(1199092) = 120601((1199092 minus 05)02) and 1198913(1199093) = 04120601((1199093 minus

01)02) + 06120601((1199092 minus 08)02) The error is similar to thefirst simulationThe design points (1199091198941 1199091198942 1199091198943) are generatedby

[

[

1199091198941

1199091198942

1199091198943

]

]

=

[[[

[

(1 + 120588 + 1205882)minus1

0 0

0 (1 + 2120588)minus1

0

0 0 (1 + 120588 + 1205882)minus1

]]]

]

times [

[

1 120588 1205882

120588 1 120588

1205882120588 1

]

]

[

[

1199111198941

1199111198942

1199111198943

]

]

(44)

where 119911119894119895 (119894 = 1 119899 119895 = 1 2 3) are generated inde-pendently from the uniform distribution on [0 1] In thissimulation we adopt 120588 = 0 and 120588 = 05 We then correctedto satisfy

int

1

0

119891119895 (119911) 119889119911 = 0 119895 = 1 2 3 (45)

in each 120588 We construct the penalized spline estimator viathe backfitting algorithm with the cubic spline and secondorder difference penalty The number of equidistant knots is119870119895 = 5119899

25 and the smoothing parameters are determinedusing the direct method The pilot estimator to construct119895opt is the regression spline estimator with fifth spline and119870119895 = 119899

25 We calculate the sample MISE of each 119891119895 for119899 = 200 and 1000 Monte Carlo iterations Then we calculatethe sample MISE of 1198911 1198912 and 1198913 In order to compare withthe direct method we also conduct the same simulation withGCV and REML In GCV we set the candidate values of 120582119895 as11989410 119894 = 0 1 9 for 119895 = 1 2 3 Table 4 summarizes thesample MISE of 119891119895 (119895 = 1 2 3) denoted by MISE119895 for directmethod GCV and REML The penalized spline estimatorwith the direct method performs like that with the GCV inboth the uncorrelated and correlated design cases For 120588 = 0the behaviors of MISE1 MISE2 and MISE3 with the directmethod are similar On the other hand for GCV the MISE1is slightly larger than MISE2 and MISE3 The direct methodleads to an efficient estimate of all covariates On the wholethe direct method is better than REML From above webelieve that the direct method is preferable in practice

Journal of Probability and Statistics 9

Table 4 Results of sample MISE of the penalized spline estimator For each direct method GCV and 120588 MISE1 MISE2 and MISE3 are thesample MISE of 1198911 1198912 and 1198913 respectively All entries for MISE are 102 times their actual values

Method 120588 = 0 120588 = 05

MISE1 MISE2 MISE3 MISE1 MISE2 MISE3Direct 0281 0209 0205 0916 0795 0972GCV 0687 0237 0390 0929 0857 1112REML 1204 0825 0621 2104 1832 2263

Table 5 Results of sample MSE of the selected smoothing parameter via direct method GCV and REML for 120588 = 0 and 120588 = 05 All entriesfor MSE are 102 times their actual values

120588 = 0 120588 = 05

True 1198911 1198912 1198913 1198911 1198912 1198913

Direct 0124 0042 0082 0312 0105 0428GCV 1315 0485 0213 0547 0352 0741REML 2531 1237 2656 1043 0846 1588

Table 6 Results of sample PSE of the penalized spline estimator Allentries for PSE are 102 times their actual values

119899 = 200 120588 = 0 120588 = 05

Direct 0281 0429GCV 0387 0467REML 1204 0925

To confirm the consistency of the direct method thesample MSE of 119895opt is calculated the same manner as thatgiven in univariate regression case For comparison thesample MSE of GCV and REML are obtained Here the true120582119895opt (119895 = 1 2 3) is defined in the same way as that forunivariate case In Table 5 the results for each 1198911 1198912 and 1198913and 120588 = 0 and 05 are shown We see from the results that thebehavior of the directmethod is goodThe sampleMSE of thedirect method is smaller than that of GCV and REML for all119895 For the random design 120588 = 05 such tendency can be alsoseen

Table 6 shows the sPSE of 119891(x119894) = 1198911(1199091198941) + 1198912(1199091198942) +1198913(1199091198943)with each smoothing parameter selectionmethodWesee from result that the efficiency of the direct method isguaranteed in the context of prediction accuracy

Finally we show the computational time (in second)required to construct the penalized spline estimatorwith eachmethod The computational times with the direct methodGCV and REML are 1187 s 12643 s and 435 s respectivelyWe see that the direct method is more efficient than othermethods in context of the computation (see Remark 11) Allof computations in the simulation were done using a softwareR and a computer with 340GHz CPU and 240GB memoryThough this is only few examples the direct method can beseen as one of the good methods to select the smoothingparameter

Remark 11 We calculate |sPSE minus (05)2| as the criteria toconfirm the prediction squared error The ordinal PSE isdefined as

PSE = 1119899

119899

sum

119894=1

119864 [119884119894 minus 119891 (119909119894)2

]

= 1205902+1

119899

119899

sum

119894=1

119864 [119891 (119909119894) minus 119891 (119909119894)2

]

(46)

where 119884119894rsquos are the test data The second term of the PSEis similar to MISE So it can be said that the sample PSEevaluates the accuracy of the variance part andMISE part Tosee the detail of the difference of the sample PSE between thedirect method and other method we calculated |sPSE minus 1205902|in this section

6 Discussion

In this paper we proposed a new direct method for deter-mining the smoothing parameter for a penalized splineestimator in a regression problemThe directmethod is basedon minimizing MISE of the penalized spline estimator Westudied the asymptotic property of the direct method Theasymptotic normality of the penalized spline estimator usingopt is theoretically guaranteed when the consistent estimatoris used as the pilot estimator to obtain opt In numericalstudy for the additive model the computational cost of thisdirect method is dramatically reduced when compared togrid search methods such as GCV Furthermore we find thatthe performance of the direct method is better than or at leastsimilar to that of other methods

The direct method will be developed for other regressionmodels such as varying-coefficient models Cox propor-tional hazards models single-index models and others ifthe asymptotic bias and variance of the penalized spline

10 Journal of Probability and Statistics

estimator are derived It is not limited to themean regressionit can be applied to quantile regression Actually Yoshida[31] has presented the asymptotic bias and variance of thepenalized spline estimator in univariate quantile regressionsFurthermore it is seen that improving the direct method isimportant for various situations and datasets In particularthe development of the determination of locally adaptive 120582 isan interesting avenue of further research

Appendix

We describe the technical details For the matrix 119860 = (119886119894119895)119894119895119860infin = max119894119895|119886119894119895| First to prove Theorems 1 and 2 weintroduce the fundamental property of penalized splines inthe following lemma

Lemma A1 Let 119860 = (119886119894119895)119894119895 be (119870 + 119901) matrix Supposethat 119870 = 119900(11989912) 120582 = 119900(1198991198701minus119898) and 119860infin = 119874(1) Then119860119866minus1

119899infin= 119874(119870) and 119860Λminus1

119899infin= 119874(119870)

The proof of Lemma A1 is shown in Zhou et al [28] andClaeskens et al [18] The repeated use of Lemma A1 yieldsthat the asymptotic order of the variance of the second termof (12) is 119874(1205822(119870119899)3) Actually the asymptotic order of thevariance of the second term of (12) can be calculated as

(120582

119899)

21205902

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899119863119879

119898119863119898Λminus1

119899B (119909)

= 119874(1205822

11989931198703)

(A1)

When 119870 = 119900(11989912) and 120582 = 119874(119899119870

1minus119898) 1205822(119870119899)3 =

119900(120582(119870119899)2) holds

Proof of Theorem 1 We write

119862 (119909 | 119899 119870 120582) =21205902

119899B(119909)119879119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909)

+21205902

119899B(119909)119879 Λminus1

119899minus 119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909) (A2)

Since

Λminus1

119899minus 119866minus1

119899= Λminus1

119899119868 minus Λ 119899119866

minus1

119899

=120582

119899Λminus1

119899119863119879

119898119863119898119866minus1

119899

(A3)

we have Λminus1119899minus 119866minus1

119899infin

= 119874(1205821198702119899) and hence the

second term of right hand side of (A2) can be shown119874((119870119899)

2(120582119870119899)) From Lemma A1 it is easy to derive that

119862(119909 | 119899 119870 0) = 119874(119870(119870119899)) Consequently we have

119862 (119909 | 119899 119870 120582)

119870=119862 (119909 | 119899 119870 0)

119870+ 119874((

119870

119899)

2

(120582

119899))

=119862 (119909 | 119899 119870 0)

119870+ 119900 (

119870

119899)

(A4)

and this leads to Theorem 1

Proof of Theorem 2 It is sufficient to prove that int1198871198861198631119899(119909 |

blowast)119889119909 = 119874(1198702(1minus119898)) and

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874(1198702

119899)

(A5)

Since100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

119889119909

100381610038161003816100381610038161003816100381610038161003816

le sup119909isin[119886119887]

[B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

] (119886 minus 119887)

(A6)

we have

int

119887

119886

1198631119899 (119909 | blowast) 119889119909 = 119874 (119870

2(1minus119898)) (A7)

by using Lemma A1 and the fact that 119863119898blowastinfin = 119874(119870minus119898)

(see Yoshida [31])From the property of119861-spline function for (119870+119901) square

matrix 119860 there exists 119862 gt 0 such that

int

119887

119886

B(119909)119879119860B (119909) 119889119909 le 119860infin int119887

119886

B(119909)119879B (119909) 119889119909

le 119860infin119862

(A8)

In other words the asymptotic order of int119887119886B(119909)119879119860B(119909)119889119909

and that of 119860infin are the same Let 119860119899 = 119866minus1

119899119863119879

119898119863119898119866minus1

119899

Then since 119860119899infin = 1198702 we obtain

int

119887

119886

119862 (119909 | 119899 119870 0) 119889119909 = 119874(1198702

119899) (A9)

Furthermore because sup119886isin[119886119887]

119887119886(119909) = 119874(1) we have

119870minus(119901+1)

100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

119887119886 (119909)B(119909)119879119866minus1

119899119863119879

119898119863119898blowast119889119909

100381610038161003816100381610038161003816100381610038161003816

= 119874 (119870minus(119901+119898)

)

(A10)

and hence it can be shown that

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874 (119899minus11198702) (A11)

Together with (A7) and (A11) 120582opt = 119874(119899119870119898minus119901minus2

+ 1198702119898)

can be obtained Then rate of convergence of the penalizedspline estimator with 120582opt is 119874(119899

minus2(119901+2)(2119901 + 3)) when 119870 =

119874(1198991(2119901+3)

) which is detailed in Yoshida and Naito [22]

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Journal of Probability and Statistics 11

Acknowledgment

The author also would like to thank two anonymous refereesfor their careful reading and comments to improve the paper

References

[1] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash527 1986

[2] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash121 1996 With comments and a rejoinder by the authors

[3] B D Marx and P H C Eilers ldquoDirect generalized additivemodeling with penalized likelihoodrdquo Computational Statisticsand Data Analysis vol 28 no 2 pp 193ndash209 1998

[4] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge UK 2003

[5] T Krivobokova ldquoSmoothing parameter selection in two frame-works for penalized splinesrdquo Journal of the Royal StatisticalSociety B vol 75 no 4 pp 725ndash741 2013

[6] P T Reiss and R T Ogden ldquoSmoothing parameter selection fora class of semiparametric linear modelsrdquo Journal of the RoyalStatistical Society B vol 71 no 2 pp 505ndash523 2009

[7] S N Wood ldquoModelling and smoothing parameter estimationwith multiple quadratic penaltiesrdquo Journal of the Royal Statisti-cal Society B vol 62 no 2 pp 413ndash428 2000

[8] S NWood ldquoStable and efficient multiple smoothing parameterestimation for generalized additive modelsrdquo Journal of theAmerican Statistical Association vol 99 no 467 pp 673ndash6862004

[9] S N Wood ldquoFast stable direct fitting and smoothness selectionfor generalized additive modelsrdquo Journal of the Royal StatisticalSociety B vol 70 no 3 pp 495ndash518 2008

[10] X Lin and D Zhang ldquoInference in generalized additive mixedmodels by using smoothing splinesrdquo Journal of the RoyalStatistical Society B vol 61 no 2 pp 381ndash400 1999

[11] M P Wand ldquoSmoothing and mixed modelsrdquo ComputationalStatistics vol 18 no 2 pp 223ndash249 2003

[12] L Fahrmeir T Kneib and S Lang ldquoPenalized structuredadditive regression for space-time data a Bayesian perspectiverdquoStatistica Sinica vol 14 no 3 pp 731ndash761 2004

[13] L Fahrmeir and T Kneib ldquoPropriety of posteriors in structuredadditive regression models theory and empirical evidencerdquoJournal of Statistical Planning and Inference vol 139 no 3 pp843ndash859 2009

[14] F Heinzl L Fahrmeir and T Kneib ldquoAdditive mixed modelswith Dirichlet process mixture and P-spline priorsrdquo Advancesin Statistical Analysis vol 96 no 1 pp 47ndash68 2012

[15] G Kauermann ldquoA note on smoothing parameter selection forpenalized spline smoothingrdquo Journal of Statistical Planning andInference vol 127 no 1-2 pp 53ndash69 2005

[16] P Hall and J D Opsomer ldquoTheory for penalised splineregressionrdquo Biometrika vol 92 no 1 pp 105ndash118 2005

[17] Y Li and D Ruppert ldquoOn the asymptotics of penalized splinesrdquoBiometrika vol 95 no 2 pp 415ndash436 2008

[18] G Claeskens T Krivobokova and J D Opsomer ldquoAsymptoticproperties of penalized spline estimatorsrdquo Biometrika vol 96no 3 pp 529ndash544 2009

[19] G Kauermann T Krivobokova and L Fahrmeir ldquoSomeasymptotic results on generalized penalized spline smoothingrdquo

Journal of the Royal Statistical Society B vol 71 no 2 pp 487ndash503 2009

[20] X Wang J Shen and D Ruppert ldquoOn the asymptotics ofpenalized spline smoothingrdquo Electronic Journal of Statistics vol5 pp 1ndash17 2011

[21] T Yoshida and K Naito ldquoAsymptotics for penalized additive 119861-spline regressionrdquo Journal of the Japan Statistical Society vol 42no 1 pp 81ndash107 2012

[22] T Yoshida and K Naito ldquoAsymptotics for penalized splinesin generalized additive modelsrdquo Journal of NonparametricStatistics vol 26 no 2 pp 269ndash289 2014

[23] L Xiao Y Li and D Ruppert ldquoFast bivariate 119875-splines thesandwich smootherrdquo Journal of the Royal Statistical Society Bvol 75 no 3 pp 577ndash599 2013

[24] D Ruppert S J Sheather and M P Wand ldquoAn effectivebandwidth selector for local least squares regressionrdquo Journal ofthe American Statistical Association vol 90 no 432 pp 1257ndash1270 1995

[25] M P Wand and M C Jones Kernel Smoothing Chapman ampHall London UK 1995

[26] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[27] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[28] S Zhou X Shen and D A Wolfe ldquoLocal asymptotics forregression splines and confidence regionsrdquo The Annals ofStatistics vol 26 no 5 pp 1760ndash1782 1998

[29] G K Robinson ldquoThat BLUP is a good thing the estimation ofrandom effectsrdquo Statistical Science vol 6 no 1 pp 15ndash32 1991

[30] T J Hastie and R J Tibshirani Generalized Additive ModelsChapman amp Hall London UKI 1990

[31] T Yoshida ldquoAsymptotics for penalized spline estimators inquantile regressionrdquo Communications in Statistics-Theory andMethods 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Direct Determination of Smoothing

4 Journal of Probability and Statistics

From the asymptotic form of 119864[119891(119909)] and 119881[119891(119911)] andTheorem 1 we see that for small 120582 the bias of 119891(119909) issmall and the variance becomes large On the other handthe large 120582 indicates the bias of 119891(119909) increases and thevariance decreases From Theorem 1 the MISE of 119891(119909) canbe expressed as

int

119887

119886

119864 [119891 (119909) minus 119891 (119909)2

] 119889119909

= int

119887

119886

119870minus(119901+1)

119887119886 (119909) minus120582

119899B(119909)119879Λminus1

119899119863119879

119898119863119898blowast

2

119889119909

+1205902

119899int

119887

119886

B(119909)119879119866119899B (119909) 119889119909 minus120582119870

119899

int119887

119886119862 (119909 | 119899 119870 0) 119889119909

119870

+ 1199031119899 (119870) + 1199032119899 (119870 120582)

(16)

where 1199031119899(119870) and 1199032119899(119870 120582) are of negligible order respec-tively compared to the regression spline and penalized splineof the second term of the right hand side of (12) Actuallywe have 1199031119899(119870) = 119900(119870119899) and 1199032119899(119870 120582) = 119900(120582

21198702(1minus119898)

1198992)

The MISE of 119891(119909) is asymptotically quadratic and a globalminimum exists Let

MISE (120582) = 1205822

1198992int

119887

119886

B(119909)119879Λminus1119899119863119879

119898119863119898blowast2

119889119909

minus120582

119899int

119887

119886

2119887119886 (119909)B(119909)119879Λminus1119899 119863

119879

119898119863119898blowast

119870119901+1

+119862 (119909 | 119899 119870 0) 119889119909

(17)

And let 120582opt be theminimizer ofMISE(120582)We suggest the useof 120582opt as the optimal smoothing parameter

120582opt =119899

2

int119887

1198861198632119899 (119909 | 119891

(119901+1) blowast) 119889119909

int119887

1198861198631119899 (119909 | blowast) 119889119909

(18)

where

1198631119899 (119909 | blowast) = B(119909)119879119866minus1

119899119863119879

119898119863119898blowast2

1198632119899 (119909 | 119891(119901+1)

blowast) = 2119887119886 (119909)B(119909)119879119866minus1119899 119863

119879

119898119863119898blowast

119870119901+1

+ 119862 (119909 | 119899 119870 0)

(19)

However it is easy to see that MISE(120582) and 120582opt contain anunknown function and parameters and hence these must beestimated We construct the estimator of 120582opt by using theconsistent estimator of 119891(119901+1) and blowast We can use the penal-ized spline estimator and its derivative as the pilot estimatorof119891(119901+1) and blowast If we then use another smoothing parameter1205820 it should be chosen appropriately Therefore we use theregression spline estimator as the pilot estimator of 119891(119901+1)(119909)

and blowast First we establish b = (119885119879119885)minus1119885119879119884 Next we con-struct the pilot estimator with119891(119901+1)(119909) by using the (119901+2)thdegree 119861-spline basis Let B[119901](119909) = (119861[119901]minus119901 (119909) sdot sdot sdot 119861

[119901]

119870minus1(119909))119879

Using the fundamental property of the 119861-spline function119891(119898)(119909) can be written as 119891(119898)(119909) = 119861

[119901minus119898](119909)119879119863119898blowast

asymptotically Hence the regression spline estimator 119891(119901+1)

can be constructed as 119891(119901+1)(119909) = B[119901+1](119909)119879119863119901+1b(119901+2)where b(119901+2) = ((119885[119901+2])119879119885[119901+2])minus1(119885[119901+2])119879Y and 119885[119901+2] =(119861[119901+2]

minus119901+119895minus1(119909119894))119894119895 Since the regression spline estimator tends to

be oscillatory with a higher order 119901th spline function thefewer knots are used to construct 119891(119901+1)(119909) The variance 1205902included in 119862(119909 | 119899 119870 0) is estimated via

2=

1

119899 minus (119870 + 119901)

119899

sum

119894=1

119884119894 minus B(119909119894)119879b2

(20)

Using the above pilot estimators

opt =119899

2

sum119869

119895=11198632 (119911119895 | 119891

(119901+1) b)

sum119869

119895=11198631 (119911119895 | b)

(21)

with some finite grid points 119911119895119869

1on [119886 119887]

Consequently the final penalized spline estimator isdefined as

119891 (119909) = B(119909)119879bopt (22)

where

bopt = (119885119879119885 + opt119863

119879

119898119863119898)minus1

119885119879Y (23)

It is known that the optimal order of 119870 of the penalizedspline estimator is the same as that of the regression splineestimator 119870 = 119874(119899

1(2119901+3)) (see Kauermann et al [19] and

Zhou et al [28]) Using this we show the asymptotic propertyof 120582opt in following theorem

Theorem 2 Let 119891 isin 119862119901+1 Suppose that 119870 = 119900(11989912) and

120582 = 119900(1198991198701minus119898) Then 120582119900119901119905 given in (21) exists and 120582119900119901119905 =

119874(119899119870119898minus119901minus2

+1198702119898) as 119899 rarr infin Furthermore119870 = 119874(1198991(2119901+3))

leads to the optimal order

120582opt = 119874 (119899(119901+119898+1)(2119901+3)

) + 119874 (1198992119898(2119901+3)

) (24)

and the rate of convergence of MISE of 119891(119909) becomes119874(119899minus2(119901+1)(2119901+3)

)

The proof of Theorem 2 is given in the appendix At theend of this section we give a few remarks

Remark 3 The asymptotic order of the squared biasand variance of the penalized splines are 119874(119870minus2(119901+1)) +119874(12058221198702(1minus119898)

1198992) and 119874(119870119899) respectively Therefore under

119870 = (1198991(2119901+3)

) and 120582 = 119874(119899(119901+119898+1)(2119901+3)

) the optimalrate of convergence of MISE of the penalized splines is119874(119899minus2(119901+1)(2119901+3)

) From Theorem 2 we see that the asymp-totic order of 120582opt yields the optimal rate of convergence

Journal of Probability and Statistics 5

Remark 4 OrsquoSullivan [1] used 120583int119887119886119904(119898)(119909)2119889119909 as the penalty

term where 120583 is the smoothing parameter When equidistantknots are used the penaltyint119887

119886119904(119898)(119909)2119889119909 can be expressed as

1198702119898b119879119863119879

119898119877119863119898b where 119877 = (int

119887

119886119861[119901minus119898]

119894(119909)119861[119901minus119898]

119895(119909))119894119895 The

penalty b119879119863119879119898119863119898b provided by Eilers and Marx [2] can be

seen as the simple version of1198702119898b119879119863119879119898119877119863119898b by replacing 119877

with 119870minus1119868 and 120582 = 1205831198702119898minus1 where 119868 is the identity matrixSo the order 119898 of the difference matrix 119863119898 controls thesmoothness of the 119901th 119861-spline function 119904(119909) and hence 119898should be set such that119898 le 119901 to give a theoretical justificationalthough the penalized spline estimator can be also calculatedfor 119898 gt 119901 Actually (119901119898) = (3 2) is often used by manyauthors

Remark 5 The penalized spline regression is often consid-ered as the mixed model representation (see Ruppert et al[4]) In this frame work we use the 119901th truncated splinemodel

119904 (119909) = 1205730 + 1205731119909 + sdot sdot sdot + 120573119901119909119901+

119870

sum

119895=1

119906119896(119909 minus 120581119896)119901

+ (25)

where (119909 minus 119886)+ = max119909 minus 119886 0 and the 120573rsquos are unknownparameters Each 119906119896 is independently distributed as 119906119896 sim119873(0 120590

2

119906) where 1205902

119906is an unknown variance parameter of 119906119896

The penalized spline fit of 119891(119909) is defined as the estimatedBLUP (see Robinson [29]) The smoothing parameter in theordinal spline regression model corresponds to 12059021205902

119906in the

spline mixed model Since the 1205902 and 1205902119906are estimated using

the ML or REML method we do not need to choose thesmoothing parameter via a grid search It is known that theestimated BLUP fit is linked to the penalized spline estimator(9) with 119898 = 119901 + 1 (see Kauermann et al [19]) Hence theestimated BLUP tends to have a theoretically underfit (seeRemark 4)

Remark 6 From Lyapunovrsquos theorem the asymptotic nor-mality of the penalized spline estimator119891(119909)with opt can bederived under the same assumption as Theorem 2 and someadditional mild conditions Although the proof is omittedit is straightforward since opt is the consistent estimator of120582opt and the asymptotic order of 120582opt satisfies Lyapunovrsquoscondition

4 Extension to Additive Models

We extend the direct method to the regression model withmultidimensional explanatory variables In particular weconsider additive models in this section For the dataset(119884119894 x119894) 119894 = 1 119899 with 1-dimensional response 119884119894 and119889-dimensional explanatory x119894 = (1199091198941 119909119894119889) the additivemodels are connected via unknown regression functions119891119895 (119895 = 1 119889) and the mean parameter 120583 such that

119884119894 = 120583 + 1198911 (1199091198941) + sdot sdot sdot + 119891119889 (119909119894119889) + 120576119894 (26)

We assume that 119909119894119895 is located on an interval [119886119895 119887119895] and1198911 119891119889 are normalized as

int

119887119895

119886119895

119891119895 (119906) 119889119906 = 0 119895 = 1 119889 (27)

to ensure the identifiability of 119891119895 Then the intercept 120583 istypically estimated via

120583 =1

119899

119899

sum

119894=1

119884119894 (28)

Hence we replace 119884119894 with 119884119894 minus 120583 in (26) and set 120583 = 0redefining the additive model as

119884119894 = 1198911 (1199091198941) + sdot sdot sdot + 119891119889 (119909119894119889) + 120576119894 119894 = 1 119899 (29)

where each 119884119894 is centered We aim to estimate 119891119895 via apenalized spline method Let

119904119895 (119909) =

119870119895minus1

sum

119896=minus119901

119861119895119896 (119909) 119887119895119896 119895 = 1 119889 (30)

be the 119861-spline model where 119861119895119896(119909) is the 119901th 119861-splinebasis function with knots 120581119895119896 | 119896 = minus119901 119870119895 + 119901 +

1 1205811198950 = 119886119895 120581119895119870119895+1= 119887119895 119895 = 1 119889 and 119887119895119896rsquos are unknown

parameters We consider the 119861-spline additive models

119884119894 = 1199041 (1199091198941) + sdot sdot sdot + 119904119889 (119909119894119889) + 120576119894 119894 = 1 119899 (31)

and estimate 119904119895 and 119887119895119896 For 119895 = 1 119889 the penalized spline

estimator b119895 = (119895minus119901 sdot sdot sdot 119895119870minus1)119879

of b119895 = (119887119895minus119901 sdot sdot sdot 119887119895119870minus1)119879

is defined as

[[

[

b1b119889

]]

]

= argminb1b119889

119899

sum

119894=1

119884119894 minus

119889

sum

119895=1

119870119895

sum

119896=minus119901

119861119895119896 (119909119894) 119887119895119896

2

+

119889

sum

119895=1

120582119895b119879

119895119863119879

119895119898119863119895119898b119895

(32)

where 120582119895 are the smoothing parameters and 119863119895119898 is the 119898thorder difference matrix with size (119870119895 + 119901 minus 119898) times (119870119895 + 119901) for119895 = 1 119889 By using b119895 the penalized spline estimator of119891119895(119909119895) is defined as

119891119895 (119909119895) =

119870119895minus1

sum

119896=minus119901

119861119895119896 (119909119895) 119895119896 119895 = 1 119889 (33)

6 Journal of Probability and Statistics

The asymptotics for 119891119895(119909119895) have been studied by Yoshidaand Naito [22] who derived the asymptotic bias and varianceto be

119864 [119891119895 (119909119895)] = 119891119895 (119909119895) + 119870minus(119901+1)

119895119887119895119886 (119909119895) (1 + 119900 (1))

+

120582119895

119899B119895(119909119895)

119879

Λminus1

119895119899119863119879

119895119898119863119895119898blowast

119895+ 119900(

1205821198951198701minus119898

119895

119899)

119881 [119891119895 (119909119895)] =1205902

119899B119895(119909119895)

119879

119866minus1

119895119899B119895 (119909119895)

minus120582119870

119899

1205902119862119895 (119909119895 | 119899 119870119895)

119870+ 119900(

1205821198951198702

119895

1198992)

(34)

where B119895(119909) = (119861minus119901(119909) sdot sdot sdot 119861119870119895minus1(119909))119879

119866119895119899 = 119899minus1119885119879

119895119885119895

Λ 119895119899 = 119866119895119899 + (120582119895119899)119863119879

119898119863119898 119885119895 = (119861minus119901+119896minus1(119909119894119895))119894119896 and blowast

119895is

the best 119871infin approximation of 119891119895

119887119895119886 (119909) = minus

119891(119901+1)

119895(119909)

(119901 + 1)

119870119895minus1

sum

119896=0

119868 (120581119895119896 le 119909 lt 120581119895119896+1)

times 119861119901+1(

119909 minus 120581119895119896

119870minus1119895

)

119862119895 (119909119895 | 119899 119870119895) =2

119899B119895(119909119895)

119879

119866minus1

119895119899119863119879

119895119898119863119895119898119866

minus1

119899B119895 (119909119895)

(35)

The above asymptotic bias and variance of 119891119895(119909119895) are sim-ilar to that of the penalized spline estimator in univariateregression with (119884119894 119909119894119895) 119894 = 1 119899 Furthermore

the asymptotic normality of [1198911(1199091) sdot sdot sdot 119891119889(119909119889)]119879

has beenshown by Yoshida and Naito [22] From their paper we findthat119891119894(119909119894) and119891119895(119909119895) are then asymptotically independent for119894 = 119895 This indicates some theoretical justification to select 120582119895in minimizing the MISE of 119891119895 Similar to the discussion inSection 3 the minimizer 120582119895opt of the MISE of 119891119895(119909119895) can beobtained for 119895 = 1 119889

120582119895opt =119899

2

int119887119895

119886119895119871119895119899 (119909 | 119891

(119901+1)

119895 blowast119895 1205902) 119889119909

int119887119895

119886119895119867119895119899 (119909 | blowast119895 ) 119889119909

(36)

where

119867119895119899 (119909 | blowast

119895) = B119895(119909119895)

119879

119866minus1

119895119899119863119879

119895119898119863119895119898blowast

119895

2

119871119895119899 (119909 | 119891(119901+1)

119895 blowast119895 1205902) = 2

119887119886 (119909)B119895(119909119895)119879

119866minus1

119895119899119863119879

119895119898119863119895119898blowast119895

119870119901+1

119895

+ 1205902119862119895 (119909 | 119899 119870119895)

(37)

Since 119891119895 blowast119895 and 1205902 are unknown they should be estimated

Thepilot estimators of119891119895 blowast119895 and1205902 are constructed based on

the regression splinemethod By using the pilot estimators119891119895b119895 of119891119895blowast119895 and the estimator of1205902 we construct the estimatorof 120582119895opt

119895opt =119899

2

sum119877

119903=1119871119895119899 (119911119903 | 119891

(119901+1)

119895 b119895 2)

sum119877

119903=1119867119895119899 (119911119903 | b119895)

(38)

where 119911119903119877

1is some finite grid point sequence on [119886119895 119887119895] We

obtain for 119895 = 1 119889 the penalized spline estimator 119891119895(119909119895)of 119891119895(119909119895)

119891119895 (119909119895) = B119895(119909119895)119879

b119895opt (39)

where b119895opt is the penalized spline estimator of b119895 using119895opt From Theorem 2 and the proof of Theorem 34of Yoshida and Naito [22] the asymptotic normality of[1198911(1199091) sdot sdot sdot 119891119889(119909119889)]

119879

using (1205821opt 120582119889opt) can be shown

Remark 7 Since the true regression functions are normalizedthe estimator 119891119895 should be also centered as

119891119895 (119909119895) minus1

119899

119899

sum

119894=1

119891119895 (119909119894119895) (40)

Remark 8 Thepenalized spline estimator of b1 b119889 can beobtained using a backfitting algorithm (Hastie and Tibshirani[30]) The backfitting algorithm for the penalized splines inadditive regression is detailed in Marx and Eilers [3] andYoshida and Naito [21]

Remark 9 Although we focus on the nonparametric additiveregression in this paper the directmethod can be also appliedto the generalized additive models However we omit thisdiscussion because the procedure is similar to the case of theadditive models discussed in this section

Remark 10 The direct method is quite computationally effi-cient when compared to the grid search method in additivemodels In grid searches we prepare the candidate of 120582119895 Let119872119895 be the set of all possible candidate grid value of 120582119895 for 119895 =1 119889 Then we need to compute the backfitting algorithm1198721 times sdot sdot sdot times 119872119889 times On the other hand it is sufficientto perform the backfitting algorithm for only two steps forthe pilot estimator and the final penalized spline estimatorThus compared with the conventional grid search methodthe direct method can drastically reduce computation time

5 Numerical Study

In this section we investigate the finite sample performanceof the proposed direct method in a Monte Carlo simulationLet us first consider the univariate regression model (1) forthe data (119884119894 119909119894) 119894 = 1 119899 Then we use the three

Journal of Probability and Statistics 7

Table 1 Results of sample MISE for 119899 = 50 and 119899 = 200 All entries for MISE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0526 0322 3684 1070 1160 0377L-GCV 0726 0621 5811 1082 1183 0379L-REML 2270 1544 9966 3520 1283 0873Local linear 0751 0220 4274 0901 1689 0503C-Direct 0401 0148 3027 0656 1044 0326C-GCV 0514 0137 3526 0666 1043 0290C-REML 1326 0732 8246 4213 3241 0835

Table 2 Results of sample MSE of smoothing parameter obtained by direct method GCV and REML for 119899 = 50 and 119899 = 200 All entries forMSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0331 0193 0113 0043 0025 0037L-GCV 0842 0342 0445 0101 0072 0043L-REML 1070 1231 0842 0437 0143 0268C-Direct 0262 0014 0082 0045 0053 0014C-GCV 0452 0123 0252 0092 0085 0135C-REML 0855 0224 0426 0152 0093 0463

types of true regression function 119891(119909) = cos(120587(119909 minus 03))119891(119909) = 120601((119909 minus 08)005)radic005 minus 120601((119909 minus 02)004)radic004and 119891(119909) = 21199093 +3 sin(2120587(119909minus08)3) + 3 exp[minus(119909minus05)201]which are labeled by F1 F2 and F3 respectively Here120601(119909) is the density function of the normal distribution Theexplanatory 119909119894 and error 120576119894 are independently generated fromuniform distribution on [0 1] and 119873(0 052) respectivelyWe estimate each true regression function via the penalizedspline method We then use the linear and cubic 119861-splinebases with equidistant knots and the second order differencepenalty In addition we set 119870 = 5119899

25 equidistant knotsand the smoothing parameter is determined by the directmethodThe penalized spline estimator with the linear splineand cubic spline are denoted as L-Direct and C-Directrespectively For comparison with L-Direct the same studiesare also implemented for the penalized spline estimator witha linear spline and the smoothing parameter selected viaGCV and restricted maximum likelihood method (REML)in mixed model representation and the local polynomialestimator with normal kernel and Plug-in bandwidth (seeRuppert et al [24]) In GCV we set the candidate values of120582 as 11989410 119894 = 0 1 99 The above three estimators aredenoted by L-GCV L-REML and local linear respectivelyFurthermore we compare C-Direct with C-GCV and C-REML which are the penalized spline estimator with thecubic spline and the smoothing parameter determined byGCV and REML Let

sMISE = 1119869

119869

sum

119895=1

[1

119877

119877

sum

119903=1

119891119903 (119911119895) minus 119891 (119911119895)2

] (41)

be the sample MISE of any estimator 119891(119909) of 119891(119909) where119891119903(119911) is 119891(119911) with 119903th replication and 119911119895 = 119895119869 We calculatethe sample MISE of the penalized spline estimator withthe direct method GCV and REML and the local linearestimator In this simulation we use 119869 = 100 and 119877 = 1000We have simulated 119899 = 50 and 200

The sMISE of all estimators for each model and 119899 aregiven in Table 1 The penalized spline estimator using thedirect method shows good performance in each setting Incomparison with other smoothing parameter methods thedirect method is a little better than the GCV as a wholeHowever for 119899 = 200 C-GCV is better than C-Direct inF1 and F3 though its difference is very small We see thatthe sMISE of C-Direct is smaller than local linear whereaslocal linear behaves better than L-Direct in some case In F2C-Direct is the smallest of all estimators for 119899 = 50 and200 Although the performance totally seems to depend onsituation in which data sets are generated we believe that theproposed method is one of the efficient methods

Next the difference between opt and 120582opt is investigatedempirically Let opt119903 be the opt with 119903th replications for 119903 =1 1000 Then we calculate the sample MSE of opt

sMSE = 1119877

119877

sum

119903=1

opt119903 minus 120582opt2

(42)

for F1 F2 and F3 and 119899 = 50 and 200 To construct 120582opt forF1 F2 and F3 we use true119891(4)(119909) and 1205902 and an approximateblowast Here the approximate blowast means the sample average of bwith 119899 = 200 and 1000 replications

In Table 2 the sMSE of opt for each true functionare described For comparison the sMSE of the smoothing

8 Journal of Probability and Statistics

Table 3 Results of sample PSE for 119899 = 50 and 119899 = 200 All entries for PSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0424 0052 0341 0070 0360 0117L-GCV 0331 0084 0381 0089 0333 0139L-REML 0642 0124 0831 1002 0733 0173Local linear 0751 0220 0474 0901 0489 0143C-Direct 0341 048 0237 063 0424 0086C-GCV 0314 043 0326 076 0533 0089C-REML 0863 1672 0456 121 1043 0125

parameter obtained via GCV and REML are calculated ThesMSE of the L-Direct and the C-Direct is small even in 119899 = 50for all true functions Therefore it seems that the accuracyof opt is guaranteed It indicates that the pilot estimatorconstructed via least squares method is not bad The sMSEwith the direct method are smaller than that with GCV andREML This result is not surprising since GCV and REMLare not concerned with 120582opt However together with Table 1it seems that the sample MSE of the smoothing parameter isreflected in the sample MISE of the estimator

The proposed direct method was derived based on theMISE On the other hand the GCV and the REML areobtained in context of minimizing prediction squared error(PSE) and prediction error Hence we compare the samplePSE of the penalized spline estimator with the direct methodGCV and REML and the local linear estimator Since theprediction error is almost the same as MISE (see Section 4 ofRuppert et al [4]) we omit the comparison of the predictionerror Let

sPSE = 1119899

119899

sum

119894=1

[1

119877

119877

sum

119903=1

119910119894119903 minus 119891119903 (119909119894)2

] (43)

be the sample PSE for any estimator 119891(119909) where119910119894119903 (119903 = 1 119877) is independently generated from119884119894 sim 119873(119891(119909119894) (05)

2) for all 119894

In Table 3 the modified sPSE |sPSE minus (05)2| of allestimators for each model and 119899 are described In Remark 11we discuss the modified sPSE From the result we canconfirm that the direct method has good predictability GCVcan be regarded as the estimator of sPSE Therefore in somecase sPSE with GCV is smaller than that with the directmethod It seems that the cause is the accuracy of theestimates of variance (see Remark 11) However its differenceis very small

We admit that the computational time (in second) takento obtain 119891(119909) for F1 119901 = 3 and 119899 = 200 The fits withthe direct method GCV and REML took 004 122 and 034Although the difference is small the computational time ofthe direct method was faster than that of GCV and REML

Next we confirm the behavior of the penalized splineestimator with the direct method in the additive model Forthe data (119884119894 1199091198941 1199091198942 1199091198943) 119894 = 1 119899 we assume theadditive model with true functions 1198911(1199091) = sin(21205871199091)

1198912(1199092) = 120601((1199092 minus 05)02) and 1198913(1199093) = 04120601((1199093 minus

01)02) + 06120601((1199092 minus 08)02) The error is similar to thefirst simulationThe design points (1199091198941 1199091198942 1199091198943) are generatedby

[

[

1199091198941

1199091198942

1199091198943

]

]

=

[[[

[

(1 + 120588 + 1205882)minus1

0 0

0 (1 + 2120588)minus1

0

0 0 (1 + 120588 + 1205882)minus1

]]]

]

times [

[

1 120588 1205882

120588 1 120588

1205882120588 1

]

]

[

[

1199111198941

1199111198942

1199111198943

]

]

(44)

where 119911119894119895 (119894 = 1 119899 119895 = 1 2 3) are generated inde-pendently from the uniform distribution on [0 1] In thissimulation we adopt 120588 = 0 and 120588 = 05 We then correctedto satisfy

int

1

0

119891119895 (119911) 119889119911 = 0 119895 = 1 2 3 (45)

in each 120588 We construct the penalized spline estimator viathe backfitting algorithm with the cubic spline and secondorder difference penalty The number of equidistant knots is119870119895 = 5119899

25 and the smoothing parameters are determinedusing the direct method The pilot estimator to construct119895opt is the regression spline estimator with fifth spline and119870119895 = 119899

25 We calculate the sample MISE of each 119891119895 for119899 = 200 and 1000 Monte Carlo iterations Then we calculatethe sample MISE of 1198911 1198912 and 1198913 In order to compare withthe direct method we also conduct the same simulation withGCV and REML In GCV we set the candidate values of 120582119895 as11989410 119894 = 0 1 9 for 119895 = 1 2 3 Table 4 summarizes thesample MISE of 119891119895 (119895 = 1 2 3) denoted by MISE119895 for directmethod GCV and REML The penalized spline estimatorwith the direct method performs like that with the GCV inboth the uncorrelated and correlated design cases For 120588 = 0the behaviors of MISE1 MISE2 and MISE3 with the directmethod are similar On the other hand for GCV the MISE1is slightly larger than MISE2 and MISE3 The direct methodleads to an efficient estimate of all covariates On the wholethe direct method is better than REML From above webelieve that the direct method is preferable in practice

Journal of Probability and Statistics 9

Table 4 Results of sample MISE of the penalized spline estimator For each direct method GCV and 120588 MISE1 MISE2 and MISE3 are thesample MISE of 1198911 1198912 and 1198913 respectively All entries for MISE are 102 times their actual values

Method 120588 = 0 120588 = 05

MISE1 MISE2 MISE3 MISE1 MISE2 MISE3Direct 0281 0209 0205 0916 0795 0972GCV 0687 0237 0390 0929 0857 1112REML 1204 0825 0621 2104 1832 2263

Table 5 Results of sample MSE of the selected smoothing parameter via direct method GCV and REML for 120588 = 0 and 120588 = 05 All entriesfor MSE are 102 times their actual values

120588 = 0 120588 = 05

True 1198911 1198912 1198913 1198911 1198912 1198913

Direct 0124 0042 0082 0312 0105 0428GCV 1315 0485 0213 0547 0352 0741REML 2531 1237 2656 1043 0846 1588

Table 6 Results of sample PSE of the penalized spline estimator Allentries for PSE are 102 times their actual values

119899 = 200 120588 = 0 120588 = 05

Direct 0281 0429GCV 0387 0467REML 1204 0925

To confirm the consistency of the direct method thesample MSE of 119895opt is calculated the same manner as thatgiven in univariate regression case For comparison thesample MSE of GCV and REML are obtained Here the true120582119895opt (119895 = 1 2 3) is defined in the same way as that forunivariate case In Table 5 the results for each 1198911 1198912 and 1198913and 120588 = 0 and 05 are shown We see from the results that thebehavior of the directmethod is goodThe sampleMSE of thedirect method is smaller than that of GCV and REML for all119895 For the random design 120588 = 05 such tendency can be alsoseen

Table 6 shows the sPSE of 119891(x119894) = 1198911(1199091198941) + 1198912(1199091198942) +1198913(1199091198943)with each smoothing parameter selectionmethodWesee from result that the efficiency of the direct method isguaranteed in the context of prediction accuracy

Finally we show the computational time (in second)required to construct the penalized spline estimatorwith eachmethod The computational times with the direct methodGCV and REML are 1187 s 12643 s and 435 s respectivelyWe see that the direct method is more efficient than othermethods in context of the computation (see Remark 11) Allof computations in the simulation were done using a softwareR and a computer with 340GHz CPU and 240GB memoryThough this is only few examples the direct method can beseen as one of the good methods to select the smoothingparameter

Remark 11 We calculate |sPSE minus (05)2| as the criteria toconfirm the prediction squared error The ordinal PSE isdefined as

PSE = 1119899

119899

sum

119894=1

119864 [119884119894 minus 119891 (119909119894)2

]

= 1205902+1

119899

119899

sum

119894=1

119864 [119891 (119909119894) minus 119891 (119909119894)2

]

(46)

where 119884119894rsquos are the test data The second term of the PSEis similar to MISE So it can be said that the sample PSEevaluates the accuracy of the variance part andMISE part Tosee the detail of the difference of the sample PSE between thedirect method and other method we calculated |sPSE minus 1205902|in this section

6 Discussion

In this paper we proposed a new direct method for deter-mining the smoothing parameter for a penalized splineestimator in a regression problemThe directmethod is basedon minimizing MISE of the penalized spline estimator Westudied the asymptotic property of the direct method Theasymptotic normality of the penalized spline estimator usingopt is theoretically guaranteed when the consistent estimatoris used as the pilot estimator to obtain opt In numericalstudy for the additive model the computational cost of thisdirect method is dramatically reduced when compared togrid search methods such as GCV Furthermore we find thatthe performance of the direct method is better than or at leastsimilar to that of other methods

The direct method will be developed for other regressionmodels such as varying-coefficient models Cox propor-tional hazards models single-index models and others ifthe asymptotic bias and variance of the penalized spline

10 Journal of Probability and Statistics

estimator are derived It is not limited to themean regressionit can be applied to quantile regression Actually Yoshida[31] has presented the asymptotic bias and variance of thepenalized spline estimator in univariate quantile regressionsFurthermore it is seen that improving the direct method isimportant for various situations and datasets In particularthe development of the determination of locally adaptive 120582 isan interesting avenue of further research

Appendix

We describe the technical details For the matrix 119860 = (119886119894119895)119894119895119860infin = max119894119895|119886119894119895| First to prove Theorems 1 and 2 weintroduce the fundamental property of penalized splines inthe following lemma

Lemma A1 Let 119860 = (119886119894119895)119894119895 be (119870 + 119901) matrix Supposethat 119870 = 119900(11989912) 120582 = 119900(1198991198701minus119898) and 119860infin = 119874(1) Then119860119866minus1

119899infin= 119874(119870) and 119860Λminus1

119899infin= 119874(119870)

The proof of Lemma A1 is shown in Zhou et al [28] andClaeskens et al [18] The repeated use of Lemma A1 yieldsthat the asymptotic order of the variance of the second termof (12) is 119874(1205822(119870119899)3) Actually the asymptotic order of thevariance of the second term of (12) can be calculated as

(120582

119899)

21205902

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899119863119879

119898119863119898Λminus1

119899B (119909)

= 119874(1205822

11989931198703)

(A1)

When 119870 = 119900(11989912) and 120582 = 119874(119899119870

1minus119898) 1205822(119870119899)3 =

119900(120582(119870119899)2) holds

Proof of Theorem 1 We write

119862 (119909 | 119899 119870 120582) =21205902

119899B(119909)119879119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909)

+21205902

119899B(119909)119879 Λminus1

119899minus 119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909) (A2)

Since

Λminus1

119899minus 119866minus1

119899= Λminus1

119899119868 minus Λ 119899119866

minus1

119899

=120582

119899Λminus1

119899119863119879

119898119863119898119866minus1

119899

(A3)

we have Λminus1119899minus 119866minus1

119899infin

= 119874(1205821198702119899) and hence the

second term of right hand side of (A2) can be shown119874((119870119899)

2(120582119870119899)) From Lemma A1 it is easy to derive that

119862(119909 | 119899 119870 0) = 119874(119870(119870119899)) Consequently we have

119862 (119909 | 119899 119870 120582)

119870=119862 (119909 | 119899 119870 0)

119870+ 119874((

119870

119899)

2

(120582

119899))

=119862 (119909 | 119899 119870 0)

119870+ 119900 (

119870

119899)

(A4)

and this leads to Theorem 1

Proof of Theorem 2 It is sufficient to prove that int1198871198861198631119899(119909 |

blowast)119889119909 = 119874(1198702(1minus119898)) and

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874(1198702

119899)

(A5)

Since100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

119889119909

100381610038161003816100381610038161003816100381610038161003816

le sup119909isin[119886119887]

[B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

] (119886 minus 119887)

(A6)

we have

int

119887

119886

1198631119899 (119909 | blowast) 119889119909 = 119874 (119870

2(1minus119898)) (A7)

by using Lemma A1 and the fact that 119863119898blowastinfin = 119874(119870minus119898)

(see Yoshida [31])From the property of119861-spline function for (119870+119901) square

matrix 119860 there exists 119862 gt 0 such that

int

119887

119886

B(119909)119879119860B (119909) 119889119909 le 119860infin int119887

119886

B(119909)119879B (119909) 119889119909

le 119860infin119862

(A8)

In other words the asymptotic order of int119887119886B(119909)119879119860B(119909)119889119909

and that of 119860infin are the same Let 119860119899 = 119866minus1

119899119863119879

119898119863119898119866minus1

119899

Then since 119860119899infin = 1198702 we obtain

int

119887

119886

119862 (119909 | 119899 119870 0) 119889119909 = 119874(1198702

119899) (A9)

Furthermore because sup119886isin[119886119887]

119887119886(119909) = 119874(1) we have

119870minus(119901+1)

100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

119887119886 (119909)B(119909)119879119866minus1

119899119863119879

119898119863119898blowast119889119909

100381610038161003816100381610038161003816100381610038161003816

= 119874 (119870minus(119901+119898)

)

(A10)

and hence it can be shown that

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874 (119899minus11198702) (A11)

Together with (A7) and (A11) 120582opt = 119874(119899119870119898minus119901minus2

+ 1198702119898)

can be obtained Then rate of convergence of the penalizedspline estimator with 120582opt is 119874(119899

minus2(119901+2)(2119901 + 3)) when 119870 =

119874(1198991(2119901+3)

) which is detailed in Yoshida and Naito [22]

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Journal of Probability and Statistics 11

Acknowledgment

The author also would like to thank two anonymous refereesfor their careful reading and comments to improve the paper

References

[1] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash527 1986

[2] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash121 1996 With comments and a rejoinder by the authors

[3] B D Marx and P H C Eilers ldquoDirect generalized additivemodeling with penalized likelihoodrdquo Computational Statisticsand Data Analysis vol 28 no 2 pp 193ndash209 1998

[4] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge UK 2003

[5] T Krivobokova ldquoSmoothing parameter selection in two frame-works for penalized splinesrdquo Journal of the Royal StatisticalSociety B vol 75 no 4 pp 725ndash741 2013

[6] P T Reiss and R T Ogden ldquoSmoothing parameter selection fora class of semiparametric linear modelsrdquo Journal of the RoyalStatistical Society B vol 71 no 2 pp 505ndash523 2009

[7] S N Wood ldquoModelling and smoothing parameter estimationwith multiple quadratic penaltiesrdquo Journal of the Royal Statisti-cal Society B vol 62 no 2 pp 413ndash428 2000

[8] S NWood ldquoStable and efficient multiple smoothing parameterestimation for generalized additive modelsrdquo Journal of theAmerican Statistical Association vol 99 no 467 pp 673ndash6862004

[9] S N Wood ldquoFast stable direct fitting and smoothness selectionfor generalized additive modelsrdquo Journal of the Royal StatisticalSociety B vol 70 no 3 pp 495ndash518 2008

[10] X Lin and D Zhang ldquoInference in generalized additive mixedmodels by using smoothing splinesrdquo Journal of the RoyalStatistical Society B vol 61 no 2 pp 381ndash400 1999

[11] M P Wand ldquoSmoothing and mixed modelsrdquo ComputationalStatistics vol 18 no 2 pp 223ndash249 2003

[12] L Fahrmeir T Kneib and S Lang ldquoPenalized structuredadditive regression for space-time data a Bayesian perspectiverdquoStatistica Sinica vol 14 no 3 pp 731ndash761 2004

[13] L Fahrmeir and T Kneib ldquoPropriety of posteriors in structuredadditive regression models theory and empirical evidencerdquoJournal of Statistical Planning and Inference vol 139 no 3 pp843ndash859 2009

[14] F Heinzl L Fahrmeir and T Kneib ldquoAdditive mixed modelswith Dirichlet process mixture and P-spline priorsrdquo Advancesin Statistical Analysis vol 96 no 1 pp 47ndash68 2012

[15] G Kauermann ldquoA note on smoothing parameter selection forpenalized spline smoothingrdquo Journal of Statistical Planning andInference vol 127 no 1-2 pp 53ndash69 2005

[16] P Hall and J D Opsomer ldquoTheory for penalised splineregressionrdquo Biometrika vol 92 no 1 pp 105ndash118 2005

[17] Y Li and D Ruppert ldquoOn the asymptotics of penalized splinesrdquoBiometrika vol 95 no 2 pp 415ndash436 2008

[18] G Claeskens T Krivobokova and J D Opsomer ldquoAsymptoticproperties of penalized spline estimatorsrdquo Biometrika vol 96no 3 pp 529ndash544 2009

[19] G Kauermann T Krivobokova and L Fahrmeir ldquoSomeasymptotic results on generalized penalized spline smoothingrdquo

Journal of the Royal Statistical Society B vol 71 no 2 pp 487ndash503 2009

[20] X Wang J Shen and D Ruppert ldquoOn the asymptotics ofpenalized spline smoothingrdquo Electronic Journal of Statistics vol5 pp 1ndash17 2011

[21] T Yoshida and K Naito ldquoAsymptotics for penalized additive 119861-spline regressionrdquo Journal of the Japan Statistical Society vol 42no 1 pp 81ndash107 2012

[22] T Yoshida and K Naito ldquoAsymptotics for penalized splinesin generalized additive modelsrdquo Journal of NonparametricStatistics vol 26 no 2 pp 269ndash289 2014

[23] L Xiao Y Li and D Ruppert ldquoFast bivariate 119875-splines thesandwich smootherrdquo Journal of the Royal Statistical Society Bvol 75 no 3 pp 577ndash599 2013

[24] D Ruppert S J Sheather and M P Wand ldquoAn effectivebandwidth selector for local least squares regressionrdquo Journal ofthe American Statistical Association vol 90 no 432 pp 1257ndash1270 1995

[25] M P Wand and M C Jones Kernel Smoothing Chapman ampHall London UK 1995

[26] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[27] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[28] S Zhou X Shen and D A Wolfe ldquoLocal asymptotics forregression splines and confidence regionsrdquo The Annals ofStatistics vol 26 no 5 pp 1760ndash1782 1998

[29] G K Robinson ldquoThat BLUP is a good thing the estimation ofrandom effectsrdquo Statistical Science vol 6 no 1 pp 15ndash32 1991

[30] T J Hastie and R J Tibshirani Generalized Additive ModelsChapman amp Hall London UKI 1990

[31] T Yoshida ldquoAsymptotics for penalized spline estimators inquantile regressionrdquo Communications in Statistics-Theory andMethods 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Direct Determination of Smoothing

Journal of Probability and Statistics 5

Remark 4 OrsquoSullivan [1] used 120583int119887119886119904(119898)(119909)2119889119909 as the penalty

term where 120583 is the smoothing parameter When equidistantknots are used the penaltyint119887

119886119904(119898)(119909)2119889119909 can be expressed as

1198702119898b119879119863119879

119898119877119863119898b where 119877 = (int

119887

119886119861[119901minus119898]

119894(119909)119861[119901minus119898]

119895(119909))119894119895 The

penalty b119879119863119879119898119863119898b provided by Eilers and Marx [2] can be

seen as the simple version of1198702119898b119879119863119879119898119877119863119898b by replacing 119877

with 119870minus1119868 and 120582 = 1205831198702119898minus1 where 119868 is the identity matrixSo the order 119898 of the difference matrix 119863119898 controls thesmoothness of the 119901th 119861-spline function 119904(119909) and hence 119898should be set such that119898 le 119901 to give a theoretical justificationalthough the penalized spline estimator can be also calculatedfor 119898 gt 119901 Actually (119901119898) = (3 2) is often used by manyauthors

Remark 5 The penalized spline regression is often consid-ered as the mixed model representation (see Ruppert et al[4]) In this frame work we use the 119901th truncated splinemodel

119904 (119909) = 1205730 + 1205731119909 + sdot sdot sdot + 120573119901119909119901+

119870

sum

119895=1

119906119896(119909 minus 120581119896)119901

+ (25)

where (119909 minus 119886)+ = max119909 minus 119886 0 and the 120573rsquos are unknownparameters Each 119906119896 is independently distributed as 119906119896 sim119873(0 120590

2

119906) where 1205902

119906is an unknown variance parameter of 119906119896

The penalized spline fit of 119891(119909) is defined as the estimatedBLUP (see Robinson [29]) The smoothing parameter in theordinal spline regression model corresponds to 12059021205902

119906in the

spline mixed model Since the 1205902 and 1205902119906are estimated using

the ML or REML method we do not need to choose thesmoothing parameter via a grid search It is known that theestimated BLUP fit is linked to the penalized spline estimator(9) with 119898 = 119901 + 1 (see Kauermann et al [19]) Hence theestimated BLUP tends to have a theoretically underfit (seeRemark 4)

Remark 6 From Lyapunovrsquos theorem the asymptotic nor-mality of the penalized spline estimator119891(119909)with opt can bederived under the same assumption as Theorem 2 and someadditional mild conditions Although the proof is omittedit is straightforward since opt is the consistent estimator of120582opt and the asymptotic order of 120582opt satisfies Lyapunovrsquoscondition

4 Extension to Additive Models

We extend the direct method to the regression model withmultidimensional explanatory variables In particular weconsider additive models in this section For the dataset(119884119894 x119894) 119894 = 1 119899 with 1-dimensional response 119884119894 and119889-dimensional explanatory x119894 = (1199091198941 119909119894119889) the additivemodels are connected via unknown regression functions119891119895 (119895 = 1 119889) and the mean parameter 120583 such that

119884119894 = 120583 + 1198911 (1199091198941) + sdot sdot sdot + 119891119889 (119909119894119889) + 120576119894 (26)

We assume that 119909119894119895 is located on an interval [119886119895 119887119895] and1198911 119891119889 are normalized as

int

119887119895

119886119895

119891119895 (119906) 119889119906 = 0 119895 = 1 119889 (27)

to ensure the identifiability of 119891119895 Then the intercept 120583 istypically estimated via

120583 =1

119899

119899

sum

119894=1

119884119894 (28)

Hence we replace 119884119894 with 119884119894 minus 120583 in (26) and set 120583 = 0redefining the additive model as

119884119894 = 1198911 (1199091198941) + sdot sdot sdot + 119891119889 (119909119894119889) + 120576119894 119894 = 1 119899 (29)

where each 119884119894 is centered We aim to estimate 119891119895 via apenalized spline method Let

119904119895 (119909) =

119870119895minus1

sum

119896=minus119901

119861119895119896 (119909) 119887119895119896 119895 = 1 119889 (30)

be the 119861-spline model where 119861119895119896(119909) is the 119901th 119861-splinebasis function with knots 120581119895119896 | 119896 = minus119901 119870119895 + 119901 +

1 1205811198950 = 119886119895 120581119895119870119895+1= 119887119895 119895 = 1 119889 and 119887119895119896rsquos are unknown

parameters We consider the 119861-spline additive models

119884119894 = 1199041 (1199091198941) + sdot sdot sdot + 119904119889 (119909119894119889) + 120576119894 119894 = 1 119899 (31)

and estimate 119904119895 and 119887119895119896 For 119895 = 1 119889 the penalized spline

estimator b119895 = (119895minus119901 sdot sdot sdot 119895119870minus1)119879

of b119895 = (119887119895minus119901 sdot sdot sdot 119887119895119870minus1)119879

is defined as

[[

[

b1b119889

]]

]

= argminb1b119889

119899

sum

119894=1

119884119894 minus

119889

sum

119895=1

119870119895

sum

119896=minus119901

119861119895119896 (119909119894) 119887119895119896

2

+

119889

sum

119895=1

120582119895b119879

119895119863119879

119895119898119863119895119898b119895

(32)

where 120582119895 are the smoothing parameters and 119863119895119898 is the 119898thorder difference matrix with size (119870119895 + 119901 minus 119898) times (119870119895 + 119901) for119895 = 1 119889 By using b119895 the penalized spline estimator of119891119895(119909119895) is defined as

119891119895 (119909119895) =

119870119895minus1

sum

119896=minus119901

119861119895119896 (119909119895) 119895119896 119895 = 1 119889 (33)

6 Journal of Probability and Statistics

The asymptotics for 119891119895(119909119895) have been studied by Yoshidaand Naito [22] who derived the asymptotic bias and varianceto be

119864 [119891119895 (119909119895)] = 119891119895 (119909119895) + 119870minus(119901+1)

119895119887119895119886 (119909119895) (1 + 119900 (1))

+

120582119895

119899B119895(119909119895)

119879

Λminus1

119895119899119863119879

119895119898119863119895119898blowast

119895+ 119900(

1205821198951198701minus119898

119895

119899)

119881 [119891119895 (119909119895)] =1205902

119899B119895(119909119895)

119879

119866minus1

119895119899B119895 (119909119895)

minus120582119870

119899

1205902119862119895 (119909119895 | 119899 119870119895)

119870+ 119900(

1205821198951198702

119895

1198992)

(34)

where B119895(119909) = (119861minus119901(119909) sdot sdot sdot 119861119870119895minus1(119909))119879

119866119895119899 = 119899minus1119885119879

119895119885119895

Λ 119895119899 = 119866119895119899 + (120582119895119899)119863119879

119898119863119898 119885119895 = (119861minus119901+119896minus1(119909119894119895))119894119896 and blowast

119895is

the best 119871infin approximation of 119891119895

119887119895119886 (119909) = minus

119891(119901+1)

119895(119909)

(119901 + 1)

119870119895minus1

sum

119896=0

119868 (120581119895119896 le 119909 lt 120581119895119896+1)

times 119861119901+1(

119909 minus 120581119895119896

119870minus1119895

)

119862119895 (119909119895 | 119899 119870119895) =2

119899B119895(119909119895)

119879

119866minus1

119895119899119863119879

119895119898119863119895119898119866

minus1

119899B119895 (119909119895)

(35)

The above asymptotic bias and variance of 119891119895(119909119895) are sim-ilar to that of the penalized spline estimator in univariateregression with (119884119894 119909119894119895) 119894 = 1 119899 Furthermore

the asymptotic normality of [1198911(1199091) sdot sdot sdot 119891119889(119909119889)]119879

has beenshown by Yoshida and Naito [22] From their paper we findthat119891119894(119909119894) and119891119895(119909119895) are then asymptotically independent for119894 = 119895 This indicates some theoretical justification to select 120582119895in minimizing the MISE of 119891119895 Similar to the discussion inSection 3 the minimizer 120582119895opt of the MISE of 119891119895(119909119895) can beobtained for 119895 = 1 119889

120582119895opt =119899

2

int119887119895

119886119895119871119895119899 (119909 | 119891

(119901+1)

119895 blowast119895 1205902) 119889119909

int119887119895

119886119895119867119895119899 (119909 | blowast119895 ) 119889119909

(36)

where

119867119895119899 (119909 | blowast

119895) = B119895(119909119895)

119879

119866minus1

119895119899119863119879

119895119898119863119895119898blowast

119895

2

119871119895119899 (119909 | 119891(119901+1)

119895 blowast119895 1205902) = 2

119887119886 (119909)B119895(119909119895)119879

119866minus1

119895119899119863119879

119895119898119863119895119898blowast119895

119870119901+1

119895

+ 1205902119862119895 (119909 | 119899 119870119895)

(37)

Since 119891119895 blowast119895 and 1205902 are unknown they should be estimated

Thepilot estimators of119891119895 blowast119895 and1205902 are constructed based on

the regression splinemethod By using the pilot estimators119891119895b119895 of119891119895blowast119895 and the estimator of1205902 we construct the estimatorof 120582119895opt

119895opt =119899

2

sum119877

119903=1119871119895119899 (119911119903 | 119891

(119901+1)

119895 b119895 2)

sum119877

119903=1119867119895119899 (119911119903 | b119895)

(38)

where 119911119903119877

1is some finite grid point sequence on [119886119895 119887119895] We

obtain for 119895 = 1 119889 the penalized spline estimator 119891119895(119909119895)of 119891119895(119909119895)

119891119895 (119909119895) = B119895(119909119895)119879

b119895opt (39)

where b119895opt is the penalized spline estimator of b119895 using119895opt From Theorem 2 and the proof of Theorem 34of Yoshida and Naito [22] the asymptotic normality of[1198911(1199091) sdot sdot sdot 119891119889(119909119889)]

119879

using (1205821opt 120582119889opt) can be shown

Remark 7 Since the true regression functions are normalizedthe estimator 119891119895 should be also centered as

119891119895 (119909119895) minus1

119899

119899

sum

119894=1

119891119895 (119909119894119895) (40)

Remark 8 Thepenalized spline estimator of b1 b119889 can beobtained using a backfitting algorithm (Hastie and Tibshirani[30]) The backfitting algorithm for the penalized splines inadditive regression is detailed in Marx and Eilers [3] andYoshida and Naito [21]

Remark 9 Although we focus on the nonparametric additiveregression in this paper the directmethod can be also appliedto the generalized additive models However we omit thisdiscussion because the procedure is similar to the case of theadditive models discussed in this section

Remark 10 The direct method is quite computationally effi-cient when compared to the grid search method in additivemodels In grid searches we prepare the candidate of 120582119895 Let119872119895 be the set of all possible candidate grid value of 120582119895 for 119895 =1 119889 Then we need to compute the backfitting algorithm1198721 times sdot sdot sdot times 119872119889 times On the other hand it is sufficientto perform the backfitting algorithm for only two steps forthe pilot estimator and the final penalized spline estimatorThus compared with the conventional grid search methodthe direct method can drastically reduce computation time

5 Numerical Study

In this section we investigate the finite sample performanceof the proposed direct method in a Monte Carlo simulationLet us first consider the univariate regression model (1) forthe data (119884119894 119909119894) 119894 = 1 119899 Then we use the three

Journal of Probability and Statistics 7

Table 1 Results of sample MISE for 119899 = 50 and 119899 = 200 All entries for MISE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0526 0322 3684 1070 1160 0377L-GCV 0726 0621 5811 1082 1183 0379L-REML 2270 1544 9966 3520 1283 0873Local linear 0751 0220 4274 0901 1689 0503C-Direct 0401 0148 3027 0656 1044 0326C-GCV 0514 0137 3526 0666 1043 0290C-REML 1326 0732 8246 4213 3241 0835

Table 2 Results of sample MSE of smoothing parameter obtained by direct method GCV and REML for 119899 = 50 and 119899 = 200 All entries forMSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0331 0193 0113 0043 0025 0037L-GCV 0842 0342 0445 0101 0072 0043L-REML 1070 1231 0842 0437 0143 0268C-Direct 0262 0014 0082 0045 0053 0014C-GCV 0452 0123 0252 0092 0085 0135C-REML 0855 0224 0426 0152 0093 0463

types of true regression function 119891(119909) = cos(120587(119909 minus 03))119891(119909) = 120601((119909 minus 08)005)radic005 minus 120601((119909 minus 02)004)radic004and 119891(119909) = 21199093 +3 sin(2120587(119909minus08)3) + 3 exp[minus(119909minus05)201]which are labeled by F1 F2 and F3 respectively Here120601(119909) is the density function of the normal distribution Theexplanatory 119909119894 and error 120576119894 are independently generated fromuniform distribution on [0 1] and 119873(0 052) respectivelyWe estimate each true regression function via the penalizedspline method We then use the linear and cubic 119861-splinebases with equidistant knots and the second order differencepenalty In addition we set 119870 = 5119899

25 equidistant knotsand the smoothing parameter is determined by the directmethodThe penalized spline estimator with the linear splineand cubic spline are denoted as L-Direct and C-Directrespectively For comparison with L-Direct the same studiesare also implemented for the penalized spline estimator witha linear spline and the smoothing parameter selected viaGCV and restricted maximum likelihood method (REML)in mixed model representation and the local polynomialestimator with normal kernel and Plug-in bandwidth (seeRuppert et al [24]) In GCV we set the candidate values of120582 as 11989410 119894 = 0 1 99 The above three estimators aredenoted by L-GCV L-REML and local linear respectivelyFurthermore we compare C-Direct with C-GCV and C-REML which are the penalized spline estimator with thecubic spline and the smoothing parameter determined byGCV and REML Let

sMISE = 1119869

119869

sum

119895=1

[1

119877

119877

sum

119903=1

119891119903 (119911119895) minus 119891 (119911119895)2

] (41)

be the sample MISE of any estimator 119891(119909) of 119891(119909) where119891119903(119911) is 119891(119911) with 119903th replication and 119911119895 = 119895119869 We calculatethe sample MISE of the penalized spline estimator withthe direct method GCV and REML and the local linearestimator In this simulation we use 119869 = 100 and 119877 = 1000We have simulated 119899 = 50 and 200

The sMISE of all estimators for each model and 119899 aregiven in Table 1 The penalized spline estimator using thedirect method shows good performance in each setting Incomparison with other smoothing parameter methods thedirect method is a little better than the GCV as a wholeHowever for 119899 = 200 C-GCV is better than C-Direct inF1 and F3 though its difference is very small We see thatthe sMISE of C-Direct is smaller than local linear whereaslocal linear behaves better than L-Direct in some case In F2C-Direct is the smallest of all estimators for 119899 = 50 and200 Although the performance totally seems to depend onsituation in which data sets are generated we believe that theproposed method is one of the efficient methods

Next the difference between opt and 120582opt is investigatedempirically Let opt119903 be the opt with 119903th replications for 119903 =1 1000 Then we calculate the sample MSE of opt

sMSE = 1119877

119877

sum

119903=1

opt119903 minus 120582opt2

(42)

for F1 F2 and F3 and 119899 = 50 and 200 To construct 120582opt forF1 F2 and F3 we use true119891(4)(119909) and 1205902 and an approximateblowast Here the approximate blowast means the sample average of bwith 119899 = 200 and 1000 replications

In Table 2 the sMSE of opt for each true functionare described For comparison the sMSE of the smoothing

8 Journal of Probability and Statistics

Table 3 Results of sample PSE for 119899 = 50 and 119899 = 200 All entries for PSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0424 0052 0341 0070 0360 0117L-GCV 0331 0084 0381 0089 0333 0139L-REML 0642 0124 0831 1002 0733 0173Local linear 0751 0220 0474 0901 0489 0143C-Direct 0341 048 0237 063 0424 0086C-GCV 0314 043 0326 076 0533 0089C-REML 0863 1672 0456 121 1043 0125

parameter obtained via GCV and REML are calculated ThesMSE of the L-Direct and the C-Direct is small even in 119899 = 50for all true functions Therefore it seems that the accuracyof opt is guaranteed It indicates that the pilot estimatorconstructed via least squares method is not bad The sMSEwith the direct method are smaller than that with GCV andREML This result is not surprising since GCV and REMLare not concerned with 120582opt However together with Table 1it seems that the sample MSE of the smoothing parameter isreflected in the sample MISE of the estimator

The proposed direct method was derived based on theMISE On the other hand the GCV and the REML areobtained in context of minimizing prediction squared error(PSE) and prediction error Hence we compare the samplePSE of the penalized spline estimator with the direct methodGCV and REML and the local linear estimator Since theprediction error is almost the same as MISE (see Section 4 ofRuppert et al [4]) we omit the comparison of the predictionerror Let

sPSE = 1119899

119899

sum

119894=1

[1

119877

119877

sum

119903=1

119910119894119903 minus 119891119903 (119909119894)2

] (43)

be the sample PSE for any estimator 119891(119909) where119910119894119903 (119903 = 1 119877) is independently generated from119884119894 sim 119873(119891(119909119894) (05)

2) for all 119894

In Table 3 the modified sPSE |sPSE minus (05)2| of allestimators for each model and 119899 are described In Remark 11we discuss the modified sPSE From the result we canconfirm that the direct method has good predictability GCVcan be regarded as the estimator of sPSE Therefore in somecase sPSE with GCV is smaller than that with the directmethod It seems that the cause is the accuracy of theestimates of variance (see Remark 11) However its differenceis very small

We admit that the computational time (in second) takento obtain 119891(119909) for F1 119901 = 3 and 119899 = 200 The fits withthe direct method GCV and REML took 004 122 and 034Although the difference is small the computational time ofthe direct method was faster than that of GCV and REML

Next we confirm the behavior of the penalized splineestimator with the direct method in the additive model Forthe data (119884119894 1199091198941 1199091198942 1199091198943) 119894 = 1 119899 we assume theadditive model with true functions 1198911(1199091) = sin(21205871199091)

1198912(1199092) = 120601((1199092 minus 05)02) and 1198913(1199093) = 04120601((1199093 minus

01)02) + 06120601((1199092 minus 08)02) The error is similar to thefirst simulationThe design points (1199091198941 1199091198942 1199091198943) are generatedby

[

[

1199091198941

1199091198942

1199091198943

]

]

=

[[[

[

(1 + 120588 + 1205882)minus1

0 0

0 (1 + 2120588)minus1

0

0 0 (1 + 120588 + 1205882)minus1

]]]

]

times [

[

1 120588 1205882

120588 1 120588

1205882120588 1

]

]

[

[

1199111198941

1199111198942

1199111198943

]

]

(44)

where 119911119894119895 (119894 = 1 119899 119895 = 1 2 3) are generated inde-pendently from the uniform distribution on [0 1] In thissimulation we adopt 120588 = 0 and 120588 = 05 We then correctedto satisfy

int

1

0

119891119895 (119911) 119889119911 = 0 119895 = 1 2 3 (45)

in each 120588 We construct the penalized spline estimator viathe backfitting algorithm with the cubic spline and secondorder difference penalty The number of equidistant knots is119870119895 = 5119899

25 and the smoothing parameters are determinedusing the direct method The pilot estimator to construct119895opt is the regression spline estimator with fifth spline and119870119895 = 119899

25 We calculate the sample MISE of each 119891119895 for119899 = 200 and 1000 Monte Carlo iterations Then we calculatethe sample MISE of 1198911 1198912 and 1198913 In order to compare withthe direct method we also conduct the same simulation withGCV and REML In GCV we set the candidate values of 120582119895 as11989410 119894 = 0 1 9 for 119895 = 1 2 3 Table 4 summarizes thesample MISE of 119891119895 (119895 = 1 2 3) denoted by MISE119895 for directmethod GCV and REML The penalized spline estimatorwith the direct method performs like that with the GCV inboth the uncorrelated and correlated design cases For 120588 = 0the behaviors of MISE1 MISE2 and MISE3 with the directmethod are similar On the other hand for GCV the MISE1is slightly larger than MISE2 and MISE3 The direct methodleads to an efficient estimate of all covariates On the wholethe direct method is better than REML From above webelieve that the direct method is preferable in practice

Journal of Probability and Statistics 9

Table 4 Results of sample MISE of the penalized spline estimator For each direct method GCV and 120588 MISE1 MISE2 and MISE3 are thesample MISE of 1198911 1198912 and 1198913 respectively All entries for MISE are 102 times their actual values

Method 120588 = 0 120588 = 05

MISE1 MISE2 MISE3 MISE1 MISE2 MISE3Direct 0281 0209 0205 0916 0795 0972GCV 0687 0237 0390 0929 0857 1112REML 1204 0825 0621 2104 1832 2263

Table 5 Results of sample MSE of the selected smoothing parameter via direct method GCV and REML for 120588 = 0 and 120588 = 05 All entriesfor MSE are 102 times their actual values

120588 = 0 120588 = 05

True 1198911 1198912 1198913 1198911 1198912 1198913

Direct 0124 0042 0082 0312 0105 0428GCV 1315 0485 0213 0547 0352 0741REML 2531 1237 2656 1043 0846 1588

Table 6 Results of sample PSE of the penalized spline estimator Allentries for PSE are 102 times their actual values

119899 = 200 120588 = 0 120588 = 05

Direct 0281 0429GCV 0387 0467REML 1204 0925

To confirm the consistency of the direct method thesample MSE of 119895opt is calculated the same manner as thatgiven in univariate regression case For comparison thesample MSE of GCV and REML are obtained Here the true120582119895opt (119895 = 1 2 3) is defined in the same way as that forunivariate case In Table 5 the results for each 1198911 1198912 and 1198913and 120588 = 0 and 05 are shown We see from the results that thebehavior of the directmethod is goodThe sampleMSE of thedirect method is smaller than that of GCV and REML for all119895 For the random design 120588 = 05 such tendency can be alsoseen

Table 6 shows the sPSE of 119891(x119894) = 1198911(1199091198941) + 1198912(1199091198942) +1198913(1199091198943)with each smoothing parameter selectionmethodWesee from result that the efficiency of the direct method isguaranteed in the context of prediction accuracy

Finally we show the computational time (in second)required to construct the penalized spline estimatorwith eachmethod The computational times with the direct methodGCV and REML are 1187 s 12643 s and 435 s respectivelyWe see that the direct method is more efficient than othermethods in context of the computation (see Remark 11) Allof computations in the simulation were done using a softwareR and a computer with 340GHz CPU and 240GB memoryThough this is only few examples the direct method can beseen as one of the good methods to select the smoothingparameter

Remark 11 We calculate |sPSE minus (05)2| as the criteria toconfirm the prediction squared error The ordinal PSE isdefined as

PSE = 1119899

119899

sum

119894=1

119864 [119884119894 minus 119891 (119909119894)2

]

= 1205902+1

119899

119899

sum

119894=1

119864 [119891 (119909119894) minus 119891 (119909119894)2

]

(46)

where 119884119894rsquos are the test data The second term of the PSEis similar to MISE So it can be said that the sample PSEevaluates the accuracy of the variance part andMISE part Tosee the detail of the difference of the sample PSE between thedirect method and other method we calculated |sPSE minus 1205902|in this section

6 Discussion

In this paper we proposed a new direct method for deter-mining the smoothing parameter for a penalized splineestimator in a regression problemThe directmethod is basedon minimizing MISE of the penalized spline estimator Westudied the asymptotic property of the direct method Theasymptotic normality of the penalized spline estimator usingopt is theoretically guaranteed when the consistent estimatoris used as the pilot estimator to obtain opt In numericalstudy for the additive model the computational cost of thisdirect method is dramatically reduced when compared togrid search methods such as GCV Furthermore we find thatthe performance of the direct method is better than or at leastsimilar to that of other methods

The direct method will be developed for other regressionmodels such as varying-coefficient models Cox propor-tional hazards models single-index models and others ifthe asymptotic bias and variance of the penalized spline

10 Journal of Probability and Statistics

estimator are derived It is not limited to themean regressionit can be applied to quantile regression Actually Yoshida[31] has presented the asymptotic bias and variance of thepenalized spline estimator in univariate quantile regressionsFurthermore it is seen that improving the direct method isimportant for various situations and datasets In particularthe development of the determination of locally adaptive 120582 isan interesting avenue of further research

Appendix

We describe the technical details For the matrix 119860 = (119886119894119895)119894119895119860infin = max119894119895|119886119894119895| First to prove Theorems 1 and 2 weintroduce the fundamental property of penalized splines inthe following lemma

Lemma A1 Let 119860 = (119886119894119895)119894119895 be (119870 + 119901) matrix Supposethat 119870 = 119900(11989912) 120582 = 119900(1198991198701minus119898) and 119860infin = 119874(1) Then119860119866minus1

119899infin= 119874(119870) and 119860Λminus1

119899infin= 119874(119870)

The proof of Lemma A1 is shown in Zhou et al [28] andClaeskens et al [18] The repeated use of Lemma A1 yieldsthat the asymptotic order of the variance of the second termof (12) is 119874(1205822(119870119899)3) Actually the asymptotic order of thevariance of the second term of (12) can be calculated as

(120582

119899)

21205902

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899119863119879

119898119863119898Λminus1

119899B (119909)

= 119874(1205822

11989931198703)

(A1)

When 119870 = 119900(11989912) and 120582 = 119874(119899119870

1minus119898) 1205822(119870119899)3 =

119900(120582(119870119899)2) holds

Proof of Theorem 1 We write

119862 (119909 | 119899 119870 120582) =21205902

119899B(119909)119879119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909)

+21205902

119899B(119909)119879 Λminus1

119899minus 119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909) (A2)

Since

Λminus1

119899minus 119866minus1

119899= Λminus1

119899119868 minus Λ 119899119866

minus1

119899

=120582

119899Λminus1

119899119863119879

119898119863119898119866minus1

119899

(A3)

we have Λminus1119899minus 119866minus1

119899infin

= 119874(1205821198702119899) and hence the

second term of right hand side of (A2) can be shown119874((119870119899)

2(120582119870119899)) From Lemma A1 it is easy to derive that

119862(119909 | 119899 119870 0) = 119874(119870(119870119899)) Consequently we have

119862 (119909 | 119899 119870 120582)

119870=119862 (119909 | 119899 119870 0)

119870+ 119874((

119870

119899)

2

(120582

119899))

=119862 (119909 | 119899 119870 0)

119870+ 119900 (

119870

119899)

(A4)

and this leads to Theorem 1

Proof of Theorem 2 It is sufficient to prove that int1198871198861198631119899(119909 |

blowast)119889119909 = 119874(1198702(1minus119898)) and

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874(1198702

119899)

(A5)

Since100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

119889119909

100381610038161003816100381610038161003816100381610038161003816

le sup119909isin[119886119887]

[B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

] (119886 minus 119887)

(A6)

we have

int

119887

119886

1198631119899 (119909 | blowast) 119889119909 = 119874 (119870

2(1minus119898)) (A7)

by using Lemma A1 and the fact that 119863119898blowastinfin = 119874(119870minus119898)

(see Yoshida [31])From the property of119861-spline function for (119870+119901) square

matrix 119860 there exists 119862 gt 0 such that

int

119887

119886

B(119909)119879119860B (119909) 119889119909 le 119860infin int119887

119886

B(119909)119879B (119909) 119889119909

le 119860infin119862

(A8)

In other words the asymptotic order of int119887119886B(119909)119879119860B(119909)119889119909

and that of 119860infin are the same Let 119860119899 = 119866minus1

119899119863119879

119898119863119898119866minus1

119899

Then since 119860119899infin = 1198702 we obtain

int

119887

119886

119862 (119909 | 119899 119870 0) 119889119909 = 119874(1198702

119899) (A9)

Furthermore because sup119886isin[119886119887]

119887119886(119909) = 119874(1) we have

119870minus(119901+1)

100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

119887119886 (119909)B(119909)119879119866minus1

119899119863119879

119898119863119898blowast119889119909

100381610038161003816100381610038161003816100381610038161003816

= 119874 (119870minus(119901+119898)

)

(A10)

and hence it can be shown that

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874 (119899minus11198702) (A11)

Together with (A7) and (A11) 120582opt = 119874(119899119870119898minus119901minus2

+ 1198702119898)

can be obtained Then rate of convergence of the penalizedspline estimator with 120582opt is 119874(119899

minus2(119901+2)(2119901 + 3)) when 119870 =

119874(1198991(2119901+3)

) which is detailed in Yoshida and Naito [22]

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Journal of Probability and Statistics 11

Acknowledgment

The author also would like to thank two anonymous refereesfor their careful reading and comments to improve the paper

References

[1] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash527 1986

[2] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash121 1996 With comments and a rejoinder by the authors

[3] B D Marx and P H C Eilers ldquoDirect generalized additivemodeling with penalized likelihoodrdquo Computational Statisticsand Data Analysis vol 28 no 2 pp 193ndash209 1998

[4] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge UK 2003

[5] T Krivobokova ldquoSmoothing parameter selection in two frame-works for penalized splinesrdquo Journal of the Royal StatisticalSociety B vol 75 no 4 pp 725ndash741 2013

[6] P T Reiss and R T Ogden ldquoSmoothing parameter selection fora class of semiparametric linear modelsrdquo Journal of the RoyalStatistical Society B vol 71 no 2 pp 505ndash523 2009

[7] S N Wood ldquoModelling and smoothing parameter estimationwith multiple quadratic penaltiesrdquo Journal of the Royal Statisti-cal Society B vol 62 no 2 pp 413ndash428 2000

[8] S NWood ldquoStable and efficient multiple smoothing parameterestimation for generalized additive modelsrdquo Journal of theAmerican Statistical Association vol 99 no 467 pp 673ndash6862004

[9] S N Wood ldquoFast stable direct fitting and smoothness selectionfor generalized additive modelsrdquo Journal of the Royal StatisticalSociety B vol 70 no 3 pp 495ndash518 2008

[10] X Lin and D Zhang ldquoInference in generalized additive mixedmodels by using smoothing splinesrdquo Journal of the RoyalStatistical Society B vol 61 no 2 pp 381ndash400 1999

[11] M P Wand ldquoSmoothing and mixed modelsrdquo ComputationalStatistics vol 18 no 2 pp 223ndash249 2003

[12] L Fahrmeir T Kneib and S Lang ldquoPenalized structuredadditive regression for space-time data a Bayesian perspectiverdquoStatistica Sinica vol 14 no 3 pp 731ndash761 2004

[13] L Fahrmeir and T Kneib ldquoPropriety of posteriors in structuredadditive regression models theory and empirical evidencerdquoJournal of Statistical Planning and Inference vol 139 no 3 pp843ndash859 2009

[14] F Heinzl L Fahrmeir and T Kneib ldquoAdditive mixed modelswith Dirichlet process mixture and P-spline priorsrdquo Advancesin Statistical Analysis vol 96 no 1 pp 47ndash68 2012

[15] G Kauermann ldquoA note on smoothing parameter selection forpenalized spline smoothingrdquo Journal of Statistical Planning andInference vol 127 no 1-2 pp 53ndash69 2005

[16] P Hall and J D Opsomer ldquoTheory for penalised splineregressionrdquo Biometrika vol 92 no 1 pp 105ndash118 2005

[17] Y Li and D Ruppert ldquoOn the asymptotics of penalized splinesrdquoBiometrika vol 95 no 2 pp 415ndash436 2008

[18] G Claeskens T Krivobokova and J D Opsomer ldquoAsymptoticproperties of penalized spline estimatorsrdquo Biometrika vol 96no 3 pp 529ndash544 2009

[19] G Kauermann T Krivobokova and L Fahrmeir ldquoSomeasymptotic results on generalized penalized spline smoothingrdquo

Journal of the Royal Statistical Society B vol 71 no 2 pp 487ndash503 2009

[20] X Wang J Shen and D Ruppert ldquoOn the asymptotics ofpenalized spline smoothingrdquo Electronic Journal of Statistics vol5 pp 1ndash17 2011

[21] T Yoshida and K Naito ldquoAsymptotics for penalized additive 119861-spline regressionrdquo Journal of the Japan Statistical Society vol 42no 1 pp 81ndash107 2012

[22] T Yoshida and K Naito ldquoAsymptotics for penalized splinesin generalized additive modelsrdquo Journal of NonparametricStatistics vol 26 no 2 pp 269ndash289 2014

[23] L Xiao Y Li and D Ruppert ldquoFast bivariate 119875-splines thesandwich smootherrdquo Journal of the Royal Statistical Society Bvol 75 no 3 pp 577ndash599 2013

[24] D Ruppert S J Sheather and M P Wand ldquoAn effectivebandwidth selector for local least squares regressionrdquo Journal ofthe American Statistical Association vol 90 no 432 pp 1257ndash1270 1995

[25] M P Wand and M C Jones Kernel Smoothing Chapman ampHall London UK 1995

[26] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[27] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[28] S Zhou X Shen and D A Wolfe ldquoLocal asymptotics forregression splines and confidence regionsrdquo The Annals ofStatistics vol 26 no 5 pp 1760ndash1782 1998

[29] G K Robinson ldquoThat BLUP is a good thing the estimation ofrandom effectsrdquo Statistical Science vol 6 no 1 pp 15ndash32 1991

[30] T J Hastie and R J Tibshirani Generalized Additive ModelsChapman amp Hall London UKI 1990

[31] T Yoshida ldquoAsymptotics for penalized spline estimators inquantile regressionrdquo Communications in Statistics-Theory andMethods 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Direct Determination of Smoothing

6 Journal of Probability and Statistics

The asymptotics for 119891119895(119909119895) have been studied by Yoshidaand Naito [22] who derived the asymptotic bias and varianceto be

119864 [119891119895 (119909119895)] = 119891119895 (119909119895) + 119870minus(119901+1)

119895119887119895119886 (119909119895) (1 + 119900 (1))

+

120582119895

119899B119895(119909119895)

119879

Λminus1

119895119899119863119879

119895119898119863119895119898blowast

119895+ 119900(

1205821198951198701minus119898

119895

119899)

119881 [119891119895 (119909119895)] =1205902

119899B119895(119909119895)

119879

119866minus1

119895119899B119895 (119909119895)

minus120582119870

119899

1205902119862119895 (119909119895 | 119899 119870119895)

119870+ 119900(

1205821198951198702

119895

1198992)

(34)

where B119895(119909) = (119861minus119901(119909) sdot sdot sdot 119861119870119895minus1(119909))119879

119866119895119899 = 119899minus1119885119879

119895119885119895

Λ 119895119899 = 119866119895119899 + (120582119895119899)119863119879

119898119863119898 119885119895 = (119861minus119901+119896minus1(119909119894119895))119894119896 and blowast

119895is

the best 119871infin approximation of 119891119895

119887119895119886 (119909) = minus

119891(119901+1)

119895(119909)

(119901 + 1)

119870119895minus1

sum

119896=0

119868 (120581119895119896 le 119909 lt 120581119895119896+1)

times 119861119901+1(

119909 minus 120581119895119896

119870minus1119895

)

119862119895 (119909119895 | 119899 119870119895) =2

119899B119895(119909119895)

119879

119866minus1

119895119899119863119879

119895119898119863119895119898119866

minus1

119899B119895 (119909119895)

(35)

The above asymptotic bias and variance of 119891119895(119909119895) are sim-ilar to that of the penalized spline estimator in univariateregression with (119884119894 119909119894119895) 119894 = 1 119899 Furthermore

the asymptotic normality of [1198911(1199091) sdot sdot sdot 119891119889(119909119889)]119879

has beenshown by Yoshida and Naito [22] From their paper we findthat119891119894(119909119894) and119891119895(119909119895) are then asymptotically independent for119894 = 119895 This indicates some theoretical justification to select 120582119895in minimizing the MISE of 119891119895 Similar to the discussion inSection 3 the minimizer 120582119895opt of the MISE of 119891119895(119909119895) can beobtained for 119895 = 1 119889

120582119895opt =119899

2

int119887119895

119886119895119871119895119899 (119909 | 119891

(119901+1)

119895 blowast119895 1205902) 119889119909

int119887119895

119886119895119867119895119899 (119909 | blowast119895 ) 119889119909

(36)

where

119867119895119899 (119909 | blowast

119895) = B119895(119909119895)

119879

119866minus1

119895119899119863119879

119895119898119863119895119898blowast

119895

2

119871119895119899 (119909 | 119891(119901+1)

119895 blowast119895 1205902) = 2

119887119886 (119909)B119895(119909119895)119879

119866minus1

119895119899119863119879

119895119898119863119895119898blowast119895

119870119901+1

119895

+ 1205902119862119895 (119909 | 119899 119870119895)

(37)

Since 119891119895 blowast119895 and 1205902 are unknown they should be estimated

Thepilot estimators of119891119895 blowast119895 and1205902 are constructed based on

the regression splinemethod By using the pilot estimators119891119895b119895 of119891119895blowast119895 and the estimator of1205902 we construct the estimatorof 120582119895opt

119895opt =119899

2

sum119877

119903=1119871119895119899 (119911119903 | 119891

(119901+1)

119895 b119895 2)

sum119877

119903=1119867119895119899 (119911119903 | b119895)

(38)

where 119911119903119877

1is some finite grid point sequence on [119886119895 119887119895] We

obtain for 119895 = 1 119889 the penalized spline estimator 119891119895(119909119895)of 119891119895(119909119895)

119891119895 (119909119895) = B119895(119909119895)119879

b119895opt (39)

where b119895opt is the penalized spline estimator of b119895 using119895opt From Theorem 2 and the proof of Theorem 34of Yoshida and Naito [22] the asymptotic normality of[1198911(1199091) sdot sdot sdot 119891119889(119909119889)]

119879

using (1205821opt 120582119889opt) can be shown

Remark 7 Since the true regression functions are normalizedthe estimator 119891119895 should be also centered as

119891119895 (119909119895) minus1

119899

119899

sum

119894=1

119891119895 (119909119894119895) (40)

Remark 8 Thepenalized spline estimator of b1 b119889 can beobtained using a backfitting algorithm (Hastie and Tibshirani[30]) The backfitting algorithm for the penalized splines inadditive regression is detailed in Marx and Eilers [3] andYoshida and Naito [21]

Remark 9 Although we focus on the nonparametric additiveregression in this paper the directmethod can be also appliedto the generalized additive models However we omit thisdiscussion because the procedure is similar to the case of theadditive models discussed in this section

Remark 10 The direct method is quite computationally effi-cient when compared to the grid search method in additivemodels In grid searches we prepare the candidate of 120582119895 Let119872119895 be the set of all possible candidate grid value of 120582119895 for 119895 =1 119889 Then we need to compute the backfitting algorithm1198721 times sdot sdot sdot times 119872119889 times On the other hand it is sufficientto perform the backfitting algorithm for only two steps forthe pilot estimator and the final penalized spline estimatorThus compared with the conventional grid search methodthe direct method can drastically reduce computation time

5 Numerical Study

In this section we investigate the finite sample performanceof the proposed direct method in a Monte Carlo simulationLet us first consider the univariate regression model (1) forthe data (119884119894 119909119894) 119894 = 1 119899 Then we use the three

Journal of Probability and Statistics 7

Table 1 Results of sample MISE for 119899 = 50 and 119899 = 200 All entries for MISE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0526 0322 3684 1070 1160 0377L-GCV 0726 0621 5811 1082 1183 0379L-REML 2270 1544 9966 3520 1283 0873Local linear 0751 0220 4274 0901 1689 0503C-Direct 0401 0148 3027 0656 1044 0326C-GCV 0514 0137 3526 0666 1043 0290C-REML 1326 0732 8246 4213 3241 0835

Table 2 Results of sample MSE of smoothing parameter obtained by direct method GCV and REML for 119899 = 50 and 119899 = 200 All entries forMSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0331 0193 0113 0043 0025 0037L-GCV 0842 0342 0445 0101 0072 0043L-REML 1070 1231 0842 0437 0143 0268C-Direct 0262 0014 0082 0045 0053 0014C-GCV 0452 0123 0252 0092 0085 0135C-REML 0855 0224 0426 0152 0093 0463

types of true regression function 119891(119909) = cos(120587(119909 minus 03))119891(119909) = 120601((119909 minus 08)005)radic005 minus 120601((119909 minus 02)004)radic004and 119891(119909) = 21199093 +3 sin(2120587(119909minus08)3) + 3 exp[minus(119909minus05)201]which are labeled by F1 F2 and F3 respectively Here120601(119909) is the density function of the normal distribution Theexplanatory 119909119894 and error 120576119894 are independently generated fromuniform distribution on [0 1] and 119873(0 052) respectivelyWe estimate each true regression function via the penalizedspline method We then use the linear and cubic 119861-splinebases with equidistant knots and the second order differencepenalty In addition we set 119870 = 5119899

25 equidistant knotsand the smoothing parameter is determined by the directmethodThe penalized spline estimator with the linear splineand cubic spline are denoted as L-Direct and C-Directrespectively For comparison with L-Direct the same studiesare also implemented for the penalized spline estimator witha linear spline and the smoothing parameter selected viaGCV and restricted maximum likelihood method (REML)in mixed model representation and the local polynomialestimator with normal kernel and Plug-in bandwidth (seeRuppert et al [24]) In GCV we set the candidate values of120582 as 11989410 119894 = 0 1 99 The above three estimators aredenoted by L-GCV L-REML and local linear respectivelyFurthermore we compare C-Direct with C-GCV and C-REML which are the penalized spline estimator with thecubic spline and the smoothing parameter determined byGCV and REML Let

sMISE = 1119869

119869

sum

119895=1

[1

119877

119877

sum

119903=1

119891119903 (119911119895) minus 119891 (119911119895)2

] (41)

be the sample MISE of any estimator 119891(119909) of 119891(119909) where119891119903(119911) is 119891(119911) with 119903th replication and 119911119895 = 119895119869 We calculatethe sample MISE of the penalized spline estimator withthe direct method GCV and REML and the local linearestimator In this simulation we use 119869 = 100 and 119877 = 1000We have simulated 119899 = 50 and 200

The sMISE of all estimators for each model and 119899 aregiven in Table 1 The penalized spline estimator using thedirect method shows good performance in each setting Incomparison with other smoothing parameter methods thedirect method is a little better than the GCV as a wholeHowever for 119899 = 200 C-GCV is better than C-Direct inF1 and F3 though its difference is very small We see thatthe sMISE of C-Direct is smaller than local linear whereaslocal linear behaves better than L-Direct in some case In F2C-Direct is the smallest of all estimators for 119899 = 50 and200 Although the performance totally seems to depend onsituation in which data sets are generated we believe that theproposed method is one of the efficient methods

Next the difference between opt and 120582opt is investigatedempirically Let opt119903 be the opt with 119903th replications for 119903 =1 1000 Then we calculate the sample MSE of opt

sMSE = 1119877

119877

sum

119903=1

opt119903 minus 120582opt2

(42)

for F1 F2 and F3 and 119899 = 50 and 200 To construct 120582opt forF1 F2 and F3 we use true119891(4)(119909) and 1205902 and an approximateblowast Here the approximate blowast means the sample average of bwith 119899 = 200 and 1000 replications

In Table 2 the sMSE of opt for each true functionare described For comparison the sMSE of the smoothing

8 Journal of Probability and Statistics

Table 3 Results of sample PSE for 119899 = 50 and 119899 = 200 All entries for PSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0424 0052 0341 0070 0360 0117L-GCV 0331 0084 0381 0089 0333 0139L-REML 0642 0124 0831 1002 0733 0173Local linear 0751 0220 0474 0901 0489 0143C-Direct 0341 048 0237 063 0424 0086C-GCV 0314 043 0326 076 0533 0089C-REML 0863 1672 0456 121 1043 0125

parameter obtained via GCV and REML are calculated ThesMSE of the L-Direct and the C-Direct is small even in 119899 = 50for all true functions Therefore it seems that the accuracyof opt is guaranteed It indicates that the pilot estimatorconstructed via least squares method is not bad The sMSEwith the direct method are smaller than that with GCV andREML This result is not surprising since GCV and REMLare not concerned with 120582opt However together with Table 1it seems that the sample MSE of the smoothing parameter isreflected in the sample MISE of the estimator

The proposed direct method was derived based on theMISE On the other hand the GCV and the REML areobtained in context of minimizing prediction squared error(PSE) and prediction error Hence we compare the samplePSE of the penalized spline estimator with the direct methodGCV and REML and the local linear estimator Since theprediction error is almost the same as MISE (see Section 4 ofRuppert et al [4]) we omit the comparison of the predictionerror Let

sPSE = 1119899

119899

sum

119894=1

[1

119877

119877

sum

119903=1

119910119894119903 minus 119891119903 (119909119894)2

] (43)

be the sample PSE for any estimator 119891(119909) where119910119894119903 (119903 = 1 119877) is independently generated from119884119894 sim 119873(119891(119909119894) (05)

2) for all 119894

In Table 3 the modified sPSE |sPSE minus (05)2| of allestimators for each model and 119899 are described In Remark 11we discuss the modified sPSE From the result we canconfirm that the direct method has good predictability GCVcan be regarded as the estimator of sPSE Therefore in somecase sPSE with GCV is smaller than that with the directmethod It seems that the cause is the accuracy of theestimates of variance (see Remark 11) However its differenceis very small

We admit that the computational time (in second) takento obtain 119891(119909) for F1 119901 = 3 and 119899 = 200 The fits withthe direct method GCV and REML took 004 122 and 034Although the difference is small the computational time ofthe direct method was faster than that of GCV and REML

Next we confirm the behavior of the penalized splineestimator with the direct method in the additive model Forthe data (119884119894 1199091198941 1199091198942 1199091198943) 119894 = 1 119899 we assume theadditive model with true functions 1198911(1199091) = sin(21205871199091)

1198912(1199092) = 120601((1199092 minus 05)02) and 1198913(1199093) = 04120601((1199093 minus

01)02) + 06120601((1199092 minus 08)02) The error is similar to thefirst simulationThe design points (1199091198941 1199091198942 1199091198943) are generatedby

[

[

1199091198941

1199091198942

1199091198943

]

]

=

[[[

[

(1 + 120588 + 1205882)minus1

0 0

0 (1 + 2120588)minus1

0

0 0 (1 + 120588 + 1205882)minus1

]]]

]

times [

[

1 120588 1205882

120588 1 120588

1205882120588 1

]

]

[

[

1199111198941

1199111198942

1199111198943

]

]

(44)

where 119911119894119895 (119894 = 1 119899 119895 = 1 2 3) are generated inde-pendently from the uniform distribution on [0 1] In thissimulation we adopt 120588 = 0 and 120588 = 05 We then correctedto satisfy

int

1

0

119891119895 (119911) 119889119911 = 0 119895 = 1 2 3 (45)

in each 120588 We construct the penalized spline estimator viathe backfitting algorithm with the cubic spline and secondorder difference penalty The number of equidistant knots is119870119895 = 5119899

25 and the smoothing parameters are determinedusing the direct method The pilot estimator to construct119895opt is the regression spline estimator with fifth spline and119870119895 = 119899

25 We calculate the sample MISE of each 119891119895 for119899 = 200 and 1000 Monte Carlo iterations Then we calculatethe sample MISE of 1198911 1198912 and 1198913 In order to compare withthe direct method we also conduct the same simulation withGCV and REML In GCV we set the candidate values of 120582119895 as11989410 119894 = 0 1 9 for 119895 = 1 2 3 Table 4 summarizes thesample MISE of 119891119895 (119895 = 1 2 3) denoted by MISE119895 for directmethod GCV and REML The penalized spline estimatorwith the direct method performs like that with the GCV inboth the uncorrelated and correlated design cases For 120588 = 0the behaviors of MISE1 MISE2 and MISE3 with the directmethod are similar On the other hand for GCV the MISE1is slightly larger than MISE2 and MISE3 The direct methodleads to an efficient estimate of all covariates On the wholethe direct method is better than REML From above webelieve that the direct method is preferable in practice

Journal of Probability and Statistics 9

Table 4 Results of sample MISE of the penalized spline estimator For each direct method GCV and 120588 MISE1 MISE2 and MISE3 are thesample MISE of 1198911 1198912 and 1198913 respectively All entries for MISE are 102 times their actual values

Method 120588 = 0 120588 = 05

MISE1 MISE2 MISE3 MISE1 MISE2 MISE3Direct 0281 0209 0205 0916 0795 0972GCV 0687 0237 0390 0929 0857 1112REML 1204 0825 0621 2104 1832 2263

Table 5 Results of sample MSE of the selected smoothing parameter via direct method GCV and REML for 120588 = 0 and 120588 = 05 All entriesfor MSE are 102 times their actual values

120588 = 0 120588 = 05

True 1198911 1198912 1198913 1198911 1198912 1198913

Direct 0124 0042 0082 0312 0105 0428GCV 1315 0485 0213 0547 0352 0741REML 2531 1237 2656 1043 0846 1588

Table 6 Results of sample PSE of the penalized spline estimator Allentries for PSE are 102 times their actual values

119899 = 200 120588 = 0 120588 = 05

Direct 0281 0429GCV 0387 0467REML 1204 0925

To confirm the consistency of the direct method thesample MSE of 119895opt is calculated the same manner as thatgiven in univariate regression case For comparison thesample MSE of GCV and REML are obtained Here the true120582119895opt (119895 = 1 2 3) is defined in the same way as that forunivariate case In Table 5 the results for each 1198911 1198912 and 1198913and 120588 = 0 and 05 are shown We see from the results that thebehavior of the directmethod is goodThe sampleMSE of thedirect method is smaller than that of GCV and REML for all119895 For the random design 120588 = 05 such tendency can be alsoseen

Table 6 shows the sPSE of 119891(x119894) = 1198911(1199091198941) + 1198912(1199091198942) +1198913(1199091198943)with each smoothing parameter selectionmethodWesee from result that the efficiency of the direct method isguaranteed in the context of prediction accuracy

Finally we show the computational time (in second)required to construct the penalized spline estimatorwith eachmethod The computational times with the direct methodGCV and REML are 1187 s 12643 s and 435 s respectivelyWe see that the direct method is more efficient than othermethods in context of the computation (see Remark 11) Allof computations in the simulation were done using a softwareR and a computer with 340GHz CPU and 240GB memoryThough this is only few examples the direct method can beseen as one of the good methods to select the smoothingparameter

Remark 11 We calculate |sPSE minus (05)2| as the criteria toconfirm the prediction squared error The ordinal PSE isdefined as

PSE = 1119899

119899

sum

119894=1

119864 [119884119894 minus 119891 (119909119894)2

]

= 1205902+1

119899

119899

sum

119894=1

119864 [119891 (119909119894) minus 119891 (119909119894)2

]

(46)

where 119884119894rsquos are the test data The second term of the PSEis similar to MISE So it can be said that the sample PSEevaluates the accuracy of the variance part andMISE part Tosee the detail of the difference of the sample PSE between thedirect method and other method we calculated |sPSE minus 1205902|in this section

6 Discussion

In this paper we proposed a new direct method for deter-mining the smoothing parameter for a penalized splineestimator in a regression problemThe directmethod is basedon minimizing MISE of the penalized spline estimator Westudied the asymptotic property of the direct method Theasymptotic normality of the penalized spline estimator usingopt is theoretically guaranteed when the consistent estimatoris used as the pilot estimator to obtain opt In numericalstudy for the additive model the computational cost of thisdirect method is dramatically reduced when compared togrid search methods such as GCV Furthermore we find thatthe performance of the direct method is better than or at leastsimilar to that of other methods

The direct method will be developed for other regressionmodels such as varying-coefficient models Cox propor-tional hazards models single-index models and others ifthe asymptotic bias and variance of the penalized spline

10 Journal of Probability and Statistics

estimator are derived It is not limited to themean regressionit can be applied to quantile regression Actually Yoshida[31] has presented the asymptotic bias and variance of thepenalized spline estimator in univariate quantile regressionsFurthermore it is seen that improving the direct method isimportant for various situations and datasets In particularthe development of the determination of locally adaptive 120582 isan interesting avenue of further research

Appendix

We describe the technical details For the matrix 119860 = (119886119894119895)119894119895119860infin = max119894119895|119886119894119895| First to prove Theorems 1 and 2 weintroduce the fundamental property of penalized splines inthe following lemma

Lemma A1 Let 119860 = (119886119894119895)119894119895 be (119870 + 119901) matrix Supposethat 119870 = 119900(11989912) 120582 = 119900(1198991198701minus119898) and 119860infin = 119874(1) Then119860119866minus1

119899infin= 119874(119870) and 119860Λminus1

119899infin= 119874(119870)

The proof of Lemma A1 is shown in Zhou et al [28] andClaeskens et al [18] The repeated use of Lemma A1 yieldsthat the asymptotic order of the variance of the second termof (12) is 119874(1205822(119870119899)3) Actually the asymptotic order of thevariance of the second term of (12) can be calculated as

(120582

119899)

21205902

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899119863119879

119898119863119898Λminus1

119899B (119909)

= 119874(1205822

11989931198703)

(A1)

When 119870 = 119900(11989912) and 120582 = 119874(119899119870

1minus119898) 1205822(119870119899)3 =

119900(120582(119870119899)2) holds

Proof of Theorem 1 We write

119862 (119909 | 119899 119870 120582) =21205902

119899B(119909)119879119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909)

+21205902

119899B(119909)119879 Λminus1

119899minus 119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909) (A2)

Since

Λminus1

119899minus 119866minus1

119899= Λminus1

119899119868 minus Λ 119899119866

minus1

119899

=120582

119899Λminus1

119899119863119879

119898119863119898119866minus1

119899

(A3)

we have Λminus1119899minus 119866minus1

119899infin

= 119874(1205821198702119899) and hence the

second term of right hand side of (A2) can be shown119874((119870119899)

2(120582119870119899)) From Lemma A1 it is easy to derive that

119862(119909 | 119899 119870 0) = 119874(119870(119870119899)) Consequently we have

119862 (119909 | 119899 119870 120582)

119870=119862 (119909 | 119899 119870 0)

119870+ 119874((

119870

119899)

2

(120582

119899))

=119862 (119909 | 119899 119870 0)

119870+ 119900 (

119870

119899)

(A4)

and this leads to Theorem 1

Proof of Theorem 2 It is sufficient to prove that int1198871198861198631119899(119909 |

blowast)119889119909 = 119874(1198702(1minus119898)) and

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874(1198702

119899)

(A5)

Since100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

119889119909

100381610038161003816100381610038161003816100381610038161003816

le sup119909isin[119886119887]

[B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

] (119886 minus 119887)

(A6)

we have

int

119887

119886

1198631119899 (119909 | blowast) 119889119909 = 119874 (119870

2(1minus119898)) (A7)

by using Lemma A1 and the fact that 119863119898blowastinfin = 119874(119870minus119898)

(see Yoshida [31])From the property of119861-spline function for (119870+119901) square

matrix 119860 there exists 119862 gt 0 such that

int

119887

119886

B(119909)119879119860B (119909) 119889119909 le 119860infin int119887

119886

B(119909)119879B (119909) 119889119909

le 119860infin119862

(A8)

In other words the asymptotic order of int119887119886B(119909)119879119860B(119909)119889119909

and that of 119860infin are the same Let 119860119899 = 119866minus1

119899119863119879

119898119863119898119866minus1

119899

Then since 119860119899infin = 1198702 we obtain

int

119887

119886

119862 (119909 | 119899 119870 0) 119889119909 = 119874(1198702

119899) (A9)

Furthermore because sup119886isin[119886119887]

119887119886(119909) = 119874(1) we have

119870minus(119901+1)

100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

119887119886 (119909)B(119909)119879119866minus1

119899119863119879

119898119863119898blowast119889119909

100381610038161003816100381610038161003816100381610038161003816

= 119874 (119870minus(119901+119898)

)

(A10)

and hence it can be shown that

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874 (119899minus11198702) (A11)

Together with (A7) and (A11) 120582opt = 119874(119899119870119898minus119901minus2

+ 1198702119898)

can be obtained Then rate of convergence of the penalizedspline estimator with 120582opt is 119874(119899

minus2(119901+2)(2119901 + 3)) when 119870 =

119874(1198991(2119901+3)

) which is detailed in Yoshida and Naito [22]

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Journal of Probability and Statistics 11

Acknowledgment

The author also would like to thank two anonymous refereesfor their careful reading and comments to improve the paper

References

[1] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash527 1986

[2] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash121 1996 With comments and a rejoinder by the authors

[3] B D Marx and P H C Eilers ldquoDirect generalized additivemodeling with penalized likelihoodrdquo Computational Statisticsand Data Analysis vol 28 no 2 pp 193ndash209 1998

[4] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge UK 2003

[5] T Krivobokova ldquoSmoothing parameter selection in two frame-works for penalized splinesrdquo Journal of the Royal StatisticalSociety B vol 75 no 4 pp 725ndash741 2013

[6] P T Reiss and R T Ogden ldquoSmoothing parameter selection fora class of semiparametric linear modelsrdquo Journal of the RoyalStatistical Society B vol 71 no 2 pp 505ndash523 2009

[7] S N Wood ldquoModelling and smoothing parameter estimationwith multiple quadratic penaltiesrdquo Journal of the Royal Statisti-cal Society B vol 62 no 2 pp 413ndash428 2000

[8] S NWood ldquoStable and efficient multiple smoothing parameterestimation for generalized additive modelsrdquo Journal of theAmerican Statistical Association vol 99 no 467 pp 673ndash6862004

[9] S N Wood ldquoFast stable direct fitting and smoothness selectionfor generalized additive modelsrdquo Journal of the Royal StatisticalSociety B vol 70 no 3 pp 495ndash518 2008

[10] X Lin and D Zhang ldquoInference in generalized additive mixedmodels by using smoothing splinesrdquo Journal of the RoyalStatistical Society B vol 61 no 2 pp 381ndash400 1999

[11] M P Wand ldquoSmoothing and mixed modelsrdquo ComputationalStatistics vol 18 no 2 pp 223ndash249 2003

[12] L Fahrmeir T Kneib and S Lang ldquoPenalized structuredadditive regression for space-time data a Bayesian perspectiverdquoStatistica Sinica vol 14 no 3 pp 731ndash761 2004

[13] L Fahrmeir and T Kneib ldquoPropriety of posteriors in structuredadditive regression models theory and empirical evidencerdquoJournal of Statistical Planning and Inference vol 139 no 3 pp843ndash859 2009

[14] F Heinzl L Fahrmeir and T Kneib ldquoAdditive mixed modelswith Dirichlet process mixture and P-spline priorsrdquo Advancesin Statistical Analysis vol 96 no 1 pp 47ndash68 2012

[15] G Kauermann ldquoA note on smoothing parameter selection forpenalized spline smoothingrdquo Journal of Statistical Planning andInference vol 127 no 1-2 pp 53ndash69 2005

[16] P Hall and J D Opsomer ldquoTheory for penalised splineregressionrdquo Biometrika vol 92 no 1 pp 105ndash118 2005

[17] Y Li and D Ruppert ldquoOn the asymptotics of penalized splinesrdquoBiometrika vol 95 no 2 pp 415ndash436 2008

[18] G Claeskens T Krivobokova and J D Opsomer ldquoAsymptoticproperties of penalized spline estimatorsrdquo Biometrika vol 96no 3 pp 529ndash544 2009

[19] G Kauermann T Krivobokova and L Fahrmeir ldquoSomeasymptotic results on generalized penalized spline smoothingrdquo

Journal of the Royal Statistical Society B vol 71 no 2 pp 487ndash503 2009

[20] X Wang J Shen and D Ruppert ldquoOn the asymptotics ofpenalized spline smoothingrdquo Electronic Journal of Statistics vol5 pp 1ndash17 2011

[21] T Yoshida and K Naito ldquoAsymptotics for penalized additive 119861-spline regressionrdquo Journal of the Japan Statistical Society vol 42no 1 pp 81ndash107 2012

[22] T Yoshida and K Naito ldquoAsymptotics for penalized splinesin generalized additive modelsrdquo Journal of NonparametricStatistics vol 26 no 2 pp 269ndash289 2014

[23] L Xiao Y Li and D Ruppert ldquoFast bivariate 119875-splines thesandwich smootherrdquo Journal of the Royal Statistical Society Bvol 75 no 3 pp 577ndash599 2013

[24] D Ruppert S J Sheather and M P Wand ldquoAn effectivebandwidth selector for local least squares regressionrdquo Journal ofthe American Statistical Association vol 90 no 432 pp 1257ndash1270 1995

[25] M P Wand and M C Jones Kernel Smoothing Chapman ampHall London UK 1995

[26] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[27] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[28] S Zhou X Shen and D A Wolfe ldquoLocal asymptotics forregression splines and confidence regionsrdquo The Annals ofStatistics vol 26 no 5 pp 1760ndash1782 1998

[29] G K Robinson ldquoThat BLUP is a good thing the estimation ofrandom effectsrdquo Statistical Science vol 6 no 1 pp 15ndash32 1991

[30] T J Hastie and R J Tibshirani Generalized Additive ModelsChapman amp Hall London UKI 1990

[31] T Yoshida ldquoAsymptotics for penalized spline estimators inquantile regressionrdquo Communications in Statistics-Theory andMethods 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Direct Determination of Smoothing

Journal of Probability and Statistics 7

Table 1 Results of sample MISE for 119899 = 50 and 119899 = 200 All entries for MISE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0526 0322 3684 1070 1160 0377L-GCV 0726 0621 5811 1082 1183 0379L-REML 2270 1544 9966 3520 1283 0873Local linear 0751 0220 4274 0901 1689 0503C-Direct 0401 0148 3027 0656 1044 0326C-GCV 0514 0137 3526 0666 1043 0290C-REML 1326 0732 8246 4213 3241 0835

Table 2 Results of sample MSE of smoothing parameter obtained by direct method GCV and REML for 119899 = 50 and 119899 = 200 All entries forMSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0331 0193 0113 0043 0025 0037L-GCV 0842 0342 0445 0101 0072 0043L-REML 1070 1231 0842 0437 0143 0268C-Direct 0262 0014 0082 0045 0053 0014C-GCV 0452 0123 0252 0092 0085 0135C-REML 0855 0224 0426 0152 0093 0463

types of true regression function 119891(119909) = cos(120587(119909 minus 03))119891(119909) = 120601((119909 minus 08)005)radic005 minus 120601((119909 minus 02)004)radic004and 119891(119909) = 21199093 +3 sin(2120587(119909minus08)3) + 3 exp[minus(119909minus05)201]which are labeled by F1 F2 and F3 respectively Here120601(119909) is the density function of the normal distribution Theexplanatory 119909119894 and error 120576119894 are independently generated fromuniform distribution on [0 1] and 119873(0 052) respectivelyWe estimate each true regression function via the penalizedspline method We then use the linear and cubic 119861-splinebases with equidistant knots and the second order differencepenalty In addition we set 119870 = 5119899

25 equidistant knotsand the smoothing parameter is determined by the directmethodThe penalized spline estimator with the linear splineand cubic spline are denoted as L-Direct and C-Directrespectively For comparison with L-Direct the same studiesare also implemented for the penalized spline estimator witha linear spline and the smoothing parameter selected viaGCV and restricted maximum likelihood method (REML)in mixed model representation and the local polynomialestimator with normal kernel and Plug-in bandwidth (seeRuppert et al [24]) In GCV we set the candidate values of120582 as 11989410 119894 = 0 1 99 The above three estimators aredenoted by L-GCV L-REML and local linear respectivelyFurthermore we compare C-Direct with C-GCV and C-REML which are the penalized spline estimator with thecubic spline and the smoothing parameter determined byGCV and REML Let

sMISE = 1119869

119869

sum

119895=1

[1

119877

119877

sum

119903=1

119891119903 (119911119895) minus 119891 (119911119895)2

] (41)

be the sample MISE of any estimator 119891(119909) of 119891(119909) where119891119903(119911) is 119891(119911) with 119903th replication and 119911119895 = 119895119869 We calculatethe sample MISE of the penalized spline estimator withthe direct method GCV and REML and the local linearestimator In this simulation we use 119869 = 100 and 119877 = 1000We have simulated 119899 = 50 and 200

The sMISE of all estimators for each model and 119899 aregiven in Table 1 The penalized spline estimator using thedirect method shows good performance in each setting Incomparison with other smoothing parameter methods thedirect method is a little better than the GCV as a wholeHowever for 119899 = 200 C-GCV is better than C-Direct inF1 and F3 though its difference is very small We see thatthe sMISE of C-Direct is smaller than local linear whereaslocal linear behaves better than L-Direct in some case In F2C-Direct is the smallest of all estimators for 119899 = 50 and200 Although the performance totally seems to depend onsituation in which data sets are generated we believe that theproposed method is one of the efficient methods

Next the difference between opt and 120582opt is investigatedempirically Let opt119903 be the opt with 119903th replications for 119903 =1 1000 Then we calculate the sample MSE of opt

sMSE = 1119877

119877

sum

119903=1

opt119903 minus 120582opt2

(42)

for F1 F2 and F3 and 119899 = 50 and 200 To construct 120582opt forF1 F2 and F3 we use true119891(4)(119909) and 1205902 and an approximateblowast Here the approximate blowast means the sample average of bwith 119899 = 200 and 1000 replications

In Table 2 the sMSE of opt for each true functionare described For comparison the sMSE of the smoothing

8 Journal of Probability and Statistics

Table 3 Results of sample PSE for 119899 = 50 and 119899 = 200 All entries for PSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0424 0052 0341 0070 0360 0117L-GCV 0331 0084 0381 0089 0333 0139L-REML 0642 0124 0831 1002 0733 0173Local linear 0751 0220 0474 0901 0489 0143C-Direct 0341 048 0237 063 0424 0086C-GCV 0314 043 0326 076 0533 0089C-REML 0863 1672 0456 121 1043 0125

parameter obtained via GCV and REML are calculated ThesMSE of the L-Direct and the C-Direct is small even in 119899 = 50for all true functions Therefore it seems that the accuracyof opt is guaranteed It indicates that the pilot estimatorconstructed via least squares method is not bad The sMSEwith the direct method are smaller than that with GCV andREML This result is not surprising since GCV and REMLare not concerned with 120582opt However together with Table 1it seems that the sample MSE of the smoothing parameter isreflected in the sample MISE of the estimator

The proposed direct method was derived based on theMISE On the other hand the GCV and the REML areobtained in context of minimizing prediction squared error(PSE) and prediction error Hence we compare the samplePSE of the penalized spline estimator with the direct methodGCV and REML and the local linear estimator Since theprediction error is almost the same as MISE (see Section 4 ofRuppert et al [4]) we omit the comparison of the predictionerror Let

sPSE = 1119899

119899

sum

119894=1

[1

119877

119877

sum

119903=1

119910119894119903 minus 119891119903 (119909119894)2

] (43)

be the sample PSE for any estimator 119891(119909) where119910119894119903 (119903 = 1 119877) is independently generated from119884119894 sim 119873(119891(119909119894) (05)

2) for all 119894

In Table 3 the modified sPSE |sPSE minus (05)2| of allestimators for each model and 119899 are described In Remark 11we discuss the modified sPSE From the result we canconfirm that the direct method has good predictability GCVcan be regarded as the estimator of sPSE Therefore in somecase sPSE with GCV is smaller than that with the directmethod It seems that the cause is the accuracy of theestimates of variance (see Remark 11) However its differenceis very small

We admit that the computational time (in second) takento obtain 119891(119909) for F1 119901 = 3 and 119899 = 200 The fits withthe direct method GCV and REML took 004 122 and 034Although the difference is small the computational time ofthe direct method was faster than that of GCV and REML

Next we confirm the behavior of the penalized splineestimator with the direct method in the additive model Forthe data (119884119894 1199091198941 1199091198942 1199091198943) 119894 = 1 119899 we assume theadditive model with true functions 1198911(1199091) = sin(21205871199091)

1198912(1199092) = 120601((1199092 minus 05)02) and 1198913(1199093) = 04120601((1199093 minus

01)02) + 06120601((1199092 minus 08)02) The error is similar to thefirst simulationThe design points (1199091198941 1199091198942 1199091198943) are generatedby

[

[

1199091198941

1199091198942

1199091198943

]

]

=

[[[

[

(1 + 120588 + 1205882)minus1

0 0

0 (1 + 2120588)minus1

0

0 0 (1 + 120588 + 1205882)minus1

]]]

]

times [

[

1 120588 1205882

120588 1 120588

1205882120588 1

]

]

[

[

1199111198941

1199111198942

1199111198943

]

]

(44)

where 119911119894119895 (119894 = 1 119899 119895 = 1 2 3) are generated inde-pendently from the uniform distribution on [0 1] In thissimulation we adopt 120588 = 0 and 120588 = 05 We then correctedto satisfy

int

1

0

119891119895 (119911) 119889119911 = 0 119895 = 1 2 3 (45)

in each 120588 We construct the penalized spline estimator viathe backfitting algorithm with the cubic spline and secondorder difference penalty The number of equidistant knots is119870119895 = 5119899

25 and the smoothing parameters are determinedusing the direct method The pilot estimator to construct119895opt is the regression spline estimator with fifth spline and119870119895 = 119899

25 We calculate the sample MISE of each 119891119895 for119899 = 200 and 1000 Monte Carlo iterations Then we calculatethe sample MISE of 1198911 1198912 and 1198913 In order to compare withthe direct method we also conduct the same simulation withGCV and REML In GCV we set the candidate values of 120582119895 as11989410 119894 = 0 1 9 for 119895 = 1 2 3 Table 4 summarizes thesample MISE of 119891119895 (119895 = 1 2 3) denoted by MISE119895 for directmethod GCV and REML The penalized spline estimatorwith the direct method performs like that with the GCV inboth the uncorrelated and correlated design cases For 120588 = 0the behaviors of MISE1 MISE2 and MISE3 with the directmethod are similar On the other hand for GCV the MISE1is slightly larger than MISE2 and MISE3 The direct methodleads to an efficient estimate of all covariates On the wholethe direct method is better than REML From above webelieve that the direct method is preferable in practice

Journal of Probability and Statistics 9

Table 4 Results of sample MISE of the penalized spline estimator For each direct method GCV and 120588 MISE1 MISE2 and MISE3 are thesample MISE of 1198911 1198912 and 1198913 respectively All entries for MISE are 102 times their actual values

Method 120588 = 0 120588 = 05

MISE1 MISE2 MISE3 MISE1 MISE2 MISE3Direct 0281 0209 0205 0916 0795 0972GCV 0687 0237 0390 0929 0857 1112REML 1204 0825 0621 2104 1832 2263

Table 5 Results of sample MSE of the selected smoothing parameter via direct method GCV and REML for 120588 = 0 and 120588 = 05 All entriesfor MSE are 102 times their actual values

120588 = 0 120588 = 05

True 1198911 1198912 1198913 1198911 1198912 1198913

Direct 0124 0042 0082 0312 0105 0428GCV 1315 0485 0213 0547 0352 0741REML 2531 1237 2656 1043 0846 1588

Table 6 Results of sample PSE of the penalized spline estimator Allentries for PSE are 102 times their actual values

119899 = 200 120588 = 0 120588 = 05

Direct 0281 0429GCV 0387 0467REML 1204 0925

To confirm the consistency of the direct method thesample MSE of 119895opt is calculated the same manner as thatgiven in univariate regression case For comparison thesample MSE of GCV and REML are obtained Here the true120582119895opt (119895 = 1 2 3) is defined in the same way as that forunivariate case In Table 5 the results for each 1198911 1198912 and 1198913and 120588 = 0 and 05 are shown We see from the results that thebehavior of the directmethod is goodThe sampleMSE of thedirect method is smaller than that of GCV and REML for all119895 For the random design 120588 = 05 such tendency can be alsoseen

Table 6 shows the sPSE of 119891(x119894) = 1198911(1199091198941) + 1198912(1199091198942) +1198913(1199091198943)with each smoothing parameter selectionmethodWesee from result that the efficiency of the direct method isguaranteed in the context of prediction accuracy

Finally we show the computational time (in second)required to construct the penalized spline estimatorwith eachmethod The computational times with the direct methodGCV and REML are 1187 s 12643 s and 435 s respectivelyWe see that the direct method is more efficient than othermethods in context of the computation (see Remark 11) Allof computations in the simulation were done using a softwareR and a computer with 340GHz CPU and 240GB memoryThough this is only few examples the direct method can beseen as one of the good methods to select the smoothingparameter

Remark 11 We calculate |sPSE minus (05)2| as the criteria toconfirm the prediction squared error The ordinal PSE isdefined as

PSE = 1119899

119899

sum

119894=1

119864 [119884119894 minus 119891 (119909119894)2

]

= 1205902+1

119899

119899

sum

119894=1

119864 [119891 (119909119894) minus 119891 (119909119894)2

]

(46)

where 119884119894rsquos are the test data The second term of the PSEis similar to MISE So it can be said that the sample PSEevaluates the accuracy of the variance part andMISE part Tosee the detail of the difference of the sample PSE between thedirect method and other method we calculated |sPSE minus 1205902|in this section

6 Discussion

In this paper we proposed a new direct method for deter-mining the smoothing parameter for a penalized splineestimator in a regression problemThe directmethod is basedon minimizing MISE of the penalized spline estimator Westudied the asymptotic property of the direct method Theasymptotic normality of the penalized spline estimator usingopt is theoretically guaranteed when the consistent estimatoris used as the pilot estimator to obtain opt In numericalstudy for the additive model the computational cost of thisdirect method is dramatically reduced when compared togrid search methods such as GCV Furthermore we find thatthe performance of the direct method is better than or at leastsimilar to that of other methods

The direct method will be developed for other regressionmodels such as varying-coefficient models Cox propor-tional hazards models single-index models and others ifthe asymptotic bias and variance of the penalized spline

10 Journal of Probability and Statistics

estimator are derived It is not limited to themean regressionit can be applied to quantile regression Actually Yoshida[31] has presented the asymptotic bias and variance of thepenalized spline estimator in univariate quantile regressionsFurthermore it is seen that improving the direct method isimportant for various situations and datasets In particularthe development of the determination of locally adaptive 120582 isan interesting avenue of further research

Appendix

We describe the technical details For the matrix 119860 = (119886119894119895)119894119895119860infin = max119894119895|119886119894119895| First to prove Theorems 1 and 2 weintroduce the fundamental property of penalized splines inthe following lemma

Lemma A1 Let 119860 = (119886119894119895)119894119895 be (119870 + 119901) matrix Supposethat 119870 = 119900(11989912) 120582 = 119900(1198991198701minus119898) and 119860infin = 119874(1) Then119860119866minus1

119899infin= 119874(119870) and 119860Λminus1

119899infin= 119874(119870)

The proof of Lemma A1 is shown in Zhou et al [28] andClaeskens et al [18] The repeated use of Lemma A1 yieldsthat the asymptotic order of the variance of the second termof (12) is 119874(1205822(119870119899)3) Actually the asymptotic order of thevariance of the second term of (12) can be calculated as

(120582

119899)

21205902

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899119863119879

119898119863119898Λminus1

119899B (119909)

= 119874(1205822

11989931198703)

(A1)

When 119870 = 119900(11989912) and 120582 = 119874(119899119870

1minus119898) 1205822(119870119899)3 =

119900(120582(119870119899)2) holds

Proof of Theorem 1 We write

119862 (119909 | 119899 119870 120582) =21205902

119899B(119909)119879119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909)

+21205902

119899B(119909)119879 Λminus1

119899minus 119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909) (A2)

Since

Λminus1

119899minus 119866minus1

119899= Λminus1

119899119868 minus Λ 119899119866

minus1

119899

=120582

119899Λminus1

119899119863119879

119898119863119898119866minus1

119899

(A3)

we have Λminus1119899minus 119866minus1

119899infin

= 119874(1205821198702119899) and hence the

second term of right hand side of (A2) can be shown119874((119870119899)

2(120582119870119899)) From Lemma A1 it is easy to derive that

119862(119909 | 119899 119870 0) = 119874(119870(119870119899)) Consequently we have

119862 (119909 | 119899 119870 120582)

119870=119862 (119909 | 119899 119870 0)

119870+ 119874((

119870

119899)

2

(120582

119899))

=119862 (119909 | 119899 119870 0)

119870+ 119900 (

119870

119899)

(A4)

and this leads to Theorem 1

Proof of Theorem 2 It is sufficient to prove that int1198871198861198631119899(119909 |

blowast)119889119909 = 119874(1198702(1minus119898)) and

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874(1198702

119899)

(A5)

Since100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

119889119909

100381610038161003816100381610038161003816100381610038161003816

le sup119909isin[119886119887]

[B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

] (119886 minus 119887)

(A6)

we have

int

119887

119886

1198631119899 (119909 | blowast) 119889119909 = 119874 (119870

2(1minus119898)) (A7)

by using Lemma A1 and the fact that 119863119898blowastinfin = 119874(119870minus119898)

(see Yoshida [31])From the property of119861-spline function for (119870+119901) square

matrix 119860 there exists 119862 gt 0 such that

int

119887

119886

B(119909)119879119860B (119909) 119889119909 le 119860infin int119887

119886

B(119909)119879B (119909) 119889119909

le 119860infin119862

(A8)

In other words the asymptotic order of int119887119886B(119909)119879119860B(119909)119889119909

and that of 119860infin are the same Let 119860119899 = 119866minus1

119899119863119879

119898119863119898119866minus1

119899

Then since 119860119899infin = 1198702 we obtain

int

119887

119886

119862 (119909 | 119899 119870 0) 119889119909 = 119874(1198702

119899) (A9)

Furthermore because sup119886isin[119886119887]

119887119886(119909) = 119874(1) we have

119870minus(119901+1)

100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

119887119886 (119909)B(119909)119879119866minus1

119899119863119879

119898119863119898blowast119889119909

100381610038161003816100381610038161003816100381610038161003816

= 119874 (119870minus(119901+119898)

)

(A10)

and hence it can be shown that

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874 (119899minus11198702) (A11)

Together with (A7) and (A11) 120582opt = 119874(119899119870119898minus119901minus2

+ 1198702119898)

can be obtained Then rate of convergence of the penalizedspline estimator with 120582opt is 119874(119899

minus2(119901+2)(2119901 + 3)) when 119870 =

119874(1198991(2119901+3)

) which is detailed in Yoshida and Naito [22]

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Journal of Probability and Statistics 11

Acknowledgment

The author also would like to thank two anonymous refereesfor their careful reading and comments to improve the paper

References

[1] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash527 1986

[2] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash121 1996 With comments and a rejoinder by the authors

[3] B D Marx and P H C Eilers ldquoDirect generalized additivemodeling with penalized likelihoodrdquo Computational Statisticsand Data Analysis vol 28 no 2 pp 193ndash209 1998

[4] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge UK 2003

[5] T Krivobokova ldquoSmoothing parameter selection in two frame-works for penalized splinesrdquo Journal of the Royal StatisticalSociety B vol 75 no 4 pp 725ndash741 2013

[6] P T Reiss and R T Ogden ldquoSmoothing parameter selection fora class of semiparametric linear modelsrdquo Journal of the RoyalStatistical Society B vol 71 no 2 pp 505ndash523 2009

[7] S N Wood ldquoModelling and smoothing parameter estimationwith multiple quadratic penaltiesrdquo Journal of the Royal Statisti-cal Society B vol 62 no 2 pp 413ndash428 2000

[8] S NWood ldquoStable and efficient multiple smoothing parameterestimation for generalized additive modelsrdquo Journal of theAmerican Statistical Association vol 99 no 467 pp 673ndash6862004

[9] S N Wood ldquoFast stable direct fitting and smoothness selectionfor generalized additive modelsrdquo Journal of the Royal StatisticalSociety B vol 70 no 3 pp 495ndash518 2008

[10] X Lin and D Zhang ldquoInference in generalized additive mixedmodels by using smoothing splinesrdquo Journal of the RoyalStatistical Society B vol 61 no 2 pp 381ndash400 1999

[11] M P Wand ldquoSmoothing and mixed modelsrdquo ComputationalStatistics vol 18 no 2 pp 223ndash249 2003

[12] L Fahrmeir T Kneib and S Lang ldquoPenalized structuredadditive regression for space-time data a Bayesian perspectiverdquoStatistica Sinica vol 14 no 3 pp 731ndash761 2004

[13] L Fahrmeir and T Kneib ldquoPropriety of posteriors in structuredadditive regression models theory and empirical evidencerdquoJournal of Statistical Planning and Inference vol 139 no 3 pp843ndash859 2009

[14] F Heinzl L Fahrmeir and T Kneib ldquoAdditive mixed modelswith Dirichlet process mixture and P-spline priorsrdquo Advancesin Statistical Analysis vol 96 no 1 pp 47ndash68 2012

[15] G Kauermann ldquoA note on smoothing parameter selection forpenalized spline smoothingrdquo Journal of Statistical Planning andInference vol 127 no 1-2 pp 53ndash69 2005

[16] P Hall and J D Opsomer ldquoTheory for penalised splineregressionrdquo Biometrika vol 92 no 1 pp 105ndash118 2005

[17] Y Li and D Ruppert ldquoOn the asymptotics of penalized splinesrdquoBiometrika vol 95 no 2 pp 415ndash436 2008

[18] G Claeskens T Krivobokova and J D Opsomer ldquoAsymptoticproperties of penalized spline estimatorsrdquo Biometrika vol 96no 3 pp 529ndash544 2009

[19] G Kauermann T Krivobokova and L Fahrmeir ldquoSomeasymptotic results on generalized penalized spline smoothingrdquo

Journal of the Royal Statistical Society B vol 71 no 2 pp 487ndash503 2009

[20] X Wang J Shen and D Ruppert ldquoOn the asymptotics ofpenalized spline smoothingrdquo Electronic Journal of Statistics vol5 pp 1ndash17 2011

[21] T Yoshida and K Naito ldquoAsymptotics for penalized additive 119861-spline regressionrdquo Journal of the Japan Statistical Society vol 42no 1 pp 81ndash107 2012

[22] T Yoshida and K Naito ldquoAsymptotics for penalized splinesin generalized additive modelsrdquo Journal of NonparametricStatistics vol 26 no 2 pp 269ndash289 2014

[23] L Xiao Y Li and D Ruppert ldquoFast bivariate 119875-splines thesandwich smootherrdquo Journal of the Royal Statistical Society Bvol 75 no 3 pp 577ndash599 2013

[24] D Ruppert S J Sheather and M P Wand ldquoAn effectivebandwidth selector for local least squares regressionrdquo Journal ofthe American Statistical Association vol 90 no 432 pp 1257ndash1270 1995

[25] M P Wand and M C Jones Kernel Smoothing Chapman ampHall London UK 1995

[26] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[27] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[28] S Zhou X Shen and D A Wolfe ldquoLocal asymptotics forregression splines and confidence regionsrdquo The Annals ofStatistics vol 26 no 5 pp 1760ndash1782 1998

[29] G K Robinson ldquoThat BLUP is a good thing the estimation ofrandom effectsrdquo Statistical Science vol 6 no 1 pp 15ndash32 1991

[30] T J Hastie and R J Tibshirani Generalized Additive ModelsChapman amp Hall London UKI 1990

[31] T Yoshida ldquoAsymptotics for penalized spline estimators inquantile regressionrdquo Communications in Statistics-Theory andMethods 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Direct Determination of Smoothing

8 Journal of Probability and Statistics

Table 3 Results of sample PSE for 119899 = 50 and 119899 = 200 All entries for PSE are 102 times their actual values

True F1 F2 F3Sample size 119899 = 50 119899 = 200 119899 = 50 119899 = 200 119899 = 50 119899 = 200

L-Direct 0424 0052 0341 0070 0360 0117L-GCV 0331 0084 0381 0089 0333 0139L-REML 0642 0124 0831 1002 0733 0173Local linear 0751 0220 0474 0901 0489 0143C-Direct 0341 048 0237 063 0424 0086C-GCV 0314 043 0326 076 0533 0089C-REML 0863 1672 0456 121 1043 0125

parameter obtained via GCV and REML are calculated ThesMSE of the L-Direct and the C-Direct is small even in 119899 = 50for all true functions Therefore it seems that the accuracyof opt is guaranteed It indicates that the pilot estimatorconstructed via least squares method is not bad The sMSEwith the direct method are smaller than that with GCV andREML This result is not surprising since GCV and REMLare not concerned with 120582opt However together with Table 1it seems that the sample MSE of the smoothing parameter isreflected in the sample MISE of the estimator

The proposed direct method was derived based on theMISE On the other hand the GCV and the REML areobtained in context of minimizing prediction squared error(PSE) and prediction error Hence we compare the samplePSE of the penalized spline estimator with the direct methodGCV and REML and the local linear estimator Since theprediction error is almost the same as MISE (see Section 4 ofRuppert et al [4]) we omit the comparison of the predictionerror Let

sPSE = 1119899

119899

sum

119894=1

[1

119877

119877

sum

119903=1

119910119894119903 minus 119891119903 (119909119894)2

] (43)

be the sample PSE for any estimator 119891(119909) where119910119894119903 (119903 = 1 119877) is independently generated from119884119894 sim 119873(119891(119909119894) (05)

2) for all 119894

In Table 3 the modified sPSE |sPSE minus (05)2| of allestimators for each model and 119899 are described In Remark 11we discuss the modified sPSE From the result we canconfirm that the direct method has good predictability GCVcan be regarded as the estimator of sPSE Therefore in somecase sPSE with GCV is smaller than that with the directmethod It seems that the cause is the accuracy of theestimates of variance (see Remark 11) However its differenceis very small

We admit that the computational time (in second) takento obtain 119891(119909) for F1 119901 = 3 and 119899 = 200 The fits withthe direct method GCV and REML took 004 122 and 034Although the difference is small the computational time ofthe direct method was faster than that of GCV and REML

Next we confirm the behavior of the penalized splineestimator with the direct method in the additive model Forthe data (119884119894 1199091198941 1199091198942 1199091198943) 119894 = 1 119899 we assume theadditive model with true functions 1198911(1199091) = sin(21205871199091)

1198912(1199092) = 120601((1199092 minus 05)02) and 1198913(1199093) = 04120601((1199093 minus

01)02) + 06120601((1199092 minus 08)02) The error is similar to thefirst simulationThe design points (1199091198941 1199091198942 1199091198943) are generatedby

[

[

1199091198941

1199091198942

1199091198943

]

]

=

[[[

[

(1 + 120588 + 1205882)minus1

0 0

0 (1 + 2120588)minus1

0

0 0 (1 + 120588 + 1205882)minus1

]]]

]

times [

[

1 120588 1205882

120588 1 120588

1205882120588 1

]

]

[

[

1199111198941

1199111198942

1199111198943

]

]

(44)

where 119911119894119895 (119894 = 1 119899 119895 = 1 2 3) are generated inde-pendently from the uniform distribution on [0 1] In thissimulation we adopt 120588 = 0 and 120588 = 05 We then correctedto satisfy

int

1

0

119891119895 (119911) 119889119911 = 0 119895 = 1 2 3 (45)

in each 120588 We construct the penalized spline estimator viathe backfitting algorithm with the cubic spline and secondorder difference penalty The number of equidistant knots is119870119895 = 5119899

25 and the smoothing parameters are determinedusing the direct method The pilot estimator to construct119895opt is the regression spline estimator with fifth spline and119870119895 = 119899

25 We calculate the sample MISE of each 119891119895 for119899 = 200 and 1000 Monte Carlo iterations Then we calculatethe sample MISE of 1198911 1198912 and 1198913 In order to compare withthe direct method we also conduct the same simulation withGCV and REML In GCV we set the candidate values of 120582119895 as11989410 119894 = 0 1 9 for 119895 = 1 2 3 Table 4 summarizes thesample MISE of 119891119895 (119895 = 1 2 3) denoted by MISE119895 for directmethod GCV and REML The penalized spline estimatorwith the direct method performs like that with the GCV inboth the uncorrelated and correlated design cases For 120588 = 0the behaviors of MISE1 MISE2 and MISE3 with the directmethod are similar On the other hand for GCV the MISE1is slightly larger than MISE2 and MISE3 The direct methodleads to an efficient estimate of all covariates On the wholethe direct method is better than REML From above webelieve that the direct method is preferable in practice

Journal of Probability and Statistics 9

Table 4 Results of sample MISE of the penalized spline estimator For each direct method GCV and 120588 MISE1 MISE2 and MISE3 are thesample MISE of 1198911 1198912 and 1198913 respectively All entries for MISE are 102 times their actual values

Method 120588 = 0 120588 = 05

MISE1 MISE2 MISE3 MISE1 MISE2 MISE3Direct 0281 0209 0205 0916 0795 0972GCV 0687 0237 0390 0929 0857 1112REML 1204 0825 0621 2104 1832 2263

Table 5 Results of sample MSE of the selected smoothing parameter via direct method GCV and REML for 120588 = 0 and 120588 = 05 All entriesfor MSE are 102 times their actual values

120588 = 0 120588 = 05

True 1198911 1198912 1198913 1198911 1198912 1198913

Direct 0124 0042 0082 0312 0105 0428GCV 1315 0485 0213 0547 0352 0741REML 2531 1237 2656 1043 0846 1588

Table 6 Results of sample PSE of the penalized spline estimator Allentries for PSE are 102 times their actual values

119899 = 200 120588 = 0 120588 = 05

Direct 0281 0429GCV 0387 0467REML 1204 0925

To confirm the consistency of the direct method thesample MSE of 119895opt is calculated the same manner as thatgiven in univariate regression case For comparison thesample MSE of GCV and REML are obtained Here the true120582119895opt (119895 = 1 2 3) is defined in the same way as that forunivariate case In Table 5 the results for each 1198911 1198912 and 1198913and 120588 = 0 and 05 are shown We see from the results that thebehavior of the directmethod is goodThe sampleMSE of thedirect method is smaller than that of GCV and REML for all119895 For the random design 120588 = 05 such tendency can be alsoseen

Table 6 shows the sPSE of 119891(x119894) = 1198911(1199091198941) + 1198912(1199091198942) +1198913(1199091198943)with each smoothing parameter selectionmethodWesee from result that the efficiency of the direct method isguaranteed in the context of prediction accuracy

Finally we show the computational time (in second)required to construct the penalized spline estimatorwith eachmethod The computational times with the direct methodGCV and REML are 1187 s 12643 s and 435 s respectivelyWe see that the direct method is more efficient than othermethods in context of the computation (see Remark 11) Allof computations in the simulation were done using a softwareR and a computer with 340GHz CPU and 240GB memoryThough this is only few examples the direct method can beseen as one of the good methods to select the smoothingparameter

Remark 11 We calculate |sPSE minus (05)2| as the criteria toconfirm the prediction squared error The ordinal PSE isdefined as

PSE = 1119899

119899

sum

119894=1

119864 [119884119894 minus 119891 (119909119894)2

]

= 1205902+1

119899

119899

sum

119894=1

119864 [119891 (119909119894) minus 119891 (119909119894)2

]

(46)

where 119884119894rsquos are the test data The second term of the PSEis similar to MISE So it can be said that the sample PSEevaluates the accuracy of the variance part andMISE part Tosee the detail of the difference of the sample PSE between thedirect method and other method we calculated |sPSE minus 1205902|in this section

6 Discussion

In this paper we proposed a new direct method for deter-mining the smoothing parameter for a penalized splineestimator in a regression problemThe directmethod is basedon minimizing MISE of the penalized spline estimator Westudied the asymptotic property of the direct method Theasymptotic normality of the penalized spline estimator usingopt is theoretically guaranteed when the consistent estimatoris used as the pilot estimator to obtain opt In numericalstudy for the additive model the computational cost of thisdirect method is dramatically reduced when compared togrid search methods such as GCV Furthermore we find thatthe performance of the direct method is better than or at leastsimilar to that of other methods

The direct method will be developed for other regressionmodels such as varying-coefficient models Cox propor-tional hazards models single-index models and others ifthe asymptotic bias and variance of the penalized spline

10 Journal of Probability and Statistics

estimator are derived It is not limited to themean regressionit can be applied to quantile regression Actually Yoshida[31] has presented the asymptotic bias and variance of thepenalized spline estimator in univariate quantile regressionsFurthermore it is seen that improving the direct method isimportant for various situations and datasets In particularthe development of the determination of locally adaptive 120582 isan interesting avenue of further research

Appendix

We describe the technical details For the matrix 119860 = (119886119894119895)119894119895119860infin = max119894119895|119886119894119895| First to prove Theorems 1 and 2 weintroduce the fundamental property of penalized splines inthe following lemma

Lemma A1 Let 119860 = (119886119894119895)119894119895 be (119870 + 119901) matrix Supposethat 119870 = 119900(11989912) 120582 = 119900(1198991198701minus119898) and 119860infin = 119874(1) Then119860119866minus1

119899infin= 119874(119870) and 119860Λminus1

119899infin= 119874(119870)

The proof of Lemma A1 is shown in Zhou et al [28] andClaeskens et al [18] The repeated use of Lemma A1 yieldsthat the asymptotic order of the variance of the second termof (12) is 119874(1205822(119870119899)3) Actually the asymptotic order of thevariance of the second term of (12) can be calculated as

(120582

119899)

21205902

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899119863119879

119898119863119898Λminus1

119899B (119909)

= 119874(1205822

11989931198703)

(A1)

When 119870 = 119900(11989912) and 120582 = 119874(119899119870

1minus119898) 1205822(119870119899)3 =

119900(120582(119870119899)2) holds

Proof of Theorem 1 We write

119862 (119909 | 119899 119870 120582) =21205902

119899B(119909)119879119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909)

+21205902

119899B(119909)119879 Λminus1

119899minus 119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909) (A2)

Since

Λminus1

119899minus 119866minus1

119899= Λminus1

119899119868 minus Λ 119899119866

minus1

119899

=120582

119899Λminus1

119899119863119879

119898119863119898119866minus1

119899

(A3)

we have Λminus1119899minus 119866minus1

119899infin

= 119874(1205821198702119899) and hence the

second term of right hand side of (A2) can be shown119874((119870119899)

2(120582119870119899)) From Lemma A1 it is easy to derive that

119862(119909 | 119899 119870 0) = 119874(119870(119870119899)) Consequently we have

119862 (119909 | 119899 119870 120582)

119870=119862 (119909 | 119899 119870 0)

119870+ 119874((

119870

119899)

2

(120582

119899))

=119862 (119909 | 119899 119870 0)

119870+ 119900 (

119870

119899)

(A4)

and this leads to Theorem 1

Proof of Theorem 2 It is sufficient to prove that int1198871198861198631119899(119909 |

blowast)119889119909 = 119874(1198702(1minus119898)) and

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874(1198702

119899)

(A5)

Since100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

119889119909

100381610038161003816100381610038161003816100381610038161003816

le sup119909isin[119886119887]

[B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

] (119886 minus 119887)

(A6)

we have

int

119887

119886

1198631119899 (119909 | blowast) 119889119909 = 119874 (119870

2(1minus119898)) (A7)

by using Lemma A1 and the fact that 119863119898blowastinfin = 119874(119870minus119898)

(see Yoshida [31])From the property of119861-spline function for (119870+119901) square

matrix 119860 there exists 119862 gt 0 such that

int

119887

119886

B(119909)119879119860B (119909) 119889119909 le 119860infin int119887

119886

B(119909)119879B (119909) 119889119909

le 119860infin119862

(A8)

In other words the asymptotic order of int119887119886B(119909)119879119860B(119909)119889119909

and that of 119860infin are the same Let 119860119899 = 119866minus1

119899119863119879

119898119863119898119866minus1

119899

Then since 119860119899infin = 1198702 we obtain

int

119887

119886

119862 (119909 | 119899 119870 0) 119889119909 = 119874(1198702

119899) (A9)

Furthermore because sup119886isin[119886119887]

119887119886(119909) = 119874(1) we have

119870minus(119901+1)

100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

119887119886 (119909)B(119909)119879119866minus1

119899119863119879

119898119863119898blowast119889119909

100381610038161003816100381610038161003816100381610038161003816

= 119874 (119870minus(119901+119898)

)

(A10)

and hence it can be shown that

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874 (119899minus11198702) (A11)

Together with (A7) and (A11) 120582opt = 119874(119899119870119898minus119901minus2

+ 1198702119898)

can be obtained Then rate of convergence of the penalizedspline estimator with 120582opt is 119874(119899

minus2(119901+2)(2119901 + 3)) when 119870 =

119874(1198991(2119901+3)

) which is detailed in Yoshida and Naito [22]

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Journal of Probability and Statistics 11

Acknowledgment

The author also would like to thank two anonymous refereesfor their careful reading and comments to improve the paper

References

[1] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash527 1986

[2] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash121 1996 With comments and a rejoinder by the authors

[3] B D Marx and P H C Eilers ldquoDirect generalized additivemodeling with penalized likelihoodrdquo Computational Statisticsand Data Analysis vol 28 no 2 pp 193ndash209 1998

[4] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge UK 2003

[5] T Krivobokova ldquoSmoothing parameter selection in two frame-works for penalized splinesrdquo Journal of the Royal StatisticalSociety B vol 75 no 4 pp 725ndash741 2013

[6] P T Reiss and R T Ogden ldquoSmoothing parameter selection fora class of semiparametric linear modelsrdquo Journal of the RoyalStatistical Society B vol 71 no 2 pp 505ndash523 2009

[7] S N Wood ldquoModelling and smoothing parameter estimationwith multiple quadratic penaltiesrdquo Journal of the Royal Statisti-cal Society B vol 62 no 2 pp 413ndash428 2000

[8] S NWood ldquoStable and efficient multiple smoothing parameterestimation for generalized additive modelsrdquo Journal of theAmerican Statistical Association vol 99 no 467 pp 673ndash6862004

[9] S N Wood ldquoFast stable direct fitting and smoothness selectionfor generalized additive modelsrdquo Journal of the Royal StatisticalSociety B vol 70 no 3 pp 495ndash518 2008

[10] X Lin and D Zhang ldquoInference in generalized additive mixedmodels by using smoothing splinesrdquo Journal of the RoyalStatistical Society B vol 61 no 2 pp 381ndash400 1999

[11] M P Wand ldquoSmoothing and mixed modelsrdquo ComputationalStatistics vol 18 no 2 pp 223ndash249 2003

[12] L Fahrmeir T Kneib and S Lang ldquoPenalized structuredadditive regression for space-time data a Bayesian perspectiverdquoStatistica Sinica vol 14 no 3 pp 731ndash761 2004

[13] L Fahrmeir and T Kneib ldquoPropriety of posteriors in structuredadditive regression models theory and empirical evidencerdquoJournal of Statistical Planning and Inference vol 139 no 3 pp843ndash859 2009

[14] F Heinzl L Fahrmeir and T Kneib ldquoAdditive mixed modelswith Dirichlet process mixture and P-spline priorsrdquo Advancesin Statistical Analysis vol 96 no 1 pp 47ndash68 2012

[15] G Kauermann ldquoA note on smoothing parameter selection forpenalized spline smoothingrdquo Journal of Statistical Planning andInference vol 127 no 1-2 pp 53ndash69 2005

[16] P Hall and J D Opsomer ldquoTheory for penalised splineregressionrdquo Biometrika vol 92 no 1 pp 105ndash118 2005

[17] Y Li and D Ruppert ldquoOn the asymptotics of penalized splinesrdquoBiometrika vol 95 no 2 pp 415ndash436 2008

[18] G Claeskens T Krivobokova and J D Opsomer ldquoAsymptoticproperties of penalized spline estimatorsrdquo Biometrika vol 96no 3 pp 529ndash544 2009

[19] G Kauermann T Krivobokova and L Fahrmeir ldquoSomeasymptotic results on generalized penalized spline smoothingrdquo

Journal of the Royal Statistical Society B vol 71 no 2 pp 487ndash503 2009

[20] X Wang J Shen and D Ruppert ldquoOn the asymptotics ofpenalized spline smoothingrdquo Electronic Journal of Statistics vol5 pp 1ndash17 2011

[21] T Yoshida and K Naito ldquoAsymptotics for penalized additive 119861-spline regressionrdquo Journal of the Japan Statistical Society vol 42no 1 pp 81ndash107 2012

[22] T Yoshida and K Naito ldquoAsymptotics for penalized splinesin generalized additive modelsrdquo Journal of NonparametricStatistics vol 26 no 2 pp 269ndash289 2014

[23] L Xiao Y Li and D Ruppert ldquoFast bivariate 119875-splines thesandwich smootherrdquo Journal of the Royal Statistical Society Bvol 75 no 3 pp 577ndash599 2013

[24] D Ruppert S J Sheather and M P Wand ldquoAn effectivebandwidth selector for local least squares regressionrdquo Journal ofthe American Statistical Association vol 90 no 432 pp 1257ndash1270 1995

[25] M P Wand and M C Jones Kernel Smoothing Chapman ampHall London UK 1995

[26] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[27] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[28] S Zhou X Shen and D A Wolfe ldquoLocal asymptotics forregression splines and confidence regionsrdquo The Annals ofStatistics vol 26 no 5 pp 1760ndash1782 1998

[29] G K Robinson ldquoThat BLUP is a good thing the estimation ofrandom effectsrdquo Statistical Science vol 6 no 1 pp 15ndash32 1991

[30] T J Hastie and R J Tibshirani Generalized Additive ModelsChapman amp Hall London UKI 1990

[31] T Yoshida ldquoAsymptotics for penalized spline estimators inquantile regressionrdquo Communications in Statistics-Theory andMethods 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Direct Determination of Smoothing

Journal of Probability and Statistics 9

Table 4 Results of sample MISE of the penalized spline estimator For each direct method GCV and 120588 MISE1 MISE2 and MISE3 are thesample MISE of 1198911 1198912 and 1198913 respectively All entries for MISE are 102 times their actual values

Method 120588 = 0 120588 = 05

MISE1 MISE2 MISE3 MISE1 MISE2 MISE3Direct 0281 0209 0205 0916 0795 0972GCV 0687 0237 0390 0929 0857 1112REML 1204 0825 0621 2104 1832 2263

Table 5 Results of sample MSE of the selected smoothing parameter via direct method GCV and REML for 120588 = 0 and 120588 = 05 All entriesfor MSE are 102 times their actual values

120588 = 0 120588 = 05

True 1198911 1198912 1198913 1198911 1198912 1198913

Direct 0124 0042 0082 0312 0105 0428GCV 1315 0485 0213 0547 0352 0741REML 2531 1237 2656 1043 0846 1588

Table 6 Results of sample PSE of the penalized spline estimator Allentries for PSE are 102 times their actual values

119899 = 200 120588 = 0 120588 = 05

Direct 0281 0429GCV 0387 0467REML 1204 0925

To confirm the consistency of the direct method thesample MSE of 119895opt is calculated the same manner as thatgiven in univariate regression case For comparison thesample MSE of GCV and REML are obtained Here the true120582119895opt (119895 = 1 2 3) is defined in the same way as that forunivariate case In Table 5 the results for each 1198911 1198912 and 1198913and 120588 = 0 and 05 are shown We see from the results that thebehavior of the directmethod is goodThe sampleMSE of thedirect method is smaller than that of GCV and REML for all119895 For the random design 120588 = 05 such tendency can be alsoseen

Table 6 shows the sPSE of 119891(x119894) = 1198911(1199091198941) + 1198912(1199091198942) +1198913(1199091198943)with each smoothing parameter selectionmethodWesee from result that the efficiency of the direct method isguaranteed in the context of prediction accuracy

Finally we show the computational time (in second)required to construct the penalized spline estimatorwith eachmethod The computational times with the direct methodGCV and REML are 1187 s 12643 s and 435 s respectivelyWe see that the direct method is more efficient than othermethods in context of the computation (see Remark 11) Allof computations in the simulation were done using a softwareR and a computer with 340GHz CPU and 240GB memoryThough this is only few examples the direct method can beseen as one of the good methods to select the smoothingparameter

Remark 11 We calculate |sPSE minus (05)2| as the criteria toconfirm the prediction squared error The ordinal PSE isdefined as

PSE = 1119899

119899

sum

119894=1

119864 [119884119894 minus 119891 (119909119894)2

]

= 1205902+1

119899

119899

sum

119894=1

119864 [119891 (119909119894) minus 119891 (119909119894)2

]

(46)

where 119884119894rsquos are the test data The second term of the PSEis similar to MISE So it can be said that the sample PSEevaluates the accuracy of the variance part andMISE part Tosee the detail of the difference of the sample PSE between thedirect method and other method we calculated |sPSE minus 1205902|in this section

6 Discussion

In this paper we proposed a new direct method for deter-mining the smoothing parameter for a penalized splineestimator in a regression problemThe directmethod is basedon minimizing MISE of the penalized spline estimator Westudied the asymptotic property of the direct method Theasymptotic normality of the penalized spline estimator usingopt is theoretically guaranteed when the consistent estimatoris used as the pilot estimator to obtain opt In numericalstudy for the additive model the computational cost of thisdirect method is dramatically reduced when compared togrid search methods such as GCV Furthermore we find thatthe performance of the direct method is better than or at leastsimilar to that of other methods

The direct method will be developed for other regressionmodels such as varying-coefficient models Cox propor-tional hazards models single-index models and others ifthe asymptotic bias and variance of the penalized spline

10 Journal of Probability and Statistics

estimator are derived It is not limited to themean regressionit can be applied to quantile regression Actually Yoshida[31] has presented the asymptotic bias and variance of thepenalized spline estimator in univariate quantile regressionsFurthermore it is seen that improving the direct method isimportant for various situations and datasets In particularthe development of the determination of locally adaptive 120582 isan interesting avenue of further research

Appendix

We describe the technical details For the matrix 119860 = (119886119894119895)119894119895119860infin = max119894119895|119886119894119895| First to prove Theorems 1 and 2 weintroduce the fundamental property of penalized splines inthe following lemma

Lemma A1 Let 119860 = (119886119894119895)119894119895 be (119870 + 119901) matrix Supposethat 119870 = 119900(11989912) 120582 = 119900(1198991198701minus119898) and 119860infin = 119874(1) Then119860119866minus1

119899infin= 119874(119870) and 119860Λminus1

119899infin= 119874(119870)

The proof of Lemma A1 is shown in Zhou et al [28] andClaeskens et al [18] The repeated use of Lemma A1 yieldsthat the asymptotic order of the variance of the second termof (12) is 119874(1205822(119870119899)3) Actually the asymptotic order of thevariance of the second term of (12) can be calculated as

(120582

119899)

21205902

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899119863119879

119898119863119898Λminus1

119899B (119909)

= 119874(1205822

11989931198703)

(A1)

When 119870 = 119900(11989912) and 120582 = 119874(119899119870

1minus119898) 1205822(119870119899)3 =

119900(120582(119870119899)2) holds

Proof of Theorem 1 We write

119862 (119909 | 119899 119870 120582) =21205902

119899B(119909)119879119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909)

+21205902

119899B(119909)119879 Λminus1

119899minus 119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909) (A2)

Since

Λminus1

119899minus 119866minus1

119899= Λminus1

119899119868 minus Λ 119899119866

minus1

119899

=120582

119899Λminus1

119899119863119879

119898119863119898119866minus1

119899

(A3)

we have Λminus1119899minus 119866minus1

119899infin

= 119874(1205821198702119899) and hence the

second term of right hand side of (A2) can be shown119874((119870119899)

2(120582119870119899)) From Lemma A1 it is easy to derive that

119862(119909 | 119899 119870 0) = 119874(119870(119870119899)) Consequently we have

119862 (119909 | 119899 119870 120582)

119870=119862 (119909 | 119899 119870 0)

119870+ 119874((

119870

119899)

2

(120582

119899))

=119862 (119909 | 119899 119870 0)

119870+ 119900 (

119870

119899)

(A4)

and this leads to Theorem 1

Proof of Theorem 2 It is sufficient to prove that int1198871198861198631119899(119909 |

blowast)119889119909 = 119874(1198702(1minus119898)) and

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874(1198702

119899)

(A5)

Since100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

119889119909

100381610038161003816100381610038161003816100381610038161003816

le sup119909isin[119886119887]

[B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

] (119886 minus 119887)

(A6)

we have

int

119887

119886

1198631119899 (119909 | blowast) 119889119909 = 119874 (119870

2(1minus119898)) (A7)

by using Lemma A1 and the fact that 119863119898blowastinfin = 119874(119870minus119898)

(see Yoshida [31])From the property of119861-spline function for (119870+119901) square

matrix 119860 there exists 119862 gt 0 such that

int

119887

119886

B(119909)119879119860B (119909) 119889119909 le 119860infin int119887

119886

B(119909)119879B (119909) 119889119909

le 119860infin119862

(A8)

In other words the asymptotic order of int119887119886B(119909)119879119860B(119909)119889119909

and that of 119860infin are the same Let 119860119899 = 119866minus1

119899119863119879

119898119863119898119866minus1

119899

Then since 119860119899infin = 1198702 we obtain

int

119887

119886

119862 (119909 | 119899 119870 0) 119889119909 = 119874(1198702

119899) (A9)

Furthermore because sup119886isin[119886119887]

119887119886(119909) = 119874(1) we have

119870minus(119901+1)

100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

119887119886 (119909)B(119909)119879119866minus1

119899119863119879

119898119863119898blowast119889119909

100381610038161003816100381610038161003816100381610038161003816

= 119874 (119870minus(119901+119898)

)

(A10)

and hence it can be shown that

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874 (119899minus11198702) (A11)

Together with (A7) and (A11) 120582opt = 119874(119899119870119898minus119901minus2

+ 1198702119898)

can be obtained Then rate of convergence of the penalizedspline estimator with 120582opt is 119874(119899

minus2(119901+2)(2119901 + 3)) when 119870 =

119874(1198991(2119901+3)

) which is detailed in Yoshida and Naito [22]

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Journal of Probability and Statistics 11

Acknowledgment

The author also would like to thank two anonymous refereesfor their careful reading and comments to improve the paper

References

[1] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash527 1986

[2] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash121 1996 With comments and a rejoinder by the authors

[3] B D Marx and P H C Eilers ldquoDirect generalized additivemodeling with penalized likelihoodrdquo Computational Statisticsand Data Analysis vol 28 no 2 pp 193ndash209 1998

[4] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge UK 2003

[5] T Krivobokova ldquoSmoothing parameter selection in two frame-works for penalized splinesrdquo Journal of the Royal StatisticalSociety B vol 75 no 4 pp 725ndash741 2013

[6] P T Reiss and R T Ogden ldquoSmoothing parameter selection fora class of semiparametric linear modelsrdquo Journal of the RoyalStatistical Society B vol 71 no 2 pp 505ndash523 2009

[7] S N Wood ldquoModelling and smoothing parameter estimationwith multiple quadratic penaltiesrdquo Journal of the Royal Statisti-cal Society B vol 62 no 2 pp 413ndash428 2000

[8] S NWood ldquoStable and efficient multiple smoothing parameterestimation for generalized additive modelsrdquo Journal of theAmerican Statistical Association vol 99 no 467 pp 673ndash6862004

[9] S N Wood ldquoFast stable direct fitting and smoothness selectionfor generalized additive modelsrdquo Journal of the Royal StatisticalSociety B vol 70 no 3 pp 495ndash518 2008

[10] X Lin and D Zhang ldquoInference in generalized additive mixedmodels by using smoothing splinesrdquo Journal of the RoyalStatistical Society B vol 61 no 2 pp 381ndash400 1999

[11] M P Wand ldquoSmoothing and mixed modelsrdquo ComputationalStatistics vol 18 no 2 pp 223ndash249 2003

[12] L Fahrmeir T Kneib and S Lang ldquoPenalized structuredadditive regression for space-time data a Bayesian perspectiverdquoStatistica Sinica vol 14 no 3 pp 731ndash761 2004

[13] L Fahrmeir and T Kneib ldquoPropriety of posteriors in structuredadditive regression models theory and empirical evidencerdquoJournal of Statistical Planning and Inference vol 139 no 3 pp843ndash859 2009

[14] F Heinzl L Fahrmeir and T Kneib ldquoAdditive mixed modelswith Dirichlet process mixture and P-spline priorsrdquo Advancesin Statistical Analysis vol 96 no 1 pp 47ndash68 2012

[15] G Kauermann ldquoA note on smoothing parameter selection forpenalized spline smoothingrdquo Journal of Statistical Planning andInference vol 127 no 1-2 pp 53ndash69 2005

[16] P Hall and J D Opsomer ldquoTheory for penalised splineregressionrdquo Biometrika vol 92 no 1 pp 105ndash118 2005

[17] Y Li and D Ruppert ldquoOn the asymptotics of penalized splinesrdquoBiometrika vol 95 no 2 pp 415ndash436 2008

[18] G Claeskens T Krivobokova and J D Opsomer ldquoAsymptoticproperties of penalized spline estimatorsrdquo Biometrika vol 96no 3 pp 529ndash544 2009

[19] G Kauermann T Krivobokova and L Fahrmeir ldquoSomeasymptotic results on generalized penalized spline smoothingrdquo

Journal of the Royal Statistical Society B vol 71 no 2 pp 487ndash503 2009

[20] X Wang J Shen and D Ruppert ldquoOn the asymptotics ofpenalized spline smoothingrdquo Electronic Journal of Statistics vol5 pp 1ndash17 2011

[21] T Yoshida and K Naito ldquoAsymptotics for penalized additive 119861-spline regressionrdquo Journal of the Japan Statistical Society vol 42no 1 pp 81ndash107 2012

[22] T Yoshida and K Naito ldquoAsymptotics for penalized splinesin generalized additive modelsrdquo Journal of NonparametricStatistics vol 26 no 2 pp 269ndash289 2014

[23] L Xiao Y Li and D Ruppert ldquoFast bivariate 119875-splines thesandwich smootherrdquo Journal of the Royal Statistical Society Bvol 75 no 3 pp 577ndash599 2013

[24] D Ruppert S J Sheather and M P Wand ldquoAn effectivebandwidth selector for local least squares regressionrdquo Journal ofthe American Statistical Association vol 90 no 432 pp 1257ndash1270 1995

[25] M P Wand and M C Jones Kernel Smoothing Chapman ampHall London UK 1995

[26] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[27] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[28] S Zhou X Shen and D A Wolfe ldquoLocal asymptotics forregression splines and confidence regionsrdquo The Annals ofStatistics vol 26 no 5 pp 1760ndash1782 1998

[29] G K Robinson ldquoThat BLUP is a good thing the estimation ofrandom effectsrdquo Statistical Science vol 6 no 1 pp 15ndash32 1991

[30] T J Hastie and R J Tibshirani Generalized Additive ModelsChapman amp Hall London UKI 1990

[31] T Yoshida ldquoAsymptotics for penalized spline estimators inquantile regressionrdquo Communications in Statistics-Theory andMethods 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Direct Determination of Smoothing

10 Journal of Probability and Statistics

estimator are derived It is not limited to themean regressionit can be applied to quantile regression Actually Yoshida[31] has presented the asymptotic bias and variance of thepenalized spline estimator in univariate quantile regressionsFurthermore it is seen that improving the direct method isimportant for various situations and datasets In particularthe development of the determination of locally adaptive 120582 isan interesting avenue of further research

Appendix

We describe the technical details For the matrix 119860 = (119886119894119895)119894119895119860infin = max119894119895|119886119894119895| First to prove Theorems 1 and 2 weintroduce the fundamental property of penalized splines inthe following lemma

Lemma A1 Let 119860 = (119886119894119895)119894119895 be (119870 + 119901) matrix Supposethat 119870 = 119900(11989912) 120582 = 119900(1198991198701minus119898) and 119860infin = 119874(1) Then119860119866minus1

119899infin= 119874(119870) and 119860Λminus1

119899infin= 119874(119870)

The proof of Lemma A1 is shown in Zhou et al [28] andClaeskens et al [18] The repeated use of Lemma A1 yieldsthat the asymptotic order of the variance of the second termof (12) is 119874(1205822(119870119899)3) Actually the asymptotic order of thevariance of the second term of (12) can be calculated as

(120582

119899)

21205902

119899B(119909)119879Λminus1

119899119863119879

119898119863119898119866minus1

119899119863119879

119898119863119898Λminus1

119899B (119909)

= 119874(1205822

11989931198703)

(A1)

When 119870 = 119900(11989912) and 120582 = 119874(119899119870

1minus119898) 1205822(119870119899)3 =

119900(120582(119870119899)2) holds

Proof of Theorem 1 We write

119862 (119909 | 119899 119870 120582) =21205902

119899B(119909)119879119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909)

+21205902

119899B(119909)119879 Λminus1

119899minus 119866minus1

119899119863119879

119898119863119898119866minus1

119899B (119909) (A2)

Since

Λminus1

119899minus 119866minus1

119899= Λminus1

119899119868 minus Λ 119899119866

minus1

119899

=120582

119899Λminus1

119899119863119879

119898119863119898119866minus1

119899

(A3)

we have Λminus1119899minus 119866minus1

119899infin

= 119874(1205821198702119899) and hence the

second term of right hand side of (A2) can be shown119874((119870119899)

2(120582119870119899)) From Lemma A1 it is easy to derive that

119862(119909 | 119899 119870 0) = 119874(119870(119870119899)) Consequently we have

119862 (119909 | 119899 119870 120582)

119870=119862 (119909 | 119899 119870 0)

119870+ 119874((

119870

119899)

2

(120582

119899))

=119862 (119909 | 119899 119870 0)

119870+ 119900 (

119870

119899)

(A4)

and this leads to Theorem 1

Proof of Theorem 2 It is sufficient to prove that int1198871198861198631119899(119909 |

blowast)119889119909 = 119874(1198702(1minus119898)) and

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874(1198702

119899)

(A5)

Since100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

119889119909

100381610038161003816100381610038161003816100381610038161003816

le sup119909isin[119886119887]

[B(119909)119879119866minus1119899119863119879

119898119863119898blowast2

] (119886 minus 119887)

(A6)

we have

int

119887

119886

1198631119899 (119909 | blowast) 119889119909 = 119874 (119870

2(1minus119898)) (A7)

by using Lemma A1 and the fact that 119863119898blowastinfin = 119874(119870minus119898)

(see Yoshida [31])From the property of119861-spline function for (119870+119901) square

matrix 119860 there exists 119862 gt 0 such that

int

119887

119886

B(119909)119879119860B (119909) 119889119909 le 119860infin int119887

119886

B(119909)119879B (119909) 119889119909

le 119860infin119862

(A8)

In other words the asymptotic order of int119887119886B(119909)119879119860B(119909)119889119909

and that of 119860infin are the same Let 119860119899 = 119866minus1

119899119863119879

119898119863119898119866minus1

119899

Then since 119860119899infin = 1198702 we obtain

int

119887

119886

119862 (119909 | 119899 119870 0) 119889119909 = 119874(1198702

119899) (A9)

Furthermore because sup119886isin[119886119887]

119887119886(119909) = 119874(1) we have

119870minus(119901+1)

100381610038161003816100381610038161003816100381610038161003816

int

119887

119886

119887119886 (119909)B(119909)119879119866minus1

119899119863119879

119898119863119898blowast119889119909

100381610038161003816100381610038161003816100381610038161003816

= 119874 (119870minus(119901+119898)

)

(A10)

and hence it can be shown that

int

119887

119886

1198632119899 (119909 | 119891(119901+1)

blowast) 119889119909 = 119874 (119870minus(119901+119898)) + 119874 (119899minus11198702) (A11)

Together with (A7) and (A11) 120582opt = 119874(119899119870119898minus119901minus2

+ 1198702119898)

can be obtained Then rate of convergence of the penalizedspline estimator with 120582opt is 119874(119899

minus2(119901+2)(2119901 + 3)) when 119870 =

119874(1198991(2119901+3)

) which is detailed in Yoshida and Naito [22]

Conflict of Interests

The author declares that there is no conflict of interestsregarding the publication of this paper

Journal of Probability and Statistics 11

Acknowledgment

The author also would like to thank two anonymous refereesfor their careful reading and comments to improve the paper

References

[1] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash527 1986

[2] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash121 1996 With comments and a rejoinder by the authors

[3] B D Marx and P H C Eilers ldquoDirect generalized additivemodeling with penalized likelihoodrdquo Computational Statisticsand Data Analysis vol 28 no 2 pp 193ndash209 1998

[4] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge UK 2003

[5] T Krivobokova ldquoSmoothing parameter selection in two frame-works for penalized splinesrdquo Journal of the Royal StatisticalSociety B vol 75 no 4 pp 725ndash741 2013

[6] P T Reiss and R T Ogden ldquoSmoothing parameter selection fora class of semiparametric linear modelsrdquo Journal of the RoyalStatistical Society B vol 71 no 2 pp 505ndash523 2009

[7] S N Wood ldquoModelling and smoothing parameter estimationwith multiple quadratic penaltiesrdquo Journal of the Royal Statisti-cal Society B vol 62 no 2 pp 413ndash428 2000

[8] S NWood ldquoStable and efficient multiple smoothing parameterestimation for generalized additive modelsrdquo Journal of theAmerican Statistical Association vol 99 no 467 pp 673ndash6862004

[9] S N Wood ldquoFast stable direct fitting and smoothness selectionfor generalized additive modelsrdquo Journal of the Royal StatisticalSociety B vol 70 no 3 pp 495ndash518 2008

[10] X Lin and D Zhang ldquoInference in generalized additive mixedmodels by using smoothing splinesrdquo Journal of the RoyalStatistical Society B vol 61 no 2 pp 381ndash400 1999

[11] M P Wand ldquoSmoothing and mixed modelsrdquo ComputationalStatistics vol 18 no 2 pp 223ndash249 2003

[12] L Fahrmeir T Kneib and S Lang ldquoPenalized structuredadditive regression for space-time data a Bayesian perspectiverdquoStatistica Sinica vol 14 no 3 pp 731ndash761 2004

[13] L Fahrmeir and T Kneib ldquoPropriety of posteriors in structuredadditive regression models theory and empirical evidencerdquoJournal of Statistical Planning and Inference vol 139 no 3 pp843ndash859 2009

[14] F Heinzl L Fahrmeir and T Kneib ldquoAdditive mixed modelswith Dirichlet process mixture and P-spline priorsrdquo Advancesin Statistical Analysis vol 96 no 1 pp 47ndash68 2012

[15] G Kauermann ldquoA note on smoothing parameter selection forpenalized spline smoothingrdquo Journal of Statistical Planning andInference vol 127 no 1-2 pp 53ndash69 2005

[16] P Hall and J D Opsomer ldquoTheory for penalised splineregressionrdquo Biometrika vol 92 no 1 pp 105ndash118 2005

[17] Y Li and D Ruppert ldquoOn the asymptotics of penalized splinesrdquoBiometrika vol 95 no 2 pp 415ndash436 2008

[18] G Claeskens T Krivobokova and J D Opsomer ldquoAsymptoticproperties of penalized spline estimatorsrdquo Biometrika vol 96no 3 pp 529ndash544 2009

[19] G Kauermann T Krivobokova and L Fahrmeir ldquoSomeasymptotic results on generalized penalized spline smoothingrdquo

Journal of the Royal Statistical Society B vol 71 no 2 pp 487ndash503 2009

[20] X Wang J Shen and D Ruppert ldquoOn the asymptotics ofpenalized spline smoothingrdquo Electronic Journal of Statistics vol5 pp 1ndash17 2011

[21] T Yoshida and K Naito ldquoAsymptotics for penalized additive 119861-spline regressionrdquo Journal of the Japan Statistical Society vol 42no 1 pp 81ndash107 2012

[22] T Yoshida and K Naito ldquoAsymptotics for penalized splinesin generalized additive modelsrdquo Journal of NonparametricStatistics vol 26 no 2 pp 269ndash289 2014

[23] L Xiao Y Li and D Ruppert ldquoFast bivariate 119875-splines thesandwich smootherrdquo Journal of the Royal Statistical Society Bvol 75 no 3 pp 577ndash599 2013

[24] D Ruppert S J Sheather and M P Wand ldquoAn effectivebandwidth selector for local least squares regressionrdquo Journal ofthe American Statistical Association vol 90 no 432 pp 1257ndash1270 1995

[25] M P Wand and M C Jones Kernel Smoothing Chapman ampHall London UK 1995

[26] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[27] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[28] S Zhou X Shen and D A Wolfe ldquoLocal asymptotics forregression splines and confidence regionsrdquo The Annals ofStatistics vol 26 no 5 pp 1760ndash1782 1998

[29] G K Robinson ldquoThat BLUP is a good thing the estimation ofrandom effectsrdquo Statistical Science vol 6 no 1 pp 15ndash32 1991

[30] T J Hastie and R J Tibshirani Generalized Additive ModelsChapman amp Hall London UKI 1990

[31] T Yoshida ldquoAsymptotics for penalized spline estimators inquantile regressionrdquo Communications in Statistics-Theory andMethods 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article Direct Determination of Smoothing

Journal of Probability and Statistics 11

Acknowledgment

The author also would like to thank two anonymous refereesfor their careful reading and comments to improve the paper

References

[1] F OrsquoSullivan ldquoA statistical perspective on ill-posed inverseproblemsrdquo Statistical Science vol 1 no 4 pp 502ndash527 1986

[2] P H C Eilers and B D Marx ldquoFlexible smoothing with 119861-splines and penaltiesrdquo Statistical Science vol 11 no 2 pp 89ndash121 1996 With comments and a rejoinder by the authors

[3] B D Marx and P H C Eilers ldquoDirect generalized additivemodeling with penalized likelihoodrdquo Computational Statisticsand Data Analysis vol 28 no 2 pp 193ndash209 1998

[4] D Ruppert M P Wand and R J Carroll SemiparametricRegression Cambridge University Press Cambridge UK 2003

[5] T Krivobokova ldquoSmoothing parameter selection in two frame-works for penalized splinesrdquo Journal of the Royal StatisticalSociety B vol 75 no 4 pp 725ndash741 2013

[6] P T Reiss and R T Ogden ldquoSmoothing parameter selection fora class of semiparametric linear modelsrdquo Journal of the RoyalStatistical Society B vol 71 no 2 pp 505ndash523 2009

[7] S N Wood ldquoModelling and smoothing parameter estimationwith multiple quadratic penaltiesrdquo Journal of the Royal Statisti-cal Society B vol 62 no 2 pp 413ndash428 2000

[8] S NWood ldquoStable and efficient multiple smoothing parameterestimation for generalized additive modelsrdquo Journal of theAmerican Statistical Association vol 99 no 467 pp 673ndash6862004

[9] S N Wood ldquoFast stable direct fitting and smoothness selectionfor generalized additive modelsrdquo Journal of the Royal StatisticalSociety B vol 70 no 3 pp 495ndash518 2008

[10] X Lin and D Zhang ldquoInference in generalized additive mixedmodels by using smoothing splinesrdquo Journal of the RoyalStatistical Society B vol 61 no 2 pp 381ndash400 1999

[11] M P Wand ldquoSmoothing and mixed modelsrdquo ComputationalStatistics vol 18 no 2 pp 223ndash249 2003

[12] L Fahrmeir T Kneib and S Lang ldquoPenalized structuredadditive regression for space-time data a Bayesian perspectiverdquoStatistica Sinica vol 14 no 3 pp 731ndash761 2004

[13] L Fahrmeir and T Kneib ldquoPropriety of posteriors in structuredadditive regression models theory and empirical evidencerdquoJournal of Statistical Planning and Inference vol 139 no 3 pp843ndash859 2009

[14] F Heinzl L Fahrmeir and T Kneib ldquoAdditive mixed modelswith Dirichlet process mixture and P-spline priorsrdquo Advancesin Statistical Analysis vol 96 no 1 pp 47ndash68 2012

[15] G Kauermann ldquoA note on smoothing parameter selection forpenalized spline smoothingrdquo Journal of Statistical Planning andInference vol 127 no 1-2 pp 53ndash69 2005

[16] P Hall and J D Opsomer ldquoTheory for penalised splineregressionrdquo Biometrika vol 92 no 1 pp 105ndash118 2005

[17] Y Li and D Ruppert ldquoOn the asymptotics of penalized splinesrdquoBiometrika vol 95 no 2 pp 415ndash436 2008

[18] G Claeskens T Krivobokova and J D Opsomer ldquoAsymptoticproperties of penalized spline estimatorsrdquo Biometrika vol 96no 3 pp 529ndash544 2009

[19] G Kauermann T Krivobokova and L Fahrmeir ldquoSomeasymptotic results on generalized penalized spline smoothingrdquo

Journal of the Royal Statistical Society B vol 71 no 2 pp 487ndash503 2009

[20] X Wang J Shen and D Ruppert ldquoOn the asymptotics ofpenalized spline smoothingrdquo Electronic Journal of Statistics vol5 pp 1ndash17 2011

[21] T Yoshida and K Naito ldquoAsymptotics for penalized additive 119861-spline regressionrdquo Journal of the Japan Statistical Society vol 42no 1 pp 81ndash107 2012

[22] T Yoshida and K Naito ldquoAsymptotics for penalized splinesin generalized additive modelsrdquo Journal of NonparametricStatistics vol 26 no 2 pp 269ndash289 2014

[23] L Xiao Y Li and D Ruppert ldquoFast bivariate 119875-splines thesandwich smootherrdquo Journal of the Royal Statistical Society Bvol 75 no 3 pp 577ndash599 2013

[24] D Ruppert S J Sheather and M P Wand ldquoAn effectivebandwidth selector for local least squares regressionrdquo Journal ofthe American Statistical Association vol 90 no 432 pp 1257ndash1270 1995

[25] M P Wand and M C Jones Kernel Smoothing Chapman ampHall London UK 1995

[26] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[27] D Ruppert ldquoSelecting the number of knots for penalizedsplinesrdquo Journal of Computational and Graphical Statistics vol11 no 4 pp 735ndash757 2002

[28] S Zhou X Shen and D A Wolfe ldquoLocal asymptotics forregression splines and confidence regionsrdquo The Annals ofStatistics vol 26 no 5 pp 1760ndash1782 1998

[29] G K Robinson ldquoThat BLUP is a good thing the estimation ofrandom effectsrdquo Statistical Science vol 6 no 1 pp 15ndash32 1991

[30] T J Hastie and R J Tibshirani Generalized Additive ModelsChapman amp Hall London UKI 1990

[31] T Yoshida ldquoAsymptotics for penalized spline estimators inquantile regressionrdquo Communications in Statistics-Theory andMethods 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: Research Article Direct Determination of Smoothing

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of