from linearity to nonlinear additive spline modeling in partial least-squares regression

From linearity to nonlinear additive spline modeling in

Partial Least-Squares regression

Jean-François Durand

Montpellier II University

Scuola della Società Italiana di Statistica, Capua 2004/09/15

Main effects Linear Partial Least-Squares (PLSL)

Learning data matrices : X nxp, r=rank(X), and Y nxqLearning data matrices : X nxp, r=rank(X), and Y nxq

pxx ,,1 p predictors (cont. or categorical)p predictors (cont. or categorical)

q responses (cont. or categorical)q responses (cont. or categorical) continuous : regression model continuous : regression model q indicator var ’s : classification modelq indicator var ’s : classification model

qyy ,,1

All variables are standardized with respect to

),,( 1 nppdiagD

k latent variablesk latent variables

algorithmalgorithm (1)(1)

(2) Once obtained, « Partial » regressions are made(2) Once obtained, « Partial » regressions are made

and next is computed on remaining informationand next is computed on remaining information

).,cov(max Yt

ktt ,,1

ppwXwXXwt 1

1

t

),(^

tXOLSX ),(^

tYOLSY

t

^

XXX ^

YYY

12 w

OLS model on the k latent variablesOLS model on the k latent variables

« coordinate » linear function of« coordinate » linear function of

the main effect of on thethe main effect of on the response .response .

To summarize : PLSL (X ,Y)To summarize : PLSL (X ,Y)

jpk

j,pk

j,j εxxy ˆˆ 11

ikij x,̂

ixjy

jkkjjj tctcy ˆˆ 1

1

The dimension of the model : kThe dimension of the model : k

Cross-Validation (CV or GCV)Cross-Validation (CV or GCV)

if k=r, PLSL( X , Y) = OLS(X , Y)if k=r, PLSL( X , Y) = OLS(X , Y)

If Y = X ,If Y = X , PLSL( X , Y=X ) = PCA( X )PLSL( X , Y=X ) = PCA( X )

Pruning step : Variable subset selection (CV or GCV)

Maps of the observationsMaps of the observations

Main effects Partial Least-Squares Splines (PLSS)

Additive model through k latent variablesAdditive model through k latent variables

« coordinate » spline function of« coordinate » spline function of

the main effect of on thethe main effect of on the response : a spline functionresponse : a spline function To summarize : To summarize : PLSS(X ,Y)= PLSL(B ,Y)PLSS(X ,Y)= PLSL(B ,Y) B = spline coding matrix of X B = spline coding matrix of X

jpk

j,pk

j,j ε)(xs)(xsy ˆˆ 11

)(ˆ , ikij xs ix

jy

ktt ,,1

principal components mapsprincipal components maps

Pruning step Pruning step : parsimonious models by selecting main : parsimonious models by selecting main effects according to the range of spline functions. effects according to the range of spline functions. Validation of the new models : CV or GCVValidation of the new models : CV or GCV

tuning parameterstuning parameters

The PLS dimension : k (CV or GCV)The PLS dimension : k (CV or GCV) The spline space The spline space for each predictorfor each predictor

the degree the degree dd the « knots» : the number the « knots» : the number KK and the locations and the locations Dimension of the spline spaceDimension of the spline space : : d+1+K d+1+K

Advantages of PLSSAdvantages of PLSS

against colinearity of predictors

against small ratio #observations / #predictors

easy to interpret the main effects spline functions

Multivariate Additive PLS Splines : MAPLSS (bivariate interactions)

j

Iii

iik

iijp

i

ikijj εxxsxsy

)',(

'',

1

, ),(ˆ)(ˆ

Model casted in the ANOVA decomposition :Model casted in the ANOVA decomposition :

ANOVAANOVA splinespline functionsfunctions

The curse of dimensionalityThe curse of dimensionality

The price of nonlinearity : expansion of the dimension of B

MAPLSS(X,Y) = PLSL(B,Y)

B = spline coding matrix of X with interactions

Example : p predictors (p -1)p / 2 possible interactions

spline dimension = 10 for each predictor

#columns of the design matrix BLinear PLS pPLSS (main effects) 10p MAPLSS (all bivar. interactions) 10p + 10²p(p - 1)/2 ~ (10p)²/2

Necessity of eliminating non influent interactions

Rule: Order decreasingly interactions, refuse one if CRIT(k)<0Rule: Order decreasingly interactions, refuse one if CRIT(k)<0

),()( kPRESSkCVm ),()( kGCVkCVm

)(

)()(

)(

)()()(

2

22

kCV

kCVkCV

kR

kRkRkCRIT

m

imm

m

mim

1) Automatic selection of candidate interactions :1) Automatic selection of candidate interactions :

Denote Denote oror

each interaction each interaction ii is is separately addedseparately added to the main effects model to the main effects model mm and evaluatedand evaluated

2) Add step-by-step ordered candidates to the main effects 2) Add step-by-step ordered candidates to the main effects model, and accept a model if it model, and accept a model if it significantlysignificantly improves CV improves CV

3) Pruning step : Selection of main effects and interactions according to the range of the ANOVA functions (CV/GCV)

Advantages of MAPLSS :

inherits the advantages of PLSL and PLSS

captures most influential bivariate interations

easy interpretable ANOVA function plots

Disadvantages of MAPLSS :

no higher interactions

no automatic selection of spline parameters

Bibliography

J. F. Durand. Local Polynomial Additive Regression through PLS and Splines: PLSS, Chemometrics and Intelligent Laboratory Systems 58, 235-246, 2001.

J. F. Durand and R. Lombardo. Interactions terms in nonlinear PLS via additive spline transformations. « Between Data Science

and Applied Data Analysis », Studies in Classification, Data Analysis, and Knowledge Organization . Eds M.Schader, W.

Gaul and M. Vichi, Springer, 22-29, 2003

from linearity to nonlinear additive spline modeling in partial least-squares regression

Documents

spline coding matrix

plsl x

range of spline functions

pls dimension

gcvthe spline space

selection of main effects

nonlinear pls

main effects linear