from linearity to nonlinear additive spline modeling in partial least-squares regression
DESCRIPTION
Scuola della Società Italiana di Statistica, Capua 2004/09/15. From linearity to nonlinear additive spline modeling in Partial Least-Squares regression. Jean-François Durand Montpellier II University. Main effects Linear Partial Least-Squares (PLSL). p predictors (cont. or categorical). - PowerPoint PPT PresentationTRANSCRIPT
From linearity to nonlinear additive spline modeling in
Partial Least-Squares regression
Jean-François Durand
Montpellier II University
Scuola della Società Italiana di Statistica, Capua 2004/09/15
Main effects Linear Partial Least-Squares (PLSL)
Learning data matrices : X nxp, r=rank(X), and Y nxqLearning data matrices : X nxp, r=rank(X), and Y nxq
pxx ,,1 p predictors (cont. or categorical)p predictors (cont. or categorical)
q responses (cont. or categorical)q responses (cont. or categorical) continuous : regression model continuous : regression model q indicator var ’s : classification modelq indicator var ’s : classification model
qyy ,,1
All variables are standardized with respect to
),,( 1 nppdiagD
k latent variablesk latent variables
algorithmalgorithm (1)(1)
(2) Once obtained, « Partial » regressions are made(2) Once obtained, « Partial » regressions are made
and next is computed on remaining informationand next is computed on remaining information
).,cov(max Yt
ktt ,,1
ppwXwXXwt 1
1
t
),(^
tXOLSX ),(^
tYOLSY
t
^
XXX ^
YYY
12 w
OLS model on the k latent variablesOLS model on the k latent variables
« coordinate » linear function of« coordinate » linear function of
the main effect of on thethe main effect of on the response .response .
To summarize : PLSL (X ,Y)To summarize : PLSL (X ,Y)
jpk
j,pk
j,j εxxy ˆˆ 11
ikij x,̂
ixjy
jkkjjj tctcy ˆˆ 1
1
The dimension of the model : kThe dimension of the model : k
Cross-Validation (CV or GCV)Cross-Validation (CV or GCV)
if k=r, PLSL( X , Y) = OLS(X , Y)if k=r, PLSL( X , Y) = OLS(X , Y)
If Y = X ,If Y = X , PLSL( X , Y=X ) = PCA( X )PLSL( X , Y=X ) = PCA( X )
Pruning step : Variable subset selection (CV or GCV)
Maps of the observationsMaps of the observations
Main effects Partial Least-Squares Splines (PLSS)
Additive model through k latent variablesAdditive model through k latent variables
« coordinate » spline function of« coordinate » spline function of
the main effect of on thethe main effect of on the response : a spline functionresponse : a spline function To summarize : To summarize : PLSS(X ,Y)= PLSL(B ,Y)PLSS(X ,Y)= PLSL(B ,Y) B = spline coding matrix of X B = spline coding matrix of X
jpk
j,pk
j,j ε)(xs)(xsy ˆˆ 11
)(ˆ , ikij xs ix
jy
ktt ,,1
principal components mapsprincipal components maps
Pruning step Pruning step : parsimonious models by selecting main : parsimonious models by selecting main effects according to the range of spline functions. effects according to the range of spline functions. Validation of the new models : CV or GCVValidation of the new models : CV or GCV
tuning parameterstuning parameters
The PLS dimension : k (CV or GCV)The PLS dimension : k (CV or GCV) The spline space The spline space for each predictorfor each predictor
the degree the degree dd the « knots» : the number the « knots» : the number KK and the locations and the locations Dimension of the spline spaceDimension of the spline space : : d+1+K d+1+K
Advantages of PLSSAdvantages of PLSS
against colinearity of predictors
against small ratio #observations / #predictors
easy to interpret the main effects spline functions
Multivariate Additive PLS Splines : MAPLSS (bivariate interactions)
j
Iii
iik
iijp
i
ikijj εxxsxsy
)',(
'',
1
, ),(ˆ)(ˆ
Model casted in the ANOVA decomposition :Model casted in the ANOVA decomposition :
ANOVAANOVA splinespline functionsfunctions
The curse of dimensionalityThe curse of dimensionality
The price of nonlinearity : expansion of the dimension of B
MAPLSS(X,Y) = PLSL(B,Y)
B = spline coding matrix of X with interactions
Example : p predictors (p -1)p / 2 possible interactions
spline dimension = 10 for each predictor
#columns of the design matrix BLinear PLS pPLSS (main effects) 10p MAPLSS (all bivar. interactions) 10p + 10²p(p - 1)/2 ~ (10p)²/2
Necessity of eliminating non influent interactions
Rule: Order decreasingly interactions, refuse one if CRIT(k)<0Rule: Order decreasingly interactions, refuse one if CRIT(k)<0
),()( kPRESSkCVm ),()( kGCVkCVm
)(
)()(
)(
)()()(
2
22
kCV
kCVkCV
kR
kRkRkCRIT
m
imm
m
mim
1) Automatic selection of candidate interactions :1) Automatic selection of candidate interactions :
Denote Denote oror
each interaction each interaction ii is is separately addedseparately added to the main effects model to the main effects model mm and evaluatedand evaluated
2) Add step-by-step ordered candidates to the main effects 2) Add step-by-step ordered candidates to the main effects model, and accept a model if it model, and accept a model if it significantlysignificantly improves CV improves CV
3) Pruning step : Selection of main effects and interactions according to the range of the ANOVA functions (CV/GCV)
Advantages of MAPLSS :
inherits the advantages of PLSL and PLSS
captures most influential bivariate interations
easy interpretable ANOVA function plots
Disadvantages of MAPLSS :
no higher interactions
no automatic selection of spline parameters
Bibliography
J. F. Durand. Local Polynomial Additive Regression through PLS and Splines: PLSS, Chemometrics and Intelligent Laboratory Systems 58, 235-246, 2001.
J. F. Durand and R. Lombardo. Interactions terms in nonlinear PLS via additive spline transformations. « Between Data Science
and Applied Data Analysis », Studies in Classification, Data Analysis, and Knowledge Organization . Eds M.Schader, W.
Gaul and M. Vichi, Springer, 22-29, 2003