partial least squares - ucd school of mathematics and...

12
12/9/2013 1 Partial Least Squares A tutorial Lutgarde Buydens Partial least Squares Multivariate regression Multiple Linear Regression (MLR) Principal Component Regression (PCR) Partial Least Squares (PLS) Validation Preprocessing Multivariate Regression Y n p k Rows: Cases, observations, … Collums: Variables, Classes, tags Analytical observations of different samples Experimental runs Persons …. P: Spectral variables Analytical measurements K: Class information Concentration,.. X: Independent variabels (will be always available) Y: Dependent variables ( to be predicted later from X) 2000 4000 6000 8000 10000 12000 14000 -0.5 0 0.5 1 1.5 2 W avenumber (cm-1) Raw data Multivariate Regression Y n p k Rows: Cases, observations … Collums: Variables, Classes, tags X: Independent variabels (will be always available) Y: Dependent variables ( to be predicted later from X) 2000 4000 6000 8000 10000 12000 14000 -0.5 0 0.5 1 1.5 2 W avenumber (cm -1 ) Raw data Y = f(X) : Predict Y from X MLR: Multiple Linear Regression PCR: Principal Component Regression PLS: Partial Least Sqaures From univariate to Multiple Linear Regression (MLR) Least squares regression y= b 0 +b 1 x 1 + ε b 0 : intercept b 1 : slope ε y x MLR: Multiple Linear Regression Least squares regression y= b 0 +b 1 x 1 + ε b 0 : intercept b 1 : slope Multiple Linear Regression y= b 0 +b 1 x 1 + b 2 x 2 + … b p x p + ε E Y Y ^ ε y x x 2 x 1 y ) , ( y y r maximizes ε

Upload: doananh

Post on 06-May-2018

237 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Partial Least Squares - UCD School of Mathematics and ...brendan/chemometrics/PLS_TUTORIAL.pdf12/9/2013 1 Partial Least Squares A tutorial Lutgarde Buydens Partial least Squares •

12/9/2013

1

Partial Least Squares

A tutorial

Lutgarde Buydens

Partial least Squares

• Multivariate regression• Multiple Linear Regression (MLR)• Principal Component Regression (PCR)• Partial Least Squares (PLS)

• Validation

• Preprocessing

Multivariate Regression

X Y

n

p k

Rows: Cases, observations, … Collums: Variables, Classes, tags

Analytical observations of different samplesExperimental runsPersons….

P: Spectral variablesAnalytical measurements

K: Class informationConcentration,.. X: Independent variabels (will be always available)

Y: Dependent variables ( to be predicted later from X)

2000 4000 6000 8000 10000 12000 14000-0.5

0

0.5

1

1.5

2

Wavenumber (cm-1)

Raw data

Multivariate Regression

X Y

n

pk

Rows: Cases, observations … Collums: Variables, Classes, tagsX: Independent variabels (will be always available)Y: Dependent variables ( to be predicted later from X)

2000 4000 6000 8000 10000 12000 14000-0.5

0

0.5

1

1.5

2

Wavenumber (cm-1)

Raw data

Y = f(X) : Predict Y from X

MLR: Multiple Linear RegressionPCR: Principal Component RegressionPLS: Partial Least Sqaures

From univariate to Multiple Linear Regression (MLR)

Least squares regression

y= b0 +b1 x1 + ε

b0 : interceptb1 : slopeε

y

x

MLR: Multiple Linear Regression

Least squares regression

y= b0 +b1 x1 + ε

b0 : interceptb1 : slope

Multiple Linear Regression

y= b0 +b1 x1 + b2x2 + … bpxp + ε

EYY ^

ε

y

x

x2

x1

y

),(

yyrmaximizes

ε

Page 2: Partial Least Squares - UCD School of Mathematics and ...brendan/chemometrics/PLS_TUTORIAL.pdf12/9/2013 1 Partial Least Squares A tutorial Lutgarde Buydens Partial least Squares •

12/9/2013

2

MLR: Multiple Linear Regression

y= b0 +b1 x1 + b2x2 + … bpxp + ε

+

y b0 eX

nn n

b1

bp

=

EYY ^

11

11

:::

p+1

x

x2

x1

yn1 = Xnpbp1 + en1

Ynk = XnpBpk + Enk

b = (XTX)-1XTy

MLR: Multiple Linear Regression

•Disadavantages: (XTX)-1

• Uncorrelated X-variables required

• n p +1

x2

x1

y r(x1,x2) 1

MLR: Multiple Linear Regression

Disadavantages: (XTX)-1

• Uncorrelated X-variables required

x2

x1

y r(x1,x2) 1

Fits a plane through a line !!

MLR: Multiple Linear Regression

Disadavantages: (XTX)-1

• Uncorrelated X-variables required

x2

x1

y r(x1,x2) 1

3.763.67

-2.91-2.87

0.230.23

5.555.49

3.253.23

-0.99-1.01

3.67

-2.87

0.23

5.49

3.23

-1.01

3.76

-2.91

0.21

5.55

3.25

-0.99

x2x2 x1x1

Set A Set B

11.29

-8.09

2.19

19.09

10.33

-1.89

y

y= b1 x1 + b2x2 + ε

b1 b2 b1 b2

10.3 -6.92 2.96 0.28MLR

R2 =0.98 R2 =0.98

MLR: Multiple Linear Regression

Disadavantages: (XTX)-1

• Uncorrelated X-variables required

• n p +1

X p

n

Dimension reduction Variable Selection

Latent variables (PCR, PLS)

PCR: Principal Component Regression

Step 1: Perform PCA on the original X

Step 2 : Use the orthogonal PC-scores as independent variables in a MLR model

Step 3: Calculate b-coefficients from the a-coefficients

XPCA

T

pcols

n-rows n-rows

acols

a1

a2

aa

MLRy

Step 1 Step2

n-rows

a2

aa

b1

b0

bp

Step 3a1

Page 3: Partial Least Squares - UCD School of Mathematics and ...brendan/chemometrics/PLS_TUTORIAL.pdf12/9/2013 1 Partial Least Squares A tutorial Lutgarde Buydens Partial least Squares •

12/9/2013

3

PCR: Principal Component Regression

PC1

Dimension reduction:

Use scores (projections) on latent variables that explain maximal variance in X

xp

x2

x1

PCR: Principal Component Regression

Step 0 : Meancenter X

Step 1: Perform PCA: X = TPT X* = (TPT)*

Step 2: Perform MLR Y=TA

A = (TTT)-1TTY

Step 3 : Calculate B Y = X* B

Y = (T PT) B

A = PTB

B = (PPT)-1PA

B = PA

Calculate b0’s

MLR on reconstructed X*= (TPT)*

yyb ˆ0

PCR: Principal Component Regression

Optimal number of PC’s

Calculate Crossvalidation RMSE for different # PC’s

nyyRMSECV ii

2)(

PLS: Partial Least Squares Regression

XPLS

T

pcols

n-rows n-rows

acol

a1

a2

aa

MLRy

Phase 1

n-rowsa1

a2

aa

b1

b0

bp

Y

kcols

n-rows

Phase 2

a1

kcols

Phase 3

PLS: Partial Least Squares Regression

Projection to Latent Structure

PC1

xp

x2

x1

LV1 (w)

xp

x2

x1

PCR PLS

Use PC: Maximizes variance in X

Use LV: Maximizes covariance (X,y)

= VarX*vary*cor(X,y)

PLS: Partial Least Squares Regression

Sequential Algorithm: Latent variables and their scores are calculated sequentially

• Step 0: Mean center X

• Step 1: Calculate w

Calculate LV1= w1 that maximizes Covariance (X,Y) : SVD on XTY

(XTY)pk = WpaDaa ZTak w1 = 1st col. of W

Phase 1 : Calculate new independent variables (T)

xp

x2

x1

w1

Page 4: Partial Least Squares - UCD School of Mathematics and ...brendan/chemometrics/PLS_TUTORIAL.pdf12/9/2013 1 Partial Least Squares A tutorial Lutgarde Buydens Partial least Squares •

12/9/2013

4

PLS: Partial Least Squares Regression

Sequential Algorithm: Latent variables and their scores are calculated sequentially

• Step 1: Calculate LV1= w1 that maximizes Covariance (X,Y) : SVD on XTY

(XTY)pk = WpaDaa ZTak w1 = 1st col. of W

•Step 2:

Calculate t1, scores (projections) of X on w1

tn1 = Xnpwp1

Phase 1 : Calculate new independent variables (T)

xp

x2

x1

w

PLS: Partial Least Squares Regression

XPLS

T

pcols

n-rows n-rows

acol

a1

a2

aa

MLRy

Phase 1

n-rowsa1

a2

aa

b1

b0

bp

Y

kcols

n-rows

Phase 2

a1

kcols

Phase 3

Optimal number of LV’s Calculate Crossvalidation RMSE for different # LV’s

nyyRMSECV ii

2)(

PLS: Partial Least Squares Regression

3.763.67

-2.91-2.87

0.230.23

5.555.49

3.253.23

-0.99-1.01

3.67

-2.87

0.23

5.49

3.23

-1.01

3.76

-2.91

0.21

5.55

3.25

-0.99

x2x2 x1x1

Set A Set B

11.29

-8.09

2.19

19.09

10.33

-1.89

y

y= b1 x1 + b2x2 + ε

b1 b2 b1 b2

10.3 -6.92 2.96 0.28

1.60 1.62 1.60 1.62

1.60 1.62 1.60 1.62

MLR

PCR

PLS

MLR, PCR, PLS:

VALIDATION

Estimating prediction error.

Basic Principle:

test how well your model works with new data,

it has not seen yet!

Common measure for prediction error

Page 5: Partial Least Squares - UCD School of Mathematics and ...brendan/chemometrics/PLS_TUTORIAL.pdf12/9/2013 1 Partial Least Squares A tutorial Lutgarde Buydens Partial least Squares •

12/9/2013

5

Prediction error of the samples the model was built on

Error is biased!

Samples also used to build the model

model is biased towards accurate prediction of these specific samples

A Biased Approach

Basic Principle:

test how well your model works with new data, it has not seen yet!

Split data in training and test set.

Several ways:One large test setLeave one out and repeat: LOOLeave n objects out and repeat: LNO

. . .Apply entire model procedure on the test set

Validation: Basic Principle

Validation

Full dataset

Trainingset

Testset

Build model :

b0

bp

yRMSEP

Remark: for final model use whole data set.

Training and test sets

Split in training and test set.• Test set should be

representative of training set• Random choice is often the

best• Check for extremely unlucky

divisions• Apply whole procedure on the

test and validation sets

Cross-validation

• Most simple case: Leave-One-Out (=LOO, segment=1 sample). Normally 10-20% out (=LnO).

• Remark: for final model use whole data set.

Cross-validation: an example

• The data

Page 6: Partial Least Squares - UCD School of Mathematics and ...brendan/chemometrics/PLS_TUTORIAL.pdf12/9/2013 1 Partial Least Squares A tutorial Lutgarde Buydens Partial least Squares •

12/9/2013

6

Cross-validation: an example

• Split data into training set and validation set

Cross-validation: an example

• Split data into training set and test set

Cross-validation: an example

• Build a model on the training set

Cross-validation: an example

Cross-validation: an example

• Split data again into training set and valid. set– Until all samples have been in the validation set once– Common: Leave-One-Out (LOO)

Cross-validation: an example

• Split data again into training set and valid. set– Until all samples have been in the validation set once– Common: Leave-One-Out (LOO)

Page 7: Partial Least Squares - UCD School of Mathematics and ...brendan/chemometrics/PLS_TUTORIAL.pdf12/9/2013 1 Partial Least Squares A tutorial Lutgarde Buydens Partial least Squares •

12/9/2013

7

Cross-validation: an example

• Split data again into training set and valid. set– Until all samples have been in the validation set once– Common: Leave-One-Out (LOO)

Cross-validation: an example

• Split data again into training set and valid. set– Until all samples have been in the validation set once– Common: Leave-One-Out (LOO)

Cross-validation: an example

• Split data again into training set and valid. set– Until all samples have been in the validation set once– Common: Leave-One-Out (LOO)

Cross-validation: an example

• Split data again into training set and valid. set– Until all samples have been in the validation set once– Common: Leave-One-Out (LOO)

Cross-validation: a warning

• Data: 13 x 5 = 65 NIR spectra (1102 wavelengths)– 13 samples: different composition of NaOH, NaOCl and Na2CO3

– 5 temperatures: each sample measured at 5 temperatures

Composition

NaOH (wt%) NaOCl(wt%)

Na2CO3 (wt%) Temperature (°C)

1 18.99 0 0 15 21 27 34 402 9.15 9.99 0.15 15 21 27 34 403 15.01 0 4.01 15 21 27 34 404 9.34 5.96 3.97 15 21 27 34 40… … … … …13 16.02 2.01 1.00 15 21 27 34 40

Cross-validation: a warning

• The data

65

1102

65

312

13

Leave SAMPLE out

y

Page 8: Partial Least Squares - UCD School of Mathematics and ...brendan/chemometrics/PLS_TUTORIAL.pdf12/9/2013 1 Partial Least Squares A tutorial Lutgarde Buydens Partial least Squares •

12/9/2013

8

Selection of number of LV’s

Trough Validation:

Choose number of LV’s that results in model with lowest prediction errror

Testset to assess final model cannot be used !

Divide trainingsetCrossvalidation

Validation

Full dataset

TrainingSet

Test’ set

Testset

1) determine #LV’s : wit test’ set

2) Build model : b0

bp

yRMSEP

Remark: for final model use whole data set.

Double Cross Validation

Full dataset

TrainingsetC

1) determine #LV’s : CV Innerloop

2) Build model : CV Outer loop

b0

bp

yRMSEP

Remark: for final model use whole data set Skip.

CV 1

CV2

Double cross-validation

• The data

Double cross-validation

• Split data into training set and validation set

Double cross-validation

• Split data into training set and validation set

Used later to assess model performance!

Page 9: Partial Least Squares - UCD School of Mathematics and ...brendan/chemometrics/PLS_TUTORIAL.pdf12/9/2013 1 Partial Least Squares A tutorial Lutgarde Buydens Partial least Squares •

12/9/2013

9

Double cross-validation

• Apply crossvalidation on the rest: Split training set into(new) training set and test set

1LV 2LV 3LV

1LV 2LV 3LV 1LV 2LV 3LV

Lowest RMSECV

Double cross-validation

Page 10: Partial Least Squares - UCD School of Mathematics and ...brendan/chemometrics/PLS_TUTORIAL.pdf12/9/2013 1 Partial Least Squares A tutorial Lutgarde Buydens Partial least Squares •

12/9/2013

10

Cross-validation: an example

• Repeat procedure – Until all samples have been in the validation set once

Cross-validation: an example

• Repeat procedure – Until all samples have been in the validation set once

Double cross-validation

• In this way:– The number of LVs is determined by using samples not

used to build the model with

– The prediction error is also determined using samples the model has not seen before

Remark: for final model use whole data set.

Raw + meancentered data

2000 4000 6000 8000 10000 12000 14000-0.5

0

0.5

1

1.5

2

Wavenumber (cm-1)

Abs

orba

nce

(a.u

.)

Raw data

2000 4000 6000 8000 10000 12000 14000-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

Wavenumber (cm-1)

Abs

orba

nce

(a.u

.)

Meancentered data

PLS: an example

RMSECV vs. No of LVs

1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Number of LVs

RM

SEC

V

RMSECV values for prediction of NaOH

Regression coeffficients

3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000-0.5

0

0.5

1

1.5

2

Wavenumber (cm-1)

Abs

orba

nce

(a.u

.)

Raw data

3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000-2

0

2

4

6

8

10

Wavenumber (cm-1)

Reg

ress

ion

coef

ficie

nt

Page 11: Partial Least Squares - UCD School of Mathematics and ...brendan/chemometrics/PLS_TUTORIAL.pdf12/9/2013 1 Partial Least Squares A tutorial Lutgarde Buydens Partial least Squares •

12/9/2013

11

True vs. predicted

8 10 12 14 16 18 208

10

12

14

16

18

20

NaOH, true

NaO

H, p

redi

cted

True values vs. predictions

500 1000 1500 2000 2500

0

0.5

1

1.5

2

2.5

3

Wavelength (cm-1)

Inte

nsity

(a.u

)

Original spectrumOffsetSlopeScatter

Why Pre-Processing ?Data Artefacts• Baseline correction• Alignment• Scatter correction• Noise removal• Scaling, Normalisation• Transformation• …..

Other• Missing values• Outliers

0 200 400 600 800 1000 1200 1400 16000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength (a.u.)

Inte

nsity

(a.u

.)

original

0 200 400 600 800 1000 1200 1400 160000.10.20.30.40.50.60.70.8

Wavelength (a.u.)

Inte

nsity

(a.u

.) offset

0 200 400 600 800 1000 1200 1400 160000.10.20.30.40.50.60.70.8

Wavelength (a.u.)

Inte

nsity

(a.u

.)

offset+slope

0 200 400 600 800 1000 1200 1400 16000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength (a.u.)

Inte

nsity

(a.u

.) multiplicative

0 200 400 600 800 1000 1200 1400 16000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength (a.u.)

Inte

nsity

(a.u

.)

originaloffsetoffset+slopemultiplicativeoffset + slope + multiplicative

0 200 400 600 800 1000 1200 1400 16000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Wavelength (a.u.)

Inte

nsity

(a.u

.)

originaloffsetoffset+slopemultiplicativeoffset + slope + multiplicative

Pre-Processing Methods4914 combinations: all reasonable

STEP 1: (7x) BASELINE

STEP 2: (10x) SCATTER

STEP 3: (10x) NOISE

STEP 4: (7x) SCALING &

TRANSFORMATIONS No baseline correction No scatter correction No noise removal Meancentering

(3x) Detrending polynomial order (2-3-4)

(4x) scaling: Mean Median Max L2 norm

(9x) S-G smoothing (window: 5-9-11 pt) (order: 2-3-4)

Autoscaling

Range scaling

(2x) Derivatisation (1st – 2nd )

SNV Pareto scaling

(3x) RNV (15, 25, 35)% Poisson scaling

AsLS MSC Level scaling

Log transformation

Supervised pre-processing methods OSC No noise removal Meancentering

DOSC Autoscaling Range scaling Pareto scaling Poisson scaling Level scaling Log scaling

Pre-Processing Results

Classification accuracy %

Com

plex

ity o

f the

mod

el (n

o of

LV)

• Complexity of the model : no of LV• Classification Accuracy Raw Data

J. Engel et al. TrAC 2013

Page 12: Partial Least Squares - UCD School of Mathematics and ...brendan/chemometrics/PLS_TUTORIAL.pdf12/9/2013 1 Partial Least Squares A tutorial Lutgarde Buydens Partial least Squares •

12/9/2013

12

• PLS Toolbox (Eigenvector Inc.)– www.eigenvector.com– For use in MATLAB (or standalone!)

• XLSTAT-PLS (XLSTAT)– www.xlstat.com– For use in Microsoft Excel

• Package pls for R– Free software– http://cran.r-project.org

SOFTWARE