regression / calibration mlr, rr, pcr, pls. paul geladi head of research nirce unit of biomass...

68
Regression / Calibration MLR, RR, PCR, PLS

Upload: harvey-watkins

Post on 24-Dec-2015

230 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Regression / Calibration

MLR, RR, PCR, PLS

Page 2: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Paul Geladi

Head of Research NIRCEUnit of Biomass Technology and ChemistrySwedish University of Agricultural SciencesUmeåTechnobothniaVasa [email protected] [email protected]

Page 3: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Univariate regression

Page 4: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

x

y

Offset

Slope

Page 5: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

x

y

Offset a

Slope b

y = a + bx +

Page 6: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

x

y

Page 7: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

x

y Linear fit

Underfit

Page 8: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

x

y Overfit

Page 9: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

x

y Quadratic fit

Page 10: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Multivariate linear regression

Page 11: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

y = f(x)

Works sometimes

y = f(x)

Works only for a few variables

Measurement noise!

∞ possible functions

Page 12: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

X y

I

K

Page 13: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

y = f(x)

y = f(x)

Simplified by:

y = b0 + b1x1 + b2x2 + ... + bKxK + f

Linear approximation

Page 14: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

y = b0 + b1x1 + b2x2 + ... + bKxK + f

y : responsexk : predictorsbk : regression coefficientsb0 : offset, constantf : residual

Nomenclature

Page 15: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

X y

I

K

X, y mean-centered b0 out

Page 16: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

y = b1x1 + b2x2 + ... + bKxK + f

y = b1x1 + b2x2 + ... + bKxK + f

y = b1x1 + b2x2 + ... + bKxK + f

y = b1x1 + b2x2 + ... + bKxK + f

y = b1x1 + b2x2 + ... + bKxK + f

} I samples

Page 17: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

y = b1x1 + b2x2 + ... + bKxK +f

y = b1x1 + b2x2 + ... + bKxK +f

y = b1x1 + b2x2 + ... + bKxK +f

y = b1x1 + b2x2 + ... + bKxK +f

y = b1x1 + b2x2 + ... + bKxK +f

Page 18: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Xy

I

K

f

b

= +

y = Xb + f

Page 19: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

X, y known, measurableb, f unknown

No solution

f must be constrained

Page 20: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

The MLR solution

Multiple Linear Regression

Ordinary Least Squares (OLS)

Page 21: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

b = (X’X)-1 X’y

Problems?

Least squares

Page 22: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

3b1 + 4b2 = 14b1 + 5b2 = 0

One solution

Page 23: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

3b1 + 4b2 = 14b1 + 5b2 = 0 b1 + b2 = 4

No solution

Page 24: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

3b1 + 4b2 + b3 = 14b1 + 5b2 + b3 = 0

∞ solutions

Page 25: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

b = (X’X)-1 X’y

-K > I ∞ solutions-I > K no solution-error in X-error in y-inverse may not exist-inverse may be unstable

Page 26: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

3b1 + 4b2 + e = 14b1 + 5b2 + e = 0 b1 + b2 + e = 4

Solution

Page 27: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Wanted solution

- I ≥ K- No inverse- No noise in X

Page 28: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Diagnostics

y = Xb + f

SS tot = SSmod + SSres

R2 = SSmod / SStot = 1- SSres / SStot

Coefficient of determination

Page 29: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Diagnostics

y = Xb + f

SSres = f’f

RMSEC = [ SSres / (I-A) ] 1/2

Root Mean Squared Error of Calibration

Page 30: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Alternatives to MLR/OLS

Page 31: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Ridge Regression (RR)

b = (X’X)-1 X’y

I easiest to invert

b = (X’X + kI)-1 X’y

k (ridge constant) as small as possible

Page 32: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Problems

- Choice of ridge constant

- No diagnostics

Page 33: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Principal Component Regression (PCR)

- I ≥ K

-Easy inversion

Page 34: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Principal Component Regression (PCR)

X T

K A

PCA

- A ≤ I- T orthogonal- Noise in X removed

Page 35: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Principal Component Regression (PCR)

y = Td + f

d = (T’T)-1 T’y

Page 36: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Problem

How many components used?

Page 37: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Advantage

- PCA done on data- Outliers- Classes- Noise in X removed

Page 38: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Partial Least SquaresRegression

Page 39: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

X Yt u

Page 40: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

X Yt u

w’ q’

Outer relationship

Page 41: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

X Yt u

w’ q’

Inner relationship

Page 42: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

X Yt u

w’ q’

A

A A

A

p’

Page 43: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Advantages

- X decomposed- Y decomposed- Noise in X left out- Noise in Y left out

Page 44: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

PCR, PLS are one component at a time methods

After each component, a residual is calculated

The next component is calculatedon the residual

Page 45: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Another view

y = Xb + f

y = XbRR + fRR

y = XbPCR + fPCR

y = XbPLS + fPLS

Page 46: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

bbb123OLSShrunk and rotatedA regression vector with too much shrinkage

Subspace of useful regression vectors

Page 47: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Prediction

Page 48: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Xcal ycal

I

K

Xtest ytest

J

yhat

Page 49: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Prediction diagnostics

yhat = Xtestb

ftest = ytest -yhat

PRESS = ftest’ftest

RMSEP = [ PRESS / J ] 1/2

Root Mean Squared Error of Prediction

Page 50: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Prediction diagnostics

yhat = Xtestb

ftest = ytest -yhat

R2test = Q2 = 1 - ftest’ftest/ytest’ytest

Page 51: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Some rules of thumb

R2 > 0.65 5 PLS comp.

R2test > 0.5

R2 - R2test < 0.2

Page 52: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Bias

f = y - Xb

always 0 bias

ftest = y - yhat

bias = 1/J ftest

Page 53: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Leverage - influence

b= (X’X)-1 X’y

yhat = Xb = X(X’X)-1 X’y = Hy

the Hat matrix

diagonal elements of H: Leverage

Page 54: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Leverage - influence

b= (X’X)-1 X’y

yhat = Xb = X(X’X)-1 X’y = Hy

the Hat matrix

diagonal elements of H: Leverage

Page 55: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Leverage - influence

Page 56: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Leverage - influence

Page 57: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Leverage - influence

Page 58: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

ypred0OutlierBiasedftestUnbiasedLarge varianceSmall varianceHeteroscedastic

Residual plot

Page 59: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Residual

-Check histogram f

-Check variablewise E

-Check objectwise E

Page 60: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural
Page 61: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Measured responsePredicted responseMeasured responsePredicted responseHeteroscedasticMeasured responsePredicted responseOutlier byextrapolationBad outlierEFG

Page 62: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

X Yt u

w’ q’

A

A A

A

p’

Page 63: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Plotting: line plots

Scree plot RMSEC, RMSECV, RMSEP

Loading plot against wavel.

Score plot against time

Residual against sample

Residual against yhat

T2 against sample

H against sample

Page 64: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Plotting: scatter plots 2D, 3DScore plot

Loading plot

Biplot

H against residual

Inner relation t - u

Weight wq

Page 65: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Nonlinearities

Page 66: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

xyxyxyABDLinearWeak nonlinearxyCStrong nonlinearNon-monotonicxyELinear approximations

Page 67: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Remedies for nonlinearites. Making nonlinear data fit a linear model or making the model nonlinear.

-Fundamental theory (e.g. going from transmittance to absorbance)

-Use extra latent variables in PCR or PLSR

-Use transformations of latent variables

-Remove disturbing variables

-Find subsets that behave linearly

Page 68: Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural

Remedies for nonlinearites. Making nonlinear data fit a linear model or making the model nonlinear.

-Use intrinsically nonlinear methods

-Locally transform variables X, y, or both nonlinearly (powers, logarithms, adding powers)

-Transformation in a neighbourhood (window methods)

-Use global transformations (Fourier, Wavelet)

-GIFI type discretization