chemometric functions in excel

45
01.12.08 1 Chemometric functions in Chemometric functions in Excel Excel Oxana Rodionova & Alexey Pomerantsev Oxana Rodionova & Alexey Pomerantsev Semenov Institute of Chemical Semenov Institute of Chemical Physics Physics [email protected] [email protected]

Upload: gunnar

Post on 14-Jan-2016

111 views

Category:

Documents


7 download

DESCRIPTION

Chemometric functions in Excel. Oxana Rodionova & Alexey Pomerantsev Semenov Institute of Chemical Physics [email protected]. Distance Learning Course in Chemometrics for Technological and Natural-Science Mastership Education. Unfulfilled need in chemometric education in Russia - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chemometric functions in Excel

01.12.08 1

Chemometric functions in Excel Chemometric functions in Excel

Oxana Rodionova & Alexey PomerantsevOxana Rodionova & Alexey Pomerantsev

Semenov Institute of Chemical PhysicsSemenov Institute of Chemical Physics

[email protected]@chph.ras.ru

Page 2: Chemometric functions in Excel

01.12.08 2

Distance Learning Course in Chemometrics Distance Learning Course in Chemometrics for Technological and Natural-Science for Technological and Natural-Science Mastership EducationMastership Education

3000 km

4000 km

• Unfulfilled need in chemometric education in Russia

• Low number of qualified specialists in chemometrics

• Large distances, e.g. Moscow – Barnaul is about 3000 km

• No modern chemometrics books in Russian

• No available chemometric software

• No support from officials: government, Academy, etc

Barnaul

• Easy available everywhere => INTERNET

• Interactive layout: all calculations should be clear and repeatable

• Web friendly environment for the calculations => EXCEL

• Necessity to make and use our own (free) software => EXCEL Add-In

Page 3: Chemometric functions in Excel

01.12.08 3

Chemometric Chemometric calculations in calculations in Excel Excel

Excel UserInterface

VBAFunctions

С++DLL

DA

TAR

es

ults

Input

Calculations

Provides user with all possibilities of Excel interface, worksheet calculations, worksheet functions, charts, etc.

VBA helps to simplify routine work

All calculations are made "on the fly“ and very fast

Page 4: Chemometric functions in Excel

01.12.08 4

InstallationInstallation

http://rcs.chph.ras.ru/down/sacs.ziphttp://rcs.chph.ras.ru/down/sacs.zip

Chemometrics. xlaChemometrics. xla put in the AddInn folderput in the AddInn folder

(C:\Documents and Settings\ (C:\Documents and Settings\ <User>\Application Data\ <User>\Application Data\ Microsoft\AddIns\)Microsoft\AddIns\)

Chemometrics.dllChemometrics.dll

put in your Windows folderput in your Windows folder (C:\WINDOWS\)(C:\WINDOWS\)

Load Chemometrics.xla by < Excel Options> <Add-Ins> in the open Workbook

Page 5: Chemometric functions in Excel

01.12.08 5

Matrix calculations in ExcelMatrix calculations in Excel

={TRANSPOSE(B6:F10)}

={MMULT(B6:F10,TRANSPOSE(Barr))}

B6:F10

Barr

Ctrl-Shift-Enter

Page 6: Chemometric functions in Excel

01.12.08 6

Principal Component Analysis (PCA)Principal Component Analysis (PCA)

Initial data

Loading matrix

XI

J A

Score matrix

TI= +

×

Error matrix

EI

J

PT

J

A

PJ

A

X=TPT+E

Page 7: Chemometric functions in Excel

01.12.08 7

Chemometrics XLA. PCA ScoresChemometrics XLA. PCA Scores

={ScoresPCA(Xcal,5,1,Xtst)}

CenteringAND/ORweighting

nPC

XcalXcal

XtstXtst

Page 8: Chemometric functions in Excel

01.12.08 8

Chemometrics XLA. PCA LoadingsChemometrics XLA. PCA Loadings

=TRANSPOSE(LoadingsPCA(Xcal,5,1))}CenteringAND/ORweighting

nPCExcel worksheet function

XcalXcal

Page 9: Chemometric functions in Excel

01.12.08 9

List of chemometric functionsList of chemometric functionsPCA ScoresPCA <for calibration or test samples>

LoadingsPCA

PLS ScoresPLS <X-scores for calibration or test samples>

UScoresPLS <Y-scores for calibration or test samples>

LoadingsPLS <P-loadings>

WLoadingsPLS

QLoadingsPLS

PLS2 ScoresPLS2 <X-scores for calibration or test samples>

UScoresPLS2 <Y-scores for calibration or test samples>

LoadingsPLS2 <P-loadings>

WLoadingsPLS2

QLoadingsPLS2

Options:

• Centering AND/OR scaling

• Number of PCs

Page 10: Chemometric functions in Excel

01.12.08 10

ScoresPCAScoresPCA

ScoresPCA (rMatrix [, nPCs] [,nCentWeightX] [, rMatrixNew] ) 

X data (calibration set)

Number of PC (A)

centering and/or scaling

1 centering

2 scaling

3 both

Test set

X[IJ] T[I A]

Page 11: Chemometric functions in Excel

01.12.08 11

Validation RulesValidation Rules

If rMatrixNew is omitted then only calibration scores are calculated

If rMatrixNew is specified then only test scores are calculated

If rMatrixNew coincides with rMatrix then cross-validation is

calculated10% -out

cross-validation

Page 12: Chemometric functions in Excel

01.12.08 12

LoadingsPCALoadingsPCA

LoadingsPCA (rMatrix [, nPCs] [,nCentWeightX]) 

X data (calibration set)

Number of PC (A)

centering and/or scaling

1 centering

2 scaling

3 both

X[IJ] P[J A]

Page 13: Chemometric functions in Excel

01.12.08 13

Explorative Data AnalysisExplorative Data Analysis

Case study 1: People

Page 14: Chemometric functions in Excel

01.12.08 14

PeoplePeople

Page 15: Chemometric functions in Excel

01.12.08 15

Dataset in Excel Workbook (People.xls)Dataset in Excel Workbook (People.xls)

Number of objects (n) = 32

Number of variables (m) = 12

Page 16: Chemometric functions in Excel

01.12.08 16

Data PreprocessingData Preprocessing

Aim: to transform the data into the most suitable form for data analysis

Page 17: Chemometric functions in Excel

01.12.08 17

AutoscalingAutoscaling

mean centering scaling

autoscaling

+

=

Page 18: Chemometric functions in Excel

01.12.08 18

PeoplePeople: : Scores & Loadings (PC1 vs. PC2)Scores & Loadings (PC1 vs. PC2)

-2

0

2

4

-4 -2 0 2 4 6

t1

t2

FSFS

FS

FS

FS

FS

FS

FS

FN

FN

FNFN

FN

FN

FN

FN

MS

MSMS

MS

MSMS

MSMS

MN

MN

MNMN

MN

MN

MNMN

-2

0

2

4

-4 -2 0 2 4 6

t1

t2

Height

Weight

Hairs

Shoes

Age

IncomeBeer

Wine

Sex

Strength

Region

IQ

-0.3

0.0

0.3

0.6

-0.4 -0.2 0.0 0.2 0.4

P1

P2 a)

“Map of Samples” “Map of Variables”

Page 19: Chemometric functions in Excel

01.12.08 19

PeoplePeople: : Scores & Loadings (PC1 vs. PC3)Scores & Loadings (PC1 vs. PC3)

MNMN

MN

MN

MNMN

MN

MN

MSMS

MS MS

MS

MSMS

MS

FN

FN

FN

FN

FNFN

FN

FN

FS

FS

FS

FS

FS

FS

FSFS

-3

-1

1

3

-4 -2 0 2 4 6

t1

t3

Score plot Loading plot

IQ

Region

StrengthSex

Wine

Beer

Income

Age

ShoesHairs

Weight

Height

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

-0.4 -0.2 0.0 0.2 0.4

P1

P3 a)

18

20

21

2627

26

30

33

2324

2427

30

36

32

35

36

4240

41

32 3337

41

40

49

37

50

43

55

4648

-3

-2

-1

0

1

2

3

-4 -2 0 2 4 6

t1

t3

Page 20: Chemometric functions in Excel

01.12.08 20

Case study 2: HPLC-DADCase study 2: HPLC-DAD

Page 21: Chemometric functions in Excel

01.12.08 21

MeasurementsMeasurements

15

913

1721

2529

220

249

277

306

334

0.0

0.2

0.4

0.6

0.8

1.0

1.2

AU

time

wavelength

Page 22: Chemometric functions in Excel

01.12.08 22

Dataset in Excel WorkbookDataset in Excel Workbook

X(3028)

Page 23: Chemometric functions in Excel

01.12.08 23

Pure compoundsPure compounds A andA and BB

X=CST+E

0.0

0.2

0.4

0.6

0.8

1.0

220 240 260 280 300 320 340

l, nm

AU A

BC (t )

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0 5 10 15 20 25 30

time

A

B

If we observe X can we predict C and S ?

Page 24: Chemometric functions in Excel

01.12.08 24

30292827262524232221201918

1716

1514

1312 11

10

9

8

7

6

54

3

2

1

t 1

t 2

Score plotScore plot

B

A

C (t )

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0 5 10 15 20 25 30

time

A

B

Page 25: Chemometric functions in Excel

01.12.08 25

Conclusions from the Score PlotConclusions from the Score Plot

1. Linear regions = Pure compounds

2. Curved line= Co-elution

3. Closer to the origin = Lower intensity

4. Number of bends = Number of different compounds

Page 26: Chemometric functions in Excel

01.12.08 26

Factor analysis vs. PCA analysisFactor analysis vs. PCA analysis

X

E1

+

=

CST×

2

J

I

I

J

X

E2

+

=

TPT×

A

J

I

I

J

Page 27: Chemometric functions in Excel

01.12.08 27

Scores and LoadingsScores and Loadings

S , P

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

220 240 260 280 300 320 340

wave length

A

B

p1

p2

C , T

-0.8

0.2

1.2

2.2

3.2

0 5 10 15 20 25 30

time

A

B

t1

t2

Page 28: Chemometric functions in Excel

01.12.08 28

Procrustes transformationProcrustes transformation

X ≈ CST

X ≈ TPT

I = RRT = Identity matrix

X ≈ T(RRT)PT = (TR)(PR)T

C ≈ TR S ≈ PR

R = Rstretch ×Rrotation

^ ^

Page 29: Chemometric functions in Excel

01.12.08 29

Scores TransformationScores Transformation

3029282726252423222120191817 16 15 1413 12 11 10

9

8

7

6

54

3

21

t 1

t 2

12

3

4 5

6

7

8

9

10111213

1415161718192021222324252627282930

t 1

t 2

Stretching

12

3

45

6

7

8

9

101112131415161718192021222324252627282930

t 1

t 2

Page 30: Chemometric functions in Excel

01.12.08 30

Procrustes analysis resultsProcrustes analysis results

C (t )

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0 5 10 15 20 25 30

time

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0C hat(t )

A

B

Ahat

Bhat

0.0

0.2

0.4

0.6

0.8

1.0

220 240 260 280 300 320 340

wavelength l, nm

S(l)

0.0

0.2

0.4

0.6

0.8

1.0

1.2S hat(l)

A

B

Ahat

Bhat

Page 31: Chemometric functions in Excel

01.12.08 31

Conclusions Conclusions

1. Scaling and centering is problem dependent

2. In this example number of PCs = Number of

different compounds

Page 32: Chemometric functions in Excel

01.12.08 32

RegressionRegression

Page 33: Chemometric functions in Excel

01.12.08 33

Principal Component Regression (PCR)Principal Component Regression (PCR)

Xp1

t

pAt...

tAt1

...

P

T

a = + e yT

1) PCA

2) MLR

Page 34: Chemometric functions in Excel

01.12.08 34

Projection on Latent Structures (PLS)Projection on Latent Structures (PLS)

w1 t

wA t...

Xp1

t

pAt...

tAt1

... Yu1 uA

...

...q1

t

qAtQ

U

P

T

W

Page 35: Chemometric functions in Excel

01.12.08 35

Projection on Latent Structures (PLS)Projection on Latent Structures (PLS)

B = + e YT

Page 36: Chemometric functions in Excel

01.12.08 36

PLS and PLS2PLS and PLS2

b = + e yT1

1 1

B = + E YTM

M M

PLS

PLS2

Page 37: Chemometric functions in Excel

01.12.08 37

ScoresPLSScoresPLS

ScoresPLS (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

X Test set

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

X[IJ], Y[I1] T[IA]

Page 38: Chemometric functions in Excel

01.12.08 38

UScoresPLSUScoresPLS

UScoresPLS (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew] [,

rMatrixYNew])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

X Test set

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

Y Test set

X[IJ] , Y[I1] U[I A]

Page 39: Chemometric functions in Excel

01.12.08 39

WLoadingsPLSWLoadingsPLS

WLoadingsPLS (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

X[IJ] , Y[I1] W[J A]

Page 40: Chemometric functions in Excel

01.12.08 40

LoadingsPLSLoadingsPLS

LoadingsPLS (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

X[IJ] , Y[I1] P[JA]

Page 41: Chemometric functions in Excel

01.12.08 41

QLoadingsPLSQLoadingsPLS

QLoadingsPLS (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

X[IJ], Y[I1] Q[1 A]

Page 42: Chemometric functions in Excel

01.12.08 42

ScoresPLS2ScoresPLS2

ScoresPLS2 (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

X Test set

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

X[IJ], Y[IK] T[I A]

Page 43: Chemometric functions in Excel

01.12.08 43

UScoresPLS2UScoresPLS2

UScoresPLS2 (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY] [, rMatrixXNew] [,

rMatrixYNew])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

X Test set

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

Y Test set

X[IJ], Y[IK] U[I A]

Page 44: Chemometric functions in Excel

01.12.08 44

LoadingsPLS2LoadingsPLS2

LoadingsPLS2 (rMatrixX, rMatrixY

[, nPCs] [, nCentWeightX] [, nCentWeightY])

X data (calibration set)

Number of PC (A)

centering and/or scaling of X

1 centering

2 scaling

3 both

Y data (calibration set)

centering and/or scaling of Y

1 centering

2 scaling

3 both

WLoadingsPLS2WLoadingsPLS2

QLoadingsPLS2QLoadingsPLS2

X[IJ], Y[IK] P[J A] or W[J A] or Q[K A]

Page 45: Chemometric functions in Excel

01.12.08 45

Seventh Winter Symposium on Seventh Winter Symposium on ChemometricsChemometrics

near Tula city, February 2010

100 km