statistical methods for the analysis, monitoring and optimization … · 2014-04-07 ·...

29
Multivariate statistical methods for the analysis, monitoring and optimization of processes Jay Liu Dept. of Chemical Engineering Pukyong National University 1 유준, Copyright ©

Upload: others

Post on 11-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Multivariate statistical methods for the analysis, monitoring and optimization of 

processesJay Liu

Dept. of Chemical EngineeringPukyong National University

1유준, Copyright ©

Page 2: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Credits

• Prof. John MacGregor @ McMaster University– “father” of multivariate statistics & industrial applications– Course outline and topics covered are similar to his course

• Mr. Kevin Dunn @ McMaster University/ConnectMV– Most of this lecture materials come from his

• Prosensus– Providing one‐year ProMV academic license for free.– Providing materials for tutorials

유준, Copyright © 2

Page 3: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

1. Introduction

Extracting value from data• Engineers can use large quantities of data:

1. Improve process understanding2. Troubleshooting process problems3. Improving, optimizing and controlling processes4. Predictive modeling (inferential sensors)5. Process monitoring

We will come back to these in the next class

3유준, Copyright ©

Page 4: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

• Throughout this course,

4

Convention in Multivariate statistics

유준, Copyright ©

Page 5: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Data Characteristics

5유준, Copyright ©

Page 6: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Data Characteristics

• Today

6

Near‐Infrared spectrum

유준, Copyright ©

Page 7: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Data Characteristics

7유준, Copyright ©

Page 8: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Data Characteristics

8유준, Copyright ©

Page 9: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Data Characteristics

9유준, Copyright ©

Page 10: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Data Characteristics

10유준, Copyright ©

Page 11: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Data Analysis Purpose

• Summary & trouble‐shooting• Predictive modeling (Regression)• Labeling or grading (Classification)• Process & product design• Scale‐up & product transfer• QSAR (Quantitative Structure Activity Relationships)• And many more

11유준, Copyright ©

Page 12: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Examples: trouble‐shooting

• Undetected process changes @ pulp digester• Didn’t know when & why the changes occurred• Data: 301 obs., 22 vars.

• 22 time‐series plots? 231 Scatter plots? 

12유준, Copyright ©

Page 13: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

-5

-4

-3

-2

-1

0

1

2

3

4

5

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8

t[2]

t[1]

1000

10011002

1003100410051006

1007

1008

10091010

10111012

1013

1014

1015

10161017

1018

1019

10201021

1022

10231024 1025

10261027

1028

10291030

10311032

10331034

1035

1036

1037

10381039

1040104110421043

1044

1045

10461047

1048

1049

10501051

10521053105410551056

1057

10581059

1060

1061

10621063

1064

10651066

1067

10681069

1070

10711072

10731074

1075

1076

1077

1078

1079

1080

1081

10821083

1084

10851086

108710881089

1090

1091

10921093

109410951096109710981099

1100

11011102

11031104

1105

1106

1107 1108

11091110

1111

11121113

1114

1115

1116

1117

1118

1119

1120

1121

11221123

11241125

1126112711281129

11301131

11321133

11341135113611371138

11391140114111421143

114411451146

1147

1148

11491150

1151

1152

1153

115411551156

11571158

1159116011611162

1163

11641165

11661167

1168

1169

11701171

117211731174

11751176

1177117811791180

1181

11821183118411851186

11871188 1189 11901191 1192

1193

1194

1195 119611971198

1199

1200

1201

1202

1203

1204

1205

12061207

1208

1209

1210

1211

1212

1213

12141215

1216

1217

1218

1219

12201221

122212231224

1225

1226122712281229

1230

1231

1232123312341235

1236

12371238

123912401241

124212431244 12451246

1247124812491250

12511252

1253

12541255

12561257125812591260

12611262

1263126412651266

1267

12681269

12701271

12721273

12741275

127612771278

1279

1280 1281128212831284

1285

12861287

12881289

1290129112921293

1294

12951296

12971298

1299

1300

PCA score plot

Cluster 2

Cluster 1

Examples: trouble‐shooting

• PCA gives ONE excellent summary• Two distinct clusters in a t1‐t2 score plotCluster 1: obs.1000~1030, obs.1192~1300Cluster 2: obs.1031~1191

13유준, Copyright ©

Page 14: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

-3

-2

-1

0

1

2

3

4

5Yk

ap

chip

4

blcm

blflw

chpl

4

uxt2

lxt2

ucza

3

wlfl4

wls

t4

aaw

d4

chm

o4

bsfl4 lht3

uht3

cmfr4

tfflw

xflw

2

f18f

w

stfl3 tct4

lqs4

Scor

e C

ontri

b(O

bs 1

206

- Obs

112

3), W

eigh

t=p1

p2

Var ID (Primary)

PCA Contribution plot

v2

v1

Examples: trouble‐shooting

• PCA gives candidates of root‐cause• During the period of cluster 2, v1 and v2 fluctuated while they were maintained constant level during the period of cluster 1.

14유준, Copyright ©

Page 15: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Examples: predictive modeling

• Measurement of biodiesel quality• Spec. & methods given in ASTM

15

Costs too much time & money!

유준, Copyright ©

Page 16: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Examples: predictive modeling

• PLS gives ONE accurate multivariate calibration model for biodiesel & impurities.

16

5000 5500 6000 6500 7000 7500 8000 8500 9000 95000.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

Wavenumber(cm-1)A

bsor

banc

e

NIR spectrum

N

Quality

PLS model

Components Biodiesel MEOH Freeglycerin

MonoGlyceride

Di Glyceride

Tri Glyceride

R2 value 0.999 0.994 0.995 0.996 0.994 0.994

유준, Copyright ©

Page 17: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Issues faced with engineering data

17유준, Copyright ©

Page 18: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Issues faced with engineering data

18유준, Copyright ©

Page 19: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Issues faced with engineering data

19

Can be up to 30% (or more!).

유준, Copyright ©

Page 20: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Issues faced with engineering data

20유준, Copyright ©

Page 21: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Issues faced with engineering data

21유준, Copyright ©

Page 22: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Examples of large data sets

• Typical fab processes– Data size: order of terabytes in a few days

22

Database merging in a data‐mining package

Databases in a typical fab process

유준, Copyright ©

Page 23: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

What is a latent variable?

• Fortunately (& unfortunately at the same time), all variables are not independent.• Redundant images of few “latent” variables

23유준, Copyright ©

Page 24: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

What is a latent variable?

24

Protein profile

I am a doctor!

Metabolic profile

Gene expression

Clinical History/Q&A

유준, Copyright ©

Page 25: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

What is a latent variable?

유준, Copyright © 25

Page 26: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

What is a latent variable?

유준, Copyright © 26

Page 27: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

What is a latent variable?

유준, Copyright © 27

Page 28: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

What is a latent variable?

유준, Copyright © 28

Page 29: statistical methods for the analysis, monitoring and optimization … · 2014-04-07 · (Multivariate statistical methods = latent variable methods) • Principal component analysis

Latent variable methods

(Multivariate statistical methods = latent variable methods)

• Principal component analysis (PCA)– a.k.a Karhunen‐Loève transform or Hotelling transform

• Factor analysis• Fisher’s discriminant analysis• Independent component analysis• Principal component regression• Partial least squares (or projection to latent structures)

• …..

유준, Copyright © 29