statistical methods for the analysis, monitoring and optimization … · 2014-04-07 ·...
TRANSCRIPT
Multivariate statistical methods for the analysis, monitoring and optimization of
processesJay Liu
Dept. of Chemical EngineeringPukyong National University
1유준, Copyright ©
Credits
• Prof. John MacGregor @ McMaster University– “father” of multivariate statistics & industrial applications– Course outline and topics covered are similar to his course
• Mr. Kevin Dunn @ McMaster University/ConnectMV– Most of this lecture materials come from his
• Prosensus– Providing one‐year ProMV academic license for free.– Providing materials for tutorials
유준, Copyright © 2
1. Introduction
Extracting value from data• Engineers can use large quantities of data:
1. Improve process understanding2. Troubleshooting process problems3. Improving, optimizing and controlling processes4. Predictive modeling (inferential sensors)5. Process monitoring
We will come back to these in the next class
3유준, Copyright ©
• Throughout this course,
4
Convention in Multivariate statistics
유준, Copyright ©
Data Characteristics
5유준, Copyright ©
Data Characteristics
• Today
6
Near‐Infrared spectrum
≫
유준, Copyright ©
Data Characteristics
7유준, Copyright ©
Data Characteristics
8유준, Copyright ©
Data Characteristics
9유준, Copyright ©
Data Characteristics
10유준, Copyright ©
Data Analysis Purpose
• Summary & trouble‐shooting• Predictive modeling (Regression)• Labeling or grading (Classification)• Process & product design• Scale‐up & product transfer• QSAR (Quantitative Structure Activity Relationships)• And many more
11유준, Copyright ©
Examples: trouble‐shooting
• Undetected process changes @ pulp digester• Didn’t know when & why the changes occurred• Data: 301 obs., 22 vars.
• 22 time‐series plots? 231 Scatter plots?
12유준, Copyright ©
-5
-4
-3
-2
-1
0
1
2
3
4
5
-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8
t[2]
t[1]
1000
10011002
1003100410051006
1007
1008
10091010
10111012
1013
1014
1015
10161017
1018
1019
10201021
1022
10231024 1025
10261027
1028
10291030
10311032
10331034
1035
1036
1037
10381039
1040104110421043
1044
1045
10461047
1048
1049
10501051
10521053105410551056
1057
10581059
1060
1061
10621063
1064
10651066
1067
10681069
1070
10711072
10731074
1075
1076
1077
1078
1079
1080
1081
10821083
1084
10851086
108710881089
1090
1091
10921093
109410951096109710981099
1100
11011102
11031104
1105
1106
1107 1108
11091110
1111
11121113
1114
1115
1116
1117
1118
1119
1120
1121
11221123
11241125
1126112711281129
11301131
11321133
11341135113611371138
11391140114111421143
114411451146
1147
1148
11491150
1151
1152
1153
115411551156
11571158
1159116011611162
1163
11641165
11661167
1168
1169
11701171
117211731174
11751176
1177117811791180
1181
11821183118411851186
11871188 1189 11901191 1192
1193
1194
1195 119611971198
1199
1200
1201
1202
1203
1204
1205
12061207
1208
1209
1210
1211
1212
1213
12141215
1216
1217
1218
1219
12201221
122212231224
1225
1226122712281229
1230
1231
1232123312341235
1236
12371238
123912401241
124212431244 12451246
1247124812491250
12511252
1253
12541255
12561257125812591260
12611262
1263126412651266
1267
12681269
12701271
12721273
12741275
127612771278
1279
1280 1281128212831284
1285
12861287
12881289
1290129112921293
1294
12951296
12971298
1299
1300
PCA score plot
Cluster 2
Cluster 1
Examples: trouble‐shooting
• PCA gives ONE excellent summary• Two distinct clusters in a t1‐t2 score plotCluster 1: obs.1000~1030, obs.1192~1300Cluster 2: obs.1031~1191
13유준, Copyright ©
-3
-2
-1
0
1
2
3
4
5Yk
ap
chip
4
blcm
blflw
chpl
4
uxt2
lxt2
ucza
3
wlfl4
wls
t4
aaw
d4
chm
o4
bsfl4 lht3
uht3
cmfr4
tfflw
xflw
2
f18f
w
stfl3 tct4
lqs4
Scor
e C
ontri
b(O
bs 1
206
- Obs
112
3), W
eigh
t=p1
p2
Var ID (Primary)
PCA Contribution plot
v2
v1
Examples: trouble‐shooting
• PCA gives candidates of root‐cause• During the period of cluster 2, v1 and v2 fluctuated while they were maintained constant level during the period of cluster 1.
14유준, Copyright ©
Examples: predictive modeling
• Measurement of biodiesel quality• Spec. & methods given in ASTM
15
Costs too much time & money!
유준, Copyright ©
Examples: predictive modeling
• PLS gives ONE accurate multivariate calibration model for biodiesel & impurities.
16
5000 5500 6000 6500 7000 7500 8000 8500 9000 95000.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
Wavenumber(cm-1)A
bsor
banc
e
NIR spectrum
N
Quality
PLS model
Components Biodiesel MEOH Freeglycerin
MonoGlyceride
Di Glyceride
Tri Glyceride
R2 value 0.999 0.994 0.995 0.996 0.994 0.994
유준, Copyright ©
Issues faced with engineering data
17유준, Copyright ©
Issues faced with engineering data
18유준, Copyright ©
Issues faced with engineering data
19
Can be up to 30% (or more!).
유준, Copyright ©
Issues faced with engineering data
20유준, Copyright ©
Issues faced with engineering data
21유준, Copyright ©
Examples of large data sets
• Typical fab processes– Data size: order of terabytes in a few days
22
Database merging in a data‐mining package
Databases in a typical fab process
유준, Copyright ©
What is a latent variable?
• Fortunately (& unfortunately at the same time), all variables are not independent.• Redundant images of few “latent” variables
23유준, Copyright ©
What is a latent variable?
24
Protein profile
I am a doctor!
Metabolic profile
Gene expression
Clinical History/Q&A
유준, Copyright ©
What is a latent variable?
유준, Copyright © 25
What is a latent variable?
유준, Copyright © 26
What is a latent variable?
유준, Copyright © 27
What is a latent variable?
유준, Copyright © 28
Latent variable methods
(Multivariate statistical methods = latent variable methods)
• Principal component analysis (PCA)– a.k.a Karhunen‐Loève transform or Hotelling transform
• Factor analysis• Fisher’s discriminant analysis• Independent component analysis• Principal component regression• Partial least squares (or projection to latent structures)
• …..
유준, Copyright © 29