„the perfect is not good enough !” (carl benz)
DESCRIPTION
Visualization of high dimensional data by use of genetic programming – application to on- line infrared spectroscopy based process monitoring Tibor Kulcsár, János Abonyi University of Pannonia Department of Process Engineering. „The perfect is not good enough !” (Carl Benz). - PowerPoint PPT PresentationTRANSCRIPT
„The perfect is not good enough!” (Carl Benz)
VISUALIZATION OF HIGH DIMENSIONAL DATA BY USE OF GENETIC PROGRAMMING – APPLICATION TO ON-LINE INFRARED SPECTROSCOPY BASED PROCESS MONITORINGTIBOR KULCSÁR, JÁNOS ABONYIUNIVERSITY OF PANNONIADEPARTMENT OF PROCESS ENGINEERING
2
PreconditionsOnline analyzers are widely used in oil industry to
predict product properties like Density, Cloud point, etc.
Properties can’t be described using linear models
Visualization of high dimensional spectral database is needed for model development and proces monitoring
Cost function and a tool for equation discovery is needed to obtain compact and interpretable mappingof high dimensional data
3
4000 4100 4200 4300 4400 4500 4600 4700 48000
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
cm-1
jj f wy
w
y
n
n
R
R
w
y 3010 yn
195wn
Njwn
kkj ,...,1,1
1
Task I: Estimation
Nj ,...,1
Tjnjj yPP ,...1y
4
Similar spectra - Similar property
-1.5 -1 -0.5 0 0.5 1 1.5
x 10-4
-4
-3
-2
-1
0
1
2
3
4
5x 10
-5
Dmax
Rsphere = 3 Percentage of Dmax corresponding to the radius of the sphere
15 20 25 30 351
1.5
2
2.5
3
3.5
4
4.5
5
5.5
TotAroP
olyC
ycl
k
n
kkxkjxjjx wwSSdi
1
,
mjxjx iSSdi ,vxvj PP vvxvj EPP
Evim
5
Finding similar spectraPrediction model
Nearest Neighbors algorithm The neighborhood is basis of the
prediction
2D mapping Define the range of validity for
the local models The mapped plain should follow
the original spectral space
Quality measure Measure the quality of mapping Measure the neighborhood
preserving
Property X = f ( Prop[S1, S2, S3, S4, S5, S6] )
S1S2S4
S6S5S3
N2
N4
N6N5
N3
N1X
nxP̂
n
1iiP
6
Chemical information – interpretable?
0
.2
.4
.6
.8
1
1.2
4000 4100 4200 4300 4400 4500 4600 4700 4800
Abs
orbe
ncy
Aromatic
Ethy
leni
c
Ole
f ini c
Aro
mat
i cA
rom
ati c
Bra
nche
d / c
ycl o
nic
Linear
Saturated
Saturated
Branched
Wavenumber (cm-1)
43
21~WWWWKARO
aromaticlinear
olefinic
7
Aggregates – need for explicit mapping
1.8 2 2.25
6
7
Rsat
Karo
1.8 2 2.210
20
30
Rsat
Kiso
1.8 2 2.210
15
20
Rsat
Kene
1.8 2 2.265
70
75
Rsat
Nol
a
1.8 2 2.215
20
25
Rsat
Nol
ef
1.8 2 2.2-10
0
10
Rsat
Nar
o
1.8 2 2.2-100
-50
0
Rsat
Kox
1.8 2 2.280
100
120
Rsat
Par
ox
1.8 2 2.2-1
-0.5
0
Rsat
Kar
o3
1.8 2 2.2100
150
Rsat
Kcy
1.8 2 2.20
50
100
Rsat
Ksat
u
1.8 2 2.20
50
100
Rsat
Kero
H
1.8 2 2.29
9.5
10
Rsat
AKa
ro
5.5 6 6.5 710
20
30
Karo
Kiso
5.5 6 6.5 710
15
20
Karo
Kene
5.5 6 6.5 765
70
75
Karo
Nol
a5.5 6 6.5 7
15
20
25
Karo
Nol
ef
5.5 6 6.5 7-10
0
10
Karo
Nar
o
5.5 6 6.5 7-100
-50
0
Karo
Kox
5.5 6 6.5 780
100
120
Karo
Paro
x
5.5 6 6.5 7-1
-0.5
0
Karo
Karo
3
5.5 6 6.5 7100
150
Karo
Kcy
5.5 6 6.5 70
50
100
Karo
Ksat
u
5.5 6 6.5 70
50
100
Karo
Kero
H
5.5 6 6.5 79
9.5
10
Karo
AKar
o
15 20 25 3010
15
20
Kiso
Kene
15 20 25 3065
70
75
Kiso
Nol
a
15 20 25 3015
20
25
Kiso
Nol
ef
15 20 25 30-10
0
10
Kiso
Nar
o
15 20 25 30-100
-50
0
Kiso
Kox
15 20 25 3080
100
120
Kiso
Paro
x
15 20 25 30-1
-0.5
0
Kiso
Karo
3
15 20 25 30100
150
Kiso
Kcy
15 20 25 300
50
100
Kiso
Ksat
u
15 20 25 300
50
100
Kiso
Kero
H
15 20 25 309
9.5
10
Kiso
AKar
o
12 14 16 1865
70
75
Kene
Nol
a
12 14 16 1815
20
25
Kene
Nol
ef
12 14 16 18-10
0
10
Kene
Nar
o
12 14 16 18-100
-50
0
Kene
Kox
12 14 16 1880
100
120
Kene
Paro
x
12 14 16 18-1
-0.5
0
Kene
Karo
3
12 14 16 18100
150
Kene
Kcy
12 14 16 180
50
100
Kene
Ksat
u
12 14 16 180
50
100
Kene
Kero
H
12 14 16 189
9.5
10
Kene
AKar
o
65 70 7515
20
25
Nola
Nol
ef
65 70 75-10
0
10
NolaN
aro
65 70 75-100
-50
0
Nola
Kox
65 70 7580
100
120
Nola
Paro
x
65 70 75-1
-0.5
0
Nola
Karo
3
65 70 75100
150
Nola
Kcy
65 70 750
50
100
Nola
Ksat
u
65 70 750
50
100
Nola
Kero
H
65 70 759
9.5
10
Nola
AKa
ro
15 20 25-10
0
10
Nolef
Nar
o
15 20 25-100
-50
0
Nolef
Kox
15 20 2580
100
120
Nolef
Paro
x
15 20 25-1
-0.5
0
Nolef
Karo
3
15 20 25100
150
Nolef
Kcy
15 20 250
50
100
Nolef
Ksat
u
15 20 250
50
100
Nolef
Kero
H
15 20 259
9.5
10
Nolef
AKar
o
-5 0 5 10-100
-50
0
NaroKo
x-5 0 5 10
80
100
120
Naro
Paro
x
-5 0 5 10-1
-0.5
0
Naro
Karo
3
-5 0 5 10100
150
Naro
Kcy
-5 0 5 100
50
100
Naro
Ksat
u
-5 0 5 100
50
100
Naro
Kero
H
-5 0 5 109
9.5
10
Naro
AKar
o
-100 -50 080
100
120
Kox
Paro
x
-100 -50 0-1
-0.5
0
Kox
Karo
3
-100 -50 0100
150
Kox
Kcy
-100 -50 00
50
100
Kox
Ksat
u
-100 -50 00
50
100
Kox
Kero
H
-100 -50 09
9.5
10
Kox
AKar
o
80 90 100 110-1
-0.5
0
Parox
Karo
3
80 90 100 110100
150
Parox
Kcy
80 90 100 1100
50
100
Parox
Ksat
u
80 90 100 1100
50
100
Parox
Kero
H
80 90 100 1109
9.5
10
Parox
AKar
o
-1 -0.5 0100
150
Karo3
Kcy
-1 -0.5 00
50
100
Karo3
Ksat
u
-1 -0.5 00
50
100
Karo3
Kero
H
-1 -0.5 09
9.5
10
Karo3
AKar
o
100 120 140 1600
50
100
Kcy
Ksat
u
100 120 140 1600
50
100
Kcy
Kero
H
100 120 140 1609
9.5
10
Kcy
AKa
ro
20 40 60 800
50
100
Ksatu
Kero
H
20 40 60 809
9.5
10
Ksatu
AKar
o
0 50 1009
9.5
10
KeroH
AKar
o
𝑊1 +𝑊2𝑊3 +𝑊4 +𝑊5
൬𝑊1 ⋅ 𝑊2𝑊3 ⋅ 𝑊4 −𝐶1൰𝐶2 +𝐶3
൬𝑊1 −𝑊2 +𝐶1𝐶2𝑊1 +𝑊4 −𝐶3൰𝐶4 +𝐶5
൬𝑊1𝑊2𝑊3 −𝐶1൰𝐶2 −𝐶3
൬𝑊1 +𝑊2 +𝑊3𝑊4 +𝑊5 −𝐶1൰𝐶2 +𝐶3
൬𝐶1𝑊1 +𝑊2𝑊3 +𝑊4 −𝐶1൰𝐶2 +𝐶3
ሺ𝐶1𝑊1 +𝐶2𝑊2 +𝐶3𝑊3 +𝐶4𝑊4 +𝐶5𝑊5 +𝐶6𝑊6ሻ−𝐶7
Agg
rage
2
Aggrage 1
Two aggregate
2D mapping
8
Representation of AggregatesOne of the most popular method
for representing structures is the binary tree.
1221 / pxpxy
Terminal nodes:Variables: x1, x2
Parameters: p1, p2
Non terminal nodesOperators: +,-,*,/Functions: exp(),cos()
–
X1 /
+ P1
P2 X2
9
Genetic Operators: Mutation
-
x1 /
*
x2x1
p1
-
x1 /
+
x2x1
p1
10
Genetic Operators: Crossover-
x1 /+
x2x1
p1
+
x2 +
x1 p1
+
x2
-
x1 +
x1 p1
/+
x2x1
p1
11
Scheme of Genetic ProgramingCreation of initial
population
Evaluation
Selection
Direct reproduction
New generation
End?
End
Crossover Mutation
Parameteroptimization
Fitnessvalue
12
Process of model developmentMeasurement•Online spectrum
•Labor data
MATLAB•Preprocessing•Data query
MATLAB Genetic
algorithmTOPNIR
environment
Online System
13
Results
Best pair from original set Best eq and an optimised pair
Searche a better pair
14
ConclusionThe quality of mapping is measureable
Neighborhood preserving (forward and backward) Discriminating operational regimes
Aggregate based mapping Interpretable chemical information Build aggregate – needs much experience (divination)
Genetic programing Controlled method to make new equations Needs proper cost function
(measure the quality of mapping)
Visual representation of models Aggregate -> 2D plot -> dashboard graph Information about the model structure
15
Questions? …
The financial support of the TAMOP-4.2.2/B-10/1-2010-0025 project is acknowledged.
ACKNOWLEDGMENT
In case of any question or remark please contact us