implementation of the web-based flow-oriented...

1
Implementation of the web-based flow-oriented approach to the process control and optimization 2 , Oxana Ye. Rodionova 1 Moscow State Institute of Electronics and Mathematics, Moscow 2 Semenov Institute of Chemical Physics, Russian Academy of Sciences, Moscow 1 2,3 Yury V. Zontov , Alexey L. Pomerantsev 3 State South Research & Testing Site RAS, Sochi Projection methods + SIC approach For real world data SIC method is often used together with some multivariate calibration method. The most useful outcomes are yielded when regression results are supplemented with the SIC results. the Fig. 9 Procedure flow-chart Initial Data Set {X,Y} PLS/PCR model Fixed number of PCs SIC-modeling - + [v , v] b: b , b min sic RMSEC RMSEP y ˆ SIC prediction Using the obtained RPV we can solve the prediction problem for any given predictor vector x (e.g. a spectra). The result of prediction is presented as an interval for response y + V = [ v , v ] + t where v = maximum xa, for a subject to a О A t v = minimum xa, for a subject to a О A. This is a typical problem of linear programming, so interval V can be found for any new object x (see Fig. 6), and there is no need to construct A explicitly. SIC Status Classification The interval approach helps to build the explicit classification of calibration or new objects (test set samples, or new x-data alone) in relation to the obtained calibration model, which is presented by the RPV. This classification is constructed using the following measures of prediction quality (Fig. 7): SIC-residual r is defined as - + r (x, y) = [ y – 0.5 (v + v ) ]/b SIC-leverage h is defined as - + h (x, y) = [ 0.5 (v v ) ]/b Two fundamental equations | r (x, y) |=1 – h (x, y) and | r (x, y) |=1 + h (x, y) divide the SIC-residual (r) vs. SIC-leverage (h) plane into three categories: insiders, outsiders and outliers. This can be represented as the object status plot (OSP, Fig. 8) for any dimensionality of initial data set and for any number of estimated model parameters. Fig. 5 Typical shape of RPV - polyhedron in the 2-dimensional space y V + v v t y=xa a 1 a 2 RPV Fig. 6 SIC interval prediction Fig. 7 Prediction & calibration intervals Fig.8 SIC Object Status Plot. Region of possible parameter values (RPV) The Region of Possible (parameter) Values ( RPV, see Fig.5) is a set in parameter space determined as p A={aО R : |Xa y|< b } Region A is a closed convex polyhedron. This is a volumetric analogue of the conventional parameter point estimates, which is calculated by some traditional regression method, e.g. PLS. Introduction We present CSDesign, a modular web-based system providing a flow-oriented style environment for the synthesis and analysis of various processes. Two mathematical methods are implemented for the process control and optimization.They are the PLS regression and the SIC-method. S1 S2 S3 M1 M2 M3 CM1 CM2 CM3 W1 W2 W3 CW1 CW2 CW3 WR1 WR2 MR1 MR2 S W CW M CM P A1 A2 A3 A4 A5 A6 I 6 II 8 III 11 IV 14 V 16 VI 19 VII 25 Fig. 2 Multi-stage production process S1 S2 S3 W1 W2 W3 WR1 WR2 CW1 CW2 CW3 M1 M2 M3 MR1 MR2 CM1 CM2 CM3 A1 A2 A3 A4 A5 A6 Y Training Set (102) Y Test Set (52) Y X V X VI X VII X I X II X III X IV X V X VI X VII X I X II X III X IV Fig. 3 Process variables set M Multistage process The functionality of the system is illustrated with a real-world example of a multi-stage continuous technological process. It is represented by 25 key variables X and by one output variable y, which is the final quality of the product. The whole cycle is divided into seven stages numbered by the Roman numerals. First stage (I) is represented by six input variables (W1, W2, W3 and S1, S2, S3) that stand for the properties of the raw components S and W. At the second stage (II) component W is refining and variables WR1 and WR2 characterize this process. Variables CW1, CW2, and CW3 (Stage III) represent the properties of the outcome product CW. The next stage (IV) is mixing of the raw component S and the refined component CW. The result M is characterized by variables M1, M2, and M3. Afterward, blend M is also refined (Stage V) with the process characteristics MR1 and MR2, and the properties of outcome CM are presented by variables CM1, CM2, and CM3 (Stage VI). The last stage (VII) stands for the ultimate amendments, which are done with additives A1,…, A6. The output variable (P=y) is the final product quality. Data set description We have a collection of historical data measured for 154 samples that characterize proper process performance. Each sample corresponds to the entire production cycle shown on Fig. 2. The whole data set is divided horizontally (by samples) in two parts: the training set (102 objects) and the test set (52 object). All data are also divided vertically (by variables) into 7 blocks in conformity with the technological stages (see Fig. 3). Fig. 10 The modeling network Fig. 11 SIC-Results module GUI Fig. 12 OSP Module GUI Software implementation CSDesign is a software system, that uses the ideas of flow-based programming approach. In computer science, flow-based programming (FBP) is a programming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections by message passing, where the connections are specified externally to the processes. This approach allows you to extend application functionality by adding new modules or by making changes in their interaction patterns without having to change their internal structure. Also, CSDesign is web-based, which means, that you don’t need to install it on all computers in your laboratory. Everything you’ll need is modern web-browser. Process control and optimization Using these data, we construct a series of PLS1 regression models. Each model is denoted here by the operator XY(M), which maps the X block, X(M), to the Y block, y. Each XY model uses the same number of PLS principal components k. The main purpose of these models is the prediction of the output quality variable y at each (M-th) stage of production process. The predicted value could be further compared with a desired quality level. Too large difference signalizes that something is wrong and the process demands active improvements at the next (M+1)-th stage. To verify these corrections, a process engineer may try out various values of the variables that characterize stage M+1. The corresponding model XY(M+1): X(M+1)y can validate the solution. Therefore ,the system of such models serves as an “adviser” that helps the engineer to make a decision. However, this adviser cannot predict the future outcome y exactly. There is always some uncertainty. To present it, the corresponding SIC models are used. These models are built on the base of the relative PLS models with a given number of principal components, k. Fig. 1 Base classes and interfaces Main features The main features of the system are as follows: Extensibility This means that the functionality and the range of tasks solved by the system may be expanded by addition of a new task-specific software module. The programming framework developed as a CSDesign part provides a number of interfaces, which a new software module should implement (see Fig. 1) . Intuitiveness This means a specific problem development in the form of a flow-chart drawing including such regular actions as drag-n-drop of the solutions' components, and their interconnection with links. Server-side calculations Most of all complex and resource-consuming calculations are processed remotely and the results are transferred back to a client asynchronously using the Ajax technology. The rich user interface is used primarily for visualization and input. SIC definition Simple Interval Calculation (SIC) is a method for linear modeling that gives the result of prediction directly in the interval form. The primary SIC consequence is a radically new object classification that can be interpreted using a two-dimensional object status plot (OSP), ‘SIC residual vs. SIC leverage’ . SIC basic assumption All errors involved in the general calibration problem t y = xa + e are limited (sampling errors, measurement errors, modeling errors, etc.), which would appear to be a reasonable supposition in many practical applications. This assumption means that there exists a positive b value (initially unknown), which limits the difference between the predicted response and the true response value y. Prob { |e|>b } = 0 and for any 0<b<b Prob { |e|>b } > 0. b is called the maximum error deviation (MED). Fig. 4 Examples of error distributions Normal and some finite distributions considered in SIC. -b +b e Process modeling To accomplish the process modeling task several software components such as matrix input module (ch_input), autoscaling module, and PLS module (ch_pls) were implemented in the system (see Fig. 1). The SIC method is adopted and incorporated into the SIC and the Object Status Plot drawing modules (ch_sic, ch_sic_osp, ch_sic_out). Each module consists of a GUI-part written in HTML, a DLL-library and several Matlab m-files. Fig. 10 shows the “network” used in modeling of one stage of the process under consideration. You can build such a network simply by dragging the necessary modules from the tool-box panel to the CSDesign’s main workspace. The Input modules contain the calibration and test data sets . The PLS module calculates the PLS model, based on the number of principal components and data preprocessing, defined on it’s settings screen. Finally, the SIC module calculates the SIC model, that can be interpreted using the OSP and prediction intervals. One can see them by clicking on the “magnifier glass button” on the Intervals module (see Fig. 11) and the OSP module (see Fig. 12). Conclusions The employment of flow-oriented approach to user interface development allows us to define the modeling task in an intuitive and user-friendly manner and implement the process in the form of a network of reusable modules.

Upload: dangkhanh

Post on 15-Sep-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Implementation of the web-based flow-oriented approach

to the process control and optimization2, Oxana Ye. Rodionova

1Moscow State Institute of Electronics and Mathematics, Moscow

2Semenov Institute of Chemical Physics, Russian Academy of Sciences, Moscow

1 2,3Yury V. Zontov , Alexey L. Pomerantsev

3 State South Research & Testing Site RAS, Sochi

Projection methods + SIC approach

For real world data SIC method is often used together with some multivariate

calibration method. The most useful outcomes are yielded when regression

results are supplemented with the SIC results.

the

Fig. 9 Procedure flow-chart

Initial Data Set{X,Y}

PLS/PCR modelFixed number of PCs

SIC-modeling

- +[v , v ]

b: b , bmin sic

RMSEC RMSEP

y

SIC prediction

Using the obtained RPV we can solve the prediction problem for any given predictor vector x

(e.g. a spectra). The result of prediction is presented as an interval for response y + – V = [ v , v ]

+ twhere v =maximum xa, for a subject to a О A – tv = minimum xa, for a subject to a О A.

This is a typical problem of linear programming, so interval V can be found for any new object

x (see Fig. 6), and there is no need to construct A explicitly.

SIC Status Classification

The interval approach helps to build the explicit classification of calibration or new objects

(test set samples, or new x-data alone) in relation to the obtained calibration model, which is

presented by the RPV. This classification is constructed using the following measures of

prediction quality (Fig. 7):

SIC-residual r is defined as - + – r (x, y) = [ y – 0.5 (v + v ) ]/b

SIC-leverage h is defined as - + – h (x, y) = [ 0.5 (v – v ) ]/b

Two fundamental equations

| r (x, y) |=1 – h (x, y) and | r (x, y) |=1 + h (x, y)

divide the SIC-residual (r) vs. SIC-leverage (h) plane into three categories: insiders,

outsiders and outliers. This can be represented as the object status plot (OSP, Fig. 8) for any

dimensionality of initial data set and for any number of estimated model parameters.

Fig. 5 Typical shape of RPV - polyhedron in the 2-dimensional space

yV

+v

v –

ty=x a

a1

a2

RPV

Fig. 6 SIC interval prediction Fig. 7 Prediction & calibration intervals

Fig.8 SIC Object Status Plot.

Region of possible parameter values (RPV)

The Region of Possible (parameter) Values ( RPV, see Fig.5) is a set in parameter space

determined as

pA={aОR : |Xa – y|< b }

Region A is a closed convex polyhedron. This is a volumetric analogue of the conventional

parameter point estimates, which is calculated by some traditional regression method, e.g.

PLS.

Introduction

We present CSDesign, a modular web-based system providing a flow-oriented style

environment for the synthesis and analysis of various processes. Two mathematical

methods are implemented for the process control and optimization.They are the PLS

regression and the SIC-method.

S1 S2 S3

M1 M2 M3 CM1 CM2 CM3

W1 W2 W3 CW1 CW2 CW3

WR1

WR2

MR1

MR2

S

W CW

M CM PA1 A2

A3 A4

A5 A6

I6

II8

III11

IV14

V16

VI19

VII25

Fig. 2 Multi-stage production process

S1

S2

S3

W1

W2

W3

WR

1

WR

2

CW

1

CW

2

CW

3

M1

M2

M3

MR

1

MR

2

CM

1

CM

2

CM

3

A1

A2

A3

A4

A5

A6 Y

Tra

inin

g

Se

t(1

02

)

Y

Te

st

Se

t(5

2)

Y

XV XVI XVII

XI XII XIII XIV XV XVI XVII

XI XII XIII XIV

Fig. 3 Process variables set

M

Multistage process

The functionality of the system is illustrated with a real-world example of a multi-stage

continuous technological process. It is represented by 25 key variables X and by one

output variable y, which is the final quality of the product. The whole cycle is divided into

seven stages numbered by the Roman numerals. First stage (I) is represented by six

input variables (W1, W2, W3 and S1, S2, S3) that stand for the properties of the raw

components S and W.

At the second stage (II) component W is refining and variables WR1 and WR2 characterize

this process. Variables CW1, CW2, and CW3 (Stage III) represent the properties of the

outcome product CW. The next stage (IV) is mixing of the raw component S and the refined

component CW. The result M is characterized by variables M1, M2, and M3. Afterward,

blend M is also refined (Stage V) with the process characteristics MR1 and MR2, and the

properties of outcome CM are presented by variables CM1, CM2, and CM3 (Stage VI). The

last stage (VII) stands for the ultimate amendments, which are done with additives A1,…,

A6. The output variable (P=y) is the final product quality.

Data set description

We have a collection of historical data measured for 154 samples that characterize proper

process performance. Each sample corresponds to the entire production cycle shown on

Fig. 2. The whole data set is divided horizontally (by samples) in two parts: the training set

(102 objects) and the test set (52 object). All data are also divided vertically (by variables) into

7 blocks in conformity with the technological stages (see Fig. 3).

Fig. 10 The modeling network

Fig. 11 SIC-Results module GUI

Fig. 12 OSP Module GUI

Software implementation

CSDesign is a software system, that uses the ideas of flow-based programming

approach. In computer science, flow-based programming (FBP) is a

programming paradigm that defines applications as networks of "black box"

processes, which exchange data across predefined connections by message

passing, where the connections are specified externally to the processes.

This approach allows you to extend application functionality by adding new

modules or by making changes in their interaction patterns without having to

change their internal structure.

Also, CSDesign is web-based, which means, that you don’t need to install it on all

computers in your laboratory. Everything you’ll need is modern web-browser.

Process control and optimization

Using these data, we construct a series of PLS1 regression models.

Each model is denoted here by the operator XY(M), which maps the X block, X(M), to

the Y block, y. Each XY model uses the same number of PLS principal components k.

The main purpose of these models is the prediction of the output quality variable y at

each (M-th) stage of production process. The predicted value could be further compared

with a desired quality level. Too large difference signalizes that something is wrong and

the process demands active improvements at the next (M+1)-th stage. To verify these

corrections, a process engineer may try out various values of the variables that

characterize stage M+1. The corresponding model XY(M+1): X(M+1)⇒y can validate the

solution. Therefore ,the system of such models serves as an “adviser” that helps the

engineer to make a decision. However, this adviser cannot predict the future outcome y

exactly. There is always some uncertainty. To present it, the corresponding SIC models

are used. These models are built on the base of the relative PLS models with a given

number of principal components, k.

Fig. 1 Base classes and interfaces

Main features

The main features of the system are as follows:

• Extensibility

This means that the functionality and the range of tasks solved by the system may be

expanded by addition of a new task-specific software module.

The programming framework developed as a CSDesign part provides a number of

interfaces, which a new software module should implement (see Fig. 1) .

• Intuitiveness

This means a specific problem development in the form of a flow-chart drawing including

such regular actions as drag-n-drop of the solutions' components, and their interconnection

with links.

• Server-side calculations

Most of all complex and resource-consuming calculations are processed remotely and the

results are transferred back to a client asynchronously using the Ajax technology.

The rich user interface is used primarily for visualization and input.

SIC definitionSimple Interval Calculation (SIC) is a method for linear modeling that gives the result of

prediction directly in the interval form. The primary SIC consequence is a radically new object

classification that can be interpreted using a two-dimensional object status plot (OSP), ‘SIC

residual vs. SIC leverage’ .

SIC basic assumption

All errors involved in the general calibration problemty = xa + e

are limited (sampling errors, measurement errors,

modeling errors, etc.), which would appear to be a

reasonable supposition in many practical applications.

This assumption means that there exists a positive b value

(initially unknown), which limits the difference between the

predicted response and the true response value y.

Prob { |e|>b } = 0 and for any 0<b<b Prob { |e|>b } > 0.

b is called the maximum error deviation (MED).

Fig. 4 Examples of error distributionsNormal and some finite distributions considered in SIC.

-b +b

e

Process modeling

To accomplish the process modeling task several software components such as

matrix input module (ch_input), autoscaling module, and PLS module (ch_pls)

were implemented in the system (see Fig. 1).

The SIC method is adopted and incorporated into the SIC and the Object Status

Plot drawing modules (ch_sic, ch_sic_osp, ch_sic_out).

Each module consists of a GUI-part written in HTML, a DLL-library and several

Matlab m-files.

Fig. 10 shows the “network” used in modeling of one stage of the process under

consideration. You can build such a network simply by dragging the necessary

modules from the tool-box panel to the CSDesign’s main workspace. The Input

modules contain the calibration and test data sets . The PLS module calculates

the PLS model, based on the number of principal components and data

preprocessing, defined on it’s settings screen. Finally, the SIC module calculates

the SIC model, that can be interpreted using the OSP and prediction intervals.

One can see them by clicking on the “magnifier glass button” on the Intervals

module (see Fig. 11) and the OSP module (see Fig. 12).

Conclusions

The employment of flow-oriented approach to user interface development allows

us to define the modeling task in an intuitive and user-friendly manner and

implement the process in the form of a network of reusable modules.