multivariate statistical process monitoring using robust nonlinear principal component analysis

TSINGHUA SCIENCE AND TECHNOLOGYISSN 1007-0214 11/20 pp582-586Volume 10, Number 5, October 2005

Multivariate Statistical Process Monitoring Using Robust

Nonlinear Principal Component Analysis*

ZHAO Shijian ( ), XU Yongmao ( )**

Department of Automation, Tsinghua University, Beijing 100084, China

Abstract: The principal component analysis (PCA) algorithm is widely applied in a diverse range of fields for

performance assessment, fault detection, and diagnosis. However, in the presence of noise and gross errors,

the nonlinear PCA (NLPCA) using autoassociative bottle-neck neural networks is so sensitive that the

obtained model differs significantly from the underlying system. In this paper, a robust version of NLPCA is

introduced by replacing the generally used error criterion mean squared error with a mean log squared error.

This is followed by a concise analysis of the corresponding training method. A novel multivariate statistical

process monitoring (MSPM) scheme incorporating the proposed robust NLPCA technique is then

investigated and its efficiency is assessed through application to an industrial fluidized catalytic cracking

plant. The results demonstrate that, compared with NLPCA, the proposed approach can effectively reduce

the number of false alarms and is, hence, expected to better monitor real-world processes.

Key words: robust nonlinear principal component analysis; autoassociative networks; multivariate statistical

process monitoring (MSPM); fluidized catalytic cracking unit (FCCU)

Introduction

Multivariate statistical process monitoring (MSPM)

has achieved great success in the past decade, mainly

due to the adoption of powerful chemometric

techniques such as principal components analysis

(PCA) and projection to latent structures (PLS)[1,2]

.

PCA has been developed and successfully applied to

deal with data sets from industrial processes with a

large number of highly correlated variables. However,

these data sets are usually also contaminated by noise

and gross errors. The basic concept behind PCA is to

project the data set onto a lower dimensionality

subspace. Noise will be suppressed to a certain extent

by discarding some latent variables, mainly due to the

influence of random noise.

Unfortunately, although PCA is good for linear or

almost linear problems, it fails to deal well with the

significant intrinsic nonlinearity associated with

real-world processes. Hence, nonlinear extensions of

PCA have been investigated by different researchers.

One of the preferred methods is that proposed by

Kramer[3]

for its simplicity, which employs a five-layer

autoassociative network with a bottle-neck layer to

achieve the dimensionality reduction.

Data collected from industrial processes are

normally contaminated by noise and gross errors,

which severely affects the model accuracy. Kramer[4]

proposed a robust version of NLPCA and illustrated

that it could be utilized to detect gross errors,

compensate for missing values, and estimate the

desired value in one pass. However, the model

demands that the data should all be for the normal

operating condition (NOC). Furthermore, massive

samples are required for network training, which

Received: 2004-02-26; revised: 2004-05-11

Supported by the National High-Tech Research and Development

(863) Program of China (No. 2001AA413320)

To whom correspondence should be addressed.

E-mail: [email protected]; Tel: 86-10-62785845

ZHAO Shijian et al Multivariate Statistical Process Monitoring Using 583

makes the method quite time- consuming. In this paper,

analysis of the error modelled to a robust NLPCA

method which is applied to the monitoring of a

fluidized catalytic cracking (FCC) plant. The results

demonstrate that compared with the NLPCA, the new

technique is more effective and can produce a more

accurate representation of the process.

1 Robust Nonlinear PCA Based on Autoassociative Neural Network

1.1 Nonlinear PCA (NLPCA)

Consider a data set , where

is the number of samples being observed and is

the number of process variables. PCA aims to find a

projection axis

T

1[ ]

n pn… RX x x n

p

1pRw which represents the

direction with maximum variability to provide a

simpler and more efficient representation of the

original data set[5]

:

2

max var( ) subject to 1w

Xw w (1)

This objective function is equivalent to the

minimum reconstruction error criterion[6]

:

2T

1

1

1min

n

ii

Jnw

x xww i

X X Xw w1

X

(2)

After the first principal component (PC) is

obtained, is deflated as , is

then used to retrieve the remaining PCs in an

analogous manner. This process continues until a

convergence index is satisfied.

1w

X T

1 1 1

This standard approach has been extended by

several researchers to cope with the intrinsic nonlin-

earity in the data collected from industrial processes.

Kramer’s approach[3]

employed an autoassociative

neural network with a bottle-neck layer, as shown in

Fig. 1. Kamer defined the output of the bottle-neck

Fig. 1 NLPCA using an autoassociative network

layer, , as the nonlinear PC score matrix,T

( ) ,n f f pRT G X

where is a nonlinear vector function assembling

the nonlinear mapping of the first three layers. is

then transformed inversely to the original

dimensionality as:

GT

,ˆ ( )n pRX H T

where is another nonlinear function representing

the effect of the remaining two layers. This network is

trained to match the reconstructed outputs

H

1ˆ ˆ[X x …

ˆ ]px with as closely as possible, or more

traditionally, to minimize the mean squared error

(MSE):

X

2

1

1 n

ii

En

e (3)

where ˆ ii ie x x is the reconstruction error. The

training can be implemented using a back-propagation

(BP)[7]

algorithm or its variants, such as RPROP BP[8]

.

1.2 Robust nonlinear PCA

Data collected from industrial processes is inevitably

contaminated by intentional changes in the process

operations, unintentional process disturbances, and

noise from both the process and the measurements. It

has been demonstrated that the MSE method in Eq. (3)

makes implicit assumptions about the error such as

normality and independence[9]

, and thus overestimates

the influence of outliers, which usually possess very

large errors.

One reasonable remedy is to keep the influence of

most of the normal data unchanged while simultan-

eously suppressing the impact of noise and gross errors.

Referring to the work of statisticians, Liano[9]

introduced the mean log squared error (MLSE)

function,

2

1

1 1log(1 )

2

n

ii

En

e (4)

as a substitute for the MSE to make the model robust

against potential gross errors in the original data set.

As a special case, the objective function in Eq.(3)

can be replaced by Eq. (4) to give a robust NLPCA.

This approach is compatible with various existing

gradient-based weight updating methods such as BP[7]

and RPROP BP[8]

. However, since the Levenberg-

Tsinghua Science and Technology, October 2005, 10(5): 582 586584

Marquardt BP method[10]

explicitly assumes that the

network uses MSE as the performance function, they

cannot be adapted directly to the robust approach.

However, for cases where the historical data used for

modelling is all for normal operating conditions (NOC),

Kramer[4]

proposed another interesting robust

nonlinear PCA to detect grosss errors, compensate for

missing values, and estimate the desired value in one

pass. This method augments the original training data

set containing samples of variables with

artificial examples with inputs

X n p

iZ X I j = 1, …,

p, where jI is the j-th column of the identity matrix,

with the corresponding unchanged outputs X. If the

number of random corruptions of each sensor is ,

the augmented training data set will then contain as

many as n n samples, which renders the

training rather time-consuming. Additionally, the

constraint that all data used for modelling is for NOC

is rather demanding.

c

npc

2 Multivariate Statistical ProcessMonitoring Scheme

With increasing numbers of variables being measured,

multivariate statistical process monitoring (MSPM) is

becoming more and more necessary. The principle

behind MSPM is that a process subject to only natural

variability or NOC in the process parameters will

remain in a state of statistical control unless a special

event occurs[1]

.

As powerful statistical analysis tools to achieve

dimensionality reduction and noise suppression, PCA

and PLS are widely used to construct multivariate

control charts such as Q- and -statistic charts. This

analysis focuses on the Q-statistic, which effectively

characterizes the deviation of a specific sample

2T

1 pRx from the PC model. After a PC model is

obtained based on historical data with most data for

NOC, the Q-statistic is then calculated for every data

point as the squared prediction error: x2 2

ˆ .Q e x x

Analogous to the linear PCA case, this definition can

be extended to a robust NLPCA by letting be the

output of the autoassociative network in Fig. 1

corresponding to the input .

x̂

xAnother important issue related to Q-statistic charts

is the determination of the confidence bounds. Box[11]

showed that the Q-statistic distribution can be well

approximated by

2

hQ g ,

where the weight g and the degree of freedom h can

be estimated by the matching moments of the mean

( ) and variance ( ) of the cumulantsm v [12]:

2

vg

m,

and

22m

hv

.

The resulting upper control limit (UCL) can thus be

calculated as

2

2

1

2UCL ( )

2Q

v m

m v,

where is the predefined level of significance. In

this paper, 0 05 and 0 02 are chosen a priorias the warning and action limits.

3 Application Results

The robust NLPCA can be applied to the same

problems as the NLPCA. For data sets with possible

gross errors, the robust method is expected to yield

better results. Here we show how the proposed

approach can be applied to the monitoring of the

fluidized catalytic cracking unit (FCCU) shown in Fig.

2 (for simplicity, only part of the plant is illustrated).

The FCCU plays a very important role in oil refining

to convert heavy fraction oil to gasoline, C3-C4 cuts,

and petrochemicals. The plant shown in Fig. 2 differs

from a classical FCCU by possessing two-stage

regenerators. The feedstock is injected at the bottom of

the riser. At this point, the vaporized feedstock droplets

mix with the catalyst from the second-stage

regenerator and the reaction commences. This creates a

high gas velocity, which entrains catalyst particles

upwards so that reactions take place in the riser. The

catalyst containing coke at the riser exit is separated

from the gases in cyclones. The catalyst is then flushed

with steam in the stripper to minimize the hydrocarbon

entrainment to the first regenerator. The coke on the

catalyst is burned off by air in the first and second

regenerators. Fresh catalyst is added in the second

regenerator at regular pre-defined intervals. The coke

ZHAO Shijian et al Multivariate Statistical Process Monitoring Using 585

combustion provides the thermal energy to vaporize

the feedstock and to compensate for the endothermic

reaction. This energy is transported by the hot catalyst.

Fig . 2 Simplified schemeatic of fluidized catalyticcracking unit

In total there were 18 nonlinearly correlated

variables in the data set. The sampling interval was

1 min. The historical data set containing 4000 samples

was segmented into a training data set and a validation

data set of equal lengths. The NLPCA and robust

NLPCA methods were then used to model the process.

The number of neurons in the bottle-neck layer was

determined through cross-validation. The network was

trained by the modified RRPOP BP[8]

approach with

the maximum epochs assigned to 106. The final

network architecture for the NLPCA was

18-37-1-37-18. For ease of comparison the same

structure was used with the robust NLPCA method

with the same number of maximum epochs.

The methods used 2000 samples each for training

and validation. This is in sharp contrast with

Kramer’s[4]

algorithm which, as pointed out at the end

of Section 1.2, requires up to 2000 +2000 18 6

218 000 samples for training and validation, even if the

number of random corruptions of each sensor, , is

set as small as 6. Furthermore, it is too demanding to

assume that the training data collected from an

industrial plant are all for normal operating conditions

which renders Kramer’s

c

[4] approach more or less

impractical.

Generally, frequent false alarms incur in practice

which may be attributed mainly to the fact that the

developed model is too sensitive to noise and gross

errors. As shown in Fig. 3, when the model obtained

using the NLPCA is utilized for real-time monitoring

of the process, false alarms are frequently triggered

while real alarms arising from process malfunctions

are not recognized. This sensitivity makes it difficult if

not impossible to identify real process abnormalities

efficiently. In contrast, as shown in Fig. 4, the model

produced by the robust NLPCA provides better

analysis of the data set and better predictions,

accompanied by fewer errors. As a consequence, this

methodology effectively reduces the number of false

alarms. Therefore, the proposed approach behaves

better in the presence of noise and gross errors.

Fig. 3 Q chart with NLPCA

Fig. 4 Q chart with robust NLPCA

4 Conclusions

Although nonlinear PCA plays an important role in the

monitoring of industrial processes, it is sensitive to

noise and gross errors in the historical data. This

sensitivity usually results in a model differing

significantly from the underlying system, which in

Tsinghua Science and Technology, October 2005, 10(5): 582 586586

effect extremely degrades its practicability. This paper

describes a robust NLPCA which replaces the MSE

error criterion with an MLSE criterion. The

corresponding training algorithm and the incorporation

within an existing MSPM framework are illustrated.

The reliability and efficiency of the new method are

demonstrated by monitoring an industrial FCC plant.

References

[1] Martin E B, Morris A J, Zhang J. Process performance

monitoring using multivariate statistical process control.

IEE Proceedings on Control Theory and Applications,

1996, 143: 132-144.

[2] Qin S J. Statistical process monitoring: Basics and beyond.

Journal of Chemometrics, 2003, 17: 480-502.

[3] Kramer M A. Nonlinear principal component analysis

using autoassociative neural networks. AIChE. Journal,

1991, 37(2): 233-243.

[4] Kramer M A. Autoassociative neural networks. Computer

and Chemical Engineering, 1992, 16(4): 313-328.

[5] Jackson J E. A User’s Guide to Principal Components.

New York: John Wiley & Sons, 1991.

[6] Sanger T D. Optimal unsupervised learning in a

single-layer linear feedforward neural network. Neural

Networks, 1989, 2: 459-473.

[7] Rumelhart D E, Hinton G E, Williams R J. Learning

internal representations by error propagation. In:

Rumelhart D E, McClelland J L, eds. Parallel Distributed

Processing, Vol.1. Cambridge, MA: MIT Press, 1986.

[8] Reidmiller M, Heinrich B. A direct adaptive method for

faster backpropagation learning: The RPROP algorithm. In:

Ruspini H, ed. Proceedings of the IEEE Int. Conf. on NN

(ICNN). San Francisco, 1993: 586-591.

[9] Liano K. Robust error measure for supervised neural

network learning with outliers. IEEE Transactions on

Neural Networks, 1996, 7(1): 246-250.

[10] Martin T H, Menhaj M B. Training feedforward networks

with the Marquardt algorithm. IEEE Transactions on

Neural Networks, 1994, 5(6): 989-993.

[11] Box G E P. Some theorems on quadratic forms applied in

the study of analysis of variance problems (I): Effect of

inequality of variance in the one-way classification. Annals.

of Mathematical Statistics, 1954, 25: 290-302.

[12] Nomikos P, MacGregor J F. Multivariate SPC charts for

monitoring batch processes. Technometrics, 1995, 37(1):

41-59.

multivariate statistical process monitoring using robust nonlinear principal component analysis

Documents