multivariate statistical process monitoring using robust nonlinear principal component analysis
TRANSCRIPT
TSINGHUA SCIENCE AND TECHNOLOGYISSN 1007-0214 11/20 pp582-586Volume 10, Number 5, October 2005
Multivariate Statistical Process Monitoring Using Robust
Nonlinear Principal Component Analysis*
ZHAO Shijian ( ), XU Yongmao ( )**
Department of Automation, Tsinghua University, Beijing 100084, China
Abstract: The principal component analysis (PCA) algorithm is widely applied in a diverse range of fields for
performance assessment, fault detection, and diagnosis. However, in the presence of noise and gross errors,
the nonlinear PCA (NLPCA) using autoassociative bottle-neck neural networks is so sensitive that the
obtained model differs significantly from the underlying system. In this paper, a robust version of NLPCA is
introduced by replacing the generally used error criterion mean squared error with a mean log squared error.
This is followed by a concise analysis of the corresponding training method. A novel multivariate statistical
process monitoring (MSPM) scheme incorporating the proposed robust NLPCA technique is then
investigated and its efficiency is assessed through application to an industrial fluidized catalytic cracking
plant. The results demonstrate that, compared with NLPCA, the proposed approach can effectively reduce
the number of false alarms and is, hence, expected to better monitor real-world processes.
Key words: robust nonlinear principal component analysis; autoassociative networks; multivariate statistical
process monitoring (MSPM); fluidized catalytic cracking unit (FCCU)
Introduction
Multivariate statistical process monitoring (MSPM)
has achieved great success in the past decade, mainly
due to the adoption of powerful chemometric
techniques such as principal components analysis
(PCA) and projection to latent structures (PLS)[1,2]
.
PCA has been developed and successfully applied to
deal with data sets from industrial processes with a
large number of highly correlated variables. However,
these data sets are usually also contaminated by noise
and gross errors. The basic concept behind PCA is to
project the data set onto a lower dimensionality
subspace. Noise will be suppressed to a certain extent
by discarding some latent variables, mainly due to the
influence of random noise.
Unfortunately, although PCA is good for linear or
almost linear problems, it fails to deal well with the
significant intrinsic nonlinearity associated with
real-world processes. Hence, nonlinear extensions of
PCA have been investigated by different researchers.
One of the preferred methods is that proposed by
Kramer[3]
for its simplicity, which employs a five-layer
autoassociative network with a bottle-neck layer to
achieve the dimensionality reduction.
Data collected from industrial processes are
normally contaminated by noise and gross errors,
which severely affects the model accuracy. Kramer[4]
proposed a robust version of NLPCA and illustrated
that it could be utilized to detect gross errors,
compensate for missing values, and estimate the
desired value in one pass. However, the model
demands that the data should all be for the normal
operating condition (NOC). Furthermore, massive
samples are required for network training, which
Received: 2004-02-26; revised: 2004-05-11
Supported by the National High-Tech Research and Development
(863) Program of China (No. 2001AA413320)
To whom correspondence should be addressed.
E-mail: [email protected]; Tel: 86-10-62785845
ZHAO Shijian et al Multivariate Statistical Process Monitoring Using 583
makes the method quite time- consuming. In this paper,
analysis of the error modelled to a robust NLPCA
method which is applied to the monitoring of a
fluidized catalytic cracking (FCC) plant. The results
demonstrate that compared with the NLPCA, the new
technique is more effective and can produce a more
accurate representation of the process.
1 Robust Nonlinear PCA Based on Autoassociative Neural Network
1.1 Nonlinear PCA (NLPCA)
Consider a data set , where
is the number of samples being observed and is
the number of process variables. PCA aims to find a
projection axis
T
1[ ]
n pn… RX x x n
p
1pRw which represents the
direction with maximum variability to provide a
simpler and more efficient representation of the
original data set[5]
:
2
max var( ) subject to 1w
Xw w (1)
This objective function is equivalent to the
minimum reconstruction error criterion[6]
:
2T
1
1
1min
n
ii
Jnw
x xww i
X X Xw w1
X
(2)
After the first principal component (PC) is
obtained, is deflated as , is
then used to retrieve the remaining PCs in an
analogous manner. This process continues until a
convergence index is satisfied.
1w
X T
1 1 1
This standard approach has been extended by
several researchers to cope with the intrinsic nonlin-
earity in the data collected from industrial processes.
Kramer’s approach[3]
employed an autoassociative
neural network with a bottle-neck layer, as shown in
Fig. 1. Kamer defined the output of the bottle-neck
Fig. 1 NLPCA using an autoassociative network
layer, , as the nonlinear PC score matrix,T
( ) ,n f f pRT G X
where is a nonlinear vector function assembling
the nonlinear mapping of the first three layers. is
then transformed inversely to the original
dimensionality as:
GT
,ˆ ( )n pRX H T
where is another nonlinear function representing
the effect of the remaining two layers. This network is
trained to match the reconstructed outputs
H
1ˆ ˆ[X x …
ˆ ]px with as closely as possible, or more
traditionally, to minimize the mean squared error
(MSE):
X
2
1
1 n
ii
En
e (3)
where ˆ ii ie x x is the reconstruction error. The
training can be implemented using a back-propagation
(BP)[7]
algorithm or its variants, such as RPROP BP[8]
.
1.2 Robust nonlinear PCA
Data collected from industrial processes is inevitably
contaminated by intentional changes in the process
operations, unintentional process disturbances, and
noise from both the process and the measurements. It
has been demonstrated that the MSE method in Eq. (3)
makes implicit assumptions about the error such as
normality and independence[9]
, and thus overestimates
the influence of outliers, which usually possess very
large errors.
One reasonable remedy is to keep the influence of
most of the normal data unchanged while simultan-
eously suppressing the impact of noise and gross errors.
Referring to the work of statisticians, Liano[9]
introduced the mean log squared error (MLSE)
function,
2
1
1 1log(1 )
2
n
ii
En
e (4)
as a substitute for the MSE to make the model robust
against potential gross errors in the original data set.
As a special case, the objective function in Eq.(3)
can be replaced by Eq. (4) to give a robust NLPCA.
This approach is compatible with various existing
gradient-based weight updating methods such as BP[7]
and RPROP BP[8]
. However, since the Levenberg-
Tsinghua Science and Technology, October 2005, 10(5): 582 586584
Marquardt BP method[10]
explicitly assumes that the
network uses MSE as the performance function, they
cannot be adapted directly to the robust approach.
However, for cases where the historical data used for
modelling is all for normal operating conditions (NOC),
Kramer[4]
proposed another interesting robust
nonlinear PCA to detect grosss errors, compensate for
missing values, and estimate the desired value in one
pass. This method augments the original training data
set containing samples of variables with
artificial examples with inputs
X n p
iZ X I j = 1, …,
p, where jI is the j-th column of the identity matrix,
with the corresponding unchanged outputs X. If the
number of random corruptions of each sensor is ,
the augmented training data set will then contain as
many as n n samples, which renders the
training rather time-consuming. Additionally, the
constraint that all data used for modelling is for NOC
is rather demanding.
c
npc
2 Multivariate Statistical ProcessMonitoring Scheme
With increasing numbers of variables being measured,
multivariate statistical process monitoring (MSPM) is
becoming more and more necessary. The principle
behind MSPM is that a process subject to only natural
variability or NOC in the process parameters will
remain in a state of statistical control unless a special
event occurs[1]
.
As powerful statistical analysis tools to achieve
dimensionality reduction and noise suppression, PCA
and PLS are widely used to construct multivariate
control charts such as Q- and -statistic charts. This
analysis focuses on the Q-statistic, which effectively
characterizes the deviation of a specific sample
2T
1 pRx from the PC model. After a PC model is
obtained based on historical data with most data for
NOC, the Q-statistic is then calculated for every data
point as the squared prediction error: x2 2
ˆ .Q e x x
Analogous to the linear PCA case, this definition can
be extended to a robust NLPCA by letting be the
output of the autoassociative network in Fig. 1
corresponding to the input .
x̂
xAnother important issue related to Q-statistic charts
is the determination of the confidence bounds. Box[11]
showed that the Q-statistic distribution can be well
approximated by
2
hQ g ,
where the weight g and the degree of freedom h can
be estimated by the matching moments of the mean
( ) and variance ( ) of the cumulantsm v [12]:
2
vg
m,
and
22m
hv
.
The resulting upper control limit (UCL) can thus be
calculated as
2
2
1
2UCL ( )
2Q
v m
m v,
where is the predefined level of significance. In
this paper, 0 05 and 0 02 are chosen a priorias the warning and action limits.
3 Application Results
The robust NLPCA can be applied to the same
problems as the NLPCA. For data sets with possible
gross errors, the robust method is expected to yield
better results. Here we show how the proposed
approach can be applied to the monitoring of the
fluidized catalytic cracking unit (FCCU) shown in Fig.
2 (for simplicity, only part of the plant is illustrated).
The FCCU plays a very important role in oil refining
to convert heavy fraction oil to gasoline, C3-C4 cuts,
and petrochemicals. The plant shown in Fig. 2 differs
from a classical FCCU by possessing two-stage
regenerators. The feedstock is injected at the bottom of
the riser. At this point, the vaporized feedstock droplets
mix with the catalyst from the second-stage
regenerator and the reaction commences. This creates a
high gas velocity, which entrains catalyst particles
upwards so that reactions take place in the riser. The
catalyst containing coke at the riser exit is separated
from the gases in cyclones. The catalyst is then flushed
with steam in the stripper to minimize the hydrocarbon
entrainment to the first regenerator. The coke on the
catalyst is burned off by air in the first and second
regenerators. Fresh catalyst is added in the second
regenerator at regular pre-defined intervals. The coke
ZHAO Shijian et al Multivariate Statistical Process Monitoring Using 585
combustion provides the thermal energy to vaporize
the feedstock and to compensate for the endothermic
reaction. This energy is transported by the hot catalyst.
Fig . 2 Simplified schemeatic of fluidized catalyticcracking unit
In total there were 18 nonlinearly correlated
variables in the data set. The sampling interval was
1 min. The historical data set containing 4000 samples
was segmented into a training data set and a validation
data set of equal lengths. The NLPCA and robust
NLPCA methods were then used to model the process.
The number of neurons in the bottle-neck layer was
determined through cross-validation. The network was
trained by the modified RRPOP BP[8]
approach with
the maximum epochs assigned to 106. The final
network architecture for the NLPCA was
18-37-1-37-18. For ease of comparison the same
structure was used with the robust NLPCA method
with the same number of maximum epochs.
The methods used 2000 samples each for training
and validation. This is in sharp contrast with
Kramer’s[4]
algorithm which, as pointed out at the end
of Section 1.2, requires up to 2000 +2000 18 6
218 000 samples for training and validation, even if the
number of random corruptions of each sensor, , is
set as small as 6. Furthermore, it is too demanding to
assume that the training data collected from an
industrial plant are all for normal operating conditions
which renders Kramer’s
c
[4] approach more or less
impractical.
Generally, frequent false alarms incur in practice
which may be attributed mainly to the fact that the
developed model is too sensitive to noise and gross
errors. As shown in Fig. 3, when the model obtained
using the NLPCA is utilized for real-time monitoring
of the process, false alarms are frequently triggered
while real alarms arising from process malfunctions
are not recognized. This sensitivity makes it difficult if
not impossible to identify real process abnormalities
efficiently. In contrast, as shown in Fig. 4, the model
produced by the robust NLPCA provides better
analysis of the data set and better predictions,
accompanied by fewer errors. As a consequence, this
methodology effectively reduces the number of false
alarms. Therefore, the proposed approach behaves
better in the presence of noise and gross errors.
Fig. 3 Q chart with NLPCA
Fig. 4 Q chart with robust NLPCA
4 Conclusions
Although nonlinear PCA plays an important role in the
monitoring of industrial processes, it is sensitive to
noise and gross errors in the historical data. This
sensitivity usually results in a model differing
significantly from the underlying system, which in
Tsinghua Science and Technology, October 2005, 10(5): 582 586586
effect extremely degrades its practicability. This paper
describes a robust NLPCA which replaces the MSE
error criterion with an MLSE criterion. The
corresponding training algorithm and the incorporation
within an existing MSPM framework are illustrated.
The reliability and efficiency of the new method are
demonstrated by monitoring an industrial FCC plant.
References
[1] Martin E B, Morris A J, Zhang J. Process performance
monitoring using multivariate statistical process control.
IEE Proceedings on Control Theory and Applications,
1996, 143: 132-144.
[2] Qin S J. Statistical process monitoring: Basics and beyond.
Journal of Chemometrics, 2003, 17: 480-502.
[3] Kramer M A. Nonlinear principal component analysis
using autoassociative neural networks. AIChE. Journal,
1991, 37(2): 233-243.
[4] Kramer M A. Autoassociative neural networks. Computer
and Chemical Engineering, 1992, 16(4): 313-328.
[5] Jackson J E. A User’s Guide to Principal Components.
New York: John Wiley & Sons, 1991.
[6] Sanger T D. Optimal unsupervised learning in a
single-layer linear feedforward neural network. Neural
Networks, 1989, 2: 459-473.
[7] Rumelhart D E, Hinton G E, Williams R J. Learning
internal representations by error propagation. In:
Rumelhart D E, McClelland J L, eds. Parallel Distributed
Processing, Vol.1. Cambridge, MA: MIT Press, 1986.
[8] Reidmiller M, Heinrich B. A direct adaptive method for
faster backpropagation learning: The RPROP algorithm. In:
Ruspini H, ed. Proceedings of the IEEE Int. Conf. on NN
(ICNN). San Francisco, 1993: 586-591.
[9] Liano K. Robust error measure for supervised neural
network learning with outliers. IEEE Transactions on
Neural Networks, 1996, 7(1): 246-250.
[10] Martin T H, Menhaj M B. Training feedforward networks
with the Marquardt algorithm. IEEE Transactions on
Neural Networks, 1994, 5(6): 989-993.
[11] Box G E P. Some theorems on quadratic forms applied in
the study of analysis of variance problems (I): Effect of
inequality of variance in the one-way classification. Annals.
of Mathematical Statistics, 1954, 25: 290-302.
[12] Nomikos P, MacGregor J F. Multivariate SPC charts for
monitoring batch processes. Technometrics, 1995, 37(1):
41-59.