fault detection and diagnosis using information measuresstaff.utia.cas.cz/kulhavy/eth97s.pdf ·...
TRANSCRIPT
Fault Detection and Diagnosis
Using Information Measures
Fault Detection and Diagnosis
Using Information Measures
Rudolf Kulhavý
Honeywell Technology Center &
Institute of Information Theory and Automation
Prague, Czech Republic
Outline
� Probability-based inference revisited
Fundamentals of information geometry
� Finite-memory inference
Minimum Relative Entropy (MRE) approximation
� Implementation
Markov Chain Monte Carlo (MCMC) methods
� Brute-Force Alternative
Monte Carlo Again: Weighted Bootstrap
Likelihood-based Inference
� General regression
� Model
� Likelihood function
� Posterior density
(((( )))) )|(),|,(1
11 kk
mN
mk
mmmNm
mNmN zyscuyuyql ∏∏∏∏
++++
++++====
++++++++
++++++++ ======== θθθθθθθθθθθθ
mNmkyuzzy kkkk ++++++++======== −−−− ,,1),,(, 1 Κ
nRTzys ⊂⊂⊂⊂∈∈∈∈θθθθθθθθ ),|(
(((( )))) )()(0 θθθθθθθθθθθθ NN lpcp ====
Information-based Inference
� Empirical density
� Conditional inaccuracy
� Likelihood
� Posterior density
(((( )))) (((( ))))(((( ))))θθθθθθθθ srKNcl NN :exp −−−−====
(((( )))) (((( ))))∑∑∑∑====
−−−−−−−−====N
kkkN zzyy
Nzyr
1
,1
, δδδδ
(((( )))) (((( )))) (((( ))))∫∫∫∫∫∫∫∫==== zyzys
zyrsrK NN dd|
1log,:
θθθθθθθθ
(((( )))) )()(0 θθθθθθθθθθθθ NN lpcp ====
∏∏∏∏++++
++++====−−−−
mN
mkkk zys
N 1
)|(log1
θθθθ
(((( ))))
(((( )))) (((( )))) zyzyr
zyrzyzys
zyrzyr
zyzys
zyrsrK
dd|
1log),(dd
|
)|(log),(
dd|
1log),():(
∫∫∫∫∫∫∫∫∫∫∫∫∫∫∫∫
∫∫∫∫∫∫∫∫
++++====
====
Conditional Inaccuracy
conditional
relative entropy
conditional
Shannon entropy
� Model
� Assumptions
�
�
�
� Theoretical density
Example: Random-Coefficient AR(1)
kkkk eyvy ++++++++==== −−−−1)(µµµµ
constantisµµµµddistribute),0(is 2
vk Nv σσσσ
ddistribute),0(is 2ek Ne σσσσ
),,( 22ev σσσσσσσσµµµµθθθθ ====
−−−−−−−−==== 2
22)(
)(2
1exp
)(2
1)|( zy
zzzys µµµµ
σσσσσσσσππππθθθθ
!)(variancedependenthistory 2222ve zz σσσσσσσσσσσσ ++++====
Empirical vs Theoretical Density
-2 4-2
4
1−−−−==== kk yz 1−−−−==== kk yz
ky
scatter plot
histogram
ky
Testing of Various Hypotheses
-2 4-2
4
8.0====θθθθ
-2 4-2
4
-2
4
03.02 ====vσσσσ
-2 4
03.02 ====eσσσσ
ky ky ky
1−−−−ky 1−−−−ky 1−−−−ky
Minimum Inaccuracy (MI)
unnormalized inaccuracy
(((( ))))zyrN ,
(((( ))))zys ,ˆ,λλλλθθθθ(((( ))))zys |θθθθ
(((( )))) (((( )))) zyzys
zyrsrK dd|
1log),(: ∫∫∫∫∫∫∫∫====
θθθθS
(((( ))))srKnR
:min∈∈∈∈λλλλ
= exponential envelope
(((( )))) (((( )))) (((( ))))(((( ))))zyhzysczys ,exp|,, λλλλθθθθλλλλθθθθ ′′′′====θθθθS
const.)(log1 ++++−−−−==== θθθθNlN
MI coincides with
Maximum Likelihood!
Minimum Relative Entropy (MRE)
= h-compatible set
unnormalized relative entropy
(((( ))))zyrN ,
(((( ))))zys ,ˆ,λλλλθθθθ(((( ))))zys |θθθθ
(((( )))) (((( ))))(((( )))) zy
zys
zyrzyrsrD dd
|
,log),(|| ∫∫∫∫∫∫∫∫====
NR (((( )))) (((( )))) zyzyhzyr dd,,∫∫∫∫∫∫∫∫
(((( ))))srDNr
||minR∈∈∈∈
NR
MRE generalizes
Maximum Entropy!
N
N
kkk hzyh
N======== ∑∑∑∑
====1),(
1
(((( )))) (((( ))))(((( ))))
∫∫∫∫∫∫∫∫ ∫∫∫∫
∫∫∫∫∫∫∫∫
−−−−====
====
zzr
zrzyzys
zyrzyrzr
zyzys
zyrzyrsrD
d)(
1log)(dd
)|(
)|(log)|()(
dd|
,log),(||
Unnormalized Relative Entropy
conditional
relative entropy
marginal
Shannon entropy
Information Geometry
h-projection
)||():():( ˆ,ˆ, θθθθλλλλθθθθλλλλθθθθθθθθ ssDsrKsrK NN ++++====
Pythagorean theorem
(((( ))))zyrN ,
(((( ))))zys ,ˆ,λλλλθθθθ(((( ))))zys |θθθθ θθθθS
NR (((( )))) (((( )))) zyzyhzys dd,,ˆ,∫∫∫∫∫∫∫∫ λλλλθθθθ
(((( )))) (((( )))) NN hzyzyhzyr ======== ∫∫∫∫∫∫∫∫ dd,,
Outline
� Probability-based inference revisited
Fundamentals of information geometry
� Finite-memory inference
Minimum Relative Entropy (MRE) approximation
� Implementation
Markov Chain Monte Carlo (MCMC) methods
� Brute-Force Alternative
Monte Carlo Again: Weighted Bootstrap
MRE Approximation
1 choose so that (((( ))))zyh ,
2 approximate (((( ))))θθθθsrK N :
via minimum relative entropy (((( )))) (((( )))) (((( ))))(((( ))))θθθθθθθθθθθθ sDNpcp NN ||expˆ 0 R−−−−====3 approximate posterior density
(((( )))) (((( ))))θθθθθθθθ srDsDNr
N ||min||R
R∈∈∈∈
====
(((( )))) const.: ˆ, ≈≈≈≈λλλλθθθθsrK N
(((( ))))zyrN ,
(((( ))))zys ,ˆ,λλλλθθθθ(((( ))))zys |θθθθθθθθS
NR for expected values of θθθθ
MRE Algorithm
� Convex optimization problem ( easy part )
� Logarithm of normalizing divisor ( difficult part )
(((( )))) (((( ))))(((( ))))∫∫∫∫∫∫∫∫ ′′′′==== zyzyhzys dd,exp|log),( λλλλλλλλθθθθψψψψ θθθθ
]),([min)||( NR
N hsDn
λλλλλλλλθθθθψψψψλλλλ
θθθθ ′′′′−−−−====∈∈∈∈
R
Choice of Statistic
� Differencing
� Differentiation
� Weighted integration
)|(log)|(log),(1
zyszyszyhiii θθθθθθθθ −−−−==== ++++
)|(loggrad),( zyszyhiii θθθθθθθθωωωω ′′′′====
θθθθθθθθ θθθθ d)|(log)(),( zyswzyh ii ∫∫∫∫∫∫∫∫====
0d)( ====∫∫∫∫∫∫∫∫ θθθθθθθθiw
Two Simple Hypotheses
0θθθθs
0θθθθs1θθθθs1θθθθs
)|(
)|(log),(
0
1
zys
zyszyh
θθθθ
θθθθ==== implies
Nr Nr
)),((exp)|()|(0
zyhzysczys λλλλθθθθλλλλ ====
)(
)(log
1
0
1
θθθθθθθθ
N
NN
l
l
Nh ====
exponential envelope
Two Composite Hypotheses
Nr
0H1H
exponential family enveloping 10 , HH
λλλλ̂s
Construction of h-Statistic: Differencing
ezy ++++==== )arctan(θθθθ
vzy ++++==== θθθθ
hnoiseCauchy
ezy ++++==== θθθθ
zy
ezy ++++==== )sin(θθθθ
y y
yz
z z
h h
Construction of h-Statistic: Differentiation
ezy ++++==== )sin(θθθθ
1.0====θθθθ
2.0====θθθθ
4.0====θθθθ
h
h
h
yy
y
z
z
z
Example: Sensor Validation
� Monitoring of signal differences
� Model = mixture of 3 normal distributions
� Unknown parameters
� Statistic chosen
)*100,0()*01.0,0(),0()1( vNvNvN gfgf θθθθθθθθθθθθθθθθ ++++++++−−−−−−−−
normal
operation“frozen”
sensorgross
errors
1−−−−−−−−==== kkk yye
gf θθθθθθθθ ,iesprobabilit
2θθθθ
1θθθθ00 1
1
)(
)(log)(
0es
eseh
ii
θθθθ
θθθθ====
]0,0[0 ====θθθθ]0,1[1 ====θθθθ]1,0[2 ====θθθθ
]3/1,3/1[3 ====θθθθ
Signal Difference
ke
k0 500-25
15
Relative Entropy
)||(log θθθθsD NR
1θθθθ2θθθθ
Posterior Density
)(ˆ θθθθNp
1θθθθ2θθθθ
Outline
� Probability-based inference revisited
Fundamentals of information geometry
� Finite-memory inference
Minimum Relative Entropy (MRE) approximation
� Implementation
Markov Chain Monte Carlo (MCMC) methods
� Brute-Force Alternative
Monte Carlo Again: Weighted Bootstrap
(((( )))) (((( ))))(((( )))) ]dd,exp|log[max)||( ∫∫∫∫∫∫∫∫ ′′′′−−−−′′′′====∈∈∈∈
zyzyhzyshsRD NR
Nn
λλλλλλλλ θθθθλλλλ
θθθθ
MRE Algorithm
� Dual optimization task
� Numerical integration necessary
� sample from
� kernel estimate
� from it follows
),(,),,( )()()1()1( MM zyzy Κ ),(, zys λλλλθθθθ
0)ˆ||( ,, ≥≥≥≥==== εεεελλλλθθθθλλλλθθθθ ssD
),(ˆ , zys λλλλθθθθ
),( λλλλθθθθψψψψ
∑∑∑∑====
′′′′≈≈≈≈
M
iii
iiii
zys
zyhzys
M 1)()(
,
)()()()(
)|(ˆ
)),((exp)|(log
1),(
λλλλθθθθ
θθθθ λλλλλλλλθθθθψψψψ
MRE Implementation
Metropolis samplerMetropolis sampler
Metropolis samplerMetropolis sampler
MRE OptimizationMRE Optimization
Tilted model densityTilted model density
Model densityModel density
(((( ))))xs λλλλθθθθ ,
(((( ))))xsθθθθ
)()1( ,, Nxx Κ
)||( )(isD N θθθθR
)()1( ,, Mθθθθθθθθ Κ
),( zyx ====
Sample-based Computations
� expectation
� covariance
� probability of the event
� marginal density of
� predictive density
� direct sampling
� Rao-Blackwellized estimate
)(θθθθNE
)(Cov θθθθN
∫∫∫∫====ApAP θθθθθθθθ d)()(
)|(fromsample )()( zysy ii
θθθθ
∑∑∑∑====
====M
iN zys
Mzys i
1
)|(1
)|(ˆ )(θθθθ
),(given baa θθθθθθθθθθθθθθθθ ====
Metropolis Sampler I.
====∗∗∗∗∗∗∗∗
1,)(/)(
)(/)(min
)()( ii xxp
xxp
ππππππππαααα
).(fromSample xx ππππ∗∗∗∗ .w.p.Accept )1( αααα∗∗∗∗++++ ==== xx i
)(xππππ )(xp
Metropolis Sampler II.
.walkRandom )( nxx i ++++====∗∗∗∗ .w.p.Accept )1( αααα∗∗∗∗++++ ==== xx i
====∗∗∗∗
1,)(
)(min
)(ixp
xpαααα
)(xp
0 10
1
Example: Metropolis Sampling
0 10
1
1θθθθ1θθθθ
2θθθθ 2θθθθ
scatter plot histogram
Outline
� Probability-based inference revisited
Fundamentals of information geometry
� Finite-memory inference
Minimum Relative Entropy (MRE) approximation
� Implementation
Markov Chain Monte Carlo (MCMC) methods
� Brute-Force Alternative
Monte Carlo Again: Weighted Bootstrap
Weighted Bootstrap Filtering
� model
� time update
� data update
� calculate normalized weights
� resample M-times from the discrete distribution over
with probability mass wi associated with element i
),(
),( 111
kkkk
kkkk
vxgy
wxfx
======== −−−−−−−−−−−−
Miwxfx ik
ikk
ik ,,1),,( )(
1)(11
)( Κ======== −−−−−−−−−−−−
},,1:{ )( Mix ik Κ====
∑∑∑∑ ====
====M
jjkk
ikk
ixyp
xypw
1)(
)(
)|(
)|(
Stochastic Simulation
new(predicted)
state
Model ofProcessDynamics
Model ofSensors
predictedsensorresponse
current(filtered)state
measureddata
RESAMPLING
Example: Nonisothermal CSTR
TcA ,
fAf Tc ,
V
F
F
TcA ,cQ
,1
)(1
,1
)(1
χχχχθθθθ
ββββθθθθ
θθθθθθθθ
−−−−++++−−−−−−−−====
++++−−−−−−−−====
fA
AfAAA
TcTkTdt
dT
ccTkcdt
dc
)/(exp)( 0 RTEkTk −−−−====
CSTR model
Reaction rate (Arrhenius relation)
Ref: Seborg, Edgar, Mellichamp (1989), Exercise 5.21
Variable Feed
0 20 40 60 80 100 1200.78
0.8
0.82
0.84
0.86Variations in feed concentration [lb mole/ft3]
0 20 40 60 80 100 120147
148
149
150
151Variations in feed temperature [oF]
Afc
fT
Cooling Effect
0 20 40 60 80 100 1200
1
2
3
4
5Periodic cooling
0 20 40 60 80 100 120130
140
150
160
170Temperature [oF]
χχχχ
T
0 20 40 60 80 100 1200
0.01
0.02
0.03
0 20 40 60 80 100 120130
140
150
160
170
State Estimation
Concentration [lb mole/ft3]
Temperature [oF]
Ac
T
Measurement Prediction
0 20 40 60 80 100 1200
0.01
0.02
0.03Concentration measurements vs predictions
0 20 40 60 80 100 120130
140
150
160
170Temperature measurements vs predictions
T
Ac
State Estimation with Sensor Validation
0 20 40 60 80 100 1200
0.01
0.02
0.03
0 20 40 60 80 100 120130
140
150
160
170
Concentration [lb mole/ft3]
Temperature [oF]
Ac
T
Conclusions
� Theory:
� Information geometry yields additional insight.
� Information geometry is tolerant to approximations
and “cheating”.
� Algorithm:
� Iterative sampling and importance resampling Monte
Carlo schemes offer powerful tools to manage the
“curse of dimensionality”.
� Benefit:
� Fine description of uncertainty results in lower missed
& false alarm rates, and shorter delay in detection.
Further Reading
� T.M. Cover and J.A. Thomas (1991). Elements of
Information Theory. Wiley, New York.
� R. E. Blahut (1987). Principles and Practice of
Information Theory. Addison-Wesley, Reading, MA.
� L. Tierney (1994). Markov chains for exploring posterior
distributions. Ann. Statist., 22, 1701-1762.
� A.F.M. Smith and A.E. Gelfand (1992). Bayesian
statistics without tears: a sampling-resampling
perspective. Amer. Statist., 46, 84-88.
� R. Kulhavý (1996). Recursive Nonlinear Estimation: A
Geometric Approach. Springer-Verlag, London.