deep neural network for prognostics and ......method and its application on rolling element bearing...
TRANSCRIPT
DEEP NEURAL NETWORK FOR PROGNOSTICS
AND HEALTH MANAGEMENT
Zhe Yang
2
k
j l
An Artificial Neural Network (ANN) is composed by several simple computational units (also called
nodes or neurons) directionally connected by weighted connections organized in a proper architecture
Input layer Hidden layer Output layer
Neuron
(node)
Bias 𝑤𝑗𝑘 𝑤𝑙𝑗
0
3
1
jk j
h h
j k
k
wf x wu=
= +
1
0
2o o h
l j
j
lj lw wu f u=
= +
Activate function
Artificial Neural Network
𝑦𝑗ℎ 𝑦𝑙
𝑜
𝑢𝑗ℎ 𝑢𝑙
𝑜
3
ANN-based PHM
Feature
extraction(Variance/Maximum
/Minimum/Skewnes
s/Kurtosis/Peak/Wa
velet…)
…
f1
f2
f3
fN
fsel1
fsel2
fsel3
Feature
selection(Lasso/
Filter/wrapper
method…)
Raw signal
Statistical Indicators
Expert intervention!
ANN
• Abnormal
condition
• Fault class
• RUL
4
Limitations of ANN
➢ Expert intervention for feature extraction and selection
➢ ANN UPPER LIMIT of learning capability
• MNIST database of handwritten digits [1]
Training set: 𝑛𝑝 =60,000 samples
Test set: 10,000 samples
0
1
2
3
4
5
6
7
8
9
What are they?
Accuracy
ANN [2] 99.3
Deep Neural Network[3] 99.65
[1] http://yann.lecun.com/exdb/mnist/
[2] Simard P Y, Steinkraus D, Platt J C. Best practices for convolutional neural networks applied to visual document analysis[C]//null. IEEE, 2003: 958.
[3] Ciresan et al. Neural Computation 10, 2010 and arXiv 1003.0358, 2010
𝑥11
𝑥21
𝒙 1𝒙 2
𝑥12
𝑥22
𝒙 𝑛𝑝
𝑥1𝑛𝑝
𝑥2𝑛𝑝
…
…
………
5
Neural Network-based PHM
Feature
extraction(Variance/Maximum
/Minimum/Skewnes
s/Kurtosis/Peak/Wa
velet…)
…
f2
f3
fN
fsel1
fsel2
fsel3
Feature
selection(Lasso/
Filter/wrapper
method…)
TaskRaw signal
Raw signal Deep Neural Network Task
Extract & select useful features automatically!
ANN
6
Shallow and Deep Neural Network
architecture
Shallow Neural Network Deep Neural Network (DNN)
Number of hidden layers: typically <3 Number of hidden layers: >=3
Why DNN was rarely used ten years ago?
7
How to train a DNN?
➢ Error back propagation:
Input layer
Hidden
layers
Output layer
𝐸 =1
2𝑛𝑜
𝑙=1
𝑛𝑜
𝑢𝑙𝑜 − 𝑡𝑙
2
TRUE l-th output
ANN l-th output
Number of output
neurons𝜕𝐸
𝜕𝑢𝑙𝑜 =
𝑢𝑙𝑜 − 𝑡𝑙𝑛𝑜
𝑤𝑙𝑗4(𝑖+1)
= 𝑤𝑙𝑗4(𝑖)
− 𝜂𝜕𝐸
𝜕𝑢𝑙𝑜
𝜕𝑢𝑙𝑜
𝜕𝑦𝑙𝑜
𝜕𝑦𝑙𝑜
𝜕𝑤𝑙𝑗4
𝑤𝑗4𝑗3(𝑖+1)
= 𝑤𝑗4𝑗3(𝑖)
− 𝜂𝜕𝐸
𝜕𝑢𝑙𝑜
𝜕𝑢𝑙𝑜
𝜕𝑦𝑙𝑜
𝜕𝑦𝑙𝑜
𝜕𝑢𝑗4ℎ
𝜕𝑢𝑗4ℎ
𝜕𝑦𝑗3ℎ
𝜕𝑦𝑗3ℎ
𝜕𝑤𝑗4𝑗3
𝑤𝑗3𝑗2(𝑖+1)
= 𝑤𝑗3𝑗2(𝑖)
− 𝜂𝜕𝐸
𝜕𝑢𝑙𝑜
𝜕𝑢𝑙𝑜
𝜕𝑦𝑙𝑜
𝜕𝑦𝑙𝑜
𝜕𝑢𝑗4ℎ
𝜕𝑢𝑗4ℎ
𝜕𝑦𝑗3ℎ
𝜕𝑦𝑗3ℎ
𝜕𝑢𝑗2ℎ
𝜕𝑢𝑗2ℎ
𝜕𝑦𝑗2ℎ
𝜕𝑦𝑗2ℎ
𝜕𝑤𝑗3𝑗2
𝑤𝑗2𝑗1(𝑖+1)
= 𝑤𝑗2𝑗1(𝑖)
− 0
𝑤𝑗1𝑘(𝑖+1)
= 𝑤𝑗1𝑘(𝑖)
− 0
Problem: fail to
tune weights!
Product of 3 partial derivatives
Product of 5 partial derivatives
Product of 7 partial
derivatives
Partial derivatives are small numbers (<1),
the product of many partial derivatives is ≈0
Gradient
vanishing
𝑤 𝑖+1 = 𝑤 𝑖 − 𝜂𝜕𝐸
𝜕𝑤
𝑢𝑙𝑜
8
How to train a DNN: Layer-wise training
➢ Pre-training
Breakthrough: 2006 [1]
[1] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of
data with neural networks[J]. science, 2006, 313(5786): 504-507.
➢ Stacking
9
DNN pre-training → Autoencoder
Input layer
Encoder
Decoder
Hidden layer
Output layer
Same number
of neurons
Input patterns
Reconstructed input patterns
Similar information
content
10
Sparse autoencoder: training
Encoder Decoder
Input samplesReconstructed
input samples
( ) ( )2 2
1
1 1ˆ * *
2
pn
p p
sparse
pp
E Rn
=
= − + + Wx x
• Cost function
Reconstruction
Error
Sparse
decomposition
of the input
Constrain the
value of inner
weights
𝒙 𝑝 , 𝑝 = 1,… , 𝑛𝑝 ෝ𝒙 𝑝 , 𝑝 = 1,… , 𝑛𝑝
11
Sparse autoencoder: sparsity
➢ Sparsity
Input samples
𝒙 𝑝 , 𝑝 = 1,… , 𝑛𝑝
…
0 0 0 0 0
…
…
0 0 0 0 0
…
1 1
1 hidden neuron only responds to 1 kind of pattern
Sparse
representation of
input
12
Sparse autoencoder: sparsity
j
ො𝜌𝑗 =1
𝑛𝑝
𝑝=1
𝑛𝑝
𝑓ℎ
𝑘=1
4
𝑥𝑘𝑝𝑤𝑗𝑘 +𝑤𝑗0
𝑥11
𝑥21
𝑥31
𝑥41
𝒙 1𝒙 2
𝑥12
𝑥22
𝑥32
𝑥42
𝒙 𝑛𝑝
𝑥1𝑛𝑝
𝑥2𝑛𝑝
𝑥3𝑛𝑝
𝑥4𝑛𝑝
…
…
Output of hidden
neuron j given input 𝒙 𝑝Average output of hidden neuron
j over the training set
𝑓ℎ 𝒙 1 =0,𝑓ℎ 𝒙 2 =1, 𝑓ℎ 𝒙 𝑛𝑝 =0…
ො𝜌𝑗 =𝑛
𝑛𝑝 Very small number when 𝑛 ≪ 𝑛𝑝
➢ Let us consider hidden neuron j
k
𝒙 2
…
j
…
…
Input samples
(many different patterns)
𝑛 = 3, the number of
13
Sparse autoencoder: sparsity
j
➢ To obtain the sparse decomposition of input samples, we set an expected average output for
hidden neurons, 𝜌.
ො𝜌𝑗 =1
𝑛𝑝
𝑝=1
𝑛𝑝
𝑓𝑗 𝒙 𝑝
𝜌 = 0.3
𝜌
Training objective for sparsity
KL divergence: Difference between ො𝜌𝑗 and 𝜌
( ) ( )2 2
1
1 1ˆ * *
2
pn
p p
sparse
pp
E Rn
=
= − + + x x W
𝑗
𝑛ℎ
KL 𝜌ቛො𝜌𝑗 =
𝑗
𝑛ℎ
𝜌 log𝜌
ො𝜌𝑗+ 1 − 𝜌 log
1 − 𝜌
1 − ො𝜌𝑗
ො𝜌𝑗
ො𝜌𝑗 = 𝜌
KL 𝜌ቛො𝜌𝑗Minimize
14
Build a DNN using autoencoders
2000
2000
1000
300
1000
1000
300
300
10
2000
1000
300
10
Objective
100
100
100
100
Autoencoder 1
Autoencoder 2
Autoencoder 3
Autoencoder 4
Step 1. train
autoencoders
Input patterns
Target
Input patterns
Classification/
regression
15
Build a DNN using autoencoders
2000
2000
1000
300
1000
1000
300
300
10
100
100
100Step 1
2000
1000
300
10
300
1000
2000
100
100
Step 2. Stacking and
fine-tuning
2000
1000
300
10
100
Step 3. Throw away
‘Decoder’ and
supervised training
Classification/
regression
Input patterns Input patterns Input patterns
Target
16
DNN: Difficulty to decide the number of neurons
20
00
10
00
300
10
10
0
Cla
ssific
atio
n/
regre
ssio
n
How to decide the
numbers of layers and
neurons?
Sparse autoencoder: extract interesting features in
the data relatively independent from the layers and
neurons
…
0 0 0 0 0
…
1
…
…
0 0 01
17
Comparison: Shallow and Deep Neural
Network
Shallow Neural Network Deep Neural Network (DNN)
Number of hidden layers: typically <3 Number of hidden layers: >=3
Training Training
Accuracy Accuracy
Big data Big data
Back propagation
√
√
×
×
Raw data × Raw data √
Layer-wise training using
autoencoders
18
Case study
DNN for bearing fault detection
19
Case study
➢ [1]• Available information: run-to-failure
vibration signal of 4 bearings
[1] Benchmark available at Nasa prognostic repository (https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/). J. Lee, H.
Qiu, G. Yu, J. Lin, and Rexnord Technical Services (2007). IMS, University of Cincinnati. "Bearing Data Set", NASA Ames Prognostics Data
Repository (http://ti.arc.nasa.gov/project/prognostic-data-repository), NASA Ames Research Center, Moffett Field, CA
State at the end of
experiment
Healthy
Healthy
Failed
(Inner race)
Failed
(Roller element)
B1
B2
B3
B4When?
20
Case study- data collection
Sample
Acc
eler
atio
n m
/s2
time
Snapshot (1s) snapshot snapshot
……
B3
10 min
Failed
(Inner race)
21
Case study: data preprocessing
time
Snapshot (1s) snapshot snapshotB3
10 min
20480 samples
Samples
Coeffic
ients
Mean
Co
effic
ients
Continuous Wavelet Transform
22
Case study: train sparse autoencoder
Extracted features
1333
700
5
700
1500
1333
200
200
1500
…
B3
Run-to-failure
trajectory
23
Degradation
trend
Abrupt jump
at the end
𝑍 =
𝑆 − 1
𝜎0
𝑆 + 1
𝜎
If 𝑆 > 0
If 𝑆 = 0
If 𝑆 < 0
Mann-Kendall
metric
𝑆 =
𝑘=1
𝑛−1
𝑗=𝑘+1
𝑛
𝑠𝑔𝑛 𝑋𝑗 − 𝑋𝑘 , 𝜎 =𝑛 𝑛 − 1 2𝑛 + 5
18
Feature
Test of signal
monotonicity
(Mann-Kendall)
1 27.6913
2 -21.7529
3 9.6151
4 -13.6292
5 -11.1801
Zoom of Extracted features
Case study: feature evaluation
24
Case study: train sparse autoencoder
Extracted feature 1
Degradation trend
25
Case study: training of DNN
1333
700
5
700
1500
1333
200
200
15001333
700
5
200
1500
Fault detection(classification layer)
26
Case study: training of DNN
1333
700
5
200
1500
Fault detection(classification layer)
Snapshot Label Health State
1~1616 0 Healthy
1617~2027 0.5 Possibly Failed
2028~2156 1 Failed
➢ Training set
B3
Snapshot 1 Snapshot 2 Snapshot 3
Onset of failure: 1617 [1], 2027 [2]
[1] Qiu, H., J. Lee, J. Lin, and G. Yu, Wavelet filter-based weak signature detection method and its application on rolling element bearing prognostics. Journal of sound and vibration, 2006. 289(4): p. 1066-1090.
[2] Hasani, R.M., G. Wang, and R. Grosu, An Automated Auto-encoder Correlation-based Health-Monitoring and Prognostic Method for Machine Bearings. arXivpreprint arXiv:1703.06272, 2017.
Output: health state
Failed
(Inner race)
27
1333
5
1500
700
200
➢ Developed model
Fault detection(classification layer)
Healthy
Healthy
De
gra
da
tio
n le
ve
lD
eg
rad
atio
n le
ve
l
Case study: results
B1
B2
Robust when the signals of B1 and B2 are
influenced by failed bearings
28
Healthy
Failed
Snapshot Label Health State
1~1616 0 Healthy
1617~1760 0.5 Possibly Failed
1761~2156 1 Failed
➢ Result
The onset is identified at snapshot 1680
with threshold Th=0.5, which lies in the
possibly failed range [1617, 1760]
[1] Qiu, H., J. Lee, J. Lin, and G. Yu, Wavelet filter-based weak signature detection method and its application on rolling element bearing prognostics. Journal of sound and vibration, 2006. 289(4): p. 1066-1090.
[2] Hasani, R.M., G. Wang, and R. Grosu, An Automated Auto-encoder Correlation-based Health-Monitoring and Prognostic Method for Machine Bearings. arXiv preprint arXiv:1703.06272, 2017.
[3] Yu, J., Health condition monitoring of machines based on hidden Markov model and contribution analysis. IEEE Transactions on Instrumentation and Measurement, 2012. 61(8): p. 2200-2211.
Case study: results
B4
De
gra
da
tio
n le
ve
l
Failed
(Roller element)
Detect multiple failure
29
Conclusions
➢ Shallow and deep neural network
➢ Training of DNN: autoencoders
…
➢ Sparse autoencoder
➢ Case study
KL 𝜌ቛො𝜌𝑗
VS.
30