deep neural network for prognostics and ......method and its application on rolling element bearing...

DEEP NEURAL NETWORK FOR PROGNOSTICS

AND HEALTH MANAGEMENT

Zhe Yang

2

k

j l

An Artificial Neural Network (ANN) is composed by several simple computational units (also called

nodes or neurons) directionally connected by weighted connections organized in a proper architecture

Input layer Hidden layer Output layer

Neuron

(node)

Bias 𝑤𝑗𝑘 𝑤𝑙𝑗

0

3

1

jk j

h h

j k

k

wf x wu=

= +

1

0

2o o h

l j

j

lj lw wu f u=

= +

Activate function

Artificial Neural Network

𝑦𝑗ℎ 𝑦𝑙

𝑜

𝑢𝑗ℎ 𝑢𝑙

𝑜

3

ANN-based PHM

Feature

extraction(Variance/Maximum

/Minimum/Skewnes

s/Kurtosis/Peak/Wa

velet…)

…

f1

f2

f3

fN

fsel1

fsel2

fsel3

Feature

selection(Lasso/

Filter/wrapper

method…)

Raw signal

Statistical Indicators

Expert intervention!

ANN

• Abnormal

condition

• Fault class

• RUL

4

Limitations of ANN

➢ Expert intervention for feature extraction and selection

➢ ANN UPPER LIMIT of learning capability

• MNIST database of handwritten digits [1]

Training set: 𝑛𝑝 =60,000 samples

Test set: 10,000 samples

0

1

2

3

4

5

6

7

8

9

What are they?

Accuracy

ANN [2] 99.3

Deep Neural Network[3] 99.65

[1] http://yann.lecun.com/exdb/mnist/

[2] Simard P Y, Steinkraus D, Platt J C. Best practices for convolutional neural networks applied to visual document analysis[C]//null. IEEE, 2003: 958.

[3] Ciresan et al. Neural Computation 10, 2010 and arXiv 1003.0358, 2010

𝑥11

𝑥21

𝒙 1𝒙 2

𝑥12

𝑥22

𝒙 𝑛𝑝

𝑥1𝑛𝑝

𝑥2𝑛𝑝

…

…

………

http://yann.lecun.com/exdb/mnist/

5

Neural Network-based PHM

Feature

extraction(Variance/Maximum

/Minimum/Skewnes

s/Kurtosis/Peak/Wa

velet…)

…

f2

f3

fN

fsel1

fsel2

fsel3

Feature

selection(Lasso/

Filter/wrapper

method…)

TaskRaw signal

Raw signal Deep Neural Network Task

Extract & select useful features automatically!

ANN

6

Shallow and Deep Neural Network

architecture

Shallow Neural Network Deep Neural Network (DNN)

Number of hidden layers: typically <3 Number of hidden layers: >=3

Why DNN was rarely used ten years ago？

7

How to train a DNN?

➢ Error back propagation:

Input layer

Hidden

layers

Output layer

𝐸 =1

2𝑛𝑜

𝑙=1

𝑛𝑜

𝑢𝑙𝑜 − 𝑡𝑙

2

TRUE l-th output

ANN l-th output

Number of output

neurons𝜕𝐸

𝜕𝑢𝑙𝑜 =

𝑢𝑙𝑜 − 𝑡𝑙𝑛𝑜

𝑤𝑙𝑗4(𝑖+1)

= 𝑤𝑙𝑗4(𝑖)

− 𝜂𝜕𝐸

𝜕𝑢𝑙𝑜

𝜕𝑢𝑙𝑜

𝜕𝑦𝑙𝑜

𝜕𝑦𝑙𝑜

𝜕𝑤𝑙𝑗4

𝑤𝑗4𝑗3(𝑖+1)

= 𝑤𝑗4𝑗3(𝑖)

− 𝜂𝜕𝐸

𝜕𝑢𝑙𝑜

𝜕𝑢𝑙𝑜

𝜕𝑦𝑙𝑜

𝜕𝑦𝑙𝑜

𝜕𝑢𝑗4ℎ

𝜕𝑢𝑗4ℎ

𝜕𝑦𝑗3ℎ

𝜕𝑦𝑗3ℎ

𝜕𝑤𝑗4𝑗3



− 𝜂𝜕𝐸

𝜕𝑢𝑙𝑜

𝜕𝑢𝑙𝑜

𝜕𝑦𝑙𝑜

𝜕𝑦𝑙𝑜

𝜕𝑢𝑗4ℎ

𝜕𝑢𝑗4ℎ

𝜕𝑦𝑗3ℎ

𝜕𝑦𝑗3ℎ

𝜕𝑢𝑗2ℎ

𝜕𝑢𝑗2ℎ

𝜕𝑦𝑗2ℎ

𝜕𝑦𝑗2ℎ

𝜕𝑤𝑗3𝑗2



− 0

𝑤𝑗1𝑘(𝑖+1)

= 𝑤𝑗1𝑘(𝑖)

− 0

Problem: fail to

tune weights!

Product of 3 partial derivatives

Product of 5 partial derivatives

Product of 7 partial

derivatives

Partial derivatives are small numbers (<1),

the product of many partial derivatives is ≈0

Gradient

vanishing

𝑤 𝑖+1 = 𝑤 𝑖 − 𝜂𝜕𝐸

𝜕𝑤

𝑢𝑙𝑜

8

How to train a DNN: Layer-wise training

➢ Pre-training

Breakthrough: 2006 [1]

[1] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of

data with neural networks[J]. science, 2006, 313(5786): 504-507.

➢ Stacking

9

DNN pre-training → Autoencoder

Input layer

Encoder

Decoder

Hidden layer

Output layer

Same number

of neurons

Input patterns

Reconstructed input patterns

Similar information

content

10

Sparse autoencoder: training

Encoder Decoder

Input samplesReconstructed

input samples

( ) ( )2 2

1

1 1ˆ * *

2

pn

p p

sparse

pp

E Rn

=

= − + + Wx x

• Cost function

Reconstruction

Error

Sparse

decomposition

of the input

Constrain the

value of inner

weights

𝒙 𝑝 , 𝑝 = 1,… , 𝑛𝑝 ෝ𝒙 𝑝 , 𝑝 = 1,… , 𝑛𝑝

11

Sparse autoencoder: sparsity

➢ Sparsity

Input samples

𝒙 𝑝 , 𝑝 = 1,… , 𝑛𝑝

…

0 0 0 0 0

…

…

0 0 0 0 0

…

1 1

1 hidden neuron only responds to 1 kind of pattern

Sparse

representation of

input

12


j

ො𝜌𝑗 =1

𝑛𝑝

𝑝=1

𝑛𝑝

𝑓ℎ

𝑘=1

4

𝑥𝑘𝑝𝑤𝑗𝑘 +𝑤𝑗0

𝑥11

𝑥21

𝑥31

𝑥41

𝒙 1𝒙 2

𝑥12

𝑥22

𝑥32

𝑥42

𝒙 𝑛𝑝

𝑥1𝑛𝑝

𝑥2𝑛𝑝

𝑥3𝑛𝑝

𝑥4𝑛𝑝

…

…

Output of hidden

neuron j given input 𝒙 𝑝Average output of hidden neuron

j over the training set

𝑓ℎ 𝒙 1 =0,𝑓ℎ 𝒙 2 =1, 𝑓ℎ 𝒙 𝑛𝑝 =0…

ො𝜌𝑗 =𝑛

𝑛𝑝 Very small number when 𝑛 ≪ 𝑛𝑝

➢ Let us consider hidden neuron j

k

𝒙 2

…

j

…

…

Input samples

(many different patterns)

𝑛 = 3, the number of

13


j

➢ To obtain the sparse decomposition of input samples, we set an expected average output for

hidden neurons, 𝜌.

ො𝜌𝑗 =1

𝑛𝑝

𝑝=1

𝑛𝑝

𝑓𝑗 𝒙 𝑝

𝜌 = 0.3

𝜌

Training objective for sparsity

KL divergence: Difference between ො𝜌𝑗 and 𝜌

( ) ( )2 2

1

1 1ˆ * *

2

pn

p p

sparse

pp

E Rn

=

= − + + x x W

𝑗

𝑛ℎ

KL 𝜌ቛො𝜌𝑗 =

𝑗

𝑛ℎ

𝜌 log𝜌

ො𝜌𝑗+ 1 − 𝜌 log

1 − 𝜌

1 − ො𝜌𝑗

ො𝜌𝑗

ො𝜌𝑗 = 𝜌

KL 𝜌ቛො𝜌𝑗Minimize

14

Build a DNN using autoencoders

2000

2000

1000

300

1000

1000

300

300

10

2000

1000

300

10

Objective

100

100

100

100

Autoencoder 1

Autoencoder 2

Autoencoder 3

Autoencoder 4

Step 1. train

autoencoders

Input patterns

Target

Input patterns

Classification/

regression

15

Build a DNN using autoencoders

2000

2000

1000

300

1000

1000

300

300

10

100

100

100Step 1

2000

1000

300

10

300

1000

2000

100

100

Step 2. Stacking and

fine-tuning

2000

1000

300

10

100

Step 3. Throw away

‘Decoder’ and

supervised training

Classification/

regression

Input patterns Input patterns Input patterns

Target

16

DNN: Difficulty to decide the number of neurons

20

00

10

00

300

10

10

0

Cla

ssific

atio

n/

regre

ssio

n

How to decide the

numbers of layers and

neurons?

Sparse autoencoder: extract interesting features in

the data relatively independent from the layers and

neurons

…

0 0 0 0 0

…

1

…

…

0 0 01

17

Comparison: Shallow and Deep Neural

Network

Shallow Neural Network Deep Neural Network (DNN)

Number of hidden layers: typically <3 Number of hidden layers: >=3

Training Training

Accuracy Accuracy

Big data Big data

Back propagation

√

√

×

×

Raw data × Raw data √

Layer-wise training using

autoencoders

18

Case study

DNN for bearing fault detection

19

Case study

➢ [1]• Available information: run-to-failure

vibration signal of 4 bearings

[1] Benchmark available at Nasa prognostic repository (https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/). J. Lee, H.

Qiu, G. Yu, J. Lin, and Rexnord Technical Services (2007). IMS, University of Cincinnati. "Bearing Data Set", NASA Ames Prognostics Data

Repository (http://ti.arc.nasa.gov/project/prognostic-data-repository), NASA Ames Research Center, Moffett Field, CA

State at the end of

experiment

Healthy

Healthy

Failed

(Inner race)

Failed

(Roller element)

B1

B2

B3

B4When?

20

Case study- data collection

Sample

Acc

eler

atio

n m

/s2

time

Snapshot (1s) snapshot snapshot

……

B3

10 min

Failed

(Inner race)

21

Case study: data preprocessing

time

Snapshot (1s) snapshot snapshotB3

10 min

20480 samples

Samples

Coeffic

ients

Mean

Co

effic

ients

Continuous Wavelet Transform

22

Case study: train sparse autoencoder

Extracted features

1333

700

5

700

1500

1333

200

200

1500

…

B3

Run-to-failure

trajectory

23

Degradation

trend

Abrupt jump

at the end

𝑍 =

𝑆 − 1

𝜎0

𝑆 + 1

𝜎

If 𝑆 > 0

If 𝑆 = 0

If 𝑆 < 0

Mann-Kendall

metric

𝑆 =

𝑘=1

𝑛−1

𝑗=𝑘+1

𝑛

𝑠𝑔𝑛 𝑋𝑗 − 𝑋𝑘 , 𝜎 =𝑛 𝑛 − 1 2𝑛 + 5

18

Feature

Test of signal

monotonicity

(Mann-Kendall)

1 27.6913

2 -21.7529

3 9.6151

4 -13.6292

5 -11.1801

Zoom of Extracted features

Case study: feature evaluation

24

Case study: train sparse autoencoder

Extracted feature 1

Degradation trend

25

Case study: training of DNN

1333

700

5

700

1500

1333

200

200

15001333

700

5

200

1500

Fault detection(classification layer)

26

Case study: training of DNN

1333

700

5

200

1500


Snapshot Label Health State

1~1616 0 Healthy

1617~2027 0.5 Possibly Failed

2028~2156 1 Failed

➢ Training set

B3

Snapshot 1 Snapshot 2 Snapshot 3

Onset of failure: 1617 [1], 2027 [2]

[1] Qiu, H., J. Lee, J. Lin, and G. Yu, Wavelet filter-based weak signature detection method and its application on rolling element bearing prognostics. Journal of sound and vibration, 2006. 289(4): p. 1066-1090.

[2] Hasani, R.M., G. Wang, and R. Grosu, An Automated Auto-encoder Correlation-based Health-Monitoring and Prognostic Method for Machine Bearings. arXivpreprint arXiv:1703.06272, 2017.

Output: health state

Failed

(Inner race)

27

1333

5

1500

700

200

➢ Developed model


Healthy

Healthy

De

gra

da

tio

n le

ve

lD

eg

rad

atio

n le

ve

l

Case study: results

B1

B2

Robust when the signals of B1 and B2 are

influenced by failed bearings

28

Healthy

Failed

Snapshot Label Health State

1~1616 0 Healthy

1617~1760 0.5 Possibly Failed

1761~2156 1 Failed

➢ Result

The onset is identified at snapshot 1680

with threshold Th=0.5, which lies in the

possibly failed range [1617, 1760]

[1] Qiu, H., J. Lee, J. Lin, and G. Yu, Wavelet filter-based weak signature detection method and its application on rolling element bearing prognostics. Journal of sound and vibration, 2006. 289(4): p. 1066-1090.

[2] Hasani, R.M., G. Wang, and R. Grosu, An Automated Auto-encoder Correlation-based Health-Monitoring and Prognostic Method for Machine Bearings. arXiv preprint arXiv:1703.06272, 2017.

[3] Yu, J., Health condition monitoring of machines based on hidden Markov model and contribution analysis. IEEE Transactions on Instrumentation and Measurement, 2012. 61(8): p. 2200-2211.

Case study: results

B4

De

gra

da

tio

n le

ve

l

Failed

(Roller element)

Detect multiple failure

29

Conclusions

➢ Shallow and deep neural network

➢ Training of DNN: autoencoders

…

➢ Sparse autoencoder

➢ Case study

KL 𝜌ቛො𝜌𝑗

VS.

deep neural network for prognostics and ......method and its application on rolling element bearing...

Documents