sliding window based fault detection from high-dimensional data …/file/zhang liangwei... · 2016....
TRANSCRIPT
Sliding window based fault detectionfrom high-dimensional data streams
Liangwei ZhangDivision of Operation and Maintenance Engineering
Authors: Liangwei Zhang, Janet Lin, Ramin Karim
1
OUTLINE
1. Background
2. Problem statement
3. Existing work
4. Sliding Window-based ABSAD
5. Summary
2
Problem: Detect ”something has gone wrong”, fault detection
Structured Semi-structured Unstructured
Terabytes Records Transactions Tables, files Dimensions
3Vs of Big Data
Batch Real-time Near-real-time Streams
Volume Velocity
Variety
1. Background
• Big Data
128+64 sensors (High dimensionality)
1Hz, 86400 samplesper day (Data stream)
3
Veracity, Value, Visualization, Volatility, Validity, Venue, Vocabulary, Vagueness…
1. Background• ISO 13374-2, OSA-CBM (Open System Architecture for
Condition-Based Maintenance)
1. Data Acquisition
2. Data Manipulation
3. State Detection
4. Health Assessment
5. Prognostics
6. Decision Support
7. Presentation
Sensor / Transducer / Manual Entry
External systems, data archiving and
block configuration
CO
MM
ON
NET
WO
RK
Fault detection
Fault detectionand diagnosis (FDD)
FDD and prognosis
4
(anomaly detection,outlier detection)
1. Background
• Fault detection
– Fault detection aims to identify defective states and
conditions within complex industrial systems, subsystems
and components
– Detect “something has gone wrong”
– From a machine learning viewpoint, it answers “yes or no”,
skewed binary classification problem
5
1 Background
• Machine Learning, models are fixed once learned.
6
2. Problem statement
• Curse of dimensionality– Number of training samples (exponentially increased)
– Notions like proximity, distance, neighborhood becomeless meaningful (the concentration of norms)
– Probability theory is counter-intuitive
7
2. Problem statement
• Requirements from data stream– Real-time or near real-time response
– One-scan algorithms
– Concept drift
• System has time-varying characteristics due to seasonalflunctuation, equipment aging, process drifting, etc.
8
3. Existing work
9
• Typically, extension of existing algorithms– Exponential weighted PCA
– Sliding window PCA
– Recursive PCA
• The key is the learned model should be refined, enhanced, and personalized while stream evolves so as to accomodate the natural drift in data stream.
• Research in fault detection from high-dimensional data stream is under-explored.
10
4.1 ABSAD: fault detection from high-dimensionaldata
Zhang, Liangwei, Jing Lin, and Ramin Karim. "An angle-based subspace anomaly detection approach to high-dimensional data: With an application to industrial fault detection." Reliability Engineering & System Safety 142 (2015): 482-497.
4.2 Sliding window-based ABSAD
• sliding window strategy
11
1 100
101
2 3 …
Initial Model (parameters)
Window size L: 100
EvaluationEvaluation
Data stream
102 103 104 …New Model (new parameters)
• Parameters: window profile
• ”Primitive model”: initial model is built once for all
Stage 1: Offline model trainingStage 2: Online fault detection
• Model structure
12
4.2 Sliding window-based ABSAD
Stage 1:Offline Model Training
Stage 2:Online Fault Detection
Data preparation
Feature normalization
Derivation of reference sets
Selection of relevant subspaces
Computation to local outlier scores
Determination of the control limit
start
Data preprocessing
Feature normalization
Derivation of the reference set
Selection of the relevant subspace
Computation to the local outlier score
Current window profile:(memory-resident data)
LOS exceeds the control limit
Samples (a matrix)
Several consecutive
faults detected
Yes
Trigger an alarm
Yes
No
Mean vector
Standard deviation
vector
k-Distance
vector
Control limit
No
Update the window profile
Sampling/Resampling
KNN list (a set of
sets)
Initialize the first window
• Computational complexity – The first stage takes the same complexity as the ABSAD
approach, O ∙max , . If some structure is employed, O log ∙max ,
– The time complexity of the second stage for processing a single sample is O ∙max , and the space complexity is O ∙max ,
13
4.2 Sliding window-based ABSAD
• Numerical illustration: data– System modeled on the input-output form
14
4.2 Sliding window-based ABSAD
• Numerical illustration: data– 95 fault-irrelevant dimensions were added to create a high-
dimensional setting
– 4 datasets (4 faults), 2000 by 100 matrix
– All faults were induced starting from the 1501st sample
– slow drifts were added starting from the 1001st sample
15
4.2 Sliding window-based ABSAD
• Numerical illustration: results and discussion– Fault detection results on scenario 2
16
4.2 Sliding window-based ABSAD
4.2 Sliding window-based ABSAD
• Numerical illustration: results and discussion
17
5. Conclusion
• The sliding window-based ABSAD approach can be adaptive to the time-varying behavior of the concerned system
• The sliding window ABSAD can perform isochronously online fault detection by trading space for time.
18
Acknowledgement
19
Thank you!