planes, trains, and automobiles: a data scientist’s guide to modeling engine degradation

75
1 © 2016 Pivotal Software, Inc. All rights reserved. 1 Planes, Trains, and Automobiles A Data Scientist’s Guide to Modeling Engine Degradation April Song @aprilsongg Sarah Aerni @itweetsarah

Upload: april-song

Post on 13-Feb-2017

613 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

1 © 2016 Pivotal Software, Inc. All rights reserved. 1

Planes, Trains, and Automobiles

A Data Scientist’s Guide to Modeling Engine Degradation

April Song @aprilsongg Sarah Aerni @itweetsarah

Page 2: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

2 © 2016 Pivotal Software, Inc. All rights reserved.

Gene Sequencing

Smart Grids

COST TO SEQUENCE ONE GENOME HAS FALLEN FROM

$100M IN 2001

TO $10K IN 2011 TO $1K IN 2014

READING SMART METERS EVERY 15 MINUTES IS 3000X MORE DATA INTENSIVE

Stock Market

Social Media

FACEBOOK UPLOADS 250 MILLION

PHOTOS EACH DAY

Oil Exploration

Video Surveillance

OIL RIGS GENERATE

25000 DATA POINTS PER SECOND

Medical Imaging

Mobile Sensors

All industries need technology to process and store data

Page 3: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

3 © 2016 Pivotal Software, Inc. All rights reserved.

How can connected devices in our home be smart enough to

make daily life easier?

Page 4: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

4 © 2016 Pivotal Software, Inc. All rights reserved.

How can we know a tree has fallen on a power line before the

residents complain?

Page 5: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

5 © 2016 Pivotal Software, Inc. All rights reserved.

How can we use data to prevent airplane accidents?

Page 6: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

6 © 2016 Pivotal Software, Inc. All rights reserved.

Aerospace Industry is Embracing IoT

!  Engines are being fitted with more and more sensors

!  Aircraft data networks are improving data transfer speeds

!  Real time analytics is improving efficiency and performance

Page 7: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

7 © 2016 Pivotal Software, Inc. All rights reserved.

Pratt & Whitney’s Geared Turbo Fan Engine

!  5,000 sensors

!  10 GB data per second

!  12 hours of flight = 844 TB data

Page 8: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

8 © 2016 Pivotal Software, Inc. All rights reserved.

WHY IS THIS A DATA SCIENCE PROBLEM?

Page 9: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

9 © 2016 Pivotal Software, Inc. All rights reserved.

How does this…

Page 10: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

10 © 2016 Pivotal Software, Inc. All rights reserved.

How does this…

…become this?

Page 11: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

11 © 2016 Pivotal Software, Inc. All rights reserved.

How does this…

…become this?

By recognizing this

Page 12: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

12 © 2016 Pivotal Software, Inc. All rights reserved.

HOW CAN IT SOLVE JET ENGINE CHALLENGES?

Page 13: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

13 © 2016 Pivotal Software, Inc. All rights reserved.

But what can we do with this much data?

Predict thrust demands of an engine Reduction in fuel consumption

Monitor engine health and degradation

Reduced maintenance costs with increased performance, efficiency, and engine lifetime

Detect faults and anomalies during a flight

Prevention of equipment failures and accidents

Page 14: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

14 © 2016 Pivotal Software, Inc. All rights reserved.

What We Will Cover Today

!  Jet Engine Sensor Data

!  Enabling Technologies for Data Science

!  Building Models on Large-Scale Datasets –  Detecting Engine “end-of-life” Signal via Clustering –  Tracing Engine Health Degradation using Classification

Page 15: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

15 © 2016 Pivotal Software, Inc. All rights reserved.

Commercial Modular Aero-Propulsion System Simulation

Introduction to C-MAPSS

C-MAPSS a Matlab program that simulates a large high-bypass commercial turbofan engine capable of ~90k lbs thrust –  GUI allows point-and-click

operation of engine models –  simulates deterioration and

faults

Simplified diagram of 90k engine

Page 16: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

16 © 2016 Pivotal Software, Inc. All rights reserved.

Overview of Flights

!  6,875 flights –  5,244 flights from

nominal engines –  1,631 flights from

fault engines

!  Flight lengths range from 74 to 85 minutes

!  Average length of flight is ~80 minutes

# of

Flig

hts

Length of Flight (Seconds)

Page 17: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

17 © 2016 Pivotal Software, Inc. All rights reserved.

Flight Parameters

Parameter Name Description Units

Flight Conditions

time Flight time sec

alt Altitude ft

MN Mach number pct

TRA Trottle resolver angle deg

Wf Fuel flow pps

Fn Net thrust lbf

Parameter Name Description Units

Measurement Temperatures

T48 Total temperature at HPT outlet R

T2 Total temperature at fan outlet R

T24 Total temperature at LPC outlet R

T30 Total temperature at HPC outlet R

T50 Total temperature at LPT outlet R

Parameter Name Description Units

Other Measurements Nf Physical fan speed rpm

Nc Physical core speed rpm

epr Engine pressure ratio (P50/P2) --

phi Ratio of fuel flow to Ps30

pps/psiu

Ps30 Static pressure at HPC outlet psia

NfR Corrected fan speed rpm

NcR Corrected core speed rpm

BPR Bypass ratio --

farB Burner fuel-air ratio --

htBleed Bleed enthalpy --

PCNfRdmd Percent corrected fan speed pct

W31 HPT coolant bleed lmb/s

W32 LPT coolant bleed lmb/s

Health Indicators

SmHPC HPC stall margin --

SmLPC LPC stall margin --

SmFan Fan stall margin --

Pressure Measurements

P2 Pressure at fan inlet psia

P15 Total pressure in bypass-duct psia

P30 Total pressure at HPC outlet psia

Page 18: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

18 © 2016 Pivotal Software, Inc. All rights reserved.

LARGE DATASETS REQUIRE NEW TECHNOLOGIES

At-Scale Modeling

Page 19: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

19 © 2016 Pivotal Software, Inc. All rights reserved.

Need for new environments to process big data?

HDFS STORAGE AND MPP ARCHITECTURES DISTRIBUTE STORAGE

AND PREVENT DATA MOVEMENT VARIETY/VELOCITY

DISTRIBUTED COMPUTATION FOR PARALLELIZATION

PETABYTES OF DATA

OPEN-SOURCE LIBRARY FOR MACHINE LEARNING AT SCALE AND FRAMEWORK

TO ACCESS COMMON LANGUAGES

RAPIDLY EVOLVING FIELD OF DATA SCIENCE AND

TOOLS

SQL ENGINE AND ODBC/JDBC CONNECTIONS TO HADOOP

MANY EXISTING LIBRARIES, TOOLS AND

EXPERTISE

FLEXIBLE

SCALABLE

ENABLING

ACCESSIBLE

Page 20: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

20 © 2016 Pivotal Software, Inc. All rights reserved.

A single address for everything analytics Analytics with Pivotal

Time-to-Insights FORECASTING CLUSTERING

REGRESSION

CLASSIFICATION

OPTIMIZATION

Page 21: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

21 © 2016 Pivotal Software, Inc. All rights reserved.

Pivotal Greenplum MPP DB Think of it as multiple PostGreSQL

servers

Rows are distributed across segments by a particular field (or

randomly)

Segments/Workers

Master

Page 22: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

22 © 2016 Pivotal Software, Inc. All rights reserved.

Greenplum Database Features for Data Scientists

•  Window functions: Perform calculations across a set of table rows that are somehow related to the current row

•  Analytics extensions: In-database machine learning at scale using MADlib

•  Procedural language extensions: Extended functionality using non-SQL programming languages and packages (e.g. Python and R) !  Client Access: ODBC and JDBC

access to support connections to 3rd party tools

* Only a subset of Greenplum Database features

Page 23: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

23 © 2016 Pivotal Software, Inc. All rights reserved.

MADlib: Scalable, In-database ML

•  Open Source https://github.com/madlib/madlib •  Works on Greenplum DB, HAWQ and PostgreSQL •  In active development by Pivotal •  Downloads and Docs: http://madlib.net/

Page 24: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

24 © 2016 Pivotal Software, Inc. All rights reserved.

•  For embarrassingly parallel tasks, we can use procedural languages to easily parallelize any stand-alone library in Java, Python, R or C/C++

•  The interpreter/VM of the language ‘X’ is installed on each node of the MPP environment

Standby Master

Master Host

SQL

Interconnect

Segment Host Segment Segment

Segment Host Segment Segment

Segment Host Segment Segment

Segment Host Segment Segment

Data Parallelism through PL/X

CREATE FUNCTION pymax ( a integer, b integer) RETURNS integer AS $$ if a > b: return a return b $$ LANGUAGE plpythonu;

SQL wrapper

Source language code

Source language

declaration

User Defined Functions

Page 25: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

25 © 2016 Pivotal Software, Inc. All rights reserved.

Altitude over time for some example flights

What does a typical flight look like?

!  Flight consists of series of ascents, cruises, and descents

!  Average cruise at 35,000 Ft is for ~ 21 minutes –  Engine health is

calculated from a snapshot of parameters during this cruise

Page 26: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

26 © 2016 Pivotal Software, Inc. All rights reserved.

Time Series: Pressure Parameters

!  P2, P15, and P30 appear to be positively correlated except during the middle cruise –  correlation may

differ depending on regime

P2 = Pressure at Fan Inlet P15 = Total pressure in bypass-duct P30 = Total pressure at HPC outlet

Page 27: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

27 © 2016 Pivotal Software, Inc. All rights reserved.

Life of a Nominal Engine

!  Engine health is modeled to degrade exponentially over time

!  5,244 flights from 25 nominal engines

!  Median number of flights for a nominal engine is 201

!  Median health score of nominal engines across all flights is ~.81

Page 28: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

28 © 2016 Pivotal Software, Inc. All rights reserved.

Opportunity for Clustering of Engines

!  Nominal engines seem to degrade in at least 4 different ways –  cluster engines

based on degradation trend

–  caveat: small sample size (35 engines)

!  Additional Modeling Opportunity: –  Predict engine

health score

Page 29: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

29 © 2016 Pivotal Software, Inc. All rights reserved.

Life of a Fault Engine !  Significant drop in

engine health is apparent after a fault flight

!  1,631 flights from 10 fault engines

!  Median number flights of fault engines is 137 flights

!  Median health score of fault engines across all flights is ~.72

Fault Flight

Page 30: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

30 © 2016 Pivotal Software, Inc. All rights reserved.

Example: Engine Pressure Ratio (EPR) for flight 32-15, a flight with a fan fault

What happens when there is a fault?

At first glance, fault’s effects are not noticeable –  Need to zoom in to see the effects of a fault

Page 31: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

31 © 2016 Pivotal Software, Inc. All rights reserved.

Feature Engineering: Transforming Timeseries

!  Many modeling approaches require feature extraction –  Clustering of engines –  Regression to reverse-engineer engine

health

Page 32: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

32 © 2016 Pivotal Software, Inc. All rights reserved.

Engineering Features From Time Series

!  Goal: Represent timeseries data as variables

!  Approach: 1.  Identify the different phases

of the flight: takeoff, climbs, cruises, descents, landing

2.  For each phase and parameter calculate:

3.  Summary stats on rate of change for features

▪  mean ▪  min ▪  max ▪  stddev

▪  max – min ▪  median

mean: 13,674 stddev: 0 max: 13,674 min: 13,674 max-min: 0 median: 13,674

mean: 33,596 stddev: 5,732 max: 45,575 min: 25,959 max-min: 19,616 median: 32,556

Page 33: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

33 © 2016 Pivotal Software, Inc. All rights reserved.

Calculating Correlations between Sensors

!  How correlated are two sensors?

!  Are correlations between the sensors different flight to flight?

!  Approach: –  1) Calculate correlations over entire flight data set and observe

trends –  2) Calculate correlations over each flight and observe trends

Page 34: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

34 © 2016 Pivotal Software, Inc. All rights reserved.

Sensor Parameter Correlations

!  Correlations calculated on entire flight data set

!  435 total unique parameter pairs –  162 pairs are strongly

positively correlated (>.8) –  45 pairs are strongly

negatively correlated (<-.8) –  228 pairs are weakly

correlated

# of

Mea

sure

men

t Pai

rs

Correlation

Page 35: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

35 © 2016 Pivotal Software, Inc. All rights reserved.

Top Correlated Parameter Pairs

Parameter 1 Parameter 2 Correlation

p2 alt -0.985

t2 alt -0.974

p15 alt -0.972

w31 alt -0.931

w32 alt -0.931

Parameter 1 Parameter 2 Correlation

nc htbleed .999

t30 nc .999

t30 htbleed .999

ps30 p30 .999

w31 w32 .999

Negatively Correlated Positively Correlated

p2 pressure at fan inlet t2 total temp at fan inlet p15 total pressure in bypass-duct w31 HPT cooland bleed w32 LPT cooland bleed

nc physical core speed htbleed bleed enthalpy t30 total temperature at HPC outlet ps30 total pressure at HPC outlet

Page 36: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

36 © 2016 Pivotal Software, Inc. All rights reserved.

Top Negatively Correlated Sensors

p2 pressure at fan inlet t2 total temp at fan inlet p15 total pressure in bypass-duct w31 HPT cooland bleed w32 LPT cooland bleed

!  Potential Analysis: Calculating correlations at a regime level may reveal anomalies

Page 37: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

37 © 2016 Pivotal Software, Inc. All rights reserved.

Top Positively Correlated Sensors

nc physical core speed htbleed bleed enthalpy t30 total temperature at HPC outlet ps30 total pressure at HPC outlet

!  Potential Analysis: Calculating correlations at a regime level may reveal anomalies

Page 38: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

38 © 2016 Pivotal Software, Inc. All rights reserved.

Correlation Between Altitude and P2 Flight ID

Page 39: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

39 © 2016 Pivotal Software, Inc. All rights reserved.

Correlation Between Altitude and P2 Flight ID

Page 40: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

40 © 2016 Pivotal Software, Inc. All rights reserved.

Correlation Between Altitude and P2 Flight ID

Page 41: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

41 © 2016 Pivotal Software, Inc. All rights reserved.

Correlation Between Altitude and P2 Flight ID

Page 42: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

42 © 2016 Pivotal Software, Inc. All rights reserved.

Correlation Between Altitude and P2 Flight ID

Page 43: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

43 © 2016 Pivotal Software, Inc. All rights reserved.

Correlation Between Altitude and P2 Flight ID

Page 44: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

44 © 2016 Pivotal Software, Inc. All rights reserved.

Clustering Flights Insights on engine degradation and end of life

Page 45: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

45 © 2016 Pivotal Software, Inc. All rights reserved.

Feature Reduction using VIF

K-Means Clustering Algorithm Objective: Group flights based on their parameter time series

Time Series for Single Sensor Data

Extract Summary Statistics for All Phases

Cluster using K-means algorithm in MADlib with Summary Statistics as Feature Vector

Page 46: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

46 © 2016 Pivotal Software, Inc. All rights reserved.

Feature Reduction using VIF

K-Means Clustering Algorithm Objective: Group flights based on their parameter time series

Time Series for Single Sensor Data

Extract Summary Statistics for All Phases

Cluster using K-means algorithm in MADlib with Summary Statistics as Feature Vector

Param 1

Extract Features

Page 47: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

47 © 2016 Pivotal Software, Inc. All rights reserved.

K-Means Clustering Algorithm

Source: http://www.naftaliharris.com/

Feature Reduction using VIF

Time Series for Single Sensor Data

Extract Summary Statistics for All Phases

For each Cluster using K-means algorithm in MADlib with Summary Statistics as Feature Vector

Page 48: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

48 © 2016 Pivotal Software, Inc. All rights reserved.

K-Means Clustering Algorithm

Feature Reduction using VIF

Time Series for Single Sensor Data

Extract Summary Statistics for All Phases

For each Cluster using K-means algorithm in MADlib with Summary Statistics as Feature Vector

Repeat process for 29 parameters

Page 49: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

49 © 2016 Pivotal Software, Inc. All rights reserved.

Flights in Cluster 4 Indicate Engine’s end of life

Smfan Timeseries Features Clustering Results

Page 50: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

50 © 2016 Pivotal Software, Inc. All rights reserved.

Classification-Based Similarity Metric Understanding similarities between flights

Page 51: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

51 © 2016 Pivotal Software, Inc. All rights reserved.

Classification-Based Distance Metric

!  Binary classification methods to build models to differentiate between two groups using available attributes –  Algorithms allow us to use

optimal subset of attributes to differentiate classes (feature selection)

–  Ability to differentiate becomes a proxy for dissimilarity

Class 1 Class 2 Class 3

Classes differentiated by size and color

These classes are indistinguishable

Model accuracy HIGH : able to predict class

Model accuracy LOW: unable to predict classes

Page 52: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

52 © 2016 Pivotal Software, Inc. All rights reserved.

Classification-Based Flight Similarity Metric

!  For a given pre-takeoff phase –  Create a non-overlapping set of all 5-second windows –  Extract features

▪  Summary statistic (402) for each parameter in the time-window ▪  Correlations between all pairs of parameters in the time-window used for

propulsion data only

Flight 1, Flight 2

Flight 1, Flight 3

Flight m, Flight n

Train Classifier for

Classification Accuracy Score

Classification Accuracy Score

Classification Accuracy Score

Engine 1

Page 53: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

53 © 2016 Pivotal Software, Inc. All rights reserved.

Expected Results

!  745,281 total models built –  For each flight, classifier to

each other flight for the same engine

–  Modeling run-time ~11 min on 128-segment cluster

!  As engines begin to degrade, adjacent flights should be similar (low accuracy)

Class 1 Class 2 Class 3

Classes differentiated by size and color

These classes are indistinguishable

Model accuracy HIGH : able to predict class

Model accuracy LOW: unable to predict classes

Page 54: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

54 © 2016 Pivotal Software, Inc. All rights reserved.

Expected Results

!  745,281 total models built –  For each flight, classifier to

each other flight for the same engine

–  Modeling run-time ~11 min on 128-segment cluster

!  As engines begin to degrade, adjacent flights should be similar (low accuracy)

Model accuracy HIGH : able to distinguish flights that occur after degradation

Model accuracy LOW: unable to predict distinguish adjacent flights (little difference)

Flight number

Mod

el A

ccur

acy

REFERENCE FLIGHT

Page 55: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

55 © 2016 Pivotal Software, Inc. All rights reserved.

Engine 1 results Model accuracy HIGH : able to distinguish flights that occur after degradation

Model accuracy LOW: unable to predict distinguish adjacent flights (little difference)

REFERENCE FLIGHT

Page 56: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

56 © 2016 Pivotal Software, Inc. All rights reserved.

Engine 1 results Model accuracy HIGH : able to distinguish flights that occur after degradation

Model accuracy LOW: unable to predict distinguish adjacent flights (little difference)

REFERENCE FLIGHT

Page 57: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

57 © 2016 Pivotal Software, Inc. All rights reserved.

Engine 1 results

Model accuracy HIGH : able to distinguish flights that occur before and after degradation

Model accuracy LOW: unable to predict distinguish adjacent flights (little difference)

REFERENCE FLIGHT

Page 58: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

58 © 2016 Pivotal Software, Inc. All rights reserved.

Engine 1 results

Model accuracy HIGH : able to distinguish flights that occur before and after degradation

Model accuracy LOW: unable to predict distinguish adjacent flights (little difference)

REFERENCE FLIGHT

Page 59: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

59 © 2016 Pivotal Software, Inc. All rights reserved.

Logistic Regression Results

!  Earlier flights are more similar to each other

!  Earlier flights are more dissimilar to later flights

!  Flights up until 50th are similar to each other

!  Flights after 50th are only similar to neighboring flights but start to differ from earlier flights

!  Indicates change/degradation over time

Similar Dissimilar

Page 60: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

60 © 2016 Pivotal Software, Inc. All rights reserved.

Examining Engine Degradation Over Time

!  Summary statistics over flights provide insights into degradation patterns –  Median/mean accuracies over

PRECEDING flights indicates what degradation occurred since the engine start

–  Observations over adjacent windows may be of interest

–  Detecting anomalies

Flight number

Mod

el A

ccur

acy

REFERENCE FLIGHT

REFERENCE FLIGHT

Page 61: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

61 © 2016 Pivotal Software, Inc. All rights reserved.

Engine Health and Engine Classification-based Similarity

!  Median accuracy score of a flight to prior flights increases as engine health decreases

!  Abrupt changes in engine health can be found using future flights (to find an inflection)

Page 62: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

62 © 2016 Pivotal Software, Inc. All rights reserved.

Accuracy Scores Show both Time and Degradation

!  With many more flights median accuracy increases

Page 63: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

63 © 2016 Pivotal Software, Inc. All rights reserved.

Accuracy Scores Show both Time and Degradation

!  With many more flights median accuracy increases

Page 64: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

64 © 2016 Pivotal Software, Inc. All rights reserved.

Accuracy Scores Show both Time and Degradation

!  With many more flights median accuracy increases

!  Degradation in engine causes median accuracy to drop faster

Page 65: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

65 © 2016 Pivotal Software, Inc. All rights reserved.

Example of Fault: engine 32

!  Flight before fault occurs

!  avg scores of flights before fault flight is slightly higher

!  flight after fault: more flights with score > .8

Page 66: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

66 © 2016 Pivotal Software, Inc. All rights reserved.

HPT Fault

Classification-Based Similarity Changes at Faults

Page 67: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

67 © 2016 Pivotal Software, Inc. All rights reserved.

LPC Fault – low engine health change still detected

Classification-Based Similarity Changes at Faults

Page 68: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

68 © 2016 Pivotal Software, Inc. All rights reserved.

LPC Fault – low engine health change still detected

Classification-Based Similarity Changes at Faults

Page 69: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

69 © 2016 Pivotal Software, Inc. All rights reserved.

Engine Health and Median Accuracy Correlations

HPT fault flights

Page 70: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

70 © 2016 Pivotal Software, Inc. All rights reserved.

Engine Health and Median Accuracy Correlations

fan hpc hpt

lpc lpt

Page 71: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

71 © 2016 Pivotal Software, Inc. All rights reserved.

What Did We Learn? What is next?

•  Through technology, data exploration and feature generation becomes easier –  What we learned: Rapidly transforming large volumes of

sensor data –  What’s next: Timeseries analysis, interpolation on missing

data •  Experimentation with building models to predict engine decay

and faults –  What we learned: unsupervised techniques for clustering

and distance metrics enable us to discover signals of decay

–  What’s next: supervised approaches to detect known faults

Page 72: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

72 © 2016 Pivotal Software, Inc. All rights reserved.

Opportunities in the Digital Brain

Page 73: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

73 © 2016 Pivotal Software, Inc. All rights reserved.

Opportunities in the Digital Brain

CONNECTED CARS

PERSONALIZED MEDICINE

SMART METERS

SECURITY

PREDICTIVE MAINTENANCE

SPORT TRACKING

OPTIMIZATION AND EFFICIENCY

Page 74: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation
Page 75: Planes, Trains, and Automobiles: A Data Scientist’s Guide to Modeling Engine Degradation

75 © 2016 Pivotal Software, Inc. All rights reserved.

Appendix

•  Propulsion dataset can be downloaded at: https://c3.nasa.gov/dashlink/resources/140/