how to make cars smarter: a step towards self-driving cars
TRANSCRIPT
How to Make Cars Smarter: A Step Towards Self-Driving Cars
Kaushik K. DasEsther VasietePivotal Data Science
October 2016
Today’s presentersPivotal Data Science Perspectives
Kaushik K. DasHead of Data Science, Pivotal
Esther VasieteData Scientist, Pivotal
Agenda
• What do we mean by “smarter cars”?
• How do we apply data science to build smarter cars?
Example 1: Predictive Maintenance
Example 2: Understanding Driver Behavior Patterns
• Demo
• Next Steps
Autonomous Cars will offer many advantages
Call a car whenever you want to go somewhere – sit and relax – and you are there!
● No stress for you – don’t have to drive in traffic or maintain a car
● Better utilization of cars leading to lower impact on environment
● Fewer accidents and injuries
BUT
there are some issues that still need to be solved – e.g. California law needs a driver ready to take over in case of an emergency
Autonomous Cars
Manually Driven Cars
We need to get from
Smart “Augmented” Cars*
Autonomous Cars
Manually Driven CarsWhy not -
* Some people refer to smart augmented cars as semi-autonomous vehicles
Augmentation – a situation in which humans and computers combine to create effective and efficient outcomes*
● You get reduced stress and fewer accidents
● Fewer regulatory / legal barriers
● Easier to implement* Thomas H. Davenport, Augmentation or Automation ?, WSJ, Feb 25, 2015.
Smart Cars offer many of the advantages of automation
Smart System = Sensors Digital Brain + Actuators
Problem Formulation
Data Step
Modeling Step
Application Step
Data Science For Building Models
Sensors & Data
Data Lake
Big Data Platform
Phase 1: Problem Formulation
Make sure you formulate a problem that is relevant to
the goals and pain points of the stakeholders
Phase 2: Data StepBuild the right feature set
making full use of the volume, variety and
velocity of all available data
Phase 3: Modeling StepThis is where you move from answering what, where and when to answering why and
what if?
Phase 4: ApplicationCreate a framework for
integrating the model with decision making processes and taking action using the
Internet of Things
Technology SelectionSelect the right platform and the right set of tools for solving the
problem at hand
Iterative ApproachPerform each phase in an agile manner, team up with domain experts and SMEs, and iterate
as required
CreativityTake the opportunity to innovate at every phase
Building a NarrativeCreate a fact-based narrative
that clearly communicates insights to stakeholders
The Eightfold Path of Data Science – four phases and four differentiating factors
KEY LANGUAGES
P L A T F O R M
KEY TOOLS
MLlib
PL/X
Mod
elin
g To
ols
Visu
aliz
atio
n To
ols
Platform
PivotalHDB
Pivotal Greenplum
Spring Cloud Data Flow
Apache Spark
PivotalHDP
Data Science Toolkit
Scalable, In-Database Machine Learning
• Open source https://github.com/apache/incubator-madlib• Downloads and docs http://madlib.incubator.apache.org/• Wiki
https://cwiki.apache.org/confluence/display/MADLIB/
Functions
Linear Systems• Sparse and Dense Solvers• Linear Algebra
Matrix Factorization• Singular Value Decomposition (SVD)• Low Rank
Generalized Linear Models• Linear Regression• Logistic Regression• Multinomial Logistic Regression• Ordinal Regression• Cox Proportional Hazards Regression• Elastic Net Regularization• Robust Variance (Huber-White),
Clustered Variance, Marginal Effects
Other Machine Learning Algorithms• Principal Component Analysis (PCA)• Association Rules (Apriori)• Topic Modeling (Parallel LDA)• Decision Trees• Random Forest• Support Vector Machines• Conditional Random Field (CRF)• Clustering (K-means) • Cross Validation• Naïve Bayes• Support Vector Machines (SVM)• Prediction Metrics
Descriptive StatisticsSketch-Based Estimators• CountMin (Cormode-Muth.)• FM (Flajolet-Martin)• MFV (Most Frequent Values)Correlation and CovarianceSummary
Utility ModulesArray and Matrix OperationsSparse VectorsRandom SamplingProbability FunctionsData PreparationPMML ExportConjugate GradientStemmingSessionizationPivot
Inferential StatisticsHypothesis Tests
Time Series• ARIMA
Sept 2016
Path Functions• Operations on Pattern Matches
Data Science Use-Cases● Smarter Car‒ Is the car functioning well?‒ Do any of the parts need servicing or replacement?‒ How are the new parts functioning? Are they better than the old parts? How’s their performance
relative to tests?
● Smarter Driver Response‒ Understand drivers driving patterns and typical routes and customize for better driving experience
(Advanced Driver Assistance Systems)
● Smarter Response to Surroundings‒ How do we improve congestion forecasting and optimize routes better?‒ How do we improve traffic management ?‒ How can city planning be improved by using very granular driving and traffic information?
InitialSales
Web/AppsLogs
Demographics
CRM
Consumer Data
Surveys
DrivingBehavior
Sales &Leasing
Car Data
Dealership
Service Data
Parts
Manufactur-ing
Telemetry Data
Weather
Traffic
Economic
External
SpecialEvents
(Note: not an exhaustive list)
There’s a lot of data available
Example 1 - Smarter Car
Preventive Maintenance for Connected Cars
Diagnostic Trouble Codes (DTC)
Unscheduled repairs
AB1029 – Power steering pump replacementCT3408 – Wheel alignment
Data Sources for Predictive Maintenance
VINTimestamp DTC CodeOdometer
SpeedAcceleration
Engine Temperature Engine Torque GPS
Coordinates etc.
VINDate vehicle in
Date vehicle outRepair code
Parts replacedWarranty claims
Repair Commentsetc.
Vehicle Data Car Repairs Data
Predicting Job Type from Diagnostic Trouble Codes (DTCs)
Time
Job Type: Transmission
Job Type: Transmission
EngineJob Type:
Regular check
DTC: B DTC: B,
P, C
DTC: U DTC: B DTC: B
DTC: B, P, C, U
DTC:P, B, U
DTC: P DTC: B DTC: B,P
DTC: B,P
Can the DTCs observed here predict
this Job Type?
Can the DTCs observed here predict this Job
Type?
Can the DTCs observed here predict this Job
Type?
Hierarchical Classification Framework
Vehicle Features
DF1210
DF1215
DF2980
AB1029
AB1622
AB1625
AB8622
CT3402
CT3408
CT3560
CT2409
DTC codes + other features (e.g. mileage, vehicle model, previous repairs, ...)
1st stage: N one-vs-rest logistic regression models
2nd stage: N random forest models
Your car will be repaired before you have a problem!
Example 2 - Smarter Driver Response
Unsupervised driving behavior analysis
Segmentation:From raw sensor data to driving scenes using HMM.
Feature Distribution:Quantization of physical features observed in each scene
Driving topics:Scenes are represented as a combination of driving topics, which explain driving patterns.
Parallelism using:
PL/Python ** HMM inference frompre-trained model
PL/Python
[T. Bando, K. Tabenaka, S. Negasaka, T. Taniguchi, Unsupervised drive topic finding from driving behavioral data, IEEE Intelligent Vehicles Symposium, 2013]
HMM inference using PL/PythonNote: HMM parameters had been provided to us and loaded in the database.
hmmlearn library installed in every segment!
From time-series driving behavior into natural language
Latent Dirichlet Allocation (LDA)
Document
Word
Scene
Quantizedsensorvalue
[D. Blei, Probabilistic topic models, Communications of the ACM, 2012]
Live Demo
Data Lake Business Levers
Apps
MLlibPL
/X
Model Building
Model Tuning
Continuous Model Improvement
Data Feeds
Ingest Filter Enrich
SinkSpring Cloud Data Flow
Greenplum
Operationalization - Pipeline of a Data Science Driven App
We will be able to improve your driving experience by preparing your car for the exact conditions you are
about to encounter.
It’s easy to make cars smarter - let’s make it happen!
Questions?
Additional resources & next steps
Read: Pivotal Data Science Bloghttps://blog.pivotal.io/channels/data-science-pivotal
Strategic: Pivotal Data Science Analytics Roadmapping Engagement https://pivotal.io/contact
Tune in: Next data science webinar “How Data Science can help with Fraud Detection and Cybersecurity” - Q1 2017 (Date TBD) https://pivotal.io/resources/1/webinars
Hands on: HDB Sandbox on HDP VM https://network.pivotal.io/products/pivotal-hdbGreenplum Sandbox https://network.pivotal.io/products/pivotal-gpdbApache MADlib (incubating) http://madlib.incubator.apache.org/