data driven-toyota customer 360 insights on apache spark and mllib-(brian kursar, toyota)
TRANSCRIPT
![Page 1: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/1.jpg)
TOYOTA Customer 360◦ on Apache SparkTM
DATA DRIVEN
Brian Kursar, Sr Data Scientist Toyota Motor Sales IT Research and Development Final
![Page 2: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/2.jpg)
TOYOTA Big Data History
2015
2014
2013
2012
2011
2010
C360 - Next Gen Insights Platform Over 6B Records
C360 - Customer Experience Analytics Over 700M Records
C360 - Toyota Social Media Intelligence Center Over 500M Records
Product Quality Analytics v2 Over 120M Records
Marketing and Incentives Analytics 70M Records
Product Quality Analytics Over 60M Records
![Page 3: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/3.jpg)
SPARK SUMMIT 2014
TEAM TOYOTA
R&D
Data Engineering
Infrastructure
Enterprise Architecture
![Page 4: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/4.jpg)
Genchi Genbutsu “Go Look, Go See”
![Page 5: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/5.jpg)
• Compute • Streaming • Machine Learning
ACTIONABLE INSIGHTS
![Page 6: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/6.jpg)
Customer Experience original Batch Job
160 hours (6.6 days)
Same job re-written using Apache Spark …
4 hours
Data Engineering
![Page 7: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/7.jpg)
Existing Tools
![Page 8: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/8.jpg)
Existing Tools
Jan – Feb Feb - Mar
+1% +1%
+1% +2%
Toyota Social Opinion 2013
![Page 9: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/9.jpg)
40% Retailers Selling Toyota Vehicles
11% Opinions on Marketing Campaigns
10% Feedback on Dealer Sales and Service Experiences
9% Opinions on Product Styling and Features
8% People In Market for a Toyota
8% Incident Reports Involving a Toyota Vehicle
7% Feedback on Product Quality
5% Customers Advocating for the Brand
2% Completely Irrelevant
Toyota Online Conversations by the Numbers
2014 Study
![Page 10: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/10.jpg)
Toyota Online Conversations by the Numbers
40% Retailers Selling Toyota Vehicles
11% Opinions on Marketing Campaigns
10% Feedback on Dealer Sales and Service Experiences
9% Opinions on Product Styling and Features
8% People In Market for a Toyota
8% Incident Reports Involving a Toyota Vehicle
7% Feedback on Product Quality
5% Customers Advocating for the Brand
2% Completely Irrelevant
2014 Study
50% Noise
![Page 11: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/11.jpg)
Categorize and Prioritize incoming Social Media interactions in Real-Time using Spark MLLib
Campaign Opinions
Customer Feedback
Product Feedback Noise
![Page 12: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/12.jpg)
![Page 13: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/13.jpg)
First Spark MLlib Experiment
• Seat Cover Wrinkles/Cracking • Brake Noise • Shift Quality • Oil Leaks • HVAC Odor • Dead Battery • Rodent Wire Harness Damage • Paint Chips
Time-box project to 12 Weeks Classify at min 80% accuracy
![Page 14: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/14.jpg)
Describe this issue…
![Page 15: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/15.jpg)
Noise Brakes
![Page 16: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/16.jpg)
When I’m backing up in my 2012 Prius, it sounds like something hanging up or scraping as it rotates and only happens in the morning..
I hear a squeak coming from the back wheel of my Prius as I pull out from my driveway in the morning.
![Page 17: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/17.jpg)
Extract features from the Verbatim Text
Identify Training Data for Brake Noise Model
Extract Category (Label) matching Model Objective
![Page 18: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/18.jpg)
Social ML Pipeline
Natural Language Processing Options
Feature EngineeringOptions
Statistical Selection(Chi Square)
All Top
TopRandomRandom
Popular
Vectorization Options
Machine Learning Model
TF*IDF
TF OptionsNaturalBooleanLogAugmented
IDF OptionsUnaryInverseInverse SmoothProbDF
Support Vector Machine (MLlib)
Validation Set (1 Fold)
Cross Validation: Repeat 10 times
Training Set(9 Fold)
Multiple Iteration: Optimal SVM Parameter
Training selection filters
StopwordsStemmingN-‐grams extraction
Cleaning (#,’, /, @)
Positive & Negative training Sets
Labeled Data(Surveys, Hand Scored Social)
Extracted n-‐grams
![Page 19: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/19.jpg)
Extract Text Features
![Page 20: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/20.jpg)
Train Predictive Model
![Page 21: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/21.jpg)
![Page 22: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/22.jpg)
Ver 1
56% Accuracy
![Page 23: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/23.jpg)
Ver 3
36% Accuracy
![Page 24: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/24.jpg)
Ver 8
35% Accuracy
![Page 25: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/25.jpg)
False Positives
I just had my friend at the toyota dealer rotate my tires and he
said … that the brake pads are getting thin really fast. So what should I do when they get too thin in the future and start
to squeak?
![Page 26: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/26.jpg)
False Positives
i cut the iac hose as shown in figure 20 in the manual but when i
start the car, it started gasping for air... choking...
sounds like it's about to die out.
i bought the power brake check valve (80190 part for kragen)... but either i'm not installing it right or it's the wrong size... i have no idea.
![Page 27: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/27.jpg)
Solution
![Page 28: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/28.jpg)
Explicit Semantic Analysis
![Page 29: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/29.jpg)
Noise Brakes
![Page 30: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/30.jpg)
Noise
caliper pads
rotor
wheel
squeak grind
groan
squeal
Brakes
drum
![Page 31: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/31.jpg)
Noise
squeak grind
groan
squeal
Distance Similarity Between Concepts
pads caliper
rotor
wheel
drum
Brakes
Distance is calculated between
concepts based on the Minkowski distance formula.
![Page 32: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/32.jpg)
Ver 9
82% Accuracy
![Page 33: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/33.jpg)
![Page 34: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/34.jpg)
Kaizen
= Continuous Improvement
![Page 35: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/35.jpg)
Kaizen
Train
Test Evaluate
Refine
![Page 36: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/36.jpg)
Social ML Pipeline
Natural Language Processing Options
Feature EngineeringOptions
Statistical Selection(Chi Square)
All Top
TopRandomRandom
Popular
Vectorization Options
Machine Learning Model
TF*IDF
TF OptionsNaturalBooleanLogAugmented
IDF OptionsUnaryInverseInverse SmoothProbDF
Support Vector Machine (MLlib)
Validation Set (1 Fold)
Cross Validation: Repeat 10 times
Training Set(9 Fold)
Multiple Iteration: Optimal SVM Parameter
Training selection filters
StopwordsStemmingN-‐grams extraction
Cleaning (#,’, /, @)
Positive & Negative training Sets
Labeled Data(Surveys, Hand Scored Social)
Extracted n-‐grams
Synonyms POS
Feature Manager
Negation Evaluator
Distance Similarity
Manhattan Distance (L1)-‐|x|+ |y|
Euclidean Distance (L2)-‐√(|x|^ 2 + |y|^ 2)
Concept Manager Features
Concept Interpreter
![Page 37: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/37.jpg)
TODAY
![Page 38: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/38.jpg)
FUTURE
Manufacturing Data
Connected Vehicle Data
Consumer Data
![Page 39: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/39.jpg)
TEAM TOYOTA Spark Tips
• Education and Inclusion
• Pace Yourself
• Design and Plan a Transitional Architecture to Incrementally
Introduce Spark elements into your Applications
• Use Joda Time for Date Comparisons
![Page 40: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/40.jpg)
TEAM TOYOTA Lessons Learned
• Be mindful of AKKA versions when trying to Build a new Spark release to a
packaged Hadoop Distribution
• Use SparkSQL versus DSLs for Joins
• Remember to configure Memory Fraction based on the size of your data.
![Page 41: Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kursar, Toyota)](https://reader030.vdocuments.site/reader030/viewer/2022032506/55ce3c7abb61ebb3378b480d/html5/thumbnails/41.jpg)
Brian Kursar – Sr Data Scientist Toyota Motor Sales IT Research and Development @briankursar
Visit us at the TOYOTA Booth here at the Spark Summit Today.