data science in action
TRANSCRIPT
![Page 1: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/1.jpg)
C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
GOEDE TIJDEN SLECHTE TIJDEN, RESTAURANT REVIEWS,
BRAD PITT AND THE IKEA BILLY INDEX
Longhow Lam – Freelance Data Scientist
https://www.linkedin.com/in/longhowlam
https://longhowlam.wordpress.com
@longhowlam
![Page 2: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/2.jpg)
Data Science in Action
AGENDA
TEXT MINING AND MACHINE LEARNING
SOME CRAZY EXAMPLES
Goede tijden Slechte tijden
IENS Restaurant Reviews
Who looks like Brad Pitt?
The IKEA Billy Index
![Page 3: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/3.jpg)
Text mining and Machine Learning
![Page 4: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/4.jpg)
Text mining: simple exampleDoc 1 “I walked accross the street in Amsterdam, 1057DK, with my bike”
Doc 2 “She didn’t walk but cycled with her blue biike, //bitly.com/sdrtw”
Doc 3 “My bicycle is broken, what a piece of junk, @#$%$@!”
Terms Doc 1 Doc 2 Doc 3
+Bicycle (noun) 1 1 1
Cycling (verb) 0 1 0
Blue (adjective) 0 1 0
Amsterdam (location) 1 0 0
+Walk (verb) 1 1 0
Street (noun) 1 0 0
Broken (adjective) 0 0 1
Piece of junk (noun) 0 0 1
1057DK (postal code) 1 0 0
//bitly.com/sdrtw 0 1 0
TERM DOCUMENT MATRIX: A
• Every text document is a (very)
long string (with many zeros!)
• Data mining techniques are
applied to this matrix A
![Page 5: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/5.jpg)
Data Science in Action
TEXT MINING PREDICT OR CLUSTER
Combine texts and “normal data” to predict behaviour (churn / fraude)
Use machine learning to train a
learner f to predict the TARGET
Automatically create topics / clusters in huge piles of documents
Apply cluster techniques to divide
documents into topic
Topic 1 Topic 2 Topic 3
![Page 6: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/6.jpg)
Data Science in Action
MACHINE LEARNING SOME ALGORITHMS
Predict
Trees
Random Forests
Cluster
K-means
Hierarchical clustering
DBSCAN
Lineair regression
f
y = f(x) = a0 + a1x1 + a2x2+…anxn
Neural networks y = f(g(h(x)))
![Page 7: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/7.jpg)
![Page 8: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/8.jpg)
Data Science in Action
GTST ANALYSIS TEXT ANALYTICS
Business pain
Looking at GTST (Dutch soap): what the hack is this all about?
Are there trends in the series, is it not all the same?
Approach
Take the 5000 summaries and apply text mining in SAS
![Page 9: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/9.jpg)
Data Science in Action
GTST ANALYSIS RESULTS
Main topics in 5000 episodes
![Page 10: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/10.jpg)
Data Science in Action
GTST ANALYSIS DISTANCES BETWEEN TOPICS
![Page 11: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/11.jpg)
Data Science in Action
GTST ANALYSIS ZOOMING IN ON A TOPIC
![Page 12: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/12.jpg)
Data Science in Action
GTST ANALYSIS ZOOMING IN ON A TOPIC
Sub-topics of main topic: topic 16 (Ludo, Isabelle, Martine, Janine)
Harmsen feeling lonely.
Plan by Jack, dangerous
Writing a farewell letter
Panic, fear,
Questions about giving kid assignment
Getting money back, paying
IMPORTANT: Business validation!
I asked my wife, she used to be a loyal GTST watcher
![Page 13: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/13.jpg)
Data Science in Action
GTST ANALYSIS TREND RESULTS
Trends over time with SAS text profile feature
![Page 14: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/14.jpg)
Data Science in Action
GTST ANALYSIS TRENDS OVER TIME
![Page 15: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/15.jpg)
Data Science in Action
GTST ANALYSIS SIMILARITY OF EPISODES THROUGH THE YEARS
![Page 16: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/16.jpg)
Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
Two statistics that I like to share:
![Page 17: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/17.jpg)
Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
50.1% of people don’t
wash their hands
after visiting the toilet
![Page 18: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/18.jpg)
Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
50.1% of people don’t
wash their hands
after visiting the toilet
84.6% of all statistics are
just made up on the spot !!
![Page 19: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/19.jpg)
![Page 20: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/20.jpg)
Data Science in Action
IENS RESTAURANT PATH ANALYTICS
Business pain
I have eaten Chinese, where should I go next?.
Approach
Look at what others do, IENS restaurant reviewers!
![Page 21: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/21.jpg)
Data Science in Action
A FEW FACTS… IENS DATA (TRADITIONAL BI)
Most occurring restaurant name (39 times)
Among “dutch”
restaurant (6 times)
% Sustainable kitchensBiological (67%)
French (58%)
Fish (44%)
Vegetarian (39%)
…
…
…
Chinese (3%)700 reviews on a “normal” Saturday
Valentine 2015 1200 reviews (1.7 times)
23 times
12 times
![Page 22: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/22.jpg)
Data Science in Action
IENS RESTAURANT PATH ANALYSIS: GENERATED PATHS
![Page 23: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/23.jpg)
Data Science in Action
IENS REVIEWS CAN SENTIMENT BE PREDICTED?
Translate the reviews into a term document matrix
Apply machine learning to predict scores
Why would you do this?
![Page 24: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/24.jpg)
Data Science in Action
IENS REVIEWS CAN I PREDICT THE SENTIMENT?
![Page 25: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/25.jpg)
Data Science in Action
IENS REVIEWS PREDICT THE ‘EAT’ SCORE
Neural (2 X 20) R2 of 0.65
Linear reg model R2 of 0.56
![Page 26: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/26.jpg)
Data Science in Action
Predicted review score vs. Given review score
IENS REVIEWS PREDICTION THE ‘EAT’ SCORE
![Page 27: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/27.jpg)
Data Science in Action
IENS REVIEWS SENTIMENT ANALYSIS / PREDICTIVE MODELING
![Page 28: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/28.jpg)
![Page 29: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/29.jpg)
Data Science in Action
OUTLIERS IN FACES DATA MINING & MACHINE LEARNING
Business pain
Tell me: Who has a strange face at SAS Netherlands?
Approach
Take SAS photos and translate to data and apply machine learning
![Page 30: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/30.jpg)
Data Science in Action
OUTLIERS IN FACES DATA MINING & MACHINE LEARNING
![Page 31: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/31.jpg)
Data Science in Action
STRANGE FACE
DETECTIONCOMBO OF OPEN API & SAS
Use Face++ to do facial landmarking (no deep learning!!)
Import all landmarks in SAS as an ABT
Now you can solve some funny business issues with machine learning:
Which persons are look-alikes?
Hierarchical clustering
Are there any accountmanagers?
Predictive modeling / machine learning
Who is the Brad Pitt at SAS?
Nearest Neighbor
Funny faces
Anomaly / outlier detection
![Page 32: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/32.jpg)
Data Science in Action
STRANGE FACE
DETECTIONHIERARCHICAL CLUSTERING
![Page 33: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/33.jpg)
Data Science in Action
STRANGE FACE
DETECTIONBRAD PITT LOOK-A-LIKES…
![Page 34: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/34.jpg)
Data Science in Action
STRANGE FACE
DETECTIONOUTLIER DETECTION
![Page 35: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/35.jpg)
![Page 36: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/36.jpg)
Data Science in Action
IKEA WEBSITE KEEP TRACK OF BILLY STOCK
Define the IKEA Billy Index
as the change in stock over time
![Page 37: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/37.jpg)
Data Science in Action
IKEA WEBSITE THE IKEA BILLY INDEX
![Page 38: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/38.jpg)
Data Science in Action
THE BILLY INDEX SOME STATISTICS
![Page 39: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/39.jpg)
Data Science in Action
Every extra unit increase in wind speed results in 19 less Billy’s sold
![Page 40: Data science in action](https://reader031.vdocuments.site/reader031/viewer/2022020314/5a676ccf7f8b9a656a8b509b/html5/thumbnails/40.jpg)
C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
Thanks for your attention, QUESTIONS?
Freelance Data Scientist, Ik sta open om eens een kop koffie te drinken
https://www.linkedin.com/in/longhowlam
https://longhowlam.wordpress.com/
@longhowlam