strata 2016 - lessons learned from building real-life machine learning systems
TRANSCRIPT
![Page 1: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/1.jpg)
Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain (@xamat)www.quora.com/profile/Xavier-Amatriain
3/29/16
![Page 2: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/2.jpg)
A bit about
![Page 3: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/3.jpg)
Our Mission
“To share and grow
the world’s knowledge”
• Millions of questions & answers
• Millions of users
• Thousands of topics
• ...
![Page 4: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/4.jpg)
Demand
What we care about
Quality
Relevance
![Page 5: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/5.jpg)
Lessons Learned
![Page 6: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/6.jpg)
More Data vs. Better Models
![Page 7: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/7.jpg)
More data or better models?
Really?
Anand Rajaraman: VC, Founder, Stanford Professor
![Page 8: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/8.jpg)
More data or better models?
Sometimes, it’s not about more data
![Page 9: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/9.jpg)
More data or better models?
Norvig: “Google does not have better Algorithms only more Data”
Many features/low-bias models
![Page 10: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/10.jpg)
More data or better models?
Sometimes, it’s not about more data
![Page 11: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/11.jpg)
Sometimes you do needA (more) Complex Model
![Page 12: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/12.jpg)
Better models and features that “don’t work”
● E.g. You have a linear model and have been selecting and optimizing features for that model
■ More complex model with the same features -> improvement not likely
■ More expressive features with the same model -> improvement not likely
● More complex features may require a more complex model
● A more complex model may not show improvements with a feature set that is too simple
![Page 13: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/13.jpg)
Model selection is also aboutHyperparameter optimization
![Page 14: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/14.jpg)
Hyperparameter optimization
● Automate hyperparameter optimization by choosing the right metric.○ But, is it as simple as choosing the
max?
● Bayesian Optimization (Gaussian Processes) better than grid search○ See spearmint, hyperopt, AutoML,
MOE...
![Page 15: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/15.jpg)
Supervised vs. plus Unsupervised Learning
![Page 16: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/16.jpg)
Supervised/Unsupervised Learning
● Unsupervised learning as dimensionality reduction
● Unsupervised learning as feature engineering
● The “magic” behind combining
unsupervised/supervised learning
○ E.g.1 clustering + knn
○ E.g.2 Matrix Factorization■ MF can be interpreted as
● Unsupervised:
○ Dimensionality Reduction a la PCA
○ Clustering (e.g. NMF)
● Supervised
○ Labeled targets ~ regression
![Page 17: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/17.jpg)
Supervised/Unsupervised Learning
● One of the “tricks” in Deep Learning is how it
combines unsupervised/supervised learning
○ E.g. Stacked Autoencoders
○ E.g. training of convolutional nets
![Page 18: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/18.jpg)
Everything is an ensemble
![Page 19: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/19.jpg)
Ensembles
● Netflix Prize was won by an ensemble
○ Initially Bellkor was using GDBTs
○ BigChaos introduced ANN-based ensemble
● Most practical applications of ML run an ensemble
○ Why wouldn’t you?
○ At least as good as the best of your methods
○ Can add completely different approaches (e.
g. CF and content-based)
○ You can use many different models at the
ensemble layer: LR, GDBTs, RFs, ANNs...
![Page 20: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/20.jpg)
Ensembles & Feature Engineering
● Ensembles are the way to turn any model into a feature!
● E.g. Don’t know if the way to go is to use Factorization
Machines, Tensor Factorization, or RNNs?
○ Treat each model as a “feature”
○ Feed them into an ensemble
![Page 21: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/21.jpg)
The Master Algorithm?
It definitely is the ensemble!
![Page 22: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/22.jpg)
The pains & gains of Feature Engineering
![Page 23: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/23.jpg)
Feature Engineering
● Main properties of a well-behaved ML feature
○ Reusable
○ Transformable
○ Interpretable
○ Reliable
● Reusability: You should be able to reuse features in different
models, applications, and teams
● Transformability: Besides directly reusing a feature, it
should be easy to use a transformation of it (e.g. log(f), max(f),
∑ft over a time window…)
![Page 24: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/24.jpg)
Feature Engineering
● Main properties of a well-behaved ML feature
○ Reusable
○ Transformable
○ Interpretable
○ Reliable
● Interpretability: In order to do any of the previous, you
need to be able to understand the meaning of features and
interpret their values.
● Reliability: It should be easy to monitor and detect bugs/issues
in features
![Page 25: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/25.jpg)
Feature Engineering Example - Quora Answer Ranking
What is a good Quora answer?
• truthful
• reusable
• provides explanation
• well formatted
• ...
![Page 26: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/26.jpg)
Feature Engineering Example - Quora Answer Ranking
How are those dimensions translated
into features?
• Features that relate to the answer
quality itself
• Interaction features
(upvotes/downvotes, clicks,
comments…)
• User features (e.g. expertise in topic)
![Page 27: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/27.jpg)
Implicit signals beat explicit ones
(almost always)
![Page 28: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/28.jpg)
Implicit vs. Explicit
● Many have acknowledged
that implicit feedback is more useful
● Is implicit feedback really always
more useful?
● If so, why?
![Page 29: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/29.jpg)
● Implicit data is (usually):
○ More dense, and available for all users
○ Better representative of user behavior vs.
user reflection
○ More related to final objective function
○ Better correlated with AB test results
● E.g. Rating vs watching
Implicit vs. Explicit
![Page 30: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/30.jpg)
● However
○ It is not always the case that
direct implicit feedback correlates
well with long-term retention
○ E.g. clickbait
● Solution:
○ Combine different forms of
implicit + explicit to better represent
long-term goal
Implicit vs. Explicit
![Page 31: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/31.jpg)
be thoughtful about your Training Data
![Page 32: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/32.jpg)
Defining training/testing data
● Training a simple binary classifier for good/bad answer○ Defining positive and negative labels ->
Non-trivial task○ Is this a positive or a negative?
● funny uninformative answer with many upvotes● short uninformative answer by a well-known
expert in the field● very long informative answer that nobody
reads/upvotes● informative answer with grammar/spelling
mistakes● ...
![Page 33: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/33.jpg)
Other training data issues: Time traveling
● Time traveling: usage of features that originated after the event you are trying to predict○ E.g. Your upvoting an answer is a pretty good prediction
of you reading that answer, especially because most upvotes happen AFTER you read the answer
○ Tricky when you have many related features○ Whenever I see an offline experiment with huge wins, I
ask: “Is there time traveling?”
![Page 34: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/34.jpg)
Your Model will learn what you teach it to learn
![Page 35: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/35.jpg)
Training a model
● Model will learn according to:
○ Training data (e.g. implicit and explicit)
○ Target function (e.g. probability of user reading an answer)
○ Metric (e.g. precision vs. recall)
● Example 1 (made up):
○ Optimize probability of a user going to the cinema to
watch a movie and rate it “highly” by using purchase history
and previous ratings. Use NDCG of the ranking as final
metric using only movies rated 4 or higher as positives.
![Page 36: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/36.jpg)
Example 2 - Quora’s feed
● Training data = implicit + explicit
● Target function: Value of showing a story to a
user ~ weighted sum of actions: v = ∑a va 1{ya = 1}
○ predict probabilities for each action, then compute expected
value: v_pred = E[ V | x ] = ∑a va p(a | x)
● Metric: any ranking metric
![Page 37: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/37.jpg)
Offline testing
● Measure model performance, using (IR) metrics
● Offline performance = indication to make decisions on follow-up A/B tests
● A critical (and mostly unsolved) issue is how offline metrics correlate with A/B test results.
![Page 38: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/38.jpg)
Learn to deal with Presentation Bias
![Page 39: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/39.jpg)
2D Navigational modeling
More likely to see
Less likely
![Page 40: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/40.jpg)
The curse of presentation bias
● User can only click on what you decide to show● But, what you decide to show is the result of what your model
predicted is good● Simply treating things you show as negatives is not likely to work● Better options
● Correcting for the probability a user will click on a position -> Attention models
● Explore/exploit approaches such as MAB
![Page 41: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/41.jpg)
You don’t need to distribute your ML algorithm
![Page 42: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/42.jpg)
Distributing ML
● Most of what people do in practice can fit into a multi-
core machine
○ Smart data sampling
○ Offline schemes
○ Efficient parallel code
● Dangers of “easy” distributed approaches such
as Hadoop/Spark
● Do you care about costs? How about latencies?
![Page 43: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/43.jpg)
Distributing ML
● Example of optimizing computations to fit them into
one machine
○ Spark implementation: 6 hours, 15 machines
○ Developer time: 4 days
○ C++ implementation: 10 minutes, 1 machine
● Most practical applications of Big Data can fit into
a (multicore) implementation
![Page 44: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/44.jpg)
The untold story of Data Science and vs. ML engineering
![Page 45: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/45.jpg)
Data Scientists and ML Engineers
● We all know the definition of a Data Scientist
● Where do Data Scientists fit in an organization?
○ Many companies struggling with this
● Valuable to have strong DS who can bring value
from the data
● Strong DS with solid engineering skills are
unicorns and finding them is not scalable○ DS need engineers to bring things to production
○ Engineers have enough on their plate to be willing to
“productionize” cool DS projects
![Page 46: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/46.jpg)
The data-driven ML innovation funnel
Data Research
ML Exploration - Product Design
AB Testing
![Page 47: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/47.jpg)
Data Scientists and ML Engineers
● Solution:
○ (1) Define different parts of the innovation funnel
■ Part 1. Data research & hypothesis
building -> Data Science
■ Part 2. ML solution building &
implementation -> ML Engineering
■ Part 3. Online experimentation, AB
Testing analysis-> Data Science
○ (2) Broaden the definition of ML Engineers
to include from coding experts with high-level
ML knowledge to ML experts with good
software skills
Data Research
ML Solution
AB Testing
Data
ScienceD
ata Science
ML
Engineering
![Page 48: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/48.jpg)
Conclusions
![Page 49: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/49.jpg)
● In data, size is not all that matters● Understand dependencies between data, models
& systems● Choose the right metric & optimize what matters● Be thoughtful about
○ your ML infrastructure/tools
○ about organizing your teams
![Page 50: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/50.jpg)
Questions?
![Page 51: Strata 2016 - Lessons Learned from building real-life Machine Learning Systems](https://reader034.vdocuments.site/reader034/viewer/2022042723/586e73a21a28ab99598b5525/html5/thumbnails/51.jpg)