@quora @qconsf 11/7/16 @nikhilgarg28 scaling quality on … · 2017-02-02 · scaling quality on...
TRANSCRIPT
![Page 1: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/1.jpg)
Scaling Quality On QuoraUsing Machine Learning
Nikhil Garg @nikhilgarg28
@Quora @QconSF 11/7/16
![Page 2: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/2.jpg)
● Introducing specific product problems we need to solve to stay high-quality
● Describing our formulation and approach to these problems.
● Identifying common themes of ML problems in the quality domain.
● Sharing high level lessons that we have learnt over time.
Goals Of The Talk
![Page 3: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/3.jpg)
● At Quora since 2012
● Currently leading two engineering teams:
○ ML Platform
○ Content Quality
● Interested in the intersection of distributed
systems, machine learning and human behavior
A bit about me...
@nikhilgarg28
![Page 4: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/4.jpg)
![Page 5: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/5.jpg)
To Grow And Share World’s Knowledge
![Page 6: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/6.jpg)
![Page 7: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/7.jpg)
![Page 8: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/8.jpg)
![Page 9: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/9.jpg)
![Page 10: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/10.jpg)
![Page 11: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/11.jpg)
Over 100 million monthly uniques
Millions of questions & answers
In hundreds of thousands of topics
Supported by 80 engineers
![Page 12: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/12.jpg)
ML @ Quora
![Page 13: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/13.jpg)
ML’s Importance For Quora
● ML is not just something we do on the side, it is mission critical for us.
● It’s one of the most important core competencies for us.
![Page 14: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/14.jpg)
Data: Billions of relationships
Users
AnswersQuestions
Topics Votes
Follow
Ask
Write
Cast
Have
Contain Get
Comments
Get
Follow
Write
Have Have
![Page 15: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/15.jpg)
Data: Billions of words in high quality corpus
● Questions
● Answers
● Comments
● Topic biographies
● ...
![Page 16: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/16.jpg)
Data: Interaction History
● Highly engaged users => long history of activity e.g search queries, upvotes etc.
● Ever-green content => long history of users engaging with the content in search, feed etc.
![Page 17: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/17.jpg)
● Answer ranking
● Feed ranking
● Search ranking
● User recommendations
● Topic recommendations
● Duplicate questions
● Email Digest
● Request Answers
● Trending now
● Topic expertise prediction
● Spam, abuse detection
● ….
ML Applications At Quora
![Page 18: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/18.jpg)
● Logistic Regression
● Elastic Nets
● Random Forests
● Gradient Boosted Decision Trees
● Matrix Factorization
● (Deep) Neural Networks
● LambdaMart
● Clustering
● Random walk based methods
● Word Embeddings
● LDA
● ...
ML Algorithms At Quora
![Page 19: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/19.jpg)
What We Care About
Relevance
Quality Demand
Is content high quality?
Is user an expert in the topic?Do lots of people want to get answers to this question?
What is the search intent of the user?
Would user be interested in reading answer?
Would user be able to answer the question?
![Page 20: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/20.jpg)
What We Care About
Relevance
Quality Demand
Is content high quality?
Is user an expert in the topic?Do lots of people want to get answers to this question?
What is the search intent of the user?
Would user be interested in reading answer?
Would user be able to answer the question?
![Page 21: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/21.jpg)
1. Duplicate Question Detection
2. Answer Ranking
3. Topic Expertise Detection
4. Moderation
![Page 22: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/22.jpg)
1. Duplicate Question Detection
2. Answer Ranking
3. Topic Expertise Detection
4. Moderation
![Page 23: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/23.jpg)
● Energy of people who can answer the question gets divided
● No single question page becomes the best resource for that question
● People looking for answers have to search and read many question pages
● Bad experience if the “same” question shows up in feed again and again
● Search engines can not rank any one page very highly
Why Duplicate Questions Are Bad
![Page 24: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/24.jpg)
Duplicate Question Detection
![Page 25: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/25.jpg)
● Need to detect duplicates even before
question reaches the system
● When user adds a question, we search
ALL our questions to check for duplicates.
● Latency: tens of milliseconds.
● ML algorithm aside, this is also a crazy
hard engineering problems.
Duplicate Question Detection
![Page 26: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/26.jpg)
Detect if a new question is duplicate of an
existing question.
Problem Statement
![Page 27: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/27.jpg)
What is the Sun’s temperature?
How hot does the Sun get?
What is the average temperature of Sun?
What is the temperature of Sun’s surface and that of Sun’s core?
What is the hottest object in our solar system? How hot is it?
What is the temperature, pressure and density of Sun?
What is the temperature of a yellow star like our Sun?
● Syntax
● Semantics
● Generality
● High precision
● High recall
Algorithmic Challenges
![Page 28: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/28.jpg)
Recent Work
![Page 29: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/29.jpg)
Problem Formulation
Binary classification on pairs of questions
Training Data Sources
Hand labeled data, Semi-supervised approaches to bootstrap data, Random negative sampling,
User browse/search behavior, Language model on standard datasets, ...
Models
Logistic Regression, Random Forests, GBDT, Deep Neural Networks, Ensembles…
Features
Word embeddings, conventional IR features, usage based features, …
Our Approach
![Page 30: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/30.jpg)
● Judgements are hairy for even humans.
● Can’t optimize some user action directly.
● Training data is scarce -- need to fuse multiple data sources together.
Duplicate Questions: Problem Properties
![Page 31: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/31.jpg)
1. Duplicate Question Detection
2. Answer Ranking
3. Topic Expertise Detection
4. Moderation
![Page 32: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/32.jpg)
Given a question, how do you rank answers by quality?
![Page 33: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/33.jpg)
Rank answers to a question by their “quality”
Problem Statement
![Page 34: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/34.jpg)
![Page 35: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/35.jpg)
A simple function of upvotes and downvotes,
with some precomputed author priors.
Previous Approach
![Page 36: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/36.jpg)
● Popular answers != factually correct
● Joke answers get disproportionately many upvotes
● Expert answers ranked lower than answers by popular writers
● Rich get richer
● Poor ranking for new answers
● ...
Great Baseline, but...
![Page 37: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/37.jpg)
● Upvote means different things to different people e.g funny, correct, useful.
● Doesn’t always correspond to quality
● ...what is quality?
Why do all these problems exist?
![Page 38: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/38.jpg)
Defining High Quality Answers
● Answers the question
● Is factually correct
● Is clear and easy to read
● Supported with rationale
● Demonstrates credibility
● ...
![Page 39: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/39.jpg)
Answer Ranking: Formulation
● Item-wise regression on answers.
● Also tried item-wise multi-class
classification on score buckets
● Item-wise enable comparing answers
across different questions.
● Can also discover “Quora Gold” and
“really bad” answers
Answer 1
Answer 2
Answer 3
Question
0.9
0.8
0.5
![Page 40: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/40.jpg)
● R2
● Weighted R2 with different weights for
different parts of the quality spectrum
● NDCG
● Kendall’s Tau
● ...
Answer Ranking: Evaluation
![Page 41: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/41.jpg)
● Hand labeled data
● Language model on standard datasets
● Explicit quality survey shown to users
● Implicit data from usage
● Semi-supervised approaches for label propagation
● Surrogate learning (e.g predicting if “topic experts” will
upvote the answer)
● ...
Answer Ranking: Training Data
![Page 42: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/42.jpg)
We tested 100+ features, the final model uses ~50 features after feature pruning
● User features -- e.g “Is the author an expert in the topic?”
● Answer text features -- e.g “What is the syntactic complexity of the text?”
● Question/Answer features -- e.g “Is the answer answering the question?”
● Voter features -- e.g “Is voter an expert in the topic?”
● Metadata features -- e.g “How many answers did the question have when the answer was written?”
Answer Ranking: Features
![Page 43: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/43.jpg)
Models
● Logistic Regression
● Random Forests
● Gradient Boosted Decision Trees
● Recurrent Neural Networks
● Ensembles
● …
GBDTs provide a good balance between accuracy, complexity, training time, prediction time and ease of deploying in production.
Answer Ranking: Models
![Page 44: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/44.jpg)
Answer Ranking: Interpretability
![Page 45: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/45.jpg)
Our Approach
![Page 46: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/46.jpg)
● Latency: tens of milliseconds
● Computing 100 features each for 100 answers, even at 10us per feature computation,
can take 100ms.
● Need to parallelize computation, and also cache feature values/scores.
● Caching → need to support real-time cache dirties/updates.
Answer Ranking: Productionizing
![Page 47: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/47.jpg)
● Trick -- don’t recompute scores if the feature
doesn’t flip any ‘decision branch’.
Answer Ranking: Productionizing
![Page 48: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/48.jpg)
Answer Quality: Problem Properties
● Need to start with defining what we want the model to learn.
● Feature engineering and interpretability are important.
● Class imbalance for classification problems.
● Training data is scarce -- need to fuse multiple data sources together.
![Page 49: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/49.jpg)
1. Duplicate Question Detection
2. Answer Ranking
3. Topic Expertise Detection
4. Moderation
![Page 50: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/50.jpg)
![Page 51: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/51.jpg)
● Important signal to all other quality
systems.
● Can make content more trustworthy.
● Helps retaining and engaging experts
Topic Expertise Matters For Quality
![Page 52: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/52.jpg)
Topic Experts
RelevantCredentials
![Page 53: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/53.jpg)
Predict topic expertise level of users.
Problem Statement
![Page 54: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/54.jpg)
Deducing Expertise From Topic Biography
“Researcher at MSR since 2005”
“ML Engineer at Quora”
“Taken undergraduate courses”
Degree of Expertise In Topic “Machine Learning”
“Invented AdaBoost”“Learning machine programming”
![Page 55: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/55.jpg)
Problem Formulation
Multi-class classification on text of biography, classes being discrete buckets on the expertise spectrum.
Experts are sparse → class imbalance.
Training Data Sources
Hand labeled data, Data from other quality measures, Label propagation, Users can “report” bios...
Our Approach
![Page 56: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/56.jpg)
Models
Logistic Regression, Random Forests, Gradient Boosted Decision Trees, ...
Regularization is very important
Features
Ngrams, Named Entities, Cosine similarity between topic name and biography text, ...
Our Approach
![Page 57: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/57.jpg)
● Andrew Ng is an ML expert → his ML
answers must be good.
● What about answers upvoted by him?
● What about answers written by
someone whose own answers were
upvoted by him?
● …
Deducing Expertise From Voting
![Page 58: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/58.jpg)
Topic Expertise Propagation Graph
● Trust in expertise is transitive: A → B, and B → C ⇒ A → C
● So trust in expertise propagates through the network
● We can mine the graph to discover topic expertise using graph algorithms like PageRank
● Unsupervised learning!
![Page 59: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/59.jpg)
Ensembles To Combine Submodels
● Models trained on heterogeneous data
● Models trained using supervised, semi-supervised and unsupervised approaches
● Low correlation between different models
● Can combine them using ensembles
![Page 60: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/60.jpg)
Topic Expertise: Problem Properties
● Class imbalance for classification problems.
● Unsupervised learning is powerful.
● Ensembles can help combine learners trained on data from different sources.
![Page 61: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/61.jpg)
1. Duplicate Question Detection
2. Answer Ranking
3. Topic Expertise Detection
4. Moderation
![Page 62: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/62.jpg)
Moderation
● Any user-generated-content product has lots of
moderation challenges.
● Content -- spam, hate-speech, porn, plagiarism etc.
● Account -- fake names, impersonation, sockpuppets
● Quora specific policies -- e.g. answers making fun of
questions, insincere questions.
![Page 63: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/63.jpg)
Moderation: ML Challenges
● Super nuanced judgements, too hard for even
trained humans.
● Noisy labeled data
● Too hard a problem for a computer.
![Page 64: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/64.jpg)
Moderation: ML Challenges
● Difficulty in learning due to severe class
imbalance.
● Metrics can be deceiving -- useless
models at 99 % accuracy
● Scarce labeled data, class imbalance
makes it worse
![Page 65: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/65.jpg)
Moderation: ML Challenges
● High precision needed.
● Usually get extremely low recall at desired precision
levels.
● Want to detect problems users see bad content, so
can’t rely on any user interactions.
![Page 66: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/66.jpg)
Moderation: Tricks and Approaches
● Start with looking at the right metrics.
● Can use standard metrics like F1, AUC.
● Or fine tune metrics based on your
application needs.
![Page 67: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/67.jpg)
Moderation: Tricks and Approaches
● Oversampling minority class, undersampling
majority class
● Random sampling to generate negative
examples
● Higher cost of mistakes on the minority
class.
![Page 68: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/68.jpg)
Moderation: Tricks and Approaches
● Successive iterations of: hand labeling →
model to reduce the space → hand labeling …
● Scarce data, high feature dimensionality →
simple models with regularization often work
very well
● Using low precision classifiers for making
human review efficient
![Page 69: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/69.jpg)
Moderation: Problem Properties
● Severe class imbalance.
● Need to look at the “right” evaluation metrics.
● Sampling techniques can be very effective.
![Page 70: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/70.jpg)
Summary
What do all quality problems have in common?
![Page 71: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/71.jpg)
● Start with defining what we want the model to do.
● Product intuition is important for feature engineering.
● Interpretability of models is important for iteration.
Machine Learning + Product Understanding
![Page 72: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/72.jpg)
● Good and cheap training data is unavailable/costly.
● Often need to combine data from different sources.
● Unsupervised/semi-supervised learning are very useful.
● Ensembles are your friends.
Training Data Scarcity
![Page 73: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/73.jpg)
Dealing With Class Imbalance
● Look at the right metrics.
● Re-sampling techniques can be very effective.
● Can incorporate “data collection” cost into the algorithm.
![Page 74: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/74.jpg)
● More specific semi-supervised learning approaches.
● Combining Quality, Relevance and Demand together.
● Avoiding (and sometimes creating) feedback loops.
● Engineering challenges behind these ML problems.
Topics For Some Other Day
![Page 75: @Quora @QconSF 11/7/16 @nikhilgarg28 Scaling Quality On … · 2017-02-02 · Scaling Quality On Quora Using Machine Learning Nikhil Garg @nikhilgarg28 @Quora @QconSF 11/7/16 Introducing](https://reader033.vdocuments.site/reader033/viewer/2022053017/5f1cc07dbb9662692c7261d2/html5/thumbnails/75.jpg)
Nikhil Garg
@nikhilgarg28
Thank You! Questions?
Standard Disclaimer: Quora Is Hiring :)