i want to answer, who has a
TRANSCRIPT
- 1. Author: Gideon Dror, Yehuda Koren, YoelleMaarek,
IdanSzpektor
Publication: KDD 2011
Presenter: Po-chih Chen
I Want to Answer, Who Has a Question?Yahoo! Answers Recommender System
Copyright 2011 ACM
1
2. Outline
Introduction
Related work and background
Problem characterization
A multi-channel recommender system model
Empirical study
Conclusions
Copyright 2011 ACM
2
3. Introduction
In spite of the continuous progress of Web search engines, Many of
users needs still remain unanswered.
The internet behind the query not being well expressed
The absence of relevant content.
While Community Question Answering (CQA)sites can feature factoid
questions, their primary goal is to satisfy needs such as:
Opinion seeking
Recommendation
Open-ended questions
Problem solving
Copyright 2011 ACM
3
4. Introduction(cont.)
Searching for questions to answer is a different challenge than
regular Web search, as users are driven by more than content
similarity or page popularity.
The figure shows a snapshot of a list of recent questions at a
given time.
Waiting for questions which you want to answer?
Copyright 2011 ACM
4
5. Introduction(cont.)
In this paper, to address the answering mood need by suggesting the
right questions to potential answerers
They propose multi-channel recommender (MCR) system.
MCR accounts for the multiple dimensions of the data
Copyright 2011 ACM
5
6. Related work and background
Yahoo! Answers
It is currently the largest existing CQA site.
A question thread starts:
The question remains open
for four days with an option for extension
no best answer is chosen: in-voting
or for less if the asker chose a best answer
After actions above: The question considered resolved.
It has high variance in perceived question and answer
quality.
Their research focuses on a complementary task: matching questions
to users before answers are written.
Copyright 2011 ACM
6
7. Related work and background(cont.)
Recommender Systems
Recommender systems are based on two different strategies:
Collaborative filtering (CF)
It relies on analyzing relationships between users and
interdependencies among products in order to identify new user-item
matches.
Content analysis (CA)
These techniques create a characterizing profile for each user or
product. The resulting profiles allow programs to associate users
with matching products.
for solving cold-startscenarios
Copyright 2011 ACM
7
8. Recommender Systems
The two primary schools of CF are:
latent factor models
It explain ratings by characterizing both items and users on
factors inferred from rating patterns
neighborhood methods
It compute the relationships among users, estimating unknown
ratings based on recorded ratings of like minded users
Their method introduces a novel, symmetric integration of CF with
CA approaches that allows exploiting behavioral signals together
with user- and question-attributes.
Copyright 2011 ACM
8
9. Problem characterization
The task of recommending questions brings less well addressed
challenges, which induce the unique design criteria for their
model.
The first factor to consider is that different families of item
descriptors need to be exploited
A second factor comes from the need to account for the multiple
kinds of interactions of different intensities between users and
questions
When data per user and item is scarce, exploiting these diverse
types of user item interactions is vital.
Copyright 2011 ACM
9
10. A multi-channel recommender system model
This section we introduce a Multi-Channel Recommender system model
(MCR) for assessing the match between a user and a question.
how questions and users are mapped into their attribute
representation
Question Attributes
User attributes
how multiple features are derived from the multi-channel attributes
of the users and questions
Interaction features
Bias features
how user and question-specific features are incorporated into MCR
and how the model is trained.
Copyright 2011 ACM
10
11. Question Attributes
Question attributes are split into three
families: textual, categories and user IDs
Textual Family
This family encodes textual information and takes text tokens as
values.
For each text block, our tokenizer annotates each word with its
part-of-speech(POS) tag and lemma.
The extracted terms are counted separately within each field,
producing four sets of (term, count) as values of four
attributes
Then they filter out non-representative terms
Copyright 2011 ACM
11
12. Question Attributes (cont.)
Then they filter out non-representative terms
For every question, we retain only terms that are either nouns,
verbs or adjectives, based on their POS tags
Then, each term t is ranked by its usefulness L(t).
They define usefulness as the entropy of the distribution of
categories given t
C is the set of all categories in Yahoo! Answers
#c(t) is the number of times term t appeared in text fields within
category c, and
Copyright 2011 ACM
12
13. Question Attributes (cont.)
Copyright 2011 ACM
13
Category Family
Category Family reflects the category of the question that the user
has to select, from a predefined taxonomy
They obviously select the user-selected category as a direct
attribute, but we also add parent and grand-parent categories, when
available, in order to inherit semantic similarities.
14. Question Attributes (cont.)
Users can interact with a question in various ways, each deserving
a different treatment
asker: the user asking the question
best answerer: the user who provided the best answer
answerers: other users who answered the question
question voters: users who starred the question as
interesting
answer voters: users who voted on the quality of individual answers
(by thumb up/down votes)
best answer selectors: users who participated in the best answer
voting process
question tracers: users who requested to receive updates on the
question
Copyright 2011 ACM
14
15. Question Attributes (cont.)
Copyright 2011 ACM
15
Formal question attributes model
A question q is described by an attribute matrix
The d1 columns of the matrix correspond to each individual textual
token, category and user
The d2 rows correspond to the attributes
Qq[i][j] holds the count for term j of attribute i.
Example:
Qq[title][football] = 1, d2 = 14
16. User attributes
Copyright 2011 ACM
16
Users may explicitly pick their preferences over attributes within
each of the attribute families.
Question-driven attributes
They do not want to arbitrarily weigh the relative importance of
each of questions and answers interaction types.
They keep them separate by adding another dimension to the user
repository, called channels.
Channels that qualify the user interaction with the questions
Asked
best answered
Answered
voted on question
voted on an associated answer
voted on best answer
Traced the question
17. User attributes (cont.)
Copyright 2011 ACM
17
Question-driven attributes (cont.)
Channels serve a different purpose: associating a user with
questions
Each channel aggregates properties from the questions corresponding
to a certain kind of interaction
The model describes 49 kinds of user-user interaction
Cartesian product of the two identical 7-tuples
Explicit user attributes
one more channel for expressing direct user preferences.
user can explicitly specify which keywords and categories he is
interested in
or which other users s/he would like to follow.
Textual and category families in this channel remain empty.
18. User attributes (cont.)
Copyright 2011 ACM
18
Formal user attributes model
A user u is represented by a 3-dimensional tensor
The first dimension corresponds to the channels of interaction ( d3
=8)
The other two dimensions correspond to attributes and values, in
analogy to the question representation
is the set of questions with which user u interacted through
channel c
19. Interaction features
Copyright 2011 ACM
19
These features are used by a classifier to evaluate the match
between the user and the question.
Pairing each question attribute with each user attribute creates
multiple features
For each question and user attributes of the same family
They create a distinct interaction feature by measuring the cosine
similarity between their corresponding attribute vectors
The interaction feature resulting by matching s and t under c is
the inner product:
let t be one of the question attributes
s be one of the user attributes under channel c
20. Bias features
Copyright 2011 ACM
20
some questions that already received several answers are less
attractive to users who shoot for best answer votes.
They address these intuitions by adding 5 user-specific and
question-specific biases as features to each question-user
pair
21. Empirical study
Copyright 2011 ACM
21
Experimental Setup
They built user profiles based on past user activity, and then, at
test time, we match these users to new questions.
User profiles were constructed from four consecutive months of
Yahoo! Answers activity logs.
New questions were then taken from the following fifth month.
22. Empirical study (cont.)
Copyright 2011 ACM
22
Model Training
They training the MCR model using several linear and non-linear
classifiers
The best results were achieved by Gradient Boosted Decision Trees
(GBDT)
The feature space is not very large, they could afford using
complex classifier
There are four parameters controlling GBDT
number of trees
size of each tree
Shrinkage (or, learning rate)
sampling rate
In their setup the parameter settings are:
#trees=100, tree-size=20, shrinkage=1, and sampling-rate=0.5
23. Baseline Models
Copyright 2011 ACM
23
The weight of each feature is the sum, over all channels and
attributes, of the multiplication of the feature weight in the
question and in the user models.
c are all the possible channels
sand t are all the possible user and question attributes
wc are manually set channel weights
24. Baseline Models (cont.)
Copyright 2011 ACM
24
We constructed two baselines:
simple baseline
Assumes all channels are equally informative (wc = 1),
weighted baseline
chose wc = 1 for asking, answering and best-answering
wc= 1/2 for the remaining channels
We examined several ways to modify the feature distribution:
standardization: each feature is scaled so that its variance is
1
logarithm transformation:xi log(1+xi) and
normalization: the features of each feature family are scaled to
have a squared norm 1
25. Results
Copyright 2011 ACM
25
They evaluated the performance of MCR and the baseline by
calculating the accuracy and the Area Under ROC Curve (AUC) on test
examples.
The AUC metric measures the probability that a positive example
isscored higher than a negative example
This result shows the advantage of the MCR model
26. Results(cont.)
Copyright 2011 ACM
26
To gain some insight on our models performance, we inspectedthe
most important features, as ranked by GBDT
the top features are quite evenly distributed, showing the
importance of utilizing each of these families.
This also shows the importance of splitting the attribute space
into multiple channels, as otherwise this signal would have been
lost.
27. Results(cont.)
Copyright 2011 ACM
27
Table 5 describes the results of testing the classifier with the
possible feature-subsets
The results show that direct social features between users play
only a marginal role in the discovery of promising user-question
pairs
28. Results(cont.)
Copyright 2011 ACM
28
They expect the MCR model to be more precise when recommending
questions to users who interact more with the system.
They divided the users into 12 disjoint bins on a logarithmic
scale, according to the number of answered questions in the user
model.
Figure 5 depicts the mean
accuracy and AUC for each
set of users
29. Conclusions
This paper introduced a novel multi-channel recommender system
approach for suggesting questions to potential answerers in Yahoo!
Answers.
The MCR model enabled us to take advantage of various types of
signals, in full symmetry, without worrying about which should be
emphasized, or which would dilute others.
Their experiments showed that learning to combine many signals
significantly improves the baseline.
Their analysis discovered that direct social relations are not as
important as content signals.
Copyright 2011 ACM
29