i want to answer, who has a

Click here to load reader

Upload: chenbojyh

Post on 27-May-2015

621 views

Category:

Technology


3 download

TRANSCRIPT

  • 1. Author: Gideon Dror, Yehuda Koren, YoelleMaarek, IdanSzpektor
    Publication: KDD 2011
    Presenter: Po-chih Chen
    I Want to Answer, Who Has a Question?Yahoo! Answers Recommender System
    Copyright 2011 ACM
    1

2. Outline
Introduction
Related work and background
Problem characterization
A multi-channel recommender system model
Empirical study
Conclusions
Copyright 2011 ACM
2
3. Introduction
In spite of the continuous progress of Web search engines, Many of users needs still remain unanswered.
The internet behind the query not being well expressed
The absence of relevant content.
While Community Question Answering (CQA)sites can feature factoid questions, their primary goal is to satisfy needs such as:
Opinion seeking
Recommendation
Open-ended questions
Problem solving
Copyright 2011 ACM
3
4. Introduction(cont.)
Searching for questions to answer is a different challenge than regular Web search, as users are driven by more than content similarity or page popularity.
The figure shows a snapshot of a list of recent questions at a given time.
Waiting for questions which you want to answer?
Copyright 2011 ACM
4
5. Introduction(cont.)
In this paper, to address the answering mood need by suggesting the right questions to potential answerers
They propose multi-channel recommender (MCR) system.
MCR accounts for the multiple dimensions of the data
Copyright 2011 ACM
5
6. Related work and background
Yahoo! Answers
It is currently the largest existing CQA site.
A question thread starts:
The question remains open
for four days with an option for extension
no best answer is chosen: in-voting
or for less if the asker chose a best answer
After actions above: The question considered resolved.
It has high variance in perceived question and answer quality.
Their research focuses on a complementary task: matching questions to users before answers are written.
Copyright 2011 ACM
6
7. Related work and background(cont.)
Recommender Systems
Recommender systems are based on two different strategies:
Collaborative filtering (CF)
It relies on analyzing relationships between users and interdependencies among products in order to identify new user-item matches.
Content analysis (CA)
These techniques create a characterizing profile for each user or product. The resulting profiles allow programs to associate users with matching products.
for solving cold-startscenarios
Copyright 2011 ACM
7
8. Recommender Systems
The two primary schools of CF are:
latent factor models
It explain ratings by characterizing both items and users on factors inferred from rating patterns
neighborhood methods
It compute the relationships among users, estimating unknown ratings based on recorded ratings of like minded users
Their method introduces a novel, symmetric integration of CF with CA approaches that allows exploiting behavioral signals together with user- and question-attributes.
Copyright 2011 ACM
8
9. Problem characterization
The task of recommending questions brings less well addressed challenges, which induce the unique design criteria for their model.
The first factor to consider is that different families of item descriptors need to be exploited
A second factor comes from the need to account for the multiple kinds of interactions of different intensities between users and questions
When data per user and item is scarce, exploiting these diverse types of user item interactions is vital.
Copyright 2011 ACM
9
10. A multi-channel recommender system model
This section we introduce a Multi-Channel Recommender system model (MCR) for assessing the match between a user and a question.
how questions and users are mapped into their attribute representation
Question Attributes
User attributes
how multiple features are derived from the multi-channel attributes of the users and questions
Interaction features
Bias features
how user and question-specific features are incorporated into MCR and how the model is trained.
Copyright 2011 ACM
10
11. Question Attributes
Question attributes are split into three
families: textual, categories and user IDs
Textual Family
This family encodes textual information and takes text tokens as values.
For each text block, our tokenizer annotates each word with its part-of-speech(POS) tag and lemma.
The extracted terms are counted separately within each field, producing four sets of (term, count) as values of four attributes
Then they filter out non-representative terms
Copyright 2011 ACM
11
12. Question Attributes (cont.)
Then they filter out non-representative terms
For every question, we retain only terms that are either nouns, verbs or adjectives, based on their POS tags
Then, each term t is ranked by its usefulness L(t).
They define usefulness as the entropy of the distribution of categories given t
C is the set of all categories in Yahoo! Answers
#c(t) is the number of times term t appeared in text fields within category c, and
Copyright 2011 ACM
12
13. Question Attributes (cont.)
Copyright 2011 ACM
13
Category Family
Category Family reflects the category of the question that the user has to select, from a predefined taxonomy
They obviously select the user-selected category as a direct attribute, but we also add parent and grand-parent categories, when available, in order to inherit semantic similarities.
14. Question Attributes (cont.)
Users can interact with a question in various ways, each deserving a different treatment
asker: the user asking the question
best answerer: the user who provided the best answer
answerers: other users who answered the question
question voters: users who starred the question as interesting
answer voters: users who voted on the quality of individual answers (by thumb up/down votes)
best answer selectors: users who participated in the best answer voting process
question tracers: users who requested to receive updates on the question
Copyright 2011 ACM
14
15. Question Attributes (cont.)
Copyright 2011 ACM
15
Formal question attributes model
A question q is described by an attribute matrix
The d1 columns of the matrix correspond to each individual textual token, category and user
The d2 rows correspond to the attributes
Qq[i][j] holds the count for term j of attribute i.
Example:
Qq[title][football] = 1, d2 = 14
16. User attributes
Copyright 2011 ACM
16
Users may explicitly pick their preferences over attributes within each of the attribute families.
Question-driven attributes
They do not want to arbitrarily weigh the relative importance of each of questions and answers interaction types.
They keep them separate by adding another dimension to the user repository, called channels.
Channels that qualify the user interaction with the questions
Asked
best answered
Answered
voted on question
voted on an associated answer
voted on best answer
Traced the question
17. User attributes (cont.)
Copyright 2011 ACM
17
Question-driven attributes (cont.)
Channels serve a different purpose: associating a user with questions
Each channel aggregates properties from the questions corresponding to a certain kind of interaction
The model describes 49 kinds of user-user interaction
Cartesian product of the two identical 7-tuples
Explicit user attributes
one more channel for expressing direct user preferences.
user can explicitly specify which keywords and categories he is interested in
or which other users s/he would like to follow.
Textual and category families in this channel remain empty.
18. User attributes (cont.)
Copyright 2011 ACM
18
Formal user attributes model
A user u is represented by a 3-dimensional tensor
The first dimension corresponds to the channels of interaction ( d3 =8)
The other two dimensions correspond to attributes and values, in analogy to the question representation
is the set of questions with which user u interacted through channel c
19. Interaction features
Copyright 2011 ACM
19
These features are used by a classifier to evaluate the match between the user and the question.
Pairing each question attribute with each user attribute creates multiple features
For each question and user attributes of the same family
They create a distinct interaction feature by measuring the cosine similarity between their corresponding attribute vectors
The interaction feature resulting by matching s and t under c is the inner product:
let t be one of the question attributes
s be one of the user attributes under channel c
20. Bias features
Copyright 2011 ACM
20
some questions that already received several answers are less attractive to users who shoot for best answer votes.
They address these intuitions by adding 5 user-specific and question-specific biases as features to each question-user pair
21. Empirical study
Copyright 2011 ACM
21
Experimental Setup
They built user profiles based on past user activity, and then, at test time, we match these users to new questions.
User profiles were constructed from four consecutive months of Yahoo! Answers activity logs.
New questions were then taken from the following fifth month.
22. Empirical study (cont.)
Copyright 2011 ACM
22
Model Training
They training the MCR model using several linear and non-linear classifiers
The best results were achieved by Gradient Boosted Decision Trees (GBDT)
The feature space is not very large, they could afford using complex classifier
There are four parameters controlling GBDT
number of trees
size of each tree
Shrinkage (or, learning rate)
sampling rate
In their setup the parameter settings are:
#trees=100, tree-size=20, shrinkage=1, and sampling-rate=0.5
23. Baseline Models
Copyright 2011 ACM
23
The weight of each feature is the sum, over all channels and attributes, of the multiplication of the feature weight in the question and in the user models.
c are all the possible channels
sand t are all the possible user and question attributes
wc are manually set channel weights
24. Baseline Models (cont.)
Copyright 2011 ACM
24
We constructed two baselines:
simple baseline
Assumes all channels are equally informative (wc = 1),
weighted baseline
chose wc = 1 for asking, answering and best-answering
wc= 1/2 for the remaining channels
We examined several ways to modify the feature distribution:
standardization: each feature is scaled so that its variance is 1
logarithm transformation:xi log(1+xi) and
normalization: the features of each feature family are scaled to have a squared norm 1
25. Results
Copyright 2011 ACM
25
They evaluated the performance of MCR and the baseline by calculating the accuracy and the Area Under ROC Curve (AUC) on test examples.
The AUC metric measures the probability that a positive example isscored higher than a negative example
This result shows the advantage of the MCR model
26. Results(cont.)
Copyright 2011 ACM
26
To gain some insight on our models performance, we inspectedthe most important features, as ranked by GBDT
the top features are quite evenly distributed, showing the importance of utilizing each of these families.
This also shows the importance of splitting the attribute space into multiple channels, as otherwise this signal would have been lost.
27. Results(cont.)
Copyright 2011 ACM
27
Table 5 describes the results of testing the classifier with the possible feature-subsets
The results show that direct social features between users play only a marginal role in the discovery of promising user-question pairs
28. Results(cont.)
Copyright 2011 ACM
28
They expect the MCR model to be more precise when recommending questions to users who interact more with the system.
They divided the users into 12 disjoint bins on a logarithmic scale, according to the number of answered questions in the user model.
Figure 5 depicts the mean
accuracy and AUC for each
set of users
29. Conclusions
This paper introduced a novel multi-channel recommender system approach for suggesting questions to potential answerers in Yahoo! Answers.
The MCR model enabled us to take advantage of various types of signals, in full symmetry, without worrying about which should be emphasized, or which would dilute others.
Their experiments showed that learning to combine many signals significantly improves the baseline.
Their analysis discovered that direct social relations are not as important as content signals.
Copyright 2011 ACM
29