efficient pseudo-relevance feedback methods for collaborative filtering recommendation [ecir...
TRANSCRIPT
ECIR 2016, PADUA, ITALYEFFICIENT PSEUDO-RELEVANCE FEEDBACKMETHODS FOR COLLABORATIVE FILTERINGRECOMMENDATION
Daniel Valcarce, Javier Parapar, Álvaro Barreiro@dvalcarce @jparapar @AlvaroBarreiroG
Information Retrieval Lab@IRLab_UDC
University of A CoruñaSpain
Outline
1. Pseudo-Relevance Feedback (PRF)
2. Collaborative Filtering (CF)
3. PRF Methods for CF
4. Experiments
5. Conclusions and Future Work
1/28
PSEUDO-RELEVANCE FEEDBACK (PRF)
Pseudo-Relevance Feedback (I)
Pseudo-Relevance Feedback provides an automatic method forquery expansion:
# Assumes that the top retrieved documents with theoriginal query are relevant (pseudo-relevant set).
# The query is expanded with the most representative termsfrom this set.
# The expanded query is expected to yield better results thanthe original one.
3/28
Pseudo-Relevance Feedback (II)
Information need
4/28
Pseudo-Relevance Feedback (II)
Information need
query
4/28
Pseudo-Relevance Feedback (II)
Information need
query RetrievalSystem
4/28
Pseudo-Relevance Feedback (II)
Information need
query RetrievalSystem
4/28
Pseudo-Relevance Feedback (II)
Information need
query RetrievalSystem
4/28
Pseudo-Relevance Feedback (II)
Information need
query RetrievalSystem
4/28
Pseudo-Relevance Feedback (II)
Information need
query RetrievalSystem
QueryExpansion
expandedquery
4/28
Pseudo-Relevance Feedback (II)
Information need
query RetrievalSystem
QueryExpansion
expandedquery
4/28
Pseudo-Relevance Feedback (III)
Some popular PRF approaches:
# Based on Rocchio’s model(Rocchio, 1971 & Carpineto et al., ACM TOIS 2001)
# Relevance-Based Language Models(Lavrenko & Croft, SIGIR 2001)
# Divergence Minimization Model(Zhai & Lafferty, SIGIR 2006)
# Mixture Models(Tao & Zhai, SIGIR 2006)
5/28
COLLABORATIVE FILTERING (CF)
Recommender Systems
Notation:
# The set of users U
# The set of items I
# The rating that the user u gave to the item i is ru ,i
# The set of items rated by user u is denoted by Iu
# The set of users that rated item i is denoted by Ui
# The neighbourhood of user u is denoted by Vu
Top-N recommendation: create a ranked list containingrelevant and unknown items for each user u ∈ U.
7/28
Collaborative Filtering (I)
Collaborative Filtering (CF) employs the past interactionbetween users and items to generate recommendations.
Idea: If this user who is similar to you likes this item, maybe you willalso like it.
Different input data:
# Explicit feedback: ratings, reviews...
# Implicit feedback: clicks, purchases...
Perhaps the most popular approach to recommendation giventhe increasing amount of information about users.
8/28
Collaborative Filtering (II)
Collaborative Filtering (CF) techniques can be classified in:
# Model-based methods: learn a predictive model from theuser-item ratings.◦ Matrix factorisation (e.g., SVD)
# Neighbourhood-based (or memory-based) methods:compute recommendations using directly part of theratings.◦ k-NN approaches
9/28
PRF METHODS FOR CF
PRF for CF
PRF CFUser’s query User’s profile
mostˆ1,populatedˆ2,stateˆ2 Titanicˆ2,Avatarˆ3,Matrixˆ5
Docum
ents
Neigh
bours
Term
s
Items
11/28
Previous Work on Adapting PRF Methods to CF
Relevance-Based Language Models
# Originally devised for PRF (Lavrenko & Croft, SIGIR 2001).# Adapted to CF (Parapar et al., Inf. Process. Manage. 2013).# Two models: RM1 and RM2.# High precision figures in recommendation.
# ... but high computational cost!
RM1 : p(i |Ru) ∝∑v∈Vu
p(v) p(i |v)∏j∈Iu
p( j |v)
RM2 : p(i |Ru) ∝ p(i)∏j∈Iu
∑v∈Vu
p(i |v) p(v)p(i) p( j |v)
12/28
Previous Work on Adapting PRF Methods to CF
Relevance-Based Language Models
# Originally devised for PRF (Lavrenko & Croft, SIGIR 2001).# Adapted to CF (Parapar et al., Inf. Process. Manage. 2013).# Two models: RM1 and RM2.# High precision figures in recommendation.# ... but high computational cost!
RM1 : p(i |Ru) ∝∑v∈Vu
p(v) p(i |v)∏j∈Iu
p( j |v)
RM2 : p(i |Ru) ∝ p(i)∏j∈Iu
∑v∈Vu
p(i |v) p(v)p(i) p( j |v)
12/28
Our Proposals based on Rocchio’s Framework
Rocchio’s Weights
pRocchio(i |u) �∑v∈Vu
rv ,i
|Vu |
Robertson Selection Value g
pRSV (i |u) �∑v∈Vu
rv ,i
|Vu | p(i |Vu)
CHI-2 g
pCHI−2(i |u) ��p(i |Vu) − p(i |C)�2
p(i |C)
Kullback–Leibler Divergence
pKLD(i |u) � p(i |Vu) logp(i |Vu)p(i |C)
13/28
Our Proposals based on Rocchio’s Framework
Rocchio’s Weights
pRocchio(i |u) �∑v∈Vu
rv ,i
|Vu |
Robertson Selection Value g
pRSV (i |u) �∑v∈Vu
rv ,i
|Vu | p(i |Vu)
CHI-2 g
pCHI−2(i |u) ��p(i |Vu) − p(i |C)�2
p(i |C)
Kullback–Leibler Divergence
pKLD(i |u) � p(i |Vu) logp(i |Vu)p(i |C)
13/28
Our Proposals based on Rocchio’s Framework
Rocchio’s Weights
pRocchio(i |u) �∑v∈Vu
rv ,i
|Vu |
Robertson Selection Value g
pRSV (i |u) �∑v∈Vu
rv ,i
|Vu | p(i |Vu)
CHI-2 g
pCHI−2(i |u) ��p(i |Vu) − p(i |C)�2
p(i |C)
Kullback–Leibler Divergence
pKLD(i |u) � p(i |Vu) logp(i |Vu)p(i |C)
13/28
Probability Estimation
Maximum Likelihood Estimate under a MultinomialDistribution over the ratings:
pmle(i |Vu) �∑
v∈Vu rv ,i∑v∈Vu , j∈I rv , j
pmle(i |C) �∑
u∈U ru ,i∑u∈U, j∈I ru , j
14/28
Neighbourhood Length Normalisation (I)
Neighbourhoods are computed using clustering algorithms:
# Hard clustering: every user is in only one cluster. Clustersmay have different sizes. Example: k-means.
# Soft clustering: each user has its own neighbours. Whenwe set k to a high value, we may find different amounts ofneighbours. Example: k-NN.
Idea: consider the variability of the neighbourhood lengths:
# Big neighbourhoods is equivalent to a query with a lot ofresults: the collection model is closed to the target user.
# Small neighbourhoods implies that neighbours are highlyspecific: the collection is very different from the target user.
15/28
Neighbourhood Length Normalisation (I)
Neighbourhoods are computed using clustering algorithms:
# Hard clustering: every user is in only one cluster. Clustersmay have different sizes. Example: k-means.
# Soft clustering: each user has its own neighbours. Whenwe set k to a high value, we may find different amounts ofneighbours. Example: k-NN.
Idea: consider the variability of the neighbourhood lengths:
# Big neighbourhoods is equivalent to a query with a lot ofresults: the collection model is closed to the target user.
# Small neighbourhoods implies that neighbours are highlyspecific: the collection is very different from the target user.
15/28
Neighbourhood Length Normalisation (II)
We bias the MLE to perform neighbourhood lengthnormalisation:
pnmle(i |Vu) rank�
1|Vu |
∑v∈Vu rv ,i∑
v∈Vu , j∈I rv , j
pnmle(i |C) rank�
1|U |
∑u∈U ru ,i∑
u∈U, j∈I ru , j
16/28
EXPERIMENTS
Experimental settings
Baselines:
# UB: traditional user-based neighbourhood approach.# SVD: matrix factorisation.# UIR-Item: probabilistic approach.# RM1 and RM2: Relevance-Based Language Models.
Our algorithms:
# Rocchio’s Weights (RW)# Robertson Selection Value (RSV)# CHI-2# Kullback-Leibler Divergence (KLD)
18/28
Efficiency
0.01
0.1
1
10
ML 100k ML 1M ML 10Mreco
mm
enda
tion
tim
epe
rus
er(s
)
dataset
UIRRM1RM2
SVD++RSVUBRW
CHI-2KLD
19/28
Accuracy (nDCG@10)
Algorithm ML 100k ML 1M R3-Yahoo! LibraryThing
UB 0.0468 0.0313 0.0108 0.0055b
SVD 0.0936a 0.0608a 0.0101 0.0015UIR-Item 0.2188ab 0.1795abd 0.0174abd 0.0673abd
RM1 0.2473abc 0.1402ab 0.0146ab 0.0444ab
RM2 0.3323abcd 0.1992abd 0.0207abcd 0.0957abcd
Rocchio’s Weights 0.2604abcd 0.1557abd 0.0194abcd 0.0892abcd
RSV 0.2604abcd 0.1557abd 0.0194abcd 0.0892abcd
KLDMLE 0.2693abcd 0.1264ab 0.0197abcd 0.1576abcde
NMLE 0.3120abcd 0.1546ab 0.0201abcd 0.1101abcde
CHI-2MLE 0.0777a 0.0709ab 0.0149ab 0.0939abcd
NMLE 0.3220abcd 0.1419ab 0.0204abcd 0.1459abcde
Table: Values of nDCG@10. Pink = best algorithm. Blue = notsignificantly different to the best (Wilcoxon two-sided p < 0.01). 20/28
Diversity (Gini@10)
Algorithm ML 100k ML 1M R3-Yahoo! LibraryThing
UIR-Item 0.0124 0.0050 0.0137 0.0005RM2 0.0256 0.0069 0.0207 0.0019CHI-2 NMLE 0.0450 0.0106 0.0506 0.0539
Table: Values of the complement of Gini index at 10. Pink = bestalgorithm.
21/28
Novelty (MSI@10)
Algorithm ML 100k ML 1M R3-Yahoo! LibraryThing
UIR-Item 5.2337e 8.3713e 3.7186e 17.1229eRM2 6.8273c 8.9481c 4.9618c 19.27343c
CHI-2 NMLE 8.1711ec 10.0043ec 7.5555ec 8.8563
Table: Values of Mean Self-Information at 10. Pink = best algorithm.
22/28
Trade-off Accuracy-Diversity
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
200 300 400 500 600 700 800 900
G–(Gini,n
DCG)
k
RM2CHI-2 NMLE
Figure: G-measure of nDCG@10 and Gini@10 on MovieLens 100kvarying the number of neighbours k using Pearson’s correlationsimilarity.
23/28
Trade-off Accuracy-Novelty
0.91.01.11.21.31.41.51.61.71.81.92.0
200 300 400 500 600 700 800 900
G–(MSI,nDCG)
k
RM2CHI-2 NMLE
Figure: G-measure of nDCG@10 and MSI@10 on MovieLens 100kvarying the number of neighbours k using Pearson’s correlationsimilarity.
24/28
CONCLUSIONS AND FUTURE WORK
Conclusions
We proposed to use fast PRF methods (Rocchio’s Weigths, RSV,KLD and CHI-2):
# They are orders of magnitude faster than the RelevanceModels (up to 200x).
# They generate quite accurate recommendations.
# Good novelty and diversity figures with a better trade-offthan RM2.
# They lack of parameters (only clustering parameters).
26/28
Future Work
Other approaches for computing neighbourhoods:
# Posterior Probability Clustering (a non-negative matrixfactorisation).
# Normalised Cut (spectral clustering).
Explore other PRF methods:
# Divergence Minimization Models.
# Mixture Models.
27/28
Future Work
Other approaches for computing neighbourhoods:
# Posterior Probability Clustering (a non-negative matrixfactorisation).
# Normalised Cut (spectral clustering).
Explore other PRF methods:
# Divergence Minimization Models.
# Mixture Models.
27/28
THANK YOU!
@DVALCARCEhttp://www.dc.fi.udc.es/~dvalcarce