similarity & recommendation - cwi scientific meeting - sep 27th, 2013
DESCRIPTION
Recommender systems aim to predict the content that a user would like based on observations of the online behaviour of its users. Research in the Information Access group addresses different aspects of this problem, varying from how to measure recommendation results, how recommender systems relate to information retrieval models, and how to build effective recommender systems (note: last Friday, we won the ACM RecSys 2013 News Recommender Systems challenge). We would like to develop a general methodology to diagnose weaknesses and strengths of recommender systems. In this talk, I discuss the initial results of an analysis of the core component of collaborative filtering recommenders: the similarity metric used to find the most similar users (neighbours) that will provide the basis for the recommendation to be made. The purpose is to shed light on the question why certain user similarity metrics have been found to perform better than others. We have studied statistics computed over the distance distribution in the neighbourhood as well as properties of the nearest neighbour graph. The features identified correlate strongly with measured prediction performance - however, we have not yet discovered how to deploy this knowledge to actually improve recommendations made.TRANSCRIPT
![Page 1: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/1.jpg)
Similarity & Recommendation
Arjen P. de Vries
[email protected] CWI Scientific Meeting
September 27th 2013
![Page 2: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/2.jpg)
Recommendation
• Informally:– Search for information “without a query”
• Three types:– Content-based recommendation– Collaborative filtering (CF)
• Memory-based• Model-based
– Hybrid approaches
![Page 3: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/3.jpg)
Recommendation
• Informally:– Search for information “without a query”
• Three types:– Content-based recommendation– Collaborative filtering
• Memory-based• Model-based
– Hybrid approaches
Today’s focus!
![Page 4: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/4.jpg)
Collaborative Filtering• Collaborative filtering (originally introduced by
Patti Maes as “social information filtering”)
1. Compare user judgments2. Recommend differences between
similar users
• Leading principle:People’s tastes are not randomly distributed– A.k.a. “You are what you buy”
![Page 5: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/5.jpg)
Collaborative Filtering• Benefits over content-based approach
– Overcomes problems with finding suitable features to represent e.g. art, music
– Serendipity– Implicit mechanism for qualitative aspects like
style
• Problems: large groups, broad domains
![Page 6: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/6.jpg)
Context• Recommender systems
– Users interact (rate, purchase, click) with items
![Page 7: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/7.jpg)
Context• Recommender systems
– Users interact (rate, purchase, click) with items
![Page 8: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/8.jpg)
Context• Recommender systems
– Users interact (rate, purchase, click) with items
![Page 9: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/9.jpg)
Context• Recommender systems
– Users interact (rate, purchase, click) with items
![Page 10: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/10.jpg)
Context• Nearest-neighbour recommendation methods
– The item prediction is based on “similar” users
![Page 11: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/11.jpg)
Context• Nearest-neighbour recommendation methods
– The item prediction is based on “similar” users
![Page 12: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/12.jpg)
Similarity
![Page 13: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/13.jpg)
Similarity
![Page 14: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/14.jpg)
Similarity
s( , ) sim( , )s( , )
![Page 15: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/15.jpg)
Research Question
• How does the choice of similarity measure determine the quality of the recommendations?
![Page 16: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/16.jpg)
Sparseness
• Too many items exist, so many ratings will be missing
• A user’s neighborhood is likely to extend to include “not-so-similar” users and/or items
![Page 17: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/17.jpg)
“Best” similarity?
• Consider cosine similarity vs. Pearson similarity
• Most existing studies report Pearson correlation to lead to superior recommendation accuracy
![Page 18: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/18.jpg)
“Best” similarity?
• Common variations to deal with sparse observations:– Item selection:
• Compare full profiles, or only on overlap
– Imputation:• Impute default value for unrated items
– Filtering:• Threshold on minimal similarity value
![Page 19: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/19.jpg)
“Best” similarity?
• Cosine superior (!), but not for all settings– No consistent results
![Page 20: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/20.jpg)
Analysis
![Page 21: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/21.jpg)
Distance Distribution
• In high dimensions, nearest neighbour is unstable:If the distance from query point to most data points is less than (1 + ε) times the distance from the query point to its nearest neighbour
Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999
![Page 22: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/22.jpg)
Distance Distribution
Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999
![Page 23: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/23.jpg)
Distance Distribution
• Quality q(n, f): Fraction of users for which the similarity function has ranked at least n percent of the user community within a factor f of the nearest neighbour’s similarity value (well... its corresponding distance)
![Page 24: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/24.jpg)
Distance Distribution
![Page 25: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/25.jpg)
NNk Graph
• Graph associated with the top k nearest neighbours
• Analysis focusing on the binary relation of whether a user does or does not belong to a neighbourhood– Ignore similarity values (already included in
the distance distribution analysis)
![Page 26: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/26.jpg)
NNk Graph
![Page 27: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/27.jpg)
MRR vs. Features
• Quality:– If most of the user population is far away, high
similarity correlates with effectiveness– If most of the user population is close, high
similarity correlates with ineffectiveness
![Page 28: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/28.jpg)
MRR vs. Features
![Page 29: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/29.jpg)
Conclusions (so far)
• “Similarity features” correlate with recommendation effectiveness– “Stability” of a metric (as defined in database
literature on k-NN search in high dimensions) is related to its ability to discriminate between good and bad neighbours
![Page 30: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/30.jpg)
Future Work
• How to exploit this knowledge to now improve recommendation systems?
![Page 31: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/31.jpg)
News Recommendation Challenge
![Page 32: Similarity & Recommendation - CWI Scientific Meeting - Sep 27th, 2013](https://reader036.vdocuments.site/reader036/viewer/2022062615/548ad553b479594a778b4634/html5/thumbnails/32.jpg)
Thanks
• Alejandro Bellogín – ERCIM fellow in the Information Access group
Details: Bellogín and De Vries, ICTIR 2013.