similarity & recommendation arjen p. de vries [email protected] cwi scientific meeting september 27th...
TRANSCRIPT
![Page 1: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/1.jpg)
Similarity & Recommendation
Arjen P. de Vries
[email protected] CWI Scientific Meeting
September 27th 2013
![Page 2: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/2.jpg)
Recommendation
• Informally:– Search for information “without a query”
• Three types:– Content-based recommendation– Collaborative filtering (CF)
• Memory-based• Model-based
– Hybrid approaches
![Page 3: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/3.jpg)
Recommendation
• Informally:– Search for information “without a query”
• Three types:– Content-based recommendation– Collaborative filtering
• Memory-based• Model-based
– Hybrid approaches
Today’s focus!
![Page 4: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/4.jpg)
Collaborative Filtering• Collaborative filtering (originally introduced by
Patti Maes as “social information filtering”)
1. Compare user judgments2. Recommend differences between
similar users
• Leading principle:People’s tastes are not randomly distributed– A.k.a. “You are what you buy”
![Page 5: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/5.jpg)
Collaborative Filtering• Benefits over content-based approach
– Overcomes problems with finding suitable features to represent e.g. art, music
– Serendipity– Implicit mechanism for qualitative aspects like
style
• Problems: large groups, broad domains
![Page 6: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/6.jpg)
Context• Recommender systems
– Users interact (rate, purchase, click) with items
![Page 7: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/7.jpg)
Context• Recommender systems
– Users interact (rate, purchase, click) with items
![Page 8: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/8.jpg)
Context• Recommender systems
– Users interact (rate, purchase, click) with items
![Page 9: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/9.jpg)
Context• Recommender systems
– Users interact (rate, purchase, click) with items
![Page 10: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/10.jpg)
Context• Nearest-neighbour recommendation methods
– The item prediction is based on “similar” users
![Page 11: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/11.jpg)
Context• Nearest-neighbour recommendation methods
– The item prediction is based on “similar” users
![Page 12: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/12.jpg)
Similarity
![Page 13: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/13.jpg)
Similarity
![Page 14: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/14.jpg)
Similarity
s( , ) sim( , )s( , )
![Page 15: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/15.jpg)
Research Question
• How does the choice of similarity measure determine the quality of the recommendations?
![Page 16: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/16.jpg)
Sparseness
• Too many items exist, so many ratings will be missing
• A user’s neighborhood is likely to extend to include “not-so-similar” users and/or items
![Page 17: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/17.jpg)
“Best” similarity?
• Consider cosine similarity vs. Pearson similarity
• Most existing studies report Pearson correlation to lead to superior recommendation accuracy
![Page 18: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/18.jpg)
“Best” similarity?
• Common variations to deal with sparse observations:– Item selection:
• Compare full profiles, or only on overlap
– Imputation:• Impute default value for unrated items
– Filtering:• Threshold on minimal similarity value
![Page 19: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/19.jpg)
“Best” similarity?
• Cosine superior (!), but not for all settings– No consistent results
![Page 20: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/20.jpg)
Analysis
![Page 21: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/21.jpg)
Distance Distribution
• In high dimensions, nearest neighbour is unstable:If the distance from query point to most data points is less than (1 + ε) times the distance from the query point to its nearest neighbour
Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999
![Page 22: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/22.jpg)
Distance Distribution
Beyer et al. When is “nearest neighbour” meaningful? ICDT 1999
![Page 23: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/23.jpg)
Distance Distribution
• Quality q(n, f): Fraction of users for which the similarity function has ranked at least n percent of the user community within a factor f of the nearest neighbour’s similarity value (well... its corresponding distance)
![Page 24: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/24.jpg)
Distance Distribution
![Page 25: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/25.jpg)
NNk Graph
• Graph associated with the top k nearest neighbours
• Analysis focusing on the binary relation of whether a user does or does not belong to a neighbourhood– Ignore similarity values (already included in
the distance distribution analysis)
![Page 26: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/26.jpg)
NNk Graph
![Page 27: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/27.jpg)
MRR vs. Features
• Quality:– If most of the user population is far away, high
similarity correlates with effectiveness– If most of the user population is close, high
similarity correlates with ineffectiveness
![Page 28: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/28.jpg)
MRR vs. Features
![Page 29: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/29.jpg)
Conclusions (so far)
• “Similarity features” correlate with recommendation effectiveness– “Stability” of a metric (as defined in database
literature on k-NN search in high dimensions) is related to its ability to discriminate between good and bad neighbours
![Page 30: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/30.jpg)
Future Work
• How to exploit this knowledge to now improve recommendation systems?
![Page 31: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/31.jpg)
News Recommendation Challenge
![Page 32: Similarity & Recommendation Arjen P. de Vries arjen@cwi.nl CWI Scientific Meeting September 27th 2013](https://reader035.vdocuments.site/reader035/viewer/2022062809/5697bf991a28abf838c91de8/html5/thumbnails/32.jpg)
Thanks
• Alejandro Bellogín – ERCIM fellow in the Information Access group
Details: Bellogín and De Vries, ICTIR 2013.