cold-start management with cross-domain collaborative filtering and tags

1. EC-Web - August 2013, Prague, Czech Republic Cold-Start Management with Cross-Domain Collaborative Filtering and Tags Manuel Enrich, Matthias Braunhofer, and Francesco Ricci Free University of Bozen - Bolzano Piazza Domenicani 3, 39100 Bolzano, Italy {menrich,mbraunhofer,fricci}@unibz.it

2. EC-Web - August 2013, Prague, Czech Republic Outline 2 Recommender Systems and the Cold-Start Problem State of the Art Tag-Based Rating Prediction Models Experimental Evaluation Conclusions and Future Work 3. EC-Web - August 2013, Prague, Czech Republic Outline 2 Recommender Systems and the Cold-Start Problem State of the Art Tag-Based Rating Prediction Models Experimental Evaluation Conclusions and Future Work 4. EC-Web - August 2013, Prague, Czech Republic Recommender Systems (RSs) Goal: recommend new, relevant items to users based on their feedback Explicit feedback (ratings) vs. implicit feedback (purchase / browsing history) Two basic technical approaches: Collaborative ltering (CF) Content-based 3 5. EC-Web - August 2013, Prague, Czech Republic Cold-Start Problem CF RSs suer from the cold-start problem New user problem: How do you recommend to a new user? New item problem: How do you recommend a new item with no ratings? Content-based RSs overcome the new item problem 4 5 ? 3 2 45 ? 43 ? ?? 5 ? 3 2 4 ?5 ? ? 4 ?3 6. EC-Web - August 2013, Prague, Czech Republic Outline 5 Recommender Systems and the Cold-Start Problem State of the Art Tag-Based Rating Prediction Models Experimental Evaluation Conclusions and Future Work 7. EC-Web - August 2013, Prague, Czech Republic Cross-Domain CF (1/2) Technique that uses user ratings in one (auxiliary) domain to improve the recommendation accuracy in another (target) domain (Berkovsky et al., 2007) Example: 6 Ratings in target domain Recommender System Recommended content in target domain Ratings in auxiliary domain 8. EC-Web - August 2013, Prague, Czech Republic Cross-Domain CF (2/2) Main limitation: its limited applicability It fails when no common users / items are shared among the domains Example: 7 5 ? 3 4 4 3 2 ?5 3 4 2 5 3 4 4 15 Ratings in target domain Ratings in auxiliary domain 9. EC-Web - August 2013, Prague, Czech Republic Additional Knowledge Sources Extend existing rating prediction models by incorporating additional sources of information about the users and items to better predict user preferences (Koren and Bell, 2011; Baltrunas et al., 2012) Utilize implicit feedback, demographic data, contextual factors, ... Main limitations: Extensive training sets (i.e., browsing / purchase histories, ratings in context) to learn the models are required Training sets are specic to the applications target domain 8 10. EC-Web - August 2013, Prague, Czech Republic Tag-Induced Cross-Domain CF Exploits user-generated tags that are shared across domains to link their users and items (Shi et al., 2011) Cross-domain similarities calculated based on user-assigned tags are used to constrain matrix factorization Main limitations: Depends on a similarity function that might inuence the recommendation quality Requires the target user to have tagged several items 9 11. EC-Web - August 2013, Prague, Czech Republic Outline 10 Recommender Systems and the Cold-Start Problem State of the Art Tag-Based Rating Prediction Models Experimental Evaluation Conclusions and Future Work 12. EC-Web - August 2013, Prague, Czech Republic Our Solution: Tag-Based Prediction Models Main assumption: it is possible to exploit the information about how users tag and rate items in a particular domain to improve the prediction accuracy in another domain Example: 11 Target domain knowledge Auxiliary domain knowledge 5 ? 3 4 4 3 2 ?5 3 4 2 5 3 4 4 15 ExcitingExciting ? Exciting 13. EC-Web - August 2013, Prague, Czech Republic Our Solution: Tag-Based Prediction Models Main assumption: it is possible to exploit the information about how users tag and rate items in a particular domain to improve the prediction accuracy in another domain Example: 11 Target domain knowledge Auxiliary domain knowledge 5 ? 3 4 4 3 2 ?5 3 4 2 5 3 4 4 15 ExcitingExciting 5 Exciting 14. EC-Web - August 2013, Prague, Czech Republic Latent Factor Models Each user u and item i are associated with latent factor vectors pu and qi Dot product captures the predicted users overall interest in the item: Factor vectors are learned using: Stochastic gradient descent Alternating least squares 12 rui = puqi T 15. EC-Web - August 2013, Prague, Czech Republic Incorporating Tags Consider that the item i has been tagged with some tags T(i) We can use an additional set of factor vectors, one for each tag t, yt - expressing how much an item that was annotated with tag t is loading the factors The rating prediction function is now: 13 rui = pu (qi T + yt T ) tT (i) 16. EC-Web - August 2013, Prague, Czech Republic 1st Proposed Model: UserItemTags Main idea: user ratings for an item may be also correlated with the specic tags the user attached to the item Assumption: target user has tagged the item 14 rui = pu (qi T + 1 Tu (i) yt T ) tTu (i) pu : latent factor vector of user u qi : latent factor vector of item i Tu(i) : set of tags assigned by user u to item i yt : latent factor vector of tag t 17. EC-Web - August 2013, Prague, Czech Republic 2nd Proposed Model: UserItemRelTags Main idea: same as before, except that we consider only relevant tags (i.e., tags that have a statistically signicant inuence on the ratings) Assumption: target user has tagged the item 15 rui = pu (qi T + 1 TRu (i) yt T ) tTRu (i) pu : latent factor vector of user u qi : latent factor vector of item i TRu(i) : set of relevant tags assigned by user u to item i yt : latent factor vector of tag t 18. EC-Web - August 2013, Prague, Czech Republic 3rd Proposed Model: ItemRelTags Main idea: target user rating for an item can be better predicted by modeling how tags overall inuence the items ratings Advantage: doesnt require the target user to have tagged the item 16 rui = pu (qi T + 1 TRoi TRoi (t)yt T ) tTR(i) pu : latent factor vector of user u qi : latent factor vector of item i TR(i) : set of relevant tags assigned to item i TRoi: relevant tags applied to item i (incl. duplicates) TRoi(t): tag occurrences of tag t in item i 19. EC-Web - August 2013, Prague, Czech Republic Outline 17 Recommender Systems and the Cold-Start Problem State of the Art Tag-Based Rating Prediction Models Experimental Evaluation Conclusions and Future Work 20. EC-Web - August 2013, Prague, Czech Republic Used Datasets 2 tagged rating datasets 18 Total number of ratings 24,564 24,564 Unique users 2,026 283 Unique items 5,088 12,554 Unique tags 9,486 4,708 Tag assignments 44,805 78,239 Average ratings per user 12.12 86.80 Average tags per rating 1.82 3.18 % of tags overlapping with the tags used in the other domain 14.54 29.31 * The statistics refer to the datasets after performing some pre-processing 21. EC-Web - August 2013, Prague, Czech Republic Cross-Domain Recommendations Evaluation Design (1/2) 2 results for each model MovieLens as target and LibraryThing as auxiliary domain LibraryThing as target and MovieLens as auxiliary domain SVD model (Koren and Bell, 2011) used as baseline system Only data coming from the target domain used for training Rating prediction accuracy measured in terms of: Mean Absolute Error (MAE) 19 22. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) (Extended) 10-fold cross validation scheme: 23. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) (Extended) 10-fold cross validation scheme: Break up the data from target domain into 10 pieces Target domain Auxiliary domain 24. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) (Extended) 10-fold cross validation scheme: Break up the data from target domain into 10 pieces Treat one piece as test dataset and t the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain 25. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) (Extended) 10-fold cross validation scheme: Break up the data from target domain into 10 pieces Treat one piece as test dataset and t the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain 26. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) (Extended) 10-fold cross validation scheme: Break up the data from target domain into 10 pieces Treat one piece as test dataset and t the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain 27. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) (Extended) 10-fold cross validation scheme: Break up the data from target domain into 10 pieces Treat one piece as test dataset and t the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain 28. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) (Extended) 10-fold cross validation scheme: Break up the data from target domain into 10 pieces Treat one piece as test dataset and t the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain 29. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) (Extended) 10-fold cross validation scheme: Break up the data from target domain into 10 pieces Treat one piece as test dataset and t the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain 30. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) (Extended) 10-fold cross validation scheme: Break up the data from target domain into 10 pieces Treat one piece as test dataset and t the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain 31. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) (Extended) 10-fold cross validation scheme: Break up the data from target domain into 10 pieces Treat one piece as test dataset and t the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain 32. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) (Extended) 10-fold cross validation scheme: Break up the data from target domain into 10 pieces Treat one piece as test dataset and t the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain 33. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) (Extended) 10-fold cross validation scheme: Break up the data from target domain into 10 pieces Treat one piece as test dataset and t the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Target domain Auxiliary domain Target domain Auxiliary domain 34. EC-Web - August 2013, Prague, Czech Republic 20 Cross-Domain Recommendations Evaluation Design (2/2) (Extended) 10-fold cross validation scheme: Break up the data from target domain into 10 pieces Treat one piece as test dataset and t the model incrementally by adding 10% from the other nine pieces (which together with the data from the auxiliary domain are now the training data) Repeat Target domain Auxiliary domain Target domain Auxiliary domain Target domain Auxiliary domain 35. EC-Web - August 2013, Prague, Czech Republic Cross-Domain Recommendations Evaluation Results (1/2) Average MAEs using MovieLens as target and LibraryThing as auxiliary domain 21 0.74% 0.76% 0.78% 0.8% 0.82% 0.84% 0.86% 0.88% 0.9% 0.92% 0.94% 0.96% 0.98% 10%% 20%% 30%% 40%% 50%% 60%% 70%% 80%% 90%% 100%% Average'MAE' Usage'of'data'from'target'domain' SVD% UserItemTags% UserItemRelTags% ItemRelTags% 36. EC-Web - August 2013, Prague, Czech Republic Cross-Domain Recommendations Evaluation Results (2/2) Average MAEs using LibraryThing as target and MovieLens as auxiliary domain 22 0.76% 0.78% 0.8% 0.82% 0.84% 0.86% 0.88% 0.9% 10%% 20%% 30%% 40%% 50%% 60%% 70%% 80%% 90%% 100%% Average'MAE' Usage'of'data'from'the'target'domain' SVD% UserItemTags% UserItemRelTags% ItemRelTags% 37. EC-Web - August 2013, Prague, Czech Republic Single-Domain Recommendations Evaluation Design Check the performance of the models using only rating and tagging data in the target domain (Extended) 10-fold cross validation scheme: In each of the 10 iterations, one split used as test and the remaining data as training set Training set is split into 10 further parts used for incremental training SVD used as a baseline model 23 38. EC-Web - August 2013, Prague, Czech Republic Single-Domain Recommendations Evaluation Results (1/2) Comparison of models MAEs - single vs. cross-domain (MovieLens target) 24 0.74% 0.76% 0.78% 0.8% 0.82% 0.84% 0.86% 0.88% 0.9% 0.92% 0.94% 0.96% 0.98% 10%% 20%% 30%% 40%% 50%% 60%% 70%% 80%% 90%% 100%% Average'MAE' Usage'of'data' SVD% UserItemRelTags% UserItemRelTags%(cross@domain)% ItemRelTags% ItemRelTags%(cross@domain)% 39. EC-Web - August 2013, Prague, Czech Republic Single-Domain Recommendations Evaluation Results (2/2) Comparison of models MAEs - single vs. cross-domain (LibraryThing target) 25 0.76% 0.78% 0.8% 0.82% 0.84% 0.86% 0.88% 0.9% 0.92% 10%% 20%% 30%% 40%% 50%% 60%% 70%% 80%% 90%% 100%% Average'MAE' Usage'of'data' SVD% UserItemRelTags% UserItemRelTags%(cross@domain)% ItemRelTags% ItemRelTags%(cross@domain)% 40. EC-Web - August 2013, Prague, Czech Republic Outline 26 Recommender Systems and the Cold-Start Problem State of the Art Tag-Based Rating Prediction Models Experimental Evaluation Conclusions and Future Work 41. EC-Web - August 2013, Prague, Czech Republic Conclusions Novel cross-domain recommendation approaches improve the prediction accuracy on a target domain using rating and tagging data from an auxiliary domain (assuming that there is a good tag overlap) very useful in the cold-start situation (i.e., when a small amount of training data in the target domain is available) improve the rating prediction also in a single-domain scenario (i.e., using only rating and tagging data in the target domain) 27 42. EC-Web - August 2013, Prague, Czech Republic Future Work Extended evaluation Better correlation of the algorithm performance to the characteristics of the datasets Usage of other datasets / comparison with other cross-domain RSs Analysis of elds of application Exploitation in context-aware RSs Generation of more diverse recommendations 28 43. EC-Web - August 2013, Prague, Czech Republic Questions? Thank you.

cold-start management with cross-domain collaborative filtering and tags

Technology

czech republic cold

czech republic crossdomain

target domain ratings

auxiliary domain

user ratings

problem new user problem

target domain berkovsky

target user