collaborative filtering and recommender systems by navisro analytics
DESCRIPTION
Recommendation System Overview, Types of Recommender System, and OpenSource tools/libraries available.TRANSCRIPT
Recommender Systems Navisro Analytics
@navisro [email protected]
http://www.navisro.com
ACM Data Mining Hackathon
8/18/2012
Capturing the Long Tail…
Recommender Approaches
Item Hierarchy
(You bought Printer you will also need ink - BestBuy)
Collaborative Filtering – User-User Similarity
(People like you who bought beer also bought diapers - Target)
Attribute-based recommendations
(You like action movies, starring
Clint Eastwood, you might like “Good,
Bad and the Ugly” Netflix)
Collaborative Filtering – Item-
Item similarity
(You like Godfather so you will like
Scarface - Netflix)
Social+Interest Graph Based (Your friends like Lady Gaga so you will like Lady Gaga, PYMK – Facebook, LinkedIn)
Model Based Training SVM, LDA, SVD for implicit features
Other/Model-based
Approaches
• Slope one recommender
• Latent factor Models for Web Data
– Matrix factorization using SVD, ALS, with Regularization
– LDA, SVM, Bayesian Clustering
General Steps •Problem definition (user-based, item-based, ratings/binary…)
•Map-Reduce, cleansing, massaging data (input matrix)
•Training Set, Validation Set Data Prep
• bias removal - Z-score, Mean-centering, Log Normalize
• Pearson Correlation Coefficient
• Cosine Similarity
• K-nearest neighbor
Similarity weights/Neighbors
• Training model (only in model-based approaches) Train
• Predict missing ratings
• top-N predictions for every user Predict
• Reverse of normalization Denormalize
• Accuracy, Precision, Recall, F1, ROC Evaluate Accuracy
User-based CF
Reference: Recommenderlab vignette, http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf
Challenges
• Dimensionality reduction (e.g. use PCA)
• Input data sparsity (aka cold start problem)
• Overfitting to training data set (use regularization)
• Data wrangling, in general…
Just How Good is your
Recommender?
• Evaluation of predicted ratings (Mean Average Error, Root Mean Sq Error)
• Evaluation of top-N recommendations
– Mean Absolute Error
– Accuracy
– Precision & Recall (F1 score)
– ROC curve
Tools
Open Source Tools Software Description Language URL
Apache Mahout Hadoop ML library that includes Collaborative Filtering Java
http://mahout.apache.org/
Cofi Collaborative Filtering Library Java http://www.nongnu.org/cofi/
Crab Components to create recommender systems Python https://github.com/muricoca/crab
easyrec Recommender for web pages Java http://easyrec.org/
LensKit Collaborative Filtering algorithms from GroupLens Research Java http://lenskit.grouplens.org/
MyMediaLite Recommender system algorithms C#/Mono http://mloss.org/software/view/282/
SVDFeature Toolkit for Feature based Matrix Factorization C++ http://mloss.org/software/view/333/
Vogoo PHP LIB Collaborative Filtering for personalized web sites PHP http://sourceforge.net/projects/vogoo/
recommenderlab R library for developing and testing collaborative filtering systems R
http://cran.r-project.org/web/packages/recommenderlab/index.html
Scikit-learn
Python module integrating classic ML algorithms in scientific Python packages (numpy, scipy, matplotlib) Python http://scikit-learn.org/stable/
recommenderlab
Reference: Recommenderlab vignette, http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf
Mahout
DataModel model = new FileDataModel(new File("data.txt"));
// Construct the list of pre-computed correlations Collection<GenericItemSimilarity.ItemItemSimilarity> correlations =
...;
ItemSimilarity itemSimilarity =
new GenericItemSimilarity(correlations);
Recommender recommender =
new GenericItemBasedRecommender(model, itemSimilarity);
Recommender cachingRecommender = new CachingRecommender(recommender);
...
List<RecommendedItem> recommendations = cachingRecommender.recommend (1234, 10);
Peter Harrington’s Sample Py
Code
• High Level Reading – Programming Collective Intelligence by Toby Segaran. The 2nd
chapter gives a good introduction to collaborative filtering with Python examples (non-SVD).
– Matrix Factorization Techniques for Recommender Systems Yehuda Koren; Robert Bell; Chris Volinsky, IEEE Computer, 2009, 8
• Singular Value Decomposition (SVD) Reading – The Singular Value Decomposition, by Jody Hourigan and Lynn
McIndoo, Linear Algebra – Math 45. http://online.redwoods.edu/INSTRUCT/darnold/LAPROJ/Fall98/JodLynn/report2.pdf w/ Matlab & image examples
– Numerical Recipes, 3rd Edition, Press et. al.,2007, p65-75.
2. References & Reading
• Collaborative Filtering Reading – See papers on research.yahoo.com/Yehuda_Koren – Collaborative Filtering for Implicit Feedback Datasets, Yifan Hu;
Yehuda Koren; Chris Volinsky, IEEE International Conference on Data Mining (ICDM 2008), IEEE, 2008
– Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, Yehuda Koren, ACM Int. Conference on Knowledge Discovery and Data Mining (KDD’08), 2008
– Collaborative Filtering with Temporal Dynamics, Yehuda Koren, KDD 2009, ACM, 2009
– James Thornton’s CF Blog http://original.jamesthornton.com/cf/ – Apache Mahout Recommender
https://cwiki.apache.org/MAHOUT/recommender-documentation.html
– Flexible Collaborative Filtering In Java With Mahout Taste - Philippe Adjiman
– Books, Articles and Tutorials on Mahout/Cofi
References & Reading (continued)
Questions?