combining predictions for accurate recommender systems
DESCRIPTION
Combining Predictions for Accurate Recommender Systems. M. Jahrer 1 , A. Töscher 1 , R. Legenstein 2 1 Commendo Research & Consulting 2 Institute for Theoretical Computer Science, Graz University of Technology KDD ‘10 2010. 11. 26. - PowerPoint PPT PresentationTRANSCRIPT
Combining Predictions for Combining Predictions for Accurate Accurate
Recommender SystemsRecommender Systems
M. Jahrer1, A. Töscher1, R. Legenstein2
1Commendo Research & Consulting2Institute for Theoretical Computer Science, Graz University of Technology
KDD ‘10KDD ‘10
2010. 11. 26.
Summarized and Presented by Sang-il Song, IDS Lab., Seoul National University
Copyright 2010 by CEBT
ContentsContents
The Netflix Prize
Neflix Dataset
Challenge of Recommendation
Review: Collaborative Techniques
Motivation
Blending Techniques
Linear Regression
Binned Linear Regression
Neural Network
Bagged Gradient Boosted Decision Tree
Kernel Ridge Regression
K-Nearest Neighbor Blending
Results
Conclusion
2
Copyright 2010 by CEBT
The Netflix PrizeThe Netflix Prize
Open competition for the best collaborative filtering algorithm
The objective is to improve the performance of Netflix’s own recommendation algorithm by 10%
3
Copyright 2010 by CEBT
Netflix DatasetNetflix Dataset
480,189 users
17,770 movies
100,480,507 ratings (training data)
Each rating is formed as <user, movie, date of grade, grade>
4
Copyright 2010 by CEBT
Recommendation ProblemRecommendation Problem
m1
m2
m3
m4
m5
… mN
u1 3 ? 2 1
u2 ? 4 ?
u3 ? 2 3 2
u4 1 ?
u5 5 5
…
uM ? 1 2
5
Copyright 2010 by CEBT
Measure of CF algorithmMeasure of CF algorithm
Root Mean Square Error (RMSE)
is estimated rating by algorithm
N is size of test dataset
The original Netflix Algorithm, called “Cinematch”, achieved an RMSE of about 0.95
6
error
Copyright 2010 by CEBT
Challenges of Recommender SystemChallenges of Recommender System
Size of Data
Places premium on efficient algorithms
Stretched memory limits of standard PCs
99% of data are missing
Eliminates many standard prediction methods
Certainly not missing at random
Countless Factors may affect ratings
Large imbalance in training data
Number of ratings per user or movie varies by several orders of magnitude
Information to estimate individual parameters varies widely
7
Reference R. Bell – Lesson From the Netflix Prize
Copyright 2010 by CEBT
Collaborative Filtering TechniquesCollaborative Filtering Techniques
Memory based Approach
KNN user-user
KNN item-item
Model based Approach
Singular Value Decomposition (SVD)
Asymmetric Factor Model (AFM)
Restricted Boltzmann Machine (RBM)
Global Effect (GE)
Combination: Residual Training
8
Copyright 2010 by CEBT
KNN user-userKNN user-user
Traditional Approach for Collaborative Filtering
Methods
Find k similar users with user u
Aggregate their ratings for item i
9
Copyright 2010 by CEBT
KNN user-userKNN user-user
10
Copyright 2010 by CEBT
KNN item-itemKNN item-item
Symmetric Approach to KNN user-user
Just flip user and item sides
Methods
Find k similar items with item i
Aggregate their ratings for user u
11
Copyright 2010 by CEBT
KNN item-itemKNN item-item
12
Copyright 2010 by CEBT
SVD (matrix factorization)SVD (matrix factorization)
Singular Value Decomposition
Dimension Reduction Technique by Matrix Factorization
Capturing Latent Semantics
13
Copyright 2010 by CEBT
SVD ExampleSVD Example
14
i1 i2 i3 i4 i5
u1 2 0 3 5 0
u2 1 2 0 1 4
u3 3 0 4 4 0
u4 2 0 1 5 0
u5 0 5 0 0 5
f1 f2 f3
u1 .59 -.11 -.01
u2 .18 .51 -.18
u3 .60 -.11 .65
u4 .50 -.08 -.73
u5 .09 .85 .12
R = is factorized into
f1 f2 f3
f1 10 0 0
f2 08.2
0
f3 0 02.2
i1 i2 i3 i4 i5
f1 .40 .08 .45 .78 .11
f2 -.02 .64 -.10 -.10 .76
f3 .13 .11 .81 -.55 -.05
x x
Copyright 2010 by CEBT
Asymmetric Factorization ModelAsymmetric Factorization Model
An Extension of SVD
Item is represented by feature vector (same as SVD)
User is represented by items (different from SVD)
15
Feature 1
Featrue 2
Feature 3
0.8 0.1 1.2
Feature 1
Featrue 2
Feature 3
0.2 1.5 0
Feature 1
Featrue 2
Feature 3
0.2 0.2 0
Item 1 Item 2 Item 3
userFeature
1Featrue
2Feature
3
0.4 0.3 1.2
Copyright 2010 by CEBT
Restricted Boltzmann Machine (RBM)Restricted Boltzmann Machine (RBM)
Neural Network with one input layer and one hidden layer
Handling sparsity problem of data very well
16
Copyright 2010 by CEBT
Global EffectsGlobal Effects
Motivated from Data normalization
Based on user and item features
support (number of votes)
mean rating
mean standard deviation
Effective when applied to residuals of other algorithms
17
Copyright 2010 by CEBT
Residual TrainingResidual Training
A popular method to combine CF algorithms
Several models are trained by sequentially
18
Model 1 Model 2 Model 3
Copyright 2010 by CEBT
MotivationMotivation
Combinations of different kinds of collaborative filtering
leads to significant performance improvements over individual algorithms
19
Copyright 2010 by CEBT
“Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75”
Rookies
Copyright 2010 by CEBT
“My approach is to combine the results of many methods (also two-way interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression”
Arek Paterek
http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf
Copyright 2010 by CEBT
“When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system.”
U of Toronto
http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf
Copyright 2010 by CEBT
Gravity
home.mit.bme.hu/~gtakacs/download/gravity.pdf
Copyright 2010 by CEBT
“Our common team blends the result of team Gravity and team Dinosaur Planet.”
Might have guessed from the name…
When Gravity and Dinosaurs Unite
Copyright 2010 by CEBT
And, yes, the top team which is from AT&T…
“Our final solution (RMSE=0.8712) consists of blending 107 individual results. “
BellKor / KorBell
Copyright 2010 by CEBT
Blending ProblemBlending Problem
26
Alg 1 Alg 2 Alg 3 Alg 4Ratin
g
Data 1
3 3.3 3.2 2.5 3
Data 2
2.2 2.4 3 1.9 2
Data 3
2.8 3.2 3 2.9 ?
Data 4
1 1.1 1.3 2 3
Data 5
0.9 1.1 1.2 1 ?
Copyright 2010 by CEBT
Blending MethodsBlending Methods
Linear Regression (baseline)
Binned Linear Regression
Neural Network
Bagged Gradient Boosted Decision Tree
Kernel Ridge Regression
K-Nearest Neighbor Blending
27
Copyright 2010 by CEBT
Linear RegressionLinear Regression
Baseline
Assume a quadratic error function
Find optimal linear combination weight w
By solving the least squares problem
Weight w can be calculated with ridge regression
28
Copyright 2010 by CEBT
Binned Linear RegressionBinned Linear Regression
A Simple Extension of Linear Regression
Training dataset can be divided into B disjoint subjects
Training dataset may be very huge
Each subset can be used to learn different weight wb
Training set can be split by using following criteria:
Support (number of votes)
Time
Frequency (number of ratings from a user at day t).
29
Copyright 2010 by CEBT
Neural Network (NN)Neural Network (NN)
Efficient for huge data sets
30
Alg 1 Alg 2 Alg 3 Alg 4
Rating
Copyright 2010 by CEBT
Bagged Gradient Boosted Decision Tree Bagged Gradient Boosted Decision Tree (BGBDT)(BGBDT)
Single Decision Tree
Discretized output => limits its ability to model smooth functions
The number of possible outputs corresponds to the number of leaves
A Single tree is trained recursively by splitting always that leaf which provides the output value for the largest number of training samples
Bagging
Training Nbag copies of the model slightly different training set
(Stochastic Gradient) Boosting
Each model learns only a fraction of the desired function Ω
31
Copyright 2010 by CEBT
BGBDTBGBDT
32
Copyright 2010 by CEBT
Kernel Ridge Regression Blending Kernel Ridge Regression Blending (KRR)(KRR)
Kernel Ridge Regression
Regularized least square method for classification and regression
Similar to an SVM
– But, emphasis on points which don’t close to the decision boundary
Suitable for a small number of features and many training data sets.
Training complexity: O(n3)
Space requirement s: O(n2)
33
Copyright 2010 by CEBT
K-Nearest Neighbor Blending (KNN)K-Nearest Neighbor Blending (KNN)
Find k Similar Training Data Samples <user,item>
Aggregate the target value
34
Alg1 Alg2 Alg3 Alg4 Rating
Sample 1
4 3.2 3.2 3.6 ?
Sample 2
3 2.7 2 2.9 3
Sample 3
1 1.2 0.8 0.9 1.5
Sample 4
4 3 3.3 3.3 3.3
Sample 5
2 2 2 2 2
Copyright 2010 by CEBT
Experimental SetupExperimental Setup
18 CF Algorithms
4 versions of AFM
4 versions of GE
4 versions of KNN-item
2 versions of RBM
4 versions of SVD
1,400,000 samples
Running at 3.8 GHz CPU with 12GB main memory
35
Copyright 2010 by CEBT
ResultsResults
36
Copyright 2010 by CEBT
ConclusionsConclusions
The combinations of Collaborative Filtering Algorithms outperforms the single collaborative filtering algorithms
37
Thank youThank you
38