combining predictions for accurate recommender systems

Combining Predictions for Combining Predictions for Accurate Accurate

Recommender SystemsRecommender Systems

M. Jahrer1, A. Töscher1, R. Legenstein2

1Commendo Research & Consulting2Institute for Theoretical Computer Science, Graz University of Technology

KDD ‘10KDD ‘10

2010. 11. 26.

Summarized and Presented by Sang-il Song, IDS Lab., Seoul National University

Copyright 2010 by CEBT

ContentsContents

The Netflix Prize

Neflix Dataset

Challenge of Recommendation

Review: Collaborative Techniques

Motivation

Blending Techniques

Linear Regression

Binned Linear Regression

Neural Network

Bagged Gradient Boosted Decision Tree

Kernel Ridge Regression

K-Nearest Neighbor Blending

Results

Conclusion

2


The Netflix PrizeThe Netflix Prize

Open competition for the best collaborative filtering algorithm

The objective is to improve the performance of Netflix’s own recommendation algorithm by 10%

3


Netflix DatasetNetflix Dataset

480,189 users

17,770 movies

100,480,507 ratings (training data)

Each rating is formed as <user, movie, date of grade, grade>

4


Recommendation ProblemRecommendation Problem

m1

m2

m3

m4

m5

… mN

u1 3 ? 2 1

u2 ? 4 ?

u3 ? 2 3 2

u4 1 ?

u5 5 5

…

uM ? 1 2

5


Measure of CF algorithmMeasure of CF algorithm

Root Mean Square Error (RMSE)

is estimated rating by algorithm

N is size of test dataset

The original Netflix Algorithm, called “Cinematch”, achieved an RMSE of about 0.95

6

error


Challenges of Recommender SystemChallenges of Recommender System

Size of Data

Places premium on efficient algorithms

Stretched memory limits of standard PCs

99% of data are missing

Eliminates many standard prediction methods

Certainly not missing at random

Countless Factors may affect ratings

Large imbalance in training data

Number of ratings per user or movie varies by several orders of magnitude

Information to estimate individual parameters varies widely

7

Reference R. Bell – Lesson From the Netflix Prize


Collaborative Filtering TechniquesCollaborative Filtering Techniques

Memory based Approach

KNN user-user

KNN item-item

Model based Approach

Singular Value Decomposition (SVD)

Asymmetric Factor Model (AFM)

Restricted Boltzmann Machine (RBM)

Global Effect (GE)

Combination: Residual Training

8


KNN user-userKNN user-user

Traditional Approach for Collaborative Filtering

Methods

Find k similar users with user u

Aggregate their ratings for item i

9


KNN user-userKNN user-user

10


KNN item-itemKNN item-item

Symmetric Approach to KNN user-user

Just flip user and item sides

Methods

Find k similar items with item i

Aggregate their ratings for user u

11


KNN item-itemKNN item-item

12


SVD (matrix factorization)SVD (matrix factorization)

Singular Value Decomposition

Dimension Reduction Technique by Matrix Factorization

Capturing Latent Semantics

13


SVD ExampleSVD Example

14

i1 i2 i3 i4 i5

u1 2 0 3 5 0

u2 1 2 0 1 4

u3 3 0 4 4 0

u4 2 0 1 5 0

u5 0 5 0 0 5

f1 f2 f3

u1 .59 -.11 -.01

u2 .18 .51 -.18

u3 .60 -.11 .65

u4 .50 -.08 -.73

u5 .09 .85 .12

R = is factorized into

f1 f2 f3

f1 10 0 0

f2 08.2

0

f3 0 02.2

i1 i2 i3 i4 i5

f1 .40 .08 .45 .78 .11

f2 -.02 .64 -.10 -.10 .76

f3 .13 .11 .81 -.55 -.05

x x


Asymmetric Factorization ModelAsymmetric Factorization Model

An Extension of SVD

Item is represented by feature vector (same as SVD)

User is represented by items (different from SVD)

15

Feature 1

Featrue 2

Feature 3

0.8 0.1 1.2

Feature 1

Featrue 2

Feature 3

0.2 1.5 0

Feature 1

Featrue 2

Feature 3

0.2 0.2 0

Item 1 Item 2 Item 3

userFeature

1Featrue

2Feature

3

0.4 0.3 1.2


Restricted Boltzmann Machine (RBM)Restricted Boltzmann Machine (RBM)

Neural Network with one input layer and one hidden layer

Handling sparsity problem of data very well

16


Global EffectsGlobal Effects

Motivated from Data normalization

Based on user and item features

support (number of votes)

mean rating

mean standard deviation

Effective when applied to residuals of other algorithms

17


Residual TrainingResidual Training

A popular method to combine CF algorithms

Several models are trained by sequentially

18

Model 1 Model 2 Model 3


MotivationMotivation

Combinations of different kinds of collaborative filtering

leads to significant performance improvements over individual algorithms

19


“Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75”

Rookies


“My approach is to combine the results of many methods (also two-way interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression”

Arek Paterek

http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf


“When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system.”

U of Toronto

http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf


Gravity

home.mit.bme.hu/~gtakacs/download/gravity.pdf


“Our common team blends the result of team Gravity and team Dinosaur Planet.”

Might have guessed from the name…

When Gravity and Dinosaurs Unite


And, yes, the top team which is from AT&T…

“Our final solution (RMSE=0.8712) consists of blending 107 individual results. “

BellKor / KorBell


Blending ProblemBlending Problem

26

Alg 1 Alg 2 Alg 3 Alg 4Ratin

g

Data 1

3 3.3 3.2 2.5 3

Data 2

2.2 2.4 3 1.9 2

Data 3

2.8 3.2 3 2.9 ?

Data 4

1 1.1 1.3 2 3

Data 5

0.9 1.1 1.2 1 ?


Blending MethodsBlending Methods

Linear Regression (baseline)

Binned Linear Regression

Neural Network

Bagged Gradient Boosted Decision Tree


K-Nearest Neighbor Blending

27


Linear RegressionLinear Regression

Baseline

Assume a quadratic error function

Find optimal linear combination weight w

By solving the least squares problem

Weight w can be calculated with ridge regression

28


Binned Linear RegressionBinned Linear Regression

A Simple Extension of Linear Regression

Training dataset can be divided into B disjoint subjects

Training dataset may be very huge

Each subset can be used to learn different weight wb

Training set can be split by using following criteria:

Support (number of votes)

Time

Frequency (number of ratings from a user at day t).

29


Neural Network (NN)Neural Network (NN)

Efficient for huge data sets

30

Alg 1 Alg 2 Alg 3 Alg 4

Rating


Bagged Gradient Boosted Decision Tree Bagged Gradient Boosted Decision Tree (BGBDT)(BGBDT)

Single Decision Tree

Discretized output => limits its ability to model smooth functions

The number of possible outputs corresponds to the number of leaves

A Single tree is trained recursively by splitting always that leaf which provides the output value for the largest number of training samples

Bagging

Training Nbag copies of the model slightly different training set

(Stochastic Gradient) Boosting

Each model learns only a fraction of the desired function Ω

31


BGBDTBGBDT

32


Kernel Ridge Regression Blending Kernel Ridge Regression Blending (KRR)(KRR)


Regularized least square method for classification and regression

Similar to an SVM

– But, emphasis on points which don’t close to the decision boundary

Suitable for a small number of features and many training data sets.

Training complexity: O(n3)

Space requirement s: O(n2)

33


K-Nearest Neighbor Blending (KNN)K-Nearest Neighbor Blending (KNN)

Find k Similar Training Data Samples <user,item>

Aggregate the target value

34

Alg1 Alg2 Alg3 Alg4 Rating

Sample 1

4 3.2 3.2 3.6 ?

Sample 2

3 2.7 2 2.9 3

Sample 3

1 1.2 0.8 0.9 1.5

Sample 4

4 3 3.3 3.3 3.3

Sample 5

2 2 2 2 2


Experimental SetupExperimental Setup

18 CF Algorithms

4 versions of AFM

4 versions of GE

4 versions of KNN-item

2 versions of RBM

4 versions of SVD

1,400,000 samples

Running at 3.8 GHz CPU with 12GB main memory

35


ResultsResults

36


ConclusionsConclusions

The combinations of Collaborative Filtering Algorithms outperforms the single collaborative filtering algorithms

37

Thank youThank you

38

combining predictions for accurate recommender systems

Documents

item sidesmethodsfind

item iaggregate

user uaggregate

feature vector

knn useruserjust flip

training datanumber

netflix prizecopyright

similar items