combining predictions for accurate recommender systems

Combining Predictions for Combining Predictions for Accurate Accurate

Recommender SystemsRecommender Systems

M. Jahrer1, A. Töscher1, R. Legenstein2

1Commendo Research & Consulting2Institute for Theoretical Computer Science, Graz University of Technology

KDD ‘10KDD ‘10

2010. 11. 26.

Summarized and Presented by Sang-il Song, IDS Lab., Seoul National University

ContentsContents

The Netflix Prize

Neflix Dataset

Challenge of Recommendation

Review: Collaborative Techniques

Motivation

Blending Techniques

Linear Regression

Binned Linear Regression

Neural Network

Bagged Gradient Boosted Decision Tree

Kernel Ridge Regression

K-Nearest Neighbor Blending

Results

Conclusion

The Netflix PrizeThe Netflix Prize

Open competition for the best collaborative filtering algorithm

The objective is to improve the performance of Netflix’s own recommendation algorithm by 10%

Netflix DatasetNetflix Dataset

480,189 users

17,770 movies

100,480,507 ratings (training data)

Each rating is formed as <user, movie, date of grade, grade>

Recommendation ProblemRecommendation Problem

… mN

u1 3 ? 2 1

u2 ? 4 ?

u3 ? 2 3 2

u4 1 ?

u5 5 5

uM ? 1 2

Measure of CF algorithmMeasure of CF algorithm

Root Mean Square Error (RMSE)

is estimated rating by algorithm

N is size of test dataset

The original Netflix Algorithm, called “Cinematch”, achieved an RMSE of about 0.95

Challenges of Recommender SystemChallenges of Recommender System

Size of Data

Places premium on efficient algorithms

Stretched memory limits of standard PCs

99% of data are missing

Eliminates many standard prediction methods

Certainly not missing at random

Countless Factors may affect ratings

Large imbalance in training data

Number of ratings per user or movie varies by several orders of magnitude

Information to estimate individual parameters varies widely

Reference R. Bell – Lesson From the Netflix Prize

Collaborative Filtering TechniquesCollaborative Filtering Techniques

Memory based Approach

KNN user-user

KNN item-item

Model based Approach

Singular Value Decomposition (SVD)

Asymmetric Factor Model (AFM)

Restricted Boltzmann Machine (RBM)

Global Effect (GE)

Combination: Residual Training

KNN user-userKNN user-user

Traditional Approach for Collaborative Filtering

Methods

Find k similar users with user u

Aggregate their ratings for item i

KNN user-userKNN user-user

KNN item-itemKNN item-item

Symmetric Approach to KNN user-user

Just flip user and item sides

Methods

Find k similar items with item i

Aggregate their ratings for user u

KNN item-itemKNN item-item

SVD (matrix factorization)SVD (matrix factorization)

Singular Value Decomposition

Dimension Reduction Technique by Matrix Factorization

Capturing Latent Semantics

SVD ExampleSVD Example

i1 i2 i3 i4 i5

u1 2 0 3 5 0

u2 1 2 0 1 4

u3 3 0 4 4 0

u4 2 0 1 5 0

u5 0 5 0 0 5

f1 f2 f3

u1 .59 -.11 -.01

u2 .18 .51 -.18

u3 .60 -.11 .65

u4 .50 -.08 -.73

u5 .09 .85 .12

R = is factorized into

f1 f2 f3

f1 10 0 0

f2 08.2

f3 0 02.2

i1 i2 i3 i4 i5

f1 .40 .08 .45 .78 .11

f2 -.02 .64 -.10 -.10 .76

f3 .13 .11 .81 -.55 -.05

Asymmetric Factorization ModelAsymmetric Factorization Model

An Extension of SVD

Item is represented by feature vector (same as SVD)

User is represented by items (different from SVD)

Feature 1

Featrue 2

Feature 3

0.8 0.1 1.2

Feature 1

Featrue 2

Feature 3

0.2 1.5 0

Feature 1

Featrue 2

Feature 3

0.2 0.2 0

Item 1 Item 2 Item 3

userFeature

1Featrue

2Feature

0.4 0.3 1.2

Restricted Boltzmann Machine (RBM)Restricted Boltzmann Machine (RBM)

Neural Network with one input layer and one hidden layer

Handling sparsity problem of data very well

Global EffectsGlobal Effects

Motivated from Data normalization

Based on user and item features

support (number of votes)

mean rating

mean standard deviation

Effective when applied to residuals of other algorithms

Residual TrainingResidual Training

A popular method to combine CF algorithms

Several models are trained by sequentially

Model 1 Model 2 Model 3

MotivationMotivation

Combinations of different kinds of collaborative filtering

leads to significant performance improvements over individual algorithms

“Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75”

Rookies

“My approach is to combine the results of many methods (also two-way interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression”

Arek Paterek

http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf

“When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system.”

U of Toronto

http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf

Gravity

home.mit.bme.hu/~gtakacs/download/gravity.pdf

“Our common team blends the result of team Gravity and team Dinosaur Planet.”

Might have guessed from the name…

When Gravity and Dinosaurs Unite

And, yes, the top team which is from AT&T…

“Our final solution (RMSE=0.8712) consists of blending 107 individual results. “

BellKor / KorBell

Blending ProblemBlending Problem

Alg 1 Alg 2 Alg 3 Alg 4Ratin

Data 1

3 3.3 3.2 2.5 3

Data 2

2.2 2.4 3 1.9 2

Data 3

2.8 3.2 3 2.9 ?

Data 4

1 1.1 1.3 2 3

Data 5

0.9 1.1 1.2 1 ?

Blending MethodsBlending Methods

Linear Regression (baseline)

Binned Linear Regression

Neural Network

Bagged Gradient Boosted Decision Tree

K-Nearest Neighbor Blending

Linear RegressionLinear Regression

Baseline

Assume a quadratic error function

Find optimal linear combination weight w

By solving the least squares problem

Weight w can be calculated with ridge regression

Binned Linear RegressionBinned Linear Regression

A Simple Extension of Linear Regression

Training dataset can be divided into B disjoint subjects

Training dataset may be very huge

Each subset can be used to learn different weight wb

Training set can be split by using following criteria:

Support (number of votes)

Frequency (number of ratings from a user at day t).

Neural Network (NN)Neural Network (NN)

Efficient for huge data sets

Alg 1 Alg 2 Alg 3 Alg 4

Rating

Bagged Gradient Boosted Decision Tree Bagged Gradient Boosted Decision Tree (BGBDT)(BGBDT)

Single Decision Tree

Discretized output => limits its ability to model smooth functions

The number of possible outputs corresponds to the number of leaves

A Single tree is trained recursively by splitting always that leaf which provides the output value for the largest number of training samples

Bagging

Training Nbag copies of the model slightly different training set

(Stochastic Gradient) Boosting

Each model learns only a fraction of the desired function Ω

BGBDTBGBDT

Kernel Ridge Regression Blending Kernel Ridge Regression Blending (KRR)(KRR)

Regularized least square method for classification and regression

Similar to an SVM

– But, emphasis on points which don’t close to the decision boundary

Suitable for a small number of features and many training data sets.

Training complexity: O(n3)

Space requirement s: O(n2)

K-Nearest Neighbor Blending (KNN)K-Nearest Neighbor Blending (KNN)

Find k Similar Training Data Samples <user,item>

Aggregate the target value

Alg1 Alg2 Alg3 Alg4 Rating

Sample 1

4 3.2 3.2 3.6 ?

Sample 2

3 2.7 2 2.9 3

Sample 3

1 1.2 0.8 0.9 1.5

Sample 4

4 3 3.3 3.3 3.3

Sample 5

2 2 2 2 2

Experimental SetupExperimental Setup

18 CF Algorithms

4 versions of AFM

4 versions of GE

4 versions of KNN-item

2 versions of RBM

4 versions of SVD

1,400,000 samples

Running at 3.8 GHz CPU with 12GB main memory

ResultsResults

ConclusionsConclusions

The combinations of Collaborative Filtering Algorithms outperforms the single collaborative filtering algorithms

Thank youThank you

combining predictions for accurate recommender systems

item sidesmethodsfind

item iaggregate

user uaggregate

feature vector

knn useruserjust flip

training datanumber

netflix prizecopyright

similar items

Documents

combining dynamic predictions from joint models …combining...

developing a course recommender by combining...

recommender systems recommender systems

a fuzzy recommender system for eelections - unifr.ch fuzzy...

recommender systems an introduction chapter07 evaluating...

kernel-mapping recommender system algorithms · a main...

a recommender system framework combining neural networks...

probabilistic load and price forecasting: team tololo ·...

introduction to recommender - leuphana … · systems and...

making business predictions by combining human...

combining high-resolution numerical weather predictions ......

recommender systems - universidade nova de...

music predictions using deep learning. could lstm networks...

group recommender systems tourism from predictions to...

combining context features in sequence-aware recommender...

tfr: a tourist food recommender system based on...

tutorial: recommender systemswelling/teaching/cs77b... ·...

how to use recommender systems in e-business domains ·...

recommender lab

recommender systems