combining predictions for accurate recommender systems

Post on 21-Jan-2016

37 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Combining Predictions for Accurate Recommender Systems. M. Jahrer 1 , A. Töscher 1 , R. Legenstein 2 1 Commendo Research & Consulting 2 Institute for Theoretical Computer Science, Graz University of Technology KDD ‘10 2010. 11. 26. - PowerPoint PPT Presentation

TRANSCRIPT

Combining Predictions for Combining Predictions for Accurate Accurate

Recommender SystemsRecommender Systems

M. Jahrer1, A. Töscher1, R. Legenstein2

1Commendo Research & Consulting2Institute for Theoretical Computer Science, Graz University of Technology

KDD ‘10KDD ‘10

2010. 11. 26.

Summarized and Presented by Sang-il Song, IDS Lab., Seoul National University

Copyright 2010 by CEBT

ContentsContents

The Netflix Prize

Neflix Dataset

Challenge of Recommendation

Review: Collaborative Techniques

Motivation

Blending Techniques

Linear Regression

Binned Linear Regression

Neural Network

Bagged Gradient Boosted Decision Tree

Kernel Ridge Regression

K-Nearest Neighbor Blending

Results

Conclusion

2

Copyright 2010 by CEBT

The Netflix PrizeThe Netflix Prize

Open competition for the best collaborative filtering algorithm

The objective is to improve the performance of Netflix’s own recommendation algorithm by 10%

3

Copyright 2010 by CEBT

Netflix DatasetNetflix Dataset

480,189 users

17,770 movies

100,480,507 ratings (training data)

Each rating is formed as <user, movie, date of grade, grade>

4

Copyright 2010 by CEBT

Recommendation ProblemRecommendation Problem

m1

m2

m3

m4

m5

… mN

u1 3 ? 2 1

u2 ? 4 ?

u3 ? 2 3 2

u4 1 ?

u5 5 5

uM ? 1 2

5

Copyright 2010 by CEBT

Measure of CF algorithmMeasure of CF algorithm

Root Mean Square Error (RMSE)

is estimated rating by algorithm

N is size of test dataset

The original Netflix Algorithm, called “Cinematch”, achieved an RMSE of about 0.95

6

error

Copyright 2010 by CEBT

Challenges of Recommender SystemChallenges of Recommender System

Size of Data

Places premium on efficient algorithms

Stretched memory limits of standard PCs

99% of data are missing

Eliminates many standard prediction methods

Certainly not missing at random

Countless Factors may affect ratings

Large imbalance in training data

Number of ratings per user or movie varies by several orders of magnitude

Information to estimate individual parameters varies widely

7

Reference R. Bell – Lesson From the Netflix Prize

Copyright 2010 by CEBT

Collaborative Filtering TechniquesCollaborative Filtering Techniques

Memory based Approach

KNN user-user

KNN item-item

Model based Approach

Singular Value Decomposition (SVD)

Asymmetric Factor Model (AFM)

Restricted Boltzmann Machine (RBM)

Global Effect (GE)

Combination: Residual Training

8

Copyright 2010 by CEBT

KNN user-userKNN user-user

Traditional Approach for Collaborative Filtering

Methods

Find k similar users with user u

Aggregate their ratings for item i

9

Copyright 2010 by CEBT

KNN user-userKNN user-user

10

Copyright 2010 by CEBT

KNN item-itemKNN item-item

Symmetric Approach to KNN user-user

Just flip user and item sides

Methods

Find k similar items with item i

Aggregate their ratings for user u

11

Copyright 2010 by CEBT

KNN item-itemKNN item-item

12

Copyright 2010 by CEBT

SVD (matrix factorization)SVD (matrix factorization)

Singular Value Decomposition

Dimension Reduction Technique by Matrix Factorization

Capturing Latent Semantics

13

Copyright 2010 by CEBT

SVD ExampleSVD Example

14

i1 i2 i3 i4 i5

u1 2 0 3 5 0

u2 1 2 0 1 4

u3 3 0 4 4 0

u4 2 0 1 5 0

u5 0 5 0 0 5

f1 f2 f3

u1 .59 -.11 -.01

u2 .18 .51 -.18

u3 .60 -.11 .65

u4 .50 -.08 -.73

u5 .09 .85 .12

R = is factorized into

f1 f2 f3

f1 10 0 0

f2 08.2

0

f3 0 02.2

i1 i2 i3 i4 i5

f1 .40 .08 .45 .78 .11

f2 -.02 .64 -.10 -.10 .76

f3 .13 .11 .81 -.55 -.05

x x

Copyright 2010 by CEBT

Asymmetric Factorization ModelAsymmetric Factorization Model

An Extension of SVD

Item is represented by feature vector (same as SVD)

User is represented by items (different from SVD)

15

Feature 1

Featrue 2

Feature 3

0.8 0.1 1.2

Feature 1

Featrue 2

Feature 3

0.2 1.5 0

Feature 1

Featrue 2

Feature 3

0.2 0.2 0

Item 1 Item 2 Item 3

userFeature

1Featrue

2Feature

3

0.4 0.3 1.2

Copyright 2010 by CEBT

Restricted Boltzmann Machine (RBM)Restricted Boltzmann Machine (RBM)

Neural Network with one input layer and one hidden layer

Handling sparsity problem of data very well

16

Copyright 2010 by CEBT

Global EffectsGlobal Effects

Motivated from Data normalization

Based on user and item features

support (number of votes)

mean rating

mean standard deviation

Effective when applied to residuals of other algorithms

17

Copyright 2010 by CEBT

Residual TrainingResidual Training

A popular method to combine CF algorithms

Several models are trained by sequentially

18

Model 1 Model 2 Model 3

Copyright 2010 by CEBT

MotivationMotivation

Combinations of different kinds of collaborative filtering

leads to significant performance improvements over individual algorithms

19

Copyright 2010 by CEBT

“Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75”

Rookies

Copyright 2010 by CEBT

“My approach is to combine the results of many methods (also two-way interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression”

Arek Paterek

http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf

Copyright 2010 by CEBT

“When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system.”

U of Toronto

http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf

Copyright 2010 by CEBT

Gravity

home.mit.bme.hu/~gtakacs/download/gravity.pdf

Copyright 2010 by CEBT

“Our common team blends the result of team Gravity and team Dinosaur Planet.”

Might have guessed from the name…

When Gravity and Dinosaurs Unite

Copyright 2010 by CEBT

And, yes, the top team which is from AT&T…

“Our final solution (RMSE=0.8712) consists of blending 107 individual results. “

BellKor / KorBell

Copyright 2010 by CEBT

Blending ProblemBlending Problem

26

Alg 1 Alg 2 Alg 3 Alg 4Ratin

g

Data 1

3 3.3 3.2 2.5 3

Data 2

2.2 2.4 3 1.9 2

Data 3

2.8 3.2 3 2.9 ?

Data 4

1 1.1 1.3 2 3

Data 5

0.9 1.1 1.2 1 ?

Copyright 2010 by CEBT

Blending MethodsBlending Methods

Linear Regression (baseline)

Binned Linear Regression

Neural Network

Bagged Gradient Boosted Decision Tree

Kernel Ridge Regression

K-Nearest Neighbor Blending

27

Copyright 2010 by CEBT

Linear RegressionLinear Regression

Baseline

Assume a quadratic error function

Find optimal linear combination weight w

By solving the least squares problem

Weight w can be calculated with ridge regression

28

Copyright 2010 by CEBT

Binned Linear RegressionBinned Linear Regression

A Simple Extension of Linear Regression

Training dataset can be divided into B disjoint subjects

Training dataset may be very huge

Each subset can be used to learn different weight wb

Training set can be split by using following criteria:

Support (number of votes)

Time

Frequency (number of ratings from a user at day t).

29

Copyright 2010 by CEBT

Neural Network (NN)Neural Network (NN)

Efficient for huge data sets

30

Alg 1 Alg 2 Alg 3 Alg 4

Rating

Copyright 2010 by CEBT

Bagged Gradient Boosted Decision Tree Bagged Gradient Boosted Decision Tree (BGBDT)(BGBDT)

Single Decision Tree

Discretized output => limits its ability to model smooth functions

The number of possible outputs corresponds to the number of leaves

A Single tree is trained recursively by splitting always that leaf which provides the output value for the largest number of training samples

Bagging

Training Nbag copies of the model slightly different training set

(Stochastic Gradient) Boosting

Each model learns only a fraction of the desired function Ω

31

Copyright 2010 by CEBT

BGBDTBGBDT

32

Copyright 2010 by CEBT

Kernel Ridge Regression Blending Kernel Ridge Regression Blending (KRR)(KRR)

Kernel Ridge Regression

Regularized least square method for classification and regression

Similar to an SVM

– But, emphasis on points which don’t close to the decision boundary

Suitable for a small number of features and many training data sets.

Training complexity: O(n3)

Space requirement s: O(n2)

33

Copyright 2010 by CEBT

K-Nearest Neighbor Blending (KNN)K-Nearest Neighbor Blending (KNN)

Find k Similar Training Data Samples <user,item>

Aggregate the target value

34

Alg1 Alg2 Alg3 Alg4 Rating

Sample 1

4 3.2 3.2 3.6 ?

Sample 2

3 2.7 2 2.9 3

Sample 3

1 1.2 0.8 0.9 1.5

Sample 4

4 3 3.3 3.3 3.3

Sample 5

2 2 2 2 2

Copyright 2010 by CEBT

Experimental SetupExperimental Setup

18 CF Algorithms

4 versions of AFM

4 versions of GE

4 versions of KNN-item

2 versions of RBM

4 versions of SVD

1,400,000 samples

Running at 3.8 GHz CPU with 12GB main memory

35

Copyright 2010 by CEBT

ResultsResults

36

Copyright 2010 by CEBT

ConclusionsConclusions

The combinations of Collaborative Filtering Algorithms outperforms the single collaborative filtering algorithms

37

Thank youThank you

38

top related