a scalable collaborative filtering framework based on co clustering

Post on 06-May-2015

2.658 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

A scalable collaborative filtering framework based on co clustering

TRANSCRIPT

A SCALABLE COLLABORATIVE FILTERING FRAMEWORK BASED ON CO-CLUSTERINGAuthors/ Thomas George and Srujana Merugu

Source/ ICDM’05, pp. 628-628

Presenter/ Allen

1

OUTLINE

Introduction Related Work Problem Definition Collaborative Filtering via Co-clustering Scalable Collaborative Filtering System Experimental Results Conclusion

2

INTRODUCTION

Due to the overwhelming increasing in web-based activities, users are often forced to choose from a large number of products or content items.

To aid users in the decision making process, it has become increasingly important to design recommender systems.

Collaborative filtering identify the likely preferences of a user based on the known preferences of other users.

3

INTRODUCTION (CONT.) Existing collaborative filtering methods based on correlation criteria

Singular value decomposition (SVD) Non-negative matrix factorization (NNMF)

Drawbacks: Computationally expensive of training component

The practical scenarios such as real-time news personalization require dynamic collaborative filtering.

The key idea Simultaneously obtaining user and item neighborhoods via co-

clustering. Generating predictions based on average ratings. 4

INTRODUCTION (CONT.)

Two new contributions: Dynamic collaborative filtering approach

Supporting the entry of new users, items and ratings via a hybrid of incremental and batch versions of the co-clustering algorithm.

A scalable, real-time collaborative filtering system Developing parallel versions of co-clustering, prediction and

incremental training routines.

Notation: A: matrix, e.g. Aij denoting the corresponding matrix elements.

: sets, and enumerated as {xi}ni=1, where xi are the elements of

the set. 5

RELATED WORK

Recommender System Content-based filtering system Collaborative filtering system

Co-clustering SVD and NNMF-based filtering techniques that predict the

unknown ratings based on a low rank approximation of the original ratings matrix. The missing values are filled with the average ratings.

Incremental versions of SVD has been proposed to solve the computational expensive problem. (SDM 2003)

6

PROBLEM DEFINITION

Let U={ui}mi=1 be the set of users such that |U|=m and

P={pj}nj=1 be the set of items such that |P|=n.

Let A be the mn ratings matrix such that Aij is the rating of the user ui to the item pj. Let W be the mn matrix corresponding to the condifence of

the ratings in A. Wij=1, the rating is known and 0 otherwise.

Let user clustering : {1, …, m} → {1, …, k}, and item clustering :{1, …, n} → {1, …, l} k: # user clusters; l: # item clusters

7

PROBLEM DEFINITION (CONT.) The approximate matrix  is given by

where g=(i), h=(j). Ai

R, AjC are the average ratings of user ui and item pj.

AghCOC, Ag

RC and AhCC are the average ratings of the corresponding

co-cluster, user-cluster and item-cluster.

8

COLLABORATIVE FILTERING VIA CO-CLUSTERING

Static training (co-clustering): the goal is to minimize

The row and column assignment steps can be implemented efficiently by pre-computing the invariant parts of the update cost functions. Required info. Row updating: minimizing

Column updating: minimizing

9CCh

COChi

tmpji AAA )(

3)(

STATIC TRAINING: CO-CLUSTERING

10

PREDICTION

11

INCREMENTAL TRAINING

12

SCALABLE COLLABORATIVE FILTERING SYSTEM

Using a distributed memory representation for the data objects so that each of the processors P1 and P2 are in fact clusters of processors. P1 handles the prediction and incremental training. P2 is responsible for the static training.

13

PARALLEL CO-CLUSTERING

14

EXPERIMENTAL RESULTS

Datasets and algorithm Movie-lens (100K): 943 users and 1682 movies consists of

100,000 ratings(1-5). BookCrossing: 470034 users and 133438 books consists of

269392 ratings(1-10). Movie1-Movie10: 10-100% ratings of the movie-lens 100K.

80% training and 20% testing for all the datasets. Evaluation metrics: Mean Absolute Error (MAE)

The experiments evaluated the effectiveness and efficiency in terms of MAE and execution time.

15

MAE COMPARISON

Mov1: movie-lens Mov2: BookCrossing Mov3: 10 subsets of movie-lens

16

K=3

VARIATION OF MAE WITH # PARAMETERS

# prediction parameters: COCLUST: (m+n+kl-k-l) values SVD, NNMF: (m+n)(k+l) values

Movie3 dataset

17

EFFICIENCY

The time is needed for prediction on each given test pair of movie-lens.

Training time (co-clustering) vs. Data size Movie-lens dataset Experimental devices

AMD 1.4Ghz on 128 computer

nodes with 384MB RAM

18

TRAINING TIME VS. # OF PROCESSORS

Movie-lens dataset Experimental devices

AMD 1.4Ghz on different # of processors with 384MB RAM

19

CONCLUSION

Recommender system are proving to be extremely useful for a number of online activities such as e-commerce.

Regarding to the dynamic scenario, the efficiency and effectiveness issues should be concerned. New users, items and ratings enter the system at a rapid rate.

This paper proposed a new dynamic CF approach based on co-clustering.

Empirical results indicate the high quality predictions at a much lower computational cost.

20

top related