a scalable collaborative filtering framework based on co clustering

A SCALABLE COLLABORATIVE FILTERING FRAMEWORK BASED ON CO-CLUSTERINGAuthors/ Thomas George and Srujana Merugu

Source/ ICDM’05, pp. 628-628

Presenter/ Allen

OUTLINE

Introduction Related Work Problem Definition Collaborative Filtering via Co-clustering Scalable Collaborative Filtering System Experimental Results Conclusion

INTRODUCTION

Due to the overwhelming increasing in web-based activities, users are often forced to choose from a large number of products or content items.

To aid users in the decision making process, it has become increasingly important to design recommender systems.

Collaborative filtering identify the likely preferences of a user based on the known preferences of other users.

INTRODUCTION (CONT.) Existing collaborative filtering methods based on correlation criteria

Singular value decomposition (SVD) Non-negative matrix factorization (NNMF)

Drawbacks: Computationally expensive of training component

The practical scenarios such as real-time news personalization require dynamic collaborative filtering.

The key idea Simultaneously obtaining user and item neighborhoods via co-

clustering. Generating predictions based on average ratings. 4

INTRODUCTION (CONT.)

Two new contributions: Dynamic collaborative filtering approach

Supporting the entry of new users, items and ratings via a hybrid of incremental and batch versions of the co-clustering algorithm.

A scalable, real-time collaborative filtering system Developing parallel versions of co-clustering, prediction and

incremental training routines.

Notation: A: matrix, e.g. Aij denoting the corresponding matrix elements.

: sets, and enumerated as {xi}ni=1, where xi are the elements of

the set. 5

RELATED WORK

Recommender System Content-based filtering system Collaborative filtering system

Co-clustering SVD and NNMF-based filtering techniques that predict the

unknown ratings based on a low rank approximation of the original ratings matrix. The missing values are filled with the average ratings.

Incremental versions of SVD has been proposed to solve the computational expensive problem. (SDM 2003)

PROBLEM DEFINITION

Let U={ui}mi=1 be the set of users such that |U|=m and

P={pj}nj=1 be the set of items such that |P|=n.

Let A be the mn ratings matrix such that Aij is the rating of the user ui to the item pj. Let W be the mn matrix corresponding to the condifence of

the ratings in A. Wij=1, the rating is known and 0 otherwise.

Let user clustering : {1, …, m} → {1, …, k}, and item clustering :{1, …, n} → {1, …, l} k: # user clusters; l: # item clusters

PROBLEM DEFINITION (CONT.) The approximate matrix Â is given by

where g=(i), h=(j). Ai

R, AjC are the average ratings of user ui and item pj.

AghCOC, Ag

RC and AhCC are the average ratings of the corresponding

co-cluster, user-cluster and item-cluster.

COLLABORATIVE FILTERING VIA CO-CLUSTERING

Static training (co-clustering): the goal is to minimize

The row and column assignment steps can be implemented efficiently by pre-computing the invariant parts of the update cost functions. Required info. Row updating: minimizing

Column updating: minimizing

tmpji AAA )(

STATIC TRAINING: CO-CLUSTERING

PREDICTION

INCREMENTAL TRAINING

SCALABLE COLLABORATIVE FILTERING SYSTEM

Using a distributed memory representation for the data objects so that each of the processors P1 and P2 are in fact clusters of processors. P1 handles the prediction and incremental training. P2 is responsible for the static training.

PARALLEL CO-CLUSTERING

EXPERIMENTAL RESULTS

Datasets and algorithm Movie-lens (100K): 943 users and 1682 movies consists of

100,000 ratings(1-5). BookCrossing: 470034 users and 133438 books consists of

269392 ratings(1-10). Movie1-Movie10: 10-100% ratings of the movie-lens 100K.

80% training and 20% testing for all the datasets. Evaluation metrics: Mean Absolute Error (MAE)

The experiments evaluated the effectiveness and efficiency in terms of MAE and execution time.

MAE COMPARISON

Mov1: movie-lens Mov2: BookCrossing Mov3: 10 subsets of movie-lens

VARIATION OF MAE WITH # PARAMETERS

# prediction parameters: COCLUST: (m+n+kl-k-l) values SVD, NNMF: (m+n)(k+l) values

Movie3 dataset

EFFICIENCY

The time is needed for prediction on each given test pair of movie-lens.

Training time (co-clustering) vs. Data size Movie-lens dataset Experimental devices

AMD 1.4Ghz on 128 computer

nodes with 384MB RAM

TRAINING TIME VS. # OF PROCESSORS

Movie-lens dataset Experimental devices

AMD 1.4Ghz on different # of processors with 384MB RAM

CONCLUSION

Recommender system are proving to be extremely useful for a number of online activities such as e-commerce.

Regarding to the dynamic scenario, the efficiency and effectiveness issues should be concerned. New users, items and ratings enter the system at a rapid rate.

This paper proposed a new dynamic CF approach based on co-clustering.

Empirical results indicate the high quality predictions at a much lower computational cost.

a scalable collaborative filtering framework based on co clustering

co clustering

training time coclustering

user clustering

coclustering algorithm

parallel coclustering

mn ratings matrix

corresponding cocluster

theunknown ratings

Documents

a scalable clustering-based task scheduler for homogeneous...

adaptive collaborative filtering based on scalable ... ·...

clustering and correlation based collaborative filtering ......

clustering-based collaborative filtering using an

scalable fair clustering - icml.cc · scalable fair...

a clustering approach for topic filtering within

scalable clustering using multiple gpus

scalable clustering by truncated fuzzy -means …

scalable collaborative filtering recommendation algorithms...

an efficient and scalable algorithm for clustering...

incremental collaborative filtering via evolutionary co...

limbo: scalable clustering of categorical...

scalable inference algorithms for clustering large networks

a scalable collaborative filtering framework based on...

efficient synonym filtering and scalable delayed translation...

enabling scalable spectral clustering for image segmentation...

scalable web server clustering technologies

simple and scalable constrained clustering: a generalized...

local graph sparsiﬁcation for scalable clustering

google news personalization scalable online collaborative...