survey of recommendation systems

38
Survey of Recommendation Systems

Upload: youalab

Post on 06-May-2015

6.903 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Survey of Recommendation Systems

Survey of Recommendation Systems

Page 2: Survey of Recommendation Systems

Outline

• Introduction

• Collaborative Filtering Algorithm

• Challenges

• Experiments (demo)

• Summary

• Future work

Page 3: Survey of Recommendation Systems

Outline

• Introduction

• Collaborative Filtering Algorithm

• Challenges

• Experiments (demo)

• Summary

• Future work

Page 4: Survey of Recommendation Systems

Introduction

• What is recommendation system?

– Recommend related items

– Personalized experiences

• How to build a recommendation system?

– Content-Based

– Collaborative Filtering Algorithm

• Examples

– Amazon

– Youa

Page 5: Survey of Recommendation Systems

Examples

Browsing a book

Recommendations

Rating?

Page 6: Survey of Recommendation Systems

Outline

• Introduction

• Collaborative Filtering Algorithm

• Challenges

• Experiments (demo)

• Summary

• Future work

Page 7: Survey of Recommendation Systems

CF Algorithm

• Memory-Based User-Based

Item-Based

• Model-Based Bayes

Clustering

Page 8: Survey of Recommendation Systems

User-Based CF Algorithm

Page 9: Survey of Recommendation Systems

User-Based CF Algorithm

User by Item Matrix:

Table 1: An example of user-item matrix

Table 2: A simple example of ratings matrix

Page 10: Survey of Recommendation Systems

User-Based CF Algorithm

Voting : vi,j corresponding to the vote for user i on item j.

Mean Vote :

where Ii is the set of items on which user i voted.

Predicted vote:

weights of n similar users normalizer

Page 11: Survey of Recommendation Systems

Similarity Computation

Vector Cosine-Based Similarity

Correlation-Based Similarity (Pearson)

Other Similarities

Page 12: Survey of Recommendation Systems

Vector Cosine-Based Similarity

Vector cosine similarity:

Uu ujuUu uiu

Uu ujuuiu

BA

rrrr

rrrrw

2

,

2

,

,,

,

)()(

))((

Adjusted cosine similarity:

different rating scale?

Page 13: Survey of Recommendation Systems

Correlation-Based Similarity

Pearson correlation:

Thus in the example in Table 2, we have w1,5 = 0.756.

Page 14: Survey of Recommendation Systems

Prediction Computation

Weighted Sum of Others’ Ratings:

For the simple example in Table 4, using the user-based CF algorithm, to

predict the rating for U1 on I2, we have

Page 15: Survey of Recommendation Systems

Recommendations I

Rating Prediction Algorithm:

a) Calculate Pa,i for each item i with prediction

computation formulation.

b) Recommend the top-N highest rating items

that the active user a has not purchased.

Page 16: Survey of Recommendation Systems

Recommendations II

K Nearest Neighbors Algorithm:

a) Find k most similar users (KNN).

b) Identify a set of items, C, purchased by the

group together with their frequency.

c) Recommend the top-N most frequent items in

C that the active user has not purchased.

Page 17: Survey of Recommendation Systems

Item-Based CF Algorithm

Correlation-Based Similarity:

where ru,i is the rating of user u on item i, is the average rating of the ith item by

those users.

User-Item

Matrix

ir

Page 18: Survey of Recommendation Systems

Prediction Computation

Simple Weighted Average:

where wi,n is the weight between items i and n, ru,n is the rating for

user u on item n.

Page 19: Survey of Recommendation Systems

Extensions

• Default Voting

• Inverse User Frequency

• Case Amplification

Page 20: Survey of Recommendation Systems

Default Voting

Problem:

• pair-wise similarity is computed only from the ratings in

the intersection of the items both users have rated.

• too few votes at the beginning

Solution: Assuming some default voting values for the missing

ratings can improve the CF prediction performance.

Dimension Reduction, such as SVD, PCA etc.

Page 21: Survey of Recommendation Systems

Inverse User Frequency

Definition:

)/log( ji nnf

where nj is the number of users who have rated item j and

n is the total number of users.

Page 22: Survey of Recommendation Systems

Case Amplification

where ρ is the case amplification power, ρ ≥ 1, and

typical choice of ρ is 2.5. Case amplification reduces

noise in the data.

It tends to favor high weights as small values raised to a

power become negligible.

For example, wi,j = 0.9, then it remains high (0.92.5 ≈ 0.8);

if wi,j = 0.1, then it be negligible (0.12.5 ≈ 0.003).

Page 23: Survey of Recommendation Systems

Model-Based CF Algorithm

• Simple Bayesian CF Algorithm

• Clustering CF Algorithm

Page 24: Survey of Recommendation Systems

Simple Bayesian CF Algorithm

Simple Bayesian:

Laplace Estimator:

Page 25: Survey of Recommendation Systems

Simple Bayesian CF Algorithm

Example in Table 4, to produce the rating for U1 on I2 using the

Simple Bayesian CF algorithm and the Laplace Estimator:

Page 26: Survey of Recommendation Systems

Clustering CF Algorithm

For two data objects, X = (x1, x2, …, xn) and Y = (y1,

y2, …, yn), the popular Minkowski distance is defined as,

where n is the dimension number of the object, and q is a positive integer.

Obviously, when q = 1, d is Manhattan distance; when

q = 2, d is Euclidian distance.

Page 27: Survey of Recommendation Systems

Evaluation Metrics

Mean Absolute Error and Normalized Mean Absolute Error:

where rmax and rmin are the upper and lower bounds of the ratings.

Page 28: Survey of Recommendation Systems

Outline

• Introduction

• Collaborative Filtering Algorithm

• Challenges

• Experiments (demo)

• Summary

• Future work

Page 29: Survey of Recommendation Systems

Challenges

• Data sparsity

• Scalability

• Synonymy

• Gray Sheep

• Shilling Attacks

Page 30: Survey of Recommendation Systems

Outline

• Introduction

• Collaborative Filtering Algorithm

• Challenges

• Experiments (demo)

• Summary

• Future work

Page 31: Survey of Recommendation Systems

Demo

• Tools:Mahout - Scalable machine learning and data

mining library,http://mahout.apache.org/

• Data: MovieLens, http://www.movielens.org/

Page 32: Survey of Recommendation Systems

Outline

• Introduction

• Collaborative Filtering Algorithm

• Challenges

• Experiments (demo)

• Summary

• Future work

Page 33: Survey of Recommendation Systems

Conclusions

CF categories Memory-based CF

Representative techniques Item-based/user-based top-N

recommendations

Main advantages 1. easy implementation

2. new data can be added easily and

incrementally

3. need not consider the content of the

items being recommended

4. scale well with co-rated items

Main shortcomings 1. are dependent on human ratings

2. performance decrease when data

are sparse

3. cannot recommend for new users

and items

4. have limited scalability for large

Page 34: Survey of Recommendation Systems

Conclusions

CF categories Model-based CF

Representative techniques 1. Bayesian belief nets CF

2. Clustering CF

3. CF using dimensionality reduction

techniques, SVD, PCA

Main advantages 1. better address the sparsity,

scalability and other problems

2. improve prediction performance

3. give an intuitive rationale for

recommendations

Main shortcomings 1. expensive model-building

2. trade-off between prediction

performance and scalability

3. lose useful information for

dimensionality reduction techniques

Page 35: Survey of Recommendation Systems

Outline

• Introduction

• Collaborative Filtering Algorithm

• Challenges

• Experiments (demo)

• Summary

• Future work

Page 36: Survey of Recommendation Systems

Future work

Scalability Real-time

Page 37: Survey of Recommendation Systems

Q & A

Page 38: Survey of Recommendation Systems

References

J. Breese, D. Heckerman, and C. Kadie, “Empirical analysis of predictive

algorithms for collaborative filtering,” in Proceedings of the 4th

Conference on Uncertainty in Artificial Intelligence (UAI ’98), 1998.

B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-based collaborative

filtering recommendation algorithms,” in Proc. of the WWW Conference,

2001.

K. Miyahara and M. J. Pazzani, “Collaborative filtering with the simple

Bayesian classifier,” in Proceedings of the 6th Pacific Rim International

Conference on Artificial Intelligence, pp. 679–689, 2000.

L. H. Ungar and D. P. Foster, “Clustering methods for collaborative

filtering,” in Proceedings of the Workshop on Recommendation Systems,

AAAI Press, 1998.

Xiaoyuan Su and Taghi M. Khoshgoftaar, “A Survey of Collaborative

Filtering Techniques,” in Advances in Artificial Intelligence Volume 2009,

Article ID 421425, 19 pages.