conmf: exploiting user comments for clustering web2.0 items

17
CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June 2013 Email: [email protected] School of Computing National University of Singapore

Upload: minya

Post on 01-Feb-2016

56 views

Category:

Documents


0 download

DESCRIPTION

CoNMF: Exploiting User Comments for Clustering Web2.0 Items. Presenter: He Xiangnan 28 June 2013 Email: [email protected] School of Computing National University of Singapore. Introduction. Motivations: Users comment on items based on their own interests. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Presenter: He Xiangnan

28 June 2013Email: [email protected]

School of Computing

National University of Singapore

Page 2: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

Introduction

• Motivations:– Users comment on items based on their own interests.

– Most users’ interests are limited.

– The categories of items can be inferred from the comments.

• Proposed problem:– Clustering items by exploiting user comments.

• Applications:– Improve search diversity.

– Automatic tag generation from comments.

– Group-based recommendation

2WING, NUS

Page 3: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

Challenges

• Traditional solution:– Represent items as a feature space.

– Apply any clustering algorithm, e.g. k-means.

• Key challenges:– Items have heterogeneous features:

1. Own features (e.g. words for articles, pixels for images)

2. Comments Usernames Textual contents

– Simply concatenate all features does not preform well.

– How to meaningfully combine the heterogeneous views to produce better clustering (i.e. multi-view clustering)?

3WING, NUS

Page 4: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

Proposed solution

• Extend NMF (Nonnegative Matrix Factorization) to support multi-view clustering…

4WING, NUS

Page 5: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

NMF (Non-negative Matrix Factorization)

5WING, NUS

• Factorize data matrix V (#doc×#words) as:

–where W is #doc×k and H is k×#words, and each entry is non-negative

• Alternating optimization:– With Lagrange multipliers, differentiate on W and H respectively.

Local optimum, not global!

• Goal is minimizing the objective function:

–where || || denotes the Frobenius norm

Page 6: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

• Difference with SVD(LSI):

Characteristics of NMF• Matrix Factorization with a non-negative constraint

Reduce the dimension of the data; derive the latent space

Characteristic SVD NMF

Orthogonal basis Yes No

Negative entry Yes No

Post clustering Yes No

• Theoretically proved suitable for clustering (Chis et al. 2005)• Practically shown superior performance than SVD and k-means in

document clustering (Xu et al. 2003)

Page 7: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

Extensions of NMF

• Relationships with other clustering algorithms:– K-means: Orthogonal NMF = K-means– PLSI: KL-Divergence NMF = PLSI– Spectral clustering

• Extensions:–Tri-factor of NMF( V = W S H ) (Ding et al. 2006)–NMF with sparsity constraints (Hoyer 2004)–NMF with graph regularization (Cai et al. 2011)– However, studies on NMF-based multi-view clustering approaches are quite limited. (Liu et al. 2013)

• My proposal:– Extend NMF to support multi-view clustering

7WING, NUS

Page 8: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

Proposed solution - CoNMF

• Idea:– Couple the factorization process of NMF

• Example:– Single NMF:

Factorization equation : Objective function: Constraints: all entries of W and H are non-negative.

8WING, NUS

- 2-view CoNMF: Factorization equation:

Objective function:

Page 9: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

CoNMF Framework

– Mutual-based: Point-wise:

Cluster-wise:

9WING, NUS

• Objective function:

–Similar alternating optimization with Lagrange multipliers can solve it.

• Coupling the factorization process of multiple matrices(i.e. views) via regularization.

• Different options of regularization:– Centroid-based (Liu et al. 2013):

Page 10: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

Experiments• Last.fm dataset:

• 3-views:

• Ground-truth:– Music type of each artist provided by Last.fm

• Evaluation metrics:– Accuracy and F1

• Average performance of 20 runs.10WING, NUS

#Items #Users #Comments #Clusters

9,694 131,898 2,500,271 21

View #Items #Features Token type

Items-Desc. words 9,694 14,076 TF – IDF

Items-Comm. words 9,694 31,172 TF – IDF

Items-Users 9,694 131,898 Boolean

Page 11: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

Statistics of datasets

11WING, NUS

Statistics of #items/user Statistics of #clusters/user

P(T<=3) = 0.6229P(T<=5) = 0.8474P(T<=10) = 0.9854

Verify our assumption: each user usually comments on limited music types.

Page 12: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

Experimental results (Accuracy)

Initialization Method Desc. Comm. Users Comb.Random kmeans 0.25 0.28 0.34 0.415

12WING, NUS

  SVD 0.29 0.31 0.28 0.294

Random NMF 0.24 0.27 0.32 0.313

K-means NMF 0.26 0.32 0.40 0.417

K-means CoNMF – point       0.460

K-means CoNMF – cluster       0.420

NMF Multi-NMF(SDM'13)       0.369

Random MM-LDA(WSDM'09)       0.366

1. Users>Comm.>Desc., while combined is best.2. SVD performs badly on users (non-textual).3. Users>Comm.>Desc., while combined does worse.4. Initialization is important for NMF.

5. CoNMF-point performs best.

6. Other two state-of-the-art baselines.

Page 13: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

Experimental results (F1)

13WING, NUS

Initialization Method Desc. Comm. Users Combined

Random kmeans 0.15 0.16 0.15 0.254

  SVD 0.25 0.25 0.24 0.249

Random NMF 0.13 0.18 0.21 0.216

K-means NMF 0.15 0.21 0.27 0.298

K-means CoNMF –point       0.320

K-means CoNMF – cluster       0.284

NMF Multi-NMF(SDM'13)       0.265

Random MM-LDA(WSDM'09)       0.286

Page 14: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

Conclusions

• Comments benefit clustering.• Mining different views from the comments is

important:– The two views (commenting words and users) contribute differently for clustering.

– For this Last.fm dataset, users is more useful.

– Combining all views works best.

• For NMF-based methods, initialization is important.

14WING, NUS

Page 15: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

Ongoing

• More experiments on other datasets.• Improve the CoNMF framework through adding the

sparseness constraints.• The influence of normalization on CoNMF.

15WING, NUS

Page 16: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

Thanks!

QA?

16WING, NUS

Page 17: CoNMF: Exploiting User Comments for Clustering Web2.0 Items

Xiangnan He

References(I)

• Ding Chris, Xiaofeng He, and Horst D. Simon. 2005. On the equivalence of nonnegative matrix factorization and spectral clustering. In  Proc. SIAM Data Mining Conf 2005.

• Wei Xu, Xin Liu, and Yihong Gong. 2003. Document clustering based on non-negative matrix factorization. In Proc. of SIGIR 2003

• Chris Ding, Tao Li, Wei Peng. 2006. Orthogonal nonnegative matrix tri-factorizations for clustering. In Proc. of SIGKDD 2006

• Patrik O. Hoyer. 2004. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Researh 2004

• Deng Cai, Xiaofei He, Jiawei Han, and Thomas S. Huang. 2011. Graph Regularized Nonnegative Matrix Factorization for Data Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011 

• Jialu Liu, Chi Wang, Jing Gao and Jiawei Han. 2013. Multi-View Clustering via Joint Nonnegative Matrix Factorization, In Proceedings of SIAM Data Mining Conference (SDM’13)

17WING, NUS