learning to recommend with user generated content

Learning to Recommend with User

Generated Content

Yueshen Xu1, Zhiyuan Chen2, Jianwei Yin1, Zizheng

Wu1 and Taojun Yao1

1School of Computer Science and Technology, Zhejiang University

2University of Illinois at Chicago

[email protected]; [email protected]

2015/6/9 1Zhejiang University

Junxiang Wang

Yueshen Xu, WAIM, 2015

Outline

Background

Introduction

Related Work

Recommendation with UGC in User Side

Matrix Factorization

Topic Analysis for Items through Topic Modeling

User Interest Distribution

User Topic Regularization

Recommendation with UGC in Item Side

Item Topic Regularization

Experiment and Evaluation

Reference2015/6/9 2Zhejiang University

Keywords: Recommendation, User

Generated Content, Topic Modeling, Matrix

Factorization


Background

Recommendation in General

Collaborative Filtering (CF)

− Matrix Factorization (MF)

Content-based approach

− Pandora music genome project


User Generated Content (UGC)

social tag, review, question answer, blog, tweet, etc

tag-based / review-based recommendation

Problems in existing works not every web site has all kinds of UGC

the item-word / user-word space is highly sparse

synonym & polysemy

most works only focus on a single kind of UGC

item1 item2 item3 item4

user1 r11

user2 r22

user3

user4 r41 r44

user5 r53


Background

2015/6/9 4

Other related work social / trust-based recommendation helpful but limited

− no social relationship Amazon, Ebay, Newegg, Jingdong, Expedia, etc

− UGC √

Description/Profile-based recommendation− static content

− fail to distinguish different items

− unrelated to a user’s preference

UGC, in contrast: emphasize an item’s features

− those words received frequently

increase dynamically

associated with a user’s preference / interested topics − I like science fiction films, so I wrote a lot of movie reviews that contain

words like fiction, tech, super, hero, robotic, machine

natural chunking (social tag)


Contribution


Main contributions

We study UGC in learning user interests and learning item features

We propose a novel user-oriented collaborative filtering model and a

novel item-oriented collaborative filtering model

We propose a way to utilize different types of UGC in a unified way in

recommender systems

We expand an existing dataset by crawling new data, and conduct

sufficient experiments on three real-world datasets, which attest the

effectiveness of proposed models.


Recommendation with UGC in User

Side

2015/6/9 Zhejiang University 6

Topic analysis for items through topic modeling Terms in UGC are combined together to compose the term set W

each item owns an aggregated term list

pLSA/LDA/HDP/nCRP/PAM: all are OK

𝚯 = 𝜽𝒋 (𝜽𝒋 = 𝜃𝒋𝟏, 𝜃𝒋𝟐, … , 𝜃𝒋𝑲, ) is the topic/aspect distribution

of document j (i.e., item j) what we need

User Interest Distribution Cluster items into groups according to the similarity of their

topics (K-Means/GMM/K-Medoid: all are OK)



Side


User Interest Distribution (cont.)

Intuition : find items with similar topics, although they are in

different categories: clothes, gadget, book, toy, DVD all about

Harry Potter

Aggregate each user’s consumption records on each cluster 𝐶𝑞

𝑆𝑖𝑚 𝑖, 𝑙 =𝑃𝐶𝐶, 𝒄𝒐𝒔𝒊𝒏𝒆 𝑜𝑟 𝐾𝐿 𝑑𝑖𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒

the weight of 𝑙 as one of user 𝑖’s

neighbors: 𝑒𝑖𝑙 𝑖, 𝑙 =𝑆𝑖𝑚(𝑖,𝑙)

𝑙′∈𝐿(𝑖) 𝑆𝑖𝑚(𝑖,𝑙′)

A novel regularization : user topic regularization (UTR)

𝑚𝑖𝑛 𝑖=1𝑀 ∥ 𝑈𝑖 − 𝑙∈𝐿(𝑖) 𝑒𝑖𝑙𝑈𝑙 ∥𝐹

2

Intuition: users with similar interested topics tend to have similar latent features

user 𝑖

user 𝑙



Side


A new MF model (UTR-MF)

𝑚𝑖𝑛𝑈,𝑉𝐿 = 𝑖=1𝑀 𝑗=1

𝑁 𝐼𝑖𝑗(𝑅𝑖𝑗 − 𝑈𝑖𝑇𝑉𝑗)

2 +𝜆𝑈

2∥ 𝑈 ∥𝐹

2 +𝜆𝑉

2∥ 𝑉 ∥𝐹

2 +𝛼

2 𝑖=1𝑀 ∥ 𝑈𝑖 − 𝑙∈𝐿(𝑖) 𝑒𝑖𝑙𝑈𝑙 ∥𝐹

2

gradient descent/ coordinate descent

Gradient Descent

𝜕𝐿

𝜕𝑈𝑖= 𝑗=1

𝑁 𝐼𝑖𝑗(𝑅𝑖𝑗 − 𝑈𝑖𝑇𝑉𝑗)(−𝑉𝑗) + 𝜆𝑈𝑈𝑖 + 𝛼 𝑈𝑖 − 𝑙∈𝐿 𝑖 𝑒𝑖𝑙𝑈𝑖 +

𝛼 𝑔∈𝐺(𝑖)(𝑈𝑔 − 𝑙′∈𝐿 𝑔 𝑒𝑔𝑙′𝑈𝑙′) × (−𝑒𝑔𝑖)

𝜕𝐿

𝜕𝑉𝑗= 𝑖=1

𝑀 𝐼𝑖𝑗(𝑅𝑖𝑗 − 𝑈𝑖𝑇𝑉𝑗)(−𝑈𝑖) + 𝜆𝑉𝑉𝑗

𝐺(𝑖) is a set consisting of those users whose neighborhoods

include user 𝑖


Recommendation with UGC in Item

Side

2015/6/9 9

Intuition for items: similar UGC similar topic

distribution similar latent feature

𝑆𝑖𝑚 𝑗, ℎ : similarity between item j and h PCC, cosine or KL

divergence

𝑤 𝑗, ℎ =𝑆𝑖𝑚(𝑗,ℎ)

ℎ′∈𝐻(𝑗) 𝑆𝑖𝑚(𝑗,ℎ′)

A novel regularization: item topic regularization (ITR)

𝑚𝑖𝑛 𝑗=1𝑁 ∥ 𝑉𝑗 − ℎ∈𝐻(𝑗)𝑤𝑗ℎ𝑉ℎ ∥𝐹

2

A new MF model (ITR-MF):

‒ 𝑚𝑖𝑛𝑈,𝑉𝐿 = 𝑖=1𝑀 𝑗=1

𝑁 𝐼𝑖𝑗(𝑅𝑖𝑗 − 𝑈𝑖𝑇𝑉𝑗)

2 +𝜆𝑈

2∥ 𝑈 ∥𝐹

2 +𝜆𝑉

2∥ 𝑉 ∥𝐹

2 +𝛼

2 𝑗=1𝑁 ∥ 𝑉𝑗 − ℎ∈𝐻(𝑗)𝑤𝑗ℎ𝑉ℎ ∥𝐹

2

A natural combination: UTR + ITR

gradient descent/coordinate descent




Real-world dataset Movielens (social tag + rating)

Last.fm (expanded, social tag + rating)

Yelp (review + rating)

Evaluation Metric: RMSE and MAE Compared baseline models: UserCF, ItemCF, PMF, TF-IDF MF, CTR

In social tag case:




Experimental results (cont.)

UTR-MF and ITR-MF outperform other baselines in all cases

A detailed example, in Last.fm dataset, ITR-MF achieves 14%

improvement than PMF and 8% improvement than CTR

ITR-MF behaves better than UTR-MF: a user’s preference is harder to

infer. The main reason is probably that a user’s preference can change

dynamically




Experimental results (cont.) in review case the improvement is similar to that in the social tag

case

UTR-MF and ITR-MF outperform other baselines in all cases

ITR-MF behaves better than UTR-MF: a user’s preference is harder to

infer

The improvements are significant according to the paired t-test (𝑝 <0.001)

For more details, please refer to our paper


Conclusion

Conclusion

We demonstrate that different types of UGC can be integrated

into the MF model in a unified way

User preferences and item features can be learned from UGC

text

Our two novel regularization terms are effective to model user

preferences and item features

Our two MF-extended models can achieve large improvements

Future Work

Study other types of UGC, such as tweet and blog, to learn user

preferences and influential events in SNS



Reference

[1] Adomavicius, G. and Tuzhilin, A.: Toward the next generation of recommender systems: A survey of

the state-of-the-art and possible extensions. In: IEEE TKDE, 17(6):734-749 (2005)

[2] Aggarwal, C.C. and Zhai, C.: Mining Text Data. In: Springer, New York (2012)

[3] Bischo, K., Firan, C.S., Nejdl, W., and Paiu, R.: Can all tags be used for search?In: ACM CIKM, pp.

193-202 (2008)

[4] Blei, D.M., Ng, A. Y., and Jordan, M. I.: Latent dirichlet allocation. In: JMLR,3:993-1022 (2003)

[5] Cantador, I., Brusilovsky, P., and Ku ik, T.: HetRec workshop. In: ACM RecSys,New York, USA (2011)

[6] Chen, C., Zheng, X., Wang, Y., Hong, F. and Lin, Z.: Context-Aware Collaborative Topic Regression

with Social Matrix Factorization for Recommender Systems. In: AAAI, pp. 9-15 (2014)

[7] Fang, Y. and Si, L.: Matrix co-factorization for recommendation with rich side information and implicit

feedback. In: HetRec (workshop of RecSys), pp. 65-69 (2011)

[8] Griths, T. L. and Steyvers, M.: Finding Scientific Topics. In: PNAS (2004)

[9] Koren, Y., Bell, R., and Volinsky, C.: Matrix factorization techniques for recommender systems. In:

Computer, 42(8):30-37 (2009)

[10] Liang, H., Xu, Y., Li, Y., Nayak, R., and Tao, X.: Connecting users and items with weighted tags for

personalized item recommendations. In: Hypertext, pp.51-60(2010)

[11] Liu, X. and Aberer, K.: SoCo: a social network aided context-aware recommendersystem. In: WWW,

pp. 781-802 (2013)

[12] Ma, H., Zhou, D., Liu, C., Lyu, M.R., and King, I.: Recommender systems with social regularization.

In: ACM WSDM, pp. 287-296 (2011)



Reference

[13] McAuley, J.J. and Leskovec, J.: Hidden factors and hidden topics: understanding rating

dimensions with review text. In: ACM RecSys, pp. 165-172 (2013)

[14] Moens, M.-F., Li, J. and Chua, T.-S. : Mining User Generated Content. In: Chapman and Hall/CRC

(2014)

[15] Pandora. Music genome project. In: http://www.pandora.com/about/mgp

[16] Purushotham, S. and Liu, Y.: Collaborative topic regression with social matrix factorization for

recommendation systems. In: IEEE ICML, pp. 759-766 (2012)

[17] Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., and Riedl, J.: Grouplens: An open

architecture for collaborative filtering of netnews. In: CSCW, pp. 175-186 (1994)

[18] Rovi. Recommendations api version 2.0. In:

http://proddoc.rovicorp.com/mashery/index.php/Recommendations

[19] Salakhutdinov, R. and Mnih, A.: Probabilistic matrix factorization. In: NIPS

[20] Sarwar, B., Karypis, G., Konstan, J., and Reidl, J.: Item-based collaborative tering

recommendation algorithm. In: WWW, pp. 285-295 (2001)

[21] Wang, C. and Blei, D.M.: Collaborative topic modeling for recommending scientic articles. In: ACM

SIGKDD, pp. 448-456 (2011)

[22] Yang, X., Steck, H., and Liu, Y.: Circle-based recommendation in online social networks. In: ACM

SIGKDD, pp. 1267-1275 (2012)

[23] Zhang, Y., Lai, G., Zhang, M., Zhang, Y., Liu, Y. and Ma, S.: Explicit factor models for explainable

recommendation based on phrase-level sentiment analysis. In: ACM SIGIR, pp. 83-92 (2014)



Thank you!

Q&A


learning to recommend with user generated content

Data & Analytics

topic modeling user

kinds of ugc

background recommendation

learning user interests

zhejiang university

different types of ugc

users preference ugc

itemword userword space