research on recommender systems: beyond ratings and lists

Research on Recommender Systems: Beyond Ratings and Lists

Denis Parra, Ph.D. Information Sciences Assistant Professor, CS Department School of Engineering Pontificia Universidad Católica de Chile

Tuesday, November 11th of 2014

Outline

•  Personal Introduction •  Quick Overview of Recommender Systems •  My Work on Recommender Systems

– Tag-Based Recommendation –  Implicit-Feedback (time allowing …) – Visual Interactive Interfaces

•  Summary & Current & Future Work

Nov 11th 2014 D.Parra ~ JCC 2014 ~ Invited Talk 2

Personal Introduction

•  I’m from Valdivia! •  There are many reasons to love Valdivia


The City




The Sports




The Animals


•  B.Eng. and professional title of Civil Engineer in Informatics from Universidad Austral de Chile (2004), Valdivia, Chile

•  Ph.D. in Information Sciences at University of Pittsburgh (2013), Pittsburgh, PA, USA


INTRODUCTION Recommender Systems

Nov 11th 2014 8

* Danboard (Danbo): Amazon’s cardboard robot, in these slides it represents a recommender system

*

Recommender Systems (RecSys) Systems that help (groups of) people to find relevant items in

a crowded item or information space (MacNee et al. 2006)


Why do we care about RecSys?

•  RecSys have gained popularity due to several domains & applications that require people to make decisions among a large set of items.


A lil’ bit of History

•  First recommender systems were built at the beginning of 90’s (Tapestry, GroupLens, Ringo)

•  Online contests, such as the Netflix prize, grew the attention on recommender systems beyond Computer Science (2006-2009)


The Recommendation Problem

•  The most popular way that the recommendation problem has been presented is about rating prediction:

•  How good is my prediction?


Item 1 Item 2 … Item m

User 1 1 5 4

User 2 5 1 ?

…

User n 2 5 ?

Predict!

Recommendation Methods

•  Without covering all possible methods, the two most typical classifications on recommender algorithms are


Classification 1 Classification 2 -  Collaborative Filtering -  Content-based Filtering - Hybrid

- Memory-based - Model-based

Collaborative Filtering (User-based KNN)


•  Step 1: Finding Similar Users (Pearson Corr.)

5

4

4

1

2

1

5

4

4

1 2

5

Active user

User_1

User_2

User_3

active user

user_1

user_2

user_3



•  Step 1: Finding Similar Users (Pearson Corr.)

5

4

4

1

2

1

5

4

4

1 2

5

Active user

User_1

User_2

User_3

∑∑∑

⊂⊂

⊂

−−

−−=

nunu

nu

CRi nniCRi uui

CRi nniuui

rrrr

rrrrnuSim

,,

,

22 )()(

))((),(

active user

user_1 0.4472136

user_2 0.49236596

user_3 -0.91520863



•  Step 2: Ranking the items to recommend

5

4

4

2

1

5

4

4

Active user

User_1

User_2

2

3

4

2

Item 1

Item 2

Item 3



•  Step 2: Ranking the items to recommend

5

4

4

2

1

5

4

4

Active user

User_1

User_2

∑∑

⊂

⊂−⋅

+=)(

)(

),(

)(),(),(

uneighborsn

uneighborsn nni

u nuuserSim

rrnuuserSimriupred2

3

4

2

Item 1

Item 2

Item 3

Pros/Cons of CF PROS: •  Very simple to implement •  Content-agnostic •  Compared to other techniques such as content-

based, is more accurate CONS: •  Sparsity •  Cold-start •  New Item


Content-Based Filtering •  Can be traced back to techniques from IR, where

the User Profile represents a query.


user_profile = {w_1, w_2, …., w_3} using TF-IDF, weighting

Doc_1 = {w_1, w_2, …., w_3}

Doc_2 = {w_1, w_2, …., w_3}

Doc_3 = {w_1, w_2, …., w_3}

Doc_n = {w_1, w_2, …., w_3}

5

4

5

PROS/CONS of Content-Based Filtering PROS: •  New items can be matched without previous

feedback •  It can exploit also techniques such as LSA or LDA •  It can use semantic data (ConceptNet, WordNet,

etc.) CONS: •  Less accurate than collaborative filtering •  Tends to overspecialization


Hybridization •  Combine previous methods to overcome their

weaknesses (Burke, 2002)


C2. Model/Memory Classification

•  Memory-based methods use the whole dataset in training and prediction. User and Item-based CF are examples.

•  Model-based methods build a model during training and only use this model during prediction. This makes prediction performance way faster and scalable


Model-based: Matrix Factorization


Latent vector of the item

Latent vector of the user

SVD ~ Singular Value Decomposition

PROS/CONS of MF and latent factors model

PROS: •  So far, state-of-the-art in terms of accuracy (these

methods won the Netflix Prize) •  Performance-wise, the best option nowadays: slow

at training time O((m+n)3) compared to correlation O(m2n), but linear at prediction time O(m+n)

CONS: •  Recommendations are obscure: How to explain that

certain “latent factors” produced the recommendation


Rethinking the Recommendation Problem

•  Ratings are scarce: need for exploiting other sources of user preference

•  User-centric recommendation takes the problem beyond ratings and ranked lists: evaluate user engagement and satisfaction, not only RMSE

•  Several other dimensions to consider in the evaluation: novelty of the results, diversity, coverage (user and catalog), serendipity

•  Study de effect of interface characteristics: user-control, explainability


My Take on RecSys Research


My Work on RecSys

•  Traditional RecSys: accurate prediction and TopN algorithms

•  In my research I have contributed to RecSys by: –  Utilizing other sources of user preference (Social Tags) –  Exploiting implicit feedback for recommendation and for

mapping explicit feedback –  Studying user-centric evaluation: the effect of user

controllability on user satisfaction in interactive interfaces

•  And nowadays: Studying whether Virtual Worlds are a good proxy for real world recommendation tasks


This is not only My work J


•  Dr. Peter Brusilovsky University of Pittsburgh, PA, USA

•  Dr. Alexander Troussov IBM Dublin and TCD, Ireland

•  Dr. Xavier Amatriain TID / Netflix, CA, USA

•  Dr. Christoff Trattner NTNU, Norway

•  Dr. Katrien Verbert KU Leuven, Belgium

•  Dr. Leandro Balby-Marinho UFCG, Brasil

TAG-BASED RECOMMENDATION

Tag-based Recommendation

•  D. Parra, P. Brusilovsky. Improving Collaborative Filtering in Social Tagging Systems for the Recommendation of Scientific Articles. Web Intelligence 2010, Toronto, Canada

•  D. Parra, P. Brusilovsky. Collaborative Filtering for Social Tagging Systems: an Experiment with CiteULike. ACM Recsys 2009, New York, NY, USA


Motivation •  Ratings are scarce. Find another source of user

preference: Social Tagging Systems


User

Resource

Tags

A Folksonomy

•  When a user u uses adds an item i using one or more tags t1,…, tn, there is a tagging instance.

•  The collection of tagging instances produces a folksonomy


Applying CF over the Folksonomy

•  In the first step: Calculate user similarity

•  In the second step: incorporate the amount of raters to rank the items (NwCF)


Traditional CF Tag-based CF Pearson Correlation over ratings

BM25 over social tags

Tag-based CF


Query

Doc_1

Doc_2 Doc_3

BM25

= Active User

Okapi BM25


BM25: We obtain the similarity between users (neighbors) using their set of tags as “documents” and performing an Okapi BM25 (probabilistic IR model) Retrieval Status Value calculation.

),())(1(log),( 10 iupredinbriudpre ⋅+=ʹ′

∑∈ +

+⋅

+×+−

+⋅=

qt tq

tq

tdaved

tdd tfk

tfktfLLbbk

tfkIDFRSV

3

3

1

1 )1())/()1((

)1(sim(u,v) =

Tag frequency in the neighbor (v) profile

Tag frequency in the active user (u) profile

Evaluation


Item # unique instances # users 784

# items 26,599

# tags 26,009

# posts 71,413

# annotations 218,930

avg # items per user 91

avg # users per item 2.68

avg # tags per user 88.02

avg # users per tag 2.65

avg # tags per item 7.07

avg # items per tag 7.23

Item Phase 2 dataset

# users 5,849

# articles 574,907

# tags 139,993

#tagging incidents

2,337,571

Filtering process

•  Crawl during 38 days during June-July 2009

Cross-validation


•  Test-validation-train sets, 10-fold cross validation

•  Training to obtain parameter K: neighb. size •  One run the experiment: ~12 hours

Results & Statistical Significance

•  BM25 is intended to bring more neighbors, at the cost of more noise (neighbors not so similar)

•  NwCF helps to decrease noise, so it was natural to combine them and try just that option


CCF NwCF BM25+CCF BM25+NwCF

MAP@10 0.12875 0.1432* 0.1876** 0.1942***

K (neigh.size) 20 22 21 29

Ucov 81.12% 81.12% 99.23% 99.23%

Significance over the baseline: *p < 0.236, ** p < 0.033, *** p < 0.001

Take-aways

•  We can exploit tags as a source for user similarity in recommendation algorithms

•  Tag-based (BM25) similarity can be considered as an alternative to Pearson Correlation to calculate user similarity in STS.

•  Incorporating the number of raters helped to decrease the noise produced by items with too few ratings


IMPLICIT FEEDBACK

Work with Xavier Amatriain

Implicit-Feedback

•  Slides are based on two articles: – Parra-Santander, D., & Amatriain, X. (2011). Walk the

Talk: Analyzing the relation between implicit and explicit feedback for preference elicitation. Proceedings of UMAP 2011, Girona, Spain

– Parra, D., Karatzoglou, A., Amatriain, X., & Yavuz, I. (2011). Implicit feedback recommendation via implicit-to-explicit ordinal logistic regression mapping. Proceedings of the CARS Workshop, Chicago, IL, USA, 2011.


Introduction (1/2)

•  Most of recommender system approaches rely on explicit information of the users, but…

•  Explicit feedback: scarce (people are not especially eager to rate or to provide personal info)

•  Implicit feedback: Is less scarce, but (Hu et al., 2008) There’s no negative feedback

… and if you watch a TV program just once or twice?

Noisy

… but explicit feedback is also noisy (Amatriain et al., 2009)

Preference & Confidence

… we aim to map the I.F. to preference (our main goal)

Lack of evaluation metrics

… if we can map I.F. and E.F., we can have a comparable evaluation


Introduction (2/2)

•  Is it possible to map implicit behavior to explicit preference (ratings)?

•  Which variables better account for the amount of times a user listens to online albums? [Baltrunas & Amatriain CARS ‘09 workshop – RecSys 2009.]

•  OUR APPROACH: Study with Last.fm users – Part I: Ask users to rate 100 albums (how to sample) – Part II: Build a model to map collected implicit feedback

and context to explicit feedback


Walk the Talk (2011)

Albums they listened to during last: 7days, 3months, 6months, year, overall For each album in the list we obtained:

# user plays (in each period), # of global listeners and # of global plays


Walk the Talk - 2 •  Requirements: 18 y.o., scrobblings > 5000


Quantization of Data for Sampling •  What items should they rate? Item (album) sampling:

–  Implicit Feedback (IF): playcount for a user on a given album. Changed to scale [1-3], 3 means being more listened to.

–  Global Popularity (GP): global playcount for all users on a given album [1-3]. Changed to scale [1-3], 3 means being more listened to.

–  Recentness (R) : time elapsed since user played a given album. Changed to scale [1-3], 3 means being listened to more recently.


4 Regression Analysis

•  Including Recentness increases R2 in more than 10% [ 1 -> 2] •  Including GP increases R2, not much compared to RE + IF [ 1 -> 3] •  Not Including GP, but including interaction between IF and RE

improves the variance of the DV explained by the regression model. [ 2 -> 4 ]

M1: implicit feedback

M2: implicit feedback & recentness

M4: Interaction of implicit feedback & recentness

M3: implicit feedback, recentness, global popularity


4.1 Regression Analysis

•  We tested conclusions of regression analysis by predicting the score, checking RMSE in 10-fold cross validation.

•  Results of regression analysis are supported.

Model RMSE1 RMSE2 User average 1.5308 1.1051 M1: Implicit feedback 1.4206 1.0402 M2: Implicit feedback + recentness 1.4136 1.034 M3: Implicit feedback + recentness + global popularity 1.4130 1.0338 M4: Interaction of Implicit feedback * recentness 1.4127 1.0332


Conclusions of Part I

•  Using a linear model, Implicit feedback and recentness can help to predict explicit feedback (in the form of ratings)

•  Global popularity doesn’t show a significant improvement in the prediction task

•  Our model can help to relate implicit and explicit feedback, helping to evaluate and compare explicit and implicit recommender systems.


Part II: Extension of Walk the Talk

•  Implicit Feedback Recommendation via Implicit-to-Explicit OLR Mapping (Recsys 2011, CARS Workshop) – Consider ratings as ordinal variables – Use mixed-models to account for non-independence of

observations – Compare with state-of-the-art implicit feedback

algorithm


Recalling the 1st study (5/5) •  Prediction of rating by multiple Linear Regression

evaluated with RMSE. •  Results showed that Implicit feedback (play count

of the album by a specific user) and recentness (how recently an album was listened to) were important factors, global popularity had a weaker effect.

•  Results also showed that listening style (if user preferred to listen to single tracks, CDs, or either) was also an important factor, and not the other ones.


... but

•  Linear Regression didn’t account for the nested nature of ratings

•  And ratings were treated as continuous, when they are actually ordinal.

User 1

1 3 5 3 0 4 5 2 2 1 5 4 3 2

User n

3 2 1 0 4 5 2 5 4 3 2 1 3 5

. . .


So, Ordinal Logistic Regression! •  Actually Mixed-Efects Ordinal Multinomial Logistic

Regression •  Mixed-effects: Nested nature of ratings •  We obtain a distribution over ratings (ordinal

multinomial) per each pair USER, ITEM -> we predict the rating using the expected value.

•  … And we can compare the inferred ratings with a method that directly uses implicit information (playcounts) to recommend ( by Hu, Koren et al. 2007)


Ordinal Regression for Mapping

•  Model

•  Predicted value


Datasets

•  D1: users, albums, if, re, gp, ratings, demographics/consumption

•  D2: users, albums, if, re, gp, NO RATINGS.


Results


Conclusions & Current Work

Problem/ Challenge

1.  Ground truth: How many Playcounts to relevancy? > Sensibility Analysis needed

2. Quantization of playcounts (implicit feedback), recentness, and overall number of listeners of an album (global popularity) [1-3] scale v/s raw playcounts > modifiy and compare 3. Additional/Alternative metrics for evaluation [MAP and nDCG used in the paper]


VISUALIZATION + USER CONTROLLABILITY

Part of this work with Katrien Verbert

Visualization & User Controllability

•  Motivation: Can user controllability and explainability improve user engagement and satisfaction with a recommender system?

•  Specific research question: How intersections of contexts of relevance (of recommendation algorithms) might be better represented for user experience with the recommender?


The Concept of Controllability MovieLens: example of traditional recommender list


Visualization & User Controllability

•  Motivation: Can user controllability and explainability improve user engagement and satisfaction with a recommender system?

•  Specific research question: How intersections of contexts of relevance (of recommendation algorithms) might be better represented for user experience with the recommender?


Research Platform

•  The studies were conducted using Conference Navigator, a Conference Support System

•  Our goal was recommending conference talks

Program

Proceedings

Author List

Recommendations

http://halley.exp.sis.pitt.edu/cn3/


Hybrid RecSys: Visualizing Intersections

Clustermap Venn diagram

•  Clustermap vs. Venn Diagram


TalkExplorer – IUI 2013 •  Adaptation of Aduna Visualization to CN •  Main research question: Does fusion (intersection) of

contexts of relevance improve user experience?


TalkExplorer - I

Entities Tags, Recommender Agents, Users


TalkExplorer - II

Recommender Recommender

Cluster with intersection of entities Cluster (of

talks) associated to only one entity

•  Canvas Area: Intersections of Different Entities

User


TalkExplorer - III

Items Talks explored by the user


Our Assumptions

•  Items which are relevant in more that one aspect could be more valuable to the users

• Displaying multiple aspects of relevance visually is important for the users in the process of item’s exploration


TalkExplorer Studies I & II •  Study I

– Controlled Experiment: Users were asked to discover relevant talks by exploring the three types of entities: tags, recommender agents and users.

– Conducted at Hypertext and UMAP 2012 (21 users) –  Subjects familiar with Visualizations and Recsys

•  Study II –  Field Study: Users were left free to explore the interface. – Conducted at LAK 2012 and ECTEL 2013 (18 users) –  Subjects familiar with visualizations, but not much with

RecSys


Evaluation: Intersections & Effectiveness •  What do we call an “Intersection”?

•  We used #explorations on intersections and their effectiveness, defined as:

Effectiveness = |𝑏𝑜𝑜𝑘𝑚𝑎𝑟𝑘𝑒𝑑 𝑖𝑡𝑒𝑚𝑠|/|𝑖𝑛𝑡𝑒𝑟𝑒𝑠𝑒𝑐𝑡𝑖𝑜𝑛𝑠 𝑒𝑥𝑝𝑙𝑜𝑟𝑒𝑑| 


Results of Studies I & II

•  Effectiveness increases with intersections of more entities

•  Effectiveness wasn’t affected in the field study (study 2)

•  … but exploration distribution was affected


SETFUSION: VENN DIAGRAM FOR USER-CONTROLLABLE INTERFACE

76

SetFusion – IUI 2014


SetFusion I

Traditional Ranked List Papers sorted by Relevance. It combines 3 recommendation approaches.


SetFusion - II Sliders Allow the user to control the importance of each data source or recommendation method

Interactive Venn Diagram Allows the user to inspect and to filter papers recommended. Actions available: -  Filter item list by clicking on an area -  Highlight a paper by mouse-over on a circle -  Scroll to paper by clicking on a circle -  Indicate bookmarked papers


SetFusion – UMAP 2012

•  Field Study: let users freely explore the interface

-  ~50% (50 users) tried the SetFusion recommender

-  28% (14 users) bookmarked at least one paper

-  Users explored in average 14.9 talks and bookmarked 7.36 talks in average.

A AB ABC AC B BC C15 7 9 26 18 4 1716% 7% 9% 27% 19% 4% 18%

Distribution of bookmarks per method or combination of methods


TalkExplorer vs. SetFusion

•  Comparing distributions of explorations

In studies 1 and 2 over talkExplorer we observed an important change in the distribution of explorations.


TalkExplorer vs. SetFusion

•  Comparing distributions of explorations

Comparing the field studies: -  In TalkExplorer, 84% of

the explorations over intersections were performed over clusters of 1 item

-  In SetFusion, was only 52%, compared to 48% (18% + 30%) of multiple intersections, diff. not statistically significant


Summary & Conclusions

•  We presented two implementations of visual interactive interfaces that tackle exploration on a recommendation setting

•  We showed that intersections of several contexts of relevance help to discover relevant items

•  The visual paradigm used can have a strong effect on user behavior: we need to keep working on visual representation that promote exploration without increasing the cognitive load over the users


Limitations & Future Work

•  Apply our approach to other domains (fusion of data sources or recommendation algorithms)

•  For SetFusion, find alternatives to scale the approach to more than 3 sets, potential alternatives: – Clustering and – Radial sets

•  Consider other factors that interact with the user satisfaction: – Controllability by itself vs. minimum level of accuracy


More Details on SetFusion?

•  Effect of other variables: gender, age, experience with in the domain, or familiarity with the system

•  Check our upcoming paper in the IJHCS “User-controllable Personalization: A Case Study with SetFusion”: Controlled Laboratory study with SetFusion versus traditional ranked list


CONCLUSIONS (& CURRENT) & FUTURE WORK

Challenges in Recommender Systems •  Recommendation to groups •  Cross-Domain recommendation •  User-centric evaluation •  Interactive interfaces and visualization •  Improve Evaluation for comparison (P. Campos of U.

of Bio-Bio on doing fair evaluations considering time) •  ML: Active learning, multi-armed bandits (exploration,

exploitation) •  Prevent the “Filter Bubble” •  Make algorithms resistant to attacks


•  Why? We have a Second Life dataset with 3 connected dimensions of information

•  2 undergoing projects: Entrepreneurship and LBSN

Are Virtual Worlds Good Proxies for Real World ?


Social Network

Marketplace

Virtual World

Entrepreneurship •  Can we predict whether a user will create a store

and how successful will she/he be? Literature on this area is extremely scarce.


Social Network Marketplace

James Gaskin SEM, Causal models BYU, USA

Stephen Zhang Entrepreneurship PUC Chile

Christoph Trattner Social Networks NTNU, Norway

Location-Based Social Networks (LBSN) •  How similar are the patterns of mobility in real

world and virtual world ?


Social Network Virtual World

Christoph Trattner Social Networks NTNU, Norway

Leandro Balby-Marinho LBSN and RecSys UFCG, Brasil

Other RecSys Activities

•  I am part of the Program Committee of the 2015 RecSys challenge. Don’t miss it!

»  Is the user going to buy items in this session? Yes|No »  If yes, what are the items that are going to be bought?

•  Part of team creating the upcoming RecSys Forum (like SIGIR Forum). Coming Soon! (Alan Said, Cataldo Musto, Alejandro Bellogin, etc.)


THANKS! [email protected]

research on recommender systems: beyond ratings and lists

Education