recommender systems evaluation: a 3d benchmark - presented at rue 2012 workshop at acm recsys 2012

13
Recommender systems evaluation: a 3D benchmark Alan Said 1 , Domonkos Tikk 2 , Yue Shi 3 , Martha Larson 3 , Klára Stumpf 2 , Paolo Cremonesi 4 1: TU Berlin 2: Gravity R&D 3: TU Delft 4: Politecnico di Milano/Moviri

Upload: domonkos-tikk

Post on 19-Aug-2015

940 views

Category:

Technology


2 download

TRANSCRIPT

Recommender systems evaluation: a 3D benchmark

Alan Said1, Domonkos Tikk2, Yue Shi3, Martha Larson3, Klára Stumpf2, Paolo Cremonesi4

1: TU Berlin2: Gravity R&D3: TU Delft4: Politecnico di Milano/Moviri

Motivation

• Current recsys evaluation benchmarks are insufficient– mostly focused on IR measures (RMSE,

MAP@X, precision/recall)– does not consider the need of all stakeholders

(users, content provider, recsys vendor)– technological and business requirements are

mostly overlooked

• 3D Recommender System Benchmarking Model

Stakeholders

users

content of service provider

recommender

The Proposed 3D model

Recent benchmarks (1)

• pros:– Large scale– very well organized

• cons:– qualitative assessment of recommendation:

simplified to RMSE– rating prediction (not ranking)– no focus on direct business and technical

parameters (scalability, robustness, reactivity)

Recent benchmarks (2)

• pros:– constraints on training and response time– real traffic (only planned)– major driver: revenue increase

• cons:– only business goals, but otherwise unclear optimization

criteria– user needs are neglected– organization

Recent Benchmarks (3)

• pros:– availability of additional metadata (compared to KDD

Cup 2011)– not rating based (implicit feedback)– ranking based evaluation metric (MAP@500)

• cons:– offline evaluation– size does not matter anymore (lower interest)– no business requirements or technical constraint

3D MODEL

User requirements

• functional (quality-related)– relevant, interesting, novel, diverse,

serendipitious, context-aware, ethical, etc.

• non-functional (technology related)– real-time– usability-related

Business requirements

• Business model – for-profit: revenue stream – NP-style: award driven (reputation, community

building)

• KPI depends on the application area– Revenue increase– CTR– Raise awarness to content or service

Technical constraints

• data driven– availability of user feedback (e.g. satellite TV)

• system driven– hardware/software limitations (device-

dependent)

• scalability– typical response time

• robustness

Example

• VoD recommendation scenario (TV)– user: easy contect exploration, context-

awareness (time, viewer identification)– business: increase VoD sales & awareness

(user base)– technical: middleware, HW/SW of the

provider, response time

Facit

• Recommendation tasks have many aspects typically overlooked

• Tasks define the important user, business, and technical quality measures– the fulfilment of all is required at a certain level– trade-off is usually required

• Proposal: with our 3D evaluation concept more comprehensive evaluation can be achieved