presentacion dcai 2010

22
DCAI 2010, September 7-10 2010, Valencia A Recommendation System for the Semantic Web Victor Codina and Luigi Ceccaroni [email protected] Departament de Llenguatges i Sistemes Informàtics (LSI) Universitat Politècnica de Catalunya (UPC)

Upload: victor-codina

Post on 10-May-2015

265 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Presentacion Dcai 2010

DCAI 2010, September 7-10 2010, Valencia

A Recommendation System for the Semantic Web

Victor Codina and Luigi Ceccaroni

[email protected]

Departament de Llenguatges i Sistemes Informàtics (LSI)

Universitat Politècnica de Catalunya (UPC)

Page 2: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Outline

DCAI 2010, September 7-10 2010, Valencia

Introduction & motivations

Our semantic approach

Evaluation

Conclusions & future work

2

Page 3: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

The general personalization process

DCAI 2010, September 7-10 2010, Valencia 3

Learningalgorithm

ITEMS

Item Representation

User Profile

Recommendation strategy

PersonalizedRecommendation

USERS

USER MODELING

CONTENT ADAPTATION

User satisfaction

Implicitfeedback

Explicitfeedback

User behavior

Page 4: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Potential benefits of using semantics

DCAI 2010, September 7-10 2010, Valencia

The use of semantics provides several advantages to reduce some limitations of current recommenders

o Cold-start problem

• By inferring missing information exploiting the relationships of domain ontologies

o Domain-dependency

• By employing standard ontology-based languages to uniformly represent information

4

Page 5: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Service oriented architecture design

DCAI 2010, September 7-10 2010, Valencia 5

Page 6: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Ontology-based representation (weighted overlay)

Weighted User’s interest

User’s interests and Item representation

DCAI 2010, September 7-10 2010, Valencia 6

Concept taxonomies

Weighted Item annotation

Page 7: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

How do we take advantage of semantics?

DCAI 2010, September 7-10 2010, Valencia

We incorporate semantics in both stages of thepersonalization process to reduce the cold-start problem

o The user-profile learning algorithm employs a domain-basedinference method

• It expands and enrich the user-profiles with interests that cannotbe directly inferred from the user feedback

o The Content-based recommendation algorithm employs a taxonomy-based similarity method

• It uses the user’s interests in more general concepts related to the item’s annotations in order to refine the matching calculation

7

Page 8: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Step 1. Interest weights of the concepts related to the item are calculated/updated

Semantically-enhanced learning algorithm

DCAI 2010, September 7-10 2010, Valencia 8

START. The user provides some feedbackabout an item (e.g. a purchase or rating of an item)

Step 2. A domain-based inference methodinfers new interests from the families of concepts with updated interests

Updated

Learnt

Inferred

Item

User

Page 9: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Based on the minimum percentage of direct subconcepts

Two types of propagationo Upward-based (propagation to the parent concept)

o Sideward-based (propagation to the siblings)

The domain-based inference method

DCAI 2010, September 7-10 2010, Valencia 9

Baseball Basketball Football Tennis Golf

Sport

[1.0][1.0][-0.5] [0.5]

[0.5]

Upward-based threshold (UIT) = 0.6Sideward-based threshold (SIT) = 0.9

Upward-based? Pct(subconcepts) = 4/5 = 0.8 0.8 > UIT = 0.6 => Propagation

Sideward-based? Pct(subconcepts) = 4/5 = 0.8 0.8 > SIT = 0.9 => No propagation

[ ? ]

Page 10: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Semantically-enhanced content-based filtering

DCAI 2010, September 7-10 2010, Valencia 10

START. The system has to predict if the user will like/dislike an item

FOR EACH item’s annotation DO:

STEP 1. The conceptScore is calculated based on:• The interest degree of the user’s interests that match the item’s annotation• The semantic similarity of the matchings (perfect or partial match)

END FOR

STEP 2. The itemScore is calculated using the weighted average of conceptScore values according to their relevance

User

Item

C1

C2Perfect

Partial

Partial

Page 11: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

The taxonomy-based similarity method

DCAI 2010, September 7-10 2010, Valencia

Based on the distance in terms of taxonomy levels betweeno The item’s annotation

o The user’s interest (an ancestor of the item’s annotation)

Weighted semantic distance among levels using K factor

11

Genre

Romance

Steamy RomanceLevel 3

Level 2

Level 1 Source

Sport

Extreme

ClimbingLevel 4

UserInterest

ItemAnnotation

SIM = 0.7

ItemAnnotation

User Interest

SIM = 0.6distance = 1

distance = 1

K4 = 0.3

K3 = 0.4

Page 12: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Experimental dataset

DCAI 2010, September 7-10 2010, Valencia

Netflix-prize movie dataset

o 480,000 users

o 17,700 movies

o 100M user ratings ranging between 1 and 5

Movie taxonomy used by Netflix for annotating movies

o 1 global hierarchy of concepts describing the movies

o 3 levels of depth

o 550 nodes (item’s annotations)

RMSE metric

o Measures the error on rating prediction for a set of users

12

Page 13: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Experimental evaluation

DCAI 2010, September 7-10 2010, Valencia

Exp. 1: Traditional vs semantic approach

o GOAL. To evaluate the improvement on accuracy when the semantics-based methods are employed

• Is cold-start problem reduced?

Exp. 2: Semantic approach on two different taxonomies

o GOAL. To analyze if the hierarchical structure of the taxonomy affect the effectiveness of semantics-based methods

• How the taxonomy structure affect their performance?

13

Page 14: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Exp.1: Traditional vs Semantic approach

DCAI 2010, September 7-10 2010, Valencia

Experiment setup

o The error of two algorithm configurations is compared

• CB configuration (traditional CB approach)

• SEM-CB configuration (semantically-enhanced CB approach)

14

Config.User profile

representationInterest-prediction

methodItem - User matching

CBKeyword-based

profile Rating-based Perfect matches

SEM-CBOntology-based

profile

Rating-based +

Domain inference

Perfect + Partial matches

(semantic similarity)

Page 15: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Exp.1: Traditional vs Semantic approach

DCAI 2010, September 7-10 2010, Valencia

Overall prediction results:

15

1,025

1,03

1,035

1,04

1,045

1,05

1,055

1,06

1,065

CB SEM-CB

RMSE

Page 16: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Exp.1: Traditional vs Semantic approach

DCAI 2010, September 7-10 2010, Valencia

Prediction results grouped by user-profile size (nº ratings)

16

Each interval nearly contains2% of predictions of the Netflix test-set

Page 17: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Exp.1: Traditional vs Semantic approach

DCAI 2010, September 7-10 2010, Valencia

Comparison of RMSE based on user-profile size

17

The improvement is bigger in users with small profile-size (the cold-start users)

Page 18: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Exp.2: Semantic approach on different taxonomies

DCAI 2010, September 7-10 2010, Valencia

Experiment setup

o Two semantics-based configurations are compared on different versions of the movie taxonomy:

• Sem-CB configuration (employs the original taxonomy)

• Sem-CB+ configuration (employs an alternative version)

18

Taxonomy properties

Config. Nº nodes Nº levels Nº hierarchiesAvg. Size of nodes

per family

SEM-CB 550 3 1 14

SEM-CB+ 550 4 4 7

Page 19: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Exp.2: Semantic approach on different taxonomies

DCAI 2010, September 7-10 2010, Valencia

Results:

19

Parameter settings of semantics-based algorithms

Optimal execution Same accuracy

Page 20: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Conclusions and Future work

DCAI 2010, September 7-10 2010, Valencia

Main conclusions

o The cold-start problem is reduced by exploiting semantics

o The incorporation of semantics in a traditional CB approach

o The recommender is domain-independent by combining

• A service oriented architecture design

• Standard ontology-based languages (FOAF, OWL)

Future work

o Further experimentation

• In richer domains and with other semantic methods

o The incorporation of semantics into other approaches

• e.g. Collaborative Filtering and Hybrid systems

20

Page 21: Presentacion Dcai 2010

DCAI 2010, September 7-10 2010, Valencia

A Recommendation System for the Semantic Web

Victor Codina and Luigi Ceccaroni

[email protected]

Departament de Llenguatges i Sistemes Informàtics (LSI)

Universitat Politècnica de Catalunya (UPC)

Page 22: Presentacion Dcai 2010

Introduction Our semantic approach Evaluation Conclusions

Exp.1: Traditional vs Semantically-enhanced

DCAI 2010, September 7-10 2010, Valencia

Comparison of overall accuracy results:

22

0,880,9

0,920,940,960,98

11,021,041,061,08

RMSE