presentacion dcai 2010

DCAI 2010, September 7-10 2010, Valencia

A Recommendation System for the Semantic Web

Victor Codina and Luigi Ceccaroni

[email protected]

Departament de Llenguatges i Sistemes Informàtics (LSI)

Universitat Politècnica de Catalunya (UPC)

mailto:[email protected]

Introduction Our semantic approach Evaluation Conclusions

Outline


Introduction & motivations

Our semantic approach

Evaluation

Conclusions & future work

2


The general personalization process

DCAI 2010, September 7-10 2010, Valencia 3

Learningalgorithm

ITEMS

Item Representation

User Profile

Recommendation strategy

PersonalizedRecommendation

USERS

USER MODELING

CONTENT ADAPTATION

User satisfaction

Implicitfeedback

Explicitfeedback

User behavior


Potential benefits of using semantics


The use of semantics provides several advantages to reduce some limitations of current recommenders

o Cold-start problem

• By inferring missing information exploiting the relationships of domain ontologies

o Domain-dependency

• By employing standard ontology-based languages to uniformly represent information

4


Service oriented architecture design



Ontology-based representation (weighted overlay)

Weighted User’s interest

User’s interests and Item representation


Concept taxonomies

Weighted Item annotation


How do we take advantage of semantics?


We incorporate semantics in both stages of thepersonalization process to reduce the cold-start problem

o The user-profile learning algorithm employs a domain-basedinference method

• It expands and enrich the user-profiles with interests that cannotbe directly inferred from the user feedback

o The Content-based recommendation algorithm employs a taxonomy-based similarity method

• It uses the user’s interests in more general concepts related to the item’s annotations in order to refine the matching calculation

7


Step 1. Interest weights of the concepts related to the item are calculated/updated

Semantically-enhanced learning algorithm


START. The user provides some feedbackabout an item (e.g. a purchase or rating of an item)

Step 2. A domain-based inference methodinfers new interests from the families of concepts with updated interests

Updated

Learnt

Inferred

Item

User


Based on the minimum percentage of direct subconcepts

Two types of propagationo Upward-based (propagation to the parent concept)

o Sideward-based (propagation to the siblings)

The domain-based inference method


Baseball Basketball Football Tennis Golf

Sport

[1.0][1.0][-0.5] [0.5]

[0.5]

Upward-based threshold (UIT) = 0.6Sideward-based threshold (SIT) = 0.9

Upward-based? Pct(subconcepts) = 4/5 = 0.8 0.8 > UIT = 0.6 => Propagation

Sideward-based? Pct(subconcepts) = 4/5 = 0.8 0.8 > SIT = 0.9 => No propagation

[ ? ]


Semantically-enhanced content-based filtering


START. The system has to predict if the user will like/dislike an item

FOR EACH item’s annotation DO:

STEP 1. The conceptScore is calculated based on:• The interest degree of the user’s interests that match the item’s annotation• The semantic similarity of the matchings (perfect or partial match)

END FOR

STEP 2. The itemScore is calculated using the weighted average of conceptScore values according to their relevance

User

Item

C1

C2Perfect

Partial

Partial


The taxonomy-based similarity method


Based on the distance in terms of taxonomy levels betweeno The item’s annotation

o The user’s interest (an ancestor of the item’s annotation)

Weighted semantic distance among levels using K factor

11

Genre

Romance

Steamy RomanceLevel 3

Level 2

Level 1 Source

Sport

Extreme

ClimbingLevel 4

UserInterest

ItemAnnotation

SIM = 0.7

ItemAnnotation

User Interest

SIM = 0.6distance = 1

distance = 1

K4 = 0.3

K3 = 0.4


Experimental dataset


Netflix-prize movie dataset

o 480,000 users

o 17,700 movies

o 100M user ratings ranging between 1 and 5

Movie taxonomy used by Netflix for annotating movies

o 1 global hierarchy of concepts describing the movies

o 3 levels of depth

o 550 nodes (item’s annotations)

RMSE metric

o Measures the error on rating prediction for a set of users

12


Experimental evaluation


Exp. 1: Traditional vs semantic approach

o GOAL. To evaluate the improvement on accuracy when the semantics-based methods are employed

• Is cold-start problem reduced?

Exp. 2: Semantic approach on two different taxonomies

o GOAL. To analyze if the hierarchical structure of the taxonomy affect the effectiveness of semantics-based methods

• How the taxonomy structure affect their performance?

13


Exp.1: Traditional vs Semantic approach


Experiment setup

o The error of two algorithm configurations is compared

• CB configuration (traditional CB approach)

• SEM-CB configuration (semantically-enhanced CB approach)

14

Config.User profile

representationInterest-prediction

methodItem - User matching

CBKeyword-based

profile Rating-based Perfect matches

SEM-CBOntology-based

profile

Rating-based +

Domain inference

Perfect + Partial matches

(semantic similarity)




Overall prediction results:

15

1,025

1,03

1,035

1,04

1,045

1,05

1,055

1,06

1,065

CB SEM-CB

RMSE




Prediction results grouped by user-profile size (nº ratings)

16

Each interval nearly contains2% of predictions of the Netflix test-set




Comparison of RMSE based on user-profile size

17

The improvement is bigger in users with small profile-size (the cold-start users)


Exp.2: Semantic approach on different taxonomies


Experiment setup

o Two semantics-based configurations are compared on different versions of the movie taxonomy:

• Sem-CB configuration (employs the original taxonomy)

• Sem-CB+ configuration (employs an alternative version)

18

Taxonomy properties

Config. Nº nodes Nº levels Nº hierarchiesAvg. Size of nodes

per family

SEM-CB 550 3 1 14

SEM-CB+ 550 4 4 7


Exp.2: Semantic approach on different taxonomies


Results:

19

Parameter settings of semantics-based algorithms

Optimal execution Same accuracy


Conclusions and Future work


Main conclusions

o The cold-start problem is reduced by exploiting semantics

o The incorporation of semantics in a traditional CB approach

o The recommender is domain-independent by combining

• A service oriented architecture design

• Standard ontology-based languages (FOAF, OWL)

Future work

o Further experimentation

• In richer domains and with other semantic methods

o The incorporation of semantics into other approaches

• e.g. Collaborative Filtering and Hybrid systems

20


A Recommendation System for the Semantic Web

Victor Codina and Luigi Ceccaroni

[email protected]

Departament de Llenguatges i Sistemes Informàtics (LSI)

Universitat Politècnica de Catalunya (UPC)

mailto:[email protected]


Exp.1: Traditional vs Semantically-enhanced


Comparison of overall accuracy results:

22

0,880,9

0,920,940,960,98

11,021,041,061,08

RMSE

presentacion dcai 2010

Documents

based propagation

semantic similarity

semantic distance

based threshold sit

based threshold uit

thesemanticsbased methods

semantic webvictor codina

user item level