presentacion dcai 2010
TRANSCRIPT
DCAI 2010, September 7-10 2010, Valencia
A Recommendation System for the Semantic Web
Victor Codina and Luigi Ceccaroni
Departament de Llenguatges i Sistemes Informàtics (LSI)
Universitat Politècnica de Catalunya (UPC)
Introduction Our semantic approach Evaluation Conclusions
Outline
DCAI 2010, September 7-10 2010, Valencia
Introduction & motivations
Our semantic approach
Evaluation
Conclusions & future work
2
Introduction Our semantic approach Evaluation Conclusions
The general personalization process
DCAI 2010, September 7-10 2010, Valencia 3
Learningalgorithm
ITEMS
Item Representation
User Profile
Recommendation strategy
PersonalizedRecommendation
USERS
USER MODELING
CONTENT ADAPTATION
User satisfaction
Implicitfeedback
Explicitfeedback
User behavior
Introduction Our semantic approach Evaluation Conclusions
Potential benefits of using semantics
DCAI 2010, September 7-10 2010, Valencia
The use of semantics provides several advantages to reduce some limitations of current recommenders
o Cold-start problem
• By inferring missing information exploiting the relationships of domain ontologies
o Domain-dependency
• By employing standard ontology-based languages to uniformly represent information
4
Introduction Our semantic approach Evaluation Conclusions
Service oriented architecture design
DCAI 2010, September 7-10 2010, Valencia 5
Introduction Our semantic approach Evaluation Conclusions
Ontology-based representation (weighted overlay)
Weighted User’s interest
User’s interests and Item representation
DCAI 2010, September 7-10 2010, Valencia 6
Concept taxonomies
Weighted Item annotation
Introduction Our semantic approach Evaluation Conclusions
How do we take advantage of semantics?
DCAI 2010, September 7-10 2010, Valencia
We incorporate semantics in both stages of thepersonalization process to reduce the cold-start problem
o The user-profile learning algorithm employs a domain-basedinference method
• It expands and enrich the user-profiles with interests that cannotbe directly inferred from the user feedback
o The Content-based recommendation algorithm employs a taxonomy-based similarity method
• It uses the user’s interests in more general concepts related to the item’s annotations in order to refine the matching calculation
7
Introduction Our semantic approach Evaluation Conclusions
Step 1. Interest weights of the concepts related to the item are calculated/updated
Semantically-enhanced learning algorithm
DCAI 2010, September 7-10 2010, Valencia 8
START. The user provides some feedbackabout an item (e.g. a purchase or rating of an item)
Step 2. A domain-based inference methodinfers new interests from the families of concepts with updated interests
Updated
Learnt
Inferred
Item
User
Introduction Our semantic approach Evaluation Conclusions
Based on the minimum percentage of direct subconcepts
Two types of propagationo Upward-based (propagation to the parent concept)
o Sideward-based (propagation to the siblings)
The domain-based inference method
DCAI 2010, September 7-10 2010, Valencia 9
Baseball Basketball Football Tennis Golf
Sport
[1.0][1.0][-0.5] [0.5]
[0.5]
Upward-based threshold (UIT) = 0.6Sideward-based threshold (SIT) = 0.9
Upward-based? Pct(subconcepts) = 4/5 = 0.8 0.8 > UIT = 0.6 => Propagation
Sideward-based? Pct(subconcepts) = 4/5 = 0.8 0.8 > SIT = 0.9 => No propagation
[ ? ]
Introduction Our semantic approach Evaluation Conclusions
Semantically-enhanced content-based filtering
DCAI 2010, September 7-10 2010, Valencia 10
START. The system has to predict if the user will like/dislike an item
FOR EACH item’s annotation DO:
STEP 1. The conceptScore is calculated based on:• The interest degree of the user’s interests that match the item’s annotation• The semantic similarity of the matchings (perfect or partial match)
END FOR
STEP 2. The itemScore is calculated using the weighted average of conceptScore values according to their relevance
User
Item
C1
C2Perfect
Partial
Partial
Introduction Our semantic approach Evaluation Conclusions
The taxonomy-based similarity method
DCAI 2010, September 7-10 2010, Valencia
Based on the distance in terms of taxonomy levels betweeno The item’s annotation
o The user’s interest (an ancestor of the item’s annotation)
Weighted semantic distance among levels using K factor
11
Genre
Romance
Steamy RomanceLevel 3
Level 2
Level 1 Source
Sport
Extreme
ClimbingLevel 4
UserInterest
ItemAnnotation
SIM = 0.7
ItemAnnotation
User Interest
SIM = 0.6distance = 1
distance = 1
K4 = 0.3
K3 = 0.4
Introduction Our semantic approach Evaluation Conclusions
Experimental dataset
DCAI 2010, September 7-10 2010, Valencia
Netflix-prize movie dataset
o 480,000 users
o 17,700 movies
o 100M user ratings ranging between 1 and 5
Movie taxonomy used by Netflix for annotating movies
o 1 global hierarchy of concepts describing the movies
o 3 levels of depth
o 550 nodes (item’s annotations)
RMSE metric
o Measures the error on rating prediction for a set of users
12
Introduction Our semantic approach Evaluation Conclusions
Experimental evaluation
DCAI 2010, September 7-10 2010, Valencia
Exp. 1: Traditional vs semantic approach
o GOAL. To evaluate the improvement on accuracy when the semantics-based methods are employed
• Is cold-start problem reduced?
Exp. 2: Semantic approach on two different taxonomies
o GOAL. To analyze if the hierarchical structure of the taxonomy affect the effectiveness of semantics-based methods
• How the taxonomy structure affect their performance?
13
Introduction Our semantic approach Evaluation Conclusions
Exp.1: Traditional vs Semantic approach
DCAI 2010, September 7-10 2010, Valencia
Experiment setup
o The error of two algorithm configurations is compared
• CB configuration (traditional CB approach)
• SEM-CB configuration (semantically-enhanced CB approach)
14
Config.User profile
representationInterest-prediction
methodItem - User matching
CBKeyword-based
profile Rating-based Perfect matches
SEM-CBOntology-based
profile
Rating-based +
Domain inference
Perfect + Partial matches
(semantic similarity)
Introduction Our semantic approach Evaluation Conclusions
Exp.1: Traditional vs Semantic approach
DCAI 2010, September 7-10 2010, Valencia
Overall prediction results:
15
1,025
1,03
1,035
1,04
1,045
1,05
1,055
1,06
1,065
CB SEM-CB
RMSE
Introduction Our semantic approach Evaluation Conclusions
Exp.1: Traditional vs Semantic approach
DCAI 2010, September 7-10 2010, Valencia
Prediction results grouped by user-profile size (nº ratings)
16
Each interval nearly contains2% of predictions of the Netflix test-set
Introduction Our semantic approach Evaluation Conclusions
Exp.1: Traditional vs Semantic approach
DCAI 2010, September 7-10 2010, Valencia
Comparison of RMSE based on user-profile size
17
The improvement is bigger in users with small profile-size (the cold-start users)
Introduction Our semantic approach Evaluation Conclusions
Exp.2: Semantic approach on different taxonomies
DCAI 2010, September 7-10 2010, Valencia
Experiment setup
o Two semantics-based configurations are compared on different versions of the movie taxonomy:
• Sem-CB configuration (employs the original taxonomy)
• Sem-CB+ configuration (employs an alternative version)
18
Taxonomy properties
Config. Nº nodes Nº levels Nº hierarchiesAvg. Size of nodes
per family
SEM-CB 550 3 1 14
SEM-CB+ 550 4 4 7
Introduction Our semantic approach Evaluation Conclusions
Exp.2: Semantic approach on different taxonomies
DCAI 2010, September 7-10 2010, Valencia
Results:
19
Parameter settings of semantics-based algorithms
Optimal execution Same accuracy
Introduction Our semantic approach Evaluation Conclusions
Conclusions and Future work
DCAI 2010, September 7-10 2010, Valencia
Main conclusions
o The cold-start problem is reduced by exploiting semantics
o The incorporation of semantics in a traditional CB approach
o The recommender is domain-independent by combining
• A service oriented architecture design
• Standard ontology-based languages (FOAF, OWL)
Future work
o Further experimentation
• In richer domains and with other semantic methods
o The incorporation of semantics into other approaches
• e.g. Collaborative Filtering and Hybrid systems
20
DCAI 2010, September 7-10 2010, Valencia
A Recommendation System for the Semantic Web
Victor Codina and Luigi Ceccaroni
Departament de Llenguatges i Sistemes Informàtics (LSI)
Universitat Politècnica de Catalunya (UPC)
Introduction Our semantic approach Evaluation Conclusions
Exp.1: Traditional vs Semantically-enhanced
DCAI 2010, September 7-10 2010, Valencia
Comparison of overall accuracy results:
22
0,880,9
0,920,940,960,98
11,021,041,061,08
RMSE