sssw 2013 - feeding recommender systems with linked open data
DESCRIPTION
Basic introduction to recommender systems + Implementing a content-based recommender system by leveraging knowledge encoded into Linked Open Data datasetsTRANSCRIPT
TOOLS AND TECHNIQUES
“FEEDING RECOMMENDER SYSTEMS WITH LINKED OPEN DATA”
Tommaso Di Noia [email protected]
Politecnico di Bari ITALY
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Recommender Systems
Input Data: A set of users U={u1, …, uM} A set of items X={x1, …, xN} The rating matrix R=[ru,i]
Problem Definition:
Given user u and target item i Predict the rating ru,i
A definition Recommender Systems (RSs) are software tools and techniques providing suggestions for items to be of use to a user. [F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor, editors. Recommender Systems Handbook. Springer, 2011.]
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
The rating matrix
5 1 2 4 3 ??
2 4 5 3 5 2
4 3 2 4 1 3
3 5 1 5 2 4
4 4 5 3 5 2
The
Mat
rix
Tita
nic
I lo
ve s
ho
pp
ing
Arg
o
Love
Act
ual
ly
The
han
gove
r
Tommaso
Enrico
Sean
Natasha
Valentina
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
The rating matrix (in the real world)
5 4 3 ??
2 4 5 5
3 4 3
3 5 5 2
4 4 5 5 2
The
Mat
rix
Tita
nic
I lo
ve s
ho
pp
ing
Arg
o
Love
Act
ual
ly
The
han
gove
r
Tommaso
Enrico
Sean
Natasha
Valentina
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Collaborative Recommender Systems
Collaborative RSs recommend items to a user by identifying other users with a similar profile
Recommender System
User profile
Users
Item7 Item15 Item11 …
Top-N Recommendations
Item1, 5 Item2, 1 Item5, 4 Item10, 5 ….
….
Item1, 4 Item2, 2 Item5, 5 Item10, 3 ….
Item1, 4 Item2, 2 Item5, 5 Item10, 3 ….
Item1, 4 Item2, 2 Item5, 5 Item10, 3 ….
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Content-based Recommender Systems
Recommender System
User profile
Item7 Item15 Item11 …
Top-N Recommendations
Item1, 5 Item2, 1 Item5, 4 Item10, 5 ….
Items
Item1 Item2
Item100 Item’s
descriptions
….
CB-RSs recommend items to a user based on their description and on the profile of the user’s interests
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Knowledge-based Recommender Systems
Recommender System
User profile
Item7 Item15 Item11 …
Top-N Recommendations Item1, 5 Item2, 1 Item5, 4 Item10, 5 ….
Items
Item1 Item2
Item100 Item’s descriptions
….
KB-RSs recommend items to a user based on their description and domain knowledge encoded in a knowledge base
Knowledge-base
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
User-based Collaborative Recommendation
5 1 2 4 3 ??
2 4 5 3 5 2
4 3 2 4 1 3
3 5 1 5 2 4
4 4 5 3 5 2
The
Mat
rix
Tita
nic
I lo
ve s
ho
pp
ing
Arg
o
Love
Act
ual
ly
The
han
gove
r
Tommaso
Enrico
Sean
Natasha
Valentina
𝑠𝑖𝑚 𝑢𝑖 , 𝑢𝑗 = 𝑟𝑢𝑖,𝑥 − 𝑟𝑢𝑖
∗ 𝑟𝑗,𝑥 − 𝑟𝑢𝑗𝑥∈𝑋
𝑟𝑢𝑖,𝑥 − 𝑟𝑢𝑖
2
𝑥∈𝑋 ∗ 𝑟𝑢𝑗,𝑥 − 𝑟𝑢𝑗
2
𝑥∈𝑋
Pearson’s correlation coefficient
Rate prediction
𝑟 𝑢𝑖 , 𝑥′ = 𝑟𝑢𝑖
+ 𝑠𝑖𝑚 𝑢𝑖 , 𝑢𝑗 ∗ 𝑟𝑢𝑗,𝑥
′ − 𝑟𝑢𝑗𝑢𝑗
𝑠𝑖𝑚(𝑢𝑖 , 𝑢𝑗)𝑢𝑗
= 𝑋
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Item-based Collaborative Recommendation
5 1 2 4 3 ??
2 4 5 3 5 2
4 3 2 4 1 3
3 5 1 5 2 4
4 4 5 3 5 2
The
Mat
rix
Tita
nic
I lo
ve s
ho
pp
ing
Arg
o
Love
Act
ual
ly
The
han
gove
r
Tommaso
Enrico
Sean
Natasha
Valentina
𝑠𝑖𝑚 𝑥𝑖 , 𝑥𝑗 = 𝑥𝑖 ⋅ 𝑥𝑗
|𝑥𝑖| ∗ |𝑥𝑗|=
𝑟𝑢,𝑥𝑖∗ 𝑟𝑢,𝑥𝑗𝑢
𝑟𝑢,𝑥𝑖2
𝑢 ∗ 𝑟𝑢,𝑥2
𝑢
Cosine Similarity
Rate prediction
𝑟 𝑢𝑖 , 𝑥′ =
𝑠𝑖𝑚 𝑥 , 𝑥 ′ ∗ 𝑟𝑥,𝑢𝑖𝑥∈𝑋𝑢𝑖
𝑠𝑖𝑚 𝑥 , 𝑥 ′𝑥∈𝑋𝑢𝑖
𝑠𝑖𝑚 𝑥𝑖 , 𝑥𝑗 = 𝑟𝑢,𝑥𝑖
− 𝑟𝑢 ∗ 𝑟𝑢,𝑥𝑗− 𝑟𝑢 𝑢
𝑟𝑢,𝑥𝑖− 𝑟𝑢
2𝑢 ∗ 𝑟𝑢,𝑥𝑗
− 𝑟𝑢 2
𝑢
Adjusted Cosine Similarity
= 𝑋𝑢𝑖
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Content-Based Recommender Systems
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
P. Lops, M. de Gemmis, G. Semeraro. Content-based recommender Systems: State of the Art and Trends. In: P. Kantor, F. Ricci, L. Rokach, B. Shapira, editors, Recommender Systems Hankbook: A complete Guide for Research Scientists & Practitioners
Content-Based Recommender Systems
• Items are described in terms of attributes/features
• A finite set of values is associated to each feature
• Items representation is a (Boolean) vector
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Content-Based Recommender Systems
• Compute similarity between items
𝑠𝑖𝑚 𝑥𝑖 , 𝑥𝑗 = |𝑥𝑖 ∩ 𝑥𝑗|
|𝑥𝑖 ∪ 𝑥𝑗|
Jaccard similarity
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Content-Based Recommender Systems
• Compute similarity between items
𝑠𝑖𝑚 𝑥𝑖 , 𝑥𝑗 = 𝑥𝑖 ⋅ 𝑥𝑗
𝑥𝑖 ∗ |𝑥𝑗|
Cosine similarity
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Content-Based Recommender Systems
• Compute similarity between items
𝑠𝑖𝑚 𝑥𝑖 , 𝑥𝑗 = 𝑥𝑖 ⋅ 𝑥𝑗
𝑥𝑖 ∗ |𝑥𝑗|
Cosine similarity and TF-IDF
𝑇𝐹 𝑣, 𝑥 = # 𝑣 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑒𝑠𝑐𝑟𝑖𝑝𝑡𝑖𝑜𝑛 𝑜𝑓 𝑥
𝐼𝐷𝐹 𝑣 = log|𝑋|
#𝑣 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑖𝑛 𝑋
𝑥 = (𝑇𝐹 𝑣1, 𝑥 ∗ 𝐼𝐷𝐹 𝑣1 , … , 𝑇𝐹 𝑣𝑛, 𝑥 ∗ 𝐼𝐷𝐹 𝑣𝑛 )
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Content-Based Recommender Systems
Rate prediction
𝑟 𝑢𝑖 , 𝑥′ =
𝑠𝑖𝑚 𝑥 , 𝑥 ′ ∗ 𝑟𝑥,𝑢𝑖𝑥∈𝑋𝑢𝑖
𝑠𝑖𝑚 𝑥 , 𝑥 ′𝑥∈𝑋𝑢𝑖
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Content-Based Recommender Systems
• Nearest neighbors
– Given a set of items representing the user profile, select their most similar items which are not in the user profile
– Predict the rate only for the N nearest neighbors
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Content-Based Recommender Systems
• Nearest neighbors with kNN
𝑥𝑖,1 𝑥𝑖,2 𝑥𝑖,3
𝑥𝑖,4
𝑥𝑖,5
𝑥𝑖,6
k = 5
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Content-Based Recommender Systems
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Need of domain knowledge! We need rich descriptions of the items!
No suggestion is available if the analyzed content does not contain enough information to discriminate items the user might like from items the user might not like.*
(*) P. Lops, M. de Gemmis, G. Semeraro. Content-based Recommender Systems: State of the Art and Trends. In: P. Kantor, F. Ricci, L. Rokach and B. Shapira, editors, Recommender Systems Handbook: A Complete Guide for Research Scientists & Practitioners
The quality of CB recommendations are correlated with the quality of the features that are explicitly associated with the items.
Main CB RSs Drawback: Limited Content Analysis
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
A solution based on Linked Data
Use Linked Data to mitigate the limited content analysis issue
• Plenty of structured data available
• No Content Analyzer required
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
FRED: Transforming Natural Language text to RDF/OWL graphs
input
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Courtesy of Valentina Presutti
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
FRED: RDF/OWL graph output
Resolution and linking Vocabulary alignment
Frames/events
Taxonomy induction
Semantic roles Co-reference
Custom namespace
Types
Induced types (sense tagging)
22
Courtesy of Valentina Presutti
Linked Data as structured information source for items descriptions
Rich items descriptions
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Select the domain(s) of your RS
SELECT count(?i) AS ?num ?c
WHERE {
?i a ?c .
FILTER(regex(?c, "^http://dbpedia.org/ontology")) .
}
ORDER BY DESC(?num)
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Tìpalo: automatic typing of DBpedia entities
Input: natural language definition of an entity
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Courtesy of Valentina Presutti
26
Linking to WN supersenses
DBpedia resource Extracted type
Disambiguated sense
Inferred superclass
Linked resource
Disambiguated sense
A chaise longue is an upholstered sofa in
the shape of a chair that is long enough
to support the legs.
A chaise longue (English /ˌʃeɪz ˈlɔːŋ/;[1] French
pronunciation: [ʃɛzlɔ̃ŋɡ(ə)], "long chair") is an
upholstered sofa in the shape of a chair that is long
enough to support the legs.
Linking to DOLCE
http://en.wikipedia.org/wiki/Chaise_longue
Rich taxonomical data!
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Courtesy of Valentina Presutti
Select the sub-graph you are interested in
select ?m ?p ?o1
where{
?m ?p ?o1 .
?m dcterms:subject ?o.
dbpedia:The_Matrix dcterms:subject ?o.
?m a dbpedia-owl:Film.
filter(regex(?p,"^http://dbpedia.org/ontology/")).
}
order by ?m
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Enriching Annotations with Resources From the Web
Watson: find ontologies on the Semantic Web for enriching the representation of your data
Sindice: find other entities for enriching your knowledge base
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
SameAs.org: find other entities to link to from your knowledge base
Courtesy of Valentina Presutti
Computing similarity in LOD datasets
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Vector Space Model for LOD
Righteous Kill
starring director
subject/broader genre
Heat
Ro
be
rt D
e N
iro
Joh
n A
vnet
Seri
al k
ille
r fi
lms
Dra
ma
Al P
acin
o
Bri
an D
en
ne
hy
He
ist
film
s C
rim
e f
ilms
starring
Ro
be
rt D
e N
iro
A
l Pac
ino
B
rian
De
nn
eh
y
Righteous Kill Heat
… …
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Vector Space Model for LOD
Righteous Kill
STARRING Al Pacino
(v1)
Robert De Niro
(v2)
Brian Dennehy
(v3)
Righteous Kill (m1)
X X X
Heat (m2) X X
Heat
Righteous Kill (x1) wv1,x1 wv2,x1 wv3,x1
Heat (x2) wv1,x2 wv2,x2 0
𝑤𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 = 𝑡𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 ∗ 𝑖𝑑𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Vector Space Model for LOD
Righteous Kill
STARRING Al Pacino
(v1)
Robert De Niro
(v2)
Brian Dennehy
(v3)
Righteous Kill (m1)
X X X
Heat (m2) X X
Heat
Righteous Kill (x1) wv1,x1 wv2,x1 wv3,x1
Heat (x2) wv1,x2 wv2,x2 0
𝑤𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 = 𝑡𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜,𝐻𝑒𝑎𝑡 ∗ 𝑖𝑑𝑓𝐴𝑙𝑃𝑎𝑐𝑖𝑛𝑜
𝑡𝑓 ∈ {0,1}
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Vector Space Model for LOD
+
+
+
… =
𝒔𝒊𝒎𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈(𝒙𝒊, 𝒙𝒋) = 𝒘𝒗𝟏,𝒙𝒊
∗ 𝒘𝒗𝟏,𝒙𝒋+ 𝒘𝒗𝟐,𝒙𝒊
∗ 𝒘𝒗𝟐,𝒙𝒋+ 𝒘𝒗𝟑,𝒙𝒊
∗ 𝒘𝒗𝟑,𝒙𝒋
𝒘𝒗𝟏,𝒙𝒊𝟐 + 𝒘𝒗𝟐,𝒙𝒊
𝟐 + 𝒘𝒗𝟑,𝒙𝒊𝟐 ∗ 𝒘𝒗𝟏,𝒙𝒋
𝟐 + 𝒘𝒗𝟐,𝒙𝒋𝟐 + 𝒘𝒗𝟑,𝒙𝒋
𝟐
𝜶𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈 ∗ 𝒔𝒊𝒎𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈(𝒙𝒊, 𝒙𝒋)
𝜶𝒅𝒊𝒓𝒆𝒄𝒕𝒐𝒓 ∗ 𝒔𝒊𝒎𝒅𝒊𝒓𝒆𝒄𝒕𝒐𝒓(𝒙𝒊, 𝒙𝒋)
𝜶𝒔𝒖𝒃𝒋𝒆𝒄𝒕 ∗ 𝒔𝒊𝒎𝒔𝒖𝒃𝒋𝒆𝒄𝒕(𝒙𝒊, 𝒙𝒋)
𝒔𝒊𝒎𝒔𝒕𝒂𝒓𝒓𝒊𝒏𝒈(𝒙𝒊, 𝒙𝒋)
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
A LOD-based CB RS
Given a user profile defined as:
We can predict the rating using a Nearest Neighbor Classifier wherein the similarity measure is a linear combination of local property similarities
𝑿𝒖𝒊= { < 𝒙𝒊, 𝒓𝒙𝒊,𝒖𝒊
> }
𝑟 𝑢𝑖 , 𝑥′ =
𝛼𝑝 ∗ 𝑠𝑖𝑚𝑝(𝑥𝑖 , 𝑥 ′)𝑝
𝑃∗ 𝑟𝑥𝑖,𝑢𝑖<𝑥𝑖,𝑟𝑥𝑖,𝑢𝑖
> ∈ 𝑋𝑢𝑖
|𝑋𝑢𝑖|
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
We’ve just scratched the surface…
Missing in this presentation (not a complete list)
• Learning 𝛼𝑝
• Explanation
– Quite straight with a content-based approach
• Hybrid approaches
– Mix a content based approach with a collaborative one
– Collaborative similarity as a further vector space (property) of the content based approach?
– Exploit the graph-based nature of both the rating matrix and the RDF graph
• … SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web
Many thanks to Vito Claudio Ostuni and Roberto Mirizzi
SSSW'13 The 10th Summer School on Ontology Engineering and the Semantic Web