discovery hub: on-the-fly linked data exploratory search
TRANSCRIPT
Discovery Hub: on-the-fly linked data exploratory search
Nicolas Marie, Fabien Gandon, Myriam RibièreFlorentin Rodio, Damien Legrand
CONTEXTPROPOSITIONEVALUATIONCONCLUSION
Search…ExploratoryLookup
???« members » + « The Beatles»
Precise information need Fuzzy information need
you are here
related work…Aemoo Kaminskas & al. LED MORE Seevl Yovisto
Purpose Explorator
y search
Cross-domain
recommendation
Exploratory
search on
ICT domain
Film
recommendati
on
Musical
recommendati
on
Video
exploratory
search
Data DBpedia
EN +
external
services
DBpedia EN
subset
DBpedia +
external
services
DBpedia EN
subset
DBpedia EN
subset
DBpedia
EN+DE
subset
Multi-domain Yes Cross two
domains
No No, cinema No, music Yes
Query Entity
search
Entity selection in
a pre-processed
list
Entity search Entity search Entity
recognition
from Youtube.
Entity
recognition in
keywords
Algorithm EKP
filtered
view
weighted
activation
DBpedia
Ranker
sVSM algo. DBrec
algorithm
Set of
heuristics
Ranking No Yes Yes Yes Yes Yes
Explanations Wikipedia-
based
Path-based No Shared prop. Shared
properties
No
Offline proc. Yes , EKP
part
Yes Yes Yes Yes Yes
goal: domain-independent, customizable, on the fly, remote sources
composite interest queries
knowing my interest for X and Y what can I
discover/learn which is related to all these resources?
The Beatles Ken Loach
CONTEXTPROPOSITIONEVALUATIONCONCLUSION
principle
results selectionrankingsorting/categorizationexplanations
1
2
3
4
http://dbpedia.org/resource/Ken_Loach
…dbpedia.org/resource/The_Beatles
research questions
1. How can we discover linked resources of interest
to be explored ?
2. How to address remote LOD sources for this?
3. How to present and explain the results to the user
for an exploratory objective ?
http://fr.dbpedia.org/sparql
http://es.dbpedia.org/sparql
http://it.dbpedia.org/sparql
semantic adaptation of spreading activation
1
0,2
0,2
0,2 0,2
0,1
0,6
0,6
1
0,8
1
example of semantic spreading activation
Album, Band, Film, Musical Artist, Music Genre, Person, Radio Station, Single, Song, Television Show
Company, Election, Film, Journalist, Musical Artist, Newspaper, Office Holder, Organisation, Politician, School, Single, Television Show, Writer
propagation domain propagation domain
research questions
1. How can we discover linked resources of interest
to be explored ?
2. How to address remote LOD sources for it?
3. How to present and explain the results to the user
for an exploratory objective ?
http://fr.dbpedia.org/sparql
http://es.dbpedia.org/sparql
http://it.dbpedia.org/sparql
sampling algorithm
1.sparql endpoint = http://xxx/sparql
2.seeds = xxx//The_Beatles, xxx/Ken_Loach
3. compute the propagation domain (w(i,o))
4. find a path between the seeds
5. import path nodes & their neighbors
6. for(i=1; i<=maxPulse; i++){
7. pulse();
8. if(sampleSize <= maxSampleSize){
9. extend the sample
10. }
11.}
iterative import
Local Kgram instance
Online LOD source
magic numbers
1.sparql endpoint = http://xxx/sparql
2.seeds = xxx//The_Beatles, xxx/Ken_Loach
3. compute the propagation domain (w(i,o))
4. find a path between the seeds
5. import path nodes & their neighbors
6. for(i=1; i<=maxPulse; i++){
7. pulse
8. if(sampleSize <= maxSampleSize){
9. extend the sample
10. }
11.}
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 5000 10000 15000 20000
Ken
da
ll T
au
Resp
on
se
Tim
e
Triples loading limit
Sample size influence on top 100 results, maxSampleSize
Convergence, top 100 results maxPulse
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
Ken
dall-T
au
Sh
are
d r
es
ult
s
Iterations
Response time histogram
000001111111111222222
seco
nd
s
Queries response time histogram
5
20
research questions
1. How can we discover linked resources of interest
to be explored ?
2. How to address remote LOD sources for it?
3. How to present and explain the results to the user
for an exploratory objective ?
http://fr.dbpedia.org/sparql
http://es.dbpedia.org/sparql
http://it.dbpedia.org/sparql
Discovery Hub 1.0
1. Start from what you like or are interested in
3. Be redirected on third-party platforms to continue the
discovery experience
Book
2. Explore, understand, disco
ver
…
Discovery Hub 1.0
short demo
CONTEXTPROPOSITIONEVALUATIONCONCLUSION
composite queries
• randomly combining Facebook likes of 12 users
• two queries for each participants to judge the top 20 results
The result interests me [Strongly Disagree … Strongly Agree ]
The result is unexpected [Strongly Disagree … Strongly Agree ]
Very interesting
Not interesting at all
overall•61.6% of the results were rated as strongly relevant
or relevant by the participants.
•65% of the results were rated as strongly
unexpected or unexpected.
•35.42% of the results were rated both as strongly
relevant or relevant and strongly unexpected or
unexpected.
Explanatory features evaluation
Common prop. Wiki-based Graph-based OverallCommon prop. Wiki-based Graph-based Overall
Very Helpful
Not helpful at all
comparison SSA(Discovery Hub) vs. sVSM (More)
• Hypothesis 1: SSA gives results at least as relevant as sVSM.
• Hypothesis 2: SSA has a weaker degradation than sVSM (better end-lists).
• Hypothesis 3: results less relevant but newer to users at the end of the lists.
• Hypothesis 4: advanced search gives better results compared to standard
query.
Measure Algo Rank Mean St. Dev.
Relevance SSA 1-10 1.54 0.305
11-20 1.28 0.243
sVSM 1-10 1.42 0.294
11-20 0.93 0.228
Discovery SSA 1-10 1.10 0.247
11-20 1.21 0.228
sVSM 1-10 1.14 0.251
11-20 1.50 0.205 0
0.5
1
1.5
2
2001 Erin Term Princess Fight Overall
SC
OR
E
SSA sVSM
CONTEXTPROPOSITIONEVALUATIONCONCLUSION
•semantic spreading activation
algorithm coupled to a graph
sampling to address remote
LOD sources.
• faceted browsing and
multiple explanations of
the results.
•on-going extensive user
evaluation
•publicly available http://discoveryhub.co
Discovery Hub : enabling exploratory
search starting from several interests
using linked data sources
1
0,2
0,2 0,2
0,6
0,6
1
0,8
1
current work: propagation over multiple data sources in parallel.
redesign of the interface: Discovery Hub 2.0 released
perspective: other applications of semantic spreading
activation
multi-lingual modedbpedia:Charles_Baudelaire sameAs fr.dbpedia:Charles_Baudelaire
French
English
http://discoveryhub.co/
@discovery_hub