visual querying lod sources with lodex
TRANSCRIPT
DB
Gro
up @
U
NIM
O
Visual Querying LOD sources with LODeX
Fabio Benedetti, Sonia Bergamaschi, Laura PoDepartment of Engineering “Enzo Ferrari”
University of Modena & Reggio Emilia
K-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
DB
Gro
up @
U
NIM
O
3Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 3
Linked Open Data: The story so far
[Schmachtenberg, Max, Christian Bizer, and Heiko Paulheim. "Adoption of the Linked Data Best Practices in Different Topical Domains." The Semantic Web–ISWC 2014. Springer International Publishing, 2014. 245-260]
DB
Gro
up @
U
NIM
O
4Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 4
Linked Open Data: The story so far
*Only 570 datasets belong to the LOD cloud,the remaining datasets do not contain ingoing/outgoing links to the LOD Cloud.
2009 2014*Domain Number % Number %
Cross-domain 41 13.95% 41 4.04%
Geographic 31 10.54% 21 2.07%
Government 49 16.67% 183 18.05%
Life sciences 41 13.95% 83 8.19%
Media 25 8.50% 22 2.17%
Publications 87 29.59% 96 9.47%
Social web 0 0.00% 520 51.28%
User-generated content
20 6.80% 48 4.73%
Total 294 1014
2009Domain
Cross-domain
Geographic
Government
Life sciences
Media
Publications
Social web
User-gener-ated content
2014
DB
Gro
up @
U
NIM
O
5Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 5
Linked Open Data drawbacks
The Open Access trends encourage the publication of
Open Data in form of Linked Data
But
Discovering and consuming LOD sources is a complex task for both
skilled and unskilled user
DB
Gro
up @
U
NIM
O
6Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 6
Why is a complex task?• There does not exist any standard for documenting a
dataset• A great number of datasets is published without a real
documentation that could help on revealing their structure.
To understand if a dataset really contains interesting information a user have to manually explore it using SPARQL queries.
Unskilled user
A user with no SPARQL knowledge cannot become a consumer of Linked Data
Skilled user
The task of exploring a dataset can be time consuming without having any knowledge of its structure
DB
Gro
up @
U
NIM
O
7Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 7
Our solution - LODeX
A tool for that promotes the understanding, navigation and querying of LOD sources
Requirements
• portable to the LOD Cloud• provide a synthetic representation of the
structure of the dataset• provide visual query building functionalities
hiding the complexity of Semantic Web technologies
DB
Gro
up @
U
NIM
O
8Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 8
LODeX ArchitectureTwo main modules• Extraction & Summarization
– Index Extraction (IE)– Post Processing (PP)
LOD Cloud
SPARQL Queries
LODeX Post-
processing
Statistical Indexes
LODeX Indexes
Extraction
EndpointURLs
Schema Summary
NoSQL
SPARQL Queries
SchemaSummary
Query Orchestrator
Schema Summary
Visualizzation
Basic QueryResults
• Visualization & Querying– Schema Summary Visualization– Query Orchestrator
DB
Gro
up @
U
NIM
O
9Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 9
Extraction & Summarization
Index Extraction [1]The IE process is able to generate the SPARQL queries used to extract the different indexes.• Pattern Strategy technique
– It is a technique able to produce an higher number of less complex SPARQL query
Post Processing
An algorithm combines the information contained in the Statistical Indexes to produce and store the Schema Summary
DB
Gro
up @
U
NIM
O
10Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 10
Schema SummaryThe Schema Summary is a pseudograph composed
by:• C - Classes (nodes)• P - Properties (edges)
And additional elements and function:• A - Attributes associated to each class
– Each attribute represent the existence of a Datatype property from the instances of the class
• - labels• l – labeling function • count - count function
The Schema Summary is inferred by the distribution of the instances of a dataset
DB
Gro
up @
U
NIM
O
11Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 11
Running example
ex:Sector foaf:Organization
owl:Class
ex:sector
“sector”
rdf:type rdf:type
rdf:Propertyrdf:type
owl:ObjectProperty
rdf:type
sector1 organization1ex:sector
dc:title
“Energy” organization2
Extensional Classes
ExtensionalKnowledge
IntensionalKnowledge
ex:activity
“Village electrification in the Pacific”
“+41331231”
rdfs:label
rdfs:label
rdfs:domain
rdf:type
ex:sector
rdf:type rdf:type
dbpedia:fax
person1
foaf:Person
ex:activity
“Paolo”
“Rossi”
rdf:type
ex:ceo
rdf:type foaf:firstName
foaf:lastName
The information contained in the Intensional knowledge can be incomplete or absent
DB
Gro
up @
U
NIM
O
12Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 12
Indexes needed to generate a Schema Summary
These indexes belong to extensional group of the Statistical Indexes [2]:• SC (Subject Class) contains the pairs (p,c) where p is an object
property and c is its domain class.• SCl (Subject Class to literal) contains the pairs (p,c) where p is a
datatype property and c is its domain class.• OC (Object Class) contains the pairs (p,c) where p is an object
property and c is its range class.
DB
Gro
up @
U
NIM
O
13Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 13
Indexes needed to generate a Schema Summary
These indexes belong to extensional group of the Statistical Indexes [2]:• SC (Subject Class) contains the pairs (p,c) where p is an object
property and c is its domain class.• SCl (Subject Class to literal) contains the pairs (p,c) where p is a
datatype property and c is its domain class.• OC (Object Class) contains the pairs (p,c) where p is an object
property and c is its range class.
ex:Sector foaf:Organization
sector1 organization1ex:sector
dc:title
“Energy” organization2
Extensional Classes
ExtensionalKnowledge
“Village electrification in the Pacific”
“+41331231”ex:sector
rdf:type rdf:type
dbpedia:fax
person1
foaf:Person
ex:activity
“Paolo”
“Rossi”
rdf:type
ex:ceo
rdf:type foaf:firstName
foaf:lastName
DB
Gro
up @
U
NIM
O
14Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 14
Indexes needed to generate a Schema Summary
These indexes belong to extensional group of the Statistical Indexes [2]:• SC (Subject Class) contains the pairs (p,c) where p is an object
property and c is its domain class.• SCl (Subject Class to literal) contains the pairs (p,c) where p is a
datatype property and c is its domain class.• OC (Object Class) contains the pairs (p,c) where p is an object
property and c is its range class.
ex:Sector foaf:Organization
sector1 organization1ex:sector
dc:title
“Energy” organization2
Extensional Classes
ExtensionalKnowledge
“Village electrification in the Pacific”
“+41331231”ex:sector
rdf:type rdf:type
dbpedia:fax
person1
foaf:Person
ex:activity
“Paolo”
“Rossi”
rdf:type
ex:ceo
rdf:type foaf:firstName
foaf:lastName
DB
Gro
up @
U
NIM
O
15Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 15
Indexes needed to generate a Schema Summary
These indexes belong to extensional group of the Statistical Indexes [2]:• SC (Subject Class) contains the pairs (p,c) where p is an object
property and c is its domain class.• SCl (Subject Class to literal) contains the pairs (p,c) where p is a
datatype property and c is its domain class.• OC (Object Class) contains the pairs (p,c) where p is an object
property and c is its range class.
ex:Sector foaf:Organization
sector1 organization1ex:sector
dc:title
“Energy” organization2
Extensional Classes
ExtensionalKnowledge
“Village electrification in the Pacific”
“+41331231”ex:sector
rdf:type rdf:type
dbpedia:fax
person1
foaf:Person
ex:activity
“Paolo”
“Rossi”
rdf:type
ex:ceo
rdf:type foaf:firstName
foaf:lastName
DB
Gro
up @
U
NIM
O
16Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 16
Schema Summary generationWe use an algorithm for combining these indexes and produce a Schema Summary
Name Values
SC(foaf:Organization,ex:ceo,1),
(foaf:Organization,ex:sector,2)
SCl
(foaf:Person,foaf:firstName,1), (foaf:Person,foaf:lastName,1),
(foaf:Organization,ex:dbpedia:fax,1), (ex:Sector,dc:title,1),
(foaf:Organization,ex:activity,1), (foaf:Organization,dbpedia:fax,1)
OC(ex:Sector,ex:sector,1)(ex:Person,ex:ceo,1)
DB
Gro
up @
U
NIM
O
17Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 17
Schema Summary generation
foaf:Organizzation2
ex:Sector1
ex:sector 2foaf:Person1
ex:ceo 1
dc:title 1foaf:firstName 1foaf:lastName 1
ex:activity 1dbpedia:fax 1
We use an algorithm for combining these indexes and produce a Schema Summary
Name Values
SC(foaf:Organization,ex:ceo,1),
(foaf:Organization,ex:sector,2)
SCl
(foaf:Person,foaf:firstName,1), (foaf:Person,foaf:lastName,1),
(foaf:Organization,ex:dbpedia:fax,1), (ex:Sector,dc:title,1),
(foaf:Organization,ex:activity,1), (foaf:Organization,dbpedia:fax,1)
OC(ex:Sector,ex:sector,1)(ex:Person,ex:ceo,1)
DB
Gro
up @
U
NIM
O
18Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 18
Visualization & QueryingSchema Summary VisualizationFront end of the Web Application composed by three panel:• List of datasets indexed in LODeX• Schema Summary and query building panel• Refinement panel
Query Orchestrator• It manages the interaction between the User and the
GUI• It contains a SPARQL compiler able to compile the
visual query in a SPARQL one
DB
Gro
up @
U
NIM
O
19Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 19
Schema Summary – Building a Visual Query
DB
Gro
up @
U
NIM
O
20Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 20
Refinement Panel
DB
Gro
up @
U
NIM
O
21Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 21
Visual Query & SPARQL compiler
Schema Summary
SPARQL compiler
SPARQL query
Basic Query
The Visual Query has a tree structure
A SPARQL compiler exploits a recursive algorithm to generate the corresponding SPARQL queryOperators supported by the compiler:• AND• Optional• Filter
The query is sent to the SPARQL endpoint and the results can be visualized in a tabular view
• ORDER BY• LIMIT• OFFSET
DB
Gro
up @
U
NIM
O
22Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 22
Evaluation of LODeX
We performed 3 different kinds of evaluation to inspect:• Portability of LODeX to SPARQL endpoints
• SPARQL expressiveness
• Usability of LODeX– to verify if the graph visualization of the SS is clear in representing
the structure of a dataset– to prove if the visual query panel is a powerful and adequate way
for generating SPARQL queries
DB
Gro
up @
U
NIM
O
23Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 23
We evaluate the complexity of the graph visualization with a group of group of 5 students.• Task: find a node in graphs of increasing size
The test set is composed by 185 datasets taken from Datahub
Portability evaluation
Result portability test Number of datasets
%
Huge Schema Summary(more than 80 nodes)
40 21%
Offline endpoints 7 4%
Not standard response 28 15%
Pass the test 110 60%
DB
Gro
up @
U
NIM
O
24Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 24
We analyzed what kind of SPARQL query LODeX is able to generate
We used as reference the queries contained in the Berlin SPARQL Benchmark [3]• LODeX is able to generate 6 of 10 queries contained in BSBM
SPARQL expressiveness
DB
Gro
up @
U
NIM
O
25Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 25
We analyzed what kind of SPARQL query LODeX is able to generate
We used as reference the queries contained in the Berlin SPARQL Benchmark [3]• LODeX is able to generate 6 of 10 queries contained in BSBM
SPARQL expressiveness
• UNION queries• CONSTRUCT queries• ASK queries
DB
Gro
up @
U
NIM
O
26Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 26
We analyzed what kind of SPARQL query LODeX is able to generate
We used as reference the queries contained in the Berlin SPARQL Benchmark [3]• LODeX is able to generate 6 of 10 queries contained in BSBM
SPARQL expressiveness
• UNION queries• CONSTRUCT queries• ASK queries
• All JOIN acyclic queries• All FILTER queries• All ORDER queries
DB
Gro
up @
U
NIM
O
27Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 27
We performed an online survey where has been enrolled 27 users
The survey is divided in two parts having different goals:• Evaluate the clarity of Schema Summary• Evaluate the functionality of visual query building
For each part has been designed some tasks and a SUS [4] questionnaires
User Evaluation
DB
Gro
up @
U
NIM
O
28Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 28
Clarity of Schema Summary
Tasks:
Datasets:
• (T1)Indicate the topic of the dataset• (T2)Find out the class with the largest number of instances• (T3)Find out the classes connected to a given class chosen by us• (T4)Find out the most used attribute of a class chosen by us
• Bio2RDF - INOH - pathway database of model organisms• Linked Open Aalto Data Service - Open data published by
Aalto University
Task Number n Correct %
T1 54 48 89%
T2 54 48 89%
T3 27 23 89%
T4 27 27 100%
Total
162 148 91%
DB
Gro
up @
U
NIM
O
30Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 30
Query building functionalities
Tasks:
Dataset:
• (Q1)Return all the different category of Nobel prizes• (Q2) Return a table containing the list of winners of a Nobel
prizes ordered by the name of the winner; the table has to contain the date of birth of the winner.
• (Q3) Find the award files related to the award of Peter W. Higgs
• (Q4) Find the organizations that won a Nobel prize after the 1999Nobel Prizes - Linked Open Data about every Nobel Prize
Task Number n Correct %
Q1 27 27 100%
Q2 27 26 96%
Q3 27 22 81%
Q4 27 23 85%
Total
108 98 90%
DB
Gro
up @
U
NIM
O
31Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 31
Query building functionalities(2)
We obtained a median SUS score of 85.5• No remarkable differences between skilled and unskilled
user• This score classifies the usability of LODeX as “Excellent”
[5] FeedbackUnskilled users write their SPARQL query for the first time“LODeX is cognitively less demanding that write SPARQL query”
Browser rendering differenceStarting a query can be difference and keyword search techniques could be helpful
DB
Gro
up @
U
NIM
O
32Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 32
Conclusion and Future WorksConclusion• LODeX is portable with the 60% of the datasets tested
– 19% a failure induced by endpoint issues• Both skilled and unskilled users appreciated LODeX
Future works• Modify the interface of LODeX according to the results
of the online survey• Define clustering and new techniques of browsing to
reduce the complexity of the Summary for huge dataset
• Extend the group of operators supported by the SPARQL compiler
DB
Gro
up @
U
NIM
O
33Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 33
Referencies
• [1] F. Benedetti, S. Bergamaschi, and L. Po, A visual summary for linked open data sources. 2014, International Semantic Web Conference (Posters & Demos).
• [2] F. Benedetti, S. Bergamaschi, and L. Po. Online index extraction from linked open data sources. Linked Data for Information Extraction (LD4IE) Workshop held at International Semantic Web Conference, 2014.
• [3] C. Bizer and A. Schultz. Benchmarking the performance of storage systems that expose sparql endpoints.
• [4] J. Brooke. Sus-a quick and dirty usability scale. Usability evaluation in industry, 189(194):4–7, 1996.
• [5] A. Bangor, P. Kortum, and J. Miller. Determining what individual sus scores mean: Adding an adjective rating scale.
DB
Gro
up @
U
NIM
O
34Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia 34
Acknowledgment
DB
Gro
up @
U
NIM
O
35Visual Querying LOD sources with LODeXK-Cap 2015 - The 8th International Conference on Knowledge Capture October 7-10, 2015, Palisades, NY, USA
Fabio BenedettiDip. Ing. “Enzo Ferrari” – University of Modena e Reggio Emilia
Thanks for your attention!
Try LODeX at: http://dbgroup.unimo.it/lodex2