eswc ss 2013 - tuesday tutorial 2 maribel acosta and barry norton: interaction with linked data
TRANSCRIPT
Interaction with Linked Data
Presented by: Maribel Acosta Barry Norton
Motivation: Music!
2
Visualiza3on Module
Metadata Streaming providers
Physical Wrapper
Downloads
Data acquisi3
on R2R Transf. LD Wrapper
Musical Content
Applica3
on
Analysis & Mining Module
LD Dataset
Access
LD Wrapper
RDF/ XML
Integrated Dataset
Interlinking Cleansing Vocabulary Mapping
SPARQL Endpoint
Publishing
RDFa
Other content EUCLID – Interac3on with Linked Data
Motivation: Music! (2)
EUCLID – Interac3on with Linked Data 3
• Our aim: build a music-‐based portal using Linked Data technologies
• So far, we have studied different mechanisms to consume Linked Data: • Execu3ng SPARQL queries • Dereferencing URIs • Downloading RDF dumps • Extrac3ng RDFa data
• The output of these mechanisms corresponds to data in machine-‐readable formats
CH 2
CH 3
CH 1
Examples of machine-‐readable output:
Motivation: Music! (3)
EUCLID – Interac3on with Linked Data 4
Visualiza=ons techniques are needed in order to transform the machine-‐readable data into this:
Motivation: Music! (4)
EUCLID – Interac3on with Linked Data 5
Source: hZp://musicbrainz.fluidops.net/
In addi3on, visualiza=on techniques allow for:
Motivation: Music! (5)
EUCLID – Interac3on with Linked Data 6
• Telling a story
• Engaging our paZern matching brain
• Iden3fying data characteris3cs which cannot be directly inferred from sta3s3cal proper3es: • Anscombe’s quartet: 4 datasets very
different, but with same sta3s3cal values.
Image: hZp://en.wikipedia.org/wiki/Anscombe's_quartet
Source: Donaldson, I. and Lamere P. Using Visualiza,ons for Music Discovery
Image: Chan W., Qu. H, Mak, W. Visualizing the Seman,c Structure in Classical Musical Works.
Agenda 1. Linked Data visualiza=on
2. Linked Data search
3. Methods for Linked Data analysis
7 EUCLID – Interac3on with Linked Data
LINKED DATA VISUALIZATION
EUCLID – Interac3on with Linked Data 8
LD Visualization Techniques
• Linked Data visualiza3on techniques should provide graphical representa=ons of the informa3on within the LD datasets
• Visualiza3on techniques should be selected accordingly to:
– The type of data: Specific types of data should be visualized in a certain way
– The purpose of the visualiza=on: Depending on the type of analysis/applica3on to employ
9 EUCLID – Interac3on with Linked Data
LD Visualization Techniques (2)
EUCLID – Interac3on with Linked Data 10
• (Raw) RDF data: Instance data, taxonomies, ontologies, vocabularies.
• Analy=cally extracted data: Subset of the data denominated region of interest (ROI), obtained via data extrac,on mechanisms, for example, SPARQL queries.
• Visualiza=on abstrac=on: It is obtained by applying visualiza,on transforma,ons to render the data into displayable informa3on.
• View: Final result. The visual mapping transforma3ons obtain a graphic representa3on of the data using the selected visualiza3on technique.
• User interac=on: The user interacts (click, zoom, etc.) with the visualiza3on, which may trigger a new visualiza3on process.
RDF data
Analy3cally extracted data
Visualiza3on abstrac3on
View
Data extraction
Visualization transformation
Visual mapping transformation
Overview of the Linked Data Visualization process
Process par3ally based on: Brunej , J.M.; Auer, S.; García, R. The Linked Data Visualiza,on Model.
(Op3onal)
User interaction
country releases
United Kingdom 225
United States 140
Germany 30
Luxembourg 29
LD Visualization Techniques (3)
EUCLID – Interac3on with Linked Data 11
Example of the Linked Data Visualization process
…
RDF data
Analy3cally extracted data
…
Visualiza3on abstrac3on
SELECT ?country (COUNT(?release) AS ?releases) WHERE { <http://dbpedia.org/resource/The_Beatles> foaf:made ?release . ?release a mo:Release ; mo:label ?label . ?label foaf:based_near ?country .} GROUP BY ?country ORDER BY DESC(?releases)
Data extraction
SPARQL query: Retrieve number of releases per country of The Beatles
#widget : HeatMap | input = 'country_code' | output = {{ 'releases' }}
Visualization transformation
country_code releases
GB 225
US 140
DE 30
LU 29
?country_code2 := REPLACE(str(?country), "hZp://ontologi.es/place/", "", "i”) ?country_code := REPLACE(?country_code2, "%", "", "i")
Formajng the names of the countries
View Visual mapping transformation
Selec3ng the visualiza3on technique (input, output)
Can be performed in a single step
… …
LD Visualization Techniques (3)
EUCLID – Interac3on with Linked Data 12
Example of the Linked Data Visualization process
View
Challenges for Linked Data Visualization
EUCLID – Interac3on with Linked Data 13
• Enabling user interac=on – Users must be able to navigate through the data by exploi3ng the
connec3ons between Linked Data resources – The user might edit the underlying data to enrich it by:
• Crea3ng addi3onal metadata • Highligh3ng or correc3ng errors • Valida3ng data
• Suppor3ng data reusability – The output (the ploZed data or the visualiza3on itself) might be
encoded using standard ontologies and vocabularies
• Scalability – Linked Data visualiza3on techniques should support the display of
large amount of data in an efficient way
Challenges for Linked Open Data Visualization
EUCLID – Interac3on with Linked Data 14
• Extrac3ng data from different repositories – A Linked Data set might be par33oned into several repositories – The region of interest (ROI) might include data from different data
sets, requiring the access to distributed repositories
• Handling heterogeneous data – The same data (concepts) might be modeled differently, for example,
using different vocabularies – Certain values might have different formats, for example, dates
represented as DD-‐MM-‐YYYY, MM-‐DD-‐YYYY or just YYYY
• Dealing with missing values – Due to the semi-‐structuredness of Linked Data, some instances might
have missing values for certain proper3es
Classification of Visualization Techniques
15 EUCLID – Interac3on with Linked Data
Task Visualiza=on techniques
Comparison of aZributes / values
• Bar/column and pie chart • Line charts • Histogram
Analysis of rela3onships and hierarchies
• Graph • Arc diagram • Matrix • Node-‐link visualiza3ons • Space-‐filling techniques: Treemaps, icicles and sunburst,
circle packing and rose diagrams
Analysis of temporal or geographical events
• Timeline • Maps
Analysis of mul3-‐dimensional data
• Parallel coordinates • Radar/star chart • ScaZer plot
Bar/column chart Allows the comparison of values of different categories.
Pie chart Useful for performing comparison of percentages or propor3ons.
Comparison of Attributes / Values
16 EUCLID – Interac3on with Linked Data
Line chart Allows visualizing data as a series of data points, where the measurement points (x-‐axis) are ordered.
Histogram Graphical representa3on of the distribu3on of the data.
Image source: hZp://mbostock.github.io/protovis/ Image source: hZp://musicbrainz.fluidops.net
Image source: hZp://mbostock.github.io/protovis/ Image source: hZp://musicbrainz.fluidops.net
Arc diagram The nodes are displayed in one dimension, and the arcs represent the connec3ons.
Analysis of Relationships and Hierarchies Graph The data entries are represented as nodes and the links as edges.
17 EUCLID – Interac3on with Linked Data
Adjacency Matrix diagram The nodes are displayed as rows and columns, and the links between the nodes are entries in the matrix.
Node-‐link visualiza3ons The data is organized in hierarchies.
Source of images: hZp://mbostock.github.io/protovis/
Icicles and sunburst Hierarchies are represented by adjacencies.
Analysis of Relationships and Hierarchies (2) Treemaps Subdivide area into rectangles.
18 EUCLID – Interac3on with Linked Data
Circle-‐packing Containment is used to represent the hierarchies.
Rose diagrams Areas are equal angles and the data is represented by the extension of the area.
Source of images: hZp://mbostock.github.io/protovis/
Space-‐filling techniqu
es
Analysis of Temporal or Geographical Events
Timeline
19 EUCLID – Interac3on with Linked Data
Maps
Source: hZp://mbostock.github.io/protovis/
Choropleth maps Aggregate data by geographical area
Loca3on maps Display geo-‐points on a map
Dorling cartograms Aggregate data and replace each area with a circle
Discrete data points in 3me Con3nuous data in 3me
Source: hZp://www.koZke.org/08/08/2008-‐movie-‐box-‐office-‐chart Source: hZp//musicbrainz.fluidops.net
Source: Google Map API Source: hZp//musicbrainz.fluidops.net
ScaZer plot Useful for performing comparison of percentages or propor3ons.
Analysis of Multidimensional Data
Radar/star chart Displays mul3variate data as a two-‐dimensional chart. The axes correspond to the variables.
20 EUCLID – Interac3on with Linked Data
Parallel coordinates Allows visualizing high-‐dimensional data. Each ver3cal axis denotes a dimension, and a mul3dimensional point is represented as a polyline with ver3ces on the axes.
Source: hZp://mbostock.github.io/protovis/
Source: hZp://mbostock.github.io/protovis/ Source: hZp://mbostock.github.io/protovis/
Other Visualization Techniques
EUCLID – Interac3on with Linked Data 21
• Text-‐based visualiza3ons: tag clouds
• Some of the previously presented techniques can be combined to produce more complex data visualiza3ons
Phrase Net of Beatles Lyrics DBpedia music genres
Source: hZp://www.wordle.net Source: hZp://many-‐eyes.com
• Get an overview of the data
• Iden3fica3on of relevant resources, classes or proper=es in datasets
• Learning about certain underlying characteris=cs of the data, e.g., vocabularies or ontologies
• Detec3ng missing links between nodes in an RDF graph
• Discovering new paths between nodes in an RDF graph
• Iden3fying hidden paUerns in the data
• Finding errors or atypical values (outliers) 22 EUCLID – Interac3on with Linked Data
Applications of Linked Data Visualization Techniques
Linked Data Visualization Tool Requirements The requirements for visualiza3on tools that consume Linked Data can be summarized as follows:
• Data naviga=on and explora=on capabili3es in order to understand the structure and the content
• Exploi3ng data structures: • Links to visualize hierarchies or graphs • Mul3-‐dimensional
• User interac=on: • Basic and advanced querying • Filtering values • Interac3ve UI: responsive to the user input
• Publica=on/syndica=on of the graphical representa3on of the data • Data extrac=on in order to export the data such that can be reused by
third par3es
23 EUCLID – Interac3on with Linked Data
Linked Data Visualization Tool Types 1. LD browsers with text-‐based representation
• Dereference URIs to retrieve the resource descrip3on • Use a textual representa3on of LD resources • Display adequately texts and images • Mainly support exploratory browsing and knowledge discovery
2. LD and RDF browsers with visualization options • Exploit picture, graphics, images and other visual representa3ons of the data
• Support user interac3on: allows for querying, filtering and jumping between resources
• Suitable for browsing and knowledge discovery as well as analy3c ac3vi3es
24 EUCLID – Interac3on with Linked Data
Linked Data Visualization Tool Types (2) 3. Visualization toolkits
• Frameworks providing a wide range of visualiza3on techniques • General toolkits support LD visualiza3on by applying a set of transforma3ons of the data
• Some toolkits are specially designed to consume LD
4. SPARQL visualization • These tools allow transforming the output of SPARQL queries into graphics
• Contact SPARQL endpoints in order to evaluate the query • Suitable for analy3cal ac3vi3es
25 EUCLID – Interac3on with Linked Data
Linked Data Visualization Tool Types (3)
26 EUCLID – Interac3on with Linked Data
LD browsers with text-‐based presenta3ons
Sig.ma
Sindice
OpenLink RDF Browser
Marbles
Disco Hyperdata Browser
Piggy Bank (SIMILE)
Zitgist DataViewer
iLOD
URI Burner
Dipper – Talis Pla�orm Browser
LD and RDF browsers with visualiza=on op3ons
Tabulator
IsaViz
OpenLink Data Explorer
RDF Gravity
RelFinder
DBpedia Mobile
LESS
SIMILE Exhibit
Haystack
FoaF Explorer
Humboldt
LENA
Noadster
Visualiza3on toolkits
Linked Data tools: Informa3on Workbench
Visual RDF (by Graves)
LOD Live
LOD Visualiza3on
Data-‐Driven Documents (D3)
NetworkX
Many Eyes
Tableau
Prefuse
SPARQL visualiza3on
Informa3on Workbench
Google Visualiza3on API
SPARQL package for R
Gruff (for AllegroGraph)
Linked Data:
General data:
Linked Data Visualization Examples (1)
EUCLID – Interac3on with Linked Data 27
Sig.ma
Source: hZp://sig.ma/search?q=The+Beatles
Retrieves informa3on from different LD sources
Keyword search
Displays values per predicate
Displays the source for each value
Linked Data Visualization Examples (2)
EUCLID – Interac3on with Linked Data 28
Sig.ma
Source: hZp://sig.ma/search?q=The+Beatles
Displays values per predicate:
May include (redundant) informa3on in different languages, for example: annés and anno
Summary: • Sig.ma lists all the triples, and group
them per predicate • Useful for browsing predicates and
values within data sets • The meaning of the values is not evident
URIs are clickable, allowing naviga3on through RDF resources
Linked Data Visualization Examples (3)
EUCLID – Interac3on with Linked Data 29
Sindice Keyword search
Filtering per type of document
Retrieves links to documents
Allows accessing cache documents
Allows inspec3ng resources
Source: hZp://sindice.com/search?q=The+Beatles
Linked Data Visualization Examples (4)
EUCLID – Interac3on with Linked Data 30
Sindice
Both interfaces display the set of triples related to the inspected resource
Cache triples
Live triples
Linked Data Visualization Examples (5)
EUCLID – Interac3on with Linked Data 31
Information Workbench • Demo available at: hZp://musicbrainz.fluidops.net
• Displays human-‐readable content about Linked Data resources
• Supports visualiza=on techniques (different types of charts,
maps, 3melines, etc.) to plot results from SPARQL queries
• Allows the user to interact with the displayed data
Linked Data Visualization Examples (6)
EUCLID – Interac3on with Linked Data 32
Information Workbench: Browsing a music artist (1) Search op3ons (2) Search results
Linked Data Visualization Examples (7)
EUCLID – Interac3on with Linked Data 33
Information Workbench: Browsing a music artist (3) Browsing the selected resource
Linked Data Visualization Examples (8)
EUCLID – Interac3on with Linked Data 34
Information Workbench: Visualization techniques (3) Browsing the selected resource
Linked Data Visualization Examples (9)
EUCLID – Interac3on with Linked Data 35
Information Workbench: User interaction LD visualiza3ons must support naviga3on through the data
Source: hZp://musicbrainz.fluidops.net/resource/Analy3cal5
Linked Data Visualization Examples (9)
EUCLID – Interac3on with Linked Data 36
Information Workbench: SPARQL Visualization
Implements widgets which allow:
• Retrieving ROI via SPARQL queries • Selec3ng the appropriate visualiza3on technique • Configuring parameters of the visualiza3on
Linked Data Visualization Examples (10)
EUCLID – Interac3on with Linked Data 37
Information Workbench: SPARQL visualization
SELECT ?release ((SUM(xsd:double(?duration/60000))) AS ?avg) WHERE { <http://dbpedia.org/resource/The_Beatles> foaf:made ?release . ?release mo:record ?record . ?record mo:track ?track . ?track mo:duration ?duration .} GROUP BY ?release ORDER BY DESC(?avg) LIMIT 10
SPARQL Query
Result set
Top ten The Beatles releases according to the sum of track dura,ons in minutes
Linked Data Visualization Examples (11)
EUCLID – Interac3on with Linked Data 38
Information Workbench: SPARQL visualization Top ten The Beatles releases according to the sum of track dura,ons in minutes
Widget
Visualization: Bar chart
{{#widget: BarChart | query ='SELECT (COUNT(?Release) AS ?COUNT) ?label WHERE { <http://musicbrainz.org/artist/8538e728-‐ca0b-‐4321-‐b7e5-‐cff6565dd4c0#_> foaf:made ?Release. ?Release rdf:type mo:Release . ?Release dc:title ?label .} GROUP BY ?label ORDER BY DESC(?COUNT) LIMIT 20' | settings = 'Settings:barvertical_mb' | asynch = 'true' | input = 'label' | output = 'COUNT' | height = '300’}}
Linked Data Visualization Examples (12)
EUCLID – Interac3on with Linked Data 39
Information Workbench: SPARQL visualization Top ten The Beatles releases according to the sum of track dura,ons in minutes Other visualiza3ons of the same result set …
Line chart:
Pie chart:
Linked Data Visualization Examples (13)
EUCLID – Interac3on with Linked Data 40
Information Workbench: Automated Widget Suggestion
Bar chart
Line chart
Pie chart
1
2 3 Table
Pivot view
Select a suggested visualiza3on Visualiza3on automa3cally built
Linked Data Visualization Examples (14)
EUCLID – Interac3on with Linked Data 41
Other tools
Source: hZp://en.lodlive.it Source: hZp://lodvisualiza3on.appspot.com
LOD Visualization LOD live
• Graph visualiza3ons • Interac3ve UI (the graph can be
expanded by clicking on the nodes) • Live access to SPARQL endpoints
• Hierarchy visualiza3ons: treemaps and trees • Live access to SPARQL endpoints
(suppor3ng JSON and SPARQL 1.1)
Linking Open Data Cloud Visualization (1)
42 EUCLID – Interac3on with Linked Data
“The Linking Open Data cloud diagram” by Richard Cyganiak and Anja Jentzsch
Source: hZp://lod-‐cloud.net
• The nodes correspond to Linked Data sets
• The edges represent connec3ons between Linked Data sets
• The size of the nodes is propor3onal to the number of triples in each data set
• The datasets are categorized by knowledge domains represented with colors
Linking Open Data Cloud Visualization (2)
43 EUCLID – Interac3on with Linked Data
Image source: hZp://twitpic.com/17qj1h
“Linked Open Data Cloud” generated by Gephis
• The central cluster (green) displays DBpedia as a central focus
• The size of the nodes reflect the size of the datasets
• The length of the connec=ons encode informa3on about the data structure
Source: A. Dadzie and M. Rowe. Approaches to Visualizing Linked Data: A Survey. 2011
Linking Open Data Cloud Visualization (3)
44 EUCLID – Interac3on with Linked Data
“Linked Open Data Graph” by Protovis
Source: hZp://inkdroid.org/lod-‐graph/
• The data to be displayed are retrieved using the CKAN API
• The nodes represent Linked Data sets available in the Data Hub “lod-‐cloud” group
• The size of the nodes is propor3onal to the data set size
• Edges are connec3ons between data sets
• The colors reflect the CKAN ra3ng and the intensity of the color reflects the number of received ra3ngs
• The nodes can be clicked to go to the data set CKAN page
LD Reporting
EUCLID – Interac3on with Linked Data 45
• Visualiza3ons techniques are used in the crea3on of reports included in data monitoring and management solu3ons
• Provides and overview of the dataset by genera3ng a low-‐level descrip=ve analysis: • Quan3ta3ve informa3on about the dataset
• Users may interact with the data via dashboards
• Some systems support this feature over structured data: • Google Webmaster Tools (hZps://www.google.com/webmasters/tools) • Informa3on Workbench (hZp://www.fluidops.com/informa3on-‐workbench)
• eCloudManager (hZp://www.fluidops.com/ecloudmanager)
Google Webmaster Tool: Structure Data Dashboard (1)
EUCLID – Interac3on with Linked Data 46
• Provides to webmasters informa3on about the structured data embedded in their websites (and recognized by Google)
• The dashboard three levels: i. Site-‐level view: aggregates the data by classes defined in
the vocabulary schema
ii. Item-‐type-‐level view: provides details per page for each type of resource
iii. Page-‐level view: shows the aZributes of every type of resource on a given web page
Google Webmaster Tool: Structure Data Dashboard (2)
EUCLID – Interac3on with Linked Data 47
Source: hZp://googlewebmastercentral.blogspot.de/2012/07/introducing-‐structured-‐data-‐dashboard.html
Site-‐level view
Google Webmaster Tool: Structure Data Dashboard (3)
EUCLID – Interac3on with Linked Data 48
Source: hZp://googlewebmastercentral.blogspot.de/2012/07/introducing-‐structured-‐data-‐dashboard.html
Page-‐level view
Site-‐level view
LINKED DATA SEARCH
EUCLID – Interac3on with Linked Data 49
Semantic Search Process
Using semantic models for the search process
50 EUCLID – Interac3on with Linked Data
Faceted Search
Seman=c Search
Image based on: Tran, T., Herzig, D., Ladwig, G. SemSearchPro-‐ Using seman3cs through the search process
Data graphs Query
Result visualiza=on/presenta=on
User query (e.g. keywords, NL)
Query visualiza=on (Op3onal) User
System
Refinement
Presenta3on
Analysis
Presenta3on / Ranking
Graph matching
En3ty Extrac3on / Seman3c query analysis
Image Source: hZp://musicontology.com
Semantic Search: Example (1)
51 EUCLID – Interac3on with Linked Data
User query (NL) “songs wriZen by members of the beatles”
En=ty extrac=on:
Query expansion:
song
track
melody
tune
synonym
synonym
mo:Track Candidates
…
song member (of) wriZen by (the) beatles
En=ty mapping:
Semantic Search: Example (2)
52 EUCLID – Interac3on with Linked Data
User query (NL) “songs wriZen by members of the beatles”
En=ty extrac=on:
Query expansion:
writer
composer
creator synonym
mo:composer
Image Source: hZp://musicontology.com
Candidates wriZen by
inverse of
…
song member (of) wriZen by (the) beatles
En=ty mapping:
Semantic Search: Example (3)
53 EUCLID – Interac3on with Linked Data
User query (NL) “songs wriZen by members of the beatles”
En=ty extrac=on: song member (of) wriZen by (the) beatles
Query expansion:
member (of)
mo:member_of mo:member
inverse of
Image Source: hZp://musicontology.com
En=ty mapping:
Semantic Search: Example (4)
54 EUCLID – Interac3on with Linked Data
User query (NL) “songs wriZen by members of the beatles”
En=ty extrac=on: song member (of) wriZen by (the) beatles
En=ty mapping:
(the) beatles
Candidates
Beatles (Book)
The Beatles (Music Group)
Beatle (Animal)
Beatle (Automobile)
How to iden3fy the right “Beatle”? Examine the context (Contextual Analysis)
Semantic Search: Example (5)
55 EUCLID – Interac3on with Linked Data
User query (NL) “songs wriZen by members of the beatles”
En=ty extrac=on: song member (of) wriZen by (the) beatles
En=ty mapping:
(the) beatles
Contextual Analysis
foaf:Agent mo:composer
mo:Track
mo: MusicAr3st
rdfs:subClassOf
mo: MusicGroup
mo:member
rdfs:subClassOf
This subgraph is part of the query
The Beatles (Music Group)
dbpedia: The_Beatles
En=ty mapping:
Semantic Search: Example (6)
56 EUCLID – Interac3on with Linked Data
User query (NL) “songs wriZen by members of the beatles”
En=ty extrac=on: song member (of) wriZen by (the) beatles
?y
Mo:Track
?x mo:composer
a
dbpedia: The_Beatles
mo:member
Results (I want to) Come Home Angel in Disguise Another Day …
Answers presented to the user The results could be ranked
Query foaf:Agent a
Semantic Search
• Aims at understanding the meaning of the resources specified in the query
• Different approaches to exploit seman3cs:
• Query expansion using ontologies Since ontologies represent knowledge about specific domains, they can be used to expand the query by incorpora3ng related ontology terms into the query.
• Contextual analysis In LD, this approach may explore the resources specified in the query and their adjacent nodes in the RDF graph. Mainly applied to disambiguate query terms.
• Reasoning In some cases, the answer to a specific query is not explicitly contained in the data, but it can be computed by using reasoning methods.
57 EUCLID – Interac3on with Linked Data
Semantic Search & Linked Data
58 EUCLID – Interac3on with Linked Data
Component Seman=c search SPARQL query
Keyword or NL / concept matching
Performs en3ty extrac3on and matching to formal concepts
Not supported
Fuzzy concepts/rela3on/logics
Allows the applica3on of fuzzy qualifiers as query constrains
Not supported
Graph paZerns Uses the context and other seman3c informa3on to locate interes3ng sub-‐graphs
Applies paZern matching
Path discovery Finds new interes=ng links that may lead to addi3onal informa3on
Not supported
Semantic Search vs. SPARQL query
Semantic Search: Google (1)
59 EUCLID – Interac3on with Linked Data
Input: query in NL Output: List of answers
Google performs seman3c search on certain en33es and queries!
Semantic Search: Google (2)
60 EUCLID – Interac3on with Linked Data
Input: ques3on in NL
Output: List of web pages ranked using the algorithm Google PageRank to display the most relevant pages first
Semantic Search: DuckDuckGo (1)
61 EUCLID – Interac3on with Linked Data
Input: ques3on in NL
Output: List of answers
Semantic Search: DuckDuckGo (2)
62 EUCLID – Interac3on with Linked Data
Performs disambigua=on of the query terms.
The 45 sugges=ons are grouped by classes according to their corresponding knowledge domain: This approach is denominated Faceted Search
Faceted Search: Example
Information Workbench: Searching for artists in categories
63 EUCLID – Interac3on with Linked Data
Facet
Facet
Facet
Source: hZp://musicbrainz.fluidops.net/resource/mo:MusicAr3st?view=pivot
Depic3ons of ar3sts
Faceted Search
• Facets = proper3es
• Suitable for browsing mul=-‐dimensional taxonomies based on the search aZributes
• Allows user to explore the data: • User submits a (keyword) query
• Faceted system dynamically iden3fies the relevant facets (proper3es) for the given query and the constrains (values of those proper3es), and display the search results
• User may “drill down” by selec3ng specific constrains to the search results
• Informa3on can be accessed and ranked in mul3ple ways
64 EUCLID – Interac3on with Linked Data
Faceted Search (2)
Challenges for supporting Faceted Search
• Iden3fying which facets to surface: • In heterogeneous datasets, data entries may have different facets
• Dynamically iden3fy the most appropriate facets for each query
• Ordering the facets depending on the relevance to the query
• Compu3ng previews: • Accurately predic3ng counts, without examining all the results
• Offering facet preview to give users an idea of what to expect
65 EUCLID – Interac3on with Linked Data
Source: Teevan , J., Dumais, S., GuZ. Z. Challenges for Suppor3ng Faceted Search in Large, Heterogeneous Corpora like the Web
Faceted Search: LD Example (1)
FacetedDBLP
• Retrieves informa3on from the DBLP collec=on
• Shows the result set with different facets: • Publica3on years • Authors • Conferences
• It is implemented upon the DBLP++ dataset (enhancement of DBLP including addi3onal keywords and abstracts): • DBLP ++ is stored in a MySQL database • Uses D2R server to consume RDF triples
66 EUCLID – Interac3on with Linked Data
Faceted Search: LD Example (2)
67 EUCLID – Interac3on with Linked Data
Input: “crowdsourcing”
Facets
485 results
FacetedDBLP
Classification of Search Engines
68 EUCLID – Interac3on with Linked Data
Seman=c Search Systems
Faceted Search Systems
Google (GKG) Bing
KIM
sig.ma
LOD cloud cache /facet
Longwell
mSpace
Exhibit (SIMILE)
PoolParty Seman3c Search Server
DuckDuckGo
Hakia
SenseBot
PowerSet
DeepDive
Kosmix Fac3bles
Lexxe
Informa3on Workbench
Searching for Semantic Data
69 EUCLID – Interac3on with Linked Data
Search for
• Ontologies
• Vocabularies
• RDF documents
Semantic Data Search Engines (1)
EUCLID – Interac3on with Linked Data 70
Searching for ontologies Swoogle
hZp://kmi-‐web05.open.ac.uk/WatsonWUI hZp://swoogle.umbc.edu
Watson
Keyword search
Keyword search
Semantic Data Search Engines (2)
Searching for vocabularies: LOV Portal
• Allows to search proper=es, classes or vocabularies in the Linked Open Vocabulary (LOV) catalog
• The LOV search engine implement faceted search on: • The knowledge domain • The role of the resource matched from the input query • The vocabulary containing the resource
• Results are ranked according to a score considering: • Relevancy to the query (string) • Element labels matched importance • Number of LOV vocabularies that refer to the element
71 EUCLID – Interac3on with Linked Data
Semantic Data Search Engines (3)
72 EUCLID – Interac3on with Linked Data
Facets 84 results
Input: “ar3st”
CH 3
Searching for vocabularies: LOV Portal
Semantic Data Search Engines (4)
EUCLID – Interac3on with Linked Data 73
Searching for documents
hZp://swse.deri.org hZp://sindice.com
Seman3c Web Search Engine Sindice
METHODS FOR LINKED DATA ANALYSIS
EUCLID – Interac3on with Linked Data 74
Features of Data Analysis
75 EUCLID – Interac3on with Linked Data
Sta3s3cal analysis • Allows describing the data via Exploratory Data Analysis (EDA) methods • Includes sta3s3cal inference and predic3on
Data aggrega3on & filtering • One of the first steps in data analysis is pre-‐processing in order to select the
appropriate data to study
Visualiza=on techniques can be built on top of these as part of data analysis
Machine learning • Focuses on predic3on • Combines Ar3ficial Intelligence and Sta3s3cs • Includes supervised and unsupervised learning (not covered in this course)
LD Data Aggregation & Filtering
EUCLID – Interac3on with Linked Data 76
• Data aggrega3on refers to merging/summarizing several values into a single a one
• Filtering allows retrieving relevant data proper3es and selec3ng a par3cular range of data values
• SPARQL is able to perform these features via SELECT queries as follows:
Features SPARQL capabili=es
Aggrega3on Combining aggregate func3ons (COUNT, SUM, AVG, … ) and GROUP BY operator
Filtering Combining projec3on, FILTER and HAVING operators
LD Statistical Analysis
EUCLID – Interac3on with Linked Data 77
• Sta3s3cal analysis supports descrip=ve and predic=ve opera3ons
• SPARQL supports some descrip=ve opera=ons (average, maximum, minimum) but does not offer more sophis3cated sta3s3cal features like: • Fijng distribu3ons • Linear regressions • Analysis of variance • …
• Some approaches are able to consume data retrieved from SPARQL endpoints: – “R for SPARQL” by Willen Robert van Hage & Tomi Kauppinen – “Performing Sta,s,cal Methods on Linked Data” by Zapilko & Mathiak
R – Statistical Computing
EUCLID – Interac3on with Linked Data 78
• R is a language and environment for sta=s=cal compu=ng
• R provides a wide variety of sta=s=cal and graphical techniques • Linear and nonlinear modeling • Classical sta3s3cal tests • Time-‐series analysis • Classifica3on (Machine Learning) • Clustering (Machine Learning) • Extensible with further func3onali3es
• R is available as Free So_ware (under the terms of the GNU general public license)
Statistical Analysis with R
EUCLID – Interac3on with Linked Data 79
R for SPARQL
EUCLID – Interac3on with Linked Data 80
• The R for SPARQL Package enables to: • Connect a SPARQL endpoint over HTTP • Pose a SELECT query or an UPDATE opera3on (LOAD, INSERT, DELETE)
• If given a SELECT query, it returns the results as a data frame • The results can directly be mapped and visualized
• Posing requests: • If the parameter query is given, it is assumed that the input is a SELECT query
and a GET request will be performed to get the results from the URL of the endpoint
• If the parameter update is given, it is assumed that the input is an UPDATE opera3on and a POST request will be submit to the URL of the endpoint. Nothing is returned
Source: hZp://linkedscience.org/tools/sparql-‐package-‐for-‐r/
R for SPARQL: Example (1)
EUCLID – Interac3on with Linked Data 81
1. Download the R package and load it: • library(SPARQL) • Library(sp) #user for plotting spatial data
2. Define the endpoint with the triples • endpoint = "http://spatial.linkedscience.org/sparql"
3. Define the query • q = "SELECT ?cell ?row ?col ?polygon ?DEFOR_2002
WHERE { ?cell a <http://linkedscience.org/lsv/ns#Item> ; <http://spatial.linkedscience.org/context/amazon/Lin> ?row ; <http://spatial.linkedscience.org/context/amazon/Col> ?col; <http://observedchange.com/tisc/ns#geometry> ?polygon . <http://spatial.linkedscience.org/context/amazon/DEFOR_2002> ?DEFOR_2002 . }"
Source: hZp://linkedscience.org/tools/sparql-‐package-‐for-‐r
R for SPARQL: Example (2)
EUCLID – Interac3on with Linked Data 82
4. Link the result to an object • res <-‐ SPARQL(endpoint,q)$results
5. Handling the results • res$row <-‐ -‐res$row • coordinates(res) <-‐ ~col -‐ row
6. Chose the graphical format and plot the results • spplot(res,"DEFOR_2002",col.regions=rev(heat.colors(
17))[-‐1], at=(0:16)/100, main="relative deforestation per pixel during 2002")
Source: hZp://linkedscience.org/tools/sparql-‐package-‐for-‐r
R for SPARQL: Example (3)
EUCLID – Interac3on with Linked Data 83
Source: hZp://linkedscience.org/tools/sparql-‐package-‐for-‐r
Machine Learning
EUCLID – Interac3on with Linked Data 84
• Machine Learning techniques allow to extract interes3ng informa3on from data sources, and can be used to discover hidden paUerns within datasets by generalizing from examples
• Different ML approaches can be applied: • Clustering: groups similar data into data par33ons called clusters • Associa=on rule learning: discovers rela3ons between variables • Decision tree learning: analyses observa3ons to build a predic3ve
model represented as a tree • Many others …
• Weka is a Data Mining framework commonly used to apply ML on tabular data: – www.cs.waikato.ac.nz/ml/weka
Machine Learning on LD
EUCLID – Interac3on with Linked Data 85
Challenges for applying Machine Learning on LD • LD heterogeneity introduces noise to the data:
– Same LD resources, different URIs – Predicates with similar seman3cs, but different constraints
• The data is not independent and iden3cally distributed (iid): – It does not consist of only one type of objects – The en33es are related to each other
• LD rarely contains nega=ve examples needed for ML algorithms: – For example, owl:differentFrom
Source hZp://www.cip.ifi.lmu.de/~nickel/iswc2012-‐slides
Applications of Machine Learning on LD
EUCLID – Interac3on with Linked Data 86
• Node ranking: – Ranking nodes according to their relevance for a query
• Link predic=on: – Infer edges between LD resources – Predict the new edges that will be added to the RDF graph
• En=ty resolu=on: – Determine whether two URIs correspond to the same real-‐world object
• Taxonomy learning: – Infer taxonomies or concept hierarchies from a given vocabulary or ontology
Summary
EUCLID – Interac3on with Linked Data 87
• Linked Data visualiza3on techniques: • Visualiza3ons must be chosen according the type of the data • Wide variety of tools suppor3ng SPARQL results’ visualiza=on • Might be used in dashboards for suppor3ng administra3ve tasks
• Linked Data search • Seman=c search: exploits the meaning of user queries (NL or set of
keywords) to present useful results • Faceted search: allows browsing mul3-‐dimensional data
• Linked Data analysis: • Includes data manipula3on such as aggrega=on & filtering • Applies sta=s=cal methods to get a beZer understanding of the data • Machine Learning techniques can be applied for predic3ve analysis • Visualiza=on techniques can be built on top of the previous features
For exercises, quiz and further material visit our website:
EUCLID -‐ Providing Linked Data 88
@euclid_project euclidproject euclidproject
http://www.euclid-‐project.eu
Other channels:
eBook Course