methodological guidelines for publishing linked...
TRANSCRIPT
Methodological Guidelines for Publishing
Linked Data
Boris Villazón-Terrazas [email protected]
@boricles Slides available at: http://www.slideshare.net/boricles/
Acknowledgements: OEG
2
Main References
2
Wood, David (Ed) Linking Government Data - 2011!
Methodological Guidelines for Publishing Government Linked Data!
Boris Villazón-Terrazas, Luis M. Vilches, Oscar Corcho, Asunción Gómez-Pérez!
Best Practices for Publishing Linked Data!
W3C Editor’s Draft – Government Linked Data Working Group!
Bernadette Hyland, Boris Villazón-Terrazas, Ghislain Atemezing!
https://dvcs.w3.org/hg/gld/raw-file/default/bp/index.html!
Cookbook for Open Government Linked Data!
W3C Editor’s Draft – Government Linked Data Working Group!
Bernadette Hyland, Boris Villazón-Terrazas!
http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook!
3
ToC » Introduction
» Guidelines for Publishing Linked Data
» Uses Cases
4
Publishing Linked Data
?
Process that involves a high number of steps, design decisions and technologies.
5
Iterative and Incremental Linked Data Life Cycle
6
Iterative and Incremental Linked Data Life Cycle
7
Specification
§ Identification and analysis of the data sources
§ URI design
§ Definition of the license
8
Specification Identification and analysis of the data sources
After we have identified and selected the government data sources
§ Search and compile all the available data and
documentation about those resources
§ Identify the schema of those resources including conceptual components and their relationships
§ Identify the items in the domain, i.e., things whose properties and relations are described in the data sources
9
Specification URI Design
§ Use meaningful URIs, instead of opaque URIs, when possible
§ Separate TBox (ontology model) from ABox (instances) URIs.
- Base URI http://data.gov.bo/ http://health.data.gov.bo/
- TBox URIs http://data.gov.bo/ontology/{class|property}
- ABox URIs http://data.gov.bo/resource/ http://data.gov.bo/resource/province/Tiraque
10
Iterative and Incremental Linked Data Life Cycle
11
Modelling Reuse available vocabularies
Search for suitable vocabularies
Linked Open Vocabularies
are there suitable
vocabularies?
Build the vocabulary by reusing available
vocabularies
Yes
No
12
Modelling Reuse available non-ontological resources
Search for suitable non-ontological resources
Highly reliable Web Sites
Domain-related sites
Government Catalogs
are there suitable
resources?
Build the vocabulary by transforming available
resources
Yes
No
Build the vocabulary from scratch
Boris Villazón-Terrazas, A Method for Reusing and Re-Engineering Non-Ontological Resources for Building Ontologies. IOS Press 2012
*
…
13
Iterative and Incremental Linked Data Life Cycle
14
Generation
§ Transformation
§ Data cleansing / curation
§ Linking
15
Generation Transformation
§ Take the data sources selected in the specification activity and transform them to RDF according to the vocabulary created in the modelling activity
§ Some tools - CSV and spreadsheets
• RDF extension of Google Refine, XLWrap, RDF123, NOR2O
- RDB • D2R Server, ODEMapster, W3C RDB2RDF WG – R2RML
- XML • GRDDL, ReDeFer
http://www.w3.org/wiki/ConverterToRdf
16
Generation Transformation – RDB2RDF
§ A majority of dynamic Web content is backed by relational databases (RDB), and so are many enterprise systems.
§ W3C RDB2RDF Working Group R2RML: RDB to RDF Mapping Language - http://www.w3.org/TR/r2rml/ Direct Mapping - http://www.w3.org/TR/rdb-direct-mapping/ R2RML and Direct Mapping Test Cases - http://www.w3.org/2001/sw/rdb2rdf/test-cases/ RDB2RDF Implementation Report - http://www.w3.org/TR/rdb2rdf-test-cases/
16
transformation description
transformation engine
17
Generation Transformation – Spreadsheets to RDF
17
Industry Production Index
Province
Year
NOR2O
18
Generation
§ Tool for generating RDF from geospatial information
§ The geometry could be available in GML or WKT
https://github.com/boricles/geometry2rdf
Transformation – Geospatial to RDF
19
Generation Transformation – MARC21 to RDF
19
§ A MARC Mappings and RDF generator
MARiMbA
Classification
Annotation
Mapping templates
Relationships
MARiMbA
Domain experts
20
Generation Linking
Identify suitable data sets as linking targets
Look for similar datasets in http://thedatahub.org/
Discover relationships between data items
Silk Framework LIMES
Validate the relationships discovered
http://aksw.org/Projects/limes http://www4.wiwiss.fu-berlin.de/bizer/silk/
Look for our resources in tools like sig.ma
21
Iterative and Incremental Linked Data Life Cycle
22
Publication
§ Dataset publication
§ Metadata publication
§ Dataset discovery
23
Publication Dataset publication
§ Tools for storing RDF/SPARQL endpoint/Linked Data frontend
- Virtuoso Universal Server, Jena, Sesame, 4Store, YARS, OWLIM, Talis Platform, Fuseki, Pubby, Linked Data API
§ Store the RDF data in different graphs - http://example.com/graph/ontology - http://example.com/graph/dataset - http://example.com/graph/links
24
Publication Metadata Publication
§ VoID allows to express metadata about RDF datasets
§ The PROV Ontology
http://www.w3.org/TR/void/ http://www.w3.org/TR/prov-o/
25
Publication Dataset discovery
§ Register the dataset into CKAN Registry, thedatahub.org
§ Generate sitemap files for your dataset, by using sitemap4rdf
§ Submit the sitemap location to Google and Sindice
http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation http://lab.linkeddata.deri.ie/2010/sitemap4rdf/
26
Publication Dataset discovery - Example
http://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation http://lab.linkeddata.deri.ie/2010/sitemap4rdf/
27
Iterative and Incremental Linked Data Life Cycle
28
Exploitation
Streaming resources
29
Exploitation
• Faceted browser interface.
• Geospatial visualization using Google Maps and Open Street Maps.
• Visualization of geometries (LineStrings, Polygons, etc) when using the GeoLinkedData data model.
• Visualization of statistical data using SCOVO / RDF Data Cube.
map4rdf
https://github.com/boricles/linked-data-visualization-tools
SPARQL
Triplestore
30
ToC » Introduction
» Guidelines for Publishing Linked Data
» Uses Cases
31
http://geo.linkeddata.es
1. Specification 2. Modelling
3. Generation 4. Publication & Exploitation
32
http://aemet.linkeddata.es/browser_en.html
1. Specification 2. Modelling
3. Generation 4. Publication & Exploitation
Python scritps
250 weather stations (pressure, humidity, etc)
Data from the stations in CSV files in a FTP server
33
http://bne.linkeddata.es/graphvis/ http://datos.bne.es/
1. Specification 2. Modelling
3. Generation 4. Publication &
Exploitation
MARC 21 XML records
MARiMbA
Classification
Annotation
Mapping templates
Relationships
MARiMbA
Domain experts
34
http://webenemasuno.linkeddata.es
1. Specification 2. Modelling
3. Generation 4. Publication & Exploitation
Scenario in the context of tourism and travelling, where the content is aggregated from different platforms. Heterogeneous content (images, travel guides, posts,
videos, news)
35
Methodological Guidelines for Publishing Linked Data
Boris Villazón-Terrazas [email protected]
@boricles Slides available at: http://www.slideshare.net/boricles/
Acknowledgements: OEG