tutorial kcc-2011
DESCRIPTION
Semantic Web and Linked DataTRANSCRIPT
2011.06.30
Linked Data:Enabler of Semantic Web
Sung-Kook HanSemantic Technology Lab Won Kwang Univ.
Outline
Introduction to Semantic Technology
Semantic Technology + Web Technology
• Semantic Web
• Web 2.0
• Linked Data
Design and Publication of Linked Data
• 9 steps towards Linked Open Data
Why Semantic Technology??
the ways of thinking, cognition…
George Boole: An Investigation of the Laws of Thought (1854)
Claude Shannon: 1937 master's thesis, A Symbolic Analysis of Relay and Switching Circuits
Kurt Gödel Alan TuringJohn von Neumann
Our Computers
Communication
Human vs. Human
Human vs. Alien
Human vs. Computer
Computer vs. Computer
Semantic Technology
Semantic technology has been a distinct research field for more than 40 years.
Formal Logic (since Russell and Frege)
Knowledge Representation Systems in AI
Semantic Networks and ATN (William Woods, 1975)
DARPA and European Commission programs in information integration
Development of simple tractable logics
Relational Algebras and Schemas in Database Systems
Library Science (classifications, thesauri, taxonomies)
New challenges of Semantic Technology: Semantic Web
A massive store of information that computers cannot use
A way to get around needing the “big data warehouse”
Another place where “a little semantics can go a long way”...
cf: The Relationship Between Web 2.0 And the Semantic Web - Dr. Mark Greaves, Vulcan, Inc.
Ontology Spectrum
Animal
Mammal ReptileBird
SnakeDog Cat
Cocker
Spaniel
Lady
Technologieshas_experience_in
Programsworks
Personnel
S1
Agent
Company
illusion
has WISO
Department
am
AS ASAS
LeoPaulnderleez
IntelligenceNavy
BradAnn
Howard
AssistantDirectorReza
Director
Technical
ManagementProject
TelecommunicationTask
Program
EcDARPA
Request
SemanticInteroperability
KnowledgeRepresentation
NaturalLanguage
Is Disjoint Subclass
of with transitivity
property
Modal Logic
Logical Theory
ThesaurusHas Narrower Meaning Than
TaxonomyIs Sub-Classification of
Conceptual ModelIs Subclass of
DB Schemas, XML Schema
UML
First Order Logic
Relational
Model, XML
ER
Extended ER
Description Logic
DAML+OIL, OWL
RDF/SXTM
Syntactic Interoperability
Structural Interoperability
Semantic Interoperability
weak semanticsweak semantics
strong semanticsstrong semantics
Based on Leo Obrst, The Ontology Spectrum & Semantic Models
Semantic Technology
OntologyOntology
MetadataMetadata
controlled vocabularycontrolled vocabulary
Web resourcesServices
Web resourcesServices
ImageAudio/Video
ImageAudio/Video
DocumentsDocuments
IntegrationIntelligence Interoperability
Semantic
Technology
Semantic
Technology
Machine-processibleSemantics
DigitalInformation Resources
Web Technology
Web of machine-processible Data
Common vocabularies: Metadata and Ontology
Query and reasoning
Web of Services
Internet of Services
Internet of Things
Social Web
Connect human-being
Web as a platform
Programmable APIs and proprietary interfaces
Mashups based on a fixed set of data sources
Classic Web
Web of Documents
HTML as document format
HTTP URLs as globally unique IDs
Hyperlinks to connect everything
Semantic Web
Standardizations Trio of Semantic Web
Metadata / Ontology: RDF, RDFS, OWL
Query Language: SPARQL
Rule Language: RIF (SWRL)
SKOS, RDFa, GRRDL, WSMO,…
SOAP/ REST
Tools and Systems Authoring, Reasoning Engines,…
835 items in Sweet Tools
Best Practices Linked Open Data
Semantic MediaWiki
NEPOMUK, SIOC, Garlik
W3C Semantic Web Use cases
Sweet Tools: http://www.mkbergman.com/new-version-sweet-tools-sem-web/
W3C Semantic Web Case Studies and Use Cases: http://www.w3.org/2001/sw/sweo/public/UseCases/
Semantic Applications
Semantic Wave 2008, Industry Roadmap to Web 3.0, Project10X
http://www.mkbergman.com/new-version-sweet-tools-sem-web/
Web 2.0
Resharpen the way of viewing the WebWeb as the platform
Web as the social media
Web as the collaboration tool
Web as ……
Web 2.0 Manifestation Openness / Sharing
Participation / Collaboration
Web 2.0 Syndrome Library 2.0
Government 2.0
Enterprise 2.0
……
New Web applications wiki, blog, RSS,…
Web 2.0 Developers
Semantic Web Today
Major future issues:
• Vocabularies • Scalability• Provenance• Personal Infospheres• Mobile and Real World Networks
Web 2.0 APIs Today
MashUp
WebAPI
WebAPI
WebAPI
A CB
No Single global space:
• Mashups of APIs are proprietary.• No links between data.
Web APIs slice the Web into Walled Gardens.
Christian Bizer: Pay-as-you-go Data Integration (21/9/2010)
Long Live the Web !
http://www.scientificamerican.com/article.cfm?id=long-live-the-web
Lessons Learned
Data is more important than API code.
Data is the Intel Inside.
Open data is more important than open source
Structured data is more valuable than unstructured.
We should seek to structure our data well.
Metadata will play a core role of data structure.
A little semantics goes a long way.
Beware the usefulness of shallow ontology shown in LOD.
Linking data and services are essential.
Link every thing.
Rich user experiences are the key for adaption.
We should consider mobile computing and personalization.
Visualize and navigate.
Semantic Web &
Linked Data
Web of Documents
A global file systems of documents (document silos on the
Web).
Implicit semantics of content and links
Designed for human consumption
Disconnected data
Architecture: Web of Documents
HTMLDoc.
DB-C
HTMLDoc.
DB-A
HTMLDoc.
DB-B
hyperlinkdocument link
hyperlinkdocument link
WebBrowsers
SearchEngines
HTTP URL
Analogy
a global file system
Designed for
human consumption
Primary objects
documents
Links between
documents (or sub-parts of)
Degree of structure in objects
fairly low
Main Usage
Search and browsing
Semantics of content and links
implicit
Machine-Processible Data
Web of Documents
Web of Data
Database
Documents
Documents
Data
Information Resources
Human processible
Machine processible
Open the data silos and get rid of repository-centric mindset Publish data of public interest on the Web In a way that other applications can access and interpret the data Using common Web technologies
Semantic Web: Web of Data
The vision of a Semantic Web:
building a global Web of machine-readable data
Berners-Lee, Hendler & Lassila, 2001; Marshall & Shipman, 2003
Linked Data Foundation
can lower the barrier to reuse, integration and application of data from multiple,
distributed and heterogeneous sources.
the more sophisticated proposals associated with the Semantic Web vision,
such as intelligent agents, may become a reality.
The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web - a web of data that can be processed directly or indirectly by machines. Therefore, while the Semantic Web, or Web of Data, is the goal or the end result of this process, Linked Data provides the means to reach that goal. -- Tim Berners-Lee, et al., http://linkeddata.org/docs/ijswis-special-issue, Jan, 2009
The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web - a web of data that can be processed directly or indirectly by machines. Therefore, while the Semantic Web, or Web of Data, is the goal or the end result of this process, Linked Data provides the means to reach that goal. -- Tim Berners-Lee, et al., http://linkeddata.org/docs/ijswis-special-issue, Jan, 2009
Linked Data: Web of Data
Goal: Web-scale Data Integration
Alternative to classic data integration systems in order to cope with growing
number of data sources.
Querying across data sources
Global distributed database
Extend the Web with a single global data space
Giant Global Graph (GGG)
Demonstrate the possibility of Semantic Web
By using RDF to publish structured data
By setting links between data
RDF
RDF
RDF
RDF
RDFRDF
singleuniversal
information space.
Architecture: Linked Data
RDFtriples
DB-C
RDFtriples
DB-A
TriplesRDF
Triples
DB-B
RDF linkdata link
RDF linkdata link
Linked DataBrowsers
SearchEngines
HTTP URI
Linked DataMashup
Analogy a global database
Designed for machines first, humans later
Primary objects things (or descriptions (data) of
things)
Links between things
Degree of structure in (descriptions of) things high
Main usage query, navigation and reasoning
Semantics of content and links explicit
Linked Data Principles
Set of best practices for publishing structured data on the Web in accordance with
the general architecture of the Web.
Use URIs as names for things.
Use URIs as names for things, not just for documents or homepages
Use HTTP URIs so that people can look up those names.
When someone looks up a URI, provide useful RDF information.
Include RDF statements that link to other URIs so that they can discover
related things.
URI
URI
URI
URIURI
URI URI
RDF Link
HTTP URI
RDF triple Information
Linked Open Data
Community effort to
publish existing open license datasets as Linked Data on the Web
interlink things between different data sources
develop clients that consume Linked Data from the Web
began early 2007
LOD Data sets on the Web
http://richard.cyganiak.de/2007/10/lod/lod-datasets_2010-09-22_colored.svg
25 billion RDF triples, which are interlinked by around 395 million RDF links (Sep. 2010).
Summary: Web of Linked Data
A global, distributed database built on a simple set of
standards
RDF, URI, HTTP
Explicit semantics of content and links
Resources are connected by semantic links.
creating a single global data graph that span data sources
enables the discovery of new data sources
Provides for data co-existence
Anyone can publish data to the Web of Linked Data
Data publishers are not constrained in choice of vocabularies with
which to represent data.
Designed for computer first, humans later
Data.Gov
Europeana
European digital library: Europeana: This European Commission initiative
encompasses not only libraries but also museums, archives and other holders of cultural
heritage material.
http://version1.europeana.eu/web/europeana-project
Linked Library Cloud
Libraries have been producing
metadata for ages.
Libraries (often) produce high-
quality metadata.
Library develops many metadata
standards such as DC, SKOS,
BIBO, OAI-ORE including
MARC 21, MODS, FRBR,..
Integrate Library Catalogues on
global scale
http://code4lib.org/conference/2010/singer
Linking Open Drug Data
linking the various sources of
drug data together to answer
interesting scientific and
business questions.
Survey publicly available data
sets about drugs
Publish and interlink these data
sets on the Web
Explore interesting questions that
could be answered if the data sets
are linked.
8 million RDF triples, which are
interlinked by more than
370,000 RDF links (As of
August 2009)
BBC Semantic Project
Publish program / music data as RDF/XML or RDFa
Build semantically linked and annotated web pages about artists and
singers whose songs are played on BBC radio stations.
semantically interconnected
DBpedia Mobile
Show map with information about nearby locations
Linked data browser
GPS + Google Maps + DBpedia + Flickr + Revyu
Attention by Search Engines
Yahoo!
crawls Linked Data in its RDFa serialization as well as Microformat
Yahoo Search Monkey to make search results more useful and visually
appealing
provides access to crawled data through the Yahoo BOSS API
use Social Graph API
is developing Google Squared and Google Fusion Table
merged MetaWeb
manage Freebase, a DBpedia/YAGO competitor
Rich Snippets
Linked Open Commerce
Design and Publication
of
Linked Data
9 Steps to publishing Linked Data
Understand the principlesUnderstand the principles
Setup Your Infrastructure for Linked DataSetup Your Infrastructure for Linked Data
Understand your dataUnderstand your data
Create VocabulariesCreate Vocabularies
Choose URIs for Things in your DataChoose URIs for Things in your Data
Link to other Data SetsLink to other Data Sets
Describe your Data SetsDescribe your Data Sets
Publicize your Data SetsPublicize your Data Sets
Triplify Data SetsTriplify Data Sets
1. Understand Linked Data
• Principle• Core Stack• Data Modeling
Linked Data: Overview
Benefits of Linked Data Enables web-scale data distributed
publication with web-based discovery mechanisms.
Linked Data Web Resources are generic real-world data
objects or entities:
People, Places, and other physical things
Abstract concepts (e.g., emotion, notion,…)
Subject matter (e.g., science, economics, arts,…)
Linked Data is not just structured data published on the
Web.
Linked Data is based on well-established Web standards
Linked Data adds value: less redundancy, greater
discoverability, network effects.
Linked Data Principles (TimBL, 2006)
Use URIs as names for things
not just for documents
http://dbpedia.org/resource/ontology
you are not your homepage
http://mentalist.com/actor/patrick_jane
Use HTTP URIs
globally unique names, distributed ownership
allows people to look up those names
Provide useful information in RDF
when someone looks up a URI
Include RDF links to other URIs
to enable discovery of related information
5 Star rating
On the web, open licensed: Available on the web (whatever format), but with an open license
Machine-readable data: Available as machine-readable structured data (e.g. excel instead of image scan of a table)
Non-proprietary format (e.g. csv instead of excel)
RDF standards: Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff
Linked RDF: Link your data to other people’s data to provide context
Linked Data Core Stack
http://linkeddata-specs.info/
RFC 2616 Hypertext Transfer Protocol• HTTP/1.1 Defines HTTP, a generic and stateless application-level protocol for distributed,
collaborative, hypermedia information systems.
RFC 3986 Uniform Resource Identifier (URI): • Generic Syntax Defines a generic URI syntax and a process for resolving URI references that
might be in relative form, along with guidelines and security considerations for the use of URIs on the Internet.
RDF Concepts and Abstract Syntax • Defines the RDF graph data model and key concepts.
SPARQL Query Language for RDF • Defines the syntax and semantics of the SPARQL query language for RDF.
Core Technology
Uniform Resource Identifier (URI)
Names (identifiers) for resources in an open Web environment
Resource Description Framework (RDF)
a model for representing metadata on the web
triple structure
RDF Schema and OWL
languages for defining vocabularies
RDF/XML, N3, Turtle,…
serialization and de-serialization of RDF triples for exchanging RDF data
Simple Knowledge Organization System (SKOS)
a language for describing controlled vocabularies
SPARQL
a query language and protocol for accessing RDF data via the Web
Linked Data Modeling
Data ModelingData Modeling Data LinkingData Linking
RDF data model to publish structured data on the WebRDF data model to publish structured data on the Web
RDF links to interlink data from different data sourcesRDF links to interlink data from different data sources
RDF triple: subject, predicate, and object Subject: URI identifying the described resource Predicate: relation exists between subject and object, vocabularies, collections of URIs that can be used to represent information about a certain
domain Object: a simple literal value, or the URI of another resource that is related to the subject
Linked Data Model
Flexible graph-based model: RDF graph
URI: global primary key
skos:subject = http://www.w3.org/2004/02/skos/core#subject dbp-prop:title = http://dbpedia.org/property/title
The HTTP protocol brings together identification
and retrieval again.
Deeper into the Web
http://.../isbn/46316
The Lord of the rings
English novels
dbp-prop:title
skos:subject
J.R.R. Tolkien
wkp-en:J.R.R.Tolkien
dbp-prop:author
dbp-prop:name
foaf:homepage dbpidia:Allen&Unwin
dbp-prop:publisher
fb:guid…..92df7London
Marivie
83 Alexander St 83 Alexander
opencyc:headquarterdbp-prop:city
fb:creator
fb:street_address
Basic Infrastructure
Data/Content
DB
extractionextraction
conversionconversion
linklinkgeneration
SPARQLQueryEngine
Framework + APIs
Web Server (Apache)
index
searchdiscoverynavigation
triple store
packaging
browser navigator search
RDF Triple Base
Interface
Delivery
Application
Infrastructure Construction
Configuration of Web server
Configuring the server for correct MIME types application/rdf+xml
Code samples for ConNeg and 303 Redirects: http://linkeddata.org/tools
use cURL: http://curl.haxx.se/ to configure Apache
Configure for hash URI or Slash URI
Testing your content negotiation
Install the LiveHTTPHeaders and Modify Headers extensions for Firefox
Try LiveHTTPHeaders against my URI
http://www.skyhigh.com/id/hong
do the same with URIs from other data sets
Modify your headers to ask for application/rdf+xml
Supporting Technologies
Linked Data Browsers
provide for navigating between data sources and for exploring the dataspace.
Tabulator Browser (MIT, USA), Marbles (FU Berlin, DE), OpenLink RDF
Browser (OpenLink, UK), Zitgist RDF Browser (Zitgist, USA), Disco
Hyperdata Browser Berlin, Fenfire (DERI, Irland)
Web of Data Search Engines
crawl the data space and provide best-effort query answers over crawled data.
Falcons (IWS, China), Sig.ma (DERI, Ireland), Swoogle (UMBC, USA),
VisiNav (DERI, Ireland), Watson (Open University, UK), TAP, Sindice
Supporting Technologies
Describing data set
discovery and usage of linked datasets
voiD, Ding
Registry
an open registry of data and content packages
CKAN
Linking tool
discovering relationships between data items within different Linked Data sources
SILK
Mapping tool
mapping database to RDF triples
Triplify, D2R Server
LOD platform D2R Server, Virtuoso Universal Server,
Talis Platform, Pubby, …
3. Understand Data to be published
• Review about Data to be published• Requirement analysis
Review about Data to be published
What think about the key things to be presented in Linked Data
analysis of data properties
What vocabularies can be used to describe these?
Why purposes and goals of linked data to be published
What for how to use and apply linked data (use cases)
How to serve Serving Linked Data as Static RDF/XML Files
Serving Linked Data as RDF Embedded in HTML Files
Serving RDF and HTML with Custom Server-Side Scripts
Serving Linked Data from Relational Databases
Serving Linked Data from RDF Triple Stores
Serving Linked Data by Wrapping Existing Application or Web APIs
Guideline for Vocabulary Creation
Do not define new vocabularies from scratch, but complement existing
vocabularies with additional terms (in your own namespace) to represent your
data as required.
Provide for both humans and machines. Use rdfs:comments for each term
invented. Always provide a label for each term using the rdfs:label property.
Make term URIs de-referenceable following the W3C Best Practice Recipes
for Publishing RDF Vocabularies.
Make use of other people's terms. Using other people's terms, or providing
mappings to them, by means of rdfs:subClassOf or rdfs:subPropertyOf.
State all important information explicitly. For example, state all ranges and
domains explicitly.
Do not create over-constrained, brittle models; leave some flexibility for
growth. Do not use full-featured OWL or RDF to define your vocabulary.
Unless you know exactly what you are doing, use RDF Schema to define
vocabularies.
Potential Ontologies / Vocabularies
Friend-of-a-Friend (FOAF), vocabulary for describing people.
Dublin Core (DC) defines general metadata attributes. See also their new
domains and ranges draft.
Semantically-Interlinked Online Communities (SIOC), vocabulary for
representing online communities.
Description of a Project (DOAP), vocabulary for describing projects.
Simple Knowledge Organization System (SKOS), vocabulary for
representing taxonomies and loosely structured knowledge.
Music Ontology provides terms for describing artists, albums and tracks.
Review Vocabulary, vocabulary for representing reviews.
Creative Commons (CC), vocabulary for describing license terms
Geo, vocabulary for describing geographical locations
GoodRelations, vocabulary for describing products
Common Namespaces
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#"xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:dc="http://purl.org/dc/terms/"xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:vcard="http://www.w3.org/2006/vcard/ns#"xmlns:dbp="http://dbpedia.org/dbprop/"xmlns:geo="http://www.geonames.org/ontology#"xmlns:gr="http://purl.org/goodrelations/v1#" xmlns:commerce="http://search.yahoo.com/searchmonkey/commerce/"xmlns:media="http://search.yahoo.com/searchmonkey/media/" xmlns:cb="http://cb.semsol.org/ns#"
More Common Namespaces:http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/CommonVocabularieshttp://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/100-most-popular-rdf-namespaces
Definition of Vocabulary
# Definition of the class "Lover"<http://sites.movie.org/pub/LoveVocabulary#Lover>
rdf:type rdfs:Class ;rdfs:label "Lover"@en ;rdfs:label "Liebender"@de ;rdfs:comment "A person who loves somebody."@en ;rdfs:comment "Eine Person die Jemanden liebt."@de ;rdfs:subClassOf foaf:Person .
# Definition of the property "loves"<http://sites.movie.org/pub/LoveVocabulary#loves>
rdf:type rdf:Property ;rdfs:label "loves"@en ;rdfs:label "liebt"@de ;rdfs:comment "Relation between a lover and a loved person."@en ;rdfs:subPropertyOf foaf:knows ;rdfs:domain <http://sites.movie.org/pub/LoveVocabulary#Lover> ;rdfs:range foaf:Person .
Tools for Vocabulary Definition
Ontology editors
Protégé:
an open-source ontology editor with a dedicated OWL plug-in
Neologism:
Web-based tool for creating, managing and publishing simple RDFS
vocabularies.
open-source and implemented in PHP on top of the Drupal-platform.
TopBraid Composer:
a powerful commercial modeling environment for developing Semantic
Web ontologies
NeOn Toolkit:
an open-source ontology engineering environment with an extensive set of
plug-ins.
5. Choose URIs
• Resource Identification• Types of URIs• De-Referencing• Common URI Patterns
Resource Identification
Separation of Identity and Representation
Identity
Identity (URI) of an Object or Entity should be unambiguous and globally unique
Representation
On the Web a URI should provide an unambiguous data access path
Access
Reference to abstract (physically inaccessible)
Objects or Entities is only achievable via conduit documents that carry representations of entity descriptions (which at best are facets of an entire description)
URI Requirements:
Keep out of other peoples' namespaces
Use a namespace that you control
Abstract away from implementation details (Short is better…)
Stable and persistent
Hash or Slash
Use common URI patterns
URI
URI: Unique Resource Identifier
http://www.example.com/people/alice
home page??(Web document)
informationobject ??
URI: identification of people, products, places, ideas and concepts such as ontology classes, including URLs for Web documents
Two Approaches
hash URIhash URI
slash URIslash URI
Hash / Slash URI
Hash URI
URIs can contain a fragment, a special part that is separated from the
rest of the URI by a hash symbol (“#”).
http://www.example.com/products/BiBimBab#this
http://www.travel.com /nation/Korea/KyungJu#main
simply publish a description document containing RDF about the things
at the base URI
Slash URI
examples:
http://www.example.com/products/BiBimBab
http://www.travel.com /nation/Korea/KyungJu
must publish your description document at another, distinct URI.
hash URI
http://www.skyhigh.com/person/GilDong#this
http://www.skyhigh.com/person/GilDong
Metadata:content-type:application/xhtml+ xml
Data:<html xmlns=“..<head><title> Our hero…
</html>
Entity(GilDong)
Separating identification and naming from representation
slash URI
http://www.skyhigh.com/person/hero/GilDong/id
http://www.skyhigh.com/person/hero/GilDong/page
http://www.skyhigh.com/person/hero/GilDong/data
Metadata:content-type:application/xhtml+ xml
Data:<html xmlns=“..<head><title> Our hero…
</html>
Metadata:content-type:application/rdf+xml
Data:<html xmlns=“..<head><title> Our hero…
</html>
Entity(GilDong)
Separating identification and naming from representation
Slash vs. Hash
Slash URI HTTP redirection (30X response) is required in order for resource "Identity" to be
separated from "representation". :
http://www.skyhigh.com/person/hero/GilDong/id (URI of an Organization Entity)
http://www.skyhigh.com/person/hero/GilDong/page (HTML representation of Entity description)
http://www.skyhigh.com/person/hero/GilDong/data (RDF representation that describes the Entity which could be: Turtle, N3. RDF/XML etc. based data serialization)
Hash URI HTTP redirection isn't required in order for resource "Identity" to be separated from
"representation". :
http://demo.openlinksw.com/Northwind/Customer/ALFKI#this (URI of an Organization Entity)
http://demo.openlinksw.com/Northwind/Customer/ALFKI a document (HTML, Turtle, N3, RDF/XML, representation of Entity description).
DeReferencing Hash URI
http://www.example.com/about#alice
RDF
http://www.example.com/about
automatic truncation of fragment
ID
Without content negotiation
http://www.example.com/about#alice
http://www.example.com/about.rdf
automatic truncation of fragment
ID
RDF
HTML
http://www.example.com/about.html
contentnegotiation
application/rdf+xml win text/html win
http://www.example.com/about
With content negotiation
DeReferencing Slash URI
One Generic Document Different documents
http://www.example.com/id/alice
http://www.example.com/doc/alice.rdf
303 redirected
ID
RDF
HTML
http://www.example.com/doc/alice.html
contentnegotiation
application/rdf+xml win text/html win
http://www.example.com/doc/alice
generic document
http://www.example.com/id/alice
http://www.example.com/doc/alice.rdf
ID
RDF
HTML
http://www.example.com/doc/alice.html
303 redirectedwith contentnegotiation
application/rdf+xml win
text/html win
Content Negotiation
Content Negotiation
Common URI Pattern
http://dbpedia.org/resource/New_York_City Thinghttp://dbpedia.org/data/New_York_City RDF datahttp://dbpedia.org/page/New_York_City HTML page
http://revyu.com/people/tom Thinghttp://revyu.com/people/tom/about/rdf RDF datahttp://revyu.com/people/tom/about/html HTML page
http://www.bbc.co.uk/music/artists/db4624cf#artist Thinghttp://www.bbc.co.uk/music/artists/db4624cf.rdf RDF datahttp://www.bbc.co.uk/music/artists/db4624cf.html HTML page
http://id.dbpedia.org/Berlin Thinghttp://data.dbpedia.org/Berlin RDF Datahttp://page.dbpedia.org/Berlin HTML page
http://www4.wiwiss.fu-berlin.de/bookmashup/books/006251587X ISBN
Choosing URI
http://www.culture.com/LOD/class/member
http://www.culture.com/LOD/class/member.rdf
http://www.culture.com/LOD/class/member.html
Examples:
URI of an Organization Entity
http://demo.openlinksw.com/Northwind/Customer/ALFKI/id
HTML representation of Entity description
http://demo.openlinksw.com/Northwind/Customer/ALFKI/ page
RDF representation that describes the Entity which could be: Turtle, N3.
RDF/XML etc. based data serialization
http://demo.openlinksw.com/Northwind/Customer/ALFKI/data
Linked Data Publication
Structured Data Text
EntityExtractor
(e.g. Calais)
RDF-izersFor CVS, xml,
Excel
RDB-to-RDFWrapper
(e.g. D2R)
CMS withRDFa
Output(e.g. Drupal)
CustomLinked Data
wrapper
Linked DataInterface
(e.g. Pubby
WebServer
(e.g. Apache)
Linked Data on the Web
RelationalDatabase
Data SourceData SourceWith API
RDFStore
RDFfiles
Types of data
Data Preparation
Data storage
Data Publication
Publication Strategy
Strategy
From unstructured sources
use NLP, text mining, annotation,…
OpenCalais, Ontos
From semi-structured sources
Dbpedia, Linked GeoData, SCOVO,…
efficient bi-directional synchronization
From structured sources (relational database)
Declarative syntax and semantics of data model translation
RDB2RDF,…
Conversion of Database
Books Authors
Publishers
IDYear
IDNameHomepage
IDPublisherNameCity
ID Author Title Publisher Year
ISBN0-00-651409-X id_xyz The Glass Palace id_qpr 2000
ID Name Home page
id_xyz Ghosh, Amitav http://www.amitavghosh.com
ID Publisher Name City
id_qpr Harper Collins London
Books
Authors
Publishers
Conversion of Database
Tools for mapping RDB to Linked Data
D2R Server for customizable mappings from relational databases to ontologies
[Bizer, Cyganiak 06]
Browser-based tools for defining RDB-to-RDF mappings
[Zhou, Xu, Chen, Idehen 08]
Triplify [Auer, Dietzold, Lehmann, Hellmann, Aumueller 09]
OpenLink Data Spaces [Idehen, Erling 08]
RDF Features Best Avoided
Do not use the full expressivity of the RDF data model.
Use a subset of the RDF features
No blank nodes.
It is impossible to set external RDF links to a blank node,
Do not use RDF reification as the semantics of reification
unclear and cumbersome to query with the SPARQL query language.
Metadata can be attached to the information resource instead
Be careful before using RDF collections or RDF containers
do not work well together with SPARQL
7. Link to other Data sets
• Types of Linking• Linking manually• Automatic generation of Link
Link ! Reuse !!
Reuse. Do not invent the wheel again…
The URIs are de-referenceable.
For instance, using the DBpedia URI http://dbpedia.org/page/Doom to
identify the computer game Doom gives you an extensive description of
the game including abstracts in 10 different languages and various
classifications.
The URIs are already linked to URIs from other data sources.
For instance, you can navigate from the DBpedia URI
http://dbpedia.org/resource/Innsbruck to data about Innsbruck provided by
Geonames and EuroStat.
Therefore, by using concept URIs form these datasets, you interlink your
data with a rich and fast-growing network of other data sources.
Types of Linking to other Data Sets
Relationship Links
point at related things in other data sources, for instance, other people, places or genes.
<http://www.skyhigh.com/people/GilDong>
rdf:type foaf:Person ;
foaf:name “Hong, Gil-Dong" ;
foaf:based_near <http://dbpedia.org/resource/Seoul> ;
foaf:topic_interest <http://dbpedia.org/resource/Justice> ;
foaf:knows <http://dbpedia.org/resource/HalBingDang> .
Identity Links
point at URI aliases used by other data sources to identify the same real-world object or abstract concept.
<http:// www.skyhigh.com/people/GilDong > <http://www.w3.org/2002/07/owl#sameAs> <http://www.korea.org/history/hero>
Vocabulary Links
point to the definitions of related terms in other vocabularies<http://www.university.org/terms/professor>
rdf:type rdfs:Class ;
rdfs:subClassOf <http://dbpedia.org/ontology/Person> .
rdfs:subClassOf <http://sw.opencyc.org/concept/Mx4rvbGdrcN5Y29ycA> ;
owl:equivalentClass <http://rdf.dictionary.com/entry/facultyMember>
Link to other Data Sets
URI aliases In an open environment like the Web it often happens that different
information providers talk about the same non-information resource. As they do not know about each other, they introduce different URIs for identifying the same real-world object. http://dbpedia.org/resource/Berlin
http://sws.geonames.org/2950159/
URI aliases provide an important social function to the Web of Data as they are de-referenced to different descriptions of the same non-information resource and thus allow different views and opinions to be expressed.
owl:sameAs
Common Properties rdfs:seeAlso, foaf:knows, foaf:based_near, foaf:topic_interest,…
Two approaches for linking data: RDF Links Manually
Auto-generating RDF Links
RDF Links Manually
Find the similar data sets as suitable linking targets manually search in these for the URI references you want to link to.
If a data source doesn't provide a search interface, you can use Linked Data browsers like Tabulator or Disco to explore the dataset and find the right URIs.
Useful sites: Sindice and Falcons provide indexes to identify candidate URIs for linking.
CKAN site : a registry of open linked data and projects.
Uriqr - A URI Search Engine: http://dev.uriqr.com/
Freebase: http://www.freebase.com
MOAT: Meaning Of A Tag Framework For manually interlinking tags with Semantic Web URIs (such as URIs from
DBpedia, Geonames … or any knowledge base)
Remember that data sources might use HTTP-303 redirects to redirect clients from URIs identifying non-information resources to URIs identifying information resources that describe the non-information resources.
Auto-generating RDF Links
Various approaches Pattern-based Algorithms
Similarity-based Approaches
Complex property-based Algorithms Yves Equivalence Miner: interlinking Jamendo and Musicbrainz.
Equivalence Mining and Matching Frameworks Silk - A Link Discovery Framework for the Web of Data.
Silk can be run on a single machine or on a Hadoop cluster (for instance Amazon EC2).
LIMES - Link Discovery Framework for Metric Spaces. time-efficient and lossless approaches for large-scale link discovery based on
the characteristics of metric spaces.
DSNotify - Detecting and Fixing Broken Links in Linked Data Sets
TopBraid Composer a wizard for linking ontology instances to corresponding DBpedia concepts.
SemMF a flexible framework for calculating semantic similarity between objects that
are represented as arbitrary RDF graphs.
http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/EquivalenceMining
Publishing Descriptions of a Data set
Help others discover and index your data
Apply a license or waiver to your data set
Metadata about the published linked data set authorship of a data set, its currency (i.e., how recently the data set was updated), its
licensing terms, the provenance and timeliness of a data set and the terms for licensing
Important issues: Provenance:
the ability to track the origin of data
key component in building trustworthy, reliable applications
Open Provenance Model84
Licenses vs. Waivers
Norms : a means for data publishers who waive their legal rights (through application of a waiver) to define expectations they have about how the data is used
Two primary mechanisms Semantic Sitemaps: http://sw.deri.org/2007/07/sitemapextension/
voiD : http://semanticweb.org/wiki/VoiD
Description
DescriptionDescription Description of dataset that have the resource's URI as the subject. Description of dataset that have the resource's URI as the subject.
BacklinksBacklinksDescription of dataset that have the resource's URI as the object. This is redundant, but it allows browsers and crawlers to traverse links in either direction.
Description of dataset that have the resource's URI as the object. This is redundant, but it allows browsers and crawlers to traverse links in either direction.
Related Related descriptions
Any additional information about related resources, i.e., answering information about a book with the author information. A moderate approach not overloaded excessively.
Any additional information about related resources, i.e., answering information about a book with the author information. A moderate approach not overloaded excessively.
MetadataMetadata Metadata about published data, such as a URI identifying the author and licensing information. Metadata about published data, such as a URI identifying the author and licensing information.
SyntaxSyntax
Various ways to serialize RDF descriptions. At least provide RDF descriptions as RDF/XML which is the only official syntax for RDF.Additionally provide Turtle descriptions
Various ways to serialize RDF descriptions. At least provide RDF descriptions as RDF/XML which is the only official syntax for RDF.Additionally provide Turtle descriptions Trix, and other
Data Set Description: Example
# Metadata and Licensing Information<http://dbpedia.org/data/Alec_Empire>
rdfs:label "RDF description of Alec Empire" ;rdf:type foaf:Document ;dc:publisher <http://dbpedia.org/resource/DBpedia> ;dc:date "2007-07-13"^^xsd:date ;dc:rights <http://en.wikipedia.org/wiki/WP:GFDL> .
# The description<http://dbpedia.org/resource/Alec_Empire>
foaf:name "Empire, Alec" ;rdf:type foaf:Person ;rdf:type <http://dbpedia.org/class/yago/musician> ;rdfs:comment
"Alec Empire (born May 2, 1972) is a German musician who is ..."@en ;rdfs:comment
"Alec Empire (eigentlich Alexander Wilke) ist ein deutscher Musiker. ..."@de ;dbpedia:genre <http://dbpedia.org/resource/Techno> ;dbpedia:associatedActs <http://dbpedia.org/resource/Atari_Teenage_Riot> ;foaf:page <http://en.wikipedia.org/wiki/Alec_Empire> ;foaf:page <http://dbpedia.org/page/Alec_Empire> ;rdfs:isDefinedBy <http://dbpedia.org/data/Alec_Empire> ;owl:sameAs <http://zitgist.com/music/artist/d71ba53b-23b0-4870-a429-cce6f345763b> .
Data Set Description: Example
# Backlinks<http://dbpedia.org/resource/60_Second_Wipeout>
dbpedia:producer <http://dbpedia.org/resource/Alec_Empire> .<http://dbpedia.org/resource/Limited_Editions_1990-1994>
dbpedia:artist <http://dbpedia.org/resource/Alec_Empire> .
Publishing Linked Data
Serialization of Data
RDF files shouldn't be larger than, say, a few hundred kilobytes. Break them up into several RDF files
Make sure multiple RDF files are linked to each other through RDF triples.
Publication
MethodAdvantages Disadvantages
RDF/XML Document Oldest, best supported Confusingly like normal XML
Turtle (N3)
DocumentSimplest
Not technically a standard
yet
HTML Document
with RDFa
Fits inside HTML,
but also RDFCan get very complicated
JSON Normal JSON, but also RDFPromising, but still being
developed
GRDDL Use the XML you have/wantNeeds to download+run
XSLT
SPARQL Query Protocol Query Protocol
Examples
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:db="http://dbpedia.org/resource/">
<rdf:Description rdf:about="http://dbpedia.org/resource/Massachusetts"><db:Governor><rdf:Description rdf:about="http://dbpedia.org/resource/Deval_Patrick" />
</db:Governor><db:Nickname>Bay State</db:Nickname><db:Capital><rdf:Description rdf:about="http://dbpedia.org/resource/Boston">
<db:Nickname>Beantown</db:Nickname></rdf:Description>
</db:Capital></rdf:Description>
</rdf:RDF>
RDF/XML
Turtle
@prefix db: <http://dbpedia.org/resource/>
db:Massachusetts db:Governor db:Deval_Patrick;db:Nickname "Bay State";db:Capital db:Boston.db:Nickname "Beantown".
Examples
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"><html xmlns="http://www.w3.org/1999/xhtml"
xmlns:db="http://dbpedia.org/resource/"version="XHTML+RDFa 1.0">
<head><title>About Massachusetts</title>
</head><body>
<div about="http://dbpedia.org/resource/Massachusetts">TheMassachusetts governor is<span rel="db:Governor">
<span about="http://dbpedia.org/resource/Deval_Patrick">DevalPatrick</span>,
</span>the nickname is "<span property="db:Nickname">Bay State</span>",and the capital<span rel="db:Capital">
<span about="http://dbpedia.org/resource/Boston">has the nickname "<span property="db:Nickname">Beantown</span>".
</span></span>
</div></body>
</html>
RDFa
Examples
"__iri": "db:Massachusetts","db:Nickname": "Bay State","db:Governor": "__iri": "db:Deval_Patrick" ,"db:Capital": "__iri": "db:Boston",
"db:Nickname": "Beantown",
"__prefixes": "db:": "http://dbpedia.org/resource/"
<MyDataSet xmlns="http://example.org/my-data-xml-namespace"><State><name>Massachusetts</name><governor>Deval_Patrick</governor><nickname>Bay State</nickname><capital><name>Boston</name><nickname>Beantown</nickname>
</capital></State>
</MyDataSet>
RDF-JSON
GRDDL
Linked Data Storage
RDB to RDF Middleware D2R Server
Native RDF Storage (manage it yourself) 4Store
AllegroGraph
Bigdata
BigOWLIM
Jena TDB
Neo4j
Sesame
Virtuoso
Native RDF Storage (managed) Talis Platform
Pubby Linked Data front-end for SPARQL Endpoints
Paget Framework
Testing and Debugging Linked Data
To ensure it adheres to the Linked Data principles and best practices
correctness of URIs dereference
Vapour Linked Data Validator at http://idi.fundacionctic.org/vapour
RDF:Alerts at http://swse.deri.org/RDFAlerts/
Sindice Inspector at http://inspector.sindice.com/
manual validation and debugging of Linked Data
cURL, Firefox browser extensions LiveHTTPHeaders and ModifyHeaders
technical debugging and validation
Linked Data browsers can be used for.
Tabulator, Marbles, LOD Browser Switch
Summary: Linked Data
Semantic Technologies need to go where the data is !
Long Live Semantic Technology !
Early adaptation of Semantic Technology is the king !
Linked Data is the common global data space.
Gun for killer apps of semantic technology…
Catalyst and enabler to make semantic technology real…
Unlimited opportunities ahead…
Growth in data volumes is very rapid.
Link, Integrate, Reuse
Linked Data is a truly Web-friendly way of publishing data.
References
Keith Alexander, Richard Cyganiak, Michael Hausenblas, and Jun Zhao, Describing linked datasets, In
Proceedings of the WWW2009 Workshop on Linked Data on the Web, 2009.
Tim Berners-Lee, Linked Data - Design Issues, 2006, http://www.w3.org/DesignIssues/LinkedData.html.
Tim Berners-Lee, Giant global graph, http://dig.csail.mit.edu/breadcrumbs/node/215, 2007.
Christian Bizer, Tom Heath, and Tim Berners-Lee, Linked data - the story so far, Int. J. Semantic Web Inf.
Syst., 5(3):1–22, 2009.
Chris Bizer, Richard Cyganiak, and Tom Heath, How to Publish Linked Data on the Web,
http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
W3C Working Draft, Cool URIs for the Semantic Web,
http://www.w3.org/TR/2008/WD-cooluris-20080321/
http://data.gov.uk/linked-data
http://www.w3.org/2001/sw/Specs.html
Auer, S., Dietzold, S., Lehmann, J., Hellmann, S., and Aumueller, D. (2009). Triplify : lightweight linked
data publication from relational databases. In Proceedings of the 17th International Conference on World
Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009
A Survey of current approaches for mapping of relational databases to RDF:
http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt
Miles et al.: Best Practices Recipes for Publishing RDF Vocabularies, Available at:
http://www.w3.org/TR/swbp-vocab-pub/