tutorial kcc-2011

101
2011.06.30 Linked Data: Enabler of Semantic Web Sung-Kook Han Semantic Technology Lab Won Kwang Univ. 1 [email protected]

Upload: won-kwang-university

Post on 10-May-2015

3.819 views

Category:

Education


2 download

DESCRIPTION

Semantic Web and Linked Data

TRANSCRIPT

Page 1: Tutorial kcc-2011

2011.06.30

Linked Data:Enabler of Semantic Web

Sung-Kook HanSemantic Technology Lab Won Kwang Univ.

[email protected]

Page 2: Tutorial kcc-2011

Outline

Introduction to Semantic Technology

Semantic Technology + Web Technology

• Semantic Web

• Web 2.0

• Linked Data

Design and Publication of Linked Data

• 9 steps towards Linked Open Data

[email protected]

Page 3: Tutorial kcc-2011

Why Semantic Technology??

the ways of thinking, cognition…

George Boole: An Investigation of the Laws of Thought (1854)

Claude Shannon: 1937 master's thesis, A Symbolic Analysis of Relay and Switching Circuits

Kurt Gödel Alan TuringJohn von Neumann

[email protected]

Page 4: Tutorial kcc-2011

Why Semantic Technology??

Final Goal: Intelligence

[email protected]

Page 5: Tutorial kcc-2011

Our Computers

[email protected]

Page 6: Tutorial kcc-2011

Communication

Human vs. Human

Human vs. Alien

Human vs. Computer

Computer vs. Computer

[email protected]

Page 7: Tutorial kcc-2011

Semantic Technology

Semantic technology has been a distinct research field for more than 40 years.

Formal Logic (since Russell and Frege)

Knowledge Representation Systems in AI

Semantic Networks and ATN (William Woods, 1975)

DARPA and European Commission programs in information integration

Development of simple tractable logics

Relational Algebras and Schemas in Database Systems

Library Science (classifications, thesauri, taxonomies)

New challenges of Semantic Technology: Semantic Web

A massive store of information that computers cannot use

A way to get around needing the “big data warehouse”

Another place where “a little semantics can go a long way”...

cf: The Relationship Between Web 2.0 And the Semantic Web - Dr. Mark Greaves, Vulcan, Inc.

[email protected]

Page 8: Tutorial kcc-2011

[email protected] 8

Ontology Spectrum

Animal

Mammal ReptileBird

SnakeDog Cat

Cocker

Spaniel

Lady

Technologieshas_experience_in

Programsworks

Personnel

S1

Agent

Company

illusion

has WISO

Department

am

AS ASAS

LeoPaulnderleez

IntelligenceNavy

BradAnn

Howard

AssistantDirectorReza

Director

Technical

ManagementProject

TelecommunicationTask

Program

EcDARPA

Request

SemanticInteroperability

KnowledgeRepresentation

NaturalLanguage

Is Disjoint Subclass

of with transitivity

property

Modal Logic

Logical Theory

ThesaurusHas Narrower Meaning Than

TaxonomyIs Sub-Classification of

Conceptual ModelIs Subclass of

DB Schemas, XML Schema

UML

First Order Logic

Relational

Model, XML

ER

Extended ER

Description Logic

DAML+OIL, OWL

RDF/SXTM

Syntactic Interoperability

Structural Interoperability

Semantic Interoperability

weak semanticsweak semantics

strong semanticsstrong semantics

Based on Leo Obrst, The Ontology Spectrum & Semantic Models

Page 9: Tutorial kcc-2011

Semantic Technology

OntologyOntology

MetadataMetadata

controlled vocabularycontrolled vocabulary

Web resourcesServices

Web resourcesServices

ImageAudio/Video

ImageAudio/Video

DocumentsDocuments

IntegrationIntelligence Interoperability

Semantic

Technology

Semantic

Technology

Machine-processibleSemantics

DigitalInformation Resources

[email protected]

Page 10: Tutorial kcc-2011

Web Technology

Web of machine-processible Data

Common vocabularies: Metadata and Ontology

Query and reasoning

Web of Services

Internet of Services

Internet of Things

Social Web

Connect human-being

Web as a platform

Programmable APIs and proprietary interfaces

Mashups based on a fixed set of data sources

Classic Web

Web of Documents

HTML as document format

HTTP URLs as globally unique IDs

Hyperlinks to connect everything

[email protected]

Page 11: Tutorial kcc-2011

Semantic Web

Standardizations Trio of Semantic Web

Metadata / Ontology: RDF, RDFS, OWL

Query Language: SPARQL

Rule Language: RIF (SWRL)

SKOS, RDFa, GRRDL, WSMO,…

SOAP/ REST

Tools and Systems Authoring, Reasoning Engines,…

835 items in Sweet Tools

Best Practices Linked Open Data

Semantic MediaWiki

NEPOMUK, SIOC, Garlik

W3C Semantic Web Use cases

Sweet Tools: http://www.mkbergman.com/new-version-sweet-tools-sem-web/

W3C Semantic Web Case Studies and Use Cases: http://www.w3.org/2001/sw/sweo/public/UseCases/

[email protected]

Page 12: Tutorial kcc-2011

Semantic Applications

Semantic Wave 2008, Industry Roadmap to Web 3.0, Project10X

http://www.mkbergman.com/new-version-sweet-tools-sem-web/

[email protected]

Page 13: Tutorial kcc-2011

Web 2.0

Resharpen the way of viewing the WebWeb as the platform

Web as the social media

Web as the collaboration tool

Web as ……

Web 2.0 Manifestation Openness / Sharing

Participation / Collaboration

Web 2.0 Syndrome Library 2.0

Government 2.0

Enterprise 2.0

……

New Web applications wiki, blog, RSS,…

[email protected]

Page 14: Tutorial kcc-2011

Web 2.0 Developers

[email protected]

Page 15: Tutorial kcc-2011

Semantic Web Today

Major future issues:

• Vocabularies • Scalability• Provenance• Personal Infospheres• Mobile and Real World Networks

[email protected]

Page 16: Tutorial kcc-2011

Web 2.0 APIs Today

MashUp

WebAPI

WebAPI

WebAPI

A CB

No Single global space:

• Mashups of APIs are proprietary.• No links between data.

Web APIs slice the Web into Walled Gardens.

Christian Bizer: Pay-as-you-go Data Integration (21/9/2010)

[email protected]

Page 17: Tutorial kcc-2011

The Web is Dead??

http://www.wired.com/magazine/2010/08/ff_webrip/

[email protected]

Page 18: Tutorial kcc-2011

Long Live the Web !

http://www.scientificamerican.com/article.cfm?id=long-live-the-web

[email protected]

Page 19: Tutorial kcc-2011

Lessons Learned

Data is more important than API code.

Data is the Intel Inside.

Open data is more important than open source

Structured data is more valuable than unstructured.

We should seek to structure our data well.

Metadata will play a core role of data structure.

A little semantics goes a long way.

Beware the usefulness of shallow ontology shown in LOD.

Linking data and services are essential.

Link every thing.

Rich user experiences are the key for adaption.

We should consider mobile computing and personalization.

Visualize and navigate.

[email protected]

Page 20: Tutorial kcc-2011

Semantic Web &

Linked Data

Page 21: Tutorial kcc-2011

Web of Documents

A global file systems of documents (document silos on the

Web).

Implicit semantics of content and links

Designed for human consumption

Disconnected data

[email protected]

Page 22: Tutorial kcc-2011

Architecture: Web of Documents

HTMLDoc.

DB-C

HTMLDoc.

DB-A

HTMLDoc.

DB-B

hyperlinkdocument link

hyperlinkdocument link

WebBrowsers

SearchEngines

HTTP URL

Analogy

a global file system

Designed for

human consumption

Primary objects

documents

Links between

documents (or sub-parts of)

Degree of structure in objects

fairly low

Main Usage

Search and browsing

Semantics of content and links

implicit

[email protected]

Page 23: Tutorial kcc-2011

Machine-Processible Data

Web of Documents

Web of Data

Database

Documents

Documents

Data

Information Resources

Human processible

Machine processible

Open the data silos and get rid of repository-centric mindset Publish data of public interest on the Web In a way that other applications can access and interpret the data Using common Web technologies

[email protected]

Page 24: Tutorial kcc-2011

Semantic Web: Web of Data

The vision of a Semantic Web:

building a global Web of machine-readable data

Berners-Lee, Hendler & Lassila, 2001; Marshall & Shipman, 2003

Linked Data Foundation

can lower the barrier to reuse, integration and application of data from multiple,

distributed and heterogeneous sources.

the more sophisticated proposals associated with the Semantic Web vision,

such as intelligent agents, may become a reality.

The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web - a web of data that can be processed directly or indirectly by machines. Therefore, while the Semantic Web, or Web of Data, is the goal or the end result of this process, Linked Data provides the means to reach that goal. -- Tim Berners-Lee, et al., http://linkeddata.org/docs/ijswis-special-issue, Jan, 2009

The first step is putting data on the Web in a form that machines can naturally understand, or converting it to that form. This creates what I call a Semantic Web - a web of data that can be processed directly or indirectly by machines. Therefore, while the Semantic Web, or Web of Data, is the goal or the end result of this process, Linked Data provides the means to reach that goal. -- Tim Berners-Lee, et al., http://linkeddata.org/docs/ijswis-special-issue, Jan, 2009

[email protected]

Page 25: Tutorial kcc-2011

Linked Data: Web of Data

Goal: Web-scale Data Integration

Alternative to classic data integration systems in order to cope with growing

number of data sources.

Querying across data sources

Global distributed database

Extend the Web with a single global data space

Giant Global Graph (GGG)

Demonstrate the possibility of Semantic Web

By using RDF to publish structured data

By setting links between data

RDF

RDF

RDF

RDF

RDFRDF

singleuniversal

information space.

[email protected]

Page 26: Tutorial kcc-2011

Architecture: Linked Data

RDFtriples

DB-C

RDFtriples

DB-A

TriplesRDF

Triples

DB-B

RDF linkdata link

RDF linkdata link

Linked DataBrowsers

SearchEngines

HTTP URI

Linked DataMashup

Analogy a global database

Designed for machines first, humans later

Primary objects things (or descriptions (data) of

things)

Links between things

Degree of structure in (descriptions of) things high

Main usage query, navigation and reasoning

Semantics of content and links explicit

[email protected]

Page 27: Tutorial kcc-2011

Linked Data Principles

Set of best practices for publishing structured data on the Web in accordance with

the general architecture of the Web.

Use URIs as names for things.

Use URIs as names for things, not just for documents or homepages

Use HTTP URIs so that people can look up those names.

When someone looks up a URI, provide useful RDF information.

Include RDF statements that link to other URIs so that they can discover

related things.

URI

URI

URI

URIURI

URI URI

RDF Link

HTTP URI

RDF triple Information

[email protected]

Page 28: Tutorial kcc-2011

Linked Open Data

Community effort to

publish existing open license datasets as Linked Data on the Web

interlink things between different data sources

develop clients that consume Linked Data from the Web

began early 2007

[email protected]

Page 29: Tutorial kcc-2011

LOD Data sets on the Web

http://richard.cyganiak.de/2007/10/lod/lod-datasets_2010-09-22_colored.svg

25 billion RDF triples, which are interlinked by around 395 million RDF links (Sep. 2010).

[email protected]

Page 30: Tutorial kcc-2011

Summary: Web of Linked Data

A global, distributed database built on a simple set of

standards

RDF, URI, HTTP

Explicit semantics of content and links

Resources are connected by semantic links.

creating a single global data graph that span data sources

enables the discovery of new data sources

Provides for data co-existence

Anyone can publish data to the Web of Linked Data

Data publishers are not constrained in choice of vocabularies with

which to represent data.

Designed for computer first, humans later

[email protected]

Page 31: Tutorial kcc-2011

Data.Gov

[email protected]

Page 32: Tutorial kcc-2011

Europeana

European digital library: Europeana: This European Commission initiative

encompasses not only libraries but also museums, archives and other holders of cultural

heritage material.

http://version1.europeana.eu/web/europeana-project

[email protected]

Page 33: Tutorial kcc-2011

Linked Library Cloud

Libraries have been producing

metadata for ages.

Libraries (often) produce high-

quality metadata.

Library develops many metadata

standards such as DC, SKOS,

BIBO, OAI-ORE including

MARC 21, MODS, FRBR,..

Integrate Library Catalogues on

global scale

http://code4lib.org/conference/2010/singer

[email protected]

Page 34: Tutorial kcc-2011

Linking Open Drug Data

linking the various sources of

drug data together to answer

interesting scientific and

business questions.

Survey publicly available data

sets about drugs

Publish and interlink these data

sets on the Web

Explore interesting questions that

could be answered if the data sets

are linked.

8 million RDF triples, which are

interlinked by more than

370,000 RDF links (As of

August 2009)

[email protected]

Page 35: Tutorial kcc-2011

BBC Semantic Project

Publish program / music data as RDF/XML or RDFa

Build semantically linked and annotated web pages about artists and

singers whose songs are played on BBC radio stations.

semantically interconnected

[email protected]

Page 36: Tutorial kcc-2011

DBpedia Mobile

Show map with information about nearby locations

Linked data browser

GPS + Google Maps + DBpedia + Flickr + Revyu

[email protected]

Page 37: Tutorial kcc-2011

Attention by Search Engines

Yahoo!

crawls Linked Data in its RDFa serialization as well as Microformat

Yahoo Search Monkey to make search results more useful and visually

appealing

provides access to crawled data through the Yahoo BOSS API

Google

use Social Graph API

is developing Google Squared and Google Fusion Table

merged MetaWeb

manage Freebase, a DBpedia/YAGO competitor

Rich Snippets

[email protected]

Page 38: Tutorial kcc-2011

Linked Open Commerce

[email protected]

Page 39: Tutorial kcc-2011

Design and Publication

of

Linked Data

Page 40: Tutorial kcc-2011

9 Steps to publishing Linked Data

Understand the principlesUnderstand the principles

Setup Your Infrastructure for Linked DataSetup Your Infrastructure for Linked Data

Understand your dataUnderstand your data

Create VocabulariesCreate Vocabularies

Choose URIs for Things in your DataChoose URIs for Things in your Data

Link to other Data SetsLink to other Data Sets

Describe your Data SetsDescribe your Data Sets

Publicize your Data SetsPublicize your Data Sets

Triplify Data SetsTriplify Data Sets

[email protected]

Page 41: Tutorial kcc-2011

1. Understand Linked Data

• Principle• Core Stack• Data Modeling

Page 42: Tutorial kcc-2011

Linked Data: Overview

Benefits of Linked Data Enables web-scale data distributed

publication with web-based discovery mechanisms.

Linked Data Web Resources are generic real-world data

objects or entities:

People, Places, and other physical things

Abstract concepts (e.g., emotion, notion,…)

Subject matter (e.g., science, economics, arts,…)

Linked Data is not just structured data published on the

Web.

Linked Data is based on well-established Web standards

Linked Data adds value: less redundancy, greater

discoverability, network effects.

[email protected]

Page 43: Tutorial kcc-2011

Linked Data Principles (TimBL, 2006)

Use URIs as names for things

not just for documents

http://dbpedia.org/resource/ontology

you are not your homepage

http://mentalist.com/actor/patrick_jane

Use HTTP URIs

globally unique names, distributed ownership

allows people to look up those names

Provide useful information in RDF

when someone looks up a URI

Include RDF links to other URIs

to enable discovery of related information

[email protected]

Page 44: Tutorial kcc-2011

5 Star rating

On the web, open licensed: Available on the web (whatever format), but with an open license

Machine-readable data: Available as machine-readable structured data (e.g. excel instead of image scan of a table)

Non-proprietary format (e.g. csv instead of excel)

RDF standards: Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff

Linked RDF: Link your data to other people’s data to provide context

[email protected]

Page 45: Tutorial kcc-2011

Linked Data Core Stack

http://linkeddata-specs.info/

RFC 2616 Hypertext Transfer Protocol• HTTP/1.1 Defines HTTP, a generic and stateless application-level protocol for distributed,

collaborative, hypermedia information systems.

RFC 3986 Uniform Resource Identifier (URI): • Generic Syntax Defines a generic URI syntax and a process for resolving URI references that

might be in relative form, along with guidelines and security considerations for the use of URIs on the Internet.

RDF Concepts and Abstract Syntax • Defines the RDF graph data model and key concepts.

SPARQL Query Language for RDF • Defines the syntax and semantics of the SPARQL query language for RDF.

[email protected]

Page 46: Tutorial kcc-2011

Core Technology

Uniform Resource Identifier (URI)

Names (identifiers) for resources in an open Web environment

Resource Description Framework (RDF)

a model for representing metadata on the web

triple structure

RDF Schema and OWL

languages for defining vocabularies

RDF/XML, N3, Turtle,…

serialization and de-serialization of RDF triples for exchanging RDF data

Simple Knowledge Organization System (SKOS)

a language for describing controlled vocabularies

SPARQL

a query language and protocol for accessing RDF data via the Web

[email protected]

Page 47: Tutorial kcc-2011

Linked Data Modeling

Data ModelingData Modeling Data LinkingData Linking

RDF data model to publish structured data on the WebRDF data model to publish structured data on the Web

RDF links to interlink data from different data sourcesRDF links to interlink data from different data sources

RDF triple: subject, predicate, and object Subject: URI identifying the described resource Predicate: relation exists between subject and object, vocabularies, collections of URIs that can be used to represent information about a certain

domain Object: a simple literal value, or the URI of another resource that is related to the subject

[email protected]

Page 48: Tutorial kcc-2011

Linked Data Model

Flexible graph-based model: RDF graph

URI: global primary key

skos:subject = http://www.w3.org/2004/02/skos/core#subject dbp-prop:title = http://dbpedia.org/property/title

The HTTP protocol brings together identification

and retrieval again.

Deeper into the Web

http://.../isbn/46316

The Lord of the rings

English novels

dbp-prop:title

skos:subject

J.R.R. Tolkien

wkp-en:J.R.R.Tolkien

dbp-prop:author

dbp-prop:name

foaf:homepage dbpidia:Allen&Unwin

dbp-prop:publisher

fb:guid…..92df7London

Marivie

83 Alexander St 83 Alexander

opencyc:headquarterdbp-prop:city

fb:creator

fb:street_address

[email protected]

Page 49: Tutorial kcc-2011

2. Setup Infrastructure

• Basic Infrastructure• Systems and Tools

[email protected]

Page 50: Tutorial kcc-2011

Basic Infrastructure

Data/Content

DB

extractionextraction

conversionconversion

linklinkgeneration

SPARQLQueryEngine

Framework + APIs

Web Server (Apache)

index

searchdiscoverynavigation

triple store

packaging

browser navigator search

RDF Triple Base

Interface

Delivery

Application

[email protected]

Page 51: Tutorial kcc-2011

Infrastructure Construction

Configuration of Web server

Configuring the server for correct MIME types application/rdf+xml

Code samples for ConNeg and 303 Redirects: http://linkeddata.org/tools

use cURL: http://curl.haxx.se/ to configure Apache

Configure for hash URI or Slash URI

Testing your content negotiation

Install the LiveHTTPHeaders and Modify Headers extensions for Firefox

Try LiveHTTPHeaders against my URI

http://www.skyhigh.com/id/hong

do the same with URIs from other data sets

Modify your headers to ask for application/rdf+xml

[email protected]

Page 52: Tutorial kcc-2011

Supporting Technologies

Linked Data Browsers

provide for navigating between data sources and for exploring the dataspace.

Tabulator Browser (MIT, USA), Marbles (FU Berlin, DE), OpenLink RDF

Browser (OpenLink, UK), Zitgist RDF Browser (Zitgist, USA), Disco

Hyperdata Browser Berlin, Fenfire (DERI, Irland)

Web of Data Search Engines

crawl the data space and provide best-effort query answers over crawled data.

Falcons (IWS, China), Sig.ma (DERI, Ireland), Swoogle (UMBC, USA),

VisiNav (DERI, Ireland), Watson (Open University, UK), TAP, Sindice

[email protected]

Page 53: Tutorial kcc-2011

Supporting Technologies

Describing data set

discovery and usage of linked datasets

voiD, Ding

Registry

an open registry of data and content packages

CKAN

Linking tool

discovering relationships between data items within different Linked Data sources

SILK

Mapping tool

mapping database to RDF triples

Triplify, D2R Server

LOD platform D2R Server, Virtuoso Universal Server,

Talis Platform, Pubby, …

[email protected]

Page 54: Tutorial kcc-2011

3. Understand Data to be published

• Review about Data to be published• Requirement analysis

[email protected]

Page 55: Tutorial kcc-2011

Review about Data to be published

What think about the key things to be presented in Linked Data

analysis of data properties

What vocabularies can be used to describe these?

Why purposes and goals of linked data to be published

What for how to use and apply linked data (use cases)

How to serve Serving Linked Data as Static RDF/XML Files

Serving Linked Data as RDF Embedded in HTML Files

Serving RDF and HTML with Custom Server-Side Scripts

Serving Linked Data from Relational Databases

Serving Linked Data from RDF Triple Stores

Serving Linked Data by Wrapping Existing Application or Web APIs

[email protected]

Page 56: Tutorial kcc-2011

4. Create Vocabularies

• Vocabulary Creation• Common Namespace• Definition

[email protected]

Page 57: Tutorial kcc-2011

Guideline for Vocabulary Creation

Do not define new vocabularies from scratch, but complement existing

vocabularies with additional terms (in your own namespace) to represent your

data as required.

Provide for both humans and machines. Use rdfs:comments for each term

invented. Always provide a label for each term using the rdfs:label property.

Make term URIs de-referenceable following the W3C Best Practice Recipes

for Publishing RDF Vocabularies.

Make use of other people's terms. Using other people's terms, or providing

mappings to them, by means of rdfs:subClassOf or rdfs:subPropertyOf.

State all important information explicitly. For example, state all ranges and

domains explicitly.

Do not create over-constrained, brittle models; leave some flexibility for

growth. Do not use full-featured OWL or RDF to define your vocabulary.

Unless you know exactly what you are doing, use RDF Schema to define

vocabularies.

[email protected]

Page 58: Tutorial kcc-2011

Potential Ontologies / Vocabularies

Friend-of-a-Friend (FOAF), vocabulary for describing people.

Dublin Core (DC) defines general metadata attributes. See also their new

domains and ranges draft.

Semantically-Interlinked Online Communities (SIOC), vocabulary for

representing online communities.

Description of a Project (DOAP), vocabulary for describing projects.

Simple Knowledge Organization System (SKOS), vocabulary for

representing taxonomies and loosely structured knowledge.

Music Ontology provides terms for describing artists, albums and tracks.

Review Vocabulary, vocabulary for representing reviews.

Creative Commons (CC), vocabulary for describing license terms

Geo, vocabulary for describing geographical locations

GoodRelations, vocabulary for describing products

[email protected]

Page 59: Tutorial kcc-2011

Common Namespaces

xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#"xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:dc="http://purl.org/dc/terms/"xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:vcard="http://www.w3.org/2006/vcard/ns#"xmlns:dbp="http://dbpedia.org/dbprop/"xmlns:geo="http://www.geonames.org/ontology#"xmlns:gr="http://purl.org/goodrelations/v1#" xmlns:commerce="http://search.yahoo.com/searchmonkey/commerce/"xmlns:media="http://search.yahoo.com/searchmonkey/media/" xmlns:cb="http://cb.semsol.org/ns#"

More Common Namespaces:http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/CommonVocabularieshttp://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/100-most-popular-rdf-namespaces

[email protected]

Page 60: Tutorial kcc-2011

Definition of Vocabulary

# Definition of the class "Lover"<http://sites.movie.org/pub/LoveVocabulary#Lover>

rdf:type rdfs:Class ;rdfs:label "Lover"@en ;rdfs:label "Liebender"@de ;rdfs:comment "A person who loves somebody."@en ;rdfs:comment "Eine Person die Jemanden liebt."@de ;rdfs:subClassOf foaf:Person .

# Definition of the property "loves"<http://sites.movie.org/pub/LoveVocabulary#loves>

rdf:type rdf:Property ;rdfs:label "loves"@en ;rdfs:label "liebt"@de ;rdfs:comment "Relation between a lover and a loved person."@en ;rdfs:subPropertyOf foaf:knows ;rdfs:domain <http://sites.movie.org/pub/LoveVocabulary#Lover> ;rdfs:range foaf:Person .

[email protected]

Page 61: Tutorial kcc-2011

Tools for Vocabulary Definition

Ontology editors

Protégé:

an open-source ontology editor with a dedicated OWL plug-in

Neologism:

Web-based tool for creating, managing and publishing simple RDFS

vocabularies.

open-source and implemented in PHP on top of the Drupal-platform.

TopBraid Composer:

a powerful commercial modeling environment for developing Semantic

Web ontologies

NeOn Toolkit:

an open-source ontology engineering environment with an extensive set of

plug-ins.

[email protected]

Page 62: Tutorial kcc-2011

5. Choose URIs

• Resource Identification• Types of URIs• De-Referencing• Common URI Patterns

[email protected]

Page 63: Tutorial kcc-2011

Resource Identification

Separation of Identity and Representation

Identity

Identity (URI) of an Object or Entity should be unambiguous and globally unique

Representation

On the Web a URI should provide an unambiguous data access path

Access

Reference to abstract (physically inaccessible)

Objects or Entities is only achievable via conduit documents that carry representations of entity descriptions (which at best are facets of an entire description)

URI Requirements:

Keep out of other peoples' namespaces

Use a namespace that you control

Abstract away from implementation details (Short is better…)

Stable and persistent

Hash or Slash

Use common URI patterns

[email protected]

Page 64: Tutorial kcc-2011

URI

URI: Unique Resource Identifier

http://www.example.com/people/alice

home page??(Web document)

informationobject ??

URI: identification of people, products, places, ideas and concepts such as ontology classes, including URLs for Web documents

Two Approaches

hash URIhash URI

slash URIslash URI

[email protected]

Page 65: Tutorial kcc-2011

Hash / Slash URI

Hash URI

URIs can contain a fragment, a special part that is separated from the

rest of the URI by a hash symbol (“#”).

http://www.example.com/products/BiBimBab#this

http://www.travel.com /nation/Korea/KyungJu#main

simply publish a description document containing RDF about the things

at the base URI

Slash URI

examples:

http://www.example.com/products/BiBimBab

http://www.travel.com /nation/Korea/KyungJu

must publish your description document at another, distinct URI.

[email protected]

Page 66: Tutorial kcc-2011

hash URI

http://www.skyhigh.com/person/GilDong#this

http://www.skyhigh.com/person/GilDong

Metadata:content-type:application/xhtml+ xml

Data:<html xmlns=“..<head><title> Our hero…

</html>

Entity(GilDong)

Separating identification and naming from representation

[email protected]

Page 67: Tutorial kcc-2011

slash URI

http://www.skyhigh.com/person/hero/GilDong/id

http://www.skyhigh.com/person/hero/GilDong/page

http://www.skyhigh.com/person/hero/GilDong/data

Metadata:content-type:application/xhtml+ xml

Data:<html xmlns=“..<head><title> Our hero…

</html>

Metadata:content-type:application/rdf+xml

Data:<html xmlns=“..<head><title> Our hero…

</html>

Entity(GilDong)

Separating identification and naming from representation

[email protected]

Page 68: Tutorial kcc-2011

Slash vs. Hash

Slash URI HTTP redirection (30X response) is required in order for resource "Identity" to be

separated from "representation". :

http://www.skyhigh.com/person/hero/GilDong/id (URI of an Organization Entity)

http://www.skyhigh.com/person/hero/GilDong/page (HTML representation of Entity description)

http://www.skyhigh.com/person/hero/GilDong/data (RDF representation that describes the Entity which could be: Turtle, N3. RDF/XML etc. based data serialization)

Hash URI HTTP redirection isn't required in order for resource "Identity" to be separated from

"representation". :

http://demo.openlinksw.com/Northwind/Customer/ALFKI#this (URI of an Organization Entity)

http://demo.openlinksw.com/Northwind/Customer/ALFKI a document (HTML, Turtle, N3, RDF/XML, representation of Entity description).

[email protected]

Page 69: Tutorial kcc-2011

DeReferencing Hash URI

http://www.example.com/about#alice

RDF

http://www.example.com/about

automatic truncation of fragment

ID

Without content negotiation

http://www.example.com/about#alice

http://www.example.com/about.rdf

automatic truncation of fragment

ID

RDF

HTML

http://www.example.com/about.html

contentnegotiation

application/rdf+xml win text/html win

http://www.example.com/about

With content negotiation

[email protected]

Page 70: Tutorial kcc-2011

DeReferencing Slash URI

One Generic Document Different documents

http://www.example.com/id/alice

http://www.example.com/doc/alice.rdf

303 redirected

ID

RDF

HTML

http://www.example.com/doc/alice.html

contentnegotiation

application/rdf+xml win text/html win

http://www.example.com/doc/alice

generic document

http://www.example.com/id/alice

http://www.example.com/doc/alice.rdf

ID

RDF

HTML

http://www.example.com/doc/alice.html

303 redirectedwith contentnegotiation

application/rdf+xml win

text/html win

[email protected]

Page 71: Tutorial kcc-2011

Content Negotiation

[email protected]

Page 72: Tutorial kcc-2011

Content Negotiation

[email protected]

Page 73: Tutorial kcc-2011

Common URI Pattern

http://dbpedia.org/resource/New_York_City Thinghttp://dbpedia.org/data/New_York_City RDF datahttp://dbpedia.org/page/New_York_City HTML page

http://revyu.com/people/tom Thinghttp://revyu.com/people/tom/about/rdf RDF datahttp://revyu.com/people/tom/about/html HTML page

http://www.bbc.co.uk/music/artists/db4624cf#artist Thinghttp://www.bbc.co.uk/music/artists/db4624cf.rdf RDF datahttp://www.bbc.co.uk/music/artists/db4624cf.html HTML page

http://id.dbpedia.org/Berlin Thinghttp://data.dbpedia.org/Berlin RDF Datahttp://page.dbpedia.org/Berlin HTML page

http://www4.wiwiss.fu-berlin.de/bookmashup/books/006251587X ISBN

[email protected]

Page 74: Tutorial kcc-2011

Choosing URI

http://www.culture.com/LOD/class/member

http://www.culture.com/LOD/class/member.rdf

http://www.culture.com/LOD/class/member.html

Examples:

URI of an Organization Entity

http://demo.openlinksw.com/Northwind/Customer/ALFKI/id

HTML representation of Entity description

http://demo.openlinksw.com/Northwind/Customer/ALFKI/ page

RDF representation that describes the Entity which could be: Turtle, N3.

RDF/XML etc. based data serialization

http://demo.openlinksw.com/Northwind/Customer/ALFKI/data

[email protected]

Page 75: Tutorial kcc-2011

6. Triplify Data Sets

• Publication Strategies• Conversion of Database

[email protected]

Page 76: Tutorial kcc-2011

Linked Data Publication

Structured Data Text

EntityExtractor

(e.g. Calais)

RDF-izersFor CVS, xml,

Excel

RDB-to-RDFWrapper

(e.g. D2R)

CMS withRDFa

Output(e.g. Drupal)

CustomLinked Data

wrapper

Linked DataInterface

(e.g. Pubby

WebServer

(e.g. Apache)

Linked Data on the Web

RelationalDatabase

Data SourceData SourceWith API

RDFStore

RDFfiles

Types of data

Data Preparation

Data storage

Data Publication

[email protected]

Page 77: Tutorial kcc-2011

Publication Strategy

Strategy

From unstructured sources

use NLP, text mining, annotation,…

OpenCalais, Ontos

From semi-structured sources

Dbpedia, Linked GeoData, SCOVO,…

efficient bi-directional synchronization

From structured sources (relational database)

Declarative syntax and semantics of data model translation

RDB2RDF,…

[email protected]

Page 78: Tutorial kcc-2011

Conversion of Database

Books Authors

Publishers

IDYear

IDNameHomepage

IDPublisherNameCity

ID Author Title Publisher Year

ISBN0-00-651409-X id_xyz The Glass Palace id_qpr 2000

ID Name Home page

id_xyz Ghosh, Amitav http://www.amitavghosh.com

ID Publisher Name City

id_qpr Harper Collins London

Books

Authors

Publishers

[email protected]

Page 79: Tutorial kcc-2011

Conversion of Database

Tools for mapping RDB to Linked Data

D2R Server for customizable mappings from relational databases to ontologies

[Bizer, Cyganiak 06]

Browser-based tools for defining RDB-to-RDF mappings

[Zhou, Xu, Chen, Idehen 08]

Triplify [Auer, Dietzold, Lehmann, Hellmann, Aumueller 09]

OpenLink Data Spaces [Idehen, Erling 08]

[email protected]

Page 80: Tutorial kcc-2011

RDF Features Best Avoided

Do not use the full expressivity of the RDF data model.

Use a subset of the RDF features

No blank nodes.

It is impossible to set external RDF links to a blank node,

Do not use RDF reification as the semantics of reification

unclear and cumbersome to query with the SPARQL query language.

Metadata can be attached to the information resource instead

Be careful before using RDF collections or RDF containers

do not work well together with SPARQL

[email protected]

Page 81: Tutorial kcc-2011

7. Link to other Data sets

• Types of Linking• Linking manually• Automatic generation of Link

[email protected]

Page 82: Tutorial kcc-2011

Link ! Reuse !!

Reuse. Do not invent the wheel again…

The URIs are de-referenceable.

For instance, using the DBpedia URI http://dbpedia.org/page/Doom to

identify the computer game Doom gives you an extensive description of

the game including abstracts in 10 different languages and various

classifications.

The URIs are already linked to URIs from other data sources.

For instance, you can navigate from the DBpedia URI

http://dbpedia.org/resource/Innsbruck to data about Innsbruck provided by

Geonames and EuroStat.

Therefore, by using concept URIs form these datasets, you interlink your

data with a rich and fast-growing network of other data sources.

[email protected]

Page 83: Tutorial kcc-2011

Types of Linking to other Data Sets

Relationship Links

point at related things in other data sources, for instance, other people, places or genes.

<http://www.skyhigh.com/people/GilDong>

rdf:type foaf:Person ;

foaf:name “Hong, Gil-Dong" ;

foaf:based_near <http://dbpedia.org/resource/Seoul> ;

foaf:topic_interest <http://dbpedia.org/resource/Justice> ;

foaf:knows <http://dbpedia.org/resource/HalBingDang> .

Identity Links

point at URI aliases used by other data sources to identify the same real-world object or abstract concept.

<http:// www.skyhigh.com/people/GilDong > <http://www.w3.org/2002/07/owl#sameAs> <http://www.korea.org/history/hero>

Vocabulary Links

point to the definitions of related terms in other vocabularies<http://www.university.org/terms/professor>

rdf:type rdfs:Class ;

rdfs:subClassOf <http://dbpedia.org/ontology/Person> .

rdfs:subClassOf <http://sw.opencyc.org/concept/Mx4rvbGdrcN5Y29ycA> ;

owl:equivalentClass <http://rdf.dictionary.com/entry/facultyMember>

[email protected]

Page 84: Tutorial kcc-2011

Link to other Data Sets

URI aliases In an open environment like the Web it often happens that different

information providers talk about the same non-information resource. As they do not know about each other, they introduce different URIs for identifying the same real-world object. http://dbpedia.org/resource/Berlin

http://sws.geonames.org/2950159/

URI aliases provide an important social function to the Web of Data as they are de-referenced to different descriptions of the same non-information resource and thus allow different views and opinions to be expressed.

owl:sameAs

Common Properties rdfs:seeAlso, foaf:knows, foaf:based_near, foaf:topic_interest,…

Two approaches for linking data: RDF Links Manually

Auto-generating RDF Links

[email protected]

Page 85: Tutorial kcc-2011

RDF Links Manually

Find the similar data sets as suitable linking targets manually search in these for the URI references you want to link to.

If a data source doesn't provide a search interface, you can use Linked Data browsers like Tabulator or Disco to explore the dataset and find the right URIs.

Useful sites: Sindice and Falcons provide indexes to identify candidate URIs for linking.

CKAN site : a registry of open linked data and projects.

Uriqr - A URI Search Engine: http://dev.uriqr.com/

Freebase: http://www.freebase.com

MOAT: Meaning Of A Tag Framework For manually interlinking tags with Semantic Web URIs (such as URIs from

DBpedia, Geonames … or any knowledge base)

Remember that data sources might use HTTP-303 redirects to redirect clients from URIs identifying non-information resources to URIs identifying information resources that describe the non-information resources.

[email protected]

Page 86: Tutorial kcc-2011

Auto-generating RDF Links

Various approaches Pattern-based Algorithms

Similarity-based Approaches

Complex property-based Algorithms Yves Equivalence Miner: interlinking Jamendo and Musicbrainz.

Equivalence Mining and Matching Frameworks Silk - A Link Discovery Framework for the Web of Data.

Silk can be run on a single machine or on a Hadoop cluster (for instance Amazon EC2).

LIMES - Link Discovery Framework for Metric Spaces. time-efficient and lossless approaches for large-scale link discovery based on

the characteristics of metric spaces.

DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

TopBraid Composer a wizard for linking ontology instances to corresponding DBpedia concepts.

SemMF a flexible framework for calculating semantic similarity between objects that

are represented as arbitrary RDF graphs.

http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/EquivalenceMining

[email protected]

Page 87: Tutorial kcc-2011

8. Describe Data Sets

• Metadata for Description

[email protected]

Page 88: Tutorial kcc-2011

Publishing Descriptions of a Data set

Help others discover and index your data

Apply a license or waiver to your data set

Metadata about the published linked data set authorship of a data set, its currency (i.e., how recently the data set was updated), its

licensing terms, the provenance and timeliness of a data set and the terms for licensing

Important issues: Provenance:

the ability to track the origin of data

key component in building trustworthy, reliable applications

Open Provenance Model84

Licenses vs. Waivers

Norms : a means for data publishers who waive their legal rights (through application of a waiver) to define expectations they have about how the data is used

Two primary mechanisms Semantic Sitemaps: http://sw.deri.org/2007/07/sitemapextension/

voiD : http://semanticweb.org/wiki/VoiD

[email protected]

Page 89: Tutorial kcc-2011

Description

DescriptionDescription Description of dataset that have the resource's URI as the subject. Description of dataset that have the resource's URI as the subject.

BacklinksBacklinksDescription of dataset that have the resource's URI as the object. This is redundant, but it allows browsers and crawlers to traverse links in either direction.

Description of dataset that have the resource's URI as the object. This is redundant, but it allows browsers and crawlers to traverse links in either direction.

Related Related descriptions

Any additional information about related resources, i.e., answering information about a book with the author information. A moderate approach not overloaded excessively.

Any additional information about related resources, i.e., answering information about a book with the author information. A moderate approach not overloaded excessively.

MetadataMetadata Metadata about published data, such as a URI identifying the author and licensing information. Metadata about published data, such as a URI identifying the author and licensing information.

SyntaxSyntax

Various ways to serialize RDF descriptions. At least provide RDF descriptions as RDF/XML which is the only official syntax for RDF.Additionally provide Turtle descriptions

Various ways to serialize RDF descriptions. At least provide RDF descriptions as RDF/XML which is the only official syntax for RDF.Additionally provide Turtle descriptions Trix, and other

[email protected]

Page 90: Tutorial kcc-2011

Data Set Description: Example

# Metadata and Licensing Information<http://dbpedia.org/data/Alec_Empire>

rdfs:label "RDF description of Alec Empire" ;rdf:type foaf:Document ;dc:publisher <http://dbpedia.org/resource/DBpedia> ;dc:date "2007-07-13"^^xsd:date ;dc:rights <http://en.wikipedia.org/wiki/WP:GFDL> .

# The description<http://dbpedia.org/resource/Alec_Empire>

foaf:name "Empire, Alec" ;rdf:type foaf:Person ;rdf:type <http://dbpedia.org/class/yago/musician> ;rdfs:comment

"Alec Empire (born May 2, 1972) is a German musician who is ..."@en ;rdfs:comment

"Alec Empire (eigentlich Alexander Wilke) ist ein deutscher Musiker. ..."@de ;dbpedia:genre <http://dbpedia.org/resource/Techno> ;dbpedia:associatedActs <http://dbpedia.org/resource/Atari_Teenage_Riot> ;foaf:page <http://en.wikipedia.org/wiki/Alec_Empire> ;foaf:page <http://dbpedia.org/page/Alec_Empire> ;rdfs:isDefinedBy <http://dbpedia.org/data/Alec_Empire> ;owl:sameAs <http://zitgist.com/music/artist/d71ba53b-23b0-4870-a429-cce6f345763b> .

[email protected]

Page 91: Tutorial kcc-2011

Data Set Description: Example

# Backlinks<http://dbpedia.org/resource/60_Second_Wipeout>

dbpedia:producer <http://dbpedia.org/resource/Alec_Empire> .<http://dbpedia.org/resource/Limited_Editions_1990-1994>

dbpedia:artist <http://dbpedia.org/resource/Alec_Empire> .

[email protected]

Page 92: Tutorial kcc-2011

9. Publish Data Sets

• Serialization• Linked Data Storage• Test and Debugging

[email protected]

Page 93: Tutorial kcc-2011

Publishing Linked Data

Serialization of Data

RDF files shouldn't be larger than, say, a few hundred kilobytes. Break them up into several RDF files

Make sure multiple RDF files are linked to each other through RDF triples.

Publication

MethodAdvantages Disadvantages

RDF/XML Document Oldest, best supported Confusingly like normal XML

Turtle (N3)

DocumentSimplest

Not technically a standard

yet

HTML Document

with RDFa

Fits inside HTML,

but also RDFCan get very complicated

JSON Normal JSON, but also RDFPromising, but still being

developed

GRDDL Use the XML you have/wantNeeds to download+run

XSLT

SPARQL Query Protocol Query Protocol

[email protected]

Page 94: Tutorial kcc-2011

Examples

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:db="http://dbpedia.org/resource/">

<rdf:Description rdf:about="http://dbpedia.org/resource/Massachusetts"><db:Governor><rdf:Description rdf:about="http://dbpedia.org/resource/Deval_Patrick" />

</db:Governor><db:Nickname>Bay State</db:Nickname><db:Capital><rdf:Description rdf:about="http://dbpedia.org/resource/Boston">

<db:Nickname>Beantown</db:Nickname></rdf:Description>

</db:Capital></rdf:Description>

</rdf:RDF>

RDF/XML

Turtle

@prefix db: <http://dbpedia.org/resource/>

db:Massachusetts db:Governor db:Deval_Patrick;db:Nickname "Bay State";db:Capital db:Boston.db:Nickname "Beantown".

[email protected]

Page 95: Tutorial kcc-2011

Examples

<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"

"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"><html xmlns="http://www.w3.org/1999/xhtml"

xmlns:db="http://dbpedia.org/resource/"version="XHTML+RDFa 1.0">

<head><title>About Massachusetts</title>

</head><body>

<div about="http://dbpedia.org/resource/Massachusetts">TheMassachusetts governor is<span rel="db:Governor">

<span about="http://dbpedia.org/resource/Deval_Patrick">DevalPatrick</span>,

</span>the nickname is "<span property="db:Nickname">Bay State</span>",and the capital<span rel="db:Capital">

<span about="http://dbpedia.org/resource/Boston">has the nickname "<span property="db:Nickname">Beantown</span>".

</span></span>

</div></body>

</html>

RDFa

[email protected]

Page 96: Tutorial kcc-2011

Examples

"__iri": "db:Massachusetts","db:Nickname": "Bay State","db:Governor": "__iri": "db:Deval_Patrick" ,"db:Capital": "__iri": "db:Boston",

"db:Nickname": "Beantown",

"__prefixes": "db:": "http://dbpedia.org/resource/"

<MyDataSet xmlns="http://example.org/my-data-xml-namespace"><State><name>Massachusetts</name><governor>Deval_Patrick</governor><nickname>Bay State</nickname><capital><name>Boston</name><nickname>Beantown</nickname>

</capital></State>

</MyDataSet>

RDF-JSON

GRDDL

[email protected]

Page 97: Tutorial kcc-2011

Linked Data Storage

RDB to RDF Middleware D2R Server

Native RDF Storage (manage it yourself) 4Store

AllegroGraph

Bigdata

BigOWLIM

Jena TDB

Neo4j

Sesame

Virtuoso

Native RDF Storage (managed) Talis Platform

Pubby Linked Data front-end for SPARQL Endpoints

Paget Framework

[email protected]

Page 98: Tutorial kcc-2011

Testing and Debugging Linked Data

To ensure it adheres to the Linked Data principles and best practices

correctness of URIs dereference

Vapour Linked Data Validator at http://idi.fundacionctic.org/vapour

RDF:Alerts at http://swse.deri.org/RDFAlerts/

Sindice Inspector at http://inspector.sindice.com/

manual validation and debugging of Linked Data

cURL, Firefox browser extensions LiveHTTPHeaders and ModifyHeaders

technical debugging and validation

Linked Data browsers can be used for.

Tabulator, Marbles, LOD Browser Switch

[email protected]

Page 99: Tutorial kcc-2011

Summary: Linked Data

[email protected] 99

Semantic Technologies need to go where the data is !

Long Live Semantic Technology !

Early adaptation of Semantic Technology is the king !

Linked Data is the common global data space.

Gun for killer apps of semantic technology…

Catalyst and enabler to make semantic technology real…

Unlimited opportunities ahead…

Growth in data volumes is very rapid.

Link, Integrate, Reuse

Linked Data is a truly Web-friendly way of publishing data.

Page 100: Tutorial kcc-2011

References

Keith Alexander, Richard Cyganiak, Michael Hausenblas, and Jun Zhao, Describing linked datasets, In

Proceedings of the WWW2009 Workshop on Linked Data on the Web, 2009.

Tim Berners-Lee, Linked Data - Design Issues, 2006, http://www.w3.org/DesignIssues/LinkedData.html.

Tim Berners-Lee, Giant global graph, http://dig.csail.mit.edu/breadcrumbs/node/215, 2007.

Christian Bizer, Tom Heath, and Tim Berners-Lee, Linked data - the story so far, Int. J. Semantic Web Inf.

Syst., 5(3):1–22, 2009.

Chris Bizer, Richard Cyganiak, and Tom Heath, How to Publish Linked Data on the Web,

http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/

W3C Working Draft, Cool URIs for the Semantic Web,

http://www.w3.org/TR/2008/WD-cooluris-20080321/

http://data.gov.uk/linked-data

http://www.w3.org/2001/sw/Specs.html

Auer, S., Dietzold, S., Lehmann, J., Hellmann, S., and Aumueller, D. (2009). Triplify : lightweight linked

data publication from relational databases. In Proceedings of the 17th International Conference on World

Wide Web, WWW 2009, Madrid, Spain, April 20-24, 2009

A Survey of current approaches for mapping of relational databases to RDF:

http://esw.w3.org/topic/Rdb2RdfXG/StateOfTheArt

Miles et al.: Best Practices Recipes for Publishing RDF Vocabularies, Available at:

http://www.w3.org/TR/swbp-vocab-pub/

[email protected]

Page 101: Tutorial kcc-2011

Semantic Technology

Your World, Your Way

[email protected]

[email protected]