linked (open) data

64
Linked (Open) Data INFO 4302 - April 18, 2011 Bernhard Haslhofer - Cornell University

Upload: bernhard-haslhofer

Post on 08-May-2015

2.934 views

Category:

Documents


0 download

DESCRIPTION

Lecture slides about the basics of Linked Data

TRANSCRIPT

Page 1: Linked (Open) Data

Linked (Open) Data

INFO 4302 - April 18, 2011Bernhard Haslhofer - Cornell University

Page 2: Linked (Open) Data

Who am I?

• Postdoc at Cornell Information Science

• Research areas• linked data

• user-contributed data (annotations)

• (meta-)data interoperability

• Contact:• [email protected]

Page 3: Linked (Open) Data

Today we talk about...

http://www.youtube.com/watch?v=5Cb3ik6zP2I

Page 4: Linked (Open) Data

Today we talk about...

• Movies, actors and other real-world entities

• How to make data about these entities available on the Web (Linked Data)

• Enabling technologies, best-practices and useful tools that help us in doing so

• Other Linked Data projects (BBC, LoC)

Page 5: Linked (Open) Data

Web Architecture Recap

Page 6: Linked (Open) Data

The World Wide Web (WWW)

• Internet != WWW != Google != Facebook

• Fundamental technologies• URI - a simple and generic syntax for identifiers

• HTML - a markup language without formal schema binding

• HTTP - a simple protocol to access and manipulate resources and resource representations in a distributed environment

• W3C Consortium (http://www.w3.org)

Page 7: Linked (Open) Data

URIs

• Identification of resources via Uniform Resource Identifiers (URIs)

• Generic Syntax:

© Prof. Dr. Wolfgang Klas und Dr. Bernhard Haslhofer, WS 2009/10 - Multimediale Systeme 27 Semantic Multimedia (I): RDF 7-11

The generic syntax consists of a hierarchical sequence of components, scheme, authority, path, query, and fragment.

URI = scheme “:” hier-path [ “?” query ] [ “#” fragment ]

Scheme and hier-path are required, though the path may be empty.

Example URIs with components:

foo://example.com:8042/over/there?name=ferret#nose \_/ \________________/\_________/ \_________/ \__/ | | | | | scheme authority path query fragment | ______________________|_ / \ / \ urn:example:animal:ferret:nose

The components are defined in more detail, e.g. authority may contain userinfo, host, and port. The path may be empty, absolute, or rootless.

URI

URL URN

Page 8: Linked (Open) Data

URIs / Resources

• Information Resource• web pages, images, product catalogs, etc

• all their essential characteristics can be conveyed in a message

• e.g., http://www.flickr.com/user2/photos/image.jpg

• Non-Information Resource• other things such as dogs, people, this classroom, concepts

• their essence is not information

• e.g., http://www.example.com/ontology/meter

Page 9: Linked (Open) Data

HTTP

• A stateless request-response protocol in the client-server computing model

• HTTP methods: GET, POST, PUT, DELETE, ...

• Agents may use a URI to access the referenced resource = dereferencing the URI

Page 10: Linked (Open) Data

HTTP Content Negotiation

• A URI is not (necessarily) a filename

• Conneg = making available multiple resource representations via the same URI

URI

Resource

Plain Texttext/plain

HTML (en)text/html

HTML (jp)text/htmlhttp://example.com/The_Shining

Page 11: Linked (Open) Data

(X)HTML(5)

• A resource representation data format...

• ... for presentation markup• rendered by user agents (typically browsers)

• focus on readability

• less formal, user-friendly syntax and semantics

Page 12: Linked (Open) Data

Web Services

• Application-to-application communication based on the Web architecture• simple and open standards (HTTP, XML, JSON, ...)

• send data from Application A to Application B through the Web

• usually define some API

Web

Application A Application B

Page 13: Linked (Open) Data

Linked Data

Page 14: Linked (Open) Data

Why Linked Data?

Page 15: Linked (Open) Data

Why Linked Data?

Page 16: Linked (Open) Data

Why Linked Data?

Page 17: Linked (Open) Data

Why Linked Data?

• There is lots of information on the Web

• ...valuable information that can be (re-)used

• Problem• information is usually expressed in the form of

HTML documents

• the underlying raw data are locked in closed data silos (mostly DBMS)

Page 19: Linked (Open) Data

Why Linked Data?

• The Web is successful because it provides• Uniform encoding (HTML)

• Uniform addressing (URI)

• Uniform transportation (HTTP)

for the exchange of documents.

• Why not apply the same mechanism to the underlying data?

Page 20: Linked (Open) Data
Page 21: Linked (Open) Data
Page 22: Linked (Open) Data

What is Linked Data?

• A method to build a Web of Data

• Architectural style, set of standards

Web

Page 23: Linked (Open) Data

What is Linked Data?

• A set of four principles• use URIs as names for things

• use HTTP URIs so that people can look up those names

• when someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)

• include links to other URIs, so that they can discover more things

Page 24: Linked (Open) Data

Enabling Technologies

Page 25: Linked (Open) Data

Uniform Resource Identifiers (URI)

• Name and identify things (resources)

• Dereferencable HTTP URIs

http://dbpedia.org/resource/The_Shining_(film)

http://rdf.freebase.com/ns/m/04fjzv

http://data.linkedmdb.org/resource/film/2014

Page 26: Linked (Open) Data

Resource Description Framework (RDF)

• A model for representing data on the Web

• Several statements (triples) form a graph

http://dbpedia.org/resource/The_Shining_(film)

The Shining (film)

rdfs:label

!" (#$)

rdfs:label

http://dbpedia.org/ontology/Film

rdf:type

http://dbpedia.org/resource/Jack_Nicholsondbpprop:starring

http://xmlns.com/foaf/0.1/Person

rdf:type

1937-04-22 Jack Nicholson

dbpedia-owl:birthDatefoaf:name

Page 27: Linked (Open) Data

RDF serialization (RDF/XML, N3, Turtle, etc.)

• Data formats for RDF resource representations

• Used to transfer RDF data between apps

© Prof. Dr. Wolfgang Klas und Dr. Bernhard Haslhofer, WS 2010/11 - Multimediale Systeme 27 Linked (Open) Data 7-15

7.2.2.3 RDF Serialization Formats: RDF/XML, N3, Turtle, N-Triple, etc

Data formats for RDF resource representations

Used to transfer RDF data from application-to-application

N3/Turtle example:

@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dbpedia-owl:<http://dbpedia.org/ontology/> .

<http://dbpedia.org/resource/The_Shining_%28film%29>rdf:type dbpedia-owl:Work , dbpedia-owl:Film .

@prefix dbpprop:<http://dbpedia.org/property/> .@prefix ns9:<http://dbpedia.org/datatype/> .

<http://dbpedia.org/resource/The_Shining_%28film%29>dbpprop:runtime"146.0"^^ns9:minute ;

Page 28: Linked (Open) Data

RDF Vocabulary Description Language (RDFS)

• A language for describing the syntax and semantics of vocabularies in a machine-understandable way

http://dbpedia.org/ontology/Film

http://dbpedia.org/ontology/Work

rdfs:subClassOf

Page 29: Linked (Open) Data

OWL - Web Ontology Language

• A more expressive (formal) language for defining the syntax and semantics of vocabularies

• Solves RDFS shortcomings but introduces quite some complexity

http://dbpedia.org/ontology/starring

http://www.w3.org/2002/07/owl#ObjectProperty

http://dbpedia.org/ontology/Person

http://dbpedia.org/ontology/Work

starring

rdf:type

rdfs:range

rdfs:domain

rdfs:label

Page 30: Linked (Open) Data

Simple Knowledge Organization System (SKOS)

• A language for describing controlled vocabularies (taxonomies, thesauri, classification schemes)

http://dbpedia.org/resource/The_Shining_(film)

http://dbpedia.org/resource/Category:1980s_horror_films

http://dbpedia.org/resource/Category:1980s_films

http://www.w3.org/2004/02/skos/core#Concept

skos:subject rdf:type

skos:broader

rdf:type

Page 31: Linked (Open) Data

Links between Resources

• OWL defines properties for linking resources

http://dbpedia.org/resource/The_Shining_(film)

http://rdf.freebase.com/ns/m/04fjzv

http://data.linkedmdb.org/resource/film/2014

owl:sameAs

http://dbpedia.org/resource/Jack_Nicholson

owl:sameAs

dbpprop:starring

http://data.nytimes.com/N5761411277431266513

owl:sameAs

Page 32: Linked (Open) Data

SPARQL

• A query language and protocol for accessing RDF data on the Web

© Prof. Dr. Wolfgang Klas und Dr. Bernhard Haslhofer, WS 2010/11 - Multimediale Systeme 27 Linked (Open) Data 7-19

7.2.2.7 SPARQL - RDF Query Language

A query language and protocol for accessing RDF data on the Web

SELECT DISTINCT ?x

WHERE {?x skos:subject <http:dbpedia.org/resource/Cate-gory:1980s_horror_films>}

LIMIT 10

Page 33: Linked (Open) Data

Vocabulary / Data Publishing Best Practices

Page 34: Linked (Open) Data

Publishing Vocabularies

• Hash-based URIs• e.g., http://example.com/example1#ClassA

• Suited to group the description of a moderate number of related terms into one RDF document

• Agent can retrieve terms with a single request

• Slash-based URIs• e.g., http://example.com/example1/ClassB

• Suited to split terms in large vocabularies into one document per term

• No need to download a massive document

Page 35: Linked (Open) Data

Provide either:

human-readable content from vocabulary URI

Page 36: Linked (Open) Data

or:

machine-readable content from vocabulary URI

... depending on what is requested.

Page 37: Linked (Open) Data
Page 38: Linked (Open) Data
Page 39: Linked (Open) Data

Publishing Data

• Distinguish between non-information and information resource

• Sample non-information resource• http://dbpedia.org/resource/The_Shining_(film)

• Sample information resource• http://dbpedia.org/page/The_Shining_(film) - HTML

• http://dbpedia.org/data/The_Shining_(film) - RDF

Page 40: Linked (Open) Data

Publishing Data

GET http://dbpedia.org/resource/The_Shining_(film)Accept: application/rdf+xml

303 See OtherLocation: http://dbpedia.org/data/The_Shining_(film)

GET http://dbpedia.org/data/The_Shining_(film)Accept: application/rdf+xml

200 OK...<?xml version="1.0" encoding="utf-8"?><rdf:RDF ...

Page 41: Linked (Open) Data

The Linking Open Data Community Project

Page 42: Linked (Open) Data

Linking? Open? Data Project?

• Open Data: a philosophy, practice, or policy that data are freely available to everyone without restrictions from copyright, patents, a.s.o.

• Linked Data: method / best practices for exposing, sharing, and connecting data using URIs and RDF

• Linking Open Data: a W3C community project with the goal to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting links between data items from different sources

Page 43: Linked (Open) Data
Page 44: Linked (Open) Data
Page 45: Linked (Open) Data
Page 46: Linked (Open) Data
Page 47: Linked (Open) Data
Page 48: Linked (Open) Data
Page 49: Linked (Open) Data
Page 50: Linked (Open) Data
Page 51: Linked (Open) Data
Page 52: Linked (Open) Data
Page 53: Linked (Open) Data
Page 54: Linked (Open) Data
Page 55: Linked (Open) Data
Page 56: Linked (Open) Data

Useful Tools

Page 57: Linked (Open) Data

RDF APIs

• Java

• Jena Semantic Web Framework (http://openjena.org/)

• Sesame RDF API (http://www.openrdf.org/)

• PHP

• ARC (http://arc.semsol.org/)

• Ruby

• RDF.rb: Linked Data for Ruby (http://rdf.rubyforge.org/)

• Python

• RDFLib (http://www.rdflib.net/)

• C

• Redland RDF Libraries (http://librdf.org/)

Page 59: Linked (Open) Data

RDF / Linked Data Wrappers

• D2RQ - SPARQL / Linked Data for relational databases (http://www4.wiwiss.fu-berlin.de/bizer/d2rq/)

• OAI2LOD Server - expose any OAI-PMH source as Linked Data

• TripFS - filesystem as Linked Data

• TripCel - XLS spreadsheets as Linked Dat

• ...

Page 60: Linked (Open) Data

Linked Data debugging

Startup your console / terminal- native on Linux / Mac OS X- Windows: http://www.cygwin.com/

Dereference resources with cURL (http://curl.haxx.se/)

curl -I -H "Accept: application/rdf+xml" http://dbpedia.org/resource/The_Shining_%28film%29

curl -H "Accept: application/rdf+xml" http://dbpedia.org/data/The_Shining_%28film%29

Page 62: Linked (Open) Data

Readings

Page 63: Linked (Open) Data

Required Reading

• T. Heath, C. Bizer. Linked Data: Evolving the Web into a Global Data Space, Chapters 1-5

http://linkeddatabook.com/editions/1.0/