introducing linked data - planetdata€¦ · irini fundulaki (institute of computer science -...

117
Introducing Linked Data (Part of this work was funded by PlanetData NoE FP7/2007-2013) Irini Fundulaki 1 1 Institute of Computer Science FORTH & W3C Greece Office Manager EICOS : 4th Meeting, Athens, Greece March 29, 2013 Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 1 / 80

Upload: others

Post on 06-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

  • Introducing Linked Data(Part of this work was funded by PlanetData NoE FP7/2007-2013)

    Irini Fundulaki1

    1Institute of Computer ScienceFORTH

    &W3C Greece Office Manager

    EICOS : 4th Meeting, Athens, GreeceMarch 29, 2013

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 1 / 80

  • Outline

    1 Going from the Web of Documents to the Web of Data

    2 Web ArchitectureItems of InterestNaming Resources: URIsResource Description Framework (RDF)

    3 Adding Semantics to Linked DataA Short IntroductionRDF Vocabulary Description Language (RDFS)Ontology Web Language (OWL)

    4 Linked Data LifeCycleExtractionInterlinking/FusingStorage/Querying

    5 Conclusions

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 2 / 80

  • Going from the Web of Documents to the Web of Data

    Traditional Web: Web of Documents

    single information space: global filesystem

    designed for human consumption

    documents are the primary objects with a loose structure

    untyped hyperlinks between documents

    URLs are the globally unique IDs and part of the retrieval mechanism

    cannot ask expressive queries

    Web Browsers

    SearchEngines

    hyperlinks hyperlinks

    hyperlinks

    c© Hartig, Cyganiac, Bizer, Hausenblas, Heath How to Publish Linked Data on the Web

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 3 / 80

  • Going from the Web of Documents to the Web of Data

    Going from a Web of Documents to a Web of Data

    a global database

    designed for machines first, humans later

    things are the primary objects with a well defined structure

    typed links between things

    ability to express structured queries

    Don’t link the documents, link the things

    The Web of Linked Data c© Tom Heath, An Introduction to Linked Data

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 4 / 80

  • Going from the Web of Documents to the Web of Data

    Publishing Data on the Web: Web APIs

    X APIs expose structured dataX enable the development of new applicationsX provide simple query access over the HTTP protocol− proprietary interfaces− based on a fixed set of sources− deep structure is not visible

    data data

    mashup creationmashup creation

    Web APIs create data silos in the WebIrini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 5 / 80

  • Going from the Web of Documents to the Web of Data

    Publishing Data on the Web: Web APIs

    X APIs expose structured dataX enable the development of new applicationsX provide simple query access over the HTTP protocol− proprietary interfaces− based on a fixed set of sources− deep structure is not visible

    data data

    mashup creationmashup creation

    Web APIs create data silos in the WebIrini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 5 / 80

  • Going from the Web of Documents to the Web of Data

    Publishing Data on the Web: Microformats

    X embed structured data into HTML pages describing specific types of entitiesX compatible with the idea of Web as a single information spaceX applications can extract data from the pages using microformats− only a fixed set of microformats exist (restricting the domain of interest)− no way to connect to data items− cannot share arbitrary data on the Web

    1. Tim ?Berners-?Lee's location is:2. 42.3633690,3. -71.091796.4.

    geo (http://microformats.org/wiki/geo): denotes the latitude andlongitude of the resource tied to the embedding web page

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 6 / 80

    http://microformats.org/wiki/geo

  • Going from the Web of Documents to the Web of Data

    Linked Data

    Beyond Web APIs and Microformats

    publishing and interlinking structured data on the Web using Semantic Webtechnologies

    sharing structured data on the Web as easily as documents are shared today

    is about using the Web to create typed links between different sources

    accessed through a single, standardized access mechanism

    Web Of Data, Semantic Web (Data Commons)a space where people and organizations can post and consume data about

    anything

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 7 / 80

  • Going from the Web of Documents to the Web of Data

    Linked Data Principles

    1. Use URIs as names for things

    2. Use HTTPs URIs so that people can look up those names

    3. When someone looks up a URI, provide useful RDF information

    4. Include RDF statements that link to other URIs so that they can discoverrelated things

    Tim Berners-Lee, 2007

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 8 / 80

  • Going from the Web of Documents to the Web of Data

    Is Linked Data Real?

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 9 / 80

  • Going from the Web of Documents to the Web of Data

    Linking Open Data

    Linking Open Datasets on the Web (LOD)

    publish public open data as Linked Data on the Web

    interlink entities between heterogeneous sources

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 10 / 80

  • Going from the Web of Documents to the Web of Data

    Evolution of Linked Open Datasets

    May 2007

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 11 / 80

  • Going from the Web of Documents to the Web of Data

    Evolution of Linked Open Datasets

    September 2008

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 12 / 80

  • Going from the Web of Documents to the Web of Data

    Evolution of Linked Open Datasets

    March 2009

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 13 / 80

  • Going from the Web of Documents to the Web of Data

    Linked Data in numbers (September 2011)

    Domain # datasets Triples % (Out)-Links %Media 25 1,841,852,061 5.82% 50,440,705 10.01%Geographic 31 6,145,532,484 19.43% 35,812,328 7.11%Government 49 13,315,009,400 42.09% 19,343,519 3.84%Publications 87 2,950,720,693 9.33% 139,925,218 27.76%Cross-domain 41 4,184,635,715 13.23% 63,183,065 12.54%Life sciences 41 3,036,336,004 9.60% 191,844,090 38.06%User-gen. cont. 20 134,127,413 0.42% 3,449,143 0.68%

    295 31,634,213,770 503,998,829

    (a) Distribution of links per domain (b) Distribution of triples per domain

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 14 / 80

  • Going from the Web of Documents to the Web of Data

    LinkedData: GeoNames Geographical Database

    covers all countries

    contains over eight million placenames

    considers facets/features

    integrates data from various datasources: National Geospatial-IntelligenceAgency’s (NGA), U.S. Geological Survey Geographic Names InformationSystem, Brasileiro de Geografia Estat́ıstica (IBGE), ...

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 15 / 80

  • Going from the Web of Documents to the Web of Data

    The DBPedia Knowledge Base

    crowd-sourced community effort to extract structured information fromWikipedia

    links available datasets on the Web to data extracted from Wikipedia

    versions of DBPedia in 111 languages

    data is published in a consistent ontology

    accessible through different SPARQL end-points

    DBPedia Live enables such a continuous synchronization between DBpedia andWikipedia: http://live.dbpedia.org/

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 16 / 80

    http://live.dbpedia.org/

  • Going from the Web of Documents to the Web of Data

    The DBPedia Knowledge Base

    crowd-sourced community effort to extract structured information fromWikipedia

    links available datasets on the Web to data extracted from Wikipedia

    versions of DBPedia in 111 languages

    data is published in a consistent ontology

    accessible through different SPARQL end-points

    DBPedia Live enables such a continuous synchronization between DBpedia andWikipedia: http://live.dbpedia.org/

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 16 / 80

    http://live.dbpedia.org/

  • Going from the Web of Documents to the Web of Data

    Linked Data Applications

    Linked Data Browsers

    Linked Data Mashups

    Search Engines

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 17 / 80

  • Going from the Web of Documents to the Web of Data

    Linked Data Applications

    Linked Data Browsers

    Linked Data Mashups

    Search Engines

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 17 / 80

  • Going from the Web of Documents to the Web of Data

    Linked Data Browsers

    Linked Data Browser explores the Linked Data Cloud:recognizes and follows RDF links to RDF resources

    Tabulator Browser

    Marbles (FU Berlin, DE)

    OpenLink RDF Browser (OpenLink, UK)

    Zitgist RDF Browser (Zitgist, USA)

    Disco Hyperdata Browser (FU Berlin, DE)

    Fenfire (DERI, Ireland)

    installed and configured as add-ons to existing browsers

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 18 / 80

  • Going from the Web of Documents to the Web of Data

    Linked Data Browsers

    Linked Data Browser explores the Linked Data Cloud:recognizes and follows RDF links to RDF resources

    Tabulator Browser

    Marbles (FU Berlin, DE)

    OpenLink RDF Browser (OpenLink, UK)

    Zitgist RDF Browser (Zitgist, USA)

    Disco Hyperdata Browser (FU Berlin, DE)

    Fenfire (DERI, Ireland)

    installed and configured as add-ons to existing browsers

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 18 / 80

  • Going from the Web of Documents to the Web of Data

    Linked Data Browsers

    Linked Data Browser explores the Linked Data Cloud:recognizes and follows RDF links to RDF resources

    Tabulator Browser

    Marbles (FU Berlin, DE)

    OpenLink RDF Browser (OpenLink, UK)

    Zitgist RDF Browser (Zitgist, USA)

    Disco Hyperdata Browser (FU Berlin, DE)

    Fenfire (DERI, Ireland)

    installed and configured as add-ons to existing browsers

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 18 / 80

  • Going from the Web of Documents to the Web of Data

    Linked Data Mashups

    Linked Data Mashups:Domain-specific applications that use Linked Data

    Revyu.com: everything you can review, uses Linked Data to enrich ratings,accessed through SPARQL endpoint

    DBtune Slashfacet: Visualizes music-related Linked Data

    data from LastFM, MySpace, BBC, Magnatune14 billion triples accessed through SPARQL endpoints

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 19 / 80

  • Going from the Web of Documents to the Web of Data

    Linked Data Mashups

    DERI Semantic Web Pipesbuild open source,extendable, embeddable Web Data mashups in thespirit of Yahoo! Pipes

    produce as output a stream of data in different formats (XML, RDF, JSON)to be used by applications.

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 20 / 80

  • Going from the Web of Documents to the Web of Data

    Linked Data Search Engines

    Falcons (IWS, China)

    supports keyword based search to objects, concepts (classes and properties),ontologies and RDF datasets

    Sindice (DERI, Ireland)

    explores and integrates RDF contentkeyword based searchstructured search through SPARQL queries

    MicroSearch (Yahooo, Spain)

    enriched Yahoo! searchextracts and displays the metadata of pages

    Watson (Open University, UK)

    supports keyword based searchgraphical representation of datasets

    Semantic Web Search Engine (SWSE) (DERI, Ireland)

    Semantic Web Ontology Swoogle (UMBC, USA)

    supports keyword based search to objects, concepts (classes and properties),ontologies and RDF datasets

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 21 / 80

  • Web Architecture Items of Interest

    Outline

    1 Going from the Web of Documents to the Web of Data

    2 Web ArchitectureItems of InterestNaming Resources: URIsResource Description Framework (RDF)

    3 Adding Semantics to Linked DataA Short IntroductionRDF Vocabulary Description Language (RDFS)Ontology Web Language (OWL)

    4 Linked Data LifeCycleExtractionInterlinking/FusingStorage/Querying

    5 Conclusions

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 22 / 80

  • Web Architecture Items of Interest

    Web Architecture: Items of Interest

    Resources

    things whose properties and relationships we want to describe in the data

    identified using Uniform Resource Identifiers (URIs)

    Distinguished into:

    Information Resources: documents, images, media files, . . .Non-Information Resources: people, places, proteins, cars, . . .

    URI Aliases

    different information providers talk about the same non-information resourceusing different URIs

    DBpedia: http://dbpedia.org/resource/Berlin identifies place “Berlin”GeoNames: http://sws.geonames.org/2950159/ identifies place “Berlin”

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 23 / 80

    http://dbpedia.org/resource/Berlinhttp://sws.geonames.org/2950159/

  • Web Architecture Items of Interest

    Web Architecture: Items of Interest

    Resources

    things whose properties and relationships we want to describe in the data

    identified using Uniform Resource Identifiers (URIs)

    Distinguished into:

    Information Resources: documents, images, media files, . . .Non-Information Resources: people, places, proteins, cars, . . .

    URI Aliases

    different information providers talk about the same non-information resourceusing different URIs

    DBpedia: http://dbpedia.org/resource/Berlin identifies place “Berlin”GeoNames: http://sws.geonames.org/2950159/ identifies place “Berlin”

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 23 / 80

    http://dbpedia.org/resource/Berlinhttp://sws.geonames.org/2950159/

  • Web Architecture Items of Interest

    Web Architecture: Items of Interest

    Representation : stream of bytes in a certain format such as HTML,

    RDF/XML, Turtle, NTriples, JSON, JPEG, PDF, . . ..

    HTML chosen to represent descriptions read by humans

    RDF/XML, Turtle, NTriples, . . . to represent descriptions managed bymachines (RDF data serialization formats)

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 24 / 80

  • Web Architecture Items of Interest

    Web Architecture: Items of Interest

    Dereferencing URIs looking up a URI on the Web in order to get informationabout the referenced resource.

    Information Resource server of the URI owner returns a snapshot of theresource’s current state and returns it in the requested format

    Non-Information Resource are dereferenced indirectly.

    Content Negotiation clients send HTTP requests with a header indicating thekind of preferred representation

    Associated Description is the result of dereferencing the URI of anon-information resource

    the description of a non-information resource can havemultiple representations

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 25 / 80

  • Web Architecture Naming Resources: URIs

    Outline

    1 Going from the Web of Documents to the Web of Data

    2 Web ArchitectureItems of InterestNaming Resources: URIsResource Description Framework (RDF)

    3 Adding Semantics to Linked DataA Short IntroductionRDF Vocabulary Description Language (RDFS)Ontology Web Language (OWL)

    4 Linked Data LifeCycleExtractionInterlinking/FusingStorage/Querying

    5 Conclusions

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 26 / 80

  • Web Architecture Naming Resources: URIs

    Naming Resource: URIs

    • provide a simple way to create globally unique names in a decentralizedfashion

    • used as a name but also as a means of accessing information• http://.. is the only scheme that is supported by all browsers• stable and persistent, defined in a controlled namespace• defined independently of any implementation

    http://dbpedia.org/resource/Munich

    http://www.demos/dbpedia/cgi-bin/resources.php?id=Munich

    • dereferencable:HTTP clients can look up the URI using the HTTP protocol

    URIs that refer to information resources, return documents

    URIs that identify real-world objects and abstract concepts (non-informationresources) return the descriptions

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 27 / 80

    http://..

  • Web Architecture Naming Resources: URIs

    URIs for Information and Non-Information Resources byExample

    Big Lynx independent television production company with website

    http://biglynx.co.uk/

    staff members: non-information resources

    http://biglynx.co.uk/people/libby-miller

    RDF/XML file that stores information about staff members: informationresources

    http://biglynx.co.uk/people/libby-miller.rdf

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 28 / 80

    http://biglynx.co.uk/http://biglynx.co.uk/people/libby-millerhttp://biglynx.co.uk/people/libby-miller.rdf

  • Web Architecture Naming Resources: URIs

    Dereferencing URIs for non-information resources

    (3)1 GET /people/libby-miller.rdf HTTP/1.12 Host: biglynx.co.uk3 Accept: text/html;q=0.5, application/rdf+xml

    (3)

    1 HTTP/1.1 303 See Other2 Location: http://biglynx.co.uk/people/libby-miller.rdf3 Vary: Accept

    (2)

    1 GET /people/libby-miller HTTP/1.12 Host: biglynx.co.uk3 Accept: text/html

    (1)

    (4)1 HTTP/1.1 200 OK 2 Content-Type: application/rdf+xml 3 4 5 6 9 10 11 12 Libby Miller13 ...

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 29 / 80

  • Web Architecture Resource Description Framework (RDF)

    Outline

    1 Going from the Web of Documents to the Web of Data

    2 Web ArchitectureItems of InterestNaming Resources: URIsResource Description Framework (RDF)

    3 Adding Semantics to Linked DataA Short IntroductionRDF Vocabulary Description Language (RDFS)Ontology Web Language (OWL)

    4 Linked Data LifeCycleExtractionInterlinking/FusingStorage/Querying

    5 Conclusions

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 30 / 80

  • Web Architecture Resource Description Framework (RDF)

    Resource Description Framework

    RDF is the W3C standard for representing information in the Web

    Subject Object

    Predicate

    U B U B L

    U

    U = set of URIsB = set of Blank nodesL = set of Literals

    (s, p, o) ∈ (U ∪ B)× U× (U ∪ B ∪ L) is called an RDF triple

    set of RDF triples is an RDF graph

    RDF graph is a node and edge-labeled directed graph

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 31 / 80

  • Web Architecture Resource Description Framework (RDF)

    RDF by Example

    (subject) (predicate) (object)

    t1 csail:libby rdf:type foaf:Persont2 csail:libby foaf:name ”Libby Miller”t3 csail:libby owl:sameAs webwho:libby miller

    subject: the URI identifying the described resource

    object: can either be a simple literal value or the URI of another resource

    predicate: the URI indicating the relation between subject and object

    csail:libby: subject URI of the resource of interest

    rdf:type, foaf:name, owl:sameAs: predicate URIs coming from vocabularies

    Resource Description Framework - RDFFriend Of A Friend - FOAFOntology Web Language - OWL.

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 32 / 80

    csail:libbycsail:libbycsail:libbycsail:libby

  • Web Architecture Resource Description Framework (RDF)

    RDF by Example

    (subject) (predicate) (object)

    t1 csail:libby rdf:type foaf:Persont2 csail:libby foaf:name ”Libby Miller”t3 csail:libby owl:sameAs webwho:libby miller

    subject: the URI identifying the described resource

    object: can either be a simple literal value or the URI of another resource

    predicate: the URI indicating the relation between subject and object

    csail:libby: subject URI of the resource of interest

    rdf:type, foaf:name, owl:sameAs: predicate URIs coming from vocabularies

    Resource Description Framework - RDFFriend Of A Friend - FOAFOntology Web Language - OWL.

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 32 / 80

    csail:libbycsail:libbycsail:libbycsail:libby

  • Web Architecture Resource Description Framework (RDF)

    RDF by Example

    (subject) (predicate) (object)

    t1 csail:libby rdf:type foaf:Persont2 csail:libby foaf:name ”Libby Miller”t3 csail:libby owl:sameAs webwho:libby miller

    subject: the URI identifying the described resource

    object: can either be a simple literal value or the URI of another resource

    predicate: the URI indicating the relation between subject and object

    csail:libby: subject URI of the resource of interest

    rdf:type, foaf:name, owl:sameAs: predicate URIs coming from vocabularies

    Resource Description Framework - RDFFriend Of A Friend - FOAFOntology Web Language - OWL.

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 32 / 80

    csail:libbycsail:libbycsail:libbycsail:libby

  • Web Architecture Resource Description Framework (RDF)

    RDF by Example

    (subject) (predicate) (object)

    t1 csail:libby rdf:type foaf:Persont2 csail:libby foaf:name ”Libby Miller”t3 csail:libby owl:sameAs webwho:libby miller

    subject: the URI identifying the described resource

    object: can either be a simple literal value or the URI of another resource

    predicate: the URI indicating the relation between subject and object

    csail:libby: subject URI of the resource of interest

    rdf:type, foaf:name, owl:sameAs: predicate URIs coming from vocabularies

    Resource Description Framework - RDFFriend Of A Friend - FOAFOntology Web Language - OWL.

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 32 / 80

    csail:libbycsail:libbycsail:libbycsail:libby

  • Web Architecture Resource Description Framework (RDF)

    RDF by Example

    (subject) (predicate) (object)

    t1 csail:libby rdf:type foaf:Person

    t2 csail:libby foaf:name ”Libby Miller”

    t3 csail:libby foaf:mbox ”mailto:[email protected]

    t4 csail:libby foaf:img ”http://biglynx.co.uk/people/img/libby.jpg”

    t5 csail:libby owl:sameAs webwho:libby miller

    the above set of RDF triples form an RDF graph

    csail:lillyrdf:type

    foaf:Person

    Lilly Millerfoaf:name

    foaf:mbox

    mailto:[email protected]

    http://biglynx.co.uk/people/img/liiby.jpg

    webwho:libby_millerowl:sameAs

    foaf:img

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 33 / 80

    csail:libbycsail:libbycsail:libbycsail:libbycsail:libby

  • Web Architecture Resource Description Framework (RDF)

    Resource Description Framework

    Benefits of the RDF Data Model

    generic and simple graph-based data model

    information from heterogeneous sources merges naturally: resources withthe same URIs denote the same non-information resource

    schemas are transparent: information is in the URIS

    structure is added using schema languages and represented as RDF triples

    Web browsers use URIs to retrieve information

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 34 / 80

  • Web Architecture Resource Description Framework (RDF)

    Resource Description Framework

    Benefits of the RDF Data Model

    generic and simple graph-based data model

    information from heterogeneous sources merges naturally: resources withthe same URIs denote the same non-information resource

    schemas are transparent: information is in the URIS

    structure is added using schema languages and represented as RDF triples

    Web browsers use URIs to retrieve information

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 34 / 80

  • Web Architecture Resource Description Framework (RDF)

    Resource Description Framework

    Benefits of the RDF Data Model

    generic and simple graph-based data model

    information from heterogeneous sources merges naturally: resources withthe same URIs denote the same non-information resource

    schemas are transparent: information is in the URIS

    structure is added using schema languages and represented as RDF triples

    Web browsers use URIs to retrieve information

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 34 / 80

  • Web Architecture Resource Description Framework (RDF)

    Resource Description Framework

    Benefits of the RDF Data Model

    generic and simple graph-based data model

    information from heterogeneous sources merges naturally: resources withthe same URIs denote the same non-information resource

    schemas are transparent: information is in the URIS

    structure is added using schema languages and represented as RDF triples

    Web browsers use URIs to retrieve information

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 34 / 80

  • Web Architecture Resource Description Framework (RDF)

    Resource Description Framework

    Benefits of the RDF Data Model

    generic and simple graph-based data model

    information from heterogeneous sources merges naturally: resources withthe same URIs denote the same non-information resource

    schemas are transparent: information is in the URIS

    structure is added using schema languages and represented as RDF triples

    Web browsers use URIs to retrieve information

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 34 / 80

  • Web Architecture Resource Description Framework (RDF)

    Resource Description Framework

    Benefits of the RDF Data Model

    generic and simple graph-based data model

    information from heterogeneous sources merges naturally: resources withthe same URIs denote the same non-information resource

    schemas are transparent: information is in the URIS

    structure is added using schema languages and represented as RDF triples

    Web browsers use URIs to retrieve information

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 34 / 80

  • Web Architecture Resource Description Framework (RDF)

    Resource Description Framework

    Benefits of the RDF Data Model

    generic and simple graph-based data model

    information from heterogeneous sources merges naturally: resources withthe same URIs denote the same non-information resource

    schemas are transparent: information is in the URIS

    structure is added using schema languages and represented as RDF triples

    Web browsers use URIs to retrieve information

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 34 / 80

  • Web Architecture Resource Description Framework (RDF)

    RDF Serialization Formats

    Serialization Formats: Creating information resources out ofnon-information resources

    RDF/XML: standardized by the W3C and is widely used to publish LinkedData on the Web but is heavy and difficult for humans to write and read

    1 2 67 8 9 Libby Miller 10 11

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 35 / 80

  • Web Architecture Resource Description Framework (RDF)

    RDF Serialization Formats

    RDFa: RDF triples are interwoven in HTML code but publishinginfrastructure is not yet in place

    1 2

  • Web Architecture Resource Description Framework (RDF)

    RDF Serialization Formats

    Turtle is a plain text format and is the format of choice for reading RDFtriples or writing them by hand

    1 @prefix rdf: . 2 @prefix foaf: . 2 @prefix csail: .3 4 csail:libby_miller 5 rdf:type foaf:Person ; 6 foaf:name "Libby Miller" .

    N-Triples is a subset of Turtle, minus features such as namespace prefixesand shorthands leading to redundancy but allows

    parsing and loading one line at a time, compression to handle big files,exchange, main memory parsing

    1 . 2 "Libby Miller" .

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 37 / 80

  • Web Architecture Resource Description Framework (RDF)

    Creating Linked Data: RDF Links

    Literal Triples: have an RDF literal as the object used to describe the propertiesof resources

    Object Triples: represent typed links between two resources ( the subject and

    object are URIs) ⇒ RDF Links

    Relationship Links: point at related things in other data sources

    Identity Links: point at URI aliases used by other data sources to identifythe same real-world object or abstract concept.

    Vocabulary Links point from data to the definitions of the vocabulary termsthat are used to represent the data

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 38 / 80

  • Web Architecture Resource Description Framework (RDF)

    Creating Linked Data: RDF Links

    Literal Triples: have an RDF literal as the object used to describe the propertiesof resources

    Object Triples: represent typed links between two resources ( the subject and

    object are URIs) ⇒ RDF Links

    Relationship Links: point at related things in other data sources

    Identity Links: point at URI aliases used by other data sources to identifythe same real-world object or abstract concept.

    Vocabulary Links point from data to the definitions of the vocabulary termsthat are used to represent the data

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 38 / 80

  • Web Architecture Resource Description Framework (RDF)

    Creating Linked Data: RDF Links

    Literal Triples: have an RDF literal as the object used to describe the propertiesof resources

    Object Triples: represent typed links between two resources ( the subject and

    object are URIs) ⇒ RDF Links

    Relationship Links: point at related things in other data sources

    Identity Links: point at URI aliases used by other data sources to identifythe same real-world object or abstract concept.

    Vocabulary Links point from data to the definitions of the vocabulary termsthat are used to represent the data

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 38 / 80

  • Web Architecture Resource Description Framework (RDF)

    Creating Linked Data: RDF Links

    csail:lillyrdf:type

    foaf:Person

    Lilly Millerfoaf:name

    foaf:based_near

    dbpedia:Munich

    Relationship Links

    csail:lillyowl:sameAs

    Identity Links webwho:libby_miller

    xsd:string

    sorg:City

    sorg:Place

    rdfs:subClassOf

    sorg:containedIn

    sorg:GeoShape

    sorg:isicV4

    sorg:geo

    csail:libby_hometownrdf:type

    Vocabulary Links

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 39 / 80

  • Web Architecture Resource Description Framework (RDF)

    Creating Linked Data: RDF Links

    RDF links: Foundation of Linked Data

    csail:lillyrdf:type

    foaf:Person

    Lilly Miller

    foaf:name

    foaf:based_near

    dbpedia:Munichrdf:type

    sorg:City

    1,1305,522

    db-ont:populationTotal

    dbpedia:Germany

    db-ont:country

    xsd:string

    sorg:Place

    rdfs:subClassOf

    sorg:containedIn

    sorg:GeoShape

    sorg:isicV4

    sorg:geo

    webwho:libby_miller

    owl:sameAs

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 40 / 80

  • Adding Semantics to Linked Data A Short Introduction

    Outline

    1 Going from the Web of Documents to the Web of Data

    2 Web ArchitectureItems of InterestNaming Resources: URIsResource Description Framework (RDF)

    3 Adding Semantics to Linked DataA Short IntroductionRDF Vocabulary Description Language (RDFS)Ontology Web Language (OWL)

    4 Linked Data LifeCycleExtractionInterlinking/FusingStorage/Querying

    5 Conclusions

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 41 / 80

  • Adding Semantics to Linked Data A Short Introduction

    Adding Semantics to Linked Data

    RDF is a generic, abstract data model for describing resources in the formof triples

    does not provide ways of defining classes, properties, constraints etc.

    Languages

    Simple Knowledge Organization System (SKOS) to definetaxonomies/thesauri

    RDF Vocabulary Description Language (RDF Schema - RDFS) to definevocabularies

    Ontology Web Language (OWL) to define ontologies

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 42 / 80

  • Adding Semantics to Linked Data A Short Introduction

    Adding Semantics to Linked Data

    RDF is a generic, abstract data model for describing resources in the formof triples

    does not provide ways of defining classes, properties, constraints etc.

    Languages

    Simple Knowledge Organization System (SKOS) to definetaxonomies/thesauri

    RDF Vocabulary Description Language (RDF Schema - RDFS) to definevocabularies

    Ontology Web Language (OWL) to define ontologies

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 42 / 80

  • Adding Semantics to Linked Data RDF Vocabulary Description Language (RDFS)

    Outline

    1 Going from the Web of Documents to the Web of Data

    2 Web ArchitectureItems of InterestNaming Resources: URIsResource Description Framework (RDF)

    3 Adding Semantics to Linked DataA Short IntroductionRDF Vocabulary Description Language (RDFS)Ontology Web Language (OWL)

    4 Linked Data LifeCycleExtractionInterlinking/FusingStorage/Querying

    5 Conclusions

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 43 / 80

  • Adding Semantics to Linked Data RDF Vocabulary Description Language (RDFS)

    RDF Vocabulary Description Language: RDF Schema

    designed to introduce useful semantics to RDF triples

    typing: defining classes, properties, instances

    relationships between types and instances: subsumption and instantiation

    constraints: domain and range for properties

    inference rules to entail new, inferred knowledge

    schemas are represented as RDF triples

    (subject) (predicate) (object)

    t1 sorg:Place rdf:type rdfs:Class

    t2 sorg:City rdfs:subClassOf sorg:Place

    t3 sorg:containedIn rdf:type rdf:Property

    t4 sorg:containedIn rdfs:domain sorg:Place

    t5 sorg:containedIn rdfs:range sorg:Place

    t6 csail:libby home town rdf:type sorg:City

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 44 / 80

    sorg:Placesorg:Citysorg:Placesorg:containedInsorg:containedInsorg:Placesorg:containedInsorg:Placesorg:City

  • Adding Semantics to Linked Data RDF Vocabulary Description Language (RDFS)

    RDF Vocabulary Description Language: RDF Schema

    designed to introduce useful semantics to RDF triples

    typing: defining classes, properties, instances

    relationships between types and instances: subsumption and instantiation

    constraints: domain and range for properties

    inference rules to entail new, inferred knowledge

    schemas are represented as RDF triples

    (subject) (predicate) (object)

    t1 sorg:Place rdf:type rdfs:Class

    t2 sorg:City rdfs:subClassOf sorg:Place

    t3 sorg:containedIn rdf:type rdf:Property

    t4 sorg:containedIn rdfs:domain sorg:Place

    t5 sorg:containedIn rdfs:range sorg:Place

    t6 csail:libby home town rdf:type sorg:City

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 44 / 80

    sorg:Placesorg:Citysorg:Placesorg:containedInsorg:containedInsorg:Placesorg:containedInsorg:Placesorg:City

  • Adding Semantics to Linked Data RDF Vocabulary Description Language (RDFS)

    RDF Vocabulary Description Language: RDF Schema

    designed to introduce useful semantics to RDF triples

    typing: defining classes, properties, instances

    relationships between types and instances: subsumption and instantiation

    constraints: domain and range for properties

    inference rules to entail new, inferred knowledge

    schemas are represented as RDF triples

    (subject) (predicate) (object)

    t1 sorg:Place rdf:type rdfs:Class

    t2 sorg:City rdfs:subClassOf sorg:Place

    t3 sorg:containedIn rdf:type rdf:Property

    t4 sorg:containedIn rdfs:domain sorg:Place

    t5 sorg:containedIn rdfs:range sorg:Place

    t6 csail:libby home town rdf:type sorg:City

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 44 / 80

    sorg:Placesorg:Citysorg:Placesorg:containedInsorg:containedInsorg:Placesorg:containedInsorg:Placesorg:City

  • Adding Semantics to Linked Data RDF Vocabulary Description Language (RDFS)

    RDF Vocabulary Description Language: RDF Schema

    designed to introduce useful semantics to RDF triples

    typing: defining classes, properties, instances

    relationships between types and instances: subsumption and instantiation

    constraints: domain and range for properties

    inference rules to entail new, inferred knowledge

    schemas are represented as RDF triples

    (subject) (predicate) (object)

    t1 sorg:Place rdf:type rdfs:Class

    t2 sorg:City rdfs:subClassOf sorg:Place

    t3 sorg:containedIn rdf:type rdf:Property

    t4 sorg:containedIn rdfs:domain sorg:Place

    t5 sorg:containedIn rdfs:range sorg:Place

    t6 csail:libby home town rdf:type sorg:City

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 44 / 80

    sorg:Placesorg:Citysorg:Placesorg:containedInsorg:containedInsorg:Placesorg:containedInsorg:Placesorg:City

  • Adding Semantics to Linked Data RDF Vocabulary Description Language (RDFS)

    RDF Vocabulary Description Language: RDF Schema

    designed to introduce useful semantics to RDF triples

    typing: defining classes, properties, instances

    relationships between types and instances: subsumption and instantiation

    constraints: domain and range for properties

    inference rules to entail new, inferred knowledge

    schemas are represented as RDF triples

    (subject) (predicate) (object)

    t1 sorg:Place rdf:type rdfs:Class

    t2 sorg:City rdfs:subClassOf sorg:Place

    t3 sorg:containedIn rdf:type rdf:Property

    t4 sorg:containedIn rdfs:domain sorg:Place

    t5 sorg:containedIn rdfs:range sorg:Place

    t6 csail:libby home town rdf:type sorg:City

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 44 / 80

    sorg:Placesorg:Citysorg:Placesorg:containedInsorg:containedInsorg:Placesorg:containedInsorg:Placesorg:City

  • Adding Semantics to Linked Data RDF Vocabulary Description Language (RDFS)

    RDF Vocabulary Description Language: RDF Schema

    designed to introduce useful semantics to RDF triples

    typing: defining classes, properties, instances

    relationships between types and instances: subsumption and instantiation

    constraints: domain and range for properties

    inference rules to entail new, inferred knowledge

    schemas are represented as RDF triples

    (subject) (predicate) (object)

    t1 sorg:Place rdf:type rdfs:Class

    t2 sorg:City rdfs:subClassOf sorg:Place

    t3 sorg:containedIn rdf:type rdf:Property

    t4 sorg:containedIn rdfs:domain sorg:Place

    t5 sorg:containedIn rdfs:range sorg:Place

    t6 csail:libby home town rdf:type sorg:City

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 44 / 80

    sorg:Placesorg:Citysorg:Placesorg:containedInsorg:containedInsorg:Placesorg:containedInsorg:Placesorg:City

  • Adding Semantics to Linked Data RDF Vocabulary Description Language (RDFS)

    RDFS Inference

    Used to entail new information from the one that is explicitly stated

    transitive closure along the class and property hierarchies

    R1 :

    (P1, rdfs:subPropertyOf, P2), (P2, rdfs:subPropertyOf, P3)

    (P1, rdfs:subPropertyOf, P3)

    R2 :

    (C1, rdfs:subClassOfC2, , )(C2, rdfs:subClassOf, C3)

    (C1, rdfs:subClassOf, C3)

    transitive closure along the type and class/property hierarchies

    R3 :

    (P1, rdfs:subPropertyOf, P2), (r1, P1, r2)

    (r1, P2, r2)

    R4 :

    (C1, rdfs:subClassOf, C2), (r1, rdf:type, C1)

    (r1, rdf:type, C2)

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 45 / 80

  • Adding Semantics to Linked Data RDF Vocabulary Description Language (RDFS)

    RDFS Inference

    (subject) (predicate) (object)

    t1 sorg:Place rdf:type rdfs:Class

    t2 sorg:City rdfs:subClassOf sorg:Place

    t6 csail:libby home town rdf:type sorg:City

    transitive closure along the type and class/property hierarchies

    R4 :

    (C1, rdfs:subClassOf, C2), (r1, rdf:type, C1)

    (r1, rdf:type, C2)

    (subject) (predicate) (object)

    t7 sorg:City rdf:type rdfs:Class

    t8 csail:libby home town rdf:type sorg:Place

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 46 / 80

    sorg:Placesorg:Citysorg:Placesorg:Citysorg:Citysorg:Place

  • Adding Semantics to Linked Data RDF Vocabulary Description Language (RDFS)

    RDFS Inference

    (subject) (predicate) (object)

    t1 sorg:Place rdf:type rdfs:Class

    t2 sorg:City rdfs:subClassOf sorg:Place

    t6 csail:libby home town rdf:type sorg:City

    transitive closure along the type and class/property hierarchies

    R4 :

    (C1, rdfs:subClassOf, C2), (r1, rdf:type, C1)

    (r1, rdf:type, C2)

    (subject) (predicate) (object)

    t7 sorg:City rdf:type rdfs:Class

    t8 csail:libby home town rdf:type sorg:Place

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 46 / 80

    sorg:Placesorg:Citysorg:Placesorg:Citysorg:Citysorg:Place

  • Adding Semantics to Linked Data RDF Vocabulary Description Language (RDFS)

    RDFS Inference

    (subject) (predicate) (object)

    t1 sorg:Place rdf:type rdfs:Class

    t2 sorg:City rdfs:subClassOf sorg:Place

    t6 csail:libby home town rdf:type sorg:City

    transitive closure along the type and class/property hierarchies

    R4 :

    (C1, rdfs:subClassOf, C2), (r1, rdf:type, C1)

    (r1, rdf:type, C2)

    (subject) (predicate) (object)

    t7 sorg:City rdf:type rdfs:Class

    t8 csail:libby home town rdf:type sorg:Place

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 46 / 80

    sorg:Placesorg:Citysorg:Placesorg:Citysorg:Citysorg:Place

  • Adding Semantics to Linked Data Ontology Web Language (OWL)

    Outline

    1 Going from the Web of Documents to the Web of Data

    2 Web ArchitectureItems of InterestNaming Resources: URIsResource Description Framework (RDF)

    3 Adding Semantics to Linked DataA Short IntroductionRDF Vocabulary Description Language (RDFS)Ontology Web Language (OWL)

    4 Linked Data LifeCycleExtractionInterlinking/FusingStorage/Querying

    5 Conclusions

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 47 / 80

  • Adding Semantics to Linked Data Ontology Web Language (OWL)

    Ontology Web Language: OWL

    RDFS is useful but has limited capabilities regarding schema constraints

    class/property hierarchies, constraining property domain and range

    Ontologies define the concepts, relationships and complexconstraints to describe and represent an area of knowledge

    Ontology Web Language used to create and reason aboutontologies:

    terminology/vocabulary used in a specific context

    constraints on terms and properties

    equivalence of vocabulary terms etc.

    rich semantics but must be computational tractable and implementable

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 48 / 80

  • Adding Semantics to Linked Data Ontology Web Language (OWL)

    Ontology Web Language: OWL

    RDFS is useful but has limited capabilities regarding schema constraints

    class/property hierarchies, constraining property domain and range

    Ontologies define the concepts, relationships and complexconstraints to describe and represent an area of knowledge

    Ontology Web Language used to create and reason aboutontologies:

    terminology/vocabulary used in a specific context

    constraints on terms and properties

    equivalence of vocabulary terms etc.

    rich semantics but must be computational tractable and implementable

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 48 / 80

  • Adding Semantics to Linked Data Ontology Web Language (OWL)

    Ontology Web Language: OWL

    RDFS is useful but has limited capabilities regarding schema constraints

    class/property hierarchies, constraining property domain and range

    Ontologies define the concepts, relationships and complexconstraints to describe and represent an area of knowledge

    Ontology Web Language used to create and reason aboutontologies:

    terminology/vocabulary used in a specific context

    constraints on terms and properties

    equivalence of vocabulary terms etc.

    rich semantics but must be computational tractable and implementable

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 48 / 80

  • Adding Semantics to Linked Data Ontology Web Language (OWL)

    Ontology Web Language

    OWL expresses a small subset of First Order Logic

    defines a structure (classes, properties, datatypes) and axioms

    inference based on OWL is restricted in this framework but can be extremelycomplex

    Ontologies can be hard

    full ontology-based application is a complex system

    hard to implement and run

    three layers of OWL are defined (in decreasing levels of complexity)

    OWL sublanguages

    OWL Full: superset of RDFS, but an OWL Full ontology can be undecidable

    OWL DL (Description Logic): maximal subset of OWL Full but decidablereasoning procedures

    OWL Lite: minimal, useful subset, easily implementable

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 49 / 80

  • Adding Semantics to Linked Data Ontology Web Language (OWL)

    Ontology Web Language: OWL

    class construction

    content enumeration

    intersection, union, complementation

    property restrictions

    property restrictions

    value constraints (interpreted as restrictions on value ranges)

    cardinality constraints: maximum, minimum, exact

    property characterization

    symmetric, transitive, functional, etc.

    differentiation between datatype and object properties

    term equivalence

    classes are equivalent (owl:equivalentClass) or disjoint (owl:disjointWith)

    properties are equivalent (owl:equivalentProperty) or inverse (owl:inverseOf)

    URIs refer to the same (owl:sameAs) or distinct individual (owl:differentFrom)

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 50 / 80

  • Adding Semantics to Linked Data Ontology Web Language (OWL)

    Ontology Web Language: OWL

    class construction

    content enumeration

    intersection, union, complementation

    property restrictions

    property restrictions

    value constraints (interpreted as restrictions on value ranges)

    cardinality constraints: maximum, minimum, exact

    property characterization

    symmetric, transitive, functional, etc.

    differentiation between datatype and object properties

    term equivalence

    classes are equivalent (owl:equivalentClass) or disjoint (owl:disjointWith)

    properties are equivalent (owl:equivalentProperty) or inverse (owl:inverseOf)

    URIs refer to the same (owl:sameAs) or distinct individual (owl:differentFrom)

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 50 / 80

  • Adding Semantics to Linked Data Ontology Web Language (OWL)

    Ontology Web Language: OWL

    class construction

    content enumeration

    intersection, union, complementation

    property restrictions

    property restrictions

    value constraints (interpreted as restrictions on value ranges)

    cardinality constraints: maximum, minimum, exact

    property characterization

    symmetric, transitive, functional, etc.

    differentiation between datatype and object properties

    term equivalence

    classes are equivalent (owl:equivalentClass) or disjoint (owl:disjointWith)

    properties are equivalent (owl:equivalentProperty) or inverse (owl:inverseOf)

    URIs refer to the same (owl:sameAs) or distinct individual (owl:differentFrom)

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 50 / 80

  • Adding Semantics to Linked Data Ontology Web Language (OWL)

    Ontology Web Language: OWL

    class construction

    content enumeration

    intersection, union, complementation

    property restrictions

    property restrictions

    value constraints (interpreted as restrictions on value ranges)

    cardinality constraints: maximum, minimum, exact

    property characterization

    symmetric, transitive, functional, etc.

    differentiation between datatype and object properties

    term equivalence

    classes are equivalent (owl:equivalentClass) or disjoint (owl:disjointWith)

    properties are equivalent (owl:equivalentProperty) or inverse (owl:inverseOf)

    URIs refer to the same (owl:sameAs) or distinct individual (owl:differentFrom)

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 50 / 80

  • Adding Semantics to Linked Data Ontology Web Language (OWL)

    Ontology Web Language: OWL

    class construction

    content enumeration

    intersection, union, complementation

    property restrictions

    property restrictions

    value constraints (interpreted as restrictions on value ranges)

    cardinality constraints: maximum, minimum, exact

    property characterization

    symmetric, transitive, functional, etc.

    differentiation between datatype and object properties

    term equivalence

    classes are equivalent (owl:equivalentClass) or disjoint (owl:disjointWith)

    properties are equivalent (owl:equivalentProperty) or inverse (owl:inverseOf)

    URIs refer to the same (owl:sameAs) or distinct individual (owl:differentFrom)

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 50 / 80

  • Linked Data LifeCycle

    Linked Open Data Cloud: The Challenge

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 51 / 80

  • Linked Data LifeCycle

    The Challenge

    Classical data management approaches assume complete control overschema, data and data generationBut the Web is distributed and open ⇒ control is lackingLinked Data LifeCycle

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 52 / 80

  • Linked Data LifeCycle Extraction

    Outline

    1 Going from the Web of Documents to the Web of Data

    2 Web ArchitectureItems of InterestNaming Resources: URIsResource Description Framework (RDF)

    3 Adding Semantics to Linked DataA Short IntroductionRDF Vocabulary Description Language (RDFS)Ontology Web Language (OWL)

    4 Linked Data LifeCycleExtractionInterlinking/FusingStorage/Querying

    5 Conclusions

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 53 / 80

  • Linked Data LifeCycle Extraction

    Extraction

    Extraction: refers to the process of extracting data from unstructured andstructured sources and representing it in the RDF data model

    Extraction from Unstructured Sources: sub-disciplines of NLP

    Named Entity Extraction: extracting entity labels from text

    Keyword/Keyphrase Extraction: recognition of central topics

    Relationship Extraction: mining the properties which link the entities andkeywords described in the input data

    Linked Data Context: extraction of suitable URIs for the discoveredentities and relationships

    extracted from knowledge bases such as DBPedia by comparing the name ofthe entity and relationship with the label or accompanied comments of theDBPedia entities

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 54 / 80

  • Linked Data LifeCycle Extraction

    Extraction

    Extraction: refers to the process of extracting data from unstructured andstructured sources and representing it in the RDF data model

    Extraction from Unstructured Sources: sub-disciplines of NLP

    Named Entity Extraction: extracting entity labels from text

    Keyword/Keyphrase Extraction: recognition of central topics

    Relationship Extraction: mining the properties which link the entities andkeywords described in the input data

    Linked Data Context: extraction of suitable URIs for the discoveredentities and relationships

    extracted from knowledge bases such as DBPedia by comparing the name ofthe entity and relationship with the label or accompanied comments of theDBPedia entities

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 54 / 80

  • Linked Data LifeCycle Extraction

    Extraction

    Extraction: refers to the process of extracting data from unstructured andstructured sources and representing it in the RDF data model

    Extraction from Unstructured Sources: sub-disciplines of NLP

    Named Entity Extraction: extracting entity labels from text

    Keyword/Keyphrase Extraction: recognition of central topics

    Relationship Extraction: mining the properties which link the entities andkeywords described in the input data

    Linked Data Context: extraction of suitable URIs for the discoveredentities and relationships

    extracted from knowledge bases such as DBPedia by comparing the name ofthe entity and relationship with the label or accompanied comments of theDBPedia entities

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 54 / 80

  • Linked Data LifeCycle Extraction

    Extraction

    Extraction from Structured Sources (relational and XML)

    R2RML Relational to RDF Mapping Language

    Relational Database(RDB)

    RDF Graph(set of RDF triples)

    RDB 2 RDF

    Virtuoso SPARQL EndPoint

    Query Access(SPARQL)

    Entity Level Access(HTTP GET)

    Access via dump(HTTP GET)

    Relational to RDF Extraction

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 55 / 80

  • Linked Data LifeCycle Extraction

    RDF to RDF Mapping Language (R2RML)

    • W3C Standard Language for expressing customized mappings from relationaldatabases to RDF datasets

    • mappings provide the ability to view relational data as RDF data

    • R2RML mappings define RDF graphs

    • R2RML enables different types of mapping implementationsoffer a virtual SPARQL endpoint over mapped relational data

    generate RDF dumps

    offer access through a Linked Data API

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 56 / 80

  • Linked Data LifeCycle Interlinking/Fusing

    Outline

    1 Going from the Web of Documents to the Web of Data

    2 Web ArchitectureItems of InterestNaming Resources: URIsResource Description Framework (RDF)

    3 Adding Semantics to Linked DataA Short IntroductionRDF Vocabulary Description Language (RDFS)Ontology Web Language (OWL)

    4 Linked Data LifeCycleExtractionInterlinking/FusingStorage/Querying

    5 Conclusions

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 57 / 80

  • Linked Data LifeCycle Interlinking/Fusing

    Interlinking/Fusing

    connecting things that are somehow related

    serves the Linked Open Data principle “ Include links to other URIs, sothat they can discover more things”

    enables the paradigm change from data silos to interoperable data distributedacross the Web

    cross-ontology question answering

    data integration

    Link Discovery Frameworks:

    Ontology Matching: aims to establish links between the ontologiesunderlying two data sources.

    Instance Matching: aims to discover links between instances contained intwo data sources.

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 58 / 80

  • Linked Data LifeCycle Interlinking/Fusing

    Instance Matching for Linked Data

    • more generic and complex than schema matching and record linkage• not limited to finding only equivalent entities in knowledge bases• aims at finding semantically related entities• establishing links between them with formal properties (transitivity, symmetry)used by reasoners

    to infer new knowledge

    perform data integration tasks

    answer cross-ontology queries

    Link Discovery Frameworks

    domain-specificRKBExplorer (academic domain): http://www.rkbexplorer.comGNAT (music domain)

    universalRDF-AI : http://code.google.com/p/rdfai/SILK: http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/LIMES: http://aksw.org/Projects/LIMES.html

    LIMES and SILK focus on time-efficient instance matching techniques

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 59 / 80

    http://www.rkbexplorer.comhttp://code.google.com/p/rdfai/http://wifo5-03.informatik.uni-mannheim.de/bizer/silk/http://aksw.org/Projects/LIMES.html

  • Linked Data LifeCycle Storage/Querying

    Outline

    1 Going from the Web of Documents to the Web of Data

    2 Web ArchitectureItems of InterestNaming Resources: URIsResource Description Framework (RDF)

    3 Adding Semantics to Linked DataA Short IntroductionRDF Vocabulary Description Language (RDFS)Ontology Web Language (OWL)

    4 Linked Data LifeCycleExtractionInterlinking/FusingStorage/Querying

    5 Conclusions

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 60 / 80

  • Linked Data LifeCycle Storage/Querying

    Querying Linked Data

    SPARQL: W3C Standard Language for Querying Linked Data

    Building Block is the SPARQL triple pattern: an RDF triple with variables

    an element of the set: ( V ∪ U)× ( V ∪ U)× ( V ∪ U ∪ L)U= set of URIs, L= set of Literals, V= set of Variables

    Main idea: Pattern Matching

    describe subgraphs of the queried RDF graph

    subgraphs that match the query yield a set of mappings: assignment ofvariables to URIs and literals

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 61 / 80

  • Linked Data LifeCycle Storage/Querying

    SPARQL: Querying Linked Data

    Intuitively a triple pattern denotes the triples in an RDFgraph that are of a specific form

    tp1 =(?j , rdf:type, ?y): matches all triples with predicate rdf:type

    Set of RDF Triples MappingsS P O

    t1 Starbucks type Cafet2 Starbucks branchLoc 53rd Stt3 MoMA address 53rd St

    ?j ?y

    µ1 Starbucks Cafe

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 62 / 80

  • Linked Data LifeCycle Storage/Querying

    SPARQL Queries

    Types of Queries

    SELECT: return a set of variable mappings

    CONSTRUCT: return an RDF graph

    ASK: boolean queries

    SQL-like syntax : SELECT ?v1, ?v2, . . . WHERE GP, with ?v1, ?v2 variables

    SPARQL graph patterns (GP) are formulated using the join (AND), union

    (UNION) and optional (OPTIONAL) operators on triple patterns with filterconditions (FILTER)

    SPARQL semantics : expressed through an algebra that operates on sets ofmappings

    unary operators σ for FILTER, π for SELECT

    binary operators, ∪ for UNION, on for AND and for OPTIONALCompatible Mappings : mappings are compatible if they agree on their

    common variables

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 63 / 80

  • Linked Data LifeCycle Storage/Querying

    SPARQL Queries

    Ω = evaluation of (?x , ?y , ?z) over T Ω1 = π?x,?y (σ?z=“53rd St”(Ω))

    ?x ?y ?zStarbucks type Cafe

    Starbucks branchLoc 53rd St

    MoMA address 53rd St

    ?x ?yµ1 : Starbucks branchLocµ2 : MoMA address

    (a) (b)

    ( ?x , ?y , 53rd St)

    Ω2 = π?y,?z (σ?x=“Starbucks”(Ω)) Ω3 = π?x,?y,?z (Ω1 on Ω2)?y ?z

    µ3 : type Cafeµ4 : branchLoc 53rd St

    ?x ?y ?zµ9 : Starbucks branchLoc 53rd St

    (c) (d)

    ( ?x , ?y , ?z ) {(?x ,?y ,53rd St}) . ({?x ,?y ,?z})Ω1 ∪ Ω2

    ?x ?y ?zµ5 : Starbucks branchLoc −µ6 : MoMA address −µ7 : − type Cafeµ8 : − branchLoc 53rd St

    (e) { ( ?x , ?y , 53rd St}) UNION ({ ?x , ?y , ?z })Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 64 / 80

  • Linked Data LifeCycle Storage/Querying

    SPARQL

    Additional SPARQL 1.0 Features

    duplicate elimination DISTINCT

    ordering results (ORDER BY) with an optional LIMIT clause

    SPARQL 1.0 Missing: aggregates, subqueries of any type, inference

    SPARQL 1.1: Extends SPARQL 1.0

    aggregations (COUNT, AVG ... )

    Subqueries

    Negation (already expressed in SPARQL 1.0 with a combination ofOPTIONAL and BOUND)

    Regular Path Expressions

    Updates

    Federation: supports queries that merge data distributed across the Web.

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 65 / 80

  • Linked Data LifeCycle Storage/Querying

    Storing Linked Data

    Logical Storage Schemas for RDF data

    schema agnostic: triples are stored in a large triple table where the attributes aresubject, predicate and object

    Triple Tablesubject predicate object

    Starbucks type Cafe

    Starbucks branchLoc 53rd St

    MoMA address 53rd St

    schema aware: one table is created per property with subject and objectattributes

    address Cafe

    subject object

    MoMA 53rd St

    subject

    Starbucks

    branchLoc

    subject object

    Starbucks 53rd St

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 66 / 80

  • Linked Data LifeCycle Storage/Querying

    Storing Linked Data

    hybrid: combines elements of the previous approaches.

    Triples of properties with range stringsubject predicate object

    Starbucks branchLoc 53rd St

    MoMA address 53rd St

    Cafe

    subject

    Starbucks

    Triples of properties with range datesubject predicate object

    Starbucks founded 1990

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 67 / 80

  • Linked Data LifeCycle Storage/Querying

    Storing Linked DataUpdates

    Schema Agnostic: addition/deletion of triples in the triple tableHybrid and Schema-aware:

    schema updates ⇒ addition/deletion of property/class tablesdata updates ⇒ addition/deletion of tuples in property/class tables

    Type InformationSchema Agnostic: typing information is lost since everything is a stringHybrid: explicitSchema-aware: implicit

    Query Processing

    Schema Agnostic:algebraic plan obtained for a query involves a large number of self joinsqueries are favorable when the predicate is a variable

    Hybrid Approach and Schema-aware:algebraic plan contains operations over the appropriate property/class tables(more in the spirit of existing relational schemas)saves many self-joins over triple tablesif the predicate is a variable, then one query per property/class must beexpressed

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 68 / 80

  • Linked Data LifeCycle Storage/Querying

    Storing Linked DataUpdates

    Schema Agnostic: addition/deletion of triples in the triple tableHybrid and Schema-aware:

    schema updates ⇒ addition/deletion of property/class tablesdata updates ⇒ addition/deletion of tuples in property/class tables

    Type InformationSchema Agnostic: typing information is lost since everything is a stringHybrid: explicitSchema-aware: implicit

    Query Processing

    Schema Agnostic:algebraic plan obtained for a query involves a large number of self joinsqueries are favorable when the predicate is a variable

    Hybrid Approach and Schema-aware:algebraic plan contains operations over the appropriate property/class tables(more in the spirit of existing relational schemas)saves many self-joins over triple tablesif the predicate is a variable, then one query per property/class must beexpressed

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 68 / 80

  • Linked Data LifeCycle Storage/Querying

    Storing Linked DataUpdates

    Schema Agnostic: addition/deletion of triples in the triple tableHybrid and Schema-aware:

    schema updates ⇒ addition/deletion of property/class tablesdata updates ⇒ addition/deletion of tuples in property/class tables

    Type InformationSchema Agnostic: typing information is lost since everything is a stringHybrid: explicitSchema-aware: implicit

    Query Processing

    Schema Agnostic:algebraic plan obtained for a query involves a large number of self joinsqueries are favorable when the predicate is a variable

    Hybrid Approach and Schema-aware:algebraic plan contains operations over the appropriate property/class tables(more in the spirit of existing relational schemas)saves many self-joins over triple tablesif the predicate is a variable, then one query per property/class must beexpressed

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 68 / 80

  • Linked Data LifeCycle Storage/Querying

    Storing Linked DataUpdates

    Schema Agnostic: addition/deletion of triples in the triple tableHybrid and Schema-aware:

    schema updates ⇒ addition/deletion of property/class tablesdata updates ⇒ addition/deletion of tuples in property/class tables

    Type InformationSchema Agnostic: typing information is lost since everything is a stringHybrid: explicitSchema-aware: implicit

    Query Processing

    Schema Agnostic:algebraic plan obtained for a query involves a large number of self joinsqueries are favorable when the predicate is a variable

    Hybrid Approach and Schema-aware:algebraic plan contains operations over the appropriate property/class tables(more in the spirit of existing relational schemas)saves many self-joins over triple tablesif the predicate is a variable, then one query per property/class must beexpressed

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 68 / 80

  • Linked Data LifeCycle Storage/Querying

    Storing Linked DataUpdates

    Schema Agnostic: addition/deletion of triples in the triple tableHybrid and Schema-aware:

    schema updates ⇒ addition/deletion of property/class tablesdata updates ⇒ addition/deletion of tuples in property/class tables

    Type InformationSchema Agnostic: typing information is lost since everything is a stringHybrid: explicitSchema-aware: implicit

    Query Processing

    Schema Agnostic:algebraic plan obtained for a query involves a large number of self joinsqueries are favorable when the predicate is a variable

    Hybrid Approach and Schema-aware:algebraic plan contains operations over the appropriate property/class tables(more in the spirit of existing relational schemas)saves many self-joins over triple tablesif the predicate is a variable, then one query per property/class must beexpressed

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 68 / 80

  • Linked Data LifeCycle Storage/Querying

    Physical Storage Schema for RDF Data

    Row store: store relations as complete rows (PostgreSQL, MySQL, Oracle RDF)

    Date Store Name Customer Price

    fully compliant solution for the schema agnostic approach

    row-store solutions that follow the schema agnostic approach, outperformthose that follow the schema-aware approach

    Column store: each attribute of a relational table is stored in a column(MonetDB)

    Date Store Name Customer Price

    fully compliant solution for vertical partitioning

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 69 / 80

  • Linked Data LifeCycle Storage/Querying

    Physical Storage Schema for RDF data

    Property Tables & Property-Class Tables

    Clustered Property Tables: contain clusters of properties that tend to bedefined togetherProperty-Class: exploits the type property of subjects to cluster similar sets ofsubjects together

    Triple Table

    subject predicate object

    Starbucks type CafeStarbucks founded XYZStarbucks city Chicago

    MOMA city New York

    Starbucks phone ABC

    subject type founded city

    Starbucks Cafe XYZ ChicagoMOMA Museum New YorkDEFMET Museum NULL New York

    Property Table

    Remaining Triples

    subject predicate object

    MOMA entryTicket GHI

    MOMA type Museum

    MOMA founded DEF

    MOMA entryTicket GHI

    MET type MuseumMET city New York

    Starbucks phone ABC

    subject type founded city

    MOMA Museum DEF New York

    Starbucks Cafe XYZChicago

    MET Museum NULL New York

    Class: Museum

    subject type city founded

    Class: Cafe

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 70 / 80

  • Linked Data LifeCycle Storage/Querying

    Dictionary & Ordered Relations

    X to avoid processing long strings URIs and Literals are mapped to uniqueidentifiers

    X regular size, much easier to handle and compress

    X the dictionary is small and can be kept in main memory

    − additional join should be performed when retrieving the results− filter conditions can be more expressive (include translating the identifiersinto values and back)

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 71 / 80

  • Linked Data LifeCycle Storage/Querying

    Physical Storage Schema for RDF Data

    Native RDF-engines: Sesame, Jena, Virtuoso, RDF-3X, Hexastore, ...

    Indexes that support different SPARQL query patterns aim at

    the efficient retrieval of triples during query evaluation

    the computation of the cost of joins

    Point Query: graph patterns consists of a single triple pattern

    SELECT ?y WHERE {s0 p0 ?x}

    Chain Query: graph pattern is a sequence of triple patterns

    SELECT ?y WHERE { s0 p0 ?x } . { ?x p1 ?y }

    Star Query:

    SELECT ?y , ?z WHERE { ?x p0 ?y } . { ?x p1 o1 } . { ?x p2 ?z }

    Hierarchical Queries: special case of chain queries that consider the traversal ofRDFS rdfs:subClassOf and rdfs:subPropertyOf relations

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 72 / 80

  • Linked Data LifeCycle Storage/Querying

    Indexes

    Point Queries

    • schema agnostic approach indexes on all possible permutations of the tripletable that stores triples of the form ( subject, predicate, object )

    triples in each index are sorted lexicographically

    (subject, object, property) for index spo,(object, subject, property) for index osp

    indexes are implemented as B+-trees

    X benefit of having all possible indexes is that any triple pattern can be answeredwith a single index scanX facilitate the use of sort-merge join operator for evaluating joins

    • vertical partitioningindexes are built on the subject and object columns of property tables[SAB+05]

    sextuple indexing scheme with subject and object-headed indexes [WKB08]

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 73 / 80

  • Linked Data LifeCycle Storage/Querying

    Indexes

    Hierarchical Queries

    supported by indexes implemented as B+ trees [CPS+07]

    indexes on subgraphs [BB07] defined by the rdfs:subClassOf andrdfs:subPropertyOf relations

    Star Queries & Chain Queries: supported by aggregated indexes [NW08,NW09] to compute the selectivity of a set of different access patterns

    for every combination of values for the subject, predicate and object valuesthat store the number of triples that match this combination

    indexes that store the number of triples that can be joined with triples thatcontain a specific combination of RDF terms ⇒ approximate the cardinalityof join results improve the performance of joins

    compute the frequent paths in the RDF dataset to estimate the cost of ajoin in the case of path and chain queries and use the size of the intermediateresults to calculate the cost of a join.

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 74 / 80

  • Linked Data LifeCycle Storage/Querying

    Where does existing Database Technology fit?

    Existing Relational Stores are problematic for querying the Web of Data

    • due to the fragmentation of information, absence of schema and constraints forRDF data leads to query plans that are hard to optimize

    schema agnostic :a large number of self joins over a large triple table isrequired

    vertical partitioning: multiple joins over different tables

    • relational optimizers are not designed for processing RDF data:compute the cost of scans but not the join hit ratio(s)

    the subject-subject and subject-object join hit ratio will have the sameestimate since correlations between triple components are not considered

    • correlated cost estimation over many selections and self joinsthe rule and not the exception for RDF query processing

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 75 / 80

  • Linked Data LifeCycle Storage/Querying

    Where does existing Database Technology fit?

    SQL-based engines: SW-store, Oracle RDF, Sesame, Virtuoso RDF

    SPARQL queries are translated into SQL queries and are evaluated by theunderlying engine

    rely optimizations related to relational data

    Native Engines: Hexastore, Yars2, RDF-3X

    rely mostly on merge joins over sorted index lists over hash joins whenpossible

    merge joins operate on sorted inputs

    sequentially scan both inputs Ximmediately join matching triples Xskip over parts without matches Xsupports pipelining X

    hash joins operate on unsorted data

    needs to touch every triple −breaks pipelining −

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 76 / 80

  • Linked Data LifeCycle Storage/Querying

    SPARQL Query Processing

    Issues

    decide among the multiple ( subject, predicate, object) indexes the one toevaluate one triple pattern (spo, sop, etc.)

    decide the join order and join algorithms per join variable (merge or hashjoins)

    Approaches

    Cost-based based on dynamic programming [NW08, NW09]

    Heuristic-based do not statistics but are based on data characteristics[TSF+12]

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 77 / 80

  • Conclusions

    Conclusions

    addressed in this talk

    From Web of Documents to Web of Data ∼ Linked DataLinked Data Principles: URIs, RDF

    Adding Semantics to data: defining schemas & ontologies

    Linked Data Life Cycle: Extraction, Intelinking, Storage and Querying

    not addressed in this talk

    LifeCycle processes

    Manual revision/authoring: principles for creating RDF datasetsClassification/Enrichment: how to add semantics to existing datasetsQuality Analysis (Provenance Tracking, Vocabulary Mapping)

    Storage and Querying Linked Data in a distributed setting

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 78 / 80

  • Conclusions

    References

    [BB07] L. Baolin and H. Bo. HPRD: A High Performance RDF Database. In Networkand Parallel Computing, 2007.

    [CPS+07] V. Christophides, D. Plexousakis, M. Scholl, and S. Tourtounis. On labelingschemes for the Semantic Web. In ISWC, 2003.

    [NW08] T. Neumann and G.Weikum. RDF-3X: a RISC-style engine for RDF. PVLDB,1(1), 2008.

    [NW09] T. Neumann and G. Weikum. Scalable join processing on very large rdf graphs.In SIGMOD, June 2009.

    [SAB+05] M. Stonebraker, D. Abadi, A. Batkin, X. Chen, M. Cherniak, M. Ferreira, E.Lau, A. Lin, S. Madden, P. E.O’Neil, A. Rasin, N. Tran, and S. Zdonik. C-Store: AColumn Oriented DBMS. In VLDB, 2005.

    [TSF+12] P. Tsialamanis, L. Sidirourgos, I. Fundulaki, P. Boncz and V. Christophides.Heuristics-based Query Optimisation for SPARQL. In EDBT, 2012.

    [WKB08] C. Weiss, P. Karras, and A. Bernstein. Hexastore: sextuple indexing forsemantic web data management. PVLDB, 1(1), 2008.

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 79 / 80

  • Conclusions

    References

    Material for the slides was used from :

    Tom Heath and Christian Bizer. Linked Data: Evolving the Web into aGlobal Data Space (1st edition). Synthesis Lectures on the Semantic Web:Theory and Technology, 1:1, 1-136. Morgan & Claypool, 2011.

    Christian Bizer, Rchard Cyganiak and Tom Heath. How to Publish LinkedData on the Web (Tutorial). Available at http://wifo5-03.informatik.uni-mannheim.de/bizer/pub/LinkedDataTutorial/.

    Tim Berners-Lee. Design Issues: Linked Data. Available at:http://linkeddata.org/guides-and-tutorials.

    Andreas Harth, Katja Hose and Ralf Schenkel. Database Techniques forLinked Data Management. In SIGMOD, 2012.

    Irini Fundulaki (Institute of Computer Science - FORTH) Introducing Linked Data March 29, 2013 80 / 80

    http://wifo5-03.informatik.uni-mannheim.de/bizer/pub/LinkedDataTutorial/http://wifo5-03.informatik.uni-mannheim.de/bizer/pub/LinkedDataTutorial/http://linkeddata.org/guides-and-tutorials

    Going from the Web of Documents to the Web of DataWeb ArchitectureItems of InterestNaming Resources: URIsResource Description Framework (RDF)

    Adding Semantics to Linked DataA Short IntroductionRDF Vocabulary Description Language (RDFS)Ontology Web Language (OWL)

    Linked Data LifeCycleExtractionInterlinking/FusingStorage/Querying

    Conclusions