linked data and tools
DESCRIPTION
American Art Collaborative Planning Grant Educational Briefings Linked Data and Tools Pedro Szekely - USC/Information Sciences Institute September 30, 2014TRANSCRIPT
Linked Data and Tools Pedro Szekely
USC/Information Sciences Institute [email protected], http://isi.edu/~szekely
September 2014
CC-By 2.0
Outline • Introduction to linked open data
• RDF: the Resource Description Framework
• Tools to convert data to RDF
• Tools for linking/reconciliation/resolution
• Storing and maintaining the data
• Applications
CC-By 2.0 2 Pedro Szekely
Pedro Szekely
Linked Open Data!
CC-By 2.0 3
The Web of Documents
CC-By 2.0 4 Pedro Szekely
What We See
Pedro Szekely CC-By 2.0 5
What the Computer Sees
blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah
blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah
Pedro Szekely CC-By 2.0 6
web pages are machine processable, but not machine understandable
impractical for building applications using the data
Problem
Pedro Szekely CC-By 2.0 7
Solution
publish the data as Linked Open Data
Pedro Szekely CC-By 2.0 8
What Is Linked Data? A method of publishing structured data
so that it can be interlinked and become more useful
Builds upon standard Web technologies
such as HTTP and URIs to share information
in a way that can be read automatically by computers from Wikipedia
Pedro Szekely CC-By 2.0 9
“Linked” Open Data Crystal Bridges
Museum ofAmerican Art
Dallas Museum of Art
IndianapolisMuseum of Art
The Metropolitan Museum of Art
National Portrait Gallery
Smithsonian American Art Museum
Pedro Szekely CC-By 2.0 10
“Linked” Open Data Crystal Bridges
Museum ofAmerican Art
Dallas Museum of Art
IndianapolisMuseum of Art
The Metropolitan Museum of Art
National Portrait Gallery
Smithsonian American Art Museum
✔
✖
Pedro Szekely CC-By 2.0 11
… data is public!… in a common format!
… but we only have islands of data!
Linked Open Data
CC-By 2.0 12 Pedro Szekely
Linked Data Principles • Use URIs as names for things
• Use HTTP URIs so that people can look up those names
• When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
• Include links to other URIs so that they can discover more things
http://youtu.be/OM6XIICm_qo!
http://www.w3.org/DesignIssues/LinkedData.html !
Pedro Szekely CC-By 2.0 13
Pedro Szekely
Principle 1 Use URIs as names for things
Principle 2 Use HTTP URIs so that people can look up those names
CC-By 2.0 14
Can USC Have a URI?
Pedro Szekely CC-By 2.0 15
http://dbpedia.org/resource/University_of_Southern_California
Pedro Szekely CC-By 2.0 16
Can the Pythagoras Theorem Have a URI?
Pedro Szekely CC-By 2.0 17
http://www.freebase.com/m/05r2j
Pedro Szekely CC-By 2.0 18
My Dog: Can He Have a URI?
Pedro Szekely CC-By 2.0 19
http://szekelys.com/diego
Pedro Szekely CC-By 2.0 20
Pedro Szekely
Principle 3 When someone looks up a URI, provide useful information, using the standards
(RDF*, SPARQL)
CC-By 2.0 21
Pedro Szekely
http://dbpedia.org/resource/University_of_Southern_California
CC-By 2.0 22
Pedro Szekely
http://www.freebase.com/m/05r2j
CC-By 2.0 23
Pedro Szekely
http://szekelys.com/diego
Principle 3 When someone looks up a URI, provide useful information, using the standards
(RDF*, SPARQL) CC-By 2.0 24
Pedro Szekely
Principle 4 Include links to other URIs so that they
can discover more things
CC-By 2.0 25
http://szekelys.com/diego @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix dbpprop: <http://dbpedia.org/property/> . @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix dbpedia-owl: <http://dbpedia.org/ontology/> . @prefix fb: <http://rdf.freebase.com/ns/> . http://szekelys.com/diego
rdf:type “Dog” ; http://szekelys.com/name ”Diego" ; dbpedia-owl:species “Labrador Retriever” ; dbprop:country “Canada” ; dbprop:color “Yellow” ; fb:base.petbreeds.dog.gender “Male” .
Linked Data?!Pedro Szekely CC-By 2.0 26
http://szekelys.com/diego @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix dbpprop: <http://dbpedia.org/property/> . @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix dbpedia-owl: <http://dbpedia.org/ontology/> . @prefix fb: <http://rdf.freebase.com/ns/> . http://szekelys.com/diego
rdf:type “Dog” ; http://szekelys.com/name ”Diego" ; dbpedia-owl:species “Labrador Retriever” ; dbprop:country “Canada” ; dbprop:color “Yellow” ; fb:base.petbreeds.dog.gender “Male” .
Not Linked Data!Pedro Szekely CC-By 2.0 27
http://szekelys.com/diego @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix dbpprop: <http://dbpedia.org/property/> . @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix dbpedia-owl: <http://dbpedia.org/ontology/> . @prefix fb: <http://rdf.freebase.com/ns/> . http://szekelys.com/diego
rdf:type dbpedia:Dog; http://szekelys.com/name ”Diego" ; dbpedia-owl:species dbpedia:Labrador_Retriever ; dbprop:country dbpedia:Canada; dbprop:color dbpedia:Yellow; fb:base.petbreeds.dog.gender fb:en.male.
Almost Linked Data!Pedro Szekely CC-By 2.0 28
http://szekelys.com/diego @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix dbpprop: <http://dbpedia.org/property/> . @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix dbpedia-owl: <http://dbpedia.org/ontology/> . @prefix fb: <http://rdf.freebase.com/ns/> . http://szekelys.com/diego
rdf:type dbpedia:Dog; http://szekelys.com/name ”Diego" ; dbpedia-owl:species dbpedia:Labrador_Retriever ; dbprop:country dbpedia:Canada; dbprop:color dbpedia:Yellow; fb:base.petbreeds.dog.gender fb:en.male.
Almost Linked Data!Pedro Szekely CC-By 2.0 29
http://szekelys.com/diego @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix dbpprop: <http://dbpedia.org/property/> . @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix dbpedia-owl: <http://dbpedia.org/ontology/> . @prefix fb: <http://rdf.freebase.com/ns/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . http://szekelys.com/diego
rdf:type dbpedia:Dog; foaf:name ”Diego" ; dbpedia-owl:species dbpedia:Labrador_Retriever ; dbprop:country dbpedia:Canada; dbprop:color dbpedia:Yellow; fb:base.petbreeds.dog.gender fb:en.male.
Linked Data!
foaf is a widely used ontology!
Pedro Szekely CC-By 2.0 30
Pedro Szekely
RDF!
CC-By 2.0 31
Intended for representing metadata about Web resources, such as the title, author, and modification date
of a Web document
… also be used to represent information about things that can be identified on the Web,
even when they cannot be directly retrieved on the Web
Resource Description Framework
Pedro Szekely CC-By 2.0 32
Represent Resources Using URIs
h&p://szekelys.com/family#pedro
“Pedro”
h&p://xmlns.com/foaf/0.1/firstName
That guy has first name “Pedro”
Pedro Szekely CC-By 2.0 33
Represent Information as Triples
h&p://szekelys.com/family#pedro h&p://xmlns.com/foaf/0.1/firstName
Subject!Predicate!
Object!The resource being described
A property of the resource The value of the property
“Pedro”
Pedro Szekely CC-By 2.0 34
Use Namespaces
h&p://szekelys.com/family#pedro
“Pedro”
foaf:firstName
h&p://szekelys.com/family#pedro
“Pedro”
h&p://xmlns.com/foaf/0.1/firstName
Pedro Szekely CC-By 2.0 35
RDF Graphs
h&p://szekelys.com/family#pedro
“Pedro”
foaf:firstName
foaf:Person rdf:type
h&p://isi.edu/~szekely
foaf:homepage
Pedro Szekely CC-By 2.0 36
RDF Graphs
h&p://szekelys.com/family#pedro
“Pedro”
foaf:firstName
foaf:Person rdf:type
h&p://isi.edu/~szekely
foaf:homepage
Real world objects!
Kinds of things!
Literals!
Properties of things!
Pedro Szekely CC-By 2.0 37
Mix Vocabularies
h&p://szekelys.com/family#pedro
“Pedro” foaf:firstName
foaf:Person rdf:type
h&p://isi.edu/~szekely
foaf:homepage
schema:Person
rdf:type
h&p://szekelys.com/family#claudia
schema:spouse
Pedro Szekely CC-By 2.0 38
Linked Open Data
CC-By 2.0 39 Pedro Szekely
Pedro Szekely
Tools to Convert Data to RDF!
CC-By 2.0 40
Steps to Create Linked Open Data • Select ontologies
… that define classes and properties for our data
• Convert data to RDF … from the museum database to the ontologies
• Identify links to other Linked Data datasets … to other museums and Link Data hubs
Pedro Szekely CC-By 2.0 41
Pedro Szekely CC-By 2.0 42
CIDOC CRM
• Select ontologies … that define classes and properties for our data
http://www.cidoc-crm.org/
Pedro Szekely CC-By 2.0 43 Pedro Szekely
• Select ontologies … that define classes and properties for our data
• Convert data to RDF … from the museum database to the ontologies
RDF Mapping Tools
CC-By 2.0 44 Pedro Szekely
Tool Shortcomings Benefits custom code
labor intensive, error prone
flexible
R2RML difficult to learn, only for SQL databases
W3C standard, good documentation, multiple vendors
RDF Refine
only for tabular data graphical user interface, support for reconciliation, open source
Karma semi-automatic, graphical user interface, supports tabular data, XML and JSON, multiple export formats, R2RML compatible, open source
R2RML
CC-By 2.0 45 Pedro Szekely
About 6,550 results!
R2RML Example
CC-By 2.0 46 Pedro Szekely
:Table1 rdf:type rr:TriplesMap ;
rr:logicalTable "Select ('<http:..isbn/' || ISBN || '>') AS isbn,
Author, Title, Publisher, Year from book_table";
rr:subjectMap [ rdf:type rr:IRIMap ; rr:column "isbn" ] ;
rr:propertyObjectMap [ rr:property a:title ; rr:column "Title" ; ] ;
rr:propertyObjectMap [ rr:property a:year ; rr:column "Year" ; ] ;
http://ivan-herman.name/2010/11/02/my-first-mapping-from-rdb-to-rdf-using-r2rml/!http://www.w3.org/TR/r2rml/!
RDF Refine
CC-By 2.0 47 Pedro Szekely http://refine.deri.ie/rdfExportDocs!
Karma
CC-By 2.0 48 Pedro Szekely
https://github.com/InformationIntegrationGroup/Web-Karma!
Pedro Szekely
Tools for Linking!
CC-By 2.0 49
Multiple “John Singer Sargent” ima:SaamPerson_John_Singer_Sargent! a saam:SaamPerson ;! dct:date "1856-1925" ;! foaf:name "John Singer Sargent" .!
saam:SaamPerson_4253! a saam:SaamPerson ;! saam:associatedPlace ! saam:SaamPlace_1357324439768t1r13950_0, ! saam:SaamPlace_1357324439768t1r13951_0 ;! saam:constituentId "4253" ;! rdaGr2:biographicalInformation ! “Painter. Sargent traveled …" ;! rdaGr2:dateAssociatedWithThePerson "1990-10-1”, "1995-5-8" ;! rdaGr2:dateOfBirth "1856-1-12" ;! rdaGr2:dateOfDeath "1925-4-15" ;! rdaGr2:placeOfBirth saam:SaamPlace_1357324439768t1r13952_0 ;! rdaGr2:placeOfDeath saam:SaamPlace_1357324439768t1r13953_0 ;! skos:altLabel "John S. Sargent" ;! skos:prefLabel "John Singer Sargent" .!
cb:SaamPerson_John_Singer_Sargent! a saam:SaamPerson ;! ont0:dateOfBirth "1879", "1885" ;! ont0:dateOfDeath "1925" ;! skos:prefLabel "John Singer Sargent" .!
met:SaamPerson_John_Singer_Sargent! a saam:SaamPerson ;! ont0:placeOfResidence ! "North and Central America", ! "United States" ;! foaf:name "John Singer Sargent" .!
dallas:SaamPerson_John_Singer_Sargent! a saam:SaamPerson ;! ont0:dateOfBirth "1856" ;! ont0:dateOfDeath "1925" ;! foaf:name "John Singer Sargent" .!
Pedro Szekely CC-By 2.0 50
ima:SaamPerson_John_Singer_Sargent! a saam:SaamPerson ;! dct:date "1856-1925" ;! foaf:name "John Singer Sargent" .!
saam:SaamPerson_4253! a saam:SaamPerson ;! saam:associatedPlace ! saam:SaamPlace_1357324439768t1r13950_0, ! saam:SaamPlace_1357324439768t1r13951_0 ;! saam:constituentId "4253" ;! rdaGr2:biographicalInformation ! “Painter. Sargent traveled …" ;! rdaGr2:dateAssociatedWithThePerson "1990-10-1”, "1995-5-8" ;! rdaGr2:dateOfBirth "1856-1-12" ;! rdaGr2:dateOfDeath "1925-4-15" ;! rdaGr2:placeOfBirth saam:SaamPlace_1357324439768t1r13952_0 ;! rdaGr2:placeOfDeath saam:SaamPlace_1357324439768t1r13953_0 ;! skos:altLabel "John S. Sargent" ;! skos:prefLabel "John Singer Sargent" .!
cb:SaamPerson_John_Singer_Sargent! a saam:SaamPerson ;! ont0:dateOfBirth "1879", "1885" ;! ont0:dateOfDeath "1925" ;! skos:prefLabel "John Singer Sargent" .!
met:SaamPerson_John_Singer_Sargent! a saam:SaamPerson ;! ont0:placeOfResidence ! "North and Central America", ! "United States" ;! foaf:name "John Singer Sargent" .!
dallas:SaamPerson_John_Singer_Sargent! a saam:SaamPerson ;! ont0:dateOfBirth "1856" ;! ont0:dateOfDeath "1925" ;! foaf:name "John Singer Sargent" .!
Pedro Szekely
John Singer Sargent
Pedro Szekely CC-By 2.0 51
Linking “John Singer Sargent”
saam:SaamPerson_4253! owl:sameAs cb:SaamPerson_John_Singer_Sargent ;! owl:sameAs dallas:SaamPerson_John_Singer_Sargent ;! owl:sameAs ima:SaamPerson_John_Singer_Sargent ;! owl:sameAs met:SaamPerson_John_Singer_Sargent ;! owl:sameAs dbpedia:John_Singer_Sargent ;! owl:sameAs nytimes/N49129220686803623753 ;! owl:sameAs w-flick/John_Singer_Sargent ;! ...!.!
Pedro Szekely Pedro Szekely CC-By 2.0 52
Linking/Reconciliation Tools
CC-By 2.0 53 Pedro Szekely
Tool Shortcomings Benefits custom code
very difficult tuned to the data
SILK LIMES
experimental, poor support
work with RDF, efficient, relatively easy to use
RDF Refine
requires implementing a new reconciliation service
integrated with RDF conversion, user interface for curation
Karma under development
SILK
CC-By 2.0 54 Pedro Szekely
http://wifo5-03.informatik.uni-mannheim.de/bizer/silk!
RDF Refine
CC-By 2.0 55 Pedro Szekely
http://refine.deri.ie/reconciliationDocs!
Pedro Szekely
Storing and Maintaining the Data!
CC-By 2.0 56
Storage Options
CC-By 2.0 57 Pedro Szekely
Technology Shortcomings Benefits SPARQL endpoint
low reliability, esoteric, slow
sophisticated query language
RDF dump no query capability, esoteric
flexibility: clients can download and use in applications, easy to publish
JSON-LD + ElasticSearch
restricted query language very high performance, mainstream technology, easy to publish
JSON-LD
CC-By 2.0 58 Pedro Szekely
{
"@type": "http://www.cidoc-crm.org/cidoc-crm/E21_Person",
"@id": "http://americanart.si.edu/data/person-institution/99”,
“P1_is_identified_by": {
"@type": "http://www.cidoc-crm.org/cidoc-crm/E82_Actor_Appellation",
"@id": "http://americanart.si.edu/data/person-institution/99/appellation/Birth-or-Maiden-Name”,
“label": " Walter Inglis Anderson”,
“lastname": "Anderson",
“firstname": "Walter Inglis”
}
}
Pedro Szekely CC-By 2.0 59
Pedro Szekely
Applications!
CC-By 2.0 60
Pedro Szekely CC-By 2.0 61
we have expanded the reach of linked data within the BBC to more audience facing products and presented our ambitions to using linked
data as glue for the plethora of content the BBC produces!!
http://www.bbc.co.uk/blogs/internet/posts/Linked-Data-new-ontologies-website!http://www.bbc.co.uk/blogs/internet/posts/Linked-Data-Connecting-together-the-BBCs-Online-Content!
http://www.bbc.co.uk/blogs/internet/posts/Opening-up-the-BBCs-Linked-Data!
Pedro Szekely CC-By 2.0 62
Pedro Szekely CC-By 2.0 63
Pedro Szekely CC-By 2.0 64
thanks for your attention!questions?!