summer school lex 2014 - rdf + sparql querying the web of (lex)data
DESCRIPTION
Lecture for SUMMER SCHOOL LEX 2014 4, Sept. 2014, Ravenna, Italy http://summerschoollex.cirsfid.unibo.it/TRANSCRIPT
RDF + SPARQL querying the web
of (lex)data
Diego Valerio Camarda regesta.exe
www.regesta.com
[email protected] dvcama @ github&twitter
DiegoValerioCamarda @ slideshare
a (really) short introduction to linked open data
what about IRIs and RDF a new way to publish data on the web
ids are ambiguous and suck!
Use URIs as names for things Use HTTP URIs so that people can look up those names Use the standards (RDF, SPARQL) providing useful information Include links to other URIs so that they can discover more things
linked data principles Tim Berners-Lee July 27, 2006
The Children and Families Act 2014
http://www.legislation.gov.uk/id/uksi/2014/2270
what about IRIs and RDF turning documents into data
ids are ambiguous and suck!
A new way to design databases RDF
(aka ’define knowledge’)
Go Triples, go! the standard (old) approach
ID_P COGNOME NOME REF_ID_SOCIETA GENERE
1 Camarda Diego 1 maschio
2 … … … …
ID_SOCIETA DENOMINAZIONE SITO
1 Regesta.exe srl www.regesta.com
Go Triples, go! the new (cool) approach
<http://www.regesta.com/diego>
Subject
Go Triples, go! the new (cool) approach
<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName>
Subject Predicate
Go Triples, go! the new (cool) approach
<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’.
Subject Predicate Object
Go Triples, go! the new (cool) approach
<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’. <http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/firstName> ‘Diego’. <http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/gender> ‘male’.
Go Triples, go! the new (cool) approach
<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ ; <http://xmlns.com/foaf/0.1/firstName> ‘Diego’ ; <http://xmlns.com/foaf/0.1/gender> ‘male’ .
Go Triples, go! ok, but what a “diego” is?
Go Triples, go! it’s a person!
<http://www.regesta.com/diego> a <http://xmlns.com/foaf/0.1/Person>
Go Triples, go! adding a Class
<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ ; <http://xmlns.com/foaf/0.1/firstName> ‘Diego’ ; <http://xmlns.com/foaf/0.1/gender> ‘male’ .
<http://www.regesta.com/diego> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> .
Go Triples, go! building a graph
<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ ; <http://xmlns.com/foaf/0.1/firstName> ‘Diego’ ; <http://xmlns.com/foaf/0.1/gender> ‘male’ ; <http://www.w3.org/1999/...#type> <http://xmlns.com/foaf/0.1/Person> .
<http://www.regesta.com/diego> <http://www.w3.org/ns/org#memberOf> <http://www.regesta.com/about> .
Go Triples, go! building a graph
<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ ; <http://xmlns.com/foaf/0.1/firstName> ‘Diego’ ; <http://xmlns.com/foaf/0.1/gender> ‘male’ ; <http://www.w3.org/1999/...#type> <http://xmlns.com/foaf/0.1/Person> ; <http://www.w3.org/ns/org#memberOf> <http://www.regesta.com/about> .
<http://www.regesta.com/about> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/org#Organization> .
Go Triples, go! building a graph
<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ ; <http://xmlns.com/foaf/0.1/firstName> ‘Diego’ ; <http://xmlns.com/foaf/0.1/gender> ‘male’ ; <http://www.w3.org/1999/...#type> <http://xmlns.com/foaf/0.1/Person> ; <http://www.w3.org/ns/org#memberOf> <http://www.regesta.com/about> . <http://www.regesta.com/about> <http://www.w3.org/1999/...#type> <http://www.w3.org/ns/org#Organization> .
Go Triples, go! building a graph
<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ ; <http://xmlns.com/foaf/0.1/firstName> ‘Diego’ ; <http://xmlns.com/foaf/0.1/gender> ‘male’ ; <http://www.w3.org/1999/...#type> <http://xmlns.com/foaf/0.1/Person> ; <http://www.w3.org/ns/org#memberOf> <http://www.regesta.com/about> . <http://www.regesta.com/about> <http://www.w3.org/1999/...#type> <http://www.w3.org/ns/org#Organization> ; <http://www.w3.org/2004/02/skos/core#prefLabel> ‘Regesta.exe srl’ ; <http://xmlns.com/foaf/0.1/homepage> <http://www.regesta.com> .
Go Triples, go! Objects could be Subjects
diego
Go Triples, go! considering diego and regesta
diego
regesta
Go Triples, go! <diego> <memberOf> <regesta>
diego
regesta
Go Triples, go! but, <regesta> <locatedIn> <rome>
diego
regesta
rome
Go Triples, go! <diego> <placeOfBirth> <rome>
diego
regesta
rome
Go Triples, go! <rome> <parentADM> <italy>
diego
regesta
rome
italy
Go Triples, go! <silvia> <placeOfBirth> <italy>
diego
regesta
silvia
rome
italy
Go Triples, go! <silvia> <…> <…>
diego
regesta
silvia
rome
italy
Go Triples, go! <…> <…> <…> = a knowledge graph!
diego
regesta
silvia
rome
italy
A lot of sentence to achieve (descriptive) freedom
<http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/familyName> ‘Camarda’ . <http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/firstName> ‘Diego’ . <http://www.regesta.com/diego> <http://xmlns.com/foaf/0.1/gender> ‘male’ . <http://www.regesta.com/diego> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <http://www.regesta.com/diego> <http://www.w3.org/ns/org#memberOf> <http://www.regesta.com> . <http://www.regesta.com/silvia> <http://xmlns.com/foaf/0.1/familyName> ‘Mazzini’ . <http://www.regesta.com/silvia> <http://xmlns.com/foaf/0.1/firstName> ‘Silvia’ . <http://www.regesta.com/silvia> <http://xmlns.com/foaf/0.1/gender> ‘female’ . <http://www.regesta.com/silvia> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <http://www.regesta.com/silvia> <http://www.w3.org/ns/org#memberOf> <http://www.regesta.com> . <http://www.regesta.com> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/org#Organization> . <http://www.regesta.com> <http://www.w3.org/2004/02/skos/core#prefLabel> ‘Regesta.exe srl’ . <http://www.regesta.com/silvia> <http://xmlns.com/foaf/0.1/knows> <http://www.regesta.com/diego> .
<…> <…> <…>.
<noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>.<noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> <makeGoCreazy> <homer>. <noTv> <makeGoCreazy> <homer>. <noBeer> …
Standards for semantic web
RDF http://www.w3.org/standards/techs/rdf SPARQL http://www.w3.org/standards/techs/sparql ONTOLOGIES http://www.w3.org/standards/semanticweb/ontology
Did you studied HTML? Good! it's time for a new standard
The Resource Description Framework is a general-purpose language for representing
information in the Web.
It's time for a new standard RDF
The SPARQL Protocol and RDF Query Language is a query language and protocol for RDF.
It's time for a new standard SPARQL
On the Semantic Web, vocabularies define the concepts and relationships
(also referred to as “terms”) used to describe and represent
an area of concern.
It's time for a new standard Ontologies
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> foaf:firstName dc:title rdfs:label
Pre:fixes (ontologies) just a few words
Browsing the web of data
Resource Description Framework
› SPARQL endpoint › dereferenceable URIs › content negotiation › standard ports, like 80 (HTTP) › JSONP support
MUST!
Resource Description Framework
› SPARQL endpoint › dereferenceable URIs › content negotiation › standards port, like 80 (HTTP) › JSONP support › up-to-date › the endpoint URL is easy to deduce from resources › the resources are described by dc:title or rdfs:label › the endpoint hosts a page for humans › the resources and the endpoint are on the same domain
SHOULD! (please do it, for me)
SELECT * {?minnesota ?banana ?sun}
SPARQL a must know query language
SPARQL group graph pattern
diego
regesta
silvia
rome
italy
diego
regesta
silvia
rome
italy
SPARQL group graph pattern
diego
regesta
rome
silvia italy
silvia italy
SELECT ?person { ?person <placeOfBirth> ?place. ?person <memberOf> ?company . ?company <locatedIn> ?place . }
SPARQL group graph pattern
<diego>
SELECT ?person ?prop ?obj { ?person <placeOfBirth> ?place. ?person <memberOf> ?company. ?person ?prop ?obj . ?company <locatedIn> ?place . }
SPARQL group graph pattern
(turn the page)
person prop obj <diego> rdf:type foaf:Person <diego> foaf:firstName ‘Diego’ <diego> foaf:familyName ‘Camarda’ <diego> foaf:gender ‘male’ <diego> org:memberOf <regesta>
SPARQL group graph pattern
DESCRIBE <diego>
SPARQL describe
(turn the page)
<diego> rdf:type foaf:Person . <diego> foaf:firstName ‘Diego’ . <diego> foaf:familyName ‘Camarda’ . <diego> foaf:gender ‘male’ . <diego> org:memberOf <regesta> . <silvia> foaf:knows <diego> .
SPARQL describe
DISTINCT, COUNT GRAPH, PREFIX isBlank, isIRI, isLiteral, isNumeric FILTER, REGEX, STR FILTER NOT EXISTS, MINUS ORDER BY, OFFSET, LIMIT for other stuff http://www.w3.org/TR/sparql11-query/
SPARQL minimum requirements
Please start negotiating content right now!
Hi dude, I accept: text/html,application/xhtml+xml Html
page Great! I’ll serve you a web page
Hi dude, I accept: application/rdf+xml
RDF data Great… 302, redirect!
Hi dude, I accept: pizza/margherita
406 error mmm… sorry
Please start negotiating content right now!
application/rdf+xml application/xml text/plain text/turtle application/x-turtle application/trix application/x-trig text/n3 text/rdf+n3 application/trix
application/x-trig application/x-binary-rdf text/x-nquads application/ld+json application/rdf+json application/xhtml+xml text/xml application/json application/rdf+xml application/rdf+n3 application/sparql-results+xml application/sparql-results+json
curl -L -H "Accept: application/rdf+xml" http://dati.camera.it/ocd/governo.rdf/g102 curl -L -H "Accept: text/n3" http://dati.camera.it/ocd/governo.rdf/g102
Please start negotiating content using CURL…
Java : Sesame / Jena
Python : RDFLib Ruby : RDF.rb
nodeJs : sparql-client
or, as I do, simple HTTP GET +
parsing results as json or xml
Please start negotiating content …or a framework!
RDF data storing and deploying
It’s slow so keep calm
1 record 15 triples
2.949.771 votes 64.948.856 triples
usually
eg. Chamber of deputies
data big data
RDF probably will transform
Virtuoso Sesame
Fuseki (Jena) Owlim / Bigdata (Sesame)
AllegroGraph D2R server
ARC2 …
Triplestores I just need a SPARQL endpoint
I just really need http://yourdomain/sparql
Case studies
select distinct ?o where {?s a ?o}
select ?o count(distinct ?s) where {?s a ?o}
select count(?s) where {?s ?p ?o}
select count(?s) ?class where {?s ?p ?o; a ?class}
select distinct ?p where {?s a <http://classe>; ?p ?o}
select ?p count(?p) where {?s a <http://classe>; ?p ?o}
select ?s where {?s a <http://classe>}
?p ?o where {<http://URI> ?p ?o}
select distinct ?s ?title where {?s a <http://classe>; dc:title ?title. FILTER(REGEX(? title,’parola’,’i’))} LIMIT 100
SPARQL magic a query for all seasons
Case studies Chamber of deputies Senate of Republic
http://dati.camera.it/sparql
http://dati.senato.it/sparql
Useful links
All Bills filtered by year SELECT DISTINCT * {?bill a ocd:atto; dc:title ?title; dc:date ?date . FILTER(regex(?date,'^2014'))} ORDER BY ?date
Last voted Bills SELECT distinct * WHERE { ?bill a ocd:atto; dc:title ?title. ?votazione a ocd:votazione; ocd:rif_attoCamera ?bill; dc:date ?data; dc:title ?denominazione; dc:description ?descrizione; ocd:votanti ?votanti; ocd:votazioneFinale 1; ocd:favorevoli ?favorevoli; ocd:contrari ?contrari; ocd:astenuti ?astenuti; ocd:rif_leg <http://dati.camera.it/ocd/legislatura.rdf/repubblica_17>} ORDER BY DESC(?data)
Example queries Chamber of deputies
All Bills filtered by year PREFIX osr: <http://dati.senato.it/osr/> SELECT DISTINCT * {?bill a osr:Ddl; osr:titolo ?title; osr:dataPresentazione ?date . FILTER(regex(STR(?date),'^2014'))} ORDER BY ASC(?date)
Last approved Bills PREFIX osr: <http://dati.senato.it/osr/> SELECT DISTINCT ?ddl ?titolo ?titoloBreve ?natura ?stato ?dataApprovato WHERE { ?ddl a osr:Ddl. ?ddl osr:statoDdl ?stato. ?ddl osr:ramo "S"^^<http://www.w3.org/2001/XMLSchema#string>. ?ddl osr:dataPresentazione ?dataPresentazione. ?ddl osr:titolo ?titolo. OPTIONAL { ?ddl osr:titoloBreve ?titoloBreve }. ?ddl osr:natura ?natura. ?ddl osr:dataStatoDdl ?dataApprovato. ?ddl osr:testoApprovato ?testoApprovato FILTER(xsd:date(str(?dataApprovato)) <= xsd:date(str("2014-12-31"))) FILTER(xsd:date(str(?dataApprovato)) >= xsd:date(str("2014-01-01"))) } ORDER BY ?dataApprovato
Example queries Senate of Republic
Case studies UK Legislation
http://gov.tso.co.uk/legislation/sparql
http://openuplabs.tso.co.uk/sparql/gov-legislation
http://www.opsi.gov.uk/legislation-api/developer/formats/rdf
Useful links
All ‘Works’ filtered by year SELECT ?work ?date ?title {?work a frbr:Work . ?work dct:title ?title . ?work dct:created ?date . FILTER (REGEX(STR(?date),'^2014')) } ORDER BY desc(?date)
Top subjects by year SELECT (count(?sub) as ?tot) ?sub { ?work a frbr:Work . ?work dct:subject ?sub . ?work dct:created ?date . FILTER (REGEX(STR(?date),'^2014')) } GROUP BY ?sub ORDER BY desc(?tot) LIMIT 100
Example queries
Even more Useful links
W3C standards http://www.w3.org/standards/semanticweb/ OKFN endpoints status (and list) http://sparqles.okfn.org LodLive (a SPRQL navigator) http://en.lodlive.it a very good intro to RDF https://github.com/JoshData/rdfabout/blob/gh-pages/intro-to-rdf.md Tim Berners-Lee’s “Linked Data – 5 stars ranking” http://www.w3.org/DesignIssues/LinkedData.html My github page http://github.com/dvcama My email mailto:[email protected]