![Page 1: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/1.jpg)
FOSDEM 5/02/2011 1
With the help of the Datalift teamAnd the support of the French National Research Agency
Datalift: A Catalyser for the Web of Data
François ScharffeLIRMM/CNRS/University of Montpellier
[email protected] @lechatpito
![Page 2: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/2.jpg)
The data revolution is on its way !
As Open Data meets the Semantic Web
![Page 3: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/3.jpg)
The promises of linked-data
![Page 4: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/4.jpg)
Richer Applications
Linked Data Lite | the Web on Steroids 1.0 (iPhone)
![Page 5: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/5.jpg)
Richer applications
BBC Programmes
![Page 6: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/6.jpg)
More precise search and QA
![Page 7: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/7.jpg)
Making your data 5 stars
http://www.w3.org/DesignIssues/LinkedData.html
![Page 8: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/8.jpg)
So, how to lift data ?
How to publish data on the Web as linked-data ?
● Basic principles Tim Berners Lee [2006] (Design Issues)
– Use URIs to identify things (not only documents)– Use HTTP URIs– When dereferecing URIS, return a description of the
ressource– Include links to other ressources on the Web
![Page 9: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/9.jpg)
Welcome aboard the data lift
Published and interlinked data on the Web
Applications
Interconnexion
Publication infrastructure
Data convertion
Vocabulary selection
Raw data
![Page 10: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/10.jpg)
Datalift
Datasets publication
R&D to automate the publication process
Tool suite to help publish data
Training, tutorials, data publication camps
![Page 11: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/11.jpg)
SemWebPro 18/01/2011 11
1st floor - Selection
![Page 12: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/12.jpg)
Les vocabulaires de mes amis …
Ø What is a (good) vocabulary for linked data ?
§ Usability criterias
Simplicity, visibility, sustainability, integration, coherence …
Ø Differents types of vocabularies
§ metadata, reference, domain, generalist …
§ The pillars of Linked Data : Dublin Core, FOAF, SKOS
Ø Good and less good practices
§ Ex : Programmes BBC vs legislation.gov.uk
§ Vocabulary of a Friend : networked vocabularies
Ø Linguistic problems
§ Existing vocabularies are in English at 99%
§ Terminological approach :which vocabularies for « Event » « Organization »
![Page 13: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/13.jpg)
SemWebPro 18/01/2011 13
Did you say « vocabulary »
… And why not « ontology »?
§ Or « schema » ou « metadata schema »?
§ Ou « model » (data ? World ?)
Ø All these terms are used and justifiable
They are all « vocabularies »
§ The define types of objects (or classes)and the properties (oo attributes) atttached to these objects.
§ Types and attributes are logically definedand named using natural language
§ A (semantic) vocabularyis an explicit formalizationof concepts existing in natural language
![Page 14: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/14.jpg)
Vocabularies for linked data
ØAre meant to describe resources in RDF
ØAre based on one of the standard W3C language§ RDF Schema (RDFS)
• For vocabulaires without too much logical complexity
§ OWL • For more complex ontological constructs
§ These two languages are compatible (almost)
ØThe can be composed « ad libitum »§ One can reuse a few elements of a vocabulary
§ The original semantics have to be followed
![Page 15: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/15.jpg)
What makes a good vocabulary ?
Ø A good vocabulary is a used vocabulary
§ Data published on CKAN give an idea of vocabulary usage
§ Exemple : vlist of datasets using FOAF http://xmlns.com/foaf/0.1/
Ø Other usability criterias
§ Simplicity and readability in natural language
§ Elements documentation (definition in natural language)
§ Visibility and sustainability of the publication
§ Flexibility and extensibility
§ Sémantique integration (with other vocabularies)
§ Social integration (with the user community)
![Page 16: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/16.jpg)
A vocabulary is also a community
ØBad (but common) practice● Build a lonely vocabulary
– For example as a research project– Without basing it on any existing vocabulary
§ To publish it (or not) and then to forget about it
§ Not to care about its users
ØA good vocabulary has an organic life
§ Users and use cases
§ Revisions and extensions
§ Like a « natural » vocabulary
![Page 17: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/17.jpg)
Types of vocabularies
Ø Metadata vocabularies
§ Allowing to annotate other vocabularies
• Dublin Core, Vann, cc REL, Status
Ø Reference vocabularies
§ Provide « common » classes and properties
• FOAF, Event, Time, Org Ontology
Ø Domain vocabularies
§ Specific to a domain of knowledge
• Geonames, Music Ontology, WildLife Ontology
Ø « general » vocabularies
§ Describe « everything » at an arbitrary detail level
• DBpedia Ontology, Cyc Ontology, SUMO
![Page 18: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/18.jpg)
Vocabulary of a Friend
Øhttp://www.mondeca.com/foaf/voaf
ØA simple vocabulary...
ØTo represent interconnexions between vocabularies
ØA unique entry point to vocabularies and Datasets of the linked-data cloud Linked Data Cloud
ØOngoing work in Datalift
![Page 19: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/19.jpg)
SemWebPro 18/01/2011 19
2nd floor - Conversion
![Page 20: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/20.jpg)
URL Design et URL Pattern
ØGood practices for linked-data
§ Ressource: http://dbpedia.org/resource/Paris
§ Document: http://dbpedia.org/page/Paris
§ Data: http://dbpedia.org/data/Paris
Ø… served using content negociation
![Page 21: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/21.jpg)
URI Pattern in REST
ØLes services REST (Representational State Transfer) manipulent des ressources et les URLs sont principalement utilisés pour adresser ces ressources
ØUne URI de base:
§ http://www.example.com/bookstore/
ØUne ressource à un URL unique: (retrieve, update, create, delete)
§ http://www.example.com/bookstore/books/ISBN123
ØNotion de collection: (list, replace, create, delete)
§ http://www.example.com/bookstore/books
![Page 22: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/22.jpg)
Convertion tools to RDF
ØHow is the raw data to be converted ?
§ Relational Database ?
§ (Semi-)structured formats ?
§ Programmatic acces (API) ?
ØThere are solutions for all cases
![Page 23: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/23.jpg)
D2RQ Map
![Page 24: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/24.jpg)
Triplify: Relational data to JSON/RDF
ØExtract a folder in your Webapp: http://sourceforge.net/projects/triplify/
ØModify a config file:
§ SQL query … URI pattern
§ PHP lover!
![Page 25: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/25.jpg)
Working on spreadsheets
![Page 26: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/26.jpg)
Google acquired Freebase
http://code.google.com/p/google-refine/
![Page 27: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/27.jpg)
RDF extension for Google Refine
ØA graphical extension for Google Refine allowing to export the clean data as RDFhttp://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/
Name Job Title Grade Organization
Annual pay rate - including
taxable benefits and allowances
Notes
Stephan Wilcke Chief Executive Officer
Asset Protection Agency
£150,000 - £154,999
Jens Bech Chief Risk Officer Asset Protection Agency
£165,000 - £169,999 No pension
Ion Dagtoglou Chief Invesment Officer
Asset Protection Agency
£165,000 - £169,999 No pension
Brian Scammell Chief Credit Officer
Asset Protection Agency
£130,000 - £134,999 4 days per week
![Page 28: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/28.jpg)
Google Refine et RDF
![Page 29: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/29.jpg)
SemWebPro 18/01/2011 29
3rd floor - Publication
![Page 30: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/30.jpg)
Publication components
SPARQLendpoint
REST
RDFstorage
Alimentation
Alimentation
Alimentation
InferenceEngine
QueryingBrowsing
A few productsVirtuoso, Sesame, Mulgara, 4storeOWLIM, AllegroGraph, Big Data,Jena
![Page 31: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/31.jpg)
Named graphs
1
23
4
5
6
7
8
9
1110
14
12
13
15
16
ØDelete on a graph
ØSPARQL queries define graphs
ØRdf graphs are bags of triples, everything is mixed
![Page 32: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/32.jpg)
Inference
Ø Generating triples from other triples
Ø Deduction mechanism
§ Men are mortals, Socrates is a man, so Socrates is mortal
Ø Allows to avoid exhaustivity, give sense to defining hierarchies
Ø Constraints: cardinality, NFPs, ...
1
23
4
5
6
7
8
9
1110
14
12
13
15
16
![Page 33: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/33.jpg)
Analyse des RDF Store : la méthode QSOS
Ø Qualification and Selection of Open Source Software
§ Projet Open Source sur des solutions open source
§ http://www.qsos.org
Ø Objectifs de QSOS
§ Qualifier des logiciels
§ Comparer des solutions après avoir défini des exigences et en pondérant les critères
§ Sélectionner le produit le plus adapté par rapport à un besoin
Ø QSOS fournit
§ Une méthode objective et formalisée
§ Un référentiel d’études disponibles
§ Des outils facilitant le déroulement de la méthode
![Page 34: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/34.jpg)
SemWebPro 18/01/2011 34
4th floor - Interconnexion
![Page 35: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/35.jpg)
Linked data and interconnexions
ØWithout links there is no Web but data silos
ØLinks can be part of the datasets design (reference datasets)
ØLinks can be found after the publication: equivalence links between resources
![Page 36: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/36.jpg)
Comment interconnecter ses données ?
![Page 37: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/37.jpg)
Tools
Ø RKB-CRS A coreference resolution service for the RKB knowledge base
Ø LD-mapper A linkage tool for datasets described using the Music Ontology
Ø ODD Linker A linkage tool based on SQL
Ø RDF-AI Multi purpose data linkage and fusion
Ø Silk et Silk LSL Linkage tool and linkage specification language
Ø Knofuss architecture Datasets linkage and fusion
![Page 38: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/38.jpg)
Exemple Silk specification
<Silk> <Prefix id="rdfs" namespace= "http://www.w3.org/2000/01/rdf-schema#" /> <Prefix id="dbpedia" namespace= "http://dbpedia.org/ontology/" /> <Prefix id="gn" namespace= "http://www.geonames.org/ontology#" />
<DataSource id="dbpedia"> <EndpointURI>http://demo_sparql_server1/sparql </EndpointURI> <Graph>http://dbpedia.org</Graph> </DataSource>
<DataSource id="geonames"> <EndpointURI>http://demo_sparql_server2/sparql </EndpointURI> <Graph>http://sws.geonames.org/</Graph> </DataSource> <Thresholds accept="0.9" verify="0.7" /> <Output acceptedLinks="accepted_links.n3" verifyLinks="verify_links.n3" mode="truncate" />
<Interlink id="cities"> <LinkType>owl:sameAs</LinkType> <SourceDataset dataSource="dbpedia" var="a"> <RestrictTo> ?a rdf:type dbpedia:City </RestrictTo> </SourceDataset> <TargetDataset dataSource="geonames" var="b"> <RestrictTo> ?b rdf:type gn:P </RestrictTo> </TargetDataset> <LinkCondition> <AVG> <Compare metric="jaroSimilarity"> <Param name="str1" path="?a/rdfs:label" /> <Param name="str2" path="?b/gn:name" /> </Compare> <Compare metric="numSimilarity"> <Param name="num1" path="?a/dbpedia:populationTotal" /> <Param name="num2" path="?b/gn:population" /> </Compare> </AVG> </LinkCondition> </Interlink></Silk>
![Page 39: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/39.jpg)
Where to find links ?
![Page 40: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/40.jpg)
Towards automated interconnexion services
ØThe linkage specification could be simplified
§ Using alignments between vocabularies
§ Detection of discriminating properties
§ Indicating comparison methods by attaching metadata to ontologies
ØWork in progress in Datalift
![Page 41: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/41.jpg)
SemWebPro 18/01/2011 41
5th floor - Applications
![Page 43: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/43.jpg)
VisiNav
![Page 44: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/44.jpg)
Sig.ma
![Page 45: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/45.jpg)
![Page 46: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/46.jpg)
Nos Députés . FR
![Page 47: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/47.jpg)
A few examples from US
http://data-gov.tw.rpi.edu/demo/USForeignAid/demo-1554.html
![Page 48: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/48.jpg)
Mashups … Mashups … Mashups …
![Page 49: Datalift a-catalyser-for-the-web-of-data-fosdem-05-02-2011](https://reader035.vdocuments.site/reader035/viewer/2022062708/558cd094d8b42ad6438b45c6/html5/thumbnails/49.jpg)
That's it !
● Datalift.org● We're looking for a Datageek !