Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
AKSW / KILT in Leipzig Leipzig has become one of the largest Semantic Web centers
AKSW has 4 subgroups and 45 PhD students http://aksw.org/Team.html
Current position:
- Head of AKSW / KILT research group (8 PhD students)- Knowledge Integration and Language Technology (KILT) http://aksw.org/Groups/KILT.html
- Project manager for 2 H2020 and 1 German research project (BMWi)- http://freme-project.eu/ , http://aligned-project.eu/ , http://smartdataweb.de/
- Executive Director of the DBpedia Association http://dbpedia.org
2
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Outline● The vision behind Linked Data - a technological introduction● Linguistic Linked Open Data● Knowledge Modelling vs. Data Encoding● LIDER● Challenges and Approaches
3
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Web of DataWWW vs. GGG - https://en.wikipedia.org/wiki/Giant_Global_Graph
Data on the Web vs. the Web of Data vs. the Semantic Web
RDF - Entity Attribute Value - http://dbpedia.org/resource/Copenhagen
Three ways to publish RDF:
1. Linked Data: resource-level access via HTTP request (next slide)2. SPARQL: query access via triplestore database3. Dump: dataset-level access via bulk download
5
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Linked DataFour rules of https://www.w3.org/DesignIssues/LinkedData
1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, provide useful information, using the
standards (RDF*, SPARQL)4. Include links to other URIs. so that they can discover more things.
https://en.wikipedia.org/wiki/Copenhagen vs. http://dbpedia.org/resource/Copenhagen
Source: https://www.w3.org/DesignIssues/LinkedData.html 6
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Open Data != Open DataOpen Access vs Open License
Open Access means accessible like a web page (often unclear license)
http://opendefinition.org by OKFN:
“Knowledge is open if anyone is free to access, use, modify, and share it — subject, at most, to measures that preserve provenance and openness.”
7
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 8
http://lod-cloud.net/
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
How is the Linked Data Cloud built?
9
- Open Access as the basis- 50 links between things required to receive
a dataset link- http://lov.okfn.org- http://datahub.io - Assessing Quantity and Quality of Links Between Linked Data Datasets by Ciro Baron Neto, Dimitris Kontokostas,
Sebastian Hellmann, Kay Müller, and Martin Brümmer in LDOW 2016 http://events.linkeddata.org/ldow2016/papers/LDOW2016_paper_09.pdf
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Linguistic Linked Open Data● Movement originated in the context of the Working Group for Open Data in
Linguistics (OWLG) at Open Knowledge Foundation (OKFN)● Open is supposed to mean Open license● Join community mailing list at http://linguistics.okfn.org/ ● Current information at http://linguistic-lod.org/
maintained by John McCrae -> Instructions on how to join the LLOD cloud
11
13
February 2012
Linked Data in Linguistics. Representing Language Data and Metadata (http://www.springer.com/computer/ai/book/978-3-642-28248-5 ) Christian Chiarcos, Sebastian Nordhoff, and Sebastian Hellmann (Eds.). Springer, Heidelberg, (2012)
Sept 2012MLODE
15
Special Issue on Multilingual Linked Open Data (MLOD)Editors: Sebastian Hellmann, Steven Moran, Martin Brümmer, and John McCrae, Semantic Web, vol. 6, no. 4, pp. 315-317, 2015
Sep 2013
17
LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/
May 2014
18
LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/
Nov 2014
19
LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/
May 2015
20
LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/
May 2016
21
LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Should we all use Linked Data?
When should we use linked data?
How should we use linked data?
When should we not use it?
24
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Entity Relationship Diagrams and UML
26
The Metadata Ecosystem of the DataId Ontology, Markus Freudenberg, submitted to MTSR Conf 2016
http://dataid.dbpedia.org
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
XML encoding variants
<same> should be symmetric, reflexive and transitive https://en.wikipedia.org/wiki/Equivalence_relation
Apples and oranges
29
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Who can you ask what XML tags and structure mean and what they are used for?
30
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Who can you ask what XML tags and structure mean and what they are used for?
31
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Internationalization Tag Set (ITS) 2.0http://www.w3.org/TR/its20/
● W3C Recommendation since 29 October 2013● defines how to embed Machine Translation and Localisation
annotations, so called Data Categories, in (X)HTML and XML● In addition to the human-readable document two ontologies are referenced
that capture the semantics of the standard.● ITS Ontology as companion● NLP Interchange Format (NIF) is the recommended format for RDF
conversion of ITS2.0 http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core
32
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Internationalization Tag Set (ITS) 2.0
33
One of the most efficient and robust ways to annotate HTML in a standardized manner
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
NIF 2.1 release pendingJoin W3C Community Group: https://www.w3.org/community/ld4lt/
NIF useful for:
● Adding semantics to NLP tool output and corpora● Providing and publishing identifiers for text and annotations
NIF is compact and scalable (cf. http://wiki-link.nlp2rdf.org/ ):
● Google Wikilinks Corpus with 10.6 million webpages and 31.5 million Wikipedia links (about 3 per page) with a zipped size of 180 GB.
● 533 million triples (other formats 7-27% more) ● 79 GB (12 GB gzipped dumps) in Turtle format (original size 180 GB containing HTML markup)
36
LIDER Towards a linguistic linked data ecosystem
37
Website: http://lider-project.eu Guidelines: http://lider-project.eu/?q=guidelines
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
LIDER - Deliverable 2.1.2
39
http://www.lider-project.eu/sites/default/files/D2.1.2-Phase-II.pdf
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
LIDER Reference Architecture Deliverable 3.1.2.General:
lemon - developed by
40
http://www.lider-project.eu/sites/default/files/D3.1.2-v2.0.pdf
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Identifier management- Ideal identifiers are stable, i.e. the meaning behind the URI does not change- Unrealistic for most use cases - Easier for individuals, i.e. persons, organisations- Non-trivial for terminology
Proposals:
1. Apply software development practices, i.e. versioning, update scripts http://vocol.org , http://github.org , http://aligned-project.eu
2. ??42
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Knowledge Fusion- Linking is mostly done manual- Linking 200 datasets pairwise requires maintenance of 40000 mappings- Adding one after the other depends on the merge order- Ideally we would be able to structure all datasets into clusters before linking
Proposals:
1. Under discussion with: Erhard Rahm - The Case for Holistic Data Integration ADBIS 2016 Keynote: http://adbis2016.vsb.cz/keynote/ (to appear)
2. Apply software development processes: https://github.com/dbpedia/links
43
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
The Metadata ChallengeWhere to publish metadata for your data?
- Barrier between data and dataset description- Stale metadata- Single point of truth missing- Metadata too heterogeneous- Download link missing- No (sufficiently) complete view over the web of data possible, discovery failure
Proposals:
1. build an index: http://linghub.lider-project.eu/ (Clarin, LRE Map, Metashare, Datahub)2. create a better schema: http://dataid.dbpedia.org and provide benefits for complying
44
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
MMoOn- LIDER
- Lemon- ODRL- Olia - NIF
- Morphology quite complex- Specific to language and to the
linguist - http://mmoon.org
45
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
The Metadata Challenge 2● RDF structure is too simple to keep additional metadata
○ Scope○ Validity○ Confidence○ Technical metadata, i.e. collection time
Contextualisation is probably already better researched in lexicography than in Semantic Web.
46
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
● Data Quality can be defined and measure with the tools.● http://svn.aksw.org/papers/2014/WWW_Databugger/public.pdf Test-driven
Evaluation of Linked Data Quality by Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, and Amrapali J. Zaveri in Proceedings of the 23rd International Conference on World Wide Web
● Current standard:○ https://www.w3.org/TR/shacl/
Data quality and verification
48
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Open licenses in research
49
Are you willing to publish your data under an open
license?
Can you make a product out of your data?
No
Yes
Start
Congratulations, your paper has been accepted
Yes
Good luck, we wish you all the best and a high profit
No
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Entity Linking Verification - new translator job profile
● http://www.freme-project.eu/ ● Business Case: Integrating semantic enrichment into multilingual content in
translation and localisation● In the future, translators and lexicographers
might be asked to judge entity linking andverify data
50
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Should I invest in publishing linked data?Long-term data strategy, if you:
● Have many expected inbound links
● Persistent ids● Long term hosting and curation
Is no problem for you
-> yes (data value increases)
One time thing:
● Interest of externals only in the yellow zone-> Publish under open license (let someone else do it)
51
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
DBpedia AssociationDBpedia+
● Maintain identifier space● Add open and member data to DBpedia+● Add data following the LIDER guidelines● Ability to add your backlinks
DBpedia Community meeting on the 15th of September in Leipzig
52
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Events in 2016● KEKI 2016 Workshop - Uses of Linguistic Linked Open Data http://keki2016.
linguistic-lod.org/ Deadline is 1st of July, but might be extended● http://2016.semantics.cc
53