linguistic linked open data, challenges, approaches, future work
Post on 16-Apr-2017
2.554 views
Embed Size (px)
TRANSCRIPT
Linguistic Linked Open DataLLOD
Challenges, Approaches, Future Work
Sebastian HellmannTKE 2016
1
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
AKSW / KILT in Leipzig Leipzig has become one of the largest Semantic Web centers
AKSW has 4 subgroups and 45 PhD students http://aksw.org/Team.html
Current position:
- Head of AKSW / KILT research group (8 PhD students)- Knowledge Integration and Language Technology (KILT) http://aksw.org/Groups/KILT.html
- Project manager for 2 H2020 and 1 German research project (BMWi)- http://freme-project.eu/ , http://aligned-project.eu/ , http://smartdataweb.de/
- Executive Director of the DBpedia Association http://dbpedia.org
2
http://aksw.org/Team.htmlhttp://aksw.org/Groups/KILT.htmlhttp://freme-project.eu/http://aligned-project.eu/http://smartdataweb.de/http://freme-project.eu/http://dbpedia.org
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Outline The vision behind Linked Data - a technological introduction Linguistic Linked Open Data Knowledge Modelling vs. Data Encoding LIDER Challenges and Approaches
3
Linked Data
4
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Web of DataWWW vs. GGG - https://en.wikipedia.org/wiki/Giant_Global_Graph
Data on the Web vs. the Web of Data vs. the Semantic Web
RDF - Entity Attribute Value - http://dbpedia.org/resource/Copenhagen
Three ways to publish RDF:
1. Linked Data: resource-level access via HTTP request (next slide)2. SPARQL: query access via triplestore database3. Dump: dataset-level access via bulk download
5
https://en.wikipedia.org/wiki/Giant_Global_Graphhttp://dbpedia.org/page/Copenhagen
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Linked DataFour rules of https://www.w3.org/DesignIssues/LinkedData
1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, provide useful information, using the
standards (RDF*, SPARQL)4. Include links to other URIs. so that they can discover more things.
https://en.wikipedia.org/wiki/Copenhagen vs. http://dbpedia.org/resource/Copenhagen
Source: https://www.w3.org/DesignIssues/LinkedData.html 6
https://www.w3.org/DesignIssues/LinkedData.htmlhttps://en.wikipedia.org/wiki/Copenhagenhttps://en.wikipedia.org/wiki/Copenhagenhttp://dbpedia.org/resource/Copenhagenhttp://dbpedia.org/resource/Copenhagenhttps://www.w3.org/DesignIssues/LinkedData.html
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Open Data != Open DataOpen Access vs Open License
Open Access means accessible like a web page (often unclear license)
http://opendefinition.org by OKFN:
Knowledge is open if anyone is free to access, use, modify, and share it subject, at most, to measures that preserve provenance and openness.
7
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 8
http://lod-cloud.net/
http://lod-cloud.net/http://lod-cloud.net/
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
How is the Linked Data Cloud built?
9
- Open Access as the basis- 50 links between things required to receive
a dataset link- http://lov.okfn.org- http://datahub.io - Assessing Quantity and Quality of Links Between Linked Data Datasets by Ciro Baron Neto, Dimitris Kontokostas,
Sebastian Hellmann, Kay Mller, and Martin Brmmer in LDOW 2016 http://events.linkeddata.org/ldow2016/papers/LDOW2016_paper_09.pdf
http://lov.okfn.orghttp://lov.okfn.orghttp://datahub.iohttp://datahub.iohttp://events.linkeddata.org/ldow2016/papers/LDOW2016_paper_09.pdfhttp://events.linkeddata.org/ldow2016/papers/LDOW2016_paper_09.pdfhttp://events.linkeddata.org/ldow2016/papers/LDOW2016_paper_09.pdf
Linguistic Linked Open Data
10
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Linguistic Linked Open Data Movement originated in the context of the Working Group for Open Data in
Linguistics (OWLG) at Open Knowledge Foundation (OKFN) Open is supposed to mean Open license Join community mailing list at http://linguistics.okfn.org/ Current information at http://linguistic-lod.org/
maintained by John McCrae -> Instructions on how to join the LLOD cloud
11
http://linguistics.okfn.org/http://linguistic-lod.org/
January 2011
12
13
February 2012
Linked Data in Linguistics. Representing Language Data and Metadata (http://www.springer.com/computer/ai/book/978-3-642-28248-5 ) Christian Chiarcos, Sebastian Nordhoff, and Sebastian Hellmann (Eds.). Springer, Heidelberg, (2012)
August 2012
14
Sept 2012MLODE
15
Special Issue on Multilingual Linked Open Data (MLOD)Editors: Sebastian Hellmann, Steven Moran, Martin Brmmer, and John McCrae, Semantic Web, vol. 6, no. 4, pp. 315-317, 2015
Jan 2013
16
Sep 2013
17
LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/
http://lider-project.eu/http://lider-project.eu/
May 2014
18
LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/
http://lider-project.eu/http://lider-project.eu/
Nov 2014
19
LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/
http://lider-project.eu/http://lider-project.eu/
May 2015
20
LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/
http://lider-project.eu/http://lider-project.eu/
May 2016
21
LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/
http://lider-project.eu/http://lider-project.eu/
22
Should we all use Linked Data?
23
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Should we all use Linked Data?
When should we use linked data?
How should we use linked data?
When should we not use it?
24
Knowledge Modeling vs. Data Encoding
25
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Entity Relationship Diagrams and UML
26
The Metadata Ecosystem of the DataId Ontology, Markus Freudenberg, submitted to MTSR Conf 2016
http://dataid.dbpedia.org
http://dataid.orghttp://dataid.org
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
XML encoding variants
27
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
XML encoding variants
28
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
XML encoding variants
should be symmetric, reflexive and transitive https://en.wikipedia.org/wiki/Equivalence_relation
Apples and oranges
29
https://en.wikipedia.org/wiki/Equivalence_relation
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Who can you ask what XML tags and structure mean and what they are used for?
30
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Who can you ask what XML tags and structure mean and what they are used for?
31
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Internationalization Tag Set (ITS) 2.0http://www.w3.org/TR/its20/
W3C Recommendation since 29 October 2013 defines how to embed Machine Translation and Localisation
annotations, so called Data Categories, in (X)HTML and XML In addition to the human-readable document two ontologies are referenced
that capture the semantics of the standard. ITS Ontology as companion NLP Interchange Format (NIF) is the recommended format for RDF
conversion of ITS2.0 http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core
32
http://www.w3.org/TR/its20/http://www.w3.org/TR/its20/http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-corehttp://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-corehttp://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
Internationalization Tag Set (ITS) 2.0
33
One of the most efficient and robust ways to annotate HTML in a standardized manner
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
NLP Interchange Format 2.0 (old example)
34
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
NLP Interchange Format 2.0 (old example)
35
Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016
NIF 2.1 release pendingJoin W3C Community Group: https://www.w3.org/community/ld4lt/
NIF useful for:
Adding semantics to NLP tool output and corpora Providing and publishing identifiers for text and annotations
NIF is compact and scalable (cf. http://wiki-link.nlp2rdf.org/ ):
Google Wikilinks Corpus with 10.6 million webpages and 31.5 million Wikipedia links (about 3 per page) with a zipped size of 180 GB.
533 million triples (other formats 7-27% more) 79 GB (12 GB gzipped dumps) in Turtle format (original size 180 GB containing HTML markup)
36
https://www.w3.org/community/ld4lt/http://wiki-link.nlp2rdf.org/
LIDER Towards a linguistic linked data ecosystem
37
Website: http://lider-project.eu Guidelines: http://lider-project.eu/?q=guidelines
http://lider-project.euhttp://lider-project.eu/?q=guidelines
Sebastian Hellmann