linguistic linked open data, challenges, approaches, future work

Download Linguistic Linked Open Data, Challenges, Approaches, Future Work

Post on 16-Apr-2017

2.554 views

Category:

Internet

0 download

Embed Size (px)

TRANSCRIPT

  • Linguistic Linked Open DataLLOD

    Challenges, Approaches, Future Work

    Sebastian HellmannTKE 2016

    1

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    AKSW / KILT in Leipzig Leipzig has become one of the largest Semantic Web centers

    AKSW has 4 subgroups and 45 PhD students http://aksw.org/Team.html

    Current position:

    - Head of AKSW / KILT research group (8 PhD students)- Knowledge Integration and Language Technology (KILT) http://aksw.org/Groups/KILT.html

    - Project manager for 2 H2020 and 1 German research project (BMWi)- http://freme-project.eu/ , http://aligned-project.eu/ , http://smartdataweb.de/

    - Executive Director of the DBpedia Association http://dbpedia.org

    2

    http://aksw.org/Team.htmlhttp://aksw.org/Groups/KILT.htmlhttp://freme-project.eu/http://aligned-project.eu/http://smartdataweb.de/http://freme-project.eu/http://dbpedia.org

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    Outline The vision behind Linked Data - a technological introduction Linguistic Linked Open Data Knowledge Modelling vs. Data Encoding LIDER Challenges and Approaches

    3

  • Linked Data

    4

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    Web of DataWWW vs. GGG - https://en.wikipedia.org/wiki/Giant_Global_Graph

    Data on the Web vs. the Web of Data vs. the Semantic Web

    RDF - Entity Attribute Value - http://dbpedia.org/resource/Copenhagen

    Three ways to publish RDF:

    1. Linked Data: resource-level access via HTTP request (next slide)2. SPARQL: query access via triplestore database3. Dump: dataset-level access via bulk download

    5

    https://en.wikipedia.org/wiki/Giant_Global_Graphhttp://dbpedia.org/page/Copenhagen

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    Linked DataFour rules of https://www.w3.org/DesignIssues/LinkedData

    1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names.3. When someone looks up a URI, provide useful information, using the

    standards (RDF*, SPARQL)4. Include links to other URIs. so that they can discover more things.

    https://en.wikipedia.org/wiki/Copenhagen vs. http://dbpedia.org/resource/Copenhagen

    Source: https://www.w3.org/DesignIssues/LinkedData.html 6

    https://www.w3.org/DesignIssues/LinkedData.htmlhttps://en.wikipedia.org/wiki/Copenhagenhttps://en.wikipedia.org/wiki/Copenhagenhttp://dbpedia.org/resource/Copenhagenhttp://dbpedia.org/resource/Copenhagenhttps://www.w3.org/DesignIssues/LinkedData.html

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    Open Data != Open DataOpen Access vs Open License

    Open Access means accessible like a web page (often unclear license)

    http://opendefinition.org by OKFN:

    Knowledge is open if anyone is free to access, use, modify, and share it subject, at most, to measures that preserve provenance and openness.

    7

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016 8

    http://lod-cloud.net/

    http://lod-cloud.net/http://lod-cloud.net/

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    How is the Linked Data Cloud built?

    9

    - Open Access as the basis- 50 links between things required to receive

    a dataset link- http://lov.okfn.org- http://datahub.io - Assessing Quantity and Quality of Links Between Linked Data Datasets by Ciro Baron Neto, Dimitris Kontokostas,

    Sebastian Hellmann, Kay Mller, and Martin Brmmer in LDOW 2016 http://events.linkeddata.org/ldow2016/papers/LDOW2016_paper_09.pdf

    http://lov.okfn.orghttp://lov.okfn.orghttp://datahub.iohttp://datahub.iohttp://events.linkeddata.org/ldow2016/papers/LDOW2016_paper_09.pdfhttp://events.linkeddata.org/ldow2016/papers/LDOW2016_paper_09.pdfhttp://events.linkeddata.org/ldow2016/papers/LDOW2016_paper_09.pdf

  • Linguistic Linked Open Data

    10

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    Linguistic Linked Open Data Movement originated in the context of the Working Group for Open Data in

    Linguistics (OWLG) at Open Knowledge Foundation (OKFN) Open is supposed to mean Open license Join community mailing list at http://linguistics.okfn.org/ Current information at http://linguistic-lod.org/

    maintained by John McCrae -> Instructions on how to join the LLOD cloud

    11

    http://linguistics.okfn.org/http://linguistic-lod.org/

  • January 2011

    12

  • 13

    February 2012

    Linked Data in Linguistics. Representing Language Data and Metadata (http://www.springer.com/computer/ai/book/978-3-642-28248-5 ) Christian Chiarcos, Sebastian Nordhoff, and Sebastian Hellmann (Eds.). Springer, Heidelberg, (2012)

  • August 2012

    14

  • Sept 2012MLODE

    15

    Special Issue on Multilingual Linked Open Data (MLOD)Editors: Sebastian Hellmann, Steven Moran, Martin Brmmer, and John McCrae, Semantic Web, vol. 6, no. 4, pp. 315-317, 2015

  • Jan 2013

    16

  • Sep 2013

    17

    LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/

    http://lider-project.eu/http://lider-project.eu/

  • May 2014

    18

    LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/

    http://lider-project.eu/http://lider-project.eu/

  • Nov 2014

    19

    LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/

    http://lider-project.eu/http://lider-project.eu/

  • May 2015

    20

    LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/

    http://lider-project.eu/http://lider-project.eu/

  • May 2016

    21

    LIDER FP7 EU Project Start: Nov 2013 Duration: 2 yearshttp://lider-project.eu/

    http://lider-project.eu/http://lider-project.eu/

  • 22

  • Should we all use Linked Data?

    23

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    Should we all use Linked Data?

    When should we use linked data?

    How should we use linked data?

    When should we not use it?

    24

  • Knowledge Modeling vs. Data Encoding

    25

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    Entity Relationship Diagrams and UML

    26

    The Metadata Ecosystem of the DataId Ontology, Markus Freudenberg, submitted to MTSR Conf 2016

    http://dataid.dbpedia.org

    http://dataid.orghttp://dataid.org

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    XML encoding variants

    27

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    XML encoding variants

    28

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    XML encoding variants

    should be symmetric, reflexive and transitive https://en.wikipedia.org/wiki/Equivalence_relation

    Apples and oranges

    29

    https://en.wikipedia.org/wiki/Equivalence_relation

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    Who can you ask what XML tags and structure mean and what they are used for?

    30

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    Who can you ask what XML tags and structure mean and what they are used for?

    31

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    Internationalization Tag Set (ITS) 2.0http://www.w3.org/TR/its20/

    W3C Recommendation since 29 October 2013 defines how to embed Machine Translation and Localisation

    annotations, so called Data Categories, in (X)HTML and XML In addition to the human-readable document two ontologies are referenced

    that capture the semantics of the standard. ITS Ontology as companion NLP Interchange Format (NIF) is the recommended format for RDF

    conversion of ITS2.0 http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core

    32

    http://www.w3.org/TR/its20/http://www.w3.org/TR/its20/http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-corehttp://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-corehttp://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    Internationalization Tag Set (ITS) 2.0

    33

    One of the most efficient and robust ways to annotate HTML in a standardized manner

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    NLP Interchange Format 2.0 (old example)

    34

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    NLP Interchange Format 2.0 (old example)

    35

  • Sebastian Hellmann - AKSW/KILT Copenhagen TKE 2016

    NIF 2.1 release pendingJoin W3C Community Group: https://www.w3.org/community/ld4lt/

    NIF useful for:

    Adding semantics to NLP tool output and corpora Providing and publishing identifiers for text and annotations

    NIF is compact and scalable (cf. http://wiki-link.nlp2rdf.org/ ):

    Google Wikilinks Corpus with 10.6 million webpages and 31.5 million Wikipedia links (about 3 per page) with a zipped size of 180 GB.

    533 million triples (other formats 7-27% more) 79 GB (12 GB gzipped dumps) in Turtle format (original size 180 GB containing HTML markup)

    36

    https://www.w3.org/community/ld4lt/http://wiki-link.nlp2rdf.org/

  • LIDER Towards a linguistic linked data ecosystem

    37

    Website: http://lider-project.eu Guidelines: http://lider-project.eu/?q=guidelines

    http://lider-project.euhttp://lider-project.eu/?q=guidelines

  • Sebastian Hellmann

Recommended

View more >