building the multilingual web of data: a hands-on...

84
20/10/2014 1 Presenter name Building the Multilingual Web of Data: A Hands-on tutorial (ISWC 2014, Riva del Garda - Italy) Multilingual Word Sense Disambiguation and Entity Linking on the Web based on BabelNet Roberto Navigli, Tiziano Flati – Sapienza University of Rome

Upload: others

Post on 10-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 1Presenter name

Building the Multilingual Web of Data:A Hands-on tutorial(ISWC 2014, Riva del Garda - Italy)

Multilingual Word Sense Disambiguation and Entity Linking on

the Web based on BabelNetRoberto Navigli, Tiziano Flati – Sapienza University of Rome

Page 2: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 2Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

The instructors

• Roberto Navigli, associate professor, Department of Computer Science, Sapienza University of Rome

• Tiziano Flati, PhD student,Department of Computer Science,Sapienza University of Rome

Page 3: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 3Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

And, if you resist until the end…

you will…

…receive a prize!!!

A BabelNet t-shirt!!!

[model is not included]

Page 4: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 4Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Part 1:

Identifying multilingual concepts

and entities in text

Page 5: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 5Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

The driving force

• Web content is available in many languages

• Information should be extracted and processed

independently of the source/target language

• This could be done automatically by means of

high-performance multilingual text understanding

Page 6: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 6Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Word Sense Disambiguation and EntityLinking

«Thomas and Mario are strikers playing in Munich»

Entity Linking: The task

of discovering mentionsof entities within a text

and linking them in a

knowledge base.

WSD: The task aimed at

assigning meanings to word occurrences within

text.

Page 7: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 11Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

The general problem

POLYSEMY• Natural language is ambiguous

• The most frequent words have several meanings!

• Our job: model meaning from a computational perspective

Page 8: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 12Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Monosemous vs. Polysemous words

• Monosemous words: only one meaning– Examples:

• plant life

• internet

• Polysemous words: more than one meaning– Example: bar

– “a room or establishment where alcoholic drinks are served”– “a counter where you can obtain food or drink”– “a rigid piece of metal or wood”– “musical notation for a repeating pattern of musical beats”

Page 9: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 15Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

But how do we representand encode semantics?

• Thesauri

• Groups words according to similar meaning

• Relations between groups (e.g., narrower meanings)

• Roget’s Thesaurus (1911)

• Machine Readable Dictionaries

• Enumerates all meanings of a word

• Includes definitions, morphology, example usages, etc.

• Oxford Dictionary of English, LDOCE, Collins, etc.

• Computation Lexicons

• Repositories of structured knowledge about a word semantics and

syntax

• Include relations like hypernymy, meronymy, or entailment

• WordNet

Page 10: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 16Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

What if we choose BabelNet

as our sense inventory?

Page 11: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 17Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

BabelNet 2.5 @ http://babelnet.org

Page 12: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 18Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

BabelNet[Navigli and Ponzetto, AIJ 2012]

A wide-coverage multilingual semantic network including both encyclopedic (from Wikipedia) and lexicographic (from WordNet) entries

Concepts from WordNetNEs and specialized

concepts from Wikipedia

Concepts integrated from

both resources

Page 13: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 19Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Anatomy of BabelNet 2.5

50 languages covered (including Latin!)

List of languages at http://babelnet.org/stats

Page 14: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 20Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Anatomy of BabelNet 2.5

50 languages covered (including Latin!)

9.3M Babel synsets (concepts and named entities)

67M word senses

262M semantic relations (28 edges per synset on avg.)

7.7M synset-associated images

21M textual definitions

Page 15: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 21Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

New 2.5 version out!

• Seamless integration of:• WordNet 3.0

• Wikipedia

• Wikidata

• Wiktionary

• OmegaWiki

• Open Multilingual WordNet [Bond and Foster, 2013]

• Translations for all open-class parts of speech

• 1.1B RDF triples available via SPARQL endpoint

Page 16: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 22Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

WordNet+OpenMultilingualWordNet+Wikipedia+…

Page 17: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 23Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

+OmegaWiki+automatic translations…

Page 18: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 24Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

+textual definitions

Page 19: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 25Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

More definitions+Wikipediacategories+…

Page 20: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 26Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

+images

Page 21: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 27Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

BabelNet 1.1.1 2.0 2.5

1. From six to 50 languages;

2. From two resources to six;

3. From 5 million to 9.3 million synsets;

4. From 50 million to 68 million word senses;

5. From 140 million semantic relationsto 262 million semantic relations

Page 22: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 28Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

BabelNet 3.0 available from November 2014!!!

Page 23: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 29Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

BabelNet 3.0 available from November 2014!!!

Page 24: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 30Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

New Babelfy interface available from November 2014!!!

Page 25: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 31Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Part 2:

BabelNet-lemon: BabelNet asmultilingual linked data!

Page 26: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 32Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

- the RDF resource consists of a set of Lexicons, one per language.

- Lexicons gather Lexical Entries which comprise the forms of an entry;in our case: words of the Babel lexicon.

- Lexical Forms encode the surface realisation(s) of Lexical Entries;in our case: lemmas of Babel words.

- Lexical Senses represent the usage of a word as reference to a specific concept;in our case: Babel senses.

- Skos Concepts represent ‘units of thought’;in our case: Babel synsets.

babelnet.org#lemonRepresentation

SKOS concepts

Babel words

lemmas

Babel senses

Babel synsets

Page 27: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 33Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

babelnet.org#interlinking

• Links towards encyclopedic datasets:Wikipedia pages (36M):

bn-lemon:wikipediaPage on lemon:LexicalSense

Wikipedia categories (46M):

bn-lemon:wikipediaCategory on skos:concept

DBpedia pages (4M):

skos:exactMatch on skos:concept

DBpedia categories (15M):

bn-lemon:dbpediaCategory on skos:concept

• Links towards lexical datasets:– lemon-Wordnet (117k): skos:exactMatch on skos:concept

– lemon-Uby, OmegaWiki (en) (15k): skos:exactMatch onskos:concept

Page 28: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 34Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

babelnet.org#publicationOnTheWeb

• RDF dumps– format: n-triples

– We produced dumps for both URIs and IRIs

• SPARQL endpoint– virtuoso universal server

– http://babelnet.org:8084/sparql/

• Dereferencing– babelnet.org/2.0/

– Pubby Linked Data Frontend

(http://wifo5-03.informatik.uni-mannheim.de/pubby/)

Page 29: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 35Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Hands-on Session: BabelNet

Page 30: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 36Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Go to:

http://babelnet.org:8084/sparql/

Page 31: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 37Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Fuseki

1. Open directory babelnet-lemon

2. Launch launch_fuseki.sh or launch_fuseki.bat

3. Open http://localhost:3030/

4. Click on Control Panel

5. Click on Select

6. Within the SPARQL query text area put your SPARQL queries

7. [List of possible words in file freqs.top10K.words.txt]

Page 32: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 38Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Some useful prefixes…

• PREFIX bn: <http://babelnet.org/2.0/>

• PREFIX bn-lemon: <http://babelnet.org/model/babelnet#>

• PREFIX lemon: <http://www.lemon-model.net/lemon#>

• PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

• PREFIX lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#>

• PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

• PREFIX dc: <http://purl.org/dc/elements/1.1/>

• PREFIX dcterms: <http://purl.org/dc/terms/>

Page 33: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 39Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Exercise 1: Retrieve the senses of a given lemma

• Given a word, e.g. home, retrieve all its sensesand corresponding synsets in all supportedlanguages:

SELECT DISTINCT ?sense ?synset WHERE {

?entries a lemon:LexicalEntry .

?entries lemon:sense ?sense .

?sense lemon:reference ?synset .

?entries rdfs:label ?term .

FILTER (str(?term)="home")

} LIMIT 10

Page 34: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 40Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Exercise 2: Retrieve the senses of a lemma for a certain language

• We can restrict to a given language, e.g. English:

SELECT DISTINCT ?sense ?synset WHERE {

?entries a lemon:LexicalEntry .

?entries lemon:language "EN" .

?entries lemon:sense ?sense .

?sense lemon:reference ?synset .

?entries rdfs:label ?term .

FILTER (str(?term)="home")

} LIMIT 10

Page 35: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 41Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Exercise 3: Retrieve the translations of a given sense

• For instance, given the sensehttp://babelnet.org/2.0/home_EN/s00044488n

SELECT ?translation WHERE {

?entry a lemon:LexicalSense .

?entry lexinfo:translation ?translation .

FILTER (str(?entry)="http://babelnet.org/2.0/home_EN/s00044488n")

}

Page 36: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 42Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Exercise 4: Retrieve license information about a sense

• For instance, given the sense:• http://babelnet.org/2.0/home_EN/s00044488n

SELECT ?license WHERE {

?entry a lemon:LexicalSense .

?entry dcterms:license ?license .

FILTER (str(?entry)="http://babelnet.org/2.0/home_EN/s00044488n")

}

Page 37: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 43Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Exercise 5: Retrieve textual definitionsin all languages

• For instance, given the synset http://babelnet.org/2.0/s00000356n

SELECT DISTINCT ?language ?gloss ?license ?sourceurl WHERE {

?url a skos:Concept .

?url bn-lemon:synsetID ?synsetID .

OPTIONAL {

?url bn-lemon:definition ?definition .

?definition lemon:language ?language .

?definition bn-lemon:gloss ?gloss .

?definition dcterms:license ?license .

?definition dc:source ?sourceurl .

}

FILTER (str(?url)="http://babelnet.org/2.0/s00000356n")

}

Page 38: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 44Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Exercise 6: Retrieve a synset’shypernyms

• For instance, given the synset:– http://babelnet.org/2.0/s00000356n

SELECT ?broader WHERE {

?entry a skos:Concept .

OPTIONAL { ?entry skos:broader ?broader }

FILTER (str(?entry)="http://babelnet.org/2.0/s00000356n")

}

Page 39: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 45Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Part 2:

WSD and Entity Linking

Together!

Page 40: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 46Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Back to the original problem

Page 41: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 47Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Word Sense Disambiguation in a Nutshell

strikers

(target word)

“Thomas and Mario are strikers playing in Munich”

(context)

WSD

system

knowledge

sense of target word

Page 42: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 48Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Entity Linking in a Nutshell

Thomas

(target mention)

“Thomas and Mario are strikers playing in Munich”

(context)

EL

system

knowledge

Named Entity

Page 43: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 49Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Entity Linking

• EL encompasses a set of similar tasks:

• Named Entity Disambiguation, that is the task of linking entity mentions in a text to a knowledge base

• Wikification, that is the automatic annotation of text by linking its relevant fragments of text to the appropriate Wikipedia articles.

Page 44: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 51Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

The multilingual aspect of disambiguation

• In both tasks, WSD and EL, knowledge-based approaches havebeen shown to perform well.

• What about multilinguality?

• Which kind of resources are available out there?

Open

Multilingual

WordNet

Page 45: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 52Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

A Joint approach to WSD and EL

The main difference between WSD and EL is

the kind of inventory used

Page 46: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 53Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

But BabelNet can be used as a multilingual inventory for both:

1. Concepts

• Calcio in Italian can denote different concepts:

2. Named Entities

• The text Mario can be used to represent different things such asthe video game character or a soccer player (Gomez) or even a music album

Page 47: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 54Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Calcio/Kick in BabelNet 2.5

Page 48: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 55Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Calcio/Calcium in BabelNet 2.5

Page 49: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 56Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Calcio/Soccer in BabelNet 2.5

Page 50: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 57Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Disambiguation and Entity Linkingtogether!

BabelNet is a huge multilingual inventory

for both word senses and named entities!

Page 51: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 58Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

So what?

http://babelfy.org

Page 52: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 59Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Babelfy: A Joint approach to WSD and EL [Moro et al., TACL 2014]

• Based on Personalized PageRank, the state-of-the-art method for graph-based WSD.

However, it cannot be run for each new input on hugegraphs.

• Idea: Precompute semantic signatures for the nodes!

• Semantic signatures are the most relevant nodes for a given node in the graph computed by using random walk with restart

Andrea Moro and Alessandro Raganato and Roberto Navigli. 2014. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational Linguistics (TACL), 2.

Page 53: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 60Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

http://babelfy.org

Demo tomorrow afternoon!

Page 54: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 61Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Babelfying ISWC!

Page 55: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 62Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Page 56: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 63Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Multilingual Word Sense Disambiguation and Entity Linking – COLING 2014 TutorialRoberto Navigli and Andrea Moro

Page 57: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 64Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Page 58: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 65Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Page 59: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 66Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Page 60: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 67Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Page 61: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 68Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Page 62: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 90Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Annotating with BabelNet: all in one!

• Annotating with BabelNet implies annotating with

WordNet and Wikipedia

• (now also OmegaWiki,

Open MultilingualWordNet, Wiktionaryand WikiData!)

Key fact!

90

BabelNet

7

Page 63: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 92Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Open Problems: grammar-agnostic

• All current approaches exploit:• POS tagging

• Lemmatization

• How to improve?• Waiting for better POS taggers

• Character-based analysis of text

Noisy (>90% for English,

but much less on

morphologically rich

languages).

Page 64: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 93Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Open Problems: language-agnostic

• All current approaches exploit:• Knowledge of the input language

• Automatic language recognition

• How to improve?• Waiting for better language recognition systems

• Unify the lexicalizations of different languages

Noisy (>90% for English,

but much less on

resource poor

languages). Moreover,

text which consists of

text in multiple

languages will be

wrongly analyzed for

sure!

Page 65: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 94Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Open Problems: fragment recognition

• Most of the current approaches exploit:• Named Entity Recognition

• Not overlapping text assumption

• How to improve?• Waiting for better NER system

• Overlap and match everything

Noisy (>80% for English,

but much less on

resource poor

languages). Moreover,

when assuming that

entities and word

senses should not

overlap you lose

information!

Page 66: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 99Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Hands-on Session: Babelfy

Page 67: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 100Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Exercise 1

• Go to babelfy.org• Type in or copy/paste your favourite text in

your favourite language in the text area

• Select the text language

• Click on «Babelfy!»

• Understand the difference between green and yellow balloons

Page 68: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 101Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Part 3:

Producing multilingual

linked data

Page 69: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 102Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Babelfy 2 NIF (hackathon at SEMANTiCS 2014)

Free text Text annotated

with Babelfy

Annotated text

in RDF

Page 70: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 103Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

From the previoustalk you shouldalready know

everything aboutNIF, don’t you?

But just in case…

Page 71: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 104Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

NIF

• The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. NIF consists of specifications, ontologies and software

Page 72: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 105Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

NIF

• Reuse of existing standards (such as RDF, OWL 2,

the PROV Ontology, etc.)

• NIF identifiers are used in the Internationalization Tag Set (ITS) Version 2.0

• Royalty-free and published under an open license.

• Driven by its open community project NLP2RDF

• good uptake by industry, open-source projects

and developers.

Page 73: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 106Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Page 74: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 107Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Annotate text with a few lines of code!

Obtain the annotations

Convert into RDF

Set the language

Set the text

Page 75: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 108Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Babelfy2Nif: an example

“the semantic web is a

collaborative movement led by

the international standards body

world wide web consortium”

Toy sentence:

Page 76: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 109Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

“the semantic web is a collaborative movement led

by the international standards body world wide

web consortium”

Babelfy2Nif: an example

nif:Context defines the overall text

Text is modelled by fragments

Fragments are identifiedby left and right indices

The BabelNet synset(i.e., the annotation of the fragment)

Page 77: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 110Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Hands-on Session: Babelfy 2 NIF

Page 78: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 111Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Overview

1. Open directory babelfy

2. Open file babelfy2nif.properties under config/directory

– Select your language (language variable)

– Type in your favourite text (text variable)

– Select the algorithm for handling overlapping annotations(algorithm variable)

– Select the output format (rdf_format variable)

– Select the output stream (output_stream variable)

3. Launch “sh run_babelfy2nif-demo.sh”

Page 79: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 112Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Exercise 1

• Take the first paragraph of theEnglish ISWC Wikipedia pagehttp://en.wikipedia.org/wiki/International_Semantic_Web_Conference

• Feed it in the property file and produce 2 outputs:

1. With LONGEST_ANNOTATION_GREEDY_ALGORITHM algorithm and NTRIPLE rdf format

2. With FIRST_COME_FIRST_SERVED_ALGORITHM algorithm and TURTLE rdf format

Page 80: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 114Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

To summarize

Page 81: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 115Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

To summarize

We have taken you through a tour of:

A very large multilingual semantic network: BabelNet

A state-of-the-art WSD and EL system: Babelfy

Page 82: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 116Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Acknowledgements

• European Research Council and the EU Commission for

funding our research

• Maud Ehrmann and Andrea Moro for their help with

slides

Page 83: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 117Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

Page 84: Building the Multilingual Web of Data: A Hands-on tutoriallider-project.eu/sites/default/files/iswc14/ISWC2014... · 2018. 4. 26. · 20/10/2014 Presenter name 1 Building the Multilingual

20/10/2014 118Roberto Navigli e Tiziano Flati – La Sapienza University of Rome

http://lcl.uniroma1.it

http://babelnet.org

http://babelfy.org

Google group: babelnet-group