entity linking meets word sense disambiguation: a unified approach(tacl 2014)の紹介

26
Entity Linking meets Word Sense Disambiguation: a Unified Approach Andrea Moro, Alessandro Raganato, Roberto Navigli (University of Roma) TACL Vol.2 (5/2014) pp 231-244 Presenter : Koji Matsuda (Tohoku University) 最最最 NLP 最最最 #6 @ 最最最最 Entity Linking meets Word Sense Disambiguation: a Unified Approach 1

Upload: koji-matsuda

Post on 24-Jan-2015

385 views

Category:

Engineering


3 download

DESCRIPTION

My presentation of the paper that "Entity Linking meets Word Sense Disambiguation: a Unified Approach" (TACL 2014), Andrea Moro, Alessandro Raganato, Roberto Navigli (University of Roma)

TRANSCRIPT

Page 1: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Entity Linking meets Word Sense Disambiguation:

a Unified ApproachAndrea Moro, Alessandro Raganato, Roberto

Navigli (University of Roma)TACL Vol.2 (5/2014) pp 231-244

Presenter : Koji Matsuda (Tohoku University)最先端 NLP 勉強会 #6 @ 東京大学

Entity Linking meets Word Sense Disambiguation: a Unified Approach 1

Page 2: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

WSD and Entity Linking Together

Entity Linking meets Word Sense Disambiguation: a Unified Approach 2

Lexical Knolwdge Base

Emcyclopedical Knolwdge Base

Integrated Knowledge Base

Thomas Muller

striker

Munich

Mario Gomez

Thomas Millan

playing

FC Bayern Munich

Semantic Interpretation Graph

Semantic Signature

→ Select most suitable meaning on the Graph

Thomas and Mario are strikers playing in Munich. They are …

Input Text

Page 3: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Background : Word Sense Disambiguation(WSD) in a Nutshell

“Thomas and Mario are strikers playing in Munich”

Entity Linking meets Word Sense Disambiguation: a Unified Approach 3

Knowledge base(i.e. WordNet)

WSD System

Sense of target word

Page 4: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Background : Entity Linkng(EL) in a Nutshell

“Thomas and Mario are strikers playing in Munich”

Entity Linking meets Word Sense Disambiguation: a Unified Approach 4

Knowledge base(i.e. Wikipedia)

EL System

Named Entity

• Mention Detection• Link Detection• Entity Disambiguation

Page 5: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Entity Linking meets Word Sense Disambiguation: a Unified Approach 5

Page 6: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

A Joint approach to WSD and EL

• Knowledge-based approaches perform well on both these two tasks:– The main difference is the kind of

inventory(knowledge-base) used

• If we had knowledge-base that contain concept and named entity, we can do it together !

Entity Linking meets Word Sense Disambiguation: a Unified Approach 6

Page 7: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

BabelNet

• Multilingual Encyclopedic Dictionary– Lexicographic & Encyclopedic knowledge– Based on Automatic Integration of :

• WordNet, Wikipedia, Wiktionary, …

Named Entities and specialized concepts from Wikipedia

Concepts from WordNet

50 Languages21M definitions62M entries

Entity Linking meets Word Sense Disambiguation: a Unified Approach 7Concepts integrated from both

resources

Page 8: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Babelfy: A Joint Approach to WSD and EL

1. Precompute Semantic Signatures;2. Select all the possible candidate meanings from

BabelNet by matching mentions with BabelNet lexicalizations;

3. Connect all the candidate meanings by using Semantic Signatures ( = Semantic Interpretation Graph);

4. Extract a dense subgraph containing semantically coherent candidate;

5. Select the most connected candidate for each fragment;

Entity Linking meets Word Sense Disambiguation: a Unified Approach 8

Page 9: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Step 1 : Compute Semantic Signature

• Semantic Signatures : Set of relevant vertices for a given vertex in the semantic network – computed by using RandomWalk with

Restart(RWR) over BabelNet

1. Start from the target vertex of the semantic network ;2. Randomly select a neighbor of the current vertex or restart

from the target vertex;3. Keep the counts of the hitting frequencies;4. Take the most visited vertices;

Entity Linking meets Word Sense Disambiguation: a Unified Approach 9

Page 10: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Example of Semantic Signature

striker

offside

athlete

sportsoccer player

Entity Linking meets Word Sense Disambiguation: a Unified Approach 10

semSign(“striker”) = { “sport”, “offside”, “soccer player”, “athrete”, … }

Page 11: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Babelfy: A Joint Approach to WSD and EL

1. Precompute Semantic Signatures;2. Select all the possible candidate meanings from

BabelNet by matching mentions with BabelNet lexicalizations;

3. Connect all the candidate meanings by using Semantic Signatures ( = Semantic Interpretation Graph);

4. Extract a dense subgraph containing semantically coherent candidate;

5. Select the most connected candidate for each fragment;

Entity Linking meets Word Sense Disambiguation: a Unified Approach 11

Page 12: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Step 3: Construct SI Graph (1)Create vertex that all candidate meaning of the text(Algorithm 2, Line 6 – 8)

Example: Thomas and Mario are strikers playing in Munich

( forward, striker)

( striker, striker)

( Munich, Munich)

( FC Bayern Munich, Munich)

( Thomas Milan, Thomas )

( Thomas Muller, Thomas )

Entity Linking meets Word Sense Disambiguation: a Unified Approach 12

( Mario Basler, Mario)

(Mario Gomez, Mario)

(Mario Adorf, Mario)

Page 13: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Step 3: Construct SI Graph (2)

( Thomas Milan, Thomas )

( Thomas Muller, Thomas ) ( Mario Basler, Mario)

(Mario Gomez, Mario)

( Munich, Munich)

( FC Bayern Munich, Munich)

( forward, striker)

( striker, striker)

Connect related meanings based on Semantic Signature(Algorithm 2 , Line 9 – 11)

Example: Thomas and Mario are strikers playing in Munich

Entity Linking meets Word Sense Disambiguation: a Unified Approach 13

(Mario Adorf, Mario)

Page 14: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Babelfy: A Joint Approach to WSD and EL

1. Precompute Semantic Signatures;2. Select all the possible candidate meanings from

BabelNet by matching mentions with BabelNet lexicalizations;

3. Connect all the candidate meanings by using Semantic Signatures ( = Semantic Interpretation Graph);

4. Extract a dense subgraph containing semantically coherent candidate;

5. Select the most connected candidate for each fragment;

Entity Linking meets Word Sense Disambiguation: a Unified Approach 14

Page 15: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Step 4: Densest subgraph Heuristics

• Reducing the level of ambiguity of the SI graph is helpful– Main Idea : Most suitable meaning of flagment will

belongs to the densest area of the graph– But, Identifying densest subgraph problem is NP-hard

– Iterative removal of low-coherence vartices• Algorithm 3• Identify most ambiguous flagment fmax and discard weakest

interpretation of fmax iteratively

Entity Linking meets Word Sense Disambiguation: a Unified Approach 15

Page 16: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Step 4: Densest subgraph Heuristics

( Thomas Milan, Thomas )

( Thomas Muller, Thomas ) ( Mario Basler, Mario)

(Mario Gomez, Mario)

( Munich, Munich)

( FC Bayern Munich, Munich)

( forward, striker)

( striker, striker)

Remove weakest interpretation of the flagement iteratively (Algorithm 2 , Line 12)

Example: Thomas and Mario are strikers playing in Munich

Entity Linking meets Word Sense Disambiguation: a Unified Approach 16

(Mario Adorf, Mario)

Page 17: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Babelfy: A Joint Approach to WSD and EL

1. Precompute Semantic Signatures;2. Select all the possible candidate meanings from

BabelNet by matching mentions with BabelNet lexicalizations;

3. Connect all the candidate meanings by using Semantic Signatures ( = Semantic Interpretation Graph);

4. Extract a dense subgraph containing semantically coherent candidate;

5. Select the most connected candidate for each fragment;

Entity Linking meets Word Sense Disambiguation: a Unified Approach 17

Page 18: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Step 5: Select most reliable meanings

( Thomas Milan, Thomas )

( Thomas Muller, Thomas ) ( Mario Basler, Mario)

(Mario Gomez, Mario)

( FC Bayern Munich, Munich)

( forward, striker)

( striker, striker)

Example: Thomas and Mario are strikers playing in Munich

Step3. Select most suitable meaning for each flagment with normalized weighted degree ( Eq. (2) ) (Algorithm 2 , 14 – 18 )

Entity Linking meets Word Sense Disambiguation: a Unified Approach 18

Page 19: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Step 5: Select most reliable meanings

• Lexical coherence : Fraction of fragments the candidate related to :

• Semantic coherence : Graph centrality measure among the candidate meanings :

Entity Linking meets Word Sense Disambiguation: a Unified Approach 19

Lexical cohorence Semantic cohorence

deg(v) :number of incoming and outcoming edges of v

Page 20: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Experiment

• WSD Datasets (only nominal mentions):– SemEval-2013 task12 : Multilingual WSD

• English, French, German, Italian, Spanish

– SemEval-2007 task 7 : Coarse-grained WSD– SemEval-2007 task17 : Fine-grained WSD– Senseval-3 WSD : Fine-grained WSD

• Entity Linking Datasets:– KORE50 : 50 short sentences, YAGO2– AIDA-CoNLL : 1392 articles, YAGO

Entity Linking meets Word Sense Disambiguation: a Unified Approach 20

Page 21: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Result : WSD (fine grained)

Proposed

Baseline

Entity Linking meets Word Sense Disambiguation: a Unified Approach 21

F1 Score

Page 22: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Result : Entity Linkng

Proposed

EL only

Entity Linking meets Word Sense Disambiguation: a Unified Approach 22

Accuracy

Page 23: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Impact of ComponentsWord Sense Disambiguation Entity Linking

System Sens3 Sem07 Sem07(coarse)

Sem13(EN, BN)

KORE50 CoNLL

Babelfy (proposed) 68.3 62.7 85.5 69.2 71.5 82.1+ unif. weight 67.0 65.2 85.7 68.5 69.4 81.7+ w/o dens. sub. 68.3 63.3 84.9 68.7 62.5 78.1+ only concepts 68.2 62.7 85.3 68.7 - -+ only NE - - - - 68.1 78.8+ on senteces 66.0 65.2 82.3 67.1 - -

Observations :• Triangle-based weighting has a smaller impact• Densest subgraph heuristics is effective in EL taks, but not for WSD dataset• Joint use of lexicographic and encyclopedic knowledge has benefit on each

of task Entity Linking meets Word Sense Disambiguation: a Unified Approach 23

Page 24: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

Conclusion

• Integrated approach to EL and WSD– Semantic signature (random walk over knowledge-base)– Unconstrained identification of candidate meaning– Linkng based on high-coherence densest subtree heuristics

• The approach exploits two key feature of BabelNet– Multilinguality and integration of lexicographic and encyclopedic

knowledge• The result of the experiments show state-of-the-art

performance– Robustness across language– Ablation tests shows which component is needed to contribute

performanceEntity Linking meets Word Sense

Disambiguation: a Unified Approach 24

Page 25: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

感想など• Roberto Navigli のチームが長年取り組んできた Knowledge-base があって

こその研究– BabelNet: WordNet concept + Wikipedia NE (50 言語 ) + etc…

• 手法自体はそれほど凝ったものではないが,使っている知識ベースの性質をよく生かしている– Semantic Signature については, [Pilehvar and Navigli 2013] の定義を簡略化 ( 語

義上の多項分布 => 単なる語義集合 )• WSD が EL の役に立つ ( 同時に解くことに意味がある ) ,ということを

述べた初の研究– しかし,実験結果からは EL が WSD の性能向上に有効,とは言えないと思う– BabelNet の多言語性が有効に働くかについては, Multilingual WSD の結果のみ

なので,明確には言えない • RandomWalk with Restart(RWR) はまんま PageRank じゃないかと思って著

者に聞いてみたら,” Yes, 無限回計算したら Personalized PageRank と同じ値になるよ”とのこと

Entity Linking meets Word Sense Disambiguation: a Unified Approach 25

Page 26: Entity linking meets Word Sense Disambiguation: a unified approach(TACL 2014)の紹介

http://babelfy.org

Entity Linking meets Word Sense Disambiguation: a Unified Approach 26

※ SDK, REST API もあるとのことです.