entity linking meets word sense disambiguation: a unified approach(tacl 2014)の紹介
DESCRIPTION
My presentation of the paper that "Entity Linking meets Word Sense Disambiguation: a Unified Approach" (TACL 2014), Andrea Moro, Alessandro Raganato, Roberto Navigli (University of Roma)TRANSCRIPT
Entity Linking meets Word Sense Disambiguation:
a Unified ApproachAndrea Moro, Alessandro Raganato, Roberto
Navigli (University of Roma)TACL Vol.2 (5/2014) pp 231-244
Presenter : Koji Matsuda (Tohoku University)最先端 NLP 勉強会 #6 @ 東京大学
Entity Linking meets Word Sense Disambiguation: a Unified Approach 1
WSD and Entity Linking Together
Entity Linking meets Word Sense Disambiguation: a Unified Approach 2
Lexical Knolwdge Base
Emcyclopedical Knolwdge Base
Integrated Knowledge Base
Thomas Muller
striker
Munich
Mario Gomez
Thomas Millan
playing
FC Bayern Munich
Semantic Interpretation Graph
Semantic Signature
→ Select most suitable meaning on the Graph
Thomas and Mario are strikers playing in Munich. They are …
Input Text
Background : Word Sense Disambiguation(WSD) in a Nutshell
“Thomas and Mario are strikers playing in Munich”
Entity Linking meets Word Sense Disambiguation: a Unified Approach 3
Knowledge base(i.e. WordNet)
WSD System
Sense of target word
Background : Entity Linkng(EL) in a Nutshell
“Thomas and Mario are strikers playing in Munich”
Entity Linking meets Word Sense Disambiguation: a Unified Approach 4
Knowledge base(i.e. Wikipedia)
EL System
Named Entity
• Mention Detection• Link Detection• Entity Disambiguation
Entity Linking meets Word Sense Disambiguation: a Unified Approach 5
A Joint approach to WSD and EL
• Knowledge-based approaches perform well on both these two tasks:– The main difference is the kind of
inventory(knowledge-base) used
• If we had knowledge-base that contain concept and named entity, we can do it together !
Entity Linking meets Word Sense Disambiguation: a Unified Approach 6
BabelNet
• Multilingual Encyclopedic Dictionary– Lexicographic & Encyclopedic knowledge– Based on Automatic Integration of :
• WordNet, Wikipedia, Wiktionary, …
Named Entities and specialized concepts from Wikipedia
Concepts from WordNet
50 Languages21M definitions62M entries
Entity Linking meets Word Sense Disambiguation: a Unified Approach 7Concepts integrated from both
resources
Babelfy: A Joint Approach to WSD and EL
1. Precompute Semantic Signatures;2. Select all the possible candidate meanings from
BabelNet by matching mentions with BabelNet lexicalizations;
3. Connect all the candidate meanings by using Semantic Signatures ( = Semantic Interpretation Graph);
4. Extract a dense subgraph containing semantically coherent candidate;
5. Select the most connected candidate for each fragment;
Entity Linking meets Word Sense Disambiguation: a Unified Approach 8
Step 1 : Compute Semantic Signature
• Semantic Signatures : Set of relevant vertices for a given vertex in the semantic network – computed by using RandomWalk with
Restart(RWR) over BabelNet
1. Start from the target vertex of the semantic network ;2. Randomly select a neighbor of the current vertex or restart
from the target vertex;3. Keep the counts of the hitting frequencies;4. Take the most visited vertices;
Entity Linking meets Word Sense Disambiguation: a Unified Approach 9
Example of Semantic Signature
striker
offside
athlete
sportsoccer player
Entity Linking meets Word Sense Disambiguation: a Unified Approach 10
semSign(“striker”) = { “sport”, “offside”, “soccer player”, “athrete”, … }
Babelfy: A Joint Approach to WSD and EL
1. Precompute Semantic Signatures;2. Select all the possible candidate meanings from
BabelNet by matching mentions with BabelNet lexicalizations;
3. Connect all the candidate meanings by using Semantic Signatures ( = Semantic Interpretation Graph);
4. Extract a dense subgraph containing semantically coherent candidate;
5. Select the most connected candidate for each fragment;
Entity Linking meets Word Sense Disambiguation: a Unified Approach 11
Step 3: Construct SI Graph (1)Create vertex that all candidate meaning of the text(Algorithm 2, Line 6 – 8)
Example: Thomas and Mario are strikers playing in Munich
( forward, striker)
( striker, striker)
( Munich, Munich)
( FC Bayern Munich, Munich)
( Thomas Milan, Thomas )
( Thomas Muller, Thomas )
Entity Linking meets Word Sense Disambiguation: a Unified Approach 12
( Mario Basler, Mario)
(Mario Gomez, Mario)
(Mario Adorf, Mario)
Step 3: Construct SI Graph (2)
( Thomas Milan, Thomas )
( Thomas Muller, Thomas ) ( Mario Basler, Mario)
(Mario Gomez, Mario)
( Munich, Munich)
( FC Bayern Munich, Munich)
( forward, striker)
( striker, striker)
Connect related meanings based on Semantic Signature(Algorithm 2 , Line 9 – 11)
Example: Thomas and Mario are strikers playing in Munich
Entity Linking meets Word Sense Disambiguation: a Unified Approach 13
(Mario Adorf, Mario)
Babelfy: A Joint Approach to WSD and EL
1. Precompute Semantic Signatures;2. Select all the possible candidate meanings from
BabelNet by matching mentions with BabelNet lexicalizations;
3. Connect all the candidate meanings by using Semantic Signatures ( = Semantic Interpretation Graph);
4. Extract a dense subgraph containing semantically coherent candidate;
5. Select the most connected candidate for each fragment;
Entity Linking meets Word Sense Disambiguation: a Unified Approach 14
Step 4: Densest subgraph Heuristics
• Reducing the level of ambiguity of the SI graph is helpful– Main Idea : Most suitable meaning of flagment will
belongs to the densest area of the graph– But, Identifying densest subgraph problem is NP-hard
– Iterative removal of low-coherence vartices• Algorithm 3• Identify most ambiguous flagment fmax and discard weakest
interpretation of fmax iteratively
Entity Linking meets Word Sense Disambiguation: a Unified Approach 15
Step 4: Densest subgraph Heuristics
( Thomas Milan, Thomas )
( Thomas Muller, Thomas ) ( Mario Basler, Mario)
(Mario Gomez, Mario)
( Munich, Munich)
( FC Bayern Munich, Munich)
( forward, striker)
( striker, striker)
Remove weakest interpretation of the flagement iteratively (Algorithm 2 , Line 12)
Example: Thomas and Mario are strikers playing in Munich
Entity Linking meets Word Sense Disambiguation: a Unified Approach 16
(Mario Adorf, Mario)
Babelfy: A Joint Approach to WSD and EL
1. Precompute Semantic Signatures;2. Select all the possible candidate meanings from
BabelNet by matching mentions with BabelNet lexicalizations;
3. Connect all the candidate meanings by using Semantic Signatures ( = Semantic Interpretation Graph);
4. Extract a dense subgraph containing semantically coherent candidate;
5. Select the most connected candidate for each fragment;
Entity Linking meets Word Sense Disambiguation: a Unified Approach 17
Step 5: Select most reliable meanings
( Thomas Milan, Thomas )
( Thomas Muller, Thomas ) ( Mario Basler, Mario)
(Mario Gomez, Mario)
( FC Bayern Munich, Munich)
( forward, striker)
( striker, striker)
Example: Thomas and Mario are strikers playing in Munich
Step3. Select most suitable meaning for each flagment with normalized weighted degree ( Eq. (2) ) (Algorithm 2 , 14 – 18 )
Entity Linking meets Word Sense Disambiguation: a Unified Approach 18
Step 5: Select most reliable meanings
• Lexical coherence : Fraction of fragments the candidate related to :
• Semantic coherence : Graph centrality measure among the candidate meanings :
Entity Linking meets Word Sense Disambiguation: a Unified Approach 19
Lexical cohorence Semantic cohorence
deg(v) :number of incoming and outcoming edges of v
Experiment
• WSD Datasets (only nominal mentions):– SemEval-2013 task12 : Multilingual WSD
• English, French, German, Italian, Spanish
– SemEval-2007 task 7 : Coarse-grained WSD– SemEval-2007 task17 : Fine-grained WSD– Senseval-3 WSD : Fine-grained WSD
• Entity Linking Datasets:– KORE50 : 50 short sentences, YAGO2– AIDA-CoNLL : 1392 articles, YAGO
Entity Linking meets Word Sense Disambiguation: a Unified Approach 20
Result : WSD (fine grained)
Proposed
Baseline
Entity Linking meets Word Sense Disambiguation: a Unified Approach 21
F1 Score
Result : Entity Linkng
Proposed
EL only
Entity Linking meets Word Sense Disambiguation: a Unified Approach 22
Accuracy
Impact of ComponentsWord Sense Disambiguation Entity Linking
System Sens3 Sem07 Sem07(coarse)
Sem13(EN, BN)
KORE50 CoNLL
Babelfy (proposed) 68.3 62.7 85.5 69.2 71.5 82.1+ unif. weight 67.0 65.2 85.7 68.5 69.4 81.7+ w/o dens. sub. 68.3 63.3 84.9 68.7 62.5 78.1+ only concepts 68.2 62.7 85.3 68.7 - -+ only NE - - - - 68.1 78.8+ on senteces 66.0 65.2 82.3 67.1 - -
Observations :• Triangle-based weighting has a smaller impact• Densest subgraph heuristics is effective in EL taks, but not for WSD dataset• Joint use of lexicographic and encyclopedic knowledge has benefit on each
of task Entity Linking meets Word Sense Disambiguation: a Unified Approach 23
Conclusion
• Integrated approach to EL and WSD– Semantic signature (random walk over knowledge-base)– Unconstrained identification of candidate meaning– Linkng based on high-coherence densest subtree heuristics
• The approach exploits two key feature of BabelNet– Multilinguality and integration of lexicographic and encyclopedic
knowledge• The result of the experiments show state-of-the-art
performance– Robustness across language– Ablation tests shows which component is needed to contribute
performanceEntity Linking meets Word Sense
Disambiguation: a Unified Approach 24
感想など• Roberto Navigli のチームが長年取り組んできた Knowledge-base があって
こその研究– BabelNet: WordNet concept + Wikipedia NE (50 言語 ) + etc…
• 手法自体はそれほど凝ったものではないが,使っている知識ベースの性質をよく生かしている– Semantic Signature については, [Pilehvar and Navigli 2013] の定義を簡略化 ( 語
義上の多項分布 => 単なる語義集合 )• WSD が EL の役に立つ ( 同時に解くことに意味がある ) ,ということを
述べた初の研究– しかし,実験結果からは EL が WSD の性能向上に有効,とは言えないと思う– BabelNet の多言語性が有効に働くかについては, Multilingual WSD の結果のみ
なので,明確には言えない • RandomWalk with Restart(RWR) はまんま PageRank じゃないかと思って著
者に聞いてみたら,” Yes, 無限回計算したら Personalized PageRank と同じ値になるよ”とのこと
Entity Linking meets Word Sense Disambiguation: a Unified Approach 25
http://babelfy.org
Entity Linking meets Word Sense Disambiguation: a Unified Approach 26
※ SDK, REST API もあるとのことです.