enhancing named entity recognition in twitter messages using entity linking
TRANSCRIPT
Enhancing Named Entity Recognition inTwitter Messages Using Entity Linking
Ikuya Yamada1,2,3 Hideaki Takeda3 Yoshiyasu Takefuji2
1Studio Ousia 2Keio University 3National Institute of Informatics
15年7月31日金曜日
STUDIO OUSIA
Background
‣ Twitter NER is difficult because of the noisy, short, and colloquial nature of tweets
‣ The performance of standard NER software suffers significantly
2
15年7月31日金曜日
STUDIO OUSIA
Entity Linking
3
New Frozen Boutique to Open at Disney's Hollywood Studios
/wiki/Frozen_(2013_film)/wiki/The_Walt_Disney_Company /wiki/Disney’s_Hollywood_Studios
‣ Entity Linking: The task of linking entity mentions to entries in a knowledge base (KB) (e.g., Wikipedia)
‣ Recently entity linking has received considerable attention✦ Many research papers (2006-) [Cucerzan 2007, Milne et al. 2008, etc.]
✦ Competitions (TAC KBP, ERD@SIGIR, #Microposts@WWW, etc.)
15年7月31日金曜日
STUDIO OUSIA 5
New Frozen Boutique to Open at Disney's Hollywood Studios
Detecting “Frozen” from this tweet is difficult
15年7月31日金曜日
STUDIO OUSIA
Entity Linking
6
New Frozen Boutique to Open at Disney's Hollywood Studios
/wiki/Frozen_(2013_film)/wiki/The_Walt_Disney_Company /wiki/Disney’s_Hollywood_Studios
‣ By using entity linking, we can detect “Frozen”:✦ “Frozen” is a very popular entity (from Wikipedia link
structure and page view count)
✦ “Frozen” is semantically related to the context entities
15年7月31日金曜日
STUDIO OUSIA
Our Approach
‣ Our system first performs entity linking in an end-to-end manner
‣ Detected entity mentions are used to enhance the NER tasks
‣ The data of entities are extracted from several open knowledge bases (Wikipedia, DBpedia, Freebase)
‣ Segmentation and classification tasks are addressed by using separate components
7
End-to-EndEntity Linking
Segmentation(NER)
Classification(NER)
15年7月31日金曜日
End-to-End Entity LinkingEnd-to-End
Entity LinkingSegmentation
(NER)Classification
(NER)
15年7月31日金曜日
STUDIO OUSIA
End-to-End Entity Linking
‣ An entity linking system specifically designed for tweets✦ Does not depend on NER to detect entity mentions (considering all
possible n-grams as mention candidates)✦ Based on supervised machine-learning (random forest) using various kinds
of features (trained using #Microposts2015 dataset)✦ Winner of a recent Twitter entity linking competition called
#Microposts2015 NEEL Challenge at WWW2015
‣ For further details, please refer to:Yamada et al, An End-to-End Entity Linking Approach for Tweetsin Proceedings of #Microposts 2015
9
Image taken from NEEL2015 Challenge Summary: http://www.slideshare.net/giusepperizzo/neel2015-challenge-summary
15年7月31日金曜日
Segmentation of Named EntitiesEnd-to-End
Entity LinkingSegmentation
(NER)Classification
(NER)
15年7月31日金曜日
STUDIO OUSIA
Segmentation: Approach
‣ Supervised machine-learning is used to assign a binary label to each of possible n-grams
‣ Random forest is used as the machine-learning algorithm
‣ Overlaps of mentions are resolved by iteratively selecting the longest entity mention from the beginning of the tweet
‣ Machine-learning features can be classified as follows:✦ Entity-based features✦ Linguistic features
11
15年7月31日金曜日
STUDIO OUSIA
Segmentation: Entity-based Features
‣ The relevance score assigned by the entity linking system
‣ The popularity of the entity:✦ The number of inbound links of the entity in
Wikipedia
✦ The average page view count of the Wikipedia entity
‣ Mention statistics in Wikipedia:✦ Link probability✦ Capitalization probability
12
15年7月31日金曜日
STUDIO OUSIA
Segmentation: Link Probability Feature
13
Her public image is associated with Japan's kawaisa
culture centered in Harajuku, Tokyo
Takeshita Street is a street lined with
fashion boutiques, and cafes in Harajuku
in Tokyo, Japan.
Department Store and Museum is a department store
located in the Harajuku...
Takeshita Street Kyary Pamyu Pamyu Laforet
Link Plain text
LINK_PROBABILITY(Harajuku) = 2/3
15年7月31日金曜日
STUDIO OUSIA
Segmentation: Linguistic Features
‣ Whether or not Stanford NER detects the mention
‣ Part-of-speech tags of the current and surrounding words
‣ Whether or not the current and surrounding words are capitalized
‣ Mention length (# of words, # of characters)
14
15年7月31日金曜日
Classification of Named EntitiesEnd-to-End
Entity LinkingSegmentation
(NER)Classification
(NER)
15年7月31日金曜日
STUDIO OUSIA
Classification‣ Supervised machine-learning is used to classify detected
mentions into the predefined types
‣ Linear SVM is used as the machine-learning algorithm
‣ Main machine-learning features:✦ Entity types in knowledge bases
(DBpedia Ontology Classes and Freebase Types)✦ Entity type detected by Stanford NER
(i.e., PERSON, ORGANIZATION, LOCATION)✦ The average of vectors of words in the n-gram using
Stanford GloVe word embeddings (840B model)✦ The relevance score assigned by entity linking
16
15年7月31日金曜日
STUDIO OUSIA
Results
‣ Our method outperformed the 2nd-ranked method by 10.34 F1 at the segmentation task and by 5.01 F1 at the end-to-end task!
17
Performances of the proposed systems at segmenting entities
Performances of the proposed systems at both segmentation and classification tasks
15年7月31日金曜日
STUDIO OUSIA
Conclusion
‣ Twitter NER can be enhanced by using entity linking
‣ Entity linking enables us to use quality data in knowledge bases for NER tasks
18
15年7月31日金曜日