a study of hybrid similarity measures for semantic relation extraction
DESCRIPTION
A Study of Hybrid Similarity Measures for Semantic Relation Extraction. Presenter : Bei -YI Jiang Authors : Universit´e catholique de Louvain, Belgium 2012 . Association for Computing Machinery. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
Intelligent Database Systems Lab
Presenter : BEI-YI JIANG
Authors : UNIVERSIT´E CATHOLIQUE DE LOUVAIN, BELGIUM
2012. ASSOCIATION FOR COMPUTING MACHINERY
A Study of Hybrid Similarity Measures for Semantic Relation Extraction
Intelligent Database Systems Lab
Outlines
MotivationObjectivesMethodologyExperimentsConclusionsComments
Intelligent Database Systems Lab
Motivation
• The quality of the relations provided by existing extractors is still lower than the quality of the manually constructed relations.
• Most studies are still not taking into account the whole range of existing measures, combining mostly sporadically different methods.
Intelligent Database Systems Lab
Objectives
• To development of new relation extraction methods.• The method is a systematic analysis of 16 baseline
measures, and their combinations with 8 fusion methods and 3 techniques for the combination set selection.
Intelligent Database Systems Lab
Methodology• norm function
• similarity scores
• knn function
Intelligent Database Systems Lab
Methodology-Single Similarity Measures
• Measures Based on a Semantic Network(5)– exploit the lengths of the shortest paths between
terms in a network– probability of terms derived from a corpus– Wu and Palmer, Leacock and Chodorow, Resnik,
Jiang and Conrath , and Lin
Intelligent Database Systems Lab
• Web-based Measures(3)– Web search engines– rely on the number of times the terms co-occur in
the documents– Normalized Google Distance(NGD)– Measures of Semantic Relatedness(MSR)– YAHOO!, BING, GOOGLE over the domain
wikipedia.org
Methodology-Single Similarity Measures
Intelligent Database Systems Lab
• Corpus-based Measures(5)– Distributional Measures
› Bag-of-words Distributional Analysis(BDA) › Syntactic Distributional Analysis(SDA)
– Pattern-based Measure› PatternWiki
– Other Corpus-based Measures› Latent Semantic Analysis(LSA)› Normalized Google Distance(NGD)
Methodology-Single Similarity Measures
Intelligent Database Systems Lab
• Definition-based Measures(3)– WktWiki– Gloss Vectors– Extended Lesk
Methodology-Single Similarity Measures
Intelligent Database Systems Lab
• Combination Methods – Input: a set of similarity matrices{S1, . . . , SK}
produced by K single measures– Output: a combined similarity matrix Scmb
› 1. Mean› 2. Mean-Nnz› 3. Mean-Zscore› 4. Median
Methodology- Hybrid Similarity Measures
› 5. Max› 6. Rank Fusion› 7. Relation Fusion› 8. Logit
Intelligent Database Systems Lab
• Combination Methods– Mean. A mean of K pairwise similarity scores:
– Mean-Nnz. A mean of those pairwise similarity scores which have a non-zero value:
Methodology- Hybrid Similarity Measures
Intelligent Database Systems Lab
• Combination Methods– Mean-Zscore. A mean of K similarity scores transformed
into Z-scores:
– Median. A median of K pairwise similarities:
Methodology- Hybrid Similarity Measures
Intelligent Database Systems Lab
• Combination Methods– Max. A maximum of K pairwise similarities:
– Rank Fusion.
Methodology- Hybrid Similarity Measures
Intelligent Database Systems Lab
• Combination Methods– Relation Fusion.
– Logit.
Methodology- Hybrid Similarity Measures
Intelligent Database Systems Lab
• Combination Sets– Expert choice of measures
– Forward stepwise procedure
– Logistic regression
Methodology- Hybrid Similarity Measures
Intelligent Database Systems Lab
Experiments• Evaluation– Human Judgements Datasets.
› MC, RG, WordSim353
– Semantic Relations Datasets.› BLESS, SN
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Experiments
Intelligent Database Systems Lab
Conclusions
• The results have shown that the hybrid measures outperform the single measures on all datasets.
• A combination of 15 baseline corpus-, web-, network-, and dictionary-based measures with Logistic Regression provided the best results.
Intelligent Database Systems Lab
Comments• Advantages– higher performance
• Applications