semantic similarity measures for semantic relation extraction
TRANSCRIPT
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Semantic Similarity Measures forSemantic Relation Extraction
Alexander PanchenkoCenter for Natural Language Processing (CENTAL)
Universite catholique de Louvain – [email protected]
September 21, 2012
1 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Plan
Introduction
Pattern-Based Similarity Measures
Hybrid Semantic Similarity Measures
2 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Semantic Similarity Measures
1. A similarity measure sij = sim(ci , cj)→ [0, 1]• ci , cj – terms• sij – high for semantic relations 〈ci , cj〉
• synonyms, hyponyms, co-hyponyms• sij – low for other pairs 〈ci , cj〉
2. Semantic similarity measures are useful for NLP/IR:• WSD (Patwardhan et al., 2003)• Query Expansion (Hsu et al., 2006)• QA (Sun et al., 2005)• Text Categorization (Tikk et al, 2003)• Text Similarity (Saric et al., 2012)
3 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
State of the Art
• WordNet-based measures• WuPalmer (1994), LeacockChodorow (1998), Resnik (1995)• rely on manually crafted resources• highest precision, limited coverage
• Dictionary-based measures• ExtendedLesk (Banerjee and Pedersen, 2003), GlossVectors
(Patward han and Pedersen, 2006) and WiktionaryOverlap(Zesch et al., 2008)
• rely on manually crafted resources• high precision, limited coverage
• Corpus-based measures• ContextWindow (Van de Cruys, 2010), SyntacticContext (Lin,
1998), LSA (Landauer et al., 1998)• no semantic resources are needed• low precision, high recall
• Combined e.g. WikiRelate! (Strube and Ponzetto, 2006) . . .4 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Introduction
Plan
Introduction
Pattern-Based Similarity MeasuresIntroductionLexico-Syntactic PatternsSemantic Similarity MeasuresResultsConclusion
Hybrid Semantic Similarity MeasuresIntroductionFeatures: Single Similarity MeasuresHybrid Similarity MeasuresResultsConclusion
5 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Introduction
Reference Paper
• Panchenko A., Morozova O., Naets H. “A SemanticSimilarity Measure Based on Lexico-Syntactic Patterns”.In Proceedings of KONVENS 2012, pp.174–178, 2012
6 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Introduction
Try a Demo
• http://serelex.cental.be/
7 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Lexico-Syntactic Patterns
Plan
Introduction
Pattern-Based Similarity MeasuresIntroductionLexico-Syntactic PatternsSemantic Similarity MeasuresResultsConclusion
Hybrid Semantic Similarity MeasuresIntroductionFeatures: Single Similarity MeasuresHybrid Similarity MeasuresResultsConclusion
8 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Lexico-Syntactic Patterns
General architecture
• 6 classical Hearst (1992) patterns• 12 further patterns• extracting hypernyms, co-hyponyms and synonyms
9 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Lexico-Syntactic Patterns
The main transducer
• A cascade of FSTs• Unitex
10 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Lexico-Syntactic Patterns
The 2nd pattern
• Allow for language variation, preserving precision• Compare to surface-based patterns (Bollegala et al., 2007)
11 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Lexico-Syntactic Patterns
Explicit extraction rules
• positive/negative contexts,• dictionaries,• insertions of adjectives, . . .
12 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Lexico-Syntactic Patterns
Patterns are applied to corpora
• No preprocessing is needed• 250Mb blocks• 1 block ≈ 1 hour @ Intel i5 [email protected]
13 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Lexico-Syntactic Patterns
Patterns extract concordances
• such diverse {[occupations]} as {[doctors]},{[engineers]} and {[scientists]}[PATTERN=1]
• such {non-alcoholic [sodas]} as {[root beer]} and{[cream soda]}[PATTERN=1]
• {traditional[food]}, such as{[sandwich]},{[burger]}, and {[fry]}[PATTERN=2]
Number of concordances:
• WaCypedia – 1.196.468• ukWaC – 2.227.025• WaCypedia+ukWaC – 3.423.493
14 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Semantic Similarity Measures
Plan
Introduction
Pattern-Based Similarity MeasuresIntroductionLexico-Syntactic PatternsSemantic Similarity MeasuresResultsConclusion
Hybrid Semantic Similarity MeasuresIntroductionFeatures: Single Similarity MeasuresHybrid Similarity MeasuresResultsConclusion
15 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Semantic Similarity Measures
General procedure
16 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Semantic Similarity Measures
Reranking
• Efreq. No re-ranking.
sij = eij
sij – semantic similarity between terms ci , cj ∈ Ceij – frequency of co-occurrence of ci and cj in concordances K
• Efreq-Rfreq. Penalizes terms strongly related to many words.
sij =2 · α · eij
ei∗ + e∗j,
ei∗ – a number of concordances containing word ciα – an expected number of semantically related words per term
17 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Semantic Similarity Measures
Reranking
• Efreq-Rnum. Penalizes terms strongly related to many words:
sij =2 · µb · eij
bi∗ + b∗j,
bi∗ =∑
j :eij≥β 1 – number of extractions with a frequency ≥ βµb = 1
|C |∑|C |
i=1 bi∗ – an average number of relations per term
• Efreq-Cfreq. Penalizes relations to general words e.g. “item”.
sij =P(ci , cj)
P(ci )P(cj)
P(ci , cj) =eij∑ij eij
– extraction probability of the pair 〈ci , cj〉
P(ci ) =fi∑i fi
– probability of the word ci
fi – frequency of ci in the corpus18 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Semantic Similarity Measures
Reranking
• Efreq-Rnum-Cfreq-Pnum. Combines previous formulas +pattern redundancy.
sij =√
pij ·2 · µb
bi∗ + b∗j·
P(ci , cj)
P(ci )P(cj).
pij = 1, 18 – number of patterns extracted the relation 〈ci , cj〉
19 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Results
Plan
Introduction
Pattern-Based Similarity MeasuresIntroductionLexico-Syntactic PatternsSemantic Similarity MeasuresResultsConclusion
Hybrid Semantic Similarity MeasuresIntroductionFeatures: Single Similarity MeasuresHybrid Similarity MeasuresResultsConclusion
20 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Results
Correlation with Human Judgements
term, ci term, cj judgement, s sim, s judgement, r sim, rtiger cat 7.35 0.85 1 3book paper 7.46 0.95 2 2
computer keyboard 7.62 0.81 3 1... ... ... ... . . . . . .
possibility girl 1.94 0.25 64 65sugar approach 0.88 0.05 65 23
Data:• WordSim353 – 353 term pairs (Finkelstein, 2002)• MC – 30 term pairs (Miller Charles, 1991)• RG – 65 term pairs (Rubenstein Goodenough, 1965)
Criteria:• Pearson correlation: ρ = cov(s,s)
σ(s)σ(s)
• Spearman’s correlation: r = cov(r,r)σ(r)σ(r)
21 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Results
Correlation with Human Judgements
22 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Results
Semantic Relation Ranking
term, ci term, cj relation type, tjudge adjudicate synjudge arbitrate synjudge chancellor syn... ... ...
judge pc randomjudge fare randomjudge lemon random
• BLESS (Baroni and Lenci, 2011)• 26554 relations• hyperonyms, co-hypernyms, meronyms, associations,
attributes, random relations• SN (Panchenko and Morozova, 2012)
• 14682 relations• synonyms, co-hyponyms, hyponyms, random relations
• |Rrandom||R| ≈ 0.5
23 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Results
Semantic Relation Ranking
• Based on the number of correctly ranked relations.• R – all non-random relations• R(k) – top k% relations of targets
Criteria
• Precision: P(k) = |R∩R(k)||R(k)| ,
• Recall: R(k) = |R∩R(k)||R| ,
• We use P(10), P(20), P(50), R(50).
24 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Results
Semantic Relation Ranking
• Precision P(50%) = 17 ≈ 0.86
term, ci term, cj relation type sijaficionado enthusiast syn 0.07197aficionado fan syn 0.05195aficionado admirer syn 0.01964aficionado addict syn 0.01326aficionado devotee syn 0.01163aficionado foundling random 0.00777aficionado fanatic syn 0.00414aficionado adherent syn 0.00353aficionado capital random 0.00232aficionado statute random 0.00029aficionado blot random 0.00025aficionado meddler random 0.00005aficionado enlargement random 0.00003aficionado bawdyhouse random 0.00000
25 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Results
Semantic Relation Ranking
26 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Results
Semantic Relation Ranking
Figure: Precision-Recall graphs calculated on the BLESS dataset: (a)PatternSim measures; (b) the best PatternSim measure versus baselines.
27 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Results
Semantic Relation Extraction
Figure: Semantic relation extraction: precision at k.
• 49 words – vocabulary of the RG dataset• three annotators, binary annotations
28 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Conclusion
Plan
Introduction
Pattern-Based Similarity MeasuresIntroductionLexico-Syntactic PatternsSemantic Similarity MeasuresResultsConclusion
Hybrid Semantic Similarity MeasuresIntroductionFeatures: Single Similarity MeasuresHybrid Similarity MeasuresResultsConclusion
29 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Conclusion
Conclusion
• We presented a similarity measure based on manually-craftedlexico-syntactic patterns.
• The measure provides results comparable to the baselinesand does not require semantic resources.
• Future work – using a supervised model to• combine different factors;• tune the meta-parameters.
Data: http://cental.fltr.ucl.ac.be/team/~panchenko/sim-eval/
Code: http://github.com/cental/patternsim/
Demo: http://serelex.cental.be/
30 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Introduction
Plan
Introduction
Pattern-Based Similarity MeasuresIntroductionLexico-Syntactic PatternsSemantic Similarity MeasuresResultsConclusion
Hybrid Semantic Similarity MeasuresIntroductionFeatures: Single Similarity MeasuresHybrid Similarity MeasuresResultsConclusion
31 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Introduction
Reference Paper
• Panchenko A. Morozova O. “A Study of Hybrid SimilarityMeasures for Semantic Relation Extraction” . InProceedings of Workshop of Innovative Hybrid Approaches tothe Processing of Textual Data Workshop, EACL 2012,pp.10-18, 2012
32 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Introduction
The State of Art
• A multitude of complimentary measures were proposed toextract synonyms, hypernyms, and co-hyponyms
• Most of them are based on one of the 5 key approaches:1. distributional analysis (Lin, 1998b)2. web as a corpus (Cilibrasi and Vitanyi, 2007)3. lexico-syntactic patterns (Bollegala et al., 2007)4. semantic networks (Resnik, 1995)5. definitions of dictionaries or encyclopedias (Zesch et al., 2008a)
• Some attempts were made to combine measures (Curran,2002; Cederberg and Widdows, 2003; Mihalcea et al., 2006;Agirre et al., 2009; Yang and Callan, 2009)
• However, most studies are still not taking into account all 5existing extraction approaches.
33 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Introduction
The State of Art
• A multitude of complimentary measures were proposed toextract synonyms, hypernyms, and co-hyponyms
• Most of them are based on one of the 5 key approaches:1. distributional analysis (Lin, 1998b)2. web as a corpus (Cilibrasi and Vitanyi, 2007)3. lexico-syntactic patterns (Bollegala et al., 2007)4. semantic networks (Resnik, 1995)5. definitions of dictionaries or encyclopedias (Zesch et al., 2008a)
• Some attempts were made to combine measures (Curran,2002; Cederberg and Widdows, 2003; Mihalcea et al., 2006;Agirre et al., 2009; Yang and Callan, 2009)
• However, most studies are still not taking into account all 5existing extraction approaches.
34 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Introduction
Contributions
• A systematic analysis of• 16 baseline similarity measures of 5 key extraction principles• their combinations with 8 fusion methods
• Hybrid similarity measures based on all the 5 extractionapproaches:1. distributional analysis2. Web as a corpus3. lexico-syntactic patterns4. semantic networks5. definitions of dictionaries or encyclopedias
35 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Introduction
Single and Hybrid Similarity Measures
• 16 single measures• 5 measures based on a semantic network• 3 web-based measures• 5 corpus-based measures
• 2 distributional• 1 lexico-syntactic patterns• 2 other co-occurence based
• 3 definition-based measures• 64 hybrid measures
• 8 combination methods• 8 measure sets obtained with 3 measure selection techniques
36 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Features: Single Similarity Measures
Plan
Introduction
Pattern-Based Similarity MeasuresIntroductionLexico-Syntactic PatternsSemantic Similarity MeasuresResultsConclusion
Hybrid Semantic Similarity MeasuresIntroductionFeatures: Single Similarity MeasuresHybrid Similarity MeasuresResultsConclusion
37 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Features: Single Similarity Measures
Measures Based on a Semantic Network
1. Wu and Palmer (1994)2. Leacock and Chodorow (1998)3. Resnik (1995)4. Jiang and Conrath (1997)5. Lin (1998)
Data:• WordNet 3.0• SemCor corpus
Variables:• Lengths of the shortest paths between terms in the network• Probability of terms derived from a corpus
Coverage: 155.287 English terms encoded in WordNet 3.0.38 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Features: Single Similarity Measures
Web-based Measures
Normalized Google Distance (NGD) (Cilibrasi and Vitanyi, 2007)
6. NGD-Yahoo!7. NGD-Bing8. NGD-Google over wikipedia.org domain
Data: number of times the terms co-occur in the documents asindexed by an IR system.Variables:
• number of hits returned by query ”ci”
• number of hits returned by query ”ci AND c ′′jCoverage: huge vocabulary in dozens of languages.
39 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Features: Single Similarity Measures
Corpus-based Measures
9. Bag-of-word Distributional Analysis (BDA) (Sahlgren, 2006)10. Syntactic Distributional Analysis (SDA) (Curran, 2003)
Data: WaCkypedia (800M tokens) and PukWaC (2000M tokens)corpora (Baroni et al., 2009)Variables:• feature vector based on the context window• feature vector based on the syntactic context
Coverage: word should occur in the corpora.
40 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Features: Single Similarity Measures
Corpus-based Measures
11. A measure based on lexico-syntactic patterns
Data: WaCkypedia corpus (800M tokens)Method:• 10 patterns for hypernymy extraction: 6 Hearst (1992)patterns + 4 other patterns
• such diverse {[occupations]} as {[doctors]},{[engineers]} and {[scientists]}[PATTERN=1]
• Efreq: semantic similarity sij between terms ci , cj ∈ C – thenumber of term co-occurences in the same concordance nij :
sim(ci , cj) = sij =nij
maxij(nij).
41 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Features: Single Similarity Measures
Corpus-based Measures
12. Latent Semantic Analysis (LSA) on TASA corpus(Landauer and Dumais, 1997)
13. NGD on Factiva corpus (Veksler et al., 2008)
42 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Features: Single Similarity Measures
Definition-based Measures
14. Extended Lesk (Banerjee and Pedersen, 2003)15. GlossVectors (Patwardhan and Pedersen, 2006)
Data: WordNet glosses.Variables:• bag-of-words vector of a term ci derived from the glosses• relation between words (ci , cj) in the network
Coverage: 117.659 glosses encoded in WordNet 3.0
43 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Features: Single Similarity Measures
Definition-based Measures
16. WktWiki – BDA on definitions of Wiktionary and Wikipedia 1
Data: Wikipedia abstracts, Wiktionary.Method:• Definition = abstract of Wikipedia article with title ”ci” +glosses, examples, quotations, related words, categories fromWiktionary for ci
• Represent a definition as a bag-of-words vector• Calculate similarities with cosine• Update similarities according to relations in the Wiktionary.
Coverage: Wiktionary: 536.594 glosses, Wikipedia: 3.8M articles
1The method stems from the work of Zesch et al. (2008)44 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Hybrid Similarity Measures
Plan
Introduction
Pattern-Based Similarity MeasuresIntroductionLexico-Syntactic PatternsSemantic Similarity MeasuresResultsConclusion
Hybrid Semantic Similarity MeasuresIntroductionFeatures: Single Similarity MeasuresHybrid Similarity MeasuresResultsConclusion
45 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Hybrid Similarity Measures
Combination Methods
• A goal of a combination method is to produce “better”similarity scores than the scores of single measures.
• A combination method takes as an input {S1, . . . ,SK}produced by K single measures and outputs Scmb.
• skij ∈ Sk is a pairwise similarity score of terms ci and cjproduced by k-th measure.
• We tested 8 combination methods.
46 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Hybrid Similarity Measures
Combination Methods
1. Mean. A mean of K pairwise similarity scores:
Scmb =1K
K∑k=1
Sk ⇔ scmbij =
1K
∑k=1,K
skij .
2. Mean-Nnz. A mean of scores having non-zero value:
scmbij =
1|k : sk
ij > 0, k = 1,K |∑
k=1,K
skij .
3. Mean-Zscore. A mean of scores transformed into Z-scores:
Scmb =1K
K∑k=1
Sk − µk
σk,
where µk and σk are a mean and a standard deviation of thescores of the k-th measure (Sk).
47 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Hybrid Similarity Measures
Combination Methods
4. Median. A median of K pairwise similarities:
scmbij = median(s1
ij , . . . , sKij ).
5. Max. A maximum of K pairwise similarities:
scmbij = max(s1
ij , . . . , sKij ).
6. RankFusion. A mean of scores converted to ranks:
scmbij =
1K
∑k=1,K
rkij ,
where rkij is the rank corresponding to the similarity score sk
ij .
48 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Hybrid Similarity Measures
Combination Methods
7. RelationFusion.• Unions the top relations found by each measure separately.• A relation extracted by several measures has more weight.• See (Panchenko and Morozova, 2012) for details.
49 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Hybrid Similarity Measures
Combination Methods
8. Logit. A supervised combination of similarity measures• Training a binary classifier (a Logistic Regression) on a set of
manually constructed semantic relations R (BLESS or SN)• Positive training examples are “meaningful” relations
(synonyms, hyponyms, co-hyponyms, associations)• Negative training examples are pairs of semantically
unrelated words (generated randomly and verified manually).• A relation 〈ci , t, cj〉 ∈ R is represented with an N-dimensionalvector of pairwise similarities: xij = (s1
ij , . . . , sNij ).
• Category yij :
yij =
{0 if 〈ci , t, cj〉 is a random relation1 otherwise
• Using the model (w1, . . . ,wK ) to combine measures:
scmbij =
11+ e−z , z = w0 +
K∑k=1
wkskij ,
50 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Hybrid Similarity Measures
Measure Selection
A problem
Number of ways to choose which of 16 single measures to combine:
216 = 65.535
• Expert choice of measures – 5, 9 and 15 measures• Forward Stepwise Procedure – 7, 8a, 8b, 10 measures• Analysis of LR weights – 12 measures
• The best predictors: C-BDA, C-SDA, C-LSA-Tasa,D-WktWiki, D-GlossVectors, D-ExtendedLesk.
51 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Hybrid Similarity Measures
Measure Selection
A problem
Number of ways to choose which of 16 single measures to combine:
216 = 65.535
• Expert choice of measures – 5, 9 and 15 measures• Forward Stepwise Procedure – 7, 8a, 8b, 10 measures• Analysis of LR weights – 12 measures• The best predictors: C-BDA, C-SDA, C-LSA-Tasa,D-WktWiki, D-GlossVectors, D-ExtendedLesk.
52 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Results
Plan
Introduction
Pattern-Based Similarity MeasuresIntroductionLexico-Syntactic PatternsSemantic Similarity MeasuresResultsConclusion
Hybrid Semantic Similarity MeasuresIntroductionFeatures: Single Similarity MeasuresHybrid Similarity MeasuresResultsConclusion
53 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Results
Single Similarity Measures
Figure: Performance of 16 single similarity measures on humanjudgement datasets (MC, RG, WordSim353). The best scores in agroup are in bold.
54 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Results
Single Similarity Measures
Figure: Performance of 16 single similarity measures on humanjudgement datasets (MC, RG, WordSim353) and semantic relationdatasets (BLESS and SN). The best scores in a group are in bold.
55 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Results
Hybrid Similarity Measures
Figure: Performance of 16 single and 8 hybrid similarity measures onhuman judgements datasets (MC, RG, WordSim353) and semanticrelation datasets (BLESS and SN). The best scores in a group(single/hybrid) are in bold; the very best scores are in grey.
56 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Results
Hybrid Similarity Measures
Figure: Precision-Recall graphs calculated on the BLESS dataset of (a)16 single measures and the best hybrid measure H-Logit-E15; (b) 8hybrid measures.
57 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Conclusion
Plan
Introduction
Pattern-Based Similarity MeasuresIntroductionLexico-Syntactic PatternsSemantic Similarity MeasuresResultsConclusion
Hybrid Semantic Similarity MeasuresIntroductionFeatures: Single Similarity MeasuresHybrid Similarity MeasuresResultsConclusion
58 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Conclusion
Conclusion:
• We have undertaken a study of 16 baseline measures, 8combination methods, and 3 measure selection techniques.
• The proposed hybrid measures:• use all 5 main types of baseline measures;• outperform the single measures on all datasets.
• The best results were provided by• a combination of 15 corpus-, web-, network-, and
definition-based measures• with Logistic Regression• ρ = 0.870, P(20) = 0.987, R(50) = 0.814.
59 / 60
Introduction Pattern-Based Similarity Measures Hybrid Semantic Similarity Measures
Conclusion
Thank you! Questions?
60 / 60