expressing implicit semantic relations without supervision acl 2006
TRANSCRIPT
Expressing Implicit Semantic Relations without Supervision
ACL 2006
2
Abstract
• For a given input word pair X:Y with unspecified semantic relations– The corresponding output list of patterns <p1,…pm> is
ranked according to how well each pattern pi expresses the relations between X and Y.
• For example, X =ostrich and Y =bird– X is the largest Y and Y such as X
• An unsupervised learning algorithm:– Mining large text corpora for patterns <p1,…pm> – The patterns are sorted by pertinence
3
Introduction
• Hearst (1992): Y such as the X – X is a hyponym (type) of Y– For building a thesaurus
• Berland and Charniak (1999) : Y’s X and X of the Y– X is a meronym (part) of Y– For building a lexicon or ontology , like WordNet
• This paper inverse of this problem: – Given a word pair X : Y with some unspecified semantic relatio
ns– Mining a large text corpus for lexico-syntactic patterns to expre
ss the implicit relations between X and Y.
4
Introduction
• A corpus of web pages : 5*1010 English words– From co-occurrences of the pair ostrich: bird in this
corpus• 516 patterns of the form “X … Y”
• 452 patterns of the form “Y … X”
• Main challenge:– To find a way of ranking the patterns– To find a way to empirically evaluate the performance
5
Pertinence - 1/3
• mason:stone vs. carpenter:wood – high degree of relational similarity
• Assumption: – There is a measure of the relational similarity between p
airs of words, simr (X1 :Y1, X2 :Y2 ) . – Let W={X1 :Y1 ,.., X n :Yn} : be a set of word pairs – Let P={P1,..,Pm} : be a set of patterns.
• The pertinence of pattern Pi to a word pair X j :Yj is the expected relational similarity between a word pair Xk :Yk
6
Pertinence - 2/3•
• Let fk ,i be a number of occurrences
– the word pair Xk :Yk with the pattern Pi
•
•
conditional probability relational similarity
7
Pertinence - 3/3
• assume p(X j :Yj ) =1/n for all pairs in W
p(X j :Yj ) =1/n : Laplace smoothing
8
The Algorithm
• Goal: – Input a set of word pairs W={X1:Y1,…,Xn:Yn}
– Output ranked lists of patterns <p1,…pm> for each input pair
• 1. Find phrases:– Corpus: 5*1010 English words
– List of the phrases that begin with Xi and end with Yi
– And, list for the opposite order
– One to three intervening words between Xi and Yi
9
The Algorithm
– The first and last words in the phrase do not need to exactly match Xi and Yi (allowable different suffixes)
• 2. Generate patterns:– For example, the phrase “carpenter nails the wood”
• X nails the Y
• X nails * Y
• X * the Y
• X * * Y
– Xi first and Yi last or vice versa• Do not allow duplicate patterns in a list.
• Pattern frequency (term frequency in IR)
10
The Algorithm
• 3. Count pair frequency:– Pair frequency (document frequency in IR) for a
pattern is the number of lists contain the given pattern.
• 4. Map pairs to rows:– For each pair Xi : Yi , create a row for Xi : Yi and
another row for Yi : Xi
• 5. Map patterns to columns:– For each unique pattern of the form “X…Y” (in step2),
create a column and another column X and Y swapped, ”Y .. X”
11
The Algorithm
• 6. Build a sparse matrix:– Build a matrix X.
• value xij is the pattern frequency of the j-th patterns for the i-th word pair.
• 7. Calculate entropy:– log(xij) * H(P)
• H(P)= H(X) = - xX p(x)log2p(x)
• 8: Apply SVD (singular value decomposition):– SVD is used to reduce noise and compensate for spars
eness
12
The Algorithm
– X = UVT , • U,V are in column orthonormal form is a diagonal matrix of singular value
• If X is of rank r, then is also rank r.
• Let k (k < r) be the diagonal matrix formed from top k singular values
• Let Uk and Vk be the matrices produced by selecting the corresponding columns from U and V.
• K = 300
13
The Algorithm
• 9. Calculate cosines:– simr (X1 :Y1, X2 :Y2 ) is given by the cosine of the angle
between their corresponding row vectors in the matrix UkkVk
• 10. Calculate conditional probabilities:– Using Bayes’ theorem and the raw frequency data
• 11. Calculate pertinence:
14
Experiments with Word Analogies
• 374 college-level SAT test – word pair: ostrich: bird
• (a) lion:cat (b) goose:flock (c) ewe:sheep (d) cub:bear (e) primate:monkey
– Row: 374*6*2=4488 • Drop some pairs they do not co-occur in the corpus.
• 4191 rows
– Column:• 1,706,845 patterns (3,413,690 columns)
• Drop all patterns with a frequency less than ten.
• 42,032 patterns (84,064 columns)
– density is 0.91%
15
16
17
Skip 15 SAT questions
f: pattern frequency
F: maximun f
n: pair frequency
N: total number of word pairs
18
Experiments with Noun-Modifiers-1/3
• 600 noun-modifiers set• 5 general classes of labels with 30 subclasses
– flu virus : causality relation (the flu is caused by a virus)
– causality (storm cloud), temporality (daily exercise), spatial (desert storm), participant (student protest), and quality (expensive book)
• Matrix:– 1184 rows and 33,698 columns– density is 2.57%
19
Experiments with Noun-Modifiers-2/3
• leave-one-out cross-validation– the testing set consists of a single noun-modifier pair a
nd the training set consists of the 599 remaining noun-modifiers.
20
Experiments with Noun-Modifiers-3/3
21
Conclusion
• How word pairs are similar
• The main contribution of this paper is the idea of pertinence
• Although the performance on the SAT analogy questions (54.6%) is near the level of the average senior high school student (57%), there is room for improvement.