expressing implicit semantic relations without supervision acl 2006

Expressing Implicit Semantic Relations without Supervision

ACL 2006

2

Abstract

• For a given input word pair X:Y with unspecified semantic relations– The corresponding output list of patterns <p1,…pm> is

ranked according to how well each pattern pi expresses the relations between X and Y.

• For example, X =ostrich and Y =bird– X is the largest Y and Y such as X

• An unsupervised learning algorithm:– Mining large text corpora for patterns <p1,…pm> – The patterns are sorted by pertinence

3

Introduction

• Hearst (1992): Y such as the X – X is a hyponym (type) of Y– For building a thesaurus

• Berland and Charniak (1999) : Y’s X and X of the Y– X is a meronym (part) of Y– For building a lexicon or ontology , like WordNet

• This paper inverse of this problem: – Given a word pair X : Y with some unspecified semantic relatio

ns– Mining a large text corpus for lexico-syntactic patterns to expre

ss the implicit relations between X and Y.

4

Introduction

• A corpus of web pages : 5*1010 English words– From co-occurrences of the pair ostrich: bird in this

corpus• 516 patterns of the form “X … Y”

• 452 patterns of the form “Y … X”

• Main challenge:– To find a way of ranking the patterns– To find a way to empirically evaluate the performance

5

Pertinence - 1/3

• mason:stone vs. carpenter:wood – high degree of relational similarity

• Assumption: – There is a measure of the relational similarity between p

airs of words, simr (X1 :Y1, X2 :Y2 ) . – Let W={X1 :Y1 ,.., X n :Yn} : be a set of word pairs – Let P={P1,..,Pm} : be a set of patterns.

• The pertinence of pattern Pi to a word pair X j :Yj is the expected relational similarity between a word pair Xk :Yk

6

Pertinence - 2/3•

• Let fk ,i be a number of occurrences

– the word pair Xk :Yk with the pattern Pi

•

•

conditional probability relational similarity

7

Pertinence - 3/3

• assume p(X j :Yj ) =1/n for all pairs in W

p(X j :Yj ) =1/n : Laplace smoothing

8

The Algorithm

• Goal: – Input a set of word pairs W={X1:Y1,…,Xn:Yn}

– Output ranked lists of patterns <p1,…pm> for each input pair

• 1. Find phrases:– Corpus: 5*1010 English words

– List of the phrases that begin with Xi and end with Yi

– And, list for the opposite order

– One to three intervening words between Xi and Yi

9

The Algorithm

– The first and last words in the phrase do not need to exactly match Xi and Yi (allowable different suffixes)

• 2. Generate patterns:– For example, the phrase “carpenter nails the wood”

• X nails the Y

• X nails * Y

• X * the Y

• X * * Y

– Xi first and Yi last or vice versa• Do not allow duplicate patterns in a list.

• Pattern frequency (term frequency in IR)

10

The Algorithm

• 3. Count pair frequency:– Pair frequency (document frequency in IR) for a

pattern is the number of lists contain the given pattern.

• 4. Map pairs to rows:– For each pair Xi : Yi , create a row for Xi : Yi and

another row for Yi : Xi

• 5. Map patterns to columns:– For each unique pattern of the form “X…Y” (in step2),

create a column and another column X and Y swapped, ”Y .. X”

11

The Algorithm

• 6. Build a sparse matrix:– Build a matrix X.

• value xij is the pattern frequency of the j-th patterns for the i-th word pair.

• 7. Calculate entropy:– log(xij) * H(P)

• H(P)= H(X) = - xX p(x)log2p(x)

• 8: Apply SVD (singular value decomposition):– SVD is used to reduce noise and compensate for spars

eness

12

The Algorithm

– X = UVT , • U,V are in column orthonormal form is a diagonal matrix of singular value

• If X is of rank r, then is also rank r.

• Let k (k < r) be the diagonal matrix formed from top k singular values

• Let Uk and Vk be the matrices produced by selecting the corresponding columns from U and V.

• K = 300

13

The Algorithm

• 9. Calculate cosines:– simr (X1 :Y1, X2 :Y2 ) is given by the cosine of the angle

between their corresponding row vectors in the matrix UkkVk

• 10. Calculate conditional probabilities:– Using Bayes’ theorem and the raw frequency data

• 11. Calculate pertinence:

14

Experiments with Word Analogies

• 374 college-level SAT test – word pair: ostrich: bird

• (a) lion:cat (b) goose:flock (c) ewe:sheep (d) cub:bear (e) primate:monkey

– Row: 374*6*2=4488 • Drop some pairs they do not co-occur in the corpus.

• 4191 rows

– Column:• 1,706,845 patterns (3,413,690 columns)

• Drop all patterns with a frequency less than ten.

• 42,032 patterns (84,064 columns)

– density is 0.91%

17

Skip 15 SAT questions

f: pattern frequency

F: maximun f

n: pair frequency

N: total number of word pairs

18

Experiments with Noun-Modifiers-1/3

• 600 noun-modifiers set• 5 general classes of labels with 30 subclasses

– flu virus : causality relation (the flu is caused by a virus)

– causality (storm cloud), temporality (daily exercise), spatial (desert storm), participant (student protest), and quality (expensive book)

• Matrix:– 1184 rows and 33,698 columns– density is 2.57%

19


• leave-one-out cross-validation– the testing set consists of a single noun-modifier pair a

nd the training set consists of the 599 remaining noun-modifiers.

20


21

Conclusion

• How word pairs are similar

• The main contribution of this paper is the idea of pertinence

• Although the performance on the SAT analogy questions (54.6%) is near the level of the average senior high school student (57%), there is room for improvement.

expressing implicit semantic relations without supervision acl 2006

Documents