graph-based analysis of semantic drift in espresso-like bootstrapping algorithms

30
Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms EMNLP 2008 22/06/28 Mamoru Komachi , Taku Kudo , Masashi Shimbo and Yuji Matsumoto Nara Institute of Science and Technology, Japan Google Inc.

Upload: luther

Post on 25-Feb-2016

21 views

Category:

Documents


1 download

DESCRIPTION

Mamoru Komachi † , Taku Kudo ‡ , Masashi Shimbo † and Yuji Matsumoto † † Nara Institute of Science and Technology, Japan ‡ Google Inc. Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms. EMNLP 2008. Bootstrapping and semantic drift. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Graph-based Analysis of Semantic Drift in

Espresso-like Bootstrapping Algorithms

EMNLP 2008

23/04/22

Mamoru Komachi†, Taku Kudo‡, Masashi Shimbo† and Yuji Matsumoto†

†Nara Institute of Science and Technology, Japan‡Google Inc.

Page 2: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

04/22/232

Bootstrapping and semantic drift Bootstrapping grows small amount of seed instances

Buy # at Apple Store

Generic patterns= Patterns which co-occur with many (irrelevant) instances

Seed instance Extracted instance

iPhone

iPod # for sale

Co-occurrence pattern

iPod

MacBook Air

car

house

Page 3: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

04/22/233

Main contributions of this work

3

1. Suggest a parallel between semantic drift in Espresso-like bootstrapping and topic drift in HITS (Kleinberg, 1999)

2. Solve semantic drift by graph kernels used in link analysis community

Page 4: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Espresso Algorithm [Pantel and Pennacchiotti, 2006]

Repeat Pattern extraction Pattern ranking Pattern selection Instance extraction Instance ranking Instance selection

Until a stopping criterion is met

04/22/234

Page 5: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Espresso uses pattern-instance matrix Afor ranking patterns and instances|P|×|I|-dimensional matrix holding the (normalized) pointwise mutual information (pmi) between patterns and instances

04/22/235

1 2 ... i ... |I|

1

2

:

p

:

|P|

[A]p,i = pmi(p,i) / maxp,i pmi(p,i)

instance indices

pattern indices

Page 6: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Pattern/instance ranking in Espressop = pattern score vectori = instance score vectorA = pattern-instance matrix

04/22/236

p ← 1IAi... pattern ranking

i ← 1PATp... instance ranking

Reliable instances are supported by reliable patterns, and vice versa

|P| = number of patterns|I| = number of instances

normalization factorsto keep score vectors not too large

Page 7: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Espresso Algorithm [Pantel and Pennacchiotti, 2006]

Repeat Pattern extraction Pattern ranking Pattern selection Instance extraction Instance ranking Instance selection

Until a stopping criterion is met

04/22/237

For graph-theoretic analysis,

we will introduce 3 simplifications to

Espresso

Page 8: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

First simplification Compute the pattern-instance matrix Repeat

Pattern extraction Pattern ranking Pattern selection Instance extraction Instance ranking Instance selection

Until a stopping criterion is met

04/22/238

Simplification 1Remove pattern/instance extraction stepsInstead, pre-compute all patterns and instances once in the beginning of the algorithm

Page 9: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Second simplification Compute the pattern-instance matrix Repeat

Pattern ranking Pattern selection

Instance ranking Instance selection

Until a stopping criterion is met

04/22/239

Simplification 2Remove pattern/instance selection stepswhich retain only highest scoring k patterns / m instances for the next iterationi.e., reset the scores of other items to 0

Instead,retain scores of all patterns and instances

Page 10: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Third simplification Compute the pattern-instance matrix Repeat

Pattern ranking

Instance ranking

Until a stopping criterion is met

04/22/2310

Until score vectors p and i converge

Simplification 3No early stopping i.e., run until convergence

Page 11: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Input Initial score vector of seed instances Pattern-instance co-occurrence matrix A

Main loopRepeat

Until i and p converge

OutputInstance and pattern score vectors i and p

p ← 1IAi... pattern ranking

i ← 1PATp... instance ranking

04/22/2311

Simplified Espresso

i = 0,1,0,0( )

Page 12: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

04/22/2312

Simplified Espresso is HITSSimplified Espresso = HITS in a bipartite graph whose adjacency matrix is A

Problem

No matter which seed you start with, the same instance is always ranked topmostSemantic drift (also called topic drift in HITS)

The ranking vector i tends to the principal eigenvector of ATA as the iteration proceedsregardless of the seed instances!

Page 13: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

How about the original Espresso?Original Espresso has heuristics not present in Simplified Espresso

Early stopping Pattern and instance selection

Do these heuristics really help reduce semantic drift?

04/22/2313

Page 14: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Experiments on semantic drift

Does the heuristics in original Espresso help reduce drift?

04/22/2314

Page 15: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

04/22/2315

Word sense disambiguation task of Senseval-3 English Lexical Sample

Predict the sense of “bank”

… the financial benefits of the bank (finance) 's employee package ( cheap mortgages and pensions, etc ) , bring this up to …

In that same year I was posted to South Shields on the south bank (bank of the river) of the River Tyne and quickly became aware that I had an enormous burden

Possibly aligned to water a sort of bank(???) by a rushing river.

Training instances are annotated with their sense

Predict the sense of target word in the test set

Page 16: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

04/22/2316

Word sense disambiguation by Original Espresso

Seed instance = the instance to predict its sense

System output = k-nearest neighbor (k=3)Heuristics of Espresso Pattern and instance selection

# of patterns to retain p=20 (increase p by 1 on each iteration)

# of instance to retain m=100 (increase m by 100 on each iteration)

Early stopping

Seed instance

Page 17: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

04/22/2317

Convergence process of EspressoHeuristics in Espresso helps reducing

semantic drift(However, early stopping is required for

optimal performance)

Output the most frequent sense regardless of input

Original Espresso

Simplified Espresso

Most frequent sense (baseline)

Semantic drift occurs (always outputs the most frequent sense)

Page 18: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Learning curve of Original Espresso:per-sense breakdown

04/22/2318

# of most frequent sense predictions increases

Recall for infrequent senses worsens even with

original Espresso

Most frequent sense

Other senses

Page 19: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

04/22/2319

Summary: Espresso and semantic driftSemantic drift happens because Espresso is designed like HITS HITS gives the same ranking list regardless of seeds

Some heuristics reduce semantic drift Early stopping is crucial for optimal performance

Still, these heuristics require many parameters to be calibrated but calibration is difficult

Page 20: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

04/22/2320

Main contributions of this work

20

1. Suggest a parallel between semantic drift in Espresso-like bootstrapping and topic drift in HITS (Kleinberg, 1999)

2. Solve semantic drift by graph kernels used in link analysis community

Page 21: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Q. What caused drift in Espresso?A. Espresso's resemblance to HITS

HITS is an importance computation method(gives a single ranking list for any seeds)

Why not use a method for another type of link analysis measure - which takes seeds into account?"relatedness" measure

(it gives different rankings for different seeds)

04/22/2321

Page 22: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

04/22/2322

The regularized Laplacian kernel A relatedness measure Takes higher-order relations into account

Has only one parameter

L = D− AGraph Laplacian

R n ( L)n (I L) 1

n0

Regularized Laplacian matrix

A: adjacency matrix of the graphD: (diagonal) degree matrix

β: parameterEach column of Rβ gives the rankings relative to a node

Page 23: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Evaluation of regularized Laplacian

Comparison to (Agirre et al. 2006)

04/22/2323

Page 24: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

algorithm F measure

Most frequent sense (baseline)

54.5

HyperLex 64.6PageRank 64.6Simplified Espresso 44.1Espresso (after convergence)

46.9

Espresso (optimal stopping)

66.5

Regularized Laplacian (β=10-2)

67.104/22/2324

WSD on all nouns in Senseval-3

Outperforms other graph-based methods

Espresso needs optimal stopping to achieve an equivalent

performance

Page 25: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

04/22/2325

Conclusions

Semantic drift in Espresso is a parallel form of topic drift in HITS

The regularized Laplacian reduces semantic drift inherently a relatedness measure ( importance measure)

Page 26: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

04/22/2326

Future work Investigate if a similar analysis is applicable to a wider class of bootstrapping algorithms (including co-training)

Try other popular tasks of bootstrapping such as named entity extraction Selection of seed instances matters

Page 27: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

04/22/2327

Page 28: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Algorithm Most frequent sense

Other senses

Simplified Espresso 100.0 0.0Espresso (after convergence)

100.0 30.2

Espresso (optimal stopping)

94.4 67.4

Regularized Laplacian (β=10-2)

92.1 62.8

04/22/2328

Label prediction of “bank” (F measure)

The regularized Laplacian keeps high recall for infrequent

senses

Espresso suffers from semantic drift

(unless stopped at optimal stage)

Page 29: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Regularized Laplacian is mostly stable across a parameter

04/22/2329

Page 30: Graph-based Analysis of Semantic Drift in  Espresso-like Bootstrapping Algorithms

Pattern/instance ranking in EspressoScore for pattern p Score for instance

i

08.10.243030

r (p)

pmi(i, p)max pmi

r(i)

iI

I

r(i)

pmi(i, p)max pmi

r (p)

pP

P

,**,,*,,,

log),(pyxypx

pipmi

p: patterni: instanceP: set of patternsI: set of instancespmi: pointwise mutual informationmax pmi: max of pmi in all the patterns

and instances