max-margin sequential learning methods

27
Max-margin sequential learning methods William W. Cohen CALD

Upload: ron

Post on 12-Feb-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Max-margin sequential learning methods. William W. Cohen CALD. Announcements. Upcoming assignments: Wed 3/3: project proposal due: personnel + 1-2 page Spring break next week, no class Will get feedback on project proposals by end of break - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Max-margin sequential learning methods

Max-margin sequential learning methods

William W. CohenCALD

Page 2: Max-margin sequential learning methods

Announcements

• Upcoming assignments:– Wed 3/3: project proposal due:

• personnel + 1-2 page – Spring break next week, no class– Will get feedback on project proposals by end

of break– No write-ups for “Distance Metrics for Text”

week are due Wed 3/17 • not the Monday after spring break

Page 3: Max-margin sequential learning methods

Collins’ paper

• Notation: – label (y) is a “tag” t– observation (x) is word w– history h is a 4-tuple <ti,ti-1,w[1:n],i>

– phis(h,t) is a feature of h, t

Page 4: Max-margin sequential learning methods

Collins’ papers

• Notation con’t:– Phi is summation of phi for all positions i

– alphas is weight to give phis

Page 5: Max-margin sequential learning methods

Collins’ paper

Page 6: Max-margin sequential learning methods

The theory

Claim 1: the algorithm is an instance of this perceptron variant:

Claim 2: the arguments in the mistake-bounded classification results of F&S99 extend immediately to this ranking task as well.

Page 7: Max-margin sequential learning methods
Page 8: Max-margin sequential learning methods

F&S99 algorithm

Page 9: Max-margin sequential learning methods

F&S99 result

Page 10: Max-margin sequential learning methods

Collins’ result

Page 11: Max-margin sequential learning methods

Results

• Two experiments– POS tagging, using the Adwait’s features– NP chunking (Start,Continue,Outside tags)– NER on special AT&T dataset (another paper)

Page 12: Max-margin sequential learning methods

Features for NP chunking

Page 13: Max-margin sequential learning methods

Results

Page 14: Max-margin sequential learning methods

More ideas

• The dual version of a perceptron:– w is built up by repeatedly adding examples => w is a

weighted sum of the examples x1,...,xn– inner product <w,x> is can be rewritten:

)(

so

1-or 0,1, is where,

11

1

xxxxxw

xw

m

jjj

m

jjj

j

m

jjj

Page 15: Max-margin sequential learning methods

Dual version of perceptron ranking

alpha i,j = i,j range over example and correct/incorrect tag sequence

Page 16: Max-margin sequential learning methods

NER features for re-ranking MAXENT tagger output

Page 17: Max-margin sequential learning methods

NER features

Page 18: Max-margin sequential learning methods

NER results

Page 19: Max-margin sequential learning methods

Altun et al paper

• Starting point – dual version of Collins’ perceptron algorithm– final hypothesis is weighted sum of inner

products with a subset of the examples– this a lot like an SVM – except that the

perceptron algorithm is used to set the weights rather than quadratic optimization

Page 20: Max-margin sequential learning methods

SVM optimization

• Notation:– yi is the correct tag for xi

– y is an incorrect tag– F(xi,yi) are features– Optimization problem:

• find weights w on the examples that maximize minimal margin, limiting ||w||=1, or

• minimize ||w||2 such that every margin >= 1

Page 21: Max-margin sequential learning methods

SVMs for ranking

Page 22: Max-margin sequential learning methods

SVMs for ranking

Proposition: (14) and (15) are equivalent:

Let

11

2

1

p

npnp

21 p

2

1

n

n

Page 23: Max-margin sequential learning methods

SVMs for ranking

A binary classification problem – with xi yi the positive example and xi y’ negative examples, except that thetai varies for each example. Why? because we’re ranking.

Page 24: Max-margin sequential learning methods

SVMs for ranking

• Altun et al work give the remaining details• Like for perceptron learning, “negative”

data is found by running Viterbi given the learned weights and looking for errors– Each mistake is a possible new support vector– Need to iterate over the data repeatedly– Could be exponential time before convergence

if the support vectors are dense...

Page 25: Max-margin sequential learning methods

Altun et al results

• NER on 300 sentences from CoNLL2002 shared task – Spanish– Four entity types, nine labels (beginning-T,

intermediate-T, other)• POS tagging on 300 sentences from Penn

TreeBank• 5-CV, window of size 3, simple features

Page 26: Max-margin sequential learning methods

Altun et al results

Page 27: Max-margin sequential learning methods

Altun et al results