dynamic conditional random fields for labeling and segmenting sequences

53
Dynamic Conditional Random Fields for Labeling and Segmenting Sequences Khashayar Rohanimanesh Joint work with Charles Sutton Andrew McCallum University of Massachusetts Amherst

Upload: galvin-hatfield

Post on 03-Jan-2016

32 views

Category:

Documents


3 download

DESCRIPTION

Dynamic Conditional Random Fields for Labeling and Segmenting Sequences. Khashayar Rohanimanesh Joint work with Charles Sutton Andrew McCallum University of Massachusetts Amherst. Noun Phrase Segmentation (CoNLL-2000, Sang and Buckholz, 2000). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Dynamic Conditional Random Fieldsfor Labeling and Segmenting Sequences

Khashayar Rohanimanesh

Joint work with

Charles SuttonAndrew McCallum

University of Massachusetts Amherst

Page 2: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Noun Phrase Segmentation(CoNLL-2000, Sang and Buckholz, 2000)

B I I B I I O O ORockwell International Corp. 's Tulsa unit said it signed

B I I O B I O B Ia tentative agreement extending its contract with Boeing Co.

O O B I O B B I Ito provide structural parts for Boeing 's 747 jetliners.

Page 3: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Named Entity Recognition

CRICKET - MILLNS SIGNS FOR BOLAND

CAPE TOWN 1996-08-22

South African provincial side Boland said on Thursday they had signed Leicestershire fast bowler David Millns on a one year contract. Millns, who toured Australia with England A in 1992, replaces former England all-rounder Phillip DeFreitas as Boland's overseas professional.

Labels: Examples:

PER Yayuk BasukiInnocent Butare

ORG 3MKDPLeicestershire

LOC LeicestershireNirmal HridayThe Oval

MISC JavaBasque1,000 Lakes Rally

[McCallum & Li, 2003]

Page 4: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Information Extraction

a seminar entitled “Nanorheology of Polymers & Complex

STIME LOCFluids," at 4:30 p.m, Monday, Feb. 27, in Wean Hall 7500.

SPEAKThe seminar will be given by Professor Steven Granick

Seminar Announcements [Peshkin,Pfeffer 2003]

PROTEINSNC1, a gene from the yeast Saccharomyces cerevisiae,

LOCencodes a homolog of vertebrate synaptic vesicle-associated

membrane proteins (VAMPs) or synaptobrevins. ”subcellular-localization(SNC1,vesicle)

Biological Abstracts [Skounakis,Craven,Ray 2003]

Page 5: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Simultaneous noun-phrase & part-of-speech tagging

B I I B I I O O O N N N O N N V O VRockwell International Corp. 's Tulsa unit said it signed

B I I O B I O B IO J N V O N O N Na tentative agreement extending its contract with Boeing Co.

Page 6: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Probabilistic Sequence Labeling

Page 7: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Linear-Chain CRFs

c(,)c(,)

c(,)c(,)

Finite-State

c(,)c(,)

c(,)c(,)

Page 8: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Linear-Chain CRFs

Graphical Model

(,)

(,)

(,)

(,)

(,)

(,)

(,)

(,)

Training

Um… what's ?

x

y

Page 9: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Linear-Chain CRFs

Graphical Model Training

Rewrite as:

for some features fk and weights k

Now solve for k by convex optimization.

x

y

Page 10: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

General CRFs

A CRF is an undirected, conditionally-trained graphical model.

Train k by convex optimization to maximize conditional log-likelihood.

Features fk can be arbitrary, overlapping, domain-specific.

Page 11: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

CRF Training

Train k by convex optimization to maximize conditional log-likelihood.

Page 12: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Optimization Methods

• Generalized Iterative Scaling (GIS)– Improved Iterative Scaling

• First order methods– Non-Linear conjugate gradient

• Second Order methods– Limited memory Quasi-Newton (BFGS)

Page 13: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

From Generative to Conditional

Graphical ModelModel

HMMs

MEMMs

Linear chainCRFs

Models observation

- Does not model observation- Label bias problem

- Does not model observation- Eliminates label bias problem

Page 14: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Dynamic CRFs

Page 15: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Simultaneous noun-phrase & part-of-speech tagging

B I I B I I O O O N N N O N N V O VRockwell International Corp. 's Tulsa unit said it signed

B I I O B I O B IO J N V O N O N Na tentative agreement extending its contract with Boeing Co.

Page 16: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Features

• Word identity “International”• Capitalization Xxxxxxx• Character classes Contains digits• Character n-gram …ment• Lexicon memberships In list of company

names• WordNet synset (speak, say, tell)• …• Part of speech Proper Noun

Page 17: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Multiple Nested Predictionson the Same Sequence

Part-of-speech

Word identity (input observation)

(output prediction)

Noun phrase

Rockwell Int’l Corp. 's Tulsa

Page 18: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Multiple Nested Predictionson the Same Sequence

Part-of-speech

Noun phrase

Word identity (input observation)

(input observation)

(output prediction)

But errors in each stage are compounding.Uncertainty from one stage to the next is not preserved.

Rockwell Int’l Corp. 's Tulsa

Page 19: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Cascaded Predictions

Segmentation

Chinese character (input observation)

(output prediction)

Part-of-speech

Named-entity tag

Page 20: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Cascaded Predictions

Segmentation

Part-of-speech

Chinese character (input observation)

(input observation)

(output prediction)

Named-entity tag

Page 21: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Cascaded Predictions

Segmentation

Part-of-speech

Named-entity tag

Chinese character (input observation)

(input observation)

(input obseration)

(output prediction)

Even more stages here, so compounding of errors is worse.

Page 22: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Joint PredictionCross-Product over Labels

Segmentation+POS+NE

Chinese character (input observation)

(output prediction)

2 x 45 x 11 = 990 possible states

O(T x 9902) running time

O(|V| x 9902) parameters

e.g.: state label = (Wordbeg, Noun, Person)

Page 23: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Segmentation

Part-of-speech

Named-entity tag

Chinese character (input observation)

(output prediction)

(output prediction)

(output prediction)

O(|V| x 990) parameters

Joint PredictionFactorial CRF

Page 24: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Linear-chain

Factorial

() exp k fk ()k

p(y | x) 1

Z(x)y (y t , y t 1)xy (x t , y t )

t1

T

where

Linear-Chain to Factorial CRFsModel Definition

...

...

...

...

...

...

p(y | x) 1

Z(x)u(ut ,ut 1)v (v t ,v t 1)w (wt ,wt 1)

t1

T

uv (ut ,v t )vw (v t ,wt )wx (wt , x t )

w

v

u

x

y

x

Page 25: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Linear-chain

Factorial

Linear-Chain to Factorial CRFsLog-likelihood Training

...

...

...

...

...

...

w

v

u

x

y

x

L

k

fk (x(i),ut( i),ut 1

(i) )t

i

p(u | x)u

t

fk (x( i),ut ,ut 1)i

k2

L

k

fk (x(i),y t( i),y t 1

(i) )t

i

p(y | x)u

t

fk (x( i),y t ,y t 1)i

k2

Page 26: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Dynamic CRFsUndirected conditionally-trained analogue

to Dynamic Bayes Nets (DBNs)

Factorial Higher-Order Hierarchical

Page 27: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Need for Inference

...

...x

y

...

...x

y

Marginal distributions

Most-likely (Viterbi) labeling

p(y t ,y t1 | x)

argmaxy

p(y | x)

Used during training

Used to label a sequence 9000 training instances x 100 maximizer iterations= 900,000 calls to inference algorithm!

Page 28: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Max-clique: 3 x 45 x 45 = 6075 assignments

NP

POS

Inference (Exact)Junction Tree

Page 29: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Max-clique: 3 x 45 x 45 x 11 = 66825 assignments

NER

POS

SEG

Inference (Exact)Junction Tree

Page 30: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Inference (Approximate)Loopy Belief Approximation

v6v5

v3v2v1

v4

m4(v1) m6(v3)m5(v2)m1(v4) m3(v6)m2(v5)

m1(v2)

m5(v6)m4(v5)

m2(v3)

m5(v4) m5(v4)

m3(v2)m2(v1)

Page 31: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

[Wainwright, Jaakkola, Willsky 2001]

1

3

2

4

5

6

14

23

25

45

36

56

12

Inference (Approximate)Tree Re-parameterization

Page 32: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

1

3

2

4

5

6

14

23

25

45

36

56

12

[Wainwright, Jaakkola, Willsky 2001]

Inference (Approximate)Tree Re-parameterization

Page 33: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

p1

p3

p2

p4

p5

p6

45

56

p23p2 p3

p36p3 p6

p25p2 p5

p14p1p4

p12p1p2

[Wainwright, Jaakkola, Willsky 2001]

Inference (Approximate)Tree Re-parameterization

Page 34: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

p1

p3

p2

p4

p5

p6

45

56

p23p2 p3

p36p3 p6

p25p2 p5

p14p1p4

p12p1p2

[Wainwright, Jaakkola, Willsky 2001]

Inference (Approximate)Tree Re-parameterization

Page 35: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

ExperimentsSimultaneous noun-phrase & part-of-speech tagging

• Data from CoNLL Shared Task 2000 (Newswire)– Training subsets of various sizes: from 223-894 sentences– Features include: word identity, neighboring words,

capitalization, lexicons of parts-of-speech, company names (1,358227 feature functions !)

B I I B I I O O O N N N O N N V O VRockwell International Corp. 's Tulsa unit said it signed

B I I O B I O B IO J N V O N O N Na tentative agreement extending its contract with Boeing Co.

Page 36: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

ExperimentsSimultaneous noun-phrase & part-of-speech tagging

Two experiments• Compare exact and approximate inference• Compare accuracy of cascaded CRFs and Factorial DCRFs

B I I B I I O O O N N N O N N V O VRockwell International Corp. 's Tulsa unit said it signed

B I I O B I O B IO J N V O N O N Na tentative agreement extending its contract with Boeing Co.

Page 37: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Noun Phrase Accuracy

Page 38: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Accuracy

POS-tagger, (Brill, 1994) F1 for NP on 8936: 93.87

Page 39: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Summary

• Many natural language tasks are solved by chaining errorful subtasks.

• Approach: Jointly solve all subtasks in a single graphical model.– Learn dependence between subtasks– Allow higher-level to inform lower level

• Improved joint and POS accuracy over cascaded model, but NP accuracy lower.

• Current work: Emphasize one subtask

Page 40: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Maximize Marginal Likelihood(Ongoing work)

NP

POS

O() log p(np( i) | x( i))i

p(np( i),pos(i) | x(i))pos

i

O

k

p(pos | np( i),x(i)) fk (pos,np( i),x(i))pos

i

p(pos,np | x(i)) fk (pos,np,x(i))np

pos

i

Page 41: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Thank you!

Page 42: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences
Page 43: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

State-of-the-art Performance

• POS tagging: – 97% (Brill, 1999)

• NP chinking:– 94.38% (Sha and Pereira)– 94.39% (?)

Page 44: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Alternatives to Traditional Joint

• Optimize Marginal Likelihood

• Optimize Utility

• Optimize Margin (M3N) [Taskar, Guestrin, Koller 2003]

Page 45: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Maximize Marginal Likelihood(Ongoing work)

NP

POS

O() log p(np( i) | x( i))i

p(np( i),pos(i) | x(i))pos

i

O

k

p(pos | np( i),x(i)) fk (pos,np( i),x(i))pos

i

p(pos,np | x(i)) fk (pos,np,x(i))np

pos

i

Page 46: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Undirected Graphical Models

Directed

Undirected

Page 47: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Hidden Markov Models

TrainingGraphical Model

p(|)p(|)

p(|)p(|)

p(|)

p(|)

p(|)

p(|)

p(,)=p() p(|) p(|) p(|) p(|) p(|)

Page 48: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences

Hidden Markov Models

p(|)p(|)

p(|)p(|)

Finite-State

p(|)p(|)

p(|)p(|)

Graphical Model

p(|)p(|)

p(|)p(|)

p(|)

p(|)

p(|)

p(|)

p(,)=p() p(|) p(|) p(|) p(|) p(|)

Page 49: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences
Page 50: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences
Page 51: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences
Page 52: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences
Page 53: Dynamic Conditional Random Fields for Labeling and Segmenting Sequences