beam-width prediction for efficient context-free parsing

28
Beam-Width Prediction for Efficient Context-Free Parsing Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark June 2011

Upload: onaona

Post on 25-Feb-2016

52 views

Category:

Documents


2 download

DESCRIPTION

Beam-Width Prediction for Efficient Context-Free Parsing. Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark. June 2011. OHSU Beam-Search Parser (BUBS). Standard bottom-up CYK Beam-search per chart cell Only “best” are retained. Ranking, Prioritization, and FOMs. f() = g() + h() - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Beam-Width Prediction for Efficient Context-Free Parsing

Beam-Width Prediction for Efficient Context-Free Parsing

Nathan Bodenstab, Aaron Dunlop, Keith Hall, Brian Roark

June 2011

Page 2: Beam-Width Prediction for Efficient Context-Free Parsing

OHSU Beam-Search Parser (BUBS)

2

• Standard bottom-up CYK• Beam-search per chart cell• Only “best” are retained

Page 3: Beam-Width Prediction for Efficient Context-Free Parsing

Ranking, Prioritization, and FOMs

• f() = g() + h()• Figure of Merit

– Caraballo and Charniak (1997)

• A* search– Klein and Manning (2003)– Pauls and Klein (2010)

• Other– Turrian (2007)– Huang (2008)

• Apply to beam-search

3

Page 4: Beam-Width Prediction for Efficient Context-Free Parsing

Beam-Width Prediction

• Traditional beam-search uses constant beam-width• Two definitions of beam-width:

– Number of local competitors to retain (n-best)– Score difference from best entry

• Advantages– Heavy pruning compared to CYK– Minimal sorting compared to global agenda

• Disadvantages– No global pruning – all chart cells treated equal– Conservative to keep outliers within beam

4

Page 5: Beam-Width Prediction for Efficient Context-Free Parsing

5

Beam-Width Prediction

• How often is gold edge ranked in top N per chart cell– Exhaustively parse section 22 + Berkeley latent variable grammar

Gold rank <= N

Cum

ulat

ive

Gol

d E

dges

Page 6: Beam-Width Prediction for Efficient Context-Free Parsing

6

Beam-Width Prediction

• How often is gold edge ranked in top N per chart cell– Exhaustively parse section 22 + Berkeley latent variable grammar

Gold rank <= N

Cum

ulat

ive

Gol

d E

dges

Page 7: Beam-Width Prediction for Efficient Context-Free Parsing

7

Beam-Width Prediction

• Beam-search + C&C Boundary ranking: – How often is gold edge ranked in top N per chart cell:

Gold rank <= N

Cum

ulat

ive

Gol

d E

dges

To maintain baseline accuracy, beam-width must be set to 15 with C&C Boundary ranking (and 50 using only inside score)

Page 8: Beam-Width Prediction for Efficient Context-Free Parsing

8

Beam-Width Prediction

• Beam-search + C&C Boundary ranking: – How often is gold edge ranked in top N per chart cell:

Gold rank <= N

Cum

ulat

ive

Gol

d E

dges

To maintain baseline accuracy, beam-width must be set to 15 with C&C Boundary ranking (and 50 using only inside score)

• Over 70% of gold edges are already ranked first in the local agenda

• 14 of 15 edges in these cells are unnecessary

• We can do much better than a constant beam-width

Page 9: Beam-Width Prediction for Efficient Context-Free Parsing

Beam-Width Prediction

• Method: Train an averaged perceptron (Collins, 2002) to predict the optimal beam-width per chart cell• Map each chart cell in sentence S spanning words wi … wj to a feature

vector representation:

• x: Lexical and POS unigrams and bigrams, relative and absolute span• y:1 if gold rank > k, 0 otherwise (no gold edge has rank of -1)

• Minimize the loss:

• H is the unit step function

9

k

k

Page 10: Beam-Width Prediction for Efficient Context-Free Parsing

Beam-Width Prediction

• Method: Use a discriminative classifier to predict the optimal beam-width per chart cell• Minimize the loss:

• L is the asymmetric loss function:

• If beam-width is too large, tolerable efficiency loss• If beam-width is too small, high risk to accuracy• Lambda set to 102 in all experiments

10

k

Page 11: Beam-Width Prediction for Efficient Context-Free Parsing

11

Beam-Width Prediction

Special case: Predict if chart cell is open or closed to multi-word constituents

Page 12: Beam-Width Prediction for Efficient Context-Free Parsing

12

Beam-Width Prediction

• A “closed” chart cell may need to be partially open• Binarized or dotted-rule parsing creates new “factored”

productions:

Page 13: Beam-Width Prediction for Efficient Context-Free Parsing

13

Beam-Width Prediction

Method 1: Constituent Closure

Page 14: Beam-Width Prediction for Efficient Context-Free Parsing

14

Beam-Width Prediction

• Constituent Closure is a per-cell generalization of Roark & Hollingshead (2008)– O(n2) classifications instead of O(n)

Page 15: Beam-Width Prediction for Efficient Context-Free Parsing

15

Beam-Width Prediction

Method 2: Complete Closure

Page 16: Beam-Width Prediction for Efficient Context-Free Parsing

16

Beam-Width Prediction

Method 3: Beam-Width Prediction

Page 17: Beam-Width Prediction for Efficient Context-Free Parsing

17

Beam-Width Prediction

Method 3: Beam-Width Prediction

• Use multiple binary classifiers instead of regression (better performance)

• Local beam-width taken from classifier with smallest beam-width prediction

• Best performance with four binary classifiers: 0, 1, 2, 4– 97% of positive examples have beam-width <= 4– Don’t need a classifier for every possible beam-

width value between 0 and global maximum (15 in our case)

Page 18: Beam-Width Prediction for Efficient Context-Free Parsing

18

Beam-Width Prediction

Page 19: Beam-Width Prediction for Efficient Context-Free Parsing

19

Beam-Width Prediction

1.0

0.8

0.6

0.4

0.2

0.0

Page 20: Beam-Width Prediction for Efficient Context-Free Parsing

20

Beam-Width Prediction

• Section 22 development set results

• Decoding time is seconds per sentence averaged over all sentences in Section 22

• Parsing with Berkeley latent variable grammar (4.3 million productions)

Parser Secs/Sent Speedup F1CYK 70.383 89.4

CYK + Constituent Closure 47.870 1.5x 89.3

CYK + Complete Closure 32.619 2.2x 89.3

Page 21: Beam-Width Prediction for Efficient Context-Free Parsing

21

Beam-Width Prediction

Parser Secs/Sent Speedup F1CYK 70.383 89.4

CYK + Constituent Closure 47.870 1.5x 89.3

CYK + Complete Closure 32.619 2.2x 89.3

Beam + Inside FOM (BI) 3.977 89.2

BI + Constituent Closer 2.033 2.0x 89.2

BI + Complete Closure 1.575 2.5x 89.3

BI + Beam-Predict 1.180 3.4x 89.3

Page 22: Beam-Width Prediction for Efficient Context-Free Parsing

22

Beam-Width Prediction

Parser Secs/Sent Speedup F1CYK 70.383 89.4

CYK + Constituent Closure 47.870 1.5x 89.3

CYK + Complete Closure 32.619 2.2x 89.3

Beam + Inside FOM (BI) 3.977 89.2

BI + Constituent Closer 2.033 2.0x 89.2

BI + Complete Closure 1.575 2.5x 89.3

BI + Beam-Predict 1.180 3.4x 89.3

Beam + Boundary FOM (BB) 0.326 89.2

BB + Constituent Closure 0.279 1.2x 89.2

BB + Complete Closure 0.199 1.6x 89.3

BB + Beam-Predict 0.143 2.3x 89.3

Page 23: Beam-Width Prediction for Efficient Context-Free Parsing

Beam-Width Prediction

23

Parser Secs/Sent Speedup F1CYK 70.383 89.4

CYK + Constituent Closure 47.870 1.5x 89.3

CYK + Complete Closure 32.619 2.2x 89.3

Beam + Inside FOM (BI) 3.977 89.2

BI + Constituent Closer 2.033 2.0x 89.2

BI + Complete Closure 1.575 2.5x 89.3

BI + Beam-Predict 1.180 3.4x 89.3

Beam + Boundary FOM (BB) 0.326 89.2

BB + Constituent Closure 0.279 1.2x 89.2

BB + Complete Closure 0.199 1.6x 89.3

BB + Beam-Predict 0.143 2.3x 89.3

Most recent numbers 0.053 6.2x 89.x

Page 24: Beam-Width Prediction for Efficient Context-Free Parsing

24

Beam-Width Prediction

• Section 23 test results• Only MaxRule is marginalizing over latent variables and performing

non-Viterbi decoding

Parser Secs/Sent F1CYK 64.610 88.7

Berkeley CTF MaxRule Petrov and Klein (2007)

0.213 90.2

Berkeley CTF Viterbi 0.208 88.8

Beam + Boundary FOM (BB) Caraballo and Charniak (1998)

0.334 88.6

BB + Chart Constraints Roark and Hollingshead (2008; 2009)

0.244 88.7

BB + Beam-Prediction 0.125 88.7

Page 25: Beam-Width Prediction for Efficient Context-Free Parsing

Thanks.

25

Page 26: Beam-Width Prediction for Efficient Context-Free Parsing

26

Beam-Width Prediction

Page 27: Beam-Width Prediction for Efficient Context-Free Parsing

27

FOM Details

• C&C FOM Details– FOM(NT) = Outsideleft * Inside * Outsideright

– Inside = Accumulated grammar score– Outsideleft = MaxPOS [ POS forward prob * POS-to-NT transition prob ]

– Outsideright = MaxPOS [ NT-to-POS transition prob * POS bkwd prob ]

Page 28: Beam-Width Prediction for Efficient Context-Free Parsing

28

FOM Details

• C&C FOM Details