fast full parsing by linear-chain conditional random fields

27
Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester

Upload: brier

Post on 11-Jan-2016

25 views

Category:

Documents


3 download

DESCRIPTION

Fast Full Parsing by Linear-Chain Conditional Random Fields. Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester. Outline. Motivation Parsing algorithm Chunking with conditional random fields Searching for the best parse Experiments Penn Treebank - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fast Full Parsing by Linear-Chain Conditional Random Fields

Fast Full Parsing by Linear-Chain Conditional Random Fields

Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou

The University of Manchester

Page 2: Fast Full Parsing by Linear-Chain Conditional Random Fields

Outline• Motivation

• Parsing algorithm• Chunking with conditional random fields• Searching for the best parse

• Experiments• Penn Treebank

• Conclusions

Page 3: Fast Full Parsing by Linear-Chain Conditional Random Fields

Motivation• Parsers are useful in many NLP applications– Information extraction, Summarization, MT, etc.

• But parsing is often the most computationally expensive component in the NLP pipeline

• Fast parsing is useful when– The document collection is large

– e.g. MEDLINE corpus: 70 million sentences– Real-time processing is required

– e.g. web applications

Page 4: Fast Full Parsing by Linear-Chain Conditional Random Fields

Parsing algorithms

• History-based approaches– Bottom-up & left-to-right (Ratnaparkhi, 1997)– Shift-reduce (Sagae & Lavie 2006)

• Global modeling– Tree CRFs (Finkel et al., 2008; Petrov & Klein 2008)– Reranking (Collins 2000; Charniak & Johnson, 2005)– Forest (Huang, 2008)

Page 5: Fast Full Parsing by Linear-Chain Conditional Random Fields

Chunk parsing• Parsing Algorithm

1. Identify phrases in the sequence.2. Convert the recognized phrases into new non-

terminal symbols.3. Go back to 1.

• Previous work– Memory-based learning (Tjong Kim Sang, 2001)• F-score: 80.49

– Maximum entropy (Tsuruoka and Tsujii, 2005)• F-score: 85.9

Page 6: Fast Full Parsing by Linear-Chain Conditional Random Fields

Parsing a sentence

Estimated volume was a light 2.4 million ounces .

VBN NN VBD DT JJ CD CD NNS .

QP

NP

VP

NP

S

Page 7: Fast Full Parsing by Linear-Chain Conditional Random Fields

Estimated volume was a light 2.4 million ounces .

VBN NN VBD DT JJ CD CD NNS .

QPNP

1st iteration

Page 8: Fast Full Parsing by Linear-Chain Conditional Random Fields

volume was a light million ounces .

NP VBD DT JJ QP NNS .

NP

2nd iteration

Page 9: Fast Full Parsing by Linear-Chain Conditional Random Fields

volume was ounces .

NP VBD NP .

VP

3rd iteration

Page 10: Fast Full Parsing by Linear-Chain Conditional Random Fields

volume was .

NP VP .

S

4th iteration

Page 11: Fast Full Parsing by Linear-Chain Conditional Random Fields

was

S

5th iteration

Page 12: Fast Full Parsing by Linear-Chain Conditional Random Fields

Estimated volume was a light 2.4 million ounces .

VBN NN VBD DT JJ CD CD NNS .

QP

NP

VP

NP

S

Complete parse tree

Page 13: Fast Full Parsing by Linear-Chain Conditional Random Fields

Chunking with CRFs

• Conditional random fields (CRFs)

• Features are defined on states and state transitions

Feature functionFeature weight

F

i

n

tttiin yytf

ZyyP

1 111 ,,,exp

1)|...( xx

Estimated volume was a light 2.4 million ounces .

VBN NN VBD DT JJ CD CD NNS .

QPNP

Page 14: Fast Full Parsing by Linear-Chain Conditional Random Fields

Estimated volume was a light 2.4 million ounces .

VBN NN VBD DT JJ CD CD NNS .

Chunking with “IOB” tagging

B-NP I-NP O O O B-QP I-QP O O

NP QP

B : Beginning of a chunk I : Inside (continuation) of the chunkO : Outside of chunks

Page 15: Fast Full Parsing by Linear-Chain Conditional Random Fields

Features for base chunking

Estimated volume was a light 2.4 million ounces .

VBN NN VBD DT JJ CD CD NNS .

?

Page 16: Fast Full Parsing by Linear-Chain Conditional Random Fields

Features for non-base chunking

volume was a light million ounces .

NP VBD DT JJ QP NNS .

NP

VBN NN

Estimated volume

?

Page 17: Fast Full Parsing by Linear-Chain Conditional Random Fields

Finding the best parse

• Scoring the entire parse tree

• The best derivation can be found by depth-first search.

h

iiipscore

0

| xy

Page 18: Fast Full Parsing by Linear-Chain Conditional Random Fields

Depth first searchPOS tagging

Chunking (base)

Chunking

Chunking Chunking

Chunking

Chunking (base)

Chunking

Chunking

Page 19: Fast Full Parsing by Linear-Chain Conditional Random Fields

Finding the best parse

Page 20: Fast Full Parsing by Linear-Chain Conditional Random Fields

Extracting multiple hypotheses from CRF

• A* search– Uses a priority queue– Suitable when top n hypotheses are needed

• Branch-and-bound– Depth-first– Suitable when a probability threshold is given

CRF

BIOOOB

0.3

BIIOOB

0.2

BIOOOO

0.18

Page 21: Fast Full Parsing by Linear-Chain Conditional Random Fields

Experiments• Penn Treebank Corpus– Training: sections 2-21– Development: section 22– Evaluation: section 23

• Training– Three CRF models

• Part-of-speech tagger• Base chunker• Non-base chunker

– Took 2 days on AMD Opteron 2.2GHz

Page 22: Fast Full Parsing by Linear-Chain Conditional Random Fields

Training the CRF chunkers

• Maximum likelihood + L1 regularization

• L1 regularization helps avoid overfitting and produce compact modes– OWLQN algorithm (Andrew and Gao, 2007)

i

ij

jj CpL xy |log

Page 23: Fast Full Parsing by Linear-Chain Conditional Random Fields

Chunking performance

Symbol # Samples Recall Precison F-score

NP 317,597 94.79 94.16 94.47

VP 76,281 91.46 91.98 91.72

PP 66,979 92.84 92.61 92.72

S 33,739 91.48 90.64 91.06

ADVP 21,686 84.25 85.86 85.05

ADJP 14,422 77.27 78.46 77.86

: : : : :

All 579,253 92.63 92.62 92.63

Section 22, all sentences

Page 24: Fast Full Parsing by Linear-Chain Conditional Random Fields

Beam width and parsing performance

Beam Recall Precision F-score Time (sec)

1 86.72 87.83 87.27 16

2 88.50 88.85 88.67 41

3 88.69 89.08 88.88 61

4 88.72 89.13 88.92 92

5 88.73 89.14 88.93 119

10 88.68 89.19 88.93 179

Section 22, all sentences (1,700 sentences)

Page 25: Fast Full Parsing by Linear-Chain Conditional Random Fields

Comparison with other parsers

Recall Prec. F-score Time (min)

This work (deterministic) 86.3 87.5 86.9 0.5

This work (beam = 4) 88.2 88.7 88.4 1.7

Huang (2008) 91.7 Unk

Finkel et al. (2008) 87.8 88.2 88.0 >250

Petrov & Klein (2008) 88.3 3

Sagae & Lavie (2006) 87.8 88.1 87.9 17

Charniak & Johnson (2005) 90.6 91.3 91.0 Unk

Charniak (2000) 89.6 89.5 89.5 23

Collins (1999) 88.1 88.3 88.2 39

Section 23, all sentences (2,416 sentences)

Page 26: Fast Full Parsing by Linear-Chain Conditional Random Fields

Discussions

• Improving chunking accuracy– Semi-Markov CRFs (Sarawagi and Cohen, 2004)– Higher order CRFs

• Increasing the size of training data– Create a treebank by parsing a large number of

sentences with an accurate parser– Train the fast parser using the treebank

Page 27: Fast Full Parsing by Linear-Chain Conditional Random Fields

Conclusion

• Full parsing by cascaded chunking– Chunking with CRFs– Depth-first search

• Performance– F-score = 86.9 (12msec/sentence)– F-score = 88.4 (42msec/sentence)

• Available soon