fast full parsing by linear-chain conditional random fields
DESCRIPTION
Fast Full Parsing by Linear-Chain Conditional Random Fields. Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester. Outline. Motivation Parsing algorithm Chunking with conditional random fields Searching for the best parse Experiments Penn Treebank - PowerPoint PPT PresentationTRANSCRIPT
Fast Full Parsing by Linear-Chain Conditional Random Fields
Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou
The University of Manchester
Outline• Motivation
• Parsing algorithm• Chunking with conditional random fields• Searching for the best parse
• Experiments• Penn Treebank
• Conclusions
Motivation• Parsers are useful in many NLP applications– Information extraction, Summarization, MT, etc.
• But parsing is often the most computationally expensive component in the NLP pipeline
• Fast parsing is useful when– The document collection is large
– e.g. MEDLINE corpus: 70 million sentences– Real-time processing is required
– e.g. web applications
Parsing algorithms
• History-based approaches– Bottom-up & left-to-right (Ratnaparkhi, 1997)– Shift-reduce (Sagae & Lavie 2006)
• Global modeling– Tree CRFs (Finkel et al., 2008; Petrov & Klein 2008)– Reranking (Collins 2000; Charniak & Johnson, 2005)– Forest (Huang, 2008)
Chunk parsing• Parsing Algorithm
1. Identify phrases in the sequence.2. Convert the recognized phrases into new non-
terminal symbols.3. Go back to 1.
• Previous work– Memory-based learning (Tjong Kim Sang, 2001)• F-score: 80.49
– Maximum entropy (Tsuruoka and Tsujii, 2005)• F-score: 85.9
Parsing a sentence
Estimated volume was a light 2.4 million ounces .
VBN NN VBD DT JJ CD CD NNS .
QP
NP
VP
NP
S
Estimated volume was a light 2.4 million ounces .
VBN NN VBD DT JJ CD CD NNS .
QPNP
1st iteration
volume was a light million ounces .
NP VBD DT JJ QP NNS .
NP
2nd iteration
volume was ounces .
NP VBD NP .
VP
3rd iteration
volume was .
NP VP .
S
4th iteration
was
S
5th iteration
Estimated volume was a light 2.4 million ounces .
VBN NN VBD DT JJ CD CD NNS .
QP
NP
VP
NP
S
Complete parse tree
Chunking with CRFs
• Conditional random fields (CRFs)
• Features are defined on states and state transitions
Feature functionFeature weight
F
i
n
tttiin yytf
ZyyP
1 111 ,,,exp
1)|...( xx
Estimated volume was a light 2.4 million ounces .
VBN NN VBD DT JJ CD CD NNS .
QPNP
Estimated volume was a light 2.4 million ounces .
VBN NN VBD DT JJ CD CD NNS .
Chunking with “IOB” tagging
B-NP I-NP O O O B-QP I-QP O O
NP QP
B : Beginning of a chunk I : Inside (continuation) of the chunkO : Outside of chunks
Features for base chunking
Estimated volume was a light 2.4 million ounces .
VBN NN VBD DT JJ CD CD NNS .
?
Features for non-base chunking
volume was a light million ounces .
NP VBD DT JJ QP NNS .
NP
VBN NN
Estimated volume
?
Finding the best parse
• Scoring the entire parse tree
• The best derivation can be found by depth-first search.
h
iiipscore
0
| xy
Depth first searchPOS tagging
Chunking (base)
Chunking
Chunking Chunking
Chunking
Chunking (base)
Chunking
Chunking
Finding the best parse
Extracting multiple hypotheses from CRF
• A* search– Uses a priority queue– Suitable when top n hypotheses are needed
• Branch-and-bound– Depth-first– Suitable when a probability threshold is given
CRF
BIOOOB
0.3
BIIOOB
0.2
BIOOOO
0.18
Experiments• Penn Treebank Corpus– Training: sections 2-21– Development: section 22– Evaluation: section 23
• Training– Three CRF models
• Part-of-speech tagger• Base chunker• Non-base chunker
– Took 2 days on AMD Opteron 2.2GHz
Training the CRF chunkers
• Maximum likelihood + L1 regularization
• L1 regularization helps avoid overfitting and produce compact modes– OWLQN algorithm (Andrew and Gao, 2007)
i
ij
jj CpL xy |log
Chunking performance
Symbol # Samples Recall Precison F-score
NP 317,597 94.79 94.16 94.47
VP 76,281 91.46 91.98 91.72
PP 66,979 92.84 92.61 92.72
S 33,739 91.48 90.64 91.06
ADVP 21,686 84.25 85.86 85.05
ADJP 14,422 77.27 78.46 77.86
: : : : :
All 579,253 92.63 92.62 92.63
Section 22, all sentences
Beam width and parsing performance
Beam Recall Precision F-score Time (sec)
1 86.72 87.83 87.27 16
2 88.50 88.85 88.67 41
3 88.69 89.08 88.88 61
4 88.72 89.13 88.92 92
5 88.73 89.14 88.93 119
10 88.68 89.19 88.93 179
Section 22, all sentences (1,700 sentences)
Comparison with other parsers
Recall Prec. F-score Time (min)
This work (deterministic) 86.3 87.5 86.9 0.5
This work (beam = 4) 88.2 88.7 88.4 1.7
Huang (2008) 91.7 Unk
Finkel et al. (2008) 87.8 88.2 88.0 >250
Petrov & Klein (2008) 88.3 3
Sagae & Lavie (2006) 87.8 88.1 87.9 17
Charniak & Johnson (2005) 90.6 91.3 91.0 Unk
Charniak (2000) 89.6 89.5 89.5 23
Collins (1999) 88.1 88.3 88.2 39
Section 23, all sentences (2,416 sentences)
Discussions
• Improving chunking accuracy– Semi-Markov CRFs (Sarawagi and Cohen, 2004)– Higher order CRFs
• Increasing the size of training data– Create a treebank by parsing a large number of
sentences with an accurate parser– Train the fast parser using the treebank
Conclusion
• Full parsing by cascaded chunking– Chunking with CRFs– Depth-first search
• Performance– F-score = 86.9 (12msec/sentence)– F-score = 88.4 (42msec/sentence)
• Available soon