![Page 1: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/1.jpg)
Slav Petrov – Google
Thanks to:
Dan Klein, Ryan McDonald, Alexander Rush, Joakim Nivre, Greg Durrett, David Weiss
Lisbon Machine Learning School 2016
Syntax and Parsing I Constituency Parsing
![Page 2: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/2.jpg)
XVP
Analyzing Natural Language
[ ][ ]
X X
X
XS
VP
PPNPXVX
NP
P
They solved the problem with statistics
![Page 3: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/3.jpg)
NP
VP
S
V
NP
P PPNP
Syntax and Semantics
They solved the problem with statistics
![Page 4: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/4.jpg)
PRON VERB DET NOUN ADP NOUN
nsubj
det
dobj
prep
pobjROOT
Constituency and Dependency
They solved the problem with statistics
![Page 5: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/5.jpg)
prep
PRON VERB DET NOUN ADP NOUN
nsubj
det
dobjpobjROOT
Constituency and Dependency
They solved the problem with statistics
![Page 6: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/6.jpg)
A “real” Sentence
Influential members of the House Ways and Means Committee introduced legislation that would restrict how the new
savings-and-loan bailout agency can raise capital, creating another potential obstacle to the government's sale of sick thrifts.
![Page 7: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/7.jpg)
Phrase Structure Parsing
• Phrase structure parsing organizes syntax into constituents or brackets
• In general, this involves nested trees
• Linguists can, and do, argue about details
• Lots of ambiguity
• Not the only kind of syntax…
• First part of today’s lecture
new art critics write reviews with computers
![Page 8: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/8.jpg)
Dependency Parsing
• Directed edges between pairs of word (head, dependent)
• Can handle free word-order languages
• Very efficient decoding algorithms exist
• Second part of today’s lecture
![Page 9: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/9.jpg)
Classical NLP: Parsing• Write symbolic or logical rules:
• Use deduction systems to prove parses from words • Minimal grammar on “Fed raises” sentence: 36 parses • Real-size grammar: many millions of parses
• This scaled very badly, didn’t yield broad-coverage tools
Grammar (CFG) LexiconROOT → S S → NP VP NP → DT NN NP → NN NNS
NN → interest NNS → raises VBP → interest VBZ → raises
NP → NP PP VP → VBP NP VP → VBP NP PP PP → IN NP
Fed raises interest rates 0.5 percentNNP NNS NN NNS CD NNVBN VBZ VBP VBZVBD VB
![Page 10: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/10.jpg)
Attachments
• I cleaned the dishes from dinner
• I cleaned the dishes with detergent
• I cleaned the dishes in my pajamas
• I cleaned the dishes in the sink
![Page 11: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/11.jpg)
Probabilistic Context-Free Grammars
• A context-free grammar is a tuple <N, T, S, R> • N : the set of non-terminals
• Phrasal categories: S, NP, VP, ADJP, etc.
• Parts-of-speech (pre-terminals): NN, JJ, DT, VB
• T : the set of terminals (the words) • S : the start symbol
• Often written as ROOT or TOP
• Not usually the sentence non-terminal S
• R : the set of rules • Of the form X → Y1 Y2 … Yk, with X, Yi ∈ N
• Examples: S → NP VP, VP → VP CC VP • Also called rewrites, productions, or local trees
• A PCFG adds: • A top-down production probability per rule P(Y1 Y2 … Yk | X)
![Page 12: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/12.jpg)
Treebank Grammars• Need a PCFG for broad coverage parsing. • Can take a grammar right off the trees (doesn’t work well):
• Better results by enriching the grammar (e.g., lexicalization). • Can also get reasonable parsers without lexicalization.
S → NP VP . 1.0 NP → PRP 0.5
NP → DT NN 0.5
VP → VBD NP 1.0
PRP → She 1.0
…
![Page 13: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/13.jpg)
PLURAL NOUN
NOUNDETDET
ADJNOUN
NP NP
CONJ
NP PP
Treebank Grammar Scale• Treebank grammars can be enormous
• As FSAs, the raw grammar has ~10K states, excluding the lexicon • Better parsers usually make the grammars larger, not smaller
NP
![Page 14: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/14.jpg)
Chomsky Normal Form• Chomsky normal form:
• All rules of the form X → Y Z or X → w • In principle, this is no limitation on the space of (P)CFGs
• N-ary rules introduce new non-terminals
• Unaries / empties are “promoted”
• In practice it’s kind of a pain: • Reconstructing n-aries is easy • Reconstructing unaries is trickier • The straightforward transformations don’t preserve tree scores
• Makes parsing algorithms simpler!
VP
[VP → VBD NP •]
VBD NP PP PP
[VP → VBD NP PP •]
VBD NP PP PP
VP
![Page 15: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/15.jpg)
A Recursive Parser
• Will this parser work? • Why or why not? • Memory requirements?
bestScore(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max score(X->YZ) *
bestScore(Y,i,k) * bestScore(Z,k,j)
![Page 16: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/16.jpg)
A Memoized Parser
• One small change:
bestScore(X,i,j,s) if (scores[X][i][j] == null)
if (j = i+1) score = tagScore(X,s[i]) else score = max score(X->YZ) *
bestScore(Y,i,k) * bestScore(Z,k,j)
scores[X][i][j] = score return scores[X][i][j]
![Page 17: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/17.jpg)
• Can also organize things bottom-up
A Bottom-Up Parser (CKY)
bestScore(s) for (i : [0,n-1])
for (X : tags[s[i]]) score[X][i][i+1] =
tagScore(X,s[i]) for (diff : [2,n])
for (i : [0,n-diff]) j = i + diff for (X->YZ : rule)
for (k : [i+1, j-1]) score[X][i][j] = max score[X][i][j],
score(X->YZ) * score[Y][i][k] * score[Z][k][j]
Y Z
X
i k j
![Page 18: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/18.jpg)
Time: Theory• How much time will it take to parse?
• For each diff (<= n) • For each i (<= n)
• For each rule X → Y Z • For each split point k
Do constant work
• Total time: |rules|*n3 • Something like 5 sec for an unoptimized parse of a
20-word sentences, or 0.2sec for an optimized parser
Y Z
X
i k j
![Page 19: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/19.jpg)
Unary Rules
• Unary rules?
bestScore(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max max score(X->YZ) * bestScore(Y,i,k) * bestScore(Z,k,j) max score(X->Y) * bestScore(Y,i,j)
![Page 20: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/20.jpg)
CNF + Unary Closure• We need unaries to be non-cyclic
• Can address by pre-calculating the unary closure • Rather than having zero or more unaries, always have
exactly one
• Alternate unary and binary layers • Reconstruct unary chains afterwards
NP
DT NN
VP
VBDNP
DT NN
VP
VBD NP
VP
S
SBAR
VP
SBAR
![Page 21: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/21.jpg)
Alternating Layers
bestScoreU(X,i,j,s) if (j = i+1) return tagScore(X,s[i]) else return max max score(X->Y) * bestScoreB(Y,i,j)
bestScoreB(X,i,j,s) return max max score(X->YZ) * bestScoreU(Y,i,k) * bestScoreU(Z,k,j)
![Page 22: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/22.jpg)
Treebank Grammars• Need a PCFG for broad coverage parsing. • Can take a grammar right off the trees (doesn’t work well):
• Better results by enriching the grammar (e.g., lexicalization). • Can also get reasonable parsers without lexicalization.
S → NP VP . 1.0 NP → PRP 0.5
NP → DT NN 0.5
VP → VBD NP 1.0
PRP → She 1.0
…
[Charniak ‘96]
Model F1Charniak ’96 72.0
![Page 23: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/23.jpg)
Conditional Independence?• Not every NP expansion can fill every NP slot
• A grammar with symbols like “NP” won’t be context-free
• Statistically, conditional independence too strong
![Page 24: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/24.jpg)
Non-Independence• Independence assumptions are often too strong.
• Example: the expansion of an NP is highly dependent on the parent of the NP (i.e., subjects vs. objects).
• Also: the subject and object expansions are correlated!
NP PP DT NN PRP
6%9%
11%
NP PP DT NN PRP
21%
9%9%
NP PP DT NN PRP
4%7%
23%All NPs NPs under S NPs under VP
![Page 25: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/25.jpg)
• Structure Annotation [Johnson ’98, Klein & Manning ’03] • Lexicalization [Collins ’99, Charniak ’00] • Latent Variables [Matsuzaki et al. 05, Petrov et al. ’06] • (Neural) CRF Parsing [Hall et al. ’14, Durrett & Klein ’15]
The Game of Designing a Grammar
NP
NP
![Page 26: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/26.jpg)
A Fully Annotated (Unlexicalized) Tree
[Klein & Manning ‘03]
Model F1Charniak ’96 72.0Klein&Manning ’03 86.3
![Page 27: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/27.jpg)
• Annotation refines base treebank symbols to improve statistical fit of the grammar • Head lexicalization [Collins ’99, Charniak ’00]
The Game of Designing a Grammar
![Page 28: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/28.jpg)
Problems with PCFGs
• If we do no annotation, these trees differ only in one rule: • VP → VP PP • NP → NP PP
• Parse will go one way or the other, regardless of words • We addressed this in one way with unlexicalized grammars (how?) • Lexicalization allows us to be sensitive to specific words
![Page 29: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/29.jpg)
Problems with PCFGs
• What’s different between basic PCFG scores here? • What (lexical) correlations need to be scored?
![Page 30: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/30.jpg)
Lexicalized Trees [Charniak ’97, Collins ’97]
• Add “headwords” to each phrasal node • Syntactic vs. semantic heads • Headship not in (most)
treebanks • Usually use head rules, e.g.:
• NP: • Take leftmost NP • Take rightmost N* • Take rightmost JJ • Take right child
• VP: • Take leftmost VB* • Take leftmost VP • Take left child
![Page 31: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/31.jpg)
Lexicalized PCFGs?• Problem: we now have to estimate probabilities like
• Never going to get these atomically off of a treebank
• Solution: break up derivation into smaller steps
![Page 32: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/32.jpg)
Lexical Derivation Steps• A derivation of a local tree [Collins ‘99]
Choose a head tag and word
Choose a complement bag
Generate children (incl. adjuncts)
Recursively derive children
![Page 33: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/33.jpg)
Lexicalized Grammars
• Challenges: • Many parameters to
estimate: requires sophisticated smoothing techniques
• Exact inference is too slow: requires pruning heuristics
• Difficult to adapt to new languages: At least head rules need to be specified, typically more changes needed Model F1
Klein&Manning ’03 86.3Charniak ’00 90.1
![Page 34: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/34.jpg)
Lexicalized CKY
bestScore(X,i,j,h) if (j = i+1) return tagScore(X,s[i]) else return max max score(X[h]->Y[h] Z[h’]) * bestScore(Y,i,k,h) * bestScore(Z,k,j,h’) max score(X[h]->Y[h’] Z[h]) * bestScore(Y,i,k,h’) * bestScore(Z,k,j,h)
Y[h] Z[h’]
X[h]
i h k h’ j
k,h’,X->YZ
(VP->VBD •)[saw] NP[her]
(VP->VBD...NP •)[saw]
k,h’,X->YZ
![Page 35: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/35.jpg)
• Annotation refines base treebank symbols to improve statistical fit of the grammar • Automatic clustering
The Game of Designing a Grammar
![Page 36: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/36.jpg)
Latent Variable Grammars
Parse Tree Sentence Parameters
...
Derivations
[Matsuzaki et al. ’05, Petrov et al. ‘06]
![Page 37: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/37.jpg)
Forward
Learning Latent Annotations
EM algorithm:
X1
X2 X7X4
X5 X6X3
He was right
.
• Brackets are known • Base categories are known • Only induce subcategories
Just like Forward-Backward for HMMs. Backward
Model F1Charniak ’00 90.1Petrov et al. ’06 90.6
![Page 38: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/38.jpg)
• Annotation refines base treebank symbols to improve statistical fit of the grammar • CRF Parsing (+Neural Network Representations)
The Game of Designing a Grammar
NP
NP
![Page 39: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/39.jpg)
Generative vs. Discriminative
Gradient-based algorithm
Generative Discriminative
EM-algorithm
EASY: expectations over observed trees
[Matsuzaki et al. ’05, Petrov et al. ’06]
HARD: expectations over all trees
[Petrov & Klein ’07, ’08]
Maximize joint likelihood of gold tree and sentence
w1 w2 ... wn
w1 w2 ... wn
w1 w2 ... wn
Maximize conditional likelihood of gold tree given sentence
w1 w2 ... wnw1 w2 ... wn
![Page 40: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/40.jpg)
Objective Functions
Generative Objective Function:
w1... wn( )�Lmax
�, [Petrov, Barrett, .... ...
Thibaux & Klein ’06]
Discriminative Objective Function:
w1... wn( )�Lmax
�[Petrov & Klein ’08,
Finkel et. al ’08]
,Bayesian Objective Function:
w1... wn( )�Lmax
�)P(�
[Liang, Petrov, Jordan & Klein ’07]
![Page 41: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/41.jpg)
He gave a speech
NNP VBD DT NN
NP
VP
S
Be atree
(Neural) CRF Parsing[Taskar et al. ’04, Petrov & Klein ’07, Hall et al. ’14, Durrett et al. ’15]
Score of VP overthis span
.w fs
spar
se lo
g-lin
ear m
odel
w
dense neural network
fs.
![Page 42: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/42.jpg)
score(2NP7 → 2NP4 4PP7) = w>f (2NP7 → 2NP4 4PP7)
FirstWord = a
AfterSplit = onPrevWord = gave
FirstWord = a…
& NP → NP PP& NP → NP PP
& NP → NP PP& NP
CRF Parsing Sparse Features
P (T |x) / I[T is a tree]
Y
r2T
exp (score(r))
NP PP
NP
0 1 2 3 4 5 6 7 8He gave a speech on foreign policy .
![Page 43: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/43.jpg)
Neural CRF Model
0 1 2 3 4 5 6 7 8
fs
(arbitrary neural network)fs = g(Hv)
v
score(2NP7 → 2NP4 4PP7) = 2X7 → 2X4 4X7 (NP → NP PP)( )W� f
s
( )f>o
fs
Sparse
He gave a speech on foreign policy .Model F1Petrov et al. ’06 90.6Durrett et al. ’16 91.3
![Page 44: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/44.jpg)
LSTM Parsing• Treat parsing as a sequence-to-sequence prediction
problem • Completely ignores tree structure, uses LSTMs as
black boxes
[Vinyals et al. ’15]
![Page 45: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/45.jpg)
Detailed English Results
92.692.4
92.092.4
91.8
92.3
91.791.591.6
91.491.391.1
89.2
91.0
90.189.7
Single Parser Reranker Product Combination
[Cha
rnia
k ’0
0]
[Pet
rov
et a
l. ’0
6]
[Car
rera
s et
al.
’08]
[Hua
ng &
Har
per
’08]
[Pet
rov
’10]
[Cha
rnia
k &
Joh
nson
’05]
[Hua
ng ’0
8]
[McC
losk
y et
al.
’06]
[Sag
ae &
Lav
ie ’0
6]
[Fos
sum
& K
nigh
t ’0
9]
[Zha
ng e
t al
. ’09
]
[Hua
ng &
Har
per,
Petr
ov ’1
0]
[Hua
ng &
Har
per,
Petr
ov ’1
0]
Self-Trained
[Hal
l ’12
]
[Dur
rett
et
al. ’
15]
[Zhu
et
al. ’
13]
![Page 46: Syntax and Parsing I - Instituto de Telecomunicaçõeslxmls.it.pt/2016/Part_1_Constituency_Parsing_2016.pdf · Syntax and Parsing I ... A Fully Annotated (Unlexicalized) Tree [Klein](https://reader034.vdocuments.site/reader034/viewer/2022052607/5a701ecb7f8b9aa7538bb383/html5/thumbnails/46.jpg)
70
75
80
85
90
95
Arab
ic
Basq
ue
Frenc
h
German
Hebrew
Hunga
rian
Korea
nPo
lish
Swed
ish
Avera
ge
85.183.5
93.0
82.2
90.788.6
81.081.3
85.4
80.2
83.282.0
90.7
80.2
88.387.2
78.479.7
83.4
78.8
80.980.6
86.8
78.6
85.285.4
78.379.8
74.7
78.7
Petrov et al. '06* Hall et al. ’14 Durrett et al. ’15
Test
set
F1 a
ll le
ngth
sMulti-Lingual Results