probabilistic context free grammars · 2018. 11. 28. · slides courtesy rebecca knowles. outline...
TRANSCRIPT
Probabilistic Context Free Grammars
CMSC 473/673
UMBC
Outline
Recap: MT word alignment
Structure in Language: Constituency
(Probabilistic) Context Free GrammarsDefinitionsHigh-level tasks: Generating and ParsingSome uses for PCFGs
CKY Algorithm: Parsing with a (P)CFG
Machine Translation as aNoisy Channel Model
Decode Rerank
written in (clean) English
observed Russian (noisy)
text
translation/decode model
(clean) language model
English
language
язы́к
speak
text
word
language
speak
text
word
language
Slides courtesy Rebecca Knowles
?
Idea: Learn Word-to-Word Translation via Word Alignment
The cat is on the chair.
Le chat est sur la chaise.
The cat is on the chair.
Le chat est sur la chaise.
Slides courtesy Rebecca Knowles
Assumption: Parallel TextsWhereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world,
Whereas disregard and contempt for human rights have resulted in barbarous acts which have outraged the conscience of mankind, and the advent of a world in which human beings shall enjoy freedom of speech and belief and freedom from fear and want has been proclaimed as the highest aspiration of the common people,
Whereas it is essential, if man is not to be compelled to have recourse, as a last resort, to rebellion against tyranny and oppression, that human rights should be protected by the rule of law,
Whereas it is essential to promote the development of friendly relations between nations,…http://www.un.org/en/universal-declaration-human-rights/
Yolki, pampa ni tlatepanitalotl, ni tlasenkauajkayotl iuan ni kualinemilistli ipan ni tlalpan, yaya ni moneki moixmatis uan monemilis, ijkinoj nochi kuali tiitstosej ika touampoyouaj.
Pampa tlaj amo tikixmatij tlatepanitalistli uan tlen kuali nemilistli ipanni tlalpan, yeka onkatok kualantli, onkatok tlateuilistli, onkatokmajmajtli uan sekinok tlamantli teixpanolistli; yeka moneki ma kualitimouikakaj ika nochi touampoyouaj, ma amo onkaj majmajyotl uanteixpanolistli; moneki ma onkaj yejyektlalistli, ma titlajtlajtokaj uan ma tijneltokakaj tlen tojuantij tijnekij tijneltokasej uan amo tlen ma topanti, kenke, pampa tijnekij ma onkaj tlatepanitalistli.
Pampa ni tlatepanitalotl moneki ma tiyejyekokaj, ma tijchiuakaj uanma tijmanauikaj; ma nojkia kiixmatikaj tekiuajtinij, uejueyij tekiuajtinij, ijkinoj amo onkas nopeka se akajya touampoj san tlen ueli kinekistechchiuilis, technauatis, kinekis technauatis ma tijchiuakaj se tlamantli tlen amo kuali; yeka ni tlatepanitalotl tlauel moneki ipantonemilis ni tlalpan.
Pampa nojkia tlauel moneki ma kuali timouikakaj, ma tielikaj keuaktiiknimej, nochi tlen tlakamej uan siuamej tlen tiitstokej ni tlalpan.…
http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nhn
Slides courtesy Rebecca Knowles
• Bitext/parallel texts: sentences with their (human provided) translations• Sentences are aligned, words are not• Commonly used bitext: Europarl (http://www.statmt.org/europarl/)
Alignments
If we had word-aligned text, we could easily estimate P(f|e).
But we don’t usually have word alignments, and they are expensive to produce by hand…
If we had P(f|e) we could produce alignments automatically.
Slides courtesy Rebecca Knowles
Join
t m
od
el
un
ob
serv
edIBM Model 1
(1993)
f: vector of French words
(visualization of alignment)
e: vector of English words
a: vector of alignment indices
Le chat est sur la chaise verte
The cat is on the green chair
0 1 2 3 4 6 5
Slides courtesy Rebecca Knowles
Lexical Translation ModelWord Alignment ModelFor all IBM models, see the original paper (Brown et al,
1993): http://www.aclweb.org/anthology/J93-2003
ob
serv
ed
t(fj|ei) : translation probability of the word fj given the word ei
Expectation Maximization (EM)0. Assume some value for and compute other parameter values
Two step, iterative algorithm
1. E-step: count alignments and translations under uncertainty, assuming these parameters
2. M-step: maximize log-likelihood (update parameters), using uncertain counts
estimated counts
P( | “the cat”)
P( | “the cat”)le chat
le chat
Slides courtesy Rebecca Knowles
Outline
Recap: MT word alignment
Structure in Language: Constituency
(Probabilistic) Context Free GrammarsDefinitionsHigh-level tasks: Generating and ParsingSome uses for PCFGs
CKY Algorithm: Parsing with a (P)CFG
Parts of Speech
Adapted from Luke Zettlemoyer
Closed class words
Open class words
Nouns
milk cat
cats
UMBC
Baltimorebread
speak
give
can
do
may
Verbs
Adjectives
would-be
wettestlargehappy
red
fake
Kamp & Partee (1995)
Adverbs
recentlyhappily
thenthere (location)
intransitive
run
ditransitive
transitive
subsective
non-subsective
modals, auxiliaries
Numbers
I you one
1,324
Determiners
PrepositionsConjunctions
Pronouns
and or if
athe
every
what
inunder
topParticles
(set) up
so (far)
not
(call) off
becausebecause
Constituency
spans of words that act (syntactically) as a group
“X phrase” (noun phrase)
Constituency
spans of words that act (syntactically) as a group
“X phrase” (noun phrase)
Baltimore is a great place to be.
This house is a great place to be.
This red house is a great place to be.
This red house on the hill is a great place to be.
noun phrase (NP)
Constituency
spans of words that act (syntactically) as a group
“X phrase” (noun phrase)
Baltimore is a great place to be.
This house is a great place to be.
This red house is a great place to be.
This red house on the hill is a great place to be. noun phrase (NP) noun phrase (NP)
Constituency
spans of words that act (syntactically) as a group
“X phrase” (noun phrase)
Baltimore is a great place to be.
This house is a great place to be.
This red house is a great place to be.
This red house on the hill is a great place to be.
Is this house a great place to be?
noun phrase (NP) noun phrase (NP)
Constituency
spans of words that act (syntactically) as a group
“X phrase” (noun phrase)
Baltimore is a great place to be.
This house is a great place to be.
This red house is a great place to be.
This red house on the hill is a great place to be.
*This is house a great place to be.
noun phrase (NP) noun phrase (NP)
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group
“X phrase” (noun phrase)
Baltimore is a great place to be.This house is a great place to be.
This red house is a great place to be.This red house on the hill is a great place to be.
S NP V NP
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group
“X phrase” (noun phrase)
Baltimore is a great place to be.This house is a great place to be.
This red house is a great place to be.This red house on the hill is a great place to be.
S NP V NPNP Det Noun
NP Noun
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group
“X phrase” (noun phrase)
Baltimore is a great place to be.This house is a great place to be.
This red house is a great place to be.This red house on the hill is a great place to be.
S NP V NPNP Det Noun
NP NounNP Det Adj Noun
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group
“X phrase” (noun phrase)
Baltimore is a great place to be.This house is a great place to be.
This red house is a great place to be.This red house on the hill is a great place to be.
The hill is a great place to be.
S NP V NPNP Det Noun
NP NounNP Det Adj Noun
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group
“X phrase” (noun phrase)
Baltimore is a great place to be.This house is a great place to be.
This red house is a great place to be.This red house on the hill is a great place to be.
The hill is a great place to be.
S NP V NPNP Det Noun
NP NounNP Det Adj NounNP NP Prep NP
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group“X phrase” (noun phrase)
Baltimore is a great place to be.This house is a great place to be.
This red house is a great place to be.This red house on the hill is a great place to be.
This red house near the hill is a great place to be.This red house atop the hill is a great place to be.
The hill is a great place to be.
S NP V NPNP Det Noun
NP NounNP Det Adj Noun
NP NP PPPP P NP
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group“X phrase” (noun phrase)
Baltimore is a great place to be.This house is a great place to be.
This red house is a great place to be.This red house on the hill is a great place to be.
This red house near the hill is a great place to be.This red house atop the hill is a great place to be.
The hill is a great place to be.
S NP V NPNP Det Noun
NP NounNP Det AdjP
NP NP PPPP P NP
AdjP Adj Noun
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group“X phrase” (noun phrase)
Baltimore is a great place to be.This house is a great place to be.
This red house is a great place to be.This red house on the hill is a great place to be.
This red house near the hill is a great place to be.This red house atop the hill is a great place to be.
The hill is a great place to be.
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PPPP P NP
AdjP Adj NounVP V NP
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group“X phrase” (noun phrase)
Baltimore is a great place to be.This house is a great place to be.
This red house is a great place to be.This red house on the hill is a great place to be.
This red house near the hill is a great place to be.This red house atop the hill is a great place to be.
The hill is a great place to be.
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PPPP P NP
AdjP Adj NounVP V NP
Constituents Help Form Grammars
constituent: spans of words that act (syntactically) as a group“X phrase” (noun phrase)
Baltimore is a great place to be.This house is a great place to be.
This red house is a great place to be.This red house on the hill is a great place to be.
This red house near the hill is a great place to be.This red house atop the hill is a great place to be.
The hill is a great place to be.
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V NPNoun Baltimore
Outline
Recap: MT word alignment
Structure in Language: Constituency
(Probabilistic) Context Free GrammarsDefinitionsHigh-level tasks: Generating and ParsingSome uses for PCFGs
CKY Algorithm: Parsing with a (P)CFG
Context Free Grammar
Set of rewrite rules, comprised of terminals and non-terminals
Terminals: the words in the language (the lexicon), e.g., Baltimore
Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun
(Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V NPNoun Baltimore
Context Free Grammar
Set of rewrite rules, comprised of terminals and non-terminals
Terminals: the words in the language (the lexicon), e.g., Baltimore
Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun
(Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun
Applications: Learn more
in CMSC 331, 431
Theory: Learn
more in CMSC 451
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V NPNoun Baltimore
How Do We Robustly Handle Ambiguities?
How Do We Robustly Handle Ambiguities?
Add probabilities (to what?)
Probabilistic Context Free Grammar
Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals
Terminals: the words in the language (the lexicon), e.g., Baltimore
Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun
(Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun
S NP VPNP Det NounNP NounNP Det AdjPNP NP PP
PP P NPAdjP Adj NounVP V NPNoun Baltimore…
Probabilistic Context Free Grammar
Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals
Terminals: the words in the language (the lexicon), e.g., Baltimore
Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun
(Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun
Q: What are the distributions? What must sum to 1?
S NP VPNP Det NounNP NounNP Det AdjPNP NP PP
PP P NPAdjP Adj NounVP V NPNoun Baltimore…
Probabilistic Context Free Grammar
Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals
Terminals: the words in the language (the lexicon), e.g., Baltimore
Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun
(Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun
1.0 S NP VP.4 NP Det Noun.3 NP Noun.2 NP Det AdjP.1 NP NP PP
1.0 PP P NP.34 AdjP Adj Noun.26 VP V NP.0003 Noun Baltimore…
Q: What are the distributions? What must sum to 1?
A: P(X Y Z | X)
Probabilistic Context Free Grammar
p(S
NP VP
Noun
Baltimore
Verb NP
is a great city
)= product of probabilities of individual rules used in the
derivation
Probabilistic Context Free Grammar
p(S
NP VP
Noun
Baltimore
Verb NP
is a great city
)=
p(S
NP VP
) *
product of probabilities of individual rules used in the
derivation
Probabilistic Context Free Grammar
p(S
NP VP
Noun
Baltimore
Verb NP
is a great city
)=
p(S
NP VP
) *
p( ) *NP
Noun
p( ) *Noun
Baltimore
product of probabilities of individual rules used in the
derivation
Probabilistic Context Free Grammar
p(S
NP VP
Noun
Baltimore
Verb NP
is a great city
)=
p(S
NP VP
) *
p( ) *
p( ) *
NP
Noun
p( ) *Noun
Baltimore
VP
Verb NP
p( ) *Verb
is
p( )NP
a great city
product of probabilities of individual rules used in the
derivation
Log Probabilistic Context Free Grammar
lp(S
NP VP
Noun
Baltimore
Verb NP
is a great city
)=
lp(S
NP VP
) +
lp( ) +
lp( ) +
NP
Noun
lp( ) +Noun
Baltimore
VP
Verb NP
lp( ) +Verb
is
lp( )NP
a great city
sum of log probabilities of individual rules used in the
derivation
Estimating PCFGs
Attempt 1:
• Get access to a treebank (corpus of syntactically annotated sentences), e.g., the English Penn Treebank
• Count productions
• Smooth these counts
• This gets ~75 F1
Probabilistic Context Free Grammar (PCFG) Tasks
Find the most likely parse (for an observed sequence)
Calculate the (log) likelihood of an observed sequence w1, …, wN
Learn the grammar parameters
Outline
Recap: MT word alignment
Structure in Language: Constituency
(Probabilistic) Context Free GrammarsDefinitionsHigh-level tasks: Generating and ParsingSome uses for PCFGs
CKY Algorithm: Parsing with a (P)CFG
Context Free Grammar
Generate: Iteratively create a string (or tree 1.derivation) using the rewrite rules
Parse: Assign a tree (if possible) to an input string2.
Generate from a Context Free Grammar
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V NPNoun Baltimore
S
S
Generate from a Context Free Grammar
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V NPNoun Baltimore
NP VP
S
NP VP
Generate from a Context Free Grammar
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V NPNoun Baltimore
Noun VP
S
NP VP
Noun
Generate from a Context Free Grammar
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V NPNoun Baltimore
Baltimore VP
S
NP VP
Noun
Baltimore
Generate from a Context Free Grammar
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V NPNoun Baltimore
Baltimore V NP
S
NP VP
Noun
Baltimore
Verb NP
Generate from a Context Free Grammar
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V NPNoun Baltimore
…
Baltimore is a great city
S
NP VP
Noun
Baltimore
Verb NP
is a great city
Generate from a Context Free Grammar
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V NPNoun Baltimore
…
Baltimore is a great city
S
NP VP
Noun
Baltimore
Verb NP
is a great city
Assign Structure (Parse) with aContext Free Grammar
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V NPNoun Baltimore
…
Baltimore is a great city
S
NP VP
Noun
Baltimore
Verb NP
is a great city
Assign Structure (Parse) with aContext Free Grammar
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V NPNoun Baltimore
…
[S [NP [Noun Baltimore] ] [VP [Verb is] [NP a great city]]]
S
NP VP
Noun
Baltimore
Verb NP
is a great city
bracket notation
Assign Structure (Parse) with aContext Free Grammar
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V NPNoun Baltimore
…
(S (NP (Noun Baltimore))(VP (V is)
(NP a great city)))
S
NP VP
Noun
Baltimore
Verb NP
is a great city
S-expression
Some CFG Terminology:Derivation/Parse Tree
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V `PNoun Baltimore
…
S
NP VP
Noun
Baltimore
Verb NP
is a great city
derivation,parse tree
Some CFG Terminology:Start Symbol
S NP VPNP Det Noun
NP NounNP Det AdjP
NP NP PP
PP P NPAdjP Adj Noun
VP V NPNoun Baltimore
…
S
NP VP
Noun
Baltimore
Verb NP
is a great city
start symbol
Some CFG Terminology:Rewrite Choices
S NP VPNP Det Noun |
Noun |Det AdjP |NP PP
PP P NPAdjP Adj NounVP V NPNoun Baltimore | ….…
S
NP VP
Noun
Baltimore
Verb NP
is a great city
show choices with “|” (vertical bar)
Some CFG Terminology:Chomsky Normal Form (CNF)
non-terminal non-terminal non-terminal
non-terminal terminal
X Y Z
X a
binary rules can only involve non-terminals
unary rules can only involve terminals
Restricted binary and unary rules only
No ternary rules (or above)
Outline
Recap: MT word alignment
Structure in Language: Constituency
(Probabilistic) Context Free GrammarsDefinitionsHigh-level tasks: Generating and ParsingSome uses for PCFGs
CKY Algorithm: Parsing with a (P)CFG
What are some benefits to CFGs?
Why should you care about syntax?
Some Uses of CFGs
Clearly disambiguate certain ambiguities
Morphological derivations
Identify “grammatical” sentences
…
Clearly Show Ambiguity
I ate the meal with friends
Clearly Show Ambiguity
I ate the meal with friends
Clearly Show Ambiguity
I ate the meal with friendssalt
Clearly Show Ambiguity
I ate the meal with friends
NP VP
VP NP PP
S
Clearly Show Ambiguity
I ate the meal with friends
NP VP
VP NP PP
S
NP VP
S
VP NP
PPNP
Clearly Show Ambiguity
I ate the meal with friends
NP VP
VP NP PP
S
NP VP
S
VP NP
PPNP
PP Attachment(a common source of errors,
even still today)
Clearly Show Ambiguity…But Not Necessarily All Ambiguity
I ate the meal with friends
NP VP
VP NP PP
S
I ate the meal with gusto
I ate the meal with a fork
Other Attachment Ambiguity
We invited the students, Chris and Pat.
Coordination Ambiguity
old men womenand
Grammars Aren’t Just for Syntax
overgeneralization
general -izeA AV
generalizeV
-tionVN
generalizationN
over-NN
overgeneralizationN
Clearly Show Grammaticality (?)
The old man the boats
S
NP VP
Clearly Show Grammaticality (?)
The old man the boats
S
NP VP
S
NP NP
Clearly Show Grammaticality (?)
The old man the boats
S
NP VP
S
NP NP
Idea: define grammatical sentences as those that can be parsed by a grammar
Clearly Show Grammaticality (?)
The old man the boats
S
NP VP
S
NP NP
Idea: define grammatical sentences as those that can be parsed by a grammar
Issue 1: Which grammar?
Clearly Show Grammaticality (?)
The old man the boats
S
NP VP
S
NP NP
Idea: define grammatical sentences as those that can be parsed by a grammar
Issue 1: Which grammar?
Issue 2: Discourse demands flexibility
Q: What do you see?
A: [I see] The old man [and] the boats.
Outline
Recap: MT word alignment
Structure in Language: Constituency
(Probabilistic) Context Free GrammarsDefinitionsHigh-level tasks: Generating and ParsingSome uses for PCFGs
CKY Algorithm: Parsing with a (P)CFG
Parsing with a CFG
Top-down backtracking (brute force)
CKY Algorithm: dynamic bottom-up
Earley’s Algorithm: dynamic top-down
not covered due to time
CKY Precondition
Grammar must be in Chomsky Normal Form (CNF)
non-terminal non-terminal non-terminal
non-terminal terminal
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det aExample from Jason Eisner
Entire grammarAssume uniform weights
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform weights
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform weights
Goal:
(S, 0, 7)
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform weights
Check 1: What are the non-terminals?
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform weights
Check 1: What are the non-terminals?
SNPVPPP
NVPDet
Check 2: What are the terminals?
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform weights
Check 1: What are the non-terminals?
SNPVPPP
NVPDet
Check 2: What are the terminals?
Papacaviarspoonate
withthea
Check 3: What are the pre-terminals?
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform weights
Check 1: What are the non-terminals?
SNPVPPP
NVPDet
Check 2: What are the terminals?
Papacaviarspoonate
withthea
Check 3: What are the pre-terminals?
NV
PDet
Check 4: Is this in CNF?
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform weights
Check 1: What are the non-terminals?
SNPVPPP
NVPDet
Check 2: What are the terminals?
Papacaviarspoonate
withthea
Check 3: What are the pre-terminals?
NV
PDet
Check 4: Is this in CNF?Yes
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform weights
First: Let’s find all NPs
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform weights
First: Let’s find all NPs
(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar
“Papa ate the caviar with a spoon”
S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform
weights
First: Let’s find all NPs
(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar
(NP, 0, 1) (VP, 1, 7) (S, 0, 7)
“Papa ate the caviar with a spoon”S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform
weights
First: Let’s find all NPs
(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar
0
6
1
2
3
4
5
1 2 3 4 5 6 7
(NP, 0, 1) (VP, 1, 7) (S, 0, 7)
start
end
“Papa ate the caviar with a spoon”S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform
weights
First: Let’s find all NPs
(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar
NP0
6
1
2
3
4
5
1 2 3 4 5 6 7
(NP, 0, 1) (VP, 1, 7) (S, 0, 7)
start
end
“Papa ate the caviar with a spoon”S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform
weights
First: Let’s find all NPs
(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar
NP
VP
0
6
1
2
3
4
5
1 2 3 4 5 6 7
(NP, 0, 1) (VP, 1, 7) (S, 0, 7)
start
end
“Papa ate the caviar with a spoon”S NP VP
NP Det N
NP NP PP
VP V NP
VP VP PP
PP P NP
NP Papa
N caviar
N spoon
V spoon
V ate
P with
Det the
Det a
0 1 2 3 4 5 6 7
Example from Jason Eisner
Entire grammarAssume uniform
weights
First: Let’s find all NPs
(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon
Second: Let’s find all VPs
(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar
Third: Let’s find all Ss
(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar
NP
VP
0
6
1
2
3
4
5
1 2 3 4 5 6 7
(NP, 0, 1) (VP, 1, 7) (S, 0, 7)
start
end
S
CKY Recognizer
Input: * string of N words
* grammar in CNF
Output: True (with parse)/False
Data structure: N*N table T
Rows indicate span start (0 to N-1)
Columns indicate span end (1 to N)
T[i][j] lists constituents spanning i j
CKY Recognizer
Input: * string of N words* grammar in CNF
Output: True (with parse)/False
Data structure: N*N table TRows indicate span
start (0 to N-1)Columns indicate span
end (1 to N)
T[i][j] lists constituents spanning i j
For Viterbi in HMMs:build table left-to-right
For CKY in trees:1. build smallest-to-largest &2. left-to-right
CKY RecognizerT = Cell[N][N+1]
CKY RecognizerT = Cell[N][N+1]
for(j = 1; j ≤ N; ++j) {
T[j-1][j].add(X for non-terminal X in G if X wordj)
}
CKY RecognizerT = Cell[N][N+1]
for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)
}
for(width = 2; width ≤ N; ++width) {
}
CKY RecognizerT = Cell[N][N+1]
for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)
}
for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {
end = start + width
}}
CKY RecognizerT = Cell[N][N+1]
for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)
}
for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {
end = start + widthfor(mid = start+1; mid < end; ++mid) {
}}
}
CKY RecognizerT = Cell[N][N+1]
for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)
}
for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {
end = start + widthfor(mid = start+1; mid < end; ++mid) {
for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {
T[start][end].add(X for rule X Y Z : G)}
}}
}}
X
Y Z
Y Z
CKY RecognizerT = Cell[N][N+1]
for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)
}
for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {
end = start + widthfor(mid = start+1; mid < end; ++mid) {
for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {
T[start][end].add(X for rule X Y Z : G)}
}}
}}
Q: What do we return?
CKY RecognizerT = Cell[N][N+1]
for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)
}
for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {
end = start + widthfor(mid = start+1; mid < end; ++mid) {
for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {
T[start][end].add(X for rule X Y Z : G)}
}}
}}
Q: What do we return?
A: S in T[0][N]
CKY RecognizerT = Cell[N][N+1]
for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)
}
for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {
end = start + widthfor(mid = start+1; mid < end; ++mid) {
for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {
T[start][end].add(X for rule X Y Z : G)}
}}
}}
Q: How do we get the parse?
CKY RecognizerT = Cell[N][N+1]
for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)
}
for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {
end = start + widthfor(mid = start+1; mid < end; ++mid) {
for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {
T[start][end].add(X for rule X Y Z : G)}
}}
}}
Q: How do we get the parse?
A: Follow backpointers (stored where?)
CKY RecognizerT = Cell[N][N+1]
for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)
}
for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {
end = start + widthfor(mid = start+1; mid < end; ++mid) {
for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {
T[start][end].add(X for rule X Y Z : G)}
}}
}}
CKY RecognizerT = Cell[N][N+1]
for(j = 1; j ≤ N; ++j) {
T[j-1][j].add(X for non-terminal X in G if X wordj)
}
for(width = 2; width ≤ N; ++width) {
for(start = 0; start < N - width; ++start) {
end = start + width
for(mid = start+1; mid < end; ++mid) {
for(rule X Y Z : G) {
T[start][end].add(X if Y in T[start][mid] & Z in T[mid][end])
}
}
}
}
CKY RecognizerT = bool[K][N][N+1]
for(j = 1; j ≤ N; ++j) {for(non-terminal X in G if X wordj) {
T[X][j-1][j] = True}
}
for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {
end = start + widthfor(mid = start+1; mid < end; ++mid) {
for(rule X Y Z : G) {for rule X Y Z : G) {
T[X][start][end] = T[Y][start][mid] & T[Z][mid][end]}
}}
}}
Another PCFG Task: Likelihood of the Observed Words
p(S + w1 w2 w3 … wN)
p(w1 w2 w3 … wN)likelihood of word sequence w1w2…wN
p( )
S
w1 w2 w3 w4
p( )
S
w1
w2 w3
w4
p( )
S
w1
w2
w3 w4
…
likelihood of word sequence w1w2…wN
based on starting at S
“syntactic language model”
CKY is Versatile: PCFG TasksTask PCFG algorithm name HMM analog
Find any parse CKY recognizer none
Find the most likely parse (for an observed sequence)
CKY weighted Viterbi Viterbi
Calculate the (log) likelihood of an observed sequence w1, …, wN Inside algorithm Forward algorithm
Learn the grammar parametersInside-outside algorithm (EM)
Forward-backward/Baum-Welch (EM)
CKY Algorithms
Weights ⓪ ①
RecognizerBoolean
(True/False)or and False True
Viterbi [0,1] max * 0 1
Inside [0,1] + * 0 1
Outside?Not really (“Semiring Parsing,” Goodman, 1998). But there is a
connection between inside-outside and backprop! (“Inside-Outside and Forward-Backward Algorithms are Just Backprop,” Eisner, 2016)
Adapted from Jason Eisner
Outline
Recap: MT word alignment
Structure in Language: Constituency
(Probabilistic) Context Free GrammarsDefinitionsHigh-level tasks: Generating and ParsingSome uses for PCFGs
CKY Algorithm: Parsing with a (P)CFG