probabilistic context free grammars · 2018. 11. 28. · slides courtesy rebecca knowles. outline...

Post on 02-Mar-2021

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Probabilistic Context Free Grammars

CMSC 473/673

UMBC

Outline

Recap: MT word alignment

Structure in Language: Constituency

(Probabilistic) Context Free GrammarsDefinitionsHigh-level tasks: Generating and ParsingSome uses for PCFGs

CKY Algorithm: Parsing with a (P)CFG

Machine Translation as aNoisy Channel Model

Decode Rerank

written in (clean) English

observed Russian (noisy)

text

translation/decode model

(clean) language model

English

language

язы́к

speak

text

word

language

speak

text

word

language

Slides courtesy Rebecca Knowles

?

Idea: Learn Word-to-Word Translation via Word Alignment

The cat is on the chair.

Le chat est sur la chaise.

The cat is on the chair.

Le chat est sur la chaise.

Slides courtesy Rebecca Knowles

Assumption: Parallel TextsWhereas recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world,

Whereas disregard and contempt for human rights have resulted in barbarous acts which have outraged the conscience of mankind, and the advent of a world in which human beings shall enjoy freedom of speech and belief and freedom from fear and want has been proclaimed as the highest aspiration of the common people,

Whereas it is essential, if man is not to be compelled to have recourse, as a last resort, to rebellion against tyranny and oppression, that human rights should be protected by the rule of law,

Whereas it is essential to promote the development of friendly relations between nations,…http://www.un.org/en/universal-declaration-human-rights/

Yolki, pampa ni tlatepanitalotl, ni tlasenkauajkayotl iuan ni kualinemilistli ipan ni tlalpan, yaya ni moneki moixmatis uan monemilis, ijkinoj nochi kuali tiitstosej ika touampoyouaj.

Pampa tlaj amo tikixmatij tlatepanitalistli uan tlen kuali nemilistli ipanni tlalpan, yeka onkatok kualantli, onkatok tlateuilistli, onkatokmajmajtli uan sekinok tlamantli teixpanolistli; yeka moneki ma kualitimouikakaj ika nochi touampoyouaj, ma amo onkaj majmajyotl uanteixpanolistli; moneki ma onkaj yejyektlalistli, ma titlajtlajtokaj uan ma tijneltokakaj tlen tojuantij tijnekij tijneltokasej uan amo tlen ma topanti, kenke, pampa tijnekij ma onkaj tlatepanitalistli.

Pampa ni tlatepanitalotl moneki ma tiyejyekokaj, ma tijchiuakaj uanma tijmanauikaj; ma nojkia kiixmatikaj tekiuajtinij, uejueyij tekiuajtinij, ijkinoj amo onkas nopeka se akajya touampoj san tlen ueli kinekistechchiuilis, technauatis, kinekis technauatis ma tijchiuakaj se tlamantli tlen amo kuali; yeka ni tlatepanitalotl tlauel moneki ipantonemilis ni tlalpan.

Pampa nojkia tlauel moneki ma kuali timouikakaj, ma tielikaj keuaktiiknimej, nochi tlen tlakamej uan siuamej tlen tiitstokej ni tlalpan.…

http://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=nhn

Slides courtesy Rebecca Knowles

• Bitext/parallel texts: sentences with their (human provided) translations• Sentences are aligned, words are not• Commonly used bitext: Europarl (http://www.statmt.org/europarl/)

Alignments

If we had word-aligned text, we could easily estimate P(f|e).

But we don’t usually have word alignments, and they are expensive to produce by hand…

If we had P(f|e) we could produce alignments automatically.

Slides courtesy Rebecca Knowles

Join

t m

od

el

un

ob

serv

edIBM Model 1

(1993)

f: vector of French words

(visualization of alignment)

e: vector of English words

a: vector of alignment indices

Le chat est sur la chaise verte

The cat is on the green chair

0 1 2 3 4 6 5

Slides courtesy Rebecca Knowles

Lexical Translation ModelWord Alignment ModelFor all IBM models, see the original paper (Brown et al,

1993): http://www.aclweb.org/anthology/J93-2003

ob

serv

ed

t(fj|ei) : translation probability of the word fj given the word ei

Expectation Maximization (EM)0. Assume some value for and compute other parameter values

Two step, iterative algorithm

1. E-step: count alignments and translations under uncertainty, assuming these parameters

2. M-step: maximize log-likelihood (update parameters), using uncertain counts

estimated counts

P( | “the cat”)

P( | “the cat”)le chat

le chat

Slides courtesy Rebecca Knowles

Outline

Recap: MT word alignment

Structure in Language: Constituency

(Probabilistic) Context Free GrammarsDefinitionsHigh-level tasks: Generating and ParsingSome uses for PCFGs

CKY Algorithm: Parsing with a (P)CFG

Parts of Speech

Adapted from Luke Zettlemoyer

Closed class words

Open class words

Nouns

milk cat

cats

UMBC

Baltimorebread

speak

give

can

do

may

Verbs

Adjectives

would-be

wettestlargehappy

red

fake

Kamp & Partee (1995)

Adverbs

recentlyhappily

thenthere (location)

intransitive

run

ditransitive

transitive

subsective

non-subsective

modals, auxiliaries

Numbers

I you one

1,324

Determiners

PrepositionsConjunctions

Pronouns

and or if

athe

every

what

inunder

topParticles

(set) up

so (far)

not

(call) off

becausebecause

Constituency

spans of words that act (syntactically) as a group

“X phrase” (noun phrase)

Constituency

spans of words that act (syntactically) as a group

“X phrase” (noun phrase)

Baltimore is a great place to be.

This house is a great place to be.

This red house is a great place to be.

This red house on the hill is a great place to be.

noun phrase (NP)

Constituency

spans of words that act (syntactically) as a group

“X phrase” (noun phrase)

Baltimore is a great place to be.

This house is a great place to be.

This red house is a great place to be.

This red house on the hill is a great place to be. noun phrase (NP) noun phrase (NP)

Constituency

spans of words that act (syntactically) as a group

“X phrase” (noun phrase)

Baltimore is a great place to be.

This house is a great place to be.

This red house is a great place to be.

This red house on the hill is a great place to be.

Is this house a great place to be?

noun phrase (NP) noun phrase (NP)

Constituency

spans of words that act (syntactically) as a group

“X phrase” (noun phrase)

Baltimore is a great place to be.

This house is a great place to be.

This red house is a great place to be.

This red house on the hill is a great place to be.

*This is house a great place to be.

noun phrase (NP) noun phrase (NP)

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group

“X phrase” (noun phrase)

Baltimore is a great place to be.This house is a great place to be.

This red house is a great place to be.This red house on the hill is a great place to be.

S NP V NP

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group

“X phrase” (noun phrase)

Baltimore is a great place to be.This house is a great place to be.

This red house is a great place to be.This red house on the hill is a great place to be.

S NP V NPNP Det Noun

NP Noun

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group

“X phrase” (noun phrase)

Baltimore is a great place to be.This house is a great place to be.

This red house is a great place to be.This red house on the hill is a great place to be.

S NP V NPNP Det Noun

NP NounNP Det Adj Noun

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group

“X phrase” (noun phrase)

Baltimore is a great place to be.This house is a great place to be.

This red house is a great place to be.This red house on the hill is a great place to be.

The hill is a great place to be.

S NP V NPNP Det Noun

NP NounNP Det Adj Noun

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group

“X phrase” (noun phrase)

Baltimore is a great place to be.This house is a great place to be.

This red house is a great place to be.This red house on the hill is a great place to be.

The hill is a great place to be.

S NP V NPNP Det Noun

NP NounNP Det Adj NounNP NP Prep NP

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group“X phrase” (noun phrase)

Baltimore is a great place to be.This house is a great place to be.

This red house is a great place to be.This red house on the hill is a great place to be.

This red house near the hill is a great place to be.This red house atop the hill is a great place to be.

The hill is a great place to be.

S NP V NPNP Det Noun

NP NounNP Det Adj Noun

NP NP PPPP P NP

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group“X phrase” (noun phrase)

Baltimore is a great place to be.This house is a great place to be.

This red house is a great place to be.This red house on the hill is a great place to be.

This red house near the hill is a great place to be.This red house atop the hill is a great place to be.

The hill is a great place to be.

S NP V NPNP Det Noun

NP NounNP Det AdjP

NP NP PPPP P NP

AdjP Adj Noun

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group“X phrase” (noun phrase)

Baltimore is a great place to be.This house is a great place to be.

This red house is a great place to be.This red house on the hill is a great place to be.

This red house near the hill is a great place to be.This red house atop the hill is a great place to be.

The hill is a great place to be.

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PPPP P NP

AdjP Adj NounVP V NP

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group“X phrase” (noun phrase)

Baltimore is a great place to be.This house is a great place to be.

This red house is a great place to be.This red house on the hill is a great place to be.

This red house near the hill is a great place to be.This red house atop the hill is a great place to be.

The hill is a great place to be.

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PPPP P NP

AdjP Adj NounVP V NP

Constituents Help Form Grammars

constituent: spans of words that act (syntactically) as a group“X phrase” (noun phrase)

Baltimore is a great place to be.This house is a great place to be.

This red house is a great place to be.This red house on the hill is a great place to be.

This red house near the hill is a great place to be.This red house atop the hill is a great place to be.

The hill is a great place to be.

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V NPNoun Baltimore

Outline

Recap: MT word alignment

Structure in Language: Constituency

(Probabilistic) Context Free GrammarsDefinitionsHigh-level tasks: Generating and ParsingSome uses for PCFGs

CKY Algorithm: Parsing with a (P)CFG

Context Free Grammar

Set of rewrite rules, comprised of terminals and non-terminals

Terminals: the words in the language (the lexicon), e.g., Baltimore

Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun

(Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V NPNoun Baltimore

Context Free Grammar

Set of rewrite rules, comprised of terminals and non-terminals

Terminals: the words in the language (the lexicon), e.g., Baltimore

Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun

(Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun

Applications: Learn more

in CMSC 331, 431

Theory: Learn

more in CMSC 451

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V NPNoun Baltimore

How Do We Robustly Handle Ambiguities?

How Do We Robustly Handle Ambiguities?

Add probabilities (to what?)

Probabilistic Context Free Grammar

Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals

Terminals: the words in the language (the lexicon), e.g., Baltimore

Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun

(Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun

S NP VPNP Det NounNP NounNP Det AdjPNP NP PP

PP P NPAdjP Adj NounVP V NPNoun Baltimore…

Probabilistic Context Free Grammar

Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals

Terminals: the words in the language (the lexicon), e.g., Baltimore

Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun

(Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun

Q: What are the distributions? What must sum to 1?

S NP VPNP Det NounNP NounNP Det AdjPNP NP PP

PP P NPAdjP Adj NounVP V NPNoun Baltimore…

Probabilistic Context Free Grammar

Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals

Terminals: the words in the language (the lexicon), e.g., Baltimore

Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun

(Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun

1.0 S NP VP.4 NP Det Noun.3 NP Noun.2 NP Det AdjP.1 NP NP PP

1.0 PP P NP.34 AdjP Adj Noun.26 VP V NP.0003 Noun Baltimore…

Q: What are the distributions? What must sum to 1?

A: P(X Y Z | X)

Probabilistic Context Free Grammar

p(S

NP VP

Noun

Baltimore

Verb NP

is a great city

)= product of probabilities of individual rules used in the

derivation

Probabilistic Context Free Grammar

p(S

NP VP

Noun

Baltimore

Verb NP

is a great city

)=

p(S

NP VP

) *

product of probabilities of individual rules used in the

derivation

Probabilistic Context Free Grammar

p(S

NP VP

Noun

Baltimore

Verb NP

is a great city

)=

p(S

NP VP

) *

p( ) *NP

Noun

p( ) *Noun

Baltimore

product of probabilities of individual rules used in the

derivation

Probabilistic Context Free Grammar

p(S

NP VP

Noun

Baltimore

Verb NP

is a great city

)=

p(S

NP VP

) *

p( ) *

p( ) *

NP

Noun

p( ) *Noun

Baltimore

VP

Verb NP

p( ) *Verb

is

p( )NP

a great city

product of probabilities of individual rules used in the

derivation

Log Probabilistic Context Free Grammar

lp(S

NP VP

Noun

Baltimore

Verb NP

is a great city

)=

lp(S

NP VP

) +

lp( ) +

lp( ) +

NP

Noun

lp( ) +Noun

Baltimore

VP

Verb NP

lp( ) +Verb

is

lp( )NP

a great city

sum of log probabilities of individual rules used in the

derivation

Estimating PCFGs

Attempt 1:

• Get access to a treebank (corpus of syntactically annotated sentences), e.g., the English Penn Treebank

• Count productions

• Smooth these counts

• This gets ~75 F1

Probabilistic Context Free Grammar (PCFG) Tasks

Find the most likely parse (for an observed sequence)

Calculate the (log) likelihood of an observed sequence w1, …, wN

Learn the grammar parameters

Outline

Recap: MT word alignment

Structure in Language: Constituency

(Probabilistic) Context Free GrammarsDefinitionsHigh-level tasks: Generating and ParsingSome uses for PCFGs

CKY Algorithm: Parsing with a (P)CFG

Context Free Grammar

Generate: Iteratively create a string (or tree 1.derivation) using the rewrite rules

Parse: Assign a tree (if possible) to an input string2.

Generate from a Context Free Grammar

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V NPNoun Baltimore

S

S

Generate from a Context Free Grammar

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V NPNoun Baltimore

NP VP

S

NP VP

Generate from a Context Free Grammar

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V NPNoun Baltimore

Noun VP

S

NP VP

Noun

Generate from a Context Free Grammar

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V NPNoun Baltimore

Baltimore VP

S

NP VP

Noun

Baltimore

Generate from a Context Free Grammar

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V NPNoun Baltimore

Baltimore V NP

S

NP VP

Noun

Baltimore

Verb NP

Generate from a Context Free Grammar

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V NPNoun Baltimore

Baltimore is a great city

S

NP VP

Noun

Baltimore

Verb NP

is a great city

Generate from a Context Free Grammar

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V NPNoun Baltimore

Baltimore is a great city

S

NP VP

Noun

Baltimore

Verb NP

is a great city

Assign Structure (Parse) with aContext Free Grammar

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V NPNoun Baltimore

Baltimore is a great city

S

NP VP

Noun

Baltimore

Verb NP

is a great city

Assign Structure (Parse) with aContext Free Grammar

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V NPNoun Baltimore

[S [NP [Noun Baltimore] ] [VP [Verb is] [NP a great city]]]

S

NP VP

Noun

Baltimore

Verb NP

is a great city

bracket notation

Assign Structure (Parse) with aContext Free Grammar

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V NPNoun Baltimore

(S (NP (Noun Baltimore))(VP (V is)

(NP a great city)))

S

NP VP

Noun

Baltimore

Verb NP

is a great city

S-expression

Some CFG Terminology:Derivation/Parse Tree

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V `PNoun Baltimore

S

NP VP

Noun

Baltimore

Verb NP

is a great city

derivation,parse tree

Some CFG Terminology:Start Symbol

S NP VPNP Det Noun

NP NounNP Det AdjP

NP NP PP

PP P NPAdjP Adj Noun

VP V NPNoun Baltimore

S

NP VP

Noun

Baltimore

Verb NP

is a great city

start symbol

Some CFG Terminology:Rewrite Choices

S NP VPNP Det Noun |

Noun |Det AdjP |NP PP

PP P NPAdjP Adj NounVP V NPNoun Baltimore | ….…

S

NP VP

Noun

Baltimore

Verb NP

is a great city

show choices with “|” (vertical bar)

Some CFG Terminology:Chomsky Normal Form (CNF)

non-terminal non-terminal non-terminal

non-terminal terminal

X Y Z

X a

binary rules can only involve non-terminals

unary rules can only involve terminals

Restricted binary and unary rules only

No ternary rules (or above)

Outline

Recap: MT word alignment

Structure in Language: Constituency

(Probabilistic) Context Free GrammarsDefinitionsHigh-level tasks: Generating and ParsingSome uses for PCFGs

CKY Algorithm: Parsing with a (P)CFG

What are some benefits to CFGs?

Why should you care about syntax?

Some Uses of CFGs

Clearly disambiguate certain ambiguities

Morphological derivations

Identify “grammatical” sentences

Clearly Show Ambiguity

I ate the meal with friends

Clearly Show Ambiguity

I ate the meal with friends

Clearly Show Ambiguity

I ate the meal with friendssalt

Clearly Show Ambiguity

I ate the meal with friends

NP VP

VP NP PP

S

Clearly Show Ambiguity

I ate the meal with friends

NP VP

VP NP PP

S

NP VP

S

VP NP

PPNP

Clearly Show Ambiguity

I ate the meal with friends

NP VP

VP NP PP

S

NP VP

S

VP NP

PPNP

PP Attachment(a common source of errors,

even still today)

Clearly Show Ambiguity…But Not Necessarily All Ambiguity

I ate the meal with friends

NP VP

VP NP PP

S

I ate the meal with gusto

I ate the meal with a fork

Other Attachment Ambiguity

We invited the students, Chris and Pat.

Coordination Ambiguity

old men womenand

Grammars Aren’t Just for Syntax

overgeneralization

general -izeA AV

generalizeV

-tionVN

generalizationN

over-NN

overgeneralizationN

Clearly Show Grammaticality (?)

The old man the boats

S

NP VP

Clearly Show Grammaticality (?)

The old man the boats

S

NP VP

S

NP NP

Clearly Show Grammaticality (?)

The old man the boats

S

NP VP

S

NP NP

Idea: define grammatical sentences as those that can be parsed by a grammar

Clearly Show Grammaticality (?)

The old man the boats

S

NP VP

S

NP NP

Idea: define grammatical sentences as those that can be parsed by a grammar

Issue 1: Which grammar?

Clearly Show Grammaticality (?)

The old man the boats

S

NP VP

S

NP NP

Idea: define grammatical sentences as those that can be parsed by a grammar

Issue 1: Which grammar?

Issue 2: Discourse demands flexibility

Q: What do you see?

A: [I see] The old man [and] the boats.

Outline

Recap: MT word alignment

Structure in Language: Constituency

(Probabilistic) Context Free GrammarsDefinitionsHigh-level tasks: Generating and ParsingSome uses for PCFGs

CKY Algorithm: Parsing with a (P)CFG

Parsing with a CFG

Top-down backtracking (brute force)

CKY Algorithm: dynamic bottom-up

Earley’s Algorithm: dynamic top-down

not covered due to time

CKY Precondition

Grammar must be in Chomsky Normal Form (CNF)

non-terminal non-terminal non-terminal

non-terminal terminal

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det aExample from Jason Eisner

Entire grammarAssume uniform weights

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform weights

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform weights

Goal:

(S, 0, 7)

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform weights

Check 1: What are the non-terminals?

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform weights

Check 1: What are the non-terminals?

SNPVPPP

NVPDet

Check 2: What are the terminals?

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform weights

Check 1: What are the non-terminals?

SNPVPPP

NVPDet

Check 2: What are the terminals?

Papacaviarspoonate

withthea

Check 3: What are the pre-terminals?

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform weights

Check 1: What are the non-terminals?

SNPVPPP

NVPDet

Check 2: What are the terminals?

Papacaviarspoonate

withthea

Check 3: What are the pre-terminals?

NV

PDet

Check 4: Is this in CNF?

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform weights

Check 1: What are the non-terminals?

SNPVPPP

NVPDet

Check 2: What are the terminals?

Papacaviarspoonate

withthea

Check 3: What are the pre-terminals?

NV

PDet

Check 4: Is this in CNF?Yes

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform weights

First: Let’s find all NPs

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform weights

First: Let’s find all NPs

(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar

“Papa ate the caviar with a spoon”

S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform

weights

First: Let’s find all NPs

(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar

(NP, 0, 1) (VP, 1, 7) (S, 0, 7)

“Papa ate the caviar with a spoon”S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform

weights

First: Let’s find all NPs

(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar

0

6

1

2

3

4

5

1 2 3 4 5 6 7

(NP, 0, 1) (VP, 1, 7) (S, 0, 7)

start

end

“Papa ate the caviar with a spoon”S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform

weights

First: Let’s find all NPs

(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar

NP0

6

1

2

3

4

5

1 2 3 4 5 6 7

(NP, 0, 1) (VP, 1, 7) (S, 0, 7)

start

end

“Papa ate the caviar with a spoon”S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform

weights

First: Let’s find all NPs

(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar

NP

VP

0

6

1

2

3

4

5

1 2 3 4 5 6 7

(NP, 0, 1) (VP, 1, 7) (S, 0, 7)

start

end

“Papa ate the caviar with a spoon”S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform

weights

First: Let’s find all NPs

(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar

NP

VP

0

6

1

2

3

4

5

1 2 3 4 5 6 7

(NP, 0, 1) (VP, 1, 7) (S, 0, 7)

start

end

S

CKY Recognizer

Input: * string of N words

* grammar in CNF

Output: True (with parse)/False

Data structure: N*N table T

Rows indicate span start (0 to N-1)

Columns indicate span end (1 to N)

T[i][j] lists constituents spanning i j

CKY Recognizer

Input: * string of N words* grammar in CNF

Output: True (with parse)/False

Data structure: N*N table TRows indicate span

start (0 to N-1)Columns indicate span

end (1 to N)

T[i][j] lists constituents spanning i j

For Viterbi in HMMs:build table left-to-right

For CKY in trees:1. build smallest-to-largest &2. left-to-right

CKY RecognizerT = Cell[N][N+1]

CKY RecognizerT = Cell[N][N+1]

for(j = 1; j ≤ N; ++j) {

T[j-1][j].add(X for non-terminal X in G if X wordj)

}

CKY RecognizerT = Cell[N][N+1]

for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)

}

for(width = 2; width ≤ N; ++width) {

}

CKY RecognizerT = Cell[N][N+1]

for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)

}

for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {

end = start + width

}}

CKY RecognizerT = Cell[N][N+1]

for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)

}

for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {

end = start + widthfor(mid = start+1; mid < end; ++mid) {

}}

}

CKY RecognizerT = Cell[N][N+1]

for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)

}

for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {

end = start + widthfor(mid = start+1; mid < end; ++mid) {

for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {

T[start][end].add(X for rule X Y Z : G)}

}}

}}

X

Y Z

Y Z

CKY RecognizerT = Cell[N][N+1]

for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)

}

for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {

end = start + widthfor(mid = start+1; mid < end; ++mid) {

for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {

T[start][end].add(X for rule X Y Z : G)}

}}

}}

Q: What do we return?

CKY RecognizerT = Cell[N][N+1]

for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)

}

for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {

end = start + widthfor(mid = start+1; mid < end; ++mid) {

for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {

T[start][end].add(X for rule X Y Z : G)}

}}

}}

Q: What do we return?

A: S in T[0][N]

CKY RecognizerT = Cell[N][N+1]

for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)

}

for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {

end = start + widthfor(mid = start+1; mid < end; ++mid) {

for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {

T[start][end].add(X for rule X Y Z : G)}

}}

}}

Q: How do we get the parse?

CKY RecognizerT = Cell[N][N+1]

for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)

}

for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {

end = start + widthfor(mid = start+1; mid < end; ++mid) {

for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {

T[start][end].add(X for rule X Y Z : G)}

}}

}}

Q: How do we get the parse?

A: Follow backpointers (stored where?)

CKY RecognizerT = Cell[N][N+1]

for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)

}

for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {

end = start + widthfor(mid = start+1; mid < end; ++mid) {

for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {

T[start][end].add(X for rule X Y Z : G)}

}}

}}

CKY RecognizerT = Cell[N][N+1]

for(j = 1; j ≤ N; ++j) {

T[j-1][j].add(X for non-terminal X in G if X wordj)

}

for(width = 2; width ≤ N; ++width) {

for(start = 0; start < N - width; ++start) {

end = start + width

for(mid = start+1; mid < end; ++mid) {

for(rule X Y Z : G) {

T[start][end].add(X if Y in T[start][mid] & Z in T[mid][end])

}

}

}

}

CKY RecognizerT = bool[K][N][N+1]

for(j = 1; j ≤ N; ++j) {for(non-terminal X in G if X wordj) {

T[X][j-1][j] = True}

}

for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {

end = start + widthfor(mid = start+1; mid < end; ++mid) {

for(rule X Y Z : G) {for rule X Y Z : G) {

T[X][start][end] = T[Y][start][mid] & T[Z][mid][end]}

}}

}}

Another PCFG Task: Likelihood of the Observed Words

p(S + w1 w2 w3 … wN)

p(w1 w2 w3 … wN)likelihood of word sequence w1w2…wN

p( )

S

w1 w2 w3 w4

p( )

S

w1

w2 w3

w4

p( )

S

w1

w2

w3 w4

likelihood of word sequence w1w2…wN

based on starting at S

“syntactic language model”

CKY is Versatile: PCFG TasksTask PCFG algorithm name HMM analog

Find any parse CKY recognizer none

Find the most likely parse (for an observed sequence)

CKY weighted Viterbi Viterbi

Calculate the (log) likelihood of an observed sequence w1, …, wN Inside algorithm Forward algorithm

Learn the grammar parametersInside-outside algorithm (EM)

Forward-backward/Baum-Welch (EM)

CKY Algorithms

Weights ⓪ ①

RecognizerBoolean

(True/False)or and False True

Viterbi [0,1] max * 0 1

Inside [0,1] + * 0 1

Outside?Not really (“Semiring Parsing,” Goodman, 1998). But there is a

connection between inside-outside and backprop! (“Inside-Outside and Forward-Backward Algorithms are Just Backprop,” Eisner, 2016)

Adapted from Jason Eisner

Outline

Recap: MT word alignment

Structure in Language: Constituency

(Probabilistic) Context Free GrammarsDefinitionsHigh-level tasks: Generating and ParsingSome uses for PCFGs

CKY Algorithm: Parsing with a (P)CFG

top related