edinburgh mt lecture 12: synchronous context-free grammar

141
Synchronous context-free grammar

Upload: alopezfoo

Post on 25-Jul-2015

218 views

Category:

Technology


2 download

TRANSCRIPT

Synchronous context-free grammar

Phrase-based models

•Exact decoding is NP-hard.

•As a consequence of arbitrary permutation…

•…but real permutations are not arbitrary!

•Parameterization of reordering is weak.

•No generalization!

Garcia and associates .

Garcia y asociados .Carlos Garcia has three associates .

Carlos Garcia tiene tres asociados .his associates are not strong .

sus asociados no son fuertes .Garcia has a company also .

Garcia tambien tiene una empresa .its clients are angry .

sus clientes estan enfadados .the associates are also angry .

los asociados tambien estan enfadados .

la empresa tiene enemigos fuertes en Europa .

the company has strong enemies in Europe .the clients and the associates are enemies .

los clientes y los asociados son enemigos .the company has three groups .

la empresa tiene tres grupos .its groups are in Europe .

sus grupos estan en Europa .the modern groups sell strong pharmaceuticals .

los grupos modernos venden medicinas fuertes .the groups do not sell zanzanine .

los grupos no venden zanzanina .the small groups are not modern .

los grupos pequenos no son modernos .

Garcia and associates .

Garcia y asociados .Carlos Garcia has three associates .

Carlos Garcia tiene tres asociados .his associates are not strong .

sus asociados no son fuertes .Garcia has a company also .

Garcia tambien tiene una empresa .its clients are angry .

sus clientes estan enfadados .the associates are also angry .

los asociados tambien estan enfadados .

la empresa tiene enemigos fuertes en Europa .

the company has strong enemies in Europe .the clients and the associates are enemies .

los clientes y los asociados son enemigos .the company has three groups .

la empresa tiene tres grupos .its groups are in Europe .

sus grupos estan en Europa .the modern groups sell strong pharmaceuticals .

los grupos modernos venden medicinas fuertes .the groups do not sell zanzanine .

los grupos no venden zanzanina .the small groups are not modern .

los grupos pequenos no son modernos .

Same pattern:NN JJ → JJ NN

Phrase-based models do not capture this generalization.

Context-free grammar

Context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

Context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

Context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

Context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

Context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

Context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

watashi wa

Context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

watashi wa

Context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

NP Vwatashi wa

Context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

NP Vwatashi wa

Context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

NP Vwatashi wa

hako wo

Context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

NP Vwatashi wa

hako wo

Context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

NP Vwatashi wa

hako wo akemasu

Context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

NP Vwatashi wa

hako wo akemasu

Context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S

NP VP

NP Vwatashi wa

hako wo akemasu

watashi wa hako wo akemasu

Synchronous context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

Originally: syntax-directed translation (Lewis & Stearns 1966; Aho and Ullman 1969)

Synchronous context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S → NP VPNP → INP → the boxVP → V NP V → open

Originally: syntax-directed translation (Lewis & Stearns 1966; Aho and Ullman 1969)

Synchronous context-free grammar

S → NP VPNP → watashi waNP → hako woVP → NP V V → akemasu

S → NP VPNP → INP → the boxVP → V NP V → open

Originally: syntax-directed translation (Lewis & Stearns 1966; Aho and Ullman 1969)

Japanese is SOV. English is SVO.

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

Originally: syntax-directed translation (Lewis & Stearns 1966; Aho and Ullman 1969)

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

REQUIREMENT:one-to-mapping

between source and target nonterminals,

indicated by coindexes

Originally: syntax-directed translation (Lewis & Stearns 1966; Aho and Ullman 1969)

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

S S

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

S S

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

S S

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

S S

NP VP NP VP

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

S S

NP VP NP VP

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

S S

NP VP NP VP

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

S S

NP VP NP VP

watashi wa I

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

S S

NP VP NP VP

watashi wa I

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

S S

NP VP NP VP

NP V V NPwatashi wa I

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

S S

NP VP NP VP

NP V V NPwatashi wa I

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

S S

NP VP NP VP

NP V V NPwatashi wa I

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

S S

NP VP NP VP

NP V V NPwatashi wa I

hako wo the box

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

S S

NP VP NP VP

NP V V NPwatashi wa I

hako wo the box

Synchronous context-free grammar

S → NP1 VP2 / NP1 VP2

NP → watashi wa / I NP → hako wo / the boxVP → NP1 V2 / V2 NP1

V → akemasu / open

S S

NP VP NP VP

NP V V NPwatashi wa I

akemasu openhako wo the box

Synchronous context-free grammarS S

NP VP NP VP

NP V V NPwatashi wa I

akemasu openhako wo the box

Synchronous context-free grammarS S

NP VP NP VP

NP V V NPwatashi wa I

akemasu openhako wo the box

watashi wa hako wo akemasu

Synchronous context-free grammarS S

NP VP NP VP

NP V V NPwatashi wa I

akemasu openhako wo the box

watashi wa hako wo akemasu I open the box

Translation as parsing

watashi wa hako wo akemasu

Translation as parsingS

NP VP

NP Vwatashi wa

akemasuhako wo

watashi wa hako wo akemasu

Translation as parsingS

NP VP

NP Vwatashi wa

akemasuhako wo

S

NP VP

V NPI

open the box

watashi wa hako wo akemasu

Translation as parsingS

NP VP

NP Vwatashi wa

akemasuhako wo

S

NP VP

V NPI

open the box

watashi wa hako wo akemasu I open the box

Preliminaries

S → NP VPNP → watashi wa NP → hako woVP → NP V V → akemasu

Transform source grammar into Chomsky normal form:all productions in form X → w or X → YZ.

Preliminaries

S → NP VPNP → watashi wa NP → hako woVP → NP V V → akemasu

Transform source grammar into Chomsky normal form:all productions in form X → w or X → YZ.

S → NP VPNP → X Y

X → watashi Y → wa NP → Z W Z → hako W → woVP → NP V V → akemasu

Preliminaries

S → NP VPNP → watashi wa NP → hako woVP → NP V V → akemasu

Transform source grammar into Chomsky normal form:all productions in form X → w or X → YZ.

S → NP VPNP → X Y

X → watashi Y → wa NP → Z W Z → hako W → woVP → NP V V → akemasu

Q: how do synchronous productions interact with

this transformation?

Decoding

Decoding

•A binary-branching (i.e. CNF) grammar can produce a Catalan number of parses of an input sentence.

Decoding

•A binary-branching (i.e. CNF) grammar can produce a Catalan number of parses of an input sentence.

O((2n)!

(n + 1)!n!)

Decoding

•A binary-branching (i.e. CNF) grammar can produce a Catalan number of parses of an input sentence.

•Dynamic programming to the rescue!

O((2n)!

(n + 1)!n!)

ParsingNN → duck, pato

PRP → I, yo

VBD → saw, vi

PRP$ → her, ella

NP → PRP$1 NN2, PRP$1 NN2

VP → VBD1 NP2, VBD1 NP2

S → PRP1 VP2, PRP1 VP2

PRP → her, su

VB → duck, agacharseSBAR → PRP1 VB2, PRP1 VB2

VP → VBD1 SBAR2, VBD1 SBAR2

ParsingNN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Parsing

I1 saw2 her3 duck4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Parsing

I1 saw2 her3 duck4

PRP0,1 � (w1 = I) ⇤ (PRP ⇥ I)Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Parsing

I1 saw2 her3 duck4

PRP0,1 � (w1 = I) ⇤ (PRP ⇥ I)Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)NP2,4 � PRP$2,3 ⇤NN3,4 ⇤ (NP⇥ PRP$ NN)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)NP2,4 � PRP$2,3 ⇤NN3,4 ⇤ (NP⇥ PRP$ NN)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

NP2,4

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

NP2,4

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

NP2,4 SBAR2,4

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

NP2,4 SBAR2,4

VP1,4

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

NP2,4 SBAR2,4

VP1,4

Parsing

I1 saw2 her3 duck4

Xi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)

PRP0,1

Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR VBD1,2

PRP$2,3

PRP2,3

NN3,4

VB3,4

NP2,4 SBAR2,4

VP1,4

S0,4

Parsing

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

Parsing

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

Parsing

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

I saw her duck

NP

VP

PRP VBD PRP$ NN

S

Parsing

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

I saw her duck

NP

VP

PRP VBD PRP$ NN

SNP

VP

PRP VBD PRP$ NN

S

yo vi su pato

Parsing

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

I saw her duck

SBAR

VP

PRP VBD PRP VB

S

Parsing

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

I saw her duck

SBAR

VP

PRP VBD PRP VB

SSBAR

VP

PRP VBD PRP VB

S

yo vi ella agacharse

Parsing

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

Analysis

Parsing

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

Analysis

nodesO(Nn2)

O(Gn3) edges

Wait a second!

Wait a second!

•Phrase-based MT is NP-hard because of permutations (there are a factorial number).

Wait a second!

•Phrase-based MT is NP-hard because of permutations (there are a factorial number).

•SCFGs also permute sentences.

Wait a second!

•Phrase-based MT is NP-hard because of permutations (there are a factorial number).

•SCFGs also permute sentences.

•But the decoding algorithm is polynomial…

Wait a second!

•Phrase-based MT is NP-hard because of permutations (there are a factorial number).

•SCFGs also permute sentences.

•But the decoding algorithm is polynomial…

•What are we giving up for this efficiency?

PermutationsX → X1 X2, X1 X2

X → X1 X2, X2 X1

X → a, a

X → b, bX → c, cX → d, d

PermutationsX → X1 X2, X1 X2

X → X1 X2, X2 X1

X → a, a

X → b, bX → c, cX → d, d

PermutationsX → X1 X2, X1 X2

X → X1 X2, X2 X1

X → a, a

What permutations of a b c d can this

grammar produce?

X → b, bX → c, cX → d, d

PermutationsX → X1 X2, X1 X2

X → X1 X2, X2 X1

X → a, a

b a c d

b a d c

b c a d

b c d a

b d a c

b d c a

What permutations of a b c d can this

grammar produce?

d a b c

d a c b

d b a c

d b c a

d c a b

d c b a

a b c d

a b d c

a c b d

a c d b

a d b c

a d c b

c a b d

c a d b

c b a d

c b d a

c d a b

c d b a

X → b, bX → c, cX → d, d

PermutationsX → X1 X2, X1 X2

X → X1 X2, X2 X1

X → a, a

b a c d

b a d c

b c a d

b c d a

b d a c

b d c a

What permutations of a b c d can this

grammar produce?

d a b c

d a c b

d b a c

d b c a

d c a b

d c b a

a b c d

a b d c

a c b d

a c d b

a d b c

a d c b

c a b d

c a d b

c b a d

c b d a

c d a b

c d b a

X → b, bX → c, cX → d, d

PermutationsX → X1 X2, X1 X2

X → X1 X2, X2 X1

X → a, a

b a c d

b a d c

b c a d

b c d a

b d a c

b d c a

What permutations of a b c d can this

grammar produce?

d a b c

d a c b

d b a c

d b c a

d c a b

d c b a

a b c d

a b d c

a c b d

a c d b

a d b c

a d c b

c a b d

c a d b

c b a d

c b d a

c d a b

c d b a

X → b, bX → c, cX → d, d

PermutationsX → X1 X2, X1 X2

X → X1 X2, X2 X1

X → a, a

b a c d

b a d c

b c a d

b c d a

b d a c

b d c a

What permutations of a b c d can this

grammar produce?

d a b c

d a c b

d b a c

d b c a

d c a b

d c b a

a b c d

a b d c

a c b d

a c d b

a d b c

a d c b

c a b d

c a d b

c b a d

c b d a

c d a b

c d b a

X → b, bX → c, cX → d, d

PermutationsX → X1 X2, X1 X2

X → X1 X2, X2 X1

X → a, a

b a c d

b a d c

b c a d

b c d a

b d a c

b d c a

What permutations of a b c d can this

grammar produce?

d a b c

d a c b

d b a c

d b c a

d c a b

d c b a

a b c d

a b d c

a c b d

a c d b

a d b c

a d c b

c a b d

c a d b

c b a d

c b d a

c d a b

c d b a

X → b, bX → c, cX → d, d

PermutationsX → X1 X2, X1 X2

X → X1 X2, X2 X1

X → a, a

b d a c

What permutations of a b c d can this

grammar produce?

c a d b

X → b, bX → c, cX → d, d

PermutationsX → X1 X2, X1 X2

X → X1 X2, X2 X1

X → a, a

b d a c

What permutations of a b c d can this

grammar produce?

c a d b

X → b, bX → c, cX → d, d

PermutationsX → X1 X2, X1 X2

X → X1 X2, X2 X1

X → a, a

b d a c

What permutations of a b c d can this

grammar produce?

c a d b

X → b, bX → c, cX → d, d

inside-outside alignments

PermutationsX → X1 X2, X1 X2

X → X1 X2, X2 X1

X → a, a

b d a c

What permutations of a b c d can this

grammar produce?

c a d b

X → b, bX → c, cX → d, d

inside-outside alignments

X → X1 X2 X3 X4, X2 X4 X1 X3 X → X1 X2 X3 X4, X3 X1 X4 X2

PermutationsX → X1 X2, X1 X2

X → X1 X2, X2 X1

X → a, a

b d a c

What permutations of a b c d can this

grammar produce?

c a d b

X → b, bX → c, cX → d, d

inside-outside alignments

X → X1 X2 X3 X4, X2 X4 X1 X3 X → X1 X2 X3 X4, X3 X1 X4 X2

No equivalent binary-branching SCFG

PermutationsX → X1 X2, X1 X2

X → X1 X2, X2 X1

X → a, a

b d a c

What permutations of a b c d can this

grammar produce?

c a d b

X → b, bX → c, cX → d, d

inside-outside alignments

X → X1 X2 X3 X4, X2 X4 X1 X3 X → X1 X2 X3 X4, X3 X1 X4 X2

No equivalent binary-branching SCFGComplexity of many problems is exponential in rank

Parsing as deductionXi,i+1 � (wi+1 = w) ⇤ (X ⇥ w)Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)

[X ! w][wi+1 = w]

[i,X, i+ 1]

[X ! Y Z][i, Y, k][k, Z, j]

[i,X, j]

For sentence w1…wn, grammar G with nonterminals N

[i,X, j] 8i, j 2 0, ..., n,X 2 N

[X ! w] 8X ! w 2 PG

[wi = w] 8i 2 1, ..., naxioms:

items:inference rules:

[1, S, n]goal:

[X ! Y Z] 8X ! Y Z 2 PG

From proof to (pseudo)codeInput: w1…wn, grammar Gfor i in 1,…,n: for X->w_i in P(G): chart[i-1,X,i] := truefor l in 2,…,n: for i in 0,…,n-l: j := i+l for k in i+1,…,j-1: for X->YZ in P(G): if chart[i,Y,k] and chart[k,Z,j]: chart[i,X,j] := truereturn chart[0,S,n]

That’s nice, but…

•We need probabilities.

•We need to compute the most probable parse.

•We need to compute expectations.

Most probable parse

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

Most probable parse

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

0.7

Most probable parse

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

0.3 1.0

1.01.0

1.0

0.7

Most probable parse

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Most probable parse

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Most probable parse

I1

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Most probable parse

I1

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

0.06

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Most probable parse

I1

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

0.06

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Most probable parse

I1

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

0.06

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Most probable parse

I1

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

0.8

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Most probable parse

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

0.8

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Most probable parse

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

0.80.56

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Most probable parse

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

0.80.56

Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Rule expectations

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Rule expectations

I1

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Rule expectations

I1

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

0.06

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Rule expectations

I1

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

0.06

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Rule expectations

I1

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

0.86

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Rule expectations

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

0.86

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Rule expectations

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

NN → duck

PRP → I

VBD → saw

PRP$ → her

NP → PRP$ NN

VP → VBD NP

S → PRP VP

PRP → her

VB → duckSBAR → PRP VB

VP → VBD SBAR

(1.0)(1.0)(0.3)(0.7)(1.0)(1.0)(1.0)(1.0)(0.8)(0.2)(1.0)

0.860.602

1.0 0.3

0.3 1.0

1.01.0

1.0

0.7

Rule expectations

I1 saw2 her3 duck4

PRP0,1

VBD1,2

VP1,4

PRP$2,3 NN3,4

NP2,4

PRP2,3 VB3,4

SBAR2,4

S0,4

0.860.602

Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))

Similarities

Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j � Yi,k ⇤ Zk,j ⇤ (X ⇥ Y Z)

Similarities

Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))

Similarities

Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))

h{T, F},_, F,^, T i

Similarities

Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))

h{T, F},_, F,^, T i

hR+,max, 0,⇥, 1i

Similarities

Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))

h{T, F},_, F,^, T i

hR+,max, 0,⇥, 1i

hR+,+, 0,⇥, 1i

Similarities

Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))

Xi,j = Xi,j � (Yi,k ⇥ Zk,j ⇥R(X ⇤ Y Z))

h{T, F},_, F,^, T i

hR+,max, 0,⇥, 1i

hR+,+, 0,⇥, 1i

Similarities

Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))

Xi,j = Xi,j � (Yi,k ⇥ Zk,j ⇥R(X ⇤ Y Z))

h{T, F},_, F,^, T i

hA,�,0,⌦,1i

hR+,max, 0,⇥, 1i

hR+,+, 0,⇥, 1i

Similarities

Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))

Xi,j = Xi,j � (Yi,k ⇥ Zk,j ⇥R(X ⇤ Y Z))

h{T, F},_, F,^, T i

hA,�,0,⌦,1isemiring

hR+,max, 0,⇥, 1i

hR+,+, 0,⇥, 1i

Similarities

Xi,j = Xi,j + (Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = max(Xi,j , Yi,k � Zk,j � p(X ⇥ Y Z))

Xi,j = Xi,j ⇤ (Yi,k ⇥ Zk,j ⇥ (X � Y Z))

Xi,j = Xi,j � (Yi,k ⇥ Zk,j ⇥R(X ⇤ Y Z))

boolean

tropical

inside

h{T, F},_, F,^, T i

hA,�,0,⌦,1isemiring

hR+,max, 0,⇥, 1i

hR+,+, 0,⇥, 1i

Parsing as weighted deductionFor sentence w1…wn, grammar G with nonterminals N

[i,X, j] 8i, j 2 0, ..., n,X 2 N

[X ! w] 8X ! w 2 PG

[wi = w] 8i 2 1, ..., naxioms:

items:inference rules:

[1, S, n]goal:

[X ! w] : u [wi+1 = w] : v

[i,X, i+ 1] : u⌦ v

[X ! Y Z] : u [i, Y, k] : v [k, Z, j] : y

[i,X, j] : u⌦ v ⌦ y

[X ! Y Z] 8X ! Y Z 2 PG

From proof to (pseudo)codeInput: w1…wn, grammar Gfor i in 1,…,n: for X->w_i in P(G): chart[i-1,X,i] := u(X->w_i)for l in 2,…,n: for i in 0,…,n-l: j := i+l for k in i+1,…,j-1: for X->YZ in P(G): chart[i,X,j] += chart[i,Y,k]*chart[k,Z,j]return chart[0,S,n]

Semiring parsing•Viterbi, inside, boolean (Goodman 1999)

•Expectation and variance semirings (Li & Eisner 2009)

•Feature expectations

•Minimum Bayes Risk

•Gradients, etc.

•minimum error upper envelope (Kumar et al. 2009)

Remaining questions

•How do we get the grammar?

•How do n-gram language models fit in?

•Is this really a plausible model of translation?

•Does it actually work? Why or why not?