chapter 11 tree-based models - machine translation · chapter 11: tree-based models 11. learning...
TRANSCRIPT
![Page 1: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/1.jpg)
Chapter 11
Tree-based models
Statistical Machine Translation
![Page 2: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/2.jpg)
Tree-Based Models
• Traditional statistical models operate on sequences of words
• Many translation problems can be best explained by pointing to syntax
– reordering, e.g., verb movement in German–English translation– long distance agreement (e.g., subject-verb) in output
⇒ Translation models based on tree representation of language
– significant ongoing research– state-of-the art for some language pairs
Chapter 11: Tree-Based Models 1
![Page 3: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/3.jpg)
Phrase Structure Grammar
• Phrase structure
– noun phrases: the big man, a house, ...– prepositional phrases: at 5 o’clock, in Edinburgh, ...– verb phrases: going out of business, eat chicken, ...– adjective phrases, ...
• Context-free Grammars (CFG)
– non-terminal symbols: phrase structure labels, part-of-speech tags– terminal symbols: words– production rules: nt → [nt,t]+
example: np → det nn
Chapter 11: Tree-Based Models 2
![Page 4: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/4.jpg)
Phrase Structure Grammar
PRPI
MDshall
VBbe
VBGpassing
RPon
TOto
PRPyou
DTsome
NNScomments
NP-APP
VP-AVP-A
VP-AS
Phrase structure grammar tree for an English sentence(as produced Collins’ parser)
Chapter 11: Tree-Based Models 3
![Page 5: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/5.jpg)
Synchronous Phrase Structure Grammar
• English rule
np → det jj nn
• French rule
np → det nn jj
• Synchronous rule (indices indicate alignment):
np → det1 nn2 jj3 | det1 jj3 nn2
Chapter 11: Tree-Based Models 4
![Page 6: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/6.jpg)
Synchronous Grammar Rules
• Nonterminal rules
np → det1 nn2 jj3 | det1 jj3 nn2
• Terminal rules
n → maison | house
np → la maison bleue | the blue house
• Mixed rules
np → la maison jj1 | the jj1 house
Chapter 11: Tree-Based Models 5
![Page 7: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/7.jpg)
Tree-Based Translation Model
• Translation by parsing
– synchronous grammar has to parse entire input sentence– output tree is generated at the same time– process is broken up into a number of rule applications
• Translation probability
score(tree,e, f) =∏
i
rulei
• Many ways to assign probabilities to rules
Chapter 11: Tree-Based Models 6
![Page 8: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/8.jpg)
Aligned Tree Pair
PRPI
MDshall
VBbe
VBGpassing
RPon
TOto
PRPyou
DTsome
NNScomments
NP-APP
VP-AVP-A
VP-AS
IchPPER
werdeVAFIN
IhnenPPER
dieART
entsprechendenADJ
AnmerkungenNN
aushändigenVVFIN
NP
VPS VP
Phrase structure grammar trees with word alignment(German–English sentence pair.)
Chapter 11: Tree-Based Models 7
![Page 9: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/9.jpg)
Reordering Rule
• Subtree alignmentvp
pper
...
np
...
vvfin
aushandigen
↔ vp
vbg
passing
rp
on
pp
...
np
...
• Synchronous grammar rulevp → pper1 np2 aushandigen | passing on pp1 np2
• Note:
– one word aushandigen mapped to two words passing on ok– but: fully non-terminal rule not possible
(one-to-one mapping constraint for nonterminals)
Chapter 11: Tree-Based Models 8
![Page 10: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/10.jpg)
Another Rule
• Subtree alignment
pro
Ihnen
↔ pp
to
to
prp
you
• Synchronous grammar rule (stripping out English internal structure)
pro/pp → Ihnen | to you
• Rule with internal structure
pro/pp → Ihnento
to
prp
you
Chapter 11: Tree-Based Models 9
![Page 11: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/11.jpg)
Another Rule
• Translation of German werde to English shall be
vp
vafin
werde
vp
...
↔ vp
md
shall
vp
vb
be
vp
...• Translation rule needs to include mapping of vp
⇒ Complex rule
vp →vafin
werde
vp1md
shall
vp
vb
be
vp1
Chapter 11: Tree-Based Models 10
![Page 12: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/12.jpg)
Internal Structure
• Stripping out internal structure
vp → werde vp1 | shall be vp1
⇒ synchronous context free grammar
• Maintaining internal structure
vp →vafin
werde
vp1md
shall
vp
vb
be
vp1
⇒ synchronous tree substitution grammar
Chapter 11: Tree-Based Models 11
![Page 13: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/13.jpg)
Learning Synchronous Grammars
• Extracting rules from a word-aligned parallel corpus
• First: Hierarchical phrase-based model
– only one non-terminal symbol x– no linguistic syntax, just a formally syntactic model
• Then: Synchronous phrase structure model
– non-terminals for words and phrases: np, vp, pp, adj, ...– corpus must also be parsed with syntactic parser
Chapter 11: Tree-Based Models 12
![Page 14: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/14.jpg)
Extracting Phrase Translation Rules
Ishall
bepassing
some
onto
you
comments
Ich
werd
eIh
nen
die
ents
prec
hend
enAn
mer
kung
enau
shän
dige
n
shall be = werde
Chapter 11: Tree-Based Models 13
![Page 15: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/15.jpg)
Extracting Phrase Translation Rules
Ishall
bepassing
some
onto
you
comments
Ich
werd
eIh
nen
die
ents
prec
hend
enAn
mer
kung
enau
shän
dige
nsome comments = die entsprechenden Anmerkungen
Chapter 11: Tree-Based Models 14
![Page 16: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/16.jpg)
Extracting Phrase Translation Rules
Ishall
bepassing
some
onto
you
comments
Ich
werd
eIh
nen
die
ents
prec
hend
enAn
mer
kung
enau
shän
dige
n
werde Ihnen die entsprechenden Anmerkungen aushändigen = shall be passing on to you some comments
Chapter 11: Tree-Based Models 15
![Page 17: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/17.jpg)
Extracting Hierarchical Phrase Translation Rules
Ishall
bepassing
some
onto
you
comments
Ich
werd
eIh
nen
die
ents
prec
hend
enAn
mer
kung
enau
shän
dige
n
werde X aushändigen= shall be passing on X
subtractingsubphrase
Chapter 11: Tree-Based Models 16
![Page 18: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/18.jpg)
Formal Definition
• Recall: consistent phrase pairs
(e, f) consistent with A⇔∀ei ∈ e : (ei, fj) ∈ A→ fj ∈ f
and ∀fj ∈ f : (ei, fj) ∈ A→ ei ∈ eand ∃ei ∈ e, fj ∈ f : (ei, fj) ∈ A
• Let P be the set of all extracted phrase pairs (e, f)
Chapter 11: Tree-Based Models 17
![Page 19: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/19.jpg)
Formal Definition
• Extend recursively:
if (e, f) ∈ P and (esub, fsub) ∈ Pand e = epre + esub + epost
and f = fpre + fsub + fpost
and e 6= esub and f 6= fsub
add (epre + x + epost, fpre + x + fpost) to P
(note: any of epre, epost, fpre, or fpost may be empty)
• Set of hierarchical phrase pairs is the closure under this extension mechanism
Chapter 11: Tree-Based Models 18
![Page 20: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/20.jpg)
Comments
• Removal of multiple sub-phrases leads to rules with multiple non-terminals,such as:
y → x1 x2 | x2 of x1
• Typical restrictions to limit complexity [Chiang, 2005]
– at most 2 nonterminal symbols– at least 1 but at most 5 words per language– span at most 15 words (counting gaps)
Chapter 11: Tree-Based Models 19
![Page 21: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/21.jpg)
Learning Syntactic Translation Rules
PRP IMD shall
VB beVBG passing
DT some
RP onTO to
PRP you
NNS comments
Ich
PPE
R
werd
e V
AFIN
Ihne
n P
PER
die
ART
ents
pr.
ADJ
Anm
. N
N
aush
änd.
VV
FIN
NP
PPVP
VP
VP
S
NP
VPVP
S
pro
Ihnen
= pp
to
to
prp
you
Chapter 11: Tree-Based Models 20
![Page 22: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/22.jpg)
Constraints on Syntactic Rules
• Same word alignment constraints as hierarchical models
• Hierarchical: rule can cover any span⇔ syntactic rules must cover constituents in the tree
• Hierarchical: gaps may cover any span⇔ gaps must cover constituents in the tree
• Much less rules are extracted (all things being equal)
Chapter 11: Tree-Based Models 21
![Page 23: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/23.jpg)
Impossible Rules
PRP IMD shall
VB beVBG passing
DT some
RP onTO to
PRP you
NNS comments
Ich
PPE
R
werd
e V
AFIN
Ihne
n P
PER
die
ART
ents
pr.
ADJ
Anm
. N
N
aush
änd.
VV
FIN
NP
PPVP
VP
VP
S
NP
VPVP
S
English span not a constituentno rule extracted
Chapter 11: Tree-Based Models 22
![Page 24: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/24.jpg)
Rules with Context
PRP IMD shall
VB beVBG passing
DT some
RP onTO to
PRP you
NNS comments
Ich
PPE
R
werd
e V
AFIN
Ihne
n P
PER
die
ART
ents
pr.
ADJ
Anm
. N
N
aush
änd.
VV
FIN
NP
PPVP
VP
VP
S
NP
VPVP
S
Rule with this phrase pair
requires syntactic context
vp
vafin
werde
vp
... =
vp
md
shall
vp
vb
be
vp
...
Chapter 11: Tree-Based Models 23
![Page 25: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/25.jpg)
Too Many Rules Extractable
• Huge number of rules can be extracted(every alignable node may or may not be part of a rule → exponential number of rules)
• Need to limit which rules to extract
• Option 1: similar restriction as for hierarchical model(maximum span size, maximum number of terminals and non-terminals, etc.)
• Option 2: only extract minimal rules (”GHKM” rules)
Chapter 11: Tree-Based Models 24
![Page 26: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/26.jpg)
Minimal Rules
I shall be passing on to you some comments
PRP MD VB VBG RP TO PRP DT NNS
NPPP
VP
VP
VP
S
Ich werde Ihnen die entsprechenden Anmerkungen aushändigen
Extract: set of smallest rules required to explain the sentence pair
Chapter 11: Tree-Based Models 25
![Page 27: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/27.jpg)
Lexical Rule
I shall be passing on to you some comments
PRP MD VB VBG RP TO PRP DT NNS
NPPP
VP
VP
VP
S
Ich werde Ihnen die entsprechenden Anmerkungen aushändigen
Extracted rule: prp → Ich | I
Chapter 11: Tree-Based Models 26
![Page 28: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/28.jpg)
Lexical Rule
I shall be passing on to you some comments
PRP MD VB VBG RP TO PRP DT NNS
NPPP
VP
VP
VP
S
Ich werde Ihnen die entsprechenden Anmerkungen aushändigen
Extracted rule: prp → Ihnen | you
Chapter 11: Tree-Based Models 27
![Page 29: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/29.jpg)
Lexical Rule
I shall be passing on to you some comments
PRP MD VB VBG RP TO PRP DT NNS
NPPP
VP
VP
VP
S
Ich werde Ihnen die entsprechenden Anmerkungen aushändigen
Extracted rule: dt → die | some
Chapter 11: Tree-Based Models 28
![Page 30: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/30.jpg)
Lexical Rule
I shall be passing on to you some comments
PRP MD VB VBG RP TO PRP DT NNS
NPPP
VP
VP
VP
S
Ich werde Ihnen die entsprechenden Anmerkungen aushändigen
Extracted rule: nns → Anmerkungen | comments
Chapter 11: Tree-Based Models 29
![Page 31: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/31.jpg)
Insertion Rule
I shall be passing on to you some comments
PRP MD VB VBG RP TO PRP DT NNS
NPPP
VP
VP
VP
S
Ich werde Ihnen die entsprechenden Anmerkungen aushändigen
Extracted rule: pp → x | to prp
Chapter 11: Tree-Based Models 30
![Page 32: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/32.jpg)
Non-Lexical Rule
I shall be passing on to you some comments
PRP MD VB VBG RP TO PRP DT NNS
NPPP
VP
VP
VP
S
Ich werde Ihnen die entsprechenden Anmerkungen aushändigen
Extracted rule: np → x1 x2 | dt1 nns2
Chapter 11: Tree-Based Models 31
![Page 33: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/33.jpg)
Lexical Rule with Syntactic Context
I shall be passing on to you some comments
PRP MD VB VBG RP TO PRP DT NNS
NPPP
VP
VP
VP
S
Ich werde Ihnen die entsprechenden Anmerkungen aushändigen
Extracted rule: vp → x1 x2 aushandigen | passing on pp1 np2
Chapter 11: Tree-Based Models 32
![Page 34: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/34.jpg)
Lexical Rule with Syntactic Context
I shall be passing on to you some comments
PRP MD VB VBG RP TO PRP DT NNS
NPPP
VP
VP
VP
S
Ich werde Ihnen die entsprechenden Anmerkungen aushändigen
Extracted rule: vp → werde x | shall be vp (ignoring internal structure)
Chapter 11: Tree-Based Models 33
![Page 35: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/35.jpg)
Non-Lexical Rule
I shall be passing on to you some comments
PRP MD VB VBG RP TO PRP DT NNS
NPPP
VP
VP
VP
S
Ich werde Ihnen die entsprechenden Anmerkungen aushändigen
Extracted rule: s → x1 x2 | prp1 vp2
done — note: one rule per alignable constituent
Chapter 11: Tree-Based Models 34
![Page 36: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/36.jpg)
Unaligned Source Words
I shall be passing on to you some comments
PRP MD VB VBG RP TO PRP DT NNS
NPPP
VP
VP
VP
S
Ich werde Ihnen die entsprechenden Anmerkungen aushändigen
Attach to neighboring words or higher nodes → additional rules
Chapter 11: Tree-Based Models 35
![Page 37: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/37.jpg)
Too Few Phrasal Rules?
• Lexical rules will be 1-to-1 mappings (unless word alignment requires otherwise)
• But: phrasal rules very beneficial in phrase-based models
• Solutions
– combine rules that contain a maximum number of symbols(as in hierarchical models, recall: ”Option 1”)
– compose minimal rules to cover a maximum number of non-leaf nodes
Chapter 11: Tree-Based Models 36
![Page 38: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/38.jpg)
Composed Rules
• Current rules x1 x2 = np
dt1 nns1
die = dt
some
entsprechenden Anmerkungen = nns
comments
• Composed ruledie entsprechenden Anmerkungen = np
dt
some
nns
comments
(1 non-leaf node: np)
Chapter 11: Tree-Based Models 37
![Page 39: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/39.jpg)
Composed Rules
• Minimal rule: x1 x2 aushandigen = vp
prp
passing
prp
on
pp1 np2
3 non-leaf nodes:vp, pp, np
• Composed rule: Ihnen x1 aushandigen = vp
prp
passing
prp
on
pp
to
to
prp
you
np1
3 non-leaf nodes:vp, pp and np
Chapter 11: Tree-Based Models 38
![Page 40: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/40.jpg)
Relaxing Tree Constraints
• Impossible rulex
werde
= md
shall
vb
be
• Create new non-terminal label: md+vb
⇒ New rulex
werde
= md+vb
md
shall
vb
be
Chapter 11: Tree-Based Models 39
![Page 41: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/41.jpg)
Zollmann Venugopal Relaxation
• If span consists of two constituents , join them: x+y
• If span conststs of three constituents, join them: x+y+z
• If span covers constituents with the same parent x and include
– every but the first child y, label as x\y– every but the last child y, label as x/y
• For all other cases, label as fail
⇒ More rules can be extracted, but number of non-terminals blows up
Chapter 11: Tree-Based Models 40
![Page 42: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/42.jpg)
Special Problem: Flat Structures
• Flat structures severely limit rule extraction
np
dt
the
nnp
Israeli
nnp
Prime
nnp
Minister
nnp
Sharon
• Can only extract rules for individual words or entire phrase
Chapter 11: Tree-Based Models 41
![Page 43: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/43.jpg)
Relaxation by Tree Binarizationnp
dt
the
np
nnp
Israeli
np
nnp
Prime
np
nnp
Minister
nnp
Sharon
More rules can be extracted
Left-binarization or right-binarization?
Chapter 11: Tree-Based Models 42
![Page 44: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/44.jpg)
Scoring Translation Rules
• Extract all rules from corpus
• Score based on counts
– joint rule probability: p(lhs,rhsf ,rhse)– rule application probability: p(rhsf ,rhse|lhs)– direct translation probability: p(rhse|rhsf , lhs)– noisy channel translation probability: p(rhsf |rhse, lhs)– lexical translation probability:
∏ei∈rhse
p(ei|rhsf , a)
Chapter 11: Tree-Based Models 43
![Page 45: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/45.jpg)
Syntactic DecodingInspired by monolingual syntactic chart parsing:
During decoding of the source sentence,a chart with translations for the O(n2) spans has to be filled
SiePPER
willVAFIN
eineART
TasseNN
KaffeeNN
trinkenVVINF
NP
VPS
Chapter 11: Tree-Based Models 44
![Page 46: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/46.jpg)
Syntax Decoding
SiePPER
willVAFIN
eineART
TasseNN
KaffeeNN
trinkenVVINF
NP
VPS
VBdrink
➏
German input sentence with tree
Chapter 11: Tree-Based Models 45
![Page 47: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/47.jpg)
Syntax Decoding
SiePPER
willVAFIN
eineART
TasseNN
KaffeeNN
trinkenVVINF
NP
VPS
PROshe
VBdrink
➏
➊
Purely lexical rule: filling a span with a translation (a constituent in the chart)
Chapter 11: Tree-Based Models 46
![Page 48: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/48.jpg)
Syntax Decoding
SiePPER
willVAFIN
eineART
TasseNN
KaffeeNN
trinkenVVINF
NP
VPS
PROshe
VBdrink
NNcoffee
➏
➊ ➋
Purely lexical rule: filling a span with a translation (a constituent in the chart)
Chapter 11: Tree-Based Models 47
![Page 49: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/49.jpg)
Syntax Decoding
SiePPER
willVAFIN
eineART
TasseNN
KaffeeNN
trinkenVVINF
NP
VPS
PROshe
VBdrink
NNcoffee
➏
➊ ➋ ➌
Purely lexical rule: filling a span with a translation (a constituent in the chart)
Chapter 11: Tree-Based Models 48
![Page 50: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/50.jpg)
Syntax Decoding
SiePPER
willVAFIN
eineART
TasseNN
KaffeeNN
trinkenVVINF
NP
VPS
PROshe
VBdrink
NN|
cup
IN|of
NP
PP
NN
NP
DET|a
NNcoffee
➏
➊ ➋ ➌
➍
Complex rule: matching underlying constituent spans, and covering words
Chapter 11: Tree-Based Models 49
![Page 51: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/51.jpg)
Syntax Decoding
SiePPER
willVAFIN
eineART
TasseNN
KaffeeNN
trinkenVVINF
NP
VPS
PROshe
VBdrink
NN|
cup
IN|of
NP
PP
NN
NP
DET|a
VBZ|
wantsVB
VPVP
NPTO|to
NNcoffee
➏
➊ ➋ ➌
➍
➎
Complex rule with reordering
Chapter 11: Tree-Based Models 50
![Page 52: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/52.jpg)
Syntax Decoding
SiePPER
willVAFIN
eineART
TasseNN
KaffeeNN
trinkenVVINF
NP
VPS
PROshe
VBdrink
NN|
cup
IN|
of
NP
PP
NN
NP
DET|a
VBZ|
wantsVB
VPVP
NPTO|
to
NNcoffee
S
PRO VP
➏
➊ ➋ ➌
➍
➎
Chapter 11: Tree-Based Models 51
![Page 53: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/53.jpg)
Bottom-Up Decoding
• For each span, a stack of (partial) translations is maintained
• Bottom-up: a higher stack is filled, once underlying stacks are complete
Chapter 11: Tree-Based Models 52
![Page 54: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/54.jpg)
Naive Algorithm
Input: Foreign sentence f = f1, ...flf , with syntax treeOutput: English translation e
1: for all spans [start,end] (bottom up) do2: for all sequences s of hypotheses and words in span [start,end] do3: for all rules r do4: if rule r applies to chart sequence s then5: create new hypothesis c6: add hypothesis c to chart7: end if8: end for9: end for
10: end for11: return English translation e from best hypothesis in span [0,lf ]
Chapter 11: Tree-Based Models 53
![Page 55: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/55.jpg)
Chart Organization
SiePPER
willVAFIN
eineART
TasseNN
KaffeeNN
trinkenVVINF
NP
VPS
• Chart consists of cells that cover contiguous spans over the input sentence
• Each cell contains a set of hypotheses1
• Hypothesis = translation of span with target-side constituent
1In the book, they are called chart entries.
Chapter 11: Tree-Based Models 54
![Page 56: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/56.jpg)
Dynamic Programming
Applying rule creates new hypothesis
eineART
TasseNN
KaffeeNN
trinkenVVINF
NP: coffee
NP+P: a cup of
NP: a cup of coffee
apply rule:NP → NP Kaffee ; NP → NP+P coffee
Chapter 11: Tree-Based Models 55
![Page 57: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/57.jpg)
Dynamic Programming
Another hypothesis
eineART
TasseNN
KaffeeNN
trinkenVVINF
NP: coffee
NP+P: a cup of
NP: a cup of coffee
apply rule:NP → eine Tasse NP ; NP → a cup of NP
NP: a cup of coffee
Both hypotheses are indistiguishable in future search→ can be recombined
Chapter 11: Tree-Based Models 56
![Page 58: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/58.jpg)
Recombinable States
Recombinable?
NP: a cup of coffee
NP: a cup of coffee
NP: a mug of coffee
Chapter 11: Tree-Based Models 57
![Page 59: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/59.jpg)
Recombinable States
Recombinable?
NP: a cup of coffee
NP: a cup of coffee
NP: a mug of coffee
Yes, iff max. 2-gram language model is used
Chapter 11: Tree-Based Models 58
![Page 60: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/60.jpg)
Recombinability
Hypotheses have to match in
• span of input words covered
• output constituent label
• first n–1 output wordsnot properly scored, since they lack context
• last n–1 output wordsstill affect scoring of subsequently added words,
just like in phrase-based decoding
(n is the order of the n-gram language model)
Chapter 11: Tree-Based Models 59
![Page 61: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/61.jpg)
Language Model Contexts
When merging hypotheses, internal language model contexts are absorbed
NP
(minister)the foreign ... ... of Germany
S
(minister of Germany met with Condoleezza Rice)the foreign ... ... in Frankfurt
VP
(Condoleezza Rice)met with ... ... in Frankfurt
relevant history un-scored wordspLM(met | of Germany)
pLM(with | Germany met)
Chapter 11: Tree-Based Models 60
![Page 62: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/62.jpg)
Stack Pruning
• Number of hypotheses in each chart cell explodes
⇒ need to discard bad hypothesese.g., keep 100 best only
• Different stacks for different output constituent labels?
• Cost estimates
– translation model cost known– language model cost for internal words known→ estimates for initial words
– outside cost estimate?(how useful will be a NP covering input words 3–5 later on?)
Chapter 11: Tree-Based Models 61
![Page 63: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/63.jpg)
Naive Algorithm: Blow-ups
• Many subspan sequences
for all sequences s of hypotheses and words in span [start,end]
• Many rules
for all rules r
• Checking if a rule applies not trivial
rule r applies to chart sequence s
⇒ Unworkable
Chapter 11: Tree-Based Models 62
![Page 64: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/64.jpg)
Solution
• Prefix tree data structure for rules
• Dotted rules
• Cube pruning
Chapter 11: Tree-Based Models 63
![Page 65: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/65.jpg)
Storing Rules
• First concern: do they apply to span?→ have to match available hypotheses and input words
• Example rule
np → x1 des x2 | np1 of the nn2
• Check for applicability
– is there an initial sub-span that with a hypothesis with constituent label np?– is it followed by a sub-span over the word des?– is it followed by a final sub-span with a hypothesis with label nn?
• Sequence of relevant information
np • des • nn • np1 of the nn2
Chapter 11: Tree-Based Models 64
![Page 66: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/66.jpg)
Rule Applicability Check
Trying to cover a span of six words with given rule
das Haus des Architekten Frank Gehry
NP • des • NN → NP: NP of the NN
Chapter 11: Tree-Based Models 65
![Page 67: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/67.jpg)
Rule Applicability Check
First: check for hypotheses with output constituent label np
das Haus des Architekten Frank Gehry
NP • des • NN → NP: NP of the NN
Chapter 11: Tree-Based Models 66
![Page 68: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/68.jpg)
Rule Applicability Check
Found np hypothesis in cell, matched first symbol of rule
das Haus des Architekten Frank Gehry
NP
NP • des • NN → NP: NP of the NN
Chapter 11: Tree-Based Models 67
![Page 69: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/69.jpg)
Rule Applicability Check
Matched word des, matched second symbol of rule
das Haus des Architekten Frank Gehry
NP
NP • des • NN → NP: NP of the NN
Chapter 11: Tree-Based Models 68
![Page 70: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/70.jpg)
Rule Applicability Check
Found a nn hypothesis in cell, matched last symbol of rule
das Haus des Architekten Frank Gehry
NP NN
NP • des • NN → NP: NP of the NN
Chapter 11: Tree-Based Models 69
![Page 71: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/71.jpg)
Rule Applicability Check
Matched entire rule → apply to create a np hypothesis
das Haus des Architekten Frank Gehry
NP NN
NP • des • NN → NP: NP of the NN
NP
Chapter 11: Tree-Based Models 70
![Page 72: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/72.jpg)
Rule Applicability Check
Look up output words to create new hypothesis(note: there may be many matching underlying np and nn hypotheses)
das Haus des Architekten Frank Gehry
NP: the house NN: architect Frank Gehry
NP • des • NN → NP: NP of the NN
NP: the house of the architect Frank Gehry
Chapter 11: Tree-Based Models 71
![Page 73: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/73.jpg)
Checking Rules vs. Finding Rules
• What we showed:
– given a rule– check if and how it can be applied
• But there are too many rules (millions) to check them all
• Instead:
– given the underlying chart cells and input words– find which rules apply
Chapter 11: Tree-Based Models 72
![Page 74: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/74.jpg)
Prefix Tree for Rules
NP: NP1 of IN2 NP3
NP
PP …
DETNP …
desum
......
NN
NN
NP: NP1 IN2 NP3
NP: NP1 of DET2 NP3
NP: NP1 of the NN2
VP …
VP …
DET NN NP: DET1 NN2
......
NP: NP1
das Haus NP: the house
NP: NP1 of NP2
NP: NP2 NP1
......
... ...
... ...
...
Highlighted Rulesnp → np1 det2 nn3 | np1 in2 nn3
np → np1 | np1
np → np1 des nn2 | np1 of the nn2
np → np1 des nn2 | np2 np1
np → det1 nn2 | det1 nn2
np → das Haus | the house
Chapter 11: Tree-Based Models 73
![Page 75: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/75.jpg)
Dotted Rules: Key Insight
• If we can apply a rule like
p → A B C | x
to a span
• Then we could have applied a rule like
q → A B | y
to a sub-span with the same starting word
⇒ We can re-use rule lookup by storing A B • (dotted rule)
Chapter 11: Tree-Based Models 74
![Page 76: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/76.jpg)
Finding Applicable Rules in Prefix Tree
das Haus des Architekten Frank Gehry
Chapter 11: Tree-Based Models 75
![Page 77: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/77.jpg)
Covering the First Cell
das Haus des Architekten Frank Gehry
Chapter 11: Tree-Based Models 76
![Page 78: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/78.jpg)
Looking up Rules in the Prefix Tree
● das ❶
das Haus des Architekten Frank Gehry
Chapter 11: Tree-Based Models 77
![Page 79: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/79.jpg)
Taking Note of the Dotted Rule
● das ❶
das ❶
das Haus des Architekten Frank Gehry
Chapter 11: Tree-Based Models 78
![Page 80: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/80.jpg)
Checking if Dotted Rule has Translations
● das ❶ DET: theDET: that
das ❶
das Haus des Architekten Frank Gehry
Chapter 11: Tree-Based Models 79
![Page 81: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/81.jpg)
Applying the Translation Rules
● das ❶ DET: theDET: that
das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
Chapter 11: Tree-Based Models 80
![Page 82: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/82.jpg)
Looking up Constituent Label in Prefix Tree
● das ❶
DET ❷
das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
Chapter 11: Tree-Based Models 81
![Page 83: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/83.jpg)
Add to Span’s List of Dotted Rules
● das ❶
DET ❷
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
Chapter 11: Tree-Based Models 82
![Page 84: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/84.jpg)
Moving on to the Next Cell
● das ❶
DET ❷
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
Chapter 11: Tree-Based Models 83
![Page 85: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/85.jpg)
Looking up Rules in the Prefix Tree
● das ❶
DET ❷
Haus ❸
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
Chapter 11: Tree-Based Models 84
![Page 86: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/86.jpg)
Taking Note of the Dotted Rule
● das ❶
DET ❷
Haus ❸
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
house ❸
Chapter 11: Tree-Based Models 85
![Page 87: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/87.jpg)
Checking if Dotted Rule has Translations
● das ❶
DET ❷
Haus ❸NN: houseNP: house
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
house ❸
Chapter 11: Tree-Based Models 86
![Page 88: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/88.jpg)
Applying the Translation Rules
● das ❶
DET ❷
Haus ❸NN: houseNP: house
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
house ❸
NN: houseNP: house
Chapter 11: Tree-Based Models 87
![Page 89: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/89.jpg)
Looking up Constituent Label in Prefix Tree
● das ❶
DET ❷
Haus ❸
NN ❹NP ❺
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
house ❸
NN: houseNP: house
Chapter 11: Tree-Based Models 88
![Page 90: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/90.jpg)
Add to Span’s List of Dotted Rules
● das ❶
DET ❷
Haus ❸
NN ❹NP ❺
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
NN ❹ NP ❺ house ❸
NN: houseNP: house
Chapter 11: Tree-Based Models 89
![Page 91: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/91.jpg)
More of the Same
● das ❶
DET ❷
Haus ❸
NN ❹NP ❺
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
NN ❹ NP ❺ house ❸
NN: houseNP: house
DET ❷des•
DET: theIN: of
NN ❹Architekten•
NN: architectNP: architect
NNP•Frank•
NNP: FrankNNP•Gehry•
NNP: Gehry
Chapter 11: Tree-Based Models 90
![Page 92: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/92.jpg)
Moving on to the Next Cell
● das ❶
DET ❷
Haus ❸
NN ❹NP ❺
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
NN ❹ NP ❺ house ❸
NN: houseNP: house
DET ❷des•
DET: theIN: of
NN ❹Architekten•
NN: architectNP: architect
NNP•Frank•
NNP: FrankNNP•Gehry•
NNP: Gehry
Chapter 11: Tree-Based Models 91
![Page 93: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/93.jpg)
Covering a Longer Span
Cannot consume multiple words at once
All rules are extensions of existing dotted rules
Here: only extensions of span over das possible
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
NN ❹ NP ❺ house ❸
NN: houseNP: house
DET ❷des•
DET: theIN: of
NN ❹Architekten•
NN: architectNP: architect
NNP•Frank•
NNP: FrankNNP•Gehry•
NNP: Gehry
Chapter 11: Tree-Based Models 92
![Page 94: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/94.jpg)
Extensions of Span over das
● das ❶
DET ❷
Haus ❸
NN ❹NP ❺
NN, NP, Haus?
NN, NP, Haus?
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
NN ❹ NP ❺ house ❸
NN: houseNP: house
DET ❷des•
DET: theIN: of
NN ❹Architekten•
NN: architectNP: architect
NNP•Frank•
NNP: FrankNNP•Gehry•
NNP: Gehry
Chapter 11: Tree-Based Models 93
![Page 95: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/95.jpg)
Looking up Rules in the Prefix Tree
● das ❶
DET ❷
Haus ❻NN ❼
Haus ❽NN ❾
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
NN ❹ NP ❺ house ❸
NN: houseNP: house
DET ❷des•
DET: theIN: of
NN ❹Architekten•
NN: architectNP: architect
NNP•Frank•
NNP: FrankNNP•Gehry•
NNP: Gehry
Chapter 11: Tree-Based Models 94
![Page 96: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/96.jpg)
Taking Note of the Dotted Rule
● das ❶
DET ❷
Haus ❻NN ❼
Haus ❽NN ❾
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
NN ❹ NP ❺ house ❸
NN: houseNP: house
DET ❷des•
DET: theIN: of
NN ❹Architekten•
NN: architectNP: architect
NNP•Frank•
NNP: FrankNNP•Gehry•
NNP: Gehry
DET NN❾DET Haus❽das NN❼das Haus❻
Chapter 11: Tree-Based Models 95
![Page 97: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/97.jpg)
Checking if Dotted Rules have Translations
● das ❶
DET ❷
Haus ❻NN ❼
Haus ❽NN ❾
NP: the house
NP: the NN
NP: DET house
NP: DET NN
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
NN ❹ NP ❺ house ❸
NN: houseNP: house
DET ❷des•
DET: theIN: of
NN ❹Architekten•
NN: architectNP: architect
NNP•Frank•
NNP: FrankNNP•Gehry•
NNP: Gehry
DET NN❾DET Haus❽das NN❼das Haus❻
Chapter 11: Tree-Based Models 96
![Page 98: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/98.jpg)
Applying the Translation Rules
● das ❶
DET ❷
Haus ❻NN ❼
Haus ❽NN ❾
NP: the house
NP: the NN
NP: DET house
NP: DET NN
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
NN ❹ NP ❺ house ❸
NN: houseNP: house
DET ❷des•
DET: theIN: of
NN ❹Architekten•
NN: architectNP: architect
NNP•Frank•
NNP: FrankNNP•Gehry•
NNP: Gehry
DET NN❾DET Haus❽das NN❼das Haus❻
NP: the houseNP: that house
Chapter 11: Tree-Based Models 97
![Page 99: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/99.jpg)
Looking up Constituent Label in Prefix Tree
● das ❶
DET ❷
Haus ❻NN ❼
Haus ❽NN ❾
NP: the house
NP: the NN
NP: DET house
NP: DET NN
NP ❺
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
NN ❹ NP ❺ house ❸
NN: houseNP: house
DET ❷des•
DET: theIN: of
NN ❹Architekten•
NN: architectNP: architect
NNP•Frank•
NNP: FrankNNP•Gehry•
NNP: Gehry
DET NN❾DET Haus❽das NN❼das Haus❻
NP: the houseNP: that house
Chapter 11: Tree-Based Models 98
![Page 100: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/100.jpg)
Add to Span’s List of Dotted Rules
● das ❶
DET ❷
Haus ❻NN ❼
Haus ❽NN ❾
NP: the house
NP: the NN
NP: DET house
NP: DET NN
NP ❺
DET ❷das ❶
das Haus des Architekten Frank Gehry
DET: theDET: that
NN ❹ NP ❺ house ❸
NN: houseNP: house
DET ❷des•
DET: theIN: of
NN ❹Architekten•
NN: architectNP: architect
NNP•Frank•
NNP: FrankNNP•Gehry•
NNP: Gehry
DET NN❾ NP❺ DET Haus❽das NN❼das Haus❻
NP: the houseNP: that house
Chapter 11: Tree-Based Models 99
![Page 101: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/101.jpg)
Even Larger Spans
Extend lists of dotted rules with cell constituent labels
span’s dotted rule list (with same start)plus neighboring
span’s constituent labels of hypotheses (with same end)
das Haus des Architekten Frank Gehry
Chapter 11: Tree-Based Models 100
![Page 102: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/102.jpg)
Reflections
• Complexity O(rn3) with sentence length n and size of dotted rule list r
– may introduce maximum size for spans that do not start at beginning– may limit size of dotted rule list (very arbitrary)
• Does the list of dotted rules explode?
• Yes, if there are many rules with neighboring target-side non-terminals
– such rules apply in many places– rules with words are much more restricted
Chapter 11: Tree-Based Models 101
![Page 103: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/103.jpg)
Difficult Rules
• Some rules may apply in too many ways
• Neighboring input non-terminals
vp → gibt x1 x2 | gives np2 to np1
– non-terminals may match many different pairs of spans– especially a problem for hierarchical models (no constituent label restrictions)– may be okay for syntax-models
• Three neighboring input non-terminals
vp → trifft x1 x2 x3 heute | meets np1 today pp2 pp3
– will get out of hand even for syntax models
Chapter 11: Tree-Based Models 102
![Page 104: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/104.jpg)
Where are we now?
• We know which rules apply
• We know where they apply (each non-terminal tied to a span)
• But there are still many choices
– many possible translations– each non-terminal may match multiple hypotheses→ number choices exponential with number of non-terminals
Chapter 11: Tree-Based Models 103
![Page 105: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/105.jpg)
Rules with One Non-Terminal
Found applicable rules pp → des x | ... np ...
the architect ... NP
architect Frank ...the famous ...Frank Gehry
NP
NP
PP ➝ of NPNPPP ➝ by NP
PP ➝ in NP
PP ➝ on to NP
• Non-terminal will be filled any of h underlying matching hypotheses
• Choice of t lexical translations
⇒ Complexity O(ht)
(note: we may not group rules by target constituent label,
so a rule np → des x | the np would also be considered here as well)
Chapter 11: Tree-Based Models 104
![Page 106: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/106.jpg)
Rules with Two Non-Terminals
Found applicable rule np → x1 des x2 | np1 ... np2
the architect NP
architect Frank ...the famous ...Frank Gehry
NP
NP
NP ➝ NP of NPNPNP ➝ NP by NP
NP ➝ NP in NP
NP ➝ NP on to NP
a housea building
the buildinga new house
• Two non-terminal will be filled any of h underlying matching hypotheses each
• Choice of t lexical translations
⇒ Complexity O(h2t) — a three-dimensional ”cube” of choices
(note: rules may also reorder differently)
Chapter 11: Tree-Based Models 105
![Page 107: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/107.jpg)
Cube Pruning
a house 1.0
a building 1.3
the building 2.2
a new house 2.6
1.5
in th
e ...
1.7
by a
rchi
tect
...
2.6
by th
e ...
3.2
of th
e ...
Arrange all the choices in a ”cube”
(here: a square, generally a orthotope, also called a hyperrectangle)
Chapter 11: Tree-Based Models 106
![Page 108: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/108.jpg)
Create the First Hypothesis
2.1a house 1.0
a building 1.3
the building 2.2
a new house 2.6
1.5
in th
e ...
1.7
by a
rchi
tect
...
2.6
by th
e ...
3.2
of th
e ...
2.1
• Hypotheses created in cube: (0,0)
Chapter 11: Tree-Based Models 107
![Page 109: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/109.jpg)
Add (”Pop”) Hypothesis to Chart Cell
2.1a house 1.0
a building 1.3
the building 2.2
a new house 2.6
1.5
in th
e ...
1.7
by a
rchi
tect
...
2.6
by th
e ...
3.2
of th
e ...
• Hypotheses created in cube: ε
• Hypotheses in chart cell stack: (0,0)
Chapter 11: Tree-Based Models 108
![Page 110: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/110.jpg)
Create Neighboring Hypotheses
2.1a house 1.0
a building 1.3
the building 2.2
a new house 2.6
1.5
in th
e ...
1.7
by a
rchi
tect
...
2.6
by th
e ...
3.2
of th
e ...
2.5
2.7
• Hypotheses created in cube: (0,1), (1,0)
• Hypotheses in chart cell stack: (0,0)
Chapter 11: Tree-Based Models 109
![Page 111: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/111.jpg)
Pop Best Hypothesis to Chart Cell
2.1a house 1.0
a building 1.3
the building 2.2
a new house 2.6
1.5
in th
e ...
1.7
by a
rchi
tect
...
2.6
by th
e ...
3.2
of th
e ...
2.5
2.7
• Hypotheses created in cube: (0,1)
• Hypotheses in chart cell stack: (0,0), (1,0)
Chapter 11: Tree-Based Models 110
![Page 112: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/112.jpg)
Create Neighboring Hypotheses
2.1a house 1.0
a building 1.3
the building 2.2
a new house 2.6
1.5
in th
e ...
1.7
by a
rchi
tect
...
2.6
by th
e ...
3.2
of th
e ...
2.5
2.7 2.4
3.1
• Hypotheses created in cube: (0,1), (1,1), (2,0)
• Hypotheses in chart cell stack: (0,0), (1,0)
Chapter 11: Tree-Based Models 111
![Page 113: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/113.jpg)
More of the Same
2.1a house 1.0
a building 1.3
the building 2.2
a new house 2.6
1.5
in th
e ...
1.7
by a
rchi
tect
...
2.6
by th
e ...
3.2
of th
e ...
2.5
2.7 2.4
3.1
3.0
3.8
• Hypotheses created in cube: (0,1), (1,2), (2,1), (2,0)
• Hypotheses in chart cell stack: (0,0), (1,0), (1,1)
Chapter 11: Tree-Based Models 112
![Page 114: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/114.jpg)
Queue of Cubes
• Several groups of rules will apply to a given span
• Each of them will have a cube
• We can create a queue of cubes
⇒ Always pop off the most promising hypothesis, regardless of cube
• May have separate queues for different target constituent labels
Chapter 11: Tree-Based Models 113
![Page 115: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/115.jpg)
Bottom-Up Chart Decoding Algorithm
1: for all spans (bottom up) do2: extend dotted rules3: for all dotted rules do4: find group of applicable rules5: create a cube for it6: create first hypothesis in cube7: place cube in queue8: end for9: for specified number of pops do
10: pop off best hypothesis of any cube in queue11: add it to the chart cell12: create its neighbors13: end for14: extend dotted rules over constituent labels15: end for
Chapter 11: Tree-Based Models 114
![Page 116: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/116.jpg)
Two-Stage Decoding
• First stage: decoding without a language model (-LM decoding)
– may be done exhaustively– eliminate dead ends– optionably prune out low scoring hypotheses
• Second stage: add language model
– limited to packed chart obtained in first stage
• Note: essentially, we do two-stage decoding for each span at a time
Chapter 11: Tree-Based Models 115
![Page 117: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/117.jpg)
Coarse-to-Fine
• Decode with increasingly complex model
• Examples
– reduced language model [Zhang and Gildea, 2008]– reduced set of non-terminals [DeNero et al., 2009]– language model on clustered word classes [Petrov et al., 2008]
Chapter 11: Tree-Based Models 116
![Page 118: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/118.jpg)
Outside Cost Estimation
• Which spans should be more emphasized in search?
• Initial decoding stage can provide outside cost estimates
SiePPER
willVAFIN
eineART
TasseNN
KaffeeNN
trinkenVVINF
NP
• Use min/max language model costs to obtain admissible heuristic(or at least something that will guide search better)
Chapter 11: Tree-Based Models 117
![Page 119: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/119.jpg)
Open Questions
• Where does the best translation fall out the beam?
• How accurate are LM estimates?
• Are particular types of rules too quickly discarded?
• Are there systemic problems with cube pruning?
Chapter 11: Tree-Based Models 118
![Page 120: Chapter 11 Tree-based models - Machine translation · Chapter 11: Tree-Based Models 11. Learning Synchronous Grammars Extracting rules from a word-aligned parallel corpus First: Hierarchical](https://reader031.vdocuments.site/reader031/viewer/2022022513/5af095507f8b9abc788d5cd0/html5/thumbnails/120.jpg)
Summary
• Synchronous context free grammars
• Extracting rules from a syntactically parsed parallel corpus
• Bottom-up decoding
• Chart organization: dynamic programming, stacks, pruning
• Prefix tree for rules
• Dotted rules
• Cube pruning
Chapter 11: Tree-Based Models 119