tutorial to statistical machine...
TRANSCRIPT
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Statistical Machine Translation
Part I: Khalil Sima’anData and Models
Universiteit van Amsterdam
Part II: Trevor CohnDecoding and efficiencyUniversity of Sheffield
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Statistical Machine Translation: PART I
Dr. Khalil Sima’anStatistical Language Processing and LearningInstitute for Logic, Language and Computation
Universiteit van Amsterdam
Some slides use figures from Philipp Koehn, Barry Haddow and Sophie Arnoult
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Data and Models: Structure of lecture
General statistical frameworkWord-based models: word alignmentsPhrase-based models: phrase-alignmentsTree-based models: tree-alignments
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Statistical Approach: Parallel Corpora
Task: Translate a source sentence f to a target sentence e.Data: Parallel corpus (source-target sentence pairs).
Daily Briefing │ Press Releases │ Radio, TV, Photo │ Documents, Maps │ Publications, Stamps, Databases │ UN Works Peace & Security│ Economic & Social Development │ Human Rights │ Humanitarian Affairs │ International Law
Welcome to the United Nations
UN MillenniumDevelopment Goals
UN News Centre
About the United Nations
Main Bodies
Conferences & Events
Member States
General Assembly President
International Conference on Financing forDevelopment, Doha, Qatar
29 November - 2 December 2008
Secretary-General
Situation in the Middle East
Situation in Iraq
Renewing the United Nations
UN Action against Terrorism
Issues on the UN Agenda
Civil Society & Business
UN Webcast
CyberSchoolBus
Home │ Recent Additions │ Employment │ Procurement │ Comments │ Q&A │ UN System Sites │ Index │ Search
中文 | English | Français | Русский | Español | عربي
Copyright, United Nations, 2000-2008 | Terms of Use | Privacy Notice | Help
فهرس الموقع مواقع منظومة األمم المتحدة أسئلة وأجوبة مالحظات شعبة المشتريات فرص عمل إضافات جديدة صفحة االستقبال
中文 English Français Русский Español عربي
أهال بكم في موقع األمم
المتحدة
بحث األمم المتحدة تعمل منشورات وطوابع وقواعد البيانات وثائق وخرائط وسائط اإلعالم المتعددة
القانون الدولي الشؤون اإلنسانية حقوق اإلنسان التنمية االقتصادية واالجتماعية السلم واألمن
نحن الشعوب
األهداف اإلنمائية لألمم
المتحدة
مركز أنباء األمم المتحدة
معلومات عن األمم المتحدة
األجهزة الرئيسية
مؤتمرات ومناسبات
الدول األعضاء
رئيس الجمعية العامة
األمين العام
الحالة في الشرق األوسط
الحالة في العراق
تجديد األمم المتحدة
األمم المتحدة في مواجهة اإلرهاب
قضايا على جدول أعمال األمم المتحدة
المجتمع المدني / المؤسسات
التجارية
البث الشبكي
الحافلة المدرسية اإللكترونية
مؤتمر المتابعة الدولي لتمويل التنمية المعنيباستعراض تنفيذ توافق آراء مونتيري
الدوحة-قطر كانون األول/ديسمبر2 تشرين الثاني/نوفمبر - 29
2008
بيان الخصوصيات شروط استخدام الموقع صفحة المساعدة
2008-2000حقوق الطبع األمم المتحدة
Source-Channel Approach: IBM Models (1990’s)
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Parallel Corpus Example
Parallel corpus C = a collection of text-chunks and theirtranslations.Parallel corpora are the by-product of human translation.Every source chunk is paired with a target chunk.
Dutch EnglishDe prijs van het huis is gestegen. The price of the house has risen.Het huis kan worden verkocht. The house can be sold.Als het de marktprijs daalt zullen sommigegezinnen een zware tijd doormaken.
If the market price goes down, some familieswill go through difficult times.
.
.
....
.
.
....
.
.
....
.
.
....
.
.
....
.
.
....
Hansards Canadian Parliament Proc. (English-French).
European Parliament Proc. (23 languages).
United Nations documents.
Newspapers: Chinese-English; Arabic-English; Urdu-English.
TAUS corpora.
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Generative Source-Channel Framework
Given source sentence f, select target sentence e
arg maxe∈E(f){ P(e | f) } = arg maxe∈E(f){L.M.︷ ︸︸ ︷
P(e)×T .M.︷ ︸︸ ︷
P(f | e) }
Set E(f) is the set of hypothesized translations of f.
P(f | e): accounts for divergence in . . .word ordermorphologysyntactic relationsidiomatic ways of expression...
How to estimate P(e | f)? Sparse-data problem!
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Inducing The Structure of Translation Data
e = Mary did not slap the green witch .
? ? ? ?f = Maria no dio una bofetada a la bruja verde .
The latent structure of translation equivalence
Graphical representations ∆f and ∆e for f and e.Relation a between ∆f and ∆e
arg maxe∈E(f){ P(e | f) } =
arg maxe∈E(f){∑
〈∆f,a,∆e〉 P(e,∆f,∆e,a | f) }
The difficult question: Which ∆f/e and a fit data best?
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Structure in current models
∆fa→ ∆e
In most current models structure of reordering:∆f/e are structures over word positions.a is an alignment between groups of word positions in∆f and ∆e.
Challenge: Number of permutations of n words is n!
Structure shows translation units composing togetherWhat are the atomic translation units?How these compose together efficiently?How to put probs. on these structures?
Structure helps combat sparsity and complexity
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Structure in Existing Models: Sketch
Word-based1
aj
l
j
m
1
Alignment a"translation dictionary"
Language Model on e
1
l
e
e
e
aj
f1
fm
fj
Phrase-based
Tree-basedj
m
1
Language Model on e
1
l
e
e
e
aj
f1
fm
fj
1
aj
l
Problem: No sufficient stats to estimate P(e | f) from data
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Word-Based Models: Word Alignments
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Some History and References
Statistical models with word-alignments:Brown, Cocke, Della Pietra, Della Pietra, Jelinek,Lafferty, Mercer and Roossin. A statistical approach tomachine translation. Computational Linguistics, 1990.Brown, Della Pietra, Della Pietra and Mercer. Themathematics of statistical machine translation:parameter estimation., Computational Linguistics,1993.Och and Ney: A Systematic Comparison of VariousStatistical Alignment Models. ComputationalLinguistics, 2003.
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Word-Based Models and Word-Alignment
a is a mapping between word positions.f
ea
∆f and ∆e are sequences of word positions.e = el
1 = e1 . . . el and f = f m1 = f1 . . . fm
A hidden word-alignment a:
P(f | e) =∑
a
P(a, f | e)
Assume that a target word-position ei translates intozero or more source word-positions
a : {posf} → ({pose} ∪ {0})
ai or a(i), i.e., word position in e with which fi is aligned.
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Word Alignment Example
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Word Alignment Example
(Le reste appartenait aux autochtones |
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Word Alignment Example: Not covered in thissetting
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Word Alignment Matrix Example
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Translation model with word alignment
arg maxeP(e | f) = arg maxeP(e)×P(f | e)
P(f | e) =∑
a P(a, f | e) =∑
a P(a| e)× P(f | a,e)
Questions
How to parametrize the model?How are e, f and a composed from basic units?How to train the model?How to acquire word alignment?How to translate with this model?Decoding and computational issues (for second part)
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Word-Alignment As Hidden Structure
1
aj
l
j
m
1
Alignment a"translation dictionary"
Language Model on e
1
l
e
e
e
aj
f1
fm
fj
We need to decomposeThe alignment a and the length m: P(a | e)
“Translation dictionary” P(f | e,a)
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Word Alignment Models: General Scheme
Alignment of positions in f with positions in e:a = am
1 = a1 . . . am
Markov process over a
P(am1 , f
m1 | el
1) = P(m | e) ×m∏
j=1
P(aj | aj−11 , f j−1
1 ,m,e)× P(fj | aj1, f
j−11 ,m,e)
In words: to generate alignment a and foreign sentence f1 Choose a length m for f2 Generate alignment aj given the preceding alignments,
words in f, m, and e3 Generate word fj conditioned on structure so far and e.
IBM models are obtained by simplifications of this formula.
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
IBM Model I
P(am1 , f
m1 | e1 . . . el) = P(m | e)×
m∏
j=1
P(aj | aj−11 , f j−1
1 ,m,e)× P(fj | aj1, f
j−11 ,m,e)
IBM Model I:Length: P(m | e) =≈ P(m | l) ≈ = ε A fixed probability ε.
Align with uniform probability j with any aj in el1 or
NULL: P(aj | aj−11 , f j−1
1 ,m,e) ≈ (l + 1)−1
Note that aj can be linked with l positions in e or with NULL.
Lexicon: lexicon parameters πt (f | e)
P(fj | aj1, f
j−11 ,m,e) ≈ P(fj | eaj ) = πt (fj | eaj )
Parameters: ε and {πt (f | e) | 〈f ,e〉 ∈ C}.
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Sketch IBM Model I
f1
fm
f j
1
aj
l
j
m
1
e1
e l
e a j
Language Model on e
"translation dictionary"
choose length m given llabel every f position with an e position
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
IBM Model I Parameters and Data Likelhood
Data Likelihood:
P(f |e) =∑
am1
P(am1 , f
m1 | e1 . . . el)
=ε
(l + 1)m ×l∑
a1=0
. . .
l∑
am=0
m∏
j=1
πt (fj | eaj )
Parameters: ε and {πt (f | e) | 〈f ,e〉 ∈ C}.Fix ε, i.e., in practice put a uniform probability over a range [1..m], for some natural number m.
Dilemma
To estimate these parameters we need word-alignmentTo get word-alignment we need these parameters.
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
IBM Model II
Extends IBM Model I at alignment probs:
P(am1 , f
m1 | e1 . . . el) ≈ ε×
m∏
j=1
P(aj | aj−11 , f j−1
1 ,m,e)×πt (fj | eaj )
IBM Model II: changes only one element in IBM Model I:IBM Model I does not take into account the position ofwords in both strings
P(aj | aj−11 , f j−1
1 ,m,e) = P(aj | j , l ,m) := πA(aj | j , l ,m)
Where πA(.|.) are parameters to be learned from data.IBM Models III, IV and V concentrate on more complexalignments allowing, e.g., 1− to − n (fertility)
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
IBM Model II Parameters
P(am1 , f
m1 | e1 . . . el) ≈ ε×
m∏
j=1
πA(aj | j , l ,m)× πt (fj | eaj )
Parameters: {πA(aj | j , l ,m)} and {πt (fj | eaj )}
Dilemma
To estimate these parameters we need word-alignmentTo get word-alignment we need these parameters.
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Estimating Model Parameters
Maximum-Likelihood Estimation of model M on parallelcorpus C
arg maxm∈M
P(C | m) = arg maxm∈M
∏
〈e,f〉in C
Pm(e | f)
Example IBM Model I:Model M is defined by model parameters.Data is incomplete: no closed form solution.Expectation-Maximization (EM) sketchInit: Set the parameters at some m0 and let i = 0Repeat until convergence (in perplexity)
EMi(C) = C completed using estimate miEMi (C) contains mi -expectations over 〈e, f, a〉: P(a | f, e)
mi+1 = Relative Frequency Estimates from EMi(C).
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
EM for Lexicon and Word Alignment Probs
... la maison ... la maison blue ... la fleur ...
... the house ... the blue house ... the flower ...
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
EM for Lexicon and Word Alignment Probs
... la maison ... la maison blue ... la fleur ...
... the house ... the blue house ... the flower ...
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
EM for Lexicon and Word Alignment Probs
... la maison ... la maison bleu ... la fleur ...
... the house ... the blue house ... the flower ...
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
EM for Lexicon and Word Alignment Probs
... la maison ... la maison bleu ... la fleur ...
... the house ... the blue house ... the flower ...
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Translation Using EM Estimates
Lexicon probability estimates: {π̂t (fj | eaj )}Alignment probabilities: {π̂A(aj | j ,m, l)}Translation Model + Language Model + Decoder
arg maxeP(e | f) = arg maxeP(e)×∑
a
P(a, f | e)
e* = argmax p(e|f) e
Source Language Text
Target Language Text
Preprocessing
Preprocessing
Global search
p(e)
p(f|e)
Language model
Translation model
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Viterbi Word-Alignment using EM estimates
After EM has stabilized on estimates
{π̂t (fj | eaj )} and {π̂A(aj | j ,m, l)}
For every 〈f,e〉 in C apply the following
arg maxam1
P(am1 , f
m1 | e1 . . . el) ≈
arg maxam1ε ×
m∏
j=1
π̂A(aj | j ,m, l)× π̂t (fj | eaj )
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
HMM Alignment Model: General Form
P(am1 , f
m1 | e1 . . . el) ≈ ε×
m∏
j=1
P(aj | aj−11 , f j−1
1 ,m,e)×πt (fj | eaj )
Words do not move independently of each other:condition word movement on previous word movement
P(aj | aj−11 , f j−1
1 ,m,e) ≈ P(aj | aj−1,m)
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
IBM Model III (and IV): Example
A hidden word-alignment a: P(f | e) =∑
a P(a, f | e)
Estimate alignment + lexicon + reordering + fertilityparameters.
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Word-based Models (Och & Ney 2003)
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Word-Alignment As Hidden Structure:Sufficient?
1
aj
l
j
m
1
Alignment a"translation dictionary"
Language Model on e
1
l
e
e
e
aj
f1
fm
fj
We assumed alignment between words and dictionary:Alignment a and the length m: P(a | e)
Dictionary P(f | e,a)
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Limitations of Word-based Models
Limitations of word-based translation:Many-to-one and many-to-many is common:“Makes more difficult”/bemoeilijkt “Dat richtte (hen)ten gronde”/”That destroyed (them)”Reordering takes place (often) by whole blocks.Reordering individual words increases ambiguity.“The (big heavy) cow/la vaca (pesada grande)”Translation works by “fixed expressions” (idiomatic).Concatenating word-translations increases ambiguity.
Estimates of P(f | e) by word-based models are inaccurate.
Instead of words as basic events: multi-word events incorpus.
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Obtaining Symmetrized Word Alignments
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Asymetric Alignments: Limitations
Word-based models presented so far are based onasymmetric word alignment.Each position i in f is aligned with at most one positionin e: ai
What about such word alignments?
Or when a word in f translated into two or more in e?
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Symmetrization Heursitics
Obtain Af→e and Ae→fFrom Intersection Af→e ∩ Ae→f to Union Af→e ∪ Ae→f
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Symmetrization Heursitics Algoritm
Obtain Af→e and Ae→fFrom Intersection Af→e ∩ Ae→f to Union Af→e ∪ Ae→f
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Phrase-based Models: Alignment between Phrases
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Statistical “Memory-based” Translation
Store arbitrary length source-target translation unitsfrom training parallel corpus.
Translate new input by “covering” it with translationunits replayed from memory.
Idiomatic = Tiling: Phrase-Based SMT
Assume word-alignment a is given in parallel corpus.Phrase-pair = contiguous source-target 〈n,m〉-gramsthat are translational equivalents under a.Estimate phrase-pair probabilities Θ(f i | ei)
Translate f by “tiling it with phrases with orderpermutation”
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
PBSMT some references
Alignment-template Approach to Stat. MachineTranslation (RWTH Aachen 1999)Phrase-based statistical machine translation (Zens,Och and Ney 2002)Phrase based SMT (Koehn, Och and Marcu 2003)Joint Phrase-based SMT (Wang and Marcu 2005)Statistical Machine Translation. (Ph. Koehn, CambridgeUniversity Press 2010)
Relation to EBMT 1984; Data-Oriented Translation (2000).
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Phrase-Based Models: Conceptual
Segment foreign sentence finto I phrases f
I1
The president︸ ︷︷ ︸ meets Saudi economic officials
arg maxeP(e | f) = arg maxeP(e)× P(f | e)
P(f | e) =∑〈f I1,e
I1〉
P(fI1,e
I1 | e)
∏Ii=1 P(f i | ei )×d(starti−endi−1−1)
arg maxeP(f | e) ≈ arg max〈f I1,e
I1〉
ph.table︷ ︸︸ ︷∏Ii=1 Θ(f i | ei )×
Dist.reord.︷ ︸︸ ︷d(starti−endi−1−1)
starti /endi are positions of first/last words of f i (translateing to ei ).d(x) = αx exponentially decaying in words skipped (α ∈ (0,1]).
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Phrase-Based Models: Linear-interpolation
Segment foreign sentence finto I phrases f
I1
The president︸ ︷︷ ︸ meets Saudi economic officials
Log-linear interpolation of factors:
score(e|f) =∑
f∈F
λf × log Hf(e, f)
Where set F consists of:Bag of phrases translation =
∏Ii=1 Θ(f i | ei)
d() Phrases reordered with reordering model d(.)
lm Language model (5-grams or even 7-grams).other Smoothing + length penalty terms.
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Topics to discuss
Phrase table extractionEstimating {Θ(f i | ei)} and {Θ(ei | f i)}Lexicalized and hierarchical phrase reordering modelsOther: phrase, length penalty . . .Log-linear interpolation and minimum error-rate trainingDecoding and optimization
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Extracting phrase pairs
A phrase pair 〈f ,e〉 is consistent with alignment a iffNon-empty: at least one alignment pair from a is in〈f ,e〉No foreign positions inside 〈f ,e〉 aligned to positionsoutside itNo english positions inside 〈f ,e〉 aligned to positionsoutside it
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Extracting phrase pairs
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Extracting phrase pairs
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Extracting phrase pairs
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Extracting phrases from permutations
3
11
1
5
5
3 4 2
2 4 3
11
1
5
5
3 4 2
2 4
3
11
1
5
5
3 4 2
2 4 1
3 4 2
2 43
5
5
11
1
3 4 2
2 43
5
5
11
1
3 4 2
2 43
5
5
11
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Phrase pair weights
Extract phrase pairs from corpus into multiset
Tab = {〈f ,e〉, freq(f ,e)}
Weights for 〈f ,e〉
Θ(f | e) = freq(f ,e)∑〈f ′,e〉∈Tab
freq(f′,e)
Θ(e | f ) = freq(e,f )∑〈f ,e′〉∈Tab freq(f ,e′)
Smoothing with lexical word alignment estimates fromIBM models
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Distance-Based Reordering Sketch
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Lexicalized Reordering Sketch (Tillmann 2004)
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Limited generalization over parallel data (1)
Non-productive Phrase Table: Phrase Variants?
Morphological e.g., changing inflection, agreement
Al$arekat Alhindiyya $areka hindiyyathe-companies the-Indian company Indianthe Indian companies (an) Indian company
Syntactic e.g. adding adjective/proposition/adverbials
the fish in the deep sea swims the fish swims
Reordering minor reordering of same words not allowedIn Arabic V-S-O and S-V-O are allowed.
Semantic e.g. synonyms, paraphrases
Non-productive Phrase Table = Data Sparseness
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Limited generalization over parallel data (2)
Reordering
Local, monotone, almost non-lexicalized reordering.What about long range reordering?
Five phrases need to be reversed: see Chiang 2007 (J. CL).
Reordering target phrases with a coarse “source roadmap”?
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Limitations: Data-Sparseness
Non-productive phrase table + Local, Uncharted reordering⇓
Data-sparseness: Shorter phrases will apply down to wordlevel.⇓
Shorter phrases combined assuming independence.⇓
Target phrase selection hard due to large hypotheses latticeTarget Language Model = Only “GLUE” over target phrases.
The Shorter the Phrases, the Greater the Risk
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Idiomatic Approach: GOOD, BAD and UGLY
Phrases as atomic units
Good: Less ambiguity in lexical choice and reordering.Match-Retrieve exactly is largely safe.
Bad: Weak generalization over data.No phrase variants, weak reordering
Ugly: Fall-back on shorter phrases downtoword-to-wordLM as “glue” is insufficient.
Idiomatic approach does not alleviate data-sparseness
How Should We Translate Novel Phrases?
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Towards the land of bi-treesAlignments between Tree Pairs, ITG, Hierarchical
Models and Syntax
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Hidden Structure of Translation: Tree Pairs
j
m
1
Language Model on e
1
l
e
e
e
aj
f1
fm
fj
1
aj
l
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Reordering n words
Permutations of n words: n!3
11
1
5
5
3 4 2
2 4
Surely not all permutations are needed! (Wu 1995)Use trees and allow permutations on the nodes?
There is an exponential number of trees in n
ITG hypothesis (Wu 1995)
Assume binary trees with two operations3 4 21
[ ]
[ ]
< >
5
[ ]
Phrase-based forms of ITG (Chiang 2005; 2007): Hiero
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Syntax-Driven Phrase Translation
Syntax-driven Re-Ordering
Hierarchical (ITG)
XP
11
1
5
5
3 4 2
2 43
XP
XP
XP
Linguistic Syntax
Is translation syntactically cohesive?
Reordering == Moving children in parse tree?Binary: monotone or inverted order at every node.Lexical elements can be phrase pairs.Covers word-alignments in parallel corpora?
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Word order difference and syntax: Impression
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Phrase-based Hierarchical Model: Hiero(Chiang 2005; 2007)
Extracting phrase-pairs with gaps (hierarchical trees):
X → X1 dar una bufetada a X2 / X1 slap X2
X → Maria no X la bruja verde / Mary did not X the green witch
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
ITG with syntactic labels
Tutorial toStatisticalMachine
Translation
Dr KhalilSima’an
Word-BasedModels
AlignmentSymmetriza-tion
Phrase-BasedModels
Limitations ofPB Models
Syntax
Part II: Trevor CohnDecoding algorithms and efficiency