tutorial to statistical machine...

66
Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetriza- tion Phrase-Based Models Limitations of PB Models Syntax Statistical Machine Translation Part I: Khalil Sima’an Data and Models Universiteit van Amsterdam Part II: Trevor Cohn Decoding and efficiency University of Sheffield

Upload: others

Post on 31-Jul-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Statistical Machine Translation

Part I: Khalil Sima’anData and Models

Universiteit van Amsterdam

Part II: Trevor CohnDecoding and efficiencyUniversity of Sheffield

Page 2: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Statistical Machine Translation: PART I

Dr. Khalil Sima’anStatistical Language Processing and LearningInstitute for Logic, Language and Computation

Universiteit van Amsterdam

Some slides use figures from Philipp Koehn, Barry Haddow and Sophie Arnoult

Page 3: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Data and Models: Structure of lecture

General statistical frameworkWord-based models: word alignmentsPhrase-based models: phrase-alignmentsTree-based models: tree-alignments

Page 4: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Statistical Approach: Parallel Corpora

Task: Translate a source sentence f to a target sentence e.Data: Parallel corpus (source-target sentence pairs).

Daily Briefing │ Press Releases │ Radio, TV, Photo │ Documents, Maps │ Publications, Stamps, Databases │ UN Works Peace & Security│ Economic & Social Development │ Human Rights │ Humanitarian Affairs │ International Law

Welcome to the United Nations

UN MillenniumDevelopment Goals

UN News Centre

About the United Nations

Main Bodies

Conferences & Events

Member States

General Assembly President

International Conference on Financing forDevelopment, Doha, Qatar

29 November - 2 December 2008

Secretary-General

Situation in the Middle East

Situation in Iraq

Renewing the United Nations

UN Action against Terrorism

Issues on the UN Agenda

Civil Society & Business

UN Webcast

CyberSchoolBus

Home │ Recent Additions │ Employment │ Procurement │ Comments │ Q&A │ UN System Sites │ Index │ Search

中文 | English | Français | Русский | Español | عربي

Copyright, United Nations, 2000-2008 | Terms of Use | Privacy Notice | Help

فهرس الموقع مواقع منظومة األمم المتحدة أسئلة وأجوبة مالحظات شعبة المشتريات فرص عمل إضافات جديدة صفحة االستقبال

中文 English Français Русский Español عربي

أهال بكم في موقع األمم

المتحدة

بحث األمم المتحدة تعمل منشورات وطوابع وقواعد البيانات وثائق وخرائط وسائط اإلعالم المتعددة

القانون الدولي الشؤون اإلنسانية حقوق اإلنسان التنمية االقتصادية واالجتماعية السلم واألمن

نحن الشعوب

األهداف اإلنمائية لألمم

المتحدة

مركز أنباء األمم المتحدة

معلومات عن األمم المتحدة

األجهزة الرئيسية

مؤتمرات ومناسبات

الدول األعضاء

رئيس الجمعية العامة

األمين العام

الحالة في الشرق األوسط

الحالة في العراق

تجديد األمم المتحدة

األمم المتحدة في مواجهة اإلرهاب

قضايا على جدول أعمال األمم المتحدة

المجتمع المدني / المؤسسات

التجارية

البث الشبكي

الحافلة المدرسية اإللكترونية

مؤتمر المتابعة الدولي لتمويل التنمية المعنيباستعراض تنفيذ توافق آراء مونتيري

الدوحة-قطر كانون األول/ديسمبر2 تشرين الثاني/نوفمبر - 29

2008

بيان الخصوصيات شروط استخدام الموقع صفحة المساعدة

2008-2000حقوق الطبع األمم المتحدة

Source-Channel Approach: IBM Models (1990’s)

Page 5: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Parallel Corpus Example

Parallel corpus C = a collection of text-chunks and theirtranslations.Parallel corpora are the by-product of human translation.Every source chunk is paired with a target chunk.

Dutch EnglishDe prijs van het huis is gestegen. The price of the house has risen.Het huis kan worden verkocht. The house can be sold.Als het de marktprijs daalt zullen sommigegezinnen een zware tijd doormaken.

If the market price goes down, some familieswill go through difficult times.

.

.

....

.

.

....

.

.

....

.

.

....

.

.

....

.

.

....

Hansards Canadian Parliament Proc. (English-French).

European Parliament Proc. (23 languages).

United Nations documents.

Newspapers: Chinese-English; Arabic-English; Urdu-English.

TAUS corpora.

Page 6: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Generative Source-Channel Framework

Given source sentence f, select target sentence e

arg maxe∈E(f){ P(e | f) } = arg maxe∈E(f){L.M.︷ ︸︸ ︷

P(e)×T .M.︷ ︸︸ ︷

P(f | e) }

Set E(f) is the set of hypothesized translations of f.

P(f | e): accounts for divergence in . . .word ordermorphologysyntactic relationsidiomatic ways of expression...

How to estimate P(e | f)? Sparse-data problem!

Page 7: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Inducing The Structure of Translation Data

e = Mary did not slap the green witch .

? ? ? ?f = Maria no dio una bofetada a la bruja verde .

The latent structure of translation equivalence

Graphical representations ∆f and ∆e for f and e.Relation a between ∆f and ∆e

arg maxe∈E(f){ P(e | f) } =

arg maxe∈E(f){∑

〈∆f,a,∆e〉 P(e,∆f,∆e,a | f) }

The difficult question: Which ∆f/e and a fit data best?

Page 8: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Structure in current models

∆fa→ ∆e

In most current models structure of reordering:∆f/e are structures over word positions.a is an alignment between groups of word positions in∆f and ∆e.

Challenge: Number of permutations of n words is n!

Structure shows translation units composing togetherWhat are the atomic translation units?How these compose together efficiently?How to put probs. on these structures?

Structure helps combat sparsity and complexity

Page 9: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Structure in Existing Models: Sketch

Word-based1

aj

l

j

m

1

Alignment a"translation dictionary"

Language Model on e

1

l

e

e

e

aj

f1

fm

fj

Phrase-based

Tree-basedj

m

1

Language Model on e

1

l

e

e

e

aj

f1

fm

fj

1

aj

l

Problem: No sufficient stats to estimate P(e | f) from data

Page 10: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Word-Based Models: Word Alignments

Page 11: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Some History and References

Statistical models with word-alignments:Brown, Cocke, Della Pietra, Della Pietra, Jelinek,Lafferty, Mercer and Roossin. A statistical approach tomachine translation. Computational Linguistics, 1990.Brown, Della Pietra, Della Pietra and Mercer. Themathematics of statistical machine translation:parameter estimation., Computational Linguistics,1993.Och and Ney: A Systematic Comparison of VariousStatistical Alignment Models. ComputationalLinguistics, 2003.

Page 12: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Word-Based Models and Word-Alignment

a is a mapping between word positions.f

ea

∆f and ∆e are sequences of word positions.e = el

1 = e1 . . . el and f = f m1 = f1 . . . fm

A hidden word-alignment a:

P(f | e) =∑

a

P(a, f | e)

Assume that a target word-position ei translates intozero or more source word-positions

a : {posf} → ({pose} ∪ {0})

ai or a(i), i.e., word position in e with which fi is aligned.

Page 13: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Word Alignment Example

Page 14: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Word Alignment Example

(Le reste appartenait aux autochtones |

Page 15: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Word Alignment Example: Not covered in thissetting

Page 16: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Word Alignment Matrix Example

Page 17: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Translation model with word alignment

arg maxeP(e | f) = arg maxeP(e)×P(f | e)

P(f | e) =∑

a P(a, f | e) =∑

a P(a| e)× P(f | a,e)

Questions

How to parametrize the model?How are e, f and a composed from basic units?How to train the model?How to acquire word alignment?How to translate with this model?Decoding and computational issues (for second part)

Page 18: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Word-Alignment As Hidden Structure

1

aj

l

j

m

1

Alignment a"translation dictionary"

Language Model on e

1

l

e

e

e

aj

f1

fm

fj

We need to decomposeThe alignment a and the length m: P(a | e)

“Translation dictionary” P(f | e,a)

Page 19: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Word Alignment Models: General Scheme

Alignment of positions in f with positions in e:a = am

1 = a1 . . . am

Markov process over a

P(am1 , f

m1 | el

1) = P(m | e) ×m∏

j=1

P(aj | aj−11 , f j−1

1 ,m,e)× P(fj | aj1, f

j−11 ,m,e)

In words: to generate alignment a and foreign sentence f1 Choose a length m for f2 Generate alignment aj given the preceding alignments,

words in f, m, and e3 Generate word fj conditioned on structure so far and e.

IBM models are obtained by simplifications of this formula.

Page 20: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

IBM Model I

P(am1 , f

m1 | e1 . . . el) = P(m | e)×

m∏

j=1

P(aj | aj−11 , f j−1

1 ,m,e)× P(fj | aj1, f

j−11 ,m,e)

IBM Model I:Length: P(m | e) =≈ P(m | l) ≈ = ε A fixed probability ε.

Align with uniform probability j with any aj in el1 or

NULL: P(aj | aj−11 , f j−1

1 ,m,e) ≈ (l + 1)−1

Note that aj can be linked with l positions in e or with NULL.

Lexicon: lexicon parameters πt (f | e)

P(fj | aj1, f

j−11 ,m,e) ≈ P(fj | eaj ) = πt (fj | eaj )

Parameters: ε and {πt (f | e) | 〈f ,e〉 ∈ C}.

Page 21: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Sketch IBM Model I

f1

fm

f j

1

aj

l

j

m

1

e1

e l

e a j

Language Model on e

"translation dictionary"

choose length m given llabel every f position with an e position

Page 22: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

IBM Model I Parameters and Data Likelhood

Data Likelihood:

P(f |e) =∑

am1

P(am1 , f

m1 | e1 . . . el)

(l + 1)m ×l∑

a1=0

. . .

l∑

am=0

m∏

j=1

πt (fj | eaj )

Parameters: ε and {πt (f | e) | 〈f ,e〉 ∈ C}.Fix ε, i.e., in practice put a uniform probability over a range [1..m], for some natural number m.

Dilemma

To estimate these parameters we need word-alignmentTo get word-alignment we need these parameters.

Page 23: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

IBM Model II

Extends IBM Model I at alignment probs:

P(am1 , f

m1 | e1 . . . el) ≈ ε×

m∏

j=1

P(aj | aj−11 , f j−1

1 ,m,e)×πt (fj | eaj )

IBM Model II: changes only one element in IBM Model I:IBM Model I does not take into account the position ofwords in both strings

P(aj | aj−11 , f j−1

1 ,m,e) = P(aj | j , l ,m) := πA(aj | j , l ,m)

Where πA(.|.) are parameters to be learned from data.IBM Models III, IV and V concentrate on more complexalignments allowing, e.g., 1− to − n (fertility)

Page 24: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

IBM Model II Parameters

P(am1 , f

m1 | e1 . . . el) ≈ ε×

m∏

j=1

πA(aj | j , l ,m)× πt (fj | eaj )

Parameters: {πA(aj | j , l ,m)} and {πt (fj | eaj )}

Dilemma

To estimate these parameters we need word-alignmentTo get word-alignment we need these parameters.

Page 25: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Estimating Model Parameters

Maximum-Likelihood Estimation of model M on parallelcorpus C

arg maxm∈M

P(C | m) = arg maxm∈M

〈e,f〉in C

Pm(e | f)

Example IBM Model I:Model M is defined by model parameters.Data is incomplete: no closed form solution.Expectation-Maximization (EM) sketchInit: Set the parameters at some m0 and let i = 0Repeat until convergence (in perplexity)

EMi(C) = C completed using estimate miEMi (C) contains mi -expectations over 〈e, f, a〉: P(a | f, e)

mi+1 = Relative Frequency Estimates from EMi(C).

Page 26: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

EM for Lexicon and Word Alignment Probs

... la maison ... la maison blue ... la fleur ...

... the house ... the blue house ... the flower ...

Page 27: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

EM for Lexicon and Word Alignment Probs

... la maison ... la maison blue ... la fleur ...

... the house ... the blue house ... the flower ...

Page 28: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

EM for Lexicon and Word Alignment Probs

... la maison ... la maison bleu ... la fleur ...

... the house ... the blue house ... the flower ...

Page 29: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

EM for Lexicon and Word Alignment Probs

... la maison ... la maison bleu ... la fleur ...

... the house ... the blue house ... the flower ...

Page 30: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Translation Using EM Estimates

Lexicon probability estimates: {π̂t (fj | eaj )}Alignment probabilities: {π̂A(aj | j ,m, l)}Translation Model + Language Model + Decoder

arg maxeP(e | f) = arg maxeP(e)×∑

a

P(a, f | e)

e* = argmax p(e|f) e

Source Language Text

Target Language Text

Preprocessing

Preprocessing

Global search

p(e)

p(f|e)

Language model

Translation model

Page 31: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Viterbi Word-Alignment using EM estimates

After EM has stabilized on estimates

{π̂t (fj | eaj )} and {π̂A(aj | j ,m, l)}

For every 〈f,e〉 in C apply the following

arg maxam1

P(am1 , f

m1 | e1 . . . el) ≈

arg maxam1ε ×

m∏

j=1

π̂A(aj | j ,m, l)× π̂t (fj | eaj )

Page 32: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

HMM Alignment Model: General Form

P(am1 , f

m1 | e1 . . . el) ≈ ε×

m∏

j=1

P(aj | aj−11 , f j−1

1 ,m,e)×πt (fj | eaj )

Words do not move independently of each other:condition word movement on previous word movement

P(aj | aj−11 , f j−1

1 ,m,e) ≈ P(aj | aj−1,m)

Page 33: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

IBM Model III (and IV): Example

A hidden word-alignment a: P(f | e) =∑

a P(a, f | e)

Estimate alignment + lexicon + reordering + fertilityparameters.

Page 34: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Word-based Models (Och & Ney 2003)

Page 35: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Word-Alignment As Hidden Structure:Sufficient?

1

aj

l

j

m

1

Alignment a"translation dictionary"

Language Model on e

1

l

e

e

e

aj

f1

fm

fj

We assumed alignment between words and dictionary:Alignment a and the length m: P(a | e)

Dictionary P(f | e,a)

Page 36: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Limitations of Word-based Models

Limitations of word-based translation:Many-to-one and many-to-many is common:“Makes more difficult”/bemoeilijkt “Dat richtte (hen)ten gronde”/”That destroyed (them)”Reordering takes place (often) by whole blocks.Reordering individual words increases ambiguity.“The (big heavy) cow/la vaca (pesada grande)”Translation works by “fixed expressions” (idiomatic).Concatenating word-translations increases ambiguity.

Estimates of P(f | e) by word-based models are inaccurate.

Instead of words as basic events: multi-word events incorpus.

Page 37: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Obtaining Symmetrized Word Alignments

Page 38: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Asymetric Alignments: Limitations

Word-based models presented so far are based onasymmetric word alignment.Each position i in f is aligned with at most one positionin e: ai

What about such word alignments?

Or when a word in f translated into two or more in e?

Page 39: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Symmetrization Heursitics

Obtain Af→e and Ae→fFrom Intersection Af→e ∩ Ae→f to Union Af→e ∪ Ae→f

Page 40: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Symmetrization Heursitics Algoritm

Obtain Af→e and Ae→fFrom Intersection Af→e ∩ Ae→f to Union Af→e ∪ Ae→f

Page 41: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Phrase-based Models: Alignment between Phrases

Page 42: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Statistical “Memory-based” Translation

Store arbitrary length source-target translation unitsfrom training parallel corpus.

Translate new input by “covering” it with translationunits replayed from memory.

Idiomatic = Tiling: Phrase-Based SMT

Assume word-alignment a is given in parallel corpus.Phrase-pair = contiguous source-target 〈n,m〉-gramsthat are translational equivalents under a.Estimate phrase-pair probabilities Θ(f i | ei)

Translate f by “tiling it with phrases with orderpermutation”

Page 43: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

PBSMT some references

Alignment-template Approach to Stat. MachineTranslation (RWTH Aachen 1999)Phrase-based statistical machine translation (Zens,Och and Ney 2002)Phrase based SMT (Koehn, Och and Marcu 2003)Joint Phrase-based SMT (Wang and Marcu 2005)Statistical Machine Translation. (Ph. Koehn, CambridgeUniversity Press 2010)

Relation to EBMT 1984; Data-Oriented Translation (2000).

Page 44: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Phrase-Based Models: Conceptual

Segment foreign sentence finto I phrases f

I1

The president︸ ︷︷ ︸ meets Saudi economic officials

arg maxeP(e | f) = arg maxeP(e)× P(f | e)

P(f | e) =∑〈f I1,e

I1〉

P(fI1,e

I1 | e)

∏Ii=1 P(f i | ei )×d(starti−endi−1−1)

arg maxeP(f | e) ≈ arg max〈f I1,e

I1〉

ph.table︷ ︸︸ ︷∏Ii=1 Θ(f i | ei )×

Dist.reord.︷ ︸︸ ︷d(starti−endi−1−1)

starti /endi are positions of first/last words of f i (translateing to ei ).d(x) = αx exponentially decaying in words skipped (α ∈ (0,1]).

Page 45: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Phrase-Based Models: Linear-interpolation

Segment foreign sentence finto I phrases f

I1

The president︸ ︷︷ ︸ meets Saudi economic officials

Log-linear interpolation of factors:

score(e|f) =∑

f∈F

λf × log Hf(e, f)

Where set F consists of:Bag of phrases translation =

∏Ii=1 Θ(f i | ei)

d() Phrases reordered with reordering model d(.)

lm Language model (5-grams or even 7-grams).other Smoothing + length penalty terms.

Page 46: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Topics to discuss

Phrase table extractionEstimating {Θ(f i | ei)} and {Θ(ei | f i)}Lexicalized and hierarchical phrase reordering modelsOther: phrase, length penalty . . .Log-linear interpolation and minimum error-rate trainingDecoding and optimization

Page 47: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Extracting phrase pairs

A phrase pair 〈f ,e〉 is consistent with alignment a iffNon-empty: at least one alignment pair from a is in〈f ,e〉No foreign positions inside 〈f ,e〉 aligned to positionsoutside itNo english positions inside 〈f ,e〉 aligned to positionsoutside it

Page 48: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Extracting phrase pairs

Page 49: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Extracting phrase pairs

Page 50: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Extracting phrase pairs

Page 51: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Extracting phrases from permutations

3

11

1

5

5

3 4 2

2 4 3

11

1

5

5

3 4 2

2 4

3

11

1

5

5

3 4 2

2 4 1

3 4 2

2 43

5

5

11

1

3 4 2

2 43

5

5

11

1

3 4 2

2 43

5

5

11

Page 52: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Phrase pair weights

Extract phrase pairs from corpus into multiset

Tab = {〈f ,e〉, freq(f ,e)}

Weights for 〈f ,e〉

Θ(f | e) = freq(f ,e)∑〈f ′,e〉∈Tab

freq(f′,e)

Θ(e | f ) = freq(e,f )∑〈f ,e′〉∈Tab freq(f ,e′)

Smoothing with lexical word alignment estimates fromIBM models

Page 53: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Distance-Based Reordering Sketch

Page 54: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Lexicalized Reordering Sketch (Tillmann 2004)

Page 55: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Limited generalization over parallel data (1)

Non-productive Phrase Table: Phrase Variants?

Morphological e.g., changing inflection, agreement

Al$arekat Alhindiyya $areka hindiyyathe-companies the-Indian company Indianthe Indian companies (an) Indian company

Syntactic e.g. adding adjective/proposition/adverbials

the fish in the deep sea swims the fish swims

Reordering minor reordering of same words not allowedIn Arabic V-S-O and S-V-O are allowed.

Semantic e.g. synonyms, paraphrases

Non-productive Phrase Table = Data Sparseness

Page 56: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Limited generalization over parallel data (2)

Reordering

Local, monotone, almost non-lexicalized reordering.What about long range reordering?

Five phrases need to be reversed: see Chiang 2007 (J. CL).

Reordering target phrases with a coarse “source roadmap”?

Page 57: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Limitations: Data-Sparseness

Non-productive phrase table + Local, Uncharted reordering⇓

Data-sparseness: Shorter phrases will apply down to wordlevel.⇓

Shorter phrases combined assuming independence.⇓

Target phrase selection hard due to large hypotheses latticeTarget Language Model = Only “GLUE” over target phrases.

The Shorter the Phrases, the Greater the Risk

Page 58: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Idiomatic Approach: GOOD, BAD and UGLY

Phrases as atomic units

Good: Less ambiguity in lexical choice and reordering.Match-Retrieve exactly is largely safe.

Bad: Weak generalization over data.No phrase variants, weak reordering

Ugly: Fall-back on shorter phrases downtoword-to-wordLM as “glue” is insufficient.

Idiomatic approach does not alleviate data-sparseness

How Should We Translate Novel Phrases?

Page 59: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Towards the land of bi-treesAlignments between Tree Pairs, ITG, Hierarchical

Models and Syntax

Page 60: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Hidden Structure of Translation: Tree Pairs

j

m

1

Language Model on e

1

l

e

e

e

aj

f1

fm

fj

1

aj

l

Page 61: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Reordering n words

Permutations of n words: n!3

11

1

5

5

3 4 2

2 4

Surely not all permutations are needed! (Wu 1995)Use trees and allow permutations on the nodes?

There is an exponential number of trees in n

ITG hypothesis (Wu 1995)

Assume binary trees with two operations3 4 21

[ ]

[ ]

< >

5

[ ]

Phrase-based forms of ITG (Chiang 2005; 2007): Hiero

Page 62: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Syntax-Driven Phrase Translation

Syntax-driven Re-Ordering

Hierarchical (ITG)

XP

11

1

5

5

3 4 2

2 43

XP

XP

XP

Linguistic Syntax

Is translation syntactically cohesive?

Reordering == Moving children in parse tree?Binary: monotone or inverted order at every node.Lexical elements can be phrase pairs.Covers word-alignments in parallel corpora?

Page 63: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Word order difference and syntax: Impression

Page 64: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Phrase-based Hierarchical Model: Hiero(Chiang 2005; 2007)

Extracting phrase-pairs with gaps (hierarchical trees):

X → X1 dar una bufetada a X2 / X1 slap X2

X → Maria no X la bruja verde / Mary did not X the green witch

Page 65: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

ITG with syntactic labels

Page 66: Tutorial to Statistical Machine Translationivan-titov.org/teaching/elements.ws13-14/elements-7.pdf · Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment

Tutorial toStatisticalMachine

Translation

Dr KhalilSima’an

Word-BasedModels

AlignmentSymmetriza-tion

Phrase-BasedModels

Limitations ofPB Models

Syntax

Part II: Trevor CohnDecoding algorithms and efficiency