statistical nlp spring 2010
DESCRIPTION
Statistical NLP Spring 2010. Lecture 17: Word / Phrase MT. Dan Klein – UC Berkeley. See you tomorrow. Corpus-Based MT. Hasta pronto. Yo lo haré pronto. I will do it soon. I will do it around. Novel Sentence. See you around. Model of translation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/1.jpg)
Statistical NLPSpring 2010
Lecture 17: Word / Phrase MTDan Klein – UC Berkeley
![Page 2: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/2.jpg)
Corpus-Based MTModeling correspondences between languages
Sentence-aligned parallel corpus:
Yo lo haré mañanaI will do it tomorrow
Hasta prontoSee you soon
Hasta prontoSee you around
Yo lo haré prontoNovel Sentence
I will do it soon
I will do it around
See you tomorrow
Machine translation system:
Model of translation
![Page 3: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/3.jpg)
Unsupervised Word Alignment Input: a bitext: pairs of translated sentences
Output: alignments: pairs oftranslated words When words have unique
sources, can represent asa (forward) alignmentfunction a from French toEnglish positions
nous acceptons votre opinion .
we accept your view .
![Page 4: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/4.jpg)
Alignment Error Rate Alignment Error Rate
Sure align.Possible
align.Predicted
align.
===
![Page 5: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/5.jpg)
A:
IBM Models 1/2
Thank you , I shall do so gladly .
1 3 7 6 9
1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
8 8 88
E:
F:
![Page 6: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/6.jpg)
Problems with Model 1 There’s a reason they
designed models 2-5! Problems: alignments jump
around, align everything to rare words
Experimental setup: Training data: 1.1M
sentences of French-English text, Canadian Hansards
Evaluation metric: alignment error Rate (AER)
Evaluation data: 447 hand-aligned sentences
![Page 7: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/7.jpg)
Intersected Model 1 Post-intersection: standard
practice to train models in each direction then intersect their predictions [Och and Ney, 03]
Second model is basically a filter on the first
Precision jumps, recall drops End up not guessing hard
alignments
Model P/R AERModel 1 EF 82/58 30.6Model 1 FE 85/58 28.7Model 1 AND 96/46 34.8
![Page 8: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/8.jpg)
Joint Training? Overall:
Similar high precision to post-intersection But recall is much higher More confident about positing non-null alignments
Model P/R AERModel 1 EF 82/58 30.6Model 1 FE 85/58 28.7Model 1 AND 96/46 34.8Model 1 INT 93/69 19.5
![Page 9: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/9.jpg)
Monotonic Translation
Le Japon secoué par deux nouveaux séismes
Japan shaken by two new quakes
![Page 10: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/10.jpg)
Local Order Change
Le Japon est au confluent de quatre plaques tectoniques
Japan is at the junction of four tectonic plates
![Page 11: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/11.jpg)
IBM Model 2 Alignments tend to the diagonal (broadly at least)
Other schemes for biasing alignments towards the diagonal: Relative vs absolute alignment Asymmetric distances Learning a full multinomial over distances
![Page 12: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/12.jpg)
EM for Models 1/2
Model parameters:Translation probabilities (1+2)Distortion parameters (2 only)
Start with uniform, including For each sentence:
For each French position j Calculate posterior over English positions
(or just use best single alignment) Increment count of word fj with word ei by these amounts Also re-estimate distortion probabilities for model 2
Iterate until convergence
![Page 13: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/13.jpg)
Example: Model 2 Helps
![Page 14: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/14.jpg)
Phrase Movement
Des tremblements de terre ont à nouveau touché le Japon jeudi 4 novembre.
On Tuesday Nov. 4, earthquakes rocked Japan once again
![Page 15: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/15.jpg)
A:
The HMM Model
Thank you , I shall do so gladly .
1 3 7 6 9
1 2 3 4 5 76 8 9
Model ParametersTransitions: P( A2 = 3 | A1 = 1)Emissions: P( F1 = Gracias | EA1 = Thank )
Gracias , lo haré de muy buen grado .
8 8 88
E:
F:
![Page 16: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/16.jpg)
The HMM Model
Model 2 preferred global monotonicity We want local monotonicity:
Most jumps are small HMM model (Vogel 96)
Re-estimate using the forward-backward algorithm Handling nulls requires some care
What are we still missing?
-2 -1 0 1 2 3
![Page 17: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/17.jpg)
HMM Examples
![Page 18: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/18.jpg)
AER for HMMs
Model AERModel 1 INT 19.5HMM EF 11.4HMM FE 10.8HMM AND 7.1HMM INT 4.7GIZA M4 AND 6.9
![Page 19: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/19.jpg)
IBM Models 3/4/5
Mary did not slap the green witch
Mary not slap slap slap the green witch
Mary not slap slap slap NULL the green witch
n(3|slap)
Mary no daba una botefada a la verde bruja
Mary no daba una botefada a la bruja verde
P(NULL)
t(la|the)
d(j|i)
[from Al-Onaizan and Knight, 1998]
![Page 20: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/20.jpg)
Examples: Translation and Fertility
![Page 21: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/21.jpg)
Example: Idioms
il hoche la tête
he is nodding
![Page 22: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/22.jpg)
Example: Morphology
![Page 23: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/23.jpg)
Some Results [Och and Ney 03]
![Page 24: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/24.jpg)
Decoding In these word-to-word models
Finding best alignments is easy Finding translations is hard (why?)
![Page 25: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/25.jpg)
Bag “Generation” (Decoding)
![Page 26: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/26.jpg)
Bag Generation as a TSP Imagine bag generation
with a bigram LM Words are nodes Edge weights are P(w|
w’) Valid sentences are
Hamiltonian paths Not the best news for
word-based MT!
it
is
not
clear
.
![Page 27: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/27.jpg)
IBM Decoding as a TSP
![Page 28: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/28.jpg)
Greedy Decoding
![Page 29: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/29.jpg)
Stack Decoding Stack decoding:
Beam search Usually A* estimates for completion cost One stack per candidate sentence length
Other methods: Dynamic programming decoders possible if we make
assumptions about the set of allowable permutations
![Page 30: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/30.jpg)
Stack Decoding Stack decoding:
Beam search Usually A* estimates for completion cost One stack per candidate sentence length
Other methods: Dynamic programming decoders possible if we make
assumptions about the set of allowable permutations
![Page 31: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/31.jpg)
Phrase-Based Systems
Sentence-aligned corpus
cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9language ||| langue ||| 0.9 …
Phrase table(translation model)Word alignments
![Page 32: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/32.jpg)
Phrase-Based Decoding
这 7 人 中包括 来自 法国 和 俄罗斯 的 宇航 员 .
Decoder design is important: [Koehn et al. 03]
![Page 33: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/33.jpg)
The Pharaoh “Model”
[Koehn et al, 2003]
Segmentation Translation Distortion
![Page 34: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/34.jpg)
The Pharaoh “Model”
Where do we get these counts?
![Page 35: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/35.jpg)
Phrase Weights
![Page 36: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/36.jpg)
![Page 37: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/37.jpg)
Phrase Scoring
les chatsaiment
lepoisson
cats
like
fresh
fish
.
.frais
.
Learning weights has been tried, several times: [Marcu and Wong, 02] [DeNero et al, 06] … and others
Seems not to work well, for a variety of partially understood reasons
Main issue: big chunks get all the weight, obvious priors don’t help Though, [DeNero et al 08]
![Page 38: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/38.jpg)
Phrase Size Phrases do help
But they don’t need to be long
Why should this be?
![Page 39: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/39.jpg)
Lexical Weighting
![Page 40: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/40.jpg)
The Pharaoh Decoder
Probabilities at each step include LM and TM
![Page 41: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/41.jpg)
Hypotheis Lattices
![Page 42: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/42.jpg)
Pruning
Problem: easy partial analyses are cheaper Solution 1: use beams per foreign subset Solution 2: estimate forward costs (A*-like)
![Page 43: Statistical NLP Spring 2010](https://reader036.vdocuments.site/reader036/viewer/2022070503/568155c6550346895dc39a65/html5/thumbnails/43.jpg)
WSD? Remember when we discussed WSD?
Word-based MT systems rarely have a WSD step Why not?