![Page 1: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/1.jpg)
Discriminative Learning of Extraction Sets for Machine Translation
John DeNero and Dan KleinUC Berkeley
![Page 2: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/2.jpg)
Identifying Phrasal Translations
In the past two years , a number of US citizens …
过去 两 年 中 , 一 批 美国 公民 …
past two year in , one lots US citizen
Phrase alignment models: Choose a segmentation and a one-to-one phrase alignment
Past Go over
Underlying assumption: There is a correct phrasal segmentation
![Page 3: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/3.jpg)
Unique Segmentations?
In the past two years , a number of US citizens …
过去 两 年 中 , 一 批 美国 公民 …
past two year in , one lots US citizen
Problem 1: Overlapping phrases can be useful (and complementary)
Problem 2: Phrases and their sub-phrases can both be useful
Hypothesis: This is why models of phrase alignment don’t work well
![Page 4: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/4.jpg)
Identifying Phrasal Translations
This talk: Modeling sets of overlapping, multi-scale phrase pairs
In the past two years , a number of US citizens …
过去 两 年 中 , 一 批 美国 公民 …
past two year in , one lots US citizen
Input: sentence pairs
Output: extracted phrases
![Page 5: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/5.jpg)
… But the Standard Pipeline has Overlap!
M O T I V A T I O N
In the past two years
过去
两
年
中
past
two
year
in
Sentence Pair
Word Alignment
Extracted Phrases
![Page 6: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/6.jpg)
Related Work
M O T I V A T I O N
Sentence Pair
Word Alignment
Extracted Phrases
Translation models: Sinuhe system (Kääriäinen, 2009)
Combining Aligners: Yonggang Deng & Bowen Zhou (2009)
Fixed alignments; learned phrase pair weights
Fixed directional alignments; learned symmetrization
Extraction models: Moore and Quirk, 2007
Fixed alignments; learned phrase pair weights
![Page 7: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/7.jpg)
Our Task: Predict Extraction Sets
M O T I V A T I O N
Sentence Pair
Extracted Phrases
Conditional model of extraction sets given sentence pairs
In the past two years
过去两年中
0
1
2
3
40 1 2 3 4 5
In the past two years
过去两年中
0
1
2
3
40 1 2 3 4 5
Extracted Phrases + ``Word Alignments’’
![Page 8: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/8.jpg)
Alignments Imply Extraction Sets
M O D E L
In the past two years
过去
两
年
中
past
two
year
in
0
1
2
3
40 1 2 3 4 5
Word-level alignment
links
Word-to-span alignments
Extraction set of bispans
![Page 9: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/9.jpg)
Nulls and Possibles
据
报道
according to
news report
it is reported
据
报道
according to
news report
it is reported
Nulls:
Possibles:
![Page 10: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/10.jpg)
Incorporating Possible Alignments
M O D E L
In the past two years
过去
两
年
中
past
two
year
in
0
1
2
3
40 1 2 3 4 5
Sure and possible
word links
Word-to-span alignments
Extraction set of bispans
![Page 11: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/11.jpg)
Linear Model for Extraction Sets
M O D E L
In the past two years
过去
两
年
中
0
1
2
3
40 1 2 3 4 5
Features on sure links
Features on all bispans
![Page 12: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/12.jpg)
Features on Bispans and Sure Links
F E A T U R E S
过
地球
go over
Earth
over the Earth
Some features on sure links
HMM posteriors
Presence in dictionary
Numbers & punctuation
Features on bispans
HMM phrase table features: e.g., phrase relative frequencies
Lexical indicator features for phrases with common words
Monolingual phrase features: e.g., “the _____”
Shape features: e.g., Chinese character counts
![Page 13: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/13.jpg)
Getting Gold Extraction Sets
T R A I N I N G
Hand Aligned: Sure and possible
word links
Word-to-span alignments
Extraction set of bispans
Deterministic: A bispan is included iff every word within the bispan aligns within the bispan
Deterministic: Find min and max alignment index for each word
![Page 14: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/14.jpg)
Discriminative Training with MIRA
T R A I N I N G
Loss function: F-score of bispan errors (precision & recall)
Training Criterion: Minimal change to w such that the gold is preferred to the guess by a loss-scaled margin
Gold (annotated) Guess (arg max w ɸ)∙
![Page 15: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/15.jpg)
Inference: An ITG Parser
I N F E R E N C E
ITG captures some bispans
![Page 16: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/16.jpg)
Coarse-to-Fine Approximation
I N F E R E N C E
Coarse Pass: Features that are local to terminal productions
Fine Pass: Agenda search using coarse pass as a heuristic
We use an agenda-based parser. It’s fast!
![Page 17: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/17.jpg)
Experimental Setup
R E S U L T S
Chinese-to-English newswire
Parallel corpus: 11.3 million words; sentences length ≤ 40
MT systems: Tuned and tested on NIST ‘04 and ‘05
Supervised data: 150 training & 191 test sentences (NIST ‘02)
Unsupervised Model: Jointly trained HMM (Berkeley Aligner)
![Page 18: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/18.jpg)
Baselines and Limited Systems
R E S U L T S
HMM:
ITG:
Coarse:
State-of-the-art unsupervised baseline
Joint training & competitive posterior decoding
Source of many features for supervised models
Supervised ITG aligner with block terminals
State-of-the-art supervised baseline
Re-implementation of Haghighi et al., 2009
Supervised block ITG + possible alignments
Coarse pass of full extraction set model
![Page 19: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/19.jpg)
Word Alignment Performance
R E S U L T S
Precision
Recall
1 - AER
84.7
84.0
84.4
82.2
84.2
83.1
83.4
83.8
83.6
84.0
76.9
80.4 HMMITGCoarseFull
![Page 20: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/20.jpg)
Extracted Bispan Performance
R E S U L T S
Precision
Recall
F1
F5
69.0
74.2
71.6
74.0
70.0
72.9
71.4
72.8
75.8
62.3
68.4
62.8
69.5
59.5
64.1
59.9
HMMITGCoarseFull
![Page 21: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/21.jpg)
Translation Performance (BLEU)
R E S U L T S
Moses
Joshua
31.5 32 32.5 33 33.5 34 34.5 35 35.5 36 36.5
34.4
35.9
34.2
35.7
33.6
34.7
33.2
34.5
HMMITGCoarseFull
Supervised conditions also included HMM alignments
![Page 22: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/22.jpg)
Conclusions
Extraction set model directly learns what phrases to extract
The system performs well as an aligner or a rule extractor
Are segmentations always bad?
Idea: get overlap and multi-scale into the learning!
![Page 23: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint](https://reader036.vdocuments.site/reader036/viewer/2022062322/56649d3f5503460f94a18562/html5/thumbnails/23.jpg)
Thank you!
nlp.cs.berkeley.edu