sp11 cs288 lecture 10 -- phrase alignment (2pp)klein/cs288/sp11... · frais .. learning weights has...
TRANSCRIPT
1
Statistical NLPSpring 2011
Lecture 10: Phrase AlignmentDan Klein – UC Berkeley
Phrase Weights
2
3
4
Phrase Scoring
les chatsaiment
lepoisson
cats
like
fresh
fish
.
.frais
.
� Learning weights has been tried, several times:� [Marcu and Wong, 02]� [DeNero et al, 06]� … and others
� Seems not to work well, for a variety of partially understood reasons
� Main issue: big chunks get all the weight, obvious priors don’t help� Though, [DeNero et al 08]
Phrase Size
� Phrases do help� But they don’t need
to be long� Why should this be?
5
Lexical Weighting
Phrase Alignment
6
7
8
9
10
Identifying Phrasal Translations
In the past two years , a number of US citizens …
过去 两 年 中 , 一 批 美国 公民 …
past two year in , one lots US citizen
Phrase alignment models: Choose a segmentation and a one-to-one phrase alignment
Past Go over
Underlying assumption: There is a correct phrasal segmentation
11
Unique Segmentations?
In the past two years , a number of US citizens …
过去 两 年 中 , 一 批 美国 公民 …
past two year in , one lots US citizen
Problem 1: Overlapping phrases can be useful (and complementary)
Problem 2: Phrases and their sub-phrases can both be useful
Hypothesis: This is why models of phrase alignment don’t work well
Identifying Phrasal Translations
This talk: Modeling sets of overlapping, multi-scale phrase pairs
In the past two years , a number of US citizens …
过去 两 年 中 , 一 批 美国 公民 …
past two year in , one lots US citizen
Input: sentence pairs
Output: extracted phrases
12
… But the Standard Pipeline has Overlap!
M O T I V A T I O N
In the past two years
过去
两
年
中
past
two
year
in
Sentence
Pair
Word
Alignment
Extracted
Phrases
Our Task: Predict Extraction Sets
M O T I V A T I O N
Sentence
Pair
Extracted
Phrases
Conditional model of extraction sets given sentence pairs
In the past two years
过去
两
年
中
0
1
2
3
40 1 2 3 4 5
In the past two years
过去
两
年
中
0
1
2
3
40 1 2 3 4 5
Extracted Phrases +
``Word Alignments’’
13
Alignments Imply Extraction Sets
M O D E L
In the past two years
过去
两
年
中
past
two
year
in
0
1
2
3
40 1 2 3 4 5
Word-level
alignment
links
Word-to-span
alignments
Extraction set
of bispans
Incorporating Possible Alignments
M O D E L
In the past two years
过去
两
年
中
past
two
year
in
0
1
2
3
40 1 2 3 4 5
Sure and
possible
word links
Word-to-span
alignments
Extraction set
of bispans
14
Linear Model for Extraction Sets
M O D E L
In the past two years
过去
两
年
中
0
1
2
3
40 1 2 3 4 5
Features on sure links
Features on all bispans
Features on Bispans and Sure Links
F E A T U R E S
过
地球
go over
Earth
over the Earth
Some features on sure links
HMM posteriors
Presence in dictionary
Numbers & punctuation
Features on bispans
HMM phrase table features: e.g., phrase relative frequencies
Lexical indicator features for phrases with common words
Monolingual phrase features: e.g., “the _____”
Shape features: e.g., Chinese character counts
15
Getting Gold Extraction Sets
T R A I N I N G
Hand Aligned:
Sure and possible
word links
Word-to-span
alignments
Extraction set
of bispans
Deterministic: A bispan is included iff every word within the bispan aligns within the bispan
Deterministic: Find min and maxalignment index for each word
Discriminative Training with MIRA
T R A I N I N G
Loss function: F-score of bispan errors (precision & recall)
Training Criterion: Minimal change to w such that the gold is preferred to the guess by a loss-scaled margin
Gold (annotated) Guess (arg max w·ɸ)
16
Inference: An ITG Parser
I N F E R E N C E
ITG captures some bispans
Experimental Setup
R E S U L T S
Chinese-to-English newswire
Parallel corpus: 11.3 million words; sentences length ≤ 40
MT systems: Tuned and tested on NIST ‘04 and ‘05
Supervised data: 150 training & 191 test sentences (NIST ‘02)
Unsupervised Model: Jointly trained HMM (Berkeley Aligner)
17
Baselines and Limited Systems
R E S U L T S
HMM:
ITG:
Coarse:
State-of-the-art unsupervised baseline
Joint training & competitive posterior decoding
Source of many features for supervised models
Supervised ITG aligner with block terminals
State-of-the-art supervised baseline
Re-implementation of Haghighi et al., 2009
Supervised block ITG + possible alignments
Coarse pass of full extraction set model
Word Alignment Performance
R E S U L T S
84.7
84.0
84.4
82.2
84.2
83.1
83.4
83.8
83.6
84.0
76.9
80.4
Precision
Recall
1 - AER
HMM
ITG
Coarse
Full
18
Extracted Bispan Performance
R E S U L T S
69.0
74.2
71.6
74.0
70.0
72.9
71.4
72.8
75.8
62.3
68.4
62.8
69.5
59.5
64.1
59.9
Precision
Recall
F1
F5 HMM
ITG
Coarse
Full
Translation Performance (BLEU)
R E S U L T S
34.4
35.9
34.2
35.7
33.6
34.7
33.2
34.5
31 32 33 34 35 36 37
Moses
Joshua
HMM
ITG
Coarse
Full
Supervised conditions also included HMM alignments