2003 (c) university of pennsylvania1 better mt using parallel dependency trees yuan ding university...
TRANSCRIPT
2003 (c) University of Pennsylvania 1
Better MT Using Parallel Dependency Trees
Yuan Ding
University of Pennsylvania
2003 (c) University of Pennsylvania 2
Outline
Motivation The alignment algorithm
Algorithm at a glance The framework Heuristics
Walking through an example Evaluation Conclusion
2003 (c) University of Pennsylvania 3
Motivation (1)Statistical MT Approaches
Statistical MT approaches Pioneered by (Brown et al., 1990, 1993) Leverage large training corpus Outperform traditional transfer based approaches
Major Criticism No internal representation, syntax/semantics
2003 (c) University of Pennsylvania 4
Motivation (2) Hybrid Approaches
Hybrid approaches (Wu, 1997) (Alshawi et al., 2000) (Yamada and
Knight, 2001, 2002) (Gildea 2003) Applying statistical learning to structured data
Problems with Hybrid MT Approaches Structural Divergence (Dorr, 1994) Vagaries of loose translations in real corpora
2003 (c) University of Pennsylvania 5
Motivation (3)
Holy grail: Syntax based MT which captures structural
divergence Accomplished work
A new approach to the alignment of parallel dependency trees (paper published at MT summit IX)
Allowing non-isomorphism of dependency trees
2003 (c) University of Pennsylvania 6
We are here…
Word Alignmentusing Dependency Trees
Collecting Dependency Treelet Pairsand
Constructing a Synchronous Dependency GrammarSyntax based MT
Details givenin this talk
Undergoing work
2003 (c) University of Pennsylvania 7
Outline
Motivation The alignment algorithm
Algorithm at a glance The framework Heuristics
Walking through an example Evaluation Conclusion
2003 (c) University of Pennsylvania 8
Define the Alignment Problem
Define the alignment problem In natural language: find word mappings between
English and Foreign sentences
In math:
ff j jae
}0{eeja
DefinitionFor each , find a labeling ,where
2003 (c) University of Pennsylvania 9
The IBM Models
The IBM way Model 1: Orders of words don’t matter, i.e. “bag of
words” model Model 2: Condition the probabilities on the length
and position Model 3, 4, 5:
A. generate fertility of each english word B. generate the identity C. generate the position
Gradually adding positioning information
2003 (c) University of Pennsylvania 10
Using Dependency Trees
Positioning information can be acquired from parse trees Parsers: (Collins, 1999) (Bikel, 2002)
Problems with using parse trees directly Two types of nodes Unlexicalized non-terminals control the domain
Using dependency trees (Fox, 2002): best* phrasal cohesion properties (Xia, 2001): constructing dependency trees from
parse trees using the Tree Adjoining Grammar
2003 (c) University of Pennsylvania 11
The Framework (1) Step 1: train IBM model 1 for lexical mapping
probabilities Step 2: find and fix high confidence mappings
according to a heuristic function h(f, e)
The girl kissed her kitty cat The girl gave a kiss to her cat
A pseudo-translation example
2003 (c) University of Pennsylvania 12
The Framework (2)
Step 3: Partition the dependency trees on both sides w.r.t. fixed
mappings One fixed mapping creates one new “treelet” Create a new set of parallel dependency structures
The
kissed
girl
kittyher
cat
gave
girl cat
tokiss
The
a
her
2003 (c) University of Pennsylvania 13
The Framework (3)
Step 4: Go back to Step 1 unless enough nodes fixed
Algorithm properties An iterative algorithm Time complexity O(n * T(h)), where T(h) is the time
for the heuristic function in Step 2. P(f |e) in IBM Model 1 has a unique global maximun
Guaranteed convergence Results only depend on the heuristic function h(f, e)
2003 (c) University of Pennsylvania 14
Heuristics
Heuristic functions for Step 2 Objective: find out the confidence of a
mapping between a pair of words First Heuristic: Entropy
Intuition: model probability distribution shape Second heuristic: Inside-outside probability
Idea borrowed from PCFG parsing Fertility threshold: rule out unlikely fertility
ratio (>2.0)
2003 (c) University of Pennsylvania 15
Outline
Motivation The alignment algorithm
Algorithm at a glance The framework Heuristics
Walking through an example Evaluation Conclusion
2003 (c) University of Pennsylvania 16
Walking through an Example (1)
[English] I have been here since 1947. [Chinese] 1947 nian yilai wo yizhi zhu zai zheli.
Iteration 1: One dependency tree pair. Align “I” and “wo”
been
I
1947
sinceherehave
zhu
wo
zheli
zaiyizhiyilai
1947
nian
2003 (c) University of Pennsylvania 17
Walking through an Example (2)
Iteration 2: Partition and form two treelet pairs. Align “since” and “yilai”
been
1947
sinceherehave
zhu
zheli
zaiyizhiyilai
1947
nian
I wo
2003 (c) University of Pennsylvania 18
Walking through an Example (3)
Iteration 3: Partition and form three treelet pairs. Align “1947” and “1947”, “here” and “zheli”
been
1947
since
herehave
zhu
zheli
zaiyizhi
yilai
1947
nian
I wo
2003 (c) University of Pennsylvania 19
Outline
Motivation The alignment algorithm
Algorithm at a glance The framework Heuristics
Walking through an example Evaluation Conclusion
2003 (c) University of Pennsylvania 20
Evaluation
Training: LDC Xinhua newswire Chinese – English parallel corpus Filtered roughly 50%, 60K+ sentence pairs used The parser generated 53130 parsed sentence pairs.
Evaluation: 500 sentence pairs provided by Microsoft Research Asia. Word level aligned by hand.
F-score:
A: set of word pairs aligned by automatic alignment G: set of word pairs aligned in the gold file.
||||
||2
GA
GAF
2003 (c) University of Pennsylvania 21
Results (1)
Results for IBM Model 1 to Model 4 (GIZA)
Bootstrapped from Model 1 to Model 4
Signs of overfitting Suspect caused by
difference b/w genres in training/testing
Itn# IBM 1 IBM 2 IBM 3 IBM 4
1 0.0000 0.5128 0.5082 0.5130
2 0.2464 0.5288 0.5077 0.5245
3 0.4607 0.5274 0.5106 0.5240
4 0.4935 0.5275 0.5130 0.5247
5 0.5039 0.5245 0.5138 0.5236
6 0.5073 0.5215 0.5149 0.5220
7 0.5092 0.5191 0.5142 0.5218
8 0.5099 0.5160 0.5138 0.5212
9 0.5111 0.5138 0.5138 0.5195
10 0.5121 0.5127 0.5132 0.5195
2003 (c) University of Pennsylvania 22
Results (2)
Results for our algorithm: Heuristic h1: (entropy) Heuristic h2: (inside-outside
probability) The table shows results after
one iteration, M1 = IBM model 1
Overfitting problem mainly caused by violation of
the partition assumption in fine-grained dependency structures.
M1Itn#
Model h1
Modelh2
1 0.5549 0.5151
2 0.5590 0.5497
3 0.5632 0.5515
4 0.5615 0.5521
5 0.5615 0.5540
6 0.5603 0.5543
7 0.5612 0.5539
8 0.5604 0.5540
9 0.5611 0.5542
10 0.5622 0.5535
2003 (c) University of Pennsylvania 23
Outline
Motivation Algorithm at a glance The framework Heuristics Walking through an example Evaluation Conclusion
2003 (c) University of Pennsylvania 24
Conclusion
Model based on partitioning sentences according to their dependency structure
Without the unrealistic isomorphism assumption
Outperforms the unstructured IBM models on a large data set.
“Orthogonal” to the IBM models uses syntactic structure but no linear ordering
information.
2003 (c) University of Pennsylvania 25
Thank You!