2003 (c) university of pennsylvania1 better mt using parallel dependency trees yuan ding university...

25
2003 (c) University of Pennsylva nia 1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

Upload: edwin-jefferson

Post on 29-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 1

Better MT Using Parallel Dependency Trees

Yuan Ding

University of Pennsylvania

Page 2: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 2

Outline

Motivation The alignment algorithm

Algorithm at a glance The framework Heuristics

Walking through an example Evaluation Conclusion

Page 3: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 3

Motivation (1)Statistical MT Approaches

Statistical MT approaches Pioneered by (Brown et al., 1990, 1993) Leverage large training corpus Outperform traditional transfer based approaches

Major Criticism No internal representation, syntax/semantics

Page 4: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 4

Motivation (2) Hybrid Approaches

Hybrid approaches (Wu, 1997) (Alshawi et al., 2000) (Yamada and

Knight, 2001, 2002) (Gildea 2003) Applying statistical learning to structured data

Problems with Hybrid MT Approaches Structural Divergence (Dorr, 1994) Vagaries of loose translations in real corpora

Page 5: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 5

Motivation (3)

Holy grail: Syntax based MT which captures structural

divergence Accomplished work

A new approach to the alignment of parallel dependency trees (paper published at MT summit IX)

Allowing non-isomorphism of dependency trees

Page 6: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 6

We are here…

Word Alignmentusing Dependency Trees

Collecting Dependency Treelet Pairsand

Constructing a Synchronous Dependency GrammarSyntax based MT

Details givenin this talk

Undergoing work

Page 7: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 7

Outline

Motivation The alignment algorithm

Algorithm at a glance The framework Heuristics

Walking through an example Evaluation Conclusion

Page 8: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 8

Define the Alignment Problem

Define the alignment problem In natural language: find word mappings between

English and Foreign sentences

In math:

ff j jae

}0{eeja

DefinitionFor each , find a labeling ,where

Page 9: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 9

The IBM Models

The IBM way Model 1: Orders of words don’t matter, i.e. “bag of

words” model Model 2: Condition the probabilities on the length

and position Model 3, 4, 5:

A. generate fertility of each english word B. generate the identity C. generate the position

Gradually adding positioning information

Page 10: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 10

Using Dependency Trees

Positioning information can be acquired from parse trees Parsers: (Collins, 1999) (Bikel, 2002)

Problems with using parse trees directly Two types of nodes Unlexicalized non-terminals control the domain

Using dependency trees (Fox, 2002): best* phrasal cohesion properties (Xia, 2001): constructing dependency trees from

parse trees using the Tree Adjoining Grammar

Page 11: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 11

The Framework (1) Step 1: train IBM model 1 for lexical mapping

probabilities Step 2: find and fix high confidence mappings

according to a heuristic function h(f, e)

The girl kissed her kitty cat The girl gave a kiss to her cat

A pseudo-translation example

Page 12: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 12

The Framework (2)

Step 3: Partition the dependency trees on both sides w.r.t. fixed

mappings One fixed mapping creates one new “treelet” Create a new set of parallel dependency structures

The

kissed

girl

kittyher

cat

gave

girl cat

tokiss

The

a

her

Page 13: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 13

The Framework (3)

Step 4: Go back to Step 1 unless enough nodes fixed

Algorithm properties An iterative algorithm Time complexity O(n * T(h)), where T(h) is the time

for the heuristic function in Step 2. P(f |e) in IBM Model 1 has a unique global maximun

Guaranteed convergence Results only depend on the heuristic function h(f, e)

Page 14: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 14

Heuristics

Heuristic functions for Step 2 Objective: find out the confidence of a

mapping between a pair of words First Heuristic: Entropy

Intuition: model probability distribution shape Second heuristic: Inside-outside probability

Idea borrowed from PCFG parsing Fertility threshold: rule out unlikely fertility

ratio (>2.0)

Page 15: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 15

Outline

Motivation The alignment algorithm

Algorithm at a glance The framework Heuristics

Walking through an example Evaluation Conclusion

Page 16: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 16

Walking through an Example (1)

[English] I have been here since 1947. [Chinese] 1947 nian yilai wo yizhi zhu zai zheli.

Iteration 1: One dependency tree pair. Align “I” and “wo”

been

I

1947

sinceherehave

zhu

wo

zheli

zaiyizhiyilai

1947

nian

Page 17: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 17

Walking through an Example (2)

Iteration 2: Partition and form two treelet pairs. Align “since” and “yilai”

been

1947

sinceherehave

zhu

zheli

zaiyizhiyilai

1947

nian

I wo

Page 18: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 18

Walking through an Example (3)

Iteration 3: Partition and form three treelet pairs. Align “1947” and “1947”, “here” and “zheli”

been

1947

since

herehave

zhu

zheli

zaiyizhi

yilai

1947

nian

I wo

Page 19: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 19

Outline

Motivation The alignment algorithm

Algorithm at a glance The framework Heuristics

Walking through an example Evaluation Conclusion

Page 20: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 20

Evaluation

Training: LDC Xinhua newswire Chinese – English parallel corpus Filtered roughly 50%, 60K+ sentence pairs used The parser generated 53130 parsed sentence pairs.

Evaluation: 500 sentence pairs provided by Microsoft Research Asia. Word level aligned by hand.

F-score:

A: set of word pairs aligned by automatic alignment G: set of word pairs aligned in the gold file.

||||

||2

GA

GAF

Page 21: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 21

Results (1)

Results for IBM Model 1 to Model 4 (GIZA)

Bootstrapped from Model 1 to Model 4

Signs of overfitting Suspect caused by

difference b/w genres in training/testing

Itn# IBM 1 IBM 2 IBM 3 IBM 4

1 0.0000 0.5128 0.5082 0.5130

2 0.2464 0.5288 0.5077 0.5245

3 0.4607 0.5274 0.5106 0.5240

4 0.4935 0.5275 0.5130 0.5247

5 0.5039 0.5245 0.5138 0.5236

6 0.5073 0.5215 0.5149 0.5220

7 0.5092 0.5191 0.5142 0.5218

8 0.5099 0.5160 0.5138 0.5212

9 0.5111 0.5138 0.5138 0.5195

10 0.5121 0.5127 0.5132 0.5195

Page 22: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 22

Results (2)

Results for our algorithm: Heuristic h1: (entropy) Heuristic h2: (inside-outside

probability) The table shows results after

one iteration, M1 = IBM model 1

Overfitting problem mainly caused by violation of

the partition assumption in fine-grained dependency structures.

M1Itn#

Model h1

Modelh2

1 0.5549 0.5151

2 0.5590 0.5497

3 0.5632 0.5515

4 0.5615 0.5521

5 0.5615 0.5540

6 0.5603 0.5543

7 0.5612 0.5539

8 0.5604 0.5540

9 0.5611 0.5542

10 0.5622 0.5535

Page 23: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 23

Outline

Motivation Algorithm at a glance The framework Heuristics Walking through an example Evaluation Conclusion

Page 24: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 24

Conclusion

Model based on partitioning sentences according to their dependency structure

Without the unrealistic isomorphism assumption

Outperforms the unstructured IBM models on a large data set.

“Orthogonal” to the IBM models uses syntactic structure but no linear ordering

information.

Page 25: 2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania

2003 (c) University of Pennsylvania 25

Thank You!