presentation

Translation Memory Retrieval Methods[Bloodgood and Strauss, 2014] in Proc of 14th EACL

Koichi Akabe and Philip Arthur

NAIST MT Study

2014-07-03

2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 1 / 27

Introduction

Translation Memory (TM)

▶ Most widely used computer-assisted translation (CAT) tool

▶ Suggest translations using other translations

En The dog opened the door.

Ja 犬がドアを開けた。

En I saw a girl with a telescope.

Ja 僕は望遠鏡で少女を見た。

En John opened the door.

1. Find the nearest source sentence

2. Suggest a translation

3. Post-editing

Ja 犬がドアを開けた。 (fuzzy)

3. Post-editing

Ja ジョンがドアを開けた。

3. Post-editing

How to find the nearest source sentence?

TM finds the nearest source sentence using similarity metrics

▶ Edit distance (Leven-shtein distance)−→ Widely used metric

▶ MT evaluation metrics [Simard and Fujita, 2012]−→ WER, BLEU, NIST, VMeteor, Meteor as TM metrics

▶ This paper

Threshold of helpfulness

Matching algorithm always returns the nearest sentenceHowever, low score suggestions should not be shown

TM softwares set the threshold at 70% in practice

TM softwares set the threshold at 70% in practice −→ Why?

Translation Memory Similarity Metrics

Definitions

TM Similarity Metrics compare M and C.M : workload sentenceC: source language side of a candidate pre-existing translation

En The dog opened the door .

En I saw a girl with a telescope .

En John opened the door .

M =John opened the door .C1 =The dog opened the door .C2 =I saw a girl with a telescope ....

Definitions

TM Similarity Metrics compare M and C.M : workload sentenceC: source language side of a candidate pre-existing translation

En The dog opened the door .

En I saw a girl with a telescope .

En John opened the door .

M =John opened the door .C1 =The dog opened the door .C2 =I saw a girl with a telescope ....

Translation Memory Similarity Metrics

Compare the following metrics:

▶ Percent Match

▶ Weighted Percent Match

▶ Edit Distance

▶ N-gram Precision

▶ Weighted N-gram Precision

▶ Modified Weighted N-gram Precision

Percent Match (PM)

The simplest metric

PM(M,C) =|Munigrams ∩ Cunigrams|

|Munigrams|

M =John opened the door .C =The dog opened the door .

PM(M,C) =4

5= 0.80

Percent Match (PM)

The simplest metric

|Munigrams|

PM(M,C) =4

5= 0.80

Percent Match (PM)

The simplest metric

|Munigrams|

PM(M,C) =4

5= 0.80

Percent Match (PM)

The simplest metric

|Munigrams|

PM(M,C) =4

5= 0.80

Weighted Percent Match (WPM)

We want to know translation of rare words

PM with IDF weighting

WPM(M,C) =

∑u∈{Munigrams∩Cunigrams}

idf(u,D)

∑u∈Munigrams

idf(u,D)

where D is a set of all source sentences in the parallel corpus

WPM(M,C) =

idf(u,D)

∑u∈Munigrams

idf(u,D)

WPM(M,C) =

idf(u,D)

∑u∈Munigrams

idf(u,D)

Problem of PM and WPM

PM and WPM only consider coverage of words

−→ They cannnot see any context

We show methods that consider contexts in next slides

PM and WPM only consider coverage of words−→ They cannnot see any context

Edit Distance (ED)

Widely used metric

ED = max

(1− edit-dist(M,C)

|Munigrams|, 0

)where edit-dist(M,C) is the number of word insertions, deletions,and substitutions required to transform M into C

M =John opened the door .C =The dog opened the door .substitution: 1insertion: 1

ED(M,C) = 1− 2

5= 0.60

Edit Distance (ED)

Widely used metric

ED = max

|Munigrams|, 0

ED(M,C) = 1− 2

5= 0.60

Edit Distance (ED)

Widely used metric

ED = max

|Munigrams|, 0

substitution: 1insertion: 1

ED(M,C) = 1− 2

5= 0.60

Edit Distance (ED)

Widely used metric

ED = max

|Munigrams|, 0

M =John opened the door .C =The dog opened the door .substitution: 1

insertion: 1

ED(M,C) = 1− 2

5= 0.60

Edit Distance (ED)

Widely used metric

ED = max

|Munigrams|, 0

ED(M,C) = 1− 2

5= 0.60

Edit Distance (ED)

Widely used metric

ED = max

|Munigrams|, 0

ED(M,C) = 1− 2

5= 0.60

N-gram Precision (NGP)

Mean of N-gram precision (like the BLEU metric)However, BLEU → 0 when the precision of longer N-grams is 0

This work uses arithmetic mean instead of geometric mean

NGP =1

N∑n=1

pn =|Mn-grams ∩ Cn-grams|

Z ∗ |Mn-grams|+ (1− Z) ∗ |Cn-grams|

where Z is a parameter to control normalization,and N is the maximum length of N-gramN = 4 and Z = 0.75 in main experiments (discuss later)

Mean of N-gram precision (like the BLEU metric)

However, BLEU → 0 when the precision of longer N-grams is 0

NGP =1

N∑n=1

NGP =1

N∑n=1

NGP =1

N∑n=1

Weighted N-gram Precision (WNGP)

NGP with IDF weighting

WNGP =

N∑n=1

∑i∈{Mn-grams∩Cn-grams}

∑i∈Mn-grams

+ (1− Z) ∗

∑i∈Cn-grams

w(i) =∑

1-gram∈iidf(1-gram,D)

Weighted N-gram Precision (WNGP)

NGP with IDF weighting

WNGP =

N∑n=1

∑i∈{Mn-grams∩Cn-grams}

∑i∈Mn-grams

+ (1− Z) ∗

∑i∈Cn-grams

w(i) =∑

1-gram∈iidf(1-gram,D)

Modified Weighted N-gram Precision (MWNGP)

Shorter N-grams may help translators more than longer N-grams

WNGP =

N∑n=1

MWNGP =2N

2N − 1

N∑n=1

Modified Weighted N-gram Precision (MWNGP)

Shorter N-grams may help translators more than longer N-grams

WNGP =

N∑n=1

MWNGP =2N

2N − 1

N∑n=1

Experiment

Two different technicals domains with Two different language pairs(Fr-En, Zn-En).

▶ Zn-En: OpenOffice3

▶ Fr-En: EMEA

Preprocessing is performed on both source sides to produce validsegment.

Some sentences are randomly sampled from corpus as M and C.

▶ Zn-En: 400 M and 10.000 C.

▶ Fr-En: 300 M and 10.000 C.

Experiment

▶ Fr-En: EMEA

▶ Zn-En: 400 M and 10.000 C.

▶ Fr-En: 300 M and 10.000 C.

Experiment

▶ Fr-En: EMEA

▶ Zn-En: 400 M and 10.000 C.

▶ Fr-En: 300 M and 10.000 C.

Experiment

▶ Fr-En: EMEA

▶ Zn-En: 400 M and 10.000 C.

▶ Fr-En: 300 M and 10.000 C.

Experiment

▶ Fr-En: EMEA

▶ Zn-En: 400 M and 10.000 C.

▶ Fr-En: 300 M and 10.000 C.

Evaluation

Evaluation is performed with Human Evaluation using AmazonMechanical Turk.

The Score is ranging from 1 to 5 (Not Helpful until ExtremelyHelpful).

Each segment M is rated by 5 Turkers and we keep track whichmetric performs best (ties is allowed).

The scores of each M are averaged as Mean Opinion Score(MOS).

Evaluation

Result and Analysis

Result: Which metric performs best?

Table OO3 Zn-En

Metric Found Best Total C

PM 178 400WPM 200 400

ED 193 400NGP 251 400

WNGP 271 400MWNGP 282 400

Table EMEA Fr-En

PM 166 300WPM 184 300

ED 148 300NGP 188 300

WNGP 198 300MWNGP 201 300

Modified Weighted N-Gram Precision (MWNGP) achieved thebest result compared to any other metrics.

There are slight different between WNGP and Modified-WNGP.

Table OO3 Zn-En

PM 178 400WPM 200 400

ED 193 400NGP 251 400

WNGP 271 400MWNGP 282 400

Table EMEA Fr-En

PM 166 300WPM 184 300

ED 148 300NGP 188 300

WNGP 198 300MWNGP 201 300

Table OO3 Zn-En

PM 178 400WPM 200 400

ED 193 400NGP 251 400

WNGP 271 400MWNGP 282 400

Table EMEA Fr-En

PM 166 300WPM 184 300

ED 148 300NGP 188 300

WNGP 198 300MWNGP 201 300

Table OO3 Zn-En

PM 178 400WPM 200 400

ED 193 400NGP 251 400

WNGP 271 400MWNGP 282 400

Table EMEA Fr-En

PM 166 300WPM 184 300

ED 148 300NGP 188 300

WNGP 198 300MWNGP 201 300

Scatterplot: OO3 Percent Match

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Scatterplot: OO3 Edit Distance

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Scatterplot: OO3 Modified N-Gram Precision

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

The effect of Z: Adjusting for length preferences

Many of the metrics are using Z as parameters.

Z parameter can be used to control for length preferences.

Table EMEA Fr-En

Z Value Avg Length0.00 9.92980.25 13.2040.50 16.01340.75 19.63551.00 27.8829

Table OO3 Zn-En

Smaller Z prefered shorter match that are more precise andincreased precision.

Larger Z prefers longer match that contains many correcttranslations and increased recall.

Table EMEA Fr-En

Table OO3 Zn-En

Table EMEA Fr-En

Table OO3 Zn-En

Table EMEA Fr-En

Table OO3 Zn-En

Table EMEA Fr-En

Table OO3 Zn-En

Conclusion

▶ This paper compares TM similarity metrics.

▶ The best method is Modified Weighted N-Gram Precision.

▶ All the discussed metrics only consider source sides in thecalculation.

▶ Z parameter is used to adjust the length preferences of theretrieved TM.

Conclusion

Thank you for your attention!

presentation

Technology

contextwe

table oo3

8829table

source language

dierent language

track whichmetric

table emea

prefered shorter