corpus effects on the evaluation of automated transliteration systems

21
Corpus Effects on the Evaluation of Automated Transliteration Systems Sarvnaz Karimi Andrew Turpin Falk Scholer Introduction Corpus Experiments Conclusion Corpus Effects on the Evaluation of Automated Transliteration Systems Sarvnaz Karimi Andrew Turpin Falk Scholer School of Computer Science and Information Technology RMIT University, Melbourne, Australia 26 June 2007

Upload: sarvnaz-karimi

Post on 13-Apr-2017

126 views

Category:

Science


2 download

TRANSCRIPT

Page 1: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Corpus Effects on the Evaluation ofAutomated Transliteration Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

School of Computer Science and Information TechnologyRMIT University, Melbourne, Australia

26 June 2007

Page 2: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Transliteration

Machine Transliteration:Automatically transforming a word written in a sourcelanguage into a word in a target language.Example:

Prague (source) to À�QK� (target)

Evaluation:Machine generated words are compared with humangenerated ones. Human judgment is a gold standard.

Page 3: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Transliteration is SubjectiveHow to define correct transliteration?

Prague: À�QK� or ÀQK�?�����������������������������������������������������������������

�����������������������������������������������������������������

������������������������������������������������������������������������������

������������������������������������������������������������������������������?

?

AutomaticTransliterator

Target Word

Target Word

Source Word

Source Word

Human Transliterator

STANDARD

STANDARD

???

PrahaPrague

PragoPrag

PraagPragoPragPrague

PrahaPragPraagPrago

Page 4: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Evaluating Algorithms or Corpus?

When evaluating a transliteration algorithm, cana testing corpus mislead us in our judgments?

Algorithm

Corpus+

Algorithm+

Corpus

Specification

Page 5: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Experimental Scheme

◮ Transliteration systems: Two grapheme-basedalgorithms previously examined for English-Persianlanguage pairs. We refer them as system A andsystem B.

◮ Corpus: We constructed a controlled corpus(language origin, number of transliterators,transliterators language knowledge).

◮ Evaluation measure: Word accuracy and itsvariants, human agreement, entropy of transliterationrules (transliterator’s consistency).

Page 6: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Controlled corpus

We made a corpus with the following specifications:◮ Three datasets (English, Arabic and Dutch)

containing 500 word-pairs each.◮ Seven transliterators (Persian native speakers).◮ All of the transliterators knew English and Arabic and

had no Dutch knowledge.◮ All of the transliterators had at least a Bachelors

degree.◮ The origin of the words was not given to them.

Page 7: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Word Accuracy

WA =number of correct transliterations

total number of test words

If more than one judgment is available we can define:

1. Uniform Word Accuracy (UWA):All the variations suggested by transliterators are equally valid.

2. Weighted Word Accuracy (WWA):A weight is assigned to the transliterations based on the numberof people who suggested that variant.

3. Majority Word Accuracy (MWA):Only one of the transliterations suggested by majority of thetransliterators, is chosen as the correct one.

Page 8: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Language Origin and the Ranking of theSystems

Corpora with different language origins:

E7 D7 A7 EDA7

Corpus

0

20

40

60

80

100

Wor

d A

ccur

acy

(%)

UWA (SYS-B)UWA (SYS-A)MWA (SYS-B)MWA (SYS-A)

Randomly selected EDA sub-corpora:

0 20 40 60 80 100

Corpus

0

20

40

60

80

100

Wor

d A

ccur

acy

(%)

UWA (SYS-B)UWA (SYS-A)MWA (SYS-B)MWA (SYS-A)

Systems ranking remains constant but not accuracy values.

Page 9: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Language Origin and the Ranking of theSystems

Corpora with different language origins:

E7 D7 A7 EDA7

Corpus

0

20

40

60

80

100

Wor

d A

ccur

acy

(%)

UWA (SYS-B)UWA (SYS-A)MWA (SYS-B)MWA (SYS-A)

Randomly selected EDA sub-corpora:

0 20 40 60 80 100

Corpus

0

20

40

60

80

100

Wor

d A

ccur

acy

(%)

UWA (SYS-B)UWA (SYS-A)MWA (SYS-B)MWA (SYS-A)

Systems ranking remains constant but not accuracy values.

Page 10: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Accuracy and Single Transliterators

System A:

E7 D7 A7 EDA7

Corpus

0

20

40

60

Wor

d A

ccur

acy

(%)

T1T2T3T4T5T6T7

17.2

39.0

System B:

E7 D7 A7 EDA7

Corpus

0

20

40

60

Wor

d A

ccur

acy

(%)

T1T2T3T4T5T6T7

23.2

56.2

Evaluation can be heavily biased towards the judgments.

Page 11: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Accuracy and Single Transliterators

System A:

E7 D7 A7 EDA7

Corpus

0

20

40

60

Wor

d A

ccur

acy

(%)

T1T2T3T4T5T6T7

17.2

39.0

System B:

E7 D7 A7 EDA7

Corpus

0

20

40

60

Wor

d A

ccur

acy

(%)

T1T2T3T4T5T6T7

23.2

56.2

Evaluation can be heavily biased towards the judgments.

Page 12: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Accuracy and Number of Transliterators

Transliteration using a combination of transliterators(EDA corpus)

Creating a corpus for training and testing of a transliterat ion

system should be done using more than one transliterator.

Page 13: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Accuracy and Number of Transliterators

Transliteration using a combination of transliterators(EDA corpus)

Creating a corpus for training and testing of a transliterat ion

system should be done using more than one transliterator.

Page 14: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Human Agreement

How far do humans themselves agree ontransliteration?

Raw agreement adapted to calculate human agreement:

PA =

total number of actual agreementstotal number of possible agreements

Page 15: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Inter-Transliterator Agreement and PerceivedDifficulty

Transliterator’s perception of the task(H:hard, M: medium, E:easy)

Transliterator English Dutch Arabic1 H H M2 M M E3 M H M4 M M E5 M H E6 M H E7 M H M

Measured Agreement:English: 33.6%

Dutch: 15.5%

Arabic: 33.3%

There is a direct relation between transliterator knowledg e of the

source language which the words come from and their

agreement.

Page 16: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Inter-Transliterator Agreement and PerceivedDifficulty

Transliterator’s perception of the task(H:hard, M: medium, E:easy)

Transliterator English Dutch Arabic1 H H M2 M M E3 M H M4 M M E5 M H E6 M H E7 M H M

Measured Agreement:English: 33.6%

Dutch: 15.5%

Arabic: 33.3%

There is a direct relation between transliterator knowledg e of the

source language which the words come from and their

agreement.

Page 17: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Transliterator Consistency

A transliterator’s habit of transliteration defines the rulesof transforming words.

Rules: C→ (�, 0.6)

C→ (¸, 0.3)

C→ (úæ�, 0.1)

E7 D7 A7 EDA7

Corpus

0.0

0.2

0.4

0.6

Ent

ropy

E7 D7 A7 EDA7

Corpus

0

20

40

60

Wor

d A

ccur

acy

(%)

T1T2T3T4T5T6T7

The consistency with which transliterators employ their ow n

rules has a direct effect on the system’s accuracy.

Page 18: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Transliterator Consistency

A transliterator’s habit of transliteration defines the rulesof transforming words.

Rules: C→ (�, 0.6)

C→ (¸, 0.3)

C→ (úæ�, 0.1)

E7 D7 A7 EDA7

Corpus

0.0

0.2

0.4

0.6

Ent

ropy

E7 D7 A7 EDA7

Corpus

0

20

40

60

Wor

d A

ccur

acy

(%)

T1T2T3T4T5T6T7

The consistency with which transliterators employ their ow n

rules has a direct effect on the system’s accuracy.

Page 19: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Conclusions

Main achievements of our experiments:

1. Although different transliteration systems may havedifferent accuracy levels on different corpora, theirranking holds across these corpora.

2. One transliteration system can achieve differentaccuracy with corpora constructed by differenttransliterators. The variation can be up to 30% interms of word accuracy.

3. The origin of source words has a direct effect onsystem performance. The English origin words aregenerally transliterated more accurately than Arabicand Dutch origin words.

Page 20: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

Conclusion

Suggestions

◮ We conclude that, when making a collection fortransliteration we should construct it with assistanceof multiple transliterators (> 4) or make sure thattransliterations are from different sources that aremore likely reflect different people knowedge.

◮ When we report our results we should report:

1. The origin of source words.2. Number of transliterators who constructed the corpus

or exact process of corpus construction.

Corpus specifications are as important asalgorithms, and must be stated clearly in our

experiments.

Page 21: Corpus Effects on the Evaluation of Automated Transliteration Systems

Corpus Effectson the Evaluation

of AutomatedTransliteration

Systems

Sarvnaz KarimiAndrew TurpinFalk Scholer

Introduction

Corpus

Experiments

ConclusionThank You!