corpus effects on the evaluation of automated transliteration systems
TRANSCRIPT
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Corpus Effects on the Evaluation ofAutomated Transliteration Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
School of Computer Science and Information TechnologyRMIT University, Melbourne, Australia
26 June 2007
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Transliteration
Machine Transliteration:Automatically transforming a word written in a sourcelanguage into a word in a target language.Example:
Prague (source) to À�QK� (target)
Evaluation:Machine generated words are compared with humangenerated ones. Human judgment is a gold standard.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Transliteration is SubjectiveHow to define correct transliteration?
Prague: À�QK� or ÀQK�?�����������������������������������������������������������������
�����������������������������������������������������������������
������������������������������������������������������������������������������
������������������������������������������������������������������������������?
?
AutomaticTransliterator
Target Word
Target Word
Source Word
Source Word
Human Transliterator
STANDARD
STANDARD
???
PrahaPrague
PragoPrag
PraagPragoPragPrague
PrahaPragPraagPrago
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Evaluating Algorithms or Corpus?
When evaluating a transliteration algorithm, cana testing corpus mislead us in our judgments?
Algorithm
Corpus+
Algorithm+
Corpus
Specification
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Experimental Scheme
◮ Transliteration systems: Two grapheme-basedalgorithms previously examined for English-Persianlanguage pairs. We refer them as system A andsystem B.
◮ Corpus: We constructed a controlled corpus(language origin, number of transliterators,transliterators language knowledge).
◮ Evaluation measure: Word accuracy and itsvariants, human agreement, entropy of transliterationrules (transliterator’s consistency).
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Controlled corpus
We made a corpus with the following specifications:◮ Three datasets (English, Arabic and Dutch)
containing 500 word-pairs each.◮ Seven transliterators (Persian native speakers).◮ All of the transliterators knew English and Arabic and
had no Dutch knowledge.◮ All of the transliterators had at least a Bachelors
degree.◮ The origin of the words was not given to them.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Word Accuracy
WA =number of correct transliterations
total number of test words
If more than one judgment is available we can define:
1. Uniform Word Accuracy (UWA):All the variations suggested by transliterators are equally valid.
2. Weighted Word Accuracy (WWA):A weight is assigned to the transliterations based on the numberof people who suggested that variant.
3. Majority Word Accuracy (MWA):Only one of the transliterations suggested by majority of thetransliterators, is chosen as the correct one.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Language Origin and the Ranking of theSystems
Corpora with different language origins:
E7 D7 A7 EDA7
Corpus
0
20
40
60
80
100
Wor
d A
ccur
acy
(%)
UWA (SYS-B)UWA (SYS-A)MWA (SYS-B)MWA (SYS-A)
Randomly selected EDA sub-corpora:
0 20 40 60 80 100
Corpus
0
20
40
60
80
100
Wor
d A
ccur
acy
(%)
UWA (SYS-B)UWA (SYS-A)MWA (SYS-B)MWA (SYS-A)
Systems ranking remains constant but not accuracy values.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Language Origin and the Ranking of theSystems
Corpora with different language origins:
E7 D7 A7 EDA7
Corpus
0
20
40
60
80
100
Wor
d A
ccur
acy
(%)
UWA (SYS-B)UWA (SYS-A)MWA (SYS-B)MWA (SYS-A)
Randomly selected EDA sub-corpora:
0 20 40 60 80 100
Corpus
0
20
40
60
80
100
Wor
d A
ccur
acy
(%)
UWA (SYS-B)UWA (SYS-A)MWA (SYS-B)MWA (SYS-A)
Systems ranking remains constant but not accuracy values.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Accuracy and Single Transliterators
System A:
E7 D7 A7 EDA7
Corpus
0
20
40
60
Wor
d A
ccur
acy
(%)
T1T2T3T4T5T6T7
17.2
39.0
System B:
E7 D7 A7 EDA7
Corpus
0
20
40
60
Wor
d A
ccur
acy
(%)
T1T2T3T4T5T6T7
23.2
56.2
Evaluation can be heavily biased towards the judgments.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Accuracy and Single Transliterators
System A:
E7 D7 A7 EDA7
Corpus
0
20
40
60
Wor
d A
ccur
acy
(%)
T1T2T3T4T5T6T7
17.2
39.0
System B:
E7 D7 A7 EDA7
Corpus
0
20
40
60
Wor
d A
ccur
acy
(%)
T1T2T3T4T5T6T7
23.2
56.2
Evaluation can be heavily biased towards the judgments.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Accuracy and Number of Transliterators
Transliteration using a combination of transliterators(EDA corpus)
Creating a corpus for training and testing of a transliterat ion
system should be done using more than one transliterator.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Accuracy and Number of Transliterators
Transliteration using a combination of transliterators(EDA corpus)
Creating a corpus for training and testing of a transliterat ion
system should be done using more than one transliterator.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Human Agreement
How far do humans themselves agree ontransliteration?
Raw agreement adapted to calculate human agreement:
PA =
total number of actual agreementstotal number of possible agreements
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Inter-Transliterator Agreement and PerceivedDifficulty
Transliterator’s perception of the task(H:hard, M: medium, E:easy)
Transliterator English Dutch Arabic1 H H M2 M M E3 M H M4 M M E5 M H E6 M H E7 M H M
Measured Agreement:English: 33.6%
Dutch: 15.5%
Arabic: 33.3%
There is a direct relation between transliterator knowledg e of the
source language which the words come from and their
agreement.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Inter-Transliterator Agreement and PerceivedDifficulty
Transliterator’s perception of the task(H:hard, M: medium, E:easy)
Transliterator English Dutch Arabic1 H H M2 M M E3 M H M4 M M E5 M H E6 M H E7 M H M
Measured Agreement:English: 33.6%
Dutch: 15.5%
Arabic: 33.3%
There is a direct relation between transliterator knowledg e of the
source language which the words come from and their
agreement.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Transliterator Consistency
A transliterator’s habit of transliteration defines the rulesof transforming words.
Rules: C→ (�, 0.6)
C→ (¸, 0.3)
C→ (úæ�, 0.1)
E7 D7 A7 EDA7
Corpus
0.0
0.2
0.4
0.6
Ent
ropy
E7 D7 A7 EDA7
Corpus
0
20
40
60
Wor
d A
ccur
acy
(%)
T1T2T3T4T5T6T7
The consistency with which transliterators employ their ow n
rules has a direct effect on the system’s accuracy.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Transliterator Consistency
A transliterator’s habit of transliteration defines the rulesof transforming words.
Rules: C→ (�, 0.6)
C→ (¸, 0.3)
C→ (úæ�, 0.1)
E7 D7 A7 EDA7
Corpus
0.0
0.2
0.4
0.6
Ent
ropy
E7 D7 A7 EDA7
Corpus
0
20
40
60
Wor
d A
ccur
acy
(%)
T1T2T3T4T5T6T7
The consistency with which transliterators employ their ow n
rules has a direct effect on the system’s accuracy.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Conclusions
Main achievements of our experiments:
1. Although different transliteration systems may havedifferent accuracy levels on different corpora, theirranking holds across these corpora.
2. One transliteration system can achieve differentaccuracy with corpora constructed by differenttransliterators. The variation can be up to 30% interms of word accuracy.
3. The origin of source words has a direct effect onsystem performance. The English origin words aregenerally transliterated more accurately than Arabicand Dutch origin words.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
Conclusion
Suggestions
◮ We conclude that, when making a collection fortransliteration we should construct it with assistanceof multiple transliterators (> 4) or make sure thattransliterations are from different sources that aremore likely reflect different people knowedge.
◮ When we report our results we should report:
1. The origin of source words.2. Number of transliterators who constructed the corpus
or exact process of corpus construction.
Corpus specifications are as important asalgorithms, and must be stated clearly in our
experiments.
Corpus Effectson the Evaluation
of AutomatedTransliteration
Systems
Sarvnaz KarimiAndrew TurpinFalk Scholer
Introduction
Corpus
Experiments
ConclusionThank You!