simplification and explicitation universals

Translation StudiesSimplification and Explicitation Universals

Claudiu Mihaila

Faculty of Computer Science”Alexandru Ioan Cuza” University of Iasi

21 April 2010

Outline

IntroductionMotivationTranslation studies

SimplificationDefinitonSimplification prosSimplification cons

ExplicitationDefinitonExplicitation prosExplicitation cons

Conclusions

2 of 13

Motivation

• The questions◦ Is there a difference between original and translated language?◦ If so, is it automatically detectable?◦ And if so, does it improve NLP quality?

• The answers◦ Yes!◦ Yes: up to 97.62% for simplification◦ Yes:

• Human translator (self-)assessment• Statistical machine translation• Multilingual plagiarism detection

3 of 13

Motivation

• The questions◦ Is there a difference between original and translated language?◦ If so, is it automatically detectable?◦ And if so, does it improve NLP quality?

• The answers◦ Yes!◦ Yes: up to 97.62% for simplification◦ Yes:

• Human translator (self-)assessment• Statistical machine translation• Multilingual plagiarism detection

3 of 13

Translation studies

• Specific lexico-grammatical and syntactic characteristics

• Translationese - Gellerstam (1986)◦ ”Fingerprints” left behind by the translation process

• Translation laws - Toury (1983)◦ Standardisation, Interference

• Translation universals - Baker (1993)◦ Simplification, Explicitation, Convergence, Normalisation

4 of 13

Translation studies

4 of 13

Translation studies

4 of 13

Translation studies

4 of 13

Simplification

• Tendency to produce simpler and easier-to-follow texts

• Laviosa (2002)◦ Study on small corpus◦ Features for simplification◦ Insufficient evidence

5 of 13

Simplification

• Tendency to produce simpler and easier-to-follow texts

• Laviosa (2002)◦ Study on small corpus◦ Features for simplification◦ Insufficient evidence

5 of 13

Simplification pros

• Baroni (2006)◦ Detect originals and translations in an Italian corpus◦ Uni-, bi-, tri-grams, word forms, lemmas, and POS tags◦ Supervised learning system◦ Accuracy up to 87%

• Corpas (2008a)◦ English-into-Spanish and Spanish medical and technical texts◦ Validated for lexical richness◦ Contradicted for complex sentences, sentence length, ambiguity,

information load, depth of syntactic trees

• Corpas (2008b)◦ Validated for lexical richness and density, number of discourse

markers, complex sentences, sentence length◦ More visible for technical domain

6 of 13

Simplification pros

6 of 13

Simplification pros

6 of 13

Simplification pros

• Ilisei (2010)◦ 21 language-independent features◦ Supervised machine learning - 8 classifiers◦ Accuracy of 97.62%◦ Most salient features - InfoGain, ChiSquare

• Lexical richness• Sentence length• Proportions of pronouns, conjunctions, grammatical and lexical words

7 of 13

Simplification cons

• Jantunen (2001)◦ Boosters in Finnish translations - hyvin, kovin, oikein◦ typical lexical combinations in most cases

• Jantunen (2004)◦ Boosters in Finnish translations - hyvin, kovin, oikein◦ untypical lexical combinations in translations◦ similar colligations in originals and translations

8 of 13

Simplification cons

• Jantunen (2001)◦ Boosters in Finnish translations - hyvin, kovin, oikein◦ typical lexical combinations in most cases

• Jantunen (2004)◦ Boosters in Finnish translations - hyvin, kovin, oikein◦ untypical lexical combinations in translations◦ similar colligations in originals and translations

8 of 13

Explicitation

• Introducing overt information into the translation that is implicit inthe source language

• Classification - Pym (2005)◦ Obligatory explicitation

• Forced by language specificity or grammar

◦ Voluntary explicitation

• Optional information to avoid misinterpretations

9 of 13

Explicitation

• Introducing overt information into the translation that is implicit inthe source language

• Classification - Pym (2005)◦ Obligatory explicitation

• Forced by language specificity or grammar

◦ Voluntary explicitation

• Optional information to avoid misinterpretations

9 of 13

Explicitation pros

• Burnett (1999)◦ BNC vs. TEC◦ suggest, admit, claim, think, believe, hope, know

• Olohan (2000)◦ BNC vs. TEC◦ say / tell + that / zero connective

• Olohan (2001)◦ BNC vs. TEC◦ promise + that / zero connective

10 of 13

Explicitation pros

10 of 13

Explicitation pros

10 of 13

Explicitation cons

• Cheong (2006)◦ Explicitation vs. implicitation◦ English-into-Korean translations◦ The phenomena appear equally◦ The direction of translation influences their behaviour

11 of 13

Conclusions

• Simplification◦ Many studies supporting it◦ Many studies contradicting it◦ Not yet clearly confirmed

• Explicitation◦ Occuring often to avoid misinterpretations◦ Implicitation needs to be considered as well

• Usefulness◦ SMT◦ Multilingual plagiarism detection◦ (Self-)assessment of translators’s work

12 of 13

Conclusions

12 of 13

Conclusions

12 of 13

Thank you!

• Questions?

13 of 13

simplification and explicitation universals

Technology