simplification and explicitation universals
DESCRIPTION
Simplification and Explicitation UniversalsTRANSCRIPT
Translation StudiesSimplification and Explicitation Universals
Claudiu Mihaila
Faculty of Computer Science”Alexandru Ioan Cuza” University of Iasi
21 April 2010
Outline
IntroductionMotivationTranslation studies
SimplificationDefinitonSimplification prosSimplification cons
ExplicitationDefinitonExplicitation prosExplicitation cons
Conclusions
2 of 13
Motivation
• The questions◦ Is there a difference between original and translated language?◦ If so, is it automatically detectable?◦ And if so, does it improve NLP quality?
• The answers◦ Yes!◦ Yes: up to 97.62% for simplification◦ Yes:
• Human translator (self-)assessment• Statistical machine translation• Multilingual plagiarism detection
3 of 13
Motivation
• The questions◦ Is there a difference between original and translated language?◦ If so, is it automatically detectable?◦ And if so, does it improve NLP quality?
• The answers◦ Yes!◦ Yes: up to 97.62% for simplification◦ Yes:
• Human translator (self-)assessment• Statistical machine translation• Multilingual plagiarism detection
3 of 13
Translation studies
• Specific lexico-grammatical and syntactic characteristics
• Translationese - Gellerstam (1986)◦ ”Fingerprints” left behind by the translation process
• Translation laws - Toury (1983)◦ Standardisation, Interference
• Translation universals - Baker (1993)◦ Simplification, Explicitation, Convergence, Normalisation
4 of 13
Translation studies
• Specific lexico-grammatical and syntactic characteristics
• Translationese - Gellerstam (1986)◦ ”Fingerprints” left behind by the translation process
• Translation laws - Toury (1983)◦ Standardisation, Interference
• Translation universals - Baker (1993)◦ Simplification, Explicitation, Convergence, Normalisation
4 of 13
Translation studies
• Specific lexico-grammatical and syntactic characteristics
• Translationese - Gellerstam (1986)◦ ”Fingerprints” left behind by the translation process
• Translation laws - Toury (1983)◦ Standardisation, Interference
• Translation universals - Baker (1993)◦ Simplification, Explicitation, Convergence, Normalisation
4 of 13
Translation studies
• Specific lexico-grammatical and syntactic characteristics
• Translationese - Gellerstam (1986)◦ ”Fingerprints” left behind by the translation process
• Translation laws - Toury (1983)◦ Standardisation, Interference
• Translation universals - Baker (1993)◦ Simplification, Explicitation, Convergence, Normalisation
4 of 13
Simplification
• Tendency to produce simpler and easier-to-follow texts
• Laviosa (2002)◦ Study on small corpus◦ Features for simplification◦ Insufficient evidence
5 of 13
Simplification
• Tendency to produce simpler and easier-to-follow texts
• Laviosa (2002)◦ Study on small corpus◦ Features for simplification◦ Insufficient evidence
5 of 13
Simplification pros
• Baroni (2006)◦ Detect originals and translations in an Italian corpus◦ Uni-, bi-, tri-grams, word forms, lemmas, and POS tags◦ Supervised learning system◦ Accuracy up to 87%
• Corpas (2008a)◦ English-into-Spanish and Spanish medical and technical texts◦ Validated for lexical richness◦ Contradicted for complex sentences, sentence length, ambiguity,
information load, depth of syntactic trees
• Corpas (2008b)◦ Validated for lexical richness and density, number of discourse
markers, complex sentences, sentence length◦ More visible for technical domain
6 of 13
Simplification pros
• Baroni (2006)◦ Detect originals and translations in an Italian corpus◦ Uni-, bi-, tri-grams, word forms, lemmas, and POS tags◦ Supervised learning system◦ Accuracy up to 87%
• Corpas (2008a)◦ English-into-Spanish and Spanish medical and technical texts◦ Validated for lexical richness◦ Contradicted for complex sentences, sentence length, ambiguity,
information load, depth of syntactic trees
• Corpas (2008b)◦ Validated for lexical richness and density, number of discourse
markers, complex sentences, sentence length◦ More visible for technical domain
6 of 13
Simplification pros
• Baroni (2006)◦ Detect originals and translations in an Italian corpus◦ Uni-, bi-, tri-grams, word forms, lemmas, and POS tags◦ Supervised learning system◦ Accuracy up to 87%
• Corpas (2008a)◦ English-into-Spanish and Spanish medical and technical texts◦ Validated for lexical richness◦ Contradicted for complex sentences, sentence length, ambiguity,
information load, depth of syntactic trees
• Corpas (2008b)◦ Validated for lexical richness and density, number of discourse
markers, complex sentences, sentence length◦ More visible for technical domain
6 of 13
Simplification pros
• Ilisei (2010)◦ 21 language-independent features◦ Supervised machine learning - 8 classifiers◦ Accuracy of 97.62%◦ Most salient features - InfoGain, ChiSquare
• Lexical richness• Sentence length• Proportions of pronouns, conjunctions, grammatical and lexical words
7 of 13
Simplification cons
• Jantunen (2001)◦ Boosters in Finnish translations - hyvin, kovin, oikein◦ typical lexical combinations in most cases
• Jantunen (2004)◦ Boosters in Finnish translations - hyvin, kovin, oikein◦ untypical lexical combinations in translations◦ similar colligations in originals and translations
8 of 13
Simplification cons
• Jantunen (2001)◦ Boosters in Finnish translations - hyvin, kovin, oikein◦ typical lexical combinations in most cases
• Jantunen (2004)◦ Boosters in Finnish translations - hyvin, kovin, oikein◦ untypical lexical combinations in translations◦ similar colligations in originals and translations
8 of 13
Explicitation
• Introducing overt information into the translation that is implicit inthe source language
• Classification - Pym (2005)◦ Obligatory explicitation
• Forced by language specificity or grammar
◦ Voluntary explicitation
• Optional information to avoid misinterpretations
9 of 13
Explicitation
• Introducing overt information into the translation that is implicit inthe source language
• Classification - Pym (2005)◦ Obligatory explicitation
• Forced by language specificity or grammar
◦ Voluntary explicitation
• Optional information to avoid misinterpretations
9 of 13
Explicitation pros
• Burnett (1999)◦ BNC vs. TEC◦ suggest, admit, claim, think, believe, hope, know
• Olohan (2000)◦ BNC vs. TEC◦ say / tell + that / zero connective
• Olohan (2001)◦ BNC vs. TEC◦ promise + that / zero connective
10 of 13
Explicitation pros
• Burnett (1999)◦ BNC vs. TEC◦ suggest, admit, claim, think, believe, hope, know
• Olohan (2000)◦ BNC vs. TEC◦ say / tell + that / zero connective
• Olohan (2001)◦ BNC vs. TEC◦ promise + that / zero connective
10 of 13
Explicitation pros
• Burnett (1999)◦ BNC vs. TEC◦ suggest, admit, claim, think, believe, hope, know
• Olohan (2000)◦ BNC vs. TEC◦ say / tell + that / zero connective
• Olohan (2001)◦ BNC vs. TEC◦ promise + that / zero connective
10 of 13
Explicitation cons
• Cheong (2006)◦ Explicitation vs. implicitation◦ English-into-Korean translations◦ The phenomena appear equally◦ The direction of translation influences their behaviour
11 of 13
Conclusions
• Simplification◦ Many studies supporting it◦ Many studies contradicting it◦ Not yet clearly confirmed
• Explicitation◦ Occuring often to avoid misinterpretations◦ Implicitation needs to be considered as well
• Usefulness◦ SMT◦ Multilingual plagiarism detection◦ (Self-)assessment of translators’s work
12 of 13
Conclusions
• Simplification◦ Many studies supporting it◦ Many studies contradicting it◦ Not yet clearly confirmed
• Explicitation◦ Occuring often to avoid misinterpretations◦ Implicitation needs to be considered as well
• Usefulness◦ SMT◦ Multilingual plagiarism detection◦ (Self-)assessment of translators’s work
12 of 13
Conclusions
• Simplification◦ Many studies supporting it◦ Many studies contradicting it◦ Not yet clearly confirmed
• Explicitation◦ Occuring often to avoid misinterpretations◦ Implicitation needs to be considered as well
• Usefulness◦ SMT◦ Multilingual plagiarism detection◦ (Self-)assessment of translators’s work
12 of 13
Thank you!
• Questions?
13 of 13