f rom t ranslation m achine t heory to m achine t ranslation t heory – some initial t houghts...
TRANSCRIPT
FROM TRANSLATION MACHINE THEORY TO MACHINE TRANSLATION THEORY
– SOME INITIAL THOUGHTS
Oliver ČuloUniversität Mainz
MT AS TRANSLATION MACHINE THEORY
TOPICS OF (EARLY) SMT
• Calculating translation models (Brown et al. 1993)
• sentence alignment (Gale & Church 1991)• word alignment (Och & Ney 2003)
…and a plethora of papers on how to improve these
RECENT RESURGENCE OF LINGUISTICS
• MT and the phrase (Fox 2002, Koehn et al. 2003, Eisele 2006)
• MT and dependency (Ding & Palmer 2005, Quirk et al. 2005, Žabokrtský et al. 2008)
• hybrid architectures (Eisele et al. 2008)• domain adaption (Koehn & Schroeder 2007,
Bertoldi & Federico 2009)• factored models (Koehn & Hoang 2007)• …
TRANSLATION-THEORETIC MODELLING OF MT
MT AND FUNCTIONAL TRANSLATION THEORY (1)
• Skopos theory (Reiss & Vermeer 1984)• pragmalinguistic model (House 1997),
function and loyalty (Nord 1997, 2006)
functional equivalence change in functiondocumentary instrumental
over covert
MT AND FUNCTIONAL TRANSLATION THEORY (2)
• aimed at functional equivalence (but does a machine or a GT user know?)
• aimed at instrumental (but in fact rather documentary; ethical dimensions?)
MT AND FUNCTIONAL TRANSLATION THEORY (3)
• MT and its lack of translation–functional considerations in system design (Schmidt in print)
• “human, purposeful action”-theoretic conception of translation as hindrance to acceptance of MT (Rozmyslowicz in print)
KNOWLEDGE TRANSFER TS -> MT
English texts
German texts
Reference Corpus ER
Reference Corpus GR
Register-control led Corpus EO
Register-control led Corpus GO
Translat ion Corpus GTrans
T ranslat ion Corpus ET rans
17 registers, 2, 000 w ord
samples each
68, 000 words
8 registers, at least 10 texts each, 3, 125 w ords (av. )
1 mil l ion words
English texts
German texts
Reference Corpus ER
Reference Corpus GR
Register-control led Corpus EO
Register-control led Corpus GO
Translat ion Corpus GTrans
T ranslat ion Corpus ET rans
17 registers, 2, 000 w ord
samples each
68, 000 words
17 registers, 2, 000 w ord
samples each
17 registers, 2, 000 w ord
samples each
68, 000 words
8 registers, at least 10 texts each, 3, 125 w ords (av. )
1 mil l ion words
8 registers, at least 10 texts each, 3, 125 w ords (av. )
8 registers, at least 10 texts each, 3, 125 w ords (av. )
1 mil l ion words
CROCO
CROCO STRUCTURE: MULTILINGUAL
Register-controlled Corpus
Translation Corpus
Word layer
Word layer
Chunk layer
Chunk layer
Clause layer
Clause layer
Sentence layer
Sentence layer
+ Metainformatio
n+ PoS tagging + Morphology+ Sense relations
+ Phrase structure+ Grammatical functions
Alignment
layers
Tray 1 holds
In Fach 1 können bis zu 125 Blatt Papier eingelegt werden
PROBJ
SUBJ
SUBJ
FIN
FIN PRED
12
up to 125 sheets
DOBJ
FUNCTION SHIFTS (TYPOLOGICAL DIFFERENCES)
E2G_ESSAY
G2E_ESSAY
E2G_FICTIO
N
G2E_FICTIO
N
E2G_INST
R
G2E_INST
R05
101520253035404550
subj-*advsubj-*obj*adv-subj*obj-subj
FUNCTION SHIFTS PER REGISTER AND TRANSLATION DIRECTION
GRAMMATICAL FUNCTIONS IN THEME POSITION
EO_SHARE ETRANS_SHARE GO_SHARE GTRANS_SHARE
0
20
40
60
80
100
120
other
verbadv
compl
obj
subj
MT AND TRANSLATION FACTORS:REGISTER AND TRANSLATION DIRECTION
• often spoken of domains, but that term is too vague
• Kurokawa et al. (2009) – training translation models according to translation
direction (A), and without (B)– for a performance of (A) equivalent to (B), they
needed only ca. 1/5 of the data size• feature selection problem: which feature per
register and translation direction (e.g. Diwersy et al. 2013, also an overview in Oakes & Ji 2012)
POST-EDITING
INCREASING ROLE OF MT IN TRANSLATION
• MT integrated into Translation Memories, many translation workflows (SDL 2011, Bajon et al. 2012, O‘Brien 2012)
• as MT needs to be post-edited, in consequence post-editing becomes a more and more important component of the translator’s job profile
CRITT TPR DATABASE
project coordinator: Copenhagen Business School
English-German data collection at FTSK in Germersheim
translation vs. post-editing vs. (blind) editing
6 source texts (ST) with different complexity levels
(Hvelplund 2011)
12 professional translators, 12 semi-professional
translators
MT system: Google Translate
eye-tracking (Tobii TX 300), key-logging (Translog II),
retrospective questionnaires
EYE-TRACKING AND KEY-LOGGING POST-EDITING
PROCESSING TIMES
cf. Carl, Gutermuth & Hansen-Schirra in print
PROCESSING STYLES
Time
Wor
d nu
mbe
r
Time
Wor
d nu
mbe
r
PROCESSING PATTERNS
Time
Wor
d nu
mbe
r
Time
Wor
d nu
mbe
r
INTERFERENCE
ST: In a gesture sure to rattle the Chinese Government, Steven Spielberg pulled out of the Beijing Olympics to protest against China's backing for Sudan's policy in Darfur.
HT: Als Zeichen des Widerstands gegen die Chinesische Regierung... ‘As sign the-GEN. resistance against the Chinese government…’
LACK OF CONSISTENCY
ST: Killer nurse receives four life sentences. Hospital nurse C.N. was imprisoned for life today for the killing of four of his patients.
PE: Killer-Krankenschwester zu viermal lebenslanger Haft verurteilt. Der Krankenpfleger C.N. wurde heute auf Lebenszeit eingesperrt für die Tötung von vier seiner Patienten.
‘Killer nurse.FEM to four times lifetime imprisonment sentenced. The nurse.MASC C.N. was today on lifetime imprisoned for the killing of four his.GEN patients.
OVERVIEW
CONCLUSIONS AND SUGGESTIONS
FUTURE WORK
• Entrenchment of MT in TS (theory): – common ground– more acceptance– improved description of MT workflow for the
translator– imrpoved task descriptions for PE
SOME TENTATIVE SUGGESTIONS TO OURSELVES FOR BETTER TASK DESCRIPTION BASED ON TRANSLATOR CONCEPTS
Task description Function of the text (e.g. Nord 2006, House 1997)
terminological idiomaticity
As little as possible (rapid PE)
documentary Conceptually equivalent, non-terms but also dispreferred or deprecated terms may be used
Unidiomatic, but understandable wording may remain (disambiguated at word level!)
As much as possible (full PE)
Covert instrumental
Only allowed terms can be used
Phraseology according to the domain
Intermediate levels Overt instrumental (usable, but identifiable as translation)
Only terms, but also dispreferred and maybe deprecated
Idiomatic, but also non-standard phraseology
THANK YOU FOR YOUR ATTENTION!
... AND YOUR QUESTIONS, COMMENTS, ...
REFERENCES (1)Bertoldi, Nicola, and Marcello Federico. 2009. “Domain Adaption for Statistical Machine Translation with
Monolingual Resources.” In Proceedings of the Fourth Workshop on Statistical Machine Translation, 182–189. Athens, Greece: Association for Computational Linguistics.
Brown, Peter E., Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer. 1993. “The Mathematics of Statistical Machine Translation: Parameter Estimation.” Computational Linguistics 2 (19): 263–311.
Eisele, Andreas. 2006. “Parallel Corpora and Phrase-based Statistical Machine Translation for New Language Pairs via Multiple Intermediaries.” In 5th International Conference on Language Resources and Evaluation (LREC) 2006.
Eisele, Andreas, Christian Federmann, Hans Uszkoreit, Saint-Amand Hervé, Martin Kay, Michael Jellinghaus, Sabine Hunsicker, Teresa Herrmann, and Yu Chen. 2008. “Hybrid Architectures for Multi-Engine Machine Translation.” In Translating and the Computer 30. London, UK.
Fox, Heidi J. 2002. “Phrasal Cohesion and Statistical Machine Translation.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 304–11. Philadelphia: ACL.
Gale, William A, and Kenneth W Church. 1993. “A Program for Aligning Sentences in Bilingual Corpora.” Computational Linguistics 19 (1): 75–102.
House, Juliane. 1997. Translation Quality Assessment. A Model Revisited. Tübingen: Gunter Narr Verlag.Koehn, Philipp, Franz Josef Och, and Daniel Marcu. 2003. “Statistical Phrase-Based Translation.” In Proceedings
of HLT-NAACL 2003, 127–133.Koehn, Philipp, and Josh Schroeder. 2007. “Experiments in Domain Adaptation for Statistical Machine
Translation.” In ACL Workshop on Machine Translation 2007.
REFERENCES (2)Kurokawa, David, Cyril Goutte, and Pierre Isabelle. 2009. “Automatic Detection of Translated Text and Its Impact
on Machine Translation.” Proceedings. MT Summit XII, The Twelfth Machine Translation Summit International Association for Machine Translation Hosted by the Association for Machine Translation in the Americas.
Lapshinova-Koltunski, Ekaterina. 2013. “VARTRA: A Comparable Corpus for the Analysis of Translation Variation.” In Proceedings of the 6th Workshop on Building and Using Comparable Corpora, 77–86. Sofia, Bulgaria.
Lembersky, Gennadi, Noam Ordan, and Shuly Wintner. 2012. “Language Models for Machine Translation: Original Vs. Translated Texts.” Computational Linguistics 38 (4): 799–825.
Nord, Christiane. 1997. Translating as a Purposeful Activity. Functionalist Approaches Explained. Translation Theories Explained 1. Manchester: Jerome.
———. 2006. “Translating for Communicative Purposes Across Culture Boundaries.” Journal of Translation Studies 9 (1): 43–60.
Och, Franz-Josef, and Hermann Ney. 2003. “A Systematic Comparison of Various Statistical Alignment Models.” Computational Linguistics 29 (1): 19–51.
Reiss, Katharina, and Hans J. Vermeer. 1984. Grundlegung Einer Allgemeinen Translationstheorie. Linguistische Arbeiten 147. Tübingen: M. Niemeyer.