maria sukhareva, christian chiarcos · 2015. 4. 16. · [email protected] april 15, 2015...

23
Introduction in Machine Translation Maria Sukhareva, Christian Chiarcos Goethe University Frankfurt [email protected] April 15, 2015 Maria Sukhareva, Christian Chiarcos (Goethe University) Introduction in Machine Translation April 15, 2015 1 / 23

Upload: others

Post on 06-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Introduction in Machine Translation

    Maria Sukhareva, Christian Chiarcos

    Goethe University Frankfurt

    [email protected]

    April 15, 2015

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 1 / 23

  • Overview

    1 Motivation

    2 Statistical Machine Translation

    3 Organisation of the Seminar

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 2 / 23

  • What do we want to translate?

    Most translated Title Author Languages

    Book Bible - 2883Non-fictional book Universal Declaration of Human Rights UN 440Fictional book The Little Prince Antoine de Saint Exupry 253Website www.jw.org Jehovah’s Witnesses 440

    (a) Most translated literary works

    language %

    Mandarin 14.4Spanish 6.15English 5.43Hindi 4.7Arabic 4.43

    (b) Mostspokenlanguages

    But! Literature of low literary status is most frequently translated:scientific, technical documents, commercial and business transactions, administrativememoranda, legal documentation, instruction manuals, agricultural and medical text books,industrial patents, publicity leaflets, newspaper reports etc. etc.

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 3 / 23

  • Machine Translation Use Cases

    1 MT for information assimilation

    2 MT for information dissemination

    3 MT for communication

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 4 / 23

  • Machine Translation for assimilation

    Definition

    The class of translation in which an individual or organization wants togather material written by others in a variety of languages and convertthem all into his or her own language.

    Requirements:

    1 Fast translation of large volumes of data

    2 Support of multiple foreign languages

    3 Quantity over quality

    4 Domain independence

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 5 / 23

  • Examples of MT for assimilation

    1 Online MT are frequentlyused for assimilation (GoogleTranslate, Bing Translator,PROMT etc.)

    2 The user has little controlover the input

    3 The translation quality isusually not publishable

    (Happy Birtday! I have not seen you for a while. How isyour husband?)

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 6 / 23

  • Machine Translation for dissemination

    Definition

    The class in which an individual or organization wants to broadcast his orher own material, written in one language, in a variety of language to theworld.

    Requirements:

    1 Quality over quantity

    2 Publishable output

    3 No need for fast translation

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 7 / 23

  • Examples of MT for dissemination

    1 Commercial MT Systems

    2 The output of MT must berevised

    3 Human-aided MT Systems

    Dangers of MT dissemination

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 8 / 23

  • Machine Translation for communication

    Definition

    the class in which two or more individuals are in more or less immediateinteraction, typically via email or otherwise online, with an MT systemmediating between them.

    Requirements:

    1 Fast and robust translation

    2 No need for translation of big volume

    3 Robust towards mistakes in the input

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 9 / 23

  • Examples of MT for communication

    1 Speech translation e.g.various APPs for voicetranslation

    2 Translation of Emails, chats,SMS etc.

    3 VoxOx, JANUS, Jibbigo

    Prof. Alex Waibel LREC AntonioZampolli Prize Talkhttps://www.youtube.com/

    watch?v=g1uHRFPhnMA

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 10 / 23

    https://www.youtube.com/watch?v=g1uHRFPhnMAhttps://www.youtube.com/watch?v=g1uHRFPhnMA

  • History of MT

    1 Georgetown-IBM experimentclaimed to solve MTproblem

    2 Russian-English translator byIBM 701 exploited only 6grammar rules for 250 items

    3 After no further progresswas reported, research ofMT ”fell into deep sleep“...

    there is no immediate or predictableprospect of useful machinetranslation

    — ALPAC Report, 1966

    Figure : R. Reagan and H. Grosch at anIBM 701 in 1954

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 11 / 23

  • History: MT after ALPAC Report

    1 Most of the research inEurope and Canada

    2 Transfer and Interlinguasystems.

    TAUM-METEO, TAUM-AVIATION (Montreal), SUSY

    (Saar), ARIANE-78 (Grenoble), METAL (Austin) etc.

    Figure : Transfer-based and Interlingualsystems

    Interlingual Systems demand:1 Dictionaries for TL and SL

    2 Grammar rules for parsing and generation

    3 Transition rules

    4 Conceptual lexicon

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 12 / 23

  • Motivation for Statistical Machine Translation

    1 It does not demand expensive manual labour to build rules for variouslinguistic layers of representation

    2 Large amount of parallel corpora (EuroParl, OPUS, EAPCOUNT1,Parallel Bible Corpus, etc.) are available and constantly growing

    3 Statistical approaches proved to be successful in multiple areas of NLP

    4 Hybrid-based system on the basis of SMT with integration ofrule-based modules are possible

    5 Dynamic learning from user feedback

    1The English-Arabic Parallel Corpus Of United Nations TextsMaria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 13 / 23

  • Statistical Machine Translation

    1 Probabilistic view of MT: E - target language, F - source language,conventionally English and French

    2 Finding the most likely target sentence e for a source sentence f -argmaxeP(e|f ):

    3 e = argmaxeP(f |e)P(e)

    e = argmaxeP(f |e)P(e)1 Translation model: P(f |e): the set of possible translations for a

    target sentence

    2 A language model: P(e): how likely it is to observe e

    3 Decoding: argmax operation navigating through the space of possibletarget translation

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 14 / 23

  • Word-based Machine Translation

    Motivation1 Context-independent approach

    2 Word-based IBM Models

    3 IBM 1 Model: only lexical translation probabilities

    4 For IBM 1 Model example - black board

    Translation Model:P(f |e)Lexical translation probability:t(WORDe |WORDf )relative count ctotal count total(f )IBM 1: find the alignment for which the product of t(e|f ) is at itsmaximum

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 15 / 23

  • Phrase-based Machine Translation

    Motivation1 Phrases - n-grams

    2 Context resolves ambiguity

    3 Collocations

    Translation Model:

    P(f |e) =i∏

    i=1

    φ(fi |ei )d(starti − endi−1 − 1)

    needs word alignmentrelative frequency:

    φ(fi |ei ) =count(f , e)∑f count(f , e)

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 16 / 23

  • Distance-Based Reordering

    starti − endi−1 − 1, starti isthe starting position of theforeign phrase thattranslates to i th Englishphrase. endi is the endingposition of the foreignphrase that translates to i th

    English phrase

    reordering probability d ,d(x) = α|x | for α ∈ [0, 1]exponential with distance

    d is not learnt from the data

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 17 / 23

  • Grading

    1 Work engagement 33, 3%

    2 Presentation 33, 3%

    3 Technical report 33, 3%

    Work engagement

    1 Two questions for each paper sent to my email

    2 Or questions during the seminar

    3 Absence: twice without medical attest – warn by email (mandatory)

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 18 / 23

  • Grading

    1 Work engagement 33, 3%

    2 Presentation 33, 3%

    3 Technical report 33, 3%

    Presentation1 A paper of your choice: 30 min presentation, 10 min discussion

    2 Two papers per session

    3 Papers will be divided at the next session

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 19 / 23

  • Grading

    1 Work engagement 33, 3%

    2 Presentation 33, 3%

    3 Technical report 33, 3%

    Technical report

    1 A practical experiment with machine translations or textnormalisation tools

    2 Training, testing and evaluation

    3 Groups of two, three students

    4 No more than 3 pages.

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 20 / 23

  • Topics

    1 IBM Models

    2 Word alignment

    3 Language Models, KenLM, SRILM, IRSTLM.

    4 Phrase-based MT (Moses Toolkit)

    5 Character-based MT

    6 SMT Evaluation

    7 SMT for under-resource languages

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 21 / 23

  • Contacts

    Maria SukharevaDoctoral [email protected]

    Prof. Dr. Christian Chiarcoschiarcos@informatik.

    uni-frankfurt.de

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 22 / 23

    [email protected]@[email protected]

  • The End

    Maria Sukhareva, Christian Chiarcos (Goethe University)Introduction in Machine Translation April 15, 2015 23 / 23

    MotivationStatistical Machine TranslationOrganisation of the Seminar