machine translationcs626-449/cs626-460-2008/public_html/om/... · ी˙ googlehindi-english...

18
Machine Translation Om Damani (Ack: Material taken from JurafskyMartin 2 nd Ed., Brown et. al. 1993) 2 The spirit is willing but the flesh is weak English-Russian Translation System Дух охотно готов но плоть слаба Russian-English Translation System The vodka is good, but the meat is rotten State of the Art Babelfish: Spirit is willingly ready but flesh it is weak Google: The spirit is willing but the flesh is week 3 The spirit is willing but the flesh is weak Google English-Hindi Translation System आमा पर शरीर दबल है Google Hindi-English Translation System Spirit on the flesh is weak State of the Art (English-Hindi) – March 19, 2009 4 Is state of the art so bad Google English-Hindi Translation System कला की हालत इतनी खराब है Google Hindi-English Translation System The state of the art is so bad Is State of the Art (English-Hindi) so bad 5 State of the english hindi translation is not so bad Google English-Hindi Translation System राय के अंमेज़ी िहदी अनुवाद का इतना बुरा नहीं है Google Hindi-English Translation System State of the English translation of English is not so bad State of the english-hindi translation is not so bad OK. Maybe it is __ bad. OK. Maybe it is __ bad. 6 State of the English Hindi translation is not so bad Google English-Hindi Translation System राय म अं मेज ी से िहंदी अनुवाद का इतना बुरा नहीं है Google Hindi-English Translation System English to Hindi translation in the state is not so bad State of the English-Hindi translation is not so bad OK. Maybe it is __ __ bad. OK. Maybe it is __ __ bad. राय के अंमेज़ी िहदी अनुवाद का इतना बुरा नहीं है

Upload: others

Post on 23-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

  • Machine Translation

    Om Damani

    (Ack: Material taken from JurafskyMartin 2nd Ed., Brown

    et. al. 1993)

    2

    The spirit is willing but the flesh is weak

    English-Russian Translation System

    Дух охотно готов но плоть слаба

    Russian-English Translation System

    The vodka is good, but the meat is rotten

    State of the Art

    Babelfish: Spirit is willingly ready but flesh it is weak

    Google: The spirit is willing but the flesh is week

    3

    The spirit is willing but the flesh is weak

    Google English-Hindi Translation System

    आ�मा पर शरीर दब�लु है

    Google Hindi-English Translation System

    Spirit on the flesh is weak

    State of the Art (English-Hindi) – March

    19, 2009

    4

    Is state of the art so bad

    Google English-Hindi Translation System

    कला की हालत इतनी खराब है

    Google Hindi-English Translation System

    The state of the art is so bad

    Is State of the Art (English-Hindi) so

    bad

    5

    State of the english hinditranslation is not so bad

    Google English-Hindi Translation System

    रा�य के अमेंज़ी िह�दी अनुवाद का इतना बुरा नहीं है

    Google Hindi-English Translation System

    State of the English translation of English is not so bad

    State of the english-hindi translation is

    not so bad

    OK. Maybe it is __ bad.OK. Maybe it is __ bad. 6

    State of the English Hindi translation is not so bad

    Google English-Hindi Translation System

    रा�य म! अमेंज ी से िहंदी अनुवाद का इतना बुरा नहीं है

    Google Hindi-English Translation System

    English to Hindi translation in the state is not so bad

    State of the English-Hindi translation is

    not so bad

    OK. Maybe it is __ __ bad.OK. Maybe it is __ __ bad.

    रा�य के अमेंज़ी िह�दी अनुवाद का इतना बुरा नहीं है

  • 7

    Your Approach to Machine Translation

    8

    Translation Approaches

    9

    Direct Transfer – What Novices do

    10

    Direct Transfer: Limitations

    Lexical Transfer: Many Bengali poet-PL,OBL this land of songs {sing has}- PrPer,Pl

    Many Bengali poets have sung songs of this land

    Final: Many Bengali poets of this land songs have sung

    Local Reordering: Many Bengali poet-PL,OBL of this land songs {has sing}- PrPer,Pl

    कई बंग ाली किवय' ने इस भूिम के ग ीत ग ाए ह,Kai Bangali kaviyon ne is bhoomi ke geet gaaye hain

    Morph: कई बंग ाली किव-PL,OBL ने इस भूिम के ग ीत {ग ाए है}-PrPer,PlKai Bangali kavi-PL,OBL ne is bhoomi ke geet {gaaye hai}-PrPer,Pl

    11

    Syntax Transfer

    (Analysis-Transfer-Generation)

    Here phrases NP, VP etc. can be arbitrarily large

    12

    Syntax Transfer Limitations

    He went to Patna -> Vah Patna gaya

    He went to Patil -> Vah Patil ke pas gaya

    Translation of went depends on the semantics of the object of went

    Fatima eats salad with spoon – what happens if you change spoon

    Semantic properties need to be included in transfer rules – Semantic Transfer

  • 13

    Interlingua Based Transfer

    you

    this

    farmer

    agtobj

    pur

    plc

    contact

    nam

    orregion

    khatav

    manchar

    taluka

    nam

    :01

    For this, you contact the farmers of Manchar region or of Khatav taluka.

    In theory: N analysis and N transfer modules in stead of N2

    In practice: Amazingly complex system to tackle N2 language pairs

    14

    Difficulties in Translation – Language Divergence

    (Concepts from Dorr 1993, Text/Figures from Dave, Parikh and Bhattacharyya 2002)

    Constituent Order Prepositional Stranding Null Subject

    Conflational Divergence Categorical Divergence

    15

    Lost in Translation: We are talking mostly about

    syntax, not semantics, or pragmatics

    You: Could you give me a glass of waterRobot: Yes.….wait..wait..nothing happens..wait……Aha, I see…You: Will you give me a glass of water…wait…wait..wait..

    Image from http://inicia.es/de/rogeribars/blog/lost_in_translation.gif

    16

    CheckPoint

    � State of the Art

    � Different Approaches

    � Translation Difficulty

    � Need for a novel approach

    17

    Statistical Machine Translation: Most

    ridiculous idea everConsider all possible partitions of a sentence.For a given partition,

    Consider all possible translations of each part.Consider all possible combinations of all possible translationsConsider all possible permutations of each combination

    And somehow select the best partition/translation/permutation

    कई बंग ाली किवय' ने इस भिूम के ग ीत ग ाए ह,Kai Bangali kaviyon ne is bhoomi ke geet gaaye hain

    have sung songsfarmPoets from Bangladesh

    song sungspacein thisMany poets from Bangal

    sing songs‘splaceto thisSeveral Bengali

    have sung poemoflandthisMany Bengali Poets

    ग ीत ग ाए ह,केभिूमने इसकई बंग ाली किवय'

    To this space have sung songs of many poets from Bangal

    18

    How many combinations are we talking

    about

    Number of choices for a N word sentence

    N=20 ??

    Number of possible chess games

  • 19

    How do we get the Phrase Table

    Collect large amount of bi-lingual parallel text.For each sentence pair,Consider all possible partitions of both sentencesFor a given partition pair,Consider all possible mapping between parts (phrases) on two side

    Somehow assign the probability to each phrase pair

    इसके िलए आप मंचर 1ेऽ के िकसान' सॆ संपक� कीिज ए

    For this you contact the farmers of Manchar region

    For this you contact the farmers of Manchar region

    इसके िलए आप मंचर 1ेऽके िकसान' सॆ संपक� कीिज ए

    Fatima eats riceफाितमा चावल खाती है

    20

    Data Sparsity Problems in Creating

    Phrase TableSunil is eating mangoe -> Sunil aam khata haiNoori is eating banana -> Noori kela khati haiSunil is eating banana -> We need examples of everyone eating everything !!

    We want to figure out that eating can be either khata hai or khati hai

    And let Language Model select from ‘Sunil kela khata hai’ and ‘Sunil kela khati hai’

    Select well-formed sentences among all candidates using LM

    21

    Formulating the Problem

    . A language model to compute P(E)

    . A translation model to compute P(F|E)

    . A decoder, which is given F and produces the most probable E

    22

    P(F|E) vs. P(E|F)

    P(F|E) is the translation probability – we need to look at the generationprocess by which pair is obtained.

    Parts of F correspond to parts of E. With suitable independence assumptions,P(F|E) measures whether all parts of E are covered by F.

    E can be quite ill-formed.

    It is OK if {P(F|E) for an ill-formed E} is greater than the {P(F|E) for a well formed E}. Multiplication by P(E) should hopefully take care of it.

    We do not have that luxury in estimating P(E|F) directly – we will need toensure that well-formed E score higher.

    Summary: For computing P(F|E), we may make several independence assumptions that are not valid. P(E) compensated for that.

    P(बािरश होरही है|It is raining) = .02P(बरसात आ रही है| It is raining) = .03P(बािरश होरही है|rain is happening) = .420

    We need to estimate P(It is raining| बािरश होरही है) vs. P(rain is happening| बािरश होरही है)

    23

    CheckPoint

    � From a parallel corpus, generate probabilistic phrase table

    � Give a sentence, generate various candidate translations using the phrase table

    � Evaluate the candidates using Translation and Language Models

    24

    What is the meaning of Probability of

    Translation

    � What is the meaning of P(F|E)� By Magic: you simply know P(F|E) for every (E,F) pair –

    counting in a parallel corpora� Or, each word in E generates one word of F, independent of

    every other word in E or F� Or, we need a ‘random process’ to generate F from E� A semantic graph G is generated from E and F is generated

    from G� We are no better off. We now have to estimate P(G|E) and P(F|G) for various G and then combine them – How?

    � We may have a deterministic procedure to convert E to G, in which case we still need to estimate P(F|G)

    � A parse tree TE is generated from E; TE is transformed to TF; finally TF is converted into F� Can you write the mathematical expression

  • 25

    The Generation Process

    � Partition: Think of all possible partitions of the source language

    � Lexicalization: For a give partition, translate each phrase into the foreign language

    � Spurious insertion: add foreign words that are not attributable to any source phrase

    � Reordering: permute the set of all foreign words -words possibly moving across phrase boundaries

    Try writing the probability expression for the generation process

    We need the notion of alignment

    26

    Generation Example: Alignment

    27

    Simplify Generation: Only 1->Many

    Alignments allowed

    28

    Alignment

    A function from target position to source position:

    The alignment sequence is: 2,3,4,5,6,6,6Alignment function A: A(1) = 2, A(2) = 3 ..A different alignment function will give the sequence:1,2,1,2,3,4,3,4 for A(1), A(2)..

    To allow spurious insertion, allow alignment with word 0 (NULL)No. of possible alignments: (I, J: length of the English, Foreign sentence)(I+1)J

    29

    IBM Model 1: Generative Process

    30

    IBM Model 1: Basic Formulation

    ),,|(*),|(*)|()|(

    :rit togethe Putting

    ),,|(*),|(),|,(),|(

    ),|(*)|(),'|(*)|'()|('

    AEJFPEJAPEJPEFP

    AEJFPEJAPEJAFPEJFP

    EJFPEJPEJFPEJPEFP

    A

    AA

    J

    ∑∑

    =

    ==

    == J’ different from J means P(F|J’,E) = 0

  • 31

    IBM Model 1: Details

    � No assumptions. Above formula is exact.

    � Choosing length: P(J|E) = P(J|E,I) = P(J|I) =

    � Choosing Alignment: all alignments equiprobable

    � Translation Probability

    ),,|(*),|(*)|()|( AEJFPEJAPEJPEFPA

    ∑=

    ∑∏=+

    =A

    J

    j

    aJ jjeft

    IEFP

    1

    )|(*)1(

    )|(ε

    ε

    ),,,|(*),,|(

    ),,|(*),|(

    ),,|(*),|(

    1

    1

    1

    1

    1

    1

    1

    1

    1

    11111

    IJj

    j

    J

    j

    Ij

    j

    IJJIJ

    eaJffPeJaaP

    eaJfPeJaP

    EJAFPEJAP

    −−

    =

    −∏=

    =

    ∑∏ −−=

    −=A

    IJj

    j

    J

    j

    Ij

    j eaJffPeJaaPEJPEFP ),,,|(*),,|(*)|()|( 11

    1

    1

    1

    1

    1

    1

    1

    1

    1),,|( 1

    1

    1+

    =−

    IeJaaP

    Ij

    j

    )|(),,,|( 11

    1

    1

    1 jajefteaJffP IJjj =

    −−

    32

    HMM Alignment

    � All alignments are not equally likely

    � Can you guess what properties does an alignment have

    � Alignments tend to be locality preserving – neighboring words tend to get aligned together

    � We would like P(aj) to depend on aj-1

    33

    HMM Alignment: Details� P(F,A|J,E) decomposed as

    P(A|J,E)*P(F|A,J,E) in Model 1

    � Now we will decompose it differently� (J is implict, not mentioned in conditional

    expressions)

    � Alignment Assumption (Markov): Alignment probability of Jth word P(aj) depends only on the alignment of the previous word aj-1

    � Translation assumption: probability of the foreign word fjdepends only on the aligned English word eaj

    ),,|(*),,|(

    ),,|,(

    )|,(

    )|,(

    11

    1

    11

    1

    1

    1

    1

    1

    1

    1

    1

    1

    1

    1

    111

    Ijj

    j

    Ijj

    j

    J

    j

    Ijj

    jj

    J

    j

    IJJ

    eaffPeafaP

    eafafP

    eafP

    EAFP

    −−−

    =

    −−

    =

    =

    =

    =

    ),|(),,|( 111

    1

    1

    1 IaaPeafaP jjIjj

    j −−− =

    )|(),,|( 111

    1 jaj

    Ijj

    j efPeaffP =−

    ∑∏∑ −=

    ==A

    ajjj

    J

    jAj

    efPIaaPIJPEAFPEFP )|(*),|(*)|()|,()|( 11 34

    Computing the Alignment Probability

    � P(aj|aj-1, I) is written as P(i|i’, I)

    � Assume - probability does not depend on absolute word positions but on the jump-width (i-i’) between words: P (4 | 6, 17) = P (5 | 7, 17)

    � Note: Denominator counts are collected over sentences of all lengths. But sum is performed over only those jump-widths relevant to (i,i’) – For I’=6: -5 to 11 is relevant

    35

    HMM Model - Example

    )|(*),|(*)|()|,(

    )|(*),|(*)|()|,()|(

    1

    1

    1

    1

    j

    j

    ajjj

    J

    j

    A

    ajjj

    J

    jA

    efPIaaPIJPEAFP

    efPIaaPIJPEAFPEFP

    =

    =

    ∑∏∑

    =

    ==

    P(F,A|E) = P(J=10|I=9)*P(2|start,9)*P(इसके|this)*P(1|2,9)*P(िलए|this)*P(3|1,9)*….*P(4|4,9)*P(कीिज ए|contact)

    36

    Enhancing the HMM model

    � Add NULL words in the English to which foreign words can align

    � Condition the alignment on the word class (say POS tag) of the previous English word

    � Other suggestions ??

    � What is the problem in making more realistic assumptions

    � How to estimate the parameters of the model

    ))(,,|(11 −− jajj

    eCIaaP

  • 37

    Checkpoint

    � Generative Process is important for computing probability expressions

    � Model1 and HMM model

    � What about Phrase Probabilities

    38

    Training Alignment Models

    � Given a parallel corpora, for each (F,E) learn the best alignment A and the component probabilities:

    � t(f|e) for Model 1

    � lexicon probability P(f|e) and alignment probability P(ai|ai-1,I) for the HMM model

    � How will you compute these probabilities if all you have is a parallel corpora

    39

    Intuition : Interdependence of

    Probabilities

    � If you knew which words are probable translation of each other then you can guess which alignment is probable and which one is improbable

    � If you were given alignments with probabilities then you can compute translation probabilities

    � Looks like a chicken and egg problem

    � Can you write equations expressing one in terms of other

    40

    Computing Alignment Probabilities� Align. Prob. In terms of

    trans. Prob. :P(A,F|J,E)

    � Compute P(A) in terms of P(A,F)� Note: Prior Prob. for all Alignments are equal. We are interested in posterior probabilities.

    � Can you specify translation prob. in terms of align. prob.

    ∏=+

    =

    =J

    j

    ajJ jeft

    I

    EJAFPEJAPEJFAP

    1

    )|(*)1(

    ),,|(*),|(),|,(

    ε

    ∑=

    A

    EFAP

    EFAPEFAP

    )|,(

    )|,(),|(

    41

    Computing Translation probabilities

    P(संपक� | contact) = 2/6

    What if alignments had probabilities

    .5

    .3

    = (.5*1+.3*1+.9*0)/(.5*3+.3*2+.9*1)=.8/3

    Note: It is not .5*1/3 + .3*1/2 + .9*0 ??

    .9

    42

    Computing Translation Probabilities –

    Maximum Likelihood Estimate

    ∑∑

    =

    =

    f

    EF A

    eftcount

    eftcounteft

    EFAefCEFAPeftcount

    )|(

    )|()|(

    ),,|,(*),|()|(,

  • 43

    Expectation Maximization (EM)

    Algorithm

    Used when we want maximum likelihood estimate of the parameters ofa model when the model depends on hidden variables-In present case, parameters are Translation Probabilities, and hidden Variables are alignment probabilities

    Init: Start with an arbitrary estimate of parametersE-step: compute the expected value of hidden variablesM-Step: Recompute the parameters that maximize the likelihood of

    data given the expected value of the hidden variables from E-step

    44

    Working out alignments for a simplified

    Model 1

    � Ignore the NULL words

    � Assume that every english word aligns with some foreign word (just to reduce the number of alignments for the illustration)

    45

    Example of EM

    Green houseCasa verde

    The houseLa casa

    Init: Assume that any word can generate any word with equal prob:

    P(la|house) = 1/3

    46

    E-Step

    ∏=+

    =

    =J

    j

    ajJ jeft

    I

    EJAFPEJAPEJFAP

    1

    )|(*)1(

    ),,|(*),|(),|,(

    ε

    E-Step:

    ∑=

    A

    EFAP

    EFAPEFAP

    )|,(

    )|,(),|(

    47

    M-Step

    ∑∑

    =

    =

    f

    EF A

    eftcount

    eftcounteft

    EFAefCAPeftcount

    )|(

    )|()|(

    ),,|,(*)()|(,

    48

    E-Step again

    ∏=+

    =

    =J

    j

    ajJ jeft

    I

    EJAFPEJAPEJFAP

    1

    )|(*)1(

    ),,|(*),|(),|,(

    ε

    ∑=

    A

    EFAP

    EFAPEFAP

    )|,(

    )|,(),|( 1/3 2/3 2/3 1/3

    Repeat till convergence

  • 49

    Computing Translation Probabilities in

    Model 1

    � E-M algo is fine, but it requires exponential computation

    � For each alignment we recompute alignment probability

    � Translation probability is computed from all alignment probabilities

    � We need efficient algo

    50

    Form of Eq. 10 suggests that EM algorithm can be used

    51 52

    From Exponential to polynomial computation

    53

    Checkpoint

    � Use of EM algorithm for estimating phrase probabilities under IBM Model-1

    � An example

    � And an efficient algorithm

    54

    Generating Bi-directional Alignments

    � Existing models only generate uni-directional alignments

    � Combine two uni-directional alignments to get many-to-many bi-directional alignments

    have sung songsfarmPoets from Bangladesh

    song sungspacein thisMany poets from Bangal

    sing songs‘splaceto thisSeveral Bengali

    have sung poemoflandthisMany Bengali Poets

    ग ीत ग ाए ह,केभिूमने इसकई बंग ाली किवय'

  • 55

    Eng-Hindi Alignment

    |destination

    |vacation

    |

    beach

    |premier

    |

    a

    is

    |

    Goa

    हैग ंत

  • 61

    Phrase Table

    62

    Checkpoint

    � Generating Phrase Table from sentence alignment

    63

    IBM Model 3

    Model 1 story seems bizarre -Who will first chose the sentence length and then align and then generate

    A more likely case is- generate translation for each word and then reorder

    Model 1 Generative story

    64

    Model 3 Generative Story

    65

    Model 3 Formula – P(F,A|E)

    � Ignore generation from NULL

    � Choosing Fertility:

    � Generating words:

    � Aligning words:

    )|(1

    i

    I

    i

    i en∏=

    φ

    ))|((*)!(11

    ja

    J

    j

    j

    I

    i

    i eft∏∏==

    φ

    ),,|(0:

    JIajdJ

    aj

    j

    j

    ∏≠

    आप िकसान' सॆ संपक� कीिज ए

    you contact the farmers 66

    Generating Spurious Words

    � Instead of using n(2|NULL) or n(1|NULL)

    � With probability p1, generate a spurious word every time a valid word is generated

    � Ensures that longer sentences generate more spurious words

  • 67

    Training for Model 3

    � We used EM algo to estimate t(f|e) for Model1

    � Many parameters in Model 3

    � Given the alignments, all parameters can be estimated easily

    ∏=+

    =

    =J

    j

    ajJ jeft

    I

    EJAFPEJAPEJAFP

    1

    )|(*)1(

    ),,|(*),|(),|,(

    ε

    68

    Model 3 Training cont..

    � We can use EM algo

    � For Model 3 – no efficient conversions of exponential computation to polynomial

    � Use Model 1, Model 2 to estimate best few alignment

    � Use these best few alignments to estimate other parameters

    69

    Model 4 and Model 5

    � Model 3 allows independent movement of different foreign words aligned to a given English word

    � Model 4 allows the movement of the phrase as a whole

    � Models 1-4 are deficient: they allow probability to strings that are not at all a sentence – various words stacked on top of each other

    � Model 5 removes this deficiency

    � Models 4 and 5 are quite complicated

    70

    Checkpoint

    � Bi-direction alignments – phrase table

    � Model 3 generative story

    � Model 3 parameters estimation

    71

    SMT Decoding

    72

    Search for the Best Translation

    Consider all possible partitions of a sentence.For a given partition,

    Consider all possible translations of each part.Consider all possible combinations of all possible translationsConsider all possible permutations of each combination

    And somehow select the best partition/translation/permutation

  • 73

    The Search Process

    74

    A Tall Task

    कई बंग ाली किवय' ने इस भिूम के ग ीत ग ाए ह,Kai Bangali kaviyon ne is bhoomi ke geet gaaye hain

    have sung songsfarmPoets from Bangladesh

    song sungspacein thisMany poets from Bangal

    sing songs‘splaceto thisSeveral Bengali

    have sung poemoflandthisMany Bengali Poets

    ग ीत ग ाए ह,केभिूमने इसकई बंग ाली किवय'

    To this space have sung songs of many poets from Bangal

    512 possiblesegmentations

    584 possiblephrase translationcombinations

    120 possible reorderings

    For a 20 word sentence, numbers will be tens of orders of magnitude higher

    75

    Even Simple Decoding is NP-Complete

    � Just pick one partition

    � Just pick one set of translations

    � Deciding among all possible ordering for the language model score is NP-Complete� Traveling Salesman Problem can be reduced to it

    w(ei->ej)=p(ej|ei)

    Source: www.umiacs.umd.edu/~nmadnani/pdf/decoding-slides.pdf

    76

    Cost of a path

    � If we are going to consider all possible candidates the cost can be evaluated at the end

    � Use of partial costs to prune the candidate space

    � do not take the unpromising paths

    � Keep top k candidates only at each stage

    � We can use language model probability of the candidate generated so far

    � Or we can take both translation, language model, and distortion cost into account

    ),,|(*)|(*)(),(cos0:1

    JIajdeftEPFEtJ

    aj

    ja

    J

    j

    j

    j

    j ∏∏≠=

    =

    At each round of the searchExpand all states on the stackEstimate cost for all statesKeep top k candidates only

    77

    A* Search

    � Current scheme biases the search process towards high probability beginnings

    � We need a cost-estimate for the remaining sentence

    � Accurate estimation of future cost means exploring the entire solution space

    � Make some heuristic approximation

    � If we always underestimate the future cost,

    � find one complete solution

    � discard all partial solutions whose expected cost is higher thanthe cost of the solution found

    Cost FutureCost Current Cost Total

    (p)*h g(p) (p)*f +=

    78

    Beam Search in place of A*

    � Coming up with a heuristic that always underestimates is expensive

    � Approximate by ignoring the distortion cost and just take sum of the approximate language model cost and the translation cost

    � Best segmentation for Translation Model cost can be found using Dynamic Programming ??

    � Language Model cost of the best segments approximated as sum of the LM cost of the segments ignoring any relation between the segments

    At each round of the searchExpand all states on the stack by one phraseEstimate cost for all statesKeep top k only

  • 79

    Using Multiple Stacks

    � Comparing the costs across hypotheses covering different foreign words is not meaningful

    कई बंग ाली किवय' ने - Lower Costकई किवय' ने ग ीत ग ाए ह,

    80

    Final Algo

    81

    Decoding Errors

    � Model Error

    � Solution does not belong to the search space

    � Right answer cannot by found even with perfect search

    � Results from unseen usage

    � Search Error

    � Most decoding operations result in search error

    � Best solution not found

    82

    Decoding: Further Issues

    � Hypotheses Recombination

    � Length Penalty

    � Discriminative Models

    83

    Hypothesis Recombination

    84

    Hypothesis Recombination

  • 85

    Length Penalty

    Language Model cost is typically lower for shorter sentences.Compensate by introducing a length penalty

    कई बंग ाली किवय' ने इस भूिम के ग ीत ग ाए ह,कई किवय' ने इस के ग ीत ग ाए ह, – lower cost

    86

    Have we taken all relevant facts into

    account

    � Our generative model considers various factors:

    � Language Model Probability

    � Phrase Translation Probability

    � Distortion Probability

    � Equal Weight Given to each factor

    � That is nice in theory

    � In practice, some factors more important than others

    � Approximations involved in modeling and computation

    � Unequal weightage becomes even more important

    ),,|(*)|(*)(),(cos0:1

    JIajdeftEPFEtJ

    aj

    ja

    J

    j

    j

    j

    j ∏∏≠=

    =

    87

    Many more Factors

    � Length Penalty

    � Reverse Translation Model P(E|F)

    � Word Coverage

    � Unknown Word Penalty

    Depending on the chosen phrase, some source words notcovered or unknown target words introduced

    88

    Discriminative Model

    Learning Task: Pick the best set of feature weights

    Used in practical decoders like Moses

    89

    Discriminative vs. Generative Models

    - Cheap

    - Better Results

    90

    Making Discriminative Model Usable

    � Billions of features for discriminative models:� Sample feature: poets occurs in E, किवय'occurs in F

    � Sample feature: poet occurs in E, किव occurs in F

    � Merge Generative and Discriminative Models

    � Features are based on Generative Models

    � Sample Feature: Translation Model cost, LM cost

  • 91

    Checkpoint

    � A simple Beam-Search based algorithm for decoding

    � Good cost function is important for search efficiency and goodness

    � Discriminative models

    92

    MT Evaluation

    � How will you evaluate whether the generated translation is any good

    Fluency of the given translation is:

    (4) Perfect: Good grammar

    (3) Fair: Easy-to-understand but flawed grammar

    (2)Acceptable: Broken - understandable with effort

    (1) Nonsense: Incomprehensible

    Adequacy: How much meaning of the reference

    sentence is conveyed in the translation?

    (4) All: No loss of meaning

    (3) Most: Most of the meaning is conveyed

    (2) Some: Some of the meaning is conveyed

    (1) None: Hardly any meaning is conveyed

    Somewhat corresponds to P(F|E)*P(E) – Language Model and Translation Model

    93

    Sample Evaluation

    44ज ीवािBवक सबंमण से इसकी ज ड़! ूभािवत होती ह,

    32इमु का प1ी रेटाइट का पिरवार को सबंंिधत होता है और थोड़ायह शुतुरमगु � के साथ समान िदखती ह,

    11हम! मेथी का फसल के बाद बोता है मेथी और धिनया फसलको अIJ बढ़ने या अIJ बढ़ने बता

    43परी1ण के अनुLप िनयिमत Mप से फल' की अIछी विृP के िलएखाद' की खरुाक! दी ज ानी चािहए

    32कवक के कारण आम की नाज ुक पिQयां यिद झुलसे रहे ह, 0.5ूितशत का बोडT िमौण 10 लीटर पानी के साथ तोिछड़का ज ानाचाि◌हए

    AdequacyFluencyHindi Output

    94

    Other Measures

    � Informativeness: Is the translation output sufficiently good for some task:

    � Comprehension: Multiple choice questions given based on the original passage. Raters answer the questions based on the translation. Percentage correct is the score

    � Topic Identification

    � Cross Lingual Information Retrieval: identifying relevant documents

    � Edit-cost : Effort required to edit the output to make it acceptable – number of edit operations needed, or time taken

    95

    Manual Evaluation

    � Issues ??

    96

    Automatic Evaluation: BLEU

    � Generate Reference Translations

    � Use Precision as the measure

    Candidate: The military always obeys the commands of the party.

    Reference: The military forces are always under the command of the Party.

    Unigram Match: 8/9Bigram Match: 4/8Trigramn Match: ??

    Issues ??

  • 97

    Improving over Raw Precision

    Multiple Reference Translations

    Cap the candidate word occurrence by maximum occurrence in reference

    From Papineni et. al. ACL 2002

    98

    BLEU Formulae

    Combine various N-Gram using Geometric Mean

    From Callison-Burch et. al.

    Brevity Penalty: Shorter sentences can have higher precision.Candidate: The military. Reference: The military forces are

    always under the command of the Party.

    c: candidate lengthr: reference length

    Precision: Average n-gram match over all sentences S in the reference corpus C. For a given sentence, consider the best reference among all candidate references.

    99

    BLEU Limitations

    Candidate: कीित� मंिदर एक राWीय ःमारक है Yय'िक यह birthplace है महा�मा ग ांधी का है .Reference: कीित� मंिदर राWीय इमारत है Yय'िक यह महा�मा ग ांधी का ज �म ःथल है .4-gram BLEU: ??

    Candidate: चामुंडा देवी का मंिदर वष� भर एक आदश� िपकिनक ःथल है Yय'िक इसम! एकआसान approach और एक ूभावशाली ििँय है

    Reference: चामुंडा देवी मंिदर वष� भर एकादश� ूमोद ःथल है Yयोिक इसकी पहचु आसान हैऔर ििँय ूभावी

    4-gram BLEU:

    Despite these limitations, BLEU stays popular because we need someautomatic measure.

    100

    Evaluation Checkpoint

    � Automatic Evaluation Metric is important

    � BLEU is the most popular metric

    � Serious limitations for Indian Languages

    101

    Alignment/SMT applications

    � Not just an application, but a fundamental building block

    � We saw the application to Transliteration

    � Can be applied to Question-Answering, as well.

    102

    QA Patterns

    � Q : Who invented the gramophone?

    � 1. invented the gramophone

    � 2. was the inventor of the gramophone

    � 3.’s invention is the gramophone

    � 4. was the father of the gramophone

    � 5. received a patent for the gramophone

  • 103

    Diagrams converted into pictures in

    next slides

    104

    इसके िलए आप मंचर 1ेऽ के िकसान' सॆ संपक� कीिज ए

    For this you contact the farmers of Manchar region

    आप िकसान' सॆ संपक� कीिज ए

    you contact the farmers

    105

    इसके िलए आप मंचर 1ेऽ के िकसान' सॆ संपक� कीिज ए

    For this you contact the farmers of Manchar region

    106

    इसके िलए िकसान' सॆ िमिलये

    For this you contact the farmers

    107

    इसके िलए आप मंचर 1ेऽ के िकसान' सॆ संपक� कीिज ए

    For this you contact the farmers of Manchar region

    108

    OchNey03 Heuristic: Intution

    � Decide the intersection

    � Extend it by adding alignments from the union if both the words in union alignment are not already aligned in the final alignment

    � Then add an alignment only if:� It already has an adjacent alignment in the final alignment, and,

    � Adding it will not cause any final alignment to have both horizontal and vertical neighbors as final alignments