csa4050: advanced techniques in nlp

Jan 2005 Statistical MT 1

CSA4050: Advanced Techniques in NLP

Machine Translation III

Statistical MT


Statistical Translation

• Robust

• Domain independent

• Extensible

• Does not require language specialists

• Uses noisy channel model of translation


Noisy Channel ModelSentence Translation (Brown et. al. 1990)

sourcesentence

target sentence

sentence


The Problem of Translation

• Given a sentence T of the target language, seek the sentence S from which a translator produced T, i.e.

find S that maximises P(S|T)• By Bayes' theorem

P(S|T) = P(S) x P(T|S)

P(T)

whose denominator is independent of S.• Hence it suffices to maximise P(S) x P(T|S)


A Statistical MT System

Source Language

Model

TranslationModel

P(S) * P(T|S) = P(S|T)

S T

DecoderT S


The Three Components of a Statistical MT model

1. Method for computing language model probabilities (P(S))

2. Method for computing translation probabilities (P(S|T))

3. Method for searching amongst source sentences for one that maximisesP(S) * P(T|S)


A Simple Alignment Based Translation Model

Assumption: target sentence is generated from the source sentence word-by-word

S: John loves Mary

T: Jean aime Marie


Sentence Translation Probability

• According to this model, the translation probability of the sentence is just the product of the translation probabilities of the words.

• P(T|S) =P(Jean aime Marie|John loves Mary) =P(Jean|John) * P(aime|loves) * P(Marie|Mary)


More Realistic Example

The proposal will not now be implemented

Les propositions ne seront pas mises en application maintenant


Some Further Parameters

• Word Translation Probability:P(t|s)

• Fertility: the number of words in the target that are paired with each source word: (0 – N)

• Distortion: the difference in sentence position between the source word and the target word: P(i|j,l)


Searching

• Maintain list of hypotheses. Initial hypothesis: (Jean aime Marie | *)

• Search proceeds interatively. At each iteration we extend most promising hypotheses with additional wordsJean aime Marie | John(1) *Jean aime Marie | * loves(2) *Jean aime Marie | * Mary(3) *Jean aime Marie | Jean(1) *


Parameter Estimation

• In general - large quantities of data

• For language model, we need only source language text.

• For translation model, we need pairs of sentences that are translations of each other.

• Use EM Algorithm (Baum 1972) to optimize model parameters.


Experiment 1 (Brown et. al. 1990)

• Hansard. 40,000 pairs of sentences = approx. 800,000 words in each language.

• Considered 9,000 most common words in each language.

• Assumptions (initial parameter values)– each of the 9000 target words equally likely as

translations of each of the source words.– each of the fertilities from 0 to 25 equally likely for

each of the 9000 source words– each target position equally likely given each source

position and target length


English: the

French Probability

le .610

la .178

l’ .083

les .023

ce .013

il .012

de .009

à .007

que .007

Fertility Probability

1 .871

0 .124

2 .004


English: not

French Probability

pas .469

ne .460

non .024

pas du tout .003

faux .003

plus .002

ce .002

que .002

jamais .002


2 .758

0 .133

1 .106


English: hear

French Probability

bravo .992

entendre .005

entendu .002

entends .001


0 .584

1 .416


Bajada 2003/4

• 400 sentence pairs from Malta/EU accession treaty

• Three different types of alignment– Paragraph (precision 97% recall 97%)– Sentence (precision 91% recall 95%)– Word: 2 translation models

• Model 1: distortion independent• Model 2: distortion dependent


Bajada 2003/4

Model 1 Model 2

word pairs present 244 244

word pairs identified 145 145

correct 58 77

incorrect 87 68

precision 40% 53%

recall 24% 32%


Experiment 2

• Perform translation using 1000 most frequent words in the English corpus.

• 1,700 most frequently used French words in translations of sentences completely covered by 1000 word English vocabulary.

• 117,000 pairs of sentences completely covered by both vocabularies.

• Parameters of English language model from 570,000 sentences in English part.


Experiment 2 contd

• 73 French sentences tested from elsewhere in corpus. Results were classified as– Exact – same as actual translation– Alternate – same meaning– Different – legitimate translation but different

meaning– Wrong – could not be intepreted as a translation– Ungrammatical – grammatically deficient

• Corrections to the last three categories were made and keystrokes were counted


Results

Category # sentences percent

Exact 4 5

Alternate 18 25

Different 13 18

Wrong 11 15

Ungrammatical 27 37

Total 73


Results - Discussion

• According to Brown et. al., system performed successfully 48% of the time (first three categories).

• 776 keystrokes needed to repair 1916 keystrokes to generate all 73 translations from scratch.

• According to authors, system therefore reduces work by 60%.


Bibliography

• Statistical MTBrown et. al., A Statistical Approach to MT, Computational Linguistics 16.2, 1990 pp79-85 (search “ACL Anthology”)

csa4050: advanced techniques in nlp

Documents

target language

target words

source sentence word

source position

source sentences

source language text

translation modelassumption

sentence s