csa4050: advanced techniques in nlp

24
Jan 2005 Statistical MT 1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT

Upload: guinevere-mckinney

Post on 31-Dec-2015

43 views

Category:

Documents


0 download

DESCRIPTION

CSA4050: Advanced Techniques in NLP. Machine Translation III Statistical MT. Statistical Translation. Robust Domain independent Extensible Does not require language specialists Uses noisy channel model of translation. Noisy Channel Model Sentence Translation (Brown et. al. 1990). - PowerPoint PPT Presentation

TRANSCRIPT

Jan 2005 Statistical MT 1

CSA4050: Advanced Techniques in NLP

Machine Translation III

Statistical MT

Jan 2005 Statistical MT 2

Statistical Translation

• Robust

• Domain independent

• Extensible

• Does not require language specialists

• Uses noisy channel model of translation

Jan 2005 Statistical MT 3

Noisy Channel ModelSentence Translation (Brown et. al. 1990)

sourcesentence

target sentence

sentence

Jan 2005 Statistical MT 4

The Problem of Translation

• Given a sentence T of the target language, seek the sentence S from which a translator produced T, i.e.

find S that maximises P(S|T)• By Bayes' theorem

P(S|T) = P(S) x P(T|S)

P(T)

whose denominator is independent of S.• Hence it suffices to maximise P(S) x P(T|S)

Jan 2005 Statistical MT 5

A Statistical MT System

Source Language

Model

TranslationModel

P(S) * P(T|S) = P(S|T)

S T

DecoderT S

Jan 2005 Statistical MT 6

The Three Components of a Statistical MT model

1. Method for computing language model probabilities (P(S))

2. Method for computing translation probabilities (P(S|T))

3. Method for searching amongst source sentences for one that maximisesP(S) * P(T|S)

Jan 2005 Statistical MT 7

ProbabilisticLanguage Models

• GeneralP(s1s2...sn) =P(s1)*P(s2|s1) ...*P(sn|s1...s(n-1))

• TrigramP(s1s2...sn) =P(s1)*P(s2|s1)*P(s3|s1,s2) ...*P(sn|s(n-1)s(n-2))

• BigramP(s1s2...sn) =P(s1)*P(s2|s1) ...*P(sn|s(n-1))

Jan 2005 Statistical MT 8

A Simple Alignment Based Translation Model

Assumption: target sentence is generated from the source sentence word-by-word

S: John loves Mary

T: Jean aime Marie

Jan 2005 Statistical MT 9

Sentence Translation Probability

• According to this model, the translation probability of the sentence is just the product of the translation probabilities of the words.

• P(T|S) =P(Jean aime Marie|John loves Mary) =P(Jean|John) * P(aime|loves) * P(Marie|Mary)

Jan 2005 Statistical MT 10

More Realistic Example

The proposal will not now be implemented

Les propositions ne seront pas mises en application maintenant

Jan 2005 Statistical MT 11

Some Further Parameters

• Word Translation Probability:P(t|s)

• Fertility: the number of words in the target that are paired with each source word: (0 – N)

• Distortion: the difference in sentence position between the source word and the target word: P(i|j,l)

Jan 2005 Statistical MT 12

Searching

• Maintain list of hypotheses. Initial hypothesis: (Jean aime Marie | *)

• Search proceeds interatively. At each iteration we extend most promising hypotheses with additional wordsJean aime Marie | John(1) *Jean aime Marie | * loves(2) *Jean aime Marie | * Mary(3) *Jean aime Marie | Jean(1) *

Jan 2005 Statistical MT 13

Parameter Estimation

• In general - large quantities of data

• For language model, we need only source language text.

• For translation model, we need pairs of sentences that are translations of each other.

• Use EM Algorithm (Baum 1972) to optimize model parameters.

Jan 2005 Statistical MT 14

Experiment 1 (Brown et. al. 1990)

• Hansard. 40,000 pairs of sentences = approx. 800,000 words in each language.

• Considered 9,000 most common words in each language.

• Assumptions (initial parameter values)– each of the 9000 target words equally likely as

translations of each of the source words.– each of the fertilities from 0 to 25 equally likely for

each of the 9000 source words– each target position equally likely given each source

position and target length

Jan 2005 Statistical MT 15

English: the

French Probability

le .610

la .178

l’ .083

les .023

ce .013

il .012

de .009

à .007

que .007

Fertility Probability

1 .871

0 .124

2 .004

Jan 2005 Statistical MT 16

English: not

French Probability

pas .469

ne .460

non .024

pas du tout .003

faux .003

plus .002

ce .002

que .002

jamais .002

Fertility Probability

2 .758

0 .133

1 .106

Jan 2005 Statistical MT 17

English: hear

French Probability

bravo .992

entendre .005

entendu .002

entends .001

Fertility Probability

0 .584

1 .416

Jan 2005 Statistical MT 18

Bajada 2003/4

• 400 sentence pairs from Malta/EU accession treaty

• Three different types of alignment– Paragraph (precision 97% recall 97%)– Sentence (precision 91% recall 95%)– Word: 2 translation models

• Model 1: distortion independent• Model 2: distortion dependent

Jan 2005 Statistical MT 19

Bajada 2003/4

Model 1 Model 2

word pairs present 244 244

word pairs identified 145 145

correct 58 77

incorrect 87 68

precision 40% 53%

recall 24% 32%

Jan 2005 Statistical MT 20

Experiment 2

• Perform translation using 1000 most frequent words in the English corpus.

• 1,700 most frequently used French words in translations of sentences completely covered by 1000 word English vocabulary.

• 117,000 pairs of sentences completely covered by both vocabularies.

• Parameters of English language model from 570,000 sentences in English part.

Jan 2005 Statistical MT 21

Experiment 2 contd

• 73 French sentences tested from elsewhere in corpus. Results were classified as– Exact – same as actual translation– Alternate – same meaning– Different – legitimate translation but different

meaning– Wrong – could not be intepreted as a translation– Ungrammatical – grammatically deficient

• Corrections to the last three categories were made and keystrokes were counted

Jan 2005 Statistical MT 22

Results

Category # sentences percent

Exact 4 5

Alternate 18 25

Different 13 18

Wrong 11 15

Ungrammatical 27 37

Total 73

Jan 2005 Statistical MT 23

Results - Discussion

• According to Brown et. al., system performed successfully 48% of the time (first three categories).

• 776 keystrokes needed to repair 1916 keystrokes to generate all 73 translations from scratch.

• According to authors, system therefore reduces work by 60%.

Jan 2005 Statistical MT 24

Bibliography

• Statistical MTBrown et. al., A Statistical Approach to MT, Computational Linguistics 16.2, 1990 pp79-85 (search “ACL Anthology”)