cs460/626 : natural language processing/speech, nlp and the web (lecture 17– alignment in smt)...

34
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Upload: blaise-booth

Post on 11-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

CS460/626 : Natural Language Processing/Speech, NLP and the Web

(Lecture 17– Alignment in SMT)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

14th Feb, 2011

Page 2: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Language Divergence Theory: Lexico-Semantic Divergences (ref: Dave, Parikh, Bhattacharyya, Journal of MT, 2002)

Conflational divergence F: vomir; E: to be sick E: stab; H: churaa se maaranaa (knife-with hit) S: Utrymningsplan; E: escape plan

Structural divergence E: SVO; H: SOV

Categorial divergence Change is in POS category (many examples discussed)

Head swapping divergence E: Prime Minister of India; H: bhaarat ke pradhaan

mantrii (India-of Prime Minister) Lexical divergence

E: advise; H: paraamarsh denaa (advice give): Noun Incorporation- very common Indian Language Phenomenon

Page 3: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Language Divergence Theory: Syntactic Divergences Constituent Order divergence

E: Singh, the PM of India, will address the nation today; H: bhaarat ke pradhaan mantrii, singh, … (India-of PM, Singh…)

Adjunction Divergence E: She will visit here in the summer; H: vah yahaa garmii

meM aayegii (she here summer-in will come) Preposition-Stranding divergence

E: Who do you want to go with?; H: kisake saath aap jaanaa chaahate ho? (who with…)

Null Subject Divergence E: I will go; H: jaauMgaa (subject dropped)

Pleonastic Divergence E: It is raining; H: baarish ho rahii haai (rain happening

is: no translation of it)

Page 4: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Alignment

Completely aligned Your answer is right Votre response est just

Problematic alignment We first met in Paris Nous nous sommes rencontres pour

la premiere fois a Paris

Page 5: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

The Statistical MT model: notation Source language: F Target Language: E Source language sentence: f Target language sentence: e Source language word: wf

Target language word: we

Page 6: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

The Statistical MT modelTo translate f:

1. Assume that all sentences in E are translations of f with some probability!

2. Choose the translation with the highest probability

ˆ arg max ( | )e

e P e f

Page 7: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

SMT Model

• What is a good translation?– Faithful to source– Fluent in target

ˆ arg max ( ) ( | ) e

e P e P f e

fluency

faithfulness

Page 8: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Language Modeling• Task to find P(e) (assigning probabilities to

sentences)

1 2

1 2 1 2 1 3 1 2 1 2 1

1 21 2 1

1 2 1

If ,

( ) ( ... ) ( ) ( | ) ( | ) ( | )

( )( | )

( )

n

n n n

nn n

nw

e w w w

P e P w w w P w P w w P w w w P w w w w

count w w wP w w w w

count w w w w

Page 9: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Language Modeling: The N-gram approximation

• Probability of the word given the previous N-1 words

• N=2: bigram approximation• N=3: trigram approximation• Bigram approximation:

1 2 1 2 1 3 2 1

11

1

( ) ( ... ) ( ) ( | ) ( | ) ( | )

( )( | )

( )

n n n

n nn n

nw

P e P w w w P w P w w P w w P w w

count w wP w w

count w w

Page 10: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Translation Modeling Task: to find P(f|e) Cannot use the counts of f and e Approximate: P(f|e) using the product of word

translation probabilities (IBM model 1)

Problem: How to calculate word translation probabilities?

Note: We do not have counts – training corpus is sentence-aligned, not word-aligned

Page 11: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Word-alignment example (1) (2) (3) (4) Ram has an apple

रा�म के� पा�स एके स�ब है� (1) (2)(3) (4) (5) (6)

Page 12: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Expectation Maximization for the translation model

Page 13: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Expectation-Maximization algorithm1. Start with uniform word translation

probabilities2. Use these probabilities to find the counts

(fractional)3. Use these new counts to recompute the word

translation probabilities4. Repeat the above steps till values converge

Works because of the co-occurrence of words that are actually translations

It can be proven that EM converges

Page 14: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

The counts in IBM Model 1Works by maximizing P(f|e) over the entire corpus

For IBM Model 1, we get the following relationship:

0

( | )( | ; , )

( | ) ( | )

( | ; , ) is the fractional count of the alignment of

with in and

( | ) is the probability of being the translation of

is the count of

l

f ef e

e ef f

f e f

e

f e f e

t w wc w w f e

t w w t w w

c w w f e w

w f e

t w w w w

Α.Β

Α in

is the count of in

f

e

w f

w eΒ

Page 15: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

The translation probabilities in IBM Model 1

( ) ( )

1

( | ) ( | ; , )

To get ( | ), normalize such that

( | ) 1

Sf e f e s s

s

f e

f e

f

t w w c w w f e

t w w

t w w

Page 16: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

English-French example of alignment Completely aligned

Your1 answer2 is3 right4

Votre1 response2 est3 just4

Alignment: 11, 22, 33, 44 Problematic alignment

We1 first2 met3 in4 Paris5

Nous1 nous2 sommes3 rencontres4 pour5 la6 premiere7 fois8 a9 Paris10

Alignment: 1(1,2) , 2(5,6,7,8) , 34 , 49 , 510

Fertilty?: yes

Page 17: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

EM for word alignment from sentence alignment: example

English

(1) three rabits

a b

(2) rabbits of Grenoble

b c d

French

(1) trois lapins

x y

(2) lapins de Grenoble

x y z

Page 18: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Initial Probabilities: each cell denotes t(a w), t(a x) etc.

a b c d

w 1/4 1/4 1/4 1/4

x 1/4 1/4 1/4 1/4

y 1/4 1/4 1/4 1/4

z 1/4 1/4 1/4 1/4

Page 19: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

The counts in IBM Model 1Works by maximizing P(f|e) over the entire corpus

For IBM Model 1, we get the following relationship:

0

( | )( | ; , )

( | ) ( | )

( | ; , ) is the fractional count of the alignment of

with in and

( | ) is the probability of being the translation of

is the count of

l

f ef e

e ef f

f e f

e

f e f e

t w wc w w f e

t w w t w w

c w w f e w

w f e

t w w w w

Α.Β

Α in

is the count of in

f

e

w f

w eΒ

Page 20: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Example of expected countC[aw; (a b)(w x)]

t(aw)= ------------------------- X #(a in ‘a b’) X #(w in ‘w

x’) t(aw)+t(ax)

1/4= ----------------- X 1 X 1= 1/2

1/4+1/4

Page 21: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

“counts”

b c d

x y z

a b c d

w 0 0 0 0

x 0 1/3 1/3 1/3

y 0 1/3 1/3 1/3

z 0 1/3 1/3 1/3

a b

w x

a b c d

w 1/2 1/2 0 0

x 1/2 1/2 0 0

y 0 0 0 0

z 0 0 0 0

Page 22: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Revised probability: example

trevised(a w)

1/2= ----------------------------------------

(½+1/2 +0+0 )(a b)( w x) +(0+0+0+0 )(b c d) (x y z)

Page 23: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Revised probabilities table

a b c d

w 1/2 1/4 0 0

x 1/2 5/12 1/3 1/3

y 0 1/6 1/3 1/3

z 0 1/6 1/3 1/3

Page 24: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

“revised counts”

b c d

x y z

a b c d

w 0 0 0 0

x 0 5/9 1/3 1/3

y 0 2/9 1/3 1/3

z 0 2/9 1/3 1/3

a b

w x

a b c d

w 1/2 3/8 0 0

x 1/2 5/8 0 0

y 0 0 0 0

z 0 0 0 0

Page 25: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Re-Revised probabilities table

a b c d

w 1/2 3/16 0 0

x 1/2 85/144 1/3 1/3

y 0 1/9 1/3 1/3

z 0 1/9 1/3 1/3

Continue until convergence; notice that (b,x) binding gets progressively stronger

Page 26: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Another Example

A four-sentence corpus: a b ↔ x y (illustrated book ↔ livre illustrie) b c ↔ x z (book shop ↔ livre magasin)

Assuming no null alignments. Possible alignments:

a b a b b c b c

x y x y x z x z

Page 27: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Iteration 1Initialize: uniform probabilities to all word translations

1( | ) ( | ) ( | )

31

( | ) ( | ) ( | )3

1( | ) ( | ) ( | )

3Compute the fractional counts:

1( | ; , ) , ( | ; , ) 0

2

( | ; ,

t a x t b x t c x

t a y t b y t c y

t a z t b z t c z

c a x ab xy c a x bc xz

c a y ab xy

1) , ( | ; , ) 0

21 1

( | ; , ) , ( | ; , )2 2

c a y bc xz

c b x ab xy c b x bc xz

Page 28: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Iteration 2

From these counts, recomputing the probabilities:

1 1 1 1( | ) 0 ; ( | ) 0 ; ( | ) 0

2 2 2 21 1 1 1 1 1

( | ) 1; ( | ) 0 ; ( | ) 02 2 2 2 2 21 1 1 1

( | ) 0 ; ( | ) 0; ( | ) 02 2 2 2

These prob

t a x t a y t a z

t b x t b y t b z

t c x t c y t c z

abilities are not normalized (indicated by )t

Page 29: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Normalized probabilities: after iteration 2

1 11 12 2( | ) ; ( | ) ; ( | ) 0

1 1 1 14 21 02 2 2 21 1 1

( | ) ; ( | ) ; ( | ) ;2 2 21 1

( | ) ; ( | ) 0; ( | ) ;4 2

t a x t a y t a z

t b x t b y t b z

t c x t c y t c z

Page 30: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Normalized probabilities: after iteration 3

( | ) 0.15; ( | ) 0.64; ( | ) 0

( | ) 0.70; ( | ) 0.36; ( | ) 0.36

( | ) 0.15; ( | ) 0; ( | ) 0.64

The probabilities (after a few iterations) converge

as expected (a y; b ; c )

t a x t a y t a z

t b x t b y t b z

t c x t c y t c z

x z

Page 31: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Translation Model: Exact expression

Five models for estimating parameters in the expression [2]

Model-1, Model-2, Model-3, Model-4, Model-5

Choose alignment given e and m

Choose the identity of foreign word given e, m, a

Choose the length of foreign language string given e

Page 32: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

a

eafef )|,Pr()|Pr(

m

emafeaf )|,,Pr()|,Pr(

m

emafememaf ),|,Pr()|Pr()|,,Pr(

m

emafem ),|,Pr()|Pr(

m

m

j

jjjj emfaafem

1

11

11 ),,,|,Pr()|Pr(

m

j

jjj

jjj

m

emfafemfaaem1

111

11

11 ),,,|Pr(),,,|Pr()|Pr(

)|,,Pr( emaf )|Pr( em

m

j

jjj

jjj emfafemfaa

1

111

11

11 ),,,|Pr(),,,|Pr(

Proof of Translation Model: Exact expression

m is fixed for a particular f, hence

; marginalization

; marginalization

Page 33: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Model-1 Simplest model Assumptions

Pr(m|e) is independent of m and e and is equal to ε Alignment of foreign language words (FLWs) depends only on

length of English sentence

= (l+1)-1

l is the length of English sentence The likelihood function will be

Maximize the likelihood function constrained to

Page 34: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 17– Alignment in SMT) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th Feb, 2011

Model-1: Parameter estimation Using Lagrange multiplier for constrained maximization, the

solution for model-1 parameters

λe : normalization constant; c(f|e; f,e) expected count; δ(f,fj) is 1 if f & fj are same, zero otherwise.

Estimate t(f|e) using Expectation Maximization (EM) procedure