machine translation paper 1 daniel montalvo, chrysanthia cheung-lau, jonny wang cs159 spring 2011

19
MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Upload: todd-dickerson

Post on 18-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

MACHINE TRANSLATION PAPER 1

Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny WangCS159 Spring 2011

Page 2: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Paper

Intersecting multilingual data for faster and better statistical translations. Association for Computational Linguistics, June 2009.

Yu Chen Martin Kay Andreas Eisele

Page 3: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Introduction

Statistical Machine Translation (SMT):

Spanish(foreign)

Translationmodel

languagemodel

BrokenEnglish

English

model

p(e | f )∝ p( f | e)p(e)

translation model language model

Page 4: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Statistical MT Overview

Bilingual data

model

monolingual data

learned parameters

Foreign sentence

Translation

Find the besttranslation given the foreign sentence and the model

Englishsentence

Page 5: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Problems

Noise from word/phrase alignment Efficiency: time and space

Page 6: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Problems

Sie lieben ihre Kinder nichtThey love their children notThey don’t love their children

ihre Kinder nicht => a) their children are not b) their children

Language model =/= They love their children are not

Solution: remove b) from translation model

Page 8: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Solution: Triangulation

SOURCE LANGUAGE TARGET LANGUAGE

BRIDGE LANGUAGE

Page 9: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Triangulation

Look at source-target phrase pair (S,T)

S => X, T => X, S => T (keep) S => X, T => Y, S =/> T (drop) S => ?, T => ?, ??? (keep)

Page 10: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Experimental Design

Database: Europarl (1.3 million) Divided by maximal sentence length Spanish => English; bridges: French,

Portuguese, Danish, German, Finnish Models from Moses toolkit Train, develop, test

Page 11: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011
Page 12: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Results

Page 13: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011
Page 14: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Phrase Length

"Long phrases suck” (Koehn, 2003)

Page 15: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011
Page 16: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Further Research

Page 17: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Further Research

German => English

Page 18: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Problems / Conclusion

Throwing away a lot of data (94%) Picking a bridge language (sometimes) Limitations of data set (Arabic?

Chinese?) Overall, great!

Page 19: MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011

Thanks!