statistical machine translation of texts with misspelled words nicola bertoldi, mauro cettolo,...

20
Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento, Italy ACL 2010

Upload: eunice-wiggins

Post on 18-Jan-2018

215 views

Category:

Documents


0 download

DESCRIPTION

Introduction non-word error

TRANSCRIPT

Page 1: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

Statistical Machine Translation of Texts with Misspelled Words

Nicola Bertoldi, Mauro Cettolo, Marcello FedericoFBK - Fondazione Bruno Kessler,

 Trento, Italy

ACL 2010

Page 2: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

Outline

Introduction System Data Evaluation Conclusions

Page 3: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

Introduction

non-word error

Page 4: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

Introduction

real-word error

Page 5: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

Introduction

Six different typing error operations ◆ Substitution

Target:  [We] had just come in from Australia.Error :  [Ww] had just come in from Australia.

◆ InsertionTarget:  is a good place to stay, if you are looking for a hotel [around] LAX airport.Error :  is a good place to stay, if you are looking for a hotel [arround] LAX airport.

◆ DeletionTarget: The room was [excellent] but the hallway was [filthy].Error : The room was [exellent] but the hallway was [filty].

Page 6: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

Introduction

◆ TranspositionTarget:  The staff was [friendly].Error :  The staff was [freindly].

◆ Run-OnTarget:   I saw a teacher[.] who cares?Error :   I saw a teacher[ ] who cares?

◆ SplitTarget:  [We] had just come in from Australia.Error :  [W e] had just come in from Australia.

Introduction

Page 7: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

Outline

Introduction System Data Evaluation Conclusions

Page 8: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

System

Page 9: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

SystemStep 1.

Step 2.

Page 10: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

SystemStep 3.

Page 11: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

SystemStep 4.

Page 12: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

SystemStep 5.Translation of the CN (e) is performed with the Moses decoder (Koehn et al., 2007)

Page 13: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

Outline

Introduction System Data Evaluation Conclusions

Page 14: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

Data

Page 15: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

DataEvaluation DataNon-word NoiseRandomly replace words in the text according to a list of 4,100frequently non-word errors provided in the Wikipedia.

Real-word NoiseReal-word errors are automatically introduced by another list of frequently misused words in the Wikipedia.

Random-word NoiseCorrupting the original text by randomly replacing, inserting,and deleting Characters.

Page 16: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

Outline

Introduction System Data Evaluation Conclusions

Page 17: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

Evaluation

Page 18: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

Evaluation

Page 19: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

Outline

Introduction System Data Evaluation Conclusions

Page 20: Statistical Machine Translation of Texts with Misspelled Words Nicola Bertoldi, Mauro Cettolo, Marcello Federico FBK - Fondazione Bruno Kessler, Trento,

Conclusions

◆ This paper addressed the issue of automatically translating written texts that are corrupted by misspelling errors.

◆ The enhanced MT system has been tested on texts corrupted with increasing noise levels of three different sources: random, non-word, and real-word errors.

◆ The impact of misspelling errors on MT performance depends on the noise rate, but not on the noise source.