beyond the hype of neural machine translation, diego bartolome (tauyou) and gema ramirez (prompsit...

15
Beyond the Hype of Neural Machine Translation Tauyou & Prompsit (Diego) [email protected] | (Gema) [email protected]

Upload: taus-enabling-better-translation

Post on 14-Apr-2017

1.087 views

Category:

Presentations & Public Speaking


1 download

TRANSCRIPT

Page 1: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

Beyond the Hype of Neural Machine

Translation

Tauyou & Prompsit

(Diego) [email protected] | (Gema) [email protected]

Page 2: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

Why neural nets?“artificial neural networks [...] are able to be trained

from examples without the need for a thorough understanding of the task in hand, and able to show surprising generalization performance and predicting power”

Mikel L. Forcada (Neural Networks: Automata and Formal Models of Computation)

Page 3: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

Why neural nets in MT now?MT maturity

➔ MT is widely used (but planning to use it everywhere)➔ MT for some languages is still not good enough (yes for others)➔ RBMT, SMT and hybrid MT approaches widely exploited

Resources availability

➔ Computational power available and cheap (GPUs)➔ Deep learning algorithms and frameworks available ➔ Data to learn from also available (corpora)

Page 4: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

So, why not?Promising results from WMT16 competition: all best systems are NMT ones

SMT NMT

BLEU TER BLEU TER

en-fi* 14.8 0.76 17.8 0.72

en-ro 27.4 0.61 28.7 0.60

en-ru 24.0 0.68 26.0 0.65

en-de 31.4 0.58 34.8 0.54

en-cz 24.1 0.67 26.3 0.63

* en-fi are Prompsit’s + DCU systems

Page 5: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

Neural nets are...➔ ...computational models inspired by Biology➔ ...playing increasing key roles in Graphics and Pattern Recognition➔ ...experiencing a new edge thanks to hardware and deep learning➔ ...made of encoding/decoding ‘neurons’ ➔ ...applied to translation (= neural MT = NMT):

◆ encode SL words as vectors that represent the relevant information

◆ decode vectors into words preserving syntactic and semantic information in the TL

Page 6: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

NMT requires...➔ Hardware: raw 10xCPUs or GPU

(times get shorter with GPUs)➔ Software: deep learning framework

(Theano, Torch, etc.) + NMT libraries➔ Data: bilingual corpora

(monolingual for LM only)➔ Learning & (early) stopping: iteratively, translation models are created. ➔ Picking up a model: evaluation and selection of best model(s)➔ Translating: model(s) are used to translate

Page 7: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

Down to the NMT business

Page 8: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

Applying NMT to generic and in-domain use casesGeneric English -- Swedish SMT vs. NMT

➔ Same generic corpus (8M segments), same training and test sets➔ SMT: Moses-based with no tuning on CPU➔ NMT: Theano-based Groundhog NMT toolkit on GPU

Domain-specific English -- Norwegian SMT vs. NMT

➔ Same in-domain corpus (800K segments), same training and test sets➔ SMT: Moses-based + tuning on CPU➔ NMT: Theano-based Groundhog NMT toolkit on GPU

Page 9: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

Comparison for generic English - SwedishSMT NMT

Training time 48 hours (CPU) 2 weeks (GPU)

Translation time 00:12:35 (866 segments) 01:38:47 (866 segments)

CPU usage in translation 56% (CPU) 100% (CPU)

Space in disk 37.7 GB 9.1GB

BLEU score 0.440 0.404

Identical matches 19.33% (161/866) 12% (104/866)

Edit distance similarity 0.78 0.746

Page 10: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

Comparison for in-domain English - NorwegianSMT NMT

Training time 1.8 hours (3 CPUs) 7 days (1 GPU)

Translation time 00:01:22 (1,000 segments) 02:08:00 (1,000 segments)

CPU usage in translation 56% (CPU) 100% (CPU)

Space in disk 2.3 GB 6.5GB

BLEU score 0.53 0.62

Identical matches 27.76% (276/1000) 30% (300/1000)

Edit distance similarity 0.77 0.83

Page 11: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

Conclusions SMT vs. NMT: technical insight

SMT NMT

Space in disk ✘ ✓ Smaller

CPU during translation ✓ ✘

RAM during translation ✘ ✓ Lesser

Training speed rate ✓ Faster ✘ Can be optimized by hardware

Translation speed rate ✓ Faster ✘ Can be optimized by hardware

Page 12: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

In domain

SMT NMT

BLEU ✘ ✓

Identical matches ✘ ✓

Edit distance similarity ✘ ✓

Translators feedback ✓ ✘

Generic

SMT NMT

BLEU ≈ ≈

Identical matches ✓ ✘

Edit distance similarity ≈ ≈

Translators feedback ✓ ✘

Conclusions SMT vs. NMT: qualitative insight

Page 13: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

Final conclusions➔ NMT is a new big player in MT:

◆ Research now focusing heavily on NMT: already outperforms SMT in many cases

◆ Use case results: with little effort, it is on par with SMT◆ Hardware requirements are more demanding for NMT:

higher budget◆ Translators feedback: SMT is still better

Page 14: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

Final conclusions➔ SMT, and other approaches, more robust and alive

◆ Better quality and consistency in MT output. ◆ Better ROI, specially for real-time translation applications

where speed is critical➔ Deep learning for other NLP applications?

◆ Of course! Vivid in quality estimation, terminology, sentiment analysis, etc.

Page 15: Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

Thanks! Go raibh maith agaibh!

Tauyou & Prompsit

(Diego) [email protected] | (Gema) [email protected]