beyond the hype of neural machine translation, diego bartolome (tauyou) and gema ramirez (prompsit...
TRANSCRIPT
Beyond the Hype of Neural Machine
Translation
Tauyou & Prompsit
(Diego) [email protected] | (Gema) [email protected]
Why neural nets?“artificial neural networks [...] are able to be trained
from examples without the need for a thorough understanding of the task in hand, and able to show surprising generalization performance and predicting power”
Mikel L. Forcada (Neural Networks: Automata and Formal Models of Computation)
Why neural nets in MT now?MT maturity
➔ MT is widely used (but planning to use it everywhere)➔ MT for some languages is still not good enough (yes for others)➔ RBMT, SMT and hybrid MT approaches widely exploited
Resources availability
➔ Computational power available and cheap (GPUs)➔ Deep learning algorithms and frameworks available ➔ Data to learn from also available (corpora)
So, why not?Promising results from WMT16 competition: all best systems are NMT ones
SMT NMT
BLEU TER BLEU TER
en-fi* 14.8 0.76 17.8 0.72
en-ro 27.4 0.61 28.7 0.60
en-ru 24.0 0.68 26.0 0.65
en-de 31.4 0.58 34.8 0.54
en-cz 24.1 0.67 26.3 0.63
* en-fi are Prompsit’s + DCU systems
Neural nets are...➔ ...computational models inspired by Biology➔ ...playing increasing key roles in Graphics and Pattern Recognition➔ ...experiencing a new edge thanks to hardware and deep learning➔ ...made of encoding/decoding ‘neurons’ ➔ ...applied to translation (= neural MT = NMT):
◆ encode SL words as vectors that represent the relevant information
◆ decode vectors into words preserving syntactic and semantic information in the TL
NMT requires...➔ Hardware: raw 10xCPUs or GPU
(times get shorter with GPUs)➔ Software: deep learning framework
(Theano, Torch, etc.) + NMT libraries➔ Data: bilingual corpora
(monolingual for LM only)➔ Learning & (early) stopping: iteratively, translation models are created. ➔ Picking up a model: evaluation and selection of best model(s)➔ Translating: model(s) are used to translate
Down to the NMT business
Applying NMT to generic and in-domain use casesGeneric English -- Swedish SMT vs. NMT
➔ Same generic corpus (8M segments), same training and test sets➔ SMT: Moses-based with no tuning on CPU➔ NMT: Theano-based Groundhog NMT toolkit on GPU
Domain-specific English -- Norwegian SMT vs. NMT
➔ Same in-domain corpus (800K segments), same training and test sets➔ SMT: Moses-based + tuning on CPU➔ NMT: Theano-based Groundhog NMT toolkit on GPU
Comparison for generic English - SwedishSMT NMT
Training time 48 hours (CPU) 2 weeks (GPU)
Translation time 00:12:35 (866 segments) 01:38:47 (866 segments)
CPU usage in translation 56% (CPU) 100% (CPU)
Space in disk 37.7 GB 9.1GB
BLEU score 0.440 0.404
Identical matches 19.33% (161/866) 12% (104/866)
Edit distance similarity 0.78 0.746
Comparison for in-domain English - NorwegianSMT NMT
Training time 1.8 hours (3 CPUs) 7 days (1 GPU)
Translation time 00:01:22 (1,000 segments) 02:08:00 (1,000 segments)
CPU usage in translation 56% (CPU) 100% (CPU)
Space in disk 2.3 GB 6.5GB
BLEU score 0.53 0.62
Identical matches 27.76% (276/1000) 30% (300/1000)
Edit distance similarity 0.77 0.83
Conclusions SMT vs. NMT: technical insight
SMT NMT
Space in disk ✘ ✓ Smaller
CPU during translation ✓ ✘
RAM during translation ✘ ✓ Lesser
Training speed rate ✓ Faster ✘ Can be optimized by hardware
Translation speed rate ✓ Faster ✘ Can be optimized by hardware
In domain
SMT NMT
BLEU ✘ ✓
Identical matches ✘ ✓
Edit distance similarity ✘ ✓
Translators feedback ✓ ✘
Generic
SMT NMT
BLEU ≈ ≈
Identical matches ✓ ✘
Edit distance similarity ≈ ≈
Translators feedback ✓ ✘
Conclusions SMT vs. NMT: qualitative insight
Final conclusions➔ NMT is a new big player in MT:
◆ Research now focusing heavily on NMT: already outperforms SMT in many cases
◆ Use case results: with little effort, it is on par with SMT◆ Hardware requirements are more demanding for NMT:
higher budget◆ Translators feedback: SMT is still better
Final conclusions➔ SMT, and other approaches, more robust and alive
◆ Better quality and consistency in MT output. ◆ Better ROI, specially for real-time translation applications
where speed is critical➔ Deep learning for other NLP applications?
◆ Of course! Vivid in quality estimation, terminology, sentiment analysis, etc.
Thanks! Go raibh maith agaibh!
Tauyou & Prompsit
(Diego) [email protected] | (Gema) [email protected]