Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

Download Beyond the Hype of Neural Machine Translation, Diego Bartolome (tauyou) and Gema Ramirez (Prompsit Language Engineering)

Post on 14-Apr-2017

1.087 views

Embed Size (px)

TRANSCRIPT

  • Beyond the Hype of Neural Machine

    Translation

    Tauyou & Prompsit

    (Diego) dbc@tauyou.com | (Gema) gramirez@prompsit.com

    mailto:diego.bartolome@tauyou.commailto:gramirez@prompsit.com

  • Why neural nets?artificial neural networks [...] are able to be trained

    from examples without the need for a thorough understanding of the task in hand, and able to show surprising generalization performance and predicting power

    Mikel L. Forcada (Neural Networks: Automata and Formal Models of Computation)

  • Why neural nets in MT now?MT maturity

    MT is widely used (but planning to use it everywhere) MT for some languages is still not good enough (yes for others) RBMT, SMT and hybrid MT approaches widely exploited

    Resources availability

    Computational power available and cheap (GPUs) Deep learning algorithms and frameworks available Data to learn from also available (corpora)

  • So, why not?Promising results from WMT16 competition: all best systems are NMT ones

    SMT NMT

    BLEU TER BLEU TER

    en-fi* 14.8 0.76 17.8 0.72

    en-ro 27.4 0.61 28.7 0.60

    en-ru 24.0 0.68 26.0 0.65

    en-de 31.4 0.58 34.8 0.54

    en-cz 24.1 0.67 26.3 0.63

    * en-fi are Prompsits + DCU systems

  • Neural nets are... ...computational models inspired by Biology ...playing increasing key roles in Graphics and Pattern Recognition ...experiencing a new edge thanks to hardware and deep learning ...made of encoding/decoding neurons ...applied to translation (= neural MT = NMT):

    encode SL words as vectors that represent the relevant information

    decode vectors into words preserving syntactic and semantic information in the TL

  • NMT requires... Hardware: raw 10xCPUs or GPU

    (times get shorter with GPUs) Software: deep learning framework

    (Theano, Torch, etc.) + NMT libraries Data: bilingual corpora

    (monolingual for LM only) Learning & (early) stopping: iteratively, translation models are created. Picking up a model: evaluation and selection of best model(s) Translating: model(s) are used to translate

  • Down to the NMT business

  • Applying NMT to generic and in-domain use casesGeneric English -- Swedish SMT vs. NMT

    Same generic corpus (8M segments), same training and test sets SMT: Moses-based with no tuning on CPU NMT: Theano-based Groundhog NMT toolkit on GPU

    Domain-specific English -- Norwegian SMT vs. NMT

    Same in-domain corpus (800K segments), same training and test sets SMT: Moses-based + tuning on CPU NMT: Theano-based Groundhog NMT toolkit on GPU

  • Comparison for generic English - SwedishSMT NMT

    Training time 48 hours (CPU) 2 weeks (GPU)

    Translation time 00:12:35 (866 segments) 01:38:47 (866 segments)

    CPU usage in translation 56% (CPU) 100% (CPU)

    Space in disk 37.7 GB 9.1GB

    BLEU score 0.440 0.404

    Identical matches 19.33% (161/866) 12% (104/866)

    Edit distance similarity 0.78 0.746

  • Comparison for in-domain English - NorwegianSMT NMT

    Training time 1.8 hours (3 CPUs) 7 days (1 GPU)

    Translation time 00:01:22 (1,000 segments) 02:08:00 (1,000 segments)

    CPU usage in translation 56% (CPU) 100% (CPU)

    Space in disk 2.3 GB 6.5GB

    BLEU score 0.53 0.62

    Identical matches 27.76% (276/1000) 30% (300/1000)

    Edit distance similarity 0.77 0.83

  • Conclusions SMT vs. NMT: technical insight

    SMT NMT

    Space in disk Smaller

    CPU during translation

    RAM during translation Lesser

    Training speed rate Faster Can be optimized by hardware

    Translation speed rate Faster Can be optimized by hardware

  • In domain

    SMT NMT

    BLEU

    Identical matches

    Edit distance similarity

    Translators feedback

    Generic

    SMT NMT

    BLEU

    Identical matches

    Edit distance similarity

    Translators feedback

    Conclusions SMT vs. NMT: qualitative insight

  • Final conclusions NMT is a new big player in MT:

    Research now focusing heavily on NMT: already outperforms SMT in many cases

    Use case results: with little effort, it is on par with SMT Hardware requirements are more demanding for NMT:

    higher budget Translators feedback: SMT is still better

  • Final conclusions SMT, and other approaches, more robust and alive

    Better quality and consistency in MT output. Better ROI, specially for real-time translation applications

    where speed is critical Deep learning for other NLP applications?

    Of course! Vivid in quality estimation, terminology, sentiment analysis, etc.

  • Thanks! Go raibh maith agaibh!

    Tauyou & Prompsit

    (Diego) dbc@tauyou.com | (Gema) gramirez@prompsit.com

    mailto:diego.bartolome@tauyou.commailto:gramirez@prompsit.com