meta-learning for low resource nmt
TRANSCRIPT
Meta-Learning for Low
Resource NMT
Introduction
● Historically Statistical Translation
● Neural Machine Translation recently outperforms
● Statistical Models outperformed translations on low resource language pairs
NMT Previous Work
Single Task Mixed Datasets
Monolingual Corpora
Direct Transfer Learning
Meta Learning in NMT
Idea:
Improve on direct transfer learning by better fine-tuning
Danish
FrenchSpanish
Greek
Greek
Polish
Portuguese
Italian
17 High-Resource Languages 4 Low-resource Languages
Turkish
Romanian Latvian
Finnish
MAML for NMT
Danish
FrenchSpanish
Greek
Greek
Polish
Portuguese
Italian
17 High-Resource Languages 4 Low-resource Languages
Turkish
Romanian Latvian
Finnish
Meta-train on these!
Meta-test on these!
e.g. Spanish English
e.g. Turkish English
Note: they simulate low-resource by sub-sampling
Gradient Update
Meta-Gradient Update
Gradient Update
1st-order Approximate Meta-Gradient Update
Meta-Gradient Update
Issue: Meta-train and meta-test input spaces should match!
Meta-train
Meta-test
En un lugar de la mancha, de cuyo nombre no puedo...
In some place in the Mancha, whose name...
Benim adım kırmızı... My name is Red…
Spanish Word Embeddings
Turkish Word Embeddings
Spanish Embedding for nombre
Turkish embedding
for adım
Trained independently
Universal Lexical Representation
Spanish Word Embeddings
English Word Embeddings
French Word Embeddings
Turkish Word Embeddings
Word embeddings trained independently on monolingual corpora
Universal Lexical Representation
Spanish Word Embeddings
English Word Embeddings
French Word Embeddings
Turkish Word Embeddings
Word embeddings trained independently on monolingual corpora
Universal Embedding Values
Universal Embedding Keys
Universal Lexical Representation
Universal Embedding Keys(transposed)
Universal Embedding Values
nombre
Spanish Word Embeddings
Transformation Matrix
Key: We represent “nombre” as a linearcombination of tokens in the ULR!
And, these are the weights of the linear combination!
Universal Lexical Representation
Universal Embedding Keys(transposed)
Universal Embedding Values
adım
Turkish Word Embeddings
Transformation Matrix
Same embedding space as Spanish!
Training
Universal Embedding Keys(transposed)
Universal Embedding Values
nombre
Spanish Word Embeddings
Transformation Matrix
trainable fixed
Experiments
Comment: Best to leave the
decoder be! Why?
Experiments
Comment: Gap narrows as more
training examples are included
Critique: Don’t evaluate on any
real low-resource languages!
Critique: Don’t know how many
training examples per task? k-shot, but what is k?