multilingual grammar induction with continuous language
TRANSCRIPT
Multilingual Grammar Induction with ContinuousLanguage Identification
Wenjuan Han, Ge Wang, Yong Jiang, Kewei Tu
ShanghaiTech University, Shanghai, ChinaAlibaba Group
{hanwj,wangge,tukw}@shanghaitech.edu.cn{yongjiang.jy}@alibaba-inc.com
November 9, 2019
Han et al., 2019 M-NDMV November 9, 2019 1 / 24
Outline
1 Motivation
2 Model
3 Learning
4 Experiments
5 Conclusion
Han et al., 2019 M-NDMV November 9, 2019 2 / 24
Motivation
Outline
1 Motivation
2 Model
3 Learning
4 Experiments
5 Conclusion
Han et al., 2019 M-NDMV November 9, 2019 3 / 24
Motivation
Grammar induction is the task to learn grammars form unannotatedcorpus.
Han et al., 2019 M-NDMV November 9, 2019 4 / 24
Motivation
Grammar induction is the task to learn grammars form unannotatedcorpus.
Multilingual grammar induction couples grammar parameters ofdifferent languages together and learns them simultaneously.
Han et al., 2019 M-NDMV November 9, 2019 5 / 24
Motivation
Grammar induction is the task to learn grammars form unannotatedcorpus.
Multilingual grammar induction couples grammar parameters ofdifferent languages together and learns them simultaneously.
→ The key is to exploit the similarities between languages.
Han et al., 2019 M-NDMV November 9, 2019 6 / 24
Motivation
Existing approaches to tackle this problem:
Treating languages equally (Iwata et al., 2010).
Utilizing hand-crafted phylogenetic tree to encode this kind ofinformation (Berg-Kirkpatrick and Klein, 2010).
Han et al., 2019 M-NDMV November 9, 2019 7 / 24
Motivation
Existing approaches to tackle this problem:
Treating languages equally (Iwata et al., 2010). → Languagesimilarity ignored.
Utilizing hand-crafted phylogenetic tree to encode this kind ofinformation (Berg-Kirkpatrick and Klein, 2010). → Need linguisticknowledge and sometimes could be misleading. Example: English isdominant SVO while German is not, although they are both Germaniclanguages.
Han et al., 2019 M-NDMV November 9, 2019 8 / 24
Model
Outline
1 Motivation
2 Model
3 Learning
4 Experiments
5 Conclusion
Han et al., 2019 M-NDMV November 9, 2019 9 / 24
Model
We represent language identities with continuous vectors i.e., languageembeddings and use them to encode language similarity.
Han et al., 2019 M-NDMV November 9, 2019 10 / 24
Model
Model Architecture
0
0.125
0.25
0.375
0.5
…
…
…
…
⌦
0
0.15
0.3
0.45
0.6
val
⌦Wdir⌦
Multilingual Grammar Model
(G)
Auxiliary Language Identification
Model (I)
language embedding
matrix
…
h
x1 x2 x3 xn
Pattach(c|·)
⌦ Wc
l
P (l|x)
Neural DMV grammar rule probability:PATTACH(child |head , direction, valence)
Han et al., 2019 M-NDMV November 9, 2019 11 / 24
Model
Model Architecture
0
0.125
0.25
0.375
0.5
…
…
…
…
⌦
0
0.15
0.3
0.45
0.6
val
⌦Wdir⌦
Multilingual Grammar Model
(G)
Auxiliary Language Identification
Model (I)
language embedding
matrix
…
h
x1 x2 x3 xn
Pattach(c|·)
⌦ Wc
l
P (l|x)
Neural DMV grammar rule probability:PATTACH(child |head , direction, valence)Now we have: PATTACH(child |head , direction, valence, language)
Han et al., 2019 M-NDMV November 9, 2019 12 / 24
Model
Model Architecture
0
0.125
0.25
0.375
0.5
…
…
…
…
⌦
0
0.15
0.3
0.45
0.6
val
⌦Wdir⌦
Multilingual Grammar Model
(G)
Auxiliary Language Identification
Model (I)
language embedding
matrix
…
h
x1 x2 x3 xn
Pattach(c|·)
⌦ Wc
l
P (l|x)
Predict language identification with language embeddings and sentencerepresentations.
Han et al., 2019 M-NDMV November 9, 2019 13 / 24
Model
Model Architecture
0
0.125
0.25
0.375
0.5
…
…
…
…
⌦
0
0.15
0.3
0.45
0.6
val
⌦Wdir⌦
Multilingual Grammar Model
(G)
Auxiliary Language Identification
Model (I)
language embedding
matrix
…
h
x1 x2 x3 xn
Pattach(c|·)
⌦ Wc
l
P (l|x)
For each training sentence x(i) from language l :
P(x(i)|Gl (i)), the probability of the training sentence x(i) beinggenerated from grammar Gl (i) .
P(l (i)|x(i)), the probability of correct language identification of x(i).
Han et al., 2019 M-NDMV November 9, 2019 14 / 24
Learning
Outline
1 Motivation
2 Model
3 Learning
4 Experiments
5 Conclusion
Han et al., 2019 M-NDMV November 9, 2019 15 / 24
Learning
Objective
For each training sentence x(i):
P(x(i)|Gl (i)), the probability of the training sentence x(i) beinggenerated from grammar Gl (i) .
P(l (i)|x(i)), the probability of correct language identification of x(i).
The training objective is:
L(Θ) =∑
(x,l)∈D
(logPΘ(x|Gl) + λ logPΘ(l |x)
)
Han et al., 2019 M-NDMV November 9, 2019 16 / 24
Learning
Learning
P(x(i)|Gl (i)) → this term is optimized with EM (Adam used in Mstep).
P(l (i)|x(i)) → Adam to optimize this term.
Han et al., 2019 M-NDMV November 9, 2019 17 / 24
Experiments
Outline
1 Motivation
2 Model
3 Learning
4 Experiments
5 Conclusion
Han et al., 2019 M-NDMV November 9, 2019 18 / 24
Experiments
Dataset
We selected 15 languages across 8 language families and subfamilies fromUD dataset to ensure diversity.
Language UD Treebank Language Family Corpus SizeET Estonian Finnic 11404
FI Finnish Finnic 9648
NL Dutch Germanic 8783
EN English Germanic 7674
DE German Germanic 7447
NO Norwegian Germanic 10017
GRC Ancient Greek Hellenic 9387
HI Hindi Indo-Iranian 4997
JA Japanese Japonic 7441
FR French Romance 4976
IT Italian Romance 6492
LA Latin-ITTB Romance 10136
BG Bulgarian Slavonic 6507
SL Slovenian Slavonic 3800
EU Basque Vasconic 4271
Han et al., 2019 M-NDMV November 9, 2019 19 / 24
Experiments
Comparison of monolingual and multilingual approaches.
G: our multilingual grammar model.
G+I: our multilingual grammar model and auxiliary language identificationtask.
Code Monolingual Multilingual
DMV NDMV DMV NDMV G G+IET 51.8 52.9 43.1 45.3 56.0 56.4FI 31.8 27.6 39.1 40.0 50.7 49.3NL 42.4 35.6 46.5 47.8 50.4 50.6EN 51.8 53.7 47.7 50.8 51.7 52.7DE 52.8 50.4 55.5 57.2 59.6 61.4NO 58.9 59.2 55.7 58.8 61.0 61.3
GRC 40.4 37.7 41.1 40.8 46.8 46.2HI 52.6 53.9 29.2 31.1 47.4 46.8JA 39.8 37.1 27.8 29.6 43.4 44.2FR 58.8 38.1 59.6 59.4 58.4 60.1IT 60.8 63.6 66.7 66.4 64.4 65.9LA 32.6 36.3 39.8 42.0 45.1 45.0BG 58.9 61.8 65.9 69.4 71.3 71.3SL 70.7 67.5 62.1 63.3 68.3 68.6EU 42.1 45.5 45.7 45.2 54.2 53.6
Avg 49.7 48.1 48.4 49.8 55.3 55.6
Each language is indicated by its ISO 639 code.
Han et al., 2019 M-NDMV November 9, 2019 20 / 24
Experiments
Visualization of the language embeddings
Language X Y L-FEstonian 281.8368 294.4548 FinnicLatin 99.28075 -133.8807 RomanceNorwegian -58.05113 -1.362061 GermanicFinnish 291.2706 254.6438 FinnicAncient_Greek 170.2816 -160.4296 HellenicDutch -274.3047 -71.46408 GermanicEnglish -90.90612 -27.69737 GermanicGerman -291.0553 -31.99229 GermanicJapanese 229.3594 -182.7461 JaponicBulgarian 4.288813 -12.05062 SlavonicItalian -306.261 126.6668 RomanceHindi 52.45371 -182.9534 Indo-IranianFrench -291.0037 165.9962 RomanceBasque 116.8331 4.031695 VasconicSlovenian 65.97718 -41.21712 Slavonic
281.8368 294.4548 -127.1413 -453.943799.28075 -133.8807 -186.8575 44.95691
-58.05113 -1.362061 122.555 39.78903291.2706 254.6438 -35.18907 -458.5191170.2816 -160.4296 -267.4718 181.6538
-274.3047 -71.46408 59.31725 274.2233-90.90612 -27.69737 30.11588 78.54933-291.0553 -31.99229 163.4853 224.1084229.3594 -182.7461 -364.3714 279.10694.288813 -12.05062 -10.18006 -51.6754-306.261 126.6668 316.187 197.147152.45371 -182.9534 -279.5474 -87.38526
-291.0037 165.9962 307.6128 92.68643116.8331 4.031695 184.086 -239.275965.97718 -41.21712 87.39928 -121.4219
Estonian
Finnish
Latin
Italian
French
Norwegian
Dutch
EnglishGerman Bulgarian
Slovenian
Ancient_Greek
Japanese
Hindi
Basque
Finnic Romance Germanic Slavonic
Hellenic Japonic Indo-Iranian Vasconic
Han et al., 2019 M-NDMV November 9, 2019 21 / 24
Conclusion
Outline
1 Motivation
2 Model
3 Learning
4 Experiments
5 Conclusion
Han et al., 2019 M-NDMV November 9, 2019 22 / 24
Conclusion
Conclusion
We represent language identities with language embeddings and usethem to encode language similarity.
The language embeddings are used for grammar parameter predictionand auxiliary language identification task.
The language embeddings learned in our model can capture languagesimilarity that can not be inferred from phylogenetic knowledge.
Han et al., 2019 M-NDMV November 9, 2019 23 / 24
Conclusion
Multilingual Grammar Induction with ContinuousLanguage Identification
Wenjuan Han, Ge Wang, Yong Jiang, Kewei Tu
ShanghaiTech University, Shanghai, ChinaAlibaba Group
{hanwj,wangge,tukw}@shanghaitech.edu.cn{yongjiang.jy}@alibaba-inc.com
November 9, 2019
Han et al., 2019 M-NDMV November 9, 2019 24 / 24