multilingual grammar induction with continuous language

Multilingual Grammar Induction with ContinuousLanguage Identification

Wenjuan Han, Ge Wang, Yong Jiang, Kewei Tu

ShanghaiTech University, Shanghai, ChinaAlibaba Group

{hanwj,wangge,tukw}@shanghaitech.edu.cn{yongjiang.jy}@alibaba-inc.com

November 9, 2019

Han et al., 2019 M-NDMV November 9, 2019 1 / 24

Outline

1 Motivation

2 Model

3 Learning

4 Experiments

5 Conclusion


Motivation

Outline

1 Motivation

2 Model

3 Learning

4 Experiments

5 Conclusion


Motivation

Grammar induction is the task to learn grammars form unannotatedcorpus.


Motivation


Multilingual grammar induction couples grammar parameters ofdifferent languages together and learns them simultaneously.


Motivation


Multilingual grammar induction couples grammar parameters ofdifferent languages together and learns them simultaneously.

→ The key is to exploit the similarities between languages.


Motivation

Existing approaches to tackle this problem:

Treating languages equally (Iwata et al., 2010).

Utilizing hand-crafted phylogenetic tree to encode this kind ofinformation (Berg-Kirkpatrick and Klein, 2010).


Motivation

Existing approaches to tackle this problem:

Treating languages equally (Iwata et al., 2010). → Languagesimilarity ignored.

Utilizing hand-crafted phylogenetic tree to encode this kind ofinformation (Berg-Kirkpatrick and Klein, 2010). → Need linguisticknowledge and sometimes could be misleading. Example: English isdominant SVO while German is not, although they are both Germaniclanguages.


Model

Outline

1 Motivation

2 Model

3 Learning

4 Experiments

5 Conclusion


Model

We represent language identities with continuous vectors i.e., languageembeddings and use them to encode language similarity.


Model

Model Architecture

0

0.125

0.25

0.375

0.5

…

…

…

…

⌦

0

0.15

0.3

0.45

0.6

val

⌦Wdir⌦

Multilingual Grammar Model

(G)

Auxiliary Language Identification

Model (I)

language embedding

matrix

…

h

x1 x2 x3 xn

Pattach(c|·)

⌦ Wc

l

P (l|x)

Neural DMV grammar rule probability:PATTACH(child |head , direction, valence)


Model

Model Architecture

0

0.125

0.25

0.375

0.5

…

…

…

…

⌦

0

0.15

0.3

0.45

0.6

val

⌦Wdir⌦


(G)


Model (I)

language embedding

matrix

…

h

x1 x2 x3 xn

Pattach(c|·)

⌦ Wc

l

P (l|x)

Neural DMV grammar rule probability:PATTACH(child |head , direction, valence)Now we have: PATTACH(child |head , direction, valence, language)


Model

Model Architecture

0

0.125

0.25

0.375

0.5

…

…

…

…

⌦

0

0.15

0.3

0.45

0.6

val

⌦Wdir⌦


(G)


Model (I)

language embedding

matrix

…

h

x1 x2 x3 xn

Pattach(c|·)

⌦ Wc

l

P (l|x)

Predict language identification with language embeddings and sentencerepresentations.


Model

Model Architecture

0

0.125

0.25

0.375

0.5

…

…

…

…

⌦

0

0.15

0.3

0.45

0.6

val

⌦Wdir⌦


(G)


Model (I)

language embedding

matrix

…

h

x1 x2 x3 xn

Pattach(c|·)

⌦ Wc

l

P (l|x)

For each training sentence x(i) from language l :

P(x(i)|Gl (i)), the probability of the training sentence x(i) beinggenerated from grammar Gl (i) .

P(l (i)|x(i)), the probability of correct language identification of x(i).


Learning

Outline

1 Motivation

2 Model

3 Learning

4 Experiments

5 Conclusion


Learning

Objective

For each training sentence x(i):

P(x(i)|Gl (i)), the probability of the training sentence x(i) beinggenerated from grammar Gl (i) .

P(l (i)|x(i)), the probability of correct language identification of x(i).

The training objective is:

L(Θ) =∑

(x,l)∈D

(logPΘ(x|Gl) + λ logPΘ(l |x)

)


Learning

Learning

P(x(i)|Gl (i)) → this term is optimized with EM (Adam used in Mstep).

P(l (i)|x(i)) → Adam to optimize this term.


Experiments

Outline

1 Motivation

2 Model

3 Learning

4 Experiments

5 Conclusion


Experiments

Dataset

We selected 15 languages across 8 language families and subfamilies fromUD dataset to ensure diversity.

Language UD Treebank Language Family Corpus SizeET Estonian Finnic 11404

FI Finnish Finnic 9648

NL Dutch Germanic 8783

EN English Germanic 7674

DE German Germanic 7447

NO Norwegian Germanic 10017

GRC Ancient Greek Hellenic 9387

HI Hindi Indo-Iranian 4997

JA Japanese Japonic 7441

FR French Romance 4976

IT Italian Romance 6492

LA Latin-ITTB Romance 10136

BG Bulgarian Slavonic 6507

SL Slovenian Slavonic 3800

EU Basque Vasconic 4271


Experiments

Comparison of monolingual and multilingual approaches.

G: our multilingual grammar model.

G+I: our multilingual grammar model and auxiliary language identificationtask.

Code Monolingual Multilingual

DMV NDMV DMV NDMV G G+IET 51.8 52.9 43.1 45.3 56.0 56.4FI 31.8 27.6 39.1 40.0 50.7 49.3NL 42.4 35.6 46.5 47.8 50.4 50.6EN 51.8 53.7 47.7 50.8 51.7 52.7DE 52.8 50.4 55.5 57.2 59.6 61.4NO 58.9 59.2 55.7 58.8 61.0 61.3

GRC 40.4 37.7 41.1 40.8 46.8 46.2HI 52.6 53.9 29.2 31.1 47.4 46.8JA 39.8 37.1 27.8 29.6 43.4 44.2FR 58.8 38.1 59.6 59.4 58.4 60.1IT 60.8 63.6 66.7 66.4 64.4 65.9LA 32.6 36.3 39.8 42.0 45.1 45.0BG 58.9 61.8 65.9 69.4 71.3 71.3SL 70.7 67.5 62.1 63.3 68.3 68.6EU 42.1 45.5 45.7 45.2 54.2 53.6

Avg 49.7 48.1 48.4 49.8 55.3 55.6

Each language is indicated by its ISO 639 code.


Experiments

Visualization of the language embeddings

Language X Y L-FEstonian 281.8368 294.4548 FinnicLatin 99.28075 -133.8807 RomanceNorwegian -58.05113 -1.362061 GermanicFinnish 291.2706 254.6438 FinnicAncient_Greek 170.2816 -160.4296 HellenicDutch -274.3047 -71.46408 GermanicEnglish -90.90612 -27.69737 GermanicGerman -291.0553 -31.99229 GermanicJapanese 229.3594 -182.7461 JaponicBulgarian 4.288813 -12.05062 SlavonicItalian -306.261 126.6668 RomanceHindi 52.45371 -182.9534 Indo-IranianFrench -291.0037 165.9962 RomanceBasque 116.8331 4.031695 VasconicSlovenian 65.97718 -41.21712 Slavonic

281.8368 294.4548 -127.1413 -453.943799.28075 -133.8807 -186.8575 44.95691

-58.05113 -1.362061 122.555 39.78903291.2706 254.6438 -35.18907 -458.5191170.2816 -160.4296 -267.4718 181.6538

-274.3047 -71.46408 59.31725 274.2233-90.90612 -27.69737 30.11588 78.54933-291.0553 -31.99229 163.4853 224.1084229.3594 -182.7461 -364.3714 279.10694.288813 -12.05062 -10.18006 -51.6754-306.261 126.6668 316.187 197.147152.45371 -182.9534 -279.5474 -87.38526

-291.0037 165.9962 307.6128 92.68643116.8331 4.031695 184.086 -239.275965.97718 -41.21712 87.39928 -121.4219

Estonian

Finnish

Latin

Italian

French

Norwegian

Dutch

EnglishGerman Bulgarian

Slovenian

Ancient_Greek

Japanese

Hindi

Basque

Finnic Romance Germanic Slavonic

Hellenic Japonic Indo-Iranian Vasconic


Conclusion

Outline

1 Motivation

2 Model

3 Learning

4 Experiments

5 Conclusion


Conclusion

Conclusion

We represent language identities with language embeddings and usethem to encode language similarity.

The language embeddings are used for grammar parameter predictionand auxiliary language identification task.

The language embeddings learned in our model can capture languagesimilarity that can not be inferred from phylogenetic knowledge.


Conclusion

Multilingual Grammar Induction with ContinuousLanguage Identification

Wenjuan Han, Ge Wang, Yong Jiang, Kewei Tu

ShanghaiTech University, Shanghai, ChinaAlibaba Group

{hanwj,wangge,tukw}@shanghaitech.edu.cn{yongjiang.jy}@alibaba-inc.com

November 9, 2019


multilingual grammar induction with continuous language

Documents