universal dependency parsing - inriapauillac.inria.fr/~seddah/nivrespmrl.pdf · this study parsing...

85
Universal Dependency Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Upload: others

Post on 02-Oct-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Universal Dependency Parsing

Joakim Nivre !

Uppsala University Department of Linguistics and Philology

Page 2: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

The Grand Vision

Page 3: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

The Grand Vision

The problem • Why 90% parsing accuracy for English but only 80% for Finnish?

• Are some languages intrinsically harder to parse?

• Not just morphological richness – many typological parameters

Page 4: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

The Grand Vision

The problem • Why 90% parsing accuracy for English but only 80% for Finnish?

• Are some languages intrinsically harder to parse?

• Not just morphological richness – many typological parameters

An ideal solution • A universal parser for all languages

• Linguistic universals are hard-coded

• Typological parameters are learned from data

Page 5: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

The Grand Vision

The problem • Why 90% parsing accuracy for English but only 80% for Finnish?

• Are some languages intrinsically harder to parse?

• Not just morphological richness – many typological parameters

An ideal solution • A universal parser for all languages

• Linguistic universals are hard-coded

• Typological parameters are learned from data

Gee! What a neat idea!

Page 6: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Pieces of the Puzzle

1. Morphosyntactic disambiguation

2. Universal dependency annotation

3. Parsing with universal dependencies

Page 7: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Part I

Morphosyntactic Disambiguation

Joint work with Bernd Bohnet, Igor Boguslavsky, Richárd Farkas, Filip Ginter and Jan Hajič

Page 8: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Background

Parsing accuracy for morphologically rich languages tends to be lower than for languages like English (Nivre et al., 2007)

Hypothesized explanations (Tsarfaty et al. 2010, 2013): • Strict separation morphology-syntax

• Data sparsity due to high type-token ratio

Suggested remedies • Joint morphological and syntactic analysis (Lee et al., 2011)

• Lexical resources (Hajič, 2000; Goldberg and Elhadad, 2013)

Background

Page 9: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

This StudyThis StudyParsing techniques:

• Transition-based model for joint morphological and syntactic analysis

• Lexical resources integrated as hard or soft constraints

Evaluation on five morphologically rich languages: • Czech, Finnish, German, Hungarian, Russian

• New state of the art in dependency parsing for all languages

Bernd Bohnet, Joakim Nivre, Igor Boguslavsky, Richárd Farkas, Filip Ginter, and Jan Hajič. 2013. Joint Morphological and Syntactic Analysis for Richly Inflected Languages. Transactions of the Association for Computational Linguistics 1:415–428

Page 10: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

RepresentationsRepresentations

Page 11: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

For each word of a sentence w1, …, wn:

Representations

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Representations

Page 12: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

For each word of a sentence w1, …, wn:• A part-of-speech tag p ∈ P

Representations

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Representations

Page 13: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

For each word of a sentence w1, …, wn:• A part-of-speech tag p ∈ P• A morphological feature bundle m ∈ M

Representations

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Representations

Page 14: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

For each word of a sentence w1, …, wn:• A part-of-speech tag p ∈ P• A morphological feature bundle m ∈ M• A lemma l ∈ Z*

Representations

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Representations

Page 15: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

For each word of a sentence w1, …, wn:• A part-of-speech tag p ∈ P• A morphological feature bundle m ∈ M• A lemma l ∈ Z* • A syntactic head h ∈ {0, wi, …, wn}

Representations

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Representations

Page 16: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

For each word of a sentence w1, …, wn:• A part-of-speech tag p ∈ P• A morphological feature bundle m ∈ M• A lemma l ∈ Z* • A syntactic head h ∈ {0, wi, …, wn}• A dependency label d ∈ D

Representations

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

Ein Haus hat er in Ulm gebaut .ART NN VAFIN PPER APPR NE VVPP $.

acc|sg|neut acc|sg|neut sg|3|pres|ind nom|sg|masc|3 — dat|sg|neut — —ein haus haben er in ulm bauen —

NK

OA

SB

MO

NK

OC

7

Representations

Page 17: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Parsing Framework

Transition system (Bohnet and Nivre, 2012): • Arc-standard system with online reordering (Nivre, 2009)

• Select morphology when shifting words to the stack

Beam search and structured learning (Zhang and Clark, 2008)

Preprocessing (at learning and parsing time) • Tagger assigns k best tags and feature bundles

• Parser can only select analyses licensed by preprocessor

Page 18: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Experiment 1Preprocessor assigns a single tag and feature bundle per word

Beam = 40 distinct trees

!Preprocessor assigns up to 2 tags and feature bundles per word

Beam = 40 distinct trees + 8 tag variants + 8 feature variants

Pipeline

Treebank Train Dev Test P M D

cs PDT (Hajič et al., 2001) 1249K 88K 70K 12 1851 49

de Tiger (Brants et al., 2002) 648K 32K 32K 54 257 43

fi TDT (Haverinen et al., 2013) 183K 21K 12 1917 47

hu Szeged (Farkas et al., 2012) 1101K 210K 171K 22 1105 33

ru SynTagRus (Boguslavsky et al., 2000) 575K 73K 72K 14 454 78

Joint

Page 19: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

10

Morphological Accuracy

85

90

95

100

cs de fi hu ru

92.8

96.2

89.190.0

93.7

92.6

96.1

88.889.1

93.0

Pipeline Joint

Page 20: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

10

Morphological Accuracy

85

90

95

100

cs de fi hu ru

92.8

96.2

89.190.0

93.7

92.6

96.1

88.889.1

93.0

Pipeline Joint

Page 21: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

11

Syntactic Accuracy

75

80

85

90

95

100

cs de fi hu ru

87.688.9

80.6

92.0

83.3

87.488.4

79.9

91.8

83.1

Pipeline Joint

Page 22: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

11

Syntactic Accuracy

75

80

85

90

95

100

cs de fi hu ru

87.688.9

80.6

92.0

83.3

87.488.4

79.9

91.8

83.1

Pipeline Joint

Page 23: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Experiment 2Morphological lexicons

• Lookup of tag, feature bundle and lemma

• Added as hard or soft constraints to preprocessor and parser

• Lemma selected deterministically from form + tag + feature bundle

Lexicon

cs PDT (Hajič and Hladká, 1998)

de SMOR (Schmid et al., 2004)

fi OMorFi (Pirinen, 2011)

hu morphdb.hu (Trón et al., 2006)

ru ETAP-3 (Apresian et al., 2003)

POS MOR

0

1

2

3

4+

POS MOR

0

1

2

3

4+

cs fi de hu ru

Page 24: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

13

Morphological Accuracy

80

85

90

95

100

cs de fi hu ru

95.1

97.4

91.691.2

94.4 94.5

97.1

91.1

92.9 92.8

96.2

89.190.0

93.7

Joint LexHard LexSoft

Page 25: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

13

Morphological Accuracy

80

85

90

95

100

cs de fi hu ru

95.1

97.4

91.691.2

94.4 94.5

97.1

91.1

92.9 92.8

96.2

89.190.0

93.7

Joint LexHard LexSoft

?

Page 26: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

14

Syntactic Accuracy

75

80

85

90

95

100

cs de fi hu ru

87.789.1

82.3

92.1

83.5

88.089.1

82.5

92.0

83.4

87.688.9

80.6

92.0

83.3

Joint LexHard LexSoft

Page 27: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

14

Syntactic Accuracy

75

80

85

90

95

100

cs de fi hu ru

87.789.1

82.3

92.1

83.5

88.089.1

82.5

92.0

83.4

87.688.9

80.6

92.0

83.3

Joint LexHard LexSoft

Page 28: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Discussion

Joint inference benefits both morphology and syntax • Strongest effect on morphology for Czech and German – syncretism?

• Strongest effect on syntax for Finnish and Hungarian – why?

Lexical resources mitigate data sparseness • Strongest effect on morphology – soft constraints always best?

• Weaker effect on syntax – relevant errors fixed by joint inference?

• Strong effect on syntax for Finnish – sparse data?

Page 29: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Discussion

Joint inference benefits both morphology and syntax • Strongest effect on morphology for Czech and German – syncretism?

• Strongest effect on syntax for Finnish and Hungarian – why?

Lexical resources mitigate data sparseness • Strongest effect on morphology – soft constraints always best?

• Weaker effect on syntax – relevant errors fixed by joint inference?

• Strong effect on syntax for Finnish – sparse data?

Nice! But why only 82%

for Finnish?

Can we even compare the

numbers?

Page 30: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Part 2

Universal Dependency Annotation

Joint work with Ryan McDonald, Slav Petrov, Chris Manning, Marie de Marneffe, Jinho Choi, Filip Ginter, Yoav Goldberg, Jan Hajič and Reut Tsarfaty

Page 31: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Apples and Oranges

Page 32: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Apples and Oranges

Treebank annotation schemes vary across languages • Hard to compare parsing results across languages (Nivre et al., 2007)

• Hard to evaluate cross-lingual learning (McDonald et al., 2013)

• Hard to make progress towards a universal parser?

Page 33: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Apples and Oranges

Treebank annotation schemes vary across languages • Hard to compare parsing results across languages (Nivre et al., 2007)

• Hard to evaluate cross-lingual learning (McDonald et al., 2013)

• Hard to make progress towards a universal parser?

Recent initiatives: • HamleDT: Conversion of 29 existing treebanks to a PDT-like

annotation scheme (Zeman et al., 2012)

• Universal Dependency Treebank Project: New annotation, conversion and harmonization to Google universal PoS tags and Stanford dependencies (McDonald et al., 2013)

Page 34: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Delexicalized transfer parsing with universal PoS tags

Case Study

SourceTraining

Language

Target Test LanguageUnlabeled Attachment Score (UAS) Labeled Attachment Score (LAS)Germanic Romance Germanic Romance

DE EN SV ES FR KO DE EN SV ES FR KODE 74.86 55.05 65.89 60.65 62.18 40.59 64.84 47.09 53.57 48.14 49.59 27.73EN 58.50 83.33 70.56 68.07 70.14 42.37 48.11 78.54 57.04 56.86 58.20 26.65SV 61.25 61.20 80.01 67.50 67.69 36.95 52.19 49.71 70.90 54.72 54.96 19.64ES 55.39 58.56 66.84 78.46 75.12 30.25 45.52 47.87 53.09 70.29 63.65 16.54FR 55.05 59.02 65.05 72.30 81.44 35.79 45.96 47.41 52.25 62.56 73.37 20.84KO 33.04 32.20 27.62 26.91 29.35 71.22 26.36 21.81 18.12 18.63 19.52 55.85

Table 3: Cross-lingual transfer parsing results. Bolded are the best per target cross-lingual result.

mon to both studies (DE, EN, SV and ES). In thatstudy, UAS was in the 38–68% range, as comparedto 55–75% here. For Swedish, we can even mea-sure the difference exactly, because the test setsare the same, and we see an increase from 58.3%to 70.6%. This suggests that most cross-lingualparsing studies have underestimated accuracies.

4 Conclusion

We have released data sets for six languages withconsistent dependency annotation. After the ini-tial release, we will continue to annotate data inmore languages as well as investigate further au-tomatic treebank conversions. This may also leadto modifications of the annotation scheme, whichshould be regarded as preliminary at this point.Specifically, with more typologically and morpho-logically diverse languages being added to the col-lection, it may be advisable to consistently en-force the principle that content words take func-tion words as dependents, which is currently vi-olated in the analysis of adpositional and copulaconstructions. This will ensure a consistent analy-sis of functional elements that in some languagesare not realized as free words or are not obliga-tory, such as adpositions which are often absentdue to case inflections in languages like Finnish. Itwill also allow the inclusion of language-specificfunctional or morphological markers (case mark-ers, topic markers, classifiers, etc.) at the leaves ofthe tree, where they can easily be ignored in appli-cations that require a uniform cross-lingual repre-sentation. Finally, this data is available on an opensource repository in the hope that the communitywill commit new data and make corrections to ex-isting annotations.

Acknowledgments

Many people played critical roles in the pro-cess of creating the resource. At Google, Fer-

nando Pereira, Alfred Spector, Kannan Pashu-pathy, Michael Riley and Corinna Cortes sup-ported the project and made sure it had the re-quired resources. Jennifer Bahk and Dave Orrhelped coordinate the necessary contracts. AndreaHeld, Supreet Chinnan, Elizabeth Hewitt, Tu Tsaoand Leigha Weinberg made the release processsmooth. Michael Ringgaard, Andy Golding, TerryKoo, Alexander Rush and many others providedtechnical advice. Hans Uszkoreit gave us per-mission to use a subsample of sentences from theTiger Treebank (Brants et al., 2002), the source ofthe news domain for our German data set. Anno-tations were additionally provided by Sulki Kim,Patrick McCrae, Laurent Alamarguy and HectorFernandez Alcalde.

ReferencesAlena Bohmova, Jan Hajic, Eva Hajicova, and Barbora

Hladka. 2003. The Prague Dependency Treebank:A three-level annotation scenario. In Anne Abeille,editor, Treebanks: Building and Using Parsed Cor-pora, pages 103–127. Kluwer.

Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolf-gang Lezius, and George Smith. 2002. The TIGERTreebank. In Proceedings of the Workshop on Tree-banks and Linguistic Theories.

Sabine Buchholz and Erwin Marsi. 2006. CoNLL-Xshared task on multilingual dependency parsing. InProceedings of CoNLL.

Miriam Butt, Helge Dyvik, Tracy Holloway King,Hiroshi Masuichi, and Christian Rohrer. 2002.The parallel grammar project. In Proceedings ofthe 2002 workshop on Grammar engineering andevaluation-Volume 15.

Pi-Chuan Chang, Huihsin Tseng, Dan Jurafsky, andChristopher D. Manning. 2009. Discriminativereordering with Chinese grammatical relations fea-tures. In Proceedings of the Third Workshop on Syn-tax and Structure in Statistical Translation (SSST-3)at NAACL HLT 2009.

Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló, and Jungmee Lee. 2013. Universal Dependency Annotation for Multilingual Parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 92–97.

Page 35: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Delexicalized transfer parsing with universal PoS tags

Case Study

SourceTraining

Language

Target Test LanguageUnlabeled Attachment Score (UAS) Labeled Attachment Score (LAS)Germanic Romance Germanic Romance

DE EN SV ES FR KO DE EN SV ES FR KODE 74.86 55.05 65.89 60.65 62.18 40.59 64.84 47.09 53.57 48.14 49.59 27.73EN 58.50 83.33 70.56 68.07 70.14 42.37 48.11 78.54 57.04 56.86 58.20 26.65SV 61.25 61.20 80.01 67.50 67.69 36.95 52.19 49.71 70.90 54.72 54.96 19.64ES 55.39 58.56 66.84 78.46 75.12 30.25 45.52 47.87 53.09 70.29 63.65 16.54FR 55.05 59.02 65.05 72.30 81.44 35.79 45.96 47.41 52.25 62.56 73.37 20.84KO 33.04 32.20 27.62 26.91 29.35 71.22 26.36 21.81 18.12 18.63 19.52 55.85

Table 3: Cross-lingual transfer parsing results. Bolded are the best per target cross-lingual result.

mon to both studies (DE, EN, SV and ES). In thatstudy, UAS was in the 38–68% range, as comparedto 55–75% here. For Swedish, we can even mea-sure the difference exactly, because the test setsare the same, and we see an increase from 58.3%to 70.6%. This suggests that most cross-lingualparsing studies have underestimated accuracies.

4 Conclusion

We have released data sets for six languages withconsistent dependency annotation. After the ini-tial release, we will continue to annotate data inmore languages as well as investigate further au-tomatic treebank conversions. This may also leadto modifications of the annotation scheme, whichshould be regarded as preliminary at this point.Specifically, with more typologically and morpho-logically diverse languages being added to the col-lection, it may be advisable to consistently en-force the principle that content words take func-tion words as dependents, which is currently vi-olated in the analysis of adpositional and copulaconstructions. This will ensure a consistent analy-sis of functional elements that in some languagesare not realized as free words or are not obliga-tory, such as adpositions which are often absentdue to case inflections in languages like Finnish. Itwill also allow the inclusion of language-specificfunctional or morphological markers (case mark-ers, topic markers, classifiers, etc.) at the leaves ofthe tree, where they can easily be ignored in appli-cations that require a uniform cross-lingual repre-sentation. Finally, this data is available on an opensource repository in the hope that the communitywill commit new data and make corrections to ex-isting annotations.

Acknowledgments

Many people played critical roles in the pro-cess of creating the resource. At Google, Fer-

nando Pereira, Alfred Spector, Kannan Pashu-pathy, Michael Riley and Corinna Cortes sup-ported the project and made sure it had the re-quired resources. Jennifer Bahk and Dave Orrhelped coordinate the necessary contracts. AndreaHeld, Supreet Chinnan, Elizabeth Hewitt, Tu Tsaoand Leigha Weinberg made the release processsmooth. Michael Ringgaard, Andy Golding, TerryKoo, Alexander Rush and many others providedtechnical advice. Hans Uszkoreit gave us per-mission to use a subsample of sentences from theTiger Treebank (Brants et al., 2002), the source ofthe news domain for our German data set. Anno-tations were additionally provided by Sulki Kim,Patrick McCrae, Laurent Alamarguy and HectorFernandez Alcalde.

ReferencesAlena Bohmova, Jan Hajic, Eva Hajicova, and Barbora

Hladka. 2003. The Prague Dependency Treebank:A three-level annotation scenario. In Anne Abeille,editor, Treebanks: Building and Using Parsed Cor-pora, pages 103–127. Kluwer.

Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolf-gang Lezius, and George Smith. 2002. The TIGERTreebank. In Proceedings of the Workshop on Tree-banks and Linguistic Theories.

Sabine Buchholz and Erwin Marsi. 2006. CoNLL-Xshared task on multilingual dependency parsing. InProceedings of CoNLL.

Miriam Butt, Helge Dyvik, Tracy Holloway King,Hiroshi Masuichi, and Christian Rohrer. 2002.The parallel grammar project. In Proceedings ofthe 2002 workshop on Grammar engineering andevaluation-Volume 15.

Pi-Chuan Chang, Huihsin Tseng, Dan Jurafsky, andChristopher D. Manning. 2009. Discriminativereordering with Chinese grammatical relations fea-tures. In Proceedings of the Third Workshop on Syn-tax and Structure in Statistical Translation (SSST-3)at NAACL HLT 2009.

Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló, and Jungmee Lee. 2013. Universal Dependency Annotation for Multilingual Parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 92–97.

First evaluation of labeled accuracy

Page 36: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Delexicalized transfer parsing with universal PoS tags

Case Study

SourceTraining

Language

Target Test LanguageUnlabeled Attachment Score (UAS) Labeled Attachment Score (LAS)Germanic Romance Germanic Romance

DE EN SV ES FR KO DE EN SV ES FR KODE 74.86 55.05 65.89 60.65 62.18 40.59 64.84 47.09 53.57 48.14 49.59 27.73EN 58.50 83.33 70.56 68.07 70.14 42.37 48.11 78.54 57.04 56.86 58.20 26.65SV 61.25 61.20 80.01 67.50 67.69 36.95 52.19 49.71 70.90 54.72 54.96 19.64ES 55.39 58.56 66.84 78.46 75.12 30.25 45.52 47.87 53.09 70.29 63.65 16.54FR 55.05 59.02 65.05 72.30 81.44 35.79 45.96 47.41 52.25 62.56 73.37 20.84KO 33.04 32.20 27.62 26.91 29.35 71.22 26.36 21.81 18.12 18.63 19.52 55.85

Table 3: Cross-lingual transfer parsing results. Bolded are the best per target cross-lingual result.

mon to both studies (DE, EN, SV and ES). In thatstudy, UAS was in the 38–68% range, as comparedto 55–75% here. For Swedish, we can even mea-sure the difference exactly, because the test setsare the same, and we see an increase from 58.3%to 70.6%. This suggests that most cross-lingualparsing studies have underestimated accuracies.

4 Conclusion

We have released data sets for six languages withconsistent dependency annotation. After the ini-tial release, we will continue to annotate data inmore languages as well as investigate further au-tomatic treebank conversions. This may also leadto modifications of the annotation scheme, whichshould be regarded as preliminary at this point.Specifically, with more typologically and morpho-logically diverse languages being added to the col-lection, it may be advisable to consistently en-force the principle that content words take func-tion words as dependents, which is currently vi-olated in the analysis of adpositional and copulaconstructions. This will ensure a consistent analy-sis of functional elements that in some languagesare not realized as free words or are not obliga-tory, such as adpositions which are often absentdue to case inflections in languages like Finnish. Itwill also allow the inclusion of language-specificfunctional or morphological markers (case mark-ers, topic markers, classifiers, etc.) at the leaves ofthe tree, where they can easily be ignored in appli-cations that require a uniform cross-lingual repre-sentation. Finally, this data is available on an opensource repository in the hope that the communitywill commit new data and make corrections to ex-isting annotations.

Acknowledgments

Many people played critical roles in the pro-cess of creating the resource. At Google, Fer-

nando Pereira, Alfred Spector, Kannan Pashu-pathy, Michael Riley and Corinna Cortes sup-ported the project and made sure it had the re-quired resources. Jennifer Bahk and Dave Orrhelped coordinate the necessary contracts. AndreaHeld, Supreet Chinnan, Elizabeth Hewitt, Tu Tsaoand Leigha Weinberg made the release processsmooth. Michael Ringgaard, Andy Golding, TerryKoo, Alexander Rush and many others providedtechnical advice. Hans Uszkoreit gave us per-mission to use a subsample of sentences from theTiger Treebank (Brants et al., 2002), the source ofthe news domain for our German data set. Anno-tations were additionally provided by Sulki Kim,Patrick McCrae, Laurent Alamarguy and HectorFernandez Alcalde.

ReferencesAlena Bohmova, Jan Hajic, Eva Hajicova, and Barbora

Hladka. 2003. The Prague Dependency Treebank:A three-level annotation scenario. In Anne Abeille,editor, Treebanks: Building and Using Parsed Cor-pora, pages 103–127. Kluwer.

Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolf-gang Lezius, and George Smith. 2002. The TIGERTreebank. In Proceedings of the Workshop on Tree-banks and Linguistic Theories.

Sabine Buchholz and Erwin Marsi. 2006. CoNLL-Xshared task on multilingual dependency parsing. InProceedings of CoNLL.

Miriam Butt, Helge Dyvik, Tracy Holloway King,Hiroshi Masuichi, and Christian Rohrer. 2002.The parallel grammar project. In Proceedings ofthe 2002 workshop on Grammar engineering andevaluation-Volume 15.

Pi-Chuan Chang, Huihsin Tseng, Dan Jurafsky, andChristopher D. Manning. 2009. Discriminativereordering with Chinese grammatical relations fea-tures. In Proceedings of the Third Workshop on Syn-tax and Structure in Statistical Translation (SSST-3)at NAACL HLT 2009.

Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Täckström, Claudia Bedini, Núria Bertomeu Castelló, and Jungmee Lee. 2013. Universal Dependency Annotation for Multilingual Parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 92–97.

First evaluation of labeled accuracyResults make typological sense

Page 37: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Too Many Standards?

Page 38: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Too Many Standards?

English SD de Marneffe et al. (2006)

Page 39: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Too Many Standards?

Google SD McDonald et al. (2013)

English SD de Marneffe et al. (2006)

Page 40: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Too Many Standards?

Google SD McDonald et al. (2013)

Hebrew SD Tsarfaty (2013)

Finnish SD Haverinen et al. (2013)

English SD de Marneffe et al. (2006)

Page 41: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Too Many Standards?

Google SD McDonald et al. (2013)

CLEAR Choi (2012)

HamleDT Zeman et al. (2012)

Hebrew SD Tsarfaty (2013)

Finnish SD Haverinen et al. (2013)

English SD de Marneffe et al. (2006)

Page 42: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Too Many Standards?

Google SD McDonald et al. (2013)

CLEAR Choi (2012)

HamleDT Zeman et al. (2012)

Hebrew SD Tsarfaty (2013)

Finnish SD Haverinen et al. (2013)

English SD de Marneffe et al. (2006)

Universal Dependencies

!

Marie-Catherine De Marneffe, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre and Christopher D. Manning. 2014. Universal Stanford Dependencies: a Cross-Linguistic Typology. In Proceedings of the Ninth International Conference on Language Resources and Evaluation.

Page 43: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Universal Dependencieshttp://universaldependencies.github.io/docs/

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

Page 44: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Universal Dependencieshttp://universaldependencies.github.io/docs/

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

Stanford Universal Dependencies

Page 45: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Universal Dependencieshttp://universaldependencies.github.io/docs/

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

Google Universal Part-of-Speech Tags + Morphological Features

Page 46: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Universal Dependencieshttp://universaldependencies.github.io/docs/

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

In dem Restaurant isst Maria den Fisch .

ADP DET NOUN VERB PNOUN DET NOUN .CASE=DAT CASE=DAT TENSE=PRES CASE=NOM CASE=ACC CASE=ACCNUM=SIN NUM=SIN NUM=SIN NUM=SINGEN=NEU GEN=NEU GEN=MAS GEN=MAS

1.1 1.2 2 3 4 5 6 7Im Restaurant isst Maria den Fisch .

casedet advmod nsubj det

dobjpunc

1

Explicit Mapping from Tokens to Words

Page 47: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Guiding Principles

Page 48: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Guiding Principles

Maximize parallelism • Don’t annotate the same thing in different ways

• Don’t make different things look the same

Page 49: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Guiding Principles

Maximize parallelism • Don’t annotate the same thing in different ways

• Don’t make different things look the same

But don’t overdo it • Don’t annotate things that are not there

• Languages select from a universal pool of categories

• Allow language-specific extensions

Page 50: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

• Keeping content words as heads promotes parallelism

• Function words often correlate with morphology

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN .

detnsubjpass

auxpasscase

det

agentpunc

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN .

detnsubjpass

auxpasscase

det

agentpunc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

2

Dependency Structure

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN .

detnsubjpass

auxpasscase

det

agentpunc

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN .

detnsubjpass

auxpasscase

det

agentpunc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

2

Page 51: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

• Keeping content words as heads promotes parallelism

• Function words often correlate with morphology

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN .

detnsubjpass

auxpasscase

det

agentpunc

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN .

detnsubjpass

auxpasscase

det

agentpunc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

2

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN .

detnsubjpass

auxpasscase

det

agentpunc

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN .

detnsubjpass

auxpasscase

det

agentpunc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

2

Dependency Structure

Page 52: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

• Keeping content words as heads promotes parallelism

• Function words often correlate with morphology

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN .

detnsubjpass

auxpasscase

det

agentpunc

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN .

detnsubjpass

auxpasscase

det

agentpunc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

2

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN .

detnsubjpass

auxpasscase

det

agentpunc

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN .

detnsubjpass

auxpasscase

det

agentpunc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

Hunden jagades av katten .NOUN VERB ADP NOUN .DEF=+ VOICE=PAS DEF=+

nsubjpass caseagent

punc

2

Dependency Structure

Page 53: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Dependency Relations

• Taxonomy of 42 universal grammatical relations, broadly supported across many languages in language typology

• Language specific subtypes can be added

Page 54: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

• The lexicalist hypothesis • Grammatical relations hold between words (including clitics)

• Morphological categories are properties of words

• Morphological annotation • Revised Google Universal Part-of-Speech Tags (Petrov et al., 2012)

• Universal inventory of morphological features (under construction)

Morphology

ADJ ADP ADV AUX CONJ DET INTJ NOUN NUM PNOUN PRON PRT PUNCT SCONJ SYM VERB X

Page 55: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Tokens and Words

Principle of recoverability • Clitics and contractions are split to allow meaningful annotation

• Mapping from basic tokenization is explicitly represented

• Heuristic mapping of annotation to basic tokens is provided

Encoded in revised CoNLL format (CoNLL-U)

Vamos nos a el mar .VERB PRON ADP DET NOUN .

1.1 1.2 2.1 2.2 3 4Vamonos al mar .

explcase

det

nmodpunc

Vamonos al mar .VERB PRON ADP DET NOUN .

Vamos nos a el mar .

explcase

det

nmodpunc

3

Page 56: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Tokens and Words

Principle of recoverability • Clitics and contractions are split to allow meaningful annotation

• Mapping from basic tokenization is explicitly represented

• Heuristic mapping of annotation to basic tokens is provided

Encoded in revised CoNLL format (CoNLL-U)

Vamos nos a el mar .VERB PRON ADP DET NOUN .

1.1 1.2 2.1 2.2 3 4Vamonos al mar .

explcase

det

nmodpunc

Vamonos al mar .VERB PRON ADP DET NOUN .

Vamos nos a el mar .

explcase

det

nmodpunc

Vamos nos a el mar .VERB PRON ADP DET NOUN .

explcase

det

nmodpunc

3

Page 57: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Tokens and Words

Principle of recoverability • Clitics and contractions are split to allow meaningful annotation

• Mapping from basic tokenization is explicitly represented

• Heuristic mapping of annotation to basic tokens is provided

Encoded in revised CoNLL format (CoNLL-U)

Vamos nos a el mar .VERB PRON ADP DET NOUN .

1.1 1.2 2.1 2.2 3 4Vamonos al mar .

explcase

det

nmodpunc

Vamonos al mar .VERB PRON ADP DET NOUN .

Vamos nos a el mar .

explcase

det

nmodpunc

3

Page 58: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Tokens and Words

Principle of recoverability • Clitics and contractions are split to allow meaningful annotation

• Mapping from basic tokenization is explicitly represented

• Heuristic mapping of annotation to basic tokens is provided

Encoded in revised CoNLL format (CoNLL-U)

Vamos nos a el mar .VERB PRON ADP DET NOUN .

1.1 1.2 2.1 2.2 3 4Vamonos al mar .

explcase

det

nmodpunc

Vamonos al mar .VERB PRON ADP DET NOUN .

Vamos nos a el mar .

explcase

det

nmodpunc

3

Page 59: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Work in Progress

Current time plan: • Stable annotation guidelines by end of September

• First release of data sets before the end of 2014

Follow our progress and give feedback: • Universal Dependencies: http://universaldependencies.github.io/docs/

Check out old releases: • Uni-Dep-TB: https://code.google.com/p/uni-dep-tb/

Let’s work together!

Page 60: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Work in Progress

Current time plan: • Stable annotation guidelines by end of September

• First release of data sets before the end of 2014

Follow our progress and give feedback: • Universal Dependencies: http://universaldependencies.github.io/docs/

Check out old releases: • Uni-Dep-TB: https://code.google.com/p/uni-dep-tb/

Let’s work together!

Hey, wait a minute!

How is this going to improve

parsing?

Page 61: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Part 3

Parsing with Universal Dependencies

Confessions of a converted dependency parser

Page 62: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Taking Stock

Page 63: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Taking Stock

Motivation for universal dependencies

• Improve comparability of parsing results across languages

• Facilitate development of multilingual systems

• Enable typological studies of syntactic structure

Page 64: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Taking Stock

Motivation for universal dependencies

• Improve comparability of parsing results across languages

• Facilitate development of multilingual systems

• Enable typological studies of syntactic structure

What about parsing?

• Not likely to improve parsing accuracy with existing parsers

• Parsers tend to prefer function words as heads (Schwarz et al., 2012)

• We risk bringing English down to 80% instead of Finnish up to 90%

Page 65: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

What’s the Problem?

Page 66: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

• Dependency parsers know only one syntactic relation

What’s the Problem?

h d

Page 67: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

• Dependency parsers know only one syntactic relation

• They do not interpret dependency labels

What’s the Problem?

h d

?

Page 68: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

• Dependency parsers know only one syntactic relation

• They do not interpret dependency labels

• They represent a construction primarily by its head

What’s the Problem?

h d

?h

Page 69: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

A Simple Case

The dog was chased by the black cat.

Page 70: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

• All criteria point to cat being the head

The dog was chased by the .black cat

amod

A Simple Case

Page 71: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

• All criteria point to cat being the head

• Little (syntactic) information is lost by dropping black

black cat

amodThe dog was chased by the .

cat

A Simple Case

Page 72: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

The dog was chased by the black cat.

A Tricky Case

Page 73: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

The dog by the black cat.was chased

?

• Some criteria point to was, others to chased as the head

A Tricky Case

Page 74: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

was chased

aux

chased

chasedwas

main

was

The dog by the black cat.

• Some criteria point to was, others to chased as the head

• Neither word can represent the whole

A Tricky Case

Page 75: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

• Some criteria point to was, others to chased as the head

• Neither word can represent the whole

• The same problem arises with by (the black cat)

A Tricky Case

The dog was chased by the black cat.

Page 76: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

• Three fundamental syntactic relations (Tesnière, 1959)

• Tesnière-style dependency treebank (Sangati and Mazza, 2009)

Back to the Roots

cat

black

Dependency single head

was chased

Transfer split head

cats and mice

Coordination multiple heads

Page 77: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Universal Dependencies

Page 78: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Universal Dependencies

• The UD representation is a simple dependency tree

Fido was chased by Kitty and Tiger .PNOUN AUX VERB ADP PNOUN CONJ PNOUN .

nsubjpassauxpass case

agent conjcc

punc

Fido was chased by Kitty and Tiger .PNOUN AUX VERB ADP PNOUN CONJ PNOUN .

nsubjpassauxpass case

agent conjcc

punc

4

Page 79: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Universal Dependencies

• The UD representation is a simple dependency tree

• But labels are universal and can be interpreted

Fido was chased by Kitty and Tiger .PNOUN AUX VERB ADP PNOUN CONJ PNOUN .

nsubjpassauxpass case

agent conjcc

punc

Fido was chased by Kitty and Tiger .PNOUN AUX VERB ADP PNOUN CONJ PNOUN .

nsubjpassauxpass case

agent conjcc

punc

4

Page 80: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Universal Dependencies

• The UD representation is a simple dependency tree

• But labels are universal and can be interpreted

• Therefore, we can put more knowledge into the parser

was chased

Fido by Kitty and Tiger

Page 81: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Final Remarks

Page 82: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Final Remarks

Constituency parsing – largely driven by PTB • Perhaps too much emphasis on English (until recently)

• But deep analysis of categories and representations

Page 83: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Final Remarks

Constituency parsing – largely driven by PTB • Perhaps too much emphasis on English (until recently)

• But deep analysis of categories and representations

Dependency parsing – largely driven by CoNLL data • More attention to typological diversity

• But ignorance and blind faith with respect to representations

Page 84: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Final Remarks

Constituency parsing – largely driven by PTB • Perhaps too much emphasis on English (until recently)

• But deep analysis of categories and representations

Dependency parsing – largely driven by CoNLL data • More attention to typological diversity

• But ignorance and blind faith with respect to representations

Can UD give us the best of both worlds?

Page 85: Universal Dependency Parsing - Inriapauillac.inria.fr/~seddah/NivreSPMRL.pdf · This Study Parsing techniques:! • Transition-based model for joint morphological and syntactic analysis!

Thanks for Your Attention! Questions?

http://stp.lingfil.uu.se/~nivre/