towards a universal grammar for natural language processing · towards a universal grammar for...

87
Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics and Philology Based on collaborative work with Filip Ginter, Yoav Goldberg, Jan Hajič, Chris Manning, Ryan McDonald, Natalia Silveira, Marie de Marneffe, Slav Petrov, Sampo Pyysalo, Reut Tsarfaty, Daniel Zeman and many others

Upload: others

Post on 03-Jun-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Towards a Universal Grammar for Natural Language Processing

Joakim Nivre

Uppsala UniversityDepartment of Linguistics and Philology

Based on collaborative work with Filip Ginter, Yoav Goldberg, Jan Hajič, Chris Manning, Ryan McDonald, Natalia Silveira, Marie de Marneffe, Slav Petrov,

Sampo Pyysalo, Reut Tsarfaty, Daniel Zeman and many others

Page 2: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

“In its substance, grammar is one and the same in all languages, even if it accidentally varies.”

Page 3: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

“In its substance, grammar is one and the same in all languages, even if it accidentally varies.”

Page 4: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

“In its substance, grammar is one and the same in all languages, even if it accidentally varies.”

Page 5: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Grammar

Page 6: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Grammar

• All human languages are species of a common genus

Page 7: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Grammar

• All human languages are species of a common genus

• Language structure is constrained by a universal cause

Page 8: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Grammar

• All human languages are species of a common genus

• Language structure is constrained by a universal cause

• There is order in the chaos of linguistic variation

Page 9: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Natural Language Processing

Page 10: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Natural Language Processing

• Linguistic diversity makes our life harderWhy 90% parsing accuracy for English but only 80% for Finnish?

Can we even compare the numbers?

Page 11: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Natural Language Processing

• Linguistic diversity makes our life harderWhy 90% parsing accuracy for English but only 80% for Finnish?

Can we even compare the numbers?

• Current NLP relies heavily on linguistic annotation:“In its substance, grammar is the same in all languages, even if the annotation accidentally varies.”

Page 12: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Natural Language Processing

• Linguistic diversity makes our life harderWhy 90% parsing accuracy for English but only 80% for Finnish?

Can we even compare the numbers?

• Current NLP relies heavily on linguistic annotation:“In its substance, grammar is the same in all languages, even if the annotation accidentally varies.”

• We need to bring some order into the chaos

Page 13: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics
Page 14: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language X

Page 15: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language X

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language Y

Page 16: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language X

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language YEn katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language Z

Page 17: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language X

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language YEn katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language Z

Which languages are most closely related?

Page 18: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language X

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language YEn katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language Z

Which languages are most closely related?

1/5

Page 19: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language X

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language YEn katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language Z

Which languages are most closely related?

1/5

2/5

Page 20: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language X

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language YEn katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language Z

Which languages are most closely related?

1/5

2/5

2/5

Page 21: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language X

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language YEn katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rader og møs

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

1

Language Z

Which languages are most closely related?

1/5

2/5

2/5

Swedish

Danish

English

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

toutefois , les fille adorer les dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

toutefois , les fille adorer les dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

toutefois , les fille adorer les dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

Page 22: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Why is this a problem?

Page 23: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Why is this a problem?

• Hard to compare empirical results across languages

Page 24: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Why is this a problem?

• Hard to compare empirical results across languages

• Hard to evaluate cross-lingual learning

Page 25: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Why is this a problem?

• Hard to compare empirical results across languages

• Hard to evaluate cross-lingual learning

• Hard to build and maintain multilingual systems

Page 26: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Why is this a problem?

• Hard to compare empirical results across languages

• Hard to evaluate cross-lingual learning

• Hard to build and maintain multilingual systems

• Hard to make progress towards a universal parser

Page 27: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Dependencieshttp://universaldependencies.org

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

Page 28: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Dependencieshttp://universaldependencies.org

Part-of-speech tags

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

Page 29: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Dependencieshttp://universaldependencies.org

Part-of-speech tags

Morphological features

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

Page 30: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Dependencieshttp://universaldependencies.org

Part-of-speech tags

Morphological features

Dependency relations

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

Page 31: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Dependencieshttp://universaldependencies.org

Page 32: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Dependencieshttp://universaldependencies.org

Stanford Dependencies

Page 33: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Dependencieshttp://universaldependencies.org

Stanford Dependencies

Google UD

Page 34: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Dependencieshttp://universaldependencies.org

Stanford Dependencies

Google UD

Stanford UD

Page 35: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Dependencieshttp://universaldependencies.org

Stanford Dependencies

Google UD

Stanford UD

HamleDT

Page 36: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Dependencieshttp://universaldependencies.org

Stanford Dependencies

Google UD

Stanford UD

HamleDT

Interset

Page 37: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Dependencieshttp://universaldependencies.org

Stanford Dependencies

Google UD

Stanford UD

HamleDT

Interset

Google universal tags

Page 38: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Dependencies

Universal Dependencieshttp://universaldependencies.org

Page 39: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Universal Dependencies

• Milestones:• Kick-off meeting at EACL in Gothenburg, April 2014• Release of annotation guidelines, Version 1, October 2014• Release of treebanks for 10 languages, January 2015• Release of treebanks for 18 languages, May 2015• Release of treebanks for 33 languages, November 2015

• Open community effort – anyone can contribute!

Universal Dependencieshttp://universaldependencies.org

Page 40: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Goals and Requirements

Page 41: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Goals and Requirements

• Cross-linguistically consistent grammatical annotation

Page 42: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Goals and Requirements

• Cross-linguistically consistent grammatical annotation

• Support multilingual research and development in NLP

Page 43: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Goals and Requirements

• Cross-linguistically consistent grammatical annotation

• Support multilingual research and development in NLP

• Based on common usage and existing de facto standards

Page 44: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Design Principles

Page 45: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Design Principles

• Dependency• Widely used in practical NLP systems

• Available in treebanks for many languages

Page 46: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Design Principles

• Dependency• Widely used in practical NLP systems

• Available in treebanks for many languages

• Lexicalism• Basic annotation units are words – syntactic words

• Words have morphological properties

• Words enter into syntactic relations

Page 47: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Golden Rules

Page 48: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Golden Rules

Maximize parallelism• Don’t annotate the same thing in different ways

• Don’t make different things look the same

Page 49: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Golden Rules

Maximize parallelism• Don’t annotate the same thing in different ways

• Don’t make different things look the same

But don’t overdo it• Don’t annotate things that are not there

• Languages select from a universal pool of categories

• Allow language-specific extensions

Page 50: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Morphology

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

toutefois , les fille adorer les dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

Page 51: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

• Lemma representing the semantic content of the word

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

toutefois , le fille adorer le dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

Toutefois , les filles adorent les desserts .

toutefois , le fille adorer les dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

Morphology

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

toutefois , les fille adorer les dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

Page 52: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

• Lemma representing the semantic content of the word

• Part-of-speech tag representing the abstract lexical category associated with the word

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

toutefois , le fille adorer le dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

Toutefois , les filles adorent les desserts .

toutefois , le fille adorer les dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

Morphology

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

toutefois , les fille adorer les dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

toutefois , les fille adorer les dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

Page 53: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

• Lemma representing the semantic content of the word

• Part-of-speech tag representing the abstract lexical category associated with the word

• Features representing lexical and grammatical properties associated with the lemma or the particular word form

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

toutefois , le fille adorer le dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

Toutefois , les filles adorent les desserts .

toutefois , le fille adorer les dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

Morphology

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

toutefois , les fille adorer les dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

toutefois , les fille adorer les dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

En katt jagar rattor och moss

det nsubj conj

dobj

conj

En kat jager rotter og mus

nsubj

? dobj cc conj

A cat chases rats and mice

det nsubj dobj cc

conj

Toutefois , les filles adorent les desserts .

toutefois , les fille adorer les dessert .

ADV PUNCT DET NOUN VERB DET NOUN PUNCT

Definite=Def Gender=Fem Number=Plur Definite=Def Gender=MascNumber=Plur Number=Plur Person=3 Number=Plur Number=Plur

Tense=Pres

advmod

punct

det nsubj

root

det

dobj

punct

1

Page 54: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

• Taxonomy of 17 universal part-of-speech tags, based on the Google Universal Tagset (Petrov et al., 2012)

• All languages use the same inventory, but not all tags have to be used by all languages

Part-of-Speech TagsOpen Closed OtherADJ ADP PUNCT

ADV AUX SYM

INTJ CONJ X

NOUN DET

PROPN NUM

VERB PART

PRON

SCONJ

Page 55: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Features

• Standardized inventory of morphological features, based on the Interset system (Zeman, 2008)

• Languages select relevant features and can add language-specific features or values with documentation

LexicalInflectional Nominal

Inflectional Verbal

PronType Gender VerbForm

NumType Animacy Mood

Poss Number Tense

Reflex Case Aspect

Definite Voice

Degree Person

Negative

Page 56: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Syntax

The cat could have chased all the dogs down the street .

DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

det

nsubj

aux

aux

root

det

det

dobj

case

det

nmod

punct

The cat could have chased all the dogs down the street .

DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

det

nsubj

aux

aux

root

det

det

dobj

case

det

nmod

The cat could have chased all the dogs down the street .

DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

nsubj

root

dobj

nmod

2

Page 57: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Syntax

• Content words are related by dependency relations

The cat could have chased all the dogs down the street .

DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

det

nsubj

aux

aux

root

det

det

dobj

case

det

nmod

punct

The cat could have chased all the dogs down the street .

DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

det

nsubj

aux

aux

root

det

det

dobj

case

det

nmod

The cat could have chased all the dogs down the street .

DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

nsubj

root

dobj

nmod

2

Page 58: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Syntax

• Content words are related by dependency relations

• Function words attach to the content word they modify

The cat could have chased all the dogs down the street .

DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

det

nsubj

aux

aux

root

det

det

dobj

case

det

nmod

punct

The cat could have chased all the dogs down the street .

DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

det

nsubj

aux

aux

root

det

det

dobj

case

det

nmod

The cat could have chased all the dogs down the street .

DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

nsubj

root

dobj

nmod

2

Page 59: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Syntax

• Content words are related by dependency relations

• Function words attach to the content word they modify

• Punctuation attach to head of phrase or clause

The cat could have chased all the dogs down the street .

DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

det

nsubj

aux

aux

root

det

det

dobj

case

det

nmod

punct

The cat could have chased all the dogs down the street .

DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

det

nsubj

aux

aux

root

det

det

dobj

case

det

nmod

The cat could have chased all the dogs down the street .

DET NOUN AUX AUX VERB DET DET NOUN ADP DET NOUN PUNCT

nsubj

root

dobj

nmod

2

Page 60: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN PUNCT

det

nsubjpass

auxpass

root

case

det

nmod

punct

Hunden jagades av katten .NOUN VERB ADP NOUN PUNCT

Definite=Def Voice=Pass Definite=Def

nsubjpass

root

case

nmod

punct

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN PUNCT

nsubjpass

root

nmod

punct

Hunden jagades av katten .NOUN VERB ADP NOUN PUNCT

Definite=Def Voice=Pass Definite=Def

nsubjpass

root

nmod

punct

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN PUNCT

det

nsubjpass

root

det

nmod

punct

Hunden jagades av katten .NOUN VERB ADP NOUN PUNCT

Definite=Def Voice=Pass Definite=Def

nsubjpass

root

nmod

punct

3

Page 61: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN PUNCT

det

nsubjpass

auxpass

root

case

det

nmod

punct

Hunden jagades av katten .NOUN VERB ADP NOUN PUNCT

Definite=Def Voice=Pass Definite=Def

nsubjpass

root

case

nmod

punct

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN PUNCT

nsubjpass

root

nmod

punct

Hunden jagades av katten .NOUN VERB ADP NOUN PUNCT

Definite=Def Voice=Pass Definite=Def

nsubjpass

root

nmod

punct

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN PUNCT

det

nsubjpass

root

det

nmod

punct

Hunden jagades av katten .NOUN VERB ADP NOUN PUNCT

Definite=Def Voice=Pass Definite=Def

nsubjpass

root

nmod

punct

3

Page 62: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN PUNCT

det

nsubjpass

auxpass

root

det

nmod

punct

Hunden jagades av katten .NOUN VERB ADP NOUN PUNCT

Definite=Def Voice=Pass Definite=Def

nsubjpass

root

nmod

punct

4

Page 63: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN PUNCT

det

nsubjpass

auxpass

root

case

det

nmod

punct

Hunden jagades av katten .NOUN VERB ADP NOUN PUNCT

Definite=Def Voice=Pass Definite=Def

nsubjpass

root

case

nmod

punct

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN PUNCT

nsubjpass

root

nmod

punct

Hunden jagades av katten .NOUN VERB ADP NOUN PUNCT

Definite=Def Voice=Pass Definite=Def

nsubjpass

root

nmod

punct

The dog was chased by the cat .DET NOUN AUX VERB ADP DET NOUN PUNCT

det

nsubjpass

root

det

nmod

punct

Hunden jagades av katten .NOUN VERB ADP NOUN PUNCT

Definite=Def Voice=Pass Definite=Def

nsubjpass

root

nmod

punct

3

Page 64: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Dependency Relations

Page 65: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Dependency Relations

• Taxonomy of 40 universal grammatical relations, broadly attested in language typology (de Marneffe et al., 2014)• Language-specific subtypes may be added

Page 66: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Dependency Relations

• Taxonomy of 40 universal grammatical relations, broadly attested in language typology (de Marneffe et al., 2014)• Language-specific subtypes may be added

• Organizing principles• Three types of structures: nominals, clauses, modifiers

• Core arguments vs. other dependents (not complements vs. adjuncts)

Page 67: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Dependents of Clausal Predicates

Nominal Clausal Other

Core

nsubjnsubjpass

dobjiobj

csubjcsubjpassccompxcomp

Non-Core

nmodvocative

discourseexpl

advcl

advmodnegaux

auxpasscop

markpunct

Page 68: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics
Page 69: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Mary was quietly reading a book in the garden .PROPN AUX ADV VERB DET NOUN ADP DET NOUN PUNCT

nsubj

aux

advmod

root

det

dobj case

det

nmod

punct

If you are sick , you should not exercise .SCONJ PRON AUX ADJ PUNCT PRON AUX ADV VERB PUNCT

mark

nsubj

cop

advcl

punct

nsubj

aux

neg

root

punct

Peter thought that he should stop smoking .PROPN VERB SCONJ PRON AUX VERB VERB PUNCT

nsubj

root

mark

nsubj

aux

ccomp

xcomp

punct

Cairo , the lovely capital of EgyptPROPN PUNCT DET ADJ NOUN ADP PROPN

mark

det

amod

appos

case

nmod

Hewey , Dewey and LouiePROPN PUNCT PROPN CONJ PROPN

punct

conj

cc

conj

5

Page 70: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Mary was quietly reading a book in the garden .PROPN AUX ADV VERB DET NOUN ADP DET NOUN PUNCT

nsubj

aux

advmod

root

det

dobj case

det

nmod

punct

If you are sick , you should not exercise .SCONJ PRON AUX ADJ PUNCT PRON AUX ADV VERB PUNCT

mark

nsubj

cop

advcl

punct

nsubj

aux

neg

root

punct

Peter thought that he should stop smoking .PROPN VERB SCONJ PRON AUX VERB VERB PUNCT

nsubj

root

mark

nsubj

aux

ccomp

xcomp

punct

Cairo , the lovely capital of EgyptPROPN PUNCT DET ADJ NOUN ADP PROPN

mark

det

amod

appos

case

nmod

Hewey , Dewey and LouiePROPN PUNCT PROPN CONJ PROPN

punct

conj

cc

conj

5

Mary was quietly reading a book in the garden .PROPN AUX ADV VERB DET NOUN ADP DET NOUN PUNCT

nsubj

aux

advmod

root

det

dobj case

det

nmod

punct

If you are sick , you should not exercise .SCONJ PRON AUX ADJ PUNCT PRON AUX ADV VERB PUNCT

mark

nsubj

cop

advcl

punct

nsubj

aux

neg

root

punct

Peter thought that he should stop smoking .PROPN VERB SCONJ PRON AUX VERB VERB PUNCT

nsubj

root

mark

nsubj

aux

ccomp

xcomp

punct

Cairo , the lovely capital of EgyptPROPN PUNCT DET ADJ NOUN ADP PROPN

mark

det

amod

appos

case

nmod

Hewey , Dewey and LouiePROPN PUNCT PROPN CONJ PROPN

punct

conj

cc

conj

5

Page 71: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Mary was quietly reading a book in the garden .PROPN AUX ADV VERB DET NOUN ADP DET NOUN PUNCT

nsubj

aux

advmod

root

det

dobj case

det

nmod

punct

If you are sick , you should not exercise .SCONJ PRON AUX ADJ PUNCT PRON AUX ADV VERB PUNCT

mark

nsubj

cop

advcl

punct

nsubj

aux

neg

root

punct

Peter thought that he should stop smoking .PROPN VERB SCONJ PRON AUX VERB VERB PUNCT

nsubj

root

mark

nsubj

aux

ccomp

xcomp

punct

Cairo , the lovely capital of EgyptPROPN PUNCT DET ADJ NOUN ADP PROPN

mark

det

amod

appos

case

nmod

Hewey , Dewey and LouiePROPN PUNCT PROPN CONJ PROPN

punct

conj

cc

conj

5

Mary was quietly reading a book in the garden .PROPN AUX ADV VERB DET NOUN ADP DET NOUN PUNCT

nsubj

aux

advmod

root

det

dobj case

det

nmod

punct

If you are sick , you should not exercise .SCONJ PRON AUX ADJ PUNCT PRON AUX ADV VERB PUNCT

mark

nsubj

cop

advcl

punct

nsubj

aux

neg

root

punct

Peter thought that he should stop smoking .PROPN VERB SCONJ PRON AUX VERB VERB PUNCT

nsubj

root

mark

nsubj

aux

ccomp

xcomp

punct

Cairo , the lovely capital of EgyptPROPN PUNCT DET ADJ NOUN ADP PROPN

mark

det

amod

appos

case

nmod

Hewey , Dewey and LouiePROPN PUNCT PROPN CONJ PROPN

punct

conj

cc

conj

5

Mary was quietly reading a book in the garden .PROPN AUX ADV VERB DET NOUN ADP DET NOUN PUNCT

nsubj

aux

advmod

root

det

dobj case

det

nmod

punct

If you are sick , you should not exercise .SCONJ PRON AUX ADJ PUNCT PRON AUX ADV VERB PUNCT

mark

nsubj

cop

advcl

punct

nsubj

aux

neg

root

punct

Peter thought that he should stop smoking .PROPN VERB SCONJ PRON AUX VERB VERB PUNCT

nsubj

root

mark

nsubj

aux

ccomp

xcomp

punct

Cairo , the lovely capital of EgyptPROPN PUNCT DET ADJ NOUN ADP PROPN

mark

det

amod

appos

case

nmod

Hewey , Dewey and LouiePROPN PUNCT PROPN CONJ PROPN

punct

conj

cc

conj

5

Page 72: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Mary was quietly reading a book in the garden .PROPN AUX ADV VERB DET NOUN ADP DET NOUN PUNCT

nsubj

aux

advmod

root

det

dobj case

det

nmod

punct

If you are sick , you should not exercise .SCONJ PRON AUX ADJ PUNCT PRON AUX ADV VERB PUNCT

mark

nsubj

cop

advcl

punct

nsubj

aux

neg

root

punct

Peter thought that he should stop smoking .PROPN VERB SCONJ PRON AUX VERB VERB PUNCT

nsubj

root

mark

nsubj

aux

ccomp

xcomp

punct

Cairo , the lovely capital of EgyptPROPN PUNCT DET ADJ NOUN ADP PROPN

mark

det

amod

appos

case

nmod

Helsinki , the lovely capital of FinlandPROPN PUNCT DET ADJ NOUN ADP PROPN

mark

det

amod

appos

case

nmod

Huey , Dewey and LouiePROPN PUNCT PROPN CONJ PROPN

punct

conj

cc

conj

5

Dependents of Nominals

Nominal Clausal Other

nummodapposnmod

aclamoddetcase

Page 73: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

• Coordinate structures are headed by the first conjunct• Subsequent conjuncts depend on it via the conj relation

• Conjunctions depend on it via the cc relation

• Punctuation marks depend on it via the punct relation

Mary was quietly reading a book in the garden .PROPN AUX ADV VERB DET NOUN ADP DET NOUN PUNCT

nsubj

aux

advmod

root

det

dobj case

det

nmod

punct

If you are sick , you should not exercise .SCONJ PRON AUX ADJ PUNCT PRON AUX ADV VERB PUNCT

mark

nsubj

cop

advcl

punct

nsubj

aux

neg

root

punct

Peter thought that he should stop smoking .PROPN VERB SCONJ PRON AUX VERB VERB PUNCT

nsubj

root

mark

nsubj

aux

ccomp

xcomp

punct

Cairo , the lovely capital of EgyptPROPN PUNCT DET ADJ NOUN ADP PROPN

mark

det

amod

appos

case

nmod

Huey , Dewey and LouiePROPN PUNCT PROPN CONJ PROPN

punct

conj

cc

conj

5

CoordinationCoordination

conj

cc

(punct)

Page 74: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Multiword Expressions

• UD annotation does not permit “words with spaces”• Multiword expressions are analysed using special relations

• The mwe, name and goeswith relations are always head-initial

• The compound relation reflects the internal structure

Relation Examples

mwe in spite of, as well as, ad hoc

name Roger Bacon, Carl XVI Gustaf, New York

compound phone book, four thousand, dress up

goeswith notwith standing, with out

Page 75: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Other Relations

Relation Explanation

parataxis Loosely linked clauses of same rank

list Lists without syntactic structure

remnant Orphans in ellipsis linked to parallel elements

reparandum Disfluency linked to (speech) repair

foreign Elements within opaque stretches of code switching

dep Unspecified dependency

root Syntactically independent element of clause/phrase

Page 76: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Language-Specific Relations• Language-specific relations are subtypes of universal

relations added to capture important phenomena

• Subtyping permits us to “back off” to universal relations

Relation Explanation

acl:relcl Relative clause

compound:prt Verb particle (dress up)

nmod:poss Genitive nominal (Mary’s book)

nmod:agent Agent in passive (saved by the bell)

cc:preconj Preconjunction (both … and)

det:predet Predeterminer (all those …)

Page 77: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Where are we now?

Page 78: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Where are we now?• Universal Dependencies, Version 1

• Guidelines released October 2014

• Latest treebank release November 2015 (v1.2):Ancient Greek, Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Old Church Slavonic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil

Page 79: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Where are we now?• Universal Dependencies, Version 1

• Guidelines released October 2014

• Latest treebank release November 2015 (v1.2):Ancient Greek, Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Old Church Slavonic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil

• Future plans:• New releases every six months (May, November)

• Revision of guidelines as needed

Page 80: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Where are we now?• Universal Dependencies, Version 1

• Guidelines released October 2014

• Latest treebank release November 2015 (v1.2):Ancient Greek, Arabic, Basque, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Gothic, Greek, Hebrew, Hindi, Hungarian, Indonesian, Irish, Italian, Japanese, Latin, Norwegian, Old Church Slavonic, Persian, Polish, Portuguese, Romanian, Slovenian, Spanish, Swedish, Tamil

• Future plans:• New releases every six months (May, November)

• Revision of guidelines as needed

• Have a look at http://universaldependencies.org

Page 81: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

So what exactly is UD?

Page 82: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

So what exactly is UD?• A new linguistic theory?

Not at all, but we like to think it is informed by linguistic theory and potentially useful also for linguistic studies

Page 83: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

So what exactly is UD?• A new linguistic theory?

Not at all, but we like to think it is informed by linguistic theory and potentially useful also for linguistic studies

• A better parsing framework?Probably not, since parsers seem to prefer function words as heads so we may have to tweak the representations for parsing

Page 84: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

So what exactly is UD?• A new linguistic theory?

Not at all, but we like to think it is informed by linguistic theory and potentially useful also for linguistic studies

• A better parsing framework?Probably not, since parsers seem to prefer function words as heads so we may have to tweak the representations for parsing

• The ultimate annotation scheme?Not quite, more like a lingua franca for treebank developers and definitely useful for some annotation projects

Page 85: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

So what exactly is UD?• A new linguistic theory?

Not at all, but we like to think it is informed by linguistic theory and potentially useful also for linguistic studies

• A better parsing framework?Probably not, since parsers seem to prefer function words as heads so we may have to tweak the representations for parsing

• The ultimate annotation scheme?Not quite, more like a lingua franca for treebank developers and definitely useful for some annotation projects

• A universal grammar?Not in the Chomskyan sense, but hopefully in the more practical sense of facilitating multilingual NLP by bringing a little order into the chaos

Page 86: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

So what exactly is UD?• A new linguistic theory?

Not at all, but we like to think it is informed by linguistic theory and potentially useful also for linguistic studies

• A better parsing framework?Probably not, since parsers seem to prefer function words as heads so we may have to tweak the representations for parsing

• The ultimate annotation scheme?Not quite, more like a lingua franca for treebank developers and definitely useful for some annotation projects

• A universal grammar?Not in the Chomskyan sense, but hopefully in the more practical sense of facilitating multilingual NLP by bringing a little order into the chaos

Well, who knows?

Page 87: Towards a Universal Grammar for Natural Language Processing · Towards a Universal Grammar for Natural Language Processing Joakim Nivre Uppsala University Department of Linguistics

Acknowledgments• Core UD group: Filip Ginter, Yoav Goldberg, Jan Hajič, Chris Manning, Ryan

McDonald, Natalia Silveira, Marie de Marneffe, Slav Petrov, Sampo Pyysalo, Reut Tsarfaty, Dan Zeman

• UD contributors: Željko Agić, Riyaz Ahmad, Maria Jesus Aranzabe, Masayuki Asahara, Aitziber Atutxa, Cristina Bosco, Giuseppe G. A. Celano, Jinho Choi, Çağrı Çöltekin, Kaja Dobrovoljc, Timothy Dozat, Binyam Ephrem, Tomaž Erjavec, Richárd Farkas, Jennifer Foster, Koldo Gojenola, Iakes Goenaga, Bruno Guillaume, Nizar Habash, Dag Haug, Anders Trærup Johannsen, Hiroshi Kanayama, Jenna Kanerva, Simon Krek, Juha Kuokkala, Veronika Laippala, Alessandro Lenci, Krister Lindén, Nikola Ljubešić, Olga Lyashevskaya, Teresa Lynn, Aibek Makazhanov, Catalina Maranduc, Héctor Martínez Alonso, Anna Missilä, Verginica Mititelu, Yusuke Miyao, Simonetta Montemagni, Shinsuke Mori, Hanna Nurmi, Petya Osenova, Lilja Øvrelid, Elena Pascual, Marco Passarotti, Jussi Piitulainen, Barbara Plank, Prokopis Prokopidis, Loganathan Ramasamy, Wolfgang Seeker, Mojgan Seraji, Maria Simi, Kiril Simov, Arne Skjæerholt, Aaron Smith, Jan Štěpánek,Takaaki Tanaka, Francis Tyers, Sumire Uematsu, Veronika Vincze, Rob Voigt, Jonathan Washington