CS11-737: Multilingual Natural Language Processing
Yulia Tsvetkov
Morphological Analysis and Inflection
What is a word
Bob’s handy man is a do-it-yourself kinda guy, isn’t he?
Morphology
The study of the formation and internal structure of words
Morpheme
Image from Lori Levin and David R. Mortensen’s draft book “Human Languages for Artificial Intelligence”
Words are made of morphemes
Bob’s handy man is a do-it-yourself kinda guy, isn’t he?
freemorpheme
boundmorphemes
Example by Austin Matthews
Morphological processes
● concatenation● affixation = stem+affix
○ prefix○ suffix
● non-concatenative affixation○ infix
● compounding = stem+stem
stemprefix + stemprefix + stem + suffix=circumfixation
=
Tagalog
● Tagalog○ stem - bundok ○ singular - mabundok○ plural - mabubundok○ gloss - ‘mountainous’
Example from Lori Levin and David R. Mortensen’s draft book “Human Languages for Artificial Intelligence”
Arabic, Chinese
● Arabic○ root and pattern morphology
● Chinese○ compound words
Morphological functions
● Derivational morphemes ○ bound morphemes used to create new words ○ is these affixes are attached to a new base, the
resulting combination yields a word with a new meaning
○ often derived word belongs to a different syntactic class
● Inflectional morphemes○ bound morphemes used to mark grammatical
distinctions○ change the form but not POS tag or the key meaning
of the word
=
Interlinear glossed text (IGT)
● https://www.eva.mpg.de/lingua/resources/glossing-rules.php
Interlinear glossed text (IGT)
● https://www.eva.mpg.de/lingua/resources/glossing-rules.php
Types of morphological categories and functions
1. Nounsa. NUMBER: Singular, Dual, Pluralb. GENDER (natural & grammatical): Masculine, Feminine, Neuter (Animate, Vegetable; AND AGREEMENTc. DEFINITENESS: Definite, Indefinited. POSSESSION: 1st, 2nd, 3rd; Singular & Plurale. NOUN CLASS (Grammatical gender): Declension types I, II, III, etc.f. CASE PARADIGM (DECLENSION)
2. Adjectivesa. RELATIONAL : QUALITATIVE : DEFECTIVEb. DEGREE: Comparative and Superlative
3. Verbsa. TRANSITIVITY: Transitive, Intransitiveb. ASPECT: Perfective, Imperfectivec. TENSE: Distant Past, Past, Present, Future, Distant Futured. VOICE: Active, Passive e. MOOD: Indicative, Imperative, Subjunctivef. Conjugation Class: I, II, III Conjugations and Conjugations: 1st, 2nd, 3rd Person, Sg, Pl Agreement
Morphological typology
● Isolating or Analytic○ Vietnamese, Chinese, English
● Synthetic○ Fusional or Flexional
■ German, Greek, Russian■ Templatic: Hebrew and Arabic
○ Agglutinative or Agglutinating■ Finnish, Turkish, Malayalam, Swahili
○ Polysynthetic ■ Inuit, Yupik
(Cettolo, Girardi, & Federico, 2012)
Type-token curves
Why is rich morphology a challenge for NLP?
● High type-token ratio due to the large variety of grammatical features expressed with morphology
○ This leads to the lexical sparsity and out-of-vocabulary words
● In language generation long-range relations between words need to be enforced for modeling morphological agreement
○ This leads to agreement errors
● Morphological properties vary across languages and language families, and mapping of morphological features across languages is a challenge
○ This is exacerbated by the variability of morphological rules and irregularities (e.g. dance → danced → danced but eat → ate → eaten)
○ This leads to problems in transfer learning, translation errors, and biases in translation
Types of morphological processing
● Analysis○ morphological parsing○ morphological segmentation
● Generation○ inflection generation ○ paradigm completion
● Acquisition of inflectional morphology
Morphological analysis
Morphological analysis with FSTs
Morphological analysis with RNNs
Canonical segmentation
a. surface segmentation: achievability → achiev+abil+ity
b. canonical segmentation achievability → achieve+able+ity
1. Character bidirectional GRU encoder with attention
2. GRU decoder produces output characters3. Neural reranker for segments to identify
canonical segments
Evaluation of morphological analysis
● Error rate● Edit distance ● Morpheme F1
1. Inflection generation
2. Paradigm completion
Morphological generation
The SIGMORPHON shared tasks
● Cross-lingual transfer for morphological inflection● Morphological analysis in context● Morphological paradigm completion
Morphological inflection generation
Paper for class discussion
● https://www.aclweb.org/anthology/D19-1091.pdf● Read the paper● Provide critique to a part of the paper (e.g., focusing on an individual
component of proposed model architecture or a part of experimental setup)● Propose directions for follow-up work