evaluating an agglutinative segmentation model for paramor
DESCRIPTION
Evaluating an Agglutinative Segmentation Model for ParaMor. Christian Monson Jaime Carbonell Alon Lavie Lori Levin Carnegie Mellon University. Turkish Morphology – Beads on a String. One Turkish Word. götür. ül. m. ü yor. s u n. present progressive. 2 nd person singular. take. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/1.jpg)
Carnegie Mellon
Christian Monson
Evaluating an Agglutinative
Segmentation Model for ParaMor
Christian Monson
Jaime Carbonell
Alon Lavie
Lori Levin
Carnegie Mellon University
![Page 2: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/2.jpg)
2Carnegie Mellon
Christian Monson
I am not being taken
Turkish Morphology – Beads on a String
götür ül m sunüyor
take passive negativepresent
progressive2nd person singular
One Turkish Word
![Page 3: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/3.jpg)
3Carnegie Mellon
Christian Monson
Computational Morphology Improves:
Machine TranslationTurkish-English (Oflazer, 2007)
Czech-English (Goldwater and McClosky, 2005)
Speech RecognitionFinnish (Creutz, 2006)
Grapheme-to-Phoneme ConversionGerman (Demberg, 2007)
Information RetrievalEnglish, German, Finnish (Kurimo et al., 2008)
![Page 4: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/4.jpg)
4Carnegie Mellon
Christian Monson
Morphology is Complex
Operations
Suffix, Prefix, Reduplication, …
Purpose
Inflection vs. Derivation
Morphophonology
Ambiguity
![Page 5: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/5.jpg)
5Carnegie Mellon
Christian Monson
Complexity Demands Time and Expertise
Kemal OflazerExpert on
Turkish
Computational morphology
Time3 - 4 Months to manually build a basic Turkish analyzer
Plus lexicon development and maintenance
![Page 6: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/6.jpg)
6Carnegie Mellon
Christian Monson
The SolutionRaw Text
Unsupervised Morphology
Induction
![Page 7: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/7.jpg)
7Carnegie Mellon
Christian Monson
The SolutionRaw Text
?
![Page 8: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/8.jpg)
8Carnegie Mellon
Christian Monson
Techniques for Unsupervised Morphology Induction
Transition Likelihood
Harris (1955) – Finite State Automata
Bernhard (2007)
![Page 9: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/9.jpg)
9Carnegie Mellon
Christian Monson
Transition Likelihood
Harris (1955) – Finite State Automata
Bernhard (2007)
Minimum Description LengthGoldsmith (2001, 2006)
Creutz’s Morfessor (2006)
Techniques for Unsupervised Morphology Induction
![Page 10: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/10.jpg)
10Carnegie Mellon
Christian Monson
Transition Likelihood
Harris (1955) – Finite State Automata
Bernhard (2007)
Statistical or Minimum Description LengthGoldsmith (2001, 2006)
Creutz’s Morfessor (2006)
The ParadigmSnover (2002)
ParaMor (2004, 2007)
Techniques for Unsupervised Morphology Induction
![Page 11: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/11.jpg)
11Carnegie Mellon
Christian Monson
What is a Paradigm?
ül m sunüyor
take passive negativepresent
progressive2nd person singular
götür
![Page 12: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/12.jpg)
12Carnegie Mellon
Christian Monson
ül m sunüyor
take passive negativepresent
progressive2nd person singular
götür
Person & Number
Paradigms Structure Inflectional Morphology
![Page 13: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/13.jpg)
13Carnegie Mellon
Christian Monson
um
Person & Number
1st person singular
umül m üyor
take passive negativepresent
progressive
götür
Paradigms Structure Inflectional Morphology
![Page 14: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/14.jpg)
14Carnegie Mellon
Christian Monson
um
Person & Number
3rd person singular
umØ
ül m üyor
take passive negativepresent
progressive
götür
Paradigms Structure Inflectional Morphology
![Page 15: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/15.jpg)
15Carnegie Mellon
Christian Monson
umumØuz
ül m üyor
take passive negativepresent
progressive
götür
Person & Number
Paradigms Structure Inflectional Morphology
![Page 16: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/16.jpg)
16Carnegie Mellon
Christian Monson
umumØuz
ül m üyor
take passive negativepresent
progressive
götür
ParadigmMutually substitutable morphological operations
Paradigm
Paradigms Structure Inflectional Morphology
![Page 17: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/17.jpg)
17Carnegie Mellon
Christian Monson
ül m um
Voice PolarityTense & Aspect
Person & Number
umØuz
üyoryecek
Paradigms Structure Inflectional Morphology
![Page 18: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/18.jpg)
18Carnegie Mellon
Christian Monson
Paradigms
ParadigmMutually substitutable morphological operations
ül m umumØuz
üyoryecek
Paradigms Structure Inflectional Morphology
![Page 19: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/19.jpg)
19Carnegie Mellon
Christian Monson
Paradigm
ül m umumØuz
üyoryecek
ParadigmMutually substitutable strings
The ParaMor Algorithm
![Page 20: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/20.jpg)
20Carnegie Mellon
Christian Monson
Paradigm
ül m umumØuz
üyoryecek
Candidate Stems
1 Morpheme Boundary
The ParaMor Algorithm
![Page 21: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/21.jpg)
21Carnegie Mellon
Christian Monson
The ParaMor Algorithm
Simplifying Assumptions
Suffixes only70% of the World’s Languages are Suffixing (Dryer, 2005)
No morphophonology
Only a High-Level Overview
![Page 22: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/22.jpg)
22Carnegie Mellon
Christian Monson
The ParaMor Algorithm
Identify Paradigms in 3 Steps
ParaMorIdentify
![Page 23: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/23.jpg)
23Carnegie Mellon
Christian Monson
The ParaMor Algorithm
Identify Paradigms in 3 Steps1. Search for candidate paradigms
ParaMorIdentify
Search
![Page 24: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/24.jpg)
24Carnegie Mellon
Christian Monson
The ParaMor Algorithm
Identify Paradigms in 3 Steps1. Search for candidate paradigms
2. Cluster candidates modeling the same paradigm
ParaMorIdentify
SearchCluster
![Page 25: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/25.jpg)
25Carnegie Mellon
Christian Monson
The ParaMor Algorithm
Identify Paradigms in 3 Steps1. Search for candidate paradigms
2. Cluster candidates modeling the same paradigm
3. Filter
ParaMorIdentify
SearchClusterFilter
![Page 26: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/26.jpg)
26Carnegie Mellon
Christian Monson
The ParaMor Algorithm
Identify Paradigms in 3 Steps1. Search for candidate paradigms
2. Cluster candidates modeling the same paradigm
3. Filter
Segment Words Using the discovered paradigms
ParaMorIdentify
SearchClusterFilter
Segment
![Page 27: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/27.jpg)
27Carnegie Mellon
Christian Monson
This Presentation
Identify Paradigms in 3 Steps1. Search for candidate paradigms
2. Cluster candidates modeling the same paradigm
3. Filter
Segment Words Using the discovered paradigms
Example Search
Full Description in Monson et al. (SIGMORPHON 2007)
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 28: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/28.jpg)
28Carnegie Mellon
Christian Monson
This Presentation
Identify Paradigms in 3 Steps1. Search for candidate paradigms
2. Cluster candidates modeling the same paradigm
3. Filter
Segment Words Using the discovered paradigms
Agglutinative Segmentation Model
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 29: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/29.jpg)
29Carnegie Mellon
Christian Monson
This Paper
Identify Paradigms in 3 Steps1. Search for candidate paradigms
2. Cluster candidates modeling the same paradigm
3. Filter
Segment Words Using the discovered paradigms
2 Filters Adapted from
Harris (1955) and Goldsmith (2006)
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 30: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/30.jpg)
30Carnegie Mellon
Christian Monson
The ParaMor Algorithm
Identify Paradigms in 3 Steps1. Search for candidate paradigms
2. Cluster candidates modeling the same paradigm
3. Filter
Segment Words Using the discovered paradigms
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 31: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/31.jpg)
31Carnegie Mellon
Christian Monson
s10697
autorizacionesbuscabamos
costasimportadoras
vallas…
Search for Candidate Paradigms
Spanish Example
Propose a morpheme boundary at every character boundary in every word
Consolidate identical candidate suffixes into paradigm seeds
Word List50,000 Types
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 32: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/32.jpg)
32Carnegie Mellon
Christian Monson
s10697
autorizacionesbuscabamos
costaØ costasimportadoraØ importadoras
vallaØ vallas…
Ø s5513
Identify the most frequent mutually replaceable candidate suffix
Stems that occur with one suffix in a paradigm will likely occur with other suffixes in that paradigm
Search for Candidate ParadigmsParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
![Page 33: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/33.jpg)
33Carnegie Mellon
Christian Monson
s10697
A Parameter halts the introduction of suffixes When the most frequent
mutually replaceable candidate suffix severely decreases the stem count
Ø s5513
Ø r s
281autorizaciones
buscabamos costar costaØ
costasimportadoraØ importadoras
vallaØ vallas…
Search for Candidate ParadigmsParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
![Page 34: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/34.jpg)
34Carnegie Mellon
Christian Monson
Move on to the next most frequent paradigm seed
a9020
s10697
Ø s5513
Ø r s
281
Search for Candidate ParadigmsParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
![Page 35: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/35.jpg)
35Carnegie Mellon
Christian Monson
a9020
a o2325
a o os
1418
a as o os899
s10697
Ø s5513
Ø r s
281
Search for Candidate ParadigmsParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
![Page 36: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/36.jpg)
36Carnegie Mellon
Christian Monson
n6039
Ø n1863
Ø n r
512
Ø do n r357
Ø da das do dos n ndo r ron
115
a9020
a o2325
a o os
1418
a as o os899
s10697
Ø s5513
Ø r s
281
Search for Candidate ParadigmsParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
![Page 37: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/37.jpg)
37Carnegie Mellon
Christian Monson
es2750
Ø es845
n6039
Ø n1863
Ø n r
512
Ø do n r357
Ø da das do dos n ndo r ron
115
a9020
a o2325
a o os
1418
a as o os899
s10697
Ø s5513
Ø r s
281
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
Search for Candidate Paradigms
![Page 38: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/38.jpg)
38Carnegie Mellon
Christian Monson
an1784
a an1045
a an ar
417
a an ar ó355
a ada adas ado ados an
ar aron ó148
es2750
Ø es845
n6039
Ø n1863
Ø n r
512
Ø do n r357
Ø da das do dos n ndo r ron
115
a9020
a o2325
a o os
1418
a as o os899
s10697
Ø s5513
Ø r s
281
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
Search for Candidate Paradigms
![Page 39: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/39.jpg)
39Carnegie Mellon
Christian Monson
strado15
rado167
rada radas rado rados
53
rada radorados
67
rada rado89
ra rada radasrado rados ran
rar raron ró23
strada strado12
strada strado stró
9
strada strado strar stró
8
strada stradas strado strar stró
7
...an
1784
a an1045
a an ar
417
a an ar ó355
a ada adas ado ados an
ar aron ó148
es2750
Ø es845
n6039
Ø n1863
Ø n r
512
Ø do n r357
Ø da das do dos n ndo r ron
115
a9020
a o2325
a o os
1418
a as o os899
s10697
Ø s5513
Ø r s
281
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
Search for Candidate Paradigms
![Page 40: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/40.jpg)
40Carnegie Mellon
Christian Monson
The ParaMor Algorithm
Identify Paradigms in 3 Steps1. Search for candidate paradigms
2. Cluster candidates modeling the same paradigm
3. Filter
Segment Words Using the discovered paradigms
ParaMorIdentify
SearchClusterFilter
Segment
![Page 41: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/41.jpg)
41Carnegie Mellon
Christian Monson
A Few of the 42 Final Paradigms4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
![Page 42: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/42.jpg)
42Carnegie Mellon
Christian Monson
4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
A Few of the 42 Final Paradigms
Number on Nouns
![Page 43: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/43.jpg)
43Carnegie Mellon
Christian Monson
A Few of the 42 Final Paradigms4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
Number & Gender on Adjectives
![Page 44: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/44.jpg)
44Carnegie Mellon
Christian Monson
A Few of the 42 Final Paradigms4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
Verbal Suffixes
![Page 45: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/45.jpg)
45Carnegie Mellon
Christian Monson
The ParaMor Algorithm
Identify Paradigms in 3 Steps1. Search for candidate paradigms
2. Cluster candidates modeling the same paradigm
3. Filter
Segment Words Using the discovered paradigms
ParaMorIdentify
SearchClusterFilter
Segment
Agglutinative Segmentation Model
![Page 46: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/46.jpg)
46Carnegie Mellon
Christian Monson
Segment Words Using the Paradigms4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
administradas‘Feminine gender nouns under administration’
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 47: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/47.jpg)
47Carnegie Mellon
Christian Monson
Segment Words Using the Paradigms4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
administr + ad + a + s
Past Participle
FemininePlural
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 48: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/48.jpg)
48Carnegie Mellon
Christian Monson
4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
administradas
Segment Words Using the ParadigmsParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
![Page 49: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/49.jpg)
49Carnegie Mellon
Christian Monson
4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
administradas administrada
Also in corpus
Segment Words Using the ParadigmsParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
![Page 50: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/50.jpg)
50Carnegie Mellon
Christian Monson
4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
administradas administrada
Segment Words Using the ParadigmsParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
![Page 51: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/51.jpg)
51Carnegie Mellon
Christian Monson
4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
administradas administradaØ
Segment Words Using the ParadigmsParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
![Page 52: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/52.jpg)
52Carnegie Mellon
Christian Monson
Segment Words Using the Paradigms4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
administr + ad + a + s
Recovers multiple morpheme boundaries
from candidate paradigms which each propose single morpheme boundaries
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 53: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/53.jpg)
53Carnegie Mellon
Christian Monson
Segment Words Using the Paradigms4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
administr + ad + a + s
Baseline ParaMor
single morpheme boundary in each analysis of each word
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 54: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/54.jpg)
54Carnegie Mellon
Christian Monson
Morpho Challenge 2007Morpho Challenge 2007ParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
Peer operated competition For unsupervised morphology
induction algorithms
![Page 55: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/55.jpg)
55Carnegie Mellon
Christian Monson
Morpho Challenge 2007Morpho Challenge 2007ParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
Peer operated competition For unsupervised morphology
induction algorithms
4 languagesEnglish ( 384,904 Types)
German (1,266,160 Types)
Finnish (2,206,720 Types)
Turkish ( 617,299 Types)
![Page 56: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/56.jpg)
56Carnegie Mellon
Christian Monson
Morpho Challenge 2007Morpho Challenge 2007ParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
Peer operated competition For unsupervised morphology
induction algorithms
4 languagesEnglish ( 384,904 Types)
German (1,266,160 Types)
Finnish (2,206,720 Types)
Turkish ( 617,299 Types)
2 methods of evaluationLinguistic – Morpheme IdentificationInformation Retrieval
![Page 57: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/57.jpg)
57Carnegie Mellon
Christian Monson
Morpho Challenge 2007Morpho Challenge 2007
Today
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
Peer operated competition For unsupervised morphology
induction algorithms
4 languagesEnglish ( 384,904 Types)
German (1,266,160 Types)
Finnish (2,206,720 Types)
Turkish ( 617,299 Types)
2 methods of evaluationLinguistic – Morpheme IdentificationInformation Retrieval
![Page 58: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/58.jpg)
58Carnegie Mellon
Christian Monson
Morpho Challenge 2007Morpho Challenge 2007
Developed on SpanishParameters Frozen
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
Peer operated competition For unsupervised morphology
induction algorithms
4 languagesEnglish ( 384,904 Types)
German (1,266,160 Types)
Finnish (2,206,720 Types)
Turkish ( 617,299 Types)
2 methods of evaluationLinguistic – Morpheme IdentificationInformation Retrieval
![Page 59: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/59.jpg)
59Carnegie Mellon
Christian Monson
Combine ParaMor and Morfessor
MorfessorFreely available unsupervised morphology
induction system (Creutz, 2006)
Combine ParaMor and Morfessor Performs better than either system alone
(Monson et al., 2007)
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 60: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/60.jpg)
60Carnegie Mellon
Christian Monson
20
40
60
English German Finnish Turkish
Linguistic Evaluation
Par
aMor
& M
orfe
ssor
50.7
47.2
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
F1
Mor
fess
or
Baseline
![Page 61: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/61.jpg)
61Carnegie Mellon
Christian Monson
20
40
60
English German Finnish Turkish
Linguistic Evaluation
Mor
fess
or
Agg
lutin
ativ
e P
& M
56.3
50.7
47.2
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
F1
Par
aMor
& M
orfe
ssor
![Page 62: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/62.jpg)
62Carnegie Mellon
Christian Monson
20
40
60
English German Finnish Turkish
Linguistic Evaluation
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
60.8
56.3
50.7
47.2
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
F1
Par
aMor
& M
orfe
ssor
![Page 63: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/63.jpg)
63Carnegie Mellon
Christian Monson
20
40
60
English German Finnish Turkish
Linguistic Evaluation
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
Ber
nhar
d
Mor
fess
or
Par
aMor
& M
orfe
ssor
60.8
56.3
52.950.7
47.2 47.8
53.4
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
F1
Par
aMor
& M
orfe
ssor
![Page 64: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/64.jpg)
64Carnegie Mellon
Christian Monson
20
40
60
English German Finnish Turkish
Linguistic Evaluation
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
Ber
nhar
d
Mor
fess
or
Par
aMor
& M
orfe
ssor
Agg
lutin
ativ
e P
& M
60.8
56.3
52.954.1
50.7
47.2 47.8
53.4
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
F1
Par
aMor
& M
orfe
ssor
![Page 65: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/65.jpg)
65Carnegie Mellon
Christian Monson
20
40
60
English German Finnish Turkish
Linguistic Evaluation
Ber
nhar
d
Mor
fess
or
Par
aMor
& M
orf.
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
60.8
56.3
52.954.1
48.250.7
47.2 47.8
53.4
40.643.2
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
F1
Par
aMor
& M
orfe
ssor
Par
aMor
& M
orfe
ssor
![Page 66: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/66.jpg)
66Carnegie Mellon
Christian Monson
20
40
60
English German Finnish Turkish
Linguistic Evaluation
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
60.8
56.3
52.954.1
48.2 48.550.7
47.2 47.8
53.4
40.643.2
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
F1
Par
aMor
& M
orfe
ssor
Par
aMor
& M
orfe
ssor
Par
aMor
& M
orf.
![Page 67: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/67.jpg)
67Carnegie Mellon
Christian Monson
20
40
60
English German Finnish Turkish
Linguistic Evaluation
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
60.8
56.3
52.954.1
48.2 48.5
24.7
50.7
47.2 47.8
53.4
40.643.2
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
F1
Par
aMor
& M
orfe
ssor
Par
aMor
& M
orfe
ssor
Par
aMor
& M
orf.
![Page 68: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/68.jpg)
68Carnegie Mellon
Christian Monson
20
40
60
English German Finnish Turkish
Linguistic Evaluation
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
Mor
fess
or
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
60.8
56.3
52.954.1
48.2 48.5
24.7
50.7
47.2 47.8
53.4
40.638.5
43.2
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
F1
Par
aMor
& M
orfe
ssor
Par
aMor
& M
orfe
ssor
Par
aMor
& M
orf.
![Page 69: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/69.jpg)
69Carnegie Mellon
Christian Monson
20
40
60
English German Finnish Turkish
Linguistic Evaluation
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
Mor
fess
or
Par
aMor
& M
orf.
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
60.8
56.3
52.954.1
48.2 48.5
24.7
50.7
47.2 47.8
53.4
40.638.5
43.2
46.7
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
F1
Par
aMor
& M
orfe
ssor
Par
aMor
& M
orfe
ssor
Par
aMor
& M
orf.
![Page 70: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/70.jpg)
70Carnegie Mellon
Christian Monson
20
40
60
English German Finnish Turkish
Linguistic Evaluation
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
Mor
fess
or
Par
aMor
& M
orf.
Agg
lutin
ativ
e P
& M
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
Ber
nhar
d
Mor
fess
or
Agg
lutin
ativ
e P
& M
60.8
56.3
52.954.1
48.2 48.5
24.7
52.050.7
47.2 47.8
53.4
40.638.5
43.2
46.7
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
F1
Par
aMor
& M
orfe
ssor
Par
aMor
& M
orfe
ssor
Par
aMor
& M
orf.
![Page 71: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/71.jpg)
71Carnegie Mellon
Christian Monson
ParaMor: State-of-the-Art Unsupervised Morphology Induction System
ParaMorIdentifies paradigms
The organizing structure of inflectional morphology
Segments words As discovered paradigms suggest
Our Agglutinative Segmentation ModelSignificantly improves morpheme identification
Particularly for agglutinative languages
![Page 72: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/72.jpg)
72Carnegie Mellon
Christian Monson
The Next Steps for ParaMor
Beyond Suffixes
English, German, Finnish, Turkish, and Spanishare all primarily suffixing
Straightforward extension to ParaMor forPrefixes
More ChallengingReduplicationInfixation etc.
![Page 73: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/73.jpg)
73Carnegie Mellon
Christian Monson
Beyond ParaMor
Improve Performance
Segmentation F1 of 50-60% is state of the art!
Morphophonology is the primary culprit
Simply splitting words cannot identify alternate forms of the same morpheme
![Page 74: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/74.jpg)
74Carnegie Mellon
Christian Monson
![Page 75: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/75.jpg)
75Carnegie Mellon
Christian Monson
![Page 76: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/76.jpg)
76Carnegie Mellon
Christian Monson
The ParaMor Algorithm
Identify Paradigms in 3 Steps1. Search for candidate paradigms
2. Cluster candidates modeling the same paradigm
3. Filter
Segment Words Using the discovered paradigms
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 77: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/77.jpg)
77Carnegie Mellon
Christian Monson
Cluster Paradigm Fragments
15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci- aplic- apoy- celebr- concentr- …
15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci- apoy- confirm- consider- declar- …
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 78: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/78.jpg)
78Carnegie Mellon
Christian Monson
Cluster Paradigm FragmentsParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci- aplic- apoy- celebr- concentr- …
15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci- apoy- confirm- consider- declar- …
anunci+aaplic+aapoy+a…anunci+abaaplic+abaapoy+aba…anunci+aría…
![Page 79: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/79.jpg)
79Carnegie Mellon
Christian Monson
Cluster Paradigm Fragments
anunci+aapoy+aconfirm+a…anunci+abaapoy+abaconfirm+aba…anunci+ara…
anunci+aaplic+aapoy+a…anunci+abaaplic+abaapoy+aba…anunci+aría…
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci- aplic- apoy- celebr- concentr- …
15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci- apoy- confirm- consider- declar- …
![Page 80: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/80.jpg)
80Carnegie Mellon
Christian Monson
Cluster Paradigm Fragments
anunci+aapoy+aconfirm+a…anunci+abaapoy+abaconfirm+aba…anunci+ara…
anunci+aaplic+aapoy+a…anunci+abaaplic+abaapoy+aba…anunci+aría…
Cosine Similarity
0.664
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci- aplic- apoy- celebr- concentr- …
15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci- apoy- confirm- consider- declar- …
![Page 81: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/81.jpg)
81Carnegie Mellon
Christian Monson
Cluster Paradigm Fragments
16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.664
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci- aplic- apoy- celebr- concentr- …
15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci- apoy- confirm- consider- declar- …
![Page 82: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/82.jpg)
82Carnegie Mellon
Christian Monson
Cluster Paradigm Fragments
15: a aba aban ada adas ado ados an ando ar aron arse ará arán ó25 Stems: anunci- aplic- apoy- celebr- consider- …
15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci- aplic- apoy- celebr- concentr- …
15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci- apoy- confirm- consider- declar- …
16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.664
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 83: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/83.jpg)
83Carnegie Mellon
Christian Monson
Cluster Paradigm Fragments
17: a aba aban ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.715
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
15: a aba aban ada adas ado ados an ando ar aron arse ará arán ó25 Stems: anunci- aplic- apoy- celebr- consider- …
15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci- aplic- apoy- celebr- concentr- …
15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci- apoy- confirm- consider- declar- …
16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.664
![Page 84: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/84.jpg)
84Carnegie Mellon
Christian Monson
17: a aba aban ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.715
15: a aba aban ada adas ado ados an ando ar aron arse ará arán ó25 Stems: anunci- aplic- apoy- celebr- consider- …
15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci- aplic- apoy- celebr- concentr- …
15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci- apoy- confirm- consider- declar- …
16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.664
Cluster Paradigm Fragments
Continue Clustering UntilAny merger would place in the same cluster 2
suffixes which share no stem in the corpus
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 85: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/85.jpg)
85Carnegie Mellon
Christian Monson
Cluster Paradigm FragmentsParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
17: a aba aban ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.715
15: a aba aban ada adas ado ados an ando ar aron arse ará arán ó25 Stems: anunci- aplic- apoy- celebr- consider- …
15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci- aplic- apoy- celebr- concentr- …
15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci- apoy- confirm- consider- declar- …
16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.664
Continue Clustering UntilAny merger would place in the same cluster 2
suffixes which share no stem in the corpus
11: a e en ida idas ido idos iendo ieron ió ía15 Stems: culpl- discut- emit- part- recib- reun- transmit- un- vend- viv- …
![Page 86: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/86.jpg)
86Carnegie Mellon
Christian Monson
Cluster Paradigm FragmentsParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
17: a aba aban ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.715
15: a aba aban ada adas ado ados an ando ar aron arse ará arán ó25 Stems: anunci- aplic- apoy- celebr- consider- …
15: a aba ada adas ado ados an ando ar aron arse ará arán aría ó22 Stems: anunci- aplic- apoy- celebr- concentr- …
15: a aba ada adas ado ados an ando ar ara aron arse ará arán ó23 Stems: anunci- apoy- confirm- consider- declar- …
16: a aba ada adas ado ados an ando ar ara aron arse ará arán aría óCosine Similarity: 0.664
In all, 23 initial candidate paradigms joined this cluster
![Page 87: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/87.jpg)
87Carnegie Mellon
Christian Monson
![Page 88: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/88.jpg)
88Carnegie Mellon
Christian Monson
Cluster Overlapping Candidates
Greedy bottom-up agglomerative clustering
Merge most similar candidate paradigms Cosine similarity:
Sets of boundary annotated supporting types
Halting conditionThe corpus must contain paradigmatic evidence for
each pair of suffixes in a cluster:
Two suffixes may not be in the same cluster if they share no common candidate stem in the corpus
YXYX
![Page 89: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/89.jpg)
89Carnegie Mellon
Christian Monson
![Page 90: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/90.jpg)
90Carnegie Mellon
Christian Monson
A Few of the 42 Final Paradigm Clusters4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
![Page 91: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/91.jpg)
91Carnegie Mellon
Christian Monson
Spanish Derivation and Clitics4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
![Page 92: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/92.jpg)
92Carnegie Mellon
Christian Monson
Morphophonology in ParaMor4 SuffixesØ menente mente s
11 Suffixes a amente as illa illas o or ora oras ores os
41 Suffixes a aba aban acion aciones ación ada adas ado ador adora adoras adores ados amos an ando ante antes ar ara aran aremos arla arlas arlo arlos arme aron arse ará arán aré aría arían ase e en ándose é ó
29 Suffixes e edor edora edoras edores en er erlo erlos erse erá erán ería erían ida idas ido idos iendo iera ieran ieron imiento imientos iéndose ió í ía ían
20 Suffixes ida idas ido idor idores idos imos ir iremos irle irlo irlos irse irá irán iré iría irían ía ían
29 Suffixes ce cedores cemos cen cer cerlo cerlos
cerse cerá cerán cería cida cidas cido cidos ciendo ciera cieran cieron cimiento cimientos cimos ció cí cía cían zca zcan zco
6 SuffixesØ es idad idades mente ísima
![Page 93: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/93.jpg)
93Carnegie Mellon
Christian Monson
![Page 94: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/94.jpg)
94Carnegie Mellon
Christian Monson
Morpheme to Feature Mapping
((TENSE past) (LEXICAL-ASPECT activity) ...
(SUBJ ((NUM sg) (PERSON 3sg) ...)))
Subject Number marked in 3 places:
1. on N head with Ø = sg, es = pl 2. on dependent Det with El = sg, Los = pl 3. on governing V with ó = sg, eron = pl
((TENSE past) (LEXICAL-ASPECT activity) ...
(SUBJ ((NUM pl) (PERSON 3sg) ...)))
Los cayeronárbolesEl cayóárbolØ
S
NP VP
VDet N
The tree fell
S
NP VP
VDet N
The trees fell
![Page 95: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/95.jpg)
95Carnegie Mellon
Christian Monson
![Page 96: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/96.jpg)
96Carnegie Mellon
Christian Monson
IR EvaluationParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
Task Based EvaluationInformation retrieval
Data from CLEF (Cross Language Evaluation Forum)
Short two-sentence queries
About international news topics
Binary relevance assessments
About 50 queries and 20K relevance judgments for each language
Okapi term weighting
![Page 97: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/97.jpg)
97Carnegie Mellon
Christian Monson
25
35
45
English German Finnish Turkish
IR EvaluationParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
Ber
nhar
d
Mor
fess
or
Par
aMor
Par
aMor
& M
orf.
39.4 39.639.3
37.2
Average Precision
31.2 – No Morphological Analysis
![Page 98: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/98.jpg)
98Carnegie Mellon
Christian Monson
25
35
45
English German Finnish Turkish
IR EvaluationParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
Ber
nhar
d
Mor
fess
or
Par
aMor
Par
aMor
& M
orf.
Ber
nhar
d
Mor
fess
or
Par
aMor
Par
aMor
& M
orfe
ssor39.4 39.6
47.348.1
39.3
37.2
46.0
39.6
Average Precision
32.3 – No Morphological Analysis
![Page 99: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/99.jpg)
99Carnegie Mellon
Christian Monson
25
35
45
English German Finnish Turkish
IR EvaluationParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
Ber
nhar
d
Mor
fess
or
Par
aMor
Par
aMor
& M
orf.
Ber
nhar
d
Mor
fess
or
Par
aMor
Par
aMor
& M
orf.
Ber
nhar
d
Mor
fess
or
Par
aMor
Par
aMor
& M
orfe
ssor39.4 39.6
47.348.1
49.2
38.839.3
37.2
46.0
39.6
37.9
36.9
Average Precision
32.3 – No Morphological Analysis
![Page 100: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/100.jpg)
100Carnegie Mellon
Christian Monson
Turkish Was Not Evaluated for the IR Task
25
35
45
English German Finnish Turkish
IR EvaluationParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
Ber
nhar
d
Mor
fess
or
Par
aMor
Par
aMor
& M
orf.
Ber
nhar
d
Mor
fess
or
Par
aMor
Par
aMor
& M
orf.
Ber
nhar
d
Mor
fess
or
Par
aMor
Par
aMor
& M
orfe
ssor39.4 39.6
47.348.1
49.2
38.839.3
37.2
46.0
39.6
37.9
36.9
![Page 101: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/101.jpg)
101Carnegie Mellon
Christian Monson
![Page 102: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/102.jpg)
102Carnegie Mellon
Christian Monson
Is Beads-on-a-String Model Adequate?
![Page 103: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/103.jpg)
103Carnegie Mellon
Christian Monson
A Sample of 894 Languages (Dryer, 2005)
![Page 104: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/104.jpg)
104Carnegie Mellon
Christian Monson
86% Have Affixational Morphology
Affixing Languages
Little Affixation
![Page 105: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/105.jpg)
105Carnegie Mellon
Christian Monson
70% are Suffixing
Primarily Prefixation
Little Affixation
Significant SuffixationSuffixationSuffixationSuffixationSuffixation
![Page 106: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/106.jpg)
106Carnegie Mellon
Christian Monson
![Page 107: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/107.jpg)
107Carnegie Mellon
Christian Monson
Paradigms Do Not Describe Derivation
inform ationmiser
ment
manage mentmis
![Page 108: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/108.jpg)
108Carnegie Mellon
Christian Monson
Paradigms Do Not Describe Derivation
inform ationmiser
ment
manage mentmisation
![Page 109: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/109.jpg)
109Carnegie Mellon
Christian Monson
Paradigms Do Not Describe Derivation
inform ationmiser
ment
manage mentmisation
![Page 110: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/110.jpg)
110Carnegie Mellon
Christian Monson
![Page 111: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/111.jpg)
111Carnegie Mellon
Christian Monson
sinyecek
present2nd person singular
Morphology is Complex – Fusion
me
take passive negative
You are not taken
götür ül
![Page 112: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/112.jpg)
112Carnegie Mellon
Christian Monson
sin
negative-present2nd person singular
take passive
You are not taken
götür ül mez
Morphology is Complex – Fusion
![Page 113: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/113.jpg)
113Carnegie Mellon
Christian Monson
sin
negative-present2nd person singular
take passive
You are not taken
götür ül mez
Morphology is Complex – Fusion
![Page 114: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/114.jpg)
114Carnegie Mellon
Christian Monson
![Page 115: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/115.jpg)
115Carnegie Mellon
Christian Monson
Size of Search Space
Huge: 2|candidate suffixes|
Most candidate suffixes have no common stems
Still Exponential
Greedily searched space: O(|candidate suffixes|)
This example: 0.1% of the searched space
s10697
autorizacionesbuscabamos
costaØ costasimportadoraØ importadoras
vallaØ vallas…
Ø s5513
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
Search for Candidate Paradigms
![Page 116: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/116.jpg)
116Carnegie Mellon
Christian Monson
![Page 117: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/117.jpg)
117Carnegie Mellon
Christian Monson
Some Candidates are Errors
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
Stem-internal boundary hypothesis
Correct
Incorrect
![Page 118: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/118.jpg)
118Carnegie Mellon
Christian Monson
![Page 119: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/119.jpg)
119Carnegie Mellon
Christian Monson
Enable Cross-Lingual Communication
7000 languages in the world
6.66 billion peopleHalf speak one of the 10 largest languages
Half don’t!
![Page 120: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/120.jpg)
120Carnegie Mellon
Christian Monson
![Page 121: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/121.jpg)
121Carnegie Mellon
Christian Monson
Preliminary Linguistic Evaluation
P R P R P R P R
English German English German
Inflectional Only Inflectional & Derivational
Built 2 styles of answer keys
For 2 languages
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 122: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/122.jpg)
122Carnegie Mellon
Christian Monson
Preliminary Linguistic Evaluation
ParaMor 33.0 81.4 42.8 68.6 48.9 53.6 60.0 33.5
P R P R P R P R
English German English German
Inflectional Only Inflectional & Derivational
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
Built 2 styles of answer keys
For 2 languages
![Page 123: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/123.jpg)
123Carnegie Mellon
Christian Monson
Preliminary Linguistic Evaluation
ParaMor 33.0 81.4 42.8 68.6 48.9 53.6 60.0 33.5
Morfessor
P R P R P R P R
English German English German
Inflectional Only Inflectional & Derivational
MorfessorFreely available unsupervised morphology
induction system
Statistical model of morphology
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 124: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/124.jpg)
124Carnegie Mellon
Christian Monson
Preliminary Linguistic Evaluation
ParaMor 33.0 81.4 42.8 68.6 48.9 53.6 60.0 33.5
Morfessor 53.3 47.0 38.7 44.2 73.6 34.0 66.9 37.1
P R P R P R P R
English German English German
Inflectional Only Inflectional & Derivational
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
MorfessorFreely available unsupervised morphology
induction system
Statistical model of morphology
![Page 125: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/125.jpg)
125Carnegie Mellon
Christian Monson
![Page 126: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/126.jpg)
126Carnegie Mellon
Christian Monson
strado15
rado167
rada radas rado rados
53
rada radorados
67
rada rado89
ra rada radasrado rados ran
rar raron ró23
strada strado12
strada strado stró
9
strada strado strar stró
8
strada stradas strado strar stró
7
...an
1784
a an1045
a an ar
417
a an ar ó355
a ada adas ado ados an
ar aron ó148
es2750
Ø es845
n6039
Ø n1863
Ø n r
512
Ø do n r357
Ø da das do dos n ndo r ron
115
a9020
a o2325
a o os
1418
a as o os899
s10697
Ø s5513
Ø r s
281
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
Where is the Statistics?
ParaMor has no explicit statistical model
Each candidate paradigm is a minimal description
MDL has close ties to Bayesian statistics
![Page 127: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/127.jpg)
127Carnegie Mellon
Christian Monson
![Page 128: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/128.jpg)
128Carnegie Mellon
Christian Monson
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
Some Model Fragments of Paradigms
15 Suffixes from the
ar Verbal Paradigm(Which has more than 30)
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 129: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/129.jpg)
129Carnegie Mellon
Christian Monson
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
Some Model Fragments of Paradigms
Here’s 15 More Suffixes from the ar Verbal Paradigm
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 130: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/130.jpg)
130Carnegie Mellon
Christian Monson
![Page 131: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/131.jpg)
131Carnegie Mellon
Christian Monson
Raw Text
The ParaMor AlgorithmParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
Spanish Example
![Page 132: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/132.jpg)
132Carnegie Mellon
Christian Monson
Raw Text
ParaMorWord List50,000 Types
autorizacionesbuscabamoscostarimportadoravallas…
The ParaMor AlgorithmParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
![Page 133: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/133.jpg)
133Carnegie Mellon
Christian Monson
Raw Text
ParaMor autorizacionesbuscabamoscostarimportadoravallas…
v + allasva + llasval + lasvall + asvalla + svallas + Ø
A priori, each character boundary is a candidate morpheme boundary
Propose multiple analyses of each word
Each analysis contains exactly 1 morpheme boundary
The ParaMor AlgorithmParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
![Page 134: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/134.jpg)
134Carnegie Mellon
Christian Monson
Paradigms
ParadigmMutually substitutable morphological operations
ül m umumØuz
üyoryecek
The ParaMor Algorithm
![Page 135: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/135.jpg)
135Carnegie Mellon
Christian Monson
Paradigms
ParadigmMutually substitutable strings
ül m umumØuz
üyoryecek
The ParaMor Algorithm
![Page 136: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/136.jpg)
136Carnegie Mellon
Christian Monson
s10697
ParaMor
Consolidate Identical candidate suffixes into paradigm seeds
Raw Text
autorizacionesbuscabamoscostarimportadoravallas…
v + allasva + llasval + lasvall + asvalla + svallas + Ø
The ParaMor Algorithm
![Page 137: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/137.jpg)
137Carnegie Mellon
Christian Monson
s10697
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
autorizacionesbuscabamos
costasimportadoras
vallas…
Begin search with the most frequent candidate suffix
Search for Candidate Paradigms
Bottom-Up
Greedy
![Page 138: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/138.jpg)
138Carnegie Mellon
Christian Monson
![Page 139: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/139.jpg)
139Carnegie Mellon
Christian Monson
8240 Selected Candidates Paradigms
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 140: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/140.jpg)
140Carnegie Mellon
Christian Monson
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
Some Candidates Model ParadigmsParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
![Page 141: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/141.jpg)
141Carnegie Mellon
Christian Monson
Some Candidates are Errors
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 142: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/142.jpg)
142Carnegie Mellon
Christian Monson
The ParaMor Algorithm
Identify Paradigms in 3 Steps1. Search for candidate paradigms
2. Cluster candidates modeling the same paradigm
3. Filter
Segment Words Using the discovered paradigms
2 Filters New to ParaMor
ParaMorIdentify
SearchClusterFilter
Segment
![Page 143: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/143.jpg)
143Carnegie Mellon
Christian Monson
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
1. Spurious String SimilaritiesParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
![Page 144: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/144.jpg)
144Carnegie Mellon
Christian Monson
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
1. Spurious String SimilaritiesParaMor
IdentifySearchClusterFilter
SegmentEvaluationResults
From: allØ amØ gØ sØ alla ama ga saallanar amanar ganar sanar
![Page 145: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/145.jpg)
145Carnegie Mellon
Christian Monson
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
From: allØ amØ gØ sØ alla ama ga saallanar amanar ganar sanar
Supported by 8 Short Types
1. Spurious String Similarities
![Page 146: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/146.jpg)
146Carnegie Mellon
Christian Monson
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
From: allØ amØ gØ sØ alla ama ga saallanar amanar ganar sanar
Exclude Short Types from the Induction Vocabulary
1. Spurious String Similarities
![Page 147: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/147.jpg)
147Carnegie Mellon
Christian Monson
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
2. Suffix-Internal Boundary Hypotheses
![Page 148: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/148.jpg)
148Carnegie Mellon
Christian Monson
2. Suffix-Internal Boundary Hypotheses
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th aØ aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
Incorrect
Correct
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
![Page 149: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/149.jpg)
149Carnegie Mellon
Christian Monson
Incorrect
Correct
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th aØ aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
2. Suffix-Internal Boundary Hypotheses
![Page 150: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/150.jpg)
150Carnegie Mellon
Christian Monson
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th aØ aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
2. Suffix-Internal Boundary Hypotheses
Adapt the Transition Likelihood ApproachSimilar to Goldsmith (2006)
![Page 151: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/151.jpg)
151Carnegie Mellon
Christian Monson
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
acompaña-anuncia-
aplica-apoya-
celebra-considera-
controla-desarrolla-desplaza-
disputa-eleva-
enfrenta-forma-
halla-integra-
lanza-llama-llega-lleva-
ocupa-pasa-
presenta-realiza-
registra-toma-
From The Candidate Stems
2. Suffix-Internal Boundary Hypotheses
![Page 152: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/152.jpg)
152Carnegie Mellon
Christian Monson
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
acompañ-anunci-
aplic-apoy-
celebr-consider-desarroll-desplaz-
disput-elev-
enfrent-form-
hall-integr-
lanz-llam-lleg-llev-
ocup-pas-
present-realiz-
registr-tom-
From The Candidate Stems
acompaña-anuncia-
aplica-apoya-
celebra-considera-
controla-desarrolla-desplaza-
disputa-eleva-
enfrenta-forma-
halla-integra-
lanza-llama-llega-lleva-
ocupa-pasa-
presenta-realiza-
registra-toma-
From The Candidate Stems
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
2. Suffix-Internal Boundary Hypotheses
![Page 153: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/153.jpg)
153Carnegie Mellon
Christian Monson
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
Entropy
3.490.00
Entropy
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
acompañ-anunci-
aplic-apoy-
celebr-consider-desarroll-desplaz-
disput-elev-
enfrent-form-
hall-integr-
lanz-llam-lleg-llev-
ocup-pas-
present-realiz-
registr-tom-
From The Candidate Stems
acompaña-anuncia-
aplica-apoya-
celebra-considera-
controla-desarrolla-desplaza-
disputa-eleva-
enfrenta-forma-
halla-integra-
lanza-llama-llega-lleva-
ocupa-pasa-
presenta-realiza-
registra-toma-
From The Candidate Stems
2. Suffix-Internal Boundary Hypotheses
![Page 154: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/154.jpg)
154Carnegie Mellon
Christian Monson
1st Ø s
2nd a as o os
3rd Ø ba ban da das do dos n ndo r ron rse rá rán
5th a aba aban ada adas ado ados an ando ar aron arse ará arán ó
11th ta tamente tas to tos
12th Ø ba ción da das do dos n ndo r ron rá rán ría
13th a aba ada adas ado ados an ando ar aron ará arán e en ó
30th a e en ida idas ido idos iendo ieron ió ía
1000th Ø g gs
1566th ido idos ir iré
2000th lia liana
3000th Ø a anar
4000th Ø e ince
8000th trada trarnos
acompañ-anunci-
aplic-apoy-
celebr-consider-desarroll-desplaz-
disput-elev-
enfrent-form-
hall-integr-
lanz-llam-lleg-llev-
ocup-pas-
present-realiz-
registr-tom-
From The Candidate Stems
Entropy
3.490.00
Entropy
Removed
ParaMor discards candidates whose entropy falls below a threshold parameter
ParaMorIdentify
SearchClusterFilter
SegmentEvaluationResults
acompaña-anuncia-
aplica-apoya-
celebra-considera-
controla-desarrolla-desplaza-
disputa-eleva-
enfrenta-forma-
halla-integra-
lanza-llama-llega-lleva-
ocupa-pasa-
presenta-realiza-
registra-toma-
From The Candidate Stems
2. Suffix-Internal Boundary Hypotheses
![Page 155: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/155.jpg)
155Carnegie Mellon
Christian Monson
![Page 156: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/156.jpg)
156Carnegie Mellon
Christian Monson
Morphology is Complex – Operations
Prefixation
Suffixation
![Page 157: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/157.jpg)
157Carnegie Mellon
Christian Monson
Prefixation
Reduplication
Suffixation
Morphology is Complex – Operations
![Page 158: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/158.jpg)
158Carnegie Mellon
Christian Monson
Prefixation
Reduplication
Infixation
Suffixation
Morphology is Complex – Operations
![Page 159: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/159.jpg)
159Carnegie Mellon
Christian Monson
Prefixation
Reduplication
Infixation
Suffixation
Morphology is Complex – Operations
![Page 160: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/160.jpg)
160Carnegie Mellon
Christian Monson
Prefixation
Reduplication
Infixation
Suffixation
Morphology is Complex – Operations
![Page 161: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/161.jpg)
161Carnegie Mellon
Christian Monson
Inflection
Morphology is Complex – Purpose
götür ül m sunüyor
take passive negativepresent
progressive2nd person singular
![Page 162: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/162.jpg)
162Carnegie Mellon
Christian Monson
Inflection
götür ül m sunüyor
take passive negativepresent
progressive2nd person singular
Derivation
inform
Morphology is Complex – Purpose
![Page 163: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/163.jpg)
163Carnegie Mellon
Christian Monson
Inflection
Morphology is Complex – Purpose
götür ül m sunüyor
take passive negativepresent
progressive2nd person singular
Derivation
inform ation
abstract noun
![Page 164: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/164.jpg)
164Carnegie Mellon
Christian Monson
Inflection
götür ül m sunüyor
take passive negativepresent
progressive2nd person singular
Derivation
inform ationmis
abstract noun
negative
Morphology is Complex – Purpose
![Page 165: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/165.jpg)
165Carnegie Mellon
Christian Monson
götür ül m sunüyor
take passive negativepresent
progressive
You are not being taken
2nd person singular
Morphology is Complex – Morphophonology
![Page 166: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/166.jpg)
166Carnegie Mellon
Christian Monson
sunyecek
future2nd person singular
götür ül m
take passive negative
You will not be taken
Morphology is Complex – Morphophonology
![Page 167: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/167.jpg)
167Carnegie Mellon
Christian Monson
sunyecek
future2nd person singular
götür ül m
take passive negative
You will not be taken
Morphology is Complex – Morphophonology
![Page 168: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/168.jpg)
168Carnegie Mellon
Christian Monson
sunyecek
future2nd person singular
götür ül me
take passive negative
You will not be taken
Morphology is Complex – Morphophonology
![Page 169: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/169.jpg)
169Carnegie Mellon
Christian Monson
sinyecek
future2nd person singular
götür ül me
take passive negative
You will not be taken
Morphology is Complex – Morphophonology
![Page 170: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/170.jpg)
170Carnegie Mellon
Christian Monson
sinyecek
future2nd person singular
götür ül me
take passive negative
You will not be taken
Morphology is Complex – Morphophonology
![Page 171: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/171.jpg)
171Carnegie Mellon
Christian Monson
Morphology is Complex – Ambiguity
Hungarianmentek
men +tekgo +Present.2nd.Plural‘yinz go’
![Page 172: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/172.jpg)
172Carnegie Mellon
Christian Monson
Hungarianmentek
men +tekgo +Present.2nd.Plural‘yinz go’
men +t +ekgo +PastParticiple
+Plural‘those who have gone’
Morphology is Complex – Ambiguity
![Page 173: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/173.jpg)
173Carnegie Mellon
Christian Monson
![Page 174: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/174.jpg)
174Carnegie Mellon
Christian Monson
Paradigms Do Not Describe Derivation
inform ationmis
![Page 175: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/175.jpg)
175Carnegie Mellon
Christian Monson
inform ationmiser
Paradigms Do Not Describe Derivation
![Page 176: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/176.jpg)
176Carnegie Mellon
Christian Monson
inform ationmiser
ement
Paradigms Do Not Describe Derivation
![Page 177: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/177.jpg)
177Carnegie Mellon
Christian Monson
inform ationmiser
ement
Paradigm Based ImpliesStrong at inflectional morphology
Paradigms Do Not Describe Derivation
![Page 178: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/178.jpg)
178Carnegie Mellon
Christian Monson
![Page 179: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/179.jpg)
179Carnegie Mellon
Christian Monson
The Next Steps for ParaMor
Scaling Paradigm InductionCurrently 50,000 typesUp to larger vocabulariesDown for languages with few resourcesParameter settings need tuning
Scaling Down SegmentationCurrently 300,000 to 2.2 million typesThe larger the vocabulary, the more likely a
particular stem will occur in more than one surface form
![Page 180: Evaluating an Agglutinative Segmentation Model for ParaMor](https://reader035.vdocuments.site/reader035/viewer/2022070408/568143ee550346895db072ba/html5/thumbnails/180.jpg)
180Carnegie Mellon
Christian Monson