cmsc 723: computational linguistics i session #3 finite...
TRANSCRIPT
![Page 1: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/1.jpg)
Finite-State MorphologyCMSC 723: Computational Linguistics I ― Session #3
Jimmy LinJimmy LinThe iSchoolUniversity of Maryland
Wednesday, September 16, 2009
![Page 2: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/2.jpg)
Today’s AgendaComputational tools
Regular expressionsFinite-state automata (deterministic vs. non-deterministic)Finite-state transducers
Overview of morphological processesOverview of morphological processes
Computational morphology with finite-state methods
![Page 3: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/3.jpg)
Regular ExpressionsA metalanguage for specifying simple classes of strings
Very useful in searching and matching text strings
Everyone does it!Implementations in the shell, Perl, Java, Python, …
![Page 4: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/4.jpg)
Regular ExpressionsBasic regular expressions/happy/ → happy/[abcd]/ → a, b, c, d/[a-d]/ → a, b, c, d/[^a-d]/ → e, f, g, … z/[ a d]/ e, f, g, … z/[Tt]he/ → the, The/(dog|cat)/ → dog, cat
Special metacharacters/colou?r/ → color, colour/oo*h!/ → oh! ooh! oooh!/oo h!/ → oh!, ooh!, oooh!, …/oo+h!/ → ooh!, oooh!, ooooh!, …/beg.n/ → began, begin, begun, begbn, …
![Page 5: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/5.jpg)
NLP* with Regular ExpressionsTranscript with Eliza, simulation of a Rogerian psychotherapist (Weizenbaum, 1966)
User: Men are all alike
ELIZA: IN WHAT WAY
User: They’re always bugging us about something or other
ELIZA: CAN YOU THINK OF A SPECIFIC EXAMPLE?
User: Well, my boyfriend made me come here
ELIZA: YOUR BOYFRIEND MADE YOU COME HERE
User: He says I’m depressed much of the timeUser: He says I m depressed much of the time
ELIZA: I AM SORRY TO HEAR THAT YOU ARE DEPRESSED
![Page 6: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/6.jpg)
How did it work?.* all .* → IN WHAT WAY
.* always .* → CAN YOU THINK OF A SPECIFIC EXAMPLE
.* I’m (depressed|sad) .* → I AM SORRY TO HEAR YOU ARE \1
.* I’m (depressed|sad) .* → WHY DO YOU THINK YOU ARE \1?
![Page 7: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/7.jpg)
Aside…What is intelligence?
What does Eliza tell us about intelligence?at does a te us about te ge ce
![Page 8: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/8.jpg)
Equivalence RelationsWe can say the following
Regular expressions describe a regular languageRegular expressions can be implemented by finite-state automataRegular languages can be generated by regular grammars
So what?So what?
RegularLanguages
Regular ExpressionsFinite-State Automata
Languages
Regular Grammars
![Page 9: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/9.jpg)
Sheeptalk!
baa!b !
Language:
R l E ibaaa!baaaa!baaaaa!...
/baa+!/Regular Expression:
Finite State Automaton:b a a !
Finite-State Automaton:
q0 q1 q2 q3 q4
a
![Page 10: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/10.jpg)
Finite-State AutomataWhat are they?
What do they do?at do t ey do
How do they work?
![Page 11: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/11.jpg)
FSA: What are they?Q: a finite set of N states
Q = {q0, q1, q2, q3, q4}The start state: q0
The set of final states: F = {q4}
Σ: a finite input alphabet of symbolsΣ: a finite input alphabet of symbolsΣ = {a, b, !}
δ(q i): transition functionδ(q,i): transition function Given state q and input symbol i, return new state q'δ(q3,!) → q4
q0 q1 q2 q3 q4
b a a !
q0 q1 q2 q3 q4
a
![Page 12: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/12.jpg)
FSA: State Transition Table
InputState b a !State b a !
0 1 ∅ ∅
1 ∅ 2 ∅1 ∅ 2 ∅2 ∅ 3 ∅3 ∅ 3 43 ∅ 3 44 ∅ ∅ ∅
q0 q1 q2 q3 q4
b a a !
q0 q1 q2 q3 q4
a
![Page 13: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/13.jpg)
FSA: What do they do?Given a string, a FSA either rejects or accepts it
ba! → rejectbaa! → acceptbaaaz! → rejectbaaaa! → acceptbaaaa! acceptbaaaaaa! → acceptbaa → rejectmoooo rejectmoooo → reject
What does this have to do with NLP?Think grammaticality!Think grammaticality!
![Page 14: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/14.jpg)
FSA: How do they work?
q0 q1 q2 q3 q3 q4
b a a a ! ACCEPT
b a a !
q0 q1 q2 q3 q4
a
![Page 15: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/15.jpg)
FSA: How do they work?
q0 q1 q2
b a ! ! ! REJECT
b a a !
q0 q1 q2 q3 q4
a
![Page 16: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/16.jpg)
D-RECOGNIZE
![Page 17: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/17.jpg)
Accept or Generate?Formal languages are sets of strings
Strings composed of symbols drawn from a finite alphabet
Finite-state automata define formal languages Without having to enumerate all the strings in the language
Two views of FSAs:Acceptors that can tell you if a string is in the languageGenerators to produce all and only the strings in the languageGenerators to produce all and only the strings in the language
![Page 18: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/18.jpg)
Simple NLP with FSAs
![Page 19: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/19.jpg)
Introducing Non-DeterminismDeterministic vs. Non-deterministic FSAs
Epsilon (ε) transitions
![Page 20: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/20.jpg)
Using NFSAs to Accept StringsWhat does it mean?
Accept: there exist at least one path (need not be all paths)Reject: no paths exist
General approaches:Backup: add markers at choice points, then possibly revisit unexplored arcs at marked choice pointLook-ahead: look ahead in input to provide cluesParallelism: look at alternatives in parallel
Recognition with NFSAs as search through state space( )Agenda holds (state, tape position) pairs
![Page 21: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/21.jpg)
ND-RECOGNIZE
![Page 22: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/22.jpg)
ND-RECOGNIZE
![Page 23: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/23.jpg)
State OrderingsStack (LIFO): depth-first
Queue (FIFO): breadth-firstQueue ( O) b eadt st
![Page 24: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/24.jpg)
ND-RECOGNIZE: Example
ACCEPT
![Page 25: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/25.jpg)
What’s the point?NFSAs and DFSAs are equivalent
For every NFSA, there is a equivalent DFSA (and vice versa)
Equivalence between regular expressions and FSAEasy to show with NFSAs
Why use NFSAs?
![Page 26: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/26.jpg)
Regular Language: Definition∅ is a regular language
a Σ ε, {a} is a regular language a ε, {a} s a egu a a guage
If L1 and L2 are regular languages, then so are:L1 · L2 = {x y | x L1 , y L2 }, the concatenation of L1 and L2L1 L2 {x y | x L1 , y L2 }, the concatenation of L1 and L2
L1 L2, the union or disjunction of L1 and L2
L1 , the Kleene closure of L1
![Page 27: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/27.jpg)
Regular Languages: Starting Points
![Page 28: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/28.jpg)
Regular Languages: Concatenation
![Page 29: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/29.jpg)
Regular Languages: Disjunction
![Page 30: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/30.jpg)
Regular Languages: Kleene Closure
![Page 31: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/31.jpg)
Finite-State Transducers (FSTs)A two-tape automaton that recognizes or generates pairs of strings
Think of an FST as an FSA with two symbol strings on each arc
One symbol string from each tape
![Page 32: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/32.jpg)
Four-fold view of FSTsAs a recognizer
As a generators a ge e ato
As a translator
As a set relaterAs a set relater
![Page 33: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/33.jpg)
Summary: Computational ToolsRegular expressions
Finite-state automata (deterministic vs. non-deterministic)te state auto ata (dete st c s o dete st c)
Finite-state transducers
![Page 34: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/34.jpg)
Computational MorphologyDefinitions and problems
What is morphology?Topology of morphologies
Computational morphologyFinite-state methods
![Page 35: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/35.jpg)
MorphologyStudy of how words are constructed from smaller units of meaning
Smallest unit of meaning = morphemefox has morpheme foxcats has two morphemes cat and –sNote: it is useful to distinguish morphemes from orthographic rules
Two classes of morphemes:Two classes of morphemes:Stems: supply the “main” meaningAffixes: add “additional” meaning
![Page 36: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/36.jpg)
Topology of MorphologiesConcatenative vs. non-concatenative
Derivational vs. inflectionale at o a s ect o a
Regular vs. irregular
![Page 37: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/37.jpg)
Concatenative MorphologyMorpheme+Morpheme+Morpheme+…
Stems (also called lemma, base form, root, lexeme):Ste s (a so ca ed e a, base o , oot, e e e)hope+ing → hopinghop+ing → hopping
Affixes:Prefixes: AntidisestablishmentarianismSuffixes: AntidisestablishmentarianismSuffixes: Antidisestablishmentarianism
Agglutinative languages (e.g., Turkish)uygarlaştıramadıklarımızdanmışsınızcasına →uygarlaştıramadıklarımızdanmışsınızcasına →uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casınaMeaning: behaving as if you are among those whom we could not cause to become civilizedcause to become civilized
![Page 38: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/38.jpg)
Non-Concatenative MorphologyInfixes (e.g., Tagalog)
hingi (borrow)humingi (borrower)
Circumfixes (e.g., German)sagen (say)gesagt (said)
Reduplication (e g Motu spoken in Papua New Guinea)Reduplication (e.g., Motu, spoken in Papua New Guinea)mahuta (to sleep)mahutamahuta (to sleep constantly)mamahuta (to sleep, plural)
![Page 39: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/39.jpg)
Templatic MorphologiesCommon in Semitic languages
Roots and patternsoots a d patte s
تك ב תכArabic Hebrewب تك ב תכ
و? ??َم ו? ??
بوكتم בוכת
maktuub ktuuvmaktuubwritten
ktuuvwritten
![Page 40: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/40.jpg)
Derivational MorphologyStem + morpheme →
Word with different meaning or different part of speechExact meaning difficult to predict
Nominalization in English: -ation: computerization, characterization-ee: appointee, advisee-er: killer, helper
Adjective formation in English:-al: computational, derivational-less: clueless, helpless-able: teachable, computable
![Page 41: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/41.jpg)
Inflectional MorphologyStem + morpheme →
Word with same part of speech as the stem
Adds: tense, number, person,…
Plural morpheme for English nouncat+sdog+s
Progressive form in English verbswalk+ingrain+ingrain+ing
![Page 42: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/42.jpg)
Noun Inflections in EnglishRegular
cat/catsdog/dogs
Irregularmouse/miceox/oxengoose/geese
![Page 43: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/43.jpg)
Verb Inflections in English
![Page 44: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/44.jpg)
Verb Inflections in Spanish
![Page 45: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/45.jpg)
Morphological ParsingComputationally decompose input forms into component morphemes
Components needed:A lexicon (stems and affixes)A model of how stems and affixes combineOrthographic rules
![Page 46: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/46.jpg)
Morphological Parsing: ExamplesWORD STEM (+FEATURES)*
cats cat +N +PLcats cat
cat cat +N +SG
cities city +N +PLcities city +N +PL
geese goose +N +PL
ducks (duck +N +PL) or (duck +V +3SG)
merging merge +V +PRES-PART
caught (catch +V +PAST-PART) or (catch +V +PAST)
![Page 47: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/47.jpg)
Different ApproachesLexicon only
Rules onlyu es o y
Lexicon and rulesfinite-state automatafinite state automatafinite-state transducers
![Page 48: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/48.jpg)
Lexicon-onlySimply enumerate all surface forms and analyses
So what’s the problem?So at s t e p ob e
When might this be useful?
acclaim acclaim $N$acclaim acclaim $V+0$acclaimed acclaim $V+ed$acclaimed acclaim $V+en$acclaiming acclaim $V+ing$acclaims acclaim $N+s$acclaims acclaim $V+s$acclamation acclamation $N$
$ $acclamations acclamation $N+s$acclimate acclimate $V+0$acclimated acclimate $V+ed$acclimated acclimate $V+en$
$ $acclimates acclimate $V+s$acclimating acclimate $V+ing$
![Page 49: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/49.jpg)
Rule-only: Porter StemmerCascading set of rules
ational → ate (e.g., reational → relate)ing → ε (e.g., walking → walk)sses → ss (e.g., grasses → grass)……
Examplescities → citicity→ citigeneralizations → generalization ge e a at o→ generalize → general → gener
![Page 50: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/50.jpg)
Porter Stemmer: What’s the Problem?Errors…
Why is it still useful?
![Page 51: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/51.jpg)
Lexicon + RulesFSA: for recognition
Recognize all grammatical input and only grammatical input
FST: for analysisIf grammatical, analyze surface form into component morphemesOtherwise, declare input ungrammatical
![Page 52: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/52.jpg)
FSA: English Noun MorphologyLexicon
i l i l lreg-noun irreg-pl-noun irreg-sg-noun pluralfoxcat
geesesheep
goosesheep
-s
R le
dog mice mouse
Note problem with orthography!Rule Note problem with orthography!
![Page 53: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/53.jpg)
FSA: English Noun Morphology
![Page 54: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/54.jpg)
FSA: English Verb Morphology
reg-verb-stem
irreg-verb-stem
irreg-past-verb
past past-part
pres-part
3sg
Lexicon
stem stem verb part partwalkfrytalk
cutspeakspoken
caughtateeaten
-ed -ed -ing -s
impeach singsang
R leRule
![Page 55: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/55.jpg)
FSA: English Adjectival MorphologyExamples:
big, bigger, biggestsmaller, smaller, smallesthappy, happier, happiest, happilyunhappy, unhappier, unhappiest, unhappilyunhappy, unhappier, unhappiest, unhappily
Morphemes:Roots: big, small, happy, etc.Affixes: un-, -er, -est, -ly
![Page 56: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/56.jpg)
FSA: English Adjectival Morphology
adj root : {happy real }adj-root1: {happy, real, …}adj-root2: {big, small, …}
![Page 57: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/57.jpg)
FSA: Derivational Morphology
![Page 58: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/58.jpg)
Morphological Parsing with FSTsLimitation of FSA:
Accepts or rejects an input… but doesn’t actually provide an analysis
Use FSTs instead!One tape contains the input the other tape as the analysisOne tape contains the input, the other tape as the analysisWhat if both tapes contain symbols?What if only one tape contains symbols?
![Page 59: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/59.jpg)
TerminologyTransducer alphabet (pairs of symbols):
a:b = a on the upper tape, b on the lower tapea:ε = a on the upper tape, nothing on the lower tapeIf a:a, write a for shorthand
Special symbolsSpecial symbols# = word boundary^ = morpheme boundary(For now, think of these as mapping to ε)
![Page 60: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/60.jpg)
FST for English NounsFirst try:
What’s the problem here?
![Page 61: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/61.jpg)
FST for English Nouns
![Page 62: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/62.jpg)
Handling Orthography
![Page 63: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/63.jpg)
Complete Morphological Parser
![Page 64: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/64.jpg)
FSTs and Ambiguityunionizable
union +ize +ableun+ ion +ize +able
assessassess +Vass +N +essN
![Page 65: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/65.jpg)
Optimizations
![Page 66: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/66.jpg)
Practical NLP ApplicationsIn practice, it is almost never necessary to write FSTs by hand…
Typically, one writes rules:Chomsky and Halle Notation: a → b / c__d
rewrite a as b when occurs between c and d= rewrite a as b when occurs between c and dE-Insertion rule
xε → e /
xsz
^ __ s #
Rule → FST compiler handles the rest…
![Page 67: CMSC 723: Computational Linguistics I Session #3 Finite ...lintool.github.io/UMD-courses/CMSC723-2009-Fall/session3-slides.pdf · Finite-State Morphology CMSC 723: Computational Linguistics](https://reader030.vdocuments.site/reader030/viewer/2022041221/5e0a48362c752d4ed5469f98/html5/thumbnails/67.jpg)
What we covered today…Computational tools
Regular expressionsFinite-state automata (deterministic vs. non-deterministic)Finite-state transducers
Overview of morphological processesOverview of morphological processes
Computational morphology with finite-state methods
One final question: is morphology actually finite state?