lecture 5: morphology - computer sciencekc2wc/teaching/nlp16/slides/05-morphology.pdf · this...
TRANSCRIPT
Lecture 5: Morphology
Kai-Wei ChangCS @ University of Virginia
Couse webpage: http://kwchang.net/teaching/NLP16
16501 Natural Language Processing
This lecture
vWhat is the structure of words?vCan we build an analyzer to model the
structure of words?vFinite-state automata and regular expression
26501 Natural Language Processing
Words
vFinite-state methods are particularly useful in dealing with a lexiconvCompact representations of words
vAgendavsome facts about wordsvcomputational methods
6501 Natural Language Processing 3
A Turkish word
vHow about English?
6501 Natural Language Processing 4
ExamplefromJuliaHockenmaier, IntrotoNLP
Longest word in English
v Longest word in Shakespeare’sHonorificabilitudinitatibus (27 letters)
v Longest non-technical word:Antidisestablishmentarianism (28 letters)
v Longest word in a major dictionaryPneumonoultramicroscopicsilicovolcanoconiosis (45 letters)
v Longest word in literatureLopadotemachoselachogaleokranioleipsano...pterygon (182 letters) – Ancient greek transliteration
v Methionylthreonylthreonylglutaminylarginyl...isoleucine (189,819 letters) – chemical name of a protein
6501 Natural Language Processing 5
What is Morphology?
vThe ways that words are built up from smaller meaningful units (morphemes)
vTwo classes of morphemesvStems: The core meaning-bearing unitsvAffixes: adhere to stems to change their
meanings and grammatical functions ve.g,. dis-grace-ful-ly
6501 Natural Language Processing 6
Inflection Morphology
Create different forms of the same word:vExamples:
vVerbs: walk, walked, walksvNouns: Book, books, book’s vPersonal pronouns: he, she, her, them, us
vServes a grammatical/semantic purpose that is different from the original but is transparently related to the original
6501 Natural Language Processing 7
Derivational Morphology
Create different words from the same lemma:v Nominalization:
v V+ -ation: e.g., computerizationv V+er: killer
v Negation:v Un-: Unod, unseen, …v Mis-: mistake, misunderstand ...
v Adjectivization:v V+-able: doablev N+-al: national
6501 Natural Language Processing 8
What else?
vCombines words into a new word:vCream, ice cream, ice cream cone, ice cream
cone bakery
vWord formation is productivevGoogle, Googler, to google, to misgoogle, to
googlefy, googlificationvGoogle Map, Google Book, …
6501 Natural Language Processing 9
Morphological parsing and generation
vMorphological parsing:
vMorphological generationvWhat words can be generated from grace?
grace, graceful, gracefully, disgrace, ungrace, undisgraceful, undisgracefully
6501 Natural Language Processing 10
Finite State Automata
vFSA and regular expression has the same expressive power
vThe above FSA accepts string r/baa+!/
6501 Natural Language Processing 11
Finite State Automata
v Terminology: v It has 5 statesv Alphabet: {b, a, !}v Start state: 𝑞"v Accept state: 𝑞#v 5 transitions
v Are there other machines that correspond to the same language r/baa+!/ ? v Yes
6501 Natural Language Processing 12
Alphabet justmeansafinitesetofsymbols intheinput
Canhavemanyacceptstates
Formal definition
vYou can specify an FSA by enumerating the following things.vThe set of states: QvA finite alphabet: ΣvA start statevA set of accept/final statesvA transition function that maps QxΣ to Q
6501 Natural Language Processing 13
Example -- dollars and Cents
6501 Natural Language Processing 14
Yet another view – table representation
6501 Natural Language Processing 15
b a ! e0 11 22 2,33 44
Ifyou’reinstate1andyou’relookingatana,gotostate2
Non-Deterministic FSA
v 𝜖- transitionvMore than one possible next statesvEquivalent to deterministic FSA
6501 Natural Language Processing 16
Regular expression
vEquivalent to FSAvMatching strings with regular expressions
(e.g., perl, python, grep)v translating the regular expression into a machine (a
table) and v passing the table and the string to an interpreter
6501 Natural Language Processing 17
Model morphology with FSA
vRegular singular nouns are okvRegular plural nouns have an -s on the endv Irregulars are ok as is
6501 Natural Language Processing 18
Now plug in the words
6501 Natural Language Processing 19
Derivational Rules
6501 Natural Language Processing 20
From recognition to parsing
vNow we can use these machines to recognize strings
vCan we use the machines to assign a structure to a string? (parsing)
vExample:vFrom “cats” to “cat +N +p”
6501 Natural Language Processing 21
Transitions
v c:c reads a c and write a cv ε:+N reads nothing and write +N
6501 Natural Language Processing 22
c:c a:a t:t ε: +N s: +p
Challenge: Ambiguity
v books: book +N +p or book +V +z (3rd
person)vNon-deterministic FSA: allows multiple
paths through a machine lead to the same accept state
vBias the search (or learn) so that a few likely paths are explored
6501 Natural Language Processing 23
Challenge: Spelling rules
v The underlying morphemes (e.g., plural-s)can have different surface realization (-s, -es)v cat+s = catsv fox+s = foxesv Make+ing = making
v How can we model it?
6501 Natural Language Processing 24
Intermediate representation
6501 Natural Language Processing 25
Overall Scheme
vOne FST that has explicit informationabout the lexiconvLexical level to intermediate forms
v Large set of machinesthat capture spelling rulesv Intermediate forms to surface
6501 Natural Language Processing 26
Lexical to intermediate level
6501 Natural Language Processing 27
Intermediate level to surface
vThe add and “e” rule for –svExample: fox^s# ↔ foxes#
6501 Natural Language Processing 28
Other application of FST
v ELIZA: https://en.wikipedia.org/wiki/ELIZAv Implemented using pattern matching -- FST
6501 Natural Language Processing 29
ELIZA as a FST cascade
Human: You don't argue with me.Computer: WHY DO YOU THINK I DON'T ARGUE WITH YOU
A simple rule:v1. Replace you with I and me with you:
I don't argue with you.v2. Replace <...> with Why do you think <...>:
Why do you think I don't argue with you.
6501 Natural Language Processing 30
What about compounds?
vCompounds have heretical structure:v (((ice cream) cone) bakery) not
(ice ((cream cone) bakery))v ((computer science) (graduate student)) not
(computer ((science graduate) student))
vWe need context-free grammars to capturethis underlying structure
6501 Natural Language Processing 31