lecture 5: morphology - computer sciencekc2wc/teaching/nlp16/slides/05-morphology.pdf · this...

31
Lecture 5: Morphology Kai-Wei Chang CS @ University of Virginia [email protected] Couse webpage: http://kwchang.net/teaching/NLP16 1 6501 Natural Language Processing

Upload: vanthu

Post on 14-May-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Lecture 5: Morphology

Kai-Wei ChangCS @ University of Virginia

[email protected]

Couse webpage: http://kwchang.net/teaching/NLP16

16501 Natural Language Processing

Page 2: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

This lecture

vWhat is the structure of words?vCan we build an analyzer to model the

structure of words?vFinite-state automata and regular expression

26501 Natural Language Processing

Page 3: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Words

vFinite-state methods are particularly useful in dealing with a lexiconvCompact representations of words

vAgendavsome facts about wordsvcomputational methods

6501 Natural Language Processing 3

Page 4: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

A Turkish word

vHow about English?

6501 Natural Language Processing 4

ExamplefromJuliaHockenmaier, IntrotoNLP

Page 5: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Longest word in English

v Longest word in Shakespeare’sHonorificabilitudinitatibus (27 letters)

v Longest non-technical word:Antidisestablishmentarianism (28 letters)

v Longest word in a major dictionaryPneumonoultramicroscopicsilicovolcanoconiosis (45 letters)

v Longest word in literatureLopadotemachoselachogaleokranioleipsano...pterygon (182 letters) – Ancient greek transliteration

v Methionylthreonylthreonylglutaminylarginyl...isoleucine (189,819 letters) – chemical name of a protein

6501 Natural Language Processing 5

Page 6: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

What is Morphology?

vThe ways that words are built up from smaller meaningful units (morphemes)

vTwo classes of morphemesvStems: The core meaning-bearing unitsvAffixes: adhere to stems to change their

meanings and grammatical functions ve.g,. dis-grace-ful-ly

6501 Natural Language Processing 6

Page 7: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Inflection Morphology

Create different forms of the same word:vExamples:

vVerbs: walk, walked, walksvNouns: Book, books, book’s vPersonal pronouns: he, she, her, them, us

vServes a grammatical/semantic purpose that is different from the original but is transparently related to the original

6501 Natural Language Processing 7

Page 8: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Derivational Morphology

Create different words from the same lemma:v Nominalization:

v V+ -ation: e.g., computerizationv V+er: killer

v Negation:v Un-: Unod, unseen, …v Mis-: mistake, misunderstand ...

v Adjectivization:v V+-able: doablev N+-al: national

6501 Natural Language Processing 8

Page 9: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

What else?

vCombines words into a new word:vCream, ice cream, ice cream cone, ice cream

cone bakery

vWord formation is productivevGoogle, Googler, to google, to misgoogle, to

googlefy, googlificationvGoogle Map, Google Book, …

6501 Natural Language Processing 9

Page 10: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Morphological parsing and generation

vMorphological parsing:

vMorphological generationvWhat words can be generated from grace?

grace, graceful, gracefully, disgrace, ungrace, undisgraceful, undisgracefully

6501 Natural Language Processing 10

Page 11: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Finite State Automata

vFSA and regular expression has the same expressive power

vThe above FSA accepts string r/baa+!/

6501 Natural Language Processing 11

Page 12: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Finite State Automata

v Terminology: v It has 5 statesv Alphabet: {b, a, !}v Start state: 𝑞"v Accept state: 𝑞#v 5 transitions

v Are there other machines that correspond to the same language r/baa+!/ ? v Yes

6501 Natural Language Processing 12

Alphabet justmeansafinitesetofsymbols intheinput

Canhavemanyacceptstates

Page 13: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Formal definition

vYou can specify an FSA by enumerating the following things.vThe set of states: QvA finite alphabet: ΣvA start statevA set of accept/final statesvA transition function that maps QxΣ to Q

6501 Natural Language Processing 13

Page 14: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Example -- dollars and Cents

6501 Natural Language Processing 14

Page 15: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Yet another view – table representation

6501 Natural Language Processing 15

b a ! e0 11 22 2,33 44

Ifyou’reinstate1andyou’relookingatana,gotostate2

Page 16: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Non-Deterministic FSA

v 𝜖- transitionvMore than one possible next statesvEquivalent to deterministic FSA

6501 Natural Language Processing 16

Page 17: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Regular expression

vEquivalent to FSAvMatching strings with regular expressions

(e.g., perl, python, grep)v translating the regular expression into a machine (a

table) and v passing the table and the string to an interpreter

6501 Natural Language Processing 17

Page 18: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Model morphology with FSA

vRegular singular nouns are okvRegular plural nouns have an -s on the endv Irregulars are ok as is

6501 Natural Language Processing 18

Page 19: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Now plug in the words

6501 Natural Language Processing 19

Page 20: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Derivational Rules

6501 Natural Language Processing 20

Page 21: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

From recognition to parsing

vNow we can use these machines to recognize strings

vCan we use the machines to assign a structure to a string? (parsing)

vExample:vFrom “cats” to “cat +N +p”

6501 Natural Language Processing 21

Page 22: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Transitions

v c:c reads a c and write a cv ε:+N reads nothing and write +N

6501 Natural Language Processing 22

c:c a:a t:t ε: +N s: +p

Page 23: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Challenge: Ambiguity

v books: book +N +p or book +V +z (3rd

person)vNon-deterministic FSA: allows multiple

paths through a machine lead to the same accept state

vBias the search (or learn) so that a few likely paths are explored

6501 Natural Language Processing 23

Page 24: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Challenge: Spelling rules

v The underlying morphemes (e.g., plural-s)can have different surface realization (-s, -es)v cat+s = catsv fox+s = foxesv Make+ing = making

v How can we model it?

6501 Natural Language Processing 24

Page 25: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Intermediate representation

6501 Natural Language Processing 25

Page 26: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Overall Scheme

vOne FST that has explicit informationabout the lexiconvLexical level to intermediate forms

v Large set of machinesthat capture spelling rulesv Intermediate forms to surface

6501 Natural Language Processing 26

Page 27: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Lexical to intermediate level

6501 Natural Language Processing 27

Page 28: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Intermediate level to surface

vThe add and “e” rule for –svExample: fox^s# ↔ foxes#

6501 Natural Language Processing 28

Page 29: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

Other application of FST

v ELIZA: https://en.wikipedia.org/wiki/ELIZAv Implemented using pattern matching -- FST

6501 Natural Language Processing 29

Page 30: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

ELIZA as a FST cascade

Human: You don't argue with me.Computer: WHY DO YOU THINK I DON'T ARGUE WITH YOU

A simple rule:v1. Replace you with I and me with you:

I don't argue with you.v2. Replace <...> with Why do you think <...>:

Why do you think I don't argue with you.

6501 Natural Language Processing 30

Page 31: Lecture 5: Morphology - Computer Sciencekc2wc/teaching/NLP16/slides/05-Morphology.pdf · This lecture vWhat is the structure of words? vCan we build an analyzer to model the structure

What about compounds?

vCompounds have heretical structure:v (((ice cream) cone) bakery) not

(ice ((cream cone) bakery))v ((computer science) (graduate student)) not

(computer ((science graduate) student))

vWe need context-free grammars to capturethis underlying structure

6501 Natural Language Processing 31