lecture 5: morphology - computer sciencekc2wc/teaching/nlp16/slides/05-morphology.pdf · this...

Post on 14-May-2018

222 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 5: Morphology

Kai-Wei ChangCS @ University of Virginia

kw@kwchang.net

Couse webpage: http://kwchang.net/teaching/NLP16

16501 Natural Language Processing

This lecture

vWhat is the structure of words?vCan we build an analyzer to model the

structure of words?vFinite-state automata and regular expression

26501 Natural Language Processing

Words

vFinite-state methods are particularly useful in dealing with a lexiconvCompact representations of words

vAgendavsome facts about wordsvcomputational methods

6501 Natural Language Processing 3

A Turkish word

vHow about English?

6501 Natural Language Processing 4

ExamplefromJuliaHockenmaier, IntrotoNLP

Longest word in English

v Longest word in Shakespeare’sHonorificabilitudinitatibus (27 letters)

v Longest non-technical word:Antidisestablishmentarianism (28 letters)

v Longest word in a major dictionaryPneumonoultramicroscopicsilicovolcanoconiosis (45 letters)

v Longest word in literatureLopadotemachoselachogaleokranioleipsano...pterygon (182 letters) – Ancient greek transliteration

v Methionylthreonylthreonylglutaminylarginyl...isoleucine (189,819 letters) – chemical name of a protein

6501 Natural Language Processing 5

What is Morphology?

vThe ways that words are built up from smaller meaningful units (morphemes)

vTwo classes of morphemesvStems: The core meaning-bearing unitsvAffixes: adhere to stems to change their

meanings and grammatical functions ve.g,. dis-grace-ful-ly

6501 Natural Language Processing 6

Inflection Morphology

Create different forms of the same word:vExamples:

vVerbs: walk, walked, walksvNouns: Book, books, book’s vPersonal pronouns: he, she, her, them, us

vServes a grammatical/semantic purpose that is different from the original but is transparently related to the original

6501 Natural Language Processing 7

Derivational Morphology

Create different words from the same lemma:v Nominalization:

v V+ -ation: e.g., computerizationv V+er: killer

v Negation:v Un-: Unod, unseen, …v Mis-: mistake, misunderstand ...

v Adjectivization:v V+-able: doablev N+-al: national

6501 Natural Language Processing 8

What else?

vCombines words into a new word:vCream, ice cream, ice cream cone, ice cream

cone bakery

vWord formation is productivevGoogle, Googler, to google, to misgoogle, to

googlefy, googlificationvGoogle Map, Google Book, …

6501 Natural Language Processing 9

Morphological parsing and generation

vMorphological parsing:

vMorphological generationvWhat words can be generated from grace?

grace, graceful, gracefully, disgrace, ungrace, undisgraceful, undisgracefully

6501 Natural Language Processing 10

Finite State Automata

vFSA and regular expression has the same expressive power

vThe above FSA accepts string r/baa+!/

6501 Natural Language Processing 11

Finite State Automata

v Terminology: v It has 5 statesv Alphabet: {b, a, !}v Start state: 𝑞"v Accept state: 𝑞#v 5 transitions

v Are there other machines that correspond to the same language r/baa+!/ ? v Yes

6501 Natural Language Processing 12

Alphabet justmeansafinitesetofsymbols intheinput

Canhavemanyacceptstates

Formal definition

vYou can specify an FSA by enumerating the following things.vThe set of states: QvA finite alphabet: ΣvA start statevA set of accept/final statesvA transition function that maps QxΣ to Q

6501 Natural Language Processing 13

Example -- dollars and Cents

6501 Natural Language Processing 14

Yet another view – table representation

6501 Natural Language Processing 15

b a ! e0 11 22 2,33 44

Ifyou’reinstate1andyou’relookingatana,gotostate2

Non-Deterministic FSA

v 𝜖- transitionvMore than one possible next statesvEquivalent to deterministic FSA

6501 Natural Language Processing 16

Regular expression

vEquivalent to FSAvMatching strings with regular expressions

(e.g., perl, python, grep)v translating the regular expression into a machine (a

table) and v passing the table and the string to an interpreter

6501 Natural Language Processing 17

Model morphology with FSA

vRegular singular nouns are okvRegular plural nouns have an -s on the endv Irregulars are ok as is

6501 Natural Language Processing 18

Now plug in the words

6501 Natural Language Processing 19

Derivational Rules

6501 Natural Language Processing 20

From recognition to parsing

vNow we can use these machines to recognize strings

vCan we use the machines to assign a structure to a string? (parsing)

vExample:vFrom “cats” to “cat +N +p”

6501 Natural Language Processing 21

Transitions

v c:c reads a c and write a cv ε:+N reads nothing and write +N

6501 Natural Language Processing 22

c:c a:a t:t ε: +N s: +p

Challenge: Ambiguity

v books: book +N +p or book +V +z (3rd

person)vNon-deterministic FSA: allows multiple

paths through a machine lead to the same accept state

vBias the search (or learn) so that a few likely paths are explored

6501 Natural Language Processing 23

Challenge: Spelling rules

v The underlying morphemes (e.g., plural-s)can have different surface realization (-s, -es)v cat+s = catsv fox+s = foxesv Make+ing = making

v How can we model it?

6501 Natural Language Processing 24

Intermediate representation

6501 Natural Language Processing 25

Overall Scheme

vOne FST that has explicit informationabout the lexiconvLexical level to intermediate forms

v Large set of machinesthat capture spelling rulesv Intermediate forms to surface

6501 Natural Language Processing 26

Lexical to intermediate level

6501 Natural Language Processing 27

Intermediate level to surface

vThe add and “e” rule for –svExample: fox^s# ↔ foxes#

6501 Natural Language Processing 28

Other application of FST

v ELIZA: https://en.wikipedia.org/wiki/ELIZAv Implemented using pattern matching -- FST

6501 Natural Language Processing 29

ELIZA as a FST cascade

Human: You don't argue with me.Computer: WHY DO YOU THINK I DON'T ARGUE WITH YOU

A simple rule:v1. Replace you with I and me with you:

I don't argue with you.v2. Replace <...> with Why do you think <...>:

Why do you think I don't argue with you.

6501 Natural Language Processing 30

What about compounds?

vCompounds have heretical structure:v (((ice cream) cone) bakery) not

(ice ((cream cone) bakery))v ((computer science) (graduate student)) not

(computer ((science graduate) student))

vWe need context-free grammars to capturethis underlying structure

6501 Natural Language Processing 31

top related