morphology what is morphology? finite state transducers two level morphology

50
Morphology What is morphology? Finite State Transducers Two Level Morphology

Post on 19-Dec-2015

287 views

Category:

Documents


9 download

TRANSCRIPT

Page 1: Morphology What is morphology? Finite State Transducers Two Level Morphology

Morphology

What is morphology?

Finite State Transducers

Two Level Morphology

Page 2: Morphology What is morphology? Finite State Transducers Two Level Morphology

What is morphology?• Decomposition of words into meaningful units:

anti dis establish ment arian ism

•Interacts with- syntax( categories and word order) [establish] = verb + ment = noun phonology: divine divinity obscene obscenity

• Interacts with semantics: boy boys Peter Peterchen

Page 3: Morphology What is morphology? Finite State Transducers Two Level Morphology

Phonological String

morphological analyzer

dictionary lookup

syntactic analyzer

lexical- semantic analysis

discourse processing

Page 4: Morphology What is morphology? Finite State Transducers Two Level Morphology

Why store all words as morphemes rather than allMorphological combinations as words?

What does the morphological analyzer have to output?

Page 5: Morphology What is morphology? Finite State Transducers Two Level Morphology

The what and the how:

•Efficient and effective algorithm to decompose categories into, or build categories from, component morphemes. What this algorithm will be depends on problems it has to solve. In turn depends on representations computed.

Given stem /lemma ( e.g. ‘jump’ add material to change categoryOr grammatical properties of word ‘jumped’, ‘jumpable’

• order of composition matters:

ride/ ridingenoble/ enobling/*nobling Adj ---> V, V===> V+ingtrance/*trancing/entrance/entrancing

Page 6: Morphology What is morphology? Finite State Transducers Two Level Morphology

CONCATENATIVE MORPHOLOGICAL PROCESSES:

COMPOUNDING:firefighter

PREFIXATION:Un+ well

INFIXATION: ( TAGALOG)fikas - strongfumikas - be strong

SUFFIXATION:Kick + er

CIRCUMFIXATION: ( German) ge [sag] t past prefix [say] past suffix

Page 7: Morphology What is morphology? Finite State Transducers Two Level Morphology

Inflectional Morphology

•non category changing, required by syntax

•Agreement: person/number: Je parle Nous parlons Ils parlent

• Gender: la petite ( the little one (fem)) le petit ( the little one (masc)) la squelette ( the skeleton)

Page 8: Morphology What is morphology? Finite State Transducers Two Level Morphology

Derivational Morphology

•changes category. Not required by syntax

Deverbal Nominal:

bak+er tion: destroy/destructioncatch+ er Roman's destruction of the city

'er' = agent of action Catcher of the ball John’s catcher of the ball 'John" ~= one who caught

Page 9: Morphology What is morphology? Finite State Transducers Two Level Morphology

Regular vs Irregular

Jump/jumped hit/hit bring/brought sing/sang

Productive/Non-Productive

adore/adorable, kick/kickable, fax/faxableproduce/production destroy/destruction *graft/graftuctionBring/ brought

Page 10: Morphology What is morphology? Finite State Transducers Two Level Morphology

Regular (English) Verbs

Morphological Form Classes Regularly Inflected Verbs

Stem walk merge try map

-s form walks merges tries maps

-ing form walking merging trying mapping

Past form or –ed participle walked merged tried mapped

Page 11: Morphology What is morphology? Finite State Transducers Two Level Morphology

Irregular (English) Verbs

Morphological Form Classes Irregularly Inflected Verbs

Stem eat catch cut

-s form eats catches cuts

-ing form eating catching cutting

Past form ate caught cut

-ed participle eaten caught cut

Page 12: Morphology What is morphology? Finite State Transducers Two Level Morphology

“To love” in Spanish

Page 13: Morphology What is morphology? Finite State Transducers Two Level Morphology

•Productive and rule governed:

fax fax +er

??? Crudoy cruduction

•Category sensitivity:breakable/* manablesensitivity/ *hittivity

•Semantic sensitivity:un + well un + happy*un + ill *un+ sad

Page 14: Morphology What is morphology? Finite State Transducers Two Level Morphology

lebensversicherungsgesellschaftsangesteller

leben+ versicherung + gesellschaft+s+angestellerlife insurance company +Poss employee

Turkish:

Turkish verns have 40k forms

Store morphemes or words?

Page 15: Morphology What is morphology? Finite State Transducers Two Level Morphology

Non- concatenative Morphology

• Templatic morphology (Semitic languages):lmd (learn), lamad (he studied), limed (he taught), lumad (he was taught)

Page 16: Morphology What is morphology? Finite State Transducers Two Level Morphology

Concatenation: Beads on a stringAgglutinative ( concatenative) languages are well behaved for FSAsas long as we don’t include phonological or spelling changes

Verb Lexicon:jump+ed jumpkiss+ed kissstream+ed stream*hopp+ed hop, ???

qq0

verbq 1

ed

q2q1

Page 17: Morphology What is morphology? Finite State Transducers Two Level Morphology

q0 q1 q3q2

un adj-root -er,est,ly

The lexicon stores the lemmas, and divides them into adjective classes

really/clearly *bigly/redly

Morphotactics:

State sequence indicates order of morpheme compositione.g. comparative or adverb formation is by suffixation

Pieces of a Morphological Analyzer

Page 18: Morphology What is morphology? Finite State Transducers Two Level Morphology

Lexicon

• Arranged as TRIE ( letter strings in common relative to position n-k-e-y D-o -g

• Classed by part of speech category ( noun, verb) and morphotactic(which other affixes can precede or follow)or orthographic considerations.

Page 19: Morphology What is morphology? Finite State Transducers Two Level Morphology

Orthography

•spelling rules- handle phonological or spelling variation in orthographic a morpheme

Try /trying/triesCringe/cringing/cringes

Page 20: Morphology What is morphology? Finite State Transducers Two Level Morphology

FSA for Inflectional Morphology: English Nouns

Page 21: Morphology What is morphology? Finite State Transducers Two Level Morphology

FSA for Inflectional Morphology: English Verbs

Page 22: Morphology What is morphology? Finite State Transducers Two Level Morphology

FSA for Derivational Morphology: Adjectival Formation

Page 23: Morphology What is morphology? Finite State Transducers Two Level Morphology

More Complex Derivational Morphology

Page 24: Morphology What is morphology? Finite State Transducers Two Level Morphology

Using FSAs for Recognition: English Nouns and their Inflection

Page 25: Morphology What is morphology? Finite State Transducers Two Level Morphology

Orthographic

• Want association between morpheme and semantic function• Want association between allographs or allophones of the samephoneme

Allographs:city -citiesbake- bakingdivine-divinitytry tried

Page 26: Morphology What is morphology? Finite State Transducers Two Level Morphology

Two Level Morphology: - Finite state transducer moves pair of read heads

simultaneously along 2 strings and maps one string into another string, making sure that steps are aligned and derivation proceeds in parallel

- Parallelism important to limit growth of transducers - Total machine = sum of individual machines. - Parallelism important for efficiency.

Finite State Transducers (FSTs)- the Big IdeaNeed to relate lexical level, the level that gives us the morphologicalanalysis (+plural,+able to the surface level that keeps track of phonological/ or graphological (spelling_ changes)

Page 27: Morphology What is morphology? Finite State Transducers Two Level Morphology

Parsing vs recognition

• An FSA can give you the string composition of a morphological sequence, and can tell you whether a given morphological string is or is not in the language. It recognizes the string

• An FST parses the string. It tells you the morphological structure associated with the string. Other instances of parsing?

Page 28: Morphology What is morphology? Finite State Transducers Two Level Morphology

Formal definition

•An FST defines a relation between sets of pairs of strings:•It contains at least a lexical level that is a concatenation of morphemesand a surface level that shows the correct spelling for each morpheme in a given context

cat/sheep ^ se.g. noun (instanciated from lexicon) + plural E s cats/sheep

Page 29: Morphology What is morphology? Finite State Transducers Two Level Morphology

Q= finite set of states q0 to qn

finite alphabet of complex symbols (feasible pairs) i:o with one symbol from the input alphabetQ0 = the start stateF= set of final states = (q, i:o) the transition function or matrixbetween states. Takes a state from Q and a complex symbol i:o from and returns a new state.

feasible pair: a relation of a symbol on one tape to a symbol on the other tape.e.g. can + [pl:^s]

Page 30: Morphology What is morphology? Finite State Transducers Two Level Morphology

• default pair- the upper tape is the same as the lower tape same input as output :c*a*t/c:c*a:a*t:t*pl:^s

•feasible pairs either stated in lexicon if irregularg:g*o:e*o:e*s:s*e:e goose:geese

or by an automaton that stipulates correspondence in rulegoverned way if the relation is regular. If regular, indicated asDefault paris and usually represented by one symbol.

•FSTs are closed under: inversion: switches i/o labels composition: union of two transducers one after the other.

Page 31: Morphology What is morphology? Finite State Transducers Two Level Morphology

trie: in lexicon, categories arranged by letter one at a time with class at end. Allows parallel search as long as things matche.g. m*e*t*a*l <N> m*e*t*a <root>

metal, meta-language

Page 32: Morphology What is morphology? Finite State Transducers Two Level Morphology

Kimmo-BasedMorphological Parsing

• Two-level morphology: lexical level + surface level (Koskenniemi 83)

• Finite-state transducers (FST): input-output pair

Page 33: Morphology What is morphology? Finite State Transducers Two Level Morphology

Four-Fold View of FSTs

• As a recognizer

• As a generator

• As a translator

• As a set relater

Page 34: Morphology What is morphology? Finite State Transducers Two Level Morphology

Terminology for Kimmo

• Upper = lexical tape• Lower = surface tape• Characters correspond to pairs, written a:b• If “a=b”, write “a” for shorthand• Two-level lexical entries• # = word boundary• ^ = morpheme boundary• Other = “any feasible pair that is not in this

tranducer”

Page 35: Morphology What is morphology? Finite State Transducers Two Level Morphology

Nominal Inflection FST

Page 36: Morphology What is morphology? Finite State Transducers Two Level Morphology

Lexical and Intermediate Tapes

Page 37: Morphology What is morphology? Finite State Transducers Two Level Morphology

Spelling RulesName Rule Description Example

Consonant Doubling 1-letter consonant doubled before -ing/-ed beg/begging

E-deletion Silent e dropped before -ing and -ed make/making

E-insertion e added after s,z,x,ch,sh before s watch/watches

Y-replacement -y changes to -ie before -s, -i before -ed try/tries

K-insertion verbs ending with vowel + -c add -k panic/panicked

Page 38: Morphology What is morphology? Finite State Transducers Two Level Morphology

Notation

e / xsz

^ __ s #

Page 39: Morphology What is morphology? Finite State Transducers Two Level Morphology

Intermediate-to-Surface Transducer

Page 40: Morphology What is morphology? Finite State Transducers Two Level Morphology

Two-Level Morphology

Page 41: Morphology What is morphology? Finite State Transducers Two Level Morphology

Sample Run

Page 42: Morphology What is morphology? Finite State Transducers Two Level Morphology

FSTs and ambiguity

Parse Example 1: unionizable

Parse Example 2: assess

Page 43: Morphology What is morphology? Finite State Transducers Two Level Morphology

What to do about Global Ambiguity?

• Accept first successful structure

• Run parser through all possible paths

• Bias the search in some manner

Page 44: Morphology What is morphology? Finite State Transducers Two Level Morphology

Latin Nominative deletion: ( from Sproat)

stem Genitive Nominative Glossleg legis legs lawlit litis lis strifefraud fraudis fraus deceitfront frontis frons browdent dentis dens toothsort sortis sors lot

t/d

t/d ---------> 0/[+son ( vowels or nasals)]_______s

Some Limitations

Page 45: Morphology What is morphology? Finite State Transducers Two Level Morphology

Rule ordering:

Finnish:"T-deletion"

t ------> 0 V______V ( simplified)

naura( to laugh) + ten ( 2nd infinitive)nauraen

i ------> j/ V________V

talo + i + tenhouse pl Gen

talo + i +entalo + j +entalojen

Page 46: Morphology What is morphology? Finite State Transducers Two Level Morphology

Kimmo treatment of Finnish - no rule ordering because feeding relationships block parallelism. - two stems for suffix iten( genitive plural) ien selector feature to trigger j -----> i rule t a l o & + i e n t a l o 0 0 j e n Insert & into V____V ( simplified) diacritic triggers j ---> i automaton Cost = something that's rule governed is listed in lexicon. Not expressing truw generalization. Gain = parallelism and potentially effciency of finite state transduction.

Page 47: Morphology What is morphology? Finite State Transducers Two Level Morphology

Stemming

• For some applications,don’t need full morphological analysis. • IR- don’t care that e.g ‘logician’ is related to ‘logical’ Just want to know that if you are interested in articles about ‘logic’ may want former two classes as well. So just want to ‘get backto root list.

• Relate two forms by having a literal relation rule. E.g al#---> 0• Is it useful: in a big document may not be necessary because thewill appear in many forms including form in query

Page 48: Morphology What is morphology? Finite State Transducers Two Level Morphology

• stemming is morphologically impoverished so error driven - can’t distinguish rules that apply at morpheme boundaries versus internal to root: patronization = patron + ize + ation organization = organize+ ationBut the stemmer will treat these as a single class and derive“organ” as an underlying root. -’adverse’/’adversity ‘universe / university

Page 49: Morphology What is morphology? Finite State Transducers Two Level Morphology

Psycholinguistics

• Is the human lexicon efficient in the way computational lexica are? -Stanners et al (1979) :where two words are related inflection- ally,then root stored and other forms rule derived. Where there is a derivational relationship, then both forms are stored

paradigm = repetition priming‘great, happy, peachy, adorable , round, short, great small

Repetition priming for ‘turns’ given ‘turning’ but not‘select’, ‘selective’

Page 50: Morphology What is morphology? Finite State Transducers Two Level Morphology

• Marslen- Wilson et al (1994): May have priming forSemantically similar derivationally related words:

permit/permission* create/creativity

On-line versus long term storage lexicon:

Speech errors: ‘we have screw looses’