october 2004csa3050 nlp algorithms1 csa3050: natural language algorithms morphological parsing

Post on 18-Jan-2018

245 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

October 2004CSA3050 NLP Algorithms3 Inflectional/Derivational Morphology Inflectional +s plural +ed past category preserving productive: always applies (esp. new words, e.g. fax) systematic: same semantic effect Derivational +ment category changing escape+ment not completely productive: detractment* not completely systematic: apartment

TRANSCRIPT

October 2004 CSA3050 NLP Algorithms 1

CSA3050: Natural Language Algorithms

Morphological Parsing

October 2004 CSA3050 NLP Algorithms 2

Morphology

• Morphemes: The smallest unit in a word that bear some meaning, such as rabbit and s, are called morphemes.

• Combination of morphemes to form words that are legal in some language.

• Two kinds of morphology– Inflectional– Derivational

October 2004 CSA3050 NLP Algorithms 3

Inflectional/DerivationalMorphology

• Inflectional+s plural+ed past

• category preserving• productive: always

applies (esp. new words, e.g. fax)

• systematic: same semantic effect

• Derivational+ment

• category changingescape+ment

• not completely productive: detractment*

• not completely systematic: apartment

October 2004 CSA3050 NLP Algorithms 4

Noun Inflections

Regular Irregular

Singular cat church mouse ox

Plural cats churches mice oxen

October 2004 CSA3050 NLP Algorithms 5

Morphological Parsing

MorphologicalParser

Input Word

cats

OutputAnalysis

cat N PL

• Output is a string of morphemes• Reversibility?

October 2004 CSA3050 NLP Algorithms 6

Morphological Parsing

• The goal of morphological parsing is to find out what morphemes a given word is built from. mouse mouse N SGmice mouse N PLfoxes fox N PL

October 2004 CSA3050 NLP Algorithms 7

2 Steps1. Split word up into its possible components,

using + to indicate possible morpheme boundaries.

cats cat + sfoxes fox + sfoxes foxe + s

2. Look up the categories of the stems and the meaning of the affixes, using a lexicon of stems and affixes

cat + s cat + NP + PLfox + s fox + N + PL.

October 2004 CSA3050 NLP Algorithms 8

Step 1: Surface IntermediateFST

October 2004 CSA3050 NLP Algorithms 9

Step 1: Surface IntermediateOperation

October 2004 CSA3050 NLP Algorithms 10

2. Intermediate Morphemes

Possible inputs to the transducer are:

• Regular noun stem: cat• Regular noun stem + s: cat+s• Singular irregular noun stem: mouse• Plural irregular noun stem: mice

October 2004 CSA3050 NLP Algorithms 11

2. Intermediate MorphemesTransducer

October 2004 CSA3050 NLP Algorithms 12

Handling Stems

cat /cat

mice/mouse

October 2004 CSA3050 NLP Algorithms 13

Completed Stage 2

October 2004 CSA3050 NLP Algorithms 14

Joining Stages 1 and 2

• If the two transducers run in a cascade (i.e. we let the second transducer run on the output of the first one), we can do a morphological parse of (some) English noun phrases.

• We can change also the direction of translation (in translation mode).

• This transducer can also be used for generating a surface form from an underlying form.

October 2004 CSA3050 NLP Algorithms 15

Prolog• The transducer

specifications we have seen translate easily into Prolog format except for the other transition.

• arc(1,3,z:z).arc(1,3,s:s).arc(1,3,x:x).arc(1,2,#:+).arc(1,3,<other>).

October 2004 CSA3050 NLP Algorithms 16

Handling other arcs

arc(1,3,z:z) :- !.arc(1,3,s:s) :- !.arc(1,3,x:x) :- !.arc(1,2,#:+) :- !.arc(1,3,X:X) :- !.

October 2004 CSA3050 NLP Algorithms 17

Combining Rules• Consider the word “berries”.• Two rules are involved

– berry + s– y → ie under certain circumstances.

• Combinations of such rules can be handled in two ways– Cascade, i.e. sequentially– Parallel

• Algorithms exist for combining transducers together in series or in parallel.

• Such algorithms involve computations over regular relations.

October 2004 CSA3050 NLP Algorithms 18

3 Related Frameworks

REGULARLANGUAGES

REGULAREXPRESSIONS

FSA

October 2004 CSA3050 NLP Algorithms 19

REGULAR RELATIONS

REGULARRELATIONS

AUGMENTEDREGULAR

EXPRESSIONS

FINITE STATETRANSDUCERS

October 2004 CSA3050 NLP Algorithms 20

Putting it all together

execution of FSTi

takes place in parallel

October 2004 CSA3050 NLP Algorithms 21

Kaplan and KayThe Xerox View

FSTi are alignedbut separate

FSTi intersectedtogether

October 2004 CSA3050 NLP Algorithms 22

Summary

• Morphological processing can be handled by finite state machinery

• Finite State Transducers are formally very similar to Finite State Automata.

• They are formally equivalent to regular relations, i.e. sets of pairings of sentences of regular languages.

top related