natural language processing chapter 3 : morphological analysis
TRANSCRIPT
04/18/23 NLP 2
Definition
• Morphology is the study of word formation – how words are built up from smaller pieces. When we do morphological analysis, then, we’re asking questions like, what pieces does this word have? What does each of them mean? How are they combined?
•Goal : Given a word that’s not in the dictionary, can we derive a root form that is in the dictionary.
•Morphological analysis is the process of recognizing the root form and type of a morphological variant (prefix, suffix). Given a word W:
04/18/23 NLP 3
Algorithm
• 1- If W is in the dictionary, then return its definition.• 2- Else apply morphology rules to identify possible root
forms of W. - Each morphology rule strips a prefix or suffix from W,
and sometimes adds back replacement characters, to produce a possible root form. If the root form is in the dictionary, success! - When a morphology rule succeeds, the root word definition is returned along with properties of the morphological variant.• Rules must be applied recursively! Multiple derivations are
common!
04/18/23 NLP 9
Basic Parts of Speech
• Parts of Speech: adjective, adverb, article, conjunction, noun, verb, preposition, pronoun, ...
• A closed class is a class that contains a relatively fixed set of words;
• new words are rarely introduced into the language.• Ex: articles, conjunctions, pronouns, prepositions, ...• An open class is a class that contains a constantly
changing set of words; new words are often introduced into the language (that readily accept new members)
• Ex: adjectives, adverbs, nouns, verbs• Examples of Closed Classes• Articles: a, an, the• Conjunctions: and, but, or, ...• Demonstratives: this, that, these, ...
04/18/23 NLP 10
Basic Parts of Speech
• Prepositions: to, for, with, between, at, of, ...• Pronouns: I, you, me, we, he, she, him, her, ...• Quantifiers الكمية محددة غير ,some, every, most, any ( كلمات
both, ...Articles
• Articles are especially problematic for natural language generation.
• Many noun phrases begin with an article.• Ex: a newspaper, an apple, the movie• But there are many exceptions, for example:• The bowl was full of rice. -The bowl was full of apple.• I go to college. - I go to university.• She went on vacation. - She went on trip.
04/18/23 NLP 11
Basic Parts of Speech
Nouns• Nouns: Words that represent objects, places, concepts,
events.• Ex: dog, city, idea, marathon• Proper nouns : names of persons, city• Count nouns: describe specific objects or sets of objects.• Ex: dogs, cities, ideas, marathons• Mass nouns: describe composites or substances.• Ex: dirt, water, garbage, deer• Modifiers• Adjectives: words that attribute qualities to objects.• Ex: wet, loud, happy, funny• Noun modifiers: nouns that modify other nouns.• Ex: dog food, aluminum can, song book
04/18/23 NLP 12
Basic Parts of Speech
Prepositions and Particles
• Prepositions represent relationships, such as time, location,
modification, and complements. For example:• He put the book on the table.• Sam gave the book to Mary.• Jane walked up the stairs.• Particles follow verbs and create a new meaning. For
example:• Greg passed out.• Charlie threw up his lunch.• Sometimes there is preposition/particle ambiguity:• Sarah looked over the paper.
04/18/23 NLP 13
Basic Parts of Speech
Verbs• Verbs: represent actions, commands, or assertions.• Main verbs: walk, eat, believe, claim, ask, ...• Auxiliary verbs: be, do, have• Modals: would, should, could, can, will, may, ..• Transitive verbs: take a direct object complement.• Ex: eat an apple, read a book, sing a song• Intransitive verbs: do not take a direct object.• Ex: she laughed, he lied, I slept.• Bitransitive verbs: take both a direct object and an indirect
object..• I gave Mary a gift.• She sang the baby a lullaby.
04/18/23 NLP 14
part of speech tagging
Tagging :The process of assigning a part-of-speech or other lexical class marker to each word in a corpus.
Example :
thegirlkissedthebabyonthecheek
WORDSTAGS
NVPDET
04/18/23 NLP 15
part of speech tagging
thegirlkissthebabyonthecheek
LEMMA TAG
+DET+NOUN+VPAST+DET+NOUN+PREP+DET+NOUN
thegirlkissedthebabyonthecheek
WORD
04/18/23 NLP 17
Rule-Based Tagging
• Basic Idea:– Assign all possible tags to words
– Remove tags according to set of rules of type: if word+1 is an adj, adv, or quantifier and the following is a sentence boundary and word-1 is not a verb like “consider” then eliminate non-adv else eliminate adv.
– Typically more than 1000 hand-written rules, but may be machine-learned.
04/18/23 NLP 18
Stochastic Tagging
• Based on probability of certain tag occurring given various possibilities
• Requires a training corpus
• No probabilities for words not in corpus.
• Training corpus may be different from test corpus.
04/18/23 NLP 19
Transformation-Based Tagging (Brill Tagging)
• Combination of Rule-based and stochastic tagging methodologies– Like rule-based because rules are used to specify tags in a
certain environment– Like stochastic approach because machine learning is used—
with tagged corpus as input• Input:
– tagged corpus– dictionary (with most frequent tags)
• Usually constructed from the tagged corpus
• Basic Idea:– Set the most probable tag for each word as a start value– Change tags according to rules of type “if word-1 is a
determiner and word is a verb then change the tag to noun” in a specific order