natural language processing parts of speech tagging, its classes, and how to process it
DESCRIPTION
Natural Language processing Parts of speech tagging, classes, processingTRANSCRIPT
![Page 1: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/1.jpg)
Part of Speech Tagging
![Page 2: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/2.jpg)
Perpectivising NLP: Areas of AI andtheir inter-dependencies
KnowledgeSearch Logic Representation
MachineLearning Planning
ExpertSystemsNLP Vision Robotics
![Page 3: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/3.jpg)
Two pictures
ProblemNLP
Semantics NLPnity
Parsing
MorphVision SpeechAnalysis
HMMStatistics and Probability Hindi English
LanguageCRF+
Knowledge Based
MEMM
Algorithm
Semantics N
Tri
Part of SpeechTagging
Marathi French
![Page 4: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/4.jpg)
What it is
POS Tagging is a process that attaches
each word in a sentence with a suitable
tag from a given set of tags.
The set of tags is called the Tag-set.
Standard Tag-set : Penn Treebank (for
English).
![Page 5: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/5.jpg)
Definition
Tagging is the assignment of a
singlepart-of-speech tag to each word
(and punctuation marker) in a corpus.
“_“ The_DT guys_NNS that_WDT
make_VBP traditional_JJ hardware_NN
are_VBP really_RB being_VBG
obsoleted_VBN by_IN microprocessor-
based_JJ machines_NNS ,_, ”_” said_VBD
Mr._NNP Benton_NNP ._.
![Page 6: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/6.jpg)
POS Tags
NN – Noun; e.g.
VM – Main Verb;
Dog_NN
e.g. Run_VM
VAUX – Auxiliary Verb; e.g. Is_VAUX
JJ – Adjective; e.g. Red_JJ
PRP – Pronoun; e.g. You_PRP
NNP – Proper Noun; e.g. John_NNP
etc.
![Page 7: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/7.jpg)
POS Tag Ambiguity
In English : I bank1 on the bank2 on the
river bank3 for
Bank1 is verb,
my transactions.
the other two banks are
noun
In Hindi :
”Khaanaa” : can be noun (food) or
eat)
verb (to
![Page 8: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/8.jpg)
For Hindi
Rama achhaa gaata hai. (hai is VAUX :
Auxiliary verb); Ram sings well
Rama achha ladakaa hai. (hai is VCOP :
Copula verb); Ram is a good boy
![Page 9: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/9.jpg)
Process
List all possible tag for each word in
sentence.
Choose best suitable tag sequence.
![Page 10: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/10.jpg)
Example
”People jump high”.
People : Noun/Verb
jump : Noun/Verb
high : Noun/Verb/Adjective
We can start with probabilities.
![Page 11: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/11.jpg)
![Page 12: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/12.jpg)
Importance of POS tagging
Ack: presentation by ClaireGardent on POS tagging by
NLTK
![Page 13: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/13.jpg)
What is Part of Speech (POS)
Words can be divided into classesbehave similarly.
Traditionally eight parts of speech
that
inEnglish: noun, verb, pronoun,preposition, adverb,adjective and article
More recently larger
conjunction,
sets have beenused: e.g. Penn Treebank (45 tags),Susanne (353 tags).
![Page 14: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/14.jpg)
Why POS POS tell us a lot about a word (and the
words near it). E.g, adjectives often followed by nouns
personal pronouns often followed by verbs
possessive pronouns by nouns
Pronunciations depends on POS, e.g. object (first syllable NN, second syllable VM), content, discount
First step in many NLP applications
![Page 15: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/15.jpg)
Categories of POSOpen and closed classes
Closed classes have a fixed membership of words: determiners, pronouns, prepositions
Closed class words are usually functionword: frequently occurring, grammatically important, often short (e.g. of, it, the, in)
Open classes: nouns, verbs, adjectives and adverbs(allow new addition of word)
![Page 16: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/16.jpg)
Open Class (1/2) Nouns:
Proper nouns (Scotland, BBC),
common nouns count nouns (goat, glass)
mass nouns (snow, pacifism)
Verbs: actions and processes (run, hope)
also auxiliary verbs (is, are, am, will, can)
![Page 17: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/17.jpg)
Open Class (2/2) Adjectives:
properties and qualitiesvalue)
Adverbs:
(age, colour,
modify verbs, or verb phrases, or otheradverbs- Unfortunately John walked home extremely slowly yesterday
Sentential adverb: unfortunately
Manner adverb: extremely, slowly
Time adverb: yesterday
![Page 18: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/18.jpg)
Closed class Prepositions: on, under, over, to, with,
by
Determiners: the, a, an, some
Pronouns: she, you, I, who
Conjunctions: and, but, or, as, when, if Auxiliary verbs: can, may, are
![Page 19: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/19.jpg)
Penn tagset (1/2)
![Page 20: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/20.jpg)
Penn tagset (2/2)
![Page 21: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/21.jpg)
IndianNoun
Language Tagset:
![Page 22: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/22.jpg)
Indian Language Tagset:Pronoun
![Page 23: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/23.jpg)
Indian Language Tagset:Quantifier
![Page 24: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/24.jpg)
Indian Language Tagset:Demonstrative
3 Demonstrative DM DM Vaha, jo, yaha,
3.1 Deictic DMD DM DMD Vaha, yaha
3.2 Relative DMR DM DMR jo, jis
3.3 Wh-word DMQ DM DMQ kis, kaun
Indefinite DMI DM DMI KoI, kis
![Page 25: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/25.jpg)
Indian Language Tagset:Verb, Adjective, Adverb
![Page 26: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/26.jpg)
Indian Language Tagset:Postposition, conjunction
![Page 27: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/27.jpg)
Indian Language Tagset:Particle
![Page 28: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/28.jpg)
Indian Language Tagset:Residuals
![Page 29: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/29.jpg)
BigramBest tag sequence
Assumption
===
T*argmax P(T|W)argmax P(T)P(W|T) (by Baye’s Theorem)
P(T) = P(t0=^ t1t2 … tn+1=.)
= P(t0)P(t1|t0)P(t2|t1t0)P(t3|t2t1t0) …P(tn|tn-1tn-2…t0)P(tn+1|tntn-1…t0)
= P(t0)P(t1|t0)P(t2|t1) … P(tn|tn-1)P(tn+1|tn)
N+1
∏i = 0
= P(ti|ti-1) Bigram Assumption
![Page 30: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/30.jpg)
Lexical Probability AssumptionP(W|T) = P(w0|t0-tn+1)P(w1|w0t0-tn+1)P(w2|w1w0t0-tn+1) …
P(wn|w0-wn-1t0-tn+1)P(wn+1|w0-wnt0-tn+1)
Assumption: A word is determined completely by inspired by speech recognition
its tag. This is
= P(wo|to)P(w1|t1) … P(wn+1|tn+1)
n+1
=∏ P(wi|ti)i = 0
n+1
= ∏ P(wi|ti)i = 1
(Lexical Probability Assumption)
![Page 31: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/31.jpg)
Generative Model
^_^ People_N Jump_V High_R ._.
LexicalProbabilities
^ N V A .
V N N BigramProbabilities
NA A
This model is called Generative model.Here words are observed from tags as states.This is similar to HMM.
![Page 32: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/32.jpg)
Bigram probabilities
N V A
N 0.2 0.7 0.1
V 0.6 0.2 0.2
A 0.5 0.2 0.3
![Page 33: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/33.jpg)
Lexical Probability
10
10
10
People jump high
N- 5
10- 3
0.4x10 - 7
V- 7
10-2
10 - 7
A 0 0 - 1
values in cell are P(col-heading/row-heading)
![Page 34: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/34.jpg)
Calculation from Corpus
actual data
^ Ram got many NLP books. He found themall very interesting.
Pos Tagged ^ N V A N N . N V N A R A .
![Page 35: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/35.jpg)
Recording numbers^ N V A R .
^ 0 2 0 0 0 0
N 0 1 2 1 0 1
V 0 1 0 1 0 0
A 0 1 0 0 1 1
R 0 0 0 1 0 0
. 1 0 0 0 0 0
![Page 36: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/36.jpg)
Probabilities^ N V A R .
^ 0 1 0 0 0 0
N 0 1/5 2/5 1/5 0 1/5
V 0 1/2 0 1/2 0 0
A 0 1/3 0 0 1/3 1/3
R 0 0 0 1 0 0
. 1 0 0 0 0 0
![Page 37: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/37.jpg)
To find
T* = argmax (P(T) P(W/T)) P(T).P(W/T) = Π P( ti / ti+1 ).P(wi /ti)
i=1 n
) : Bigram probability P( ti / ti+1
P(wi /ti): Lexical probability
![Page 38: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/38.jpg)
Bigram probabilities
N V A R
N 0.15 0.7 0.05 0.1
V 0.6 0.2 0.1 0.1
A 0.5 0.2 0.3 0
R 0.1 0.3 0.5 0.1
![Page 39: Natural Language processing Parts of speech tagging, its classes, and how to process it](https://reader034.vdocuments.site/reader034/viewer/2022042715/558903a2d8b42a8a1b8b46d4/html5/thumbnails/39.jpg)
Lexical Probability
10
10
10
People jum p high
N-5
10-3
0.4x10 -7
V-7
10-2
10 -7
A 0 0 -1
R 0 0 0
values in cell are P(col-heading/row-heading)