part-of-speech tagging using neural networks ankur parikh ltrc iiit hyderabad...
TRANSCRIPT
Part-Of-Speech Part-Of-Speech Tagging Tagging
using Neural using Neural NetworksNetworks
Ankur ParikhAnkur ParikhLTRCLTRC
IIIT HyderabadIIIT Hyderabad [email protected]@gmail.com
OutlineOutline
1.Introduction1.Introduction2.Background and Motivation2.Background and Motivation3.Experimental Setup3.Experimental Setup4.Preprocessing 4.Preprocessing 5.Representation5.Representation6.Single-neuro tagger6.Single-neuro tagger7.Experiments7.Experiments8.Multi-neuro tagger8.Multi-neuro tagger9.Results9.Results10.Discussion10.Discussion11.Future Work11.Future Work
IntroductionIntroduction
POS-TaggingPOS-Tagging::
It is the process of assigning the part of speech tag to It is the process of assigning the part of speech tag to the NL text based on both its definition and its contextthe NL text based on both its definition and its context..
Uses:Uses:Parsing of sentences, MT, IR, Word Sense disambiguation, Parsing of sentences, MT, IR, Word Sense disambiguation, Speech synthesis etc.Speech synthesis etc.
Methods:Methods:1. Statistical Approach1. Statistical Approach2. Rule Based2. Rule Based
Background: Previous Background: Previous ApproachesApproaches
Lots of work has been done using various machine Lots of work has been done using various machine learning algorithms like learning algorithms like TNTTNT CRF CRF
for Hindi.for Hindi. Trade-off: Performance versus Training timeTrade-off: Performance versus Training time
- Less precision affects later stages- Less precision affects later stages
- For a new domain or new corpus parameter tuning - For a new domain or new corpus parameter tuning is a non-trivial task. is a non-trivial task.
Background: Previous Background: Previous Approaches & MotivationApproaches & Motivation
Empirically chosen context.Empirically chosen context. Effective Handling of corpus based featuresEffective Handling of corpus based features Need of the hour: Need of the hour:
- Good performance- Good performance
- Less training time- Less training time
- Multiple contexts- Multiple contexts
- exploit corpus based features effectively- exploit corpus based features effectively Two Approaches and their comparison with TNT and Two Approaches and their comparison with TNT and
CRFCRF Word level taggingWord level tagging
Experimental Setup : Experimental Setup : Corpus statitsticsCorpus statitstics
Tag set of 25 tagsTag set of 25 tags
CorpusCorpus Size (in Size (in words)words)
Unseen Unseen words (in words (in percentagpercentage)e)
TrainingTraining 187,095187,095 --
DevelopmDevelopmentent
23,56523,565 5.33%5.33%
TestingTesting 23,28123,281 8.15%8.15%
Experimental Setup: Tools Experimental Setup: Tools and Resourcesand Resources
ToolsTools- CRF++- CRF++- TNT- TNT- Morfessor Categories – MAP- Morfessor Categories – MAP
ResourcesResources- Universal word – Hindi Dictionary- Universal word – Hindi Dictionary- Hindi Word net- Hindi Word net- Morph Analyzer - Morph Analyzer
PreprocessingPreprocessing
XC tag is removed (Gadde et. Al., XC tag is removed (Gadde et. Al., 2008).2008).
LexiconLexicon
- For each unique word w of the - For each unique word w of the training corpus training corpus => ENTRY(t1,=> ENTRY(t1,……,t24)……,t24)
- where tj = c(posj , w) / c(w)- where tj = c(posj , w) / c(w)
Representation: Encoding & Representation: Encoding & DecodingDecoding
Each word w is encoded as an n-element Each word w is encoded as an n-element vector INPUT(t1,t2,…,tn) where n = size of vector INPUT(t1,t2,…,tn) where n = size of the tag set.the tag set.
INPUT(t1,t2,…,tn) comes from lexicon if INPUT(t1,t2,…,tn) comes from lexicon if training corpus contains w.training corpus contains w.
If w is not in the training corpusIf w is not in the training corpus
- N(w) = Number of possible POS tags for w- N(w) = Number of possible POS tags for w
- tj - tj = 1/N(w) if posj is a candidate= 1/N(w) if posj is a candidate
= 0 otherwise= 0 otherwise
Representation: Encoding & Representation: Encoding & DecodingDecoding
For each word w, Desired Output is For each word w, Desired Output is encoded as D = (d1,d2,….,dn).encoded as D = (d1,d2,….,dn).
- dj = 1 if posj is a desired ouput- dj = 1 if posj is a desired ouput
= 0 otherwise= 0 otherwise In testing, for each word w, an n-In testing, for each word w, an n-
element vector OUTPUT(o1,…,on) is element vector OUTPUT(o1,…,on) is returned.returned.
- Result = posj, if oj = max(OUTPUT)- Result = posj, if oj = max(OUTPUT)
Single – neuro tagger: Single – neuro tagger: StructureStructure
Single – neuro tagger: Single – neuro tagger: Training & TaggingTraining & Tagging
Error Back-propagation learning Error Back-propagation learning AlgorithmAlgorithm
Weights are Initialized with Random Weights are Initialized with Random valuesvalues
Sequential modeSequential mode Momentum termMomentum term Eta = 0.4 and Alpha = 0.1Eta = 0.4 and Alpha = 0.1 In tagging, it can give multiple outputs In tagging, it can give multiple outputs
or a sorted list of all tags.or a sorted list of all tags.
Experiments: Experiments: Development DataDevelopment Data
FeaturesFeatures PrecisionPrecision
Corpus based and Corpus based and contextualcontextual
93.19%93.19%
Root of the wordRoot of the word 93.38%93.38%
Length of the wordLength of the word 94.04%94.04%
Handling of unseen Handling of unseen wordswords
Root->Dictionary-Root->Dictionary->Word net->Word net->Morfessor >Morfessor
{{tj tj = c(= c(posj posj ,s) + ,s) + c(c(posj posj ,p)/ c(s) + c(p)},p)/ c(s) + c(p)}
95.62%95.62%
Development of the Development of the systemsystem
Multi – neuro tagger: Multi – neuro tagger: StructureStructure
Multi – neuro tagger: Multi – neuro tagger: TrainingTraining
Multi – neuro tagger: Multi – neuro tagger: Learning curvesLearning curves
Multi – neuro tagger: Multi – neuro tagger: ResultsResults
StructureStructure ContextContext DevelopmDevelopmentent
TestTest
97-48-2497-48-24 33 95.44%95.44% 91.87%91.87%
121-48-24121-48-24 4_prev4_prev 95.64%95.64% 92.05%92.05%
121-48-24121-48-24 4_next4_next 95.66%95.66% 91.95%91.95%
145-72-24145-72-24 55 95.55%95.55% 92.15%92.15%
169-72-24169-72-24 6_prev6_prev 95.56%95.56% 92.14%92.14%
169-72-24169-72-24 6_next6_next 95.54%95.54% 92.14%92.14%
193-96-24193-96-24 77 95.46%95.46% 92.07%92.07%
Multi – neuro tagger: Multi – neuro tagger: ComparisonComparison
Precision after voting : 92.19%Precision after voting : 92.19%
TaggerTagger DevelopmDevelopment ent
TestTest Training Training TimeTime
TNTTNT 95.18%95.18% 91.58%91.58% 1-2 1-2 (Seconds)(Seconds)
Multi – Multi – neuro neuro taggertagger
95.78%95.78% 92.19%92.19% 13-14 13-14 (Minutes)(Minutes)
CRFCRF 96.05%96.05% 92.92%92.92% 2-2-2.5(Hours2.5(Hours))
ConclusionConclusion
Single versus Multi-neuro taggerSingle versus Multi-neuro tagger Multi-neuro tagger versus TNT and Multi-neuro tagger versus TNT and
CRFCRF Corpus and Dictionary based featuresCorpus and Dictionary based features More parameters need to be tunedMore parameters need to be tuned 24^5 = 79,62,624 n-grams, while 24^5 = 79,62,624 n-grams, while
250,560 weights250,560 weights Well suited for Indian LanguagesWell suited for Indian Languages
Future WorkFuture Work
Better voting schemes (Confidence Better voting schemes (Confidence point based)point based)
Finding the right context Finding the right context (Probability based)(Probability based)
Various Structures and algorithmsVarious Structures and algorithms
- Sequential Neural Network- Sequential Neural Network
- Convolution Neural Network- Convolution Neural Network
- Combination with SVM- Combination with SVM
Thank You!!
Queries???