slides 07 neurallm
TRANSCRIPT
![Page 1: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/1.jpg)
POStaggingCMSC723/LING723/INST725
MarineCarpuat
![Page 2: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/2.jpg)
PartsofSpeech
• “Equivalenceclass”oflinguisticentities• “Categories”or“types”ofwords
• StudydatesbacktotheancientGreeks• DionysiusThrax ofAlexandria(c. 100BC)• 8partsofspeech:noun,verb,pronoun,preposition,adverb,conjunction,participle,article
• Remarkablyenduringlist!
2
![Page 3: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/3.jpg)
HowcanwedefinePOS?
• Bymeaning?• Verbsareactions• Adjectivesareproperties• Nounsarethings
• Bythesyntacticenvironment• Whatoccursnearby?• Whatdoesitactas?
• Bywhatmorphologicalprocessesaffectit• Whataffixesdoesittake?
• Typicallycombinationofsyntactic+morphology
![Page 4: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/4.jpg)
PartsofSpeech
• Openclass• Impossibletocompletelyenumerate• Newwordscontinuouslybeinginvented,borrowed,etc.
• Closedclass• Closed,fixedmembership• Reasonablyeasytoenumerate• Generally,shortfunctionwordsthat“structure”sentences
![Page 5: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/5.jpg)
OpenClassPOS
• FourmajoropenclassesinEnglish• Nouns• Verbs• Adjectives• Adverbs
• Alllanguageshavenounsandverbs...butmaynothavetheothertwo
![Page 6: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/6.jpg)
Nouns
• Openclass• Newinventionsallthetime:muggle,webinar,...
• Semantics:• Generally,wordsforpeople,places,things• Butnotalways(bandwidth,energy,...)
• Syntacticenvironment:• Occurringwithdeterminers• Pluralizable,possessivizable
• Othercharacteristics:• Massvs.countnouns
![Page 7: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/7.jpg)
Verbs
• Openclass• Newinventionsallthetime:google,tweet,...
• Semantics• Generally,denoteactions,processes,etc.
• Syntacticenvironment• E.g.,Intransitive,transitive
• Othercharacteristics• Mainvs.auxiliaryverbs• Gerunds(verbsbehavinglikenouns)• Participles(verbsbehavinglikeadjectives)
![Page 8: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/8.jpg)
AdjectivesandAdverbs
• Adjectives• Generallymodifynouns,e.g.,tall girl
• Adverbs• Asemanticandformalhodge-podge…• Sometimesmodifyverbs,e.g.,sangbeautifully• Sometimesmodifyadjectives,e.g.,extremely hot
![Page 9: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/9.jpg)
ClosedClassPOS
• Prepositions• InEnglish,occurringbeforenounphrases• Specifyingsometypeofrelation(spatial,temporal,…)• Examples:on theshelf,before noon
• Particles• Resemblesapreposition,butusedwithaverb(“phrasalverbs”)• Examples:findout,turnover,goon
![Page 10: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/10.jpg)
Particlevs.Prepositions
Hecameby theofficeinahurryHecameby hisfortunehonestly
Weranup thephonebillWeranup thesmallhill
Heliveddown theblockHeneverliveddown thenicknames
(by=preposition)(by=particle)
(up=particle)(up=preposition)
(down=preposition)(down=particle)
![Page 11: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/11.jpg)
MoreClosedClassPOS
• Determiners• Establishreferenceforanoun• Examples:a,an,the (articles),that,this,many,such,…
• Pronouns• Refertopersonorentities:he,she,it• Possessivepronouns:his,her,its• Wh-pronouns:what,who
![Page 12: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/12.jpg)
ClosedClassPOS:Conjunctions
• Coordinatingconjunctions• Jointwoelementsof“equalstatus”• Examples:catsand dogs,salador soup
• Subordinatingconjunctions• Jointwoelementsof“unequalstatus”• Examples:We’llleaveafter youfinisheating.While Iwaswaitinginline,Isawmyfriend.
• Complementizers areaspecialcase:Ithinkthat youshouldfinishyourassignment
![Page 13: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/13.jpg)
BeyondEnglish…ChineseNoverb/adjectivedistinction!
RiauIndonesian/MalayNoArticlesNoTenseMarking3rdpersonpronounsneutraltobothgenderandnumberNofeaturesdistinguishingverbsfromnouns
漂亮:beautiful/tobebeautiful
Ayam (chicken) Makan (eat)
The chicken is eatingThe chicken ate
The chicken will eatThe chicken is being eatenWhere the chicken is eatingHow the chicken is eating
Somebody is eating the chickenThe chicken that is eating
![Page 14: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/14.jpg)
POStagging
![Page 15: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/15.jpg)
POSTagging:What’sthetask?
• Processofassigningpart-of-speechtagstowords
• Butwhattagsarewegoingtoassign?• Coarsegrained:noun,verb,adjective,adverb,…• Finegrained:{proper,common}noun• Evenfiner-grained:{proper,common}noun± animate
• Importantissuestoremember• Choiceoftagsencodescertaindistinctions/non-distinctions• Tagsets willdifferacrosslanguages!
• ForEnglish,PennTreebankisthemostcommontagset
![Page 16: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/16.jpg)
PennTreebankTagset:45Tags
![Page 17: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/17.jpg)
PennTreebankTagset:Choices
• Example:• The/DTgrand/JJjury/NNcommmented/VBDon/INa/DTnumber/NNof/INother/JJtopics/NNS./.
• Distinctionsandnon-distinctions• Prepositionsandsubordinatingconjunctionsaretagged“IN”(“Although/INI/PRP..”)
• Exceptthepreposition/complementizer “to”istagged“TO”
![Page 18: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/18.jpg)
WhydoPOStagging?
• OneofthemostbasicNLPtasks• NicelyillustratesprinciplesofstatisticalNLP
• Usefulforhigher-levelanalysis• Neededforsyntacticanalysis• Neededforsemanticanalysis
• SampleapplicationsthatrequirePOStagging• Machinetranslation• Informationextraction• Lotsmore…
![Page 19: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/19.jpg)
Tryyourhandattagging…
• Theback door• Onmyback• Winthevotersback• Promisedtoback thebill
![Page 20: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/20.jpg)
Tryyourhandattagging…
• Ihopethat shewins• That daywasnice• Youcangothat far
![Page 21: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/21.jpg)
WhyisPOStagginghard?
• Ambiguity!
• AmbiguityinEnglish• 11.5%ofwordtypesambiguousinBrowncorpus• 40%ofwordtokensambiguousinBrowncorpus• AnnotatordisagreementinPennTreebank:3.5%
![Page 22: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/22.jpg)
POStagging:howtodoit?
• GivenPennTreebank,howwouldyoubuildasystemthatcanPOStagnewtext?
• Baseline:pickmostfrequenttagforeachwordtype• 90%accuracyiftrain+test setsaredrawnfromPennTreebank
• Canwedobetter?
![Page 23: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/23.jpg)
HowtoPOStagautomatically?
![Page 24: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/24.jpg)
HowcanwePOStagautomatically?
• POStaggingasmulticlassclassification• Whatisx?Whatisy?
• POStaggingassequencelabeling• Modelssequencesofpredictions
![Page 25: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/25.jpg)
LinearModelsforClassification
Featurefunction
representation
Weights
![Page 26: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/26.jpg)
Multiclassperceptron
![Page 27: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/27.jpg)
POStaggingSequencelabelingwiththeperceptron
Sequencelabelingproblem
• Input:• sequenceoftokensx=[x1 … xK]• VariablelengthK
• Output(akalabel):• sequenceoftagsy=[y1 … yK]• Sizeofoutputspace?
StructuredPerceptron• Perceptronalgorithmcanbeusedforsequencelabeling
• Buttherearechallenges• Howtocomputeargmax efficiently?• Whatareappropriatefeatures?
• Approach:leveragestructureofoutputspace
![Page 28: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/28.jpg)
Featurefunctionsforsequencelabeling
• Examplefeatures?• Numberoftimes“monsters”istaggedasnoun
• Numberoftimes“noun”isfollowedby“verb”
• Numberoftimes“tasty”istaggedas“verb”
• Numberoftimestwoverbsareadjacent• …
![Page 29: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/29.jpg)
Featurefunctionsforsequencelabeling
• StandardfeaturesofPOStagging
• Unaryfeatures: #timeswordwhasbeenlabeledwithtaglforallwordswandalltagsl
• Markovfeatures: #timestaglisadjacenttotagl’inoutputforalltagslandl’
• Sizeoffeaturerepresentationisconstantwrtinputlength
![Page 30: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/30.jpg)
Solvingtheargmax problemforsequences
• Efficientalgorithmspossibleifthefeaturefunctiondecomposesovertheinput
• Thisholdsforunaryandmarkovfeatures
![Page 31: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/31.jpg)
Solvingtheargmax problemforsequences
• Trellissequencelabeling• Anypathrepresentsalabelingofinputsentence
• Goldstandardpathinred
• Eachedgereceivesaweightsuchthataddingweightsalongthepathcorrespondstoscoreforinput/ouputconfiguration
• Anymax-weightmax-weightpathalgorithmcanfindtheargmax
• e.g.ViterbialgorithmO(LK2)
![Page 32: slides 07 neurallm](https://reader031.vdocuments.site/reader031/viewer/2022013009/61ce65985da5c947d84d3352/html5/thumbnails/32.jpg)
POStaggingCMSC723/LING723/INST725
MarineCarpuat