lecture: lexical semantics - umass...
TRANSCRIPT
Lecture:Lexical Semantics
[most slides borrowed from J&M 3rd ed. website]
CS 585, Fall 2017Introduction to Natural Language Processing
http://people.cs.umass.edu/~brenocon/inlp2017
Brendan O’ConnorCollege of Information and Computer Sciences
University of Massachusetts Amherst
Tuesday, October 17, 17
• Wordsparsityisaproblem!• Showmetweetsaboutvo6ngirregulari6es• Trainaclassifieron50documents
• Idea:externaldatabaseofwordmeaninginforma6on,forwordtypes.• Today:wordsensesandtaxonomies• Thursday:sen6mentlexiconsandlexiconexpansion• Post-midterm:vectors,wordembeddings,distribu6onalseman6cs
2
Tuesday, October 17, 17
Terminology:lemmaandwordform
• Alemmaorcita(onform• Samestem,partofspeech,roughseman6cs
• Awordform• Theinflectedwordasitappearsintext
Wordform Lemma
banks bank
sung sing
Tuesday, October 17, 17
Lemmashavesenses
• Onelemma“bank”canhavemanymeanings:• …a bank can hold the investments in a custodial account…
• “…as agriculture burgeons on the east bank the river will shrink even more”
• Sense(orwordsense)• Adiscreterepresenta6onofanaspectofaword’smeaning.
• ThelemmabankherehastwosensesTuesday, October 17, 17
Lemmashavesenses
• Onelemma“bank”canhavemanymeanings:• …a bank can hold the investments in a custodial account…
• “…as agriculture burgeons on the east bank the river will shrink even more”
• Sense(orwordsense)• Adiscreterepresenta6onofanaspectofaword’smeaning.
• Thelemmabankherehastwosenses
1Sense1:
Tuesday, October 17, 17
Lemmashavesenses
• Onelemma“bank”canhavemanymeanings:• …a bank can hold the investments in a custodial account…
• “…as agriculture burgeons on the east bank the river will shrink even more”
• Sense(orwordsense)• Adiscreterepresenta6onofanaspectofaword’smeaning.
• Thelemmabankherehastwosenses
1
2
Sense1:
Sense2:
Tuesday, October 17, 17
Homonymy
Homonyms:wordsthatshareaformbuthaveunrelated,dis6nctmeanings:
• bank1:financialins6tu6on,bank2:slopingland• bat1:clubforhiSngaball,bat2:nocturnalflyingmammal
1. Homographs(bank/bank,bat/bat)2. Homophones:
1. Writeandright2. Pieceandpeace
Tuesday, October 17, 17
HomonymycausesproblemsforNLPapplica6ons
• Informa6onretrieval• “bat care”
• MachineTransla6on• bat:murciélago(animal)orbate(forbaseball)
• Text-to-Speech• bass(stringedinstrument)vs.bass(fish)
Tuesday, October 17, 17
Polysemy
• 1.Thebankwasconstructedin1875outoflocalredbrick.• 2.Iwithdrewthemoneyfromthebank• Arethosethesamesense?
• Sense2:“Afinancialins6tu6on”• Sense1:“Thebuildingbelongingtoafinancialins6tu6on”
• Apolysemouswordhasrelatedmeanings• Mostnon-rarewordshavemul6plemeanings
Tuesday, October 17, 17
• Lotsoftypesofpolysemyaresystema6c• School, university, hospital• Allcanmeantheins6tu6onorthebuilding.
• Asystema6crela6onship:• BuildingOrganiza6on
• Othersuchkindsofsystema6cpolysemy:Author(Jane Austen wrote Emma) WorksofAuthor(I love Jane Austen)
MetonymyorSystema6cPolysemy:Asystema6crela6onshipbetweensenses
Tuesday, October 17, 17
• Lotsoftypesofpolysemyaresystema6c• School, university, hospital• Allcanmeantheins6tu6onorthebuilding.
• Asystema6crela6onship:• BuildingOrganiza6on
• Othersuchkindsofsystema6cpolysemy:Author(Jane Austen wrote Emma) WorksofAuthor(I love Jane Austen)Tree(Plums have beautiful blossoms)
MetonymyorSystema6cPolysemy:Asystema6crela6onshipbetweensenses
Tuesday, October 17, 17
• Lotsoftypesofpolysemyaresystema6c• School, university, hospital• Allcanmeantheins6tu6onorthebuilding.
• Asystema6crela6onship:• BuildingOrganiza6on
• Othersuchkindsofsystema6cpolysemy:Author(Jane Austen wrote Emma) WorksofAuthor(I love Jane Austen)Tree(Plums have beautiful blossoms) ! Fruit(I ate a preserved plum)
MetonymyorSystema6cPolysemy:Asystema6crela6onshipbetweensenses
Tuesday, October 17, 17
Howdoweknowwhenawordhasmorethanonesense?
• The“zeugma”test:Twosensesofserve?• Which flights serve breakfast?• Does Lufthansa serve Philadelphia?• ?DoesLu_hansaservebreakfastandSanJose?
Tuesday, October 17, 17
Howdoweknowwhenawordhasmorethanonesense?
• The“zeugma”test:Twosensesofserve?• Which flights serve breakfast?• Does Lufthansa serve Philadelphia?• ?DoesLu_hansaservebreakfastandSanJose?
• Sincethisconjunc6onsoundsweird,• wesaythatthesearetwodifferentsensesof“serve”
Tuesday, October 17, 17
Word meaning relations
• Relationships between pairs of word meanings
• Synonymy: same meaning
• Antonymy: opposite meanings
• Hypernymy/hyponymy: more general/specific meanings
• Meronymy: part-whole relations
• etc.
10
Tuesday, October 17, 17
Synonyms
• Wordthathavethesamemeaninginsomeorallcontexts.• filbert/hazelnut• couch/sofa• big/large• automobile/car• vomit/throwup• Water/H20
• Twolexemesaresynonyms• iftheycanbesubs6tutedforeachotherinallsitua6ons• Ifsotheyhavethesameproposi(onalmeaning
Tuesday, October 17, 17
Synonyms
• Buttherearefew(orno)examplesofperfectsynonymy.• Evenifmanyaspectsofmeaningareiden6cal• S6llmaynotpreservetheacceptabilitybasedonno6onsofpoliteness,slang,register,genre,etc.
Tuesday, October 17, 17
Synonyms
• Buttherearefew(orno)examplesofperfectsynonymy.• Evenifmanyaspectsofmeaningareiden6cal• S6llmaynotpreservetheacceptabilitybasedonno6onsofpoliteness,slang,register,genre,etc.
• Example:• Water/H20
• Big/large• Brave/courageous
Tuesday, October 17, 17
Synonymyisarela6onbetweensensesratherthanwords
• Considerthewordsbigandlarge
Tuesday, October 17, 17
Synonymyisarela6onbetweensensesratherthanwords
• Considerthewordsbigandlarge• Aretheysynonyms?
• Howbigisthatplane?• WouldIbeflyingonalargeorsmallplane?
Tuesday, October 17, 17
Synonymyisarela6onbetweensensesratherthanwords
• Considerthewordsbigandlarge• Aretheysynonyms?
• Howbigisthatplane?• WouldIbeflyingonalargeorsmallplane?
• Howabouthere:• MissNelsonbecameakindofbigsistertoBenjamin.• ?MissNelsonbecameakindoflargesistertoBenjamin.
Tuesday, October 17, 17
Synonymyisarela6onbetweensensesratherthanwords
• Considerthewordsbigandlarge• Aretheysynonyms?
• Howbigisthatplane?• WouldIbeflyingonalargeorsmallplane?
• Howabouthere:• MissNelsonbecameakindofbigsistertoBenjamin.• ?MissNelsonbecameakindoflargesistertoBenjamin.
• Why?• bighasasensethatmeansbeingolder,orgrownup• largelacksthissense
Tuesday, October 17, 17
Antonyms
• Sensesthatareoppositeswithrespecttoonefeatureofmeaning• Otherwise,theyareverysimilar!
dark/light short/long! fast/slow! rise/fallhot/cold! up/down! in/out
• Moreformally:antonymscan• defineabinaryopposi6on
orbeatoppositeendsofascale• long/short, fast/slow
• bereversives:• rise/fall, up/down
Tuesday, October 17, 17
HyponymyandHypernymy
• Onesenseisahyponymofanotherifthefirstsenseismorespecific,deno6ngasubclassoftheother• carisahyponymofvehicle• mangoisahyponymoffruit
• Converselyhypernym/superordinate(“hyperissuper”)• vehicleisahypernymofcar• fruitisahypernymofmango
Superordinate/hyper vehicle fruit furnitureSubordinate/hyponym car mango chair
Tuesday, October 17, 17
Hyponymymoreformally
• Extensional:• Theclassdenotedbythesuperordinateextensionallyincludestheclassdenotedbythehyponym
• Entailment:• AsenseAisahyponymofsenseBifbeinganAentailsbeingaB
• Hyponymyisusuallytransi6ve• (AhypoBandBhypoCentailsAhypoC)
• Anothername:theIS-Ahierarchy• AIS-AB(orAISAB)• BsubsumesA
Tuesday, October 17, 17
HyponymsandInstances
• WordNethasbothclassesandinstances.• Aninstanceisanindividual,apropernounthatisauniqueen6ty
• San Francisco isaninstanceofcity• Butcityisaclass• cityisahyponymofmunicipality...location...
17
Tuesday, October 17, 17
Meronymy
• Thepart-wholerela6on• Alegispartofachair;awheelispartofacar.
• Wheelisameronymofcar,andcarisaholonymofwheel.
18
Tuesday, October 17, 17
WordNet3.0
• Ahierarchicallyorganizedlexicaldatabase• On-linethesaurus+aspectsofadic6onary
• Someotherlanguagesavailableorunderdevelopment• (Arabic,Finnish,German,Portuguese…)
Category UniqueStrings
Noun 117,798Verb 11,529Adjec6ve 22,479Adverb 4,481
Tuesday, October 17, 17
Howis“sense”definedinWordNet?
• Thesynset(synonymset),thesetofnear-synonyms,instan6atesasenseorconcept,withagloss
• Example:chumpasanounwiththegloss:“apersonwhoisgullibleandeasytotakeadvantageof”
• Thissenseof“chump”issharedby9words:chump1, fool2, gull1, mark9, patsy1, fall guy1, sucker1, soft touch1, mug2
• Eachofthesesenseshavethissamegloss• (Noteverysense;sense2ofgullistheaqua6cbird)
Tuesday, October 17, 17
Hyponyms of “person” in WN7588 total -- with most freq. sense restriction. [from Michael Heilman]
• vintager• matrisib• horseback rider• ceo• seeker• fieldhand• radiologist• captain• moujik• research director• damsel• nibbler• nailer• nude person• seismologist• oddball• prankster• radiotherapist• nebraskan• cupbearer• psychic
• accompanist• plagiariser• timberman• photographer's
model• lombard• debaser• courtier• dutch uncle• schlemiel• dizygotic twin• mental case• matriarch• vocalist• internist• transplanter• techie• sniffler• marrano• first baseman• government man
• child prodigy• athenian• hospital chaplain• dominatrix• bibliopole• hombre• east indian• ballet master• bad person• rock 'n' roll
musician• flack catcher• telephoner• dominus• cheater• groveler• accomplice• herb doctor• schoolfriend• preteen• gastronome
• concierge• shogun• flutist• bottom dog• imperialist• emir• libeler• manichaean• abnegator• cousin-german• masorite• trouble maker• villainess• rajpoot• calapooya• overlord• bank guard• tumbler• polycarp• radiographer• slave owner
• stick-in-the-mud• audile• deadbeat• maltman• jeweler• pasha• screwballer• prioress• crosspatch• persecutor• movie maker• capo• class act• navvy• golden boy• sweet talker• junior• feminist• villager• specialiser• scotsman
25
Tuesday, October 17, 17
“Supersenses”
26
(countsfromSchneiderandSmith2013’sStreuselcorpus)
“Supersenses”The*top*level*hypernyms in*the*hierarchy
26
(counts%from%Schneider%and%Smith%2013’s%Streusel%corpus)Noun Verb
GROUP 1469 place STATIVE 2922 isPERSON 1202 people COGNITION 1093 knowARTIFACT 971 car COMMUNIC.∗ 974 recommendCOGNITION 771 way SOCIAL 944 useFOOD 766 food MOTION 602 goACT 700 service POSSESSION 309 payLOCATION 638 area CHANGE 274 fixTIME 530 day EMOTION 249 loveEVENT 431 experience PERCEPTION 143 seeCOMMUNIC.∗ 417 review CONSUMPTION 93 havePOSSESSION 339 price BODY 82 get. . . doneATTRIBUTE 205 quality CREATION 64 cookQUANTITY 102 amount CONTACT 46 putANIMAL 88 dog COMPETITION 11 winBODY 87 hair WEATHER 0 —STATE 56 pain all 15 VSSTs 7806NATURAL OBJ. 54 flowerRELATION 35 portion N/A (see §3.2)SUBSTANCE 34 oil `a 1191 haveFEELING 34 discomfort ` 821 anyonePROCESS 28 process `j 54 friedMOTIVE 25 reasonPHENOMENON 23 result ∗COMMUNIC.
is short forCOMMUNICATION
SHAPE 6 squarePLANT 5 treeOTHER 2 stuffall 26 NSSTs 9018
Table 1: Summary of noun and verb supersense cate-gories. Each entry shows the label along with the countand most frequent lexical item in the STREUSLE corpus.
enrich the MWE annotations of the CMWE corpus1
(Schneider et al., 2014b), are publicly released underthe name STREUSLE.2 This includes new guidelinesfor verb supersense annotation. Our open-sourcetagger, implemented in Python, is available from thatpage as well.
2 Background: Supersense Tags
WordNet’s supersense categories are the top-levelhypernyms in the taxonomy (sometimes known assemantic fields) which are designed to be broadenough to encompass all nouns and verbs (Miller,1990; Fellbaum, 1990).3
1http://www.ark.cs.cmu.edu/LexSem/2Supersense-Tagged Repository of English with a Unified
Semantics for Lexical Expressions3WordNet synset entries were originally partitioned into
lexicographer files for these coarse categories, which becameknown as “supersenses.” The lexname function in WordNet/
The 26 noun and 15 verb supersense categories arelisted with examples in table 1. Some of the namesoverlap between the noun and verb inventories, butthey are to be considered separate categories; here-after, we will distinguish the noun and verb categorieswith prefixes, e.g. N:COGNITION vs. V:COGNITION.
Though WordNet synsets are associated with lex-ical entries, the supersense categories are unlexical-ized. The N:PERSON category, for instance, containssynsets for both principal and student. A differentsense of principal falls under N:POSSESSION.
As far as we are aware, the supersenses wereoriginally intended only as a method of organizingthe WordNet structure. But Ciaramita and Johnson(2003) pioneered the coarse word sense disambigua-tion task of supersense tagging, noting that the su-persense categories provided a natural broadeningof the traditional named entity categories to encom-pass all nouns. Ciaramita and Altun (2006) laterexpanded the task to include all verbs, and applieda supervised sequence modeling framework adaptedfrom NER. Evaluation was against manually sense-tagged data that had been automatically converted tothe coarser supersenses. Similar taggers have sincebeen built for Italian (Picca et al., 2008) and Chi-nese (Qiu et al., 2011), both of which have their ownWordNets mapped to English WordNet.
Although many of the annotated expressions in ex-isting supersense datasets contain multiple words, therelationship between MWEs and supersenses has notreceived much attention. (Piao et al. (2003, 2005) didinvestigate MWEs in the context of a lexical taggerwith a finer-grained taxonomy of semantic classes.)Consider these examples from online reviews:(1) IT IS NOT A HIGH END STEAK HOUSE
(2) The white pages allowed me to get in touch withparents of my high school friends so that I couldtrack people down one by one
HIGH END functions as a unit to mean ‘sophis-ticated, expensive’. (It is not in WordNet, though
NLTK (Bird et al., 2009) returns a synset’s lexicographer file.A subtle difference is that a special file called noun.Tops
contains each noun supersense’s root synset (e.g., group.n.01for N:GROUP) as well as a few miscellaneous synsets, such asliving_thing.n.01, that are too abstract to fall under any singlesupersense. Following Ciaramita and Altun (2006), we treat thelatter cases under an N:OTHER supersense category and mergethe former under their respective supersense.
Noun Verb
GROUP 1469 place STATIVE 2922 isPERSON 1202 people COGNITION 1093 knowARTIFACT 971 car COMMUNIC.∗ 974 recommendCOGNITION 771 way SOCIAL 944 useFOOD 766 food MOTION 602 goACT 700 service POSSESSION 309 payLOCATION 638 area CHANGE 274 fixTIME 530 day EMOTION 249 loveEVENT 431 experience PERCEPTION 143 seeCOMMUNIC.∗ 417 review CONSUMPTION 93 havePOSSESSION 339 price BODY 82 get. . . doneATTRIBUTE 205 quality CREATION 64 cookQUANTITY 102 amount CONTACT 46 putANIMAL 88 dog COMPETITION 11 winBODY 87 hair WEATHER 0 —STATE 56 pain all 15 VSSTs 7806NATURAL OBJ. 54 flowerRELATION 35 portion N/A (see §3.2)SUBSTANCE 34 oil `a 1191 haveFEELING 34 discomfort ` 821 anyonePROCESS 28 process `j 54 friedMOTIVE 25 reasonPHENOMENON 23 result ∗COMMUNIC.
is short forCOMMUNICATION
SHAPE 6 squarePLANT 5 treeOTHER 2 stuffall 26 NSSTs 9018
Table 1: Summary of noun and verb supersense cate-gories. Each entry shows the label along with the countand most frequent lexical item in the STREUSLE corpus.
enrich the MWE annotations of the CMWE corpus1
(Schneider et al., 2014b), are publicly released underthe name STREUSLE.2 This includes new guidelinesfor verb supersense annotation. Our open-sourcetagger, implemented in Python, is available from thatpage as well.
2 Background: Supersense Tags
WordNet’s supersense categories are the top-levelhypernyms in the taxonomy (sometimes known assemantic fields) which are designed to be broadenough to encompass all nouns and verbs (Miller,1990; Fellbaum, 1990).3
1http://www.ark.cs.cmu.edu/LexSem/2Supersense-Tagged Repository of English with a Unified
Semantics for Lexical Expressions3WordNet synset entries were originally partitioned into
lexicographer files for these coarse categories, which becameknown as “supersenses.” The lexname function in WordNet/
The 26 noun and 15 verb supersense categories arelisted with examples in table 1. Some of the namesoverlap between the noun and verb inventories, butthey are to be considered separate categories; here-after, we will distinguish the noun and verb categorieswith prefixes, e.g. N:COGNITION vs. V:COGNITION.
Though WordNet synsets are associated with lex-ical entries, the supersense categories are unlexical-ized. The N:PERSON category, for instance, containssynsets for both principal and student. A differentsense of principal falls under N:POSSESSION.
As far as we are aware, the supersenses wereoriginally intended only as a method of organizingthe WordNet structure. But Ciaramita and Johnson(2003) pioneered the coarse word sense disambigua-tion task of supersense tagging, noting that the su-persense categories provided a natural broadeningof the traditional named entity categories to encom-pass all nouns. Ciaramita and Altun (2006) laterexpanded the task to include all verbs, and applieda supervised sequence modeling framework adaptedfrom NER. Evaluation was against manually sense-tagged data that had been automatically converted tothe coarser supersenses. Similar taggers have sincebeen built for Italian (Picca et al., 2008) and Chi-nese (Qiu et al., 2011), both of which have their ownWordNets mapped to English WordNet.
Although many of the annotated expressions in ex-isting supersense datasets contain multiple words, therelationship between MWEs and supersenses has notreceived much attention. (Piao et al. (2003, 2005) didinvestigate MWEs in the context of a lexical taggerwith a finer-grained taxonomy of semantic classes.)Consider these examples from online reviews:(1) IT IS NOT A HIGH END STEAK HOUSE
(2) The white pages allowed me to get in touch withparents of my high school friends so that I couldtrack people down one by one
HIGH END functions as a unit to mean ‘sophis-ticated, expensive’. (It is not in WordNet, though
NLTK (Bird et al., 2009) returns a synset’s lexicographer file.A subtle difference is that a special file called noun.Tops
contains each noun supersense’s root synset (e.g., group.n.01for N:GROUP) as well as a few miscellaneous synsets, such asliving_thing.n.01, that are too abstract to fall under any singlesupersense. Following Ciaramita and Altun (2006), we treat thelatter cases under an N:OTHER supersense category and mergethe former under their respective supersense.
Noun Verb
GROUP 1469 place STATIVE 2922 isPERSON 1202 people COGNITION 1093 knowARTIFACT 971 car COMMUNIC.∗ 974 recommendCOGNITION 771 way SOCIAL 944 useFOOD 766 food MOTION 602 goACT 700 service POSSESSION 309 payLOCATION 638 area CHANGE 274 fixTIME 530 day EMOTION 249 loveEVENT 431 experience PERCEPTION 143 seeCOMMUNIC.∗ 417 review CONSUMPTION 93 havePOSSESSION 339 price BODY 82 get. . . doneATTRIBUTE 205 quality CREATION 64 cookQUANTITY 102 amount CONTACT 46 putANIMAL 88 dog COMPETITION 11 winBODY 87 hair WEATHER 0 —STATE 56 pain all 15 VSSTs 7806NATURAL OBJ. 54 flowerRELATION 35 portion N/A (see §3.2)SUBSTANCE 34 oil `a 1191 haveFEELING 34 discomfort ` 821 anyonePROCESS 28 process `j 54 friedMOTIVE 25 reasonPHENOMENON 23 result ∗COMMUNIC.
is short forCOMMUNICATION
SHAPE 6 squarePLANT 5 treeOTHER 2 stuffall 26 NSSTs 9018
Table 1: Summary of noun and verb supersense cate-gories. Each entry shows the label along with the countand most frequent lexical item in the STREUSLE corpus.
enrich the MWE annotations of the CMWE corpus1
(Schneider et al., 2014b), are publicly released underthe name STREUSLE.2 This includes new guidelinesfor verb supersense annotation. Our open-sourcetagger, implemented in Python, is available from thatpage as well.
2 Background: Supersense Tags
WordNet’s supersense categories are the top-levelhypernyms in the taxonomy (sometimes known assemantic fields) which are designed to be broadenough to encompass all nouns and verbs (Miller,1990; Fellbaum, 1990).3
1http://www.ark.cs.cmu.edu/LexSem/2Supersense-Tagged Repository of English with a Unified
Semantics for Lexical Expressions3WordNet synset entries were originally partitioned into
lexicographer files for these coarse categories, which becameknown as “supersenses.” The lexname function in WordNet/
The 26 noun and 15 verb supersense categories arelisted with examples in table 1. Some of the namesoverlap between the noun and verb inventories, butthey are to be considered separate categories; here-after, we will distinguish the noun and verb categorieswith prefixes, e.g. N:COGNITION vs. V:COGNITION.
Though WordNet synsets are associated with lex-ical entries, the supersense categories are unlexical-ized. The N:PERSON category, for instance, containssynsets for both principal and student. A differentsense of principal falls under N:POSSESSION.
As far as we are aware, the supersenses wereoriginally intended only as a method of organizingthe WordNet structure. But Ciaramita and Johnson(2003) pioneered the coarse word sense disambigua-tion task of supersense tagging, noting that the su-persense categories provided a natural broadeningof the traditional named entity categories to encom-pass all nouns. Ciaramita and Altun (2006) laterexpanded the task to include all verbs, and applieda supervised sequence modeling framework adaptedfrom NER. Evaluation was against manually sense-tagged data that had been automatically converted tothe coarser supersenses. Similar taggers have sincebeen built for Italian (Picca et al., 2008) and Chi-nese (Qiu et al., 2011), both of which have their ownWordNets mapped to English WordNet.
Although many of the annotated expressions in ex-isting supersense datasets contain multiple words, therelationship between MWEs and supersenses has notreceived much attention. (Piao et al. (2003, 2005) didinvestigate MWEs in the context of a lexical taggerwith a finer-grained taxonomy of semantic classes.)Consider these examples from online reviews:(1) IT IS NOT A HIGH END STEAK HOUSE
(2) The white pages allowed me to get in touch withparents of my high school friends so that I couldtrack people down one by one
HIGH END functions as a unit to mean ‘sophis-ticated, expensive’. (It is not in WordNet, though
NLTK (Bird et al., 2009) returns a synset’s lexicographer file.A subtle difference is that a special file called noun.Tops
contains each noun supersense’s root synset (e.g., group.n.01for N:GROUP) as well as a few miscellaneous synsets, such asliving_thing.n.01, that are too abstract to fall under any singlesupersense. Following Ciaramita and Altun (2006), we treat thelatter cases under an N:OTHER supersense category and mergethe former under their respective supersense.
Tuesday, October 17, 17
Supersenses
• Aword’ssupersensecanbeausefulcoarse-grainedrepresenta6onofwordmeaningforNLPtasks
27
• See“STREUSEL”systemhpp://www.cs.cmu.edu/~ark/LexSem/
Tuesday, October 17, 17
• TouseWordNet,oranylexicaldatabase,forNLP:• 1.WordSenseDisambigua6on(WSD)a.k.a.En6tyLinking• The[bank]3wasopenearly.
• 2.Uselexicalentryinforma6onforfeaturesorinferences• Whenwasthatbusinessopen?• [bank]3<-hypo-[commercialins6tu6on]
28
Tuesday, October 17, 17
WordNet3.0
• Whereitis:• hpp://wordnetweb.princeton.edu/perl/webwn
• Libraries• Python:WordNetfromNLTK• Java:JWNL,extJWNL
Tuesday, October 17, 17
• MeSH(MedicalSubjectHeadings)• 177,000entrytermsthatcorrespondto26,142biomedical“headings”
• HemoglobinsEntryTerms:Eryhem,FerrousHemoglobin,HemoglobinDefini(on:Theoxygen-carryingproteinsofERYTHROCYTES.Theyarefoundinallvertebratesandsomeinvertebrates.Thenumberofglobinsubunitsinthehemoglobinquaternarystructurediffersbetweenspecies.Structuresrangefrommonomerictoavarietyofmul6mericarrangements
MeSH:MedicalSubjectHeadingsthesaurusfromtheNa6onalLibraryofMedicine
Tuesday, October 17, 17
Synset
• MeSH(MedicalSubjectHeadings)• 177,000entrytermsthatcorrespondto26,142biomedical“headings”
• HemoglobinsEntryTerms:Eryhem,FerrousHemoglobin,HemoglobinDefini(on:Theoxygen-carryingproteinsofERYTHROCYTES.Theyarefoundinallvertebratesandsomeinvertebrates.Thenumberofglobinsubunitsinthehemoglobinquaternarystructurediffersbetweenspecies.Structuresrangefrommonomerictoavarietyofmul6mericarrangements
MeSH:MedicalSubjectHeadingsthesaurusfromtheNa6onalLibraryofMedicine
Tuesday, October 17, 17
UsesoftheMeSHOntology
• Providesynonyms(“entryterms”)• E.g.,glucoseanddextrose
• Providehypernyms(fromthehierarchy)• E.g.,glucoseISAmonosaccharide
• IndexinginMEDLINE/PubMEDdatabase• NLM’sbibliographicdatabase:• 20millionjournalar6cles• Eachar6clehand-assigned10-20MeSHterms
Tuesday, October 17, 17
En(ty-focusedlexicaldatabases
• General-domain,derivedfromWikipedia• DBpediahpp://wiki.dbpedia.org/• Freebase:nowdiscon6nued
• GoogleKnowledgeGraphandotherproprietarydatabases(Bing,Facebook,etc.)• Lotsofrela6ons/apributes,aimedforconsumerinternetuse• Canbeuseddirectlytoanswerqueries• Internally,theyen6ty-linkdocumenttextsagainstthem
33
Tuesday, October 17, 17
WordSimilarity
• Synonymy:abinaryrela6on• Twowordsareeithersynonymousornot
• Similarity(ordistance):aloosermetric• Twowordsaremoresimilariftheysharemorefeaturesofmeaning
• Similarityisproperlyarela6onbetweensenses• Theword“bank”isnotsimilartotheword“slope”• Bank1issimilartofund3
• Bank2issimilartoslope5
• Butwe’llcomputesimilarityoverbothwordsandsensesTuesday, October 17, 17
Whywordsimilarity
• Aprac6calcomponentinlotsofNLPtasks• Ques6onanswering• Naturallanguagegenera6on• Automa6cessaygrading• Plagiarismdetec6on
• Atheore6calcomponentinmanylinguis6candcogni6vetasks• Historicalseman6cs• Modelsofhumanwordlearning• Morphologyandgrammarinduc6on
Tuesday, October 17, 17
Wordsimilarityandwordrelatedness
• Weo_endis6nguishwordsimilarityfromwordrelatedness• Similarwords:near-synonyms• Relatedwords:canberelatedanyway• car, bicycle:similar• car, gasoline:related,notsimilar
Tuesday, October 17, 17
Twoclassesofsimilarityalgorithms
• Thesaurus-basedalgorithms• Arewords“nearby”inhypernymhierarchy?• Dowordshavesimilarglosses(defini6ons)?
• Distribu6onalalgorithms• Dowordshavesimilardistribu6onalcontexts?• Distribu6onal(Vector)seman6csonThursday!
Tuesday, October 17, 17
38
Path*based*similarity
• Two%concepts%(senses/synsets)%are%similar%if%they%are%near%each%other%in%the%thesaurus%hierarchy%• =have%a%short%path%between%them• concepts%have%path%1%to%themselves
Tuesday, October 17, 17
Evalua6ngsimilarity
• Extrinsic(task-based,end-to-end)Evalua6on:• Ques6onAnswering• SpellChecking• Essaygrading
• IntrinsicEvalua6on:• Correla6onbetweenalgorithmandhumanwordsimilarityra6ngs• Wordsim353:353nounpairsrated0-10.sim(plane,car)=5.77
• TakingTOEFLmul6ple-choicevocabularytests• Levied is closest in meaning to: imposed, believed, requested, correlated
Tuesday, October 17, 17