word similarity: distributional similarity...
TRANSCRIPT
WordMeaningandSimilarity
WordSimilarity:DistributionalSimilarity(I)
Problemswiththesaurus-basedmeaning
• Wedon’thaveathesaurusforeverylanguage• Evenifwedo,theyhaveproblemswithrecall
• Manywordsaremissing• Most(ifnotall)phrasesaremissing• Someconnectionsbetweensensesaremissing• Thesauriworklesswellforverbs,adjectives• Adjectivesandverbshavelessstructuredhyponymyrelations
Distributionalmodelsofmeaning
• Alsocalledvector-spacemodelsofmeaning• Offermuchhigherrecallthanhand-builtthesauri
• Althoughtheytendtohavelowerprecision
• Zellig Harris(1954):“oculistandeye-doctor…occurinalmostthesameenvironments….IfAandBhavealmostidenticalenvironmentswesaythattheyaresynonyms.
• Firth(1957):“Youshallknowawordbythecompanyitkeeps!”3
• Alsocalledvector-spacemodelsofmeaning• Offermuchhigherrecallthanhand-builtthesauri
• Althoughtheytendtohavelowerprecision
Intuitionofdistributionalwordsimilarity
• Nida example:A bottle of tesgüino is on the tableEverybody likes tesgüinoTesgüino makes you drunkWe make tesgüino out of corn.
• Fromcontextwordshumanscanguesstesgüinomeans• analcoholicbeveragelikebeer
• Intuitionforalgorithm:• Twowordsaresimilariftheyhavesimilarwordcontexts.
As#You#Like#It Twelfth#Night Julius#Caesar Henry#Vbattle 1 1 8 15soldier 2 2 12 36fool 37 58 1 5clown 6 117 0 0
Reminder:Term-documentmatrix
• Eachcell:countoftermt inadocumentd:tft,d:• Eachdocumentisacountvectorinℕv:acolumnbelow
5
Reminder:Term-documentmatrix
• Twodocumentsaresimilariftheirvectorsaresimilar
6
As#You#Like#It Twelfth#Night Julius#Caesar Henry#Vbattle 1 1 8 15soldier 2 2 12 36fool 37 58 1 5clown 6 117 0 0
Thewordsinaterm-documentmatrix
• EachwordisacountvectorinℕD:arowbelow
7
As#You#Like#It Twelfth#Night Julius#Caesar Henry#Vbattle 1 1 8 15soldier 2 2 12 36fool 37 58 1 5clown 6 117 0 0
Thewordsinaterm-documentmatrix
• Twowords aresimilariftheirvectorsaresimilar
8
As#You#Like#It Twelfth#Night Julius#Caesar Henry#Vbattle 1 1 8 15soldier 2 2 12 36fool 37 58 1 5clown 6 117 0 0
TheTerm-Contextmatrix
• Insteadofusingentiredocuments,usesmallercontexts• Paragraph• Windowof10words
• Awordisnowdefinedbyavectorovercountsofcontextwords
9
Samplecontexts:20words(Browncorpus)• equalamountofsugar,aslicedlemon,atablespoonfulofapricot
preserveorjam,apincheachofcloveandnutmeg,• onboardfortheirenjoyment.Cautiouslyshesampledherfirst
pineapple andanotherfruitwhosetasteshelikenedtothatof
10
• ofarecursivetypewellsuitedtoprogrammingonthedigital computer.InfindingtheoptimalR-stagepolicyfromthatof
• substantiallyaffectcommerce,forthepurposeofgatheringdataandinformation necessaryforthestudyauthorizedinthefirstsectionofthis
Term-contextmatrixforwordsimilarity
• Twowords aresimilarinmeaningiftheircontextvectorsaresimilar
11
aardvark computer data pinch result sugar …apricot 0 0 0 1 0 1pineapple 0 0 0 1 0 1digital 0 2 1 0 1 0information 0 1 6 0 4 0
Shouldweuserawcounts?
• Fortheterm-documentmatrix• Weused tf-idf insteadofrawtermcounts
• Fortheterm-contextmatrix• PositivePointwise MutualInformation(PPMI)iscommon
12
Pointwise MutualInformation• Pointwise mutualinformation:
• Doeventsxandyco-occurmorethaniftheywereindependent?
• PMIbetweentwowords:(Church&Hanks1989)• Dowordsxandyco-occurmorethaniftheywereindependent?
• PositivePMIbetweentwowords(Niwa &Nitta1994)• ReplaceallPMIvalueslessthan0withzero
PMI(X,Y ) = log2P(x,y)P(x)P(y)
PMI(word1,word2 ) = log2P(word1,word2)P(word1)P(word2)
ComputingPPMIonaterm-contextmatrix
• MatrixF withW rows(words)andC columns(contexts)• fij is#oftimeswi occursincontextcj
14
pij =fij
fijj=1
C
∑i=1
W
∑pi* =
fijj=1
C
∑
fijj=1
C
∑i=1
W
∑p* j =
fiji=1
W
∑
fijj=1
C
∑i=1
W
∑
pmiij = log2pij
pi*p* jppmiij =
pmiij if pmiij > 0
0 otherwise
!"#
$#
p(w=information,c=data)=p(w=information)=p(c=data)=
15
p(w,context) p(w)computer data pinch result sugar
apricot 0.00 0.00 0.05 0.00 0.05 0.11pineapple 0.00 0.00 0.05 0.00 0.05 0.11digital 0.11 0.05 0.00 0.05 0.00 0.21information 0.05 0.32 0.00 0.21 0.00 0.58
p(context) 0.16 0.37 0.11 0.26 0.11
=.326/1911/19 =.58
7/19 =.37
pij =fij
fijj=1
C
∑i=1
W
∑
p(wi ) =fij
j=1
C
∑
Np(cj ) =
fiji=1
W
∑
N
16
pmiij = log2pij
pi*p* j
• pmi(information,data)=log2 (
p(w,context) p(w)computer data pinch result sugar
apricot 0.00 0.00 0.05 0.00 0.05 0.11pineapple 0.00 0.00 0.05 0.00 0.05 0.11digital 0.11 0.05 0.00 0.05 0.00 0.21information 0.05 0.32 0.00 0.21 0.00 0.58
p(context) 0.16 0.37 0.11 0.26 0.11
PPMI(w,context)computer data pinch result sugar
apricot 1 1 2.25 1 2.25pineapple 1 1 2.25 1 2.25digital 1.66 0.00 1 0.00 1information 0.00 0.57 1 0.47 1
.32/ (.37*.58)) =.57
WeighingPMI
• PMIisbiasedtowardinfrequentevents• Variousweightingschemeshelpalleviatethis
• SeeTurney andPantel (2010)
• Add-onesmoothingcanalsohelp
17
18
Add#2%Smoothed%Count(w,context)computer data pinch result sugar
apricot 2 2 3 2 3pineapple 2 2 3 2 3digital 4 3 2 3 2information 3 8 2 6 2
p(w,context),[add02] p(w)computer data pinch result sugar
apricot 0.03 0.03 0.05 0.03 0.05 0.20pineapple 0.03 0.03 0.05 0.03 0.05 0.20digital 0.07 0.05 0.03 0.05 0.03 0.24information 0.05 0.14 0.03 0.10 0.03 0.36
p(context) 0.19 0.25 0.17 0.22 0.17
19
PPMI(w,context).[add22]computer data pinch result sugar
apricot 0.00 0.00 0.56 0.00 0.56pineapple 0.00 0.00 0.56 0.00 0.56digital 0.62 0.00 0.00 0.00 0.00information 0.00 0.58 0.00 0.37 0.00
PPMI(w,context)computer data pinch result sugar
apricot 1 1 2.25 1 2.25pineapple 1 1 2.25 1 2.25digital 1.66 0.00 1 0.00 1information 0.00 0.57 1 0.47 1
Usingsyntaxtodefineaword’scontext• Zellig Harris(1968)
• “Themeaningofentities,andthemeaningofgrammaticalrelationsamongthem,isrelatedtotherestrictionofcombinationsoftheseentitiesrelativetootherentities”
• Twowordsaresimilariftheyhavesimilarparsecontexts• Duty andresponsibility (ChrisCallison-Burch’sexample)
Modifiedbyadjectives
additional,administrative,assumed,collective,congressional,constitutional…
Objectsofverbs assert,assign,assume,attendto,avoid,become,breach…
Co-occurrencevectorsbasedonsyntacticdependencies
• ThecontextsCaredifferentdependencyrelations• Subject-of- “absorb”• Prepositional-objectof“inside”
• Countsforthewordcell:
Dekang Lin,1998“AutomaticRetrievalandClusteringofSimilarWords”
PMIappliedtodependencyrelations
• “Drink it” morecommonthan“drink wine”• But“wine”isabetter“drinkable”thingthan“it”
Objectof“drink” Count PMI
it 3 1.3
anything 3 5.2
wine 2 9.3
tea 2 11.8
liquid 2 10.5
Hindle, Don. 1990. Noun Classification from Predicate-Argument Structure. ACL
Objectof“drink” Count PMItea 2 11.8
liquid 2 10.5
wine 2 9.3
anything 3 5.2
it 3 1.3
WordMeaningandSimilarity
WordSimilarity:DistributionalSimilarity(I)
WordMeaningandSimilarity
WordSimilarity:DistributionalSimilarity(II)
Reminder:cosineforcomputingsimilarity
cos(v, w) =v • wv w
=vv•ww=
viwii=1
N∑vi2
i=1
N∑ wi
2i=1
N∑
Dot product Unit vectors
vi isthePPMIvalueforwordv incontextiwi isthePPMIvalueforwordw incontexti.
Cos(v,w)isthecosinesimilarityofv andw
Sec. 6.3
Cosineasasimilaritymetric
• -1:vectorspointinoppositedirections• +1:vectorspointinsamedirections• 0:vectorsareorthogonal
• RawfrequencyorPPMIarenon-negative,socosinerange0-1
26
large data computerapricot 1 0 0digital 0 1 2information 1 6 1
27
Whichpairofwordsismoresimilar?cosine(apricot,information)=
cosine(digital,information)=
cosine(apricot,digital)=
cos(v, w) =v • wv w
=vv•ww=
viwii=1
N∑vi2
i=1
N∑ wi
2i=1
N∑
1+ 0+ 0
1+ 0+ 0
1+36+1
1+36+1
0+1+ 4
0+1+ 4
1+ 0+ 0
0+ 6+ 2
0+ 0+ 0
=138
= .16
=838 5
= .58
= 0
Otherpossiblesimilaritymeasures
D:KLDivergence
WordMeaningandSimilarity
WordSimilarity:DistributionalSimilarity(II)