word similarity: distributional similarity...

WordMeaningandSimilarity

WordSimilarity:DistributionalSimilarity(I)

Problemswiththesaurus-basedmeaning

• Wedon’thaveathesaurusforeverylanguage• Evenifwedo,theyhaveproblemswithrecall

• Manywordsaremissing• Most(ifnotall)phrasesaremissing• Someconnectionsbetweensensesaremissing• Thesauriworklesswellforverbs,adjectives• Adjectivesandverbshavelessstructuredhyponymyrelations

Distributionalmodelsofmeaning

• Alsocalledvector-spacemodelsofmeaning• Offermuchhigherrecallthanhand-builtthesauri

• Althoughtheytendtohavelowerprecision

• Zellig Harris(1954):“oculistandeye-doctor…occurinalmostthesameenvironments….IfAandBhavealmostidenticalenvironmentswesaythattheyaresynonyms.

• Firth(1957):“Youshallknowawordbythecompanyitkeeps!”3

• Alsocalledvector-spacemodelsofmeaning• Offermuchhigherrecallthanhand-builtthesauri

• Althoughtheytendtohavelowerprecision

Intuitionofdistributionalwordsimilarity

• Nida example:A bottle of tesgüino is on the tableEverybody likes tesgüinoTesgüino makes you drunkWe make tesgüino out of corn.

• Fromcontextwordshumanscanguesstesgüinomeans• analcoholicbeveragelikebeer

• Intuitionforalgorithm:• Twowordsaresimilariftheyhavesimilarwordcontexts.

As#You#Like#It Twelfth#Night Julius#Caesar Henry#Vbattle 1 1 8 15soldier 2 2 12 36fool 37 58 1 5clown 6 117 0 0

Reminder:Term-documentmatrix

• Eachcell:countoftermt inadocumentd:tft,d:• Eachdocumentisacountvectorinℕv:acolumnbelow

5

Reminder:Term-documentmatrix

• Twodocumentsaresimilariftheirvectorsaresimilar

6


Thewordsinaterm-documentmatrix

• EachwordisacountvectorinℕD:arowbelow

7


Thewordsinaterm-documentmatrix

• Twowords aresimilariftheirvectorsaresimilar

8


TheTerm-Contextmatrix

• Insteadofusingentiredocuments,usesmallercontexts• Paragraph• Windowof10words

• Awordisnowdefinedbyavectorovercountsofcontextwords

9

Samplecontexts:20words(Browncorpus)• equalamountofsugar,aslicedlemon,atablespoonfulofapricot

preserveorjam,apincheachofcloveandnutmeg,• onboardfortheirenjoyment.Cautiouslyshesampledherfirst

pineapple andanotherfruitwhosetasteshelikenedtothatof

10

• ofarecursivetypewellsuitedtoprogrammingonthedigital computer.InfindingtheoptimalR-stagepolicyfromthatof

• substantiallyaffectcommerce,forthepurposeofgatheringdataandinformation necessaryforthestudyauthorizedinthefirstsectionofthis

Term-contextmatrixforwordsimilarity

• Twowords aresimilarinmeaningiftheircontextvectorsaresimilar

11

aardvark computer data pinch result sugar …apricot 0 0 0 1 0 1pineapple 0 0 0 1 0 1digital 0 2 1 0 1 0information 0 1 6 0 4 0

Shouldweuserawcounts?

• Fortheterm-documentmatrix• Weused tf-idf insteadofrawtermcounts

• Fortheterm-contextmatrix• PositivePointwise MutualInformation(PPMI)iscommon

12

Pointwise MutualInformation• Pointwise mutualinformation:

• Doeventsxandyco-occurmorethaniftheywereindependent?

• PMIbetweentwowords:(Church&Hanks1989)• Dowordsxandyco-occurmorethaniftheywereindependent?

• PositivePMIbetweentwowords(Niwa &Nitta1994)• ReplaceallPMIvalueslessthan0withzero

PMI(X,Y ) = log2P(x,y)P(x)P(y)

PMI(word1,word2 ) = log2P(word1,word2)P(word1)P(word2)

ComputingPPMIonaterm-contextmatrix

• MatrixF withW rows(words)andC columns(contexts)• fij is#oftimeswi occursincontextcj

14

pij =fij

fijj=1

C

∑i=1

W

∑pi* =

fijj=1

C

∑

fijj=1

C

∑i=1

W

∑p* j =

fiji=1

W

∑

fijj=1

C

∑i=1

W

∑

pmiij = log2pij

pi*p* jppmiij =

pmiij if pmiij > 0

0 otherwise

!"#

$#

p(w=information,c=data)=p(w=information)=p(c=data)=

15

p(w,context) p(w)computer data pinch result sugar

apricot 0.00 0.00 0.05 0.00 0.05 0.11pineapple 0.00 0.00 0.05 0.00 0.05 0.11digital 0.11 0.05 0.00 0.05 0.00 0.21information 0.05 0.32 0.00 0.21 0.00 0.58

p(context) 0.16 0.37 0.11 0.26 0.11

=.326/1911/19 =.58

7/19 =.37

pij =fij

fijj=1

C

∑i=1

W

∑

p(wi ) =fij

j=1

C

∑

Np(cj ) =

fiji=1

W

∑

N

16

pmiij = log2pij

pi*p* j

• pmi(information,data)=log2 (

p(w,context) p(w)computer data pinch result sugar


p(context) 0.16 0.37 0.11 0.26 0.11

PPMI(w,context)computer data pinch result sugar

apricot 1 1 2.25 1 2.25pineapple 1 1 2.25 1 2.25digital 1.66 0.00 1 0.00 1information 0.00 0.57 1 0.47 1

.32/ (.37*.58)) =.57

WeighingPMI

• PMIisbiasedtowardinfrequentevents• Variousweightingschemeshelpalleviatethis

• SeeTurney andPantel (2010)

• Add-onesmoothingcanalsohelp

17

18

Add#2%Smoothed%Count(w,context)computer data pinch result sugar

apricot 2 2 3 2 3pineapple 2 2 3 2 3digital 4 3 2 3 2information 3 8 2 6 2

p(w,context),[add02] p(w)computer data pinch result sugar


p(context) 0.19 0.25 0.17 0.22 0.17

19

PPMI(w,context).[add22]computer data pinch result sugar

apricot 0.00 0.00 0.56 0.00 0.56pineapple 0.00 0.00 0.56 0.00 0.56digital 0.62 0.00 0.00 0.00 0.00information 0.00 0.58 0.00 0.37 0.00

PPMI(w,context)computer data pinch result sugar

apricot 1 1 2.25 1 2.25pineapple 1 1 2.25 1 2.25digital 1.66 0.00 1 0.00 1information 0.00 0.57 1 0.47 1

Usingsyntaxtodefineaword’scontext• Zellig Harris(1968)

• “Themeaningofentities,andthemeaningofgrammaticalrelationsamongthem,isrelatedtotherestrictionofcombinationsoftheseentitiesrelativetootherentities”

• Twowordsaresimilariftheyhavesimilarparsecontexts• Duty andresponsibility (ChrisCallison-Burch’sexample)

Modifiedbyadjectives

additional,administrative,assumed,collective,congressional,constitutional…

Objectsofverbs assert,assign,assume,attendto,avoid,become,breach…

Co-occurrencevectorsbasedonsyntacticdependencies

• ThecontextsCaredifferentdependencyrelations• Subject-of- “absorb”• Prepositional-objectof“inside”

• Countsforthewordcell:

Dekang Lin,1998“AutomaticRetrievalandClusteringofSimilarWords”

PMIappliedtodependencyrelations

• “Drink it” morecommonthan“drink wine”• But“wine”isabetter“drinkable”thingthan“it”

Objectof“drink” Count PMI

it 3 1.3

anything 3 5.2

wine 2 9.3

tea 2 11.8

liquid 2 10.5

Hindle, Don. 1990. Noun Classification from Predicate-Argument Structure. ACL

Objectof“drink” Count PMItea 2 11.8

liquid 2 10.5

wine 2 9.3

anything 3 5.2

it 3 1.3


WordSimilarity:DistributionalSimilarity(I)


WordSimilarity:DistributionalSimilarity(II)

Reminder:cosineforcomputingsimilarity

cos(v, w) =v • wv w

=vv•ww=

viwii=1

N∑vi2

i=1

N∑ wi

2i=1

N∑

Dot product Unit vectors

vi isthePPMIvalueforwordv incontextiwi isthePPMIvalueforwordw incontexti.

Cos(v,w)isthecosinesimilarityofv andw

Sec. 6.3

Cosineasasimilaritymetric

• -1:vectorspointinoppositedirections• +1:vectorspointinsamedirections• 0:vectorsareorthogonal

• RawfrequencyorPPMIarenon-negative,socosinerange0-1

26

large data computerapricot 1 0 0digital 0 1 2information 1 6 1

27

Whichpairofwordsismoresimilar?cosine(apricot,information)=

cosine(digital,information)=

cosine(apricot,digital)=

cos(v, w) =v • wv w

=vv•ww=

viwii=1

N∑vi2

i=1

N∑ wi

2i=1

N∑

1+ 0+ 0

1+ 0+ 0

1+36+1

1+36+1

0+1+ 4

0+1+ 4

1+ 0+ 0

0+ 6+ 2

0+ 0+ 0

=138

= .16

=838 5

= .58

= 0

Otherpossiblesimilaritymeasures

D:KLDivergence


WordSimilarity:DistributionalSimilarity(II)

word similarity: distributional similarity...

Documents