multi-relational latent semantic analysis
DESCRIPTION
Multi-Relational Latent Semantic Analysis. Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research. Natural Language Understanding. Build an intelligent system that can interact with human using natural language Research challenge Meaning representation of text - PowerPoint PPT PresentationTRANSCRIPT
Multi-Relational Latent Semantic Analysis
Kai-Wei ChangJoint work with Scott Wen-tau Yih, Chris MeekMicrosoft Research
Natural Language UnderstandingBuild an intelligent system that can interact with human using natural language
Research challengeMeaning representation of textSupport useful inferential tasks
Semantic word representation is the foundation
Language is compositionalWord is the basic semantic unit
Continuous Semantic Representations
A lot of popular methods for creating word vectors!
Vector Space Model [Salton & McGill 83]Latent Semantic Analysis [Deerwester+ 90]Latent Dirichlet Allocation [Blei+ 01]Deep Neural Networks [Collobert & Weston 08]
Encode term co-occurrence informationMeasure semantic similarity well
Continuous Semantic Representations
sunnyrainy
windycloudy
car
wheel
cab sad
joy
emotion
feeling
Semantics Needs More Than SimilarityTomorrow
will be rainy. Tomorrow
will be sunny.
rainy, sunny?
rainy, sunny?
Leverage Linguistic ResourcesCan’t we just use the existing linguistic resources?
Knowledge in these resources is never completeOften lack of degree of relations
Create a continuous semantic representation that
Leverages existing rich linguistic resourcesDiscovers new relationsEnables us to measure the degree of multiple relations (not just similarity)
RoadmapIntroductionBackground
Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)
Multi-Relational Latent Semantic Analysis (MRLSA)
Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation
ExperimentsConclusions
RoadmapIntroductionBackground
Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)
Multi-Relational Latent Semantic Analysis (MRLSA)
Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation
ExperimentsConclusions
Latent Semantic Analysis [Deerwester+ 1990]
Data representationEncode single-relational data in a matrix
Co-occurrence (e.g., from a general corpus)Synonyms (e.g., from a thesaurus)
FactorizationApply SVD to the matrix to find latent components
Measuring degree of relationCosine of latent vectors
Cosine Score
Encode Synonyms in MatrixInput: Synonyms from a thesaurus
Joyfulness: joy, gladdenSad: sorrow, sadden
joy gladden
sorrow sadden
goodwill
Group 1: “joyfulness”
1 1 0 0 0
Group 2: “sad” 0 0 1 1 0Group 3: “affection”
0 0 0 0 1
Target word: row-vector
Term: column-vector
SVD generalizes the original dataUncovers relationships not explicit in the thesaurusTerm vectors projected to -dim latent space
Word similarity: cosine of two column vectors in
Mapping to Latent Space via SVD
𝐖 𝐔 𝐕𝑇≈
𝑑×𝑛 𝑑×𝑘𝑘×𝑘 𝑘×𝑛
𝚺terms
Problem: Handling Two Opposite RelationsSynonyms & AntonymsLSA cannot distinguish antonyms [Landauer
2002]“Distinguishing synonyms and antonyms is still perceived as a difficult open problem.” [Poon & Domingos 09]
Polarity Inducing LSA [Yih, Zweig, Platt 2012]
Data representationEncode two opposite relations in a matrix using “polarity”
Synonyms & antonyms (e.g., from a thesaurus)
FactorizationApply SVD to the matrix to find latent components
Measuring degree of relationCosine of latent vectors
joy gladden
sorrow sadden
goodwill
Group 1: “joyfulness”
1 1 -1 -1 0
Group 2: “sad” -1 -1 1 1 0Group 3: “affection”
0 0 0 0 1
Encode Synonyms & Antonyms in MatrixJoyfulness: joy, gladden; sorrow, sadden
Sad: sorrow, sadden; joy, gladdenInducing polarity
Cosine Score:
Target word: row-vector
joy gladden
sorrow sadden
goodwill
Group 1: “joyfulness”
1 1 -1 -1 0
Group 2: “sad” -1 -1 1 1 0Group 3: “affection”
0 0 0 0 1
Encode Synonyms & Antonyms in MatrixJoyfulness: joy, gladden; sorrow, sadden
Sad: sorrow, sadden; joy, gladdenInducing polarityTarget word: row-
vector
Cosine Score:
Limitation of the matrix representationEach entry captures a particular type of relation between two entities, orTwo opposite relations with the polarity trick
Encoding other binary relationsIs-A (hyponym) – ostrich is a birdPart-whole – engine is a part of car
Problem: How to Handle More Relations?
Encode multiple relations in a 3-way tensor (3-dim array)!
Multi-Relational LSAData representation
Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)
FactorizationApply tensor decomposition to the tensor to find latent components
Measuring degree of relationCosine of latent vectors after projection
Multi-Relational LSAData representation
Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)
FactorizationApply tensor decomposition to the tensor to find latent components
Measuring degree of relationCosine of latent vectors after projection
Multi-Relational LSAData representation
Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)
FactorizationApply tensor decomposition to the tensor to find latent components
Measuring degree of relationCosine of latent vectors after projection
Multi-Relational LSAData representation
Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)
FactorizationApply tensor decomposition to the tensor to find latent components
Measuring degree of relationCosine of latent vectors after projection
Multi-Relational LSAData representation
Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)
FactorizationApply tensor decomposition to the tensor to find latent components
Measuring degree of relationCosine of latent vectors after projection
Represent word relations using a tensorEach slice encodes a relation between terms and target words.
0 0 0 00 0 1 01 0 0 00 0 0 0
joy gladd
ensa
dden
feelin
gjoyfulness
gladdensad
anger
gladd
en
joy sadd
enfee
ling
joyfulnessgladden
sadanger
Synonym layer Antonym layer
1 1 0 01 1 0 00 0 1 00 0 0 0
Construct a tensor with two slices
Encode Multiple Relations in Tensor
Can encode multiple relations in the tensor
0 0 0 10 0 0 00 0 0 10 0 0 1
gladd
en
joy sadd
enfee
ling
joyfulnessgladden
sadanger
Hyponym layer
1 1 0 01 1 0 00 0 1 00 0 0 0
1 1 0 01 1 0 00 0 1 00 0 0 0
Encode Multiple Relations in Tensor
Multi-Relational LSAData representation
Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)
FactorizationApply tensor decomposition to the tensor to find latent components
Measuring degree of relationCosine of latent vectors after projection
Derive a low-rank approximation to generalize the data and to discover unseen relationsApply Tucker decomposition and reformulate the results
Tensor Decomposition – Analogy to SVD
𝑤 1,𝑤
2,…,𝑤
𝑑
𝑡1 , 𝑡2 , …, 𝑡𝑛
~~ × ×
𝑡1 , 𝑡2 ,…, 𝑡𝑛
𝑟 𝑟𝑟
𝑟
𝑤 1,𝑤
2,…,𝑤
𝑑
latent representation of words
𝑤 1,𝑤
2,…,𝑤
𝑑
𝑡1 , 𝑡2 , …, 𝑡𝑛
~~ × ×
𝑡1 , 𝑡2 ,…, 𝑡𝑛
𝑟 𝑟𝑟
𝑟
Derive a low-rank approximation to generalize the data and to discover unseen relationsApply Tucker decomposition and reformulate the results
~~ × ×
𝑟𝑟
𝑅1 ,𝑅
2 ,…,𝑅
𝑚 𝑟
latent representation of words
latent representation of a relation
Tensor Decomposition – Analogy to SVD
Multi-Relational LSAData representation
Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)
FactorizationApply tensor decomposition to the tensor to find latent components
Measuring degree of relationCosine of latent vectors after projection
Measure Degree of RelationSimilarity
Cosine of the latent vectors
Other relation (both symmetric and asymmetric)
Take the latent matrix of the pivot relation (synonym)Take the latent matrix of the relationCosine of the latent vectors after projection
𝑎𝑛𝑡 ( joy ,sadden )=cos (𝓦: , joy , 𝑠𝑦𝑛 ,𝓦: ,sadden ,𝑎𝑛𝑡 )
0 0 0 00 0 1 01 0 0 00 0 0 0
joy gladd
ensa
dden
fellin
gjoyfulness
gladdensad
anger
gladd
en
joy sadd
enfel
ling
joyfulnessgladden
sadanger
Synonym layer Antonym layer
1 1 0 01 1 0 00 0 1 00 0 0 0
Measure Degree of RelationRaw Representation
𝑎𝑛𝑡 ( joy ,sadden )=cos (𝓦: , joy , 𝑠𝑦𝑛 ,𝓦: ,sadden ,𝑎𝑛𝑡 )
0 0 0 00 0 1 01 0 0 00 0 0 0
joy gladd
ensa
dden
fellin
gjoyfulness
gladdensad
anger
gladd
en
joy sadd
enfel
ling
joyfulnessgladden
sadanger
Synonym layer Antonym layer
1 1 0 01 1 0 00 0 1 00 0 0 0
Measure Degree of RelationRaw Representation
𝐻𝑦𝑝𝑒𝑟 ( joy , feeling )=cos (𝑾 : , joy ,𝑠𝑦𝑛 ,𝑾 : ,feeling , h𝑦𝑝𝑒𝑟 )
joy gladd
ensa
dden
fellin
gjoyfulness
gladdensad
angerSynonym layer
1 0 0 01 1 0 00 0 1 00 0 0 0
Estimate the Degree of a RelationRaw Representation
0 0 0 10 0 0 00 0 0 10 0 0 1
gladd
en
joy sadd
enfee
ling
joyfulnessgladden
sadanger
Hypernym layer
𝑟𝑒𝑙 ( w 𝑖 ,w 𝑗 )=cos (𝑊 : , w 𝑖 , 𝑠𝑦𝑛 ,𝑊 : , w 𝑗 ,𝑟𝑒𝑙 )
Measure Degree of RelationRaw Representation
w 𝑗w𝑖
Synonym layer
The slice of the specific relation
Cos ( , )
~~ × ×
𝑣1 ,𝑣2, …,𝑣𝑛
𝑟𝑟
𝑆,𝑆
2 ,…,𝑆𝑚 𝑟~~
× ×
𝑟𝑒𝑙 ( w 𝑖 ,w 𝑗 )=cos (𝑺:, : ,𝑠𝑦𝑛𝐕𝑖 , :𝑇 ,𝑺: , :, 𝑟𝑒𝑙𝐕 𝑗 ,:
𝑇 )
Measure Degree of RelationLatent Representation
w 𝑗w𝑖
R 𝑠𝑦𝑛
R 𝑟𝑒𝑙 v 𝑗 v 𝑖
RoadmapIntroductionBackground
Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)
Multi-Relational Latent Semantic Analysis (MRLSA)
Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation
ExperimentsConclusions
Experiment: Data for Building MRLSA ModelEncarta Thesaurus
Record synonyms and antonyms of target wordsVocabulary of 50k terms and 47k target words
WordNetHas synonym, antonym, hyponym, hypernym relationsVocabulary of 149k terms and 117k target words
Goals:MRLSA generalizes LSA to model multiple relationsImprove performance by combing heterogeneous data
Example Antonyms Output by MRLSA
Target High Score Wordsinanimate
alive, living, bodily, in-the-flesh, incarnate
alleviate exacerbate, make-worse, in-flame, amplify, stir-up
relish detest, abhor, abominate, despise, loathe
* Words in blue are antonyms listed in the Encarta thesaurus.
Results – GRE Antonym TestTask: GRE closest-opposite questions
Which is the closest opposite of adulterate?(a) renounce (b) forbid (c) purify (d) criticize (e) correct
1 2 3 40.5
0.550.6
0.650.7
0.750.8
0.85
0.64
0.56
0.74 0.77
Accu
racy
Example Hyponyms Output by MRLSATarget High Score Wordsbird ostrich, gamecock, nighthawk, amazon,
parrotautomobile
minivan, wagon, taxi, minicab, gypsy cab
vegetable buttercrunch, yellow turnip, romaine, chipotle, chilli
Results – Relational Similarity (SemEval-2012)Task: Class-Inclusion Relation ( is-a kind of )
Most/least illustrative word pairs(a) art:abstract (b) song:opera (c) footwear:boot (d) hair:brown
1 2 30.3
0.350.4
0.450.5
0.550.6
0.340.37
0.56
Accu
racy
ConclusionsContinuous semantic representation that
Leverages existing rich linguistic resourcesDiscovers new relationsEnables us to measure the degree of multiple relations
ApproachesBetter data representationMatrix/Tensor decomposition
Challenges & Future WorkCapture more types of knowledge in the modelSupport more sophisticated inferential tasks