multi-relational latent semantic analysis

41
Multi-Relational Latent Semantic Analysis . Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

Upload: jamuna

Post on 22-Feb-2016

58 views

Category:

Documents


0 download

DESCRIPTION

Multi-Relational Latent Semantic Analysis . . Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research. Natural Language Understanding. Build an intelligent system that can interact with human using natural language Research challenge Meaning representation of text - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multi-Relational Latent Semantic  Analysis

Multi-Relational Latent Semantic Analysis.

Kai-Wei ChangJoint work with Scott Wen-tau Yih, Chris MeekMicrosoft Research

Page 2: Multi-Relational Latent Semantic  Analysis

Natural Language UnderstandingBuild an intelligent system that can interact with human using natural language

Research challengeMeaning representation of textSupport useful inferential tasks

Semantic word representation is the foundation

Language is compositionalWord is the basic semantic unit

Page 3: Multi-Relational Latent Semantic  Analysis

Continuous Semantic Representations

A lot of popular methods for creating word vectors!

Vector Space Model [Salton & McGill 83]Latent Semantic Analysis [Deerwester+ 90]Latent Dirichlet Allocation [Blei+ 01]Deep Neural Networks [Collobert & Weston 08]

Encode term co-occurrence informationMeasure semantic similarity well

Page 4: Multi-Relational Latent Semantic  Analysis

Continuous Semantic Representations

sunnyrainy

windycloudy

car

wheel

cab sad

joy

emotion

feeling

Page 5: Multi-Relational Latent Semantic  Analysis

Semantics Needs More Than SimilarityTomorrow

will be rainy. Tomorrow

will be sunny.

rainy, sunny?

rainy, sunny?

Page 6: Multi-Relational Latent Semantic  Analysis

Leverage Linguistic ResourcesCan’t we just use the existing linguistic resources?

Knowledge in these resources is never completeOften lack of degree of relations

Create a continuous semantic representation that

Leverages existing rich linguistic resourcesDiscovers new relationsEnables us to measure the degree of multiple relations (not just similarity)

Page 7: Multi-Relational Latent Semantic  Analysis

RoadmapIntroductionBackground:

Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)

Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation

ExperimentsConclusions

Page 8: Multi-Relational Latent Semantic  Analysis

RoadmapIntroductionBackground:

Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)

Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation

ExperimentsConclusions

Page 9: Multi-Relational Latent Semantic  Analysis

Latent Semantic Analysis [Deerwester+ 1990]

Data representationEncode single-relational data in a matrix

Co-occurrence (e.g., from a general corpus)Synonyms (e.g., from a thesaurus)

FactorizationApply SVD to the matrix to find latent components

Measuring degree of relationCosine of latent vectors

Page 10: Multi-Relational Latent Semantic  Analysis

Cosine Score

Encode Synonyms in MatrixInput: Synonyms from a thesaurus

Joyfulness: joy, gladdenSad: sorrow, sadden

joy gladden

sorrow sadden

goodwill

Group 1: “joyfulness”

1 1 0 0 0

Group 2: “sad” 0 0 1 1 0Group 3: “affection”

0 0 0 0 1

Target word: row-vector

Term: column-vector

Page 11: Multi-Relational Latent Semantic  Analysis

SVD generalizes the original dataUncovers relationships not explicit in the thesaurusTerm vectors projected to -dim latent space

Word similarity: cosine of two column vectors in

Mapping to Latent Space via SVD

𝐖 𝐔 𝐕𝑇≈

𝑑×𝑛 𝑑×𝑘𝑘×𝑘 𝑘×𝑛

𝚺terms

Page 12: Multi-Relational Latent Semantic  Analysis

Problem: Handling Two Opposite RelationsSynonyms & AntonymsLSA cannot distinguish antonyms [Landauer

2002]“Distinguishing synonyms and antonyms is still perceived as a difficult open problem.” [Poon & Domingos 09]

Page 13: Multi-Relational Latent Semantic  Analysis

Polarity Inducing LSA [Yih, Zweig, Platt 2012]

Data representationEncode two opposite relations in a matrix using “polarity”

Synonyms & antonyms (e.g., from a thesaurus)

FactorizationApply SVD to the matrix to find latent components

Measuring degree of relationCosine of latent vectors

Page 14: Multi-Relational Latent Semantic  Analysis

joy gladden

sorrow sadden

goodwill

Group 1: “joyfulness”

1 1 -1 -1 0

Group 2: “sad” -1 -1 1 1 0Group 3: “affection”

0 0 0 0 1

Encode Synonyms & Antonyms in MatrixJoyfulness: joy, gladden; sorrow, sadden

Sad: sorrow, sadden; joy, gladdenInducing polarity

Cosine Score:

Target word: row-vector

Page 15: Multi-Relational Latent Semantic  Analysis

joy gladden

sorrow sadden

goodwill

Group 1: “joyfulness”

1 1 -1 -1 0

Group 2: “sad” -1 -1 1 1 0Group 3: “affection”

0 0 0 0 1

Encode Synonyms & Antonyms in MatrixJoyfulness: joy, gladden; sorrow, sadden

Sad: sorrow, sadden; joy, gladdenInducing polarityTarget word: row-

vector

Cosine Score:

Page 16: Multi-Relational Latent Semantic  Analysis

Limitation of the matrix representationEach entry captures a particular type of relation between two entities, orTwo opposite relations with the polarity trick

Encoding other binary relationsIs-A (hyponym) – ostrich is a birdPart-whole – engine is a part of car

Problem: How to Handle More Relations?

Encode multiple relations in a 3-way tensor (3-dim array)!

Page 17: Multi-Relational Latent Semantic  Analysis

Multi-Relational LSAData representation

Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Page 18: Multi-Relational Latent Semantic  Analysis

Multi-Relational LSAData representation

Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Page 19: Multi-Relational Latent Semantic  Analysis

Multi-Relational LSAData representation

Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Page 20: Multi-Relational Latent Semantic  Analysis

Multi-Relational LSAData representation

Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Page 21: Multi-Relational Latent Semantic  Analysis

Multi-Relational LSAData representation

Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Page 22: Multi-Relational Latent Semantic  Analysis

Represent word relations using a tensorEach slice encodes a relation between terms and target words.

0 0 0 00 0 1 01 0 0 00 0 0 0

joy gladd

ensa

dden

feelin

gjoyfulness

gladdensad

anger

gladd

en

joy sadd

enfee

ling

joyfulnessgladden

sadanger

Synonym layer Antonym layer

1 1 0 01 1 0 00 0 1 00 0 0 0

Construct a tensor with two slices

Encode Multiple Relations in Tensor

Page 23: Multi-Relational Latent Semantic  Analysis

Can encode multiple relations in the tensor

0 0 0 10 0 0 00 0 0 10 0 0 1

gladd

en

joy sadd

enfee

ling

joyfulnessgladden

sadanger

Hyponym layer

1 1 0 01 1 0 00 0 1 00 0 0 0

1 1 0 01 1 0 00 0 1 00 0 0 0

Encode Multiple Relations in Tensor

Page 24: Multi-Relational Latent Semantic  Analysis

Multi-Relational LSAData representation

Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Page 25: Multi-Relational Latent Semantic  Analysis

Derive a low-rank approximation to generalize the data and to discover unseen relationsApply Tucker decomposition and reformulate the results

Tensor Decomposition – Analogy to SVD

𝑤 1,𝑤

2,…,𝑤

𝑑

𝑡1 , 𝑡2 , …, 𝑡𝑛

~~ × ×

𝑡1 , 𝑡2 ,…, 𝑡𝑛

𝑟 𝑟𝑟

𝑟

𝑤 1,𝑤

2,…,𝑤

𝑑

latent representation of words

Page 26: Multi-Relational Latent Semantic  Analysis

𝑤 1,𝑤

2,…,𝑤

𝑑

𝑡1 , 𝑡2 , …, 𝑡𝑛

~~ × ×

𝑡1 , 𝑡2 ,…, 𝑡𝑛

𝑟 𝑟𝑟

𝑟

Derive a low-rank approximation to generalize the data and to discover unseen relationsApply Tucker decomposition and reformulate the results

~~ × ×

𝑟𝑟

𝑅1 ,𝑅

2 ,…,𝑅

𝑚 𝑟

latent representation of words

latent representation of a relation

Tensor Decomposition – Analogy to SVD

Page 27: Multi-Relational Latent Semantic  Analysis

Multi-Relational LSAData representation

Encode multiple relations in a tensorSynonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base)

FactorizationApply tensor decomposition to the tensor to find latent components

Measuring degree of relationCosine of latent vectors after projection

Page 28: Multi-Relational Latent Semantic  Analysis

Measure Degree of RelationSimilarity

Cosine of the latent vectors

Other relation (both symmetric and asymmetric)

Take the latent matrix of the pivot relation (synonym)Take the latent matrix of the relationCosine of the latent vectors after projection

Page 29: Multi-Relational Latent Semantic  Analysis

𝑎𝑛𝑡 ( joy ,sadden )=cos (𝓦: , joy , 𝑠𝑦𝑛 ,𝓦: ,sadden ,𝑎𝑛𝑡 )

0 0 0 00 0 1 01 0 0 00 0 0 0

joy gladd

ensa

dden

fellin

gjoyfulness

gladdensad

anger

gladd

en

joy sadd

enfel

ling

joyfulnessgladden

sadanger

Synonym layer Antonym layer

1 1 0 01 1 0 00 0 1 00 0 0 0

Measure Degree of RelationRaw Representation

Page 30: Multi-Relational Latent Semantic  Analysis

𝑎𝑛𝑡 ( joy ,sadden )=cos (𝓦: , joy , 𝑠𝑦𝑛 ,𝓦: ,sadden ,𝑎𝑛𝑡 )

0 0 0 00 0 1 01 0 0 00 0 0 0

joy gladd

ensa

dden

fellin

gjoyfulness

gladdensad

anger

gladd

en

joy sadd

enfel

ling

joyfulnessgladden

sadanger

Synonym layer Antonym layer

1 1 0 01 1 0 00 0 1 00 0 0 0

Measure Degree of RelationRaw Representation

Page 31: Multi-Relational Latent Semantic  Analysis

𝐻𝑦𝑝𝑒𝑟 ( joy , feeling )=cos (𝑾 : , joy ,𝑠𝑦𝑛 ,𝑾 : ,feeling , h𝑦𝑝𝑒𝑟 )

joy gladd

ensa

dden

fellin

gjoyfulness

gladdensad

angerSynonym layer

1 0 0 01 1 0 00 0 1 00 0 0 0

Estimate the Degree of a RelationRaw Representation

0 0 0 10 0 0 00 0 0 10 0 0 1

gladd

en

joy sadd

enfee

ling

joyfulnessgladden

sadanger

Hypernym layer

Page 32: Multi-Relational Latent Semantic  Analysis

𝑟𝑒𝑙 ( w 𝑖 ,w 𝑗 )=cos (𝑊 : , w 𝑖 , 𝑠𝑦𝑛 ,𝑊 : , w 𝑗 ,𝑟𝑒𝑙 )

Measure Degree of RelationRaw Representation

w 𝑗w𝑖

Synonym layer

The slice of the specific relation

Page 33: Multi-Relational Latent Semantic  Analysis

Cos ( , )

~~ × ×

𝑣1 ,𝑣2, …,𝑣𝑛

𝑟𝑟

𝑆,𝑆

2 ,…,𝑆𝑚 𝑟~~

× ×

𝑟𝑒𝑙 ( w 𝑖 ,w 𝑗 )=cos (𝑺:, : ,𝑠𝑦𝑛𝐕𝑖 , :𝑇 ,𝑺: , :, 𝑟𝑒𝑙𝐕 𝑗 ,:

𝑇 )

Measure Degree of RelationLatent Representation

w 𝑗w𝑖

R 𝑠𝑦𝑛

R 𝑟𝑒𝑙 v 𝑗 v 𝑖

Page 34: Multi-Relational Latent Semantic  Analysis

RoadmapIntroductionBackground:

Latent Semantic Analysis (LSA)Polarity Inducing LSA (PILSA)

Multi-Relational Latent Semantic Analysis (MRLSA)

Encoding multi-relational data in a tensorTensor decomposition & measuring degree of a relation

ExperimentsConclusions

Page 35: Multi-Relational Latent Semantic  Analysis

Experiment: Data for Building MRLSA ModelEncarta Thesaurus

Record synonyms and antonyms of target wordsVocabulary of 50k terms and 47k target words

WordNetHas synonym, antonym, hyponym, hypernym relationsVocabulary of 149k terms and 117k target words

Goals:MRLSA generalizes LSA to model multiple relationsImprove performance by combing heterogeneous data

Page 36: Multi-Relational Latent Semantic  Analysis

Example Antonyms Output by MRLSA

Target High Score Wordsinanimate

alive, living, bodily, in-the-flesh, incarnate

alleviate exacerbate, make-worse, in-flame, amplify, stir-up

relish detest, abhor, abominate, despise, loathe

* Words in blue are antonyms listed in the Encarta thesaurus.

Page 37: Multi-Relational Latent Semantic  Analysis

Results – GRE Antonym TestTask: GRE closest-opposite questions

Which is the closest opposite of adulterate?(a) renounce (b) forbid (c) purify (d) criticize (e) correct

Mohammad et al. 08 Lookup PILSA MRLSA0.5

0.550.6

0.650.7

0.750.8

0.85

0.64

0.56

0.74 0.77

Accu

racy

Page 38: Multi-Relational Latent Semantic  Analysis

Example Hyponyms Output by MRLSATarget High Score Wordsbird ostrich, gamecock, nighthawk, amazon,

parrotautomobile

minivan, wagon, taxi, minicab, gypsy cab

vegetable buttercrunch, yellow turnip, romaine, chipotle, chilli

Page 39: Multi-Relational Latent Semantic  Analysis

Results – Relational Similarity (SemEval-2012)Task: Class-Inclusion Relation ( is-a kind of )

Most/least illustrative word pairs(a) art:abstract (b) song:opera (c) footwear:boot (d) hair:brown

UTD Lookup MRLSA0.3

0.350.4

0.450.5

0.550.6

0.340.37

0.56

Accu

racy

Page 40: Multi-Relational Latent Semantic  Analysis

Relational domain knowledge – the entity type

Relation can only hold between the right types of entities

Words having is-a relation have the same part-of-speechFor relation born-in, the entity types are: (person, location)

Leverage type information to improve MRLSA

Idea #3: Change the objective function

Problem: Use Relational Domain Knowledge

Page 41: Multi-Relational Latent Semantic  Analysis

ConclusionsContinuous semantic representation that

Leverages existing rich linguistic resourcesDiscovers new relationsEnables us to measure the degree of multiple relations

ApproachesBetter data representationMatrix/Tensor decomposition

Challenges & Future WorkCapture more types of knowledge in the modelSupport more sophisticated inferential tasks