discovering coherent topics using general knowledgezchen/presentations/cikm...issues of lr-sets . no...
TRANSCRIPT
Discovering Coherent Topics Using General Knowledge
Zhiyuan (Brett) Chen Arjun Mukherjee
Bing Liu
Meichun Hsu Malu Castellanos Riddhiman Ghosh
http://www.cs.uic.edu/~zchen/
Document 1
Topic Model …
Document 2
Document M
Topic 1
Topic 2
Topic T
…
Topic Model
Coherent Topics
Price Cheap Expensive Cost Money Pricey Dollar
Coherent Topics
Price Cheap Expensive Cost Money Pricey Dollar
Price Family Cheap Expensive Politics Cost Size
Issues of Unsupervised Topic Models
Objective functions do not correlate well with human judgments (Chang et al., 2009).
Many topics are not coherent.
Remedy: Knowledge-based Topic Models
Knowledge-based Topic Models
DF-LDA (Andrzejewski et al., 2009)
Picture Photo Must-Link
Picture Price Cannot-Link
Knowledge-based Topic Models
Seeded models (Burns et al., 2012; Jagarlamudi et al., 2012; Lu et al., 2011; Mukherjee and Liu, 2012)
DF-LDA (Andrzejewski et al., 2009)
Knowledge is correct for a domain.
Knowledge Assumptions
Knowledge is correct for a domain. Knowledge is domain dependent.
Knowledge Assumptions
Existing Model Flow
Existing Model Flow
Existing Model Flow
Existing Model Flow
Existing Model Flow
Existing Model Flow
Our Proposed Model Flow
Our Proposed Model Flow
Our Proposed Model Flow General Knowledge
May be wrong for a domain
Domain Independent
General Knowledge
Synonyms {Expensive, Pricey}
Lexical Semantic Relations
Antonyms {Expensive, Cheap}
Adjective-Attributes {Expensive, Price}
Synonyms {Expensive, Pricey}
Lexical Semantic Relations
Antonyms {Expensive, Cheap}
Adjective-Attributes {Expensive, Price}
WordNet
(Fei et al. 2012)
LR-Sets
Example: {Expensive, Pricey, Cheap, Price}
LR-Sets (Lexical Relation)
Example: {Expensive, Pricey, Cheap, Price}
Words should be in the same topic
Issues of LR-Sets
Partially wrong knowledge
No correct LR-sets for a word
Issues of LR-Sets
No correct LR-sets for a word
Card
{Card, Menu}
{Card, Bill}
Issues of LR-Sets
No correct LR-sets for a word {Card, Menu}
{Card, Bill}
Issues of LR-Sets
No correct LR-sets for a word {Card, Menu}
{Card, Bill}
Issues of LR-Sets
Partially wrong knowledge
{Picture, Pic, Flick} Picture
Issues of LR-Sets
Partially wrong knowledge
{Picture, Pic, Flick}
Addressing Issues
Word Correlation + GPU
Relaxing wrong sets for a word
Partially wrong knowledge
No correct LR-sets for a word
Addressing Issues
Word Correlation + GPU
Relaxing wrong sets for a word
Partially wrong knowledge
No correct LR-sets for a word
Relaxing Wrong LR-sets {Card, Menu}
{Card, Bill}
Relaxing Wrong LR-sets {Card, Menu}
{Card, Bill}
{Card}
Estimate Knowledge
{Picture, Image}
{Picture, Painting}
Word Distributions From LDA
Word Prob Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002
Estimate Word Correlation
Word Prob Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002
{Picture, Image}
{Picture, Painting}
Word Correlation Matrix C
Word Prob Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002
{Picture, Image}
{Picture, Painting}
0.15 / 0.20
0.0002 / 0.20
Quality of LR-set s Towards w
Relaxing Wrong LR-sets {Card, Menu}
{Card, Bill}
Q(s1, “Card”) < ɛ
Q(s2, “Card”) < ɛ
Relaxing Wrong LR-sets {Card, Menu}
{Card, Bill}
{Card}
Q(s1, “Card”) < ɛ
Q(s2, “Card”) < ɛ
Addressing Issues
Word Correlation + GPU
Relaxing wrong sets for a word
Partially wrong knowledge
No correct LR-sets for a word
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU)
Simple Pólya Urn Model (SPU)
The richer get richer!
Interpreting LDA Under SPU
Topic 0
picture Interpreting LDA Under SPU
Topic 0
picture
Interpreting LDA Under SPU picture
Generalized Pólya Urn Model (GPU)
Generalized Pólya Urn Model (GPU)
Generalized Pólya Urn Model (GPU)
Generalized Pólya Urn Model (GPU)
Generalized Pólya Urn Model (GPU)
Topic 0
Applying GPU picture
Topic 0
painting
image
picture picture
Applying GPU
Topic 0
painting
image
Applying GPU
Word Correlation
picture picture
Addressing Issues
Word Correlation + GPU
Relaxing wrong sets for a word
Partially wrong knowledge
No correct LR-sets for a word
Evaluation
Evaluation
Evaluation
Human Evaluation Topic Coherence
KL-Divergence
Four domains
Model Comparison
LDA (Blei et al., 2003)
DF-LDA (Andrzejewski et al., 2009) MDK-LDA (Chen et al., 2013)
LDA-GPU (Mimno et al., 2011)
GK-LDA
KL-Divergence
Topic Coherence (#T = 15)
Human Evaluation
Example Topics
love
Conclusions Discovering Coherent Topics Using General Knowledge
Conclusions Discovering Coherent Topics Using General Knowledge
Partially wrong knowledge
No correct LR-sets for a word
Conclusions Discovering Coherent Topics Using General Knowledge
Word Correlation + GPU
Relaxing wrong sets for a word
Partially wrong knowledge
No correct LR-sets for a word
Datasets: http://www.cs.uic.edu/~zchen/
Datasets: http://www.cs.uic.edu/~zchen/