discovering coherent topics using general knowledgezchen/presentations/cikm...issues of lr-sets . no...

Post on 11-Jul-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Discovering Coherent Topics Using General Knowledge

Zhiyuan (Brett) Chen Arjun Mukherjee

Bing Liu

Meichun Hsu Malu Castellanos Riddhiman Ghosh

http://www.cs.uic.edu/~zchen/

Document 1

Topic Model …

Document 2

Document M

Topic 1

Topic 2

Topic T

Topic Model

Coherent Topics

Price Cheap Expensive Cost Money Pricey Dollar

Coherent Topics

Price Cheap Expensive Cost Money Pricey Dollar

Price Family Cheap Expensive Politics Cost Size

Issues of Unsupervised Topic Models

Objective functions do not correlate well with human judgments (Chang et al., 2009).

Many topics are not coherent.

Remedy: Knowledge-based Topic Models

Knowledge-based Topic Models

DF-LDA (Andrzejewski et al., 2009)

Picture Photo Must-Link

Picture Price Cannot-Link

Knowledge-based Topic Models

Seeded models (Burns et al., 2012; Jagarlamudi et al., 2012; Lu et al., 2011; Mukherjee and Liu, 2012)

DF-LDA (Andrzejewski et al., 2009)

Knowledge is correct for a domain.

Knowledge Assumptions

Knowledge is correct for a domain. Knowledge is domain dependent.

Knowledge Assumptions

Existing Model Flow

Existing Model Flow

Existing Model Flow

Existing Model Flow

Existing Model Flow

Existing Model Flow

Our Proposed Model Flow

Our Proposed Model Flow

Our Proposed Model Flow General Knowledge

May be wrong for a domain

Domain Independent

General Knowledge

Synonyms {Expensive, Pricey}

Lexical Semantic Relations

Antonyms {Expensive, Cheap}

Adjective-Attributes {Expensive, Price}

Synonyms {Expensive, Pricey}

Lexical Semantic Relations

Antonyms {Expensive, Cheap}

Adjective-Attributes {Expensive, Price}

WordNet

(Fei et al. 2012)

LR-Sets

Example: {Expensive, Pricey, Cheap, Price}

LR-Sets (Lexical Relation)

Example: {Expensive, Pricey, Cheap, Price}

Words should be in the same topic

Issues of LR-Sets

Partially wrong knowledge

No correct LR-sets for a word

Issues of LR-Sets

No correct LR-sets for a word

Card

{Card, Menu}

{Card, Bill}

Issues of LR-Sets

No correct LR-sets for a word {Card, Menu}

{Card, Bill}

Issues of LR-Sets

No correct LR-sets for a word {Card, Menu}

{Card, Bill}

Issues of LR-Sets

Partially wrong knowledge

{Picture, Pic, Flick} Picture

Issues of LR-Sets

Partially wrong knowledge

{Picture, Pic, Flick}

Addressing Issues

Word Correlation + GPU

Relaxing wrong sets for a word

Partially wrong knowledge

No correct LR-sets for a word

Addressing Issues

Word Correlation + GPU

Relaxing wrong sets for a word

Partially wrong knowledge

No correct LR-sets for a word

Relaxing Wrong LR-sets {Card, Menu}

{Card, Bill}

Relaxing Wrong LR-sets {Card, Menu}

{Card, Bill}

{Card}

Estimate Knowledge

{Picture, Image}

{Picture, Painting}

Word Distributions From LDA

Word Prob Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002

Estimate Word Correlation

Word Prob Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002

{Picture, Image}

{Picture, Painting}

Word Correlation Matrix C

Word Prob Picture 0.20 Image 0.15 Photo 0.12 Quality 0.10 Resolution 0.05 … Painting 0.0002

{Picture, Image}

{Picture, Painting}

0.15 / 0.20

0.0002 / 0.20

Quality of LR-set s Towards w

Relaxing Wrong LR-sets {Card, Menu}

{Card, Bill}

Q(s1, “Card”) < ɛ

Q(s2, “Card”) < ɛ

Relaxing Wrong LR-sets {Card, Menu}

{Card, Bill}

{Card}

Q(s1, “Card”) < ɛ

Q(s2, “Card”) < ɛ

Addressing Issues

Word Correlation + GPU

Relaxing wrong sets for a word

Partially wrong knowledge

No correct LR-sets for a word

Simple Pólya Urn Model (SPU)

Simple Pólya Urn Model (SPU)

Simple Pólya Urn Model (SPU)

Simple Pólya Urn Model (SPU)

Simple Pólya Urn Model (SPU)

Simple Pólya Urn Model (SPU)

The richer get richer!

Interpreting LDA Under SPU

Topic 0

picture Interpreting LDA Under SPU

Topic 0

picture

Interpreting LDA Under SPU picture

Generalized Pólya Urn Model (GPU)

Generalized Pólya Urn Model (GPU)

Generalized Pólya Urn Model (GPU)

Generalized Pólya Urn Model (GPU)

Generalized Pólya Urn Model (GPU)

Topic 0

Applying GPU picture

Topic 0

painting

image

picture picture

Applying GPU

Topic 0

painting

image

Applying GPU

Word Correlation

picture picture

Addressing Issues

Word Correlation + GPU

Relaxing wrong sets for a word

Partially wrong knowledge

No correct LR-sets for a word

Evaluation

Evaluation

Evaluation

Human Evaluation Topic Coherence

KL-Divergence

Four domains

Model Comparison

LDA (Blei et al., 2003)

DF-LDA (Andrzejewski et al., 2009) MDK-LDA (Chen et al., 2013)

LDA-GPU (Mimno et al., 2011)

GK-LDA

KL-Divergence

Topic Coherence (#T = 15)

Human Evaluation

Example Topics

love

Conclusions Discovering Coherent Topics Using General Knowledge

Conclusions Discovering Coherent Topics Using General Knowledge

Partially wrong knowledge

No correct LR-sets for a word

Conclusions Discovering Coherent Topics Using General Knowledge

Word Correlation + GPU

Relaxing wrong sets for a word

Partially wrong knowledge

No correct LR-sets for a word

Datasets: http://www.cs.uic.edu/~zchen/

Datasets: http://www.cs.uic.edu/~zchen/

top related