anant pradhan

26
Anant Pradhan PET: A Statistical Model for Popular Events Tracking in Social Communities Cindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han (UIUC)

Upload: moral

Post on 25-Feb-2016

59 views

Category:

Documents


0 download

DESCRIPTION

PET: A Statistical Model for Popular Events Tracking in Social Communities. Cindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han (UIUC). Anant Pradhan. Introduction. Challenge: Tracking the evolution of a popular topic. 2. Introduction. Observing and tracking: Popular events - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Anant Pradhan

Anant Pradhan

PET: A Statistical Model forPopular Events Tracking in Social CommunitiesCindy Xide Lin, Bo Zhao, Qiaozhu Mei, Jiawei Han (UIUC)

Page 2: Anant Pradhan

Introduction

Challenge: Tracking the evolution of a popular topic

2

Page 3: Anant Pradhan

Introduction• Observing and tracking:– Popular events– Topics that evolve over time

• Existing approaches focus on:– Burstiness – Evolution of networks– Ignore interplay between textual topics and

network structures.3

Page 4: Anant Pradhan

• Propose a novel statistical method (PET) that:– Models the popularity of events over time– Considers burstiness of user interest– Information diffusion on the network structure– Evolution of textual topics

Introduction

4

Page 5: Anant Pradhan

Introduction

• Gibbs Random Field used to model:– Influence of historical status – Dependency relationships in the graph

• Topic Model:– designed to explain the generation of text data

• Interplay by regularizing each other.5

Page 6: Anant Pradhan

Problem Definition• Set of vertices: Vk

• Set of edges: Ek

• Network Stream: G = {G1, G2, · · ·, GT}• Snapshot of network: Gk = {Vk, Ek}• Document Stream: D = {D1,D2, · · ·, DT}• Topic: θ• Event: ΘE = {θE

0, θE1, θE

2,· · ·, θET}

• Interest: Hk = {hk(1), hk(2), · · ·, hk(N)}6

Page 7: Anant Pradhan

Problem Definition

• Event-related information in a social community:– An observed stream of network structures– An observed stream of text documents– A latent stream of topics about the event– A latent stream of interests

7

Page 8: Anant Pradhan

The General Model

• Task is cast as the inference of previous Hk and Θk: P(Hk,Θk|Gk, Dk, Hk−1)

• Assumption 1: Current interest status Hk is independent of the document collection Dk

• Assumption 2: Current topic model θk is independent of the network structure Gk and the previous interest status Hk−1

8

Page 9: Anant Pradhan

• From the assumptions:P(Hk,Θk|Gk,Dk,Hk−1) = P(Hk|Gk,Hk−1) · P(Θk|Hk,Dk)

The General Model

Interest Model Topic Model

9

Page 10: Anant Pradhan

The Interest Model

• Modelled as a Gibbs Random Field on the network Gk

• Uses specially designed potential functions

• Uses weighting scheme motivated by real world networks

10

Page 11: Anant Pradhan

The Topic Model

• Models historical interest status and relationships on the network.

• Allows the topics and popularity of the events to mutually influence each other over time.

• P(Θk|Hk,Dk) P(D∝ k|Hk,Θk) P(Θk|Hk)

11

Page 12: Anant Pradhan

Connection to Existing Models

• Special cases of PET under certain conditions.

• The State Automation Model: – When the network effect is omitted

• The Contagion Model– When the topic effect is omitted

12

Page 13: Anant Pradhan

Complexity Analysis

• PLSA (Probabilistic Latent Semantic Analysis): O((N +M)mt)

PET: O(NMmT)N documents involving t topics with M words, m rounds and time T.

• Reasonable.13

Page 14: Anant Pradhan

Experiments

• JonK: State automation model. First Baseline.• Cont: The contagion model. Second Baseline.• PET- : PET minus network structures.• BOM: Box Office Earning. Gold Standard for

movie-related events.• GInt: Google Insight. Gold Standard for news

related events. 14

Page 15: Anant Pradhan

Experiments

• Twitter– 5000 users– 1,438,826 tweets– From Oct 2009 to Jan 2010– Events: 2 movies (Avatar, Twilight)

2 news events (Tiger Woods affair, Copenhagen Climate Conference)

15

Page 16: Anant Pradhan

• Setup:λT: Interest model. Weight for historical info.λA: Interest model. Weight for structural info.μE: Topic model.

λT = 1 λA = 3 μE = 1

Experiments

16

Page 17: Anant Pradhan

17

Page 18: Anant Pradhan

18

Page 19: Anant Pradhan

Result Analysis

• PET has the best performance.

• Cont has the worst performance.

• JonK generally performs well, but less accurate than PET.

19

Page 20: Anant Pradhan

Network Diffusion Analysis

• Cont can’t tell the difference between interest levels.

• Both PET and PET– are able to catch the rising trend of popularity.

• PET is still superior.

20

Page 21: Anant Pradhan

21

Page 22: Anant Pradhan

22

Page 23: Anant Pradhan

Events Analysis on DBLP• For popular events, PET generates:– More accurate trends– smoother diffusion– meaningful content

evolution

23

Page 24: Anant Pradhan

Future Work

• Apply this model to track evolution of ideas, scientific innovation.

• Real-time event search system.

Page 25: Anant Pradhan

Conclusion

• A novel approach.

• Experimental evidence is convincing.

• Complexity might be a reason of concern.

Page 26: Anant Pradhan

Thank you.

Questions?