using content and interactions for discovering communities in

54
Using Content and Interactions for Discovering Communities in Social Networks IBM Research India

Upload: moresmile

Post on 11-May-2015

2.280 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Using content and interactions for discovering communities in

Using Content and Interactions for Discovering Communities in Social Networks

IBM Research India

Page 2: Using content and interactions for discovering communities in

Abstract Problem: discovering meaningful

communities from a social network We propose generative models that can

discover communities based on the discussed topics, interaction types and the social connections among people.

Person->multiple communities->multiple topics

We discover both community interests and user interests based on the information and linked associations.

Page 3: Using content and interactions for discovering communities in

Introduction Background:

rich data -> academia & business; discover relationships -> discover community

A community is a collection of users as a group such that there is high relatedness among people within the group.

One common approach used is to treat communities as group of nodes in social network that are more densely connected among themselves than with the rest of the network. A graph clustering problem

Page 4: Using content and interactions for discovering communities in

We consider communities as “groups of users(nodes) who are interconnected and communicate on shared topics”.

1. 采用 Bayesian models 来提取潜在的communities 。模型假设:社区关系是依赖于用户间感兴趣的 topics 和他们之间的链接关系的。这种方法有助于发现用户兴趣和他在网络中的角色。同时还能发现一个社区里流行的话题。所以,给定一个主题或兴趣时,就可以以此找到相关的社区。

Page 5: Using content and interactions for discovering communities in

2. We also utilize the “type” of interactions between users to emphasize their interest in topics, and thus community membership.

e.g, conversation vs broadcast

3. 两种社交网络: 1. 用户的 posts 广播给他的邻居;2. 用户只能直接给其他人发送 posts (比如 email networks );所以本文推荐了两种不同的方法对应两种不同的网络结构。

假设: post 只讨论单个 topic ,为了减少模型训练时间。但是当 post 很长时,这个假设就不合适了,所以本文同时给出了另一个模型适应这个问题。

Page 6: Using content and interactions for discovering communities in

PRIOR WORK第一种:只考虑用户间的 links 。不考虑其他

节点特性和 user interactions. 不允许一个user 属于多个 communities 。

第二种: Bayesian probabilistic models . 可以解决一对多的问题,但仍太依赖于 link structure 来发现 communities.

第三种:利用语义内容来发现communities 。 Communities are modeled as random mixtures over users who in turn have a topical distribution (interest) associated with them. 没有利用链接信息。

Page 7: Using content and interactions for discovering communities in

CUT(Community-User-Topic): 假设通过讨论特定话题结成社区的成员之间是

连接的。 graph structure and interactions

between users 两个模型 CUT1 和 CUT2 :

CUT1 只考虑社区与成员的关系,所发现的子社区更侧重于成员间联系的紧密程度,与基于图论的社区发现算法得到的结果很相似

CUT2 只考虑社区与主题间的联系,所发现的子社区更侧重于成员所关注主题的紧密程度。

Page 8: Using content and interactions for discovering communities in

CART( Community -Author-Recipient-Topic) 将内容和连接关系结合利用 适用于提取 email network 中的社区,不适用于

twitter 这种 broadcast 的网络

Page 9: Using content and interactions for discovering communities in

COMMUNITY DISCOVERY MODELS 两类网络:

1. Twitter, Facebook: 一个 post 是关于这个 user 自己的兴趣话题,不考虑接收者的兴趣

2. Email: post 的话题意味着发送者和接收者双方共同的兴趣话题

Notation : U, Ri, Pij, Pi, P, Np, Wp,Xp c, z

Page 10: Using content and interactions for discovering communities in

Topic User Community Model

假设一个用户可以属于多个社区,也可以对多个话题感兴趣

模型中,利用交互类型来提升社区发现;交互类型反映了两个用户间联系的强度和他们对于一个话题的兴趣。

每个 user 有自己的 Interaction space

Page 11: Using content and interactions for discovering communities in
Page 12: Using content and interactions for discovering communities in

参数估计:Gibbs Sampling: 随机指派 -> 更新等式

更新等式:

Page 13: Using content and interactions for discovering communities in

The procedure for the Gibbs Inference :

the worst time complexity: O ( IPCXZ + IW )

Page 14: Using content and interactions for discovering communities in

Topics can be computed using the approximation

P(u|c), P (z|c)

Page 15: Using content and interactions for discovering communities in

Topic User Recipient Community Model1

Page 16: Using content and interactions for discovering communities in

Topic User Recipient Community Model2

Page 17: Using content and interactions for discovering communities in

Full TURCM generating a topic for each word in a

post(instead of generating atopic per post)

Page 18: Using content and interactions for discovering communities in
Page 19: Using content and interactions for discovering communities in

参数估计:更新等式

Page 20: Using content and interactions for discovering communities in
Page 21: Using content and interactions for discovering communities in

EXPERIMENTS Datasets:

Twitter over a period of six months in 2009

Enron Email corpus

we set the number of communities C at 10 and topics Z at 20

We ran 1000 iterations to burn in and took 250 samples (every fourth sample) in the next 1000 iterations .

Page 22: Using content and interactions for discovering communities in

Qualitative Analysis

Page 23: Using content and interactions for discovering communities in
Page 24: Using content and interactions for discovering communities in
Page 25: Using content and interactions for discovering communities in
Page 26: Using content and interactions for discovering communities in

Community Analysis

Page 27: Using content and interactions for discovering communities in
Page 28: Using content and interactions for discovering communities in

Perplexity Analysis it measures the log likelihood of generating

unseen data after learning from a fraction of data.

Page 29: Using content and interactions for discovering communities in
Page 30: Using content and interactions for discovering communities in

Runtime Analysis

Page 31: Using content and interactions for discovering communities in

CONCLUSION

we proposed probabilistic schemes that incorporate topics, social relation ships and nature of posts for more effective community discovery .

Interaction types are important

Page 32: Using content and interactions for discovering communities in

Community Detection in Incomplete Information Networks

Page 33: Using content and interactions for discovering communities in

Abstract detecting communities in incomplete

information networks with missing edges. 1. learn a distance metric to reproduce the link-

based distance between nodes from the observed edges in the local information regions

2. Use the learned distance metric to estimate the distance between any pair of nodes in the network.

A hierarchical clustering approach

Page 34: Using content and interactions for discovering communities in

INTRODUCTION The community is defined as a group of

nodes which are densely connected inside the group, while loosely connected with the nodes outside the group.

The local regions with complete linkage information are called local information regions .

Terrorist-attack network . Food Web

Page 35: Using content and interactions for discovering communities in
Page 36: Using content and interactions for discovering communities in

contributes We identify and define the problem of

community detection in incomplete information networks with local information regions

Then a metric, which can be used to measure the distance between any pair of nodes, is learned.

Based on the learned metric, we devise a distance-based modularity function to evaluate the quality of the communities.

We propose a distance-based algorithm DSHRINKwhich can discover the hierarchical and overlapped communities.

Page 37: Using content and interactions for discovering communities in

RELATED WORK

1. focused on the topological structures 2. Some graph clustering methods which

based on attributes. 3. some clustering methods based on

both links and attributes were also proposed

Page 38: Using content and interactions for discovering communities in

PROBLEM DEFINITION

Page 39: Using content and interactions for discovering communities in
Page 40: Using content and interactions for discovering communities in

OPTIMIZATION FRAMEWORK

Page 41: Using content and interactions for discovering communities in

Diagonal form of M

full matrix of M

Page 42: Using content and interactions for discovering communities in

DISTANCE-BASED CLUSTERING Distanced-based Modularity

Page 43: Using content and interactions for discovering communities in
Page 44: Using content and interactions for discovering communities in

Clustering Algorithm

Page 45: Using content and interactions for discovering communities in
Page 46: Using content and interactions for discovering communities in

Speeding up the Clustering Process withApproximation

Page 47: Using content and interactions for discovering communities in

EXPERIMENTS Data Sets DBLP-A Dataset: DBLP-A is the data set

extracted from DBLP database which provides bibliographic information on computer science journals and proceeding.

DBLP-B Dataset:

Page 48: Using content and interactions for discovering communities in

Incomplete Information Network Generation

Snowball sampling

parameter p ,called sample ratio parameter q ,called local information region

size

Page 49: Using content and interactions for discovering communities in

Evaluation Measures The definition of purity is as follows:

each cluster is first assigned with the most frequent class in the cluster, and then the purity is measured by computing the number of the instances assigned with thesame labels in all clusters.

Page 50: Using content and interactions for discovering communities in

Compared Methods Kmeans: Md +DSHRINK: We learn a diagonal

Mahalanobis matrix Md and use it as the input of M for DSHRINK.

Mf +DSHRINK: We learn a full Mahalanobis matrix Mf and use it as the input of M for DSHRINK.

Page 51: Using content and interactions for discovering communities in

Effectiveness Results

Page 52: Using content and interactions for discovering communities in
Page 53: Using content and interactions for discovering communities in

Efficiency Results

Page 54: Using content and interactions for discovering communities in

CONCLUSION

a global metric distance-based modularity function a distance-based clustering algorithm

DSHRINK Approximation strategies