presentation: genetic clustering of social networks using random walks elsevier computational...

13
Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering of social networks using random walks Aykut Firat, Sangit Chatterjee, Mustafa Yilmaz College of Business Administration, Northeastern University, Boston, MA 02115, USA Presented by Oleg Kolgushev Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

Upload: matilda-bruce

Post on 18-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering

Presentation: Genetic clustering of social networks using random walks

ELSEVIER

Computational Statistics & Data AnalysisFebruary 2007

Genetic clustering of social networks using random walks

Aykut Firat, Sangit Chatterjee, Mustafa Yilmaz

College of Business Administration, Northeastern University, Boston, MA 02115, USA

Presented by Oleg Kolgushev

Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

Page 2: Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering

Presentation: Genetic clustering of social networks using random walks

• Introduction to Clustering in networks

• Random walk based distance measure

• Genetic representation

• Experiments– Synthetic data creation– Network clustering experiments– Spatial data experiments

• Conclusion

Contents

Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

Page 3: Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering

Presentation: Genetic clustering of social networks using random walks

• Popularity of social networks

• Mathematical model is a dream. Use heuristic techniques.

• Clustering is NP-hard problem.

• Genetic algorithm with medoid based representation.

• Random walk measure is superior to Euclidian distance.

Introduction

Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

Page 4: Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering

Presentation: Genetic clustering of social networks using random walks

• Network is represented by weighted graph (V,E,w) where w is a measure of similarity between vertices.

• Objective is to find decomposition into k-clusters (non-overlapping sub-graphs highly connected vertices)

• Random walker will likely to stay inside of a cluster until most of vertices are visited.

• Calculating “escape probabilities”.

• GA fitness function classifies a node based on sum of edges in a cluster versus sum of edges leading to different sets.

Background

Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

Page 5: Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering

Presentation: Genetic clustering of social networks using random walks

• Average First time passage m(i,j)

• Average Commute Time (ACT)

• In matrix and vector multiplication it represented as

• Where

• ui = [0100…0], L=D-A, A is similarity matrix (wij), e - is a column vector made of [1111…1] , and

Random walk based distance

Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

Page 6: Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering

Presentation: Genetic clustering of social networks using random walks

• This measure is appealing for social networks as clustered nodes connect by lots of short paths, clusters are not similar sizes and not spherically shaped.

Random walk based distance

Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

Page 7: Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering

Presentation: Genetic clustering of social networks using random walks

• GA is a computer simulation of evolution processes (inheritance, mutation, selection, and crossover).

• Representation is a key value– Array of size N (nodes in graph)elements restricted by k (clusters)– k-bins with elements restrictedby N (nodes)– k-medoids are clusters representedby one node and other nodes are assigned to the nearest cluster

• Possible gene is [3,7] with assignment [{1,2,3,4},{5,6,7,8}]

• Small genome, tight clustering.

Genetic Representation

Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

Page 8: Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering

Presentation: Genetic clustering of social networks using random walks

• Exception bin contains nodes that do not obey representation by the medoid.

• Possible gene [3,7] suggests allocation [{1,2,3,4},{5,6,7,8}] with exception [3,7{5,6},{2}]

• Crossover defined by randomly interchanging genes• Mutation is mode of exception creation based on proximity • Fitness function used: inverse of the sum of the distances to the medoids; inverse of

the sum of all pair-wise distances within a group; min sum of all pair-wise distances between nodes .

Medoid-based representation with exception bins

Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

Page 9: Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering

Presentation: Genetic clustering of social networks using random walks

• How accurate are the clustering results compare to Euclidian distance clustering?

• How efficient this approach and what is algorithm complexity?• Synthetic data creation:

Experiments

Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

Page 10: Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering

Presentation: Genetic clustering of social networks using random walks

• Example of 50 nodes network with 6 clusters shown.

Network clustering experiments

Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

Page 11: Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering

Presentation: Genetic clustering of social networks using random walks

Network clustering experiments

Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

Page 12: Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering

Presentation: Genetic clustering of social networks using random walks

• Results of transformation and clustering of 150 iris specimens, 50 from each of three species (Fisher’s Iris data)

Spacial data experiments

Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21

Page 13: Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering

Presentation: Genetic clustering of social networks using random walks

• O(n3) limit applicability of random walk distances for large network

• Excellent result when number of clusters is known. What k is right?

• Superior results compare to Euclidian distances regardless of clustering algorithm used.

• Exceptionally good clustering results for representing spacial data as a network when optimum number of nearest neighbors used.

Conclusion

Computational Epidemiology Research Lab (CERL) - Department of Computer Science and Engineering - University of North Texas - 2011/03/21