graph analysis: community detection - github pages · giri iyengar (cornell tech) graph analysis...

28
Graph Analysis: Community Detection Giri Iyengar Cornell University [email protected] March 19, 2018 Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 1 / 25

Upload: others

Post on 24-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Graph Analysis: Community Detection

    Giri Iyengar

    Cornell University

    [email protected]

    March 19, 2018

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 1 / 25

  • Overview

    1 Overview

    2 Spectral Clustering

    3 Markov Cluster Algorithm

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 2 / 25

  • Overview

    1 Overview

    2 Spectral Clustering

    3 Markov Cluster Algorithm

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 3 / 25

  • Graph Partitioning

    Partitioning a GraphIn mathematics, the graph partition problem is defined on data representedin the form of a graph G = (V,E), with V vertices and E edges, such thatit is possible to partition G into smaller components with specificproperties. For instance, a k-way partition divides the vertex set into ksmaller components. A good partition is defined as one in which thenumber of edges running between separated components is small. Uniformgraph partition is a type of graph partitioning problem that consists ofdividing a graph into components, such that the components are of aboutthe same size and there are few connections between the components.

    Source: Wikipedia

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 4 / 25

  • Graph Partitioning

    Partitioning a GraphImportant applications of graph partitioning include scientific computing,partitioning various stages of a VLSI design circuit and task scheduling inmulti-processor systems. Recently, the graph partition problem has gainedimportance due to its application for clustering and detection of cliques insocial, pathological and biological networks. For a survey on recent trendsin computational methods and applications see Buluc et al. (2013).

    Source: Wikipedia

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 5 / 25

  • Graph Partitioning

    Natural clusters have nodes withmany edges between themAnd few edges across clustersA random walk on the graphwill infrequently go from onecluster to anotherAdjacency matrix will haveblock-like structureEigenvalues of Adjacency /Laplacian matrix will berevealing

    Figure: Carlos Castillo, SapienzaUniversity

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 6 / 25

  • Overview

    1 Overview

    2 Spectral Clustering

    3 Markov Cluster Algorithm

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 7 / 25

  • Spectral Clustering

    Spectral ClusteringIn multivariate statistics and the clustering of data, spectral clusteringtechniques make use of the spectrum (eigenvalues) of the similarity matrixof the data to perform dimensionality reduction before clustering in fewerdimensions. The similarity matrix is provided as an input and consists of aquantitative assessment of the relative similarity of each pair of points inthe dataset.

    Source: Wikipedia

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 8 / 25

  • Spectral Clustering

    Initially used for Image Segmentation

    Generally recognized as a technique to cluster dataAn alternative to K-MeansWidely used in Graph Partitioning / Community Detection

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 9 / 25

  • Spectral Clustering

    Initially used for Image SegmentationGenerally recognized as a technique to cluster data

    An alternative to K-MeansWidely used in Graph Partitioning / Community Detection

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 9 / 25

  • Spectral Clustering

    Initially used for Image SegmentationGenerally recognized as a technique to cluster dataAn alternative to K-Means

    Widely used in Graph Partitioning / Community Detection

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 9 / 25

  • Spectral Clustering

    Initially used for Image SegmentationGenerally recognized as a technique to cluster dataAn alternative to K-MeansWidely used in Graph Partitioning / Community Detection

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 9 / 25

  • Spectral Clustering: Definitions

    Graph: G(V, E)Weighted Graph. That is between vi and vj , there is a weight wijUndirected. wij == wjiDefine degree of vertex di =

    ∑nj=1 wij

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 10 / 25

  • Spectral Clustering: Definitions

    W(A,B) =∑

    i∈A,j∈B wij . A, B ⊂ V|A|: Number of vertices in Avol(A) =

    ∑i∈A di

    A ∩ Ā = ∅

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 11 / 25

  • Spectral Clustering: Properties of Laplacian

    fT Lf = 12∑n

    i,j=1 wij(fi − fj)2

    fT Lf = fT Df − fT Af =∑n

    i=1 dif2i −

    ∑ni,j=1 wijfifj

    fT Lf = 12(∑n

    i=1 dif2i − 2

    ∑ni,j=1 wijfifj +

    ∑nj=1 djf

    2j )

    L is symmetric (follows from symmetry of D and A)L is positive semi-definiteSmallest eigenvalue of L is 0 and has 1n as the eigenvector

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 12 / 25

  • Spectral Clustering: Properties of Laplacian

    Number of connected componentsNumber of connected components in the Graph is given by multiplicity ofeigenvalue 0. Trivial to see from previous slide.

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 13 / 25

  • Spectral Clustering Algorithm

    Compute the un-normalized Laplacian LCompute the first k eigenvectors of L. These are the smallest keigenvectorsU matrix of the first k eigenvectors is an n× k matrixTake the Rk row-vectors of UPerform k-way K-Means on these vectors

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 14 / 25

  • Variations of Spectral Clustering

    Instead of the un-normalized Laplacian, start with normalizedLaplacian matricesInstead of solving the eigenvalue problem, solve the generalizedeigenvalue problem instead

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 15 / 25

  • Spectral Clustering for Image Segmentation

    Figure: Source - Scikit-learn Figure: Source - Scikit-learn

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 16 / 25

  • Overview

    1 Overview

    2 Spectral Clustering

    3 Markov Cluster Algorithm

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 17 / 25

  • Markov Cluster Algorithm

    MCL Simulation

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 18 / 25

    https://micans.org/mcl/ani/mcl-animation.html

  • MCL Algorithm

    MCLThe MCL algorithm is short for the Markov Cluster Algorithm, a fast andscalable unsupervised cluster algorithm for graphs (also known asnetworks) based on simulation of (stochastic) flow in graphs.

    Heavily used in clustering Proteins based on their Amino acid sequences

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 19 / 25

  • MCL Algorithm

    Finds clusters by simulating random walksAssumption: Random walks in a graph with tend to spend more timein Natural clustersRandom walks will infrequently go between clustersWithin a cluster, much more likely to find longer-length random walkbetween any pair of nodes

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 20 / 25

  • MCL Algorithm

    Start with a Column Stochastic matrix of the GraphExpansion Step: Power iteration of the MatrixInflation Step: Hadamard power followed by normalizationRepeat Expansion and Inflation till convergence

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 21 / 25

  • MCL Algorithm

    M1: Column Stochastic MatrixExpansion: M2 = M1 ×M1Hadamard: M3 = [mr2ij ]. Typically r = 2Normalization: M1 = [ m3ij∑

    jm3ij

    ]

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 22 / 25

  • Hadamard power operation on Matrix

    [0.8 0.6 0.90.2 0.4 0.1

    ] [0.64 0.36 0.810.04 0.16 0.01

    ]→

    [0.94 0.69 0.990.06 0.31 0.01

    ]

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 23 / 25

  • MCL Algorithm

    Figure: Source - Stijn van Dongen

    Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 24 / 25

  • Giri Iyengar (Cornell Tech) Graph Analysis March 19, 2018 25 / 25

    OverviewSpectral ClusteringMarkov Cluster Algorithm