network theory iii david lusseau biol4062/5062 [email protected]

25
Network theory III David Lusseau BIOL4062/5062 [email protected]

Upload: wilfred-gordon

Post on 17-Jan-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Network theory III

David Lusseau

BIOL4062/5062

[email protected]

Page 2: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Outline

16 March: community structure

Suggested readings: Newman M.E.J. 2003. The structure and function of complex

networks. SIAM Review 45,167-256

Page 3: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

What is a community?

A cluster of individuals that are more linked to one another than to others

Page 4: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Traditional techniques

Cluster analysis (hierarchical)

Multi-Dimensional Scaling

Principal Coordinate Analysis

Page 5: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Traditional techniques

How representative is the result? Loss of information measure: Stress in MDS

What is the best division? Cluster analysis Peripheral individuals are lumped together

Page 6: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Girvan-Newman algorithm

Divisive clustering algorithm Divide a population of n vertices in 1 to n communities

Find the boundaries of communities Weakest link between communities: edge betweenness

Standardise betweenness at each step Re-calculate edge betweenness at each step

Page 7: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Zachary karate club

Girvan & Newman 2002 PNAS

Page 8: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Finding the best division For each step calculate a modularity coefficient

Best division will have the most edges within communities and the least between Take community size into consideration

2i

iii aeQ

1 2 3

1 30 2 5

2 2 10 2

3 5 2 50

j

iji ea

))

108

57(

108

50())

108

14(

108

10())

108

37(

108

30(Q 222

Q=0.42

Page 9: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Zachary karate club

Newman & Girvan 2003 Physics Review E

Page 10: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Modularity coefficient

The principle of modularity coefficient optimisation can be apply to any community structure algorithm

Page 11: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Extension to weighted matrices Edge betweenness

Transform similarity matrix into dissimilarity matrix Calculate geodesic path using Djikstra’ algorithm

Problem: more likely to remove edges between strongly connected pairs

Page 12: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Alternative: Modularity optimisation Forget edge betweenness

Optimise for high Q!

Computer intensive

Prone to false minima Difficult to find out Iterate the optimisation to detect

Not always successful

Page 13: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Modularity- Greedy algorithm

Start with n communities (agglomerative clustering method)

At each step link the communities that provides the greatest increase (or the smallest decrease in Q)

Page 14: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Q optimisation

Girvan-Newman

Modularity- Greedy algorithm

Page 15: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Overlapping communities

Recognise that some individuals sit on the fence Do not force them in one community or the other

but identify them as overlapping

Palla et al. 2005 Nature

Page 16: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Palla algorithm Based on the k-clique principle: a community is composed of a number of k-cliques

k-cliques: fully connected subgraphs of k vertices

Adjacent k-cliques share k-1 vertices

Community: series of adjacent cliques

Palla et al. 2005 Nature

Page 17: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Palla algorithm Find all k-cliques Calculate the clique-clique overlap matrix Define adjacent cliques

Issues (and advantages): k is user-defined, find ‘best’ k by trial and error Works only on binary networks

(weighted network transformation)

Palla et al. 2005 Nature

Page 18: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Simply the best method

Page 19: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Modularity matrix

A matrix? Let’s eigenanalyse!

Let’s rewrite the modularity coefficient:

jiij

jiij ss

m

kkA

mQ )

2(

4

1

Links distributed at random

Community identification

Newman 2006 PNAS

Page 20: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Modularity matrix

Sum rows and sum of columns = 0 One eigenvector (1,1,1….) with eigenvalue 0 Graph Laplacian

Eigenvector of the dominant eigenvalue gives the best community division into 2 communities (negative and positive elements)

)2

(m

kkAB ji

ijij

Page 21: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Magnitude of eigenvector elements Tells us how well a vertex is classified (whether

it belongs to the core or the periphery of the community)

Zachary karate club

Page 22: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Finding the best division

Repeat the process on each subgraph

Recalculate the modularity coefficient for the whole graph

If new division makes 0 or <0 contribution to modularity then do not do it

Else continue

Page 23: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Power of modularity matrix method Different types of null models can be tested

As long as we have One eigenvector (1,1,1….) with eigenvalue 0

To do so, substract sum of rows from diagonal

jiij

ijij ssPAm

Q )(2

1

Page 24: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Uncertainty

Bootstrapped algorithm m results from community algorithm

Matrix: likelihood that 2 individuals belong to the same community

Coarse-grain community identity Provides uncertainty overlap

Page 25: Network theory III David Lusseau BIOL4062/5062 d.lusseau@dal.ca

Girvan-Newman in NetdrawModularity matrix in Socprog