a geometric interpretation of gene co-expression network analysis steve horvath, jun dong

53
A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Upload: calla

Post on 24-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong. Outline. Network and network concepts Approximately factorizable networks Gene Co-expression Network Eigengene Factorizability, Eigengene Conformity Eigengene-based network concepts - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

A Geometric Interpretation of Gene Co-Expression Network

Analysis Steve Horvath, Jun Dong

Page 2: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Outline

• Network and network concepts

• Approximately factorizable networks

• Gene Co-expression Network– Eigengene Factorizability, Eigengene

Conformity– Eigengene-based network concepts

• What can we learn from the geometric interpretation?

Page 3: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Network=Adjacency Matrix

• A network can be represented by an adjacency matrix, A=[aij], that encodes whether/how a pair of nodes is connected.– A is a symmetric matrix with entries in [0,1] – For unweighted network, entries are 1 or 0

depending on whether or not 2 nodes are adjacent (connected)

– For weighted networks, the adjacency matrix reports the connection strength between node pairs

– Our convention: diagonal elements of A are all 1.

Page 4: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Motivational example I:Pair-wise relationships between genes across

different mouse tissues and genders

Challenge:Develop simple descriptive measures that describe the patterns.

Solution: The following network concepts are useful: density, centralization,clustering coefficient, heterogeneity

Page 5: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Motivational example (continued)

Challenge: Find a simple measure for describing the relationship between gene significance and connectivity

Solution: network concept called hub gene significance

Page 6: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Backgrounds

• Network concepts are also known as network statistics or network indices– Examples: connectivity (degree), clustering

coefficient, topological overlap, etc

• Network concepts underlie network language and systems biological modeling.

• Dozens of potentially useful network concepts are known from graph theory.

Page 7: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Review of some fundamental network

concepts which are defined for all networks (not just co-

expression networks)

Page 8: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Connectivity

• Node connectivity = row sum of the adjacency matrix– For unweighted networks=number of direct

neighbors– For weighted networks= sum of connection

strengths to other nodes

iScaled connectivity=Kmax( )

i i ijj i

i

Connectivity k a

k

k

Page 9: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Density

• Density= mean adjacency• Highly related to mean connectivity

( )

( 1) 1

where is the number of network nodes.

iji j ia mean k

Densityn n n

n

Page 10: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Centralization

Centralization = 1

because it has a star topology

Centralization = 0

because all nodes have the same connectivity of 2

max( ) max( )

2 1 1

n k kCentralization Density Density

n n n

= 1 if the network has a star topology

= 0 if all nodes have the same connectivity

Page 11: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Heterogeneity

• Heterogeneity: coefficient of variation of the connectivity

• Highly heterogeneous networks exhibit hubs

( )

( )

variance kHeterogeneity

mean k

Page 12: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Clustering Coefficient

Measures the cliquishness of a particular node« A node is cliquish if its neighbors know each other »

Clustering Coef of the black node = 0

Clustering Coef = 1

,

22

il lm mil i m i li

il ill i l i

a a aClusterCoef

a a

This generalizes directly to weightednetworks (Zhang and Horvath 2005)

Page 13: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

The topological overlap dissimilarity is used as input of hierarchical clustering

• Generalized in Zhang and Horvath (2005) to the case of weighted networks• Generalized in Li and Horvath (2006) to multiple nodes• Generalized in Yip and Horvath (2007) to higher order interactions

,

min( , ) 1

iu uj iju i j

iji j ij

a a a

TOMk k a

1ij ijDistTOM TOM

Page 14: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Network Significance

• Defined as average gene significance• We often refer to the network significance of a

module network as module significance.

iGSNetworkSignif

n

Page 15: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Hub Gene Significance=slope of the regression line (intercept=0)

2( )i i

i

GS KHubGeneSignif

K

Page 16: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Q: What do all of these fundamental network concepts have in common?

They are functions of the adjacency matrix A and/or a gene significance measure GS.

Page 17: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

CHALLENGEFind relationships between these and other

seemingly disparate network concepts.• For general networks, this is a difficult problem.• But a solution exists for a special subclass of

networks: approximately factorizable networks

Page 18: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Definition of an approximately factorizable network

Definitions:

The adjacency matrix A is if

there exists a vector CF with non-negative elements such that

for all

is referred to as the of the

approximately factorizable

conformity

ij i j

i

a CFCF i j

CF

i-th node

Why is this relevant?Answer: Because modules are often approximately factorizable

Page 19: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Algorithmic definition of the conformity and a measure of factorizability

We use an iterative algorithm to approximate the conformity vector CF.

A measure of factorizability F(A) is defined as ( ).

Conceptually related to a factor analysis of A.

AF CF

2

2

We define the conformity as a maximizer of the factorizability function

( )( ) 1

( )

ij i ji j iA

iji j i

a v vF v

a

Page 20: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Empirical Observation 1

• Sub-networks comprised of module genes tend to be approximately factorizable, i.e.

for all ij i ja CFCF i j

This observation implies the following observation 2…

Empirical evidence is provided in the following article:Dong J, Horvath S (2007) Understanding Network Concepts in Modules BMC Systems Biology 2007, 1:24

Page 21: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Observation 2: Approximate relationships among network concepts in approximately

factorizable networks

22

2

2[1]

1

max( , )1

1

where [1] denotes the index of the most highly connected hub

i jij

j

mean ClusterCoef Heterogeneity Density

k kTopOverlap Heterogeneity

n

TopOverlap Centralization Density Heterogeneity

Page 22: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Drosophila PPI module networks: the relationship between fundamental network concepts.

Page 23: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

What if we focus on gene co-expression network?

Page 24: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Weighted Gene Co-expression Network

[ ] [| ( , ) | ]

where is the expression profile for gene ,

and mathematically a vector of expression values

across multiple samples.

ij i j

i

A a cor x x

x i

Note: Unweighted Network is

[ ] [ (| ( , ) | )]

where (.) is an indicator function.

ij i jA a I cor x x

I

Page 25: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

brown

123456789

101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185

brown

-0

.1

0.0

0.1

0.2

0.3

0.4

Module Eigengene= measure of over-expression=average redness

Rows,=genes, Columns=microarray

The brown module eigengenes across samples

Page 26: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Recall that the module eigengene is defined by the singular value

decomposition of X• X=gene expression data of a module • Aside: gene expressions (rows) have been

standardized across samples (columns)

1 2

1 2

1 2

1

( )

( )

(| |,| |, ,| |)

Message: is the module eigengene E

T

m

m

m

X UDV

U u u u

V v v v

D diag d d d

v

Page 27: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Question: When are co-expression modules factorizable?

Page 28: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Question: Characterize gene expression data X that lead to an approximately

factorizable correlation matrix

4 21

4 2

Solution:

Define the eigengene based factorizability as follows

| | || ( ) ( ) ||EF( ) 1

| | || ( ) ||

where ( , ).

Thus, cor(X) is approximately factorizable if EF(X) 1.

F

j Fj

i i

d cor X C CX

d cor X

C cor x E

Page 29: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Note that a factorizable correlation matrix implies a factorizable weighted co-expression network

,

, ,

| ( , ) |

| ( , ) | | ( , ) |

i j i j

i j e i e j

a cor x x

cor x E cor x E a a

, | ( , ) |e i ia cor x E

We refer to the following as weighted eigengene conformity

Page 30: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

If ( ) 1EF X

Page 31: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Theoretical relationships in co-expression modules with high

eigengene factorizability

Page 32: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

i

e,i

Result "Group conform behavior leads to a lot of friends."

More precisely, the scaled intramodular connectivity K

approximates the eigengene conformity, i.e. a | ( , ) | .

Message: the smalli iK cor x E

er the angle between and ,

the more connected is the i-th gene.ix E

Page 33: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

T

Result about hub gene significance:

Given a trait based gene significance measure GS ( ) | ( , ) | ,

the hub gene significance approximates the eigengene significance,

HGS | ( , ) | .

Message: the smal

ii cor x T

cor E T

ler the angle between and ,

the higher is the trait-significance of intramodular hubs

and the higher is the module significance (average GS).

E T

Page 34: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

What can network theorists learn from the geometric interpretation?

Some examples…

Page 35: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Problem

• Show that genes that lie intermediate between two distinct co-expression modules cannot be hub genes in these modules.

Page 36: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

gene 2

gene 1

k(2)

intermediate

hub in module 1

eige

ngen

e E2

eigengene E1

Geometric Solution

Page 37: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Problem

• Setting: a co-expression network and a trait based gene significance measure GS(i)=|cor(x(i),T)|

• Describe a situation when the sample trait (T1) leads to a trait-based gene significance measure with low hub gene significance

• Describe a situation when the sample trait (T2) leads to a trait-based gene significance measure with high hub gene significance

Page 38: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Intramodular Connectivity k

Gene SignificanceGS2(x)=|cor(x,T2)|

GS1(x)=|cor(x,T1)|

Another way of stating the problem: Find T2 and T1 such that

Page 39: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

gene 2gene 1

Sample Trait T2cor(E,T2)

k(2)

k(1)

Sam

ple

Trai

t T1

GS1(1)

eigengene E

Solution

Page 40: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

What can a microarray data analyst learn from the

geometric interpretation?

Page 41: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Some insights• Intramodular hub gene= a genes that is highly

correlated with the module eigengene, i.e. it is a good representative of a module

• Gene screening strategies that use intramodular connectivity amount to path-way based gene screening methods

• Intramodular connectivity is a highly reproducible “fuzzy” measure of module membership.

• Network concepts are useful for describing pairwise interaction patterns.

Page 42: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

The module eigengene is highly correlated with the most highly connected hub gene.

Page 43: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Dictionary for translating between general network terms and the eigengene-basedcounterparts.

Page 44: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

If also ,max ( ) 1j e ja

Page 45: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Summary• The unification of co-expression network methods with

traditional data mining methods can inform the application and development of systems biologic methods.

• We study network concepts in special types of networks, which we refer to as approximately factorizable networks.

• We find that modules often are approximately factorizable• We characterize co-expression modules that are

approximately factorizable• We provide a dictionary for relating fundamental network

concepts to eigengene based concepts• We characterize coexpression networks where hub genes

are significant with respect to a microarray sample trait• We show that intramodular connectivity can be interpreted

as a fuzzy measure of module membership.

Page 46: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Summary Cont’d

• We provide a geometric interpretation of important network concepts (e.g. hub gene significance, module significance)

• These theoretical results have important applications for describing pathways of interacting genes

• They also inform novel module detection procedures and gene selection procedures.

Page 47: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

AcknowledgementBiostatistics/

Bioinformatics• Tova Fuller• Peter Langfelder• Ai Li• Wen Lin• Mike Mason• Angela Presson• Lin Wang• Andy Yip• Wei ZhaoBrain Cancer/Yeast• Paul Mischel• Stan Nelson• Marc Carlson

Comparison Human-ChimpDan GeschwindMike OldhamGiovanni

Mouse DataJake LusisTom DrakeAnatole GhazalpourAtila Van Nas

Page 48: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

APPENDIX(back up slides)

Page 49: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Steps for constructing a

co-expression network• Hi

A) Microarray gene expression data

B) Measure concordance of gene expression with a Pearson correlation

C) The Pearson correlation matrix is either dichotomized to arrive at an adjacency matrix unweighted network

Or transformed continuously with the power adjacency function weighted network

Page 50: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Definition of module (cluster)

• Module=cluster of highly connected nodes– Any clustering method that results in such sets

is suitable

• We define modules as branches of a hierarchical clustering tree using the topological overlap matrix

Page 51: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Relationship between Module significance and hub gene significance

Page 52: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Application: Brain Cancer Data

Page 53: A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong