meta analysis and differential network analysis with applications in mouse expression data

Meta Analysis and Differential Network Analysis with Applications in Mouse

Expression DataSteve Horvath

Outline• Standard differential expression

analysis• Statistical power studies• Important network concepts• Single versus differential network

analysis• Differential network construction

Standard (gene based) differential expression analysis

• Many software packages and R functions calculate T tests, p-values, false discovery rates, fold changes, etc.

• WGCNA R functions:– For a binary trait (e.g. case control status), use

standardScreeningBinaryTrait– For a numeric trait (e.g. body weight), use

standardScreeningNumericTrait– For a right censored time variable, use

standardScreeningCensoredTime

metaAnalysis R function in the WGCNA R package

helpfile metaAnalysis

Stouffer Z statistics from metaAnalysis

Ranking based metaAnalysis statistics

Combine several gene rankings using the rankPvalue function

Statistical Power Studies

Statistical power calculations

According to google scholar, it was cited by 11708 (July 2013).

Network concept=network statistics

Network=Adjacency Matrix

• A network can be represented by an adjacency matrix, A=[aij], that encodes whether/how a pair of nodes is connected.– A is a symmetric matrix with entries in [0,1] – For unweighted network, entries are 1 or 0

depending on whether or not 2 nodes are adjacent (connected)

– For weighted networks, the adjacency matrix reports the connection strength between node pairs

– Our convention: diagonal elements of A are all 1.

Motivational example I:Pair-wise relationships between genes across

different mouse tissues and genders

Challenge:Develop simple descriptive measures that describe the patterns.Solution: The following network concepts are useful: density, centralization,clustering coefficient, heterogeneity

Motivational example (continued)

Challenge: Find a simple measure for describing the relationship between gene significance and connectivity

Solution: network concept called hub gene significance

Backgrounds• Network concepts are also known as

network statistics or network indices– Examples: connectivity (degree), clustering

coefficient, topological overlap, etc• Network concepts underlie network

language and systems biological modeling.

• Dozens of potentially useful network concepts are known from graph theory.

Review of some fundamental network concepts

which are defined for all networks (not just co-expression

networks)Horvath 2011 Weighted Network Analysis. Springer

Book. Hardcover ISBN: 978-1-4419-8818-8Dong Horvath 2007 Understanding network

concepts in modules BMC Syst BiolHorvath Dong (2008) Geometric Interpretation of Gene Co-expression network analysis. Plos Comp

Biol

Connectivity• Node connectivity = row sum of the adjacency

matrix– For unweighted networks=number of direct neighbors– For weighted networks= sum of connection strengths

to other nodes

iScaled connectivity=Kmax( )

i i ijj i

i

Connectivity k a

kk

Density• Density= mean adjacency• Highly related to mean connectivity

( )( 1) 1

where is the number of network nodes.

iji j ia mean kDensity

n n nn

Centralization

Centralization = 1because it has a star topology

Centralization = 0because all nodes have the same connectivity of 2

max( ) max( )2 1 1n k kCentralization Density Densityn n n

= 1 if the network has a star topology= 0 if all nodes have the same connectivity

Heterogeneity• Heterogeneity: coefficient of variation of the

connectivity• Highly heterogeneous networks exhibit hubs

( )( )

variance kHeterogeneity

mean k

Clustering CoefficientMeasures the cliquishness of a particular node« A node is cliquish if its neighbors know each other »

Clustering Coef of the black node = 0

Clustering Coef = 1

,

2 2

il lm mil i m i li

il ill i l i

a a aClusterCoef

a a

This generalizes directly to weightednetworks (Zhang and Horvath 2005)

The topological overlap dissimilarity is used as input of hierarchical clustering

• Generalized in Zhang and Horvath (2005) to the case of weighted networks• Generalized in Li and Horvath (2006) to multiple nodes• Generalized in Yip and Horvath (2007) to higher order interactions

,

min( , ) 1

iu uj iju i j

iji j ij

a a a

TOMk k a

1ij ijDistTOM TOM

Network Significance• Defined as average gene significance• We often refer to the network significance

of a module network as module significance.

iGSNetworkSignif

n

Maximum adjacency ratio

Network concepts for comparing two networks

Differential network concepts• Node specific statistics:

– Diff.ClusterCoef(i) = CC1(i) – CC2(i)– Diff.Mar(i)= MAR1(i) – MAR2(i)

• Global statistics– Diff.MeanClusterCoef = Mean.CC1–Mean.CC2

– Diff.MeanConnectivity=Mean.k1 – mean.k2

– Diff.MeanMAR=Mean.MAR1 – mean.MAR2

– Diff.MeanKME=Mean.KME– Diff.Density=Density1 – Density2– can be calculated via the modulePreservation function

Measuring the similarity between two networks

R code for computing network concepts

R code, help file

Data analysis strategiesSingle network analysis

versus differential network analysis

Goals of Single Network Analysis

• Identifying genetic pathways (modules)

• Finding key drivers (hub genes)• Modeling the relationships between:

– Transcriptome– Clinical traits / Phenotypes– Genetic marker data

Validation set 1 Validation set 2

Single Network WGCNA

1 gene co-expression networkMultiple data sets may be used for

validation

Goals of Differential Network Analysis

• Uncover differences in modules and connectivity in different data sets– Ex: Human versus chimpanzee brains

(Oldham et al. 2006)• Differing topology in multiple

networks reveals genes/pathways that are wired differently in different sample populations

Fuller TF, Ghazalpour A, Aten JE, Drake TA, Lusis AJ, …(2007) "Weighted Gene Co-expression Network Analysis Strategies Applied to Mouse Weight", Mamm Genome. 18(6):463-472

Oldham MC, …Geschwind DH (2006) Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci U S A 103, 17973-17978.

NETWORK 1

Differential Network WGCNA

2+ gene co-expression networksIdentify genes and pathways that are:

1. Differentially expressed2. Differentially wired

NETWORK 2

• Single network analysis female BxH mice revealed a weight-related module (Ghazalpour et al. 2006)

• Samples: Constructed networks from mice from extrema of weight spectrum:– Network 1: 30 leanest mice– Network 2: 30 heaviest mice

• Transcripts: Used 3421 most connected and varying transcripts

BxH Mouse Data from AJ Lusis

Ghazalpour A, Doss S, Zhang B, Wang S, Plaisier C, Castellanos R, Brozell A, Schadt EE, Drake TA, Lusis AJ, Horvath S (2006) Integrating genetic and network analysis to characterize genes related to mouse weight. PLoS genetics 2, e130

NETWORK 1 NETWORK 2

135 FEMALES

Methods

Compute Comparison MetricsCompute Comparison Metrics• Difference in expression: t-test statisticDifference in expression: t-test statistic• Compare difference in connectivity: Compare difference in connectivity: DiffKDiffK

Identify significantly different genes/pathwaysIdentify significantly different genes/pathwaysPermutation testPermutation test

Functional analysis of significant genes/pathwaysFunctional analysis of significant genes/pathwaysDAVID databaseDAVID database

Primary literaturePrimary literature

Computing Comparison Metrics

DIFFERENTIAL EXPRESSIONt-test statistic computed for each gene, t(i)

DIFFERENTIAL CONNECTIVITYK1(i) = k1(i) K2(i) = k2(i) max(k1)

max(k2)

DiffK(i): difference in normalized connectivities for each gene:

DiffK(i) = K1(i) – K2(i)

Sector PlotWe visualize the comparison metrics via a sector plot:• x-axis: DiffK• y-axis: t statistics

We establish sector boundaries to identify regions of differentially expressed and/or connected regions• |t| = 1.96 corresponding to p = 0.05• |DiffK| = 0.4

no.perms: number of permutations

For each sector j, we compare the number of genes in unpermuted and permuted sectors (nobs and nperm)

Permutation test:Identifying significant sectors

p j # times (nobs

j npermj ) 1

no.perms1NETWORK 1 NETWORK 2

PERMUTE

Sector Plot Results

0.010.001

0.001 0.001X

X X

X

Functional AnalysisSECTOR 3

High t statistic High DiffK

Yellow module in leanGrey in obese

(63 genes)

Genes in these sectors have higher connectivity in lean than obese mice: ~ pathways potentially

disregulated in obesity ~

SECTOR 5Low t statistic

High Diff K(28 genes)

Sector 3:Functional Analysis Results

DAVID Database• “Extracellular”:

– extracellular region (38% of genes p = 1.8 x 10-4)– extracellular space (34% of genes p = 5.7 x 10-4)

• signaling (36% of genes p = 5.4 x 10-4)• cell adhesion (16% of genes p = 7.7 x 10-4)• glycoproteins (34% of genes p = 1.6 x 10-3) • 12 terms for epidermal growth factor or its related proteins

– EGF-like 1 (8.2% of genes p = 8.7 x 10-4), – EGF-like 3 (6.6% of genes p = 1.6 x 10-3), – EGF-like 2 (6.6% of genes p = 6.0 x 10-3), – EGF (8.2% of genes p = 0.013)– EGF_CA (6.6% of genes p = 0.015)

Sector 3:Functional Analysis Results

Primary Literature• Results supported by a study on EGF

levels in mice (Kurachi et al. 1993)– EGF found to be increased in obese mice– Obesity was reversed in these mice by:

• Administration of anti-EGF • Sialoadenectomy

Kurachi H, Adachi H, Ohtsuka S, Morishige K, Amemiya K, Keno Y, Shimomura I, Tokunaga K, Miyake A, Matsuzawa Y, et al. (1993) Involvement of epidermal growth factor in inducing obesity in ovariectomized mice. The American journal of physiology 265, E323-331

Sector 5: Functional Analysis Results

DAVID Database• Enzyme inhibitor activity (p = 2.9 x 10-3)*• Protease inhibitor activity (p = 6.0 x 10-3)• Endopeptidase inhibitor activity (p = 6.0 x 10-3)• Dephosphorylation (p = 0.012)• Protein amino acid dephosphorylation (p =

0.012)• Serine-type endopeptidase inhibitor activity (p

= 0.042) * p values shown are corrected using Bonferroni correction

Itih1 and Itih3• Enriched for all categories shown previously• Located near a QTL for hyperinsulinemia (Almind and Kahn 2004)• Itih3 identified as a gene candidate for obesity-related

traits based on differential expression in murine hypothalamus (Bischof and Wevrick 2005)

Serpina3n and Serpina10• Enriched for enzyme inhibitor, protease inhibitor, and

endopeptidase inhibitor• Serpina10, or Protein Z-dependent protease inhibitor (ZPI) has

been found to be associated with venous thrombosis (Van de Water et al. 2004)

Sector 5: Functional Analysis Results

Primary Literature

Almind K, Kahn CR (2004) Genetic determinants of energy expenditure and insulin resistance in diet-induced obesity in mice. Diabetes 53, 3274-3285 Bischof JM, Wevrick R (2005) Genome-wide analysis of gene transcription in the hypothalamus. Physiological genomics 22, 191-196 Van de Water N, Tan T, Ashton F, O'Grady A, Day T, Browett P, Ockelford P, Harper P (2004) Mutations within the protein Z-dependent protease inhibitor gene are associated with venous thromboembolic disease: a new form of thrombophilia. Bjh 127, 190-194

Discussion• If applicable, always report findings from a

standard differential expression analysis as well.• A host of network concepts exists for describing

the network topology.• Relatively few people use differential network

analysis which may reflect the fact that large sample sizes are needed.– A large sample size is needed to compare two

correlation coefficients• To check whether a module is preserved in

another network use the modulePreservation function.

AcknowledgementsHORVATH LABDissertation work of Tova FullerJun DongPeter Langfelder

An R tutorial may be found at:http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/DifferentialNetworkAnalysis

Mouse data collaboration

LUSIS LABJake LusisAnatole GhazalpourThomas Drake

meta analysis and differential network analysis with applications in mouse expression data

Documents