threshold selection in gene co- expression networks using spectral graph theory techniques andy d...

151
Chronic Neurologic Problems Fall 2009

Post on 19-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

1

Threshold selection in gene co-expression networks usingspectral graph theory techniques

Andy D Perkins*,Michael A LangstonBMC Bioinformatics

2

Outline

• Introduce―How to construct a gene co-expression

network?―Steps and our criterion

•Method•Result & Analysis

3

Introduce

• In gene co-expression networks, nodes represent

gene transcripts.• Two genes are connected by an edge if

their expression values are highly correlated.

• Definition of “high” correlation is somewhat tricky ― One can use statistical significance…―But we propose a criterion for picking

threshold parameter: spectral graph theory.

4

Introduce

5

Methods

•Microarray data sets― Homo sapiens―Saccharomyces cerevisiae: baker’s yeast

6

Methods

•Network construction―Construct a complete graph―Compute Pearson correlation coefficient

between each nodes.―A high-pass filter between 0.70 to 0.95

threshold •Network representation―Laplacian of the graph G

7

Methods

•Eigenvalue and eigenvector computation―Aim to solve the eigenvalue problem defined

above.

― resulting eigenvalues and associated eigenvectors

,

―The eigenvector associated with λ1 was exacted and sorted in increasing order.

8

Exmaple

λ1=0.7216 V1=

Result:

9

Methods

•Cluster detection―Using a sliding window technique―Significant difference m + s/2 , m:median ; s:standard deviation―If less than 10 nodes, discard

10

Methods

•Paraclique extraction [17.] ―The base maximum clique size is 3.

11

Methods

•Functional comparisons―To analyze some resulting paracliques in yeast

and human, respectively.―Use Saccharomyces Genome Database GO

Slim Viewer and Ingenuity Pathways Analysis software

12

Results and discussion

•A nearly-disconnected components. [10.]

Result:

λ1

The ability to find the nearly-disconnected pieces allows us to identify those nodes sharing a well connected,or dense, cluster.

13

Results and discussion

•Spectral properties & Algebraic connectivity―the multiplicity of the zero eigenvalue is equal

to the number of connected components in the graph.

―When analyzing only the spectrum of the largest component , the smallest nonzero eigenvalue (λ1): algebraic connectivity

14

Results and discussion

•Spectral properties & Algebraic connectivity

Alg

eb

raic co

nn

ectivity

yeast

human

15

Results and discussion

•Spectral clustering :potential threshold―Resulting in a likely nearly-disconnected

component.0.78

0.83

16

Results and discussion

•Comparison with other results―Traditional methods

17

Results and discussion

•Comparison with other results―Previous studies (1) [19.]―Based on RMT approach to determine

correlation threshold―result = 0.77,corresponds approximately to

0.78

18

Results and discussion

•Comparison with other results―Previous studies (2) [14.] ―We select g=3, to enumerate paraclique.

19

Results and discussion

•Functional comparisons : SGD & IPA―yeast

t=0.78 Three largest paracliques size 21, 17, 15.9 of the 21 genes had unknown molecular function;

t=0.55 Three largest paracliques size 93, 53, 37.Many more of these gene have unknown molecular function(40,13,17).

Largely the same categories appeared within the three largest paracliques in both groups.

20

Results and discussion

•Functional comparisons : SGD & IPA―human

t=0.83 1st paraclique related to cellular organization, gene expression, genetic disorder, drug metabolism, and cell signaling;2nd protein synthesis; these were related to reproductive systems development and disease , respectively.

t=0.65 The networks seem to be annotated with a larger range of functions.ex :2nd matched 13 networks ranging from cellular assembly and organization, genetic disorder, to inflammatory disease, and many others.

21

Conclusion

•Here presented a systematic threshold selection method that make use of spectral graph theory.

•The results in agreement with previous study.

•At higher threshold―Fewer of these genes fail to be categorized

based upon the gene ontology.―Fewer networks were identified as being

enriched in the paracliques, making interpretation of the results easier.

22

Reference

• [10.] Ding CHQ, He X and Zha H: A spectral method to separate disconnected and nearly-disconnected web graph components.Proceedings of the Seventh ACM International Conference on Knowledge Discovery and Data Mining: 26–29 August 2001; San Francisco 2001.

• [14.] Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB,Brown PO, Botstein D and Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccaromyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 1998, 9

• [17.] Chesler EJ and Langston MA: Combinatorial genetic regulatory network analysis tools for high throughput transcriptomic data. RECOMB Satellite Workshop on Systems Biology and Regulatory Genomics: 2–4 December 2005; San Diego 2005.