chip-seq and its applications in grn construction jin chen 2012 fall cse891-001

32
ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

Upload: stewart-griffin

Post on 24-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

ChIP-seq and its applications in GRN construction

Jin Chen2012 Fall

CSE891-001

Page 2: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001
Page 3: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

Layout

• Genome-scale evidence from microarray measurements may be used to identify regulatory interactions between TFs and targets

• Hu et al used a genetic approach to identify targets of transcription factors in Yeast and reconstruct a functional regulatory network

• Reimand et al re-analyzed Hu’s data using improved statistical techniques

Page 4: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

Hu et al’s work

• Grew each of 263 transcription factor knockout strains and compared mRNA expression of each of these strains with a wildtype strain using microarrays

• Defined unrefined transcription factor target network as the cumulative set of significantly differentially expressed genes in each deletion strain.

• There was overlap between transcription factor targets identified in the unrefined network and targets identified by ChIP-chip

Page 5: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

2-level Refinement

• First level of network refinement– If TF A activated TF B and gene M, B activated gene M, and if the

confidence of A regulating gene M was lower than for B regulating gene M, then the regulation of gene M by A was presumed to be indirect and was therefore erased

• Additional refinement step– Similar to previous step, except that the indirect edge that was

removed bridged a three-step direct interaction series at the preceding level, resulting in a level 3 refined network

• Note that the logical consistency for regulatory edges was maintained at all times

Page 6: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001
Page 7: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

Hu et al’s work

• When the transcription factor bound to a promoter was deleted, the expression of the downstream gene was much more likely to be affected than the background

• Expression from promoters that were detectably occupied by a single TF were even more likely to be affected by deletion of that potentially major or sole TF

• Thus, there was significant overlap between binding targets defined by ChIP-chip and functional targets defined by TF deletion

Page 8: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001
Page 9: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

Hu et al’s work – problems

• However, Hu et al ‘s study used relatively dated and insensitive approaches for microarray data processing

• As a result the published P-values and target-gene ranking are likely to be unreliable– P-values were not corrected for multiple-testing– Lack of background and print-tip correction during normalization

• Reimand et al re-analyzed the same dataset with the state-of-art software and obtained a much larger network

Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

Page 10: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

False Discovery Rate

• False discovery rate is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. q-value is defined to be the FDR analogue of p-value

• FDR is the expected proportion of false positives among all significant hypotheses

• For example, if 1000 observations were experimentally predicted to be different, and FDR for these observations was 0.1, then 100 observations would be expected to be false

• FDR is determined from the observed p-value distribution, and hence is adaptive to the amount of records

Page 11: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

Redo the Preprocessing

• Microarrays were normalized using the VSN package, including print-tip and background correction

• Differential expression was calculated using a moderated eBayes t-test as implemented in the Limma Bioconductor package

• FDR cut-off of 0.05 was used to detect significant differential gene expression

Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

Page 12: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

Re-analyze TF binding data

• DNA–protein interactions derived from ChIP-chip experiments were obtained and with a P_value<0.001 were considered

• A set of ‘trusted’ position weight matrices (PWMs) for 72 regulatory factors were derived by running the PROCSE and PhyloGibbs algorithms on a set of experimentally derived TF binding sites from SCPD

• These PWMs were then used to scan multiple alignments of each intergenic region in Yeast with the orthologous regions of another four Yeast species

Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

Page 13: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

13

Re-analyze knockout expression and ChIP binding data

• Overlap between TF-binding and TF knockout data – Collect binding sites for 142 TFs, comprising 5,188 ChIP-

chip interactions and 17,091 motif predictions– Calculate the intersection between the list of differentially

expressed genes from the TF knockout and targets identified by ChIP-chip or binding-site predictions

– 2,230 regulation relations

Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

Page 14: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

14

Re-analyze knockout expression and ChIP binding data

• Checked the expression levels of the TFs– Intuitively one expects the TF under consideration to have

lower expression in the mutant strain compared with the wild type strain

– confirms this for 155 TFs– 78 TFs display a negative fold change at statistically non-

significant levels– 36 TFs are lethal

Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

Page 15: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

15Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

Page 16: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

16

Re-analyze knockout expression and ChIP binding data

• Examine functional annotations of differentially expressed genes– As most TFs are considered to regulate distinct cellular

processes, their target genes should be associated with a coherent set of molecular and biological functions

– Used g:Profiler to identify GO, KEGG and Reactome pathway annotations

– Across all TF knockouts, this analysis has a higher score than the original analysis

Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

Page 17: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

17Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

Page 18: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

SUMMARY - exploring biological networks

Page 19: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

19

Topology Approaches

• What’s the next after constructing biological networks?

• First of all, simple approaches– Degree, betweenness, clustering coefficient,

topological coefficient, shortest path– Shared neighbors, neighborhood connectivity,

closeness centrality

Page 20: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

20

Clustering Coefficient• Clustering coefficient is a measure of degree to which nodes

in a graph tend to cluster together • Clustering coefficient (local version): does my neighbors

connect with each other?• Evidence suggests that in most real-world networks, nodes

tend to create tightly knit groups characterized by a relatively high density of ties

where ki is the number of neighbors of node i and ei is the number of connected pairs between all neighbors of node i

𝐶𝑖=2𝑒𝑖

𝑘𝑖×(𝑘¿¿ 𝑖−1)¿

Luciano da F. Costa, Francisco A. Rodrigues, Alexandre S. Cristino. Complex networks: the key to systems biology. Genet. Mol. Biol. vol.31 no.3. 2008; http://med.bioinf.mpi-inf.mpg.de/netanalyzer

Page 21: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

21

Average Clustering Coefficient Distribution

Nodes with only a few links have a high C(k) and belong to highly interconnected small modules

By contrast, the highly connected hubs have a low C(k), with their role being to link different, and otherwise not communicating, modules

Define function C(k) as the average clustering coefficient of all nodes with k links

For many real networks C(k) ~ k–1

Page 22: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

22

closeness centrality• Closeness centrality is a measure of how many steps is required to access

every other node from a given node• Closeness centrality: How long it will take information to spread from a

given node to other reachable nodes in the network?

𝐶𝐶 (𝑖)=∑𝑡∈𝑉 ¿

𝑑𝐺(𝑖 ,𝑡)

¿𝑉∨−1

where dG(i, t) is the length of the shortest path from i to t, and V is the set of nodes in G

Freeman, 1978; Opsahl et al., 2010; Wasserman and Faust, 1994

Page 23: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

23

Distribution of closeness centrality

Closeness centrality are successful in distinguishing the important members of the community

Its distribution resembles a normal curve, while the other centrality measures have a long tail distribution similar to a power law

Page 24: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

24

Limitations of simple approaches

• Study each node/edge individually; cannot apply enrichment study

• Topology study only; difficult to integrate other knowledge

• Nodes with high scores <> key genes/proteins

Study a group of genes simultaneously

Page 25: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

25

Advanced approaches

• Dense subgraph detection• Network motif detection• Graph clustering• Graph classification• etc.

Page 26: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

26

Dense subgraph detection

Software available at http://zhoulab.usc.edu/CODENSE/

Page 27: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

27

Dense subgraph detection• A subgraph is considered coherent and dense if and only if

every edge is well supported, and its corresponding second-order graph is dense

COD

ENSE

Page 28: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

28

Network Motif Detection

Page 29: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

Network Motif Detection

Page 30: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

30

Perform graph join operation to find repeated size-k graphs

Join each tree with it’s cousins to produce frequent motif candidates Ck.

t4_1

t4_2

&

& &

h1 h2

h3 h4 h5

Page 31: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

31

Graph Clustering

• Graph clustering is an organization process with the goal to put similar nodes together; the result is a partition of the network into a set of communities

• MCL algorithm is a fast and scalable unsupervised cluster algorithm for graphs based on simulation of stochastic flow in graphs, available at http://www.micans.org/mcl

Van Dongen, S. (2000) Graph Clustering by Flow Simulation. PhD Thesis, University of Utrecht, The Netherlands

Page 32: ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001

32

Graph Clustering

Graph

Graph Clusters