architecture of the human regulatory network derived from encode data
DESCRIPTION
Transcription factors bind in a combinatorial fashion to specify the on-and-off states of genes; the ensemble of these binding events forms a regulatory network, constituting the wiring diagram for a cell. To examine the principles of the human transcriptional regulatory network, we determined the genomic binding information of 119 transcription-related factors in over 450 distinct experiments. We found the combinatorial, co-association of transcription factors to be highly context specific: distinct combinations of factors bind at specific genomic locations. In particular, there are significant differences in the binding proximal and distal to genes. We organized all the transcription factor binding into a hierarchy and integrated it with other genomic information (for example, microRNA regulation), forming a dense meta-network. Factors at different levels have different properties; for instance, top-level transcription factors more strongly influence expression and middle-level ones co-regulate targets to mitigate information-flow bottlenecks. Moreover, these co-regulations give rise to many enriched network motifs (for example, noise-buffering feed-forward loops). Finally, more connected network components are under stronger selection and exhibit a greater degree of allele-specific activity (that is, differential binding to the two parental alleles). The regulatory information obtained in this study will be crucial for interpreting personal genome sequences and understanding basic principles of human biology and disease.TRANSCRIPT
ARCHITECTURE OF THE HUMAN REGULATORYNETWORK DERIVED FROM ENCODE DATAGerstein MB, Kundaje A, Hariharan M, Landt SG, Yan KK, Cheng C, Mu XJ, Khurana E, Rozowsky J, Alexander R, Min R, Alves P, Abyzov A, Addleman N,Bhardwaj N, Boyle AP, Cayting P, Charos A, Chen DZ, Cheng Y, Clarke D, Eastman C, Euskirchen G, Frietze S, Fu Y, Gertz J, Grubert F, Harmanci A, Jain P,Kasowski M, Lacroute P, Leng J, Lian J, Monahan H, O'Geen H, Ouyang Z, Partridge EC, Patacsil D, Pauli F, Raha D, Ramirez L, Reddy TE, Reed B, Shi M,Slifer T, Wang J, Wu L, Yang X, Yip KY, Zilberman-Schapira G, Batzoglou S, Sidow A, Farnham PJ, Myers RM, Weissman SM, Snyder M.
Paper Presentation | Physiology |M.Sc., ITMB UoAAnaxagoras Fotopoulos – Thanos Papathanasiou | 2014
Nature, 489(7414):91-100, 2012
INTRODUCTION▪ System-wide analyses of transcription-factor-binding patterns have been performed in
unicellular model organisms, such as Escherichia coli.
▪ For humans, systems-level analyses have been a challenge due to the size of the transcription factor repertoire and genome.
▪ Large-scale data from the ENCODE project begin to enable such analyses
▪ An analysis of the genome-wide binding profiles of 119 transcription-related factors derived from 450 distinct experiments is performed, for finding correlations and multi-transcription factor motifs.
▪ The results are integrated with other genomic information to form a multi-level meta-network in which different levels have distinct properties.
▪ Information obtained in this study will be crucial to interpreting variants in the many personal genome sequences expected in the future and understanding basic principles of human biology and disease.
ENCODE
ChIP-seq data sets for 119 TF over five main cell lines
Peak Detection
TFFor every peak
find intensities of overlapping peaks of all other factors
Generation of Cobinding maps
(e.g. GATA1)
vs
Negative Set
created by independently shuffling the peak intensity
values in each row of the co-binding map
RULEFIT ALGORITHM{Combination of
factors are compared to randomized co-
binding map}
Positive Set
Randomized co-binding map Randomized co-binding map
Relative Importance Coassociation score
Aggregate across allfocus-factor contexts
Importance correlation
matrix
Maximal Coassociation
matrix
Data & Methods Analysis
POL2-(4H8)TAL1GATA2
Relative Importance gives the overall importance of each transcription factor in the model. It reflects the ‘size’ of the biclusters to which a particular transcription factor belongs, and it is related to the number of co-binding factors and the fraction of peak locations involved.
For GATA1 context primary partners POL2, TAL1 and GATA2, as well as local partners MAX and JUN, have high RI scores.
Co-association scores measure the impact of the co-dependency implicit in a particular pair on the model as a whole, and they more directly probe the co-occupancy of transcription factors in the focus factor context than does the RI score.
CCNT2–HMGN3 Novel Pair
MYC–MAX–E2F6 Expected Pairings Many Genes that lie near
clusters of co-associated factors are enriched for specific biological functions. For example
• Bicluster {E2F6–GATA1–GATA2–TAL1} was enriched for genes related to myeloid differentiation
• Bicluster {E2F6–SP1–SP2–FOS–IRF1} was involved in DNA damage response
Distinct combinationsof factors regulate specific types of
genes.
Example of GATA1 Relative Importance & Co-association scores
CORRELATIONS OF TRANSCRIPTION FACTORS {1/7} WITH DISTAL EDGES
Downward Pointing Edges
UpwardPointing Edges
Distal edges have a different degree distribution than proximal ones.
Transcription factors with low in-degree values in the proximal network but high in-degree values in the distal one, indicating
that they are heavily regulated through enhancers
Top Level
Middle Level
Bottom Level
CORRELATIONS OF TRANSCRIPTION FACTORS {2/7} WITHIN THE PROXIMAL NETWORK
• Upper-level transcription factors tend to have more targets than lower-level ones
(Less Shaded TF).
• In middle-level, TF concentrate many in-degree & out-degree information (bottleneck) between top and bottom level.
Top Level
Middle Level
Bottom Level
Downward Pointing Edges
UpwardPointing Edges
CORRELATIONS OF TRANSCRIPTION FACTORS {3/7} WITH PROTEIN INTERACTIONS AND THE PHOSPHORYLOME
Top-level transcription factors tend to have more partners in the protein–interaction network than do lower-
level ones.
Kinases at the bottom tend not to phosphorylate transcription factors
Kinases at the bottom tend to be regulated by transcription factors
Phosphorylome is a proteome (entire set) of phosphoproteins
CORRELATIONS OF TRANSCRIPTION FACTORS {4/7} WITH ncRNAs
Highly connected transcription factors
tend to regulate more miRNAs and to be more
regulated by them.
Top-level and middle-level transcription factors have the highest total number of
ncRNA targets.enriched for miRNA –>TF edges
Balanced number of edges
enriched for TF –> miRNA edges
CORRELATIONS OF TRANSCRIPTION FACTORS {5/7} WITH FAMILIES AND FUNCTIONAL CATEGORIES
• Transcription factors at the top of the hierarchy tend to have more general functions, and those at the bottom tend to have more specific functions.
• TFSSs show a greater degree of tissue specificity and are more highly regulated by miRNAs than the general and chromatin-related factors
Chromatin-related factors are
enriched at the top of the hierarchy
TF Sequence-Specific (TFSSs) are
enriched in the middle
CORRELATIONS OF TRANSCRIPTION FACTORS {6/7} WITH GENE EXPRESSION
•Highly connected factors tend to be highly expressed
•Top and middle levels show a greater correlation.
•More ‘influential’ transcription factors tend to be better connected and higher in the hierarchy.
•A model integrating the binding–expression relationships of the highly connected transcription factors has the same influence (in prediction) with the less connected ones (weak binding–expression).
Top-Middle and Middle-Middle transcription factor pairs influence
gene expression cooperatively
CORRELATIONS OF TRANSCRIPTION FACTORS {7/7} WITH NETWORK DYNAMICS
Transcription factors change their binding patterns among different cell types.
Targets of lower-level transcription factors tend to change more between cell types, consistent with their role in more specialized processes.
‘Rewiring score’ is negatively correlated with hierarchy level
Binding Set 1
BindingSet 2
𝑟𝑒𝑤𝑖𝑟𝑖𝑛𝑔 𝑠𝑐𝑜𝑟𝑒=1−𝐵𝑠1∩𝐵𝑠2𝐵𝑠1∪𝐵𝑠2
Commonbinding
sites
Rewiring score quantifies the difference between two sets of binding targets of a
TF in two cell lines (Gm12878 and K562)
ENRICHED NETWORK MOTIFS {1/4}AUTO-REGULATOR MOTIFSNetwork motifs are small connectivity patterns that carry out canonical functions
Motifs in broad template patterns, could be over- or under-represented relative to a random control
•Human Regulatory Network is enriched with auto-regulators•Auto-regulators tend to be repressors, representing a well known design principle for maintaining steady state.•Auto-regulators have more ncRNAs as their targets
Auto regulator is a simple but important motif which is commonly found in networks exhibiting multistability.
90 TF are Non-Auto-regulators
28 TF are Auto-regulators
ENRICHED NETWORK MOTIFS {2/4}THREE TRANSCRIPTION FACTOR MOTIFS
•The most enriched motif of the Three-transcription-factor motifs in the proximal network is the feed-forward loop (FFL).
•From the expression levels of the genes of the FFLs over many tissues, many were positively correlated
•Enriched three-transcription-factor motifs contain an additional regulation on top of that in a FFL. This creates a mutual regulation between a pair of transcription factors, instantiating a toggle-switch, which has essential role in the determination of the cell
EnrichmentDepletion
ENRICHED NETWORK MOTIFS {3/4}PPI-MIMs MOTIFS
•Co-regulating transcription factors are likely to interact physically, indicating that they work together as a complex.
•The motif ranking second in enrichment consists of a distal regulatory relationship, a promoter regulatory relationship, and a protein–protein interaction. Consisting of a DNA loop, with an interacting complex of transcription factors binding to the promoter and enhancer simultaneously.
Possible Multiple-Input-Modules involving promoter and distal regulation and a Protein–Protein Interaction (PPI-MIMs)
ENRICHED NETWORK MOTIFS {4/4}miRNA REGULATION MOTIFS
•The miRNAs are more likely to regulate a pair of physically interacting factors.
•In order to avoid unwanted cross-talk, a miRNA tends to shut down an entire functional unit (transcription factor complex) rather than just a single component .
•miRNAs tend to target a pair of transcription factors binding both proximally and distally. This suggests that miRNA represses the expression of both promoter and distal regulators to shut down a target completely.
ALLELIC BEHAVIOR IN A NETWORK FRAMEWORK
•The degree of allele-specific behaviour of each transcription factor can be quantified by a statistic that we call ‘allelicity’.
•of the 4,798 allele-specific binding cases (Paternal or Maternal Targets) of a single transcription factor, 57% showed coordinated allelic binding and expression.
•Increment of the degree of combinatorial regulation, cause a progressively stronger relationship between expressed and regulated alleles.
•Small insertions and deletions in TF sequences cause more allelic behavior than SNPs.
Examining relationships between sequence variation and transcription factor regulation
TF
Target
Pat/Mat
Every line denotes allele specific binding
CONCLUSIONS• Human transcription factors co-associate in a combinatorial
and context-specific fashion.• Different combinations of factors bind near different targets,
and the binding of one factor often affects the preferred binding partners of others.
• Transcription factors often show different co-association patterns in gene-proximal and distal regions
• Different parts of the hierarchical transcription factor network exhibit distinct properties.
• Number of motifs in which two genes co-regulated by a factor are bridged by a protein–protein interaction or regulating miRNA.
• Both transcription factors and Targets are under strong evolutionary selection and exhibit stronger allele-specific activity but are under weaker selection than non-allelic ones.
Thank youNational & KapodistrianUniversity of AthensDepartment of Informatics
Technological Education Institute of AthensDepartment of Biomedical Engineering
Biomedical ResearchFoundation Academy of Athens
Demokritos National Center for Scientific Research
PhysiologyInformation Technologies in Medicine and Biology