topologicalassociated domainsidentification usinghi cxiaoman/spring/lecture 19 topological...
TRANSCRIPT
![Page 1: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/1.jpg)
Topological AssociatedDomains identification
using Hi‐CModified from Djekidel Mohamed Nadhir
![Page 2: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/2.jpg)
Structural Organization of Chromatin
Interaction between TADs of the same epigenetic type give rise to compartments
Chromosome territories are formed by coalescence of compartments
A compartments are active and localize near nuclear speckles
B compartments are inactive and localize near the nuclear envelope
Chromatin is organized into TADsfrom Hansen et al., Nucleus 9, 20 (2018)
![Page 3: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/3.jpg)
Hi‐C for understanding 3D structure
• Despite revealing the sequence of the genome, little is known about its 3D structure
•
•high‐throughput chromosome capture (Hi‐C) is 3C‐based technology
it can detect chromatin interactions between loci across the entire genome
Biological experiment:
Ming, H., et al. (2013). "Understanding spatial organizations of chromosomes via statistical analysis of Hi‐C data."Quantitative Biology 1.
![Page 4: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/4.jpg)
Hi‐C in the chromatinconformation study map
Smallwood, A. and B. Ren (2013). "Genome organization and long‐range regulation of gene expression by enhancers." Current opinion in cell biology 25(3):387‐394.
![Page 5: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/5.jpg)
Data Processing Pipeline• 4main steps:
• Readmapping : Each side (50 bp) is mapped independently to the reference genome
• Read level filtering
• Fragment filtering : Filter fragments with low mappability score
• Creation of the Hi‐C contactmatrix
Ming, H., et al. (2013). "Understanding spatial organizations of chromosomes via statistical analysis of Hi‐C data."Quantitative Biology 1.
![Page 6: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/6.jpg)
Read filtering step• The flowing types of reads should be removed :
• Self‐ligation reads:
• Dangling reads : un‐ligated reads
• PCR amplification reads:many reads that map to the same location
• Random breaking reads : reads located far from the enzyme cutting site ( 1 2 500 )
![Page 7: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/7.jpg)
Fragment filtering step• Remove fragments with lowmappability score (< 0.5)
• fragment near centromere or telomere regions tends to contain a large proportion of repetitive sequence andleads to a lowmappability score
• Additional suggestions :
• Remove fragments with <100bp or > 100 kb
• Remove 0.5% of the fragments with the highest number of reads (can be source of PCR artifacts)
![Page 8: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/8.jpg)
Construction of the Hi‐C interaction matrix• The number of Enzyme cut‐site is 1012, however a typical Hi‐C experiment generate 108 reads
• Thus, we need to partition the genome into large scale bins.
Hi‐C vs FISH
![Page 9: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/9.jpg)
Discussed paper
• Aim :
• Investigate the 3D organization of the human and mouse genome in ES anddifferentiated cells.
• Data :
• Mouse :
• Mouse embryonic stem cell (mESC)
• Cortex cell (generated by another group)
• Human :
• Human embryonic stem cell (hESC)
• IMR90
![Page 10: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/10.jpg)
Data control (1)• Remove cut site bias
Raw data Normalized data
![Page 11: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/11.jpg)
Data control (2)Compare 5C generated data for the HoxA
locus (correlation > 0.73) Compare with Phc1 locus 3C data
Compare with FISH data of 6 loci
![Page 12: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/12.jpg)
Data control (3)
PearsonCorrelation between replicates
![Page 13: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/13.jpg)
Visualization of interactions
We can notice aTopologicalAssociated Domain (TAD) structure at bins < 100kb
![Page 14: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/14.jpg)
Identification of topological domainsStep1: Detection of the interaction bias
We notice that in aTAD that :
• The upstream portion is highly biased to interact downstream
• The downstream portion is highly biased to interact upstream
a directionality index (ID) was defined to calculate this bias:
• 0 Upstream bias
• 0Downstream bias
• the extent of the interaction
![Page 15: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/15.jpg)
DI calculation
Steps:
• The genome was split into bins of length 40 kb
• Let :
• A: # of reads that map in the 2M upstream of the bin
• B: # of reads that map in the 2M downstream of the bin
• E: expected number of reads 𝐄 =𝑨+𝑩
𝟐
• Then :
• 𝐷𝐼 =𝐵−𝐴
𝐵−𝐴
𝐴−𝐸 2
𝐸+
𝐵−𝐸 2
𝐸
-2Mb +2Mb40kb
A B
![Page 16: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/16.jpg)
Domain detection (1)• Each bin can have 3 states :
• Upstream biased
• Downstream biased
• No bias
• Use a HMM based on the DI to infer the biased state
• We define :
• 𝒀 = [𝒀𝟏, 𝒀𝟐, … , 𝒀𝒏] : The observed DI
• 𝑸 = [𝑸𝟏, 𝑸𝟐, … , 𝑸𝒏] : The hidden bias 𝑄𝑖 ∈ {𝐷, 𝑈, 𝑁}
• 𝑴 = 𝑴𝟏, 𝑴𝟐, … ,𝑴𝒎 : 𝑚 ∈ [1,20]
• The probabilities are calculated as follow:
• 𝑷 𝒀𝒕 𝑸𝒕 = 𝒊,𝑴𝒕 ) = 𝓝 𝐘𝐭; 𝝁𝒊𝒎, 𝚺𝒊𝒎
• 𝑷 𝑴𝒕 = 𝒎 𝑸𝒕 = 𝒊) = 𝑪(𝒊,𝒎)
• 𝑪(𝒊,𝒎): the mixture weight
D D D D U U U N N N D D D U U
Domain Boundary Domain
` ` `
𝑀1 𝑀2𝑀3
𝑸𝒕
𝒚𝒕
𝑴𝒕
𝑸𝒕+𝟏
𝒚𝒕+𝟏
𝑴𝒕+𝟏
DU
N
![Page 17: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/17.jpg)
Domain detection (1)
• The region between two TAD is termed :
• Topological boundary : if size < 400kb
• Unrecognized chromatin : if size ≥ 400 kb
![Page 18: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/18.jpg)
What separates two TADs
• Studied the HoxA locus known to be separated into two compartments
• Found that the CS5 insulator resides in the boundary
• Maybe insulators are enriched at the boundary ?
![Page 19: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/19.jpg)
CTFC role in the boundary
• Studied other known insulator CTCF
![Page 20: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/20.jpg)
Heterochromatin and boundary
• the H3K9me3 profile changed between cells hESC and IMR90 but the boundaries structure didn’t change
• potential link between the topological domains and transcriptional control in the mammalian genome
![Page 21: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/21.jpg)
Characteristics of TAD
• TAD are stable between cell lines
hESC
IMR90
![Page 22: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/22.jpg)
Characteristics of TAD
• TAD are conserved between species
![Page 23: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/23.jpg)
Cell type specific interactions
• A binomial test is performed for each 20kb bin to determine is it is cell specific
• Calculate 𝒏 = 𝑰𝒎𝑬𝑺𝑪 + 𝑰𝒄𝒐𝒓𝒕𝒆𝒙 , the number of possible interactions at a distance 𝒅
• Calculate the expected value 𝒑 =𝑰𝒎𝑬𝑺𝑪
𝒏or 𝒑 =
𝑰𝒄𝒐𝒓𝒕𝒆𝒙
𝒏
• Then for each bin do a binomial-test to see if there is a deviation in the number cell specific
interactions
d d d d
𝒏 = 𝟑 + 𝟐 + 𝟏 + 𝟏 + 𝟐 + 𝟏 + 𝟒 + 𝟏 = 𝟏𝟓
mESC
Cortex
𝒑 =𝟕
𝟏𝟓or 𝒑 =
𝟖
𝟏𝟐
![Page 24: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/24.jpg)
Cell type specific interactions
• 20% of the genes that have a FC≥ 4 are found in dynamic interacting loci.
• > 96% of the dynamic interactions occur in the same domain.
• Model :
• domain organization is stable between cell types
• but the regions within each domain may be dynamic,
![Page 25: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/25.jpg)
Factors forming the boundary (1)
• Boundaries are enriched for active promoter signals and gene bodies
![Page 26: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/26.jpg)
Factors forming the boundary (2)
![Page 27: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/27.jpg)
TAD vs A/B compartments (1)
• Loci found clustered in A compartments are generally:
• gene rich,
• transcriptionally active,
• and DNase I hypersensitive,
Lieberman-Aiden, E., et al. (2009), Science (New York, N.Y.) 326(5950): 289-293.
Compartment B
Compartment A
• Loci found clustered in B compartments are generally:
• gene poor,
• transcriptionally silent
• and DNase I insensitive
At a higher order the chromatin is organized into A and B compartments
![Page 28: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/28.jpg)
TAD vs A/B compartments (2)
TAD are smaller than A/B compartments
![Page 29: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/29.jpg)
Summary
• The mammalian genome is segmented into a megabase-scale domains
• Domain boundaries are stable between cell lines and species , suggesting that they are a basic property of the chromosome architecture.
• Domain boundaries are enricher for :
• Transcriptionally active genes
• Coincide with heterochromatin boundaries
• Enriched with insulator proteins
• Enriched with tRNA, SINE and housekeeping genes
• Developed many data-analysis approaches
![Page 30: TopologicalAssociated Domainsidentification usingHi Cxiaoman/spring/lecture 19 topological associated domains.pdfHi‐C for understanding 3D structure ... • potential link between](https://reader033.vdocuments.site/reader033/viewer/2022050411/5f885d7c28c023705f1a37da/html5/thumbnails/30.jpg)
Summary
• The mammalian genome is segmented into a megabase-scale domains
• Domain boundaries are stable between cell lines and species , suggesting that they are a basic property of the chromosome architecture.
• Domain boundaries are enricher for :
• Transcriptionally active genes
• Coincide with heterochromatin boundaries
• Enriched with insulator proteins
• Enriched with tRNA, SINE and housekeeping genes
• Developed many data-analysis approaches