topological associated domains- hi-c
TRANSCRIPT
![Page 1: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/1.jpg)
Topological Associated Domains identification using
Hi-CSpeaker : Djekidel Mohamed NadhirDate : 03/03/2014
![Page 2: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/2.jpg)
Outline
![Page 3: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/3.jpg)
Background
• Despite revealing the sequence of the genome, little is known about its 3D structure
• high-throughput chromosome capture (Hi-C) is 3C-based technology
• it can detect chromatin interactions between loci across the entire genome
Biological experiment:
Ming, H., et al. (2013). "Understanding spatial organizations of chromosomes via statistical analysis of Hi-C data." Quantitative Biology 1.
![Page 4: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/4.jpg)
Background
• Hi-C in the chromatin conformation study map
Smallwood, A. and B. Ren (2013). "Genome organization and long-range regulation of gene expression by enhancers." Current opinion in cell biology 25(3): 387-394.
![Page 5: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/5.jpg)
Background- Processing pipeline
• 4 main steps:• Read mapping : Each side (50 bp) is mapped independently to the
reference genome • Read level filtering
• Fragment filtering : Filter fragments with low mappability score • Creation of the Hi-C contact matrix
Ming, H., et al. (2013). "Understanding spatial organizations of chromosomes via statistical analysis of Hi-C data." Quantitative Biology 1.
![Page 6: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/6.jpg)
Background- Processing pipeline• Read filtering step : The flowing types of reads should be removed :
• Self-ligation reads:
• Dangling reads : un-ligated reads• PCR amplification reads: many reads that map to the same location • Random breaking reads : reads located far from the enzyme cutting site ( )
![Page 7: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/7.jpg)
Background- Processing pipeline• Fragment filtering step : Remove fragments with low mappability
score (< 0.5)• fragment near centromere or telomere regions tends to contain a large proportion of
repetitive sequence and leads to a low mappability score• Additional suggestions :
• Remove fragments with <100bp or > 100 kb• Remove 0.5% of the fragments with the highest number of reads (can be source of
PCR artifacts)
![Page 8: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/8.jpg)
Background
• Construction of the Hi-C interaction matrix:• The number of Enzyme cut-site is , however a typical Hi-C experiment generate reads• Thus, we need to partition the genome into large scale bins.
Processing pipeline:
Hi-C vs FISH
![Page 9: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/9.jpg)
Discussed paper
• Aim : • Investigate the 3D dimensional organization of the human and mouse
genome in ES and differentiated cell.• Data :
• Mouse :• Mouse embryonic stem cell (mESC)• Cortex cell (generated by another group)
• Human :• Human embryonic stem cell (hESC)• IMR90
![Page 10: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/10.jpg)
Data control (1)• Remove cut site bias
Raw data Normalized data
![Page 11: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/11.jpg)
Data control (2)Compare 5C generated data for the HoxA
locus (correlation > 0.73)Compare with Phc1 locus 3C data
Compare with FISH data of 6 loci
![Page 12: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/12.jpg)
Data control (3)Pearson Correlation between replicates
![Page 13: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/13.jpg)
Visualization of interactionsWe can notice a Topological Associated Domain (TAD) structure at bins < 100kb
![Page 14: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/14.jpg)
Identification of topological domainsStep1: Detection of the interaction bias
We notice that in a TAD that :• The upstream portion is highly biased to
interact downstream • The downstream portion is highly biased to
interact upstream
a directionality index (ID) was defined to calculate this bias:
• Upstream bias• Downstream bias• the extent of the interaction
![Page 15: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/15.jpg)
DI calculation
Steps:• The genome was split into bins of length 40 kb• Let :
• A: # of reads that map in the 2M upstream of the bin• B: # of reads that map in the 2M downstream of the bin• E: expected number of reads
• Then :
-2Mb +2Mb40kb
A B
![Page 16: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/16.jpg)
Domain detection (1)• Each bin can have 3 states :
• Upstream biased
• Downstream biased
• No bias
• Use a HMM based on the DI to infer the biased state• We define :
• : The observed DI
• : The hidden bias
•
• The probabilities are calculated as follow:
• : the mixture weight
D D D D U U U N N N D D D U U
Domain Boundary Domain
` ` `𝑀 1 𝑀 2
𝑀 3
𝑸𝒕
𝒚 𝒕
𝑴 𝒕
𝑸𝒕+𝟏
𝒚 𝒕+𝟏
𝑴 𝒕+𝟏
DU N
![Page 17: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/17.jpg)
Domain detection (1)• The region between two TAD is termed :
• Topological boundary : if size < 400kb• Unrecognized chromatin : if size 400 kb
![Page 18: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/18.jpg)
What separates two TADs
• Studied the HoxA locus known to be separated into two compartments • Found that the CS5 insulator resides in the boundary
• Maybe insulators are enriched at the boundary ?
![Page 19: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/19.jpg)
CTFC role in the boundary • Studied other known insulator CTCF
![Page 20: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/20.jpg)
Heterochromatin and boundary• the H3K9me3 profile changed between cells hESC and IMR90 but the boundaries structure
didn’t change• potential link between the topological domains and transcriptional control in the mammalian
genome
![Page 21: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/21.jpg)
Characteristics of TAD• TAD are stable between cell lines
hESC
IMR90
![Page 22: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/22.jpg)
Characteristics of TAD• TAD are conserved between species
![Page 23: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/23.jpg)
Cell type specific interactions
• A binomial test is performed for each 20kb bin to determine is it is cell specific
• Calculate , the number of possible interactions at a distance
• Calculate the expected value or
• Then for each bin do a binomial-test to see if there is a deviation in the number cell specific interactions
d d d d
𝒏=(𝟑+𝟐+𝟏+𝟏 )+ (𝟐+𝟏+𝟒+𝟏 )=𝟏𝟓
mESC
Cortex
or
![Page 24: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/24.jpg)
Cell type specific interactions• 20% of the genes that have a FC 4 are found in dynamic interacting loci.• > 96% of the dynamic interactions occur in the same domain.• Model :
• domain organization is stable between cell types• but the regions within each domain may be dynamic,
![Page 25: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/25.jpg)
Factors forming the boundary (1)
• Boundaries are enriched for active promoter signals and gene bodies
![Page 26: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/26.jpg)
Factors forming the boundary (2)
![Page 27: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/27.jpg)
TAD vs A/B compartments (1)
• Loci found clustered in A compartments are generally:• gene rich,• transcriptionally active,• and DNase I hypersensitive,
Lieberman-Aiden, E., et al. (2009), Science (New York, N.Y.) 326(5950): 289-293.
Compartment B
Compartment A
• Loci found clustered in B compartments are generally:• gene poor,• transcriptionally silent• and DNase I insensitive
At a higher order the chromatin is organized into A and B compartments
![Page 28: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/28.jpg)
TAD vs A/B compartments (2)
TAD are smaller than A/B compartments
![Page 29: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/29.jpg)
TAD vs A/B compartments (3)In summary :
Gibcus, J. and J. Dekker (2013). "The hierarchy of the 3D genome." Molecular cell 49(5): 773-782.
![Page 30: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/30.jpg)
TAD vs A/B compartments (4)In summary :
Gibcus, J. and J. Dekker (2013). "The hierarchy of the 3D genome." Molecular cell 49(5): 773-782.
![Page 31: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/31.jpg)
TAD vs Lamina associated domains (LAD) (1)
![Page 32: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/32.jpg)
TAD vs Lamina associated domains (LAD) (2)
Nora, E., et al. (2013). BioEssays : news and reviews in molecular, cellular and developmental biology 35(9): 818-828.
![Page 33: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/33.jpg)
TAD vs LOCKs• LOCK: Large Organized Chromatin K9-modifications• Conserved regions exhibiting large H3K9Me2 difference between
cell lines
![Page 34: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/34.jpg)
Summary• The mammalian genome is segmented into a megabase-scale
domains
• Domain boundaries are stable between cell lines and species , suggesting that they are a basic property of the chromosome architecture.
• Domain boundaries are enricher for :• Transcriptionally active genes• Coincide with heterochromatin boundaries• Enriched with insulator proteins• Enriched with tRNA, SINE and housekeeping genes
• Developed many data-analysis approaches
![Page 35: Topological associated domains- Hi-C](https://reader033.vdocuments.site/reader033/viewer/2022061307/589ad9871a28abc93a8b6cf1/html5/thumbnails/35.jpg)
Summary• The mammalian genome is segmented into a megabase-scale
domains
• Domain boundaries are stable between cell lines and species , suggesting that they are a basic property of the chromosome architecture.
• Domain boundaries are enricher for :• Transcriptionally active genes• Coincide with heterochromatin boundaries• Enriched with insulator proteins• Enriched with tRNA, SINE and housekeeping genes
• Developed many data-analysis approaches