chip-seq data and analysis · analysis of chip-seq data differential binding analysis...
TRANSCRIPT
![Page 1: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/1.jpg)
ChIP-Seq data and analysis
Bori Mifsud
Postdoc in Luscombe group
18.09.2014
Computational biology UCL-LRI
![Page 2: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/2.jpg)
Why do Chromatin Immunoprecipitation (ChIP)?
~99.9% identical genetic material
![Page 3: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/3.jpg)
100% identical genetic material
![Page 4: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/4.jpg)
Proteins DNA RNA
transcription translation
![Page 5: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/5.jpg)
ChIP to understand transcriptional regulation!
Map regulatory elements: Transcription Factors
–ChIP Histone marks
–ChIP DNA Methylation
–MeDIP etc. Nucleosomes RNA Polymerase
–Pol II ChIP
![Page 6: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/6.jpg)
ChIP-seq protocol
![Page 7: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/7.jpg)
Analysis of ChIP-seq data
Differential binding analysis –Occupancy-based analysis –Affinity-based analysis
Validation and downstream analysis
–Motif analysis –Annotation –Integrating binding and expression data
Experimental design –Controls and replicates
QC/Read processing –Library QC –Alignment and filtering –QC measures and assessment
Peak calling –Peak callers
![Page 8: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/8.jpg)
ENCODE project
Landt et al. (2012) ChIP-seq guidelines and practices of the ENCODE and modENCODE Consortia. Genome Research 22: 1813-1831
Chen et al. (2012) Systematic evaluation of factors influencing ChIP-seq fidelity. Nat Methods 9: 609
![Page 9: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/9.jpg)
![Page 10: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/10.jpg)
![Page 11: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/11.jpg)
![Page 12: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/12.jpg)
![Page 13: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/13.jpg)
![Page 14: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/14.jpg)
![Page 15: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/15.jpg)
Consideration 2: Why do you need controls?
![Page 16: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/16.jpg)
Consideration 2: Why do you need controls?
![Page 17: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/17.jpg)
• Non-uniform fragmentation (euchromatin-heterochromatin)
• GC sequncing bias
Consideration 2: Why do you need controls?
[Chen et al, 2012]
![Page 18: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/18.jpg)
Consideration 2: Why do you need controls?
![Page 19: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/19.jpg)
![Page 20: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/20.jpg)
Consideration 2: Why do you need controls?
The more sequencing depth you have for the input the better you can identify peaks!
[Chen et al, 2012]
![Page 21: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/21.jpg)
(over 100 million reads – HiSeq)
![Page 22: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/22.jpg)
Transcription factor – tight, highly-peaked binding region
RNA PolII – enriched at TSS but bound throughout gene
body
ChIP-Seq data from fly S2 cells
Proteins bind in different ways
![Page 23: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/23.jpg)
Activating mark (near TSS)
Peaks within body of active genes
Peaks within body of inactive genes
![Page 24: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/24.jpg)
![Page 25: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/25.jpg)
10
Supplementary Figure 10. The change in identified ChIP‐enriched regions of (a) Su(Hw) and (b) H3K36me3 with respect to the regions that were identified using the complete data is shown with the increase of sequencing depth for different algorithms. Macs‐f3 and Useq‐f3 denote the Su(Hw) regions that have more than 3 fold enrichment and were identified by Macs and Useq, respectively.
Nature Methods: doi:10.1038/nmeth.1985
Consideration 3: Sequencing depth (optimum is different for different peak finder software)
[Chen et al, 2012]
Plateau for most peak finders ~16.2 M reads in Drosophila (corresponding to ~327 M reads in human)
![Page 26: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/26.jpg)
• There is a difference when you assess the complexity of the sample
Reproducibility information gives confidence in peaks, helps choosing thresholds (IDR)
![Page 27: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/27.jpg)
Data processing steps
schematic of
ChIP-seq
experiments
[Park et al, 2009]
ChIP
sequencing
alignment
peak-finding
![Page 28: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/28.jpg)
Quality control
• Read quality
• Sequence content • Duplication (PCR artefacts) • Library complexity (overrepresented sequences)
• Contamination
Many tools (SAMstat, htSeqTools, fastQC etc.)
![Page 29: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/29.jpg)
Quality control
• Read quality
• Sequence content • Duplication (PCR artefacts) • Library complexity (overrepresented sequences)
• Contamination
Many tools (SAMstat, htSeqTools, fastQC etc.)
![Page 30: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/30.jpg)
Quality control
• Read quality
• Sequence content • Duplication (PCR artefacts) • Library complexity (overrepresented sequences)
• Contamination
Many tools (SAMstat, htSeqTools, fastQC etc.)
![Page 31: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/31.jpg)
Quality control
• Read quality
• Sequence content • Duplication (PCR artefacts) • Library complexity (overrepresented sequences)
• Contamination
Many tools (SAMstat, htSeqTools, fastQC etc.)
![Page 32: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/32.jpg)
e.g. BWA, Bowtie
![Page 33: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/33.jpg)
![Page 34: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/34.jpg)
Strand information for quality control
[Landt et al, 2012]
![Page 35: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/35.jpg)
Basic idea: Count the number of reads in windows and determine whether this number is above background – if so, define that region as bound
![Page 36: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/36.jpg)
MACS 2.0 USeq SISSRs
Calculating peakshift for 1000 best peaks Shift reads 3’ Identify potentially bound regions Calculate enrichment and significance using poisson distribution with local λ
Calculating peakshift Shift reads 3’ Define windows Calculate enrichment per window, significance using negative binomial Join regions that are within max gap eFDR
Estimate fragment length (mean sense-antisense dist) Windows with w/2 shift through genome Define potential peaks by transition in net tag count (n sense-nantisense) Calculate enrichment and significance using poisson
![Page 37: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/37.jpg)
[Park 2009]
Downstream of ChIP
![Page 38: ChIP-Seq data and analysis · Analysis of ChIP-seq data Differential binding analysis –Occupancy-based analysis –Affinity-based analysis Validation and downstream analysis –Motif](https://reader033.vdocuments.site/reader033/viewer/2022043008/5f99e26ef1e42a7991769329/html5/thumbnails/38.jpg)
Landt et al. (2012) ChIP-seq guidelines and practices of the ENCODE and modENCODE Consortia. Genome Research 22: 1813-1831
Chen et al. (2012) Systematic evaluation of factors influencing ChIP-seq fidelity. Nat Methods 9: 609 Meyer & Liu (2014) Identifying and mitigating bias in next generation sequencing methods for chromatin biology. Nature Reviews Genetics doi:10.1038/nrg3788
References: