processing, integrating and analysing chromatin...

29
Processing, integrating and analysing chromatin immunoprecipitation followed by sequencing (ChIP-seq) data DATA SCIENCE AND MACHINE LEARNING FOR BIOINFORMATICS ALEX ESSEBIER

Upload: others

Post on 07-Aug-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Processing, integrating and analysingchromatin immunoprecipitation followed

by sequencing (ChIP-seq) data

DATA SCIENCE AND MACHINE LEARNING FOR BIOINFORMATICS

ALEX ESSEBIER

Page 2: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

The regulatory system

Dynamic, sequential gene expression changes required for normal development

Histonemodifications

Chromatinaccessibility Chromatin state

Sequence features

Transcription factorbinding

Cis regulatorymodules

Gene expression

Distal or proximalinteractions

Page 3: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

• Chromatin immunoprecipitation (ChIP-seq)

• To determine protein binding sites in vivo

• Transcription factor binding & histone modifications

• DNase I hypersensitivity (DHS) (also ATAC-seq and FAIRE-seq)

• Chromatin accessibility

• Protein binding footprints

High throughput sequencing (HTS) data

Page 4: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

• RNA-seq and cap analysis gene expression (CAGE)• Gene expression

• Alternate transcription start site usage

• Changes in expression (temporal or perturbation)

High throughput sequencing (HTS) data

Page 5: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

• Chromosome conformation capture • DNA looping

• Long distance enhancer/promoter interactions• 3C, 4C, 5C, Hi-C, CHi-C, ChIA-PET…

High throughput sequencing (HTS) data

Page 6: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

ChIP-seq

GENERATING AND INTERPRETING CHIP-SEQ DATA

Page 7: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

ChIP-seq experiment

Wet lab

• Extract DNA fragments bound by protein of interest

Page 8: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

ChIP-seq sequence analysis

• Sequence depth

– Depends on size of genome and type of protein

• Mammalian transcription factor

20 million reads

• Sequence quality

– Read quality summary

– Unusual base pair patterns

– Adapter sequences

– E.g. FastQC tool

Sequencing & quality control

Page 9: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

ChIP-seq alignment

Alignment & quality control

• Alignment summary

– Generated by alignment tools

– Uniquely aligned reads

Page 10: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

ChIP-seq alignment

Alignment & quality control

• Alignment summary

– Generated by alignment tools

– Uniquely aligned reads

• Read distribution quality

– ChIP-seq peaks create bimodal pattern

– Strand cross correlation analysis (SCCA)

Page 11: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Peak calling

GENERATING AND EXPLORING CHIP-SEQ PEAKS

Page 12: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Basic principles of peak calling

InputNo antibody exposure

Sample Exposed to antibody

Peak With statistical significance

Compared to

To generate

Page 13: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Peak calling tools

• Everyone seems to have built their own!

• Omic Tools reports 86 ChIP-seq tools• 51 in 2016

• In-house tools

• Find or adapt a tool that fits your needs

• Potentially, develop your own

Be aware!Not all tools are well documented or tested

https://xkcd.com/927/

Page 14: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

So, how do I choose a tool?

• Choice of tool depends on problem• Based on experiment itself

• Based on sequence data and read distribution

Page 15: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Combining tools and replicates

Peak caller Total Unique

MACS2 42,536 12%

HOMER 45,044 19%

SPP 19,474 0.7%

• Use multiple peak callers • Combine outcomes (e.g. common peaks)

• Take advantage of strengths of different peak callers• Bolster weaknesses

• Biological replicates can vary significantlyCall peaks for replicates individually

Compare/overlap to achieve

‘golden standard’

• Comparisons are dominated by poor replicate

10,000 peaks

3,000 peaks

Page 16: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Peak quality control• Number of peaks

• Motif enrichment

• Read coverage• Fraction of reads in peaks (FRiP) > 1%• Generally observe > 10%

• E.g. below 6/50 reads in peak -> 12%

5T0U (PDB)Hashimoto et al. 2017

CTCF and DNA

Page 17: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

ChIP-seq complications

• ChIP-seq generates peaks for all of these events

Page 18: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Integrating data

COMBINING DATA SETS TO IMPROVE OUTCOMES

Page 19: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Data integration

• Experiments capture dependent regulatory events • ChIP-seq – regulatory elements

• DHS – chromatin accessibility

• RNA-seq – expression patterns

• Consider multiple datasets to:• Improve confidence

• Improve understanding

• Support hypotheses

https://marketoonist.com/2014/01/big-data.html

Page 20: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Supporting histone modifications

• Explore chromatin environment • Layer/overlap histone modifications

• DHS – chromatin accessibility

Page 21: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Supporting transcription factors

• Transcription factors preferentially bind open/active chromatin and regulatory regions

• Alter expression of genes

• RNA-seq on knock-out of transcription factor• Identify genes with significant change in expression

Page 22: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

System complexity

• Small number of differentially expressed genes are bound by target transcription factor

• System redundancy

• Indirect changes in expression

Ma et al. 2014

Page 23: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

More complicated relationships

• Overlapping is simple

• Next steps: pattern identification, prediction, classification

• Machine learning approaches

• Hidden Markov model to classify epigenetic state

• Bayesian network to predict transcription factor binding events

• Random forest of decision trees to predict long distance enhancer/promoter interactions

Page 24: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Take home messages

• Understand your data and how best to use it

• Quality control!

• Peak calling• Use multiple tools where possible

• Keep up to date with advances

• Data integration• Combine resources and data to gain a more complete picture

Page 25: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Resources

• Data/figures• Klisch, T. J., Xi, Y., Flora, A., Wang, L., Li, W., & Zoghbi, H. Y. (2011). In vivo Atoh1 targetome reveals how a proneural transcription

factor regulates cerebellar development. Proceedings of the National Academy of Sciences,108(8), 3288-3293. • Frank, C. L., Liu, F., Wijayatunge, R., Song, L., Biegler, M. T., Yang, M. G., ... & West, A. E. (2015). Regulation of chromatin accessibility

and Zic binding at enhancers in the developing cerebellum. Nature neuroscience, 18(5), 647-656. • Ma, S., Kemmeren, P., Gresham, D., & Statnikov, A. (2014). De-novo learning of genome-scale regulatory networks in S.

cerevisiae. Plos one, 9(9), e106479.• Hashimoto, H., Wang, D., Horton, J. R., Zhang, X., Corces, V. G., & Cheng, X. (2017). Structural basis for the versatile and methylation-

dependent binding of CTCF to DNA. Molecular cell, 66(5), 711-720.

• Useful papers • Bailey, T., Krajewski, P., Ladunga, I., Lefebvre, C., Li, Q., Liu, T., ... & Zhang, J. (2013). Practical guidelines for the comprehensive

analysis of ChIP-seq data. PLoS Comput Biol, 9(11), e1003326. • Landt, S. G., Marinov, G. K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., ... & Chen, Y. (2012). ChIP-seq guidelines and

practices of the ENCODE and modENCODE consortia. Genome research, 22(9), 1813-1831. • Farnham, P. J. (2009). Insights from genomic profiling of transcription factors.Nature Reviews Genetics, 10(9), 605-616. • Zhang, Y., Liu, T., Meyer, C. A., Eeckhoute, J., Johnson, D. S., Bernstein, B. E., ... & Liu, X. S. (2008). Model-based analysis of ChIPSeq

(MACS).Genome biology, 9(9), 1. • Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y. C., Laslo, P., ... & Glass, C. K. (2010). Simple combinations of lineagedetermining

transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Molecular cell, 38(4), 576-589.

Page 26: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

ChIP-seq complications

• Possible to observe multiple states at one genomic location

• False negatives• Can’t detect small sub-populations

• False positives• General non-specific chromatin being pulled down

• Bias not removed by input

• Replicates can resolve variation

Page 27: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Transcription Factors

• Confirm in vitro and in silico results• Overlapping peaks with motifs

• Identify consensus motif• For transcription factors which do not have an existing/known motif

• To identify variations in motif

• Differential peak binding• To identify differences in binding patterns

• Compare cell types or time points

Page 28: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Histone Modifications

• Epigenetic analysis• Generate epigenetic profiles

• Identify chromatin states genome wide• E.g. ChromHMM

• Identify regulatory modules • E.g. promoters or enhancers

• Differential peak binding• Identify differences in epigenetic patterns

Page 29: Processing, integrating and analysing chromatin ...bioinformatics.org.au/winterschool/wp-content/... · Processing, integrating and analysing chromatin immunoprecipitation followed

Long distance regulation

• Chromosome conformation capture reports different types of interactions

• Histone modifications can identify enhancer-promoter interactions

• Filter out structural interactions