epigenomicenrichment...

Post on 16-Jul-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

EuroBioc 2019 – Brussels

Dario Righelli – PhD

Istituto per le Applicazioni del Calcolo «M. Picone» – CNR - Napoli

d.righelli@na.iac.cnr.it || dario.righelli@gmail.com

Epigenomic enrichmentanalysis using Bioconductor

drighelli

What’s the aim?Compare methods and provide guidelines on epigenomic data analysis

ATAC-seq dataset

Yijing Su et al. 2017 - Nature Neuroscience - Neuronal activity modifies the chromatin accessibility landscape in the adult brain

Before Fear Induction Condition (E0)

After Fear Induction Condition (E1)

}

}

4 biological replicates

4 biological replicates

}Catching differencesin open chromatineregions

ChIP-seq dataset (NULL dataset)

Home Cage Controls - Histon 3, Lysine 9 Acetilation (H3K9ac)

} 9 biological replicates} How many random differences are weable to catch inside a control dataset?

BWA and Bowtie2 perform the same

o Most used aligners for

epigenomics data

o Correlation computed on

ChIP-seq data coverages

o used DeepTools

plotCorrelation tool

o Computed correlations on the

coverages of the same

samples on BWA and Bowtie2

bams have value of 1.

o MACS2 (No Bioconductor)

o Most used peak caller

o Broad and Narrow peaks option

o DEScan2

o Has a peak detector in R

o Peak resolution -> bin size

o Can work with external peaks

o DiffBind

o No peak detection

o Fast on matrix construction

o Uses external peaks

o CSAW

o Starts from BAM files

o Computes matrix of bins x samples

o edgeR

o Widely used method

o Very flexible in usage

A Bioconductor Approach

PeakCallers DESCan2 MACS2 CSAW

NarrowBroad

PeakConsensus &Matrices

DESCan2 DiffBind CSAW

DifferentialEnrichment edgeR

Counts Normalization AffectsDifferentially Accessible Regions (DARs)

o Pay attention to the normalizationprocess

o One tryes to apply a classic RNA-Seqnormalization

o The process does not always give the same results

o Maybe some more specificnormalization is required for this kind of data

ATAC-seq dataset

16652

11982

79567505

6597 6530

4491

976 976654 599 523 514 344 282

0

5000

10000

15000

Inte

rsec

tion

Size

DiffBindBroad

CSAW

DiffBindNarrow

DEScan_Z10_K4_DARs

02000040000Set Size

ATAC−seq DARs

Comparing DARs across methods

o All the methods have the biggest

overlap on the detected peaks

o CSAW and DiffBind show a big amount

of not-overlapping regions

o DEScan2 shows the lowest number of

not-overlapping regions

o The big amount of not-overlapping

regions by CSAW and DiffBind suggests

a possible high-level of false positive

regions detected.

o Ad-hoc designed UpsetPlot on GRanges

o Based on findOverlaps method

Peaks contrasts on NULL dataset show no results

H3K9ac ChIP-seq dataset o Compared performances on a nulldataset of ChIP-seq H3K9ac samples

o Performed 126 permutations of samples

o Samples are randomly divided in two groups

o All the possible permutations on 9 samples (126)

o All the methods find mostly 0 Differential Enriched Peaks on the random conditions.

o Sometimes some differences have beenfound

o With and without normalization

0

2

4

6

8

DEScan2_NoNorm

DEScan2_Norm

DES2_M2_Broa_NoNorm

DES2_M2_Broa_Norm

DES2_M2_Narr_NoNorm

DES2_M2_Narr_Norm

DiffBind_Narr_Norm

DiffBind_Broad_Norm

method

nElem normalized

NOYES

What’s Next?On-going and future works

Some comparisons are still needed

o Compare CSAW on ChIP-seq

o Compare normalization methods with all epigenomics methods

o Explore in-silico biological functions of results

o Testing ATAC—seq Single Cell dataset

• Dr. Claudia Angelini – Istituto per le Applicazioni del Calcolo-CNR

• Dr. Davide Risso – Univeristy of Padua

• Dr. Lucia Peixoto – Elon S. Floyd College of Medicine, Washington State University

• Dr. Timothy Triche Jr. - Van Andel Research Institute

• Dr. Ben Johnson - Van Andel Research Institute

• Thank you for your Attention!

Acknowledgements

http://lists.moo.gs/mailman/listinfo/biocmeetup.naplesnapoli.r.bioc@gmail.com

https://www.facebook.com/pg/NapoliRBiocMeetup

Napoli R/Bioconductor Meetupo Since Nov 2018

o R Consortium Array Group

o At least 25 people any event with a goodturn-over of attendees

o Eight meetups until now

o R Package Creation

o scRNA-seq Analysis

o Differentially Methylated Regions Analysis

o Microscope Image Processing

o Chromosomal Copy Number ChangesDetection

o Bulk RNA-seq Differential Expression

o Hi-C data analysis using HiCeekR

o Metagenomics analysis workflow

Napoli R/Bioconductor Meetup

• Part of a wider idea

• Third city in the World

• Boston (USA)

• New York (USA)

• Napoli (IT)

• Useful to

• share ideas and workflows

• create new collaborations

• extend bioinfo community

Is there a best Aligner?Bowtie2 vs BWA

Comparing DARs across methods (2)

ATAC-seq dataset

73600

60794

24236

1612013994

58544376

2879242421221913185714571046722 703 466 410 351 350 327 241 137 100 93 89 9 5 5 3 20

20000

40000

60000

80000

Inte

rsec

tion

Size

DEScanMACSNarrow

DEScanMACSBroad

DiffBindMACSBroad

DEScanZ10K4

DiffBindMACSNarrow

0250005000075000100000125000Set Size

ATAC−seq Regions Nar/Broa & DEScan2 o Ad-hoc designed UpsetPlot on Granges

o Based on findOverlaps method

o Results description

Duplicates Removal doesn’t impact peak detection

o Diagonal Correlations on

counts matrices show that

there is no big differences

between duplicates and

no-duplicates samples

o rmDup with samtools

o DEScan2 counts matrices

o DiffBind counts matrices

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

noDup_E0_1

noDup_E0_2

noDup_E0_3

noDup_E0_4

noDup_E1_1

noDup_E1_2

noDup_E1_3

noDup_E1_4

withDup_E0_1

withDup_E0_2

withDup_E0_3

withDup_E0_4

withDup_E1_1

withDup_E1_2

withDup_E1_3

withDup_E1_4

DEScan2DEScan2

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

noDup_E0_1

noDup_E0_2

noDup_E0_3

noDup_E0_4

noDup_E1_1

noDup_E1_2

noDup_E1_3

noDup_E1_4

withDup_E0_1

withDup_E0_2

withDup_E0_3

withDup_E0_4

withDup_E1_1

withDup_E1_2

withDup_E1_3

withDup_E1_4

DiffBind

DiffBind

Dup_DEScan2 noDup_DEScan2 Dup_DiffBind noDup_DiffBind

Final Peaks with/without Duplicates

010000

20000

30000

40000

DEScan2 – Differential Enriched Scan 2

• Filter out the peaks with a score lower than a

user-defined threshold

• Aligns the peaks over user-defined number of

samples

• Different thresholds produce different trends

in number of final peaks detected

top related