methods for the detection of conservation of methylation ...marjoram... · methods for the...

Post on 19-Aug-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Methods for the detection of conservation of methylation

(R21, year 2)

(or, how statisticians can make your life more complicated)

Let’s detect conservation!

Darryl Shibata

1. The phenotype of a cell is determined by its epigenome.

2. Develop methods to rank genes/pathways/regions according to their degree of conservation of methylation status.

3. Thus, we will identify the genes/pathways most important for growth of an individual Colorectal cancer

4. (A small step towards) Personalized medicine!

That sounds great, but…

1. What exactly do you mean by conservation? How do we measure it?

2. How does your technology work?

3. What kind of error rates do you get using your technology (Illumina Infinium MethylationEPIC BeadChip Kit)?

4. You need this by when?!?

What does conservation mean?

Scherer, et al., Nucleic Acids Research, 2020, Vol. 48, No. 8

What does conservation mean?

Scherer, et al., Nucleic Acids Research, 2020, Vol. 48, No. 8

Pooled Data

6

Samples from left and right side of tumor.

Our preliminary data are pooled, rather than cell-level. • Output = “beta values”. • After much QC, we can think of the beta value for a given

site as the “proportion of cells that are methylated at that site in that sample”.

Statistics to measure conservation

• Variance - by site or by region • Manhattan distance • Proportion of sites with extreme methylation

proportions ([0,0.2] or [0.8,1])

7

Overall strategy

• e.g., for a gene-based analysis: • Calculate the observed statistic value for

each gene • Rank genes according to those values • Pick-off the genes with highest(lowest)

value, as the most conserved.

8

• Look for genes in which methylation is statistically significantly conserved across normal tissue.

• Belief: those genes are “important”. • ‘Validate’: Look at expression of those conserved genes

in the Expression Atlas database. Is it high? • Colon • Small intestine • Endometrium

Proof of concept / QC

Variance of Statistic

• Suppose n sites in the region/gene.

• Proportion of sites with extreme methylation proportions ([0,0.2] or [0.8,1]) • Binomial distribution: variance is O(1/n)

• Variance • Variance of variance:

10

=(n� 1)2µ4

n3� (n� 1)(n� 3)µ2

2

n3= O(1/n)

<latexit sha1_base64="uEADy/LAHmOHSF2KL6d+syM/X8o=">AAACKnicbZDLTgIxFIY7XhFvqEs3jcQEFuAMkKgLEtSNOzGRSwLDpFM60NDpTNqOCZnwPG58FTcsNMStD2K5LBA8SZs//3dO2vO7IaNSmebE2Njc2t7ZTewl9w8Oj45TJ6d1GUQCkxoOWCCaLpKEUU5qiipGmqEgyHcZabiDhylvvBIhacBf1DAkto96nHoUI6UtJ3VXbnsC4TjDc1a2U4BtP3JKo5h3iiOYg0tMX8XslBY6hTkvP2WsK551Umkzb84KrgtrIdJgUVUnNW53Axz5hCvMkJQtywyVHSOhKGZklGxHkoQID1CPtLTkyCfSjmerjuCldrrQC4Q+XMGZuzwRI1/Koe/qTh+pvlxlU/M/1oqUd2PHlIeRIhzPH/IiBlUAp7nBLhUEKzbUAmFB9V8h7iMdj9LpJnUI1urK66JeyFul/O1zKV25X8SRAOfgAmSABa5BBTyCKqgBDN7AB/gEX8a7MTYmxve8dcNYzJyBP2X8/AJycaOd</latexit>

Bootstrapping:

• For a gene with m CpG sites: • Construct a large number of ‘null genes’ that also

have m CpG sites. • Calculate statistic value for each null gene. • Rank the statistic value for the observed gene

within the set of values for null genes to get a p-value.

• Do this for all genes of all lengths, and look at those with lowest resulting p-values.

11

Software: Methcon5

12

JCO Clin Cancer Inform 4:100-107. © 2020 by American Society of Clinical Oncology

github.com/USCbiostats/MethCon5

More complex models?

𝛼k: region-specific tendency to be methylated𝜌n: density of CpGs at nth CpG𝛽k: region-specific dependence on densitydn: distance between nth CpG and (n-1)th CpGγk:region-specificcorrelationparameter

Software: DNAMeda (Shiny App - Visualization)

Islands

Shores

Non-island

github.com/USCbiostats/DNAMeda

Software: mutational signatures

15

9

A Unifying Model to Test Difference?

Somatic mutations

pmsignature EstimatedProportions,

!"

Are !"different in

two groups?

e.g., Wilcoxon

Uncertainty in Proportions

HiLDA

HiLDA = “Hierarchical Latent Dirichlet Allocation”

!!~#$% &!", . . . , &!#, )!!"~#$% &"", . . . , &"#, )"

HILDA

16github.com/USCbiostats/HiLDA

Software: iMutSig

17

iMutSig - Shiny app.

18

Acknowledgements

19

USC: Emil Hvitfeldt, Kim Siegmund, Darryl Shibata, Zhi Yang

JH: Hari Easwaren, Tom Pisanic

And many thanks to ITCR

top related