héctor corrada bravo cmsc858p spring 2012 (many slides courtesy of rafael irizarry)

74
Introduction to epigenetics: chromatin modifications, DNA methylation and the CpG Island landscape (part 2) Héctor Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Upload: feryal

Post on 23-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Introduction to epigenetics: chromatin modifications, DNA methylation and the CpG Island landscape (part 2). Héctor Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry). How do we measure DNA methylation?. Microarray Data. One question…. Where do we measure? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Introduction to epigenetics: chromatin modifications, DNA methylation and the CpG Island landscape (part 2)

Héctor Corrada BravoCMSC858P Spring 2012

(many slides courtesy of Rafael Irizarry)

Page 2: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

How do we measure DNA methylation?

Page 3: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Microarray Data

Page 4: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

One question…

• Where do we measure?

• At least 7 arrays are needed to measure entire genome

• CpG are depleated

• Remaining CpGs cluster

Page 5: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

CpG Islands

Page 6: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

But variation seen outside

Page 7: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

McRBC

No Methylation

Cuts at AmCG or GmCG Input

Page 8: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

McRBC

Methylation

Page 9: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

McRBC after GEL

Methylation

Page 10: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

McRBC after GEL

Methylation

Page 11: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Now unmethylated

No Methylation

Page 12: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

McRBC after Gel

No Methylation

Page 13: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)
Page 14: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)
Page 15: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)
Page 16: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Gene Expression Normalization does not work well here

Page 17: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

We use control probes

Page 18: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

There are also waves

Page 19: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Smoothing

Page 20: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

McRBC on tiling two channel array

We smooth

Page 21: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Proportion of neighboring CpG also methylated/not methylated

Page 22: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

True signal (simulated)

Page 23: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Observed data

Page 24: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Observed data and true signal

Page 25: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

What is methylated (above 50%)?

Page 26: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Naïve approach

Page 27: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Many false positives (FP)

Page 28: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Smooth

Page 29: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

No FP, but one false negative

Page 30: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Smooth less? No FN, lots of FP

Page 31: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

We prefer this!

Page 32: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

CHARMDMR for three tissues (five replicates)

Irizarry et al, Nature Genetics 2009

Page 33: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Some findings

[Irizarry et al., 2009, Nat. Genetics]

Page 34: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Tissue easily distinguished

Page 35: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Cancer DMR

Page 36: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Many Regions like thisNote: hypo and hyper methylation

Page 37: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Both hyper and hypo methylated

Page 38: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Cancer and Tissue DMRs coincide

Page 39: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

DMR enriched in Shores

Page 40: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Still affects expression

T-DMRs

Page 41: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Still affects expression

C-DMRs

Page 42: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

USING SEQUENCING (BS-SEQ)

Page 43: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

TTCGATTACGA

AAGCTAATGCT

CH3

CH3

TTCGATTACGA

AAGCTAATGCT

CH3

CH3

Liver Brain

Page 44: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

TTCGATTACGA

AAGCTAATGCT

CH3

CH3

TTCGATTACGA

AAGCTAATGCT

CH3

CH3

TTCGATTACGA

AAGCTAATGCT

CH3

CH3

TTCGATTACGA

AAGCTAATGCT

CH3

TTCGATTACGA

AAGCTAATGCT

CH3

CH3TTCGATTACGA

AAGCTAATGCT

CH3

CH3

TTCGATTACGA

AAGCTAATGCT

CH3

CH3

85% Methylationchr3:44,031,616-44,031,626

Page 45: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Bisulfite Treatment

Page 46: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Bisulfite Treatment

GGGGAGCAGCATGGAGGAGCCTTCGGCTGACT

GGGGAGCAGTATGGAGGAGTTTTCGGTTGATT

Page 47: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

BS-seq

GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTCGTAGTATCTGTC TATGTCGTAGTATTTG TATATCGTAGTATTTT TATATCGTAGTATTTG NATATCGTAGTATNTG TTTTATATCGCAGTAT ATATTTTATGTCGTA ATATTTTATCTCGTA ATATTTTATGTCGTA GA-TATTTTATGTCGTGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC

GTTCAATATT

Coverage: 13Methylation Evidence: 13Methylation Percentage: 100%

Page 48: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

BS-seq

GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTCGTAGTATCTGTC TATGTCGTAGTATTTG TATATTGTAGTATTTT TATATCGTAGTATTTG NATATTGTAGTATNTG TTTTATATTGCAGTAT ATATTTTATGTCGTA ATATTTTATCTTGTA ATATTTTATGTCGTA GA-TATTTTATGTCGTGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC

GTTCAATATT

Coverage: 13Methylation Evidence: 9Methylation Percentage: 69%

Page 49: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

BS-seq

GTCGTAGTATTTGTCT GTCGTAGTATTTGTNN TGTTGTAGTATCTGTC TATGTTGTAGTATTTG TATATTGTAGTATTTT TATATTGTAGTATTTG NATATTGTAGTATNTG TTTTATATTGCAGTAT ATATTTTATGTCGTA ATATTTTATCTTGTA ATATTTTATGTTGTA GA-TATTTTATGTCGTGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTTCGTCTGGGGGGTATGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATCCTATTATTTATCGCACCTAC

GTTCAATATT

Coverage: 13Methylation Evidence: 4Methylation Percentage: 31%

Page 50: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

BS-seq

• Alignment is much trickier:– Naïve strategy: do nothing, hope not many CpG in a

single read– Smarter strategy: “bisulfite convert” reference: turn all

Cs to Ts• Also needs to be done on reverse complement reference and

reads– Smartest strategy: be unbiased and try all combinations

of methylated/un-methylated CpGs in each read• Computationally expensive (see Hansen et al, 2011, for a

strategy)

Page 51: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

BS-seq

• There are similarities to SNP calling (we’ll see this in a couple of weeks)

• EXCEPT: we want to measure percentages– Use a binomial model to estimate p, percentage of

methylation– Allow for sequencing errors, coverage differences,

etc.

Page 52: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Measuring DNA Methylation

• Estimating percentages• Use “local-likelihood”

method– Based on loess

(Plot courtesy of Kasper Hansen)

Page 53: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

BS-seq

Lister et al. 2009, Nature

Page 54: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Gene Expression Regulation: DNA methylation in promoter regions

Lister et al. 2009, Nature

Page 55: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

DNA methylation patterns within genomic regions

Lister et al. 2009

Page 56: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Putting it together

Page 57: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)
Page 58: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

What were we after?

• The epigenetic progenitor origin of human cancer• [Feinberg, et al., Nature Reviews Genetics, 2006]• Stochastic epigenetic variation as driving force of

disease• [Feinberg & Irizarry, PNAS, 2009]• Phenotypic variation, perhaps epigenetically mediated,

increases disease susceptibility• Increased epigenetic and gene expression variability of

specific genes/regions is a defining characteristic of cancer

Page 59: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

What did we do?

• Custom Illumina methylation microarray• Confirmed increased epigenetic variability in

specific regions across five cancer types

Page 60: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

What did we do?

• Custom Illumina methylation microarray• Confirmed increased epigenetic variability in

specific regions across five cancer types

Page 61: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

What did we do?• Custom Illumina methylation microarray

• Confirmed increased epigenetic variability in specific regions across five cancer

types

• Confirmed same sites are involved in tissue differentiation

Page 62: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

What did we do?• Custom Illumina methylation microarray

• Whole genome sequencing of bisulfite treated DNA– Found large blocks of hypo-methylation (sometimes Mbps long) in

colon cancer

Page 63: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

What did we do?• Custom Illumina methylation microarray

• Whole genome sequencing of bisulfite treated DNA– Found large blocks of hypo-methylation (sometimes Mbps long) in

colon cancer– These regions coincide with hyper-variable regions across cancer types

Page 64: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

What did we do?• Custom Illumina methylation microarray• Whole genome sequencing of bisulfite treated DNA• Gene Expression Analysis

Page 65: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Gene Expression Data

Page 66: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Gene Expression Data

When using multiple microarray experiments, proper normalization is key[McCall, et al., Biostatistics 2010]

Page 67: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Normalization is key

• fRMA: a single-chip normalization procedure• GNUSE: a single-chip quality metric• Barcode: a single-chip common-scale

measurement

Page 68: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

What did we do?• Custom Illumina methylation microarray• Whole genome sequencing of bisulfite treated DNA• Gene Expression Analysis

– Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks

[Corrada Bravo, et al., under review]

Page 69: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

What are we doing next?• Custom Illumina methylation microarray• Whole genome sequencing of bisulfite treated DNA• Gene Expression Analysis

– Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks

Page 70: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Bigger gene expression study

• 7,741 HGU133plus2 samples• 598 normal tissue samples, 4,886 tumor

samples• 176 different tissue types• 175 different GEO studies

Page 71: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

Bigger gene expression study

[Corrada Bravo, et al., under review]

Page 72: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

What are we doing next?• Custom Illumina methylation microarray• Whole genome sequencing of bisulfite treated DNA• Gene Expression Analysis

– Genes with hyper-variable gene expression in colon cancer are enriched in hypo-methylation blocks

– Tissue-specific genes have hyper-variable gene expression across cancer types

[Corrada Bravo, et al., under review]

Page 73: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

[Corrada Bravo, et al., under review]

Page 74: Héctor  Corrada Bravo CMSC858P Spring 2012 (many slides courtesy of Rafael Irizarry)

[Corrada Bravo, et al., under review]