epigenome roadmap ge-mvt2016-amb.slideshare

69
Epigenomics Roadmap where the road has led Uppsala, 20160208 Álvaro Martínez Barrio, PhD [email protected] linkedin.com/in/ambarrio @ambarrio

Upload: barrioam

Post on 15-Apr-2017

383 views

Category:

Health & Medicine


3 download

TRANSCRIPT

  • EpigenomicsRoadmapwheretheroadhasled

    !Uppsala,20160208

    lvaroMartnezBarrio,[email protected]/in/ambarrio@ambarrio

  • Today

    10:15-11:00EpigenomicsRoadmap1stpart(C8:321)11:00-11:15Legstretcher11:15-12:00EpigenomicsRoadmap2ndpart(C8:321)12:00-13:15Lunch13:15-14:45ComputerlabEpigenomicsRoadmap(A6:001)14:45-15:00WrapupEpigenomicsRoadmap15:15-16:00SciLIfeLab/TheSvedberg(A1:111a);HumanandgreatapesgenomediversityandevolutionofDNAmethylationbyTomasMarques-Bonet,Barcelona

  • Aboutme

    lvaroMartnezBarrio,[email protected]/in/ambarrio@ambarrio

    PhDBioinformatics2010 PostdocPopGenetics/CompBiol2014,L.Andersson+H.Ronne

    BioinformaticsScientist@scilifelab

  • Objectives

    Understandtheimportanceofaninternationalresearcheffortinepigenomics

    KnowthehighlightsoftheEpigenomicsRoadmapproject

    Extractsomepracticalcaseswheretheroadmapmayhelpmedicalresearch

    GetpracticalexperienceusingtheEpigenomicsRoadmapdata

  • Veryshortknowledgereview

    epigenetic |pdntk| adjective 1 Biology resulting from external rather than genetic influences: epigenetic carcinogens. relating to or of the nature of epigenesis. 2 Geology formed later than the surrounding or underlying rock formation. DERIVATIVES epigenetically adverb, epigenetics pluralnoun

  • Waddington's "Epigenetic Landscape during differentiation

    (C.H Waddington, 1957)

  • LevelingWaddington:theemergenceofcellreprogrammingandtransdifferentiation

    LadewigJ,KochP&BrstleO(2013)NatureReviewsMolecularCellBiology14,225-236

  • After Waddington's "Epigenetic Landscape" (C.H Waddington, 1957)

  • (C.H Waddington, 1957)

  • Epigenomeaffectsgeneexpression

    CpG

    High Medium Low Ball M et al. Nature BT 27:361, 2009

  • Epigenetics

    Upon/on/above/over/aroundthegenome Externalorenvironmentalfactorscanswitchon/offthecellularmachinery

    With(out)underlyingchangesinDNAsequence

    Changesintheregulationofgeneexpressionthatcanbepassedontoacellsprogeny

    Changescanbereversed

  • Studiesin648twins,97MZ,162DZand130singletons.

    GenotypingandHumanMeth450Chip. VariationinDNAmethylationhighlyheritable(37%)

    Commonenvironmentexplain2%ofvariation Remainingvariationduetonon-sharedenvironmentandstochasticfactors.

    Grundberg E et al (2013) AJHG 93:876

    NaturevsNurture(inDNAmethylation)

  • ByNIH,PublicDomain

  • GG13CH03-Hawkins ARI 25 July 2012 11:40

    Odd-numberednucleosome

    Even-numberednucleosome

    Plane ofnucleosome layers

    DNA

    Protein scaffold

    Chromatin loop

    Metaphasechromosome

    1

    23

    4

    5135

    241

    23 5

    f Organization of wholechromosomes inside thenucleus (quaternary level)

    d Loops of 30-nmfiber (tertiary level)

    e Interdigitating layers ofirregularly organizednucleosomes (tertiary level)

    a 11-nm fiber(primary level)

    b Nucleosome stacking(folded 11-nm fiber withzigzag linker DNA)

    c 30-nm fiber(secondary level)

    Nucleus

    Figure 1Different levels of chromatin compaction. (a) Multiple nucleosomes in a row form the 11-nm fiber that is the primary level ofchromatin compaction. Alternating nucleosomes are depicted with blue and green surfaces. (b) The 11-nm fiber folds on itself to formtwo stacks/columns of nucleosomes such that odd-numbered nucleosomes interact with other odd-numbered nucleosomes and even-numbered nucleosomes interact with other even-numbered nucleosomes. The linker DNA zigzags between the two nucleosome stacks.(c) The folded 11-nm fiber forms a two-start helix to produce the 30-nm chromatin fiber that is the secondary level of compaction.(d ) The 30-nm fiber twists further and forms a more compact fiber that is arranged in loops (blue), with some portions attached to aprotein scaffold (red ). This is one of the tertiary levels of compaction. (e) The 30-nm fiber may also result in the formation ofinterdigitating layers of irregularly oriented nucleosomes, particularly in metaphase chromosomes. Note that these plates do containnucleosome fibers, but it is unclear whether they are 30-nm fibers or another type. Regardless, this is another tertiary level ofcompaction. ( f ) The quaternary level refers to the three-dimensional organization of entire chromosomes inside the nucleus and theirrelationships with one another as well as with the inner nuclear membrane. The black lines on the pink chromosome represent planesof nucleosome layers as viewed from above.

    Quaternary structureof chromatin:the 3D positioning ofchromatin domainsrelative to one anotherand to the nuclearlamina inside thenucleus

    in metaphase chromosomes (8, 9, 26)(Figure 1e). These, too, are considered to rep-resent the tertiary level of chromatin packaging.

    The quaternary structure of chromatinrefers to the actual positioning of the chro-mosomes with respect to one another in thenucleus and with respect to the lamina of the

    inner nuclear membrane (Figure 1f ). It isknown that expression of a gene is affectedby its three-dimensional (3D) position withinthe nucleus, with the general consensus beingthat transcriptionally active genomic regionsare further away from the nuclear peripherythan those that are silent (80). The former

    www.annualreviews.org Higher-Order Chromatin Structure 61

    Ann

    u. R

    ev. G

    enom

    . Hum

    an G

    enet

    . 201

    2.13

    :59-

    82. D

    ownl

    oade

    d fr

    om w

    ww

    .ann

    ualre

    view

    s.org

    Acc

    ess p

    rovi

    ded

    by U

    nive

    rsity

    of U

    ppsa

    la o

    n 11

    /26/

    15. F

    or p

    erso

    nal u

    se o

    nly.

    GG13CH03-Hawkins ARI 25 July 2012 11:40

    Odd-numberednucleosome

    Even-numberednucleosome

    Plane ofnucleosome layers

    DNA

    Protein scaffold

    Chromatin loop

    Metaphasechromosome

    1

    23

    4

    5135

    241

    23 5

    f Organization of wholechromosomes inside thenucleus (quaternary level)

    d Loops of 30-nmfiber (tertiary level)

    e Interdigitating layers ofirregularly organizednucleosomes (tertiary level)

    a 11-nm fiber(primary level)

    b Nucleosome stacking(folded 11-nm fiber withzigzag linker DNA)

    c 30-nm fiber(secondary level)

    Nucleus

    Figure 1Different levels of chromatin compaction. (a) Multiple nucleosomes in a row form the 11-nm fiber that is the primary level ofchromatin compaction. Alternating nucleosomes are depicted with blue and green surfaces. (b) The 11-nm fiber folds on itself to formtwo stacks/columns of nucleosomes such that odd-numbered nucleosomes interact with other odd-numbered nucleosomes and even-numbered nucleosomes interact with other even-numbered nucleosomes. The linker DNA zigzags between the two nucleosome stacks.(c) The folded 11-nm fiber forms a two-start helix to produce the 30-nm chromatin fiber that is the secondary level of compaction.(d ) The 30-nm fiber twists further and forms a more compact fiber that is arranged in loops (blue), with some portions attached to aprotein scaffold (red ). This is one of the tertiary levels of compaction. (e) The 30-nm fiber may also result in the formation ofinterdigitating layers of irregularly oriented nucleosomes, particularly in metaphase chromosomes. Note that these plates do containnucleosome fibers, but it is unclear whether they are 30-nm fibers or another type. Regardless, this is another tertiary level ofcompaction. ( f ) The quaternary level refers to the three-dimensional organization of entire chromosomes inside the nucleus and theirrelationships with one another as well as with the inner nuclear membrane. The black lines on the pink chromosome represent planesof nucleosome layers as viewed from above.

    Quaternary structureof chromatin:the 3D positioning ofchromatin domainsrelative to one anotherand to the nuclearlamina inside thenucleus

    in metaphase chromosomes (8, 9, 26)(Figure 1e). These, too, are considered to rep-resent the tertiary level of chromatin packaging.

    The quaternary structure of chromatinrefers to the actual positioning of the chro-mosomes with respect to one another in thenucleus and with respect to the lamina of the

    inner nuclear membrane (Figure 1f ). It isknown that expression of a gene is affectedby its three-dimensional (3D) position withinthe nucleus, with the general consensus beingthat transcriptionally active genomic regionsare further away from the nuclear peripherythan those that are silent (80). The former

    www.annualreviews.org Higher-Order Chromatin Structure 61

    Ann

    u. R

    ev. G

    enom

    . Hum

    an G

    enet

    . 201

    2.13

    :59-

    82. D

    ownl

    oade

    d fr

    om w

    ww

    .ann

    ualre

    view

    s.org

    Acc

    ess p

    rovi

    ded

    by U

    nive

    rsity

    of U

    ppsa

    la o

    n 11

    /26/

    15. F

    or p

    erso

    nal u

    se o

    nly.

    SajanS.AandHawkinsR.D.Annu.Rev.GenomicsHum.Genet.2012

    GG13C

    H03-Ha

    wkins

    ARI25 J

    uly201

    211:4

    0

    Odd-num

    bered

    nucleos

    ome

    Even-nu

    mbered

    nucleos

    ome

    Plane of

    nucleos

    ome laye

    rs

    DNA

    Protein

    scaffold

    Chroma

    tin loop

    Metaph

    ase

    chromos

    ome

    1

    2 3

    45

    13

    5

    24

    1

    2 3

    5

    f Organiz

    ation of

    whole

    chromos

    omes ins

    ide the

    nucleus

    (quater

    nary lev

    el)

    d Loops

    of 30-nm

    fiber (te

    rtiary lev

    el)

    e Interdi

    gitating

    layers o

    f

    irregular

    ly organ

    ized

    nucleos

    omes (te

    rtiary lev

    el)

    a 11-nm

    fiber

    (primary

    level)

    b Nucleo

    some st

    acking

    (folded 1

    1-nm fibe

    r with

    zigzag l

    inker DN

    A)

    c 30-nm

    fiber

    (second

    ary leve

    l)

    Nucleus

    Figure 1

    Differen

    t levels o

    f chrom

    atincom

    paction.

    (a) Multip

    le nucle

    osomes in

    a row fo

    rm the 1

    1-nmfibe

    r that is

    theprim

    aryleve

    l of

    chromat

    in comp

    action.

    Alternat

    ingnuc

    leosome

    s aredep

    icted w

    ith blue

    andgree

    n surfac

    es. (b) T

    he 11-nm

    fiber fo

    lds on its

    elf to fo

    rm

    twostac

    ks/colum

    ns of nu

    cleosom

    es such t

    hatodd

    -numbere

    d nucleo

    somes in

    teract w

    ith other

    odd-num

    bered nu

    cleosom

    es and e

    ven-

    numbere

    d nucleo

    somes in

    teract w

    ith other

    even-num

    bered nu

    cleosom

    es. The l

    inker DN

    A zigzag

    s betwe

    en the t

    wonuc

    leosome

    stacks.

    (c) The f

    olded 11

    -nmfibe

    r forms

    a two-st

    art helix

    to produ

    ce the 3

    0-nmchro

    matin fi

    berthat

    is the se

    condary

    level of

    compact

    ion.

    (d )The

    30-nm fi

    bertwis

    ts furthe

    r and fo

    rmsa m

    orecom

    pactfibe

    r that is

    arrange

    d inloop

    s (blue),

    withsom

    e portio

    ns attac

    hedto a

    protein

    scaffold

    (red). T

    his is on

    e ofthe

    tertiary

    levels of

    compact

    ion.(e) T

    he 30-nm

    fiber ma

    y also re

    sultin th

    e forma

    tionof

    interdig

    itating l

    ayers of

    irregula

    rly orien

    tednuc

    leosome

    s, particu

    larlyin m

    etaphase

    chromos

    omes. N

    otethat

    these pl

    atesdo c

    ontain

    nucleos

    omefibe

    rs, but it

    is unclea

    r whethe

    r they a

    re 30-nm

    fibers o

    r anothe

    r type. R

    egardles

    s, this is

    another

    tertiary

    level of

    compact

    ion.( f )

    Thequa

    ternary

    level ref

    ers to th

    e three-

    dimensi

    onal org

    anizatio

    n ofenti

    re chrom

    osomes in

    sidethe

    nucleus

    andthei

    r

    relation

    ships wi

    th one a

    nother a

    s well as

    withthe

    inner nu

    clear me

    mbrane.

    Theblac

    k lines o

    n the pi

    nk chrom

    osome re

    present

    planes

    of nucle

    osome lay

    ers as vie

    wedfrom

    above.

    Quatern

    arystru

    cture

    of chro

    matin:

    the3D

    position

    ingof

    chromat

    in doma

    ins

    relative

    to one a

    nother

    andto th

    e nuclea

    r

    lamina

    inside th

    e

    nucleus

    inmet

    aphase

    chromos

    omes (

    8,9,

    26)

    (Figure

    1e).The

    se, too,

    arecon

    sidered

    to rep-

    resent th

    e tertiary

    level of c

    hromatin

    packagin

    g.

    Thequa

    ternary

    structur

    e of c

    hromatin

    refers t

    o the ac

    tualposi

    tioning

    of the

    chro-

    mosome

    s with

    respect

    to one a

    nother i

    n the

    nucleus

    andwith

    respect

    to the l

    amina o

    f the

    inner n

    uclear m

    embrane

    (Figure

    1f ).It i

    s

    known

    thatexpr

    ession o

    f agen

    e isaffe

    cted

    by its th

    ree-dim

    ensiona

    l (3D) po

    sition w

    ithin

    thenuc

    leus, wi

    th the g

    eneral c

    onsensus

    being

    thattran

    scription

    allyacti

    ve geno

    micregi

    ons

    arefurt

    herawa

    y from

    thenuc

    learperi

    phery

    thanthos

    e that a

    re silen

    t (80).

    Theform

    er

    www.ann

    ualrevie

    ws.org

    Highe

    r-Order

    Chroma

    tin Struc

    ture61

    Annu

    . Rev

    . Gen

    om. H

    uman

    Gen

    et. 2

    012.

    13:5

    9-82

    . Dow

    nloa

    ded

    from

    www

    .annu

    alrev

    iews.o

    rg

    Acc

    ess p

    rovi

    ded

    by U

    nive

    rsity

    of U

    ppsa

    la on

    11/

    26/1

    5. F

    or p

    erso

    nal u

    se o

    nly.

    DamIDA method for mapping the distribution of chromatin-associated proteins by fusing a protein of interest with E. coli DNA adenine methyltransferase (Dam), which methylates adenines proximal to the binding sites of a protein, thus circumventing the need for antibodies.

    Giemsa bandAlso known as a G-band. A characteristic banding pattern is obtained by treating chromosomes with Giemsa stain. The intensity of Giemsa staining is correlated with genomic features. For instance, dark Giemsa bands usually are AT rich, have low gene density and have higher densities of repeat elements.

    Polycomb bodyA discrete nuclear focus containing Polycomb proteins and their silenced target genes. Polycomb bodies have been observed in D. melanogaster and human cells by imaging and in situ hybridization.

    H3K9me2 and lamina-associated domains. The nuclear lamina is thought to bind and silence large regions of heterochromatin. Two studies that analysed distinct genomic features identified similar sets of domains enriched for H3K9 methylation and lamina contact96,97. Guelen et al. globally mapped the interaction between the genome and nuclear lamina in human fibroblasts using DamID. These authors observed two discrete chromatin environments: lamina-associated domains (LADs) and regions outside LADs. Both regions were approximately 0.110 Mb in size. LADs were found to have low gene density, low transcriptional activity and a paucity of active chromatin modifications. Although the nuclear lamina had previously been associated with inactivity, for the first time, these studies defined the locations and extents of LADs and the correlated chromatin patterns. Remarkably, tethering experi-ments show that interaction with the nuclear lamina is not only correlative but is also causal in reducing gene expression98100.

    Wen et al. identified a similar set of genomic domains by analysing genome-wide maps of H3K9me2 in differ-entiated and undifferentiated cells97. They found large and diffuse regions of K9 methylation that cover up to 4.9 Mb and collectively represent up to 46% of the genome, which they termed large organized chroma-tin K modifications (LOCKs). These investigators also showed that LOCKs are conserved between human and mouse, and that the H3K9me2 mark was dependent on the G9A H3K9 methyltransferase. Furthermore, a close relationship between LOCKs and LADs was indicated by a striking overlap of 82% between placental LOCKs and LADs found in fibroblasts. Thus, genomic regions diffusely marked by H3K9 methylation seem to be in contact with the nuclear lamina; these findings have prompted a model in which chromatin is partitioned into distinct environments in different cell types. It was initially proposed that LOCKs are relatively scarce in ES cells, as few such chromatin domains could be detected. However, whether this reflects a true distinc-tion in modification patterns between cell types or a detection bias has been questioned101. The nature of these compartments remains an area of active inves-tigation, as these structures could play a crucial part in sequestering unused regions of the genome, and thereby reducing the effective search space for gene regulatory machinery.

    H3K27me3 blocks and Polycomb bodies. Genome-wide histone modification maps have also revealed large blocks of H3K27me3 in differentiated cells. Identification of these domains relied on new algo-rithms for identifying broad regions rather than sharp peaks of enrichment, as two recent studies illustrate. Pauler et al. used an algorithm called broad local enrichments (BLOCs) to identify H3K27me3 blocks that are on average 43 kb and overlap silent genes and intergenic regions102. They found this pattern in numerous ChIPchip and ChIPseq data sets, and sug-gest that this is a common feature of H3K27me3 in dif-ferentiated cell types. The authors speculate that these

    H3K27me3 blocks may relate to Giemsa bands, as they observe alternating chromatin patterns along chromo-somes. Hawkins et al. used ChromaBlocks to find simi-lar H3K27me3 blocks in human IMR90 fibroblasts and characterized their dynamics during differentiation74. This study suggested that these repressive domains are often seeded in ES cells and expand in differentiated cell types, apparently to confer cell type-specific repres-sion (FIG. 4d). As these domains have only recently been observed, little is known about their establishment or functional consequences. It is tempting to consider the possibility that, like H3K9me2 domains, H3K27me3 blocks mark distinct nuclear structures or regions. They potentially correspond to Polycomb bodies, which are discrete foci of silenced genes that have been observed by imaging and in situ hybridization in fly and human cells103. Although there are no data yet that directly link H3K27me3 blocks to these structures, there is indirect evidence of H3K27me association with compacted chromatin; H3K27me3 can promote recruitment of PRC1 (REF. 6), and PRC1 may be required

    Figure 5 | Histone modification signatures associated with features in the mammalian cell nucleus. Signature histone modifications correlate with various nuclear features, although the relationships might be indirect. Chromatin with modifications generally associated with active transcription (green dots) often replicates early, whereas chromatin with generally repressive modifications (purple dots) replicates late. Regions enriched for some sets of active modifications (blue dots) may converge into transcription factories (TRFs). Blocks of histone H3 lysine 27 trimethylation (H3K27me3; red dots) may form Polycomb bodies (Pc) and diffuse domains marked by H3K9me2 or H3K9me3 (purple dots) may contact the nuclear lamina.

    REVIEWS

    NATURE REVIEWS | GENETICS VOLUME 12 | JANUARY 2011 | 15

    2011 Macmillan Publishers Limited. All rights reserved

    GG13CH03-HawkinsARI

    25 July 2012

    11:40

    Odd-numbered

    nucleosomeEven-numbered

    nucleosome

    Plane ofnucleosome layers

    DNAProtein scaffold

    Chromatin loopMetaphase

    chromosome

    1 2

    3 4

    5

    13

    5

    24

    1 2

    3

    5

    f Organization of whole

    chromosomes inside the

    nucleus (quaternary level)

    d Loops of 30-nm

    fiber (tertiary level)

    e Interdigitating layers of

    irregularly organized

    nucleosomes (tertiary level)

    a 11-nm fiber

    (primary level)

    b Nucleosome stacking

    (folded 11-nm fiber with

    zigzag linker DNA)

    c 30-nm fiber(secondary level)

    Nucleus

    Figure 1Different levels of chromatin compaction. (a) Multiple nucleosomes in a row form

    the 11-nmfiber that is the primary level of

    chromatin compaction. Alternating nucleosomes are depicted with blue and green surfaces. (b) The 11-nmfiber folds on itself to form

    two stacks/columns of nucleosomes such that odd-numbered nucleosomes interact with other odd-numbered nucleosomes and even-

    numbered nucleosomes interact with other even-numbered nucleosomes. The linker DNA zigzags between the two nucleosome stacks.

    (c) The folded 11-nmfiber forms a two-start helix to produce the 30-nm

    chromatin fiber that is the secondary level of compaction.

    (d ) The 30-nmfiber twists further and forms a more compact fiber that is arranged in loops (blue), with some portions attached to a

    protein scaffold (red ). This is one of the tertiary levels of compaction. (e) The 30-nmfiber may also result in the formation of

    interdigitating layers of irregularly oriented nucleosomes, particularly in metaphase chromosomes. Note that these plates do contain

    nucleosome fibers, but it is unclear whether they are 30-nmfibers or another type. Regardless, this is another tertiary level of

    compaction. ( f ) The quaternary level refers to the three-dimensional organization of entire chromosomes inside the nucleus and their

    relationships with one another as well as with the inner nuclear membrane. The black lines on the pink chromosome represent planes

    of nucleosome layers as viewed fromabove.

    Quaternary structure

    of chromatin:

    the 3Dpositioning of

    chromatin domains

    relative to one another

    and to the nuclear

    lamina inside the

    nucleus

    inmetaphase

    chromosomes(8,

    9,26)

    (Figure 1e). These, too, are considered to rep-

    resent the tertiary level of chromatin packaging.

    Thequaternary

    structureof

    chromatin

    refers tothe actual positioning

    of the chro-

    mosomes with respect to one another in the

    nucleus and with respect to the lamina of the

    inner nuclear membrane(Figure

    1f ). It is

    knownthat expression

    of a gene is affected

    by its three-dimensional (3D) position within

    the nucleus, with the general consensus being

    that transcriptionally active genomic regions

    are further away fromthe nuclear periphery

    thanthose that are silent (80). The former

    www.annualreviews.org Higher-Order Chromatin Structure

    61

    Annu

    . Rev

    . Gen

    om. H

    uman

    Gen

    et. 20

    12.13

    :59-82

    . Dow

    nload

    ed fr

    om w

    ww.an

    nualr

    eview

    s.org

    Acce

    ss pro

    vided

    by U

    nivers

    ity of

    Upp

    sala o

    n 11/2

    6/15.

    For p

    erson

    al use

    only.

    CpG islandA genomic region enriched for CpG dinucleotides that often occurs near constitutively active promoters. Mammalian genomes are otherwise depleted of CpGs owing to the preferential deamination of methylated cytosines.

    developments that have punctuated the shift from a gene-centric to genome-wide view. Then we discuss our current knowledge of primary chromatin structure, focusing on the global patterns, functions and dynamics of histone modifications that overlay sequence features such as promoters, enhancers and gene bodies. Finally, we will discuss notable recent studies that illuminate the link between histone modifications and higher-order chromatin domains.

    From gene-centric to genome-wideFor the past several decades, chromatin biology has been guided by a succession of methods for probing features such as chromatin accessibility; DNA methylation; the

    location, composition and turnover of nucleosomes; and the patterns of post-translational histone modifica-tions. Technological advances in microarrays and next- generation sequencing have enabled many of these assays to be scaled genome-wide. Notable examples include: the DNase Iseq9,10, FAIREseq11 and Sonoseq12 assays for chromatin accessibility; whole-genome and reduced-representation bisulphite sequencing (BS-seq)13,14 and MeDIP-seq15 assays for DNA methylation; and the MNaseseq16,17 and CATCHIT18 assays for elucidating nucleosome position and turnover, respectively. These technologies and their integration have been extensively reviewed elsewhere19,20. In this section, we focus on his-tone modifications and, in particular, on how genome-wide ChIPseq-mapping studies have enhanced our understanding of the chromatin landscape.

    Mapping histone modifications genome-wide. Although ChIP has been used since 1988 (REF. 21) to probe chro-matin structure at individual loci, its combination with microarrays and, more recently, next-generation sequenc-ing has provided far more precise and comprehensive views of histone modification landscapes, which have highlighted roles for chromatin structures across diverse genomic features and elements that were not appreci-ated in targeted studies. The basis of ChIP is the immu-noprecipitation step, in which an antibody is used to enrich chromatin that carries a histone modification (or other epitope) of interest. In ChIPseq, next-generation technology is used to deep sequence the immunoprecip-itated DNA molecules and thereby produce digital maps of ChIP enrichment (BOX 1). An example is the compre-hensive work by Keji Zhaos group to profile 39 different histone methylation and acetylation marks genome-wide in human CD4+ T cells22,23. These maps and similar data sets2426 have associated particular modifications with gene activation or repression and with various genomic features, including promoters, transcribed regions, enhancers and insulators (FIG. 2). These and subsequent studies highlight the value of comprehensive and less-biased sequencing approaches for testing the general-ity of insights gleaned through gene-specific studies, as well as for identifying altogether new associations and biological phenomena.

    Integrating ChIPseq maps. The expanding body of chromatin data in the public domain has fostered many computational efforts that aim to integrate different data types, identify novel relationships among histone modi-fications and related chromatin structures, and develop new hypotheses regarding the regulatory functions of these chromatin features. Integration of histone modi-fication maps with chromatin accessibility, nucleosome positions, transcription factor binding, RNA expression and sequence-based genome annotations is providing increasingly unified views of chromatin structure and function17,19,27.

    Two recent studies have presented innovative approaches for integrating genome-wide chromatin maps28,29, both of which were demonstrated on a com-pendium of ChIPseq data for human CD4+ T cells22,23.

    Figure 1 | Layers of chromatin organization in the mammalian cell nucleus. Broadly, features at different levels of chromatin organization are generally associated with inactive (off) or active (on) transcription. From the top, genomic DNA is methylated (Me) on cytosine bases in specific contexts and is packaged into nucleosomes, which vary in histone composition and histone modifications (for example, histone H3 lysine 9 trimethylation (H3K9me3)); these features constitute the primary layer of chromatin structure. Here, different histone modifications are indicated by coloured dots and histone variants such as H2A.Z are brown. DNA in chromatin may remain accessible to DNA-binding proteins such as transcription factors (TFs) and RNA polymerase II (RNAPII) or may be further compacted. Chromatin can also organize into higher-order structures such as nuclear lamina-associated domains and transcription factories. Each layer of organization reflects aspects of gene and genome regulation.

    REVIEWS

    8 | JANUARY 2011 | VOLUME 12 www.nature.com/reviews/genetics

    2011 Macmillan Publishers Limited. All rights reserved

    OngC-tandCorcesV.G.NatureReviewGenetics2014

  • http://www.nature.com/epigenomeroadmap

    http://www.nature.com/epigenomeroadmap

  • Abitofrecenthistory

    2012,Naturepublishedmod/ENCODE(pilotlaunchedbyUSNHGRIin2003-2007)whichaimstodescribeallthefunctionalelementsencodedinthehumangenomebymappingepigeneticmodifications.Pioneereffortbutclinicallylimited.

    2014,RoadmapEpigenomicsProject,USNIHinitiative. Material:stemcells,maturecellfromvarioustissuesfromhealthyanddiseasedonors(cancer,neurodegenerative,autoimmune,)

  • Epigenomechangesduringdisease

    Previoustechniques:HumanMeth450kchip,MeDIP-Seq,MBDCap-Seq Currenttechniques:wholegenomebisulphatesequencing(WGB-Seq) Acausallinkbetweenepigeneticchangesanddiseasehassofarbeenhardtoestablish.

    2014,RoadmapEpigenomicsProject(Nature):consistentalterationintheepigeneticlandscapecouldidentifycandidategenesandpathwaystofurtherfollow-up

    2014,RoadmapEpigenomicsProject(Nature):time-coursestudiesoftheepigeneticsofcelltyperelevanttoaspecificdiseasecouldindicatewhetherepigeneticchangeshavearoleindiseaseprogression,oronlyinitsonset

    Epigenomicmapsshouldhelptonavigatepoorlyunderstoodregionsofthegenome

  • Epigenomeincancer

    Cancerthediseaseofthegenome BUTlinkedmostunambiguouslytoepigeneticaberrations

    Epigenomicorganizationaffectsthegenomiclocationofthemutationsthatprovokecancer

    Epigenomeofacancercellcarriesafingerprintofthecelltypethatoriginatedthecancer

  • 8tracks!21NaturePublishingGrouparticles!58additionalresearcharticleswithNIHRoadmapEpigenomicsfunding

  • http://www.nature.com/epigenomeroadmap

    http://www.nature.com/epigenomeroadmap

  • http://www.nature.com/epigenomeroadmap

    http://www.nature.com/epigenomeroadmap

  • WhataboutEurope?

    http://www.epigenome.org/index.phphttps://en.wikipedia.org/wiki/Human_Epigenome_Project

    http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0000082

    http://www.epigenome.org/index.phphttps://en.wikipedia.org/wiki/Human_Epigenome_Projecthttp://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0000082

  • WhataboutEurope?

    23

    http://www.nature.com/news/2011/110928/full/477518a.html

    http://www.blueprint-epigenome.eu/

    http://www.nature.com/news/2011/110928/full/477518a.htmlhttp://www.blueprint-epigenome.eu/

  • 24

    http://ihec-epigenomes.org/

    http://ihec-epigenomes.org/outcomes/datasets/

    http://ihec-epigenomes.org/http://ihec-epigenomes.org/outcomes/datasets/

  • ARTICLE OPENdoi:10.1038/nature14248

    Integrative analysis of 111 referencehuman epigenomesRoadmap Epigenomics Consortium{, Anshul Kundaje1,2,3*, Wouter Meuleman1,2*, Jason Ernst1,2,4*, Misha Bilenky5*,Angela Yen1,2, Alireza Heravi-Moussavi5, Pouya Kheradpour1,2, Zhizhuo Zhang1,2, Jianrong Wang1,2, Michael J. Ziller2,6,Viren Amin7, John W. Whitaker8, Matthew D. Schultz9, Lucas D. Ward1,2, Abhishek Sarkar1,2, Gerald Quon1,2,Richard S. Sandstrom10, Matthew L. Eaton1,2, Yi-Chieh Wu1,2, Andreas R. Pfenning1,2, Xinchen Wang1,2,11, Melina Claussnitzer1,2,Yaping Liu1,2, Cristian Coarfa7, R. Alan Harris7, Noam Shoresh2, Charles B. Epstein2, Elizabeta Gjoneska2,12, Danny Leung8,13,Wei Xie8,13, R. David Hawkins8,13, Ryan Lister9, Chibo Hong14, Philippe Gascard15, Andrew J. Mungall5, Richard Moore5,Eric Chuah5, Angela Tam5, Theresa K. Canfield10, R. Scott Hansen16, Rajinder Kaul16, Peter J. Sabo10, Mukul S. Bansal1,2,17,Annaick Carles18, Jesse R. Dixon8,13, Kai-How Farh2, Soheil Feizi1,2, Rosa Karlic19, Ah-Ram Kim1,2, Ashwinikumar Kulkarni20,Daofeng Li21, Rebecca Lowdon21, GiNell Elliott21, Tim R. Mercer22, Shane J. Neph10, Vitor Onuchic7, Paz Polak2,23,Nisha Rajagopal8,13, Pradipta Ray20, Richard C. Sallari1,2, Kyle T. Siebenthall10, Nicholas A. Sinnott-Armstrong1,2,Michael Stevens21,42, Robert E. Thurman10, Jie Wu24,25, Bo Zhang21, Xin Zhou21, Arthur E. Beaudet26, Laurie A. Boyer11,Philip L. De Jager2,23,27, Peggy J. Farnham28, Susan J. Fisher29, David Haussler30, Steven J. M. Jones5,31,32, Wei Li33,Marco A. Marra5,32, Michael T. McManus34, Shamil Sunyaev2,23,27, James A. Thomson35,41, Thea D. Tlsty15, Li-Huei Tsai2,12,Wei Wang8, Robert A. Waterland36, Michael Q. Zhang20,37, Lisa H. Chadwick38, Bradley E. Bernstein2,39,401,Joseph F. Costello141, Joseph R. Ecker91, Martin Hirst5,181, Alexander Meissner2,61, Aleksandar Milosavljevic71, Bing Ren8,131,John A. Stamatoyannopoulos101, Ting Wang211 & Manolis Kellis1,21

    The reference human genome sequence set the stage for studies of genetic variation and its association with human disease,but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generatedthe largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysisof 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNAaccessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatorymodules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associatedgenetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diversehuman traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstratethe central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

    While the primary sequence of the human gen-ome is largely preserved in all human cell types,the epigenomic landscape of each cell can varyconsiderably, contributing to distinct gene expres-sion programs and biological functions14. Epi-genomic information, such as covalent histone modifications, DNAaccessibility and DNA methylation can be interrogated in each cell andtissue type using high-throughput molecular assays2,58. The resultingmaps have been instrumental for annotating cis-regulatory elementsand other non-exonic genomic features with characteristic epigenomicsignatures9,10, and for dissecting gene regulatory programs in develop-ment and disease7,9,1114. Despite these technological advances, we stilllack a systematic understanding of how the epigenomic landscape con-tributes to cellular circuitry, lineage specification, and the onset and pro-gression of human disease.

    To facilitate and spearhead these efforts, the NIH Roadmap Epigeno-mics Program was established with the goal of elucidating how epige-netic processes contribute to human biology and disease. One of themajor components of this programme consists of the Reference Epi-genome Mapping Centers (REMCs)15, which systematically character-ized the epigenomic landscapes of representative primary human tissues

    and cells. We used a diversity of assays, includingchromatin immunoprecipitation (ChIP)9,10,16,17,DNA digestion by DNase I (DNase)7,18, bisulfitetreatment1,2,19,20, methylated DNA immunopreci-pitation (MeDIP)21, methylation-sensitive restric-

    tion enzyme digestion (MRE)22, and RNA profiling8, each followed bymassively parallel short-read sequencing (-seq). The resulting data setswere assembled into publicly accessible websites and databases, whichserve as a broadly useful resource for the scientific and biomedical com-munity. Here we report the integrative analysis of 111 reference epige-nomes (Fig. 1 and Extended Data Fig. 1ad), which we analyse jointlywith an additional 16 epigenomes previously reported by the Ency-clopedia of DNA Elements (ENCODE) project9,23.

    We integrate information about histone marks, DNA methylation,DNA accessibility and RNA expression to infer high-resolution mapsof regulatory elements annotated jointly across a total of 127 referenceepigenomes spanning diverse cell and tissue types. We use these anno-tations to recognize epigenome differences that arise during lineagespecification and cellular differentiation, to recognize modules of regu-latory regions with coordinated activity across cell types, and to identifykey regulators of these modules based on motif enrichments and regulator

    A special issuenature.com/epigenomeroadmap

    NatureEPIGENOME ROADMAP

    {Lists of participants and their affiliations appear at the end of the paper.*These authors contributed equally to this work.1These authors jointly supervised this work.

    A list of affiliations appears at the end of the paper.

    1 9 F E B R U A R Y 2 0 1 5 | V O L 5 1 8 | N A T U R E | 3 1 7

    Macmillan Publishers Limited. All rights reserved2015

    4 Reference Epigenome Mapping Centers (REMCs)

    RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248

  • 26RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248

    http://www.roadmapepigenomics.org/participants

    http://www.roadmapepigenomics.org/participants

  • 4 Reference Epigenome Mapping Centers (REMCs) 111 reference human epigenomes from primary cells and tissues

    RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248

  • We computed several quality control measures (Fig. 2 and Supplemen-tary Table 1) including the number of distinct uniquely mapped reads;the fraction of mapped reads overlapping areas of enrichment18,36;

    genome-wide strand cross-correlation37 (Fig. 2eg); inter-replicatecorrelation; multidimensional scaling of data sets from different pro-duction centres (Supplementary Fig. 1); correlation across pairs of datasets (Extended Data Fig. 1e); consistency between assays carried out inmultiple mapping centres (Supplementary Table 2); read mapping qua-lity for bisulfite-treated reads38,39; and agreement with imputed data40.Outlier data sets were flagged, removed or replaced, and lower-coveragedata sets were combined where possible (see Methods).

    The resulting data sets provide global views of the epigenomic land-scape in a wide range of human cell and tissue types (Fig. 3), includingthe largest and most diverse collection to date of chromatin state anno-tations (Fig. 3a); some of the deepest surveys of individual cell typesusing diverse epigenomic assays (with 2131 distinct epigenomic marksfor seven deeply profiled epigenomes; Fig. 3b); and some of the broad-est surveys of individual epigenomic marks across multiple cell types(Fig. 3c). These data sets enable genome-wide epigenomic analyses acrossmultiple dimensions (Fig. 3d). All data sets, standards and protocolsare publicly available from web portals, linked from the main consor-tium homepage http://www.roadmapepigenomics.org, and also at http://compbio.mit.edu/roadmap.

    Chromatin states, DNA methylation and DNA accessibilityAs a foundation for integrative analysis, we used a common set of com-binatorial chromatin states41 across all 111 epigenomes, plus 16 addi-tional epigenomes generated by the ENCODE project (127 epigenomesin total), using the core set of five histone modification marks that werecommon to all. We trained a 15-state model (Fig. 4a, b and Supplemen-tary Table 3a) consisting of 8 active states and 7 repressed states (Fig. 4c)that were recurrently recovered (Extended Data Fig. 2a), and showeddistinct levels of DNA methylation (Fig. 4d), DNA accessibility (Fig. 4e),regulator binding (Extended Data Fig. 2b and Supplementary Fig. 2)and evolutionary conservation (Fig. 4f and Supplementary Fig. 3). Theactive states (associated with expressed genes) consist of active tran-scription start site (TSS) proximal promoter states (TssA, TssAFlnk), atranscribed state at the 59 and 39 end of genes showing both promoterand enhancer signatures (TxFlnk), actively transcribed states (Tx, TxWk),enhancer states (Enh, EnhG), and a state associated with zinc finger proteingenes (ZNF/Rpts). The inactive states consist of constitutive hetero-chromatin (Het), bivalent regulatory states (TssBiv, BivFlnk, EnhBiv),repressed Polycomb states (ReprPC, ReprPCWk), and a quiescent state(Quies), which covered on average 68% of each reference epigenome.Enhancer and promoter states covered approximately 5% of each referenceepigenome on average, and showed enrichment for evolutionarily con-served non-exonic regions42.

    To capture the greater complexity afforded by additional marks, wetrained additional chromatin state models in subsets of cell types. Inthe subset of 98 reference epigenomes that also included H3K27ac data,we also learned an 18-state model (Extended Data Fig. 2c and Supplemen-tary Table 3b), enabling us to distinguish enhancer states containingstrong H3K27ac signal (EnhA1, EnhA2), which showed higher DNA

    ES cell derivedPrimary cellPrim. tissue

    Prim. culture

    Cell line

    IMR90

    Myosat.

    Adipose

    Epithelial

    Mesench.

    B cell

    T cell

    ES-deriv.

    iPSC

    ES cell

    Neurosph.Thymus

    Other

    Digestive

    Heart

    Smoothmuscle

    Muscle

    Brain

    Data set count

    EID Epigenome name

    Cell type/tissuegroup

    Blood &

    HSC &

    IMR90 fetal lung fibroblastsES-WA7 cellsH9 cellsES-I3 cellsHUES6 cellsHUES48 cellsHUES64 cellsH1 cellsES-UCSF4 cellsiPS-20b cellsiPS-18 cellsiPS-15b cellsiPS DF 6.9 cellsiPS DF 19.11 cellsH1 derived neuronal progenitor cultured cellsH9 derived neuronal progenitor cultured cellsH9 derived neuron cultured cellsHUES64 derived CD56+ mesodermHUES64 derived CD56+ ectodermHUES64 derived CD184+ endodermH1 BMP4 derived mesendodermH1 BMP4 derived trophoblastH1 derived mesenchymal stem cellsPrimary mononuclear cells (from PB)Primary T cells from primary blood (from PB)Primary T cells effector/memory enriched (PB)Primary T cells from cord bloodPrimary T regulatory cells (from PB)Primary T helper cells (from PB)Primary T helper naive cells (from PB)Primary T helper cells PMA-I stimulatedPrimary T helper 17 cells PMA-I stimulatedPrimary T helper memory cells (from PB)Primary T helper memory cells (from PB)Primary T CD8+ memory cells (from PB)Primary T helper naive cells (from PB)Primary T CD8+ naive cells (from PB)Primary monocytes (from PB)Primary B cells from cord bloodPrimary haematopoietic stem cells (HSCs)Primary HSCs G-CSF-mobilized malePrimary HSCs G-CSF-mobilized femalePrimary HSCs short term culturePrimary B cells (from PB)Primary natural killer cells (from PB)Primary neutrophils (from PB)Bone marrow derived MSCsMesenchymal stem cell deriv. chondrocyteAdipose-derived mesenchymal stem cellsMesenchymal stem cell derived adipocyteMuscle satelliteForeskin fibroblastForeskin fibroblastForeskin melanocyteForeskin melanocyteForeskin keratinocyteForeskin keratinocyteBreast vHMEC mammary epithelialBreast myoepithelialGanglion eminence derived neurospheresCortex derived neurospheresThymusFetal thymusBrain hippocampus middleBrain substantia nigraBrain anterior caudateBrain cingulate gyrusBrain inferior temporal lobeBrain angular gyrusBrain dorsolateral prefrontal cortexBrain germinal matrixFetal brain femaleFetal brain maleAdipose nucleiPsoas muscleSkeletal muscle femaleSkeletal muscle maleFetal muscle trunkFetal muscle legFetal heartRight atriumLeft ventricleRight ventricleAortaDuodenum smooth muscleColon smooth muscleRectal smooth muscleStomach smooth muscleFetal stomachFetal intestine smallFetal intestine largeSmall intestineSigmoid colonColonic mucosaRectal mucosa donor 29Rectal mucosa donor 31Stomach mucosaDuodenum mucosaOesophagusGastricPlacenta amnionFetal kidneyFetal lungOvaryPancreatic isletsFetal adrenal glandPlacentaLiverPancreasLungSpleenA549 EtOH 0.02pct lung carcinomaDnd41 T cell leukaemiaGM12878 lymphoblastoidHeLa-S3 cervical carcinomaHepG2 hepatocellular carcinomaHMEC mammary epithelialHSMM skeletal muscle myoblastsHSMM-derived skeletal muscle myotubesHUVEC umbilical vein endothelialK562 leukaemiaMonocytes-CD14+ RO01746NH-A astrocyteNHDF-ad adult dermal fibroblastNHEK-epidermal keratinocyteNHLF lung fibroblastOsteoblast

    Prim

    ary

    cultu

    res

    ES c

    ell d

    eriv

    edP

    rimar

    y ce

    llsP

    rimar

    y tis

    sues

    E017E002E008E001E015E014E016E003E024E020E019E018E021E022E007E009E010E013E012E011E004E005E006E062E034E045E033E044E043E039E041E042E040E037E048E038E047E029E031E035E051E050E036E032E046E030E026E049E025E023E052E055E056E059E061E057E058E028E027E054E053E112E093E071E074E068E069E072E067E073E070E082E081E063E100E108E107E089E090E083E104E095E105E065E078E076E103E111E092E085E084E109E106E075E101E102E110E077E079E094E099E086E088E097E087E080E091E066E098E096E113E114E115E116E117E118E119E120E121E122E123E124E125E126E127E128E129

    127

    127

    127

    127

    127 98 62 53 95 78 184

    127

    H3K

    4me1

    H3K

    4me3

    H3K

    36m

    e3H

    3K27

    me3

    H3K

    9me3

    H3K

    27ac

    H3K

    9ac

    DN

    ase-

    Seq

    DN

    A m

    ethy

    l

    Add

    tl m

    arks

    Chr

    om. s

    tate

    s

    Sam

    ple

    typ e

    21

    21

    20

    1311

    1511

    13

    1

    4444444455444544

    ENCODE2012

    Gen

    e ex

    pr.

    Prim

    ary

    cultu

    res

    a b c e f g h i j kd

    WGBS(n = 37)RRBS(n = 51)mCRF(n = 16)

    Highest-quality epigenomes (n = 60)(ChromHMM model trained + applied)Remaining epigenomes (n = 67)(ChromHMM model only applied)

    RNA-seq(n = 56)Microarray(n = 22)

    50%

    100%

    0% Expr

    essi

    on

    Qua

    lity

    Sig

    nal-t

    o-no

    ise

    ratio

    per

    cent

    ile

    Sam

    ple

    type

    DN

    A m

    ethy

    latio

    n

    Figure 2 | Data sets available for each reference epigenome. List of 127epigenomes including 111 by the Roadmap Epigenomics program (E001E113) and 16 by ENCODE (E114E129). See Supplementary Table 1 for a fulllist of names and quality scores. ad, Tissue and cell types grouped by typeof biological material (a), anatomical location (b), reference epigenomeidentifier (EID, c) and abbreviated name (d). PB, peripheral blood. ENCODE2012 reference epigenomes are shown separately. eg, Normalized strandcross-correlation quality scores (NSC)37 for the core set of five histonemarks (e), additional acetylation marks (f) and DNase-seq (g). h, Methylationdata by WGBS (red), RRBS (blue) and mCRF (green). A total of 104methylation data sets available in 95 distinct reference epigenomes. i, Geneexpression data using RNA-seq (brown) and microarray expression (yellow).j, A total of 26 epigenomes contain 184 additional histone modification marks.k, Sixty highest-quality epigenomes (purple) were used for training the corechromatin state model, which was then applied to the full set of epigenomes(purple and orange).

    ARTICLE RESEARCH

    1 9 F E B R U A R Y 2 0 1 5 | V O L 5 1 8 | N A T U R E | 3 1 9

    Macmillan Publishers Limited. All rights reserved2015

    4 Reference Epigenome Mapping Centers (REMCs) 111 reference human epigenomes from primary cells and tissues 2,805 datasets: 1,821 histone modifications, 360 DNA accesibility, 277 DNA methylation, and 166 RNA-seq 150.21 billion (uniquely) mapped sequencing reads 3,174x coverage of hg19RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248

  • We computed several quality control measures (Fig. 2 and Supplemen-tary Table 1) including the number of distinct uniquely mapped reads;the fraction of mapped reads overlapping areas of enrichment18,36;

    genome-wide strand cross-correlation37 (Fig. 2eg); inter-replicatecorrelation; multidimensional scaling of data sets from different pro-duction centres (Supplementary Fig. 1); correlation across pairs of datasets (Extended Data Fig. 1e); consistency between assays carried out inmultiple mapping centres (Supplementary Table 2); read mapping qua-lity for bisulfite-treated reads38,39; and agreement with imputed data40.Outlier data sets were flagged, removed or replaced, and lower-coveragedata sets were combined where possible (see Methods).

    The resulting data sets provide global views of the epigenomic land-scape in a wide range of human cell and tissue types (Fig. 3), includingthe largest and most diverse collection to date of chromatin state anno-tations (Fig. 3a); some of the deepest surveys of individual cell typesusing diverse epigenomic assays (with 2131 distinct epigenomic marksfor seven deeply profiled epigenomes; Fig. 3b); and some of the broad-est surveys of individual epigenomic marks across multiple cell types(Fig. 3c). These data sets enable genome-wide epigenomic analyses acrossmultiple dimensions (Fig. 3d). All data sets, standards and protocolsare publicly available from web portals, linked from the main consor-tium homepage http://www.roadmapepigenomics.org, and also at http://compbio.mit.edu/roadmap.

    Chromatin states, DNA methylation and DNA accessibilityAs a foundation for integrative analysis, we used a common set of com-binatorial chromatin states41 across all 111 epigenomes, plus 16 addi-tional epigenomes generated by the ENCODE project (127 epigenomesin total), using the core set of five histone modification marks that werecommon to all. We trained a 15-state model (Fig. 4a, b and Supplemen-tary Table 3a) consisting of 8 active states and 7 repressed states (Fig. 4c)that were recurrently recovered (Extended Data Fig. 2a), and showeddistinct levels of DNA methylation (Fig. 4d), DNA accessibility (Fig. 4e),regulator binding (Extended Data Fig. 2b and Supplementary Fig. 2)and evolutionary conservation (Fig. 4f and Supplementary Fig. 3). Theactive states (associated with expressed genes) consist of active tran-scription start site (TSS) proximal promoter states (TssA, TssAFlnk), atranscribed state at the 59 and 39 end of genes showing both promoterand enhancer signatures (TxFlnk), actively transcribed states (Tx, TxWk),enhancer states (Enh, EnhG), and a state associated with zinc finger proteingenes (ZNF/Rpts). The inactive states consist of constitutive hetero-chromatin (Het), bivalent regulatory states (TssBiv, BivFlnk, EnhBiv),repressed Polycomb states (ReprPC, ReprPCWk), and a quiescent state(Quies), which covered on average 68% of each reference epigenome.Enhancer and promoter states covered approximately 5% of each referenceepigenome on average, and showed enrichment for evolutionarily con-served non-exonic regions42.

    To capture the greater complexity afforded by additional marks, wetrained additional chromatin state models in subsets of cell types. Inthe subset of 98 reference epigenomes that also included H3K27ac data,we also learned an 18-state model (Extended Data Fig. 2c and Supplemen-tary Table 3b), enabling us to distinguish enhancer states containingstrong H3K27ac signal (EnhA1, EnhA2), which showed higher DNA

    ES cell derivedPrimary cellPrim. tissue

    Prim. culture

    Cell line

    IMR90

    Myosat.

    Adipose

    Epithelial

    Mesench.

    B cell

    T cell

    ES-deriv.

    iPSC

    ES cell

    Neurosph.Thymus

    Other

    Digestive

    Heart

    Smoothmuscle

    Muscle

    Brain

    Data set count

    EID Epigenome name

    Cell type/tissuegroup

    Blood &

    HSC &

    IMR90 fetal lung fibroblastsES-WA7 cellsH9 cellsES-I3 cellsHUES6 cellsHUES48 cellsHUES64 cellsH1 cellsES-UCSF4 cellsiPS-20b cellsiPS-18 cellsiPS-15b cellsiPS DF 6.9 cellsiPS DF 19.11 cellsH1 derived neuronal progenitor cultured cellsH9 derived neuronal progenitor cultured cellsH9 derived neuron cultured cellsHUES64 derived CD56+ mesodermHUES64 derived CD56+ ectodermHUES64 derived CD184+ endodermH1 BMP4 derived mesendodermH1 BMP4 derived trophoblastH1 derived mesenchymal stem cellsPrimary mononuclear cells (from PB)Primary T cells from primary blood (from PB)Primary T cells effector/memory enriched (PB)Primary T cells from cord bloodPrimary T regulatory cells (from PB)Primary T helper cells (from PB)Primary T helper naive cells (from PB)Primary T helper cells PMA-I stimulatedPrimary T helper 17 cells PMA-I stimulatedPrimary T helper memory cells (from PB)Primary T helper memory cells (from PB)Primary T CD8+ memory cells (from PB)Primary T helper naive cells (from PB)Primary T CD8+ naive cells (from PB)Primary monocytes (from PB)Primary B cells from cord bloodPrimary haematopoietic stem cells (HSCs)Primary HSCs G-CSF-mobilized malePrimary HSCs G-CSF-mobilized femalePrimary HSCs short term culturePrimary B cells (from PB)Primary natural killer cells (from PB)Primary neutrophils (from PB)Bone marrow derived MSCsMesenchymal stem cell deriv. chondrocyteAdipose-derived mesenchymal stem cellsMesenchymal stem cell derived adipocyteMuscle satelliteForeskin fibroblastForeskin fibroblastForeskin melanocyteForeskin melanocyteForeskin keratinocyteForeskin keratinocyteBreast vHMEC mammary epithelialBreast myoepithelialGanglion eminence derived neurospheresCortex derived neurospheresThymusFetal thymusBrain hippocampus middleBrain substantia nigraBrain anterior caudateBrain cingulate gyrusBrain inferior temporal lobeBrain angular gyrusBrain dorsolateral prefrontal cortexBrain germinal matrixFetal brain femaleFetal brain maleAdipose nucleiPsoas muscleSkeletal muscle femaleSkeletal muscle maleFetal muscle trunkFetal muscle legFetal heartRight atriumLeft ventricleRight ventricleAortaDuodenum smooth muscleColon smooth muscleRectal smooth muscleStomach smooth muscleFetal stomachFetal intestine smallFetal intestine largeSmall intestineSigmoid colonColonic mucosaRectal mucosa donor 29Rectal mucosa donor 31Stomach mucosaDuodenum mucosaOesophagusGastricPlacenta amnionFetal kidneyFetal lungOvaryPancreatic isletsFetal adrenal glandPlacentaLiverPancreasLungSpleenA549 EtOH 0.02pct lung carcinomaDnd41 T cell leukaemiaGM12878 lymphoblastoidHeLa-S3 cervical carcinomaHepG2 hepatocellular carcinomaHMEC mammary epithelialHSMM skeletal muscle myoblastsHSMM-derived skeletal muscle myotubesHUVEC umbilical vein endothelialK562 leukaemiaMonocytes-CD14+ RO01746NH-A astrocyteNHDF-ad adult dermal fibroblastNHEK-epidermal keratinocyteNHLF lung fibroblastOsteoblast

    Prim

    ary

    cultu

    res

    ES c

    ell d

    eriv

    edP

    rimar

    y ce

    llsP

    rimar

    y tis

    sues

    E017E002E008E001E015E014E016E003E024E020E019E018E021E022E007E009E010E013E012E011E004E005E006E062E034E045E033E044E043E039E041E042E040E037E048E038E047E029E031E035E051E050E036E032E046E030E026E049E025E023E052E055E056E059E061E057E058E028E027E054E053E112E093E071E074E068E069E072E067E073E070E082E081E063E100E108E107E089E090E083E104E095E105E065E078E076E103E111E092E085E084E109E106E075E101E102E110E077E079E094E099E086E088E097E087E080E091E066E098E096E113E114E115E116E117E118E119E120E121E122E123E124E125E126E127E128E129

    127

    127

    127

    127

    127 98 62 53 95 78 184

    127

    H3K

    4me1

    H3K

    4me3

    H3K

    36m

    e3H

    3K27

    me3

    H3K

    9me3

    H3K

    27ac

    H3K

    9ac

    DN

    ase-

    Seq

    DN

    A m

    ethy

    l

    Add

    tl m

    arks

    Chr

    om. s

    tate

    s

    Sam

    ple

    typ e

    21

    21

    20

    1311

    1511

    13

    1

    4444444455444544

    ENCODE2012

    Gen

    e ex

    pr.

    Prim

    ary

    cultu

    res

    a b c e f g h i j kd

    WGBS(n = 37)RRBS(n = 51)mCRF(n = 16)

    Highest-quality epigenomes (n = 60)(ChromHMM model trained + applied)Remaining epigenomes (n = 67)(ChromHMM model only applied)

    RNA-seq(n = 56)Microarray(n = 22)

    50%

    100%

    0% Expr

    essi

    on

    Qua

    lity

    Sig

    nal-t

    o-no

    ise

    ratio

    per

    cent

    ile

    Sam

    ple

    type

    DN

    A m

    ethy

    latio

    n

    Figure 2 | Data sets available for each reference epigenome. List of 127epigenomes including 111 by the Roadmap Epigenomics program (E001E113) and 16 by ENCODE (E114E129). See Supplementary Table 1 for a fulllist of names and quality scores. ad, Tissue and cell types grouped by typeof biological material (a), anatomical location (b), reference epigenomeidentifier (EID, c) and abbreviated name (d). PB, peripheral blood. ENCODE2012 reference epigenomes are shown separately. eg, Normalized strandcross-correlation quality scores (NSC)37 for the core set of five histonemarks (e), additional acetylation marks (f) and DNase-seq (g). h, Methylationdata by WGBS (red), RRBS (blue) and mCRF (green). A total of 104methylation data sets available in 95 distinct reference epigenomes. i, Geneexpression data using RNA-seq (brown) and microarray expression (yellow).j, A total of 26 epigenomes contain 184 additional histone modification marks.k, Sixty highest-quality epigenomes (purple) were used for training the corechromatin state model, which was then applied to the full set of epigenomes(purple and orange).

    ARTICLE RESEARCH

    1 9 F E B R U A R Y 2 0 1 5 | V O L 5 1 8 | N A T U R E | 3 1 9

    Macmillan Publishers Limited. All rights reserved2015

    1,936 datasets, 111 epigenomes (+16 ENCODE) !Core set: H3K4me3, H3K4me1, H3K36me3, H3K27me3, H3K9me3 (Fig 2e) !Additional acetylation marks: H3K27ac, H3K9ac (Fig 2f) !Chromatin accessibility: DNase-seq (Fig 2g) !Methylation data: WGBS (red), RRBS (blue) and mCRF (green) (Fig 2h) !Gene expression: RNA-seq (brown) and microarray expression (yellow) (Fig 2i) !Deep set: 16 histone modification marks (on average) across 7 cell types (Fig 2j) !Fig 2h: Sixty highest-quality epigenomes (purple) were used for training the core chromatin state model, which was then applied to the full set of epigenomes (purple and orange).

    RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248

  • EpigenomeClassesasdefinedbytheRoadmap

    http://www.ncbi.nlm.nih.gov/books/NBK45786/#epi_help_doc.About_Data_Sources

    Class1Epigenomes DNAmethylation(wholegenomebisulfitesequencing);corehistonemodificationsandanexpandedsetofhistonemodifications;RNAsequencingdata(RNA-seq);Chromatinaccessibility

    Class2Epigenomes DNAmethylation(wholegenomebisulfitesequencing);corehistonemodifications;RNAsequencingdata(RNA-seq);Chromatinaccessibility

    Class3Epigenomes DNAmethylation(RRBS,MeDIP-seq,MRE-seq);corehistonemodifications;RNAsequencingdata(geneexpressionmicroarray);Chromatinaccessibility(ifpossible)

    Class4Epigenomes DNAmethylation(RRBS,MeDIP-seq,MRE-seq);corehistonemodifications;RNAsequencingdata(geneexpressionmicroarray)

    http://www.ncbi.nlm.nih.gov/books/NBK45786/#epi_help_doc.About_Data_Sources

  • Epigenomicinformationacrosstissuesandmarks

    RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248

    accessibility (Extended Data Fig. 3a), lower methylation (Extended DataFig. 3b) and higher transcription factor binding (Extended Data Fig. 2c)than enhancers lacking H3K27ac. In a subset of 7 epigenomes with anaverage of 24 epigenomic marks, we learned separate 50-state chro-matin state models based on all the available histone marks and DNAaccessibility in each epigenome (Supplementary Fig. 4), which addi-tionally distinguished: a DNase state with distinct transcription factorbinding enrichments (Supplementary Fig. 4f), including for mediator/cohesin components43 (even though CTCF was not included as an input

    track to learn the model) and repressor NRSF; transcribed states show-ing H3K79me1 and H3K79me2 and associated with the 59 ends of genesand introns; and a large number of putative regulatory and neighbour-ing regions showing diverse acetylation marks even in the absence ofthe H3K4 methylation signatures characteristic of enhancer and pro-moter regions.

    We used chromatin states to study the relationship between histonemodification patterns, RNA expression levels, DNA methylation andDNA accessibility. Consistent with previous studies19,23,44,45, we foundlow DNA methylation and high accessibility in promoter states, highDNA methylation and low accessibility in transcribed states, and inter-mediate DNA methylation and accessibility in enhancer states (Fig. 4d, eand Extended Data Fig. 3a, b). These differences in methylation levelwere stronger for higher-expression genes than for lower-expressiongenes, leading to a more pronounced DNA methylation profile (ExtendedData Fig. 3c, Supplementary Fig. 5 and Supplementary Table 4f). Genesproximal to H3K27ac-marked enhancers show significantly higher expres-sion levels (Extended Data Fig. 3d), and conversely, higher-expressiongenes were significantly more likely to be neighbouring H3K27ac-containing enhancers (Extended Data Fig. 3e).

    Chromatin states sometimes captured differences in RNA express-ion that are missed by DNA methylation or accessibility. For example,TxFlnk, Enh, TssBiv and BivFlnk states show similar distributions ofDNA accessibility but widely differing enrichments for expressed genes(Fig. 4c, d). Enh and ReprPC states show intermediate DNA methyla-tion, but very different distributions of DNA accessibility and differentenrichments for expressed genes (Fig. 4ce). Lack of DNA methylation,typically associated with de-repression, is associated with both the activeTssA promoter state and the bivalent TssBiv and BivFlnk states. Bivalentstates TssBiv and BivFlnk also show overall lower DNA methylationand higher DNA accessibility than enhancer states Enh and EnhG, andbinding by both activating and repressive regulatory factors (ExtendedData Fig. 2b). These results also held for alternative methylation mea-surement platforms (Extended Data Fig. 4ac), and for the 18-state chro-matin state model (Extended Data Fig. 4d, e). Overall, these resultshighlight the complex relationship between DNA methylation, DNAaccessibility and RNA transcription and the value of interpreting DNAmethylation and DNA accessibility in the context of integrated chro-matin states that better distinguish active and repressed regions.

    Given the intermediate methylation levels of tissue-specific enhan-cer regions, we directly annotated intermediate methylation regions,based on 25 complementary DNA methylation assays of MeDIP31,46

    and MRE-seq22,39 from 9 reference epigenomes47. This resulted in morethan 18,000 intermediate methylation regions, showing 57% CpG meth-ylation on average, that are strongly enriched in genes, enhancer chro-matin states (EnhBiv, EnhG, Enh) and evolutionarily conserved regions.Intermediate methylation was associated with intermediate levels ofactive histone modifications and DNase I hypersensitivity. Near TSSs,intermediate methylation correlated with intermediate gene expres-sion, and in exons it was associated with an intermediate level of exoninclusion47. Intermediate methylation signatures were equally strongwithin tissue samples, peripheral blood and purified cell types, suggest-ing that intermediate methylation is not simply reflecting differentialmethylation between cell types, but probably reflects a stable state ofcell-to-cell variability within a population of cells of the same type.

    Epigenomic differences during lineage specificationWe next studied the relationship between DNA methylation dynam-ics and histone modifications across 95 epigenomes with methylationdata, extending previous studies that focused on individual lineages19,4850.We found that the distribution of methylation levels for CpGs in somechromatin states varied significantly across tissue and cell type (Fig. 4g,Extended Data Fig. 4f and Supplementary Table 4a). For example,TssAFlnk states were largely unmethylated in terminally differentiatedcells and tissues, but frequently methylated for several pluripotent andembryonic-stem-cell-derived cells (Bonferroni-corrected F-test P , 0.01);

    Chr

    omat

    in s

    tate

    s

    H3K4me1

    DNase

    WGBS

    H3K4me3

    RNA-seq

    a

    b

    c d

    FAM205B ATP8B5P SIT1 NPR2 RECK RNF38 MELK PAX5 GRHPR FRMPD1 SHB ALDH1B1

    RefSeq genesChrom. states

    RNA-seqH3K36me3H4K20me1H3K79me2H3K79me1H3K9me1

    DNaseDGFInput

    H3K4me3H3K9ac

    H3K56acH2A.Z

    H2AK9acH2BK5acH3K4me2H3K18ac

    H3K4me1H3K27acH4K5acH4K8acH3K4ac

    H3K14acH3K23acH2AK5acH4K91ac

    H2BK120acH2BK12acH2BK15acH2BK20acH3K27me3H3K9me3

    WGBSHi-C

    E017E002E008E001E015E014E016E003E024E020E019E018E021E022E007E009E010E013E012E011E004E005E006E062E034E045E033E044E043E039E041E042E040E037E048E038E047E029E031E035E051E050E036E032E046E030E026E049E025E023E052E055E056E059E061E057E058E028E027E054E053E112E093E071E074E068E069E072E067E073E070E082E081E063E100E108E107E089E090E083E104E095E105E065E078E076E103E111E092E085E084E109E106E075E101E102E110E077E079E094E099E086E088E097E087E080E091E066E098E096E113E114E115E116E117E118E119E120E121E122E123E124E125E126E127E128E129

    Epithelial

    Mesench.B cell

    T cell

    ES-deriv.

    iPSC

    ES cellIMR90

    Other

    Digestive

    HeartSm. musc.

    Muscle

    Brain

    Blood &

    HSC &

    ENCODE2012

    H3K

    4me1

    semonegi pe ecner ef er 721

    DN

    ase

    WG

    BS

    H3K

    4me3

    RN

    A-se

    q

    33 data

    sets in I

    MR90 l

    ung fibr

    oblasts

    Genome-wide measurements for all marks

    IMR

    90 fe

    tal l

    ung fib

    robl

    asts

    Indi

    vidu

    al m

    ark

    data

    set

    s ac

    ross

    epi

    geno

    mes

    Chr

    omat

    in s

    tate

    ann

    otat

    ions

    in 1

    27 e

    pige

    nom

    es

    FAM205B ATP8B5P SIT1 NPR2 RECK RNF38 MELK PAX5 GRHPR FRMPD1 SHB ALDH1B1

    Figure 3 | Epigenomic information across tissues and marks. a, Chromatinstate annotations across 127 reference epigenomes (rows, Fig. 2) in a ,3.5-Mbregion on chromosome 9. Promoters are primarily constitutive (red verticallines), while enhancers are highly dynamic (dispersed yellow regions).b, Signal tracks for IMR90 showing RNA-seq, a total of 28 histone modificationmarks, whole-genome bisulfite DNA methylation, DNA accessibility, digitalgenomic footprints (DGF), input DNA and chromatin conformationinformation72. c, Individual epigenomic marks across all epigenomes in whichthey are available. d, Relationship of figure panels highlights data set dimensions.

    RESEARCH ARTICLE

    3 2 0 | N A T U R E | V O L 5 1 8 | 1 9 F E B R U A R Y 2 0 1 5

    Macmillan Publishers Limited. All rights reserved2015

  • Epigenomicinformationacrosstissuesandmarks

    RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248

    accessibility (Extended Data Fig. 3a), lower methylation (Extended DataFig. 3b) and higher transcription factor binding (Extended Data Fig. 2c)than enhancers lacking H3K27ac. In a subset of 7 epigenomes with anaverage of 24 epigenomic marks, we learned separate 50-state chro-matin state models based on all the available histone marks and DNAaccessibility in each epigenome (Supplementary Fig. 4), which addi-tionally distinguished: a DNase state with distinct transcription factorbinding enrichments (Supplementary Fig. 4f), including for mediator/cohesin components43 (even though CTCF was not included as an input

    track to learn the model) and repressor NRSF; transcribed states show-ing H3K79me1 and H3K79me2 and associated with the 59 ends of genesand introns; and a large number of putative regulatory and neighbour-ing regions showing diverse acetylation marks even in the absence ofthe H3K4 methylation signatures characteristic of enhancer and pro-moter regions.

    We used chromatin states to study the relationship between histonemodification patterns, RNA expression levels, DNA methylation andDNA accessibility. Consistent with previous studies19,23,44,45, we foundlow DNA methylation and high accessibility in promoter states, highDNA methylation and low accessibility in transcribed states, and inter-mediate DNA methylation and accessibility in enhancer states (Fig. 4d, eand Extended Data Fig. 3a, b). These differences in methylation levelwere stronger for higher-expression genes than for lower-expressiongenes, leading to a more pronounced DNA methylation profile (ExtendedData Fig. 3c, Supplementary Fig. 5 and Supplementary Table 4f). Genesproximal to H3K27ac-marked enhancers show significantly higher expres-sion levels (Extended Data Fig. 3d), and conversely, higher-expressiongenes were significantly more likely to be neighbouring H3K27ac-containing enhancers (Extended Data Fig. 3e).

    Chromatin states sometimes captured differences in RNA express-ion that are missed by DNA methylation or accessibility. For example,TxFlnk, Enh, TssBiv and BivFlnk states show similar distributions ofDNA accessibility but widely differing enrichments for expressed genes(Fig. 4c, d). Enh and ReprPC states show intermediate DNA methyla-tion, but very different distributions of DNA accessibility and differentenrichments for expressed genes (Fig. 4ce). Lack of DNA methylation,typically associated with de-repression, is associated with both the activeTssA promoter state and the bivalent TssBiv and BivFlnk states. Bivalentstates TssBiv and BivFlnk also show overall lower DNA methylationand higher DNA accessibility than enhancer states Enh and EnhG, andbinding by both activating and repressive regulatory factors (ExtendedData Fig. 2b). These results also held for alternative methylation mea-surement platforms (Extended Data Fig. 4ac), and for the 18-state chro-matin state model (Extended Data Fig. 4d, e). Overall, these resultshighlight the complex relationship between DNA methylation, DNAaccessibility and RNA transcription and the value of interpreting DNAmethylation and DNA accessibility in the context of integrated chro-matin states that better distinguish active and repressed regions.

    Given the intermediate methylation levels of tissue-specific enhan-cer regions, we directly annotated intermediate methylation regions,based on 25 complementary DNA methylation assays of MeDIP31,46

    and MRE-seq22,39 from 9 reference epigenomes47. This resulted in morethan 18,000 intermediate methylation regions, showing 57% CpG meth-ylation on average, that are strongly enriched in genes, enhancer chro-matin states (EnhBiv, EnhG, Enh) and evolutionarily conserved regions.Intermediate methylation was associated with intermediate levels ofactive histone modifications and DNase I hypersensitivity. Near TSSs,intermediate methylation correlated with intermediate gene expres-sion, and in exons it was associated with an intermediate level of exoninclusion47. Intermediate methylation signatures were equally strongwithin tissue samples, peripheral blood and purified cell types, suggest-ing that intermediate methylation is not simply reflecting differentialmethylation between cell types, but probably reflects a stable state ofcell-to-cell variability within a population of cells of the same type.

    Epigenomic differences during lineage specificationWe next studied the relationship between DNA methylation dynam-ics and histone modifications across 95 epigenomes with methylationdata, extending previous studies that focused on individual lineages19,4850.We found that the distribution of methylation levels for CpGs in somechromatin states varied significantly across tissue and cell type (Fig. 4g,Extended Data Fig. 4f and Supplementary Table 4a). For example,TssAFlnk states were largely unmethylated in terminally differentiatedcells and tissues, but frequently methylated for several pluripotent andembryonic-stem-cell-derived cells (Bonferroni-corrected F-test P , 0.01);

    Chr

    omat

    in s

    tate

    s

    H3K4me1

    DNase

    WGBS

    H3K4me3

    RNA-seq

    a

    b

    c d

    FAM205B ATP8B5P SIT1 NPR2 RECK RNF38 MELK PAX5 GRHPR FRMPD1 SHB ALDH1B1

    RefSeq genesChrom. states

    RNA-seqH3K36me3H4K20me1H3K79me2H3K79me1H3K9me1

    DNaseDGFInput

    H3K4me3H3K9ac

    H3K56acH2A.Z

    H2AK9acH2BK5acH3K4me2H3K18ac

    H3K4me1H3K27acH4K5acH4K8acH3K4ac

    H3K14acH3K23acH2AK5acH4K91ac

    H2BK120acH2BK12acH2BK15acH2BK20acH3K27me3

    H3K9me3WGBS

    Hi-C

    E017E002E008E001E015E014E016E003E024E020E019E018E021E022E007E009E010E013E012E011E004E005E006E062E034E045E033E044E043E039E041E042E040E037E048E038E047E029E031E035E051E050E036E032E046E030E026E049E025E023E052E055E056E059E061E057E058E028E027E054E053E112E093E071E074E068E069E072E067E073E070E082E081E063E100E108E107E089E090E083E104E095E105E065E078E076E103E111E092E085E084E109E106E075E101E102E110E077E079E094E099E086E088E097E087E080E091E066E098E096E113E114E115E116E117E118E119E120E121E122E123E124E125E126E127E128E129

    Epithelial

    Mesench.B cell

    T cell

    ES-deriv.

    iPSC

    ES cellIMR90

    Other

    Digestive

    HeartSm. musc.

    Muscle

    Brain

    Blood &

    HSC &

    ENCODE2012

    H3K

    4me1

    semonegi pe ecner ef er 721

    DN

    ase

    WG

    BS

    H3K

    4me3

    RN

    A-se

    q

    33 data

    sets in I

    MR90 l

    ung fibr

    oblasts

    Genome-wide measurements for all marks

    IMR

    90 fe

    tal l

    ung fib

    robl

    asts

    Indi

    vidu

    al m

    ark

    data

    set

    s ac

    ross

    epi

    geno

    mes

    Chr

    omat

    in s

    tate

    ann

    otat

    ions

    in 1

    27 e

    pige

    nom

    es

    FAM205B ATP8B5P SIT1 NPR2 RECK RNF38 MELK PAX5 GRHPR FRMPD1 SHB ALDH1B1

    Figure 3 | Epigenomic information across tissues and marks. a, Chromatinstate annotations across 127 reference epigenomes (rows, Fig. 2) in a ,3.5-Mbregion on chromosome 9. Promoters are primarily constitutive (red verticallines), while enhancers are highly dynamic (dispersed yellow regions).b, Signal tracks for IMR90 showing RNA-seq, a total of 28 histone modificationmarks, whole-genome bisulfite DNA methylation, DNA accessibility, digitalgenomic footprints (DGF), input DNA and chromatin conformationinformation72. c, Individual epigenomic marks across all epigenomes in whichthey are available. d, Relationship of figure panels highlights data set dimensions.

    RESEARCH ARTICLE

    3 2 0 | N A T U R E | V O L 5 1 8 | 1 9 F E B R U A R Y 2 0 1 5

    Macmillan Publishers Limited. All rights reserved2015

  • Epigenomicinformationacrosstissuesandmarks

    RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248

    accessibility (Extended Data Fig. 3a), lower methylation (Extended DataFig. 3b) and higher transcription factor binding (Extended Data Fig. 2c)than enhancers lacking H3K27ac. In a subset of 7 epigenomes with anaverage of 24 epigenomic marks, we learned separate 50-state chro-matin state models based on all the available histone marks and DNAaccessibility in each epigenome (Supplementary Fig. 4), which addi-tionally distinguished: a DNase state with distinct transcription factorbinding enrichments (Supplementary Fig. 4f), including for mediator/cohesin components43 (even though CTCF was not included as an input

    track to learn the model) and repressor NRSF; transcribed states show-ing H3K79me1 and H3K79me2 and associated with the 59 ends of genesand introns; and a large number of putative regulatory and neighbour-ing regions showing diverse acetylation marks even in the absence ofthe H3K4 methylation signatures characteristic of enhancer and pro-moter regions.

    We used chromatin states to study the relationship between histonemodification patterns, RNA expression levels, DNA methylation andDNA accessibility. Consistent with previous studies19,23,44,45, we foundlow DNA methylation and high accessibility in promoter states, highDNA methylation and low accessibility in transcribed states, and inter-mediate DNA methylation and accessibility in enhancer states (Fig. 4d, eand Extended Data Fig. 3a, b). These differences in methylation levelwere stronger for higher-expression genes than for lower-expressiongenes, leading to a more pronounced DNA methylation profile (ExtendedData Fig. 3c, Supplementary Fig. 5 and Supplementary Table 4f). Genesproximal to H3K27ac-marked enhancers show significantly higher expres-sion levels (Extended Data Fig. 3d), and conversely, higher-expressiongenes were significantly more likely to be neighbouring H3K27ac-containing enhancers (Extended Data Fig. 3e).

    Chromatin states sometimes captured differences in RNA express-ion that are missed by DNA methylation or accessibility. For example,TxFlnk, Enh, TssBiv and BivFlnk states show similar distributions ofDNA accessibility but widely differing enrichments for expressed genes(Fig. 4c, d). Enh and ReprPC states show intermediate DNA methyla-tion, but very different distributions of DNA accessibility and differentenrichments for expressed genes (Fig. 4ce). Lack of DNA methylation,typically associated with de-repression, is associated with both the activeTssA promoter state and the bivalent TssBiv and BivFlnk states. Bivalentstates TssBiv and BivFlnk also show overall lower DNA methylationand higher DNA accessibility than enhancer states Enh and EnhG, andbinding by both activating and repressive regulatory factors (ExtendedData Fig. 2b). These results also held for alternative methylation mea-surement platforms (Extended Data Fig. 4ac), and for the 18-state chro-matin state model (Extended Data Fig. 4d, e). Overall, these resultshighlight the complex relationship between DNA methylation, DNAaccessibility and RNA transcription and the value of interpreting DNAmethylation and DNA accessibility in the context of integrated chro-matin states that better distinguish active and repressed regions.

    Given the intermediate methylation levels of tissue-specific enhan-cer regions, we directly annotated intermediate methylation regions,based on 25 complementary DNA methylation assays of MeDIP31,46

    and MRE-seq22,39 from 9 reference epigenomes47. This resulted in morethan 18,000 intermediate methylation regions, showing 57% CpG meth-ylation on average, that are strongly enriched in genes, enhancer chro-matin states (EnhBiv, EnhG, Enh) and evolutionarily conserved regions.Intermediate methylation was associated with intermediate levels ofactive histone modifications and DNase I hypersensitivity. Near TSSs,intermediate methylation correlated with intermediate gene expres-sion, and in exons it was associated with an intermediate level of exoninclusion47. Intermediate methylation signatures were equally strongwithin tissue samples, peripheral blood and purified cell types, suggest-ing that intermediate methylation is not simply reflecting differentialmethylation between cell types, but probably reflects a stable state ofcell-to-cell variability within a population of cells of the same type.

    Epigenomic differences during lineage specificationWe next studied the relationship between DNA methylation dynam-ics and histone modifications across 95 epigenomes with methylationdata, extending previous studies that focused on individual lineages19,4850.We found that the distribution of methylation levels for CpGs in somechromatin states varied significantly across tissue and cell type (Fig. 4g,Extended Data Fig. 4f and Supplementary Table 4a). For example,TssAFlnk states were largely unmethylated in terminally differentiatedcells and tissues, but frequently methylated for several pluripotent andembryonic-stem-cell-derived cells (Bonferroni-corrected F-test P , 0.01);

    Chr

    omat

    in s

    tate

    sH3K4me1

    DNase

    WGBS

    H3K4me3

    RNA-seq

    a

    b

    c d

    FAM205B ATP8B5P SIT1 NPR2 RECK RNF38 MELK PAX5 GRHPR FRMPD1 SHB ALDH1B1

    RefSeq genesChrom. states

    RNA-seqH3K36me3H4K20me1H3K79me2H3K79me1H3K9me1

    DNaseDGFInput

    H3K4me3H3K9ac

    H3K56acH2A.Z

    H2AK9acH2BK5acH3K4me2H3K18ac

    H3K4me1H3K27acH4K5acH4K8acH3K4ac

    H3K14acH3K23acH2AK5acH4K91ac

    H2BK120acH2BK12acH2BK15acH2BK20acH3K27me3H3K9me3

    WGBSHi-C

    E017E002E008E001E015E014E016E003E024E020E019E018E021E022E007E009E010E013E012E011E004E005E006E062E034E045E033E044E043E039E041E042E040E037E048E038E047E029E031E035E051E050E036E032E046E030E026E049E025E023E052E055E056E059E061E057E058E028E027E054E053E112E093E071E074E068E069E072E067E073E070E082E081E063E100E108E107E089E090E083E104E095E105E065E078E076E103E111E092E085E084E109E106E075E101E102E110E077E079E094E099E086E088E097E087E080E091E066E098E096E113E114E115E116E117E118E119E120E121E122E123E124E125E126E127E128E129

    Epithelial

    Mesench.B cell

    T cell

    ES-deriv.

    iPSC

    ES cellIMR90

    Other

    Digestive

    HeartSm. musc.

    Muscle

    Brain

    Blood &

    HSC &

    ENCODE2012

    H3K

    4me1

    semonegi pe ecner ef er 721

    DN

    ase

    WG

    BS

    H3K

    4me3

    RN

    A-se

    q

    33 data

    sets in I

    MR90 l

    ung fibr

    oblasts

    Genome-wide measurements for all marks

    IMR

    90 fe

    tal l

    ung fib

    robl

    asts

    Indi

    vidu

    al m

    ark

    data

    set

    s ac

    ross

    epi

    geno

    mes

    Chr

    omat

    in s

    tate

    ann

    otat

    ions

    in 1

    27 e

    pige

    nom

    es

    FAM205B ATP8B5P SIT1 NPR2 RECK RNF38 MELK PAX5 GRHPR FRMPD1 SHB ALDH1B1

    Figure 3 | Epigenomic information across tissues and marks. a, Chromatinstate annotations across 127 reference epigenomes (rows, Fig. 2) in a ,3.5-Mbregion on chromosome 9. Promoters are primarily constitutive (red verticallines), while enhancers are highly dynamic (dispersed yellow regions).b, Signal tracks for IMR90 showing RNA-seq, a total of 28 histone modificationmarks, whole-genome bisulfite DNA methylation, DNA accessibility, digitalgenomic footprints (DGF), input DNA and chromatin conformationinformation72. c, Individual epigenomic marks across all epigenomes in whichthey are available. d, Relationship of figure panels highlights data set dimensions.

    RESEARCH ARTICLE

    3 2 0 | N A T U R E | V O L 5 1 8 | 1 9 F E B R U A R Y 2 0 1 5

    Macmillan Publishers Limited. All rights reserved2015

  • Epigenomicinformationacrosstissuesandmarks

    RoadmapEpigenomicsConsortium(2015)Nature518,317330doi:10.1038/nature14248

    accessibility (Extended Data Fig. 3a), lower methylation (Extended DataFig. 3b) and higher transcription factor binding (Extended Data Fig. 2c)than enhancers lacking H3K27ac. In a subset of 7 epigenomes with anaverage of 24 epigenomic marks, we learned separate 50-state chro-matin state models based on all the available histone marks and DNAaccessibility in each epigenome (Supplementary Fig. 4), which addi-tionally distinguished: a DNase state with distinct transcription factorbinding enrichments (Supplementary Fig. 4f), including for mediator/cohesin components43 (even though CTCF was not included as an input

    track to learn the model) and repressor NRSF; transcribed states show-ing H3K79me1 and H3K79me2 and associated with the 59 ends of genesand introns; and a large number of putative regulatory and neighbour-ing regions showing diverse acetylation marks even in the absence ofthe H3K4 methylation signatures characteristic of enhancer and pro-moter regions.

    We used chromatin states to study the relationship between histonemodification patterns, RNA expression levels, DNA methylation andDNA accessibility. Consistent with previous studies19,23,44,45, we foundlow DNA methylation and high accessibility in promoter states, highDNA methylation and low accessibility in transcribed states, and inter-mediate DNA methylation and accessibility in enhancer states (Fig. 4d, eand Extended Data Fig. 3a, b). These differences in methylation levelwere stronger for higher-expression genes than for lower-expressiongenes, leading to a more pronounced DNA methylation profile (ExtendedData Fig. 3c, Supplementary Fig. 5 and Supplementary Table 4f). Genesproximal to H3K27ac-marked enhancers show significantly higher expres-sion levels (Extended Data Fig. 3d), and conversely, higher-expressiongenes were significantly more likely to be neighbouring H3K27ac-containing enhancers (Extended Data Fig. 3e).

    Chromatin states sometimes captured differences in RNA express-ion that are missed by DNA methylation or accessibility. For example,TxFlnk, Enh, TssBiv and BivFlnk states show similar distributions ofDNA accessibility but widely differing enrichments for expressed genes(Fig. 4c, d). Enh and ReprPC states show intermediate DNA methyla-tion, but very different distributions of DNA accessibility and differentenrichments for expressed genes (Fig. 4ce). Lack of DNA methylation,typically associated with de-repression, is associated with both the activeTssA promoter state and the bivalent TssBiv and BivFlnk states. Bivalentstates TssBiv and BivFlnk also show overall lower DNA methylationand higher DNA accessibility than enhancer states Enh and EnhG, andbinding by both activating and repressive regulatory factors (ExtendedData Fig. 2b). These results also held for alternative methylation mea-surement platforms (Extended Data Fig. 4ac), and for the 18-state chro-matin state model (Extended Data Fig. 4d, e). Overall, these resultshighlight the complex relationship between DNA methylation, DNAaccessibility and RNA transcription and the value of interpreting DNAmethylation and DNA accessibility in the context of integrated chro-matin states that better distinguish active and repressed regions.

    Given the intermediate methylation levels of tissue-specific enhan-cer regions, we directly annotated intermediate methylation regions,based on 25 complementary DNA methylation assays of MeDIP31,46

    and MRE-seq22,39 from 9 reference epigenomes47. This resulted in morethan 18,000 intermediate methylation regions, showing 57% CpG meth-ylation on average, that are strongly enriched in genes, enhancer chro-matin states (EnhBiv, EnhG, Enh) and evolutionarily conserved regions.Intermediate methylation was associated with intermediate levels ofactive histone modifications and DNase I hypersensitivity. Near TSSs,intermediate methylation correlated with intermediate gene expres-sion, and in exons it was associated with an intermediate level of exoninclusion47. Intermediate methylation signatures were equally strongwithin tissue samples, peripheral blood and purified cell types, suggest-ing that intermediate methylation is not simply reflecting differentialmethylation between cell types, but probably reflects a stable state ofcell-to-cell variability within a population of cells of the same type.

    Epigenomic differences during lineage specificationWe next studied the relationship between DNA methylation dynam-ics and histone modifications across 95 epigenomes with methylationdata, extending previous studies that focused on individual lineages19,4850.We found that the distribution of methylation levels for CpGs in somechromatin states varied significantly across tissue and cell type (Fig. 4g,Extended Data Fig. 4f and Supplementary Table 4a). For example,TssAFlnk states were largely unmethylated in terminally differentiatedcells and tissues, but frequently methylated for several pluripotent andembryonic-stem-cell-derived cells (Bonferroni-corrected F-test P , 0.01);

    Chr

    omat

    in s

    tate

    s

    H3K4me1

    DNase

    WGBS

    H3K4me3

    RNA-seq

    a

    b

    c d

    FAM205B ATP8B5P SIT