michael snyder on behalf of the encode consortium march 10 ... · h1-hesc a549 gm12878 mcf-7 hepg2...

43
Summary of ENCODE Accomplishments Michael Snyder On Behalf of the ENCODE Consortium March 10, 2015 Conflicts: Personalis, Genapsys, AxioMx

Upload: others

Post on 24-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Summary of ENCODE Accomplishments

Michael Snyder

On Behalf of the ENCODE

Consortium

March 10, 2015

Conflicts: Personalis, Genapsys, AxioMx

Page 2: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

ENCODE: Three Phases (2003-present) Many Experimental Assays

Modified from PLoS Biol 9:e1001046, 2011 Science 306:636, 2004

Page 3: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

ENCODE Dimensions

Methods/Factors

Cells

387 C

ell

Lin

es/ T

issues

1,700 Assays (~886 different TFs/~300 RBPs)

3,331 (8,832) Experiments

Chip-seq (239 TFs

+ Histone marks;

5101 data sets)

iCLIP (308 RBPs)

RNA-seq (643)

DNAse-seq (449)

Page 4: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

High Quality Data

• > Two biological replicates

• Multiple quality control measures

IDR Processing, QC and Blacklist Filtering

Poor reproducibility

Good reproducibility

Rep1

Re

p2

Page 5: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

ENCODE Uniform Analysis Pipeline

Mapped reads

Uniform Peak Calling Pipeline (SPP, PeakSeq)

Poor reproducibility Good reproducibility

Rep1

Re

p2

Anshul Kundaje

Uniform peak Calling (SPP, PeakSeq)

Quality Control

Derived Data (Chromosome Segments,

Expression)

Processing & Element Calling Compatible with Other Projects: GTEx, REMC, IHEC

Page 6: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Established Standards For Community

• ChIP-Seq

• DNAseHS

• RNA-Seq

Antibody characterization, Biological replicates,

QC measures

Genome Res.

2012

….

Page 7: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Assays/Data Types

7

Human

Mouse

1

10

100

1000

10000Started/Proposed

Done/Submitted

Exp

erim

ents

1

10

100

1000

10000

Started/Proposed

Done/Submitted

Exp

erim

ents

Page 8: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Deep Exploration of Some Lines Using Many Assays

8

0 500 1000 1500 2000

HeLa-S3

H1-hESC

A549

GM12878

MCF-7

HepG2

K562

Histone ChIP-Seq

TF ChIP-Seq

RBP iCLIP

RBP RNAi/RNA-seq

TF RNAi/RNA-seq2

ChIA-PET

RIP-seq

Number of Factors

Page 9: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Some Assays Were Conducted Across a Broad Range of Biosamples

9

0

50

100

150

200

250

300

350

RN

A-s

eq

TF C

hIP

-Seq

DN

ase-

seq

His

ton

e C

hIP

-Seq

RR

BS

RA

MP

AG

E

RN

A p

rofi

ling

by

arra

y…

DN

aseI

com

par

ativ

e ge

no

mic

DN

A m

eth

ylat

ion

DN

ase-

seq

WG

BS

DN

ase-

seq

(d

eep

)

ATA

C-S

eq

CA

GE

FAIR

E-se

q

Rep

li-ch

ip

wh

ole

-gen

om

e…

Rep

li-se

q

Nan

ost

rin

g

RN

A-P

ET

met

hyl

C-s

eq 5C

mic

roR

NA

pro

filin

g b

y…

Ch

IA-P

ET

MeD

IP-S

EQ a

ssay

MR

E-se

q

DN

ase-

seq

(lo

ng)

RIP

-seq

shR

NA

kn

ock

do

wn

iCLI

P

DN

A-P

ET

MN

ase-

seq

RN

A-S

eq

fo

llow

ing

TF…

RN

A B

ind

-n-S

eq

Un

iqu

e B

iosa

mp

les

Page 10: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Unique Biosample Types

020406080

100120140160180200

Human

Mouse

10

Page 11: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

ENCODE Data

11

Cloud Storage and Computing Data available at Amazon Web Services (AWS) Uniform processing pipelines will be available at DNAnexus

related projects

Highly Searchable

Page 12: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

ENCODE Data Open Access

12

Page 13: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

New ENCODE Portal https://www.encodeproject.org

13

Page 14: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

ENCODE Encyclopedia Prototype

14 http://encodeproject.org

Page 15: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

10 Computational Groups

15

Analyzing data in a variety of different ways

• GWAS

• Cancer

• Regulatory principles

Page 16: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Software Tools

16

• >30 Different algorithms

• Wide variety of areas. Examples:

– Segmentation

– Allele calling

– 3D nuclear analysis

– Data processing and peak calling

– Data quality control

Page 17: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Summary of Impact • Lots of open diverse data types on same cell

lines/tissues

• Experimental standards

• New analysis methods

• Methods and standards adopted by other large communities: e.g GTEx, REMC, betaCell, CIRM

• Data are widely used

17

Page 18: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

18

Feb 2015 >750 Papers NonENCODE + 150 mod- ENCODE

ENCODE Publications

Page 19: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

ENCODE Community Publications

Page 20: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Outreach Activities • Tutorials:https://www.encodeproject.org/tutorials • (http://www.genome.gov/27553900)

• CHARGE-ENCODE workshop • User’s meeting in 2015

20

Page 21: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Additional High Level Impact

21

1) Segmenting genome into types of elements

2) Gene regulatory principles

3) GWAS

>85% of lead SNPs lie outside of coding regions

a b

c

de

Phenotype−associated SNPs

Random sampling of matched SNPs

Genotyped SNPs

1000 Genomes

24 Personal genomes

chr5:40,390,001-40,440,000 (50,000 bp)

rs4613763 rs17234657 rs11742570

rs6451493

rs1992660

rs6896969 rs1373692 rs9292777

HUVEC G ATA2

HUVEC cFOS

HUVEC Input

HUVEC

Jurkat

Th1

Th2

chr5:

G W AS Catalog

Human Feb. 2009 (GRCh37/hg19) chr5:39,274,501-40,819,500 (1,545,000 bp)

39500000 40000000 40500000

C9

DAB2

BC026261

PTGER4TTC33

OSRF

PRKAA1

DNase I

TFs

Crohn’s disease

ulcerative colitis

multiple sclerosis

DNaseI peaks TF

Fraction of SNPs that ov

erlap feature

0

0.1

0.2

0.3

log2 (fold en

richment or depletion)

−0.5

0.0

0.5

1.0

1.5

E WETSSPF

CTCFT

D/R

GM12878

genes above

threshold

G O:0006955 immune responseG WAS enrichment10

-log p-value

Phenotype SN

P-P

hen

o as

soci

atio

ns

over

lap

any

TF

occu

panc

y

Gm

1287

8Ebf

Gm

1287

8Pol

24h8

Gm

1287

8Pol

2

Hel

as3C

ebpb

Igg

rab

K56

2Ctc

f

Gm

1287

8Pu1

Gm

1287

8Mef

2a

K56

2Pol

2V04

1610

1

Gm

1287

8Pax

5c20

Gm

1287

8Nfk

bIg

grab

Hep

g2C

tcf

Hu

vecG

ata2

Ucd

Gm

1287

8Elf1

sc63

1V04

1610

1

Gm

1287

8Egr

1V04

1610

1

Hep

g2M

afka

b503

22Ig

grab

Hu

vecC

fosU

cd

Gm

1287

8Bcl

11a

Gm

1287

8Irf

4

Gm

1287

8Bat

f

Gm

1287

8Tcf

12

Hep

g2Fo

sl2

K56

2Tal

1sc1

2984

Igg

mus

K56

2Max

V04

1610

2

Hep

g2Tc

f4U

cd

Hep

g2Fo

xa2

Hep

g2P

300V

0416

101

Hep

g2Fo

xa1c

20

Hu

vecC

tcf

Hel

as3C

tcf

Hep

g2Ju

nd

CA

CO

2.D

S82

35

HU

VE

C.a

ll

Hep

G2.

all

Jurk

at.D

S12

659

hTH

1.al

l

hTH

2.D

S78

42

CD

34.D

S16

814

TOTAL 4860 600 78 57 69 69 72 47 47 71 54 35 54 29 44 28 48 50 38 35 45 37 37 44 62 33 57 46 62 40 55 47 70 85 118 62 192 57 81

Height 204 34 7 3 3 7 6 1 3 2 3 2 6 0 4 6 3 2 3 5 5 2 0 2 3 1 2 0 2 5 4 3 3 6 5 4 9 3 7

Systemic_lupus_erythematosus 62 10 4 6 6 2 1 1 4 0 1 4 1 1 4 2 0 1 2 3 4 2 1 0 1 0 0 0 0 1 1 1 1 2 0 0 4 2 1

Crohn's_disease 105 20 2 2 2 2 1 2 2 0 2 1 2 5 1 1 1 3 2 1 1 0 2 1 1 2 1 2 3 2 3 1 3 6 5 3 9 5 5

Ulcerative_colitis 85 11 2 3 3 0 1 2 3 1 3 3 1 2 3 2 1 1 2 1 2 2 0 2 2 1 0 2 2 0 1 1 3 2 5 3 7 2 3

Multiple_sclerosis 71 15 4 3 3 1 0 3 4 2 4 2 0 2 2 1 0 2 4 3 2 3 0 3 1 0 0 0 0 0 0 0 0 1 1 3 5 4 3

Rheumatoid_arthritis 57 11 4 2 2 1 0 4 3 0 4 4 0 0 1 1 0 0 1 0 2 2 0 1 0 0 0 0 0 0 0 0 0 2 2 1 11 3 1

LDL_cholesterol 45 8 0 0 0 2 2 1 0 4 1 0 1 0 1 0 1 0 0 0 0 0 3 0 3 2 3 2 2 1 1 3 1 0 2 1 0 1 0

Bone_mineral_density 65 9 1 1 1 1 2 2 2 1 2 1 1 0 2 2 2 0 1 2 1 1 0 0 1 0 2 2 3 1 1 1 2 2 4 3 3 2 3

Coronary_heart_disease 107 17 2 0 0 2 4 0 0 4 1 2 0 2 0 0 1 1 1 0 0 1 1 1 3 1 2 2 2 1 1 1 3 2 3 0 6 0 1

Chronic_lymphocytic_leukemia 17 8 1 4 5 0 0 3 1 0 2 1 0 0 2 0 1 0 2 1 1 2 0 1 0 1 0 0 0 0 0 0 1 0 0 0 2 0 1

Prostate_cancer 56 8 0 0 0 0 0 0 0 1 0 0 2 1 0 0 3 2 0 0 0 0 2 1 1 4 3 3 3 0 0 2 2 3 1 1 2 0 1

Triglycerides 48 10 0 0 0 1 2 0 0 2 1 0 2 1 1 0 2 2 0 0 0 0 3 1 2 1 2 2 1 2 2 3 0 2 1 0 2 1 0

Celiac_disease 54 11 4 3 3 0 2 2 1 1 1 2 0 0 1 0 0 0 0 1 1 1 1 0 1 1 1 1 2 0 1 2 0 0 0 2 2 1 2

Colorectal_cancer 18 5 0 0 0 1 0 0 0 1 0 0 2 0 0 0 0 0 0 0 0 0 2 0 0 2 3 3 3 0 0 2 2 0 1 0 2 0 1

Hematological_parameters 85 12 3 0 0 3 1 0 3 0 1 1 2 0 0 0 2 0 1 1 2 2 1 1 0 0 1 1 1 0 0 0 1 1 3 3 6 3 5

HIV-1_control 55 10 0 2 4 1 2 0 1 2 2 0 0 0 1 0 0 0 1 1 1 1 1 1 2 1 1 2 1 0 0 1 0 0 0 0 2 1 0

Protein_quantitative_trait_loci 48 7 2 2 2 0 0 2 1 1 1 0 0 1 2 2 1 0 2 1 2 1 0 0 1 0 0 0 0 0 0 1 1 1 2 1 2 1 1

Alzheimer's_disease 42 5 0 0 0 1 2 0 0 1 0 0 2 0 0 0 0 1 0 0 0 0 1 1 1 0 1 1 0 2 2 1 1 2 0 0 2 0 1

HDL_cholesterol 55 8 1 0 0 1 1 0 0 2 1 0 1 0 0 0 1 2 0 0 0 1 2 1 1 1 2 2 2 1 1 2 0 1 2 0 3 1 0

Cholesterol 16 6 1 0 0 0 2 0 0 2 2 0 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 0 2 1 1 0 0 2 0 1 1 0

Longevity 30 5 0 2 3 1 1 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 2 1 2 0 2 1 0 0 1

Attention_deficit_hyperactivity_disorder 102 9 0 0 0 1 2 0 0 1 0 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 2 0 1 0 0

Cognitive_performance 111 8 0 0 2 1 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 1 3 0 0 0 0

Type_2_diabetes 97 13 0 0 0 1 1 2 1 1 1 0 1 1 0 0 2 1 1 1 1 0 0 3 1 0 0 0 0 2 1 0 1 0 2 0 4 0 0

Conduct_disorder 38 5 0 1 1 1 3 0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 2 0 0 1 2 2 1 0 1

Type_1_diabetes 67 7 2 1 1 0 0 2 1 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 2 2 0 1 0 1 0 2 1 5 1 1

Dialysis-related_mortality 26 6 1 0 0 1 1 1 0 0 0 0 2 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 2 1 0 0 1

Bipolar_disorder 110 6 1 0 0 2 1 0 0 1 0 0 0 1 0 0 0 3 0 0 0 0 1 0 1 0 0 0 0 0 0 1 2 3 1 0 2 0 1

Body_mass 98 5 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 3 0 1 0 0

C-reactive_protein 34 7 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 1 0 2 1 0 0 1

Menarche_and_menopause 62 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 2 0 0 0 0 0 0 0 1 0 0 2 1 1 0

Breast_cancer 43 6 1 0 0 2 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 2 2 0 0

Mean_platelet_volume 15 5 1 0 0 0 0 1 0 0 1 0 0 0 0 0 2 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 2 0 1

Soluble_levels_of_adhesion_molecules 5 5 1 0 1 1 0 1 0 2 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

Psoriasis 38 6 1 1 2 0 0 2 1 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 4 0 0

Parkinson's_disease 46 5 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0

Obesity 36 5 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Fasting_glucose-related_traits 17 5 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 1 1 1 1 1 0 0

Page 22: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Reported Association

Functional SNP

GWAS (Japanese): T Yamauchi et al. A

genome-wide association study in the

Japanese population identifies susceptibility

loci for type 2 diabetes at UBE2E2 and

C2CD4A-C2CD4B. Nature Genetics 42, 864–

868 (2010).

GWAS (Danish): N. Grarup et al. The

diabetogenic VPS13C/C2CD4A/C2CD4B

rs7172432 variant impairs glucose-stimulated

insulin response in 5,722 non-diabetic Danish

individuals. Diabetologia (2011) 54:789–794

Example: rs7172432 in Type 2 Diabetes

1.0

Alan P Boyle RegulomeDB

Page 23: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Altered View of the Human Genome 2003

• 25,000 Protein Coding Genes (1.5%)

• Few Non Coding Genes (Mostly tRNAs, snoRNAs)

• Little regulatory information mapped

23

2015

• 20,000 Protein Coding Genes

• Thousands of noncoding genes

• More potential regulatory DNA than

protein coding DNA

Page 24: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

The ENCODE 3 Consortium

http://www.genome.gov/26525220 24

Page 25: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET
Page 26: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Alternative/Additional Slides

Page 27: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Three Phases

I) Pilot Phase -1% of Genome (2003-2007)

II) Scale Up Phase I (2007-2012)

III) Current Production Phase (2012-2016)

Related Projects:

Mouse ENCODE (2009-2012)

modENCODE (2007-2012)

27

Page 28: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Cancer

Autoimmunity, allergy

Neurological/psychiatric

Cardiovascular

Developmental

Metabolism

Infectious disease

Musculo-skeletal

Reproductive/dimorphism

Ophthalmology

Dermatological

Hematology

Human Genetics

Multiple diseases

Respiratory

Aging

Cancer 35%

Autoimmunity/Allergy

15%

Neuro/ Psych 13%

CV 7%

Categories of Disease-Related ENCODE Community Publications

28

Page 29: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Standard ENCODE Use Cases: Hypothesis Generation

• Prediction of:

• causal variants/regulatory elements

• target genes

• target cell types

• mechanism for phenotype changes

29

Page 30: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

The Genome Is Active!

RNA

Kellis et al 2014

Page 31: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

ENCODE Publications

31

Page 32: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

ENCODE Timeline

32

Page 33: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

33

The ENCODE Consortium Phase 3 Brad Bernstein (Eric Lander, Manolis Kellis, Tony Kouzarides)

Ewan Birney (Jim Kent, Mark Gerstein, Bill Noble, Peter Bickel, Ross Hardison, Zhiping Weng)

Greg Crawford (Ewan Birney, Jason Lieb, Terry Furey, Vishy Iyer)

Jim Kent (David Haussler, Kate Rosenbloom) Mike Cherry

John Stamatoyannopoulos (Evan Eichler, George Stamatoyannopoulos, Job Dekker, Maynard Olson, Michael Dorschner, Patrick

Navas, Phil Green)

Mike Snyder (Kevin Struhl, Mark Gerstein, Peggy Farnham, Sherman Weissman)

Rick Myers (Barbara Wold)

Scott Tenenbaum (Luiz Penalva)

Tim Hubbard (Alexandre Reymond, Alfonso Valencia, David Haussler, Ewan Birney, Jim Kent, Manolis Kellis, Mark Gerstein, Michael Brent,

Roderic Guigo)

Tom Gingeras (Alexandre Reymond, David Spector, Greg Hannon, Michael Brent, Roderic Guigo, Stylianos Antonarakis, Yijun Ruan,

Yoshihide Hayashizaki)

Zhiping Weng (Nathan Trinklein, Rick Myers)

Brenton Graveley (John Rinn, Others)

.. and many senior scientists, postdocs, students, technicians, computer scientists, statisticians and administrators in these groups

NHGRI: Elise Feingold, Mike Pazin, Peter Good

Page 34: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Metadata-driven searches

34

Page 35: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

ENCODE Consortium Phase 3

35

Page 36: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Goals of ENCODE

• Catalog the functional elements in human and mouse genomes

• Generate high quality data using high throughput pipelines

• Develop new technologies and analytical tools to generate, analyze and validate data

• Provide data and tools to the community in as useful form as possible

36

Page 37: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

RNA-Sequencing

Wang et al. 2009 Nat Gen. Rev.

Page 38: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Immunoprecipitation

Transcription Factor

Sequence and align ChIP-seq

Peak 300-500 bp

Antibody

Functional data: ChIP-seq

ChIP-exo Histone Marks

Motif (8-12 bp)

Page 39: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

DNaseI

Histone Histone

Transcription Factor

Sequence and align

DNaseI hypersensitivity peak

Functional data: DNase-seq

Region of open chromatin

Page 40: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

DNaseI

Histone Histone

Transcription Factor

Sequence and align

DNaseI Footprint

Functional data: DNase footprints

Region of open chromatin

Page 41: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Comparing Mouse and Human with

Mouse ENCODE Data

Page 42: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Number of Datasets Per Biosamples

Human Mouse

0

1000

2000

3000

4000

5000

6000

7000

imm

ort

aliz

ed c

ell…

tiss

ue

pri

mar

y ce

ll

stem

cel

l

in v

itro

ind

uce

d…

cell

line

Started/Proposed

Released/Submitted

0

200

400

600

800

1000

1200

1400

tiss

ue

imm

ort

aliz

ed c

ell…

pri

mar

y ce

ll

stem

cel

l

in v

itro

cell

line

ind

uce

d…

42

Page 43: Michael Snyder On Behalf of the ENCODE Consortium March 10 ... · H1-hESC A549 GM12878 MCF-7 HepG2 K562 Histone ChIP-Seq TF ChIP-Seq RBP iCLIP RBP RNAi/RNA-seq TF RNAi/RNA-seq2 ChIA-PET

Assays Per Biosample

43

0

500

1000

1500

2000K

56

2

Hep

G2

MC

F-7

GM

12

87

8

A5

49

H1

-hES

C

HeL

a-S3

liver

SK-N

-SH

end

oth

eli…

HC

T11

6

IMR

-90

kera

tin

oc…

sto

mac

h

fib

rob

last

pan

crea

s

thyr

oid

adre

nal

sple

en

adip

ose

Proposed

Released

0

50

100

150

liver

hea

rt

fore

bra

in

hin

db

rain

mid

bra

in

limb

neu

ral t

ub

e

MEL

cel

l lin

e

kid

ne

y

lun

g

inte

stin

e

sto

mac

h

CH

12

.LX

cran

iofa

cial

emb

ryo

nic

G1

E-ER

4

ES-E

14

myo

cyte

sple

en

thym

us

Proposed

Released