who's in charge here? jim kent encode data coordinating center (dcc) university of california...

22
Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions in the human genome.

Upload: simon-newman

Post on 13-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

 Who's in charge here?

Jim KentENCODE Data Coordinating Center (DCC)

University of California Santa Cruz

Finding and characterizing regulatory regions in the human genome.

Page 2: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

The Paradox of the GenomeHow does a long, static, one dimensional string of DNA turn into the remarkably complex, dynamic, and three dimensional human body?

GTTTGCCATCTTTTGCTGCTCTAGGGAATCCAGCAGCTGTCACCATGTAAACAAGCCCAGGCTAGACCAGTTACCCTCATCATCTTAGCTGATAGCCAGCCAGCCACCACAGGCATGAGT

Page 3: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

Early explanations of development

• A little man in the sperm is in charge of making the baby.

• Begs the question of what makes the little man.

• Theory later disproved by better microscopes.

Page 4: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

More modern thinking• An organism is created by the cooperative/competitive

actions of cells that make it up.• Though all cells (save some specialized blood cells)

share the same DNA, which parts of the DNA are used by cells varies.

• As cells divide they differentiate into different cell types based on signals from other cells, the environment, a bit of randomness, and the cell’s internal state.

• Most of the differentiation decisions ultimately take place in the cell nucleus.

Page 5: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

Nucleus Used to Appear Simple

• Cheek cells stained with basic dyes. Nuclei are readily visible.

Page 6: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

Mammalian nuclei stained in various ways reveals additional structure within nucleus

Image from Tom Misteli lab

Page 7: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

Focusing on Chromatin

Page 8: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

Turning on/off a gene:• Opening/closing chromatin.

• Binding expressive/inhibitory transcription factors.

• mRNA transcription (or not)

• Additional regulation occurs after transcription, but that is beyond scope of this talk.

Page 9: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

ENCODE Project

• Not to be confused with ENCODE pilot project that just covered 1% of genome.

• 23 biology labs organized into 8 grants, plus an Analysis Working Group and a Data Coordination Center (DCC)

• I’m the principal investigator of the DCC

• ENCODE’s overall goal is to identify and characterize all functional elements of the genome.

• ENCODE DCC’s job is to make data accessible and clear, to put it in UCSC Genome Browser, and to help other databases at NCBI, EBI, and elsewhere import ENCODE data as well.

Page 10: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

ENCODE assays on regulation of transcription

• Opening/closing chromatin– DNase hypersensitivity

– Chromatin immunoprecipitation & sequencing (ChIP-seq) of histone marks

• Binding expressive/inhibitory transcription factors.– ChIP-seq of various transcription factors

• RNA transcription (or not)– mRNA sequencing of ENCODE cell lines

– Exotic RNA sequencing also (see Tom Gingeras’ talk)

Page 11: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

ENCODE DNase Hypersensitivity

• Several genome-wide high throughput methods being used in ENCODE. All involve DNA-seq

• Data currently available for >50 cell lines. Plans for >300 cell lines.

• Main artifacts to watch for:– DNA present in cell in multiple copies:

• Mitochondria, centromeric repeats, other repeats• Generally such regions ignored except in “raw” data.

– Sequencing biases (highly g/c rich regions etc.)– In general artifacts easier to work around than those

associated with DNA-chip based assays.

Page 12: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

UW DNaseI at Hemoglobin Beta

Top track shows genes in the Hemoglobin beta (HBB) locus. Next track shows RNA levels in GM12878 and K562 cell lines. The last track is density plots of DNAse hypersensitivity in many cell lines. K562, a cell line similar to a red blood cell precursor, shows much RNA and DNAase activity.

Page 13: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

A more typical locus - PICALM

DNase patterns typically are less specific to a single cell type as seen here

Page 14: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

Histone Mark and related ChIP-SEQ

• Various histone marks give a broad picture of promoters, enhancers, repressed regions, transcribed regions

• ENCODE data sets currently include 9 histone marks + CTCF (insulator mark) in 9 cell lines. More planned.

Page 15: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

Histone marks on 2 cell lines

Histone mark data at the same locus in two cell lines, GM12878 (red) and K562 (blue). Different marks are associated with promoters, transcribed regions, silencers, enhancers, etc. Most marks are darker in K562, which is more actively transcribing this region.

Page 16: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

Transcription Factor ChIP-Seq

ENCODE has data on 57 factors – most in several cell lines where they are expressed. More coming.

Page 17: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

Making data fit on a single screen

• All of the ENCODE data is excellent, but there is so much of it, it can be hard to know if you’ve seen everything relevant.

• Problem most acute in transcription factor ChIP-SEQ, but really a problem everywhere.

• Lately UCSC has developed several ways of visually summarizing the data.

Page 18: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

Integrating DNase across cell linesHBB Gene

DNAseIsignal

peaks

clusteredpeaks

Page 19: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

Rainbow overlay for histone marks

Page 20: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

Integrated regulatory tracks in context with other genomics information at UCSC

Page 21: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

Acknowledgements• Programming – Tim Dreszer, Brian Raney, Galt

Barber• Wrangling – Cricket Sloan, Venkat Malladi,

Melissa Cline• Testing – Katrina Learned and colleagues• Systems –Erich Weiler, Victoria Lin, Jorge Garcia• Cat Herding – Kate Rosenbloom, Jim Kent• Funding – NHGRI, HHMI, QB3

Page 22: Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions

The End