gene regulaon: encode project and the human epigenome atlas€¦ · the encode project hlps:// the...
TRANSCRIPT
Generegula*on:ENCODEprojectandtheHumanEpigenomeAtlas
JoshuaW.K.Ho,PhD
Head,Bioinforma.csandSystemsMedicineLaboratoryVictorChangCardiacResearchIns.tute
NHMRC/Na.onalHeartFounda.onCareerDevelopmentFellow
SeniorLecturer(Conjoint),StVincent’sClinicalSchool,UNSW
hLp://bioinforma.cs.victorchang.edu.au
UNSW,20thApr2017
Human genome has 23 pairs of chromosomes
2
Human somatic cells (ie, not gametes) have a diploid genome – one haploid genome from each parent Each chromosome is a double helix A haploid genome contains ~ 20,000 protein-coding genes
22 pairs of autosomes 1 pair of sex chromosomes Complete* human genome sequenced by 2003
Beyond DNA sequence: chromatin organisation is important for epigenetic regulation
3
Nucleosomecore=146bpofDNA+histoneoctamerLinkerDNA=~10-80bpinlengthDensitynucleosomeisanimportantdeterminantofchroma.naccessibility
One nucleosome
Transcription factor (TF), RNA polymerase, histone modification, chromatin accessibility, and DNA methylation
Wangetal.(2016)Computa3onalBiology&Bioinforma3cs:GeneRegula3on4
Histone modification is associated with genomic regulatory elements
5Ho(2012)BiophysicalReviews
H3K4me3=Trimethyla.onofhistoneH3atlysine4Promoters:H3K4me3&someH3K4me1&H3K27acEnhancers:H3K4me1&H3K27acandsomeH3K4me3Repressed:H3K27me3orH3K9me3
The ENCODE Project
hLps://www.encodeproject.org
TheEncyclopediaofDNAElements(ENCODE)Consor.umisaninterna.onalcollabora.onofresearchgroupsfundedbytheNa.onalHumangenomeResearchIns.tute(NHGRI).ThegoalofENCODEistobuildacomprehensivepartslistoffunc.onalelementsinthehumangenome,includingelementsthatactattheproteinandRNAlevels,andregulatoryelementsthatcontrolcellsandcircumstancesinwhichgeneisac.ve.
6
ENCODE – past and present • Launched in September 2003 to identify all functional
elements in the human genome sequence – as a follow-up to the Human Genome Project.
• 2003-2007: The pilot phase and technology development phase. Test and compare existing method to rigorously analyse a defined portion (1%) of the human genome sequence.
• 2007-now: genome-wide data production phase
• Major finding: showing that ~80% of the genome participate in at least one biochemical RNA- and/or chromatin-associated event in at least one cell type (ENCODE 2012, Nature)
7
Who are the ENCODE team?
hLps://www.encodeproject.org/about/contributors/ListofcurrentENCODEresearchgroups:hLp://www.genome.gov/26525220
UK
Spain
Israel
USA
8
ENCODE summary slides and videos hLps://www.genome.gov/27561910
hLps://www.encodeproject.org/tutorials/
9
Roadmap Epigenomics
hLp://www.roadmapepigenomics.org/
Definingthehumanepigenome
10
GTEx – Genotype-tissue expression
hLp://www.gtexportal.org/
Discoveringassocia.onbetweengenotype(e.g.,SNP)andgeneexpressioninhumans
11
FANTOM – another international effort
hLp://fantom.gsc.riken.jp/
Discoveryoffunc.onalelementsinH.sapiens(human)andM.musculus(mouse)
12
Related project: mouse ENCODE
hLp://www.mouseencode.org/
ENCODEforM.musculus(mouse)
13
Related project - modENCODE
hLp://www.modencode.org/
ENCODEforD.melanogaster(fruitfly)andC.elegans(roundworm)
14
What can we learn from ENCODE?
• Genome-wide data – RNA-seq, ChIP-seq, DNase-seq,…
• Genome annotation – Gene annotation, enhancers, chromatin state,…
• Experimental protocols and best practices • Bioinformatics analysis methods and software
hLps://www.encodeproject.org/
Landtetal.(2012)GenomeResearch15
ENCODE human cell lines • ENCODE data were mostly generated from human
cell lines. Tier 1 cell lines are more widely assayed than tier 2 cell lines, and in turn are more widely assayed than tier 3 cell lines.
• Tier 1: GM12878 (lymphoblastoid cell line, from a female HapMap individual); H1-hESC (embryonic stem cells); K562 (leukemia cell line)
• Tier 2: 15 cell lines, including IMR90 (fetal lung fibroblasts), HUVEC (umbilical endothelial cells), HeLa-S3 (cervical carcinoma), MCF-7 (mammary gland), etc.
• Tier 3: 338 cell lines hLps://genome.ucsc.edu/ENCODE/cellTypes.html 16
Experimental assays • DNA methylation
– Methyl Array, RRBS
• Chromatin accessibility – DNase-seq, FAIRE-seq
• RNA binding proteins – RIP tiling array, RIP-seq
• Chromatin modification
– ChIP-seq
• TF binding – ChIP-seq
• Higher order chromatin structure – 5C, ChIA-PET
• Replication – Repli-chip, Repli-seq
hLps://genome.ucsc.edu/ENCODE/dataMatrix/encodeDataMatrixHuman.html
17
RNA-seq
18
RNA-seq reads in a browser
19
RNA-seqreads
Geneannota.onApplica.ons:1) Discoveryofnoveltranscriptoralterna.vepromoter2) Lookatalterna.vesplicing3) Profilingofgeneexpression
ChIP-seq maps genome-wide DNA-protein interactions
20
• ChIP = Chromatin-immunoprecipitation
• Enrich for sequence fragments that are bound by a specific protein – Transcription factors – Chromatin-
associated proteins – Histone proteins
Park(2009)NatureRev.Gene.cs
Transcrip*onfactor
Histonemodifica*on
Bulknucleosome
ChIP-seqprofiles
nucleosomeHistonewithamodifiedtail
DNATF
21
ChIP-seq
Hoetal.(2011)Tag-basedApproachesforNextGenera.onSequencing
Identifying chromatin landscape in human cell lines
22Ernstetal.(2011)NatureChroma.nstate=acombina.onofhistonemodifica.ons
Hoetal.(2014)Nature
Chromatin state can differ between organisms
23
Accessing ENCODE data from UCSC Genome Browser
Searchforgene,orgenomicintervalRef.genomeassemblySpecies
24
Many genomic ‘tracks’ in UCSC GB
Selecttrack
Click“Refresh”
25
ENCODE data and annotation on UCSC GB
Geneannota.on
Chroma.nstatetrack
ENCODEtrack’snamingformat:celltypeandexperiment:e.g.,K562H3K4me3
BLATresult
26
Mycustomtrack
Customtrackdata(BEDformatinthisexample)
You can upload custom tracks into UCSC Genome Browser
27
Using encodeproject.org
28
29
30
31
Downloaddata
32
33
34
Visualisedata
35
36
UseRegulomeDBtoiden.fyallSNPsaroundNKX2-5,andpredictoftheyarepoten.alregulatoryelements(promoter/enhancers)Trychr5:172,644,666-172,676,755athLp://regulomedb.org/
37
38
39
40
FactorBook:TheENCODEknowledgebaseoftranscrip.onfactors
hLp://www.factorbook.org/
41
42
43