browsing genes and genomes with ensembl · 2016-06-08 · variation data in ensembl and the ensembl...
TRANSCRIPT
Emily Perry
Ensembl Outreach Project Leader
EMBL-EBI
Browsing Genes and Genomes with Ensembl
Objectives
• What is Ensembl?
• What type of data can you get in Ensembl?
• How to navigate the Ensembl browser website.
• Where to go for help and documentation.
This webinar courseDate Webinar topic Instructor
24th March
Introduction to Ensembl Emily Perry
31st March
Ensembl genes Denise Carvalho-Silva
7th April Data export with BioMart Helen Sparrow
14th April
Variation data in Ensembl and the Ensembl VEP Denise Carvalho-Silva
21st April
Comparing genes and genomes with Ensembl Compara Helen Sparrow
28th April
Finding features that regulate genes – the Ensembl Regulatory Build
Emily Perry
5th May Uploading your data to Ensembl and advanced ways to access Ensembl data
Ben Moore
Structure
Presentation:What the Ensembl Regulatory Build isHow we produce/process the data
Demo:Getting
regulatory data
Exercises:On the train online course
Questions?
• We’ve muted all the mics• Ask questions in the Chat box in
the webinar interface• My Ensembl colleagues will
respond during the talk• There’s no threading so please
respond with @username
Helen Sparrow Ben Moore Denise Carvalho-Silva
Course exercises
http://www.ebi.ac.uk/training/online/course/ensembl-browser-webinar-series-2016
This text will be replaced by a YouTube (link to YouKu too) video of the webinar
and a pdf of the slides.
The “next page” will be the exercises
A link to exercises and their solutions will appear in the page
hierarchy
Get help with the exercises
• Use the exercise solutions in the online course
• Join our Facebook group and discuss the exercises with everybody (see the online course for the link)
• Email us [email protected]
Quick polls
• Poll 1: Did you attend the previous webinars?• Poll 2: Have you done the previous exercises?
EBI is an Outstation of the European Molecular Biology Laboratory.
Module 6:Ensembl Regulation
Overview
• Epigenetics in gene regulation• Methods in epigenetics
• Ensembl Regulation• Data sources
• Our build
Epigenetics
The study of heritable genetic changes, without changes in the DNA sequence.
This is known to regulate gene expression.
Epigenetic change -> cell differentiation
Epigenetic change -> cell differentiation
• Cells carry out different functions• Cells are morphologically different• Cells express different genes• Cells have different epigenomes
Epigenetic change -> cell differentiation
Stem cell
Differentiated cell
Promoters enhancers etc
• TF-binding at promoters and enhancers is necessary for transcription
• Combinations of epigenetic marks affect the ability and probability of TF-binding at these sites
Methods of gene regulation
Method of regulation Technique
Histone modifications ChIP-seq
Transcription factor binding ChIP-seq
Open/closed chromatin DNase sensitivity
DNA methylation Bisulfite sequencing
Histone modifications
We describe histone modifications using the form Subunit, Amino acid, Position, Modification, eg H3K36me3.
Histone code
Modification Histone
H3K4 H3K9 H3K14 H3K27 H3K79 H4K20 H2BK5
me1
me2
me3
ac
ChIP-seq for histone mods & TF-binding
DNA
DNA-binding protein
Shear the genome
Crosslink
Covalent bond
Antibody
Pull down the protein with an antibody
Remove crosslinks and wash
Sequence fragments
ACGCTGACTAGAATCAATGGCTTCTCTTCGCATATGGCTGACTA
TF motifs vs ChIP-seq peaks
CTCF binding motif – 19 basesChIP-seq sonication fragment 200 bases
ATTTAGTTCCCTAGATCTGATCTAATCATCGGATCTATAGCCGATCGTAGRead length 50 bases
peak
400 bp
19 bpDNAMotif
reads
Open/closed chromatin
Open chromatin is transcriptionally active.Closed chromatin is inactive.
DNase hypersensitivity
Sequence and compare to reference
DNase treatment
Purify
GGCGGGATTGCGCGTTAGATCGCGCGCTTATGCTAGCCGCGCTGATAGCGGCGGGATTGCGCGTTAGATCGCGCGCTTATGCTAGCCGCGCTGATAGC
CH3CH3 CH3 CH3 CH3CH3CH3 CH3
GCTATCAGCGCGGCTAGCATAAGCGCGCGATCTAACGCGCAATCCCGCC
CH3CH3 CH3CH3CH3 CH3 CH3CH3
DNA methylation -> inactive
Bisulfite sequencing
GGCGGGATTGCGCGTTAGATCGCGCGCTTATGCTAGCCGCGCTGATAGC
CH3 CH3 CH3
CH3 CH3
Bisulfite treatment
GGUGGGATTGUGUGTTAGATCGCGCGUTTATGUTAGUCGCGUTGATAGC
Sequence and compare to reference
Ensembl RegulationThe goal of Ensembl Regulation team is to annotate the genome with features that may play a role in the transcriptional regulation of genes.
• Predicted open/closed chromatin
• DNase I sensitivity
• FAIRE
• Transcription factor binding sites
• Epigenetic marks
• Histone modifications
• DNA methylation
• RNA Pol binding
Current data
The future
A subset of cell types
• Only a subset of available data is displayed in Ensembl.• We display cell types that have, at a minimum:
• CTCF binding (not Blueprint)• DNase or FAIRE data (not Blueprint)• H3K4me3, H3K27me3, H3K36me3 data
• We display all TFBS and histone modification data known in these cell types.
• We process these data to predict activity.
• Further data can be added using track hubs.
Processing the data
• The raw data is taken from the various sources.• This is processed to predict the positions of regulatory
features, such as promoters, enhancers and insulators.• The activity of these features is predicted in the different
cell types.
• All of this can be viewed in the genome browser.
Raw data
Transcription Factor ATranscription Factor BTranscription Factor C
Histone mod1Histone mod2Histone mod3
Searching for patterns
known promoter
known promoter
known promoter
Segmentation
Transcription Factor ATranscription Factor BTranscription Factor C
Histone mod1Histone mod2Histone mod3
Segmentation is blind
• Our algorithm has no idea what the patterns it is looking at are
• eg it doesn’t know if a histone modification is activating or repressing
• Later analysis reveals that it activating modifications are found at activating segments and vice versa
• ie we think it’s working!
MultiCell features
Cell type 1
Cell type 2
Cell type 4
Cell type 3
Cell-specific features
Cell type 1
Cell type 2
Cell type 4
Cell type 3
MultiCell
CoverageLabel Count Mean length
(bp)Max length (bp)
Total length (Mbp)
TSS 40,249 973.2 11,400 39.2
Proximal Reg. 101,206 1005.5 15,000 101.8
Distal Reg. 209,081 526.1 8,400 110.0
CTCF 108,284 550.1 5,200 59.6
Unannotated TFBS
163,528 155.8 1,630 25.5
Union 299.2
We do not…• …link promoters/enhancers/insulators or any other regulatory
features to genes. We allow you see what is where and make your own inferences.
• …link regulatory features to gene expression. We have cell-line specific regulation data and tissue specific expression data – make of it what you will.
Regulatory data is incredibly complex and still in relative infancy. There is no comprehensive database of regulation data.
Real promoters etc?
• Our predictions are based on real biological data
• We have strong evidence to suggest that they are doing what we think they are
• Most of them have not been experimentally validated (ie none have been cloned alongside a gene and tested)
• More data will further refine and improve our pipeline
Regulation BioMart
Dataset Motifs
Reg-feats Evidence
Filters Location
Cell Types Class
Attributes Reg-feat IDs
locations activity
Results table
Hands on
• We’re going to look at the region of a gene LIMD2 to find regulatory features and explore what cells types they are active in and what evidence there is to show this.
Next webinar – Advanced access
As well as exploring genomic data through the web interface, you are also able to upload your own data to view within the browser.
The first part of this final webinar will show you how you can view custom data, such as BED or BAM files, in the Ensembl browser.
We will then introduce some of the more advanced methods of accessing Ensembl data, such as using REST API, Perl API, FTP site and MySQL queries.
Questions?
• You can continue to use the chat box
• I will read out loud any further questions and answer on the screen
• You can also try hands-up, and I will unmute your mic
Helen Sparrow Ben Moore Denise Carvalho-Silva
Course exercises
http://www.ebi.ac.uk/training/online/course/ensembl-browser-webinar-series-2016
This text will be replaced by a YouTube (link to YouKu too) video of the webinar
and a pdf of the slides.
The “next page” will be the exercises
A link to exercises and their solutions will appear in the page
hierarchy
Get help with the exercises
• Use the exercise solutions in the online course
• Join our Facebook group and discuss the exercises with everybody (see the online course for the link)
• Email us [email protected]
Help and documentationCourse online http://www.ebi.ac.uk/training/online/subjects/11
Tutorials www.ensembl.org/info/website/tutorials
Flash animations
www.youtube.com/user/EnsemblHelpdesk
http://u.youku.com/Ensemblhelpdesk
Email us [email protected]
Ensembl public mailing lists [email protected], [email protected]
Follow us
www.facebook.com/Ensembl.org
@Ensembl
www.ensembl.info
Publications
Yates, A. et al
Ensembl 2016
Nucleic Acids Research
http://europepmc.org/articles/4702834
Xosé M. Fernández-Suárez and Michael K. SchusterUsing the Ensembl Genome Server to Browse Genomic Sequence Data.Current Protocols in Bioinformatics 1.15.1-1.15.48 (2010)www.ncbi.nlm.nih.gov/pubmed/20521244
Giulietta M Spudich and Xosé M Fernández-SuárezTouring Ensembl: A practical guide to genome browsingBMC Genomics 11:295 (2010)www.biomedcentral.com/1471-2164/11/295
http://www.ensembl.org/info/about/publications.html
Ensembl 2015
AcknowledgementsThe Entire Ensembl Team
Funding
Co-funded by the European Union