genes and transcripts: ensembl online webinar series
TRANSCRIPT
Denise Carvalho-‐Silva, PhD Ensembl Outreach team
European Molecular Biology Laboratory
European Bioinforma9cs Ins9tute
Ensembl online
training series 2016
Course Objec?ves
What is Ensembl?
What type of data can you get in Ensembl?
How to navigate the Ensembl browser website?
How to connect with Ensembl
This online course Date Webinar topic Instructor
24th March
Introduc9on to Ensembl Emily Perry
31st March
Ensembl genes Denise Carvalho-‐Silva
7th April Data export with BioMart Helen Sparrow
14th April
Varia9on data in Ensembl and the Ensembl VEP Denise Carvalho-‐Silva
21st April
Comparing genes and genomes with Ensembl Compara Helen Sparrow
28th April
Finding features that regulate genes – the Ensembl Regulatory Build
Emily Perry
5th May Uploading your data to Ensembl and advanced ways to access Ensembl data
Ben Moore
hTp://www.ebi.ac.uk/training/events/2016/ensembl-‐online-‐training-‐series-‐2016
Our Polls: finding more about you
• Previous webinar on the 24th March • Poll 1: ATendance • Poll 2: Exercises
• This webinar on the 31st March • Poll 3: Species of interest
Structure for this hour webinar
Presenta?on: What the Ensembl genes are How we annotate them
Demo: View genes/transcripts
Exercises: On the train online course
Ques?ons?
• We’ve muted all the microphones • Ask ques9ons in the Chat box in the webinar interface
• My Ensembl colleagues will respond during my talk
• There’s no threading so please respond with @username
Helen Sparrow Ben Moore Emily Perry
EBI is an Outstation of the European Molecular Biology Laboratory.
Compara9ve Genomics Gene models
Regula9on Varia9on
Custom data display Programma9c access
Toolkit
Ensembl Features
EBI is an Outstation of the European Molecular Biology Laboratory.
Compara9ve Genomics Gene models
Regula9on Varia9on
Custom data display Programma9c access
Toolkit
Ensembl Features
EBI is an Outsta9on of the European Molecular Biology Laboratory.
Module 2: Ensembl Genes and Transcripts
• many species • genome-wide at once • ~ 4 months
• fewer species • gene by gene • many years
Automatic and coding (20_)
Manual and coding (00_)
Automatic + Manual (“gold”)
Manual and non-coding (00_)
Automatic annotation* Manual annotation*
* based on experimental, biological evidence (INSDC, UniProtKB…)
Ensembl genes & transcripts
• merged annota9on
• higher confidence and quality
• comprehensive: alterna9vely spliced transcripts
UTR Exon Intron
5’ UTR 3’ UTR
Gold (iden9cal annota9on) = Automa9c + Manual
CCDS project
• annotate a consensus coding DNA sequence set • EBI, WTSI, UCSC and NCBI
•
Genome Res. 19:1316-‐23 (2009)
hTp://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi
CCDS transcript
APPRIS* Annotate splice isoforms (reliable data) and select a primary variant
hTp://appris.bioinfo.cnio.es/#/
Transcript Support Levels or TSL Mark Diekhans (UCSC)
GENCODE set
cDNAs/ESTs alignments
Alignment over full length of the transcript
Scoring and categories
How supported is a transcript model?
*Transcript’s not been analysed if:
• It’s pseudogene • It’s in the MHC region (HLA, Ig, TCR) • It’s a single exon transcript (to be included)
TSL1: at least one mRNA TSL2: mul9ple ESTs TSL3: single EST
TSL4: EST is suspect TSL5: no mRNA/EST support TSLNA: not analysed*
TSL categories
TSL1 > TSL2 > TSL3 > TSL4 > TSL5
well supported ààà poorly supported
APPRIS and TSLs in Ensembl
hTp://www.ensembl.org/Help/Glossary?id=493 hTp://www.ensembl.org/Help/Glossary?id=492
APPRIS
TSLs
Disclaimer: Which transcript to use
No single method will tell us which transcript to use Decision on a case by case basis • All transcripts OR one/two well supported ones?
List of transcripts: we offer choices based on • CCDS (Ensembl, HAVANA, NCBI, UCSC) • Golden transcripts (iden9cal Ensembl and HAVANA) • Cross reference entries (e.g. UniProtKB, RefSeq) • APPRIS • TSLs
Ensembl stable iden?fiers
• ENSG########### Ensembl Gene ID • ENST########### Ensembl Transcript ID • ENSP########### Ensembl Pep9de ID • ENSE########### Ensembl Exon ID
• For non-‐human species a suffix is added: ENSMUSG MUS (Mus musculus) for mouse ENSRNOG RNO (Ra;us norvegicus) for rat
Live demo
The ESPN gene products are active in the inner ear, where it appears to play an essential role in normal hearing and balance.
Let’s explore ESPN
A) How can I find the genomic sequence of this gene? What is the ID of its first exon?
B) Can I display the genomic coordinates and variants on this sequence?
C) Can I find information on the expression of this gene in different tissues?
Human ESPN: gene tab
A) How many exons does the longest ESPN transcript have? Are there any completely untranslated exons?
B) Can I find its cDNA sequence?
C) What are the UniProt and RefSeq entries cross referenced to this transcript?
Human ESPN: transcript tab
Next webinar – BioMart Ensembl provides an extensive tool kit so that you can access our data or process your own. BioMart is the first Ensembl tool we will look into during this online series. It allows you to export data from Ensembl in table format or list of sequences with no programma9c skills required. See you next week, same 9me.
Helen Sparrow
Course exercises hTp://www.ebi.ac.uk/training/online/course/ensembl-‐
browser-‐webinar-‐series-‐2016
This text will be replaced by a YouTube (link to YouKu too) video of the webinar and a pdf of the slides.
The “next page” will be the exercises
A link to exercises and their solu9ons will appear in the page
hierarchy
Get help with the exercises • Use the exercise solu9ons
in the online course • Join our Facebook group
and discuss the exercises with everybody (see the online course for the link)
• Email us [email protected]
Connect with Ensembl
? ? ? ? ?
? ?
? ? ?
www.youtube.com/user/EnsemblHelpdesk www.ensembl.org/info/genome/genebuild/index.html
Publica?ons
Yates, A. et al Ensembl 2016 Nucleic Acids Research hTp://europepmc.org/ar9cles/4702834
Xosé M. Fernández-‐Suárez and Michael K. Schuster Using the Ensembl Genome Server to Browse Genomic Sequence Data. Current Protocols in BioinformaCcs 1.15.1-‐1.15.48 (2010) www.ncbi.nlm.nih.gov/pubmed/20521244 GiulieTa M Spudich and Xosé M Fernández-‐Suárez Touring Ensembl: A prac?cal guide to genome browsing BMC Genomics 11:295 (2010) www.biomedcentral.com/1471-‐2164/11/295
hTp://www.ensembl.org/info/about/publica9ons.html