genes and transcripts: ensembl online webinar series

36
Denise CarvalhoSilva, PhD Ensembl Outreach team European Molecular Biology Laboratory European Bioinforma9cs Ins9tute Ensembl online training series 2016

Upload: denise-carvalho-silva

Post on 11-Jan-2017

142 views

Category:

Science


2 download

TRANSCRIPT

Denise  Carvalho-­‐Silva,  PhD  Ensembl  Outreach  team  

European  Molecular  Biology  Laboratory  

European  Bioinforma9cs  Ins9tute  

 Ensembl  online    

training  series  2016      

Course  Objec?ves  

What  is  Ensembl?    

What  type  of  data  can  you  get  in  Ensembl?    

 

How  to  navigate  the  Ensembl  browser  website?    

 

How  to  connect    with  Ensembl  

This  online  course  Date   Webinar  topic   Instructor  

24th  March  

Introduc9on  to  Ensembl   Emily  Perry  

31st  March  

Ensembl  genes   Denise  Carvalho-­‐Silva  

7th  April   Data  export  with  BioMart   Helen  Sparrow  

14th  April  

Varia9on  data  in  Ensembl  and  the  Ensembl  VEP   Denise  Carvalho-­‐Silva  

21st  April  

Comparing  genes  and  genomes  with  Ensembl  Compara   Helen  Sparrow  

28th  April  

Finding  features  that  regulate  genes  –  the  Ensembl  Regulatory  Build  

Emily  Perry  

5th  May   Uploading  your  data  to  Ensembl  and  advanced  ways  to  access  Ensembl  data  

Ben  Moore  

hTp://www.ebi.ac.uk/training/events/2016/ensembl-­‐online-­‐training-­‐series-­‐2016    

Our  Polls:  finding  more  about  you  

• Previous  webinar  on  the  24th  March    • Poll  1:  ATendance  • Poll  2:  Exercises  

• This  webinar  on  the  31st  March  • Poll  3:  Species  of  interest    

Structure  for  this  hour  webinar  

Presenta?on:  What  the  Ensembl  genes  are  How  we  annotate  them  

Demo:  View  genes/transcripts  

Exercises:  On  the  train  online  course  

Ques?ons?  

• We’ve  muted  all  the  microphones  •  Ask  ques9ons  in  the  Chat  box  in  the  webinar  interface  

• My  Ensembl  colleagues  will  respond  during  my  talk  

•  There’s  no  threading  so  please  respond  with  @username  

Helen  Sparrow  Ben  Moore  Emily  Perry  

EBI is an Outstation of the European Molecular Biology Laboratory.

Compara9ve  Genomics  Gene  models  

Regula9on  Varia9on  

Custom  data  display  Programma9c  access  

Toolkit  

Ensembl  Features  

EBI is an Outstation of the European Molecular Biology Laboratory.

Compara9ve  Genomics  Gene  models  

Regula9on  Varia9on  

Custom  data  display  Programma9c  access  

Toolkit  

Ensembl  Features  

EBI  is  an  Outsta9on  of  the  European  Molecular  Biology  Laboratory.    

Module  2:    Ensembl  Genes  and  Transcripts  

Gene models in Ensembl

Goal:  Generate  set  of  well-­‐supported  genes      

Automa9c   Manual  

Ensembl automatic annotation

•  many species •  genome-wide at once •  ~ 4 months

•  fewer species •  gene by gene •  many years

Automatic and coding (20_)

Manual and coding (00_)

Automatic + Manual (“gold”)

Manual and non-coding (00_)

Automatic annotation* Manual annotation*

* based on experimental, biological evidence (INSDC, UniProtKB…)

Ensembl  genes  &  transcripts  

•   merged  annota9on      

•   higher  confidence  and  quality  

•   comprehensive:  alterna9vely  spliced  transcripts  

UTR   Exon  Intron  

5’  UTR  3’  UTR  

Gold  (iden9cal  annota9on)  =  Automa9c  +  Manual  

CCDS  project  

• annotate  a  consensus  coding  DNA  sequence  set  • EBI,  WTSI,  UCSC  and  NCBI  

•     

Genome  Res.  19:1316-­‐23  (2009)  

hTp://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi  

CCDS  transcript  

Alterna?vely  spliced  transcripts  

rich  and  comprehensive  annota9on  

Helping  out  with  the  decision  

APPRIS*  Annotate  splice  isoforms  (reliable  data)  and  select  a  primary  variant    

hTp://appris.bioinfo.cnio.es/#/  

How  supported  is  a  transcript  model?        

Full  length  versus  par9al  sequences  to  support  splice  

junc9ons      

Transcript  Support  Levels  or  TSL  Mark  Diekhans  (UCSC)  

 GENCODE  set  

cDNAs/ESTs  alignments  

Alignment  over  full  length  of  the  transcript  

Scoring  and  categories  

How  supported  is  a  transcript  model?      

*Transcript’s  not  been  analysed  if:  

•  It’s    pseudogene  •  It’s  in  the  MHC  region  (HLA,  Ig,  TCR)  •  It’s  a  single  exon  transcript  (to  be  included)  

TSL1:  at  least  one  mRNA  TSL2:  mul9ple  ESTs  TSL3:  single  EST      

TSL4:  EST  is  suspect  TSL5:  no  mRNA/EST  support  TSLNA:  not  analysed*      

TSL  categories  

TSL1  >  TSL2  >  TSL3  >  TSL4  >  TSL5  

well  supported  ààà  poorly  supported  

APPRIS  and  TSLs  in  Ensembl  

hTp://www.ensembl.org/Help/Glossary?id=493  hTp://www.ensembl.org/Help/Glossary?id=492  

 

APPRIS  

TSLs  

Disclaimer:  Which  transcript  to  use  

No  single  method  will  tell  us  which  transcript  to  use  Decision  on  a  case  by  case  basis  •  All  transcripts  OR  one/two  well  supported  ones?  

List  of  transcripts:  we  offer  choices  based  on  •  CCDS  (Ensembl,  HAVANA,  NCBI,  UCSC)  •  Golden  transcripts  (iden9cal  Ensembl  and  HAVANA)  •  Cross  reference  entries  (e.g.  UniProtKB,  RefSeq)  •  APPRIS  •  TSLs  

 

RNASeq  and  gene  annota?on  

hTp://www.ensembl.org/info/genome/genebuild/rnaseq_annota9on.html    

ncRNA  gene  annota?on  

hTp://www.ensembl.org/info/genome/genebuild/ncrna.html    

Ensembl  stable  iden?fiers  

• ENSG###########    Ensembl  Gene  ID  • ENST###########    Ensembl  Transcript  ID  • ENSP###########    Ensembl  Pep9de  ID  • ENSE###########    Ensembl  Exon  ID  

• For  non-­‐human  species  a  suffix  is  added:        ENSMUSG  MUS  (Mus  musculus)    for  mouse    ENSRNOG  RNO    (Ra;us  norvegicus)    for  rat            

Live  demo  

The ESPN gene products are active in the inner ear, where it appears to play an essential role in normal hearing and balance.

Let’s explore ESPN

A) How can I find the genomic sequence of this gene? What is the ID of its first exon?

B) Can I display the genomic coordinates and variants on this sequence?

C) Can I find information on the expression of this gene in different tissues?

Human  ESPN:  gene  tab  

A) How many exons does the longest ESPN transcript have? Are there any completely untranslated exons?

B) Can I find its cDNA sequence?

C) What are the UniProt and RefSeq entries cross referenced to this transcript?

Human  ESPN:  transcript  tab  

Next  webinar  –  BioMart  Ensembl  provides  an  extensive  tool  kit  so  that  you  can  access  our  data  or  process  your  own.    BioMart  is  the  first  Ensembl  tool  we  will  look  into  during  this  online  series.    It  allows  you  to  export  data  from  Ensembl  in  table  format  or  list  of  sequences  with  no  programma9c  skills  required.    See  you  next  week,  same  9me.  

Helen  Sparrow  

Course  exercises  hTp://www.ebi.ac.uk/training/online/course/ensembl-­‐

browser-­‐webinar-­‐series-­‐2016  

This  text  will  be  replaced  by  a  YouTube  (link  to  YouKu  too)  video  of  the  webinar  and  a  pdf  of  the  slides.  

The  “next  page”  will  be  the  exercises  

A  link  to  exercises  and  their  solu9ons  will  appear  in  the  page  

hierarchy  

Get  help  with  the  exercises  •  Use  the  exercise  solu9ons  

in  the  online  course  •  Join  our  Facebook  group  

and  discuss  the  exercises  with  everybody  (see  the  online  course  for  the  link)  

•  Email  us  [email protected]  

Connect  with  Ensembl  

?  ?  ?  ?  ?

?  ?

?  ?  ?

[email protected]  

www.youtube.com/user/EnsemblHelpdesk    www.ensembl.org/info/genome/genebuild/index.html  

 

Publica?ons  

Yates,  A.  et  al  Ensembl  2016  Nucleic  Acids  Research  hTp://europepmc.org/ar9cles/4702834  

   Xosé  M.  Fernández-­‐Suárez  and  Michael  K.  Schuster  Using  the  Ensembl  Genome  Server  to  Browse  Genomic  Sequence  Data.  Current  Protocols  in  BioinformaCcs  1.15.1-­‐1.15.48  (2010)  www.ncbi.nlm.nih.gov/pubmed/20521244      GiulieTa  M  Spudich  and  Xosé  M  Fernández-­‐Suárez  Touring  Ensembl:  A  prac?cal  guide  to  genome  browsing  BMC  Genomics  11:295  (2010)  www.biomedcentral.com/1471-­‐2164/11/295    

hTp://www.ensembl.org/info/about/publica9ons.html  

Ensembl  2015  

Acknowledgements  The  En?re  Ensembl  Team  

Funding

Co-funded by the European Union