how to access genomic information using ensembl damian smedley and xosé fernández ensembl project...

46
How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November 2004

Upload: hilary-richardson

Post on 28-Dec-2015

225 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

How to access genomic information

using Ensembl

Damian Smedley and Xosé Fernández

Ensembl Project

European Bioinformatics InstituteCambridge, UK

November 2004

Page 2: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

2 of 45

Schedule

Today

Introduction to the Ensembl system

Hands-on examples to introduce the system

Evaluating genes and transcripts

Variation in Ensembl (SNPs, haplotypes)

Tomorrow

Data mining with EnsMart

Comparative genomics and proteomics in Ensembl

BioMart

Advanced topics (Upload your own data, DAS)

Page 3: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

3 of 45

Our goal

Page 4: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

4 of 45

Other ordering data

to 26,720 overlapping clones

From 325,109 initial contigs

Assembly

non-redundant, “virtual contig” view

Page 5: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

finished BAC

draft

sequenceassembly

WGS

fragment

pUCsavg size 2-4 kb

Bentley et al 2001Bruls et al 2001McPherson et al 2001Montgomery et al 2001Tilford et al 2001

mapOsoegawa et al 2001

fragment

BACsbacterial artificial chromosomesavg size 150 kb

Shizuya et al 1992Dib et al 1996Deloukas et al 1998

Mapping and Sequencing the human genome

Page 6: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

Status of the human sequencefinished red /orange~96% (99.999% accurate)

30-40% repetitive elements (eg Alpha satellite, Alu repeats)

All known genes, correctly identified (99.74%)

heterochromatin~4% grey

Assembled draft sequence totals 2.85 Gb

Page 7: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

7 of 45

Human genome: Current status

• 22,287 'gene loci‘ defined, consisting of 19,599 protein-coding genes in the human genome and 2,188 DNA additional segments ‘predicted’ to be protein-coding genes

– 1183 genes ‘were born’ in the last 60-100 My– ~ 30 genes ‘died’ in a similar time period

Finishing the euchromatic sequence of the human genome, Nature 431:931-45 (2004)

Page 8: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

8 of 45

Ensembl - project aims

• funded to provide metazoan genomes to the world• aims to provide the world’s best automated

genome annotation• a leading group for human and mouse analysis• all software, data and results freely available

Page 9: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

9 of 45

Ensembl - project background

• group split between EBI and Sanger• mainly Wellcome Trust funded • largest dedicated compute in biology in Europe• developer community > 100 people, including

companies

Page 10: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

10 of 45

Freely-availableCommunity development.

– >51 Ensembl installs worldwide.

– Both public and commercial,

e.g. Gramene (CSHL)

Fugu-sg (ICMB)

Ciona-sg (Temasek)

Ensembl – Open source

Page 11: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

11 of 45

Analysis DB

CPU

Final DB

SupportingDatabases

SNP

ManualAnnotation

Ensembl

Page 12: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

12 of 45

Genome browsingwhy present the whole genome?

• Explore what is in a chromosome region• See features in and around a specific gene• Search & retrieve across the whole genome• Investigate genome organization• Compare to other genomes

Page 13: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

13 of 45

• Ensembl – public site + installable system

Genome browsers

• NCBI Map Viewer

• UCSC Human Genome Browser

http://www.ensembl.org

http://www.ncbi.nlm.nih.gov/mapview

http://genome.ucsc.edu

Page 14: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

14 of 45

Introduction to the

Ensembl web site Ensembl … …

takes genomic sequence assemblieshuman build 34, mouse, rat, Fugu,mosquito

adds annotation and links automated process

presents all the data on a web site

Page 15: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

15 of 45

Known genes Novel genes

• where?• genomic structure?• transcripts(s)?• protein(s)?• orthologues?• attach useful links

• how to predict?require evidence• transcripts(s)?• protein(s)?• orthologues?• attach useful links

Annotation: genes

Page 16: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

16 of 45

Annotation: other features

• markers and SNPs• cytogenetic bands• repeated sequences• ESTs & other sequence records where do they show sequence similarity?

• regions homologous to other species

Page 17: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

17 of 45

How to get started … …

• Species homepage

• Site map

• Map View

• Text search

• BLAST

• SSAHA

• Disease View

Page 18: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

Homepage

Page 19: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

Site map

Page 20: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

MapView

AnchorView

Page 21: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

BLAST and SSAHA

Page 22: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

BLAST and SSAHA

Page 23: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

23 of 45

Regions, maps and markers

MarkerView

SNPView

ContigView

CytoView

SyntenyView

MultiContigView

Page 24: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

EnsemblContigView

Page 25: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

ContigView close-up

EvidenceTranscriptsred & black(Ensembl predictions)Blue (Vega)

Customising& short cuts

Pop-up menu

Page 26: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

ContigView - Chromosome 20 close-up

Manualannotationvia Vega

Ensembl predictions

Ensembl EST-based predictions

Forw

ard

strandR

everse strand

Other chromosomes with manual annotation from http://vega.sanger.ac.uk: 6, 7, 9, 10, 13, 14, 20, 22, X

Page 27: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

CytoView

Page 28: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

GeneSNPView

Page 29: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

MarkerView

SNPView

Page 30: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

SyntenyView

Page 31: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

MultiContigView

Page 32: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

32 of 45

Genes & gene products

GeneView

TransViewExonView

ProteinView

FamilyView

DomainView

GOView

DiseaseView

Page 33: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

EnsemblGeneView

Page 34: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

TransView ExonView

Page 35: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

ProteinView

Page 36: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

FamilyView

Page 37: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

GOView

Page 38: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

DiseaseView

Page 39: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

39 of 45

Data retrieval

EnsMart

Data sets on ftp site

MySQL queries of databases

Perl API access to databases

Export View

Page 40: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

ExportView

Page 41: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

EnsMart

Page 42: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

42 of 45

Mouse differences

• Genomic sequence assembly based on whole genome shotgun, with finished ‘stitched’ BACs

• BACs are shown in CytoView (FPC map), but for most no sequence is available

Page 43: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

MouseCytoView

Page 44: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

44 of 45

Help!

• context sensitive help pages - click

• access other documentation via generic home page

• email the helpdeskHelpDesk / Suggestions

Page 45: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

45 of 45

Thanks

Ensembl Team

Page 46: How to access genomic information using Ensembl Damian Smedley and Xosé Fernández Ensembl Project European Bioinformatics Institute Cambridge, UK November

Database Schema and Core API

Arne Stabenau

Yuan Chen

Ian Longden

Craig Melsopp

Glenn Proctor

Daniel Ríos

Guy Slater

Distributed Annotation System

Andreas Kähäri

Project Leader

Ewan Birney (EBI)

Tim Hubbard (Sanger)

Ensembl Web Team

James Stalker

Fiona Cunningham

James Smith

Vega Web Team

Patrick Meidl

Steve Trevianon

Analysis and

Annotation Pipeline

Val Curwen

Steve Searle

Dan Andrews

Mario Caccamo

Laura Clarke

Martin Hammond

Jan Hinnerck-Vogel

Kevin Howe

Vivek Iyer

Kerstin Jekosch

Felix Kokocinski

Simon White

User Support

Xosé Mª Fernández

Michael Schuster

Comparative Genomics

Abel Ureta-Vidal

Javier Herrero Sánchez

Jessica Severin

Cara Woodwark

EnsMart & BioMart

Arek Kasprzyk

Damian Keefe

Darin London

Damian Smedley

Ensembl TeamEnsembl Team

November 2004