ebi is an outstation of the european molecular biology laboratory. goa: looking after go annotations...

39
EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database European Bioinformatics Institute Cambridge UK

Upload: homer-arnold

Post on 13-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

EBI is an Outstation of the European Molecular Biology Laboratory.

GOA: Looking after GO annotations

Emily Dimmer

Gene Ontology Annotation (GOA) Database

European Bioinformatics Institute

Cambridge

UK

Page 2: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

2 EMBRACE Workshop 7-9th November 2007

http://www.geneontology.org

Reactome

E. Coli hub

Page 3: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

3 EMBRACE Workshop 7-9th November 2007

Gene Ontology Annotation (GOA) Database

• Member of the GO Consortium since 2001

• Largest open-source contributor of annotations to GO

• Provides annotation for more than 139,000 species

• GOA’s priority is to annotate the human proteome

• GOA is responsible for human, chicken and bovine annotations in the GO Consortium

Page 4: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

4 EMBRACE Workshop 7-9th November 2007

GOA Group

[email protected]

EMBL-EBI

Wellcome Trust Genome Campus, Hinxton, Cambridge, UK

GOA office

Page 5: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

5 EMBRACE Workshop 7-9th November 2007

Evelyn Camon(senior GOA

curator)

Daniel Barrell(GOA file releases

& database)

Rachael Huntley(GOA curator)

David Binns(QuickGO,

protein2go tools)

Along with the help of UniProt curators at the EBI, UniProt controlled vocabularies, HAMAP group, InterPro group, IntAct curators, the IPI group, Ensembl, other EBI groups

…and of course the GO editors and the other GO Consortium annotation groups

Emily Dimmer(GOA coordinator)

GOA Group

Page 6: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

6 EMBRACE Workshop 7-9th November 2007

How does GOA annotate to the GO ?

Electronic Annotation

Manual Annotation

• Both these methods have their advantages

• They can be easily distinguished by the evidence code used.

Page 7: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

7 EMBRACE Workshop 7-9th November 2007

• Annotations provided to over 140,000 taxa

• Total of 415,576 PubMed references included as evidence.

• Manual annotations integrated from external model organism and multi-species databases:AgBase, DictyBase, Ensembl, FlyBase, GDB, GeneDB(S.pombe),Gramene, HGNC, MGI, Reactome, RGD, Roslin, SGD, TAIR, TIGR, WormBase, ZFIN, the IntAct protein-protein interaction database, LIFEdb and the Proteome Inc dataset

Status of GOA Annotation

Evidence Source Annotations Proteins UniProt coverage

Electronic annotations 22,774,674 3,362,148 63.7 %

Manual Annotations 450,489 86,778 1.6 %

October 2007 Stats

Page 8: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

8 EMBRACE Workshop 7-9th November 2007

Core information needed for a GO annotation

1. Gene or gene product identifiere.g. Q9ARH1

2. GO term IDe.g. GO:0004674 (protein serine/threonine kinase)

3. Reference IDe.g. PubMed ID: 12374299 GO_REF:0000001

4. Evidence codee.g. IDA

..and also in some cases:

- Qualifiers available to modify interpretation of annotation:

NOT

contributes_to

colocalizes_with

- ‘With’ column information, to provide further information on the method (evidence code)

Page 9: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

9 EMBRACE Workshop 7-9th November 2007

Electronic Annotation

• A number of different techniques used by different GO Consortium annotation groups.

• All resulting annotations must be high-quality and provide an explanation of the method (GO_REF)

1. Mapping of external concepts to GO terms

2. Automatic transfer of annotations to orthologs

Page 10: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

10 EMBRACE Workshop 7-9th November 2007

Electronic annotation: GO mappings

Fatty acid biosynthesis (SwissProt keyword)

EC:6.4.1.2 (EC number)

IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry)

MF_00527: Putative 3-methyladenine DNA glycosylase(HAMAP)

Camon et al. BMC Bioinformatics. 2005; 6 Suppl 1:S17

GO:fatty acid biosynthesis(GO:0006633)

GO:DNA repair (GO:0006281)

GO:acetyl-CoA carboxylaseactivity

(GO:0003989)

GO:acetyl-CoA carboxylase activity

(GO:0003989)

Page 11: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

11 EMBRACE Workshop 7-9th November 2007

Page 12: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

12 EMBRACE Workshop 7-9th November 2007http://www.geneontology.org/GO.indices.shtml

Page 13: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

13 EMBRACE Workshop 7-9th November 2007

Automatic transfer of annotations to orthologs

Anopheles

Mouse DrosophilaRat Zebrafish Xenopus

Ensembl COMPARA

Homologies between different species calculated

GO terms projected from MANUAL annotation only (IDA, IEP, IGI, IMP, IPI)

One-to-one and apparent one-to-one orthologies only used.

http://www.ensembl.org/info/data/compara

Macaque Chimpanzee

Guinea Pig Rat Mouse

Dog Chicken

Human

Rat

Human

Mouse

Human

Human

Tetraodon

Fugu

Zebrafish

Aedes aegypti

Page 14: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

14 EMBRACE Workshop 7-9th November 2007

• High–quality, specific annotations made using:

• Peer-reviewed papers

• A range of evidence codes to categorize the types of evidence found in a paper

• Very time consuming and requires trained biologists

Manual Annotation

Page 15: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

15 EMBRACE Workshop 7-9th November 2007

Finding Annotations

In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response…

wound response

serine/threonine kinase activity,

integral membrane protein…for B. napus PERK1 protein (Q9ARH1) PubMed ID: 12374299

FUNCTION protein serine/threonine kinase activity GO:0004674

COMPONENT integral to plasma membrane GO:0005887

PROCESS response to wounding GO:0009611

Page 16: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

16 EMBRACE Workshop 7-9th November 2007

Evidence Codes

IEA Inferred from Electronic Annotation

IDA Inferred from Direct Assay

IMP Inferred from Mutant Phenotype

IPI Inferred from Protein Interaction

IEP Inferred from Expression Pattern

IGI Inferred from Genetic Interaction

ISS* Inferred from Sequence or Structural Similarity

IGC Inferred from Genomic Context

RCA Reviewed Computational Analysis

TAS Traceable Author Statement

NAS Non-traceable Author Statement

IC Inferred from Curator Judgement

ND No Data available

IDA:

• Enzyme assays

• In vitro reconstitution

• Immunofluorescence

• Cell fractionation

TAS:

• In the literature source the original experiments referred to are referenced.

Page 17: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

17 EMBRACE Workshop 7-9th November 2007

Core information needed for a GO annotation

1. Gene or gene product identifiere.g. Q9ARH1

2. GO term IDe.g. GO:0004674 (protein serine/threonine kinase)

3. Reference IDe.g. PubMed ID: 12374299 GO_REF:0000001

4. Evidence codee.g. IDA

..and also in some cases:

- Qualifiers available to modify interpretation of annotation

NOT

contributes_to

colocalizes_with

- ‘With’ column information, to provide further information on the method (evidence code)

Page 18: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

18 EMBRACE Workshop 7-9th November 2007

The ‘Qualifier’ Column

The Qualifier column is used to modify the interpretation of an annotation.

Allowable values are: NOT colocalizes_with

contributes_to

Page 19: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

19 EMBRACE Workshop 7-9th November 2007

The ‘NOT’ qualifier

• 'NOT' is used to make an explicit note that the gene product is not associated with the GO term.

… particularly important when associating a GO term with a gene product should be avoided (but might otherwise be made, especially by an automated method).

Also used to document conflicting claims in the literature.

NOT can be used with ALL three GO Ontologies.

e.g. This protein does not have ‘kinase activity’ because it has beenfound that this protein has a disrupted/missing an ‘ATP binding’ domain.

Page 20: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

20 EMBRACE Workshop 7-9th November 2007

The ‘colocalizes_with’ qualifier

Only used with GO Component Ontology

• Gene products that are transiently or peripherally associated with an organelle or complex may be annotated to the relevant cellular component term, using the 'colocalizes_with' qualifier.

Page 21: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

21 EMBRACE Workshop 7-9th November 2007

The ‘contributes_to’ qualifier

i.e. annotating 'to the potential of the complex‘

• distinguishes an individual subunit from complex functions

All gene products annotated using 'contributes_to' must also be annotated to a cellular component term representing the complex that possesses the activity.

Only used with GO Function Ontology

Where an individual gene product that is part of a complex can be annotated to terms that describe the action (function or process) of the whole complex.

Page 22: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

22 EMBRACE Workshop 7-9th November 2007

Page 23: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

23 EMBRACE Workshop 7-9th November 2007

Where does GOA data go?

Page 24: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

24 EMBRACE Workshop 7-9th November 2007

etc.

QuickGO browser:

http://www.ebi.ac.uk/quickgo

Human Insulin Receptor (P06213)…

Page 25: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

25 EMBRACE Workshop 7-9th November 2007

GO data in Ensembl

Page 26: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

26 EMBRACE Workshop 7-9th November 2007

GOA data in Entrez Gene

Page 27: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

27 EMBRACE Workshop 7-9th November 2007http://amigo.geneontology.org/cgi-bin/amigo/go.cgi

Page 28: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

28 EMBRACE Workshop 7-9th November 2007

Gene Association Files Tab delimited files: http://www.geneontology.org/GO.current.annotations.shtml

DB DB_Object_ID

DB_Object_Symbol Qualifier* GO_id DB:Ref Evidence With*

UniProt Q9H2K8 TAOK3_HUMAN GO:0004674 PMID:10559204 IDA

UniProt O00110 O00110_HUMAN GO:0003676 GO_REF:0000002 IEA InterPro:IPR007087

UniProt P09884 DPOLA_HUMAN NOT GO:0000731

PMID:1730053 IMP

UniProt P09936 UCHL1_HUMAN GO:0005515 PMID:12082530 IPI UniProt:P46527

Aspect DB_Object_Name* DB_Object_Synonym* DB_Object Type

Taxon Date Assigned By

F Serine/threonine-protein.. IPI00410485 protein taxon:9606 20070720 HGNC

F protein taxon:9606 20070720 UniProt

P DNA polymerase alpha.. IPI00220317 protein taxon:9606 20060825 UniProt

F UCHL1: Ubiquitin carboxyl.. IPI00018352 protein taxon:9606 20070720 IntAct

* = optional field

Page 29: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

29 EMBRACE Workshop 7-9th November 2007

http://www.geneontology.org/GO.current.annotations.shtml

Page 30: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

30 EMBRACE Workshop 7-9th November 2007

ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/

http://www.ebi.ac.uk/GOA/downloads.html

Page 31: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

31 EMBRACE Workshop 7-9th November 2007

Output from the GOA database

Non-Redundant

based on IPI

(International Protein Index)

Cow

ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/

Redundant

625 proteome sets

Page 32: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

32 EMBRACE Workshop 7-9th November 2007

Output from the GOA database

Non-Redundant

based on IPI

(International Protein Index)

Cow

ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/

Redundant

625 proteome sets

Page 33: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

33 EMBRACE Workshop 7-9th November 2007

… annotations are also displayed in:

• All GO Consortium Model Organism Databases integrate and exchange GO annotation data to ensure a comprehensive set of annotations for their organism/area of interest.

• Array Products and data analysis

Affymetrix

Spotfire

Almac

Page 34: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

34 EMBRACE Workshop 7-9th November 2007

(http://www.geneontology.org/GO.tools.shtml)

… and Numerous Third Party Tools

Page 35: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

35 EMBRACE Workshop 7-9th November 2007

What’s new on the GO annotation front?

Page 36: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

36 EMBRACE Workshop 7-9th November 2007

Reference Genomes

Arabidopsis thaliana Caenorhabditis elegans Danio rerio (zebrafish) Dictyostelium discoideum Drosophila melanogaster Escherichia coli Homo sapiens Saccharomyces cerevisiae Mus musculusSchizosaccharomyces pombe Gallus gallus Rattus norvegicus

• Comprehensive annotation of a set of conserved pathway and disease-related proteins in human and orthologs in 11 other selected genomes

• Empowers comparative methods used in first pass annotation of other proteomes.

E. Coli hub

Page 37: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

37 EMBRACE Workshop 7-9th November 2007

GOA annotation focuses

Cardiovascular GO annotation Grant with the British Heart Foundation to support a collaboration with HGNC curators to provide full Gene Ontology annotation to genes associated with cardiovascular processes

wiki: http://wiki.geneontology.org/index.php/Cardiovascular

Immune GO annotationInterest in actively GO annotating immune relevant genes. GOA, UCL and MGI are collaborating to improve annotation for

immunologically-important genes, WT grant pending.

wiki: http://wiki.geneontology.org/index.php/Immunology

Page 38: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

38 EMBRACE Workshop 7-9th November 2007

Electronic Annotation developments

New mappings:

• Swiss-Prot Subcellar Location to GO (just released)

• Swiss-Prot UniPathway

Expansion of existing methods

• Ensembl Compara species expansion

Page 39: EBI is an Outstation of the European Molecular Biology Laboratory. GOA: Looking after GO annotations Emily Dimmer Gene Ontology Annotation (GOA) Database

39 EMBRACE Workshop 7-9th November 2007

Acknowledgements

The Gene Ontology Consortium and 1.5 members of GOA currently supported by an P41 grant from the National Human Genome Research Institute (NHGRI) [grant HG002273], GOA is also supported by core EMBL funding and BBSRC Tools and

Resources grant.

Rolf Apweiler. Head of the EBI protein sequence database group

Emily Dimmer

Evelyn Camon

Rachael Huntley

Daniel Barrell

David Binns

Contact the GOA team: [email protected]

GOA web page: http://www.ebi.ac.uk/goa