Transcript
Page 1: The Gene Ontology & Gene Ontology Annotation resources

The Gene Ontologyand Gene Ontology Annotation resources

Mélanie Courtot, Ph.D.EMBL-EBIGO/GOA Project leaderSPOT/UniProt content [email protected]

Industry workshopMarch 17 2016

Page 2: The Gene Ontology & Gene Ontology Annotation resources

In 1999, collaboration between 3 Model

Organism Databases

Ashburner et al., Nat Genet. 2000 May;25(1):25-9.

Page 3: The Gene Ontology & Gene Ontology Annotation resources

• A way to capture biological knowledge for individual gene productsin a written and computable form

• A set of concepts and their relationships to each other arrangedas a hierarchy

http://www.ebi.ac.uk/QuickGO

Less specific concepts

More specific concepts

The Gene Ontology

Page 4: The Gene Ontology & Gene Ontology Annotation resources

1. Molecular FunctionAn elemental activity or task or job

• protein kinase activity• insulin receptor activity

3. Cellular ComponentWhere a gene product is located

• mitochondrion

• mitochondrial matrix

• mitochondrial inner membrane

2. Biological ProcessA commonly recognized series of events

• cell division

Page 5: The Gene Ontology & Gene Ontology Annotation resources

Provide a public resource of data and tools

Annotate gene products using ontology terms

Develop the ontology

Aims of the GO project

Page 6: The Gene Ontology & Gene Ontology Annotation resources

Develop the ontology• An OWL ontology of >41,000 classes

• biological process, cellular component, molecular function

• > 14,000 imported classes (CL, Uberon, ChEBI, NCBI_tax)

• >136,000 logical axioms, including:• ~72,000 subClassOf axioms between named GO

classes• ~41,000 simple existential restrictions (subClassOf R

some C)• EL expressivity => fast, scalable reasoning (with

ELK)https://www.cs.ox.ac.uk/isg/tools/ELK/

Page 7: The Gene Ontology & Gene Ontology Annotation resources

Building the GO• The GO editorial team• Submission via GitHub,

https://github.com/geneontology/• Submissions via TermGenie, http://

go.termgenie.org• ~80% terms are now created this way

Page 8: The Gene Ontology & Gene Ontology Annotation resources

Annotate gene products

gene -> GO term

associated genes

GO

Database

genome and protein databases

Page 9: The Gene Ontology & Gene Ontology Annotation resources
Page 10: The Gene Ontology & Gene Ontology Annotation resources

…a statement that a gene product;

P00505

Accession Name GO ID GO term name Reference Evidence code

IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2

A GO annotation is …

Page 11: The Gene Ontology & Gene Ontology Annotation resources

…a statement that a gene product;

1. has a particular molecular function or is involved in a particular biological process

or is located within a certain cellular component

A GO annotation is …

P00505

Accession Name GO ID GO term name Reference Evidence code

IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2

Page 12: The Gene Ontology & Gene Ontology Annotation resources

…a statement that a gene product;

1. has a particular molecular function or is involved in a particular biological process

or is located within a certain cellular component

2. as described in a particular reference

A GO annotation is …

P00505

Accession Name GO ID GO term name Reference Evidence code

IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2

Page 13: The Gene Ontology & Gene Ontology Annotation resources

…a statement that a gene product;

1. has a particular molecular function or is involved in a particular biological process

or is located within a certain cellular component

2. as described in a particular reference

3. as determined by a particular method

A GO annotation is …

P00505

Accession Name GO ID GO term name Reference Evidence code

IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2

Page 14: The Gene Ontology & Gene Ontology Annotation resources

Experimental data

Computational analysis

Author statements/curator inference

(+ Inferred from electronic annotations)

http://www.evidenceontology.org/

Tracking provenance

Page 15: The Gene Ontology & Gene Ontology Annotation resources

Manual annotations

• Time-consuming process producing lower numbers of annotations (~2,800 taxons covered)

• More specific GO terms• Manual annotation is

essential for creating predictions

AleksandraShypitsyna

ElenaSperetta

AlexHolmes

TonySawford

Page 16: The Gene Ontology & Gene Ontology Annotation resources

Electronic Annotations• Quick way of producing large numbers of

annotations• Annotations use less-specific GO terms• Only source of annotation for ~438,000 non-

model organism species

orthology taxon constraints

Page 17: The Gene Ontology & Gene Ontology Annotation resources

* Includes manual annotations integrated from external model organism and specialist groups

2,752,604Manual annotations*

269,207,317Electronic annotations

Provide a public resource of data and toolsNumber of annotations in UniProt-GOA database (March 2016)

http://www.ebi.ac.uk/GOA

https://www.ebi.ac.uk/QuickGO/

Page 18: The Gene Ontology & Gene Ontology Annotation resources
Page 19: The Gene Ontology & Gene Ontology Annotation resources

Enrichment analysisSample Reference

40%20%

20%20%

=> The sample is over-enriched for

Page 20: The Gene Ontology & Gene Ontology Annotation resources

Spinocerebellar ataxia type 28

PaolaRoncaglia

Page 21: The Gene Ontology & Gene Ontology Annotation resources

Novel biomarkers of rectal radiotherapy

Page 22: The Gene Ontology & Gene Ontology Annotation resources

Biomarker for diagnosis and prognosis

Page 23: The Gene Ontology & Gene Ontology Annotation resources

Gene expression changes in diabetes

Page 24: The Gene Ontology & Gene Ontology Annotation resources

Improved network analysis

Page 25: The Gene Ontology & Gene Ontology Annotation resources

25

Page 26: The Gene Ontology & Gene Ontology Annotation resources

Many gene products are associated with a large number of descriptive, leaf GO nodes:

GO slims

Page 27: The Gene Ontology & Gene Ontology Annotation resources

…however annotations can be mapped up to a smaller set of parent GO terms:

GO slims

Page 28: The Gene Ontology & Gene Ontology Annotation resources

Slim generation for industry• Collaboration funded by Roche• Need a custom GO slim for analysis of genesets of

interest• Need to be descriptive enough• Without redundancy

• Internal proprietary vocabulary – hard to maintain• Desire to automatically map to GO

http://www.swat4ls.org/wp-content/uploads/2015/10/SWAT4LS_2015_paper_44.pdf

Page 29: The Gene Ontology & Gene Ontology Annotation resources

ROCHE CVGSEA with full GO GSEA with Roche CV

Courtesy Laura Badi

Page 30: The Gene Ontology & Gene Ontology Annotation resources

• Mapping query: participant_OR_reg_participant some cannabinoid

• Description: “A process in which a cannabinoid participates, or that regulates a process in which a cannabinoid participates.”

Page 31: The Gene Ontology & Gene Ontology Annotation resources

Results• We have successfully mapped 84% of terms from

RCV (308/365) to OWL queries that can be used to replicate some proportion of the original manual mapping.

• In addition, these queries find 1000s of terms that were missed in the original mapping.

David Osumi-Sutherland

Page 32: The Gene Ontology & Gene Ontology Annotation resources

GO SLIM (generic)

Page 33: The Gene Ontology & Gene Ontology Annotation resources

ROCHE CV – MANUAL ONLY

Page 34: The Gene Ontology & Gene Ontology Annotation resources

ROCHE CV MANUAL + AUTO

Page 35: The Gene Ontology & Gene Ontology Annotation resources

Acknowledgements

• GO editors and developers• GO annotators• The Gene Ontology (GO) Consortium• Samples, Phenotype and Ontology team (Helen Parkinson)• Protein Function Content team (Claire O’Donovan)• Funding: EMBL-EBI, National Human Genome Research

Institute (NHGRI)


Top Related