Transcript
Page 1: Introduction to the Gene Ontology and GO annotation resources

EBI is an Outstation of the European Molecular Biology Laboratory.

Introduction to the Gene Ontologyand GO annotation resources

Rachael Huntley

UniProtKB-GOA Curator

EBI

Page 2: Introduction to the Gene Ontology and GO annotation resources

2

Talk Overview

• Intro to GO and GO terms

• Annotating to GO

• Practical use of GO

• GO analysis tools

• Precautions

• Accessing GO annotations

Page 3: Introduction to the Gene Ontology and GO annotation resources

3

Reasons for the Gene Ontology

www.geneontology.org

• Inconsistency in naming of biological concepts

• Large datasets need to be interpreted quickly

• Increasing amounts of biological data available

Page 4: Introduction to the Gene Ontology and GO annotation resources

4

Search on ‘DNA repair’...get over 60,000 results

Increasing amounts of biological data available

Expansion of sequence information

Page 5: Introduction to the Gene Ontology and GO annotation resources

5

Reasons for the Gene Ontology

• Inconsistency in naming of biological concepts

• Large datasets need to be interpreted quickly

• Increasing amounts of biological data available

Page 6: Introduction to the Gene Ontology and GO annotation resources

6

• Need to organise the data, analyse it, share it in a timely manner to benefit other researchers

• Inconsistencies in analyses make cross-species or cross-database comparison difficult

Large datasets need to be interpreted quickly

Page 7: Introduction to the Gene Ontology and GO annotation resources

7

Reasons for the Gene Ontology

• Inconsistency in naming of biological concepts

• Large datasets need to be interpreted quickly

• Increasing amounts of biological data available

Page 8: Introduction to the Gene Ontology and GO annotation resources

8

English is not a very precise language

• Same name for different concepts• Different names for the same concept

Inconsistency in naming of biological concepts

?

An example …

Tactition Tactile sense

Taction

Sensory perception of touch ; GO:0050975

Page 9: Introduction to the Gene Ontology and GO annotation resources

9

• A way to capture biological knowledge in a written and computable form

The Gene Ontology

• A set of concepts and their relationships to each other arrangedas a hierarchy

www.ebi.ac.uk/QuickGO

Less specific concepts

More specific concepts

Page 10: Introduction to the Gene Ontology and GO annotation resources

10

The Concepts in GO

1. Molecular Function

2. Biological Process

3. Cellular Component

An elemental activity or task or job

• protein kinase activity• insulin receptor activity

A commonly recognised series of events

• cell division

Where a gene product is located

• mitochondrion

• mitochondrial matrix

• mitochondrial inner membrane

Page 11: Introduction to the Gene Ontology and GO annotation resources

11

Anatomy of a GO term

Unique identifierUnique identifier

Term nameTerm name

DefinitionDefinitionSynonymsSynonyms

Cross-referencesCross-references

Page 12: Introduction to the Gene Ontology and GO annotation resources

12

Ontology structure

• Directed acyclic graph

Terms can have more than one parent

• Terms are linked by relationships

is_a

part_of

regulates

+ve regulates

-ve regulateswww.ebi.ac.uk/QuickGO

Page 13: Introduction to the Gene Ontology and GO annotation resources

13

http://amigo.geneontology.org/cgi-bin/amigo/go.cgi

… there are more browsers available on the GO Consortium Tools page:

http://www.geneontology.org/GO.tools.shtml

Searching for GO terms

The latest OBO format Gene Ontology file can be downloaded from:

http://www.geneontology.org/ontology/gene_ontology.obo

http://www.ebi.ac.uk/QuickGO

Page 14: Introduction to the Gene Ontology and GO annotation resources

14 http://www.ebi.ac.uk/QuickGO

The EBI's QuickGO browser

Search GO terms or proteins

Page 15: Introduction to the Gene Ontology and GO annotation resources

15

GO Annotation

Page 16: Introduction to the Gene Ontology and GO annotation resources

16

Reactome

http://www.geneontology.org

Page 17: Introduction to the Gene Ontology and GO annotation resources

17

Aims of the GO project

• Compile the ontologies

- currently over 34,000 terms- constantly increasing and improving

• Annotate gene products using ontology terms

- around 30 groups provide annotations • Provide a public resource of data and tools

- regular releases of annotations- tools for browsing/querying annotations and editing the ontology

Page 18: Introduction to the Gene Ontology and GO annotation resources

18

Gene Ontology Annotation (UniProtKB-GOA) database at the EBI

• Largest open-source contributor of annotations to GO

• Member of the GO Consortium since 2001

• Provides annotation for more than 360,000 species

• UniProtKB-GOA’s priority is to annotate the human proteome

• UniProtKB-GOA is responsible for human, chicken and bovine annotations in the GO Consortium

Page 19: Introduction to the Gene Ontology and GO annotation resources

19

A GO annotation is … …a statement that a gene product;

1. has a particular molecular function or is involved in a particular biological process

or is located within a certain cellular component

2. as determined by a particular method

3. as described in a particular reference

P00505

Accession Name GO ID GO term name Reference Evidence code

IDAPMID:2731362aspartate transaminase activityGO:0004069GOT2

Page 20: Introduction to the Gene Ontology and GO annotation resources

20

Electronic

Manual

GOA makes annotations using two methods;

• Quick way of producing large numbers of annotations• Annotations use less-specific GO terms

• Time-consuming process producing lower numbers of annotations

• Annotations are very detailed and accurate

Page 21: Introduction to the Gene Ontology and GO annotation resources

21

Electronic

Manual

Evidence Codes

Inferred from Direct Assay (IDA) – e.g. enzyme assay

Inferred from Physical Interaction (IPI) – e.g. yeast 2-hybrid

Inferred from Electronic Annotation (IEA) – e.g. UniProtKB keyword mapping

Some examples…

See the full list on the GO Consortium websitehttp://www.geneontology.org/GO.evidence.shtml

Page 22: Introduction to the Gene Ontology and GO annotation resources

22

1. Mapping of external concepts to GO terms

e.g. InterPro2GO, Swiss-Prot Keyword2GO, Enzyme Commission2GO

Annotations are high-quality and have an explanation of the method (GO_REF)

Electronic annotation by UniProtKB-GOA

Macaque

Mouse

Dog

e.g. Human

Cow

Guinea PigChimpanzee Rat

Chicken

Ensembl compara

2. Automatic transfer of annotations to orthologs

Page 23: Introduction to the Gene Ontology and GO annotation resources

23

Mapping of concepts from UniProtKB files

GO:0004715 ; non-membrane spanning protein tyrosine kinase activity

GO:0005856: cytoskeleton

GO:0007155 ; cell adhesion

Page 24: Introduction to the Gene Ontology and GO annotation resources

24

1. Mapping of external concepts to GO terms

e.g. InterPro2GO, Swiss-Prot Keyword2GO, Enzyme Commission2GO

Annotations are high-quality and have an explanation of the method (GO_REF)

Electronic annotation by UniProtKB-GOA

Macaque

Mouse

DogCow

Guinea PigChimpanzee Rat

Chicken

Ensembl compara

2. Automatic transfer of annotations to orthologs

...and more

e.g. Human

Page 25: Introduction to the Gene Ontology and GO annotation resources

25

Manual annotation by UniProtKB-GOA

High–quality, specific annotations made using:

• Peer-reviewed papers

• A range of evidence codes to categorise the types of evidence found in a

paper

http://www.ebi.ac.uk/GOA

Page 26: Introduction to the Gene Ontology and GO annotation resources

26

Finding annotations in a paper

In this study, we report the isolation and molecular characterization of the B. napus PERK1 cDNA, that is predicted to encode a novel receptor-like kinase. We have shown that like other plant RLKs, the kinase domain of PERK1 has serine/threonine kinase activity, In addition, the location of a PERK1-GTP fusion protein to the plasma membrane supports the prediction that PERK1 is an integral membrane protein…these kinases have been implicated in early stages of wound response…

Process: response to wounding GO:0009611

wound response

serine/threonine kinase activity,

Function: protein serine/threonine kinase activity GO:0004674

integral membrane protein

Component: integral to plasma membrane GO:0005887

…for B. napus PERK1 protein (Q9ARH1)

PubMed ID: 12374299

Page 27: Introduction to the Gene Ontology and GO annotation resources

27

Additional information

• QualifiersModify the interpretation of an annotation

• NOT (protein is not associated with the GO term)

• colocalizes_with (protein associates with complex but is not a bona fide member)

• contributes_to (describes action of a complex of proteins)

• 'With' column

Can include further information on the method being referenced

e.g. the protein accession of an interacting protein

Page 28: Introduction to the Gene Ontology and GO annotation resources

28

GO:0005739 Mitochondrion

HumanProteinAtlas

UniProtKB-GOA integrates manual annotations from external groups

LifeDB (subcellular locations)

Reactome (pathways)

also;

Page 29: Introduction to the Gene Ontology and GO annotation resources

29

* Includes manual annotations integrated from external model organism and multi-species databases

Status of UniProtKB-GOA Annotation

0.9%

66%

159,630 910,536Manual annotations*

11,261,38298,037,844Electronic annotations

Proteins Annotations Evidence Source UniProt

Coverage

Aug 2011 Statistics

Page 30: Introduction to the Gene Ontology and GO annotation resources

30

How to access and use

GO annotation data

Page 31: Introduction to the Gene Ontology and GO annotation resources

31

Where can you find annotations?

UniProtKB

Ensembl

Entrez gene

Page 32: Introduction to the Gene Ontology and GO annotation resources

32

GO Consortium website

UniProtKB-GOA website

Gene Association Files17 column files containing all information for each annotation

Page 33: Introduction to the Gene Ontology and GO annotation resources

33

GO browsers

Page 34: Introduction to the Gene Ontology and GO annotation resources

34 http://www.ebi.ac.uk/QuickGO

The EBI's QuickGO browser

Search GO terms or proteins

Find sets of GO annotations

Page 35: Introduction to the Gene Ontology and GO annotation resources

35

• Access gene product functional information

• Analyse high-throughput genomic or proteomic datasets

• Validation of experimental techniques

• Get a broad overview of a proteome

• Obtain functional information for novel gene products 

How scientists use the GO

Some examples…

Page 36: Introduction to the Gene Ontology and GO annotation resources

36

Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...

Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)

attacked

time

control

Puparial adhesionMolting cycleHemocyanin

Defense responseImmune responseResponse to stimulusToll regulated genesJAK-STAT regulated genes

Immune responseToll regulated genes

Amino acid catabolismLipid metobolism

Peptidase activityProtein catabolismImmune response

Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...

Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)

Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.

MicroArray data analysis

Analysis of high-throughput genomic datasets

Page 37: Introduction to the Gene Ontology and GO annotation resources

37

(Orrù et al., Molecular and Cellular Proteomics 2007)

Characterisation of proteins interacting with ribosomal protein S19

Analysis of high-throughput proteomic datasets

Page 38: Introduction to the Gene Ontology and GO annotation resources

38

Validation of experimental techniques

(Cao et al., Journal of Proteome Research 2006)

Rat liver plasma membrane isolation

Page 39: Introduction to the Gene Ontology and GO annotation resources

39

Obtain functional information for novel gene products 

MPYVSQSQHIDRVRGAIEGRLPAPGNSSRLVSSWQRSYEQYRLDPGSVIGPRVLTSSELR DVQGKEEAFLRASGQCLARLHDMIRMADYCVMLTDAHGVTIDYRIDRDRRGDFKHAGLYI GSCWSEREEGTCGIASVLTDLAPITVHKTDHFRAAFTTLTCSASPIFAPTGELIGVLDAS AVQSPDNRDSQRLVFQLVRQSAALIEDGYFLNQTAQHWMIFGHASRNFVEAQPEVLIAFD ECGNIAASNRKAQECIAGLNGPRHVDEIFDTSAVHLHDVARTDTIMPLRLRATGAVLYAR IRAPLKRVSRSACAVSPSHSGQGTHDAHNDTNLDAISRFLHSRDSRIARNAEVALRIAGK HLPILILGETGVGKEVFAQALHASGARRAKPFVAVNCGAIPDSLIESELFGYAPGAFTGA RSRGARGKIAQAHGGTLFLDEIGDMPLNLQTRLLRVLAEGEVLPLGGDAPVRVDIDVICA THRDLARMVEEGTFREDLYYRLSGATLHMPPLRERADILDVVHAVFDEEAQSAGHVLTLD GRLAERLARFSWPGNIRQLRNVLRYACAVCDSTRVELRHVSPDVAALLAPDEAALRPALA LENDERARIVDALTRHHWRPNAAAEALGM

www.ebi.ac.uk/InterProScan

InterProScan

Page 40: Introduction to the Gene Ontology and GO annotation resources

40

Annotating novel sequences

• Can use BLAST queries to find similar sequences with GO annotation which can be transferred to the new sequence

• Two tools currently available;

AmiGO BLAST (from GO Consortium) http://amigo.geneontology.org/cgi-bin/amigo/blast.cgi

– searches the GO Consortium database

BLAST2GO (from Babelomics) http://www.blast2go.org/

– searches the NCBI database

Page 41: Introduction to the Gene Ontology and GO annotation resources

41

AmiGO BLAST

amigo.geneontology.org/cgi-bin/amigo/blast.cgi

Exportin-T from Pongo abelii (Sumatran orangutan)

Page 42: Introduction to the Gene Ontology and GO annotation resources

42

Analysis of large datasets using GO

Page 43: Introduction to the Gene Ontology and GO annotation resources

43

Using the GO to provide a functional overview for a large dataset

• Many GO analysis tools use GO slims to give a broad overview of the dataset

• GO slims are cut-down versions of the GO andcontain a subset of the terms in the whole GO

• GO slims usually contain less-specialised GO terms

Page 44: Introduction to the Gene Ontology and GO annotation resources

44

Slimming the GO using the ‘true path rule’ Many gene products are associated with a large number of descriptive, leaf GO nodes:

Page 45: Introduction to the Gene Ontology and GO annotation resources

45

Slimming the GO using the ‘true path rule’ …however annotations can be mapped up to a smaller set of parent GO terms:

Page 46: Introduction to the Gene Ontology and GO annotation resources

46

GO slims

or you can make your own using;

Custom slims are available for download;

http://www.geneontology.org/GO.slims.shtml

• AmiGO's GO slimmer

• QuickGOhttp://www.ebi.ac.uk/QuickGO

http://amigo.geneontology.org/cgi-bin/amigo/slimmer

Page 47: Introduction to the Gene Ontology and GO annotation resources

47 www.ebi.ac.uk/QuickGO

Map-up annotations with GO slims

The EBI's QuickGO browser

Search GO terms or proteins

Find sets of GO annotations

Page 48: Introduction to the Gene Ontology and GO annotation resources

48

GO analysis tools

Page 49: Introduction to the Gene Ontology and GO annotation resources

49

Numerous Third Party Tools

www.geneontology.org/GO.tools.shtml

Page 50: Introduction to the Gene Ontology and GO annotation resources

50

Term enrichment

• Most popular type of GO analysis

• Determines which GO terms are more oftenassociated with a specified list of genes/proteins compared with a control list or rest of genome

• Many tools available to do this analysis

• User must decide which is best for their analysis

Page 51: Introduction to the Gene Ontology and GO annotation resources

51

Precautions when using GO annotations for analysis

• Recommended that ‘NOT’ annotations are removed before analysis - only ~5000 out of ~100 million annotations are ‘NOT’- can confuse the analysis

• The Gene Ontology is always changing and GO annotations are continually being created

- always use a current version of both

- if publishing your analyses please report the versions/dates you used

http://www.geneontology.org/GO.cite.shtml

Page 52: Introduction to the Gene Ontology and GO annotation resources

52

Precautions when using GO annotations for analysis

• Unannotated is not unknown

- where there is no evidence in the literature for a process, function orlocation the gene product is annotated to the appropriate ontology’sroot node with an ‘ND’ evidence code (no biological data), thereby distinguishing between unannotated and unknown

• Pay attention to under-represented GO terms- a strong under-representation of a pathway may mean that normalfunctioning of that pathway is necessary for the given condition

Page 53: Introduction to the Gene Ontology and GO annotation resources

53

The UniProtKB-GOA group

Curators:

Software developer:

Team leaders:

Emily DimmerRachael Huntley

Rolf Apweiler

Email: [email protected]

http://www.ebi.ac.uk/GOA

Claire O’Donovan

Tony Sawford

Yasmin Alam-Faruque

Page 54: Introduction to the Gene Ontology and GO annotation resources

54

Acknowledgements

GO ConsortiumMembers of;

UniProtKB

InterPro

IntAct

HAMAP

FundingNational Human Genome Research Institute (NHGRI)Kidney Research UKEMBL


Top Related