the gene ontology project and its application to fission yeast functional genomics data

64
The Gene Ontology project and its application to fission yeast functional genomics data Valerie Wood

Upload: anthony-hopkins

Post on 30-Dec-2015

28 views

Category:

Documents


1 download

DESCRIPTION

The Gene Ontology project and its application to fission yeast functional genomics data. Valerie Wood. Introduction to the Gene Ontology (GO) project. What is GO? (requirement, implementation). How does it work? (annotation and ontology development). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Gene Ontology project and its application to  fission yeast  functional genomics data

The Gene Ontology project and its application to

fission yeast functional genomics data

Valerie Wood

Page 2: The Gene Ontology project and its application to  fission yeast  functional genomics data

• Introduction to the Gene Ontology (GO) project

• Data mining the fission yeast genome data

• What is GO?

(requirement, implementation) • How does it work? (annotation and ontology

development) • What can I use it for?

(applications)

•Tools for using GO for data analysis

• How can I use it? Practical exercises

Page 3: The Gene Ontology project and its application to  fission yeast  functional genomics data

Gene Ontology Why?

Page 4: The Gene Ontology project and its application to  fission yeast  functional genomics data

Traditional analysisGene 1

mRNA exportprotein phosphorylationtranscriptionmitotic cell cycle…

Gene 2

mRNA exportDNA recombinationRNA elongation (pol II)…

•requires literature searching

•time-consuming

•gene by gene basis

Page 5: The Gene Ontology project and its application to  fission yeast  functional genomics data

Not scalable!

Gene 3mRNA exporttranscription (pol II)…

Gene 1

mRNA exportprotein phosphorylationtranscriptionmitotic cell cycle…

Gene 2

mRNA exportDNA recombinationRNA elongation (pol II)…

Gene 4mRNA exporttranscription polyadenylation…

Gene 5mRNA exportRNA elongation…

Gene 6mRNA exportrRNA transcriptionDNA topological change…

Gene 5000cell cyclechromosome segregationkinetochore assemblyprotein localization…

Page 6: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.teamtechnology.co.uk/f-scientist.jpg

Help!The problem gets biggerand bigger

and bigger!

Page 7: The Gene Ontology project and its application to  fission yeast  functional genomics data

The literature corpus

Including DNA repair gives 555

How will we ever extract all of this information?

What is the size of the ‘annotation problem’?

Fission yeast + pombe gives 8170 results

Including cell cycle gives 3467

Page 8: The Gene Ontology project and its application to  fission yeast  functional genomics data

Grouping by process

mRNA exportGene 1Gene 2Gene 3Gene 4Gene 5

transcriptionGene 1Gene 2Gene 3Gene 4Gene 5..

protein phosphorylationGene 1Gene 7Gene 10…

cell wall organization and biogenesisGene 10Gene 15Gene 18…

Cell cycleGene 1Gene 7Gene 8…

Page 9: The Gene Ontology project and its application to  fission yeast  functional genomics data

Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...

Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)

attacked

time

control

Puparial adhesionMolting cyclehemocyanin

Defense responseImmune responseResponse to stimulusToll regulated genesJAK-STAT regulated genes

Immune responseToll regulated genes

Amino acid catabolismLipid metobolism

Peptidase activityProtein catabloismImmune response

Selected Gene Tree: pearson lw n3d ...Branch color classification:Set_LW_n3d_5p_...

Colored by: Copy of Copy of C5_RMA (Defa...Gene List: all genes (14010)

Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI.

GO can be used to spot patterns in thousands of genes typically obtained by functional genomics data

Page 10: The Gene Ontology project and its application to  fission yeast  functional genomics data

A controlled vocabulary

•The same phrase is used to describe different ‘entities’

•Different phrases have the same or related meanings

GO is also necessary for handling different terminology used between and within scientific communities:

Page 11: The Gene Ontology project and its application to  fission yeast  functional genomics data

late endosome to vacuole transport multivesicular body sorting

MVB sorting

late endosome to vacuole transport ; GO:0045324

Page 12: The Gene Ontology project and its application to  fission yeast  functional genomics data

Bud initiation?

Page 13: The Gene Ontology project and its application to  fission yeast  functional genomics data

Bud initiation?tooth bud initiation

cellular bud initiation flower bud initiation

Page 14: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

• GO provides a “controlled vocabulary” for biological knowledge that can be interpreted identically both within and between genomes• Species independent, therefore enabling cross species comparisons• Provides a way to capture and represent biological knowledge in a computable form

So what is GO ?

Page 15: The Gene Ontology project and its application to  fission yeast  functional genomics data

Gene Ontology Content and structure

Page 16: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

What is Ontology?

• Dictionary: A branch of metaphysics concerned with the nature and relations of being.

• In philosophy, the most fundamental branch of metaphysics. It studies being or existence as well as the basic categories thereof—trying to find out what entities and what types of entities exist. – Wikipedia

1606 1700s

Page 17: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

So what does that mean?From a practical view, ontology is the representation of something we know about. “Ontologies" consist of a representation of things, that are detectable or directly observable, and the relationships between those things.

Ontologies provide controlled, consistent vocabularies to describe concepts and relationships, thereby enabling knowledge sharing Gruber 1993

Page 18: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Ontology

Includes:1. A vocabulary of terms (names for

concepts)2. Definitions3. Defined logical relationships to each other

Page 19: The Gene Ontology project and its application to  fission yeast  functional genomics data

• GO divided into three parts:

• What does the gene product do?• Where and when does it act?• Why does it perform these activities?

What information might we want to capture about a gene product?What information might we want to capture about a gene product?

cellular component

biological process

molecular function

Page 20: The Gene Ontology project and its application to  fission yeast  functional genomics data

Cellular Component• where a gene product acts (location or complex)

Images from http://microscopy.fsu.edu

Page 21: The Gene Ontology project and its application to  fission yeast  functional genomics data

Molecular Function• What a gene product does (activity)

glucose-6-phosphate isomerase activity

insulin bindinginsulin receptor activity

drug transporter activity

Page 22: The Gene Ontology project and its application to  fission yeast  functional genomics data

Biological Process

Broad objective or goal

cell division

transcription

gluconeogenesis

Page 23: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Function (what) Process (why)

Analogy: Gene Product = hammer

Drive nail (into wood) Carpentry

Drive stake (into soil) Gardening

Smash roach Pest Control

Page 24: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Ontology Structure

• The Gene Ontology is structured as a directed acyclic graph (DAG)

• A DAG is similar to a hierarchy except terms can have more than one parent

• Terms can have zero, one or more children

• Terms are linked by two relationships– is-a– part-of

Page 25: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Parent-Child Relationships

Many-to-many parental relationship

Each child may have one or more parents

DAG: Directed Acyclic Graph

One-to-many parental relationship

Each child has only one parent

Heirarchy

Page 26: The Gene Ontology project and its application to  fission yeast  functional genomics data

Ontology Structurecell

membrane chloroplast

mitochondrial chloroplastmembrane membrane

is-apart-of

Page 27: The Gene Ontology project and its application to  fission yeast  functional genomics data

Ontology structure

• This allows the modelling of biology more realistically than a hierarchy

Page 28: The Gene Ontology project and its application to  fission yeast  functional genomics data

Ontology structure

An important feature of GO is that broader parents give rise to more specific children.When a gene is annotated to a term, it is automatically annotated to all of its parent terms

Allows curators to assign terms at different levels of granularity, depending what is known or can be inferred

gene A

Page 29: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

True Path Rule• Every path from any term back to its top-level

parent(s) must always be true (biologically accurate), or the ontology must be revised

cell cytoplasm

chromosome nuclear chromosome cytoplasmic chromosome mitochondrial chromosome

nucleus nuclear chromosome

is-a

part-of

Page 30: The Gene Ontology project and its application to  fission yeast  functional genomics data

Anatomy of a GO termid: GO:0006094name: gluconeogenesisnamespace: processdef: The formation of glucose fromnoncarbohydrate precursors, such aspyruvate, amino acids and glycerol.exact_synonym: glucose biosynthesis

synonymhttp://cancerweb.ncl.ac.uk/ def

sourceis_a: GO:0006006is_a: GO:0006092

unique GO IDterm name

definition

parentage

ontology

Page 31: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

No GO Areas

• GO covers ‘normal’ functions and processes– No pathological processes– No experimental conditions

• NO evolutionary relationships• A function term refers to a reaction or activity,

NOT a gene product

• NOT a system of nomenclature for genes

Page 32: The Gene Ontology project and its application to  fission yeast  functional genomics data

So how does the GO annotation happen?

Page 33: The Gene Ontology project and its application to  fission yeast  functional genomics data

*****

PMID: *****

IDA

GO:****

What type of evidence?

For each gene Read and record paper

Identify GO terms

******* GO:******* IDA PMID:*******

Page 34: The Gene Ontology project and its application to  fission yeast  functional genomics data

Submit to the GO Consortium

Page 35: The Gene Ontology project and its application to  fission yeast  functional genomics data

Annotation appears in GO database

Page 36: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Who uses GO?

Page 37: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

http://www.geneontology.org

Page 38: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

many groups annotate, we see the results of research across species

GO:0019789SUMO ligase activity

SGDGeneDBS.pombe

pli1

nse2

pli1CST9

MMS21

SIZ1

NFI1

RGD

Pias4

Miz1

Pias3

MGI

Pias4

Pias3

Pias2

TAIR

ATSIZ1

Page 39: The Gene Ontology project and its application to  fission yeast  functional genomics data

Fission yeast GO annotation status

Page 40: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

7519

Acetyl-CoA CoA-SH

Citrate synthase

13494

Cellular Component:

Molecular Function:

9459

TCACycle

Biological Process:

Total 30,616 annotations to 3080 terms

Data from 06/06/07

Fission yeast annotation progress

Page 41: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Evidence Codes used8618 IDA inferred from direct assay 776 IPI inferred from physical interaction 901 IGI inferred from genetic interaction1089 TAS traceable author statement1073 IC inferred by curator9045 ISS inferred from sequence similarity1912 IMP inferred from mutant phenotype 522 NAS non-traceable author statement6397 IEA from electronic annotation

30333

Page 42: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Manual Curation• Emphasis on Primary Literature (IDA, IMP, IGI, IPI) • Manual inspection of sequence similarity (ISS)

Computational Mappings (IEA)• InterPro (domain or family) to GO• UniProt (Swissprot keyword to GO)• E.C. number to GO

GO Curation Strategy

1617 PMIDs15230 annotations

5815 annotations

9569

annotations

Data from 06/06/07

Page 43: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

1436416682

1900820108

22530

30343 30616

0

5000

10000

15000

20000

25000

30000

35000

Jan-04Mar-04May-04Jul-04Sep-04Nov-04Jan-05Mar-05May-05Jul-05Sep-05Nov-05Jan-06Mar-06May-06Jul-06Sep-06Nov-06Jan-07Mar-07May-07

date

associations

Series1Series2Series3Series4

pombe manualpombe electronicpombe totalcerevisiae total

Total 30,616 annotations to 3080 GO terms S. cerevisiae has 27662 annotations to 2971 GO term (no IEA)

Data from 06/06/07

GO Curation Progress

Page 44: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/All three aspects unknown 105 (564 S. cerevisiae)

Function 3542 (includes protein binding)

Biological Process4019

Cellular Component4821

14672679

3279(3455)

191 54

18

Total 5004 (5780 S. cerevisiae)

993

GO aspect coverage

Page 45: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

• A gene product can have several functions, cellular locations and be involved in many processes

• Groups of functions make up a biological process

• Annotation of a gene product to one ontology is independent from its annotation to other ontologies

• Genes with ‘no data’ are annotated to the ‘root node’

Page 46: The Gene Ontology project and its application to  fission yeast  functional genomics data

Developing GO

Page 47: The Gene Ontology project and its application to  fission yeast  functional genomics data

Developing GO

• GO under constant development• International group of developers (all the major

model organism databases contribute)– central editorial office at EBI - 4 members

• Developed in consultation with domain experts– Term suggestions handled through online

tracking system

Adding new terms and biological concepts to the Gene Ontology

Page 48: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Why GO changes

• Advances in biology• New organisms join, need new terms• Fix errors and legacy terms• Improve logical consistency

• Suggestions for changes come from • the GO editors and organism curators • the user community• Analysis of logical consistency

Page 49: The Gene Ontology project and its application to  fission yeast  functional genomics data
Page 50: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

flybase

SGD

SGD

MGI

Page 51: The Gene Ontology project and its application to  fission yeast  functional genomics data
Page 52: The Gene Ontology project and its application to  fission yeast  functional genomics data
Page 53: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

• Provides a standard for annotation• Has 2 components the ontology and the

annotations• Allows experimental work to be evaluated in the

context of other experimental data which may be annotated at different levels of granularity

• Allows biologists to search and analyse data (particularly for identifying groups of overrepresented genes in large scale experiments)

• Becomes increasingly powerful as the ontologies and annotations are refined

• More here?

Page 54: The Gene Ontology project and its application to  fission yeast  functional genomics data

Applications

Page 55: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

• Access gene product functional information (GeneDB, AmiGO)

• Do cross species comparison (AmiGO)

• “Slimming”, Grouping data into broad (user defined) categories to provide an overview of a geneset or genome

• “Term Enrichment” (or depletion) Provide a link between biological knowledge and functional genomics data

• Data mining

Simple- All genes annotated to a term (GeneDB, AmiGO)

Complex -Using Boolean operators (union/intersect) (GeneDB)

What can scientists do with GO?

Page 56: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Slimming

• High level view of GO (genes annotated to granular terms are mapped to higher level terms)

• Allows users to group genes into broader categories to assess their distribution, useful for large scale, genome wide analyses or smaller gene sets

• Different Annotation groups have created specific GO_Slims are available at GO’s FTP site

• You can create and use your own GO slim with high level terms of interest

• CARE: not a gene product count, as gene products have multiple annotations

Page 57: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Term Enrichment

What is it?• finds significant GO terms shared among a list of genes• discover what these genes may have in common • statistical measure of how likely your differentially regulated genes fall into

that category by chance

microarray

1000 genesexperiment

100 genes differentially regualted

mitosis – 80/100apoptosis – 40/100p. ctrl. cell prol. – 30/100glucose transp. – 20/100

How?

GO Term mapper/Onto express

Page 58: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Data mining, complex

A B C D E F G H I J A cell division 1018 356 224 31 49 2 271 132 - - B transcription>translat. 1367 53 66 172 0 111 47 - - C cytoskeletal/morph/vmt 842 152 32 30 78 160 - - D metabolic pathways 800 196 61 36 52 - - E mitochondrial function 732 98 47 14 - - F membrane transport 299 6 2 - - G stress 422 65 - - H signal transduction 369 - - I other 323 - J none 988

What: You can data mine the entire genome to find overlaps and intersections between terms of interest to target genes for further study

Page 59: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Tools

• AmiGO• GO Slim Mapper- maps the specific, granular

GO terms used to annotate a list of gene products to corresponding more general parent GO slim terms.

• GO Term Finder- searches for significant shared GO terms, or parents of the GO terms, used to annotate gene products in a given list.

• GeneDB Boolean querying

Page 60: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Acknowledgements

• Martin Aslett (database support)• Adrian Tivey (GeneDB

programmer)• Midori Harris and the GO editorial

office• Pfam group• SGD curators

Page 61: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Additional points

Page 62: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

1. NOT• a gene product is NOT associated with the GO term • to document conflicting claims in the literature.

2. Contributes to• distinguishes between individual subunit functions and

whole complex functions• used with GO Function Ontology

3. Colocalizes with• transiently or peripherally associated with an organelle

or complex • used with GO Component Ontology

Modifying the interpretation of an annotation: the

Qualifier column

Page 63: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Fatty acid biosynthesis (Swiss-Prot Keyword)

EC:6.4.1.2 (EC number)

IPR000438: Acetyl-CoA carboxylase carboxyl transferase beta subunit (InterPro entry)

GO:Fatty acid biosynthesis

(GO:0006633)

GO:acetyl-CoA carboxylase activity

(GO:0003989)

GO:acetyl-CoA carboxylaseactivity

(GO:0003989)

Electronic Annotations

Page 64: The Gene Ontology project and its application to  fission yeast  functional genomics data

http://www.geneontology.org/

Unknown v.s. Unannotated• “Unknown” is used when the curator has

determined that there is no existing literature to support an annotation.– Biological process unknown GO:0000004– Molecular function unknown GO:0005554– Cellular component unknown GO:0008372

• NOT the same as having no annotation at all – No annotation means that no one has looked yet