gene ontology overview and perspective lung development ontology workshop

25
Gene Ontology Overview and Perspective Lung Development Ontology Workshop

Upload: alvin-henderson

Post on 30-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

Gene OntologyOverview and Perspective

Lung Development Ontology Workshop

2

what kinds of things exist?

what are the relationships between these things?

eye

_part of

sclera

_is a

sense organ

developsfrom

Optic placode

A biological ontology is:

A (machine) interpretable representation of some aspect of biological reality

http://www.macula.org/anatomy/eyeframe.html

3

Gene Ontology (GO) Consortium

aa

www.geneontology.org Formed to develop a shared language adequate for the annotation of molecular characteristics across organisms; a common language to

share knowledge.

Seeks to achieve a mutual understanding of the definition and meaning of any word used; thus we are able to support cross-database queries.

Members agree to contribute gene product annotations and associated sequences to GO database; thus facilitating data analysis and semantic interoperability.

4

Gene Ontology widely adopted

AgBase

5

Molecular Function = elemental activity/task

the tasks performed by individual gene products; examples are carbohydrate binding and ATPase activity

Biological Process = biological goal or objective broad biological goals, such as mitosis or purine metabolism, that

are accomplished by ordered assemblies of molecular functions

Cellular Component = location or complex subcellular structures, locations, and macromolecular complexes;

examples include nucleus, telomere, and RNA polymerase II holoenzyme

GO represents three biological domains

Terms are defined graphically relative to other terms

7

Build and maintain logically rigorous and biologically accurate ontologies

Comprehensively annotate reference genomes

Support genome annotation projects for all organisms

Freely provide ontologies, annotations and tools to the research community

The Gene Ontology (GO)

1. Build and maintain logically rigorous and biologically accurate ontologies

2. Comprehensively annotate reference genomes

3. Support genome annotation projects for all organisms

4. Freely provide ontologies, annotations and tools to the research community

8

The GO is still developing daily both in ontological structures and in domain knowledge

Ontology development workshops focus on specific domains needing revision and bring together ontology developers and domain experts

Currently running ~2 workshops / year1. Metabolism and cell cycle (Aug, 2004)2. Immunology and defense response (Nov05, Apr06)3. Early CNS development (June, 2006)4. Peripheral nervous system development (Feb, 2007)5. Blood Pressure Regulation (June, 2007)6. Muscle Development (July, 2007)

Building the ontologies

9

725 new terms related to immunology

127 new terms added to cell type ontology

Building the ontology: Immune System Process

Red part_of

Blue is_aAlex Diehl

10

P05147

PMID: 2976880

GO:0047519IDA

P05147 GO:0047519 IDA PMID:2976880

GO Term

Reference

Evidence

Annotating Gene Products using GO

Gene Product

11

Annotations are assertions

There is evidence that this gene product can be best classified using this term

The source of the evidence and other information is included

There is agreement on the meaning of the term

12

Annotations are the connections between genomic information and the GO.Experiments provide the data that enables us to annotate gene products with terms from the ontologies.

Annotations for App: amyloid beta (A4) precursor protein

Annotations are assertions

13

NO Direct ExperimentInferred from evidence

Direct Experiment in organism

We use evidence codes to describe the basis of the annotation

IDA: Inferred from direct assay IPI: Inferred from physical interaction IMP: Inferred from mutant phenotype IGI: Inferred from genetic interaction IEP: Inferred from expression pattern IEA: Inferred from electronic annotation ISS: Inferred from sequence or structural

similarity TAS: Traceable author statement NAS: Non-traceable author statement IC: Inferred by curator RCA: Reviewed Computational Analysis ND: no data available

14

GO Annotation Stats:

I

GO Annotations

Total manual GO annotations - 388,633

Total proteins with manual annotations – 80,402

Contributing Groups (including MGI): - 19

Total Pub Med References – 346,002

Total number predicted annotations – 17,029,553

Total number taxa – 129,318

Total number distinct proteins – 2,971,374

April 24, 2007

15Now we can query across all annotations based on shared biological activity.

Annotations of gene products to GO are genome specific

16

GO is a functional annotation system of great utility to the data-driven biologist

17

GO enables genomic data analysis

Microarrays allow biologists to record changes in gene function across entire genomes

Result: Vast amounts of gene expression data desperately needing cataloging and tagging

Many data analysis tools use GO graph structure to statistically evaluate clusters of co-expressed genes based on shared functional annotations 680 pub (of 1517) on GO list 46 microarray tools contributed

18

OCT 13, 2006

Cancer Genome Projects

GO supports functional classifications

19

GO is wildly successful

FIGURE 3. Representative cell-type-specific genes and corresponding molecular functions.

Nature: January 2007

20

Comprehensively annotate Reference Genomes

Saccharomyces cerevisiae Schizosaccharomyces

pombe Arabidopsis thaliana

Human Mouse Fly Rat Chicken Zebrafish Worm Dicty E.coli

21

Priority genes: those implicated in human diseases

Determine orthologs/homologs in reference genomes

For these genes, comprehensively curate biomedical literature

Reference Genome Annotation Project

Mary Dolan

22

Shared annotation focus = Coordinated attention to ontology structure

Orthology/homology set across primary model organisms

Reference ID mappings including associations of sequences, gene/proteins, and human diseases

Ultimately, transparent access to comprehensive information about genes among the primary data providers

Reference Genome Development Projects

23

1. Verifying and maintaining domain representations in the ontology that reflect best knowledge of the real world.

- Depends on the involvement of biologists (domain experts)- Difficult to automate- Must accommodate continuing changes in what we think we

understand about biological systems

2. Providing comprehensive annotations, where experimental evidence is available, for all genes

- Dependant on the quality of annotations from experimental literature

- Combines manual curation by highly-trained scientists supplemented by computational inference prediction annotations

- Comprehensiveness may depend on changes in biomedical publishing

Ongoing Challenges for the GO Consortium

24

acknowledgements

MGI

Carol BultJanan EppigJim KadinJoel RichardsonMartin Ringwald

Lois Maltais TBK Reddy

Monica McAndrews-Hill

Nancy Butler

GO

Michael Ashburner (Cambridge) J. Michael Cherry (Stanford) Suzanna Lewis (LBNL)

Rex Chisholm (NWU) David Hill (Jackson Lab) Midori Harris (EBI) Chris Mungall (LBNL) Jane Lomax (EBI) Eurie Hong (Stanford) Jen Clark (EBI)

GO @ MGI

Alex Diehl Mary Dolan Harold Drabkin David Hill

Li Ni Dmitry Sitnikov

25

Gene Ontology www.geneontology.org

MGI projects are supported by NIH [NHGRI, NICH, and NCI].

Bar Harbor, Maine, USA

Mouse Genome Informaticswww.informatics.jax.org

GO Consortium is supported by NIH-NHGRI and by the European Union RTD Programme

PRO is supported by NIGMS

Corpora is supported by NLM