gene ontology and functional annotation

Download Gene Ontology and Functional Annotation

If you can't read please download the document

Upload: jenny

Post on 07-Jan-2016

86 views

Category:

Documents


2 download

DESCRIPTION

ASPB Plant Biology, June 29, 2008, Merida. Gene Ontology and Functional Annotation. Donghui Li. TAIR literature statistics. O utline. Functional annotation Controlled vocabularies: GO and PO Functional annotation at TAIR Community annotation. Functional annotation. - PowerPoint PPT Presentation

TRANSCRIPT

  • Gene Ontology and Functional AnnotationDonghui LiASPB Plant Biology, June 29, 2008, Merida

  • TAIR literature statistics

    May 2007May 2008Reference31,05834,179Research articles22,64025,001Full-text papers15,57216,638Average new papers/month204216Loci with valid references9,28910,847

  • Functional annotation

    Controlled vocabularies: GO and PO

    Functional annotation at TAIR

    Community annotationOutline

  • is defined as the process of collecting information about a genes biological identity:

    molecular function (protein kinase)biological roles (protein phosphorylation)subcellular localization (cytoplasm)

    aliasesmutant phenotypeexpression domainFunctional annotation

  • An annotation is a statement that a gene product has a particular molecular function is involved in a particular biological processis located within a certain cellular componentas determined by a particular method as described in a particular referenceWhat is an annotation?Adapted from Harold J Drabkin, The Jackson Laboratory

  • Adapted from Harold J Drabkin, The Jackson Laboratory

    Smith et al. (2006) determined by an enzyme assay that Abc2 has protein kinase activity, is involved in the process of protein phosphorylation, and is located in the cytoplasm.ReferenceEvidence codeControlled vocabulariesGene product

  • Non-controlled vocabularysame name, different conceptdifferent name, same conceptControlled vocabulary (CV)Controlled vocabularyA standardized restricted set of defined terms designed to reduce ambiguity in describing a concept

  • Same name, different conceptCell

  • Same name, different conceptgermination

    seed germinationpollen germinationspore germination

  • glucose biosynthesisglucose synthesisglucose formationglucose anabolismgluconeogenesis

    Different name, same conceptnoncarbohydrate precursors(pyruvate, amino acids and glycerol)glucose(3Z)-phytochromobilin + oxidized ferredoxin = biliverdin IXa + reduced ferredoxin. (EC:1.3.7.4)phytochromobilin synthase activity =phytochromobilin:ferredoxin oxidoreductase activity

    protein formationtranslation = protein biosynthesis

  • Cross-species cross-database comparison is problematic without CVtranslationprotein biosynthesis

    phytochromobilin synthase activityphytochromobilin:ferredoxin oxidoreductase activity

  • Cross-species cross-database comparison is problematic without CVpollensporegermination

    seed germinationpollen germinationspore germination

  • GO: The Gene Ontology, Gene Ontology Consortium

    PO: The Plant Ontology, Plant Ontology ConsortiumControlled vocabularies used by TAIR

  • molecular function: catalytic / binding activitieskinase activity, DNA binding activitytranscriptional factor

    biological process: biological goal or objectivesignal transductionmitosis, purine metabolism

    cellular component: location or complexnucleus ribosome, proteasome

    Gene Ontology

  • Term

  • Ontology structure: directed acyclic graph (DAG)DAG: each child may have one or more parentsparent 1childparent 2

  • protein complexorganellemitochondrionfatty acid beta-oxidation multienzyme complexOntology structure: directed acyclic graph (DAG)

  • is-aprotein complexorganellemitochondrionfatty acid beta-oxidation multienzyme complexpart-ofis-aOntology structure: term-term relationships

  • Gene ontology browser: AmiGOhttp://www.geneontology.org

    http://amigo.geneontology.org

  • Plant structure

    morphological and anatomical structures

    stamen, petal, guard cell

    Growth and developmental stages

    whole plant growth stages and plant structure developmental stages

    seedling growth, rosette growth, leaf development stages, embryo development stagesPlant Ontology

  • termevidenceassociationgeneHow are annotations made?The Plant Journal (2006) 47:701AT5G27620GO:0004672 protein kinase activitykinase assay

  • Experimental evidence codesEXP- Inferred from ExperimentIMP- Inferred from Mutant PhenotypeIDA- Inferred from Direct AssayIGI- Inferred from Genetic InteractionIPI- Inferred from Physical InteractionIEP- Inferred from Expression PatternComputational analysis evidence codesISS- Inferred from Sequence or structural SimilarityEvidence codes

  • May 2008KnownKnown, EXPUnannotatedUnknownFunctional annotation of Arabidopsis genome using GO

  • Search GO Annotations

  • Papers entered into TAIR (May 07 to May 08)

    TotalWith gene-related dataIndexedCuratedPapers in priority 1 journals222166100%144 (86%)Papers in priority 2 journals546385100%207 (54%)

    Papers in priority 3 journals517314100%31 (10%)

    Papers in priority 4 journals1291461100%11 (2%)Total257613261326393 (30%)

  • TAIR - Plant Physiology collaborationAuthor submits annotation after the paper is acceptedWeb-based interface

    AGI locus identifier (At1g01040)Gene function annotation linked to loci with method

    Will expand to include other journals (Plant Cell ...)

  • Functional annotation submission [email protected]

  • Add your comment on TAIR

    What is functional annotation;

    I will also introduce controlled vocabularies and gene ontology;

    This is followed by a brief introduction to the current status of functional annotation at TAIR;

    Finally I will demonstrate how to use the tools available at TAIR to find functions of Arabidopsis genes.

    What is functional annotation;

    I will also introduce controlled vocabularies and gene ontology;

    This is followed by a brief introduction to the current status of functional annotation at TAIR;

    Finally I will demonstrate how to use the tools available at TAIR to find functions of Arabidopsis genes.

    So what is an annotation: An annotation is a statement that a gene producthas a particular molecular functionis involved in a particular biological processis located within a certain cellular componentas determined by a particular method as described in a particular reference.To give you a simple example. An annotation would look like this: Smith et al. determined by a direct assay that Abc2 has protein kinase activity, is involved in the process of protein phosphorylation, and is located in the cytoplasm.Each annotation contains 4 pieces of information: 1) The gene product; 2)Vocabularies or terms that are used to describe the biological identity of the gene product; 3) An evidence code that indicates how the association between the gene product and the terms are made; 4)And finally a reference.The key to high quality annotations is to use standardized controlled vocabularies or terms to describe various aspects of a gene product. For example, in this annotation we used terms such as kinase activity protein phosphorylation cytoplasmto describe Abc2.

    So what is an annotation: An annotation is a statement that a gene producthas a particular molecular functionis involved in a particular biological processis located within a certain cellular componentas determined by a particular method as described in a particular reference.To give you a simple example. An annotation would look like this: Smith et al. determined by a direct assay that Abc2 has protein kinase activity, is involved in the process of protein phosphorylation, and is located in the cytoplasm.Each annotation contains 4 pieces of information: 1) The gene product; 2)Vocabularies or terms that are used to describe the biological identity of the gene product; 3) An evidence code that indicates how the association between the gene product and the terms are made; 4)And finally a reference.The key to high quality annotations is to use standardized controlled vocabularies or terms to describe various aspects of a gene product. For example, in this annotation we used terms such as kinase activity protein phosphorylation cytoplasmto describe Abc2.

    Presentation of GO annotations: when created and why. What they are and how they are used. Also presentation of how GO is being used at TAIR.the basic structural and functional unit of all living organisms a device that delivers an electric current as the result of a chemical reactiona room where a prisoner is keptany small compartment (eg. cells of a honeycomb)a small unit serving as part of or as the nucleus of a larger political movementA cell can be a whole organism or a part of itPresentation of GO annotations: when created and why. What they are and how they are used. Also presentation of how GO is being used at TAIR.Presentation of GO annotations: when created and why. What they are and how they are used. Also presentation of how GO is being used at TAIR.Presentation of GO annotations: when created and why. What they are and how they are used. Also presentation of how GO is being used at TAIR.Presentation of GO annotations: when created and why. What they are and how they are used. Also presentation of how GO is being used at TAIR.GO is the designation of a project as well as the product of the project.

    A gene product might be associated with or located in one or more cellular components; it is active in one or more biological processes, during which it performs one or more molecular functions. For example, the gene product cytochrome c can be described by the molecular function term oxidoreductase activity, the biological process terms oxidative phosphorylation and induction of cell death, and the cellular component terms mitochondrial matrix and mitochondrial inner membrane.

    Starting with the cellular level, we are not distinguishing cell types, organs,etc.

    Gene Ontology is a collaboration between the fly (FlyBase), mouse (MGD) genome databases, and yeast (SGD). All three groups had started independent projects to produce controlled vocabularies for the biology of their organisms.

    You will all be familiar with hierarchical system to classify enzymes (EC) or functions (YPD, SwissPROT, MIPS, ).

    We have divided our project into the creation of three ontologies. These are not necessarily hierarchical rather they can be a network of associations -- a directed acyclic graph (DAG).

    Process: cell cycle, nutrient transport, behavior, Function: alcohol dehydrogenase, Cellular Location: organelle, protein complex, subcellular compartmentGO is the designation of a project as well as the product of the project.

    Starting with the cellular level, we are not distinguishing cell types, organs,etc.

    Gene Ontology is a collaboration between the fly (FlyBase), mouse (MGD) genome databases, and yeast (SGD). All three groups had started independent projects to produce controlled vocabularies for the biology of their organisms.

    You will all be familiar with hierarchical system to classify enzymes (EC) or functions (YPD, SwissPROT, MIPS, ).

    We have divided our project into the creation of three ontologies. These are not necessarily hierarchical rather they can be a network of associations -- a directed acyclic graph (DAG).

    Process: cell cycle, nutrient transport, behavior, Function: alcohol dehydrogenase, Cellular Location: organelle, protein complex, subcellular compartmentGO is the designation of a project as well as the product of the project.

    A gene product might be associated with or located in one or more cellular components; it is active in one or more biological processes, during which it performs one or more molecular functions. For example, the gene product cytochrome c can be described by the molecular function term oxidoreductase activity, the biological process terms oxidative phosphorylation and induction of cell death, and the cellular component terms mitochondrial matrix and mitochondrial inner membrane.

    Starting with the cellular level, we are not distinguishing cell types, organs,etc.

    Gene Ontology is a collaboration between the fly (FlyBase), mouse (MGD) genome databases, and yeast (SGD). All three groups had started independent projects to produce controlled vocabularies for the biology of their organisms.

    You will all be familiar with hierarchical system to classify enzymes (EC) or functions (YPD, SwissPROT, MIPS, ).

    We have divided our project into the creation of three ontologies. These are not necessarily hierarchical rather they can be a network of associations -- a directed acyclic graph (DAG).

    Process: cell cycle, nutrient transport, behavior, Function: alcohol dehydrogenase, Cellular Location: organelle, protein complex, subcellular compartmentGO is the designation of a project as well as the product of the project.

    Starting with the cellular level, we are not distinguishing cell types, organs,etc.

    Gene Ontology is a collaboration between the fly (FlyBase), mouse (MGD) genome databases, and yeast (SGD). All three groups had started independent projects to produce controlled vocabularies for the biology of their organisms.

    You will all be familiar with hierarchical system to classify enzymes (EC) or functions (YPD, SwissPROT, MIPS, ).

    We have divided our project into the creation of three ontologies. These are not necessarily hierarchical rather they can be a network of associations -- a directed acyclic graph (DAG).

    Process: cell cycle, nutrient transport, behavior, Function: alcohol dehydrogenase, Cellular Location: organelle, protein complex, subcellular compartmentGO is the designation of a project as well as the product of the project.

    Starting with the cellular level, we are not distinguishing cell types, organs,etc.

    Gene Ontology is a collaboration between the fly (FlyBase), mouse (MGD) genome databases, and yeast (SGD). All three groups had started independent projects to produce controlled vocabularies for the biology of their organisms.

    You will all be familiar with hierarchical system to classify enzymes (EC) or functions (YPD, SwissPROT, MIPS, ).

    We have divided our project into the creation of three ontologies. These are not necessarily hierarchical rather they can be a network of associations -- a directed acyclic graph (DAG).

    Process: cell cycle, nutrient transport, behavior, Function: alcohol dehydrogenase, Cellular Location: organelle, protein complex, subcellular compartmentPlant structure

    A controlled vocabulary of botanical terms describing morphological and anatomical structures representing organ, tissue and cell types and their relationships.

    stamen, petal, guard cell etc.

    Growth and developmental stages

    A controlled vocabulary of terms describing (i) whole plant growth stages and (ii) plant structure developmental stages.

    seedling growth, rosette growth, leaf development stages, embryo development stages, flower development stages, etc.

    The Plant Journal (2006), 47: 701AT5g27620Diverse phosphoregulatory mechanisms controlling cyclin-dependent kinase-activating kinases in Arabidopsisach annotation must also include an evidence code to indicate how the annotation to a particular term is supported. ----TAIR has begun to annotate Arabidopsis gene products using controlled vocabularies since 2002. This is an ongoing effort. As of December this year, we have made about over 100 thousand GO annotation.----This figure shows that 59% of the Arabidopsis genes have at least one Molecular Function term annotation. Unknown means that a curator has looked at the relevant literature but could not capture GO information. Unannotated means these loci have not been looked by a curator.

    As of December this year, we have made about over 100 thousand GO annotations. This is a breakdown of the annotation numbers: they breakdown about equally between three ontologies. These annotations were made to a total number of 28, 795 distinct arabidopsis gene. Total number of go terms: 3102. 50 thousand ----TAIR has begun to annotate Arabidopsis gene products using controlled vocabularies since 2002. This is an ongoing effort. As of December this year, we have made about over 100 thousand GO annotation.----This figure shows that 59% of the Arabidopsis genes have at least one Molecular Function term annotation. Unknown means that a curator has looked at the relevant literature but could not capture GO information. Unannotated means these loci have not been looked by a curator.

    As of December this year, we have made about over 100 thousand GO annotations. This is a breakdown of the annotation numbers: they breakdown about equally between three ontologies. These annotations were made to a total number of 28, 795 distinct arabidopsis gene. Total number of go terms: 3102. 50 thousand