introduction to the go: a user’s guide ncsu go workshop 29 october 2009
TRANSCRIPT
![Page 1: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/1.jpg)
Introduction to the GO:a user’s guide
NCSU GO Workshop
29 October 2009
![Page 2: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/2.jpg)
Genomic Annotation Genome annotation is the process of
attaching biological information to genomic sequences. It consists of two main steps:
1. identifying functional elements in the genome: “structural annotation”
2. attaching biological information to these elements: “functional annotation”
biologists often use the term “annotation” when they are referring only to structural annotation
![Page 3: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/3.jpg)
CHICK_OLF6
DNA annotation
Protein annotation
Data from Ensembl Genome browser
TRAF 1, 2 and 3 TRAF 1 and 2
Structural annotation:
![Page 4: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/4.jpg)
catenin
Functional annotation:
![Page 5: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/5.jpg)
Structural & Functional AnnotationStructural Annotation: Open reading frames (ORFs) predicted during genome
assembly predicted ORFs require experimental confirmation the Sequence Ontology (SO) provides a structured controlled
vocabulary for sequence annotation
Functional Annotation: annotation of gene products = Gene Ontology (GO)
annotation initially, predicted ORFs have no functional literature and GO
annotation relies on computational methods (rapid) functional literature exists for many genes/proteins prior to
genome sequencing GO annotation does not rely on a completed genome
sequence!
![Page 6: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/6.jpg)
Introduction to GO1. Bio-ontologies
2. the Gene Ontology (GO) a GO annotation example GO evidence codes literature biocuration & computation analysis ND vs no GO sources of GO
3. Using the GO
4. The gene association file
![Page 7: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/7.jpg)
1. Bio-ontologies
![Page 8: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/8.jpg)
Bio-ontologies Bio-ontologies are used to capture biological
information in a way that can be read by both humans and computers.necessary for high-throughput “omics” datasetsallows data sharing across databases
Objects in an ontology (eg. genes, cell types, tissue types, stages of development) are well defined.
The ontology shows how the objects relate to each other.
![Page 9: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/9.jpg)
Bio-ontologies:http://www.obofoundry.org/
![Page 10: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/10.jpg)
Ontologies
digital identifier(computers)
description(humans)
relationships between terms
![Page 11: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/11.jpg)
2. The Gene Ontology
![Page 12: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/12.jpg)
Functional Annotation Gene Ontology (GO) is the de facto method
for functional annotation Widely used for functional genomics (high
throughput) Many tools available for gene expression
analysis using GO The GO Consortium homepage:
http://www.geneontology.org
![Page 13: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/13.jpg)
GO Mapping Example
NDUFAB1 (UniProt P52505)Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa
Biological Process (BP or P)GO:0006633 fatty acid biosynthetic process TASGO:0006120 mitochondrial electron transport, NADH to ubiquinone TASGO:0008610 lipid biosynthetic process IEA
Cellular Component (CC or C)GO:0005759 mitochondrial matrix IDAGO:0005747 mitochondrial respiratory chain complex I IDAGO:0005739 mitochondrion IEA
NDUFAB1
Molecular Function (MF or F)GO:0005504 fatty acid binding IDAGO:0008137 NADH dehydrogenase (ubiquinone) activity TASGO:0016491 oxidoreductase activity TASGO:0000036 acyl carrier activity IEA
![Page 14: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/14.jpg)
GO Mapping Example
NDUFAB1 (UniProt P52505)Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa
Biological Process (BP or P)GO:0006633 fatty acid biosynthetic process TASGO:0006120 mitochondrial electron transport, NADH to ubiquinone TASGO:0008610 lipid biosynthetic process IEA
Cellular Component (CC or C)GO:0005759 mitochondrial matrix IDAGO:0005747 mitochondrial respiratory chain complex I IDAGO:0005739 mitochondrion IEA
NDUFAB1
Molecular Function (MF or F)GO:0005504 fatty acid binding IDAGO:0008137 NADH dehydrogenase (ubiquinone) activity TASGO:0016491 oxidoreductase activity TASGO:0000036 acyl carrier activity IEA
aspect or ontologyGO:ID (unique)
GO term nameGO evidence code
![Page 15: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/15.jpg)
GO Mapping Example
NDUFAB1 (UniProt P52505)Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa
Biological Process (BP or P)GO:0006633 fatty acid biosynthetic process TASGO:0006120 mitochondrial electron transport, NADH to ubiquinone TASGO:0008610 lipid biosynthetic process IEA
Cellular Component (CC or C)GO:0005759 mitochondrial matrix IDAGO:0005747 mitochondrial respiratory chain complex I IDAGO:0005739 mitochondrion IEA
NDUFAB1
Molecular Function (MF or F)GO:0005504 fatty acid binding IDAGO:0008137 NADH dehydrogenase (ubiquinone) activity TASGO:0016491 oxidoreductase activity TASGO:0000036 acyl carrier activity IEA
GO EVIDENCE CODESDirect Evidence CodesIDA - inferred from direct assayIEP - inferred from expression patternIGI - inferred from genetic interactionIMP - inferred from mutant phenotypeIPI - inferred from physical interaction
Indirect Evidence Codesinferred from literatureIGC - inferred from genomic contextTAS - traceable author statementNAS - non-traceable author statementIC - inferred by curatorinferred by sequence analysisRCA - inferred from reviewed computational analysisIS* - inferred from sequence*IEA - inferred from electronic annotation
OtherNR - not recorded (historical)ND - no biological data available
ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model
![Page 16: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/16.jpg)
GO Mapping Example
NDUFAB1
GO EVIDENCE CODESDirect Evidence CodesIDA - inferred from direct assayIEP - inferred from expression patternIGI - inferred from genetic interactionIMP - inferred from mutant phenotypeIPI - inferred from physical interaction
Indirect Evidence Codesinferred from literatureIGC - inferred from genomic contextTAS - traceable author statementNAS - non-traceable author statementIC - inferred by curatorinferred by sequence analysisRCA - inferred from reviewed computational analysisIS* - inferred from sequence*IEA - inferred from electronic annotation
OtherNR - not recorded (historical)ND - no biological data available
ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model
Biocuration of literature• detailed function • “depth”• slower (manual)
![Page 17: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/17.jpg)
P05147
PMID: 2976880
Find a paperabout the protein.
Biocuration of Literature:detailed gene function
![Page 18: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/18.jpg)
Read paper to get experimental evidence of function
Use most specific termpossible
experiment assayed kinase activity:use IDA evidence code
![Page 19: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/19.jpg)
GO Mapping Example
NDUFAB1
GO EVIDENCE CODESDirect Evidence CodesIDA - inferred from direct assayIEP - inferred from expression patternIGI - inferred from genetic interactionIMP - inferred from mutant phenotypeIPI - inferred from physical interaction
Indirect Evidence Codesinferred from literatureIGC - inferred from genomic contextTAS - traceable author statementNAS - non-traceable author statementIC - inferred by curatorinferred by sequence analysisRCA - inferred from reviewed computational analysisIS* - inferred from sequence*IEA - inferred from electronic annotation
OtherNR - not recorded (historical)ND - no biological data available
ISS - inferred from sequence or structural similarity ISA - inferred from sequence alignment ISO - inferred from sequence orthology ISM - inferred from sequence model
Biocuration of literature• detailed function • “depth”• slower (manual)
Sequence analysis• rapid (computational)• “breadth” of coverage • less detailed
![Page 20: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/20.jpg)
Unknown Function vs No GO ND – no data
Biocurators have tried to add GO but there is no functional data available
Previously: “process_unknown”, “function_unknown”, “component_unknown”
Now: “biological process”, “molecular function”, “cellular component”
No annotations (including no “ND”): biocurators have not annotated this is important for your dataset: what % has
GO?
![Page 21: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/21.jpg)
1. Primary sources of GO: from the GO Consortium (GOC) & GOC members
most up to date most comprehensive
2. Secondary sources: other resources that use GO provided by GOC members
public databases (eg. NCBI, UniProtKB) genome browsers (eg. Ensembl) array vendors (eg. Affymetrix) GO expression analysis tools
Sources of GO
![Page 22: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/22.jpg)
Different tools and databases display the GO annotations differently.
Since GO terms are continually changing and GO annotations are continually added, need to know when GO annotations were last updated.
![Page 23: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/23.jpg)
EXAMPLES: public databases (eg. NCBI, UniProtKB) genome browsers (eg. Ensembl) array vendors (eg. Affymetrix)
CONSIDERATIONS: What is the original source? When was it last updated? Are evidence codes displayed?
Secondary Sources of GO annotation
![Page 24: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/24.jpg)
For more information about GO GO Evidence Codes:
http://www.geneontology.org/GO.evidence.shtml
gene association file information: http://www.geneontology.org/GO.format.annotation.shtml
tools that use the GO: http://www.geneontology.org/GO.tools.shtml
GO Consortium wiki: http://wiki.geneontology.org/index.php/Main_Page
![Page 25: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/25.jpg)
3. Using the GO
![Page 26: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/26.jpg)
Use GO Browsers for:
searching for GO terms searching for gene product annotation filtering sets of annotations and
downloading results creating/using GO slims
![Page 27: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/27.jpg)
GO Browsers QuickGO Browser (EBI GOA Project)
http://www.ebi.ac.uk/ego/Can search by GO Term or by UniProt ID Includes IEA annotations
AmiGO Browser (GO Consortium Project)http://amigo.geneontology.org/cgi-bin/amigo/g
o.cgiCan search by GO Term or by UniProt IDDoes not include IEA annotations
![Page 28: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/28.jpg)
Use GO for……. Determining which classes of gene products
are over-represented or under-represented. Grouping gene products by biological
function. Relating a protein’s location to its function. Focusing on particular biological pathways
and functions (hypothesis-driven data interrogation).
![Page 29: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/29.jpg)
http://www.geneontology.org/
![Page 30: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/30.jpg)
![Page 31: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/31.jpg)
However…. many of these tools do not support non-model
organisms the tools have different computing requirements may be difficult to determine how up-to-date the
GO annotations are…
Need to evaluate tools for your system.
![Page 32: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/32.jpg)
Evaluating GO toolsSome criteria for evaluating GO Tools:1. Does it include my species of interest (or do I have to
“humanize” my list)?2. What does it require to set up (computer usage/online)3. What was the source for the GO (primary or secondary) and
when was it last updated?4. Does it report the GO evidence codes (and is IEA included)?5. Does it report which of my gene products has no GO?6. Does it report both over/under represented GO groups and
how does it evaluate this?7. Does it allow me to add my own GO annotations?8. Does it represent my results in a way that facilitates
discovery?
![Page 33: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/33.jpg)
4. gene association files
![Page 34: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/34.jpg)
The gene association (ga) file standard file format used to capture GO annotation
data tab-delimited file containing 15* fields of information:
Information about the gene product (database, accession, name, symbol, synonyms, species)
information about the function: GO ID, ontology, reference, evidence, qualifiers, context
(with/from) data about the functional annotation
date, annotator
* 2 additional fields will soon be added to capture information about isoforms and other ontologies.
![Page 35: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/35.jpg)
![Page 36: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/36.jpg)
(additional column added to this example)
![Page 37: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/37.jpg)
gene product information
![Page 38: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/38.jpg)
metadata: when & who
![Page 39: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/39.jpg)
function information
![Page 40: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/40.jpg)
![Page 41: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/41.jpg)
![Page 42: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/42.jpg)
![Page 43: Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009](https://reader035.vdocuments.site/reader035/viewer/2022062222/5697bfa51a28abf838c97e60/html5/thumbnails/43.jpg)
Gene association files GO Consortium ga files
many organism specific files also includes EBI GOA files
EBI GOA ga files UniProt file contains GO annotation for all species
represented in UniProtKB AgBase ga files
organism specific files AgBase GOC file – submitted to GO Consortium & EBI
GOA AgBase Community file – GO annotations not yet
submitted or not supported all files are quality checked