from functional genomics to physiological model: using the gene ontology fiona mccarthy, shane...

Post on 20-Jan-2016

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

From Functional Genomics to Physiological Model:

Using the Gene Ontology

Fiona McCarthy, Shane Burgess, Susan BridgesThe AgBase Databases, Institute of Digital Biology, Mississippi State University

From Functional Genomics to Physiological Model1. A user’s guide to the Gene Ontology

(GO)

2. Finding GO for farm animal species

3. Adding GO to your dataset

4. GO based tools for biological modeling

5. Examples: using GO for biological modeling

• Presentation available at AgBase• Websites available as handout

1. A User’s Guide to GO

What is the Gene Ontology?Emily Dimmer, GOA EBI:“a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing”

assign functions to gene products at different levels, depending on how much is known about a gene product

is used for a diverse range of species

structured to be queried at different levels, eg: find all the chicken gene products in the genome that are involved in

signal transduction zoom in on all the receptor tyrosine kinases

human readable GO function has a digital tag to allow computational analysis of large datasets

GO Mapping Example

NDUFAB1 (UniProt P52505)Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa

Biological Process (BP or P)GO:0006633 fatty acid biosynthetic process TASGO:0006120 mitochondrial electron transport, NADH to ubiquinone TASGO:0008610 lipid biosynthetic process IEA

Cellular Component (CC or C)GO:0005759 mitochondrial matrix IDAGO:0005747 mitochondrial respiratory chain complex I IDAGO:0005739 mitochondrion IEA

NDUFAB1

Molecular Function (MF or F)GO:0005504 fatty acid binding IDAGO:0008137 NADH dehydrogenase (ubiquinone) activity TASGO:0016491 oxidoreductase activity TASGO:0000036 acyl carrier activity IEA

GO Mapping Example

NDUFAB1 (UniProt P52505)Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa

Biological Process (BP or P)GO:0006633 fatty acid biosynthetic process TASGO:0006120 mitochondrial electron transport, NADH to ubiquinone TASGO:0008610 lipid biosynthetic process IEA

Cellular Component (CC or C)GO:0005759 mitochondrial matrix IDAGO:0005747 mitochondrial respiratory chain complex I IDAGO:0005739 mitochondrion IEA

NDUFAB1

Molecular Function (MF or F)GO:0005504 fatty acid binding IDAGO:0008137 NADH dehydrogenase (ubiquinone) activity TASGO:0016491 oxidoreductase activity TASGO:0000036 acyl carrier activity IEA

aspect or ontologyGO:ID (unique)

GO term nameGO evidence code

GO Mapping Example

NDUFAB1 (UniProt P52505)Bovine NADH dehydrogenase (ubiquinone) 1, alpha/beta subcomplex, 1, 8kDa

Biological Process (BP or P)GO:0006633 fatty acid biosynthetic process TASGO:0006120 mitochondrial electron transport, NADH to ubiquinone TASGO:0008610 lipid biosynthetic process IEA

Cellular Component (CC or C)GO:0005759 mitochondrial matrix IDAGO:0005747 mitochondrial respiratory chain complex I IDAGO:0005739 mitochondrion IEA

NDUFAB1

Molecular Function (MF or F)GO:0005504 fatty acid binding IDAGO:0008137 NADH dehydrogenase (ubiquinone) activity TASGO:0016491 oxidoreductase activity TASGO:0000036 acyl carrier activity IEA

GO EVIDENCE CODESDirect Evidence CodesIDA - inferred from direct assayIEP - inferred from expression patternIGI - inferred from genetic interactionIMP - inferred from mutant phenotypeIPI - inferred from physical interaction

Indirect Evidence Codesinferred from literatureIGC - inferred from genomic contextTAS - traceable author statementNAS - non-traceable author statementIC - inferred by curatorinferred by computational analysisRCA - inferred from reviewed computational analysisISS - inferred from sequence or structural similarityIEA - inferred from electronic annotation

OtherNR - not recorded (historical)ND - no biological data available

Unknown Function vs No GO ND – no data

Biocurators have tried to add GO but there is no functional data available

Previously: “process_unknown”, “function_unknown”, “component_unknown”

Now: “biological process”, “molecular function”, “cellular component”

No annotations (including no “ND”): biocurators have not annotated

2. Finding GO for Farm Animals

GO Browsers

QuickGO Browser (EBI GOA Project) http://www.ebi.ac.uk/ego/ Can search by GO Term or by UniProt ID Includes IEA annotations

AmiGO Browser (GO Consortium Project) http://amigo.geneontology.org/cgi-bin/amigo/go.cgi Can search by GO Term or by UniProt ID Does not include IEA annotations

Getting GO http://www.ebi.ac.uk/GOA/downloads.html

includes farm animals

Getting GO http://

www.geneontology.org/GO.current.annotations.shtml#filter

Getting GO http://www.agbase.msstate.edu/

3. Adding GO to your dataset

GO analysis of array data

Probe data is linked to gene product data gene, cDNA, ESTs IDs

For some arrays, gene product data has corresponding GO data available from vendor (updated?)

Not all gene products will have GO annotation will not be included in modeling

Need to get the maximum amount of GO data to do biological modeling

Example: Netaffx

Secondary source of GO annotation

GORetriever

+ many more

GORetriever

GORetriever Results

GORetriever Results

GORetriever Results

save as text fileFor GOSlimViewer

GORetriever Results

But what about IDs not supported by GORetriever?

GOanna

GOanna Results

query IDs are hyperlinked to BLAST data(files must be in the same directory)

*WHAT IS A GOOD ALIGNMENT?

If there is a good alignment* to a protein with GO transfer GO to your record

If there is not a good alignment or the record doesn’t have GO literature

good alignment

add to GO summary file(tab-delimited text file containing ID, GO:ID, aspect)

Contact AgBase to request GO annotation of specific gene products.

GOSlimViewer: summarizing results

GOSlimViewer results

response to stimulus

amino acid and derivative metabolic process

transport

behavior

cell differentiation

metabolic process

regulation of biological process

cell communication

nucleobase, nucleoside, nucleotide and nucleic acid metabolic process

cell death

cell motility

macromolecule metabolic process

multicellular organismal development

catabolic process

biological_process

response to stimulus

amino acid and derivative metabolic process

transport

behavior

cell differentiation

metabolic process

regulation of biological process

cell communication

nucleobase, nucleoside, nucleotide and nucleic acid metabolic process

cell death

cell motility

macromolecule metabolic process

multicellular organismal development

catabolic process

biological_process

“process unknown”“function unknown”“component unknown”

??

B-cells Stroma

immune response apoptosis

cell-cell signaling

Looking at function, not genesPie Graphs – relative proportions

GOModeler: quantitative, hypothesis-driven modeling.Coming soon (contact AgBase)

GOModeler

McCarthy et al “AgBase: a functional genomics resource for agriculture.” BMC Genomics. 2006 Sep 8;7:229.

4. GO based tools for biological modeling

http://www.geneontology.org/

However…. many of these tools do not support farm animal

species the tools have different computing requirements may be difficult to determine how up-to-date the

GO annotations are…

Need to evaluate tools for your system.

Evaluating GO toolsSome criteria for evaluating GO Tools:1. Does it include my species of interest (or do I have to

“humanize” my list)?2. What does it require to set up (computer usage/online)3. What was the source for the GO (primary or secondary) and

when was it last updated?4. Does it report the GO evidence codes (and is IEA included)?5. Does it report which of my gene products has no GO?6. Does it report both over/under represented GO groups and

how does it evaluate this?7. Does it allow me to add my own GO annotations?8. Does it represent my results in a way that facilitates

discovery?

5. Using GO for biological modeling

Using GO for biological modeling:

hypothesis generating hypothesis driven

top related