Modifying GO
How changes are made to GO, and how you can be involved
Modifying GO:
Why we have to make changes to GO How changes are made to GO The GO editorial office How you can make changes to GO
SourceForge Interest groups Content meetings
Why we have to make changes to GO Why not just keep it the same?
Why we have to make changes to GO Why not just keep it the same?
annotations would never need fixing
Why we have to make changes to GO Why not just keep it the same?
annotations would never need fixing wouldn’t need an editorial office
Why we have to make changes to GO Because GO reflects current knowledge
of biology
Why we have to make changes to GO Because GO reflects current knowledge
of biology biology always changing
Why we have to make changes to GO Because GO reflects current knowledge
of biology biology always changing
New organisms being added makes existing terms arrangements incorrect we generally only add terms in response to
annotation needs if organism not being annotated, terms won’t
be present in GO
Why we have to make changes to GO Because GO reflects current knowledge of
biology biology always changing
New organisms being added makes existing terms arrangements incorrect we generally only add terms in response to
annotation needs if organism not being annotated, terms won’t be
present in GO Not everything perfect from the outset
Growth of GOGO term history 2001 - 2005
0
5000
10000
15000
20000
25000
Jan-01Mar-01May-01Jul-01Sep-01Nov-01Jan-02Mar-02May-02Jul-02Sep-02Nov-02Jan-03Mar-03May-03Jul-03Sep-03Nov-03Jan-04Mar-04May-04Jul-04Sep-04Nov-04Jan-05Mar-05
Date
Number of terms
defined terms
undefined terms
obsoletes
Evolution of GO
Original GO FlyBase (Drosophila) MGI (Mouse) SGD (S. cerevisae)
Later TAIR (Arabadopsis) TIGR (microbes including prokaryotes) SWISS-PROT (several thousand species inc. human) PSU (P. falciparum)
Recent additions PAMGO (plant pathogens)
Example - parasites
Original GO:
Example - parasites
Annotation of P. falciparum protozoan cellular parasite intracellular infection (erythrocytes)
Parasite proteins located in host nucleus What cellular component term to
annotate to? ‘nucleus’ refers to parasite nucleus when
annotating parasite
Example - parasites
Added new term ‘host’:
Example - parasites
parasite gene products located in host nucleus annotated here
parasite gene products located in parasite nucleus annotated here
Improving GO
Some parts of GO need expanding/improving In progress:
immunology cell cycle development fungal toxin metabolism
Still to do: transporters and transport signal transducer/signalling pathways
Improving GO - example
Interactions between organisms e.g. symbiosis, host/pathogen interaction,
biofilm formation
Not well covered in GO very few terms some inconsistencies
PAMGO developed node last year
Improving GO - example
Editing GO
Logistics file formats DAG-Edit cvs
Communication monthly reports diff emails updating annotations mailing lists
Editing GO - file formats
GO available in different formats: OBO flat file (terms & definitions only) GO flat file (terms & definitions only) XML (terms & definitions only) OWL (terms & definitions only) MySQL (terms, definitions & annotations)
OBO flat file primary editing format for ontologies
GO file formats
Different formats different update times: OBO flat file: every 30 minutes GO flat file, XML, OWL: daily MySQL: weekly without IEAs, monthly with IEAs
AmiGO runs from MySQL database so will not show new terms immediately
QuickGO updates weekly
Editing GO - DAG-Edit
Generic ontology editing tool developed by GO consortium
Java-based stand-alone tool Used to do almost all ontology edits demo Downloading DAG-Edit:http://sourceforge.net/project/showfiles.php?group_id=36855 DAG-Edit help:http://www.godatabase.org/dev/java/dagedit/docs/index.html
Editing GO example: adding new term Suggestion of new term from annotator:
Check whether term exists under another name
search terms and synonyms Determine if valid GO term
e.g. disease process, individual gene products not allowed
Decide on placement in ontology
Editing GO example: adding new term Write definition if not provided
from biological dictionaries, experts, papers, online sources
some types of terms e.g. metabolism have standardised definitions, see:
http://www.geneontology.org/GO.function.guidelines.shtml?all#
defshttp://www.geneontology.org/GO.process.guidelines.shtml
Add term with new id in DAG-Edit Inform annotator of new term name and id
Editing GO - cvs
Amended ontology file committed to cvs (concurrent versions system) repository located at Stanford
cvs prevents changes being overwritten by other editors
Allows files to be reverted to former versions log files detailing changes Anonymous cvs available:http://www.geneontology.org/GO.downloads.shtml?all#cvs
Editing GO - monthly reports
Every month a full report released with all changes made to ontologies that month:
http://www.geneontology.org/MonthlyReports/ Generated with set of Perl scripts available on
GO FTP site Includes:
new terms term name changes new definitions term movements term obsoletions SF items closed overall statistics
Editing GO - diff files
Daily email with all changes made to file that day
Subscribe to go-diff mailing list:http://www.geneontology.org/GO.mailing.lists.shtml?all#
godiff
Example diff:
Mailing lists
Several GO mailing lists: GO (main)
discussion of ontology development general queries/error reporting high-traffic
GO-friends mainly announcements low-traffic
annotation all annotation issues
GO-diff GO-database
all database/techy issues
Mailing lists
All mailing lists archived:http://www.geneontology.org/
GO.contents.archives.mail.shtml?all
Subscribe:http://www.geneontology.org/GO.mailing.lists.shtml
Interest group mailing lists
Editing GO - updating annotations
Annotations become out-of-sync with ontologies term name changes term obsoletions term merges
Databases have individual strategies for flagging
2 week notice given on obsoletions, provided no objections
Example email:http://www.geneontology.org/email-go/go-arc/go-2005/0012.html
The GO editorial office
Located at European Bioinformatics Institute, Cambridge UK
Four full-time editors of the ontologies: Midori Harris Jane Lomax Amelia Ireland Jennifer Clark
The GO editorial office
Primary responsibility to edit ontologies in response to community needs
Also: website documentation outreach
GO in other systems new annotation groups
training
Requesting changes to GO
Curator requests tracker demo of how to add an item types of changes
new terms errors - tpvs obsolete terms
Interest groups Content meetings
Requesting changes to GO - curator requests tracker Web-based tracking system hosted at
SourceForge.net Tracker item for each new request or
question Allows requests/suggestions/comments to be
added by anyone Daily digest of new tracker items goes to GO
mailing list
Curator requests tracker
Requesting changes to GO - curator requests tracker Common different types of changes suggested:
new term requesthttps://sourceforge.net/tracker/index.php?
func=detail&aid=1207105&group_id=36855&atid=440764
reporting errorshttps://sourceforge.net/tracker/index.php?
func=detail&aid=1206995&group_id=36855&atid=440764
obsoletion/merge requestshttps://sourceforge.net/tracker/index.php?
func=detail&aid=1200109&group_id=36855&atid=440764
add synonymhttps://sourceforge.net/tracker/index.php?
func=detail&aid=1202748&group_id=36855&atid=440764
queries term move
Requesting changes to GO - curator requests tracker Obtaining a SourceForge account demo:https://sourceforge.net/
Requesting changes to GO - curator requests tracker Submitting a request demo:https://sourceforge.net/tracker/?
atid=440764&group_id=36855&func=browse
Requesting changes to GO - curator requests tracker Things to bear in mind when submitting a request:
Have you given us enough information? useful things to include are references to papers, name/id of gene
being annotated, EC numbers if you’re requesting an e.g. obsoletion or merge, have you put a
reason? Have you included a definition?
very useful where requests very organism-specific or if you’re an expert
source of definition, PubMed id, ISBN etc. Are there any synonyms you would like included for a new
term? What type? synonym types are exact, broader and narrower
Have you suggested parentage for a new term?
Tracker volume
average 65.9 new items/month
Tracker volume
24.5 days to complete on average
Other trackers
Listed at:http://www.geneontology.org/GO.sourceforge.links.shtml?
all#track annotation website integrity checks big ideas
Content meetings
Short meetings aimed at developing specific areas of GO ontology content proposals defined and discussed before
meeting small number of people invited experts specific topics
Next content meeting - November?
Possible topics for discussion: immunology transport/transporters cell cycle signal transduction/signal transducer activity response to/defense terms
Interest groups
Groups of experts for a specific topic e.g. development, cell cycle, plants
Includes GO curators/annotators and external experts
Communicate by mailing lists and at meetings
Interest groups
We actively encourage annotators to join interest groups for their field
Complete list of groups:http://www.geneontology.org/GO.interests.shtml?all
GO documentation
Much documentation:http://www.geneontology.org/GO.contents.doc.shtml
Acknowledgements:
The GO Consortium