rob edwards depts of computer science and biology, san diego state university
DESCRIPTION
ASM Philadelphia, May 2009. How We Annotated Genomes for Free: Fast and Accurate Functional Analysis Using Subsystems Technology. Rob Edwards Depts of Computer Science And Biology, San Diego State University Mathematics and Computer Sciences Division, Argonne National Laboratory. - PowerPoint PPT PresentationTRANSCRIPT
How We Annotated Genomes for Free: Fast and Accurate Functional
Analysis Using Subsystems Technology
Rob EdwardsDepts of Computer Science And Biology,
San Diego State University
Mathematics and Computer Sciences Division, Argonne National Laboratory
ASM Philadelphia, May 2009
http://rast.nmpdr.org/?page=Conference
Pigeons
If it’s good enough for Google – it’s good enough for me
Annotation Servers
• Metagenomes– http://metagenomics.theseed.org
http://rast.nmpdr.org/?page=Conference
• Complete genomes– http://rast.nmpdr.org
Firstbacterial genome
100bacterial genomes
1,000bacterial genomes
Num
ber
of
know
n s
equence
s
Year
How much has been sequenced?
Environmentalsequencing
http://rast.nmpdr.org/?page=Conference
Everybody atan ASM meeting
Everybody inUSA
AllculturedBacteria
100people
How much will be sequenced?
One genome fromevery species
Most majormicrobial environments
http://rast.nmpdr.org/?page=Conference
The SEED Family
http://rast.nmpdr.org/?page=Conference
Subsystem Spreadsheet
Chaperone Subunit Usher Adhesin
S. enterica Enteritidis 2389 2388 2387 2386
E. coli HS 3068 3067 3066 3065
B. cenocepacia J2315 2604 2603 2602 2601
S. maltophilia 1085 1088 1087 1086
Over 1,000 Subsystems
Three level “hierarchy”
• Amino Acids and Derivatives– Alanine, serine, and glycine
• Serine Biosynthesis
• Amino Acids and Derivatives– Lysine, threonine, methionine, and cysteine
• Methionine Biosynthesis
Make your own subsystems!
http://rast.nmpdr.org/?page=Conference
Class # SS Class # SS
Amino Acids and Derivatives 56 Nucleosides and Nucleotides 14
Carbohydrates 97 Phosphorus Metabolism 6
Cell Division and Cell Cycle 10 Photosynthesis 9
Cell Wall and Capsule 50 Potassium metabolism 3
Clustering-based subsystems 193 Protein Metabolism 52
Cofactors, Vitamins, Pigments 43 RNA Metabolism 39
DNA Metabolism 30 Regulation and Cell signaling 23
Fatty Acids, Lipids, and Isoprenoids
22 Respiration 44
Membrane Transport 41 Secondary Metabolism 24
Metabolism of Aromatic Compounds
30 Stress Response 37
Motility and Chemotaxis 8 Sulfur Metabolism 12
Nitrogen Metabolism 11 Virulence 116
The Annotation Process
• Find the phylogenetic neighborhood of your genome
• Look for proteins that related organisms have– Core proteins– Subset of all subsystems
• Use those calls as a training set for critica/glimmer– Intrinsic training set!
http://rast.nmpdr.org/?page=Conference
This one’s for Gary
Automatic Metabolic Reconstruction
• Subsystem, GO, and KEGG connections– KEGG EC numbers– KEGG reaction numbers– SEED reaction numbers (Chris Henry)
• Metabolic flux models – Automatically generate FBA matrices (Aaron
Best/Matt DeJongh; Hope College)
http://rast.nmpdr.org/?page=Conference
The Populated Subsystem
http://rast.nmpdr.org/?page=Conference
Automatically Compare Metabolic Reconstructions
Find And Suggest Candidate Functions
• Rapidly correct missing annotations
• Add more members to subsystems
• Improves future genome annotations!(especially with new subsystems)
http://rast.nmpdr.org/?page=Conference
The Real Live Test
• 10 genomes submitted on Thursday at 6 pm
• First annotation complete before 8 am Friday
• Remaining annotations completed Friday before noon
• (there were others in the pipeline too!)
http://rast.nmpdr.org/?page=Conference
Subsystems Coverage
Genome Percent of Proteins in Subsystems
Haloferax denitrificans 20%
Haloferax mediterranei 19%
Haloferax sulfurifontis 19%
Haloferax volcanii DS2 19%
Haloarcula sp 33800 19%
Haloarcula sp 33799 18%
http://rast.nmpdr.org/?page=Conference
Prophages
PHANTOME
Mya Breitbart,
Matt Sullivan, Je
ff Elhai, Rob Edwards
NSF
Haloferax sulfurifontis prophage
Metagenome Comparisons
Metagenomics RAST has 300 public metagenomes
Compared using tblastx
http://rast.nmpdr.org/?page=Conference
Human Poop
High Salinity SalternsSaN Diego, July 2004
Thanks Beltran Rodriguez-Mueller, Mya Breitbart, & Forest Rohwer
Low salinity salterns High salinity salterns
July2004
Nov2005
Free workshops on NMPDR, RAST, mg-RAST, SEED
Contact Leslie McNeil [email protected]
or visithttp://www.nmpdr.org/
http://rast.nmpdr.org/?page=Conference
Acknowledgements
Environmental GenomicsForest Rohwer Beltran Rodriguez-Mueller
Annotation ServersRick StevensRoss OverbeekFolker MeyerBob Olson
Daniel Paarman Mark D'Souza
Jared Wilkening Andreas Wilke
FIGRoss OverbeekVeronika VonsteinAnnotators
ArtistPaula Morris
http://rast.nmpdr.org/?page=Conference