"the opera of phantome": phage annotation tools at the 20th biennial evergreen...
DESCRIPTION
Tools and Methods developed under the PhAnToMe (http://www.phantome.org) project between 2009-2012 using the Subsystems Technology, the SEED (http://theseed.org) environment, and RAST server (http://rast.nmpdr.org) Third presentation at the Phage Genomics Workshop at the 20th Biennial Evergreen International Phage MeetingTRANSCRIPT
The Opera of PhAnToMe
Ramy K. Aziz (Twitter: @azizrk)Aug 04 2013
opus (LT) = work (Pl. opera)
The environment, the toolbox, and the community
Phage Genomics Workshop, Evergreen 2013
08/04/2013
Past,
Phage Genomics - Evergreen 2013
NSF-funded, 3-year project (09-12) to develop
PhageAnnotationTools andMethods
Four Centers:- SDSU, San Diego, CA- VCU, Richmond, VA- USF, St. Pete FL- UA, Tucson, AZ
http://www.phantome.org
08/04/2013
… present, ...
Phage Genomics - Evergreen 2013
?TBA
08/04/2013
… and future
Phage Genomics - Evergreen 2013
Aims• Direct
– Discuss the concepts behind RAST– Quickly preview several tools developed under (or
under influence of) the PhAnToMe project– Demonstrate online, community annotation using
SEED
• Indirect {hidden agenda ;)}– PhAnToMe 2.0?– Establish community annotation efforts/
crowdsourcing– Seek Funding? Crowdfunding?
08/04/2013 Phage Genomics - Evergreen 2013
Outline• The environment (the SEED)
– The SEED and the ‘Subsystems Technology’
• The toolbox (PhAnToMe and sequels)– PHAST and RAST– PhACTS– PhiSPy– iVireons
• The community– Online annotation process – Annotation jamboree(s)– Course design
08/04/2013 Phage Genomics - Evergreen 2013
$$
Writing proposals, applying for grants
I. THE ENVIRONMENTThe Opera of PhAnToMe
Phage Genomics - Evergreen 2013
I. The Environment: SEED
http://theseed.org
08/04/2013
Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. doi:10.1371/journal.pone.0048053
Phage Genomics - Evergreen 2013
SEED: Main concept
One genome
All genomes
08/04/2013 Phage Genomics - Evergreen 2013
SEED: Main concept
One genome
All genomes
08/04/2013 Phage Genomics - Evergreen 2013
“Subsystems-based technologies were developed in the SEED with the view that the interpretation of one genome can be made more efficient and consistent if hundreds of genomes are simultaneously annotated in one subsystem at a time”
Aziz RK,, et al. (2012) PLoS ONE 7(10): e48053. doi:10.1371/journal.pone.0048053
SEED: Main concept• Protein-based database
Jargon: PEG = protein-encoding gene
• The subsystems approach
and• FIGfams: protein families based on
– sequence similarity– chromosomal co-occurrence, gene order,
synteny– human curation, evidence-based expert
assertions08/04/2013 Phage Genomics - Evergreen 2013
RAST: automated annotation
08/04/2013 Phage Genomics - Evergreen 2013
08/04/2013
What is a subsystem?• “A subset of functional roles studied across genomes”• A spreadsheet where:
– each row represents a genome– each column represents a functional role/ feature/ protein– different patterns = variants
Function 1 Function 2 … Function n
Genome a
Genome b
…
Genome z
Phage Genomics - Evergreen 2013
08/04/2013
What is a subsystem?
Phage Genomics - Evergreen 2013
Advantages of subsystems
Subsystems-basedannotation
08/04/2013 Phage Genomics - Evergreen 2013
Annotation Reconstruction
from genome from metagenome
08/04/2013 Phage Genomics - Evergreen 2013
Incomplete
frameshift
- complete- accurate
Credit: Andrew Kropinski Credit: Bas Dutilh
faulty assembly
Annotation Reconstruction
from genome from metagenome
08/04/2013
Incomplete faulty assembly
frameshift
- complete- accurate
Phage Genomics - Evergreen 2013
Credit: Andrew Kropinski Credit: Bas Dutilh
II. THE TOOLBOXThe Opera of PhAnToMe
Phage Genomics - Evergreen 2013
II. PhAnToMe ToolBoxhttp://www.phantome.org
08/04/2013 Phage Genomics - Evergreen 2013
II. The ToolBox: RAST• (At least) Four ways to annotate a genome via
RAST:
– myRAST (local)
• uses the server but you can edit offline)
– RAST (http://rast.nmpdr.org)
• annotates online, saves your genome on server
– “PhAST” (http://www.phantome.org/PhageSeed/Phage.cgi?page=phast)
• optimized gene-calling
– Use your favorite gene caller then upload gbk file to RAST
08/04/2013 Phage Genomics - Evergreen 2013
http://rast.nmpdr.org
08/04/2013 Phage Genomics - Evergreen 2013
phiRAST complaints• ORF/Gene calling
• tRNA– bug fixed, but still follow Andrew’s advice
• Too many hypotheticals, etc. – manual annotation, see later
– need for expert annotations, community contribution
– funding
08/04/2013 Phage Genomics - Evergreen 2013
“PhAST”: some improvement?
08/04/2013 Phage Genomics - Evergreen 2013
“PhAST”: some improvement?
08/04/2013 Phage Genomics - Evergreen 2013
PHAST: Disambiguation
08/04/2013 Phage Genomics - Evergreen 2013
Other tools• PHACTS:
– classifies and predicts lifestyle
• PhiSpy: – finds prophages
• iVireons– predicts phage structural proteins, holins,
more to come
08/04/2013 Phage Genomics - Evergreen 2013
II. The ToolBox: PHACTS• PHAge Classification Tool Set
• Uses a novel similarity algorithm and a supervised Random Forest classifier to predict whether the lifestyle of a phage, described by its proteome, is virulent or temperate.
• The similarity algorithm creates a training set from phages with known lifestyles and along with the lifestyle annotation, trains a Random Forest to classify the lifestyle of a phage.
• PHACTS predictions have had a 99% precision rate.
08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair
PHACTS
• Out of the 227 phages with a known lifestyle, PHACTS was able to confidently and correctly calculate the lifestyle of 197 phages.
• Only 2 phages were predicted confidently wrong: The two phages that were confidently incorrectly classified were both virulent phages that contained a functional integrase
08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair
PHACTS• http://www.phantome.org/PHACTS/
• Other applications• Host prediction: whether a phage infects a Gram
positive or Gram negative bacteria• Taxonomy prediction: a phage’s Family
08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair
PHACTS
08/04/2013 Phage Genomics - Evergreen 2013 Kate McNair
II. The ToolBox: PhiSpy
Calculate genomic characteristics
Classifyprophage region
Evaluate predicted prophages
• Transcriptional Strand Orientation• Customized AT skew• Customized GC skew• Protein length • Abundance of Phage words
• Random Forest• Pre calculated training genome• Input bacterial genome
• Produce a rank for each gene
• Phage insertion points• Similarity of phage proteins
08/04/2013 Phage Genomics - Evergreen 2013 Sajia Akhter
PhiSpy
• Performance comparison in 50 complete bacterial genomes
Applications %Identified %FN %FP
Prophinder 89% 11% 12%
Phage_finder 82% 18% 1.33%
PhiSpy 94% 6% 0.66%
08/04/2013 Phage Genomics - Evergreen 2013 Sajia Akhter
• Download: PhiSpy – http://sourceforge.net/projects/phispy
• PhiSpy is on Kbase– http://kbase.science.energy.gov
• Web version under final development
• Ran PhiSpy on 4,335 bacterial genomes
• Predicted 12,826 prophages in 3,203 genomes
– 9,101 known prophages
– 3,723 undefined prophages08/04/2013 Phage Genomics - Evergreen 2013
PhiSpy
Sajia Akhter
iVIREONS – http://vdm.sdsu.edu/ivireons
Victor Seguritan
Victor Seguritan
Application of Artificial Neural Networks (ANNs)
to Viral Dark Matter
Viral Hypothetical Protein Sequences
Known
eval <= 0.001
Conserved Domain DB (rpsblast)
Keep sequences ≥ 200 aa
no hit OR e-value > 0.001
no hit OR e-value > 0.001
eval <= 0.001
Reference Sequence DB(tblastp)
Artificial Neural Networks (ANNs)
Remove ≥ 80% identical sequences
Synthesize ANN-predicted Hypothetical Protein Genes
Clone in E.coli
Purification By Cobalt Affinity
Validation by TEM or X-ray Crystallography
08/04/2013 Phage Genomics - Evergreen 2013
“FAMILIES” OF ANNs
1) General structural proteins:
2) Phage major capsid proteins
3) Phage tail/tail fibers/collar etc.
4) Holins
5) Portals
• Trained with all types of proteins• Both phages & viruses
08/04/2013 Phage Genomics - Evergreen 2013
Victor Seguritan
1
iVIREONS – http://vdm.sdsu.edu/ivireons
2Enter User Info
VibrioPhage
3Upload Sequences
Victor Seguritan
4 View Results
5Copy Results to a Spreadsheet
iVIREONS – http://vdm.sdsu.edu/ivireons
- Structural 1:1- MCP 1:1- MCP 2:1- MCP 3:1- MCP 4:1- MCP 7:1- MCP 22:1
(lambda)- Tail 1:1- Tail 2:1- Tail 4:1- Tail 7:1- Tail 6.6:1
(lambda)
Stringencies Reported
08/04/2013 Phage Genomics - Evergreen 2013
III. THE COMMUNITYThe Opera of PhAnToMe
Phage Genomics - Evergreen 2013
SEED allows continuous annotation
08/04/2013
SEED
RAST
GenomesSubsystems
SEED Viewer
New Genomes
Subsystems Editor
Phage Genomics - Evergreen 2013
SEED allows community annotation
08/04/2013 Phage Genomics - Evergreen 2013
Later in the meeting, • Who might be interested in putting
together:a) an outline for an annotation jamboree/
workshop with phage experts
b) a syllabus/outline for a course to get undergraduate/graduate students to annotate specific subsystems
c) a proposal to get funding for community annotation efforts
d) all above
08/04/2013 Phage Genomics - Evergreen 2013
POST SCRIPTUMThe Opera of PhAnToMe
08/04/2013 Phage Genomics - Evergreen 2013
Aims• Direct
– Discuss the concepts behind RAST– Quickly preview several tools developed under (or
under influence of) the PhAnToMe project– Demonstrate online, community annotation using
SEED
• Indirect {hidden agenda ;)}– PhAnToMe 2.0?– Establish community annotation efforts/
crowdsourcing– Seek Funding? Crowdfunding?
08/04/2013 Phage Genomics - Evergreen 2013
If you use, please cite• SEED, RAST, myRAST, phiRAST, PHAST:
– RAST, BMC Genomics 2008 and SEED servers: PLoS ONE 2011
• Other tools– PHAST: McNair et al. PMID: 22238260; PhiSpy: Akhter et al. PMID:
22584627; iVireons: Seguritan et al. PMID: 22927809
• Letters of support
08/04/2013 Phage Genomics - Evergreen 2013
AcknowledgmentsRobert A. Edwards, PhD
• PhiRAST development: Ross Overbeek, Robert Olson, Gordon Pusch, Terry Disz, Bruce Parrello
• Phage annotators (Phantomers): Bhakti Dwivedi, Mya Breitbart, et al.
• FIG and all SEED annotators:VeronikaV, SvetaG, OlgaV/Z, et al.
Sajia Akhter
08/04/2013
$$
Phage Genomics - Evergreen 2013
& NSF
$$& NSF
Acknowledgments
• PHAST
Victor Seguritan
08/04/2013
Katelyn McNair
• iVireons
Phage Genomics - Evergreen 2013