from sequence to knowledge (phage genomics workshop intro at the 22nd biennial evergreen phage...
TRANSCRIPT
From Sequence to Knowledge
Assembly, Annotation, and Analysis
of Phage genomes from Genomic
and Metagenomic Data Sets
A helping hand through
The Annotation Bottleneck
Ramy K. Aziz
Workshop presenters
6 Aug 2017 Phage Genomics - Evergreen 2017
Alejandro Reyes
AR
Ramy Aziz
RAJason Gill
JG
A bit of history…
• Since 2009, the Genomics Workshop has
become an essential part of the Evergreen
phage meeting
• The challenge always is: how to meet
needs/expectations that are so many and
so diverse, in ~4 hours
6 Aug 2017 Phage Genomics - Evergreen 2017
A bit of history…
• Since 2009, the Genomics Workshop has
become an essential part of the Evergreen
phage meeting
• The challenge always is: how to meet
needs/expectations that are so many and
so diverse, in ~4 hours
• The answer is:
…….
6 Aug 2017 Phage Genomics - Evergreen 2017
“The analysis bottleneck”
• Observation:
– We generate more data than we can analyze.
– We generate sequence data faster than
we can analyze them.
• Opinion:
– Not all bottlenecks are
created equal!
– It is important to define the question(s)
before working on the answer(s)!6 Aug 2017 Phage Genomics - Evergreen 2017
Workshop audience
• Who (how many) among you have:
– annotated at least a phage genome?
– worked on a viral metagenome?
– used the command line (Unix, Linux, Mac
Terminal) for sequence analysis?
• We have actually ran an online survey,
and here is what we found …
6 Aug 2017 Phage Genomics - Evergreen 2017
Quick group activity
Defining the question(s):
• Introduce yourself, your institution, and your
favorite phage
• Do you have a genome sequenced? Planning to?
– Why have you sequenced your phage genome?
– Why you want to sequence your phage genome?
• What is the single most pressing question you
want to have answered from genome analysis?
6 Aug 2017 Phage Genomics - Evergreen 2017
What you want …... isfrom genome from metagenome
6 Aug 2017 Phage Genomics - Evergreen 2017
Incomplete
frameshift
- complete
- accurate
Credit: Andrew Kropinski Credit: Bas Dutilh
faulty assembly
What you want …... isfrom genome from metagenome
6 Aug 2017
Incomplete faulty assembly
frameshift
- complete
- accurate
Phage Genomics - Evergreen 2017
Credit: Andrew Kropinski Credit: Bas Dutilh
A process of reconstruction
• Experimentally
6 Aug 2017 Phage Genomics - Evergreen 2017
DNA
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
A process of reconstruction
• Experimentally
• Computationally
6 Aug 2017 Phage Genomics - Evergreen 2017
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
DNA
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
A process of reconstruction
• Experimentally
• Computationally
6 Aug 2017 Phage Genomics - Evergreen 2017
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
“Any phage
one can get!”
“eDNA”
TGATTGTGTGTTTGCGCAATGCG
ATGTGTATATATAGTGAGCTTGCCC
GTCTCTCTNNNTCTCTTG
TGATTGGTCTNNNTCTCTTGCGCAATGCG
Assembly
Gene finding/
ORF calling
tRNA calling
Annotation
(Assigning
functions)
orienting
Validation
Fixing frameshifts
Introns and Inteins Subsystem
assignment
Refinement/
Secondary
annotation
loop
Special purpose:
toxins, morons, integrases,
lifestyle prediction
Regulatory elements
(promoters, terminators)
Output: files and graphics
From Sequence to Knowledge
From raw sequence data to
genome submission/ publication
Classification
• The phage sequence space (Lima-Mendez et al.)
• The phage proteomic tree (Edwards & Rohwer)
• New: VIP tree http://www.genome.jp/viptree
6 Aug 2017 Phage Genomics - Evergreen 2017
This workshop: outline
1. Annotation overview
2. Automated tools for genome annotation:
– PhAnToMe/RAST related tools
– Galaxy/ Apollo
3. Tools for metagenome-based analyses
– Assembly
– Functional prediction via protein families
6 Aug 2017 Phage Genomics - Evergreen 2017
Where to go from here?
• Part I:
General introduction of genome annotation
• Part II:
Two levels
– Level 1: Novices and beginners:
Automated annotation tools
– Level 2: Intermediate to advanced users:
Command-line based tools
6 Aug 2017 Phage Genomics - Evergreen 2017
Online resources/ Slideshare• Data & links:
– http://egybio.net/tutorial
• Slides
– http://bit.ly/annotation2016
– http://bit.ly/phantome4
– Old tutorials (more detailed, but missing latest ):
• Evergreen 2011: http://slidesha.re/phantome1
• http://slidesha.re/phiRAST1 (by Karin Holmfeldt)
• Evergreen 2013: http://bit.ly/phantome2
• Evergreen 2015: http://bit.ly/phantome3
6 Aug 2017 Phage Genomics - Evergreen 2017