fungal its meeting presentation
DESCRIPTION
My talk from the Fungal ITS meeting in Boulder, Colorado (sponsored by the Sloan Foundation). Discussing metagenomic tools for fungal studies, and how we can increase support for fungal researchers within our computational pipelines being developed at UC Davis.TRANSCRIPT
Metagenomic+tools+for+the+fungal+community+
Holly+Bik,+UC+Davis+19+October+2012+
hAp://phylosiE.wordpress.com+
Explicitly+PhylogeneLc+Approaches+Aligned+environmental+sequences+
Guide+Tree+
EvoluLonary+Placement+of+short+reads+
+++++++++
We+provide:+• Support+for+Paired+End+(raw)+Illumina+data+• Marker+gene+data+for+Bacteria,+Archaea,+Eukaryotes,+Viruses+
• Taxonomy+assignments+based+on+probability+distribuLons+over+a+reference+phylogeny+
• Complement+to+exisLng+tools+–+QIIME/VAMPs+– Inputs/outputs+will+be+compaLble+for+use+with+other+soEware+tools+
Markers+
• PMPROK+–+Dongying+Wu’s+Bac/Arch+markers+• EukaryoLc+Orthologs+–+Parfrey+2011+paper+• 16S/18S+rRNA++• Mitochondria+_+protein_coding+genes+• Viral+Markers+–+Markov+clustering+on+genomes+• Codon+Subtrees+–+finer+scale+taxonomy+
• Extended+Markers+–+plasLds,+gene+families+
Reference+Marker+Genes+
The+Monkey+–+Build+Marker+Packages+
FastTree
hmmbuild (ssu-build)
Mapping'File'(sequence'name,'NCBI'taxon'ID)'
Reconcile'NCBI'taxonomy'IDs'with'phylogene?c'topology'
Execute'build_marker'mode'
Generate'unique'IDs'for'input'sequences'
Create'profile'HMMs'(or'CMs'for'rRNA'data)'using'input'sequences'
Alignment'File'(Marker'sequences'in'FASTA'format)'
Build'tree'and'collapse'topology'according'to'a'userMspecified'PD'cutoff'(e.g.'99%)''
Tree Reconciliation
Built Marker Packages
Index Marker Database
Clean'and'package'new'marker'genes'
New'marker'gene'packages'placed'into'shared'PhyloSiS'marker'directory'
Execute'index'mode'
Indexes'the'marker'databases'needed'for'LAST'and'Bow?e'
NOTE:'New'marker'packages'are'named'according'to'input'filenames'(e.g.'MarkerAlignment.fasta).'Core'marker'data'will'be'overwriXen'during'new'marker'builds'if'input'files'do'not'have'unique'names'compared'to'exis?ng'PhyloSiS'markers.'
Locally'indexed'marker'packages'will'not'interfere'with'automa?c'updates'to'PhyloSiS'core'markers'
Quan?ta?ve'metric'(minimum'hamming'distance)'used'to'match'edges'between'NCBI'taxon'tree'and'molecular'phylogeny'
PD'cutoff'
Built'PhyloSiS'Marker'package'
Tree' HMM'profile''(CMs'for'rRNA)'
Taxon'map' Representa?ve'sequences'
Alignment'
FastTree
hmmbuild (ssu-build)
Mapping'File'(sequence'name,'NCBI'taxon'ID)'
Reconcile'NCBI'taxonomy'IDs'with'phylogene?c'topology'
Execute'build_marker'mode'
Generate'unique'IDs'for'input'sequences'
Create'profile'HMMs'(or'CMs'for'rRNA'data)'using'input'sequences'
Alignment'File'(Marker'sequences'in'FASTA'format)'
Build'tree'and'collapse'topology'according'to'a'userMspecified'PD'cutoff'(e.g.'99%)''
Tree Reconciliation
Built Marker Packages
Index Marker Database
Clean'and'package'new'marker'genes'
New'marker'gene'packages'placed'into'shared'PhyloSiS'marker'directory'
Execute'index'mode'
Indexes'the'marker'databases'needed'for'LAST'and'Bow?e'
NOTE:'New'marker'packages'are'named'according'to'input'filenames'(e.g.'MarkerAlignment.fasta).'Core'marker'data'will'be'overwriXen'during'new'marker'builds'if'input'files'do'not'have'unique'names'compared'to'exis?ng'PhyloSiS'markers.'
Locally'indexed'marker'packages'will'not'interfere'with'automa?c'updates'to'PhyloSiS'core'markers'
Quan?ta?ve'metric'(minimum'hamming'distance)'used'to'match'edges'between'NCBI'taxon'tree'and'molecular'phylogeny'
PD'cutoff'
Built'PhyloSiS'Marker'package'
Tree' HMM'profile''(CMs'for'rRNA)'
Taxon'map' Representa?ve'sequences'
Alignment'
The+Kangaroo+–+SimulaLon+Data+
Select Taxa
PD on concatenated tree
Genome&Directory&Define&the&number&of&&genomes&to&pick&(default&=&10)&and&number&of&
reads&to&generate&per&file&(default&=&100,000)&
Grinder&algorithm&randomly&generates&reads&from&selected&genomes,&outputs&simulated&PEAIllumina&and&454&datasets&
Execute&sim&mode&
Determines&PD&contribuFons&for&taxa&present&in&concatenated&guide&tree&in&PhyloSiH&marker&directory&
Two&separate&approaches&used:&1. Select&some&number&of&taxa&that&contribute&
to&PD&(user&input,&default&=&10&taxa)&2. Sample&taxa&uniformly&without&replacement&
Knockout Swaths of Taxa
Generated Simulated Reads
Simulation Marker Directory
Workflow&plugs&into&updateDB&to&remove&genomes&which&have&been&used&to&simulate&metagenome&data,&as&well&as&a&swath&of&related&taxa.&
A&new&marker&directory&is&created,&where&simulated&genomes&have&been&knocked&out&from&marker&packages.&&
Compute metrics between target and
remaining taxa
Calculated&metrics&include:&the&distance&to&nearest&neighbors,&connecFng&branch&lengths,&and&the&number&of&sampled&nodes&within&various&PD&units&of&connecFng&nodes.&
DBupdate+–+Mining+new+genomes+
Amino Acid Tree
Run PhyloSift (search + align)
Execute'
phylosi/_dbupdate.pl'
A'taxa'set'is'selected'with'a'
maxPD'cutoff'of'0.02'and'a'new'
tree'is'inferred'
EBI'
Genomes'
Infer Updated Tree
PD'metric'used'to'split'guide'tree'into'
smaller'subtrees;'subsets'of'taxa'are'
selected'such'that'no'branch'connecEng'
them'has'length'>0.X'for'some'value'of'X'
Add'new'sequences'to'marker'packages'
JGI'
Genomes'
Private'
Genomes'
NCBI'
Genomes'
Nucleotide Tree
Prune Tree
Update reference sequences with
new data
New'sequences'added'at'0.25'PD'for'amino'
acid'tree;'higher'PD'threshold'enables'
more'aggressive'searches'of'reference'
database,'since'LAST'searching'is'faster'
with'fewer'sequences.'
Reconcile'NCBI'taxonomy'IDs'with'
phylogeneEc'topologies,'for'both'
amino'acid'tree'and'codon'subtrees'
Tree Reconciliation
Codon Subtrees
Package Markers
Users’'local'marker'databases'are'automaEcally'
scanned'each'Eme'PhyloSi/'is'run'and'any'new'
updates'are'automaEcally'downloaded'if'available'
Automated Download to
PhyloSift Users
Tree+ReconciliaLon+in+PhyloSiE+
Environmental,Sequences,
Named,Taxa,
Great!,,
Not,Bad,,
Ge9ng,Tricky…,,
Tree+Placement+Fat+Tree+_+Guppy+
Marine+Metagenome+
Chemoautotrophic+bacteria+–+oxidize+ammonia+into+nitrite+
Alveolate+ProLsts+
Common+seawater+Archaea+
Tree+Placement+Tog+Tree+_+Guppy+
Marine+Metagenome+
Marine+Metagenome+
Tree+Placement+Sing+Tree+_+Guppy+
Linking+with+the+Fungal+ITS+community+
• How+does+fungal+ITS+sequence+data+relate+to+your+project?+– PhyloSiE+has+the+capability+to+add+any+marker+gene+reference+packages+that+are+relevant+for+specific+taxonomic+communiLes++
• What+fungal+ITS+data+does+your+project+currently+provide+– None+–+but+we+do+mine+other+marker+genes+from+fungal+genomes+
• What+fungal+ITS+data+is+your+project+hoping+to+provide?+– We+wouldn’t+provide+data,+but+can+work+with+users+to+increase+support+for+fungal+analyses+
• Is+your+project+involved+with+curaLng+fungal+ITS+sequences+– No,+but+we+would+curate+alignments+and+marker+packages+of+ITS+sequences+mined+from+public+databases+
• If+so,+what+curaLon+strategies+are+being+implemented+for+your+project?+– Alignment+filtering+and+masking,+pruning+reference+trees+
• What+tools+for+working+with+fungal+ITS+sequences+does+your+project+currently+provide?++– None+so+far+–+but+can+be+implemented+if+given+a+reference+dataset+(e.g.+alignment)+
Linking+with+the+Fungal+ITS+community+
Linking+with+the+Fungal+ITS+community+
• What+tools+are+you+developing+/+planning+to+develop?++– Current+focus+is+on+mulLsample+comparisons+– Gene+tree+reconciliaLon+– Probability+distribuLon+over+tree+topology+to+delimit+OTUs+(PhylogeneLc+OTUs)+
• What+framework+of+fungal+taxonomy+does+your+project+use?++– NCBI_derived+taxonomy+(because+of+tree+mapping/reconciliaLon+issues)+
SATELLITE MEETING
Eukaryotic Metagenomics
March/April 2013 UC Davis
Acknowledgements+UC+Davis+• Jonathan+Eisen+• Aaron+Darling+• Guillaume+Jospin+• Dongying+Wu+• David+Coil+
+PhyloSiE+SoEware+Development+on+Github:+hAps://github.com/gjospin/PhyloSiE++Google+Group+for+user+support:++hAps://groups.google.com/d/forum/phylosiE++TwiAer:+@PhyloSiE+