high throughput computational sequence analysis rob edwards [email protected] argonne national...
Post on 19-Dec-2015
216 views
TRANSCRIPT
![Page 1: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/1.jpg)
High Throughput ComputationalSequence Analysis
Argonne National LaboratorySan Diego State University
![Page 2: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/2.jpg)
Firstbacterial genome
100bacterial genomes
1,000bacterial genomesN
um
ber
of
know
n s
equence
s
Year
How much has been sequenced
Environmentalsequencing
![Page 3: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/3.jpg)
Everybody inSan Diego
Everybody inUSA
AllculturedBacteria
100people
How much will be sequenced
One genome fromevery species
Most majormicrobial environments
![Page 4: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/4.jpg)
High Performance Computing
![Page 5: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/5.jpg)
TeraGrid
![Page 6: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/6.jpg)
The Teragrid National Resource
![Page 7: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/7.jpg)
Life Sciences Gateway to TeraGrid
![Page 8: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/8.jpg)
Subsystems
![Page 9: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/9.jpg)
Subsystems make up metabolism
Wik
ipedia
Meta
bolis
mhtt
p:/
/en.w
ikip
edia
.org
/wik
i/Port
al:M
eta
bolis
m
![Page 10: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/10.jpg)
Subsystems are not just metabolism
http://aig.cs.man.ac.uk/gallery/Utopia/
Enzyme complex
http://webdeptos.uma.es/
Cell Machinery
http://www.brown.edu/
Cell Processes
![Page 11: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/11.jpg)
http://www.theseed.org
![Page 12: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/12.jpg)
http://www.theseed.org
![Page 13: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/13.jpg)
Growth in generation of subsystems
![Page 14: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/14.jpg)
Microbial Genomics Annotation Platform
• Goal 1: Automate the generation of high quality annotations by leveraging the information contained in SubSystems and FIGfams.
• Goal 2: Minimize turnaround time. Initial target 48 hours
![Page 15: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/15.jpg)
• Automated process consisting of:– Gene calling– Initial annotation of function– Initial metabolic
reconstruction• Process takes 1-7 hours
depending on size and complexity of the genome
• ~20 genomes per day
• Password protected, secure, private
• Release to public databases if required
Freely available annotation service
http://www.nmpdr.org/anno-server/index48.cgi
![Page 16: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/16.jpg)
Some estimate of annotation quality
05
101520253035404550
Bacillus
anthracis str.
Sterne
Mycobacterium
tuberculosisCDC1551
Listeria
monocytogenes
EGD-e
Streptococcuspyogenes M1
GAS
Staphylococcusaureus subsp.
aureus MW2
260799 83331 169963 160490 196620
% in SS SEED
% in SS SP1Ke
% hypothecial SP1Ke
% hypothetical SEED
![Page 17: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/17.jpg)
Evaluation / Viewing
![Page 18: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/18.jpg)
Download results
• We provide a number of export formats:– Genbank, Fasta, GFF3, Excel– can easily be extended to all formats supported by
BioPerl
• Genomes can be deleted by the user at any time (we keep them for max. 120 days)
• Genomes can be directly imported into the SEED if the user wishes
• all genomes are password protected
![Page 19: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/19.jpg)
Metagenomics SEED
![Page 20: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/20.jpg)
http://metagenomics.theseed.org
![Page 21: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/21.jpg)
Metagenome Metabolic Reconstruction
![Page 22: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/22.jpg)
Starch utilization in cow rumens
![Page 23: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/23.jpg)
Metabolic potential in environments
![Page 24: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/24.jpg)
Everybody inSan Diego
Everybody inUSA
AllculturedBacteria
100people
Too much will be sequenced
One genome fromevery species
Most majormicrobial environments
![Page 25: High Throughput Computational Sequence Analysis Rob Edwards redwards@salmonella.org Argonne National Laboratory San Diego State University](https://reader036.vdocuments.site/reader036/viewer/2022070323/56649d375503460f94a10284/html5/thumbnails/25.jpg)
Acknowledgements
Argonne National LaboratoryRick StevensBob OlsonFolker Meyer
San Diego State UniversityForest Rohwer
Fellowship for Interpretation of Genomes
Ross OverbeekVeronika VonsteinThe Annotators