Download - Amazon resource for bioinformatics
Amazon resources for bioinformatics
Brad Chapman
Bioinformatics Interest Group, 18 Oct 2012
Goals
Automate:Reduce stepsRemove activation energyIncrease abstraction
Improve:SharingReproducibilityTeaching
Installation
Easier installation
No installation
Challenge
Biology computing platform
Widely accessible
Customizable
Community driven
Not only Amazon
http://gigaom.com/cloud/what-google-compute-engine-means-for-cloud-computing/
CloudBioLinux
Amazon image with bioinformatics software andlibraries
Automated build framework
Community e�ort to maintain and extend
http://cloudbiolinux.org
CloudMan
SGE cluster plus automation
Web interface and monitoring
Persistence and sharing
Powers the Galaxy Cloud o�ering
http://usecloudman.org/
BioCloudCentral
Automate setup of Amazon instance
Launch CloudBioLinux and CloudMan
Provide easy ssh access, no key pairs
http://biocloudcentral.org
Acknowledgments
CloudBioLinux: Ntino Krampis, Tim Booth,Dawn Field, Pjotr Prins, John Chilton andCloudBioLinux community.
CloudMan: Enis Afgan, James Taylor
BioCloudCentral: Enis Afgan, John Chilton,Dannon Baker
Documentation
http://cda.currentprotocols.com/WileyCDA/CPUnit/
refId-bi1109.html
What we'll do
1 Sign up for Amazon
2 Start a CloudBioLinux/CloudMan instance
3 Add nodes to create a compute cluster
4 Run variant calling pipeline
Everything done through the web
Getting started
Sign up for Amazon Web Serviceshttp://aws.amzaon.com
Get security credentials: Access Key and Secret Keyhttp://portal.aws.amazon.com/gp/aws/
securityCredentials
Launch: http://biocloudcentral.org
Ready two minutes later
Login to CloudMan
Shared CloudMan images
Package a complete analysis environmentDataCustomizations
Sharable with other users
Share string with NGS analysis platform:
cm-b53c6f1223f966914df347687f6fc818/shared/2012-07-23--19-23/
Start CloudMan
CloudMan console
CloudMan admin page
CloudMan: managing a cluster
Associated Galaxy instance
Analysis data on shared instance
Graphical variant-calling pipeline
Analysis data linked to pipeline
Con�gure pipeline
Run pipeline
Shut everything down
What happened
1 Sign up for Amazon
2 Start a CloudBioLinux/CloudMan instance
3 Add nodes to create a compute cluster
4 Run variant calling pipeline
Everything done through the web
ssh to the machine
$ ssh [email protected]
[email protected]'s password:
Welcome to Ubuntu 12.04 LTS
(GNU/Linux 3.2.0-23-virtual x86_64)
ubuntu@ip-10-72-197-11:~$
NX graphical client: login
http://www.nomachine.com/download.php
NX graphical client: desktop
Summary
Use cloud resources to build:
Machines with standard software
Cluster management
Analysis pipelines
Reproducible, sharable instances
Web-based interfaces