why the cloud is a computational biologist's best friend
DESCRIPTION
Pros and Cons of cloud computing for biocomputationTRANSCRIPT
Yannick Pouliot 2/23/2012
Amazon Cloud: A Religious Experience
Amazon Cloud services in a nutshell:
Highly flexible storage and compute power sold on a use basis
Why the Cloud?
•Complete flexibility of computing power and storage•Grow or diminish as needed•Arbitrary number of machines•Ridiculously powerful machine made
affordable on a short lease basis to address particular task (e.g., 15B ANOVAs)
•Unusual architectures (e.g., GPUs)
There Are Many Cloud Providers…… but Amazon is clear leader, IMO
Q: What does working with a Cloud machine feel like? A: It’s not materially different than accessing a machine on our cluster, except you can do anything you want
Main Services Provided by Amazon Cloud• Storage
▫ Traditional disk volumes▫ S3 buckets (“Simple Storage System”)
• Computing (EC2 – “Elastic Compute Cloud”)▫ Single machine instances▫ Clusters of various types
• Machine types▫ Compute servers▫ Database servers▫ Cluster▫ Specialized architectures▫ Variety of operating systems (LINUX flavors, Windows)
Types of Instances
•Based on definition of the virtual machine definition▫I/O bus▫Number of CPUs▫Memory▫Type of CPU, cluster
•Deployment: Spot market vs. “Reserved”
Costs
•You pay for (almost) everything you do▫Data transfers (out)▫Storage▫CPU cycles (depends on instance type; one
instance is free)•Can purchase cycles at below average
market price▫Can provide access to vast amounts of
computing power at a price you can afford•Research grants from Amazon
Controlling Your Services
•Web-base console•Command-line tools
▫EC2 API tools•Third party systems: RightScale
Using & Distributing Instances• You can always make images of your
instances for later use/backup• Images can be made public• You can launch other people’s images (i.e.,
public images), e.g.,▫CloudBioLinux: pre-made biocomputational
instances▫Galaxy Cloud: pre-made Cluster-based Galaxy
instance (Web-based, no less)▫PathSeq: pre-made comprehensive bowtie
engine that uses Hadoop
Issues• Security
▫ Lots of it• Data transfers
▫ Free for upload; $ for download▫ No big deal, so far▫ Can send drives…
• Latency▫ No big deal
• Small “ephemeral” storage▫ Gotcha if you don’t know
• Max 1 terabyte per disk▫ Hum…
• “Max” 20 disks per instance▫ Can be circumvented
• No sharing of disks between instances, usually
Support• Unless you purchase support, you’re on your own• Hasn’t been an issue for me, though it can consume time to find solution…
Support options:
Questions?