why the cloud is a computational biologist's best friend

13
Yannick Pouliot 2/23/2012 Amazon Cloud: A Religious Experience

Upload: yannick-pouliot

Post on 17-Dec-2014

47 views

Category:

Health & Medicine


0 download

DESCRIPTION

Pros and Cons of cloud computing for biocomputation

TRANSCRIPT

Page 1: Why The Cloud Is A Computational Biologist's Best Friend

Yannick Pouliot 2/23/2012

Amazon Cloud: A Religious Experience

Page 2: Why The Cloud Is A Computational Biologist's Best Friend

Amazon Cloud services in a nutshell:

Highly flexible storage and compute power sold on a use basis

Page 3: Why The Cloud Is A Computational Biologist's Best Friend

Why the Cloud?

•Complete flexibility of computing power and storage•Grow or diminish as needed•Arbitrary number of machines•Ridiculously powerful machine made

affordable on a short lease basis to address particular task (e.g., 15B ANOVAs)

•Unusual architectures (e.g., GPUs)

Page 4: Why The Cloud Is A Computational Biologist's Best Friend

There Are Many Cloud Providers…… but Amazon is clear leader, IMO

Page 5: Why The Cloud Is A Computational Biologist's Best Friend

Q: What does working with a Cloud machine feel like? A: It’s not materially different than accessing a machine on our cluster, except you can do anything you want

Page 6: Why The Cloud Is A Computational Biologist's Best Friend

Main Services Provided by Amazon Cloud• Storage

▫ Traditional disk volumes▫ S3 buckets (“Simple Storage System”)

• Computing (EC2 – “Elastic Compute Cloud”)▫ Single machine instances▫ Clusters of various types

• Machine types▫ Compute servers▫ Database servers▫ Cluster▫ Specialized architectures▫ Variety of operating systems (LINUX flavors, Windows)

Page 7: Why The Cloud Is A Computational Biologist's Best Friend

Types of Instances

•Based on definition of the virtual machine definition▫I/O bus▫Number of CPUs▫Memory▫Type of CPU, cluster

•Deployment: Spot market vs. “Reserved”

Page 8: Why The Cloud Is A Computational Biologist's Best Friend

Costs

•You pay for (almost) everything you do▫Data transfers (out)▫Storage▫CPU cycles (depends on instance type; one

instance is free)•Can purchase cycles at below average

market price▫Can provide access to vast amounts of

computing power at a price you can afford•Research grants from Amazon

Page 9: Why The Cloud Is A Computational Biologist's Best Friend

Controlling Your Services

•Web-base console•Command-line tools

▫EC2 API tools•Third party systems: RightScale

Page 10: Why The Cloud Is A Computational Biologist's Best Friend

Using & Distributing Instances• You can always make images of your

instances for later use/backup• Images can be made public• You can launch other people’s images (i.e.,

public images), e.g.,▫CloudBioLinux: pre-made biocomputational

instances▫Galaxy Cloud: pre-made Cluster-based Galaxy

instance (Web-based, no less)▫PathSeq: pre-made comprehensive bowtie

engine that uses Hadoop

Page 11: Why The Cloud Is A Computational Biologist's Best Friend

Issues• Security

▫ Lots of it• Data transfers

▫ Free for upload; $ for download▫ No big deal, so far▫ Can send drives…

• Latency▫ No big deal

• Small “ephemeral” storage▫ Gotcha if you don’t know

• Max 1 terabyte per disk▫ Hum…

• “Max” 20 disks per instance▫ Can be circumvented

• No sharing of disks between instances, usually

Page 12: Why The Cloud Is A Computational Biologist's Best Friend

Support• Unless you purchase support, you’re on your own• Hasn’t been an issue for me, though it can consume time to find solution…

Support options:

Page 13: Why The Cloud Is A Computational Biologist's Best Friend

Questions?