talk at microsoft cloud futures 2010

153
Scien&fic Compu&ng with Amazon Web Services Deepak Singh Cloud Futures 2010

Upload: deepak-singh

Post on 11-May-2015

3.557 views

Category:

Technology


0 download

DESCRIPTION

My talk from Cloud Futures 2010, organized by MSR

TRANSCRIPT

Page 1: Talk at Microsoft Cloud Futures 2010

Scien&fic  Compu&ng  with  Amazon  Web  ServicesDeepak  Singh

Cloud  Futures  2010

Page 2: Talk at Microsoft Cloud Futures 2010
Page 3: Talk at Microsoft Cloud Futures 2010
Page 5: Talk at Microsoft Cloud Futures 2010
Page 6: Talk at Microsoft Cloud Futures 2010
Page 7: Talk at Microsoft Cloud Futures 2010

life science industry

Page 8: Talk at Microsoft Cloud Futures 2010

Credit: Bosco Ho

Page 9: Talk at Microsoft Cloud Futures 2010
Page 10: Talk at Microsoft Cloud Futures 2010

By ~Prescott under a CC-BY-NC license

Page 11: Talk at Microsoft Cloud Futures 2010
Page 12: Talk at Microsoft Cloud Futures 2010

data

Page 13: Talk at Microsoft Cloud Futures 2010

Image: Wikipedia

Page 14: Talk at Microsoft Cloud Futures 2010
Page 15: Talk at Microsoft Cloud Futures 2010

Image  via  image  editor  under  a  CC-­‐BY  License

Page 16: Talk at Microsoft Cloud Futures 2010

Image: Matt Wood

Page 18: Talk at Microsoft Cloud Futures 2010
Page 19: Talk at Microsoft Cloud Futures 2010
Page 20: Talk at Microsoft Cloud Futures 2010
Page 21: Talk at Microsoft Cloud Futures 2010
Page 22: Talk at Microsoft Cloud Futures 2010

gigabytes

Page 23: Talk at Microsoft Cloud Futures 2010

terabytes

Page 24: Talk at Microsoft Cloud Futures 2010

petabytes

Page 25: Talk at Microsoft Cloud Futures 2010

petabytes

Page 26: Talk at Microsoft Cloud Futures 2010

petabytes

exabytes?

Page 27: Talk at Microsoft Cloud Futures 2010

really fast

Page 28: Talk at Microsoft Cloud Futures 2010

Image: http://www.broadinstitute.org/~apleite/photos.html

Page 29: Talk at Microsoft Cloud Futures 2010
Page 30: Talk at Microsoft Cloud Futures 2010
Page 31: Talk at Microsoft Cloud Futures 2010
Page 32: Talk at Microsoft Cloud Futures 2010

data management

Page 33: Talk at Microsoft Cloud Futures 2010

data processing

Page 34: Talk at Microsoft Cloud Futures 2010

data sharing

Page 35: Talk at Microsoft Cloud Futures 2010

Image: Chris Dagdigian

Page 36: Talk at Microsoft Cloud Futures 2010

compute & storage limited

Page 37: Talk at Microsoft Cloud Futures 2010
Page 38: Talk at Microsoft Cloud Futures 2010

amazon web services

Page 39: Talk at Microsoft Cloud Futures 2010

the cloud

Page 40: Talk at Microsoft Cloud Futures 2010

has_many :definitions

Page 41: Talk at Microsoft Cloud Futures 2010

infrastructure as a service

Page 42: Talk at Microsoft Cloud Futures 2010
Page 43: Talk at Microsoft Cloud Futures 2010

ComputeAmazon Elastic Compute

Cloud (EC2)- Elastic Load Balancing- Auto Scaling

StorageAmazon Simple

Storage Service (S3)- AWS Import/Export

DatabaseAmazon RDS and

SimpleDB

Page 44: Talk at Microsoft Cloud Futures 2010

ComputeAmazon Elastic Compute

Cloud (EC2)- Elastic Load Balancing- Auto Scaling

StorageAmazon Simple

Storage Service (S3)- AWS Import/Export

Content DeliveryAmazon CloudFront

MessagingAmazon Simple

Queue Service (SQS) Amazon Simple

Notification Service (SNS)

PaymentsAmazon Flexible Payments Service

(FPS)

On-Demand Workforce

Amazon Mechanical Turk

Parallel ProcessingAmazon Elastic

MapReduce

DatabaseAmazon RDS and

SimpleDB

Page 45: Talk at Microsoft Cloud Futures 2010

ComputeAmazon Elastic Compute

Cloud (EC2)- Elastic Load Balancing- Auto Scaling

StorageAmazon Simple

Storage Service (S3)- AWS Import/Export

Content DeliveryAmazon CloudFront

MessagingAmazon Simple

Queue Service (SQS)Amazon Simple

Notification Service (SNS)

PaymentsAmazon Flexible Payments Service

(FPS)

On-Demand Workforce

Amazon Mechanical Turk

Parallel ProcessingAmazon Elastic

MapReduce

MonitoringAmazon CloudWatch

ManagementAWS Management Console

ToolsAWS Toolkit for Eclipse

AWS SDK for .NET

Isolated NetworksAmazon Virtual Private

Cloud

DatabaseAmazon RDS and

SimpleDB

Page 46: Talk at Microsoft Cloud Futures 2010

ComputeAmazon Elastic Compute

Cloud (EC2)- Elastic Load Balancing- Auto Scaling

StorageAmazon Simple

Storage Service (S3)- AWS Import/Export

Your Custom Applications and Services

Content DeliveryAmazon CloudFront

MessagingAmazon Simple

Queue Service (SQS)Amazon Simple

Notification Service (SNS)

PaymentsAmazon Flexible Payments Service

(FPS)

On-Demand Workforce

Amazon Mechanical Turk

Parallel ProcessingAmazon Elastic

MapReduce

MonitoringAmazon CloudWatch

ManagementAWS Management Console

ToolsAWS Toolkit for Eclipse

AWS SDK for .NET

Isolated NetworksAmazon Virtual Private

Cloud

DatabaseAmazon RDS and

SimpleDB

Page 47: Talk at Microsoft Cloud Futures 2010
Page 48: Talk at Microsoft Cloud Futures 2010
Page 49: Talk at Microsoft Cloud Futures 2010
Page 50: Talk at Microsoft Cloud Futures 2010
Page 51: Talk at Microsoft Cloud Futures 2010

elasticity

Page 52: Talk at Microsoft Cloud Futures 2010

3000 CPU’s for one firm’s risk management application

!"#$%&'()'*+,'-./01.2%/'

344'+567/'(.'

8%%9%.:/'

;<"&/:1='

>?,3?,44@'

A&B:1='

>?,>?,44@'

C".:1='

>?,D?,44@'

E(.:1='

>?,F?,44@'

;"%/:1='

>?,G?,44@'

C10"&:1='

>?,H?,44@'

I%:.%/:1='

>?,,?,44@'

3444JJ'

344'JJ'

Page 53: Talk at Microsoft Cloud Futures 2010

scalability

Page 54: Talk at Microsoft Cloud Futures 2010

> 1PB of data in S3

Page 55: Talk at Microsoft Cloud Futures 2010
Page 56: Talk at Microsoft Cloud Futures 2010

highly available

Page 57: Talk at Microsoft Cloud Futures 2010

Image: Chris Dagdigian

Page 58: Talk at Microsoft Cloud Futures 2010
Page 59: Talk at Microsoft Cloud Futures 2010

“Everything fails, all the time”-- Werner Vogels

Page 60: Talk at Microsoft Cloud Futures 2010
Page 61: Talk at Microsoft Cloud Futures 2010

“Things will crash. Deal with it”-- Jeff Dean

Page 62: Talk at Microsoft Cloud Futures 2010

2-4% of serverswill die annually

Source: Jeff Dean, LADIS 2009

Page 63: Talk at Microsoft Cloud Futures 2010

1-5% of disk drives will die every year

Source: Jeff Dean, LADIS 2009

Page 64: Talk at Microsoft Cloud Futures 2010

human errors

Page 65: Talk at Microsoft Cloud Futures 2010

human errors~20% admin issues have unintended consequences

Source: James Hamilton

Page 66: Talk at Microsoft Cloud Futures 2010

scalable & available

Page 67: Talk at Microsoft Cloud Futures 2010

assume sw/hw failure

Page 68: Talk at Microsoft Cloud Futures 2010

design apps to be resilient

Page 69: Talk at Microsoft Cloud Futures 2010

automation & alarming

Page 70: Talk at Microsoft Cloud Futures 2010

US East Region

Availability Zone A

Availability Zone B

Availability Zone C

Availability Zone D

!"#$%&'()*+

T

TT

Page 71: Talk at Microsoft Cloud Futures 2010
Page 72: Talk at Microsoft Cloud Futures 2010

elastic load balancing

CloudWatch

auto scaling

elastic block store

elastic IP

SQSSNS

Page 73: Talk at Microsoft Cloud Futures 2010

cost effective

Page 74: Talk at Microsoft Cloud Futures 2010

cost effective

pay as you go

Page 75: Talk at Microsoft Cloud Futures 2010

on-demand instancesreserved instances

spot instances

Page 76: Talk at Microsoft Cloud Futures 2010
Page 77: Talk at Microsoft Cloud Futures 2010

Your Network

AmazonWeb Services

CloudSecure VPN Connection

over the Internet

Customer’s isolated AWS

resources

VPN Gateway

ExternalCustomers

Subnets

10.32.1.0/24

10.32.2.0/24

10.32.3.0/24

AMAZON  VPC  ARCHITECTURE

Page 78: Talk at Microsoft Cloud Futures 2010

AWS + science = win

Page 79: Talk at Microsoft Cloud Futures 2010
Page 80: Talk at Microsoft Cloud Futures 2010

3.7 million classifications in just over three days~15 million in less than a month>2.6 million clicks in 100 hours

Page 81: Talk at Microsoft Cloud Futures 2010
Page 82: Talk at Microsoft Cloud Futures 2010

lots and lots and lots and lots and lots and lots of data andlots and lots of lots of data

Page 83: Talk at Microsoft Cloud Futures 2010

scalability & availability

Page 84: Talk at Microsoft Cloud Futures 2010

we are data geeks not data center geeks

Page 85: Talk at Microsoft Cloud Futures 2010

data management

Page 86: Talk at Microsoft Cloud Futures 2010

Shaq Image: Keith Allison under a CC-BY-SA license

Page 87: Talk at Microsoft Cloud Futures 2010

Shaq Image: Keith Allison under a CC-BY-SA license

Page 88: Talk at Microsoft Cloud Futures 2010

Shaq Image: Keith Allison under a CC-BY-SA license

Page 89: Talk at Microsoft Cloud Futures 2010

Shaq Image: Keith Allison under a CC-BY-SA license

Page 90: Talk at Microsoft Cloud Futures 2010

Shaq Image: Keith Allison under a CC-BY-SA license

Page 91: Talk at Microsoft Cloud Futures 2010
Page 92: Talk at Microsoft Cloud Futures 2010
Page 93: Talk at Microsoft Cloud Futures 2010

Biomarker Warehousepre-clinical, clinical, 3rd party data and publications

!"#$%"&&'

!#%&$(%&&&'

!)*(%"&&'

+,'-./01'

23,3415'61789:1'

;<./5'=>?6@'

6178170' 6A.7341' B817-135'

Estimated cost: 10 TB warehouse over 3 years

Page 94: Talk at Microsoft Cloud Futures 2010

data processing

Page 95: Talk at Microsoft Cloud Futures 2010
Page 96: Talk at Microsoft Cloud Futures 2010

http://cyclecomputing.com

Page 99: Talk at Microsoft Cloud Futures 2010

http://www.rightscale.com

Page 100: Talk at Microsoft Cloud Futures 2010

http://leonardo.phys.washington.edu/feff/

XAFS

Page 101: Talk at Microsoft Cloud Futures 2010

http://bioteam.net

Page 102: Talk at Microsoft Cloud Futures 2010

ASSEMBLING GENOMES

140  million  454  reads

Image:  Ma)  Wood

Page 103: Talk at Microsoft Cloud Futures 2010

Map 100 million, 100 base paired end readsQuad core with 5 GB of RAM would take 16 days

30 high-memory instances; 32 hours; $195

BLAT @ U. PENN

Page 104: Talk at Microsoft Cloud Futures 2010

HEAVY-ION COLLISIONS

Problem: Quark matter physics conference imminent but no compute resources handy

Solution: NIMBUS context broker allowed researchers to provision 300 nodes and get the simulations done

Page 105: Talk at Microsoft Cloud Futures 2010

BELLE MONTE CARLO

Credit: Tom Fifield

Page 106: Talk at Microsoft Cloud Futures 2010

disk read/writesslow & expensive

Page 107: Talk at Microsoft Cloud Futures 2010

data processingfast & cheap

Page 108: Talk at Microsoft Cloud Futures 2010

distribute the dataparallel reads

Page 109: Talk at Microsoft Cloud Futures 2010
Page 110: Talk at Microsoft Cloud Futures 2010

data processing for the cloud

Page 111: Talk at Microsoft Cloud Futures 2010

distributed file system(HDFS)

Page 112: Talk at Microsoft Cloud Futures 2010

map/reduce

Page 113: Talk at Microsoft Cloud Futures 2010

Via Cloudera under a Creative Commons License

Page 114: Talk at Microsoft Cloud Futures 2010

Via Cloudera under a Creative Commons License

Page 115: Talk at Microsoft Cloud Futures 2010

http://www.cascading.org/

Page 116: Talk at Microsoft Cloud Futures 2010

apache pig

http://hadoop.apache.org/pig/

Page 117: Talk at Microsoft Cloud Futures 2010

apache hive

http://hadoop.apache.org/hive/

Page 118: Talk at Microsoft Cloud Futures 2010

work by @peteskomoroch

Page 119: Talk at Microsoft Cloud Futures 2010
Page 120: Talk at Microsoft Cloud Futures 2010

High Throughput Sequence AnalysisMike Schatz, University of Maryland

Page 121: Talk at Microsoft Cloud Futures 2010

CloudBurst

Catalog k-mers Collect seeds End-to-end alignment

http://cloudburst-bio.sourceforge.net; Bioinformatics 2009 25: 1363-1369

Page 122: Talk at Microsoft Cloud Futures 2010

Crossbow: Rapid whole genome SNP analysis

Ben Langmead

http://bowtie-bio.sourceforge.net/crossbow/index.shtml

Page 123: Talk at Microsoft Cloud Futures 2010

Preprocessed reads

Map: Bowtie

Sort: Bin and partition

Reduce: SoapSNP

Crossbow: Rapid whole genome SNP analysis

Langmead B, Schatz MC, Lin, J, Pop M, Salzberg SL. Genome Biol 10(11): R134.

Page 124: Talk at Microsoft Cloud Futures 2010

Crossbow   condenses   over   1,000   hours   of  resequencing   computa:on   into   a   few   hours  without   requiring   the   user   to   own   or   operate   a  computer  cluster

Page 125: Talk at Microsoft Cloud Futures 2010

Assembly of Large Genomes with Cloud Computing.Schatz MC, Sommer D, Kelley D, Pop M, et al. In Preparation.http://contrail-bio.sourceforge.net

Scalable Genome Assembly

Page 126: Talk at Microsoft Cloud Futures 2010

Input  S3  bucket

Output  S3  bucket

Amazon S3

Hadoop

Amazon EC2 Instances

Input dataset

outputresults

Deploy Application

Web Console, Command line tools

End

Notify

Get ResultsInput Data

Amazon Elastic MapReduce

Hadoop Hadoop

Hadoop

Hadoop

Hadoop

Elastic MapReduce

Elastic MapReduce

Page 127: Talk at Microsoft Cloud Futures 2010
Page 128: Talk at Microsoft Cloud Futures 2010

data storage & distributionpublic & private

Page 130: Talk at Microsoft Cloud Futures 2010
Page 131: Talk at Microsoft Cloud Futures 2010

sharing and collaboration

Page 132: Talk at Microsoft Cloud Futures 2010

software distribution

Page 133: Talk at Microsoft Cloud Futures 2010
Page 134: Talk at Microsoft Cloud Futures 2010

http://www.cloudbiolinux.com/

Page 135: Talk at Microsoft Cloud Futures 2010

http://usegalaxy.org/cloud

Page 136: Talk at Microsoft Cloud Futures 2010

application platforms

Page 137: Talk at Microsoft Cloud Futures 2010

http://heroku.com

Page 138: Talk at Microsoft Cloud Futures 2010

http://chempedia.com/

Page 139: Talk at Microsoft Cloud Futures 2010

Image: O’Reilly Radar

Page 140: Talk at Microsoft Cloud Futures 2010

business models

Page 141: Talk at Microsoft Cloud Futures 2010
Page 142: Talk at Microsoft Cloud Futures 2010
Page 143: Talk at Microsoft Cloud Futures 2010

to conclude

Page 144: Talk at Microsoft Cloud Futures 2010
Page 145: Talk at Microsoft Cloud Futures 2010

built for scale

Page 146: Talk at Microsoft Cloud Futures 2010

built for availability

Page 147: Talk at Microsoft Cloud Futures 2010

shared dataspacesglobal namespaces

Page 148: Talk at Microsoft Cloud Futures 2010

task-based resources

Page 149: Talk at Microsoft Cloud Futures 2010

new software architectures

Page 150: Talk at Microsoft Cloud Futures 2010

new computing platforms

Page 151: Talk at Microsoft Cloud Futures 2010

available today

Page 152: Talk at Microsoft Cloud Futures 2010

http://aws.amazon.com/education

Page 153: Talk at Microsoft Cloud Futures 2010

[email protected]  Twi?er:@mndoci  Presenta2on  ideas  from  James  Hamilton,  @mza  and  @lessig

Thank  you!