ec2 performance, spot instance roi and emr scalability

19
EC2 PERFORMANCE, SPOT INSTANCE ROI AND EMR SCALABILITY Jesse Anderson

Upload: jesse-anderson

Post on 06-Dec-2014

1.765 views

Category:

Technology


1 download

DESCRIPTION

The presentation accompanying my research into Amazon Web Services EC2 performance, Spot instance ROI and EMR (Hadoop) scalability.

TRANSCRIPT

Page 1: EC2 Performance, Spot Instance ROI and EMR Scalability

EC2 PERFORMANCE, SPOT INSTANCE ROI AND EMR SCALABILITY

Jesse Anderson

Page 2: EC2 Performance, Spot Instance ROI and EMR Scalability

AMAZON WEB SERVICES (AWS)

Elastic Cloud Compute (EC2) Virtual Machine in Cloud

Simple Storage Service (S3) Network Share in Cloud

Elastic MapReduce (EMR) Cluster of EC2 instances for Hadoop cluster

Page 3: EC2 Performance, Spot Instance ROI and EMR Scalability

EC2 PRICE TYPES

Spot Instances System for bidding on unused instances Same Performance Go away (abruptly) if outbid

On Demand Ad Hoc starting

Reserved Not Covered

Page 4: EC2 Performance, Spot Instance ROI and EMR Scalability

SPOT INSTANCE SAVINGS

Page 5: EC2 Performance, Spot Instance ROI and EMR Scalability

MILLION MONKEYS PROJECT

Randomly recreated Shakespeare Open source Good metric for CPU and memory

Page 6: EC2 Performance, Spot Instance ROI and EMR Scalability

EC2 SPECIFICATIONS

Instance Name

Memory

EC2 Compute Units/Cores

Platform

I/O Performance

Small 1.7 GB 1 EC2 on 1 Core 32-bit Moderate

Large 7.5 GB 4 EC2 on 2 Cores 64-bit High

Extra Large 15 GB 8 EC2 on 8 Cores 64-bit High

High-CPU Medium

1.7 GB 5 EC2 on 2 Cores 32-bit Moderate

High-CPU Large 7 GB 20 EC2 on 8 Cores 64-bit High

Quad XL 23 GB 33.5 on 8 Cores 64-bit Very High

EC2 Compute Unit (ECU) – One EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.

Page 7: EC2 Performance, Spot Instance ROI and EMR Scalability

EC2 PERFORMANCE

My Core 2 Duo 2.66 GHZ did 50,000,000,000 character groups

Page 8: EC2 Performance, Spot Instance ROI and EMR Scalability

EC2 COST PER HOUR ON DEMAND/SPOT

Page 9: EC2 Performance, Spot Instance ROI and EMR Scalability

PRICE PER UNIT

Page 10: EC2 Performance, Spot Instance ROI and EMR Scalability

EMR (HADOOP) CLUSTERING

Tests of 1, 2, 3, 4, 5, 10, 20 node clusters

Price Scalability

Page 11: EC2 Performance, Spot Instance ROI and EMR Scalability

EMR COST

Page 12: EC2 Performance, Spot Instance ROI and EMR Scalability

PRICE PER UNIT IN A CLUSTER

Page 13: EC2 Performance, Spot Instance ROI and EMR Scalability

CLUSTERED CHARACTER GROUPS

Page 14: EC2 Performance, Spot Instance ROI and EMR Scalability

EMR/HADOOP SCALABILITY PERCENTAGE

Page 15: EC2 Performance, Spot Instance ROI and EMR Scalability

EMR/HADOOP SCALABILITY ABSOLUTE

Page 16: EC2 Performance, Spot Instance ROI and EMR Scalability

BREAKDOWNS

Original project would have run in 3 days 9 hours Took 1.5 months before

20 node cluster costs $45.44 per day 5 day run cost $317 11 day run cost $528

Page 17: EC2 Performance, Spot Instance ROI and EMR Scalability

ENGINEERING FOR THE CLOUD

Establish if a good fit Test the EC2 performance Figure out a unit or widget Find the most cost efficient EC2

performer with price per unit/widget Engineer with Spot Instances in mind

Page 18: EC2 Performance, Spot Instance ROI and EMR Scalability

CONCLUSIONS

Spot Instance Saves From $2.20 to $1.30 per hour Saved $1,000 in one run

Hadoop/EMR Scalability 95% efficiency at 2-5 nodes 87% efficiency at 10 nodes 84% efficiency at 20 nodes