aws august webinar series - ec2 spot instances - 08192015

59
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EC2 Spot Save Up to 90% on your Amazon EC2 Bill with Spot Instances Tipu Qureshi Jafar Shameem 19 th August 2015

Upload: amazon-web-services

Post on 20-Mar-2017

5.958 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: AWS August Webinar Series -  EC2 Spot Instances - 08192015

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

EC2 SpotSave Up to 90% on your Amazon EC2 Bill with Spot Instances

Tipu Qureshi Jafar Shameem19th August 2015

Page 2: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Name your own price for EC2 Compute

• A market where price of compute changes based upon Supply and Demand

• When Bid Price exceeds Spot Market Price, instance is launched

• Instance is terminated (with 2 minute warning) if market price exceeds bid price

• Unused On-Demand Instances

What is Spot?

Page 3: AWS August Webinar Series -  EC2 Spot Instances - 08192015

• Spot prices are determined via supply and demand• There are hundreds of uncorrelated Spot markets• Prices can, but often don’t fluctuate wildly

About Spot…

Page 4: AWS August Webinar Series -  EC2 Spot Instances - 08192015

General-purpose: M1, M3 , T2

Compute-optimized:C1, CC2, C3, C4

Memory-optimized: M2, CR1, R3, M4

Dense-storage: HS1, D2

I/O-optimized: HI1, I2

GPU: CG1, G2

Micro: T1, T2

.micro

.medium

.large

.xlarge

.2xlarge

.4xlarge

.8xlarge

WindowsLinux

-1a-1b-1c….

Type Size OS AZ

Spot is not one market

Page 5: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Each instance family (r3) and size (4xlarge), in each Availability Zone (US-East-1b)

Uncorrelated pools of Spot Capacity

Page 6: AWS August Webinar Series -  EC2 Spot Instances - 08192015

50% Bid

70% Bid

You pay the market price

Bid Price and Market Price

Page 7: AWS August Webinar Series -  EC2 Spot Instances - 08192015

cc2.8xlarge32 cores, 60.5 GB memory

On-Demand Price:$2.00/hr

$0.00936/core/hr

Page 8: AWS August Webinar Series -  EC2 Spot Instances - 08192015

On average, AWS adds enough new server capacity every day to support Amazon’s global infrastructure when

it was a $7B business.

Page 9: AWS August Webinar Series -  EC2 Spot Instances - 08192015
Page 10: AWS August Webinar Series -  EC2 Spot Instances - 08192015

EC2 Spot - best practices

Page 11: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Check the Price History

Describe Spot Price History API:• Provides historical prices on a per-pool basis • Goes back 90 days (3 months)• Popular instance types tend to have Spot prices that are

somewhat more volatile• Older generations (including c1.8xlarge, m1.small,

cr1.8xlarge, and cc2.8xlarge) tend to be much more stable and have lower cost in general

Page 12: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Capacity pools

Set of EC2 instances of the same properties:• Availability zone• Product/Operating system (Linux/Unix or Windows)• EC2 instance type

Each EC2 capacity pool has it’s own:• Availability – number of Spot instances• Price – based on supply and demand

Page 13: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Use Multiple Capacity Pools

• Run applications across multiple capacity pools to reduce your application’s sensitivity to price spikes that affect a pool

• In general, there is very little correlation between prices in different capacity pools.

• For example, if you run in five different pools your price swings and interruptions can be cut by 80%.

Page 14: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Use Multiple Capacity Pools

Run across multiple availability zones in conjunction• Auto Scaling• Spot Fleet API

Run application across different sizes of instances within the same family

• Amazon EMR takes this approach

Your application could figure out how many vCPUs it is running on, and then launch enough worker threads to keep all of them occupied.

Page 15: AWS August Webinar Series -  EC2 Spot Instances - 08192015

CPU and cores• What kind of performance does your application require?

How many cores does your application need?Memory/core

• How much memory per core does your application need?Networking

• Does your application need high, moderate, low network bandwidth?

Disk• How much local disk does your application need?

Use Normalized pools of Compute

Page 16: AWS August Webinar Series -  EC2 Spot Instances - 08192015

You only pay what the Market price is

But, bid what you are willing to pay

You pay for the price as you enter the hour

And pay for it at the end of the hour

If you get interrupted, you don’t pay for that hour

Bid only what you are willing to pay.

(by default, bid limited to 10 * On Demand Price)

What about Bidding Strategy?

Page 17: AWS August Webinar Series -  EC2 Spot Instances - 08192015

AWS Spot Labs• https://github.com/awslabs/aws-spot-labs

Helps to find capacity pools (defined as instance type and AZ) with lower price volatility by ordering these pools based on duration of time since the Spot price last exceeded the bid price. It uses AWS CLI to programmatically obtain Spot price history data.

Finding the best pools of Compute Capacity

Page 18: AWS August Webinar Series -  EC2 Spot Instances - 08192015

python get_spot_duration.py \--region us-east-1 \--product-description 'Linux/UNIX' \--bids c3.xlarge:0.105,c3.2xlarge:0.21,c3.4xlarge:0.42,c3.8xlarge:0.84,c4.xlarge:0.110,c4.2xlarge:0.220,c4.4xlarge:0.440,c4.8xlarge:0.880,cc2.8xlarge:1.000,c1.xlarge:0.26 \--hours 168

Note:• Price as of 8/15/2015• AZ mappings may differ• 168 hours = 1 week• In this example, bidding

the on-demand price

Using the Spot Tools Lab

Page 19: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Build stateless, distributed, scalable applicationsChoose which instance types fit your workload the bestIngest price feed data for AZs and regions Make run time decisions on which Spot pools to launch in based on price and volatilityManage interruptionsMonitor and manage market prices across Azs and instance typesManage the capacity footprint in the fleetAnd all of this while you don’t know where the capacity isServe your customers

Helping with the undifferentiated heavy lifting

UNDIFFERENTIATED HEAVY LIFTING

Page 20: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Instead of writing all that code to manage Spot Instances, simply specify:

Target Capacity - The number of EC2 instances that you want in your fleet.Maximum Bid Price - The maximum bid price that you are willing to pay.Launch Specifications - # of and types of instances, AMI id, VPC, subnets or AZs, etc.IAM Fleet Role - The name of an IAM role. It must allow EC2 to launch and terminate instances on your behalf.

Introducing Spot Fleet

Page 21: AWS August Webinar Series -  EC2 Spot Instances - 08192015

EC2 Spot - Use Cases

Page 22: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Stateless Web/App Server Fleets

Hadoop Workloads

Continuous Integration (CI)

High Performance Computing (HPC)

Grid Computing

Media Rendering / Transcoding

Spot Use Cases

Page 23: AWS August Webinar Series -  EC2 Spot Instances - 08192015

EC2 Spot - Web Architecture

Page 24: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Considerations

Highly availability

Cost

Elasticity

Stateless Web tier

Parallelism

Page 25: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Stateless Web/App/API Architecture with Spot

Elastic LoadBalancing

Stateless Web Servers

Stateless Web Servers

On Demand Auto Scaling group

Session State Data

Stateless Web Servers (Spot)

Stateless Web Servers (Spot)

Spot Auto Scaling group

Availability Zone A

Availability Zone B

Stateless Web Servers (Spot)

Stateless Web Servers (Spot)

Spot Auto Scaling group

Page 26: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Web Application - Auto ScalingMultiple Auto Scaling groups

• On-demand instances for fallback. • Multiple EC2 Spot instance Auto Scaling groups• Each Spot Auto Scaling group using a different capacity pool

(e.g. AZ, bid, Instance size, Instance type)

Auto Scaling groups behind the same Elastic Load Balancer.

Pick the right instance time for the job based on the price history.

Page 27: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Auto Scaling Policies

Aggressive scaling policies for Spot Auto Scaling Groupse.g. Scale up at 75% CPU utilization and scale down when at 25% CPU utilization with a large capacity range)

More conservative scaling policies for On-Demand Auto Scaling groups.

Page 28: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Session state for the web application can be stored in DynamoDB.

• Data replicated across availability zones.

You can also choose other databases to maintain state in your architecture.

• Amazon RDS using Multi-AZ deployments• Amazon Elasticache

Where to store the state?

Page 29: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Spot termination considerations

Availability of Spot instances can vary based on supply and demand

Architect application to be resilient to instance termination

When the Spot price exceeds the price you named (i.e. the bid price), the instance will receive a two-minute warning that the instance will be terminated

Page 30: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Spot termination considerations

Check for the 2 minute spot instance termination notification every 5 seconds leveraging a script invoked at instance launch. Upon notification:• Place any session information into DynamoDB• Use IAM roles so that the spot instances can de-register

themselves from the ELB upon termination notification

Page 31: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Since the Auto Scaling groups span across multiple availability zones, we highly recommend enabling cross-zone load balancing for the load balancer.

To allow in-flight requests to complete when de-registering Spot instances that are about to be terminated, connection draining can be enabled on the load balancer with a timeout of 90 seconds.

Elastic Load Balancing

Page 32: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Sample script

#!/bin/bashwhile true do if curl -s http://169.254.169.254/latest/meta-data/spot/termination-time | \ grep -q .*T.*Z; then instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id); \ aws elb deregister-instances-from-load-balancer \ --load-balancer-name my-load-balancer \ --instances $instance_id; /env/bin/flushsessiontoDBonterminationscript.sh; else # Spot instance not yet marked for termination. sleep 5 fidone

Page 33: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Web Application Architecture with Spot

Elastic LoadBalancing

Stateless Web Servers

Stateless Web Servers

On Demand Auto Scaling group

Session State Data

Stateless Web Servers (Spot)

Stateless Web Servers (Spot)

Spot Auto Scaling group

Availability Zone A

Availability Zone B

Stateless Web Servers (Spot)

Stateless Web Servers (Spot)

Spot Auto Scaling group

Page 34: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Studyplus Case Study

Page 35: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Batch Processing with Amazon EC2 Spot

Page 36: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Batch oriented applications can leverage on-demand processing using EC2 Spot to save up to 90% cost:• Claims processing• Large scale transformation• Media processing• Multi-part data processing work

You can also leverage EMR with spot instances.

Batch Processing with Amazon EC2 Spot

Page 37: AWS August Webinar Series -  EC2 Spot Instances - 08192015

• Multi-part job processing architecture • Auto Scaling groups to setup a heterogeneous, scalable

“grid” of EC2 spot instances with multiple capacity pools as worker nodes

• Use S3 to invoke AWS Lambda upon object upload• Use SQS for decoupling• DynamoDB for tracking job status• Complete large batch processing tasks in parallel

Batch Processing with Amazon EC2 Spot

Page 38: AWS August Webinar Series -  EC2 Spot Instances - 08192015

About Lambda and SQS

AWS Lambda is a compute service that runs your code in response to events and automatically manages the compute resources for you, making it easy to build applications that respond quickly to new information.

Amazon Simple Queue Service (SQS) is a fast, reliable, scalable, fully managed message queuing service to decouple components.

Depending on the application’s needs, multiple SQS queues might be required for functions and priorities.

Page 39: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Batch Processing with Amazon EC2 Spot

On Demand Auto-Scaling group

Output S3 bucket

Spot Auto-Scaling group 2

Availability Zone A

Availability Zone BSpot Auto-

Scaling group 1

Upload object into input S3

bucket

Job SQS Queue

Auto Scaling groups will scale up based on queue depth and scale down based on

CPU utilization CW metrics

Workers will check for

jobs in the queue

Workers will update Job status (start time, SLA end time, etc)

in DynamoDB

Uploads to S3 will trigger a Lamda

function to put jobs in SQS and DynamoDB

EFSEC2 instance worker fleet

Page 40: AWS August Webinar Series -  EC2 Spot Instances - 08192015

IAM Role for Lambda Policy{ "Version": "2012-10-17", "Statement": [ { "Sid": "Stmt1438283855455", "Action": [ "dynamodb:PutItem" ], "Effect": "Allow", "Resource": "arn:aws:dynamodb:us-east-1::table/demojobtable" }, { "Sid": "Stmt1438283929844", "Action": [ "sqs:SendMessage" ], "Effect": "Allow", "Resource": "arn:aws:sqs:us-east-1::demojobqueue" } ]}

Page 41: AWS August Webinar Series -  EC2 Spot Instances - 08192015

AWS Lambda function for SQS and DynamoDB updates// dependenciesvar AWS = require('aws-sdk');

// get reference to clientsvar s3 = new AWS.S3();var sqs = new AWS.SQS();var dynamodb = new AWS.DynamoDB();

console.log ('Loading function');

exports.handler = function(event, context) { // Read options from the event. var srcBucket = event.Records[0].s3.bucket.name; // Object key may have spaces or unicode non-ASCII characters. var srcKey = decodeURIComponent(event.Records[0].s3.object.key.replace(/\+/g, " "));

Page 42: AWS August Webinar Series -  EC2 Spot Instances - 08192015

// prepare SQS message var params = { MessageBody: 'object '+ srcKey + ' ', QueueUrl: 'https://sqs.us-east-1.amazonaws.com//demojobqueue', DelaySeconds: 0 }; //send SQS message sqs.sendMessage(params, function (err, data) { if (err) { console.error('Unable to put object' + srcKey + ' into SQS queue due to an error: ' + err); context.fail(srcKey, 'Unable to send message to SQS'); } // an error occurred else { //define DynamoDB table variables var tableName = "demojobtable"; var datetime = new Date().getTime().toString();

AWS Lambda function for SQS and DynamoDB updates

Page 43: AWS August Webinar Series -  EC2 Spot Instances - 08192015

//Put item into DynamoDB table where srcKey is the hash key and datetime is the range key dynamodb.putItem({ "TableName": tableName, "Item": { "srcKey": {"S": srcKey }, "datetime": {"S": datetime }, } }, function(err, data) { if (err) { console.error('Unable to put object' + srcKey + ' into DynamoDB table due to an error: ' + err); context.fail(srcKey, 'Unable to put data to DynamoDB Table'); } else { console.log('Successfully put object' + srcKey + ' into SQS queue and DynamoDB'); context.succeed(srcKey, 'Data put into SQS and DynamoDB'); } }); } });};

AWS Lambda function for SQS and DynamoDB updates

Page 44: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Batch Processing with Amazon EC2 Spot

• Worker nodes get job parts from the SQS and perform single tasks based on the job task state in DynamoDB

• Store the input objects in a file system such as Amazon Elastic File System (Amazon EFS), local instance store or Amazon Elastic Block Store (EBS)

• Each job can be further split into multiples sub-parts if there is a mechanism to stitch the outputs together

• Once completed, the objects will be uploaded back to S3 using multi-part upload.

Page 45: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Batch Processing with Amazon EC2 Spot

On Demand Auto-Scaling group

Output S3 bucket

Spot Auto-Scaling group 2

Availability Zone A

Availability Zone BSpot Auto-

Scaling group 1

Upload object into input S3

bucket

Job SQS Queue

Auto Scaling groups will scale up based on queue depth and scale down based on

CPU utilization CW metrics

Workers will check for

jobs in the queue

Workers will update Job status (start time, SLA end time, etc)

in DynamoDB

Uploads to S3 will trigger a Lamda

function to put jobs in SQS and DynamoDB

EFSEC2 instance worker fleet

Page 46: AWS August Webinar Series -  EC2 Spot Instances - 08192015

More automation?Use a Lambda function to dynamically manage Auto Scaling groups based on the Spot market

• The Lambda function could periodically invoke the EC2 Spot APIs to assess market prices and availability and respond by creating new Auto Scaling launch configurations and groups automatically.

• This function could also delete any Spot Auto Scaling groups and launch configurations that have no instances.

AWS Data Pipeline can be used to invoke the Lambda function using the AWS CLI at regular intervals by scheduling pipelines

Page 47: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Automated Batch Architecture with Spot

Worker

Worker

On Demand Autoscaling group

Output S3 bucket

Worker (spot)

Worker(spot)

Spot Autoscaling group 2

Availability Zone A

Availability Zone B

Worker(spot)

Worker (spot)

Spot Autoscaling group 1

Upload object into input S3

bucket

Job SQS Queue

AutoScaling groups will scale up based on queue depth and scale down based on CPU utilization

CW metrics

Workers will check for

jobs in the queue

Workers will update Job status (start time, SLA end time, etc)

in DynamoDB

DataPipeline can invoke a Lambda function in a scheduled manner which can manage AutoScaling

groups based on the spot market

Uploads to S3 will trigger a Lamda

function to put jobs in DynamoDB and SQS EFS

Page 48: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Further cost optimization with Trusted Advisor

Save money on AWS by eliminating unused and idle resources Cost Optimization TA Checks:

• Amazon EC2 Reserved Instances Optimization• Low Utilization Amazon EC2 Instances• Idle Load Balancers• Underutilized Amazon EBS Volumes• Unassociated Elastic IP Addresses• Amazon RDS Idle DB Instances

Page 49: AWS August Webinar Series -  EC2 Spot Instances - 08192015

AWS re:Invent 2015 – October 6-9AWS re:Invent is the largest annual gathering of the global cloud community. Whether you are an existing customer or new to the cloud, AWS re:Invent will provide you with the knowledge and skills to refine your cloud strategy, improve developer productivity, increase application performance and security, and reduce infrastructure costs.

Though AWS re:Invent tickets are sold out, you can still register to view the Live Stream Broadcasts of the keynote addresses and select technical sessions on October 7 and October 8. Register now.

Details:Wednesday, October 79:00am - 10:30am PT: Andrew Jassy, Sr. Vice President, AWS11:00am - 5:15pm PT: 5 of the most popular breakout sessions (to be announced)

Thursday, October 89:00am - 10:30am PT: Dr. Werner Vogels, CTO, Amazon11:00am - 6:15pm PT: 6 of the most popular breakout sessions (to be announced)

Register now for the Live Stream Broadcast by submitting your email where prompted on the AWS re:Invent home page.

Stay Connected: Follow event activities on Twitter @awsreinvent (#reinvent), or like us on Facebook.

Page 50: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Thank you!

Questions?

Page 51: AWS August Webinar Series -  EC2 Spot Instances - 08192015

What have customers done with Spot?

Some case studies..

Page 52: AWS August Webinar Series -  EC2 Spot Instances - 08192015

EBS

Submit jobs, orchestrate HPC clusters over VPC

Run 1 Million drive head designs = 70.75 core-years

90x throughput: Ran in 8 hours, not 30 days 3 days from idea to running

70,908 cores, 729 TFLOPSc3, r3 with Intel E5-2670 v2

Cost: $5,594Spot Instances

New Drive Head DesignWorkloads

World’s Largest F500 Cloud RunTransforming drive design to store the world’s data

Encrypt, route data to AWS, return results

Cluster 70,908 Coreswith SpotInstances

Page 53: AWS August Webinar Series -  EC2 Spot Instances - 08192015

AWS Delivered Unheard-of Processing

39 years of science

10,600 AWS Instances

Saved equivalent of $40M infrastructure

10 Million compounds screened

39 drug design years in 11 hours for a cost of… $4,232

3 promising compounds identified

Page 54: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Scaling Hadoop Jobs with Spothttp://engineering.bloomreach.com/strategies-for-reducing-your-amazon-emr-costs/

Bloomreach launches 1,500 to 2,000 Amazon EMR clusters and run 6,000 Hadoop jobs every day.

Page 55: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Continuous Integration & Testing with Spot

• Tapjoy - Premier Mobile Ad Network Across iOS & Android• Global Network (435 Million Monthly Reach)• Jenkins + Spot Instances

• https://github.com/bwall/ec2-plugin (thanks to an RIT senior project)• Go wide during business hours, scale back in the evenings.

Automatically kicks online at 06:00ET• Workers scale horizontally to support dozens of simultaneous regression

tests spread out over dozens of workers• Jenkins automatically guards against spot termination

Page 56: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Ooyala• Video technology platform that

serves ESPN, Bloomberg, ...• Uses combo of OD/RI/Spot to

ensure it can cover predicted volumes while keeping costs low

• http://aws.amazon.com/solutions/case-studies/ooyala/

Vevo• Library of over 75,000 HD videos• Must be able to rapidly transcode

library to a new screen format• Can spin up 100s of Spot

instances to transcode entire library in a matter of days (instead of the weeks)

Queue-based media transcoding

Page 57: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Using Spot Fleet

An example..

Page 58: AWS August Webinar Series -  EC2 Spot Instances - 08192015

Using Spot Fleet

Create EC2 Spot Fleet IAM Role Requesting a fleet:

• aws ec2 request-spot-fleet --spot-fleet-request-config file://mySmallFleet.json

Describe fleet:• aws ec2 describe-spot-fleet-requests• aws ec2 describe-spot-fleet-requests --spot-fleet-request-ids <sfr-………..>

Describe instances within the fleet• aws ec2 describe-spot-fleet-instances --spot-fleet-request-id <sfr-…………>

Cancel Spot Fleet (with termination):• aws ec2 cancel-spot-fleet-requests --spot-fleet-request-ids <sfr-…………..> -

terminate-instances

Page 59: AWS August Webinar Series -  EC2 Spot Instances - 08192015

mySpotFleet.json{ "TargetCapacity": 5, "SpotPrice": "1.00", "IamFleetRole": "arn:aws:iam::962872214910:role/fleetRole",

"LaunchSpecifications": [ { "ImageId": "ami-ff527ecf", "InstanceType": "m1.small" },

{ "ImageId": "ami-ff527ecf", "InstanceType": "m1.medium" },

{ "ImageId": "ami-ff527ecf", "InstanceType":"m1.large" } ]}