aws re:invent 2016: running lean architectures: how to optimize for cost efficiency (arc313)

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Markus Ostertag, Head of Development, Team Internet AG, @osterjour

Constantin Gonzalez, Principal Solutions Architect, AWS, @zalez

November 30, 2016

Running Lean ArchitecturesHow to Optimize for Cost Efficiency

ARC313

What you’ll get out of this session

• Best practices on how to lower your AWS bill

• A more scalable, robust, dynamic architecture

• More time to innovate

• Real-world customer examples

• Easy to implement

Structure

Business

Architecture

Operations

Business Goals

“Pay as little as possible

for what we use.”

AWS pricing philosophy

Ecosystem

Global Footprint

New Features

New Services

More AWS

Usage

More

Infrastructure

Lower

Infrastructure

Costs

Reduced

Prices

More

CustomersInfrastructure

Innovation

57 price reductions

since 2006

Economies

of Scale

AWS TCO calculator

AWS simple monthly calculator

AWS billing alerts

AWS billing console

AWS Trusted Advisoraws.amazon.com/premiumsupport/trustedadvisor/

Free with Business or Enterprise Support

Free Trusted Advisor Trial!

• Free trial begins on 12/6/16

• Runs for 30 days

• Full suite of checks and best practice

recommendations available

• For customers not already on

business/enterprise support plans

• No action required:

Just log in and start using!

https://console.aws.amazon.com/trustedadvisor

Reserved Instances

Reserved Instances basics

1y RI

Break even

3y RI

Break even

Markus OstertagHead of Development @ Team Internet

Who is Team Internet?

Domain monetization business

30 people

HQ in Munich, Germany

Tech focused

Reserved Instances

Reserved Instances – our usage

Use the regional benefit

Change existing Reserved Instances to scope “Region”

Capacity reservation decoupled from cost optimization

Convertible Reserved Instances

Use case: “(Very) Long running, but flexible”

Architecture Goals

“Avoid waste as much

as possible.”

Turn off unused instances

• Developer, test, training instances

• Use simple instance start and stop

• Or tear down and build up all together

using AWS CloudFormation

• Instances are disposable!

Customer example

Monday Friday End of Vacation Season35% saved

Automate, automate, automate

• AWS SDKs

• AWS CLI

• AWS CloudFormation

• AWS OpsWorks

• Netflix Janitor Monkey

• Cloudlytics EC2 Scheduler

• Auto Scaling

How Auto Scaling works

AWS CloudFormation example launch configuration

"LaunchConfig": {

"Type" : "AWS::AutoScaling::LaunchConfiguration",

"Metadata" : {

"AWS::CloudFormation::Init" : {

"config" : {

… packages, sources, files, services …

}

}

},

"Properties": {

"ImageId" : "ami-149f7863",

"InstanceType" : "m1.small",

"SecurityGroups" : [ {"Ref" : "WebServerSecurityGroup"} ],

"KeyName" : "MySSHKey",

"UserData" : { "Fn::Base64" : { "Fn::Join" : ["", [

"#!/bin/bash -v\n",

… your user data script …

]]}}

}

}

AWS CloudFormation exampleAuto Scaling group definition

"WebServerGroup" : {

"Type" : "AWS::AutoScaling::AutoScalingGroup",

"Properties" : {

"AvailabilityZones" : [

"us-east-1a","us-east-1b","us-east-1c",

],

"LaunchConfigurationName" : { "Ref" : "LaunchConfig" },

"MinSize" : “3",

"MaxSize" : “6",

"DesiredCapacity" : “3",

"LoadBalancerNames" : [ { "Ref" : "ElasticLoadBalancer" } ]

}

}

Align resources with demand

Use Spot Instances

• Price based on supply/demand

• You choose your maximum price/hour

• Your instance is started if the Spot price is lower

• Your instance is terminated if

the Spot price is higher, with 2 minutes notice

• But: You did plan for fault tolerance, didn’t you?

Spot Instance example

On-Demand:

$0.24

$0.028 (11.7%) $0.026 (10,8%)

$3.28

(1367%)

Spot Instance use cases

• Stateless Web/App server fleets

• Amazon EMR

• Continuous Integration (CI)

• High Performance Computing (HPC)

• Grid Computing

• Media Rendering/Transcoding

aws.amazon.com/ec2/spot

Spot Bid Advisor

Spot Instances recap

• Very dynamic pricing

• Opportunity to save 80-90% cost• But there are risks

• Different prices per AZ

• Leverage Auto Scaling!• One group with Spot Instances

• One group with On-Demand

• Get the best of both worlds

• Spot fleets – Manage thousands of

Spot Instances with one API call

“But my applications are

too small

for Auto Scaling!”

Amazon EC2 Container Service

• Easily manage Docker containers

• Flexible container placement

• Designed for use with other AWS services

• Extensible

• Performance at Scale

• Secure

10%

15%

7%

12%

20%

9%

Consolidate with Amazon ECS

App 1 App 2

App 3 App 4

App 5 App 6

6

12 345

Amazon ECS

cluster

AWS Lambda

Amazon S3 bucket events AWS Lambda

Original object Compressed object

1

2

3

Get rid of idle time with AWS Lambda

• Automatic scaling

• Automatic provisioning

• No need to manage infrastructure

• Just bring your code

• $0.20 per million requests, 1M free

• 100 ms payment granularity

• Never pay for idle

Less than 40% utilization?

Consider using AWS Lambda instead!

Optimizing database utilization

Database optimization through caching

DynamoDB Amazon RDS DynamoDB Amazon RDS

ElastiCache

Caching saves money

DynamoDB reads

Saved 3k reads per second (>20k reads per second in total)

Cache really everything!

Cache hit ratio 25-30% Cache hit ratio 89-95%

Without negative caching With negative caching

Hit

Miss

Cache invalidation

“There are only two hard things in Computer Science:

cache invalidation and naming things” -- Phil Karlton

Two ways to cache:

Time to Live (TTL) with invalidation

Keep the cache in sync all the time

Synchronous writes

Sync/check in the app with after-write return values

DynamoDB

ElastiCache

write/update always

write/update always

Uncoupled writes/invalidation

Sync/updates via Lambda (uncoupled)

DynamoDB

ElastiCache

write/update always

no update

LambdaDynamoDB

Stream

update

Think of strategies for optimizing CU use

• Use multiple tables to support varied access patterns

• Understand access patterns for time series data

• Compress large attribute values

Consider read (4K) vs. write (1K) sizes

Use Amazon SQS to buffer over-capacity writes

Resize capacity units dynamically

Amazon SQS can buffer requests

Dynamic DynamoDB

Offload popular traffic to

Amazon S3 and/or Amazon CloudFront

Operational Goals

“Focus on what you do best,

let AWS do the rest.”

Leverage existing services

• Use Amazon RDS, DynamoDB,

ElastiCache for Redis or

Amazon Redshift

• Instead of running your own database

• Amazon Elasticsearch Service• Instead of running your own cluster

• Amazon SQS

• Amazon Kinesis,

Amazon Kinesis Firehose, Amazon SNS, and more …

AWS has experts for each service

RDSAmazon Redshift

Amazon

Elasticsearch

Service

Amazon KinesisSQS

DynamoDB

Pick the right tool for the job

Key/Value

Scalable

throughput

Low latency

Amazon Aurora

More complex

data/queries

Scalable

storage

Amazon

Redshift

Big (complex)

data

Higher

latency

ElastiCache

for Redis

Key/Value

In-Memory

(Very) low

latency

There is not one database to rule them all

MongoDB

Tracking

API

RTB

Engine

User&Stats

API

Tracking

API

RTB

Engine

DynamoDB

Decoupled

Amazon

Aurora

Amazon

Redshift

User&Stats

API

Advantages

• No undifferentiated heavy lifting

(= cost savings)

• AWS operates the DB

infrastructure for us

• Simple and more granular scale out

• No interference between

RTB/Tracking and User/Stats

Let’s recap

1. Use AWS TCO/cost/billing tools

2. Use Reserved Instances

3. Avoid idle instances through automation

4. Use Spot Instances

5. Optimize database utilization

6. Pick the right tool for the job

7. Offload your architecture

But wait! There’s even more!

https://youtu.be/SG1DsYgeGEk

AWS re:Invent 2015

ARC302 Running Lean Architectures: Optimizing for Cost Efficiency

• DynamoDB GSI Tweaking

• Using CloudFront to avoid

multi-region setups

• Amazon S3 storage tiering

• … and much more!

Thank you!

Remember to complete

your evaluations!

aws re:invent 2016: running lean architectures: how to optimize for cost efficiency (arc313)

Technology