design for failure : architecture blueprints for achieving high availability in aws

“Design for Failure”

Blueprints for Achieving High Availability in AWS

Harish Ganesan

Co founder & CTO

8KMiles

www.twitter.com/harish11g

http://www.linkedin.com/in/harishganesan

• "Anything that can go wrong will go wrong“ Murphy's law

• O'Reilly Media's George Reese "You either deemed an outage of this nature an acceptable risk or you failed to design for Amazon's cloud computing model”

URL: http://www.eweek.com/c/a/Cloud-Computing/Final-Thoughts-on-the-FiveDay-AWS-Outage-236462/

• “Yesterday’s AWS outage is a good opportunity to take a step back and be realistic about the approach to cloud” - CloudAve

URL: http://www.cloudave.com/11886/some-lessons-from-aws-outage/

• “Design for Failure and nothing will really fail“ -Werner Vogels ,CTO Amazon

Design for Failure – recent AWS outage

http://www.eweek.com/c/a/Cloud-Computing/Final-Thoughts-on-the-FiveDay-AWS-Outage-236462/

















http://www.cloudave.com/11886/some-lessons-from-aws-outage/










• Very little awareness among developers to design Highly Available applications in AWS

• In AWS Infrastructure “application is partly responsible for its availability and failover”

• Help startups and companies who are new to AWS with some blueprints

• Provide some architectural insights to build High Availability

Why this presentation ?

• Multi tiered LAMP/LAMJ Web site on AWS

• Data base tier is using MySQL / AWS RDS

• Need transparent switch between Main and High Availability (HA) site in seconds during outage

• Achieving High Availability , Failover and Scalability are the top priorities in this use case

Sample Use Case

• “Design for Failure and nothing will really fail“

• Avoid Single Point Of Failure (SPOF)

• Apply “AWS Best Practices Framework” while architecting the Infrastructure

• No compromise on Scalability

• Website should be available all times

• Infrastructure should align closely with load requirement

• Cost of Infrastructure (vs) price of unhappy customers

Blueprint Objectives

• AWS Security groups

• AWS Elastic Load balancing

• AWS Auto Scaling

• AWS EC2 ( Elastic Cloud Compute)

• AWS EBS ( Elastic Block Storage)

• AWS CloudWatch

• AWS Elastic IP

• AWS S3 ( Simple Storage Service)

List of AWS Used in this Use Case

HA Architecture Blueprints on AWS

• Blue print1 : How to achieve High Availability across AWS regions ?

• Blue print2 : How to achieve High Availability across AWS Availability Zones (AZ’s) ?

– Note : As per AWS “ By launching instances in separate Availability Zones, you can protect your applications from failure of a single location” ,- but current outage has spanned across multiple AZ’s in USA East . Hopefully AWS will rectify these problems very soon once for all.

Blue Print 1 : High Availability across AWS regions

High Availability across AWS Regions

Main web site is hosted in AWS USA East and West region

Blue Print 1 : HA across regions in AWS

Main Website in

AWS region 1

Main Website in

AWS region 2

AWS USA East Region

AWS West/Europe/APAC

Region

• Leverages AWS Inter Region application hosting

• Website is hosted on multiple Regions on AWS ( example USA east – west , USA –EUR etc)

• GEO traffic distribution and HA across continents is possible in this Architecture blueprint

• Suitable for companies which demand high level of Scalability, load balancing and Availability across the globe

• This blueprint can be applied currently to all 5 AWS regions ( USA East ,USA West, EU , APAC –Singapore and APAC-Tokyo)

Blue print 1: HA across AWS regions

MySQL Master

AWS Elastic Load Balancer

Blue Print 1 : Website in Multiple AWS regions (using MySQL)

CLOUDWATCH

D

MySQL M-M Replication

Main Site - AWS USA East Region

Dynamic/Managed/ Directional DNS Servers

Directional DNS Servers directs the user requests to Main site

in AWS USA east region. In case of outage in USA East region ,

the web requests are directed to same website hosted in USA

West region

1

D

Main Site- AWS USA WestRegion

USA-East-1A

Auto Scaling Auto Scaling

USA-East-1C

MySQL Master

S3

2AWS ELB balances the requests

between the Auto scaled EC2 launched

in multiple AZ’s inside the EAST region

3MySQL is launched in Multiple AZ’s

inside the EAST region with M-M

replication mode

MySQL Master


CLOUDWATCH

USA-WEST-1A


USA-WEST-1B

MySQL Master

S3

4MySQL Master replication

between USA EAST and WEST

regions are setup

RDS Master


Blue Print 1 : Website in Multiple AWS regions(using RDS)

CLOUDWATCH

D

Programmatic Replication

Main Site - AWS USA East Region

Dynamic/Managed/ Directional DNS Servers

Directional DNS Servers directs the user requests to Main site

in AWS USA east region. In case of outage in USA East region

,the web requests are directed to same website hosted in USA

West region

1

D

Main Site- AWS USA WestRegion

USA-East-1A


USA-East-1C

RDS Standby

S3

2AWS ELB balances the requests

between the Auto scaled EC2 launched

in multiple AZ’s inside the EAST region

3MySQL is launched in Multiple AZ’s

inside the EAST region with HA

replication mode

RDS Standby


CLOUDWATCH

USA-WEST-1A


USA-WEST-1B

RDS Master

S3

4RDS Master replication

between USA EAST and WEST

regions is done

programmatically

1. User traffic hits the Managed DNS/Directional DNS server

2. Directional DNS server redirects the load to nearest Available AWS Region

3. AWS ELB load balances the requests between the Auto scaled Web/App instances

4. Web / App instances does transactions/queries with MySQL DB / RDS

5. MySQL DB/RDS is configured for replication acrossand inside a AWS region

Blue print 1: Information flow

• Managed DNS server will provide automatic failover at DNS level in case of a outage at the primary website location

• Transparent switch between websites hosted in AWS East and AWS West within <60 seconds during outage

• Automatic Traffic diversion to nearest site location

• Managed/Directional DNS servers should be globally distributed ( no SPOF)

High Availability @ DNS Layer

• AWS ELB provides load balancing service with thousands of EC2 servers behind them

• AWS ELB avoids single point of failure on the Load Balancing layer

• The theoretical maximum response rate of AWS ELB is limitless and it can handle more than thousands of concurrent requests with ease

High Availability @ Load Balancing Layer

• Amazon EC2 instances which hosts Web/App layers are launched in Multiple AZ’s inside a region

• Multiple AZ deployments avoid single point of failure at zone level

• Amazon EC2 instances are configured with Auto scaling in each AZ inside a region

• AWS Auto Scaling will ensure minimum number Web/App EC2 instances are always running

High Availability @ Web/App Layer

• AWS RDS (MySQL) is configured with HA feature offers no single point of failure (SPOF)

• HA feature ensures that an Active standby RDS is running in another AZ

• When Master RDS fails , Active standby will automatically takeover the DB traffic in <180 seconds

• Read replicas should be launched from multiple AZ’s to avoid SPOF

• Programmatic replication of data between RDS in different AWS regions is needed

High Availability @ Database Layer (RDS)

• Master - Master MySQL is configured in two different AZ’s ( offering HA inside Region)

• Configure Asynchronous Read slaves in Multiple AZ’s

• Master – Master replication is configured between MySQL of different regions ( HA across regions)

• Elastic IP & health check based elevation within 60 seconds during failure

• Asynchronous data replication

High Availability @ Database (Non RDS)

• S3 should be used for Storing Uploaded data files and snapshots

• Companies can store all their media assets with proper Access controls in S3

High Availability @ Storage Layer

• Scalable and Highly available Architecture

• Inter Regional High Availability in AWS

• In event of failure at USA east region , the traffic can be directed to USA west in seconds

• Website deployed in both regions can scale and shrink according to load

• Cost effective for large server farm deployments

• Low latency achieved through traffic direction

• No customers are lost because of load or availability problems . Ops are happy !!!

Blueprint 1: Positives

• Complete Dependency on AWS cloud

• Technically complex and intricate setup

• Costlier to build and operate (Sophistication comes at a cost)

• No Unified Infra Management currently for this architecture

– Example : Directional DNS and AWS are two separate management consoles

Blueprint 1: Negatives

Blue Print 2 : High Availability across AWS AZ’s

Blue print 2: High Availability across AWS AZ’s

Web site is hosted in AWS USA east region spread across 2 AZ’s (1A & 1B)

Blue Print 2 : HA across AWS AZ’s

Main website in USA

–EAST -1A

Main Website in

USA-EAST-1B

AWS USA East Region

AWS USA East Region

• Leverages AWS Inter AZ application hosting

• Main site is spread across multiple AZ’s ( 1A and 1B) inside a single AWS East region

• Suitable for companies which demands Scalability, load balancing and High Availability

Blue Print 2: HA across AWS AZ’s

CloudWatch

RDSMaster

AWS Security groups

Blue Print 2 : HA inside a AWS Region

D

USA- EAST -1A


1

Auto Scaling

2

RDS Standby

Auto Scaling

USA- EAST -1B

Read Replica

Read Replica

S3

Web requests are sent to the Amazon ELB

1

AWS ELB transfers the requests to Amazon EC2 instances launched in Multiple AZ’s using Round Robin with Session Sticky

2

Web/APP hosted on Amazon EC2 will transact with RDS master and read from Read replica’s

3

RDS Standby will be launched on different AZ from the RDS master for HA.Read Replicas should be launched on multiple AZ’s for HA

4

3

4

5

AWS CloudWatch will monitor this entire Multi AZ infrastructure deployed on AWS cloud

5

1. User traffic hits AWS ELB

2. AWS ELB load balances the requests between the Auto scaled Web/App instances

3. Web / App instances does transactions/queries with MySQL DB / RDS

4. MySQL RDS is configured with Master-HA –Read replica mode inside an AWS region across AZ’s

Information flow

• Load Balancing : AWS ELB provides load balancing service with No Single Point of Failure

• AWS ELB will switch traffic across AZ’s in event of failure

• Web/App Layers :Amazon EC2 instances which hosts Web/App layers are launched in Multiple AZ’s inside a region

• AWS Auto Scaling will ensure minimum number Web/App EC2 instances are always running in both AZ’s ( Balancing algorithm)

Blueprint 2: High Availability

• Even if there is outage in one AZ ,the Web/App server in other AZ’s will be active

• Multiple AZ deployments avoid single point of failure at Web/App layer

• Database Layer :AWS RDS (MySQL) is configured with HA standby mode

• AWS RDS Master and HA standby reside on different AZ’s

• Read replicas should be configured in multiple AZ’s to avoid SPOF


• Read replica’s provide higher performance for read only operations

• Multiple AZ deployments of AWS RDS with Standby avoids single point of failure at DB layer

• S3 should be used for Storing Uploaded data files and snapshots

• Companies can store all their media assets with proper Access controls in S3


• Scalable and Highly available Architecture

• Inter AZ High Availability of AWS leveraged

• No SPOF at AZ level

• Outage on a single AWS AZ will not impact the website

• Cost effective compared to AWS Regional HA Architecture (Blue print1)

• Auto Scaling and Read replica’s are distributed across both the AZ’s inside a AWS region

Blueprint 2: Positives

• AWS Regional Outage will impact the website operations

• Complete Dependency on AWS Region

• GEO Traffic diversion is not possible

• Complete Dependency on AWS cloud

Blueprint 2: Negatives

How do I leverage High Availability architecture on AWS?

Cloud Architecture Consulting

Cloud Migration & Implementation

Cloud Application Development

Leave it to the experts , we will

handle this

Cloud Adoption Strategy

“Let's get the job done”

• Amazon Systems Integrator and Solution Developer

• Migrated 350+ servers for start-ups, small businesses and Enterprises

• Prior expertise in architecting HA solutions over AWS

• In-depth understanding of Cloud infrastructure services

Why 8KMiles ?

8KMiles in Media

Contact Us

“All you need is an idea and the cloud will execute it for you.” (Structure 2010 event) - Dr Werner Vogels , CTO of Amazon on 8KMiles

For more details on how 8KMiles Cloud Consulting can help

your business , contact

[email protected]

http://cloud.8kmiles.com

http://cloudblog.8kmiles.com

http://www.8kmiles.com

http://www.livestream.com/gigaomtv/video?clipId=pla_0aa31b29-9dd0-44a9-9a13-f2958bb81cec&utm_source=lslibrary&utm_medium=ui-thumb

design for failure : architecture blueprints for achieving high availability in aws

Documents