design for failure : architecture blueprints for achieving high availability in aws
DESCRIPTION
Every cloud expert will acknowledge that Mantra for successful architecture in AWS cloud is "Design for Failure" methodology and yet we observe that many companies did not adhere to that rule in this recent outage. The reasons could be ranging from technical awareness of configuring High availability to Cost of Operating a complex global HA setup in AWS.In this article, we have shared some of our prior experience on architecting High availability systems on AWS as blueprints. We feel this small gesture will create HA awareness and help the strong AWS user community to build better solutions in the future.The sample blueprints areBlue print 1: How to achieve High Availability across AWS Regions? Blue print 2: How to achieve High Availability across AWS Availability Zones (AZ’s)?TRANSCRIPT
“Design for Failure”
Blueprints for Achieving High Availability in AWS
Harish Ganesan
Co founder & CTO
8KMiles
www.twitter.com/harish11g
http://www.linkedin.com/in/harishganesan
• "Anything that can go wrong will go wrong“ Murphy's law
• O'Reilly Media's George Reese "You either deemed an outage of this nature an acceptable risk or you failed to design for Amazon's cloud computing model”
URL: http://www.eweek.com/c/a/Cloud-Computing/Final-Thoughts-on-the-FiveDay-AWS-Outage-236462/
• “Yesterday’s AWS outage is a good opportunity to take a step back and be realistic about the approach to cloud” - CloudAve
URL: http://www.cloudave.com/11886/some-lessons-from-aws-outage/
• “Design for Failure and nothing will really fail“ -Werner Vogels ,CTO Amazon
Design for Failure – recent AWS outage
• Very little awareness among developers to design Highly Available applications in AWS
• In AWS Infrastructure “application is partly responsible for its availability and failover”
• Help startups and companies who are new to AWS with some blueprints
• Provide some architectural insights to build High Availability
Why this presentation ?
• Multi tiered LAMP/LAMJ Web site on AWS
• Data base tier is using MySQL / AWS RDS
• Need transparent switch between Main and High Availability (HA) site in seconds during outage
• Achieving High Availability , Failover and Scalability are the top priorities in this use case
Sample Use Case
• “Design for Failure and nothing will really fail“
• Avoid Single Point Of Failure (SPOF)
• Apply “AWS Best Practices Framework” while architecting the Infrastructure
• No compromise on Scalability
• Website should be available all times
• Infrastructure should align closely with load requirement
• Cost of Infrastructure (vs) price of unhappy customers
Blueprint Objectives
• AWS Security groups
• AWS Elastic Load balancing
• AWS Auto Scaling
• AWS EC2 ( Elastic Cloud Compute)
• AWS EBS ( Elastic Block Storage)
• AWS CloudWatch
• AWS Elastic IP
• AWS S3 ( Simple Storage Service)
List of AWS Used in this Use Case
HA Architecture Blueprints on AWS
• Blue print1 : How to achieve High Availability across AWS regions ?
• Blue print2 : How to achieve High Availability across AWS Availability Zones (AZ’s) ?
– Note : As per AWS “ By launching instances in separate Availability Zones, you can protect your applications from failure of a single location” ,- but current outage has spanned across multiple AZ’s in USA East . Hopefully AWS will rectify these problems very soon once for all.
Blue Print 1 : HA across regions in AWS
Main Website in
AWS region 1
Main Website in
AWS region 2
AWS USA East Region
AWS West/Europe/APAC
Region
• Leverages AWS Inter Region application hosting
• Website is hosted on multiple Regions on AWS ( example USA east – west , USA –EUR etc)
• GEO traffic distribution and HA across continents is possible in this Architecture blueprint
• Suitable for companies which demand high level of Scalability, load balancing and Availability across the globe
• This blueprint can be applied currently to all 5 AWS regions ( USA East ,USA West, EU , APAC –Singapore and APAC-Tokyo)
Blue print 1: HA across AWS regions
MySQL Master
AWS Elastic Load Balancer
Blue Print 1 : Website in Multiple AWS regions (using MySQL)
CLOUDWATCH
D
MySQL M-M Replication
Main Site - AWS USA East Region
Dynamic/Managed/ Directional DNS Servers
Directional DNS Servers directs the user requests to Main site
in AWS USA east region. In case of outage in USA East region ,
the web requests are directed to same website hosted in USA
West region
1
D
Main Site- AWS USA WestRegion
USA-East-1A
Auto Scaling Auto Scaling
USA-East-1C
MySQL Master
S3
2AWS ELB balances the requests
between the Auto scaled EC2 launched
in multiple AZ’s inside the EAST region
3MySQL is launched in Multiple AZ’s
inside the EAST region with M-M
replication mode
MySQL Master
AWS Elastic Load Balancer
CLOUDWATCH
USA-WEST-1A
Auto Scaling Auto Scaling
USA-WEST-1B
MySQL Master
S3
4MySQL Master replication
between USA EAST and WEST
regions are setup
RDS Master
AWS Elastic Load Balancer
Blue Print 1 : Website in Multiple AWS regions(using RDS)
CLOUDWATCH
D
Programmatic Replication
Main Site - AWS USA East Region
Dynamic/Managed/ Directional DNS Servers
Directional DNS Servers directs the user requests to Main site
in AWS USA east region. In case of outage in USA East region
,the web requests are directed to same website hosted in USA
West region
1
D
Main Site- AWS USA WestRegion
USA-East-1A
Auto Scaling Auto Scaling
USA-East-1C
RDS Standby
S3
2AWS ELB balances the requests
between the Auto scaled EC2 launched
in multiple AZ’s inside the EAST region
3MySQL is launched in Multiple AZ’s
inside the EAST region with HA
replication mode
RDS Standby
AWS Elastic Load Balancer
CLOUDWATCH
USA-WEST-1A
Auto Scaling Auto Scaling
USA-WEST-1B
RDS Master
S3
4RDS Master replication
between USA EAST and WEST
regions is done
programmatically
1. User traffic hits the Managed DNS/Directional DNS server
2. Directional DNS server redirects the load to nearest Available AWS Region
3. AWS ELB load balances the requests between the Auto scaled Web/App instances
4. Web / App instances does transactions/queries with MySQL DB / RDS
5. MySQL DB/RDS is configured for replication acrossand inside a AWS region
Blue print 1: Information flow
• Managed DNS server will provide automatic failover at DNS level in case of a outage at the primary website location
• Transparent switch between websites hosted in AWS East and AWS West within <60 seconds during outage
• Automatic Traffic diversion to nearest site location
• Managed/Directional DNS servers should be globally distributed ( no SPOF)
High Availability @ DNS Layer
• AWS ELB provides load balancing service with thousands of EC2 servers behind them
• AWS ELB avoids single point of failure on the Load Balancing layer
• The theoretical maximum response rate of AWS ELB is limitless and it can handle more than thousands of concurrent requests with ease
High Availability @ Load Balancing Layer
• Amazon EC2 instances which hosts Web/App layers are launched in Multiple AZ’s inside a region
• Multiple AZ deployments avoid single point of failure at zone level
• Amazon EC2 instances are configured with Auto scaling in each AZ inside a region
• AWS Auto Scaling will ensure minimum number Web/App EC2 instances are always running
High Availability @ Web/App Layer
• AWS RDS (MySQL) is configured with HA feature offers no single point of failure (SPOF)
• HA feature ensures that an Active standby RDS is running in another AZ
• When Master RDS fails , Active standby will automatically takeover the DB traffic in <180 seconds
• Read replicas should be launched from multiple AZ’s to avoid SPOF
• Programmatic replication of data between RDS in different AWS regions is needed
High Availability @ Database Layer (RDS)
• Master - Master MySQL is configured in two different AZ’s ( offering HA inside Region)
• Configure Asynchronous Read slaves in Multiple AZ’s
• Master – Master replication is configured between MySQL of different regions ( HA across regions)
• Elastic IP & health check based elevation within 60 seconds during failure
• Asynchronous data replication
High Availability @ Database (Non RDS)
• S3 should be used for Storing Uploaded data files and snapshots
• Companies can store all their media assets with proper Access controls in S3
High Availability @ Storage Layer
• Scalable and Highly available Architecture
• Inter Regional High Availability in AWS
• In event of failure at USA east region , the traffic can be directed to USA west in seconds
• Website deployed in both regions can scale and shrink according to load
• Cost effective for large server farm deployments
• Low latency achieved through traffic direction
• No customers are lost because of load or availability problems . Ops are happy !!!
Blueprint 1: Positives
• Complete Dependency on AWS cloud
• Technically complex and intricate setup
• Costlier to build and operate (Sophistication comes at a cost)
• No Unified Infra Management currently for this architecture
– Example : Directional DNS and AWS are two separate management consoles
Blueprint 1: Negatives
Blue print 2: High Availability across AWS AZ’s
Web site is hosted in AWS USA east region spread across 2 AZ’s (1A & 1B)
Blue Print 2 : HA across AWS AZ’s
Main website in USA
–EAST -1A
Main Website in
USA-EAST-1B
AWS USA East Region
AWS USA East Region
• Leverages AWS Inter AZ application hosting
• Main site is spread across multiple AZ’s ( 1A and 1B) inside a single AWS East region
• Suitable for companies which demands Scalability, load balancing and High Availability
Blue Print 2: HA across AWS AZ’s
CloudWatch
RDSMaster
AWS Security groups
Blue Print 2 : HA inside a AWS Region
D
USA- EAST -1A
AWS Elastic Load Balancer
1
Auto Scaling
2
RDS Standby
Auto Scaling
USA- EAST -1B
Read Replica
Read Replica
S3
Web requests are sent to the Amazon ELB
1
AWS ELB transfers the requests to Amazon EC2 instances launched in Multiple AZ’s using Round Robin with Session Sticky
2
Web/APP hosted on Amazon EC2 will transact with RDS master and read from Read replica’s
3
RDS Standby will be launched on different AZ from the RDS master for HA.Read Replicas should be launched on multiple AZ’s for HA
4
3
4
5
AWS CloudWatch will monitor this entire Multi AZ infrastructure deployed on AWS cloud
5
1. User traffic hits AWS ELB
2. AWS ELB load balances the requests between the Auto scaled Web/App instances
3. Web / App instances does transactions/queries with MySQL DB / RDS
4. MySQL RDS is configured with Master-HA –Read replica mode inside an AWS region across AZ’s
Information flow
• Load Balancing : AWS ELB provides load balancing service with No Single Point of Failure
• AWS ELB will switch traffic across AZ’s in event of failure
• Web/App Layers :Amazon EC2 instances which hosts Web/App layers are launched in Multiple AZ’s inside a region
• AWS Auto Scaling will ensure minimum number Web/App EC2 instances are always running in both AZ’s ( Balancing algorithm)
Blueprint 2: High Availability
• Even if there is outage in one AZ ,the Web/App server in other AZ’s will be active
• Multiple AZ deployments avoid single point of failure at Web/App layer
• Database Layer :AWS RDS (MySQL) is configured with HA standby mode
• AWS RDS Master and HA standby reside on different AZ’s
• Read replicas should be configured in multiple AZ’s to avoid SPOF
Blueprint 2: High Availability
• Read replica’s provide higher performance for read only operations
• Multiple AZ deployments of AWS RDS with Standby avoids single point of failure at DB layer
• S3 should be used for Storing Uploaded data files and snapshots
• Companies can store all their media assets with proper Access controls in S3
Blueprint 2: High Availability
• Scalable and Highly available Architecture
• Inter AZ High Availability of AWS leveraged
• No SPOF at AZ level
• Outage on a single AWS AZ will not impact the website
• Cost effective compared to AWS Regional HA Architecture (Blue print1)
• Auto Scaling and Read replica’s are distributed across both the AZ’s inside a AWS region
Blueprint 2: Positives
• AWS Regional Outage will impact the website operations
• Complete Dependency on AWS Region
• GEO Traffic diversion is not possible
• Complete Dependency on AWS cloud
Blueprint 2: Negatives
Cloud Architecture Consulting
Cloud Migration & Implementation
Cloud Application Development
Leave it to the experts , we will
handle this
Cloud Adoption Strategy
“Let's get the job done”
• Amazon Systems Integrator and Solution Developer
• Migrated 350+ servers for start-ups, small businesses and Enterprises
• Prior expertise in architecting HA solutions over AWS
• In-depth understanding of Cloud infrastructure services
Why 8KMiles ?
Contact Us
“All you need is an idea and the cloud will execute it for you.” (Structure 2010 event) - Dr Werner Vogels , CTO of Amazon on 8KMiles
For more details on how 8KMiles Cloud Consulting can help
your business , contact
http://cloud.8kmiles.com
http://cloudblog.8kmiles.com
http://www.8kmiles.com