how to design for high availability & scale with aws
Post on 21-Oct-2014
2.263 views
DESCRIPTION
This presentation talks about how you can optimize your Application Architecture on AWS Cloud and create a Fault Tolerant Architecture that will have Zero Down Time! The best practices for a fault tolerant Web Applicaiton.TRANSCRIPT
Blazeclan 1
Blazeclan
Agenda
Introduction
High Availability
Scalability
Fault Tolerance
AWS Global Infrastructure
Key Design Concepts
Design for Failure
Scaling
Self Healing / Fault Tolerant
Multiple AZ Architecture
Loose Coupling
Sample Architectures
Cloud IT Better2
Blazeclan 3 Cloud IT Better
Introduction
Blazeclan
How Often Do You See This?
Cloud IT Better4
Blazeclan
Cost of Downtime
Cloud IT Better5
A report published in 2010 for top
412 eCommerce sites says,
• The median length of downtime was 840
minutes
• On average, each of them saw 3291 minutes
of downtime
Lost Revenue
• On average, each of them lost $800,099 in
revenue due to downtime
• The total amount of revenue lost due to
downtime of all 412 companies
was $329,640,928!
Blazeclan
Online Business & Downtime Facts
Cloud IT Better6
The Average Hourly Loss because
of Data Center Down Time in 2012
Source: http://www.techrepublic.com/blog/data-center/infographic-the-outrageous-costs-of-data-center-downtime
Blazeclan
How to Build a HIGHLY
AVAILABLE, SCALABLE,
DURABLE AND
RESILIENT Web Application
Cloud IT Better7
Blazeclan
High Availability
• Up Time of an Application
• Planned or Unplanned Outage or Downtime
• Offline, Unreachable, or Partially Available
• Slow to Use
• Goal
• No Downtime
• Always Available
Cloud IT Better8
uptime
99.999%
Blazeclan
Scalability
Cloud IT Better9
Demand
Time
Resources
Scalability doesn’t
Guarantee Availability
Ability of an
Application to
accommodate
change in traffic
without
architectural
changes
Availability may be impacted if application cannot Scale
Blazeclan
Fault Tolerance
• Built-in Redundancy so
applications can Continue
Functioning when Components
fail
• Fault tolerance is crucial to
High Availability
Cloud IT Better10
X
X
Image courtesy: Gigamone.com
Blazeclan 11 Cloud IT Better
AWS Global Infrastructure
Blazeclan
AWS democratizes High Availability
• Multiple Servers
• Isolated Redundant Data
Centers
• Regions across the
Globe
• Availability Zones within
Regions
Cloud IT Better12
Source: http://aws.amazon.com/about-aws/globalinfrastructure/#reglink-sa
Blazeclan
AWS Capacity
Cloud IT Better13
Source: http://www.slideshare.net/AmazonWebServices/aws-webinar-scaling-on-aws-for-the-first-10-million-users
Blazeclan
AWS Platform
Cloud IT Better14
Source : http://www.slideshare.net/AmazonWebServices/aws-webinar-scaling-on-aws-for-the-first-10-million-users
Blazeclan
AWS Building Blocks
Cloud IT Better15
Inherently Highly Available and Fault Tolerant Services
Amazon S3
Amazon SQS
Amazon DynamoDB
Amazon SNS
Amazon CloudFront
Amazon SES
Amazon Route53
Amazon SWF
Elastic Load Balancer
…
Highly Available with Right Architecture
Amazon EC2
Amazon EBS
Amazon RDS
Amazon VPC
Sp
an
Acro
ss A
Z’s
Arc
hit
ect
Acro
ss A
Z’s
Blazeclan 16 Cloud IT Better
Design For Failure
Blazeclan Cloud IT Better17
Avoid Impact on
Business
Avoid
single
points of
failureApplication
Should
Continue to
Function
Assume
everything
fails, and
work
backwards
Everything fails, all the time
– Werner Vogels, CTO, Amazon
Obama’s Prized Limo after it
broke down in his Israel visit!
Blazeclan
Ask Questions for Right Architecture
Cloud IT Better18
What happens if a node in your system fails?
If there are master and slaves
In your architecture, what if the master
node fails?
If a load balancer is sitting in front
of an array of application servers, what if
that load balancer fails?
What are my single points of failure?
What kind of Scenarios do I
have to plan for?
Blazeclan
Lots of Questions
Cloud IT Better19
How do you recognize that failure?
How do I replace that node?
What if the cache keys grow beyondmemory limit of an instance?
How does the failover occur &how is a new slave instantiated &brought into sync with the master?
What if downstream servicetimes out or returns an exception?
Blazeclan
Build Mechanisms to Handle Failure
Cloud IT Better20
• Build process threads that resume on reboot
• Allow the state of the system to re-sync
by reloading messages from queues
• Keep pre-configured and pre-optimized
virtual images to support above point
on launch/boot
• Avoid in-memory sessions or stateful
user context, move that to data stores
• Have a coherent backup and restore
strategy for your data and automate it
Image courtesy: http://www.outsmarthormones.com/wp-content/uploads/2011/06/Fix.jpg
Blazeclan
Design for Failure
Cloud IT Better21
Source: http://media.amazonwebservices.com/architecturecenter/AWS_ac_ra_ftha_04.pdf
Blazeclan 22 Cloud IT Better
Scaling
Blazeclan
Auto Scaling
• Enables to automatically scale
Amazon EC2 capacity up or down
• Enables to terminate Server
Instances at will
• Enables to add more instances
in response to an increasing load
• Enables launch of a replacement
instance immediately, in case of a failure
• Enables application to transition
seamlessly in case the primary server fails
Cloud IT Better23
Image Courtesy: http://www.knovelblogs.com/wp-content/uploads
Blazeclan
Elastic Load Balancing (ELB)
• Distributes incoming traffic to a
application across several Amazon
EC2 instances
• ELB is given a DNS host name &
Requests Sent to this host name
are Delegated to a pool
of Amazon EC2 instances
• ELB Detects Unhealthy Instances
within its pool of Amazon EC2 instances and automatically
reroutes traffic to healthy instances, until the unhealthy
instances have been restored
Cloud IT Better24
Blazeclan
ELB & Auto Scaling
• Auto Scaling & ELB are
an ideal combination
• ELB gives a single DNS
name for addressing
• Auto Scaling ensures
there is always the right
number of healthy
Amazon EC2 instances to
accept requests
Cloud IT Better25
Blazeclan 26 Cloud IT Better
Fault Tolerant
Blazeclan
Fault Tolerance
• In order to build fault-tolerant
applications on Amazon EC2,
it’s important to follow best
practices such as,
• Quickly being able to commission
replacement instances
• Using Amazon EBS for persistent
storage
• Use Multiple Availability Zones and
elastic IP addresses.
Cloud IT Better27
Blazeclan 28 Cloud IT Better
Multi-AZ Architecture
Blazeclan
Multi-AZ Design Considerations
Cloud IT Better29
• Achieve greater Fault Tolerance
by Distributing your application geographically
• The Amazon EC2 service level
agreement commitment is 99.95%
availability for each Amazon EC2 Region
• Deploy application that spans
across multiple Availability Zones
• Redundant instances for each tier of an
application could be placed in distinct Availability Zones
• ELB can automatically balance traffic across multiple instances &
multiple Availability Zones
Image Courtesy: http://chriscampcommunications.blogspot.in
Blazeclan
Multi- AZ Architecture
Cloud IT Better30
Blazeclan 31 Cloud IT Better
Loose Coupling
Blazeclan
Loose Coupled Systems
• Loosely coupled systems are
more fault tolerant and can achieve
a bigger scale
• Loosely coupled systems on AWS
• De-coupling systems allows for hybrid models
(in-cloud + in-physical data center)
• Balancing between clusters enables easier scaling
• Using queues (Amazon SQS) buffers against failures
Cloud IT Better32
• Design for a jumble of black boxes
Blazeclan
Decoupling using SQS
Cloud IT Better33
Blazeclan
Loose Coupling - Best Practices on AWS
Cloud IT Better34
• Use Amazon SQS to isolate components
• Use Amazon SQS as buffers between components
• Design every component such that it expose a service
interface and is responsible for its own scalability and
interacts with other components asynchronously
• Bundle the logical construct of a component
into an Amazon Machine Image so that it can
be deployed more often
• Make your applications as stateless as
possible. Store session state outside of component
(in Amazon SimpleDB, if appropriate)
Blazeclan 35 Cloud IT Better
SampleArchitectures
Blazeclan
High Availability Architecture in RDS
Cloud IT Better36
Blazeclan
Web Hosting on AWS
Cloud IT Better37
Blazeclan
Scalable Reader Farm
Cloud IT Better38
Blazeclan
Design for High Availability & Scale
Don’t let this happen to your Business
Our AWS Expert Solution Architects can help
you review your Architecture.
Avail for our 2hr Free Consultancy!
For any assistance please contact us at
Cloud IT Better39
Blazeclan
Upcoming Webinars
Cloud IT Better40
Check out Our Upcoming Webinars
www.blazeclan.com/webinars
Blazeclan
Follow Us On :
Our Blog : http://blog.blazeclan.com/
Thank you