leveraging the public cloud for disaster recovery

31
Leveraging the Public Cloud for Disaster Recovery Lahav Savir, Architect & CEO Emind systems Ltd. [email protected]

Upload: newvewm

Post on 30-Oct-2014

651 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Leveraging the Public Cloud for Disaster Recovery

Leveraging the Public Cloudfor Disaster Recovery

Lahav Savir, Architect & CEOEmind systems [email protected]

Page 2: Leveraging the Public Cloud for Disaster Recovery

About

Lahav Savir• 15+ years’ experience in on-line industry• Architect and CEO @ Emind Systems

Emind Systems (est. 2006)• Boutique system integrator• ~100 AWS customers• AWS solution provider

Page 3: Leveraging the Public Cloud for Disaster Recovery

Amazon (AWS) Certification

Amazon Solution Provider& Consulting Partner

https://aws.amazon.com/solution-providers/si/emind-systems-ltd

Page 4: Leveraging the Public Cloud for Disaster Recovery

Disaster Recovery in a Nutshell

• Business continuity• Minimize downtime and data loss• Recovery Time Objective (RPO)• Recovery Point Objective (RTO)• Price

Page 5: Leveraging the Public Cloud for Disaster Recovery

DR ApproachesComplete server mirroring

Data mirroring / replication

Configuration replication

Page 6: Leveraging the Public Cloud for Disaster Recovery

Emind’s Best Practice

Server MirrorConfiguration

Mirror

Data Mirror

Data Mirror

Page 7: Leveraging the Public Cloud for Disaster Recovery

Why Amazon ?

Flexible, Global Infrastructure• N. Virginia• Oregon• N. California• Ireland• Singapore• Tokyo• Sydney• São Paulo• GovCloud

Page 8: Leveraging the Public Cloud for Disaster Recovery

Secure

• VPC - Virtual Private Cloud on AWS's infrastructure

• Specify private IP address range

• Bridge your onsite IT infrastructure and the VPC with a VPN connection or Direct Connect

• Extending your existing security and management policies to the cloud

Page 9: Leveraging the Public Cloud for Disaster Recovery

A different cost model

2nd Site Cost

AWS Cost

Demand

Cost savings w/ AWS

Ability to scale – no arbitrary time limit to failback

Time

Infr

astr

uctu

re C

ost

Test Test Failover Failback

Page 10: Leveraging the Public Cloud for Disaster Recovery

Zoom into the technics

Page 11: Leveraging the Public Cloud for Disaster Recovery

Disaster Recovery Terms• RTO: Recovery Time Objective

– Acceptable time period within which normal operation (or degraded operation) needs to be restored after event

• RPO: Recovery Point Objective– Acceptable data loss measured in time

Page 12: Leveraging the Public Cloud for Disaster Recovery

Backup and Restore

On-premises Infrastructure

Traditional server

Amazon Route 53

AWS Import/Export

S3 Bucket with Objects

Data copied to S3

Page 13: Leveraging the Public Cloud for Disaster Recovery

Backup and Restore

Availability Zone

AWS Region

Data Volume

Amazon EC2Instance

AMI

Amazon S3 Bucket

Data copied from objects in S3

Instance Quickly provisioned from

AMI

Pre-bundled with OS and

applications

Page 14: Leveraging the Public Cloud for Disaster Recovery

Backup and Restore

• Advantages– Simple to get started– Extremely cost effective (mostly backup storage)

• Preparation Phase– Take backups of current systems– Store backups in S3– Describe procedure to restore from backup on AWS

• Know which AMI to use, build your own as needed• Know how to restore system from backups• Know how to switch to new system

Page 15: Leveraging the Public Cloud for Disaster Recovery

Backup and Restore

• In Case of Disaster– Retrieve backups from S3– Bring up required infrastructure

• EC2 instances with prepared AMIs, Load Balancing, etc.

– Restore system from backup– Switch over to the new system

• Adjust DNS records to point to AWS

• Objectives– RTO: as long as it takes to bring up infrastructure and restore

system from backups– RPO: time since last backup

Page 16: Leveraging the Public Cloud for Disaster Recovery

Pilot LightUser or system

WebServer

ApplicationServer

DatabaseServer

Data Volume

Web Server

ApplicationServer

DatabaseServer

Data Volume

Data Mirroring/ Replication

Not Running

Smaller Instance

Amazon Route 53

Page 17: Leveraging the Public Cloud for Disaster Recovery

Pilot LightUser or system

WebServer

DatabaseServer

Data Volume

Web Server

ApplicationServer

DatabaseServer

Data Volume

Not Running

Smaller Instance

Amazon Route 53

WebServer

ApplicationServer

DatabaseServer Data Mirroring/

Replication

Page 18: Leveraging the Public Cloud for Disaster Recovery

ApplicationServer

Web Server

Pilot LightUser or system

WebServer

DatabaseServer

Data Volume

DatabaseServer

Data Volume

Start in minutes

Resize as desired

Amazon Route 53

WebServer

ApplicationServer

DatabaseServer Data Mirroring/

Replication

Page 19: Leveraging the Public Cloud for Disaster Recovery

Pilot Light

• Advantages– Very cost effective (fewer 24/7 resources)

• Preparation Phase– Enable replication of all critical data to AWS– Prepare all required resources for automatic start

• AMIs, Network Settings, Load Balancing, etc.

Page 20: Leveraging the Public Cloud for Disaster Recovery

Pilot Light

• In Case of Disaster– Automatically bring up resources around the replicated core data set– Scale the system as needed to handle current production traffic– Switch over to the new system

• Adjust DNS records to point to AWS

• Objectives– RTO: as long as it takes to detect need for DR and automatically scale

up replacement system– RPO: depends on replication type

Page 21: Leveraging the Public Cloud for Disaster Recovery

WebServer

Fully-Working Low Capacity Standby

User or system

Data Volume

Data Volume

Data Mirroring/ Replication

Low CapacityAmazon Route 53

WebServer

AppServer

DBServer

DatabaseServer

ApplicationServer

Page 22: Leveraging the Public Cloud for Disaster Recovery

Fully-Working Low Capacity Standby

User or system

Data Volume

Data Volume

Low CapacityAmazon Route 53

WebServer

AppServer

DBServerData Mirroring/

Replication

WebServer

DatabaseServer

ApplicationServer

Page 23: Leveraging the Public Cloud for Disaster Recovery

Fully-Working Low Capacity Standby

User or system

Data Volume

AppServer

DBServer

Data Volume

Grow CapacityAmazon Route 53

WebServer

Web Server

ApplicationServer

DatabaseServer

WebServer

DatabaseServer

ApplicationServer

Data Mirroring/ Replication

Page 24: Leveraging the Public Cloud for Disaster Recovery

Fully-Working Low-Capacity Standby

User or system

Data Volume

AppServer

DBServer

Data Volume

Grow CapacityAmazon Route 53

WebServer

Web Server

ApplicationServer

DatabaseServer

WebServer

DatabaseServer

ApplicationServer

Data Mirroring/ Replication

Page 25: Leveraging the Public Cloud for Disaster Recovery

Fully-Working Low-Capacity Standby

• Advantages– Can take some production traffic at any time– Cost savings (IT footprint smaller than full DR)

• Preparation– Similar to Pilot Light– All necessary components running 24/7, but not scaled for production

traffic– Best practice – continuous testing

• “Trickle” a statistical subset of production traffic to DR site

Page 26: Leveraging the Public Cloud for Disaster Recovery

Fully-Working Low-Capacity Standby

• In Case of Disaster– Immediately fail over most critical production load

• Adjust DNS records to point to AWS– (Auto) Scale the system further to handle all production load

• Objectives– RTO: for critical load: as long as it takes to fail over; for all other load,

as long as it takes to scale further– RPO: depends on replication type

Page 27: Leveraging the Public Cloud for Disaster Recovery

Multi-Site Hot StandbyUser or system

Data Volume

AppServer

DBServer

Data Volume

Data Mirroring/ Replication

Full CapacityAmazon Route 53

WebServer

ApplicationServer

DatabaseServer

Web Server

ApplicationServer

DatabaseServer

Web Server

ApplicationServer

DatabaseServer

Page 28: Leveraging the Public Cloud for Disaster Recovery

Multi-Site Hot Standby

• Advantages– At any moment can take all production load

• Preparation– Similar to Low-Capacity Standby– Fully scaling in/out with production load

• In Case of Disaster– Immediately fail over all production load

• Adjust DNS records to point to AWS

• Objectives– RTO: as long as it takes fail over– RPO: depends on replication type

Page 29: Leveraging the Public Cloud for Disaster Recovery

Summary

• Plan– Analyze your existing applications and services– Find the right approach per case

• Adapt– Match your plan to RTO, RPO and Budget

• POC– Validate your plan

• Test– Periodic testing

• Monitor– Ensure continues operation of all

Page 30: Leveraging the Public Cloud for Disaster Recovery

• goCloud – Emind’s optimal road to the cloud– Secure cloud architecture– Scalable & high-availability design– Customized system deployment– Orchestrating cloud and software– Cloud operation team– Monitoring and alerting– 24x7 SLA

Page 31: Leveraging the Public Cloud for Disaster Recovery

Contact [email protected] @lahavsavir

054-4321688