aritnet#3 · ansi/tia ansi/tia-942-2005 standard 2005 standard 5. data center network topologies 6....

ARITNET#3Disaster Recovery Site: Experiments 30-31 Mar 2017

1

Trainerอ.ประกาย นาดี มทร.อีสาน อ.เชาวลิต สมบรูณ์พัฒนากิจ มทร.พระนคร ผศ.อิฐอารัญ ปิติมล มทร.ธัญบุรี

2

Contents

What’s Disaster Recovery?

Why Disaster Recovery site. RMUTP case study.

DC-DR Implementation Methods

3

What’s Disaster Recovery?

4

Data Center

Physical Layout

ANSI/TIA ANSI/TIA-942-2005 Standard 2005 Standard

5

Data Center Network Topologies

6

Common Challenges in the Data Center

Improve IT service performance

Respond to growing demand for IT services

Handle and protect rapidly expanding volumes of data

Be agile to deal with emerging IT trends

Business continuity after a disaster

7

“By failing to prepare, you are preparing to fail” Benjamin Franklin

8

Service level agreement (SLA)

9

Business Continuity is generally referred to as the program that develops, exercises, and maintains plans to continue business operations at an acceptable level within a pre-agreed time period after a disaster is declared.

There are two major components of Business Continuity planning:

• Human Environment: Deals with people, communications, processes, and other logistics.

• IT Environment: Deals with data, applications, and their supporting infrastructure.

10

What is Business Continuity? Taxonomy

HA/DR are subsets of Business Continuity

11

High Availability (HA) HA protects and/or recovers systems or its components against minor outages in a relatively short time frame. – The recovery is mostly automated to minimize the potential of failure. – It is focused on uptime rather than the recovery time – No Single point of failure; Fault tolerant systems; Seamless failover

Disaster Recovery (DR) DR is the ability to continue with services in case of major outages, often with reduced capabilities or

performance. - DR is about when the whole metropolitan area is unavailable due to a disaster. - Deals with technical/IT aspects of how to continue business after a disaster is declared

Business Continuity

Business Continuity Life Cycle

• Risk Assessment (RA) – Identify the most probable, high-

impact risks – Some risks might have an enormous

impact but are highly improbable – Other risks have moderate impact but

are highly probable and frequent

• Business Impact Analysis (BIA) – Classify business processes by criticality

– Determine the cost of downtime – Map all dependent resources (IT/non-

IT assets)

Disaster Recovery Aspects

Risk Management

BusinessImpact

Analysis

BC Plan Test

BC Plan Development

BC Strategy Development

P L A N

M A I N T E N A N C E

BC Program Lifecycle

12

Understanding RPO and RTO

DECLARE DISASTE

R 10 a.m.

Recovery Point Objectives (RPO)

Recovery Time Objectives (RTO)

RPO: Amount of data lost from failure, measured as the amount of time from a disaster event

RTO: Targeted amount of time to restart a business service after a disaster

5 a.m.

6 a.m.

7 a.m.

8 a.m.

9 a.m.

10 a.m.

11 a.m.

12 a.m.

1 p.m.

2 p.m.

3 p.m.

4 p.m.

5 p.m.

6 p.m.

7 p.m.

13

© 2014 Cisco and/or its affiliates. All rights reserved.

RPO/RTO Definitions

BRKDCT-2487 Cisco Public 14

Recovery Point Objective (RPO) - extent of data loss measured in terms of a time period that can be tolerated by a business process.

It is the maximum allowable data loss following a large-scale disaster.

An RPO defines: – The point in time to which the system must be recovered after an outage – The amount of data loss an organization can endure – Different applications/businesses that may have different recovery objectives

Recovery Time Objective (RTO) indicates the maximum allowable downtime for applications following a large-scale disaster.

It is the time available to recover a disrupted service.

An RTO defines: – How long it takes to recover services after the disaster – The amount of downtime an organization can endure

© 2014 Cisco and/or its affiliates. All rights reserved.

RPO related to RTO (example)

• Last back up Tues at 5.00PM • Disaster declared Wed at 3.00PM • Help desk system must be returned to service within 72 hours of the

disaster • Its operating state must match 5PM of the day before the disaster • RTO = 72 hours • RPO = 22 hours

RPO RTO

BRKDCT-2487 Cisco Public 15

Traditional ‘DR’ implementations

• Geographically-separated DR in Colo – More CAPEX, difficult to setup and maintain (hw+colo+tools,

etc.) – Hardware obsolescence – Several vendors to manage and control

• Remote Backup – Backups protect the data, not the applications – In case of a disaster the IT teams needs time to recover site,

rebuild, etc. – It won’t achieve RTO/RPO

• Storage Replication – Very expensive. Distance and BW limitations – Compatibility issues, needs same storage on both sides – Complexity

16

Disaster Recovery as a Service (DRaaS)

17

• DR failover to a cloud environment

• Provider runs customers’ production environments out of the cloud during disaster declarations or testing.

• DRaaS benefits over in-house DR solutions for enterprises: – No need to invest to in-house DR capacity, technology, remote location etc. – Less technical expertise, lower OPEX – Buy on a pay-per-use basis – Rates based upon defined RPO/RTO – Demand flexibility – No need for idling resources – High availability, guaranteed SLAs – Rapid turn-up time

Cloud Based DR Summary

18

Merits

• Reduces IT investment: – DC space – IT infrastructure & human resources – Expertise

• Enables DR methods in SMBs which previously couldn’t afford DR

• Rapid implementation time

• Dynamic expansion capability

• DR automation

• Guaranteed RPO/RTO SLAs

Considerations

• Is data securely transferred and stored in cloud

• How are users authenticated?

• Does the provider meet regulatory requirements?

• Is there bandwidth and network capacity to redirect all users to cloud?

• RPO/RTO requirements vs. costs

Many Implementation Methods

Primary Storage

Production data centers

Primary Storage

DRaaS provider

Remote backups

Storage Array replication

Host replication

Application replication

Hypervisor replication

19

UniNet & RMUT case study.

WAN Infrastructure

High Bandwidth

Split network base-on functions

RMUT had reliability DC.

20

Thank you.

22

aritnet#3 · ansi/tia ansi/tia-942-2005 standard 2005 standard 5. data center network topologies 6....

Documents