aritnet#3 · ansi/tia ansi/tia-942-2005 standard 2005 standard 5. data center network topologies 6....
TRANSCRIPT
ARITNET#3Disaster Recovery Site: Experiments 30-31 Mar 2017
1
Trainerอ.ประกาย นาดี มทร.อีสาน อ.เชาวลิต สมบรูณ์พัฒนากิจ มทร.พระนคร ผศ.อิฐอารัญ ปิติมล มทร.ธัญบุรี
2
Contents
What’s Disaster Recovery?
Why Disaster Recovery site. RMUTP case study.
DC-DR Implementation Methods
3
What’s Disaster Recovery?
4
Data Center
Physical Layout
ANSI/TIA ANSI/TIA-942-2005 Standard 2005 Standard
5
Data Center Network Topologies
6
Common Challenges in the Data Center
Improve IT service performance
Respond to growing demand for IT services
Handle and protect rapidly expanding volumes of data
Be agile to deal with emerging IT trends
Business continuity after a disaster
7
“By failing to prepare, you are preparing to fail” Benjamin Franklin
8
Service level agreement (SLA)
9
Business Continuity is generally referred to as the program that develops, exercises, and maintains plans to continue business operations at an acceptable level within a pre-agreed time period after a disaster is declared.
There are two major components of Business Continuity planning:
• Human Environment: Deals with people, communications, processes, and other logistics.
• IT Environment: Deals with data, applications, and their supporting infrastructure.
10
What is Business Continuity? Taxonomy
HA/DR are subsets of Business Continuity
11
High Availability (HA) HA protects and/or recovers systems or its components against minor outages in a relatively short time frame. – The recovery is mostly automated to minimize the potential of failure. – It is focused on uptime rather than the recovery time – No Single point of failure; Fault tolerant systems; Seamless failover
Disaster Recovery (DR) DR is the ability to continue with services in case of major outages, often with reduced capabilities or
performance. - DR is about when the whole metropolitan area is unavailable due to a disaster. - Deals with technical/IT aspects of how to continue business after a disaster is declared
Business Continuity
Business Continuity Life Cycle
• Risk Assessment (RA) – Identify the most probable, high-
impact risks – Some risks might have an enormous
impact but are highly improbable – Other risks have moderate impact but
are highly probable and frequent
• Business Impact Analysis (BIA) – Classify business processes by criticality
– Determine the cost of downtime – Map all dependent resources (IT/non-
IT assets)
Disaster Recovery Aspects
Risk Management
BusinessImpact
Analysis
BC Plan Test
BC Plan Development
BC Strategy Development
P L A N
M A I N T E N A N C E
BC Program Lifecycle
12
Understanding RPO and RTO
DECLARE DISASTE
R 10 a.m.
Recovery Point Objectives (RPO)
Recovery Time Objectives (RTO)
RPO: Amount of data lost from failure, measured as the amount of time from a disaster event
RTO: Targeted amount of time to restart a business service after a disaster
5 a.m.
6 a.m.
7 a.m.
8 a.m.
9 a.m.
10 a.m.
11 a.m.
12 a.m.
1 p.m.
2 p.m.
3 p.m.
4 p.m.
5 p.m.
6 p.m.
7 p.m.
13
© 2014 Cisco and/or its affiliates. All rights reserved.
RPO/RTO Definitions
BRKDCT-2487 Cisco Public 14
Recovery Point Objective (RPO) - extent of data loss measured in terms of a time period that can be tolerated by a business process.
It is the maximum allowable data loss following a large-scale disaster.
An RPO defines: – The point in time to which the system must be recovered after an outage – The amount of data loss an organization can endure – Different applications/businesses that may have different recovery objectives
Recovery Time Objective (RTO) indicates the maximum allowable downtime for applications following a large-scale disaster.
It is the time available to recover a disrupted service.
An RTO defines: – How long it takes to recover services after the disaster – The amount of downtime an organization can endure
© 2014 Cisco and/or its affiliates. All rights reserved.
RPO related to RTO (example)
• Last back up Tues at 5.00PM • Disaster declared Wed at 3.00PM • Help desk system must be returned to service within 72 hours of the
disaster • Its operating state must match 5PM of the day before the disaster • RTO = 72 hours • RPO = 22 hours
RPO RTO
BRKDCT-2487 Cisco Public 15
Traditional ‘DR’ implementations
• Geographically-separated DR in Colo – More CAPEX, difficult to setup and maintain (hw+colo+tools,
etc.) – Hardware obsolescence – Several vendors to manage and control
• Remote Backup – Backups protect the data, not the applications – In case of a disaster the IT teams needs time to recover site,
rebuild, etc. – It won’t achieve RTO/RPO
• Storage Replication – Very expensive. Distance and BW limitations – Compatibility issues, needs same storage on both sides – Complexity
16
Disaster Recovery as a Service (DRaaS)
17
• DR failover to a cloud environment
• Provider runs customers’ production environments out of the cloud during disaster declarations or testing.
• DRaaS benefits over in-house DR solutions for enterprises: – No need to invest to in-house DR capacity, technology, remote location etc. – Less technical expertise, lower OPEX – Buy on a pay-per-use basis – Rates based upon defined RPO/RTO – Demand flexibility – No need for idling resources – High availability, guaranteed SLAs – Rapid turn-up time
Cloud Based DR Summary
18
Merits
• Reduces IT investment: – DC space – IT infrastructure & human resources – Expertise
• Enables DR methods in SMBs which previously couldn’t afford DR
• Rapid implementation time
• Dynamic expansion capability
• DR automation
• Guaranteed RPO/RTO SLAs
Considerations
• Is data securely transferred and stored in cloud
• How are users authenticated?
• Does the provider meet regulatory requirements?
• Is there bandwidth and network capacity to redirect all users to cloud?
• RPO/RTO requirements vs. costs
Many Implementation Methods
Primary Storage
Production data centers
Primary Storage
DRaaS provider
Remote backups
Storage Array replication
Host replication
Application replication
Hypervisor replication
19
UniNet & RMUT case study.
WAN Infrastructure
High Bandwidth
Split network base-on functions
RMUT had reliability DC.
20
21
Thank you.
22