snapmirror business continuity (sm-bc)...•automated failover initiated by mediator in case of...

28
SnapMirror Business Continuity (SM-BC) Tech Data Manish Thakur Cheryl George Principal Product Manager Technical Marketing Engineer [email protected] [email protected] March 2021 © 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL — 1

Upload: others

Post on 25-Mar-2021

44 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

SnapMirror Business Continuity (SM-BC)Tech Data

Manish Thakur Cheryl GeorgePrincipal Product Manager Technical Marketing [email protected] [email protected]

March 2021© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —1

Page 2: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —

Agenda• Evolving with business continuity

• Introduce new business continuity solution SM-BC

• Failure scenarios

• The Right BC Solution for You

2

Page 3: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

Business Continuity and Disaster Recovery (BCDR) TechnologyNetApp SnapMirror

Data loss avoidance

BCDR

Rapid application recovery

Production

Normal

Enterprise applications

SecondaryTape/VTL

Simplicity in management and orchestration

NetApp®SnapMirror®

Public cloud

SMtape

Failover

NetApp INSIGHT © 2020 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use Only3

Page 4: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

Ability for application to failover to secondary copy in storage, without application re-connect or user disruption What Is Business Continuity?

Tape/VTL vaulting

Tape/VTL backup

Async replication

Sync replication

D2D/D2C backup

-36 -24 -12 0 12 24 36 48 60 72

Disaster recovery

Backup and recovery

Recovery Point Objective (RPO)Time period (amount) of lost data

Recovery Time Objective (RTO) Time required to resume business

Data loss Recovery time

Business continuity Sync replication

Guaranteed zero data loss2 Transparent Application Failover to prevent application disruptionRPO=0

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —

Event

RTO=01 3

4

Page 5: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

Granular Application Availability

Evolving with business continuity

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —5

Page 6: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

• Consolidate all your workloads on a single cluster

• Protect only the critical SAN workloads

• Secondary cluster hosts the mirror as well as other workloads

• Primary cluster can be high end with entry on secondary

• Leverage mirror copy for development and testing

Asymmetric configuration and value from mirror copyDesign an Optimal Solution

SnapMirror®

Business Continuity

Site A

Site B

Production application

Mirror copy

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —6

Page 7: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

No new license

Part of data protection bundle

SAN protocols

FC, iSCSI

Highly resilient

External mediator for transparent failover

Granular business continuity solution for SAN applicationsSnapMirror Business Continuity (SM-BC) Overview

Continuous AvailabilityActive workloads on both clusters

Platform FlexibilityAny 2-node AFF or ASA clusters

Easy AdministrationSnapMirror® simplicity

Application support

Consistency groupMonolithic and

distributed applications

Setup simplicity

ONTAP® System Manager simplicity

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —7

Page 8: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

Synchronous replication with Transparent Application FailoverContinuous Availability for the Application

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —

SnapMirror Synchronous

Enterprise Applications

Primary Consistency Group (CG)

Mirror copy

Storage Virtual Machine

Normal Automated failover

Storage Virtual Machine (SVM)

Unmirrored volumes

Primary Datacenter Secondary Datacenter

• Automated failover of business-critical applications

• Simplified application management with granular control

• Consistency Group (CG) for dependent write-order consistency

• Flexibility of DR Test for each application

• Create instantaneous clones of mirror, without impacting application availability

• Optimize cost by using existing ONTAP 9.8 AFF clusters

• Ease of management with intuitive workflows

8

Page 9: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

Transparent application failover for SAN onlySnapMirror Business Continuity (SM-BC)

SnapMirror Synchronous

Enterprise applications

Primary consistency group

Mirrored consistency group

Storage virtual machine (SVM)

Normal Automated failover

Storage virtual machine (SVM)

Unmirrored volumes

Primary data center Secondary data center

1. NetApp SnapMirror® Synchronous technology for RPO 0

2. Consistency group for application granularity

3. LUN identity is same on both sides

4. Transparent application failover

1

2 3

4

ONTAP Mediator

Third site

Single virtual device

NetApp INSIGHT © 2020 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use Only9

1

2

3

4

Page 10: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

• Linux physical or virtual server running RHEL or CentOS 7.6 – 8.2

• Establish quorum • Both ONTAP® storage systems (including nodes)

periodically send heartbeat and status of replication

• Avoid split-brain• Scenario in which each cluster may simultaneously try to

become the master

• Orchestrates automated failover upon detection of failure• Switchover of application host with RTO <= 120 seconds

• ONTAP Mediator can manage a total of 10 cluster pairs

Enables automated failover during disasterONTAP Mediator 1.2

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —

Replication Status

Primary Datacenter Secondary Datacenter

ONTAP Mediator

Third site

10

Page 11: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

• CG preserves write order across volumes

• Relationship between CG across 2 clusters

• CG categorized as master and mirror• Master CG preferred to serve if IO both copies available

and connectivity between clusters is lost

• All volumes within CG part of relationship

• A cluster could have multiple master or mirrored copy CGs

• SVMs containing the related CGs can have different namespaces, LIFs

Relationship between two consistency groups (CG)SM-BC Relationship

Prod - Storage Virtual Machine

Cluster A Cluster B

Master CG

NDR - Storage Virtual MachineUn-mirrored Volumes Mirror CG

Volume contains one or more LUNs.

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —11

Page 12: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

Typical data layout for Enterprise ApplicationsOracle, Microsoft SQL Server, vMSC, etc

FlexVol 2Log file(s)

FlexVol 1Datafile(s)

§ Small to medium-size non-critical databases§ Better consolidation§ Loose application granularityFlexVol 3

Datafiles Log filesLUN1 LUN2

§ Other application related files

§ Typical enterprise application layout§ Uses a dedicated vol\LUN for each database§ Suitable for large critical databases§ Better for BC, DR and cloning purposes

FlexVol 5Binaries

FlexVol 4VMs

SVM

Virtualization

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —12

Page 13: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

Fan-out Enterprise Application Protection

Production Stand-by

Primary Datacenter Local DisasterRecovery Datacenter

SM-BC

SnapMirror Synchronous(SM-S) Replication

~150km max

Mirrored CG

SVM SVM`

Primary Consistency Group (CG)

Data

Logs

VMs

Round Trip Time (RTT) <10ms

Manual step:Map the mirror LUNs to respective initiator group

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —13

AsynchronousSnapMirror

FlexClone

Remote DisasterRecovery Datacenter3

Stand-by

SVM``

Page 14: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

• Active optimized path to LUN is through cluster hosting master CG

• Both copies of LUN have same id hence host sees a single virtual LUN

• LUN can be accessed for read/write through the mirror CG• Host has an active non-optimized path to it

• Access through mirror CG is redirected to the master CG • This increases latency perceived by host

ALUA provides host optimal path to the LUNHost Path to LUNs

ALUASVM SVM

Master CG Mirror CG

Active non-optimized

Active optimized

Common LUN id across copies

Proxy path

Cluster A Cluster B

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —14

Page 15: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

• Write to a master CG results in write to mirror cluster

• Master and mirror designation of CG meant to avoid split brain

• Host receives ACK only upon writes to both master and mirror

• Periodic CG snapshot scheduled on both clusters

• Optimal for host to read/write to master CG

Host writes op complete only when NVLOG on both clusters written toSM-BC Replication

Master CG NVRAM

Mirror CG NVRAMHost Write (1)

R ACK (4)

ONTAP WR(2)

Mirror WR (2)

ACK (5)

ONTAP WR (3)

Master CG NVRAMHost Write (1)

ACK (5)

Write (4)

ACK (7)

Proxy Write (2)

Write (3

Write (3

)Mirror CG NVRAM

R ACK (6)

Host write to a CG on cluster hosting master copy

Host write to a CG on cluster hosting mirror copy

Write (3)

ALUA

ALUA

Cluster A Cluster B

Cluster B Cluster A

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —15

Page 16: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

• Automated failover initiated by mediator in case of disaster

• Mirror CG changed to master after in-flight write ops committed

• Sole path remains active optimized• Erstwhile non-optimized paths become optimized

• Replication from master CG suspended

Mediator initiates a transparent failoverSM-BC Transparent Failover on a Disaster

Master CG Mirror CGALUA

In Sync

Master CGALUA

Active non-optimized

Active optimized

Active optimized

CG on Cluster B

CG on Cluster A CG on Cluster B

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —16

Page 17: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

vSphere Stretched ESX cluster

Clustered SAN applicationVMware vSphere Metro Storage Cluster – Uniform host access configuration

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —

ESXi1 ESXi2 ESXi3 ESXi4

• Active workloads served simultaneously from both clusters• Bidirectional replication

• Best practice:• Plan for VMs to be stored in

separate datastores, locally• Otherwise,

• VM from secondary site will incur • 2x Round Trip Time (RTT)

for Writes • 1x RTT for Reads

• Then, perform a planned failover (PFO) to avoid using proxy path

VM1-vmdk VM1-vmdkSITE A

VM1 VM2

VM2-vmdk VM2-vmdkSITE B

17

Page 18: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

Failure scenarios

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —18

Page 19: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —

Steady StateNormal Operations

L1P L1S

AOANO

VS2VS1

In Sync

ONTAP Mediator

Host

N1 N2 N3 N4

C1 C2

ANOANO

Cluster

Node

19

Page 20: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —

Steady StateNormal Operations

Mediator

C1Primary

C2Secondary

Parameter DetailsHost access to storage

C1 and C2

SM-BC relationship state

In Sync

21

Page 21: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

Mediator

C1Primary

C2Secondary

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —

• ONTAP Mediator failure1. VM2. Link (Single or double)

ONTAP Mediator failure

112

2 2Parameter DetailsSM-BC action upon failure NAHost access to storage C1, C2 (No disruption to I/O)SM-BC relationship state In SyncFailover operation Not possible (Planned or

Unplanned Failover)

22

Page 22: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

Mediator

C1Primary

C2Secondary

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —

§ Replication Link failure1. Transient2. Persistent (Tries 3 times every 3 seconds = 9

seconds)

Link failure between the sites (Split brain scenario)Replication Link failure

Parameter DetailsSM-BC action upon failure No actionHost access to storage C1 after consensusSM-BC relationship state Out of syncFailover operation NA

23

Page 23: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

Mediator

C1Primary

C2Secondary

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —

Disaster at Site A

§ SM-BC detects failure at primary and triggers an automatic failover

§ When C1 recovers, automatic resync completes to bring C2->C1 relationship “In Sync”

§ Planned failback to restore normal steady state operations

Parameter DetailsSM-BC action upon failure Automatic unplanned

failover (AUFO)Host access to storage C2 after consensus (Mirror

copy->active copy)SM-BC relationship state Out of syncFailover operation Possible

24

Page 24: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —

§ When C2 recovers, automatic resync completes to bring C1->C2 relationship “In Sync”

Disaster at Site B

Mediator

C1Primary

C2Secondary

Parameter DetailsSM-BC action upon failure No actionHost access to storage C1 after consensusSM-BC relationship state Out of syncFailover operation NA

25

Page 25: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

The Right Solution for You

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —27

Page 26: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

Business continuityIntegrated ONTAP Data Protection

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —

Continuous Availability§ RPO 0§ RTO 0

SnapMirror Business Continuity

MetroCluster

Select SAN Applications Protected

All workloads protected

SnapMirror Business Continuity (SM-BC) MetroCluster (MCC)

28

Page 27: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

Zero data lossSnapMirror Synchronous (SM-S) to achieve zero Recovery Point Objective (RPO)

Application Granularity Consistency Group (CG) to maintain dependent write-order consistency

Transparent application failoverNear-zero recovery time objective (RTO) for continuous application availability

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —

Key takeaways

29

Page 28: SnapMirror Business Continuity (SM-BC)...•Automated failover initiated by mediator in case of disaster •Mirror CG changed to master after in-flight write ops committed •Sole

NetApp unlocks the best of cloud

© 2021 NetApp, Inc. All rights reserved. — NETAPP CONFIDENTIAL —