2008 srdf star for open systems srg

Copyright © 2008 EMC Corporation. Do not Copy - All Rights Reserved.

SRDF/Star for Open Systems - 1

© 2008 EMC Corporation. All rights reserved.

SRDF/Star for Open SystemsSRDF/Star for Open Systems

Welcome to SRDF/Star for Open Systems Training.

Copyright © 2008 EMC Corporation. All rights reserved.These materials may not be copied without EMC's written consent.EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.Use, copying, and distribution of any EMC software described in this publication requires an applicable software license.EMC2, EMC, EMC ControlCenter, AlphaStor, ApplicationXtender, Captiva, Catalog Solution, Celerra, CentraStar, CLARalert, CLARiiON, ClientPak, Connectrix, Co-StandbyServer, Dantz, Direct Matrix Architecture, DiskXtender, DiskXtender 2000, Documentum, EmailXaminer, EmailXtender, EmailXtract, eRoom, FLARE, HighRoad, InputAccel, Navisphere, OpenScale, PowerPath, Rainfinity, RepliStor, ResourcePak, Retrospect, Smarts, SnapShotServer, SnapView/IP, SRDF, Symmetrix, TimeFinder, VisualSAN, VSAM-Assist, WebXtender, where information lives, Xtender, Xtender Solutions are registered trademarks; and EMC Developers Program, EMC OnCourse, EMC Proven, EMC Snap, EMC Storage Administrator, Acartus, Access Logix, ArchiveXtender, Authentic Problems,Automated Resource Manager, AutoStart, AutoSwap, AVALONidm, C-Clip, Celerra Replicator, Centera, CLARevent, Codebook Correlation Technology, EMC Common Information Model, CopyCross, CopyPoint, DatabaseXtender, Direct Matrix, EDM, E-Lab, Enginuity, FarPoint, Global File Virtualization, Graphic Visualization, InfoMover, Infoscape, Invista, Max Retriever, MediaStor, MirrorView, NetWin, NetWorker, nLayers, OnAlert, Powerlink, PowerSnap, RecoverPoint, RepliCare, SafeLine, SAN Advisor, SAN Copy, SAN Manager, SDMS, SnapImage, SnapSure, SnapView, StorageScope, SupportMate, SymmAPI, SymmEnabler, Symmetrix DMX, UltraPoint, UltraScale, Viewlets, VisualSRM are trademarks of EMC Corporation.

All other trademarks used herein are the property of their respective owners.



© 2008 EMC Corporation. All rights reserved. SRDF/Star for Open Systems - 2

Course ObjectivesUpon completion of this course, you will be able to:

List the benefits of SRDF/Star over other replication technologies

Explain the underlying technologies for SRDF/Star– Synchronous SRDF consistency groups using RDF-ECA– SRDF/A Multi-session Consistency (MSC)– Special SRDF features in support of Star

Explain Concurrent and Cascaded SRDF/Star concepts

Describe the steps needed to perform– Normal Operation– Transient Fault– Unplanned switch caused by a major outage

The objectives for this course are shown here. Please take a moment to read them.




Course Outline

Module 1– Introduction and Overview

Module 2– Underlying technologies that support SRDF/Star

Existing SRDF FeaturesConsistency technology and SRDFSpecial RDF and SDDF capabilities in support of SRDF/Star

Module 3– Using SRDF/Star

The outline for this course is shown here. Please take a moment to review it.




Introduction and Overview of SRDF/StarUpon completion of this module, you will be able to:

Describe two kinds of Star configurations

Describe the origins of Star and its place in the SRDF family

The objectives for this module are shown here. Please take a moment to read them.




The History of SRDF/Star

2001: Special version of SRDF/AR for a US company

2003: SRDF/A and concurrent SRDF/S and SRDF/A released

2003: A European company in a similar business visited the American company– After their visit, the Europeans met with EMC Engineering and

requested features that led to Concurrent Star

2004: Release of Concurrent SRDF/Star on Mainframes

2005: Release of Concurrent Star on Open Systems

2008: Release of Cascaded SRDF/Star on Open Systems

In 2001, EMC built a special version of a multi-hop Mainframe SRDF/AR (known at the time as SAR, which stood for Symmetrix Automated Replication), for a New York-based financial services company. This version of Mainframe SRDF/AR maintained a differential relationship between the source (site A) and the remote site (site C). If the bunker site (site B) failed, a full resynchronization between A and C was no longer necessary. By virtue of using DeltaMark (SDDF) sessions, an incremental relationship could be maintained between site A and site C.

In 2003, SRDF/A was released and concurrent SRDF/A and SRDF/S became possible. When a large European company (in the same line of business as the American company), paid a visit to their friends in New York, they got the idea for a product with the functionality of Star. Late in 2003, the Europeans came to Hopkinton and had a long meeting with EMC Engineering in which they outlined their requirements.

EMC decided to implement a product as desired by the European customers and call it STAR – which was supposed to be an acronym for Symmetrix Triangulated Automated Replication. It took 2 years from the first conversation with the customer and 18 months of development to produce a GA version of Star on Mainframe in 2004. The Open Systems version was released in 2005. To conform to EMC’s naming architecture for the SRDF products, the name SRDF/Star was chosen.

In 2008 when Cascaded SRDF was released, Star functionality was enhanced to support this feature.




Concurrent SRDF/Star3-site disaster recovery over extended distances

Concurrent SRDF: source to two concurrent targets

SRDF link between two targets in standby mode

Synchronous and Asynchronous targets can be differentially synchronized if Workload site fails

SRDF/S< 2

00 km

SRDF/A > 200 km

Workload Site

Nearby Synchronous Target

Short distanceZero data lag

Remote Asynchronous Target

Extended distanceVariable data lag (seconds to minutes)No performance impact

Async Target

Sync Target

Concurrent SRDF/Star enables concurrent SRDF/S and SRDF/A operations from the same source volumes.

The primary business benefit of Star is that in the event of a workload site outage, it is possible to undertake a differential resynchronization between the two remaining sites followed by the resumption of production at either site.

Concurrent Star can be reconfigured to run in cascaded mode without stopping replication between the Workload and Synchronous target sites. For example, if the link between the Workload and the Asynchronous target sites fails, a reconfiguration would allow the three sites to run in Cascaded Star mode with Star protection.




Concurrent Star Operating States After Failure

1. Link failure between A & B

2. Site B failure

3. Link failure between A & C

4. Site C failure

5. Link failure between B & C

6. Site A failure

Failure Cases Operating State Operating State(without Reconfiguration) (with Reconfiguration)

A - R11

C - R2

A - R11

C - R2

A - R11 B - R2 A - R1 B - R21

C - R2A - R11 B - R2

A - R11

C - R2

B - R2

B – R11

C - R2

B – R2

C – R11

OR

This is a list of possible actions after a failure occurs in a concurrent Star setup.

1. If the link between A and B fails, it is still possible to run production at A with remote protection available at site C.

2. The same holds true if site B fails.

3. If the link between A and C fails, there are two possibilities. The first is to continue running production at A with remote protection at B. The second is to reconfigure concurrent Star to cascaded Star and run in Star protected mode.

4. If site C fails, the only option is to continue running at site A with remote protection at B.

5. If the link between B and C fails there is no effect on Star operations because the standby links between B and C are not used unless there is a failure at site A.

6. If site A fails, production has to be switched to site B or site C. This necessitates a reconfiguration of the RDF devices. The devices at the site to which production was switched become R1 devices and the remaining site is reconfigured to become R2 targets to the new production site.

The choice of which location to fail over to depends on customer needs and the location of customer resources.




Cascaded SRDF/Star3-site disaster recovery over extended distances

Cascaded SRDF: source to two cascaded targets

SRDF link between source and asynctarget in standby mode

Depending on the nature of the failure, can be reconfigured to concurrent Star

SRDF/S< 2

00 km

SRDF/A > 200 km

Workload Site

Nearby Synchronous Target

Short distanceZero data lag

Remote Asynchronous Target

Extended distanceVariable data lag (seconds to minutes)No performance impact

Async Target

Sync Target

Cascaded SRDF/Star was introduced in 2008 with the release of Enginuity 5773. Cascaded RDF allows a synchronous R2 target to also act as a source for SRDF/A. The long distance site in Cascaded RDF uses this source to receive its data feed. In the event of a failure of the workload site, the synchronous target has up-to-date data. The asynchronous target data is not more than two SRDF/A cycles behind the source site data.

Cascaded Star can be reconfigured to run in concurrent mode. If the link between the Workload and Synchronous target sites fails while running in Star protected mode, the Workload and Asynchronous target sites can be directly connected using differential resynchronization.




Cascaded Star Operating States After Failure

1. Link failure between A & B

2. Site B failure

3. Link failure between A & C

4. Site C failure

5. Link failure between B & C

6. Site A failure

Failure Cases Operating State Operating State(without Reconfiguration) (with Reconfiguration)

A - R11

C - R2

A - R11

C - R2

A - R1 B - R21

A - R1 B - R21

C - R2

A - R1 B - R21

A - R11

C - R2

B - R2

B – R21

C – R1

ORB – R11

C - R2

This is a list of possible actions after a failure occurs in a cascaded Star setup.

1. If the link between A and B fails, it is still possible to run production at A with remote protection available at site C but only after a reconfiguration to concurrent Star.

2. The same holds true if site B fails.

3. If the link between A and C fails, there is no effect on Star operations because the standby links between A and C are not used unless there is a failure of site A.

4. If site C fails, the only option is to continue running at site A with remote protection at B.

5. If the link between B and C fails, there are two possibilities. The first is to continue running production at A with remote protection at B. The second is to reconfigure cascaded Star to concurrent Star and run in Star protected mode.

6. If site A fails, production has to be switched to site B or site C. This necessitates a reconfiguration of the RDF devices. The devices at the site to which production was switched become R1 devices and the remaining site is reconfigured to become R2 targets to the new production site.

The choice of which location to fail over to depends on customer needs and the location of customer resources. As is obvious from this diagram, 3 of the 6 failure scenarios necessitate an RDF reconfiguration in Cascaded Star.




Benefits of SRDF/Star

If one site fails, production can continue without losing remote data protection

If the workload site fails, the two remaining target sites can be incrementally synchronized

If loss of primary data center is the principal risk, choose cascaded SRDF/Star

If loss of primary data center and the loss of the synchronous target are possible risks, choose concurrent SRDF/Star

The events of Sep. 11, 2001 made businesses more aware of the critical need to recover their data after a disaster. A few years ago at Share, a major bank from New York did a presentation entitled “The Effects of 9/11”. After the attacks on Sep. 11, this bank failed their data processing over to their New Jersey site. Later that week they were asked by federal regulators, “How are you protected now?”

As the importance of information continues to increase, companies are increasingly interested in protecting their data and minimizing their down time after a failure. SRDF/Star offers customers the business benefits that are a high priority for institutions with mission-critical data.

Both cascaded and concurrent Star have their uses depending on the application environment. If the loss of the primary data center is the principal concern, cascaded Star is a good choice.

If there is a risk of losing the synchronous target as well as the workload site, concurrent Star is a better choice.




SRDF Solutions Comparison

Migration

Three Site

Three Site

Three Site

Three Site

Three Site

Two Site

Two Site

Two Site

Configuration

Data Migration

Data MigrationLowUnlimitedAdaptive CopySRDF/DM

FastNo Data Loss to sec/min

High / Medium

1–200 km Sync200 km-Unlimited

Cascaded SRDF/S and

SRDF/A

SRDF/Star w/ Cascaded SRDF


High / Medium


Concurrent SRDF/S and

SRDF/A

SRDF/Star w/ Concurrent SRDF


High / Medium



SRDF/A


SRDF/A


High / Medium


Cascaded SRDF/S and

SRDF/A

Cascaded SRDF/S and SRDF/A

Fastmin/hrLowestUnlimitedAdaptive CopySRDF/AR

Single Hop

FastNo Data LossHigh / Low1–200 km Sync

200 km-UnlimitedSynchronous ==>

Adaptive CopySRDF/AR

Multi-hop

sec/min

No Data Loss

RPO

Medium

High

Bandwidth

Fast

Fast

RTO

SRDF/A (Asynchronous)

SRDF/S

Solution

UnlimitedAsynchronous

~200 kmSynchronous

Recommended DistancesMode

SRDF is a mature and proven technology with over 33,000 licenses sold. This slide shows the comparative merits of each SRDF technology.




Module Summary

Key points covered in this module:

Two kinds of Star configurations

Origins of Star and its place in the SRDF family

These are the key points covered in this module. Please take a moment to review them.




Underlying Technologies for SRDF/StarAfter completion of this module you will be able to:

Describe SRDF technologies that support Star: – Dynamic, concurrent, and cascaded RDF devices and groups– SRDF/Synchronous and SRDF/Asynchronous– Synchronous SRDF consistency groups managed by the SRDF

daemon– Cycle switching in an SRDF/A Multi-session Consistency (MSC)

environment – MSC Cleanup– Special use of SDDF sessions in tracking changes– Half delete, half swap, and special pair creation commands




© 2008 EMC Corporation. All rights reserved. Module Title - 14

Lesson 1

Upon completion of this lesson, you will be able to:

Describe dynamic, concurrent, and cascaded SRDF devices and groups

The objective for this lesson is shown here.




Dynamic RDF DevicesThe symrdf command permits quick creation and deletion of dynamic RDF pairs

Devices can be:– R1 capable– R2 capable– R1 and R2 capable

Dynamic RDF attribute of a device can be examined in the output of symdev show:

# symdev show 95 -sid 35

Dynamic RDF Capability : RDF1_OR_RDF2_Capable

Dynamic RDF devices can only exist in a Symmetrix that has the Dynamic RDF feature enabled. They can be created to be RDF1 capable, RDF2 capable, or RDF1 and RDF2 capable (as shown above). These devices can be used to build dynamic RDF pairs.

Dynamic RDF pairs can be created or dissolved using the symrdf command. They can belong to static or dynamic RDF groups. Below is a command line example of how a dynamic RDF pair is created

symrdf createpair –file <device file> -sid <xx> -rdfg <n> -type RDF1 –establish

where

• the device file contains device pairs

• sid refers to the Symmetrix ID

• -rdfg refers to the RDF group number




Dynamic RDF Groups

Allow creation and deletion of SRDF groups using the symrdf command

Work over switched SRDF network

The Symmetrix must allow switched RDFSwitched RDF Configuration State : Enabled

Dynamic RDF Configuration State : Enabled

Sample command:symrdf addgrp –label <label> -sid <xx> -rdfg <m> -dir <i> -remote_sid <yy> -remote_rdfg <n> -dir <j>

Additional documentation is located in Chapters 3 and 7 of the SYMCLI SRDF manual

While dynamic groups are not essential for SRDF/Star, they are preferable for ease of reconfiguration.




Concurrent RDFPermits a single source to communicate with two targets

Allows a combination of:– Synchronous– Adaptive Copy– Asynchronous

Two asynchronous connections are not allowed

The Symmetrix must allow switched RDFConcurrent RDF Configuration State : Enabled

Dynamic RDF Configuration State : Enabled

Additional documentation is located in Chapters 2 and 7 of the SYMCLI SRDF manual

In an SRDF configuration, a single source (R1) device can concurrently be remotely mirrored to two target (R2) devices. This feature, available with Enginuity Version 5568-based Symmetrix arrays and higher, is known as a concurrent RDF configuration and is supported with ESCON, GigE, and fibreinterfaces. This allows the availability of two remote copies at any point in time. It is valuable for duplicate restarts or disaster recovery, or for increased flexibility in data mobility and migrating applications.

Concurrent RDF technology can use two different RA adapters (RAs, RFs, or REs) in the interface link to achieve the connection between the R1 device and its two concurrent R2 mirrors. Each of the two concurrent mirrors must belong to a different RDF (RA) group.




Cascaded SRDF

R1 R21 R2

Single device assumes dual roles of Primary and Secondary SRDF simultaneously

Data received by this device as a Secondary can be transferred automatically by this device as a Primary

RPO in the order of seconds or minutes compared to RPO of hours as in the case of SRDF/AR

Can be used to build SRDF/Star configurations

Cascaded SRDF Device

Prior to Enginuity™ version 5773, an SRDF device could be a source device (R1) or a target device (R2), but could not function in both roles simultaneously. Cascaded SRDF introduces the concept of the dual role R1/R2 device, referred to as an R21 device. The R21 device is both an R1 mirror and an R2 mirror.

The first leg in a cascaded configuration can be set to run in synchronous, asynchronous, or adaptive copy modes.

The second leg may run in asynchronous or adaptive copy disk mode. If the first leg is running in asynchronous mode, the second leg may only run in adaptive copy disk mode.

Cascaded SRDF requires additional cache to support SRDF/A in the middle Symmetrix. If multi-session consistency is enabled from the R21 to the R2, there may be some performance degradation between the R1 and the R21 during MSC cycle switching.




Lesson 2


Describe SRDF/S and SRDF/A





SRDF/S Architecture

SRDF/S links

I/O write received from host/server into source cacheI/O is transmitted to target cache

Receipt acknowledgment is provided by target back to cache of source

Ending status is presented to host/server

Source Target

1

4

2

3

Synchronous SRDF mode is primarily used in campus environments. In this mode, Symmetrix maintains a real-time mirror image of the data from remotely mirrored volumes.

Data on the source (R1) volumes and target (R2) volumes are always fully synchronized. Data movement is at the block level.

The sequence of operations is:An I/O write is received from the host/server into the source cache.The I/O is transmitted to the target cache.A receipt acknowledgment is provided by the target back to the cache of the source.An ending status is presented to the host/server.

Synchronous mode is one of three modes in which SRDF can operate. The other modes are Asynchronous and Adaptive copy. Unlike competitive products, SRDF can be dynamically switched to operate in another mode without interrupting host I/O.

Like all synchronous replication solutions, synchronous SRDF has architectural limitations that must be understood:

The maximum distance over which Synchronous SRDF can be used is limited by application timeouts and speed-of-light issues. Link bandwidth must be sized for peak workload at all times.




SRDF/A Architecture

SRDF/A performs Write Folding, which only sends Transmits of the final writes from the Capture Delta Set

Repeat

Source Target

CaptureTransmitReceiveApply

CAPTURE (N)Collects

application write I/O

TRANSMIT (N-1)Sends final set of writes to target

RECEIVE (N-1)Receives writes from Transmit

Delta Set

APPLY (N-2)Once Receive is complete, data is

applied to disk

SRDF/A’s architecture delivers replication over extended distances with no performance impact.

SRDF/A uses Delta Sets to maintain a group of writes over a short period of time. Delta Sets are discrete buckets of data that reside in different sections of the Symmetrix cache. Starting at 1, each Delta Set is assigned a numerical value that is one more than the preceding one.

There are four types of Delta Sets to manage the data flow process.

The Capture Delta Set in the source Symmetrix (numbered N in this example), captures (in cache) all incoming writes to the source volumes in the SRDF/A group.

The Transmit Delta Set in the source Symmetrix (numbered N-1 in this example), contains data from the immediately preceding Delta Set. This data is being transferred to the remote Symmetrix.

The Receive Delta Set in the target system is in the process of receiving data from the transmit Delta Set N-1.

The target Symmetrix contains an older Delta Set, numbered N-2, called the Apply Delta Set. Data from the Apply Delta set is being assigned to the appropriate cache slots ready for de-staging to disk. The data in the Apply Delta set is guaranteed to be consistent and restartable should there be a failure of the source Symmetrix.

The Symmetrix performs a cycle switch once data in the N-1 set is completely received, data in the N-2 set is completely applied, and the 30 second minimum cycle time elapsed. During the cycle switch, a new delta set (N+1) becomes the capture set, N is promoted to the transmit/receive set and N-1 becomes the apply Delta Set.




Dependent Writes Ensure Data Consistency

Dependent write logic:– If ‘A’ is a predecessor and ‘B’ is a dependent write:

Any I/O ‘B’ that arrives after I/O ‘A’ has completed, must be dependent on ‘A’

SRDF/A ensures that:– ‘A’ and ‘B’ are in the same Delta Set

or– ‘B’ is in a later Delta Set

These Delta Sets (cycles) of I/Os, not the I/Os themselves, are ordered by SRDF/A

Symmetrix ensures that dependent write relationships are honored during Delta Set switch or Write Folding

Database application consistency forms the backbone of SRDF/A design. Inherently, all database applications are consistent, which means that a database application does not issue a dependent write unless a predecessor write is completed.

For example, a DBMS does not issue a dependent data write unless a predecessor write to the log was successfully completed. EMC’s consistency technology honors this dependent write logic. By honoring write ordering at the time of the Delta Set switch, SRDF/A guarantees dependent write consistency.




Lesson 3


Describe Synchronous SRDF Consistency Groups





SRDF Daemon (storrdfd)

System process on Unix and Windows

Interacts with:– Base Daemon (storapid)– Enginuity Consistency Assist (RDF-ECA)– Group Name Services (GNS) daemon

Maintains Consistency– On SRDF/S composite groups with consistency enabled– Performs cycle switching in SRDF/A when MSC is active– Performs MSC Cleanup in SRDF/A

Cooperates with daemons running on other hosts

Storrdfd (pronounced “store” R-D-F-D) is a process that runs as a daemon on Unix systems and as a service in Windows. It is referred to as the SRDF daemon and uses the base daemon for all its communications with the Symmetrix, such as the issuing of syscalls.

In an SRDF/S environment, the RDF daemon cooperates with RDF-ECA to maintain consistency for composite groups.

If GNS is enabled on the host, the SRDF daemon interacts with the GNS daemon to acquire composite group definitions. Otherwise, it gets definitions from the SYMAPI database.

In an SRDF/A environment, the SRDF daemon is responsible for cycle switching when Multi-Session Consistency is enabled.

The RDF daemon is designed for full cooperation with other RDF daemons. Any task for which the daemon is responsible, such as an MSC cycle switch, can be initiated by one RDF daemon and completed by another RDF daemon. At no time is there a single point of failure if there are two or more RDF daemons monitoring the same processes.

It is therefore advisable to have more than one host running the SRDF daemon in an environment where the daemon’s services are necessary. Such a configuration provides redundancy in case one of the daemons stops unexpectedly.




Consistency: The Dependent Write I/O PrincipleLogical dependency between write I/Os– Embedded in the logic of an application, operating system, or DBMS

A write I/O is not issued by an application until a prior related write I/O is completed– A logical dependency, not a time dependency– Inherent in all Database Management Systems (DBMS)

Page (data) write is dependent write I/O based on a successful log write– Power failures create a dependent write consistent image– Restart transforms dependent write consistent to transactionally consistent

Ensuring ‘dependent write consistency’ is the basic principle behind all EMC Consistency Technology solutions– SRDF Consistency Groups– TimeFinder Consistent Split– Consistent SNAPs and Consistent Clones– SRDF/A– Open Replicator for Symmetrix

Almost all commercial applications, such as databases, are inherently consistent by design. EMC’s consistency technology makes it possible for consistency to be maintained when replicas of production data are made.

All logging database management systems use the consistency principles described on this slide to maintain integrity. This is required for the protection against local power outages, loss of local channel connectivity, or storage devices. There is a logical dependency between I/Os built into database management systems, certain applications, middleware tools such as MQ Series, and operating systems.

EMC can create a dependent write consistent local image with its TimeFinder family of products, whereas SRDF Consistency Groups and SRDF/A create a consistent image on one or more remote Symmetrix arrays.




Enginuity Consistency Assist (ECA)

ECA is a feature that works inside the Symmetrix array

Stalls write I/Os to a user-defined list of Symmetrix devices prior to splitting a source volume and its replica

Used for:– Open Replicator consistent activation– TimeFinder consistency

TF/Mirror consistent splitTF/Clone consistent activationTF/Snap consistent activation

Enginuity Consistency Assist is a feature introduced with Enginuity 5x67. It stalls write I/O to a user-provided list of devices prior to a consistent TimeFinder split or a consistent activation of Open Replicator, TimeFinder Clone, or TimeFinder Snap. Reads are allowed to continue during this time. Once the split or activation is complete, I/O is allowed to flow again. The stalling of write I/Os guarantees that the copy of data being split or activated is dependent write consistent.




RDF-ECA

Used with SRDF/S to hold write I/Os to a consistency group until all relevant links are suspended

Interacts with RDF daemons on one or more control hosts to manage consistency

Can replace PowerPath and MF Consistency Group Task to manage consistency on FBA and CKD volumes

Supports synchronous consistency in concurrent and cascaded RDF composite groups

RDF ECA is an extension to ECA released in Enginuity 71. It interacts with the RDF daemon to manage consistency of a user-defined RDF consistency group. RDF ECA can manage consistency for CKD and FBA devices.




ECA WindowECA is activated by Enginuity when:– Host issues a TimeFinder consistent split or activate command– Host issues an Open Replicator consistent activation command– An SRDF/S I/O directed at a consistency group fails to complete on the

remote side

At activation, a 30 second timer (ECA window) starts

While ECA window is open, Enginuity requests host HBAs to retry write I/Os to affected devices

When the desired action (split/activate/suspend) completes, the ECA window is closed and I/O can flow again

If the action fails to complete within 30 seconds, I/O is allowed to flow again, but an error message is logged

In the context of TimeFinder, ECA window is the name given to a 30 second timer that starts when a consistent split or consistent activation is initiated.

In the context of RDF-ECA, the 30 second timer is started by the Symmetrix after it determines that a write I/O to a device in a consistency group cannot complete on the remote array.

Once the ECA timer starts, Enginuity does not accept write I/Os to the affected devices. Instead, it asks the host HBAs to retry the I/O. When the required action completes, the ECA window is closed and I/O is permitted to flow again.

If, for some unexpected reason, the required action does not complete before 30 seconds are up, Enginuity closes the ECA window. It allows I/O to flow again while recording an error message in the host-based log file.




Composite Groups

Similar to Device Groups– Four types: Regular, RDF1, RDF2, and RDF21– Can be used for all Control operations (e.g., TimeFinder, SRDF, etc.)– Can participate in consistency operations such as TimeFinder/Mirror

consistent splits and TimeFinder/Snap activate

Different from Device Groups– Can span multiple RDF groups and Symmetrix arrays– Managed by RDF daemon if created with –rdf_consistency option

and consistency is enabled– Required for SRDF/A Multi-session Consistency (MSC)

Composite groups are similar to device groups. They can be type RDF1, RDF21, RDF2, or Regular, and used for every action that is available with device groups.

Composite groups can span RDF groups and Symmetrix arrays. For example, a host running a database application spanning two Symmetrix arrays can use a composite group to perform a consistent TimeFinder split of the application data. If the type of the Composite Group is RDF1, RDF21, or RDF2, the CG (Composite Group) can span RDF groups.

Composite groups are a requirement for building SRDF consistency in SRDF/S and SRDF/A. When a CG is created for SRDF consistency using RDF-ECA, the “–rdf_consistency” option must be specified at group creation time.




RDF Daemon and SRDF/S Consistency

A link failure causes Symmetrix to pause writes to the CG devices in that Symmetrix

ReceiveN

ApplyN-1

ReceiveM

ApplyM-1

CaptureN+1

TransmitN

CaptureM+1

TransmitM

Host I/O

Host I/O

Host I/O

Host I/O

SRDF LINKS

SRDF LINKS

CG

R1 R2

R1 R2

R1 R2

R1 R2

R1 R2

R1 R2

R1 R2

R1 R2

RDF daemon

monitoring

When synchronous RDF consistency is enabled for a consistency group, the RDF daemon polls the Symmetrix every second to monitor the health of the consistency group.

Assume that the links connecting one of the Symmetrix pairs fail. When the source Symmetrix fails to complete writes to the remote devices, it starts the ECA timer window. All subsequent writes to the devices belonging to the composite group in that Symmetrix are turned back with retry requests issued to the host HBA. During this time, no dependent writes are issued by the application, because the host database application has not been notified of the completion of the predecessor write.




RDF Daemon and SRDF/S Consistency (Cont)

Next, the RDF daemon requests logical link suspension of the remaining devices…

ReceiveN

ApplyN-1

ReceiveM

ApplyM-1

CaptureN+1

TransmitN

CaptureM+1

TransmitM

Host I/O

Host I/O

Host I/O

Host I/O

SRDF LINKS

SRDF LINKS

CG

R1 R2

R1 R2

R1 R2

R1 R2

R1 R2

R1 R2

R1 R2

R1 R2

Daemonsuspends RDF links

When the RDF daemon recognizes that one of the Symmetrix pairs has lost connectivity, it requests the remaining Symmetrix arrays to open an ECA window which will hold incoming writes as well.

Once all ECA windows are open and write I/O is stopped to the entire consistency group, the daemon logically suspends the remaining communication links.




RDF Daemon and SRDF/S Consistency (Cont)

leaving the R2 devices dependent-write consistent

ReceiveN

ApplyN-1

ReceiveM

ApplyM-1

CaptureN+1

TransmitN

CaptureM+1

TransmitM

Host I/O

Host I/O

Host I/O

Host I/O

SRDF LINKS

SRDF LINKS

CG

R1 R2

R1 R2

R1 R2

R1 R2

R1 R2

R1 R2

R1 R2

R1 R2

RDF daemon

monitoring

Once all the links have been suspended, the ECA windows are closed and the writes to the local arrays are allowed to complete. Note that writes to the first group of devices were held as soon as the links failed and the remote writes did not complete. Thus, if the host was running a database application, no dependent write could have been issued by the host application between the time that links on the first Symmetrix failed and I/O flow was restored by the RDF daemon to all devices. This makes the target site data consistent.




RDF Daemon and MSC Cycle Switching

SRDF LINKS

CaptureN+1

TransmitN

ReceiveN

ApplyN-1Host I/O

SRDF LINKS

CaptureM+1

TransmitM

ReceiveM

ApplyM-1Host I/O

SRDF LINKS

CaptureN

TransmitN-1

ReceiveN-1

ApplyN-2

SRDF LINKS

CaptureN

TransmitN-1

ReceiveN-1

ApplyN-2

CG: MSCHost I/O

Host I/OTime to perform

cycle switch!

RDF daemon

monitoring

Daemon monitors RDF devices that belong to MSC group

A second function of the RDF Daemon is to maintain Multi-session Consistency (MSC) in an SRDF/A environment. MSC is important when consistency must be maintained between multiple production applications running on multiple SRDF groups. The example on this slide illustrates how the RDF daemon maintains consistency while cycle switching during normal MSC operations.

The RDF Daemon (or daemons) monitors all groups and manages cycle switching for all R1 Symmetrix arrays whose sessions are managed by MSC.

When the minimum cycle time, which by default is set to 30 seconds, has elapsed, the RDF Daemon verifies that:

each R1 Symmetrix array has completed transferring the Transmit Delta Set to the R2s andeach R2 Symmetrix has completed applying the apply delta sets.

Until the conditions above are satisfied for each RDF group in each Symmetrix array, the cycle switching is not initiated and the present cycle gets elongated.

Once all RDF groups indicate their readiness to switch, the RDF daemon briefly holds writes to the source arrays and switches the cycles first on the source and then on the target arrays. The cycle switching is an asynchronous process so all the source and target boxes do not switch in the same instant, they switch one after the other. Host writes are allowed to flow into the source arrays as soon as the source array has switched, whereas transmit data is allowed to flow into the target array as soon as the target array has switched.




Lesson 4


Describe MSC Cleanup using the SRDF Daemon





MSC Cleanup After an SRDF/A Trip

A trip can occur at different times in the SRDF/A cycle

From the viewpoint of a single R2 Symmetrix, there are only 2 possible states for a receive delta set:– The receive delta set is incomplete

Symmetrix knows it is incomplete, so it is automatically discardedNo MSC Cleanup needed

– The receive delta set is completeSymmetrix marks the session as Needing MSC CleanupDisposition of delta set depends on status of other R2 Symmetrix arrays in the same SRDF/A MSC protected CG group

If all are complete, then it is all right to commit the delta setIf all are not complete, then delta set must be discarded

The third and final function of the RDF daemon is to manage SRDF/A multi-session consistency in the event of a failure of communication between source and target.

When there are multiple Symmetrix arrays or SRDF groups participating in a multi-session consistency group, the Symmetrix sets the “MSC cleanup required” flag if the receive Delta Set was completely received at the time the failure occurred.

A single Symmetrix, with the SRDF/A MSC flag set, cannot determine the correct action to take for a completely received Delta Set without information from other Symmetrix arrays in the SRDF/A MSC protected consistency group.

MSC Cleanup can be invoked by any of the following methods:The RDF daemon performs MSC cleanup automatically if it can communicate with the target arrays.The API/CLI automatically performs MSC cleanup during the processing of any RDF control command.User can manually execute MSC cleanup through CLI.

The MSC Cleanup Needed status is exported to user-visible displays such as query output. MSC Cleanup commits receive cycle data in case of failure during cycle switch instead of discarding it unnecessarily.




RDF Daemon and MSC Cleanup

MSC Cleanup is needed in the bottom Symmetrix only

SRDF LINKS

CaptureN+1

TransmitN

ReceiveN

ApplyN-1Host I/O

SRDF LINKS

CaptureM+1

TransmitM

ReceiveM

ApplyM-1Host I/O

SRDF LINKS

CaptureN+1

TransmitN

ReceiveN

ApplyN-1

SRDF LINKS

CaptureN+1

TransmitN

ReceiveN

ApplyN-1

CG: MSCHost I/O

Host I/O ReceiveDelta SetComplete

ReceiveDelta Set

Incomplete

?RDF

daemonmonitoring

Assume that the links between the source and target arrays have tripped. The Receive Delta set in the top array is incomplete while the Receive delta set in the bottom array is complete.

In the top case, because the Receive Delta set is incomplete, the only valid choice for the Symmetrix is to discard it because the dependent write principle only works for complete Delta Sets.

For the bottom case, the Receive Delta set is complete. Since this is an MSC protected group, the Symmetrix cannot decide what to do on its own.

If all Receive Delta sets were complete, it would be correct to Apply the data.However, if any of the Receive Delta sets are incomplete, then the data must be discarded.The Symmetrix sets the MSC Cleanup Needed flag.

In the example displayed on this slide, MSC Cleanup is undertaken by one of the three methods mentioned earlier:

The RDF daemon;Any RDF control command issued by the API/CLI;The user issues an explicit “symstar cleanup” command.




MSC Cleanup Logic

All SRDF/A MSC sessions in the CG are inventoried– Is the MSC Cleanup Needed flag set?– What are the Apply and Receive Delta set cycle numbers?

MSC Cleanup logic decides what to do (4 possibilities)

MSC Cleanup Needed Symms must discardA=NMSC Cleanup

NeededA=NNo Cleanup Needed2

A=N-1R=N

A=N

-

B

A=NR=N+1

A=N

A

4

3

1

#

Failure occurred during a cycle switch, all are committed

MSC Cleanup Needed

No Cleanup Needed

All complete, all are committedMSC Cleanup Needed

MSC Cleanup Needed

All discarded, no MSC CleanupNo Cleanup Needed

No Cleanup Needed

ActionR2 SymmR2 Symm

Once the RDF daemon on the source side notices a trip event, it runs the MSC cleanup logic on the target arrays if it can communicate with them. The legend A=N means the Apply Delta set is numbered N. Similarly, R=N+1 means that the number of the Receive Delta Set is N+1. Though the table shown here uses two Symmetrix units, the logic works for larger numbers of arrays.

1. In this case, none of the SRDF/A sessions have the “MSC Cleanup Needed” flag set. This occurs when all the Receive Delta sets were incomplete and all were automatically discarded. There is no Cleanup action to take and it is not invoked automatically.

2. Only some Symmetrix arrays have the “MSC Cleanup Needed” flag raised. Also, ALL Apply Delta set numbers are the same. This means that some Symmetrixes had to discard their incomplete Receive Delta Sets. Consequently, all the Symmetrixes needing MSC Cleanup must discard their completely received Delta Sets.

3. All Symmetrixes have the “MSC Cleanup Needed” flag raised. In this case, ALL Apply Delta Set numbers must be the same. This indicates that all Receive Delta Sets are complete and all the Receive Delta Sets can be applied.

4. Only some Symmetrix units have their flag raised. Also, one or more Symmetrixes with the flag raised has a Receive Delta Set number that matches the Apply cycle number for a Symmetrix which discarded its incomplete Receive cycle. This indicates a failure in the middle of a cycle switch. So, all the completely received Receive Delta Sets in the Symmetrix arrays with the flag raised are applied.




Lesson 5


Describe special SDDF and RDF features in support of STAR





Symmetrix Differential Data Facility

Each Symmetrix logical volume can support up to 16 sessions

SDDF sessions comprise bitmaps that flip a bit for every track that changed since the session was initiated

SDDF sessions are used to monitor changes in:– Clones– Snaps– BCVs– Change Tracker– Open Replicator

Enhanced to support SRDF/Star

Each Symmetrix logical volume is allotted a quota of 16 SDDF sessions. These sessions allow the Symmetrix to track changes using bitmaps, which flip from a zero to a one whenever a monitored track changes.

SDDF sessions are used to monitor changes in BCVs, Clones, Snaps, Change Tracker, and Open Replicator.

SDDF functionality was enhanced for SRDF/Star to enable differential resynchronization between two target sites. Once Star is enabled, two sessions are created and activated at the Synchronous target site, and one SDDF session is created at the Asynchronous target site.




SDDF Session Usage in Concurrent Star

R2

R2

R11

Site B – 2 Active SDDF Sessions per each device

SRDF/A

SRDF/S

Passive Link

1001011001001…

0001011100100…

000000000000…

Site C – 1 Inactive SDDF Session per each device

Site A – RDF daemon from control host manages SDDF sessions

When Star Protection is enabled, two SDDF sessions are created at site B and one SDDF bitmap is created at site C. The bitmaps at site B are always active during normal Star operation. They are alternately marked and cleared after every two or more SRDF/A MSC cycles elapse between sites A and C.

The bitmap at site C stays inactive during normal Star operation.




Concurrent STAR – When Site a Fails

R2

R2

R11SRDF/A

SRDF/S

Passive Link

1001011001001…

0001011100100…

000000000000…

IOR

SDDF sessions at Site B frozen, since data flow to B stops

Inclusive OR of 2 SDDF bitmaps at B used to resolve track differences between B and C

2 SDDF bitmaps

1 SDDF bitmap

If the primary site fails, data transmission to both sites stop simultaneously.

Under these circumstances, the data at Synchronous Target B is more recent than the data at Asynchronous target C.

In the course of recovery, an inclusive OR of the two bitmaps is performed at site B. This operation marks all tracks updated in the current bitmap and all tracks updated in the previous bitmap as owed to site C. Since the bitmap initialization at site B occurs every two plus cycles, it is possible that the inclusive OR will result in more than the minimum required tracks being marked as invalid. This is not a problem since by copying a few more tracks than needed, we err on the side of caution.

MSC cleanup needs to be run at site C if needed.

If a business decision is made to run production at site B, the RDF devices at site B are turned into R1 volumes and paired with corresponding R2 volumes at site C. An RDF establish now copies the invalid tracks from site B to site C.

If the decision is to run at site C, the devices at B are turned into R2s and those at site C into R1s. An RDF restore updates the C site with tracks owed by B to C.




Concurrent STAR – Rolling Disaster

R2

R2

R11

First Failure

Passive Link

1001011001001…

0001011100100…

110011001000…

IOR

When link to Site B fails– SDDF bitmaps at Site B are frozen since data flow to B stops– SDDF bitmap and Token Counter at C (not shown in diagram) activated at

SRDF/A cycle boundary– Token counter at Site C counts elapsed cycles since activation

After failure of Site A, inclusive OR of both SDDF bitmaps at B and bitmap at C used to resolve track differences between B and C

Second Failure

2 SDDF bitmaps

1 SDDF bitmap IOR

The failure described here is often referred to as a rolling disaster, where the first failure is succeeded by a second one. Here, the first fault disrupts the links between A and B. This causes the synchronous consistency group to trip, leaving the data at site B consistent. The SDDF sessions at site B are frozen for later conversion to invalid tracks. Data processing continues at site A, and site C continues to get updated.

When the synchronous link fails, the SDDF session at site C is activated on a cycle boundary just prior to the next cycle switch. This SDDF session records new writes coming into site C. Additionally, a token counter is started at C. It starts counting the number of cycle switches after activation.

Shortly after the first failure, the primary site fails, causing data transmission to stop at site C. If the second failure occurs more than two SRDF/A cycle switches after the first failure (as recorded by the token counter), site C will be more current than site B.

A Star query after the final primary site failure indicates which side is more current.

An inclusive OR between the two SDDF bitmaps at site B and an inclusive OR between the resulting bitmap and the bitmap at site C, creates the invalid track table that must be resolved when the two sides are synchronized.

If data at site C is more current, the synchronization should cause tracks to flow from C to B. If the token counter indicates that B is more current than C, new data flows from B to C.




SDDF Session Usage in Cascaded Star

R2

R2

R1

Site B

SRDF/A

SRDF/S

Passive Link

1001011001001…

0001011100100…

000000000000…

Site C – 1 Inactive SDDF Session per each device

Site A – RDF daemon manages 2 Active SDDF sessions per each device

When Star Protection is enabled, two SDDF sessions are created at site A and one SDDF bitmap is created at site C. The bitmaps at site A are always active during normal Star operation. They are alternately marked and cleared after every two or more SRDF/A MSC cycles elapse between sites A and C.

The bitmap at site C stays inactive during normal Star operation.




Cascaded STAR – When Site B Fails

R2

R21

R1 SRDF/A

SRDF/S

Passive Link

1001011001001…

0001011100100…

000000000000…

If Sites A and C are reconfigured in concurrent mode:

Inclusive OR of 2 SDDF bitmaps at A used to resolve track differences between A and C at reconfiguration time

2 SDDF bitmaps 1 SDDF bitmap

IOR

The failure of site B in Cascaded Star is a major failure, since reconfiguration from cascaded to concurrent Star must be undertaken in order to provide remote data protection. When the link between A and C is activated, the SDDF bitmaps at site A are used to determine the invalid tracks that must be moved from A to C.




Cascaded STAR – When Site A Fails

R2

R21

R1 SRDF/A

SRDF/S

1001011001001…

0001011100100…

000000000000…

SDDF sessions at Site A frozen

Since B and C already have a track table relationship, there is no need for SDDF sessions

2 SDDF bitmaps 1 SDDF bitmapPassive Link

If the workload site fails in a cascaded star environment and the decision is made to switch production to either target site, the SDDF sessions are not needed because the differences between the B and C sites are recorded in the track tables.




Half Delete SRDF Pair

Requires 5x71 or later version of Enginuity

Deletes half of the RDF pair relationship

Can be used to dissolve RDF relationships if partner device is unavailable

RDF pair relationship shows up as half

Normal Configuration Suspended State

R1 2-WayMir

After half delete

R2R1

A half delete operation can be executed on a dynamic RDF pair using SYMCLI commands. After the half delete command is executed, the device in the Symmetrix on the left turns into a regular device, and the one on the right retains its identity. A SYMCLI query shows it as a half pair. The SRDF pair state must be suspended, failed over, split or partitioned before a half delete can be performed.

The half delete of SRDF pairs is used by SRDF/Star in a disaster situation.

The command is also available for general use, but only in special cases. If an existing RDF relationship is rendered null and void by the physical removal of one of the Symmetrix arrays, without the termination of the SRDF relationships, the half delete command can be used to dissolve remaining RDF volumes.

Do not use the half delete command when both arrays in an RDF relationship have visibility to each other.




Half Swap

Changes personality of one side of an RDF relationship

After a half swap– the RDF pair configuration for the

device shows up as “Duplicate”OR– the RDF pair configuration shows up

as normal if one of the pair state was “Duplicate”

BEFORE HALF SWAPNormal Configuration – Suspended State

R1R1

R2R1

AFTER HALF SWAPDuplicate Configuration

R2R1

R1R1

BEFORE HALF SWAPR1 and R2 are in duplicate pair state

AFTER HALF SWAPNormal Configuration – Suspended State

The half swap operation changes the personality of one SRDF volume, irrespective of whether the other RDF volume is visible or not.

There are two uses for the half swap command while reconfiguring devices during a Star action:

1. Sometimes during a site reconfiguration, an R2 device is half swapped so it becomes an R1 device. This makes the pair relationship “duplicate” since there are now two R1 devices in the pair pointing at each other.

2. At other times in the course of a site reconfiguration, a half swap converts a duplicate device pair into a normal device pair by turning one member of the duplicate pair from an R1 to an R2.




Special Create Pair Options

Note: These commands are not available to users

Two forms of RDF pair creation used only in SRDF/Star

createpair with “nocopy” option – Creates a dynamic RDF pair without copying data– Declares both sides equal without any tracks being moved– Used during a planned switch from one workload site to another

createpair with “refresh” option– Uses SDDF sessions at synchronous and asynchronous targets to

perform an incremental resynchronization

The two functions described on this slide were created for the purpose of Star and are not available to users.

Creation of a dynamic RDF pair without copying data is an action that risks data corruption if it was not 100% certain that the devices in the pair did, in fact, contain identical data. This function is used in the case of a planned workload site switch when applications are halted and all three sites are made equal prior to a switch.

Creation of a dynamic RDF pair with an incremental refresh is only possible based on the SDDF bitmaps at the synchronous and asynchronous target sites. This is the key behind SRDF/Star’s ability to switch workload sites without a full refresh.




Module Summary

Key points covered in this module:– Dynamic, concurrent, and cascaded RDF devices and groups– SRDF/Synchronous and SRDF/Asynchronous– Synchronous SRDF consistency groups managed by the SRDF

daemon– Cycle switching in an SRDF/A Multi-session Consistency (MSC)

environment – MSC Cleanup– Special use of SDDF sessions in tracking changes– Half delete, half swap, and special pair creation commands

These are the key points covered in this module. Please take a moment to review them.




Using SRDF/StarUpon completion of this module, you will be able to:

Describe Symmetrix parameters required to run SRDF/Star

List the host software components needed for Star

Explain concurrent SRDF/Star operations

Explain cascaded SRDF/Star operations





Minimum Hardware Requirements3 Symmetrix DMX systems, one for each site

2 SRDF director boards per Symmetrix

Equal number and size of SRDF devices for each Symmetrix at each site

Primary control host with Solutions Enabler at each site from where SRDF/Star will be managed

R2

R2

R11

Symmetrix 1Site A

Symmetrix 3Site C

Symmetrix 2Site B

Concurrent SRDF/Star Configuration

Cascaded SRDF/Star Configuration

SRDF/A

SRDF/S

Passive Link

R2

R21

R1

Symmetrix 1Site A

Symmetrix 3Site C

Symmetrix 2Site B

SRDF/A

SRDF/S

Passive Link

At EMC, people sometimes use the words “director” and “director board” interchangeably. For the sake of the discussion below, a director board has 8 ports managed by 4 microprocessors. When configured for SRDF, it is recommended practice to assign one microprocessor per RDF connection, leaving the other port on that microprocessor open.

When running concurrent RDF as Star does, it is recommended that at least 2 of the 4 microprocessors on each 8-port director be dedicated to SRDF traffic. This is in accordance with Symmetrix performance engineering guidelines that SRDF/A and SRDF/S traffic should not be allowed to run on the same microprocessor.

Two director boards are required to guarantee redundancy.

In addition to RDF devices, it is highly recommended that an equal number of Clone capable devices (e.g., BCVs) be provisioned for each Symmetrix target to which production could be switched.

It is recommended that redundant control hosts are run at each site from where SRDF/Star will be managed. Redundant hosts allow for redundant RDF daemons, which are necessary to avoid a single point of failure.

Though not shown on this diagram, it is assumed that each site to which production can be switched, is provisioned with hosts capable of running production applications.




System RequirementsSolutions Enabler V6.2 or higher– Need 6.5 for Cascaded SRDF/Star

Minimum Enginuity level – 5671 or 5771– Need 5773 at Sites A and B for Cascaded Star

Symmetrix level settings– Switched RDF Configuration State is Enabled– Concurrent RDF Configuration State is Enabled– Dynamic RDF Configuration State is Enabled– Concurrent Dynamic RDF Configuration is Enabled– RDF Data Mobility Configuration State is Disabled– RDF Directors are Fibre-Switched or GigE

SRDF Group Level Settings– Prevent Auto Link Recovery is Enabled– Prevent RAs Online Upon Power On is Enabled

The system requirements for Star are listed on this slide. Please take a moment to review them. The information related to the Symmetrix and the SRDF groups can be verified by the use of SYMCLI commands.




SRDF/Star Licensing and HardwareWorkload Site Synchronous Target Asynchronous TargetRequired Hardware Required Hardware Required HardwareDMX running 5x71 / 5773 DMX running 5x71 / 5773 DMX running 5x71

2+ remote adapters (GigE or FC) 2+ remote adapters (GigE or FC) 2+ remote adapters (GigE or FC)

Recommended Hardware Recommended Hardware Recommended Hardware

Available BCV Cap. for all R2s Available BCV Cap. for all R2s

Required Licenses Required Licenses Required Licenses(if failing over to this site) (if failing over to this site)

SE Base license SE Base license SE Base license

SRDF/Star license SRDF/Star license SRDF/Star license

SRDF/S license SRDF/S license

SRDF/A license SRDF/A license SRDF/A license

SRDF/CG SRDF/CG SRDF/CG

Optional Licenses Recommended Licenses Recommended Licenses (if failing over to this site) (if failing over to this site)

TimeFinder/Clone TimeFinder/Clone TimeFinder/Clone

TimeFinder/CG TimeFinder/CG TimeFinder/CG

Concurrent Star can be run in environments capable of running Enginuity 5x71. Cascaded Star can be run in environments that support 5773 or later revisions of microcode. If Cascaded Star is never used, the SRDF/CR licenses are not needed.




GNS in the SRDF/Star EnvironmentGNS Advantages– Consistent common composite group definition maintained for all

management hosts– Reduces possibility of human error when there are several

management hosts at each location

GNS Disadvantages– Concurrent group definitions not propagated over SRDF links

SYMAPI_USE_GNS=ENABLE in options file to start GNS

Command to determine if database is in GNS# symcfg -db

GNS State : Enabled

When planning an SRDF/Star implementation, a consideration is the use of Global Naming Services. GNS can be started by setting the value of SYMAPI_USE_GNS to ENABLE in the options file on the management host. This file is located on Windows hosts at \Program Files\EMC\SYMAPI\config\options. It is located in /var/symapi/config/options on Unix hosts.

The use of GNS in an SRDF/Star environment can simplify management tasks when there are several management hosts at each site cooperatively managing Star. In such a case, it greatly reduces the chance of errors caused by someone changing a CG definition on one management host, but not on the other.

GNS cannot propagate concurrent CG or DG definitions across SRDF links. Using GNS does not obviate the need for copying the Star definitions file to the other management hosts.

If GNS is enabled, the RDF daemon must be explicitly started on the management host. Details on how to do this are provided on a later page.

More information on GNS is available in the Array Management CLI product guide.




Star Options and Internal Definitions Files

Star Options File– Created with text editor on a host at Workload site– Used for:

Defining site names for the 3 sitesSpecifying parameters that govern SRDF/Star behaviorCreating the SRDF/Star internal definitions file with help of symstarsetup command

Star Internal Definitions File– Created from Star options file– Copied from Workload site host to other management hosts– Used by the symstar command– Should not be modified by user

The Star options file is created by the user with a text editor. It specifies parameters shown on the next page. The setup command translates the contents of the options file and writes them into the Star internal definitions file. This file is used by the symstar command for all its actions. The internal definitions file should not be modified by users. Any changes should be instituted through the options file.




Action Categories for SRDF/StarNormal Operation– Used for configuration setup– Connecting, protecting, and enabling configuration– Isolation of sites

Transient Fault Operation– Caused by temporary loss of network connectivity or either target site– Reset the environment

Unplanned Switch Operation– Caused by Workload Site fault – Cleanup– Unplanned switch, keep local/remote data

Planned Switch Operation– Purposeful switching of workload to another site– Halt, Halt -reset– Planned switch

During normal operation of Star, the list of activities consists of configuring and setting up Star. Connecting, Protecting and Enabling are the steps required to achieve Star protection. Site isolation is available to temporarily isolate a site for maintenance purposes.

A temporary failure caused by the outage of the network or either remote array is classified as a SRDF/Star transient fault. It does not disrupt the production workload site, and only requires remote site recovery and protection procedures executed at the workload site.

A fault caused by a workload site loss is classified as a disaster. A disaster necessitates the unplanned switching of the workload to either remaining remote sites. Even after the move, disaster protection is available because of the asynchronous SRDF relationship created between the remaining remote sites.

Planned workload switch operation defines the system in a state where the user can move the production workload from site to site in a planned procedure. It is typically undertaken when returning to the original Workload site after a disaster had forced a move of production activity to one of the target sites.

This type of operation assumes and enforces a behavior of the customer stopping the workload in the current production site, draining and synchronizing both remote sites, halting the system, and then “switching” the workload to either of the remote production sites.




Actions Used for Normal Operationsetup (workload)

– Reads/verifies the CG definition and options file– Generates the SRDF/Star internal definition file

buildcg (workload or target)– Reads the internal definition file and creates a composite group suitable for the site

connect (workload)– Performs SRDF reconfiguration and starts data flow in Adaptive Copy Disk Mode

disconnect (workload or target)– Suspends SRDF data flow– Places the path in Adaptive Copy Disk Mode

protect (workload)– Transitions to correct mode (sync/async) and enables consistency protection

unprotect (workload)– Deactivates consistency protection and transitions target to Adaptive Copy Disk Mode

enable (workload)– Provides SRDF/Star consistency protection and activates SDDF sessions

disable (workload)– Deactivates SRDF/Star consistency protection and optionally deletes SDDF sessions

isolate (workload)– Isolates a target site and makes the R2 devices R/W enabled

A brief description of the actions used in Star are provided on this and the next two slides.

Unlike RDF commands which can be issued from the source or target site, Star actions can only be initiated from a particular site. Most actions are allowed from the workload site only as indicated by the word “workload” in parenthesis next to the command.

The disconnect action can be issued from either the source or the target site.

The buildcg action is a utility that assists the user in creating the Star composite group based on the information contained in the internal definition file created at the time of setup. Based on the site from which it is run, buildcg can be used both from the workload or the target sites.




Transient Fault and Informational Commandsreset (workload)

– Cleans up internal metadata and Symmetrix cache at remote site after temporary problem (such as loss of connectivity) has been resolved

query (workload or target)– Displays status of the SRDF/Star configuration– Last action performed from that management host

show (workload or target)– Displays SRDF/Star internal definition file contents

Options selectedSymmetrix resources Optionally, all devices in the configuration

The reset action is used after the loss of a target. It performs MSC Cleanup, if needed. It can only be run from the workload site. The informational commands can be run from any site.




Actions for a Planned or Unplanned Switchcleanup (target)

– Performs MSC Cleanup at the asynchronous target– Allows for “Gold” copy capture prior to resynchronization

switch (target)– Switches workload to a remote site, either synchronous or asynchronous

halt (workload or target)– Disables Star consistency protection– Stops application workload from writing to R1 devices– Allows all invalid tracks and cycles to drain, resulting in all 3 sites having the

same data

halt –reset (workload)– write enables the R1 devices at a halted workload site

reconfigure (workload)– Changes the Star setup from cascaded to concurrent and vice versa

cleanup is performed on the asynchronous site if the target site is in PathFail;CleanReq state. While reset and some symrdf commands perform MSC Cleanup without requiring the user to issue an explicit cleanup, it is always a good idea to issue a cleanup command after the failure of the asynchronous site.

The switch action is executed to move the workload to either target site. It is used both for a planned as well as an unplanned workload switch.

reconfigure allows the changing of a star configuration from concurrent to cascaded and vice versa.




BCV Usage in Starsymstar command does not manage BCVs

BCVs are recommended at target location

BCVs are used to preserve a consistent data copy before resynchronizing a target site in the process of recovering from a link or site failure

Although the symstar command does not manage BCVs at target sites, BCVs are an important piece of Star operation. The purpose of BCVs is to preserve a gold copy of the data at the target site after :

Loss of connectivity between source and either targetReturn of connectivity between source and that target

Using BCVs at the target site preserves a good data copy to guard against a source site disaster during resynchronization.

BCV operations in Star must be managed by the user.




Lesson 1


Describe concurrent Star operations





SRDF/Star Site Designations

Workload Site: ConcStarASynchronous Target Site: ConcStarBAsynchronous Target Site: ConcStarC

R2

RecoveryLinks

R1

WorkloadSite

R2

Synchronous Target Site

Host I/O

Asynchronous Target Site

Symmetrix ID000190300992



RDFG 5

RDFG 6

RDFG 20

RDFG 21

RDFG 7

RDFG 8

The Star documentation uses the names Workload Site, Synchronous Target, and Asynchronous Target to identify the three sites participating in Star. These names are functional descriptors of the sites and not rigidly tied to a geographical location.

The names ConcStarA, ConcStarB, and ConcStarC are names that refer to this specific Star configuration. These names could be assigned to geographical locations at customer sites, e.g., NewYork, NewJersey, and London. In this Star configuration, the Workload Site could move to ConcStarB or ConcStarC after a switch operation.

The Symmetrix IDs are displayed in the diagrams. The RDF group number on Symmetrix 92 connecting it to Symmetrix 94 is 5. The RDF group number on Symmetrix 92 connecting it to Symmetrix 34 is 6. The recovery group between Symmetrix arrays 34 and 94 must be empty. They are used in the event of a workload site disaster.




Star Options File

The Star options file is created on host1 (at Workload site)

It names the 3 sites and assigns values to SRDF/Star parameters

Example:# cat ConcStar.optSYMCLI_STAR_WORKLOAD_SITE_NAME = ConcStarASYMCLI_STAR_SYNCTARGET_SITE_NAME = ConcStarBSYMCLI_STAR_ASYNCTARGET_SITE_NAME = ConcStarCSYMCLI_STAR_ADAPTIVE_COPY_TRACKS = 30000SYMCLI_STAR_ACTION_TIMEOUT = 1800SYMCLI_STAR_TERM_SDDF = NoSYMCLI_STAR_ALLOW_CASCADED_CONFIGURATION = Yes

While the names ConcStarA, ConcStarB, and ConcStarC are used for this example, names at customer sites could be chosen after other criteria such as geographic locations such as New York, New Jersey, and London.

The adaptive copy tracks value is the number of invalid tracks that must accumulate before transitioning from Adaptive Copy mode into synchronous or asynchronous mode. The default is 30,000.

The action timeout value is the maximum time (in seconds) that the system waits for a particular condition before returning a time-out failure. The wait condition may be one of:

Time to achieve Star consistency after a symstar enable commandTime for a site to reach protected state after a symstar protect command

The default is 1800 seconds (30 minutes). The smallest value allowed is 300 seconds (5 minutes).

Setting SYMCLI_STAR_TERM_SDDF to Yes terminates SDDF sessions any time Star is disabled. Setting this option to No deactivates the SDDF sessions instead of terminating them. For performance reasons, the default is No.

The last option refers to whether cascaded star configuration should be allowed.




Create CG and Define SRDF Group NamesCreate a composite group named ConcStar whose consistency will be managed by the RDF daemonsymcg create ConcStar –type RDF1 –rdf_consistency

Add all concurrent RDF devices belonging to Groups 5 and 6 to the composite group called ConcStar – Adding devices from RDFG 5 includes devices in RDF Group 6symcg –cg ConcStar addall dev –sid 92 -rdfg 5

Assign the name ConcStarB to the devices belonging to RDFG 5symcg –cg ConcStar set –name ConcStarB –rdfg 92:5

Assign the name ConcStarC to the devices belonging to RDFG 6symcg –cg ConcStar set –name ConcStarC –rdfg 92:6

In the event of failure, devices from Group 5 on Symm 92 will be inherited by Group 20 on Symm 94symcg –cg ConcStar set –recovery_rdfg 20 –rdfg 92:5

In the event of failure, devices from Group 6 on Symm 92 will be inherited by Group 21 on Symm 34symcg –cg ConcStar set –recovery_rdfg 21 –rdfg 92:6

A - R11 B - R2

C - R2

Symm: 92

Symm: 34

Symm: 94

5

6 20

21

7

8

This graphic illustrates one group of devices that belong to SRDF groups 5 and 6. Group 5 refers to the RDF group which is attached to the Synchronous target. Group 6 connects the same devices to the asynchronous target.

The commands shown create a composite group called Star and place all devices in the RDF groups 5 and 6 in that composite group.

The Recovery group 20 at the Synchronous Target site connects to group 21 at the Asynchronous target site. It must be connected but not contain any devices. The assignment of the recovery group numbers tells SRDF/Star which SRDF groups are used between the Synchronous and Asynchronous targets to communicate with each other in the event of a Workload site failure.

The commands specify that in the event of a failure:The RDF group 20 at the Synchronous site ConcStarB inherits devices in the RDF group 5 in ConcStarA. The RDF group 21 at the Asynchronous site in ConcStarC inherit devices in the RDF group 6 in ConcStarASince the same devices belong to groups 5 and 6 in ConcStarA, it therefore follows that group 20 in ConcStarB is connected to group 21 in ConcStarC




Perform Setup and Copy File

A - R11 B - R2

C - R2

ConcStarA

ConcStarC

ConcStarB

5

6 20

21Command Issued to set up Star at Workload site ConcStarAsymstar –cg ConcStar setup –options ConcStar.opt

This creates a Star internal definition file with the CG name ConcStarin the directory- /var/symapi/config/STAR/def (Unix)- C:\Program Files\emc\symapi\config\STAR\def (Windows)

ConcStar is then copied to management hosts at sites ConcStarB and ConcStarC

Command issued on management host at Synchronous Target ConcStarBsymstar –cg ConcStar buildcg –site ConcStarB

Command issued on host at Asynchronous Target ConcStarCsymstar –cg ConcStar buildcg –site ConcStarC

7

8

The setup command reads and verifies the CG definition and options file. If the Enginuity, Solutions Enabler, and Symmetrix pre-requisites have not been met, the setup command fails. A successful execution of the setup command generates the SRDF/Star internal definition file.

This internal definition file resides on the host at the Workload site in the directory locations shown on this slide. This file should be copied manually from the host on the source site to the management hosts on the target site(s) to the same directory location.

The buildcg command can be issued at each remote site host to which the internal Star definition file, created in the setup state, is copied. This command builds a matching R2 composite group at each target site. This composite group is used in the event of failure.




Connect Both Target Sites

symstar –cg ConcStar connect –site ConcStarC

A - R11 B - R2

C - R2

ConcStarA

ConcStarC

ConcStarB

5

6 20

21

Disconnected

Connected

connect

connect

Disconnected

Connected

Disconnected

Connected

Each connect command• Performs the commands to reconfigure the SRDF devices • Establishes the SRDF devices in Adaptive Copy Disk Mode• Transitions the site in question to a ‘Connected’ state

symstar –cg ConcStar connect –site ConcStarB

7

8

The next step is to connect the Synchronous and Asynchronous target sites using the connectcommand. If this command is executed at the beginning of a new Star setup, the connect command begins to establish the SRDF pairs in Adaptive Copy Disk mode if synchronization is necessary.

Later on, we will see that if the Disconnected state is reached from the PathFail state after a failure or from a Halted state, the connect command reconfigures the RDF devices (if needed), and then starts the adaptive copy synchronization.




Star Query After Connect# symstar query -cg ConcStarSite Name : ConcStarA

Workload Site : ConcStarA1st Target Site : ConcStarB2nd Target Site : ConcStarC

Composite Group Name : ConcStarComposite Group Type : RDF1

Workload Data Image Consistent : YesSystem State:

{1st_Target_Site : Connected2nd_Target_Site : ConnectedSTAR : Unprotected}

2nd Target Site Information:{Source Site Name : ConcStarATarget Site Name : ConcStarCRDF Consistency Capability : MSCRDF Consistency Mode : NONESite Data Image Consistent : No

An excerpt from the Star query command shows the state of the configuration. The 1st target site in the query refers to ConcStarB and the 2nd target site refers to ConcStarC. Both sites are in the connected state.

The ellipsis represents truncation or omission of output.

Another important piece of information is available in the section under 2nd target information. Here we see that ConcStarA is the source for ConcStarC, the asynchronous target. This means that this is a concurrent Star configuration.

If this had been a cascaded star configuration, the source for the asynchronous target would have been ConcStarB.




Protect Both Target Sites

The protect command will:• Set SRDF mode to synchronous for site B, asynchronous for C• Enables SRDF/S Consistency Protection for site ConcStarB• Enables MSC protection for site ConcStarC• Transitions each site to ‘Protected’ state

Connected

Protected

protect

A - R11 B - R2

C - R2

ConcStarA

ConcStarC

ConcStarB

5

6 20

21

symstar –cg ConcStar protect –site ConcStarC

symstar –cg ConcStar protect –site ConcStarB

Connected

Protected

Connected

Protected

protect

7

8

The protect command issued on the Synchronous target site first checks to see if the number of invalid tracks is lower than the threshold specified in the SYMCLI_STAR_ADAPTIVE_COPY_TRACKS parameter in the Star Options file.

If the invalid track count is lower than the specified threshold, the command switches the SRDF mode to synchronous and enables RDF-ECA consistency.

If the number of invalid tracks is above the specified threshold, the protect command waits for the invalid track count to fall below the threshold before it executes.

The protect command issued on the Asynchronous target site switches the SRDF mode to asynchronous and enables multi-session consistency if the invalid track count between the Workload site and Asynchronous target site is less than the invalid track count specified in the option file. Otherwise, it waits for the invalid track count to fall below that threshold and then switches the SRDF mode and enables consistency.




Star Query After Protect# symstar query -cg ConcStarSite Name : ConcStarA




{1st_Target_Site : Protected2nd_Target_Site : ProtectedSTAR : Unprotected}

As seen earlier, the excerpt from the Star query command shows the state of the configuration, which is both sites are in the protected state.




Enable SRDF/Star Protection

symstar –cg ConcStar enable• Creates and activates SDDF sessions at the target sites• RDF daemon "ties together" the sync and async consistency• Target device state allows for future differential resynchronization• Transitions the Star environment to a ‘Star Protected’ state

R2

SRDF/ARecovery

Links

R1

R2

SRDF/Synchronous

Host I/O

Synchronous Consistency

Group Protection

Multi-Session Consistency

Group Protection

SRDF/Asynchronous

SDDFSDDF

SDDF

ConcStarA

ConcStarB

ConcStarC

ProtectedProtected

Star Protected

enable

The enable command creates the SDDF sessions at the Synchronous and Asynchronous target sites and activates sessions at the Synchronous target. Once the sessions are set up, it becomes possible to differentially resynchronize the Synchronous and Asynchronous target sites and Star protection is achieved.




Query After Enabling Star# symstar query –cg ConcStarSite Name : ConcStarA




{1st_Target_Site : Protected2nd_Target_Site : ProtectedSTAR : Protected}

Last Action Performed : EnableLast Action Status : Successful

This query output shows that Star protection is enabled. Consequently, differential resynchronization between ConcStarB and ConcStarC is possible in the event of a disaster.




State Flow Diagram: Transient Fault

After ‘PathFail’condition occurs, it is recommended to capture the “gold copy” after the resetand prior to the connect command

Firm lines indicate user actions

Dotted lines indicate fault occurrences

DisconnectedDisconnected

ConnectedConnected

connect

ProtectedProtected

protect

Star Protected

enable

PathFailPathFail

transientfault

transientfault

transientfault

reset

The next section deals with the events following a transient fault. A transient fault does not disrupt the production workload site. Thus, remote site recovery and protection procedures can be executed at the workload site.

After a transient fault, the Star state changes from Star enabled to PathFail. In the PathFail state, there is no data flow between the Workload and Target site. The data at both sites is consistent, since consistency protection was in force.




Transient Fault: Async Link Failure

R2

SRDF/ARecovery

Links

R1

R2

SRDF/Synchronous

Host I/O


Group Protection


Group Protection

SRDF/Asynchronous

SDDFSDDF

SDDF

Asynchronous Link Failure• RDF daemon performs an MSC “trip”• Dependent-write consistent image on R2s• Star environment is now considered “tripped”• Transitions ConcStarC to PathFail state

Workload SiteConcStarA

Synchronous Target SiteConcStarB

AsynchronousTarget SiteConcStarC

Star Protected

ProtectedPathFail

transientfault

Let us presume that the link between the Workload site and Asynchronous target site fails. The target devices are in a consistent state and the state transitions from Star Enabled to PathFail.

If the asynchronous targets were in the middle of a cycle switch when the failure occurred, the state would transition to PathFailCleanReq. In such a case, an MSC cleanup would first be required to assure consistency on the target.

MSC Cleanup can be performed explicitly with a symstar cleanup command. The resetcommand also performs a cleanup in the course of cleaning up the metadata on the failed site.




Reset Fault Condition After Link Restoration

R2

SRDF/ARecovery

Links

R1

R2

SRDF/Synchronous

Host I/O


Group Protection

SRDF/Asynchronous

SDDFSDDF

SDDF

symstar –cg ConcStar reset –site ConcStarC• Initiates cleanup if necessary• Disables Star protection• Disables consistency protection of failed site• Allows for "Gold" copy capture prior to resynchronization• Transitions ConcStarC to ‘Disconnected’




Protected

ProtectedDisconnected

PathFail

reset

The reset action discussed earlier performs cleanup at the Asynchronous target site, if necessary. It disables Star protection of the failed Asynchronous target site, unprotects the site and transitions it to the Disconnected state. This is also the time when a BCV copy of the consistent Asynchronous target site data from the point in time of the failure should be made. Once the resynchronization between the Workload and Asynchronous site starts, the consistency is lost.




Resume SRDF/Star Protection

Connect asynchronous (failed) sitesymstar –cg ConcStar connect –site ConcStarC

Protect asynchronous (failed) sitesymstar –cg ConcStar protect –site ConcStarC

Enable SRDF/Star protectionsymstar –cg ConcStar enable

ProtectedDisconnected

ProtectedConnected

connect

ProtectedProtected

protect

Star Protected

enable

A - R11 B - R2

C - R2

ConcStarA

ConcStarC

ConcStarB

5

6 20

21

7

8

Once the transient fault is remedied, the “connect”, “protect”, and “enable” actions shown earlier allow the failed site to rejoin the other two sites in Star Enabled mode.




State Flow Diagram: Unplanned Switch


ConnectedConnected

connect

ProtectedProtected

protect

Star Protected

enable

PathFailSTAR Tripped

PathFail;CleanReq

STAR Tripped

site loss

cleanup



site loss


DisconnectedConnected

switch(keep local data)

switch(keep remote data)

An Unplanned Switch Operation becomes necessary when the Workload site disaster warrants a move of the production workload to the Synchronous or Asynchronous target site. After the disaster, the system transitions from the Star Protected to the Star Tripped state. The Synchronous target transitions to the PathFail state, the Asynchronous Targets to the PathFail or PathFail; CleanReq state. Recovery operations must be undertaken to start production at one of the target sites.

The distinction between PathFail and PathFail; CleanReq states is the need for MSC cleanup. PathFail; CleanReq indicates the need for MSC cleanup at the Asynchronous Target site.

If it is decided to switch to one of the remote sites and preserve the data at that site, the switch command transitions the sites to the Disconnected state. From that state, it is necessary to issue a connect command to arrive at the Connected state.

If the decision is made to switch to one of the remote sites and preserve the data of the other remote site, the switch command transitions the sites to the Connected state.




Workload Site Failure, Switch Variations

There are 4 main variations for Unplanned Workload Switch– Switch to Sync Target Site, Keep Sync Target data– Switch to Sync Target Site, Keep Async Target data– Switch to Async Target Site, Keep Async Target data– Switch to Async Target Site, Keep Sync Target data

Switch to site, decision based on customer needs

Keep data decision from symstar query output, telling if Async Target data is most current

Best practice is to save "Gold" copy before initiating synchronization

The decision about which site to switch depends on the customer’s infrastructure capabilities and the nature of the disaster which may have affected the campus Synchronous Target Site.

The Asynchronous Target site can be more up-to-date than the Synchronous target in the case of a rolling disaster. This can happen if the links to the Synchronous target site fail first and the Asynchronous target continues to receive data for a while. Then the Workload site fails completely. The symstar query command can assist in making the decision about which data is most recent and must be preserved.

In the example that follows, Workload is switched to the Asynchronous target site while keeping the data of the Synchronous target site.




Workload Site Fault: Synch Site More Current

R2

SRDF/ARecovery

Links

R1

R2

SRDF/Synchronous

Host I/O


Group Protection


Group Protection

SRDF/Asynchronous

SDDFSDDF

SDDF

Workload Site Failure• Both consistency groups "trip“

– Dependent-write consistent image on R2s• Star environment is now considered "tripped"• Target devices can be differentially synchronized• ConcStarB site more current than ConcStarC site




StarInternal

Definition

StarInternal

Definition

Star Protected

PathFailPathFail;CleanReq

siteloss

In the example shown here, the Workload site has failed. Assume that a query command shows the value of “Asynchronous Target Site Data Most Current” is “No”. This means that the data at the Synchronous Target is more recent.

The system state is StarTripped.

The Synchronous Target site transitions to the PathFail state.

The Asynchronous target site transitions to the PathFail; CleanReq state if the failure occurred in the middle of a Delta Set switch. Otherwise, it transitions to the PathFail state.




Workload Site Disaster: Cleanup Asynch Target

Commands performed at the ConcStarC site (or ConcStarB)symstar –cg ConcStar cleanup –site ConcStarC

• Cleans up internal metadata and Symmetrix cache at ConcStarC• Transitions 2nd target site from ‘PathFail;CleanReq’ to ‘PathFail’• Allows for “Gold” copy capture prior to resynchronization

PathFail

PathFailPathFail

PathFail;CleanReq

cleanupA - R11 B - R2

C - R2

ConcStarA

ConcStarC

ConcStarB

5

6 20

21

This step is necessary only if the state of the Asynchronous target site was PathFail; CleanReq after the failure of the Workload site.

The cleanup command can be issued from either remaining site. This performs MSC Cleanup at the Asynchronous target site.

Recall that the earlier buildcg step had already created the consistency groups at both sites and the Star internal definitions file has a record of the site names and locations. After cleanup is performed, a BCV copy of the data should be taken to preserve a consistent data copy from the point of time of the failure.




Workload Site Fault: Switch to Asynch Target

Switches production to remote site, ConcStarC, keeping sync datasymstar –cg ConcStar switch –site ConcStarC –keep_data ConcStarB

• ConcStarC devices swap personality to become R1s• ConcStarB devices are reconfigured so they become R2s of ConcStarC• ConcStarC devices are made Read-Write to the host• Allowed from a ‘PathFail;PathFail;Tripped’ state• ConcStarA is ‘Disconnected’ and ConcStarB is ‘Connected’

PathFail

DisconnectedConnected

PathFailswitch A - R11 B - R2

C - R2

ConcStarA

ConcStarC

ConcStarB

5

6 20

21

In the example shown here, the Workload site is being moved from ConcStarA to ConcStarC, while retaining data in ConcStarB. Note that this represents the “Keep Remote Data” option on the Unplanned Switch state flow diagram shown earlier.

The switch command reconfigures the RDF devices at ConcStarB and ConcStarC. Since the data at ConcStarB is more recent, a differential RDF restore operation from ConcStarB to ConcStarC is undertaken. Once the restore is complete, the R1 devices at ConcStarC are enabled for reading and writing.




Workload Site Fault: Protect Synch Target

Initiate remote data protection

symstar –cg ConcStar protect –site ConcStarB• Waits for invalid track count to reach specified amount• Sets SRDF mode to asynchronous• Enables the SRDF/A MSC Consistency Protection• Transition ConcStarB to a ‘Protected’ state

Disconnected

DisconnectedProtected

Connected

protectA - R11 B - R2

C - R2

ConcStarA

ConcStarC

ConcStarB

5

6 20

21

The protect command now enables MSC protection between ConcStarC and ConcStarB. Star protection is not possible because three sites are not available. Note that no reconfiguration is undertaken at site ConcStarA, which is expected to be inaccessible after a workload site disaster.




Query After Protecting ConcStarB# symstar -cg ibm1star querySite Name : ConcStarC

Workload Site : ConcStarC1st Target Site : ConcStarA2nd Target Site : ConcStarB



{1st_Target_Site : Disconnected2nd_Target_Site : ProtectedSTAR : Unprotected}

The output of the query command shows that the Workload site is now at ConcStarC and that ConcStarA is disconnected. ConcStarA is now referred to as the 1st target site and ConcStarB as the 2nd target site.




Planned Switch Operation: Command Flow


ConnectedConnected

connect

ProtectedProtected

protect

Star Protected

enable

HaltedHalted

switch

DisconnectedDisconnectedhalt

halt

halt

The final example shows the steps to switch the Workload site back to the original site, ConcStarA, in a planned fashion. The key command here is halt. A planned workload switch is typically used either to move back home after the resolution of a Workload site failure or in the course of a disaster drill. All site moves are allowed as long as the sites are functional and the RDF connectivity is present.

Permitted Workload site moves include:

A to B

A to C

B to A

B to C

C to A

C to B




Workload Site Problem Resolved: Bring Online

Propagate data back to ConcStarAsymstar –cg ConcStar connect –site ConcStarA• Performs commands to reconfigure RDF devices • Establishes RDF devices in Adaptive Copy Disk Mode• Transitions ConcStarA to a ‘Connected’ state

Disconnected

ConnectedProtected

Protected

connect

A - R11 B - R2

C - R2

ConcStarA

ConcStarC

ConcStarB

5

6 20

218

7

The halt command has been explained earlier as part of a planned switch procedure. It write disables the R1 devices, then drains data from the production site to the two target sites. It therefore ensures that data at all three sites are identical.

Continuing the example shown earlier, let us assume that the problem that caused the Workload site at ConcStarA to be shut down has been resolved. When the connect command is used at ConcStarC, the RDF volumes in ConcStarA are reconfigured so that they become concurrent targets of the R1 devices in ConcStarC. An adaptive copy synchronization is initiated between ConcStarA and ConcStarC.




Planned Switch to ConcStarA: Halt Replication

SRDF/ARecovery

Links

R2

R2

Host I/OShutdown applicationssymstar –cg ConcStar halt• Completely synchronize both remote target sites • Allows all invalid tracks and cycles to drain• WD or NR the R1 devices• Results in all 3 sites having the same data




R1

Connected

HaltedHalted

Protected

halt

SRD

F/Asynchronous

Multi-Session

ConsistencyProtection

Next, the halt command ensures that all three sites are identical and write disables the R1 devices if they are mapped to an FA.




Query after Halt# symstar query -cg ConcStar

Site Name : ConcStarC

Workload Site : ConcStarC1st Target Site : ConcStarA2nd Target Site : ConcStarB



{1st_Target_Site : Halted2nd_Target_Site : HaltedSTAR : Unprotected}

Last Action Performed : HaltLast Action Status : Successful

The query shows that the halt was successful. The Workload site still remains at ibm1starB.




Planned Switch Back to ConcStarA

Commands entered at ConcStarAsymstar –cg ConcStar switch –site ConcStarA• ConcStarA devices swap personality to become R1s• ConcStarC devices become R2s• ConcStarA devices are made Read-Write to the host• ConcStarB and ConcStarC are in ‘Disconnected’ state

Halted


Halted

switch

A - R11 B - R2

C - R2

ConcStarA

ConcStarC

ConcStarB

5

6 20

21

7

8

The switch command now resets the RDF relationships so that ConcStarA devices have the RDF1 attribute and are concurrently connected to ConcStarB and ConcStarC. Both targets transition to the “Disconnected” state. Now, the “connect”, “protect”, and “enable” action sequence transitions the system to the Star protected state.




Lesson 2


Describe Cascaded Star operations





Differences Between Cascaded and Concurrent Star

Since the integrity of the asynchronous data depends on the data at the synchronous target:– Connect synchronous target before asynchronous target– Protect synchronous target before asynchronous target– May not unprotect synchronous target if asynchronous

target is protected– May not connect synchronous target if synchronous target is

disconnected and the asynchronous target is protected

There are a few important differences between the normal operating conditions of Concurrent Star and Cascaded Star.

1. While connecting the sites from the Disconnected state, the synchronous site must be connected first, the asynchronous site, second.

2. Since the consistency of the asynchronous site data is dependent on the consistency of the synchronous site data, the asynchronous target can only be protected if the synchronous target is protected as well. Consequently, after the two sites have been connected, the synchronous target must be protected first.

3. While both the synchronous and asynchronous targets are in the protected state, an unprotect action on the synchronous site will not be permitted.

4. If the synchronous target is disconnected while the asynchronous target is protected (as can happen after a failure of the links between the workload site and the synchronous target), a connect action will not be permitted on the synchronous target. The asynchronous target must be tripped or unprotected before the connect with the synchronous target is allowed.

5. Since only the asynchronous site can be taken out of service without disrupting remote data protection, it is only permissible to isolate the asynchronous target from the Protected, Protected state.




Setting Up Cascaded StarCascaded SRDF/Star Configuration

Create R1 type composite group

symcg create CascStar –rdf_consistency –type r1

# Add all devices from as RDFG 17 to the CG

symcg –cg CascStar addall dev –rdfg 17

# Assign name to the B location only

symcg –cg CascStar set –name CascStarB –rdfg 92:17

# Define recovery groups

symcg –cg CascStar –rdfg 92:17 set –recovery_rdfg 37

# Run Star setup

symstar –cg CascStar –options <filename> setup

R2

R21

R1

Site CascStarASymm 92

SRDF/A

SRDF/S

Passive RecoveryLink

RDFG 17

RDFG 37

RD

FG 57

Site CascStarCSymm 34

Site CascStarBSymm 94

RD

FG 5

7

The CG creation in Cascaded Star differs from Concurrent Star in a couple of ways.

First, since the source devices have only one set of connections, only the RDF group(s) connecting A to B needs to be named. Secondly, only one recovery group statement is required, because the RDF group(s) from CascStarA to CascStarC is the only one that needs to be identified as such.




Excerpt from Cascaded Star Query# symstar query -cg CascStarSite Name : CascStarA

Workload Site : CascStarA1st Target Site : CascStarB2nd Target Site : CascStarC

Composite Group Name : CascStarComposite Group Type : RDF1


{1st_Target_Site : Disconnected2nd_Target_Site : DisconnectedSTAR : Unprotected}

2nd Target Site Information:{Source Site Name : CascStarBTarget Site Name : CascStarCRDF Consistency Capability : MSCRDF Consistency Mode : NONESite Data Image Consistent : No

As in the case of Concurrent Star, the two sites are in the Disconnected state, because the RDF links are suspended.

Note that the second target site information displays that the source for that site is CascStarB. This indicates that this is a Cascaded Star configuration.

After setup, Cascaded Star is brought up just like Concurrent Star with use of the connect, protect, and enable commands.

The only caveats to observe are the order in which the synchronous and asynchronous target sites can be connected and protected. The sequence of commands would therefore be:

# symstar –cg CascStar connect –site CascStarB –nop

# symstar –cg CascStar connect –site CascStarC –nop

# symstar –cg CascStar protect –site CascStarB –nop

# symstar –cg CascStar protect –site CascStarC -nop

# symstar –cg CascStar enable -nop




Cascaded Star – Failure of Synchronous LinksExample using composite group CascStar makes the following assumptions

– Links to site B have failed, i.e., System state is: PathFail, Protected, Tripped– Site C is still working– A to C links still work

# symstar query -cg CascStar

Workload Site : CascStarA1st Target Site : CascStarB2nd Target Site : CascStarC

System State:{1st_Target_Site : PathFail2nd_Target_Site : ProtectedSTAR : Tripped}

2nd Target Site Information:{Source Site Name : CascStarBTarget Site Name : CascStarC

If the synchronous target of a Cascaded Star configuration goes down, either because of a link or because of a site failure, remote protection is lost. It is possible then to reconfigure Cascaded Star so the recovery links are turned into a live RDF connection between CascStarA and CascStarC. However, this configuration is considered to be a Concurrent Star configuration.

In the example shown, the composite group CascStar has experienced a loss of links to site CascStarB. Star has tripped. The CascStarB to CascStarC links are intact so the second target state is Protected.

The query indicates that the configuration is cascaded, because the source for the second target is sun1starB.




Disconnect and Reconfigure Asynchronous TargetSince asynchronous links are still up, trip them consistently# symstar -cg CascStar disconnect -trip -site CascStarC -nop

Now, a query should reveal the asynchronous site in PathFail stateSystem State:{1st_Target_Site : PathFail2nd_Target_Site : PathFailSTAR : Tripped}

A reconfiguration activates the A to C links in a Concurrent Star configuration leaving the C site disconnected# symstar -cg CascStar reconfigure -reset -site CascStarC –path CascStarA:CascStarC -nop

Note that a query after the connect and protect steps reveals that the A site is now connected to C# symstar -cg CascStar connect -site sun1starC –nop# symstar -cg CascStar protect -site sun1starC –nop

System State:{1st_Target_Site : PathFail2nd_Target_Site : ProtectedSTAR : Unprotected}

2nd Target Site Information:{Source Site Name : CascStarATarget Site Name : CascStarC

If the synchronous target site had failed, both links would appear in the PathFail state and the disconnect –trip action would not have been necessary. In this example the asynchronous target must be tripped with a disconnect –trip command. This leaves the data at the asynchronous target in a consistent, PathFail state.

Now a reconfiguration causes the A to C links to be used for data replication. The query reveals that the second target site is receiving its data feed from site A.

Once the connection between A and B is reinstated, a connect CascStarB, protect CascStarB followed by enable CascStar will result in a Concurrent Star configuration in Star protected state.




Reconfiguration of Star

Star can often be reconfigured from Cascaded to Concurrent and vice versa while the Workload site is running

Other than a workload failure, which requires changing of RDF personalities, cases where reconfiguration is practical are:– Failure cases for Cascaded Star

1 Loss of link between A and B2 Loss of site B5 Loss of links between B and C

– Failure case for Concurrent Star3 Loss of links between A and C

Other than a Workload site failure, which always requires a reconfiguration of the RDF personalities of the source and target sites, there are several failure cases in Cascaded Star and Concurrent Star where a reconfiguration of the Star setup might be desirable. The following failure cases were discussed in the introductory section:

1. Loss of Links between A and B

2. Site B failure

3. Link failure between A and C

4. Site C failure

5. Link failure between B and C

6. Site A failure

In the case of Cascaded star, a reconfiguration to Concurrent Star can allow the Workload site to continue functioning with remote data protection after failure cases 1 and 2.

A reconfiguration after failure case 5 from Cascaded to Concurrent and a reconfiguration from Concurrent to Cascaded in failure case 3 makes it possible for the three sites to continue in Star protected mode despite the above named failures.




Module Summary

Key points covered in this module:

Symmetrix parameters required to run SRDF/Star

Host software components needed for Star

Concurrent SRDF/Star operations

Cascaded SRDF/Star operations

These are the key points covered in this module. Please take a moment to review them




Course Summary

Key points covered in this course:

Benefits of SRDF/Star over other replication technologies

Underlying technologies for SRDF/Star– Synchronous SRDF consistency groups using RDF-ECA– SRDF/A Multi-session Consistency– Special SRDF features in support of Star

Concurrent and Cascaded SRDF/Star concepts

Steps to perform:– Normal Operation– Transient Fault– Unplanned switch caused by a major outage

These are the key points covered in this training. Please take a moment to review them.

This concludes the training. Please proceed to the Course Completion slide to take the assessment.

2008 srdf star for open systems srg

Documents

emc software

trademarks of emc corporation

emc snap

emc oncourse

emc controlcenter

emc developers program

emc storage administrator

emc common information