ibm ds8000 four site replication management with ibm copy ... · furthermore, these d volumes are...

45
IBM DS8000 Multi Target PPRC & IBM Copy Services Manager White Paper IBM DS8000 four site replication management with IBM Copy Services Manager This document can be found on the web: www.ibm.com/support/techdocs Author: Thomas Luther IBM Consulting IT Specialist Version 1.0, 1. December 2016 IBM® Systems Storage Platform

Upload: others

Post on 04-Jul-2020

8 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM DS8000 Multi Target PPRC &

IBM Copy Services Manager White Paper

IBM DS8000 four site replication management with IBM Copy

Services Manager

This document can be found on the web: www.ibm.com/support/techdocs

Author: Thomas Luther

IBM Consulting IT Specialist

Version 1.0, 1. December 2016

IBM® Systems Storage Platform

Page 2: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

Disclaimer and Trademarks No part of this document may be reproduced or transmitted in any form without written permission from IBM Corporation. Product data has been reviewed for accuracy as of the date of initial publication. Product data is subject to change without notice. This information may include technical inaccuracies or typographical errors. IBM may make improvements and/or changes in the product(s) and/or programs(s) at any time without notice. References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business.

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATIONS "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

IBM shall have no responsibility to update this information. IBM products are warranted according to the terms and conditions of the agreements (e.g., IBM Customer Agreement, Statement of Limited Warranty, International Program License Agreement, etc.) under which they are provided. IBM is not responsible for the performance or interoperability of any non-IBM products discussed herein. The performance data contained herein was obtained in a controlled, isolated environment. Actual results that may be obtained in other operating environments may vary significantly. While IBM has reviewed each item for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Statements regarding IBM’s future direction and intent are subject to change or withdraw without notice, and represent goals and objectives only.

The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents or copyrights. Inquiries regarding patent or copyright licenses should be made, in writing, to:

IBM Director of Licensing

IBM Corporation

North Castle Drive

Armonk, NY 10504-1785

U.S.A.

IBM®, the IBM logo are trademarks of the International Business Machines Corporation in the United States, other countries, or both. A full list of U.S. trademarks owned by IBM may be found at http://www.ibm.com/legal/copytrade.shtml

Microsoft®, Windows® are registered trademarks of Microsoft Corporation.

Other company, product, or service names may be trademarks or service marks of others.

Copyright © 2016 by International Business Machines Corporation.

Page 3: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 3 © IBM Copyright, 2016

Acknowledgements The author would like to say “thank you” to Randy Blea, William Rooney and Nick Clayton for their support in clarifying various topics and sharing their expertise to draft this solution together. I also want to thank Terry Russel for helping me to validate the solution during an actual customer implementation.

A Note to the Reader This White Paper assumes a familiarity with the general concepts of IBM DS8000 Copy Services, in special Metro Mirror, Global Mirror, Global Copy and Multi Target PPRC.

Additionally, the reader will find it helpful to be familiar with general usage and concepts of IBM Copy Services Manager (or the former product TPC for Replication).

For readers unfamiliar with these topics and for additional information, please refer to references listed in the section References on page 45.

Document History Version/Date Remarks

V1.0 12/01/2016 Initial release for DS8880 and Copy Services Manager 6.1.3

Table of contents

1 Purpose of Document ........................................................................ 5

2 Solution overview .............................................................................. 6

2.1 Customer Requirements .................................................................... 6

2.2 High level layout ................................................................................ 6

2.3 Volume naming conventions .............................................................. 7

2.4 Considerations for Disaster Recovery Practice (DR Tests) ............... 8

2.5 Volume distribution across the 4 systems.......................................... 9

3 Replication management ................................................................. 11

3.1 Considerations for managing the replication with IBM Copy Services Manager (CSM) ............................................................................... 11

3.2 CSM session naming convention details ......................................... 12

3.3 CSM session and volume mapping ................................................. 14

3.4 Avoiding configuration and operational errors ................................. 15

3.5 Rules for hardware relationship assimilations in CSM sessions ...... 16

Page 4: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 4 © IBM Copyright, 2016

3.6 Automated PPRC path management............................................... 18

3.7 Managing configuration changes ..................................................... 18

4 z/OS & HyperSwap considerations .................................................. 21

4.1 Common z/OS configuration considerations .................................... 21

4.2 Couple Dataset (CDS) placement ................................................... 21

5 Operational scenarios ...................................................................... 23

5.1 (Re-) Establish replication from DC1 to DC2 ................................... 23

5.2 Create a DR practice copy on remote DC (4th system) .................... 24

5.3 Finishing the DR practice copy on remote DC (4th system) ............. 26

5.4 HyperSwap the production workload on local site ........................... 27

5.5 Failover production workload on local site ....................................... 29

5.6 Planned remote failover – Switch datacenters ................................. 31

5.7 Unplanned remote failover – Switch Datacenters ............................ 37

5.8 Revert a mistaken StartGC H3->H2->H1 action .............................. 43

6 References ...................................................................................... 45

Page 5: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 5 © IBM Copyright, 2016

1 Purpose of Document The IBM DS8000 storage products support various types of copy services which can be easily used to design two, three and four site replication solutions. In addition to the former Metro Global Mirror three site cascaded topology, the DS8870 and later models also support three site Multi Target PPRC topologies, which provide much more flexibility and can be combined with cascaded Global Copy or Global Mirror to compose a four site replication topology.

IBM Copy Services Manager (CSM), or former TPC-R for Replication (TPC-R) is the central IBM software product to manage all the various types of copy services, not only for the DS8000, but also for the majority of other IBM storage products like the Storwize Family, XIV and Flash Systems.

While CSM supports three site replication topologies for DS8000, it does not have a built in Session topology for a cascaded four site topology yet. There are various concepts for a four site replication solution, mainly depending on the customer requirements. Nevertheless, CSM manages copy services in a way that allows to combine (cascade) multiple sessions which manage the same set of volumes in a specific role of those sessions. However, this introduces implications and limitations for CSM Session management and various Failover Failback scenarios that might be required.

This Whitepaper describes a specific four site replication solution and shows how this can be managed with a set of CSM Sessions, while preventing full copies when active CSM Sessions needs to be toggled with inactive sessions to manage specific scenarios in the reversed direction.

It also provides best practices how configuration changes can be managed in such an environment to avoid configuration mistakes in CSM, by leveraging CSM features such as volume protection, site awareness and session configuration export/import capabilities.

Page 6: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 6 © IBM Copyright, 2016

2 Solution overview

2.1 Customer Requirements

The described four site replication solution fulfills following customer requirements:

Four DS8880 Storage Systems must be supported whereas two Storage Systems are installed per Datacenter.

The two local DS8880 in DC1 should provide High Availability (HA) capabilities, while Disaster Recoverability (DR) is maintained via replication to the remote datacenter DC2 via Global Mirror due to extended distance of about 100 km.

The remote datacenter site need to provide practice capabilities with minimal impact to remote replication and DR capabilities

Once in a while (approximately 1-2 times a year), the active production site is switched from the local to the remote DC and needs to run there for a while.

If a Site Switch is performed to the remote datacenter DC2, the systems in DC2 should provide local HA capabilities, as well as asynchronous replication back to original site DC1 for DR

During the site switch period, DR test capabilities should be maintained on the original datacenter DC1

The production workload is distributed amongst multiple parallel sysplexes accessing the storage systems in either DC1 or DC2, depending on which site is currently Active.

For workload distribution across both Storage Systems per site, some sysplexes should have their primary volumes on the first Storage system, while the remaining sysplexes have their primary volumes on the other storage system in the same datacenter. This allows similar capacity allocations for all four systems and Global Mirror Journaling capacity can be distributed across both target systems

The Storage Systems must be able to provide sufficient performance and capacity to cover a local storage system failure by switching the failed primary volume workload to the other system in the same datacenter

2.2 High level layout

Following pictures illustrates the high level layout. The information in brackets () indicate the meaning after a site switch to DC2. The setup is symmetrical from a site switch perspective as well as from a workload distribution perspective amongst the systems per datacenter.

Page 7: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 7 © IBM Copyright, 2016

The volumes shown in this picture are covering only a set of volumes for a single Sysplex which has its primary volumes located on system DC11. For workload balancing, the displayed volumes just need to be flipped horizontally between the storage systems on the same datacenter (A & Ja with B, C & Jc with D). See section 0

Volume distribution across the 4 systems for more details.

2.3 Volume naming conventions

To fulfill the solution requirements, we can combine DS8000 Multi Target PPRC utilizing Metro Mirror between the local systems from volumes A to B, and Global Mirror from A to C. Volumes Jc are used for Global Mirror Journaling, which forms regular consistency groups. On top of that, we can cascade an inconsistent Global Copy from C to D volumes at the remote datacenter. This will ensure that majority of data is already replicated between the systems in the remote datacenter in order to quickly re-enable local HA capability after a site switch.

Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective (RPO) impact during the test phase, because just the last Global Mirror consistency group from the C volumes need to be drained to the D volumes. Once the D volumes are synchronized, the relationship can be suspended and recovered on D while the Global Mirror relation to the C volumes can be resumed. This allow tests from consistent production data on D volumes without further impacting the remote DR replication during the test phase.

Therefore following 6 volume roles will be defined in this solution to realize the required replication layout:

A: Primary volumes containing the Sysplex production data on local site

A

B

C

D

DC2

Remote DC

DR Site

(Production site)

DC1

Local DC

Prod. Site

(DR site)

LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

JaJc

HA (Test)

DR

Test (HA)

Active(inactive) Sysplex

Inactive(Active) Sysplex

Page 8: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 8 © IBM Copyright, 2016

If sites are switched it acts as the Global Mirror target volume of the remote site

Ja: Journal Volumes required in the site switch case to form consistency groups of A volumes when they are Global Mirror targets.

Only used if sites are switched

Can be thin provisioned volumes to save capacity

B: Local synchronous Metro Mirror copy of the production data, can be used for local Storage Failover/Failback or for Storage High Availability via z/OS HyperSwap

If sites are switched it is used for DR tests (correspondingly to D volumes on the remote site)

C: Designated Global Mirror target of either A or B volumes for remote asynchronous copy of the production data

If sites are switched it is used as primary volumes for Sysplex production data while running in remote site

Jc: Journal Volumes required to form consistency groups of C volumes while C volumes act as Global Mirror targets

If sites are switched they are not used

Can be thin provisioned volumes to save capacity

D: Cascaded asynchronous Global Copy for DR tests

Must be made consistent via manual command procedure

If sites are switched it contains the synchronous copy of the production data while running in remote site and can be used for local Storage Failover/Failback or for Storage High Availability via z/OS HyperSwap (correspondingly to B volumes on the local site)

2.4 Considerations for Disaster Recovery Practice (DR Tests)

Practicing DR in this solution design will be done at the 4th system (D volumes) due to multiple good reasons:

The D volumes are required and need to be reserved anyway for site switch scenarios in order to maintain local DR/HA capabilities when running production at the remote datacenter

Using them for remote DR tests makes the best use of the D volumes while running on the local datacenter

It also avoids reserving extra FlashCopy volumes (capacity) on the 3rd system which otherwise would be required just for DR testing if the Global Mirror replication must be maintained during tests.

Using the D volumes for testing allows quick resynchronization and creation of a consistent data point for independent testing, while local and remote DR capabilities are maintained

The D volumes will be a continuous cascaded Global Copy Target of the C volumes. Therefore, only a short remote RPO impact occurs for the creation of the consistent data point on C volumes and draining them to the D volumes before the Suspend/Recover of the C->D relations can take place. The Global Mirror to the C volumes can then be resumed immediately

Page 9: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 9 © IBM Copyright, 2016

Testing from the D volumes allows testing how to recover from the 4th system since the D volumes are no special test volumes, but normal host volumes at the DR side which can be used for IPL

IPL / IODF configuration for the 4th system can be used and tested

A real recovery from the 4th system can be performed in the same way as it was tested (IPL from 4th system)

Testing from FC targets would be a different scenario and would require another specific IPL/IODF configuration which would never be used for site switch or recovery IPLs

All changes between C and D volumes during DR tests are tracked. This allows quick resynchronization once tests are completed in order to prepare the D volumes for another test or a real site switch.

The existing Global Copy relation between C and D volumes also avoids that a full copy would become necessary on real site switch scenarios in order to establish the local Metro Mirror relations to provide local DR/HA capabilities within the datacenter

The Global Copy pairs used to create test copies can be assimilated and converted into a Metro Mirror relation after a production site switch. As such, local DR/HA capability is established much quicker after a site switch than without an ongoing cascaded copy relation.

Optionally, DR testing (IPL/IODF testing) is also possible from 3rd system (C volumes), but this will require suspending the Global Mirror relation for the whole test duration (remote DR/RPO impact during test)

2.5 Volume distribution across the 4 systems

For workload balancing reasons, the workload of multiple sysplexes should be distributed across both systems in a datacenter. Following two layouts show the volume distribution when 2 or more sysplexes are used.

Page 10: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 10 © IBM Copyright, 2016

Sysplex A, primary volumes on DC11 (or alternatively on DC21 after a site switch):

Sysplex B, primary volumes on DC12 (or alternatively on DC22 after a site switch):

A

B

C

D

DC2

(DR Site)

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

Sysplex A, primary volumes on DC11 (A) or DC21 (C)

Ja Jc

B D

DC2

(DR Site)

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

A C

DC11

DC12

DC21

DC22Sysplex B, primary volumes on DC12 (A) or DC22 (C)

Ja Jc

Page 11: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 11 © IBM Copyright, 2016

3 Replication management

3.1 Considerations for managing the replication with IBM Copy Services Manager (CSM)

CSM provides Session topologies to manage 2 and 3 site replication for IBM DS8000 systems. Unfortunately it does not provide (yet) a fully integrated 4 site Session topology and one could think CSM therefore does not support this replication solution. However, CSM allows flexibility in combining sessions that share the same volumes in a dedicated role. Therefore we can combine a 3 site Metro Mirror – Global Mirror multi target PPRC Session (MM-GM) with a 2 site Metro Mirror Failover/Failback session (MM FO/FB), which both share the C volumes in a dedicated role. Following picture illustrates how to accomplish this 4 site replication solution with two active Sessions when primary data of the Sysplex is on system DC11:

Note that a MM FO/FB Session can also be started in Global Copy mode, which is possible with any Metro Mirror Role Pair for the various type of Sessions that CSM supports for DS8000. That means also the Metro Mirror role pair between A and B volumes in the MM-GM session can be started in Global Copy mode if desired.

So this set of 2 (dependent) active CSM sessions can cover the replication management when running production workload in the local datacenter DC1. In order to switch to the remote DC2 and run from there, another set of 2 (dependent) Sessions is required which are defined in the opposite direction as shown in following picture:

H3=H1

DC1

DC11

DC12

DC21 DC22DC21

(Multi Target Metro Mirror -

Global Mirror session)

DC2

(Metro Mirror Failover/Failback Session

running in Global Copy Mode)It is cascaded off the H3 volumes from the

Global Mirror targets

A

B

C

C

Jc

D

Page 12: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 12 © IBM Copyright, 2016

Switching production sites therefore also means switching (or toggling) between active and inactive CSM sessions. Fortunately, CSM provides capabilities to assimilate existing PPRC pairs when an inactive Session is started. This eliminates the need for a full copy when the sessions need to be toggled.

Of course there are dependencies for the operational actions issued against any of the 2 active sessions that share the same volumes or for toggling from the active to the inactive sessions which is required after a site switch. This is however a manageable approach and the operational procedures described in this White Paper have been tested and validated to be working without problems.

3.2 CSM session naming convention details

In order to manage 4 site replication for a single Sysplex in this solution, a set of 4 CSM sessions is required where 2 of them are active at a time. To limit the complexity for operational management, following CSM naming convention will be defined:

<pref>_

This is the session prefix, common for all 4 Sessions managing data of a specific Sysplex

E.g. the Sysplex abbreviation might be used as <pref>

_LocationID_

This is a single digit indicating the actual production site for the Sysplex

Can be _L_ for DC1 (Local) and _R_ for DC2 (Remote)

_SessionType_

This is _MMGM_ for the Multi Target Metro Mirror - Global Mirror Session, running either from DC1 to DC2 or vice versa

• It has Metro Mirror within the same datacenter with optional HyperSwap capability, and Global Mirror to the other datacenter system that also has the Journal volumes defined

This is _GC_ for the cascaded Global Copy Session to create a practice copy at the 4th system

H3=H1

DC2

DC21

DC22

DC11 DC12DC11

(Multi Target Metro Mirror -

Global Mirror session)

DC1

(Metro Mirror Failover/Failback Session

running in Global Copy Mode)It is cascaded off the H3 volumes from the

Global Mirror targets

C

D

A

A

Ja

B

Page 13: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 13 © IBM Copyright, 2016

• It is continuously replicating the Global Mirror target volumes at the remote site to the 4th system to create a consistent practice copy whenever required via manual operation

• In CSM this is a Metro Mirror Failover/Failback session that normally runs in Global Copy mode only

_CopyDirection

This is a 2 digit identifier showing the major copy direction of the Session

• LR means copy across DC from DC1 to DC2

• RL means copy across DC from DC2 to DC1

• LL means copy within DC between systems at DC1

• RR means copy within DC between systems at DC2

With the naming convention above, the set of 4 Sessions would be named as following:

When running production at local DC1, the active sessions start with <pref>_L_:

<pref>_L_MMGM_LR

<pref>_L_GC_RR

Following picture illustrates how these two sessions compose the 4 site replication solution for Sysplex A which has primary production data on DC11:

When running production at remote DC2, the active sessions start with <pref>_R_:

<pref>_R_MMGM_RL

<pref>_R_GC_LL

Following picture illustrates how these two sessions compose the 4 site replication solution for Sysplex A which has primary production data on DC11:

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3Metro

Mirror

Global Mirror

Global

CopyMetro

Mirror*

Global Mirror*

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22* Replication after local HyperSwap

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Page 14: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 14 © IBM Copyright, 2016

The 4 site replication for another Sysplex B with primary workload on the lower DS8000s will compose as following with its 4 CSM sessions:

3.3 CSM session and volume mapping

Following tables provide a mapping overview to get a better understanding which physical volume needs to be assigned to which CSM Session volume role.

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3 Metro

Mirror

Global Mirror

Global

Copy

Global Mirror*

Metro

Mirror*

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_R_MMGM_RL

<pref>_R_GC_LL

H2

H1

* Replication after local HyperSwap

B D

DC2

(DR Site)

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

A C

DC11

DC12

DC21

DC22

H1H3

H1

H3

Global Mirror

Global Mirror

H2 H2

<pref>_L_MMGM_LR <pref>_R_MMGM_RL

<pref>_L_GC_RR <pref>_R_GC_LL

Global Mirror*Global Mirror*

Metro

Mirror

Metro

Mirror

Global

Copy

Global

CopyMetro

Mirror*

Metro

Mirror*

* Replication after local HyperSwap

H1

H2

H1

H2

Ja Jc

J3J3

Page 15: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 15 © IBM Copyright, 2016

Volume mapping for CSM sessions managing volumes of Sysplex A:

Datacenter DC1 (Local) DC2 (Remote)

System DC11 DC12 DC21 DC22

Volumes Sysplex A A Ja B C Jc D

<pref>_L_MMGM_LR H1

H2 H3 J3

<pref>_L_GC_RR

H1

H2

<pref>_R_MMGM_RL H3 J3

H1

H2

<pref>_R_GC_LL H1

H2

Volume mapping for CSM sessions managing volumes of Sysplex A:

Datacenter DC1 (Local) DC2 (Remote)

System DC11 DC12 DC21 DC22

Volumes Sysplex B B A Ja D C Jc

<pref>_L_MMGM_LR H2 H1

H3 J3

<pref>_L_GC_RR

H2 H1

<pref>_R_MMGM_RL

H3 J3 H2 H1

<pref>_R_GC_LL H2 H1

3.4 Avoiding configuration and operational errors

Managing 4 individual but dependent Sessions per Sysplex might become a difficult task for operational staff. The most critical operations are configuration changes, which must be managed across the 4 dependent sessions at a time. According to the volume mapping table, configuration changes seems to get difficult, but actually they are not if certain rules are followed.

Fortunately, CSM provides a site awareness feature, which allows us to define per CSM Session, which storage system might be used for each individual volume role. This avoids choosing the wrong volume when it is assigned to a certain session and volume role. This is accomplished by defining a unique Location label per defined Storage System:

Datacenter System Location Label

DC1 (Local) DC11 DC11

DC12 DC12

DC2 (Remote) DC21 DC21

DC22 DC22

When Sessions are configured, the appropriate location label can be assigned to each site in the Session. This will activate the site awareness filtering capability when volumes

Page 16: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 16 © IBM Copyright, 2016

are added to the session. CSM will not allow to add a volume into a role when the location labels of the session and the storage system do not match.

Since only 2 of the 4 sessions are supposed to be active at a time, we need to consider how to toggle between the active and inactive CSM sessions when a site switch needs to be performed. Toggling between active and inactive CSM Sessions without terminating the HW relationships on the storage systems requires that all Copy Sets need to be removed from the session with a special option, which does not delete the hardware relations of the removed copy sets. Existing hardware relations can be assimilated by the inactive session that is going to be started. This avoids a full copy when we need to toggle between the CSM sessions. In order to avoid that the wrong session is started upon such a site switch scenario, the configuration of the emptied Session will be re-imported just after the start of the inactive sessions. Since empty sessions cannot be started, only the 2 correct Sessions can be started when toggling the CSM sessions.

Following is the high level procedure that needs to be used for toggling CSM sessions during any site switch scenario:

Remove copy sets from original sessions without deleting hardware relations. This brings the original sessions into defined state without any copy sets

Defined sessions without copy sets cannot be started by mistake

Start the defined <pref>_x_MMGM_xy session for the opposite direction

This will assimilate existing hardware relations.

Start the defined <pref>_x_GC_yy session for the practice copy as required

This will assimilate existing hardware relations.

Just once all is in expected state, re-import copy sets into the original sessions to prepare them all for another site switch back to original site.

Copy sets can be re-imported from a previously saved *.csv file (e.g. from a Session configuration export prior Copy Set removal)

To avoid configuration errors across the set of 4 dependent CSM sessions, we will use a central *.csv file per Sysplex to define all 6 volumes. Any of the 4 sessions can then import its required volume configuration from that central *.csv file

3.5 Rules for hardware relationship assimilations in CSM sessions

CSM Sessions can assimilate existing DS8000 PPRC relationships without full copy when the session is (re-)started or new copy sets are added to an active session. However, the hardware relation must be in any of the following states to get assimilated properly by CSM:

Copy pending (PPRC source and target)

Full Duplex (PPRC source and target)

Suspended (where the PPRC target is either target suspended or target full duplex)

CSM Sessions cannot assimilate following PPRC relationships and therefore will fail on session Start commands or when adding them to an active session:

PPRC Failover state where both (PPRC source and target volume) are primary suspended

Page 17: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 17 © IBM Copyright, 2016

As a work around, a manual PPRC failback via DSCLI can be performed into the started copy set direction (refresh CSM session state to verify proper assimilation afterwards)

Either PPRC source or target is in simplex state

This condition must be manually cleaned up via DSCLI

In order to ensure that an active CSM session which is going to be deactivated leaves hardware relations that can be assimilated, the copy sets should only be removed if they are in any of the following states:

Prepared

Preparing

Suspended

The corresponding commands that bring the Session into the above states are any of the following:

Start…, StartGM… and StartGC… with a resulting state of Preparing or Prepared for

the role pairs

Suspend with a resulting state of Suspended for the role pairs

If copy sets show Target Available for any of the role pairs, the relation is in a PPRC failover state that cannot be assimilated by another session! The corresponding commands with a resulting state of Target Available that should NOT be the last commands prior Copy Set removal are:

Recover… with a resulting state of Target available for the role pair

Failover…with a resulting state of Suspended (partially) for the session and Target available for the role pair

Note:

CSM only maintains the base PPRC relations (Metro Mirror and Global Copy). All the logical Global Mirror configuration as well as the Flash Copy relations for the GM Journaling are cleaned up to ensure proper assimilations by other sessions. These configurations can be quickly established by the new session as required once the base relations are assimilated.

Common rules for this 4 site replication solution:

Before the copy sets can be removed from an active session to switch the sites, a successful Start… command needs to be executed to the active session as last

command. This start command needs to be in the direction how the hardware relationships needs to be assimilated by the inactive Session.

If the Start… command fails (e.g. due to primary system problems after an unplanned recovery on the remote site), leave the session active until the Start… can be repeated successfully. The Start… in the active Session must perform the required PPRC failback sequences on the hardware relationships to allow an out of sync copy.

Page 18: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 18 © IBM Copyright, 2016

Do not try to start the inactive session for the active role pairs in a PPRC failover condition, it will fail. Any deviation from the above rules might require a full copy from scratch for the copy sets.

3.6 Automated PPRC path management

CSM provides a FCP port pairing definition file that can be used to define the hardware FCP link port pairs for PPRC paths to CSM. When copy relations need to be (re-)started, CSM will then make sure, that required PPRC paths are established across all the physical port pairs that are defined between a system pair. Those port pairs will be used for logical PPRC paths in either direction. Following is a link to the CSM Knowledge Center which describes this feature in more detail:

http://www.ibm.com/support/knowledgecenter/SSESK4_6.1.3/com.ibm.storage.csm.help.doc/frg_t_addpath_csv.html

3.7 Managing configuration changes

Since there are 4 different but corresponding sessions to manage the replication for a logical dependent set of data (e.g. for a Sysplex), any configuration change adds the risk of misconfiguration across the 4 dependent sessions that share the same <pref>_. Adding or removing Copy Sets individually or by different *.CSV files has the risk of manual errors, especially across the 4 dependent sessions. With following approach, most human errors can be avoided:

Use a single <pref>_volumes.csv file that contains all 6 devices of the replication solution in a fixed column order, e.g:

A, Ja, B, C, Jc, D

Define 4 different column header in this file (one for each <pref>_ session) to assign the required CSM Session roles to the fixed device columns (a recommended header section can be found below)

Only one of the 4 headers is uncommented at a time, the others are commented (inactive).

Add a commented header line description to indicate the associated Session name

The order of the volume roles within the header is flexible, CSM assigns it to the corresponding device column during import. For the required role order in each header, refer to the section 3.3 CSM session and volume mapping

You need to use dummy volume roles in the header to skip unused device columns for particular sessions

• E.g X1, X2, X3, X4

Use the single csv file to create or update the Copy Sets for all 4 sessions together

Uncomment the appropriate header line and re-save the csv file prior import into the corresponding CSM session

If the active header does not match to the corresponding session, no valid copy sets will be found since the site awareness feature of CSM will prevent this

Already existing Copy Sets won’t be re-added. In the Copy Set result panel you can validate the Copy Sets that are allowed to be added. Since volumes are used in more than a single session, CSM will issue warnings for most of the added copy

Page 19: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 19 © IBM Copyright, 2016

sets. But since we require usage of same volumes in multiple sessions and use dedicated operational procedures, those warnings can be ignored.

When you add volumes, use following order:

Update first the active <pref>_x_MMGM_xy session.

• Added volumes will be started directly

Then update the active cascaded <pref>_x_GC_yy session.

• Added volumes will directly be cascaded off the GM target volumes. This order avoids to copy the cascaded volumes twice

Update the inactive <pref>_y_MMGM_yx and <pref>_y_GC_xx sessions.

Make sure the copy set count across all 4 <pref>_ sessions is the same

When you remove volumes, use following order:

Remove corresponding copy sets from the csv file and resave it

Remove the corresponding copy sets from the active <pref>_x_MMGM_xy session. Use the default removal options to cleanup as well the hardware relationships

Remove the corresponding copy sets from the active <pref>_x_GC_yy session. Use the default removal options to cleanup as well the hardware relationships

Remove the corresponding copy sets from the inactive <pref>_y_MMGM_yx and <pref>_y_GC_xx sessions.

Make sure the copy set count across all 4 <pref>_ sessions is the same

Hint:

The recommended approach to perform configuration changes to multiple sessions is from the Session overview panel in the GUI. It allows to set a filter to display only the <pref>_ sessions you need to work with. Then you can add/remove copy sets on the selected session, by using the order specified above.

Following you can find a header proposal for the common <pref>_volumes.csv file:

## central CSV file for Sysplex <pref> ## enable corresponding header line prior copy set import ## <pref>_L_MMGM_LR header #H1, X1, H2, H3, J3, X2 ## <pref>_L_GC_RR header #X1, X2, X3, H1, X4, H2 ##<pref>_R_MMGM_RL header #H3, J3, X1, H1, X2, H2 ## <pref>_R_GC_LL header #H1, X1, H2, X2, X3, X4 ## fixed device columns ## Sysplex A: A,Ja = DC11; B = DC12; C,Jc = DC21; D = DC22 ## Sysplex B: A,Ja = DC12; B = DC11; C,Jc = DC22; D = DC21 ## Copy set definitions (full volume syntax from CSM required per device) ## e.g. DS8000:2107.ZA571:VOL:5000 ## A, Ja, B, C, Jc, D

Page 20: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 20 © IBM Copyright, 2016

aaaa,jaja,bbbb,cccc,jcjc,dddd aaaa,jaja,bbbb,cccc,jcjc,dddd …

Page 21: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 21 © IBM Copyright, 2016

4 z/OS & HyperSwap considerations This 4 site solution optionally supports z/OS HyperSwap for the synchronous replication within a datacenter. That means HyperSwap can be used to provide local Storage High Availability for a Sysplex. If a local disaster strikes and the production needs to be switched to the remote datacenter, a manual Failover/Site switch becomes necessary and the Sysplex LPARs need to be IPL’ed from the other datacenter. However, on the other datacenter, local High Availability can be re-established quickly, also if the primary 2 systems are still down. This is the benefit of having this 4 site solution and with the assimilation of the former cascaded Global Copy relations at the remote site, the surviving storage systems can be resynched quickly without a full copy transfer.

4.1 Common z/OS configuration considerations

Following are references to more detailed information to configure and use z/OS HyperSwap as well as requirements to support DS8000 Multi target PPRC on z/OS:

Prepare all z/OS LPARs to support HyperSwap

Review the HyperSwap configuration recommendations in Chapter 6 of the TPC-R for System z Redbook and apply them (IPL of LPARs might be required to apply changes)

• http://www.redbooks.ibm.com/abstracts/sg247563.html?Open

Review the z/OS Hot Topics Newletter of August 2016 on page 59, for recommended settings to speed up recovery times and raise HyperSwap triggers faster

• https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?htmlfid=ZS103027USEN&

Prepare all z/OS LPARs to support DS8000 Multi Target PPRC

Multi Target PPRC and HyperSwap supported on V1.13 and V2.1 or later with special APARs

• APAR OA44240: Provides the IOS infrastructure for MT-PPRC and Hyperswap

- PTFs UA90740/UA90741/UA90742

• APAR OA47113: Problems with OA44240 -IOSHSAPI ABEND878, z/OS HS disabled, or z/OS HS Policy incorrect

• APAR OA46683: Enables MT-PPRC for Basic Hyperswap and GDPS Hyperswap

• APAR OA43661: Enables MT-PPRC support for DFSMS

- Full support requires OCEOV APAR (OA43661), Device Support APAR (OA43662), AOM/DEVSERV APAR (OA43663), DEVMAN APAR (OA46198), IOS/BCP APAR (OA46173), SDM APAR (OA43654), ICKDSF APAR (PM99490)

Check latest HOLDDATA for a complete list of latest APARs

4.2 Couple Dataset (CDS) placement

In contrast to some older CDS placement recommendations, the latest Basic HyperSwap configuration recommendation as of writing this paper is that both LOGR Couple Data

Page 22: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 22 © IBM Copyright, 2016

Sets (primary and alternate) should be part of the PPRC configuration for HyperSwap to ensure actual Sysplex recovery information is available on either site. However, all other types of CDS must NOT be replicated as part of the PPRC configuration. Instead, they are replicated via the Active/Alternate CDS mechanism of the Sysplex coupling facility only. Those CDS should be placed on simplex volumes and those volumes should preferably be on a separate LCU which is not replicated at all to avoid any kind of Freeze impacts. For primary and alternate CDS redundancy, those volumes should be located on different systems. Furthermore there should be spare CDS volumes prepared on the other systems in order to re-create the CDS redundancy in case of any PSWITCH event that occurred in the Sysplex.

Following picture illustrates the recommended CDS placement in this 4 site solution, where the Sysplex LPARs should be IPL’ed from either DC1 or DC2 systems. This example is explicitly for a Sysplex that has its primary data on the DC11 system:

Note that LOGR CDS must be consistent with the latest state of the Sysplex, therefore they are part of the HyperSwap configuration to ensure they are swapped with the productive workload and are consistent on either site. All other CDS on the simplex volumes contain pretty much static information and Sysplex status information that can be restored as part of an IPL.

However, they also contain Coupling Facility (XCF) policy definitions. So if XCF policy changes are applied at the active Sysplex on production, those policy changes also need to be applied somehow on the simplex CDS at the other site. In order to access those simplex volumes at the inactive site, a DR practice copy can be created and DR LPARs being started off the 4th system to apply these XCF policy changes in parallel.

DC2

(DR Site)

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

P

A

P P S S

P’

A’

A A S S

P’

A’

P P S S

P’

A’

A A S S

P Primary LOGR CDS (PPRC source)

A Alternate LOGR CDS (PPRC source)

P’ Primary LOGR CDS (PPRC Target)

A’ Alternate LOGR CDS (PPRC Target)

PAll other Primary CDS (simplex) for Sysplex

running in DC1

AAll other Alternate CDS (simplex) for Sysplex

running in DC1

STemp Spares for CDS during outage (simplex) for

Sysplex running in DC1

PAll other Primary CDS (simplex) for Sysplex

running in DC2

AAll other Alternate CDS (simplex) for Sysplex

running in DC2

STemp Spares for CDS during outage (simplex) for

Sysplex running in DC2

Data Volumes (PPRCed)

PPRC

Page 23: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 23 © IBM Copyright, 2016

5 Operational scenarios This chapter describes operational scenarios to manage this 4 site replication solution. Those scenarios consider the dependencies of the two active sessions as well as toggling the 2 active with the 2 inactive CSM sessions upon a site switch scenario. Those scenarios ensure that no full copy is required and relationships can be assimilated without problems when inactive CSM sessions are started.

Note:

The scenarios have been validated in a real environment. Any deviation from them might result in undesired errors, which in the worst case requires a Copy Services relationship cleanup with a resulting full copy.

5.1 (Re-) Establish replication from DC1 to DC2

Following procedure can be used to either start the replication from scratch, or to restart the replication and eventually assimilate existing hardware relationships for the environment <pref>:

For an inactive Session <pref>_L_MMGM_LR, ensure that the Active Host site is on H1

If not, issue Set Production to Site 1 to the session

Start Session <pref>_L_MMGM_LR (Start H1->H2 H1->H3)

This will (re-)start or assimilate copy relations between local systems and the designated remote system, where the H3 volumes resides

Wait until the Session is Prepared and in green Normal state

If HyperSwap is enabled for the Session on the H1-H2 role pair, HyperSwap should get enabled automatically (See HS indicator in Session topology graphic, the H1-H2 role pair in the session details will also indicate HS)

For an inactive Session <pref>_L_GC_RR, ensure that the Active Host site is on H1

If not, issue Enable Copy to Site 2 to the session

Issue StartGC H1->H2 for Session <pref>_L_GC_RR

This will (re-)start or assimilate the cascaded copy relations between the remote systems for an optional practice copy on the 4th system

Note: The practice copy must be made consistent by manual operation (see following procedure)

The final replication will be as following for the <pref>_ environment with production volumes on DC11

Page 24: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 24 © IBM Copyright, 2016

5.2 Create a DR practice copy on remote DC (4th system)

This procedure describes how a consistent practice copy can be created with least impact to production and DR capability. It can be used independent of which datacenter is running the actual production workload, the active Sessions either start both with <pref>_L or <pref>_R.

If the <pref>_x _MMGM_ Session is in a Normal, Prepared state with the Global Mirror role pair H1-J3 forming Consistency groups, and the corresponding <pref>_x_GC_ Session in Warning, Preparing state, a consistent DR practice copy can be created as following:

In the <pref>_x _MMGM_ Session issue following

SuspendH1H3

• This will stop forming Global Mirror CGs and suspend the Global Copy pairs of the H1-H3 role pair

• The Session state will change to Severe, Suspended (Partial)

• Local DR (and optional HyperSwap) capability is maintained between H1 and H2

FailoverH3

• The failover is required in case Global Mirror could not be paused with the consistency option for the H3 volumes.

• This restores the last Global Mirror CG on the H3 volumes to ensure they are consistent

• Wait until the H3 volume status in the session changes to consistent. The relationships in the H1-H3 role pair will become Target Available

In the corresponding <pref>_x _GC_ Session issue following

Suspend

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3Metro

Mirror

Global Mirror

Global

Copy

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Page 25: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 25 © IBM Copyright, 2016

• This will convert the cascaded Global Copy relations into Metro Mirror and Suspend the relationships consistently once all are synchronized.

• Wait until the Session state changes into Severe, Suspended.

Recover

• This will do a PPRC failover at the volumes on the 4th system to allow full access to the DR practice copy

• Wait until the Session state changes into Normal, Target Available

• Changes will now be tracked on either site to allow an incremental resynchronization of the cascaded relationship later on, once DR practice tests are completed

In the <pref>_x _MMGM_ Session issue following

StartGM H1->H3

• This will resume Global Copy and restart forming Global Mirror CGs

• The Session state will change back to normal, prepared

The volumes on the 4th system can now be used for IPL and for DR testing

The final replication will be as following for the <pref>_ environment with production volumes on DC11 and DR tests on DC22

Following are the benefits of creating the consistent DR copy at the 4 th system in the replication context:

Local DR (and optional HyperSwap) capability between H1 and H2 is maintained all the time during this procedure

Remote DR capability is restored on 3rd system while testing can be commenced on 4th system

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3Metro

Mirror

Global Mirror

Target

Available

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Page 26: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 26 © IBM Copyright, 2016

• Remote DR capability impact is minimal. It is just for the duration of restoring the last Global Mirror CG, which needs a final incremental synchronization with the volumes at the 4th system, before the Global Mirror relations can be restarted.

5.3 Finishing the DR practice copy on remote DC (4th system)

Once DR practice tests are completed, this procedure restores the original replication state of the active Sessions, which both begin with either <pref>_L or <pref>_R.

Vary all volumes on 4th system offline again in all LPARs accessing them (or power down LPARs)

In the corresponding <pref>_x_GC_ Session issue following

StartGC H1->H2

• This will resume PPRC and continue Global Copy for another practice test in future

• The session will go to Warning, Preparing state and remain there

Following procedures can be used for trouble shooting:

In case the StartGC H1->H2 fails, it might be because volumes are still in a grouped state which indicates they are online to a host

This sanity check can be overruled by disabling following Session property in the corresponding <pref>_x_GC_ session

• “Fail MM/GC if target is online (CKD)”

Re-issue the StartGC H1->H2 after all volumes have taken offline at the 4th system or the above property was disabled

• You should re-enable the property again afterwards to have the sanity check for future start commands

In case a synchronous Start H1->H2 was issued by mistake to the <pref>_x_GC_ session, it will fail

DS8000 does not allow to cascade a synchronous copy off an asynchronous copy pair

This prevents that a wrong start command might cause I/O impact for synchronous cascaded copy over distance

Re-issue the correct asynchronous StartGC H1->H2 instead for the session

The final replication will be as following for the <pref>_ environment with production volumes on DC11

Page 27: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 27 © IBM Copyright, 2016

5.4 HyperSwap the production workload on local site

If HyperSwap is enabled for the active _MMGM_ Session, the production I/O can be switched transparently to the other system at the same datacenter. If there are HyperSwap triggers, it will swap automatically (unplanned HyperSwap).

Optionally you can perform a manual HyperSwap (Planned HyperSwap):

In the active <pref>_x _MMGM_ session, issue the HyperSwap H1H2 command

This will concurrently switch production workload between H1 and H2 volumes

The session will change into Normal, Target available state and the Active Site has changed between the H1 and H2 volumes

HyperSwap will be disabled

The final replication will be as following for the <pref>_ environment with production volumes on DC11

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3Metro

Mirror

Global Mirror

Global

Copy

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Page 28: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 28 © IBM Copyright, 2016

Independent whether a planned or unplanned HyperSwap occurred, at a defined point later on you can resynchronize the local Metro Mirror relations between (H1-H2) and/or the Global Mirror relations to the designated remote system from the new active production site (H2-H3 or H1-H3). Due to the DS8000 Multi Target PPRC capabilities, this will all be an incremental resync instead of a full copy. The cascaded Global Copy relation on the remote site can continue to run through.

The relations that can be resynced after a HyperSwap depend on the actual situation:

Any resync for the local and the remote site can be done immediately after a planned HyperSwap

If H2 is the new active production site, issue the Start H2->H1 H2->H3 command to get back to a normal state.

If H1 is the new active production site, issue the Start H1->H2 H1->H3 command to get back to a normal state.

Wait until both legs will reach a state of Prepared and the session goes into a Prepared state with Normal status.

When HyperSwap has been re-enabled, the H1-H2 role pair will change automatically from MM to HS type with HyperSwap capability back into opposite direction

An Unplanned HyperSwap might require additional repair actions until the local sites can be resynchronized

An unplanned HyperSwap might have been triggered due to primary storage controller issues

• To resynchronize individual legs only when production is active on H2, issue the StartGM H2->H3 or Start H2->H1 command as required

• To resynchronize individual legs only when production is active on H1, issue the StartGM H1->H3 or Start H1->H2 command as required

When all systems are in a normal state again and PPRC links are up

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3

Global

Copy

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Target

Available

Suspended

Page 29: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 29 © IBM Copyright, 2016

• If H2 is the new active production site, issue the Start H2->H1 H2->H3 command to get back to a normal state.

• If H1 is the new active production site, issue the Start H1->H2 H1->H3 command to get back to a normal state.

• Wait until both legs will reach a state of Prepared and the session goes into a Prepared state with Normal status.

• When HyperSwap has been re-enabled, the H1-H2 role pair will change automatically from MM to HS type with HyperSwap capability back into opposite direction

The corresponding active _GC_ session can remain in the preparing state throughout the whole local failover and resync scenario

Since Global Mirror can always be restored to the last successfully formed consistency group, it doesn’t make much sense to restore it on the H3 volumes and sync it with the _GC_ session on the 4th system for a ‘golden copy’ prior Global Mirror resync.

When resync is completed after a HyperSwap to H2, the final replication will be as following for the <pref>_ environment with production volumes on DC12

5.5 Failover production workload on local site

If HyperSwap is not enabled for the Session, a failover to the secondaries is not transparent and requires re-IPL from secondary devices. Following is the procedure to manually failover production workload on the local site:

In the active <pref>_x _MMGM_ session, issue the Suspend command

This will Freeze/Unfreeze the H1-H2 Metro Mirror pairs and suspend the H1-H3 or H2-H3 Global Mirror pairs, which depends the active production site (H1 or H2)

If production should be switched to H2:

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3

Global

Copy

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Metro

Mirror Global Mirror

Page 30: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 30 © IBM Copyright, 2016

Issue the RecoverH2 command in the active <pref>_x_MMGM_ session

This will enable updates on the H2 volumes and track changes for incremental resynchronization later on

Re-IPL LPARs from H2 volumes (B or D volumes)

If production should be switched to H1:

Issue the RecoverH1 command in the active <pref>_x_MMGM_ session

This will enable updates on the H1 volumes and track changes for incremental resynchronization later on

Re-IPL LPARs from H1 volumes (A or C volumes)

Once IPL is successful, the production swap can be confirmed to the <pref>_x_MMGM_ session. This will enable the commands for reversed copy direction and disable commands for original copy direction between H1 and H2

If production was switched to H2, issue Confirm Production at Site 2

If production was switched to H1, issue Confirm Production at Site 1

The final replication will be as following for the <pref>_ environment with production volumes manually switched from DC11 to DC12 (Switch production to H2)

At a defined point later on, resynchronize the local Metro Mirror volumes and/or Global Mirror to the designated remote system (out of sync copy)

This depends on the actual system states, it can be done immediately after a planned local production Failover

An unplanned local production Failover might require additional repair actions until the original local site can be resynchronized

The relation to designated remote system can be restarted any time as required:

• If Production was switched to H2, restart Global Mirror incremental resync via StartGM H2->H3

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3

Global

Copy

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Target

Available

Suspended

Page 31: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 31 © IBM Copyright, 2016

• If Production was switched to H1, restart Global Mirror incremental resync via StartGM H1->H3

When all systems are in a normal state again and PPRC links are up, bring them all back to a normal replication

If Production was switched to H2, issue the Start H2->H1 H2->H3

If Production was switched to H1, issue the Start H1->H2 H1->H3

The corresponding active _GC_ session can remain in the preparing state throughout the whole local failover and resync scenario

Since Global Mirror can always be restored to the last successfully formed consistency group, it doesn’t make much sense to restore it on the H3 volumes and sync it with the _GC_ session on the 4th system to preserver a ‘golden copy’ prior Global Mirror resync

When resync is completed after a manual production switch to H2, the final replication will be as following for the <pref>_ environment with production volumes on DC12

5.6 Planned remote failover – Switch datacenters

Switching datacenters (or remote failover) in this solution means the production is completely switched to the remote datacenter and will keep running there for a longer period. Therefore the local DR/HA capability should be enabled at the remote site, while the remote DR capability should also be restored. In term of CSM session management, this means the active Sessions need to be toggled from <pref>_x_ to <pref>_y_. This procedure allows these transitions without any full copy since it utilizes assimilation of existing relations and incremental resync capabilities of the DS8000.

As a starting point for this procedure, production can run in DC1 or DC2 and the active <pref>_x_MMGM_ session can have the active host on either H1 or H2. In other words, production could run on either A, B, C or D volumes, while the local Metro Mirror and

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3

Global

Copy

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Metro

Mirror Global Mirror

Page 32: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 32 © IBM Copyright, 2016

remote Global Mirror replication is in place in the active <pref>_x_MMGM_ session, as well as the cascaded Global Copy replication in the active <pref>_x_GC_ session.

Following steps are required if a planned datacenter switch needs to be performed for Sysplex <pref>

In the active <pref>_x_MMGM_ Session issue:

Suspend

• This will stop forming Global Mirror CGs and suspend its Global Copy pairs as well as the Metro Mirror pairs

• The Session state will change to Severe, Suspended

RecoverH3

• This restores the last Global Mirror CG on the H3 volumes to ensure they are consistent

In the corresponding active <pref>_x_GC_ Session issue following

Suspend

• This will convert the cascaded Global Copy into Metro Mirror and once they are all full duplex it will Freeze/Unfreeze the relationships consistently

• It will save a consistent data point at the 4th system before IPL from 3rd system

• It should not take long since the cascaded Global Copy should be almost in sync most of the time

IPL LPARs from the H3 volumes

These are the C volumes if you did a remote failover to DC2

These are the A volumes if you did a remote failover to DC1

Following is a picture with the resulting replication states after a successful IPL from the C volumes on DC21. It does not matter whether the production was previously on DC11 or D12.

Page 33: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 33 © IBM Copyright, 2016

Note:

In this special situation, we have a PPRC failover of a multi target relation plus a suspended cascaded relation to a third target on the C or A volumes. DS8000 Multi Target PPRC supports up to 3 different target relations on a primary volume, as long as not more than 2 of them are active in copy.

If the IPL is not successful, you can restore the good data that were saved earlier from the 4th system onto the H3 volumes by doing following in the suspended <pref>_x_GC_ Session:

Recover

Enable Copy to Site 1

Start H2->H1

• This will resync the good data back to the H3 volumes that are used for IPL (C or A volumes)

When session state is prepared: Suspend

Recover

When session state is Target Available with active host on H1:

• Retry IPL of LPARs from 3rd system (C or A volumes)

• You can always restart the <pref>_x_GC_ session from H2 to H1 and suspend and recover on H1 again for additional IPL trials

Following is the resulting replication state after a data restore from the 4 th system, when the planned production switch was to DC21 (C volumes)

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3

Suspended

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Target

Available

Target

Available

Target

Available

Page 34: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 34 © IBM Copyright, 2016

After successful IPL, prepare the relations for a CSM Session swap from the <pref>_x_ Sessions to the <pref>_y_ Sessions

In the active <pref>_x_MMGM_ Session issue

• Confirm production at site 3

- This enables commands for replication start off site 3

• StartGC H3->H1->H2

- This will establish cascaded Global Copy relations back from H3 to H2 and H1 and change the session state to Warning, Preparing

- Preparing is a role pair state that can be assimilated later on by the other inactive CSM session

- Do never StartGC H3->H2->H1 since the H3-H2 relation between the datacenters cannot be assimilated by the other <pref>_x_MMGM_ session. If that happened by mistake, see section 5.8 Revert a mistaken StartGC H3->H2->H1 action

In the active <pref>_x_GC_ Session, check that the active Host is on H1 and the session state is either Suspended or Target Available.

• If the active Host is still set to H2 and the state is Target Available from a previous data restore from the 4th system, you need to issue Enable Copy to Site 2

In the active <pref>_x_GC_ Session issue Start H1->H2

• This will resume copy of the active production volumes between the local systems and bring the pairs into a state that can be assimilated later on by the other CSM session

Following is the resulting state after restarting replication with the original active sessions, when the planned production switch was to DC21 (C volumes).

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Target

Available

Target

Available

Target

Available

Target

Available

Page 35: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 35 © IBM Copyright, 2016

Now all relations have a state that can be assimilated when we toggle the active CSM sessions as following:

If no central <pref>_volumes.csv file is used, preserve actual Session configuration and deactivate both active sessions

Export the Copy Sets of the active <pref>_x_MMGM_ and <pref>_x_GC_ sessions

• Save the copy set files to a location where from where you can re-import the files later on

Remove all Copy Sets from the active <pref>_x_MMGM_ and <pref>_x_GC_ session

Ensure to select the option ‘Yes, keep the base hardware relationships’ during the copy set removal dialog. Otherwise a full copy will be required later on since the PPRC pairs will be removed completely.

Once all copy sets are removed, the sessions will go back to an inactive, defined state

At this point, all 4 <pref>_ sessions of the Sysplex are inactive, but the copy relations are still running on the hardware

The emptied <pref>_x_ Sessions cannot not be restarted, which prevents restarting them by mistake during the Session swap procedure.

Assimilate the hardware copy relations in the <pref>_y_ sessions defined into opposite direction

In corresponding <pref>_y_MMGM_ Session defined in the opposite direction

• Verify the active host is set to H1

- If not, issue Set Production to Site 1 to set it back to H1

• Then issue Start H1->H2 H1->H3

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3

Global Copy

Metro

Mirror

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Global

Copy

Page 36: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 36 © IBM Copyright, 2016

- This will assimilate the still existing Global Copy pairs between the datacenters as well as the local Global Mirror or Global Copy pairs

• The session should finally go into a normal, prepared state with a local Metro Mirror and a remote Global Mirror relationship

• If the Session has HyperSwap enabled on the H1-H2 role pair, it will also send the Metro Mirror configuration again to the z/OS HyperSwap manager which will finally enable local HyperSwap for the Sysplex.

In the corresponding <pref>_y_GC_ Session defined at the other site issue StartGC H1->H2

This will assimilate the still existing Global Copy pairs at the other site to be prepared to take a future consistent copy for DR tests.

Following is the resulting state after swapping the active <pref>_ sessions defined into opposite direction when the planned production switch was to DC21 (C volumes).

Add the Copy Sets again into the empty <pref>_x_ Sessions

Do this step last to avoid start of the wrong sessions during the session swap

Either use the previously saved copy set files or activate the corresponding header in a central <pref>_volumes.csv file to restore previous configuration

Check that there are no errors when adding the copy

• There will be only warnings if the volume association is correct. The warning is because of volumes are defined already to other CSM sessions

Check that all 4 <pref>_ sessions have the same copy set count at the end

Make sure the active host is set to H1 in the inactive <pref>_x_ Sessions

• If not, issue ‘Set Production to Site 1’ or ‘Enable Copy to site 2’ to set it back to H1

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3

Metro

Mirror

Global Mirror

Global

Copy

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_R_MMGM_RL

<pref>_R_GC_LL

H2

H1

Page 37: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 37 © IBM Copyright, 2016

5.7 Unplanned remote failover – Switch Datacenters

Switching datacenters due to a local disaster (or unplanned remote failover) in this solution means the production is completely switched to the remote datacenter to recover from a local outage, which prevented a local production failover or HyperSwap. The Sysplex LPARs need to be IPLed from the remote site and keep running there for an undetermined period. Therefore the local DR/HA capability should be enabled at the remote site as soon as possible, while the remote DR capability (Global Mirror) should be restored once the original site is operational again. It is the same scenario as for a planned remote failover, except that the remote replication eventually cannot be restarted.

In terms of CSM session management, this means the active Sessions cannot be switched from <pref>_x_ to <pref>_y_ until the active <pref>_x_MMGM_ Session can perform the necessary PPRC failback actions to bring all relations into a state that can be assimilated by the other CSM session.

As a starting point for this procedure, production can run in DC1 or DC2 and the active <pref>_x_MMGM_ session can have the active host on either H1 or H2. In other words, production could run on either A, B, C or D volumes, while the local Metro Mirror and remote Global Mirror replication is in place in the active <pref>_x_MMGM_ session, as well as the cascaded Global Copy replication in the active <pref>_x_GC_ session.

Following steps are required if an unplanned datacenter switch has to be performed for Sysplex <pref>:

In the active <pref>_x_MMGM_ Session issue:

Suspend (if the Session is not suspended yet as a result of a disaster)

• This will stop forming Global Mirror CGs and suspend its Global Copy pairs as well as the Metro Mirror pairs

• The Session state will change to Severe, Suspended

- If the Session state ends in Suspending and cannot complete due to communication loss to H1 and/or H2 systems, Issue the Stop command to bring the Session into a Suspended state.

RecoverH3

• This restores the last Global Mirror CG on the H3 volumes to ensure they are consistent

In the corresponding active <pref>_x_GC_ Session issue following

Suspend

• This will convert the cascaded Global Copy into Metro Mirror and once they are all full duplex it will Freeze/Unfreeze the relationships consistently

• It will save a consistent data point at the 4th system before IPL from 3rd system

• It should not take long since the cascaded Global Copy should be almost in sync most of the time

IPL LPARs from the H3 volumes

These are the C volumes if you did a remote failover to DC2

These are the A volumes if you did a remote failover to DC1

Page 38: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 38 © IBM Copyright, 2016

Following is a picture with the resulting replication states after a successful IPL from the C volumes on DC21. It does not matter whether the production was previously on DC11 or D12.

If the IPL is not successful, you can restore the good data that were saved earlier from the 4th system onto the H3 volumes by doing following in the suspended <pref>_x_GC_ Session:

Recover

Enable Copy to Site 1

Start H2->H1

• This will resync the good data back to the H3 volumes that are used for IPL (C or A volumes)

When session state is prepared: Suspend

Recover

When session state is Target Available with active host on H1:

• Retry IPL of LPARs from 3rd system (C or A volumes)

• You can always restart the <pref>_x_GC_ session from H2 to H1 and suspend and recover on H1 again for additional IPL trials

If the IPL is successful, restore local DR/HA capability. We use the active <pref>_x_GC_ Session for the time being until the remote replication can be restarted and a complete Session swap can be performed:

In the active <pref>_x_GC_ Session, check that the active Host is on H1 and the session state is either Suspended or Target Available.

• If the active Host is still set to H2 and the state is Target Available from a previous data restore from the 4th system, you need to issue Enable Copy to Site 2

In the active <pref>_x_GC_ Session issue Start H1->H2

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3

Suspended

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Target

Available

Target

Available

Target

Available

Page 39: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 39 © IBM Copyright, 2016

• This will resume copy of the active production volumes between the local systems to provide local DR capability

• If local HA capability is required either, you can Enable HyperSwap in the properties of the active <pref>_x_GC_ session, which will ensure that the Metro Mirror HyperSwap configuration will be passed to the z/OS HyperSwap manager and HyperSwap will be enabled

• Any local DR or HyperSwap action while the active CSM sessions could not be swapped yet can be performed with the active <pref>_x_GC_ session, either by

- Manual or triggered HyperSwap - Or by the command sequence Suspend, Recover, Enable Copy to H1

• A resync after a local DR or HyperSwap action can be done with the command Start H2->H1

- This will bring the session back into a final Normal, Prepared state - If HyperSwap is enabled on the session, it will also send the Metro Mirror

configuration again to the z/OS HyperSwap manager which will finally enable HyperSwap for the Sysplex.

Following are the resulting states after recovery on C volumes and activation of the local HyperSwap capability to D volumes

After the systems on the original site are operational again, prepare the relations for a Session swap

First ensure that production is running on the H3 volumes of the active <pref>_x_MMGM Session (C or A volumes)

• If a local DR or HyperSwap occurred and production was switched to the other local system (D or B volumes), switch the production back locally.

If production is active on the H3 volumes of the active <pref>_x_MMGM_ Session issue

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3

HS

enabled

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Target

Available

Target

Available

Target

Available

Page 40: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 40 © IBM Copyright, 2016

• Confirm production at site 3

- This enables commands for replication start off site 3

• StartGC H3->H1->H2

- This will establish cascaded Global Copy relations back from H3 to H2 and H1 and change the session state to Warning, Preparing

- Preparing is a role pair state that can be assimilated later on by the other inactive CSM session

- Do never StartGC H3->H2->H1 since the H3-H2 relation between the datacenters cannot be assimilated by the other <pref>_x_MMGM_ session. If that happened by mistake, see section 5.8 Revert a mistaken StartGC H3->H2->H1 action

- If StartGC H3->H1->H2 fails due to any kind of H1, H2 or link errors, the necessary PPRC failback sequence was not issued yet and this session must be kept active until a successful StartGC H3->H1->H2 can be issued

In the active <pref>_x_GC_ Session, check that the active Host is on H1 and it has a state of Prepared

• If the active Host is still set to H2 and the state is Target Available from a previous data restore from the 4th system, you need to issue Enable Copy to Site 2

• If the state is not Prepared issue Start H1->H2

- This will resume copy of the active production volumes between the local systems and bring the pairs into a state that can be assimilated later on by the other CSM session

• If HyperSwap is still enabled on the session, remove HyperSwap again in the Session properties.

Following is the resulting state after restarting replication with the original active sessions, when production is active on DC21 (C volumes).

Page 41: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 41 © IBM Copyright, 2016

Now all relations have a state that can be assimilated when we toggle the active CSM sessions as following:

If no central <pref>_volumes.csv file is used, preserve actual Session configuration and deactivate both active sessions

Export the Copy Sets of the active <pref>_x_MMGM_ and <pref>_x_GC_ sessions

• Save the copy set files to a location where from where you can re-import the files later on

Remove all Copy Sets from the active <pref>_x_MMGM_ and <pref>_x_GC_ session

Ensure to select the option ‘Yes, keep the base hardware relationships’ during

the copy set removal dialog. Otherwise a full copy will be required later on since the PPRC pairs will be removed completely.

Once all copy sets are removed, the sessions will go back to an inactive, defined state

At this point, all 4 <pref>_ sessions of the Sysplex are inactive, but the copy relations are still running on the hardware

The emptied <pref>_x_ Sessions cannot not be restarted, which prevents restarting them by mistake during the Session swap procedure.

Assimilate the hardware copy relations in the <pref>_y_ sessions defined into opposite direction

In the corresponding <pref>_y_MMGM_ Session defined in the opposite direction

• Verify the active host is set to H1

- If not, issue Set Production to Site 1 to set it back to H1

• Then issue Start H1->H2 H1->H3

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3

Global Copy

Metro

Mirror

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Global

Copy

Page 42: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 42 © IBM Copyright, 2016

- This will assimilate the still existing Global Copy pairs between the datacenters as well as the local Global Mirror or Global Copy pairs

• The session should finally go into a Normal, Prepared state with a local Metro Mirror and a remote Global Mirror relationship

• If the Session has HyperSwap enabled on the H1-H2 role pair, it will also send the Metro Mirror configuration again to the z/OS HyperSwap manager which will finally enable local HyperSwap for the Sysplex.

In the corresponding <pref>_y_GC_ Session defined at the other site issue StartGC H1->H2

• This will assimilate the still existing Global Copy pairs at the other site to be prepared to take a future consistent copy for DR tests.

Following is the resulting state after swapping the active <pref>_ sessions defined into opposite direction when the planned production switch was to DC21 (C volumes).

Add the Copy Sets again into the empty <pref>_x_ Sessions

Do this step last to avoid start of the wrong sessions during the session swap

Either use the previously saved copy set files or activate the corresponding header in a central <pref>_volumes.csv file to restore previous configuration

Check that there are no errors when adding the copy

• There will be only warnings if the volume association is correct. The warning is because of volumes are defined already to other CSM sessions

Check that all 4 <pref>_ sessions have the same copy set count at the end

Make sure the active host is set to H1 in the inactive <pref>_x_ Sessions

• If not, issue ‘Set Production to Site 1’ or ‘Enable Copy to site 2’ to set it back to H1

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3

Global Mirror

Global

Copy

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_R_MMGM_RL

<pref>_R_GC_LL

H2

H1

HS

enabled

Page 43: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 43 © IBM Copyright, 2016

5.8 Revert a mistaken StartGC H3->H2->H1 action

After a recovery on H3 of any _MMGM_ session, there are following two Start options available after confirming production on H3:

StartGC H3->H1->H2 => Always use this

StartGC H3->H2->H1 => Never use this

Fortunately, the correct StartGC H3->H1->H2 is selected by default in the command popup window in the GUI. However, if the wrong StartGC H3->H2->H1 was selected by mistake, the resumed Global Copy relations between the datacenters cannot be assimilated by the other _MMGM_ session because their H2-H3 role pairs are defined for different physical volumes pairs and an active H2-H3 role pair cannot be assimilated at all by the corresponding _MMGM_ session in this solution.

Following are the replication relations when the wrong StartGC H3->H2->H1 was issued

Two options are possible to reactivate only the H3-H1 and H1-H2 copy relations to ensure they can be assimilated:

1. Switch the production back to previous datacenter as soon as possible by using the original _MMGM_ session that has been started with the wrong H3-H2-H1 relations

Do not activate the other 2 sessions defined for the opposite directions

The site switch back will allow to re-enable local and remote DR capabilities again

2. Accept there is no unplanned remote datacenter DR capability with cascaded Global Copy H3-H2-H1 while running production at the recovery site

Because no GM consistency groups can be formed between H3-H2 in the active _MMGM_ session

Consider there needs to be a planned site switch back at a later point in time and production on H3 should be shut down prior the final resync and suspend of the cascaded H3-H2-H1 Global Copy relation.

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3

Global Copy

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Global

Copy Suspended

Page 44: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 44 © IBM Copyright, 2016

The procedure for a planned site switch back using the active _MMGM_ session is as following:

Shutdown production on H3

Issue the Suspend when Drained command

Issue RecoverH2 command

IPL production from H2 volumes

After successful IPL, Confirm Production on Site 2

Issue Start H2->H1 H2->H3 command to get back to full HA and DR capability with production active on H2 volumes (B or D volumes)

Following are the replication states after the planned switch back to DC12 after the active _MMGM_ session was started with the wrong H3-H2-H1 relation by mistake

A

Ja

B

C

Jc

D

DC2

(DR Site)

H1

H2

H3

J3

Global

Copy

DC1

(Prod. Site)LP1

LP2

LP3

LP4

LP5

LP6

LP7

LP8

LP1‘

LP2‘

LP3‘

LP4‘

LP5‘

LP6‘

LP7‘

LP8‘

DC11

DC12

DC21

DC22

<pref>_L_MMGM_LR

<pref>_L_GC_RR

H2

H1

Metro

Mirror Global Mirror

Page 45: IBM DS8000 four site replication management with IBM Copy ... · Furthermore, these D volumes are used for practicing a disaster recovery. This minimizes the DR Recovery Point Objective

IBM Systems Spectrum Storage Page 45 © IBM Copyright, 2016

6 References

1. IBM Copy Services Manager Knowledge Center http://www.ibm.com/support/knowledgecenter/SSESK4/csm_kcwelcome.html

2. Redbook: Tivoli Storage Productivity Center for Replication for Open Systems, SG24-8149-00 http://www.redbooks.ibm.com/abstracts/sg248149.html?Open

3. Redbook: IBM Tivoli Storage Productivity Center for Replication for System z, SG24-7563-00 http://www.redbooks.ibm.com/abstracts/sg247563.html?Open

4. Redpaper: Securing IBM HyperSwap and IBM Tivoli Storage Productivity Center for Replication Communication Using AT-TLS, REDP-5061-00 http://www.redbooks.ibm.com/abstracts/redp5061.html?Open

5. Redbook: IBM DS8870 Copy Services for Open Systems, SG24-6788-07 http://www.redbooks.ibm.com/abstracts/sg246788.html?Open

6. Redbook: IBM DS8870 Copy Services for IBM z Systems, SG24-6787-07 http://www.redbooks.ibm.com/abstracts/sg246787.html?Open