high availability scenarios with ibm tivoli workload scheduler and ibm tivoli framework sg246632

ibm.com/redbooks

High Availability Scenarios with IBM Tivoli Workload Scheduler andIBM Tivoli Framework

Vasfi GucerSatoko EgawaDavid Oswald

Geoff PuseyJohn Webb

Anthony Yen

Implementing high availability for ITWS and Tivoli Framework

Windows 2000 Cluster Service and HACMP scenarios

Best practices and tips

Front cover

http://www.redbooks.ibm.com/


High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

March 2004

International Technical Support Organization

SG24-6632-00

© Copyright International Business Machines Corporation 2004. All rights reserved.Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADPSchedule Contract with IBM Corp.

First Edition (March 2004)

This edition applies to IBM Tivoli Workload Scheduler Version 8.2, IBM Tivoli Management Framework Version 4.1.

Note: Before using this information and the product it supports, read the information in “Notices” on page vii.

Contents

Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiTrademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixThe team that wrote this redbook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixBecome a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiComments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 IBM Tivoli Workload Scheduler architectural overview . . . . . . . . . . . . . . . . 21.2 IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework . 41.3 High availability terminology used in this book . . . . . . . . . . . . . . . . . . . . . . 71.4 Overview of clustering technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4.1 High availability versus fault tolerance . . . . . . . . . . . . . . . . . . . . . . . . 81.4.2 Server versus job availability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4.3 Standby versus takeover configurations . . . . . . . . . . . . . . . . . . . . . . 121.4.4 IBM HACMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.4.5 Microsoft Cluster Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.5 When to implement IBM Tivoli Workload Scheduler high availability . . . . 241.5.1 High availability solutions versus Backup Domain Manager. . . . . . . 241.5.2 Hardware failures to plan for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.5.3 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.6 Material covered in this book. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Chapter 2. High level design and architecture . . . . . . . . . . . . . . . . . . . . . . 312.1 Concepts of high availability clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.1.1 A bird’s-eye view of high availability clusters . . . . . . . . . . . . . . . . . . 322.1.2 Software considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.1.3 Hardware considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.2 Hardware configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.2.1 Types of hardware cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.2.2 Hot standby system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

2.3 Software configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.3.1 Configurations for implementing IBM Tivoli Workload Scheduler in a

cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.3.2 Software availability within IBM Tivoli Workload Scheduler . . . . . . . 572.3.3 Load balancing software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592.3.4 Job recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

© Copyright IBM Corp. 2004. All rights reserved. iii

Chapter 3. High availability cluster implementation . . . . . . . . . . . . . . . . . 633.1 Our high availability cluster scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.1.1 Mutual takeover for IBM Tivoli Workload Scheduler . . . . . . . . . . . . . 643.1.2 Hot standby for IBM Tivoli Management Framework . . . . . . . . . . . . 66

3.2 Implementing an HACMP cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.2.1 HACMP hardware considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.2.2 HACMP software considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.2.3 Planning and designing an HACMP cluster . . . . . . . . . . . . . . . . . . . 673.2.4 Installing HACMP 5.1 on AIX 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

3.3 Implementing a Microsoft Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1383.3.1 Microsoft Cluster hardware considerations . . . . . . . . . . . . . . . . . . . 1393.3.2 Planning and designing a Microsoft Cluster installation . . . . . . . . . 1393.3.3 Microsoft Cluster Service installation . . . . . . . . . . . . . . . . . . . . . . . 141

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster 1834.1 Implementing IBM Tivoli Workload Scheduler in an HACMP cluster . . . 184

4.1.1 IBM Tivoli Workload Scheduler implementation overview . . . . . . . 1844.1.2 Preparing to install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1884.1.3 Installing the IBM Tivoli Workload Scheduler engine . . . . . . . . . . . 1914.1.4 Configuring the IBM Tivoli Workload Scheduler engine . . . . . . . . . 1924.1.5 Installing IBM Tivoli Workload Scheduler Connector . . . . . . . . . . . 1944.1.6 Setting the security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1984.1.7 Add additional IBM Tivoli Workload Scheduler Connector instance 2014.1.8 Verify IBM Tivoli Workload Scheduler behavior in HACMP cluster. 2024.1.9 Applying IBM Tivoli Workload Scheduler fix pack . . . . . . . . . . . . . . 2044.1.10 Configure HACMP for IBM Tivoli Workload Scheduler . . . . . . . . . 2104.1.11 Add IBM Tivoli Management Framework . . . . . . . . . . . . . . . . . . . 3034.1.12 Production considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3404.1.13 Just one IBM Tivoli Workload Scheduler instance . . . . . . . . . . . . 345

4.2 Implementing IBM Tivoli Workload Scheduler in a Microsoft Cluster . . . 3474.2.1 Single instance of IBM Tivoli Workload Scheduler . . . . . . . . . . . . . 3474.2.2 Configuring the cluster group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3794.2.3 Two instances of IBM Tivoli Workload Scheduler in a cluster. . . . . 3834.2.4 Installation of the IBM Tivoli Management Framework . . . . . . . . . . 3964.2.5 Installation of Job Scheduling Services. . . . . . . . . . . . . . . . . . . . . . 4014.2.6 Installation of Job Scheduling Connector . . . . . . . . . . . . . . . . . . . . 4024.2.7 Creating Connector instances. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4054.2.8 Interconnecting the two Tivoli Framework Servers . . . . . . . . . . . . . 4054.2.9 Installing the Job Scheduling Console . . . . . . . . . . . . . . . . . . . . . . 4084.2.10 Scheduled outage configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 410

Chapter 5. Implement IBM Tivoli Management Framework in a cluster . 4155.1 Implement IBM Tivoli Management Framework in an HACMP cluster . . 416

iv High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

5.1.1 Inventory hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4175.1.2 Planning the high availability design . . . . . . . . . . . . . . . . . . . . . . . . 4185.1.3 Create the shared disk volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4205.1.4 Install IBM Tivoli Management Framework . . . . . . . . . . . . . . . . . . . 4535.1.5 Tivoli Web interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4645.1.6 Tivoli Managed Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4645.1.7 Tivoli Endpoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4665.1.8 Configure HACMP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480

5.2 Implementing Tivoli Framework in a Microsoft Cluster . . . . . . . . . . . . . . 5035.2.1 TMR server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5035.2.2 Tivoli Managed Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5365.2.3 Tivoli Endpoints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555

Appendix A. A real-life implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 571Rationale for IBM Tivoli Workload Scheduler and HACMP integration . . . . . 572Our environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572Installation roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573Software configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574Hardware configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575Installing the AIX operating system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576Finishing the network configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577Creating the TTY device within AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577Testing the heartbeat interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578Configuring shared disk storage devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579Copying installation code to shared storage . . . . . . . . . . . . . . . . . . . . . . . . . 580Creating user accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581Creating group accounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581Installing IBM Tivoli Workload Scheduler software . . . . . . . . . . . . . . . . . . . . 581Installing HACMP software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582Installing the Tivoli TMR software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583

Patching the Tivoli TMR software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583TMR versus Managed Node installation . . . . . . . . . . . . . . . . . . . . . . . . . . 583

Configuring IBM Tivoli Workload Scheduler start and stop scripts. . . . . . . . . 584Configuring miscellaneous start and stop scripts . . . . . . . . . . . . . . . . . . . . . . 584Creating and modifying various system files . . . . . . . . . . . . . . . . . . . . . . . . . 585Configuring the HACMP environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585Testing the failover procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585

HACMP Cluster topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586HACMP Cluster Resource Group topology. . . . . . . . . . . . . . . . . . . . . . . . 588ifconfig -a. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589

Skills required to implement IBM Tivoli Workload Scheduling/HACMP . . . . . 590Observations and questions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594

Contents v

Appendix B. TMR clustering for Tivoli Framework 3.7b on MSCS . . . . . 601Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602

Configure the wlocalhost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602Install Framework on the primary node. . . . . . . . . . . . . . . . . . . . . . . . . . . 602Install Framework on the secondary node . . . . . . . . . . . . . . . . . . . . . . . . 603

Configure the TMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603Set the root administrators login . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603Force the oserv to bind to the virtual IP . . . . . . . . . . . . . . . . . . . . . . . . . . 603Change the name of the DBDIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604Modify the setup_env.cmd and setup_env.sh. . . . . . . . . . . . . . . . . . . . . . 604Configure the registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604Rename the Managed Node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604Rename the TMR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605Rename the top-level policy region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605Rename the root administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605Configure the ALIDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606

Create the cluster resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606Create the oserv cluster resource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606Create the trip cluster resource. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606Set up the resource dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607

Validate and backup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607Test failover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607Back up the Tivoli databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607

Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609

Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615

vi High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Notices

This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.

© Copyright IBM Corp. 2004. All rights reserved. vii

TrademarksThe following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:

AFS®AIX®Balance®DB2®DFS™Enterprise Storage Server®IBM®LoadLeveler®

Maestro™NetView®Planet Tivoli®PowerPC®pSeries®Redbooks™Redbooks (logo) ™RS/6000®

SAA®Tivoli Enterprise™Tivoli®TotalStorage®WebSphere®^™z/OS®

The following terms are trademarks of other companies:

Intel, Intel Inside (logos), and Pentium are trademarks of Intel Corporation in the United States, other countries, or both.

Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Other company, product, and service names may be trademarks or service marks of others.

viii High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Preface

This IBM® Redbook is intended to be used as a major reference for designing and creating highly available IBM Tivoli® Workload Scheduler and Tivoli Framework environments. IBM Tivoli Workload Scheduler Version 8.2 is the IBM strategic scheduling product that runs on many different platforms, including the mainframe. Here, we describe how to install ITWS Version 8.2 in a high availability (HA) environment and configure it to meet high availability requirements. The focus is on the IBM Tivoli Workload Scheduler Version 8.2 Distributed product, although some issues specific to Version 8.1 and IBM Tivoli Workload Scheduler for z/OS® are also briefly covered.

When implementing a highly available IBM Tivoli Workload Scheduler environment, you have to consider high availability for both IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework environments, because IBM Tivoli Workload Scheduler uses IBM Tivoli Management Framework's services for authentication. Therefore, we discuss techniques you can use to successfully implement IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework (TMR server, Managed Nodes and Endpoints), and we present two major case studies: High-Availability Cluster Multiprocessing (HACMP) for AIX®, and Microsoft® Windows® Cluster Service.

The implementation of IBM Tivoli Workload Scheduler within a high availability environment will vary from platform to platform and from customer to customer, based on the needs of the installation. Here, we cover the most common scenarios and share practical implementation tips. We also make recommendations for other high availability platforms; although there are many different clustering technologies in the market today, they are similar enough to allow us to offer useful advice regarding the implementation of a highly available scheduling system.

Finally, although we basically cover highly available scheduling systems, we also offer a section for customers who want to implement a highly available IBM Tivoli Management Framework environment, but who are not currently using IBM Tivoli Workload Scheduler.

The team that wrote this redbookThis redbook was produced by a team of specialists from around the world working at the International Technical Support Organization, Austin Center.

© Copyright IBM Corp. 2004. All rights reserved. ix

Vasfi Gucer is an IBM Certified Consultant IT Specialist at the ITSO Austin Center. He has been with IBM Turkey for 10 years, and has worked at the ITSO since January 1999. He has more than 10 years of experience in systems management, networking hardware, and distributed platform software. He has worked on various Tivoli customer projects as a Systems Architect and Consultant in Turkey and in the United States, and is also a Certified Tivoli Consultant.

Satoko Egawa is an I/T Specialist with IBM Japan. She has five years of experience in systems management solutions. Her area of expertise is job scheduling solutions using Tivoli Workload Scheduler. She is also a Tivoli Certified Consultant, and in the past has worked closely with the Tivoli Rome Lab.

David Oswald is a Certified IBM Tivoli Services Specialist in New Jersey, United States, who works on IBM Tivoli Workload Scheduling and Tivoli storage architectures/deployments (TSRM, TSM,TSANM) for IBM customers located in the United States, Europe, and Latin America. He has been involved in disaster recovery, UNIX administration, shell scripting and automation for 17 years, and has worked with TWS Versions 5.x, 6.x, 7.x, and 8.x. While primarily a Tivoli services consultant, he is also involved in Tivoli course development, Tivoli certification exams, and Tivoli training efforts.

Geoff Pusey is a Senior I/T Specialist in the IBM Tivoli Services EMEA region. He is a Certified IBM Tivoli Workload Scheduler Consultant and has been with Tivoli/IBM since January 1998, when Unison Software was acquired by Tivoli Systems. He has worked with the IBM Tivoli Workload Scheduling product for the last 10 years as a consultant, performing customer training, implementing and customizing IBM Tivoli Workload Scheduler, creating customized scripts to generate specific reports, and enhancing IBM Tivoli Workload Scheduler with new functions.

John Webb is a Senior Consultant for Tivoli Services Latin America. He has been with IBM since 1998. Since joining IBM, John has made valuable contributions to the company through his knowledge and expertise in enterprise systems management. He has deployed and designed systems for numerous customers, and his areas of expertise include the Tivoli Framework and Tivoli PACO products.

Anthony Yen is a Senior IT Consultant with IBM Business Partner Automatic IT Corporation, <http://www.AutomaticIT.com>, in Austin, Texas, United States. He has delivered 19 projects involving 11 different IBM Tivoli products over the past six years. His areas of expertise include Enterprise Console, Monitoring, Workload Scheduler, Configuration Manager, Remote Control, and NetView®. He has given talks at Planet Tivoli® and Automated Systems And Planning OPC and TWS Users Conference (ASAP), and has taught courses on IBM Tivoli

x High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Workload Scheduler. Before that, he worked in the IT industry for 10 years as a UNIX and Windows system administrator. He has been an IBM Certified Tivoli Consultant since 1998.

Thanks to the following people for their contributions to this project:

Octavian Lascu, Dino QuinteroInternational Technical Support Organization, Poughkeepsie Center

Jackie Biggs, Warren Gill, Elaine Krakower, Tina Lamacchia, Grant McLaughlin, Nick LopezIBM USA

Antonio GallottiIBM Italy

Become a published authorJoin us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You'll team with IBM technical professionals, Business Partners and/or customers.

Your efforts will help increase product acceptance and customer satisfaction. As a bonus, you'll develop a network of contacts in IBM development labs, and increase your productivity and marketability.

Find out more about the residency program, browse the residency index, and apply online at:

ibm.com/Redbooks/residencies.html

Comments welcomeYour comments are important to us!

We want our Redbooks™ to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways:

� Use the online Contact us review Redbook form found at:

ibm.com/Redbooks

� Send your comments in an Internet note to:

[email protected]

Preface xi

http://www.redbooks.ibm.com/residencies.html

http://www.redbooks.ibm.com/residencies.html


http://www.ibm.com/redbooks/


http://www.redbooks.ibm.com/contacts.html

� Mail your comments to:

IBM Corporation, International Technical Support OrganizationDept. JN9B Building 003 Internal Zip 283411400 Burnet RoadAustin, Texas 78758-3493

xii High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Chapter 1. Introduction

In this chapter, we introduce the IBM Tivoli Workload Scheduler suite and identify the need for high availability by IBM Tivoli Workload Scheduler users. Important ancillary concepts in IBM Tivoli Management Framework (also referred as Tivoli Framework, or TMF) and clustering technologies are introduced for new users as well.

The following topics are covered in this chapter:

� “IBM Tivoli Workload Scheduler architectural overview” on page 2

� “IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework” on page 4

� “High availability terminology used in this book” on page 7

� “Overview of clustering technologies” on page 8

� “When to implement IBM Tivoli Workload Scheduler high availability” on page 24

� “Material covered in this book” on page 27

1

© Copyright IBM Corp. 2004. All rights reserved. 1

1.1 IBM Tivoli Workload Scheduler architectural overview

IBM Tivoli Workload Scheduler Version 8.2 is the IBM strategic scheduling product that runs on many different platforms, including the mainframe. This redbook covers installing ITWS Version 8.2 in a high availability (HA) environment and configuring it to meet high availability requirements. The focus is on the IBM Tivoli Workload Scheduler Version 8.2 Distributed product, although some issues specific to Version 8.1 and IBM Tivoli Workload Scheduler for z/OS are also briefly covered.

Understanding specific aspects of IBM Tivoli Workload Scheduler’s architecture is key to a successful high availability implementation. In-depth knowledge of the architecture is necessary for resolving some problems that might present themselves during the deployment of IBM Tivoli Workload Scheduler in an HA environment. We will only identify those aspects of the architecture that are directly involved with an high availability deployment. For a detailed discussion of IBM Tivoli Workload Scheduler’s architecture, refer to Chapter 2, “Overview”, in IBM Tivoli Workload Scheduling Suite Version 8.2, General Information, SC32-1256.

IBM Tivoli Workload Scheduler uses the TCP/IP-based network connecting an enterprise’s servers to accomplish its mission of scheduling jobs. A job is an executable file, program, or command that is scheduled and launched by IBM Tivoli Workload Scheduler. All servers that run jobs using IBM Tivoli Workload Scheduler make up the scheduling network.

A scheduling network contains at least one domain, the master domain, in which a server designated as the Master Domain Manager (MDM) is the management hub. This server contains the definitions of all scheduling objects that define the batch schedule, stored in a database. Additional domains can be used to divide a widely distributed network into smaller, locally managed groups. The management hubs for these additional domains are called Domain Manager servers.

Each server in the scheduling network is called a workstation, or by the interchangeable term CPU. There are different types of workstations that serve different roles. For the purposes of this publication, it is sufficient to understand that a workstation can be one of the following types. You have already been introduced to one of them, the Master Domain Manager. The other types of workstations are Domain Manager (DM) and Fault Tolerant Agent (FTA).

Figure 1-1 on page 3 shows the relationship between these architectural elements in a sample scheduling network.

2 High Availability Scenarios with IBM Tivoli Workload Scheduler and IBM Tivoli Framework

Figure 1-1 Main architectural elements of IBM Tivoli Workload Scheduler relevant to high availability

The lines between the workstations show how IBM Tivoli Workload Scheduler communicates between them. For example, if the MDM needs to send a command to FTA2, it would pass the command via DM_A. In this example scheduling network, the Master Domain Manager is the management hub for two Domain Managers, DM_A and DM_B. Each Domain Manager in turn is the management hub for two Fault Tolerant Agents. DM_A is the hub for FTA1 and FTA2, and DM_B is the hub for FTA3 and FTA4.

IBM Tivoli Workload Scheduler operations revolve around a production day, a 24-hour cycle initiated by a job called Jnextday that runs on the Master Domain

Domain Manager

DM_B

MASTERDM

Master Domain Manager

AIX

AIXDomain Manager

DM_A

HPUX

AIX Windows 2000 Solaris

DomainA DomainB

FTA1 FTA2 FTA3 FTA4

OS/400

MASTERDM

Domain Manager

DM_B

MASTERDM

Master Domain Manager

AIX

AIXDomain Manager

DM_A

HPUX

AIX Windows 2000 Solaris

DomainA DomainB

FTA1 FTA2 FTA3 FTA4

OS/400

MASTERDM

Chapter 1. Introduction 3

Manager. Interrupting or delaying this process presents serious ramifications for the proper functioning of the scheduling network.

Based upon this architecture, we determined that making IBM Tivoli Workload Scheduler highly available requires configuring at least the Master Domain Manager server for high availability. This delivers high availability of the scheduling object definitions. In some sites, even the Domain Manager and Fault Tolerant Agent servers are configured for high availability, depending upon specific business requirements.

1.2 IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework

IBM Tivoli Workload Scheduler provides out-of-the-box integration with up to six other IBM products:

� IBM Tivoli Management Framework

� IBM Tivoli Business Systems Manager

� IBM Tivoli Enterprise Console

� IBM Tivoli NetView

� IBM Tivoli Distributed Monitoring (Classic Edition)

� IBM Tivoli Enterprise Data Warehouse

Other IBM Tivoli products, such as IBM Tivoli Configuration Manager, can also be integrated with IBM Tivoli Workload Scheduler but require further configuration not provided out of the box. Best practices call for implementing IBM Tivoli Management Framework on the same Master Domain Manager server used by IBM Tivoli Workload Scheduler.

Figure 1-2 on page 5 shows a typical configuration of all six products, hosted on five servers (IBM Tivoli Business Systems Manager is often hosted on two separate servers).


Figure 1-2 Typical site configuration of all Tivoli products that can be integrated with IBM Tivoli Workload Scheduler out of the box

In this redbook, we show how to configure IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework for high availability, corresponding to the upper left server in the preceding example site configuration. Sites that want to implement other products on an IBM Tivoli Workload Scheduler Master Domain Manager server for high availability should consult their IBM service provider.

IBM Tivoli Workload Scheduler uses IBM Tivoli Management Framework to deliver authentication services for the Job Scheduling Console GUI client, and to communicate with the Job Scheduling Console in general. Two components are used within IBM Tivoli Management Framework to accomplish these responsibilities: the Connector, and Job Scheduling Services (JSS). These components are only required on the Master Domain Manager server.

For the purposes of this redbook, be aware that high availability of IBM Tivoli Workload Scheduler requires proper configuration of IBM Tivoli Management Framework, all Connector instances, and the Job Scheduling Services component.

Figure 1-3 on page 6 shows the relationships between IBM Tivoli Management Framework, the Job Scheduling Services component, the IBM Tivoli Workload Scheduler job scheduling engine, and the Job Scheduling Console.

IBM Tivoli Workload SchedulerIBM Tivoli Management Framework

IBM Tivoli Management FrameworkIBM Tivoli NetViewIBM Tivoli Distributed Monitoring

IBM Tivoli Management FrameworkIBM Tivoli Enterprise ConsoleIBM Tivoli Enterprise Data Warehouse

IBM Tivoli Business SystemsManager


Figure 1-3 Relationship between major components of IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework

In this example, Job Scheduling Console instances on three laptops are connected to a single instance of IBM Tivoli Management Framework. This instance of IBM Tivoli Management Framework serves two different scheduling networks called Production_A and Production_B via two Connectors called Connector_A and Connector_B. Note that there is only ever one instance of the

Tivoli ManagementFramework

Pro

du

ctio

n_A

Pro

du

ctio

n_B

Job SchedulingServices

Connector_A Connector_B

Job Scheduling Consoles


Job Scheduling Services component no matter how many instances of the Connector and Job Scheduling Console exist in the environment.

It is possible to install IBM Tivoli Workload Scheduler without using the Connector and Job Scheduling Services components. However, without these components the benefits of the Job Scheduling Console cannot be realized. This is only an option if a customer is willing to perform all operations from just the command line interface.

In high availability contexts, both IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework are typically deployed in a high availability environment. In this Redbook, we will show how to deploy IBM Tivoli Workload Scheduler both with and without IBM Tivoli Management Framework.

1.3 High availability terminology used in this bookIt helps to share a common terminology for concepts used in this redbook. The high availability field often uses multiple terms for the same concept, but in this redbook, we adhere to conventions set by International Business Machines Corporation whenever possible.

Cluster This refers to a group of servers configured for high availability of one or more applications.

Node This refers to a single server in a cluster.

Primary This refers to a node that initially runs an application when a cluster is started.

Backup This refers to one or more nodes that are designated as the servers an application will be migrated to if the application’s primary node fails.

Joining This refers to the process of a node announcing its availability to the cluster.

Fallover This refers to the process of a backup node taking over an application from a failed primary node.

Reintegration This refers to the process of a failed primary node that was repaired rejoining a cluster. Note that the primary node’s application does not necessarily have to migrate back to the primary node. See fallback.

Fallback This refers to the process of migrating an application from a backup node to a primary node. Note that the primary node does not have to be the original primary node (for example, it can be a new node that joins the cluster).


For more terms commonly used when configuring high availability, refer to High Availability Cluster Multi-Processing for AIX Master Glossary, Version 5.1, SC23-4867.

1.4 Overview of clustering technologiesIn this section we give an overview of clustering technologies with respect to high availability. A cluster is a group of loosely coupled machines networked together, sharing disk resources. While clusters can be used for more than just their high availability benefits (like cluster multi-processing), in this document we are only concerned with illustrating the high availability benefits; consult your IBM service provider for information about how to take advantage of the other benefits of clusters for IBM Tivoli Workload Scheduler.

Clusters provide a highly available environment for mission-critical applications. For example, a cluster could run a database server program which services client applications on other systems. Clients send queries to the server program, which responds to their requests by accessing a database stored on a shared external disk. A cluster takes measures to ensure that the applications remain available to client processes even if a component in a cluster fails. To ensure availability, in case of a component failure, a cluster moves the application (along with resources that ensure access to the application) to another node in the cluster.

1.4.1 High availability versus fault toleranceIt is important for you to understand that we are detailing how to install IBM Tivoli Workload Scheduler in a highly available, but not a fault-tolerant, configuration.

Fault tolerance relies on specialized hardware to detect a hardware fault and instantaneously switch to a redundant hardware component (whether the failed component is a processor, memory board, power supply, I/O subsystem, or storage subsystem). Although this cut-over is apparently seamless and offers non-stop service, a high premium is paid in both hardware cost and performance because the redundant components do no processing. More importantly, the fault-tolerant model does not address software failures, by far the most common reason for downtime.

High availability views availability not as a series of replicated physical components, but rather as a set of system-wide, shared resources that cooperate to guarantee essential services. High availability combines software with industry-standard hardware to minimize downtime by quickly restoring essential services when a system, component, or application fails. While not instantaneous, services are restored rapidly, often in less than a minute.


The difference between fault tolerance and high availability, then, is this: a fault-tolerant environment has no service interruption, while a highly available environment has a minimal service interruption. Many sites are willing to absorb a small amount of downtime with high availability rather than pay the much higher cost of providing fault tolerance. Additionally, in most highly available configurations, the backup processors are available for use during normal operation.

High availability systems are an excellent solution for applications that can withstand a short interruption should a failure occur, but which must be restored quickly. Some industries have applications so time-critical that they cannot withstand even a few seconds of downtime. Many other industries, however, can withstand small periods of time when their database is unavailable. For those industries, HACMP can provide the necessary continuity of service without total redundancy.

Figure 1-4 shows the costs and benefits of availability technologies.

Figure 1-4 Cost and benefits of availability technologies

As you can see, availability is not an all-or-nothing proposition. Think of availability as a continuum. Reliable hardware and software provide the base level of availability. Advanced features such as RAID devices provide an enhanced level of availability. High availability software provides near-continuous


access to data and applications. Fault-tolerant systems ensure the constant availability of the entire system, but at a higher cost.

1.4.2 Server versus job availabilityYou should also be aware of the difference between availability of the server and availability of the jobs the server runs. This redbook shows how to implement a highly available server. Ensuring the availability of the jobs is addressed on a job-by-job basis.

For example, Figure 1-5 shows a production day with four job streams, labeled A, B, C and D. In this example, a failure incident occurs in between job stream B and D, during a period of the production day when no other job streams are running.

Figure 1-5 Example disaster recovery incident where no job recovery is required

Because no jobs or job streams are running at the moment of the failure, making IBM Tivoli Workload Scheduler itself highly available is sufficient to bring back scheduling services. No recovery of interrupted jobs is required.

Now suppose that job streams B and D must complete before a database change is committed. If the failure happened during job stream D as in Figure 1-6 on page 11, then before IBM Tivoli Workload Scheduler is restarted on a new server, the database needs to be rolled back so that when job stream B is restarted, it will not corrupt the database.

Production Day

Job Stream A

Job Stream B

Job Stream C

Job Stream D

FailureIncident


Figure 1-6 Example disaster recovery incident where job recovery not related to IBM Tivoli Workload Scheduler is required

This points out some important observations about high availability with IBM Tivoli Workload Scheduler.

� It is your responsibility to ensure that the application-specific business logic of your application is preserved across a disaster incident.

For example, IBM Tivoli Workload Scheduler cannot know that a database needs to be rolled back before a job stream is restarted as part of a high availability recovery.

� Knowing what job streams and jobs to restart after IBM Tivoli Workload Scheduler falls over to a backup server is dependent upon the specific business logic of your production plan.

In fact, it is critical to the success of a recovery effort that the precise state of the production day at the moment of failure is communicated to the team performing the recovery.

Let’s look at Figure 1-7 on page 12, which illustrates an even more complex situation: multiple job streams are interrupted, each requiring its own, separate recovery activity.

Production Day

Job Stream A

Job Stream B

Job Stream C

FailureIncident

Job Stream D


Figure 1-7 Example disaster recovery incident requiring multiple, different job recovery actions

The recovery actions for job stream A in this example are different from the recovery actions for job stream B. In fact, depending upon the specifics of what your jobs and job streams run, the recovery action for a job stream that are required after a disaster incident could be different depending upon what jobs in a job stream finished before the failure.

The scenario this redbook is most directly applicable towards is restarting an IBM Tivoli Workload Scheduler Master Domain Manager server on a highly available cluster where no job streams other than FINAL are executed. The contents of this redbook can also be applied to Master Domain Manager, Domain Manager, and Fault Tolerant Agent servers that run job streams requiring specific recovery actions as part of a high availability recovery. But implementing these scenarios requires simultaneous implementation of high availability for the individual jobs. The exact details of such implementations are specific to your jobs, and cannot be generalized in a “cookbook” manner.

If high availability at the job level is an important criteria, your IBM service provider can help you to implement it.

1.4.3 Standby versus takeover configurationsThere are two basic types of cluster configurations:

Standby This is the traditional redundant hardware configuration. One or more standby nodes are set aside idling, waiting for a primary server in the cluster to fail. This is also known as hot standby.

Production Day

Job Stream D

FailureIncident

Job Stream A

Job Stream B

Job Stream C


Takeover In this configuration, all cluster nodes process part of the cluster’s workload. No nodes are set aside as standby nodes. When a primary node fails, one of the other nodes assumes the workload of the failed node in addition to its existing primary workload. This is also known as mutual takeover.

Typically, implementations of both configurations will involve shared resources. Disks or mass storage like a Storage Area Network (SAN) are most frequently configured as a shared resource.

Figure 1-8 shows a standby configuration in normal operation, where Node A is the primary node, and Node B is the standby node and currently idling. While Node B has a connection the shared mass storage resource, it is not active during normal operation.

Figure 1-8 Standby configuration in normal operation

After Node A falls over to Node B, the connection to the mass storage resource from Node B will be activated, and because Node A is unavailable, its connection to the mass storage resource is inactive. This is shown in Figure 1-9 on page 14.

Node A

MassStorage

Node B

Sta

nd

by

(id

le)


Figure 1-9 Standby configuration in fallover operation

By contrast, a takeover configuration of this environment accesses the shared disk resource at the same time. For IBM Tivoli Workload Scheduler high availability configurations, this usually means that the shared disk resource has separate, logical filesystem volumes, each accessed by a different node. This is illustrated by Figure 1-10 on page 15.

Node A(down)

MassStorage

Node B

Sta

nd

by

(act

ive)X


Figure 1-10 Takeover configuration in normal operation

During normal operation of this two-node highly available cluster in a takeover configuration, the filesystem Node A FS is accessed by App 1 on Node A, while the filesystem Node B FS is accessed by App 2 on Node B. If either node fails, the other node will take on the workload of the failed node. For example, if Node A fails, App 1 is restarted on Node B, and Node B opens a connection to filesystem Node A FS. This fallover scenario is illustrated by Figure 1-11 on page 16.

Node A

Mass Storage

Node B

Node A FS

Node B FS

App 1 App 2


Figure 1-11 Takeover configuration in fallover operation

Takeover configurations are more efficient with hardware resources than standby configurations because there are no idle nodes. Performance can degrade after a node failure, however, because the overall load on the remaining nodes increases.

In this redbook we will be showing how to configure IBM Tivoli Workload Scheduler for takeover high availability.

1.4.4 IBM HACMPThe IBM tool for building UNIX-based, mission-critical computing platforms is the HACMP software. The HACMP software ensures that critical resources, such as applications, are available for processing. HACMP has two major components: high availability (HA) and cluster multi-processing (CMP). In this document we focus upon the HA component.

The primary reason to create HACMP Clusters is to provide a highly available environment for mission-critical applications. For example, an HACMP Cluster could run a database server program that services client applications. The clients send queries to the server program, which responds to their requests by accessing a database stored on a shared external disk.

Node A

Mass Storage

Node B

Node A FS

Node B FS

App 1

App 2X


In an HACMP Cluster, to ensure the availability of these applications, the applications are put under HACMP control. HACMP takes measures to ensure that the applications remain available to client processes even if a component in a cluster fails. To ensure availability, in case of a component failure, HACMP moves the application (along with resources that ensure access to the application) to another node in the cluster.

BenefitsHACMP helps you with each of the following:

� The HACMP planning process and documentation include tips and advice on the best practices for installing and maintaining a highly available HACMP Cluster.

� Once the cluster is operational, HACMP provides the automated monitoring and recovery for all the resources on which the application depends.

� HACMP provides a full set of tools for maintaining the cluster, while keeping the application available to clients.

HACMP lets you:

� Set up an HACMP environment using online planning worksheets to simplify initial planning and setup.

� Ensure high availability of applications by eliminating single points of failure in an HACMP environment.

� Leverage high availability features available in AIX.

� Manage how a cluster handles component failures.

� Secure cluster communications.

� Set up fast disk takeover for volume groups managed by the Logical Volume Manager (LVM).

� Manage event processing for an HACMP environment.

� Monitor HACMP components and diagnose problems that may occur.

For a general overview of all HACMP features, see the IBM Web site:

http://www-1.ibm.com/servers/aix/products/ibmsw/high_avail_network/hacmp.html

Enhancing availability with the AIX softwareHACMP takes advantage of the features in AIX, which is the high-performance UNIX operating system.

AIX Version 5.1 adds new functionality to further improve security and system availability. This includes improved availability of mirrored data and



enhancements to Workload Manager that help solve problems of mixed workloads by dynamically providing resource availability to critical applications. Used with the IBM IBM ^™ pSeries®, HACMP can provide both horizontal and vertical scalability, without downtime.

The AIX operating system provides numerous features designed to increase system availability by lessening the impact of both planned (data backup, system administration) and unplanned (hardware or software failure) downtime. These features include:

� Journaled File System and Enhanced Journaled File System

� Disk mirroring

� Process control

� Error notification

The IBM HACMP software provides a low-cost commercial computing environment that ensures that mission-critical applications can recover quickly from hardware and software failures. The HACMP software is a high availability system that ensures that critical resources are available for processing. High availability combines custom software with industry-standard hardware to minimize downtime by quickly restoring services when a system, component, or application fails. While not instantaneous, the restoration of service is rapid, usually 30 to 300 seconds.

Physical components of an HACMP ClusterHACMP provides a highly available environment by identifying a set of resources essential to uninterrupted processing, and by defining a protocol that nodes use to collaborate to ensure that these resources are available. HACMP extends the clustering model by defining relationships among cooperating processors where one processor provides the service offered by a peer, should the peer be unable to do so.

An HACMP Cluster is made up of the following physical components:

� Nodes

� Shared external disk devices

� Networks

� Network interfaces

� Clients

The HACMP software allows you to combine physical components into a wide range of cluster configurations, providing you with flexibility in building a cluster that meets your processing requirements. Figure 1-12 on page 19 shows one


example of an HACMP Cluster. Other HACMP Clusters could look very different, depending on the number of processors, the choice of networking and disk technologies, and so on.

Figure 1-12 Example HACMP Cluster

NodesNodes form the core of an HACMP Cluster. A node is a processor that runs both AIX and the HACMP software. The HACMP software supports pSeries uniprocessor and symmetric multiprocessor (SMP) systems, and the Scalable POWERParallel processor (SP) systems as cluster nodes. To the HACMP software, an SMP system looks just like a uniprocessor. SMP systems provide a cost-effective way to increase cluster throughput. Each node in the cluster can be a large SMP machine, extending an HACMP Cluster far beyond the limits of a single system and allowing thousands of clients to connect to a single database.


In an HACMP Cluster, up to 32 RS/6000® or pSeries stand-alone systems, pSeries divided into LPARS, SP nodes, or a combination of these cooperate to provide a set of services or resources to other entities. Clustering these servers to back up critical applications is a cost-effective high availability option. A business can use more of its computing power, while ensuring that its critical applications resume running after a short interruption caused by a hardware or software failure.

In an HACMP Cluster, each node is identified by a unique name. A node may own a set of resources (disks, volume groups, filesystems, networks, network addresses, and applications). Typically, a node runs a server or a “back-end” application that accesses data on the shared external disks.

The HACMP software supports from 2 to 32 nodes in a cluster, depending on the disk technology used for the shared external disks. A node in an HACMP Cluster has several layers of software components.

Shared external disk devicesEach node must have access to one or more shared external disk devices. A shared external disk device is a disk physically connected to multiple nodes. The shared disk stores mission-critical data, typically mirrored or RAID-configured for data redundancy. A node in an HACMP Cluster must also have internal disks that store the operating system and application binaries, but these disks are not shared.

Depending on the type of disk used, the HACMP software supports two types of access to shared external disk devices: non-concurrent access, and concurrent access.

� In non-concurrent access environments, only one connection is active at any given time, and the node with the active connection owns the disk. When a node fails, disk takeover occurs when the node that currently owns the disk leaves the cluster and a surviving node assumes ownership of the shared disk. This is what we show in this redbook.

� In concurrent access environments, the shared disks are actively connected to more than one node simultaneously. Therefore, when a node fails, disk takeover is not required. We do not show this here because concurrent access does not support the use of the Journaled File System (JFS), and JFS is required to use either IBM Tivoli Workload Scheduler or IBM Tivoli Management Framework.

NetworksAs an independent, layered component of AIX, the HACMP software is designed to work with any TCP/IP-based network. Nodes in an HACMP Cluster use the network to allow clients to access the cluster nodes, enable cluster nodes to


exchange heartbeat messages and, in concurrent access environments, serialize access to data. The HACMP software has been tested with Ethernet, Token-Ring, ATM, and other networks.

The HACMP software defines two types of communication networks, characterized by whether these networks use communication interfaces based on the TCP/IP subsystem (TCP/IP-based), or communication devices based on non-TCP/IP subsystems (device-based).

ClientsA client is a processor that can access the nodes in a cluster over a local area network. Clients each run a front-end or client application that queries the server application running on the cluster node.

The HACMP software provides a highly available environment for critical data and applications on cluster nodes. Note that the HACMP software does not make the clients themselves highly available. AIX clients can use the Client Information (Clinfo) services to receive notice of cluster events. Clinfo provides an API that displays cluster status information. The /usr/es/sbin/cluster/clstat utility, a Clinfo client shipped with the HACMP software, provides information about all cluster service interfaces.

The clients for IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework are the Job Scheduling Console and the Tivoli Desktop applications, respectively. These clients do not support the Clinfo API, but feedback that the cluster server is not available is immediately provided within these clients.

1.4.5 Microsoft Cluster ServiceMicrosoft Cluster Service (MSCS) provides three primary services:

Availability Continue providing a service even during hardware or software failure. This redbook focuses upon leveraging this feature of MSCS.

Scalability Enable additional components to be configured as system load increases.

Simplification Manage groups of systems and their applications as a single system.

MSCS is a built-in feature of Windows NT/2000 Server Enterprise Edition. It is software that supports the connection of two servers into a cluster for higher availability and easier manageability of data and applications. MSCS can automatically detect and recover from server or application failures. It can be used to move server workload to balance utilization and to provide for planned maintenance without downtime.


MSCS uses software heartbeats to detect failed applications or servers. In the event of a server failure, it employs a shared nothing clustering architecture that automatically transfers ownership of resources (such as disk drives and IP addresses) from a failed server to a surviving server. It then restarts the failed server’s workload on the surviving server. All of this, from detection to restart, typically takes under a minute. If an individual application fails (but the server does not), MSCS will try to restart the application on the same server. If that fails, it moves the application’s resources and restarts it on the other server.

MSCS does not require any special software on client computers; so, the user experience during failover depends on the nature of the client side of their client-server application. Client reconnection is often transparent because MSCS restarts the application using the same IP address.

If a client is using stateless connections (such as a browser connection), then it would be unaware of a failover if it occurred between server requests. If a failure occurs when a client is connected to the failed resources, then the client will receive whatever standard notification is provided by the client side of the application in use.

For a client side application that has statefull connections to the server, a new logon is typically required following a server failure.

No manual intervention is required when a server comes back online following a failure. As an example, when a server that is running Microsoft Cluster Server (server A) boots, it starts the MSCS service automatically. MSCS in turn checks the interconnects to find the other server in its cluster (server B). If server A finds server B, then server A rejoins the cluster and server B updates it with current cluster information. Server A can then initiate a failback, moving back failed-over workload from server B to server A.

Microsoft Cluster Service conceptsMicrosoft provides an overview of MSCS in a white paper that is available at:

http://www.microsoft.com/ntserver/ProductInfo/Enterprise/clustering/ClustArchit.asp

The key concepts of MSCS are covered in this section.

Shared nothingMicrosoft Cluster employs a shared nothing architecture in which each server owns its own disk resources (that is, they share nothing at any point in time). In the event of a server failure, a shared nothing cluster has software that can transfer ownership of a disk from one server to another.



Cluster ServicesCluster Services is the collection of software on each node that manages all cluster-specific activity.

ResourceA resource is the canonical item managed by the Cluster Service. A resource may include physical hardware devices (such as disk drives and network cards), or logical items (such as logical disk volumes, TCP/IP addresses, entire applications, and databases).

GroupA group is a collection of resources to be managed as a single unit. A group contains all of the elements needed to run a specific application and for client systems to connect to the service provided by the application. Groups allow an administrator to combine resources into larger logical units and manage them as a unit. Operations performed on a group affect all resources within that group.

FallbackFallback (also referred as failback) is the ability to automatically rebalance the workload in a cluster when a failed server comes back online. This is a standard feature of MSCS. For example, say server A has crashed, and its workload failed over to server B. When server A reboots, it finds server B and rejoins the cluster. It then checks to see if any of the Cluster Group running on server B would prefer to be running in server A. If so, it automatically moves those groups from server B to server A. Fallback properties include information such as which group can fallback, which server is preferred, and during what hours the time is right for a fallback. These properties can all be set from the cluster administration console.

Quorum DiskA Quorum Disk is a disk spindle that MSCS uses to determine whether another server is up or down.

When a cluster member is booted, it searches whether the cluster software is already running in the network:

� If it is running, the cluster member joins the cluster.

� If it is not running, the booting member establishes the cluster in the network.

A problem may occur if two cluster members are restarting at the same time, thus trying to form their own clusters. This potential problem is solved by the Quorum Disk concept. This is a resource that can be owned by one server at a time and for which servers negotiate for ownership. The member who has the Quorum Disk creates the cluster. If the member that has the Quorum Disk fails, the resource is reallocated to another member, which in turn, creates the cluster.


Negotiating for the quorum drive allows MSCS to avoid split-brain situations where both servers are active and think the other server is down.

Load balancingLoad balancing is the ability to move work from a very busy server to a less busy server.

Virtual serverA virtual server is the logical equivalent of a file or application server. There is no physical component in the MSCS that is a virtual server. A resource is associated with a virtual server. At any point in time, different virtual servers can be owned by different cluster members. The virtual server entity can also be moved from one cluster member to another in the event of a system failure.

1.5 When to implement IBM Tivoli Workload Scheduler high availability

Specifying the appropriate level of high availability for IBM Tivoli Workload Scheduler often depends upon how much reliability needs to be built into the environment, balanced against the cost of solution. High availability is a spectrum of options, driven by what kind of failures you want IBM Tivoli Workload Scheduler to survive. These options lead to innumerable permutations of high availability configurations and scenarios. Our goal in this redbook is to demonstrate enough of the principles in configuring IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework to be highly available in a specific, non-trivial scenario such that you can use the principles to implement other configurations.

1.5.1 High availability solutions versus Backup Domain ManagerIBM Tivoli Workload Scheduler provides a degree of high availability through its Backup Domain Manager feature, which can also be implemented as a Backup Master Domain Manager. This works by duplicating the changes to the production plan from a Domain Manager to a Backup Domain Manager. When a failure is detected, a switchmgr command is issued to all workstations in the Domain Manager server’s domain, causing these workstations to recognize the Backup Domain Manager.

However, properly implementing a Backup Domain Manager is difficult. Custom scripts have to be developed to implement sensing a failure, transferring the scheduling objects database, and starting the switchmgr command. The code for sensing a failure is by itself a significant effort. Possible failures to code for


include network adapter failure, disk I/O adapter failure, network communications failure, and so on.

If any jobs are run on the Domain Manager, the difficulty of implementing a Backup Domain Manager becomes even more obvious. In this case, the custom scripts also have to convert the jobs to run on the Backup Domain Manager, for instance by changing all references to the workstation name of the Domain Manager to the workstation name of the Backup Domain Manager, and changing references to the hostname of the Domain Manager to the hostname of the Backup Domain Manager.

Then even more custom scripts have to be developed to migrate scheduling object definitions back to the Domain Manager, because once the failure has been addressed, the entire process has to be reversed. The effort required can be more than the cost of acquiring a high availability product, which addresses many of the coding issues that surround detecting hardware failures. The Total Cost of Ownership of maintaining the custom scripts also has to be taken into account, especially if jobs are run on the Domain Manager. All the nuances of ensuring that the same resources that jobs expect on the Domain Manager are met on the Backup Domain Manager have to be coded into the scripts, then documented and maintained over time, presenting a constant drain on internal programming resources.

High availability products like IBM HACMP and Microsoft Cluster Service provide a well-documented, widely-supported means of expressing the required resources for jobs that run on a Domain Manager. This makes it easy to add computational resources (for example, disk volumes) that jobs require into the high availability infrastructure, and keep it easily identified and documented.

Software failures like a critical IBM Tivoli Workload Scheduler process crashing are addressed by both the Backup Domain Manager feature and IBM Tivoli Workload Scheduler configured for high availability. In both configurations, recovery at the job level is often necessary to resume the production day.

Implementing high availability for Fault Tolerant Agents cannot be accomplished using the Backup Domain Manager feature. Providing hardware high availability for a Fault Tolerant Agent server can be accomplished through custom scripting, but using a high availability solution is strongly recommended.

Table 1-1 on page 26 illustrates the comparative advantages of using a high availability solution versus the Backup Domain Manager feature to deliver a highly available IBM Tivoli Workload Scheduler configuration.


Table 1-1 Comparative advantages of using a high availability solution

1.5.2 Hardware failures to plan forWhen identifying the level of high availability for IBM Tivoli Workload Scheduler, potential hardware failures you want to plan for can affect the kind of hardware used for the high availability solution. In this section, we address some of the hardware failures you may want to consider when planning for high availability for IBM Tivoli Workload Scheduler.

Site failure occurs when an entire computer room or data center becomes unavailable. Mitigating this failure involves geographically separate nodes in a high availability cluster. Products like IBM High Availability Geographic Cluster system (HAGEO) deliver a solution for geographic high availability. Consult your IBM service provider for help with implementing geographic high availability.

Server failure occurs when a node in a high availability cluster fails. The minimum response to mitigate this failure mode is to make backup node available. However, you might also want to consider providing more than one backup node if the workstation you are making highly available is important enough to warrant redundant backup nodes. In this redbook we show how to implement a two-node cluster, but additional nodes are an extension to a two-node configuration. Consult your IBM service provider for help with implementing multiple-node configurations.

Network failures occur when either the network itself (through a component like a router or switch), or network adapters on the server, fail. This type of failure is often addressed with redundant network paths in the former case, and redundant network adapters in the latter case.

Disk failure occurs when a shared disk in a high availability cluster fails. Mitigating this failure mode often involves a Redundant Array of Independent Disks (RAID) array. However, even a RAID can catastrophically fail if two or more disk drives fail at the same time, if a power supply fails, or a backup power supply fails at the same time as a primary power supply. Planning for these catastrophic failures usually involves creating one or more mirrors of the RAID array, sometimes even on separate array hardware. Products like the IBM TotalStorage® Enterprise Storage Server® (ESS) and TotalStorage 7133 Serial Disk System can address these kinds of advanced disk availability requirements.

Solution Hardware Software FTA Cost

HA � � � TCO: $$

BMDM � Initially: $TCO: $$


These are only the most common hardware failures to plan for. Other failures may also be considered while planning for high availability.

1.5.3 Summary

In summary, for all but the simplest configuration of IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework, using a high availability solution to deliver high availability services is the recommended approach to satisfy high availability requirements. Identifying the kinds of hardware and software failures you want your IBM Tivoli Workload Scheduler installation to address with high availability is a key part of creating an appropriate high availability solution.

1.6 Material covered in this bookIn the remainder of this redbook, we focus upon the applicable high availability concepts for IBM Tivoli Workload Scheduler, and two detailed implementations of high availability for IBM Tivoli Workload Scheduler, one using IBM HACMP and the other using Microsoft Cluster Service.

In particular, we show you:

� Key architectural design issues and concepts to consider when designing highly available clusters for IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework; refer to Chapter 2, “High level design and architecture” on page 31.

� How to implement an AIX HACMP and Microsoft Cluster Service cluster; refer to Chapter 3, “High availability cluster implementation” on page 63.

� How to implement a highly available installation of IBM Tivoli Workload Scheduler, and a highly available IBM Tivoli Workload Scheduler with IBM Tivoli Management Framework, on AIX HACMP and Microsoft Cluster Service; refer to Chapter 4, “IBM Tivoli Workload Scheduler implementation in a cluster” on page 183.

� How to implement a highly available installation of IBM Tivoli Management Framework on AIX HACMP and Microsoft Cluster Service; refer to Chapter 5, “Implement IBM Tivoli Management Framework in a cluster” on page 415.

The chapters are generally organized around the products we cover in this redbook: AIX HACMP, Microsoft Cluster Service, IBM Tivoli Workload Scheduler, and IBM Tivoli Management Framework. The nature of high availability design and implementation requires that some products and the high availability tool be considered simultaneously, especially during the planning


stage. This tends to lead to a haphazard sequence when applied along any thematic organization, except a straight cookbook recipe approach.

We believe the best results are obtained when we present enough of the theory and practice of implementing highly available IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework installations so that you can apply the illustrated principles to your own requirements. This rules out a cookbook recipe approach in the presentation, but readers who want a “recipe” will still find value in this redbook.

If you are particularly interested in following a specific configuration we show in this redbook from beginning to end, the following chapter road map gives the order that you should read the material.

If you are not familiar with high availability in general, and AIX HACMP or Microsoft Cluster Service in particular, we strongly recommend that you use the introductory road map shown in Figure 1-13.

Figure 1-13 Introductory high availability road map

If you want an installation of IBM Tivoli Workload Scheduler in a highly available configuration by itself, without IBM Tivoli Management Framework, the road map shown in Figure 1-14 on page 29 gives the sequence of chapters to read. This would be appropriate, for example, for implementing a highly available Fault Tolerant Agent.

Chapter 1

Chapter 2


Figure 1-14 Road map for implementing highly available IBM Tivoli Workload Scheduler (no IBM Tivoli Management Framework, no Job Scheduling Console access through cluster nodes)

If you want to implement an installation of IBM Tivoli Workload Scheduler with IBM Tivoli Management Framework, use the road map shown in Figure 1-15.

Figure 1-15 Road map for implementing IBM Tivoli Workload Scheduler in a highly available configuration, with IBM Tivoli Management Framework

If you want to implement an installation of IBM Tivoli Management Framework in a highly available configuration by itself, without IBM Tivoli Workload Scheduler, the road map shown in Figure 1-16 on page 30 should be used. This would be appropriate, for example, for implementing a stand-alone IBM Tivoli Management Framework server as a prelude to installing and configuring other IBM Tivoli products.

Chapter 3

Chapter 4(except forFrameworksections)

Chapter 3

Chapter 4


Figure 1-16 Road map for implementing IBM Tivoli Management Framework by itself

High availability design is a very broad subject. In this redbook, we provide representative scenarios meant to demonstrate to you the issues that must be considered during implementation. Many ancillary issues are briefly mentioned but not explored in depth here. For further information, we encourage you to read the material presented in “Related publications” on page 611.

Chapter 3

Chapter 5


Chapter 2. High level design and architecture

Implementing a high availability cluster is an essential task for most mission-critical systems. In this chapter, we present a high level overview of HA clusters. We cover the following topics:

� “Concepts of high availability clusters” on page 32

� “Hardware configurations” on page 43

� “Software configurations” on page 46

2


2.1 Concepts of high availability clustersToday, as more and more business and non-business organizations rely on their computer systems to carry out their operations, ensuring high availability (HA) to their computer systems has become a key issue. A failure of a single system component could result in an extended denial of service. To avoid or minimize the risk of denial of service, many sites consider an HA cluster to be a high availability solution. In this section we describe what an HA cluster is normally comprised of, then discuss software/hardware considerations and introduce possible ways of configuring an HA cluster.

2.1.1 A bird’s-eye view of high availability clustersWe start with defining the components of a high availability cluster.

Basic elements of a high availability clusterA typical HA cluster, as introduced in Chapter 1, “Introduction” on page 1, is a group of machines networked together sharing external disk resources. The ultimate purpose of setting up an HA cluster is to eliminate any possible single points of failure. By eliminating single points of failure, the system can continue to run, or recover in an acceptable period of time, with minimal impact to the end users.

Two major elements make a cluster highly available:

� A set of redundant system components� Cluster software that monitors and controls these components in case of a

failure

Redundant system components provide backup in case of a single component failure. In an HA cluster, an additional server(s) is added to provide server-level backups in case of a server failure. Components in a server, such as network adapters, disk adapters, disks and power supplies, are also duplicated to eliminate single points of failure. However, simply duplicating system components does not provide high availability, and cluster software is usually employed to control them.

Cluster software is the core element in HA clusters. It is what ties system components into clusters and takes control of those clusters. Typical cluster software provides a facility to configure clusters and predefine actions to be taken in case of a component failure.

The basic function of cluster software in general is to detect component failure and control the redundant components to restore service after a failure. In the event of a component failure, cluster software quickly transfers whatever service


the failed component provided to a backup component, thus ensuring minimum downtime. There are several cluster software products in the market today; Table 2-1 lists common cluster software for each platform.

Table 2-1 Commonly used cluster software - by platform

Each cluster software product has its own unique benefits, and the terminologies and technologies may differ from product to product. However, the basic concept and functions of most cluster software provides have much in common. In the following sections we describe how an HA cluster is typically configured and how it works, using simplified examples.

Typical high availability cluster configurationMost cluster software offers various options to configure an HA cluster. Configurations depend on the system’s high availability requirements and the cluster software used. Though there are several variations, the two configurations types most often discussed are idle or hot standby, and mutual takeover.

Basically, a hot standby configuration assumes a second physical node capable of taking over for the first node. The second node sits idle except in the case of a fallover. Meanwhile, the mutual takeover configuration consists of two nodes, each with their own set of applications, that can take on the function of the other in case of a node failure. In this configuration, each node should have sufficient machine power to run jobs of both nodes in the event of a node failure. Otherwise, the applications of both nodes will run in a degraded mode after a fallover, since one node is doing the job previously done by two. Mutual takeover is usually considered to be a more cost effective choice since it avoids having a system installed just for hot standby.

Figure 2-1 on page 34 shows a typical mutual takeover configuration. Using this figure as an example, we will describe what comprises an HA cluster. Keep in mind that this is just an example of an HA cluster configuration. Mutual takeover is a popular configuration; however, it may or may not be the best high

Platform type Cluster software

AIX HACMP

HP-UX MC/Service Guard

Solaris Sun Cluster, Veritas Cluster Service

Linux SCYLD Beowulf, Open Source Cluster Application Resources (OSCAR), IBM Tivoli System Automation

Microsoft Windows Microsoft Cluster Service

Chapter 2. High level design and architecture 33

availability solution for you. For a configuration that best matches your requirements, consult your service provider.

Figure 2-1 A typical HA cluster configuration

As you can see in Figure 2-1, Cluster_A has Node_A and Node_B. Each node is running an application. The two nodes are set up so that each node is able to provide the function of both nodes in case a node or a system component on a node fails. In normal production, Node_A runs App_A and owns Disk_A, while Node_B runs App_B and owns Disk_B. When one of the nodes fail, the other node will acquire ownership of both disks and run both applications.

Redundant hardware components are the bottom-line requirement to enable a high availability scenario. In the scenario shown here, notice that most hardware components are duplicated. The two nodes are each connected to two physical TCP/IP networks, subnet1 and subnet2, providing an alternate network connection in case of a network component failure. They share a same set of external disks, Disk_A and Disk_B, each mirrored to prevent the loss of data in case of a disk failure. Both nodes have a path to connect to the external disks. This enables one node to acquire owner ship of an external disk owned by

Node_A

Cluster_A

subnet1

Node_B

subnet2

Disk_A

net_hb

Disk_Amirror

Disk_B

Disk_Bmirror

App_A App_B


another node in case of a node failure. For example, if Node_A fails, Node_B can acquire ownership of Disk_A and resume whatever service that requires Disk_A. Disk adapters connecting the nodes and the external disks are duplicated to provide backup in the event of a disk adapter failure.

In some cluster configurations, there may be an additional non-TCP/IP network that directly connects the two nodes, used for heartbeats. This is shown in the figure as net_hb. To detect failures such as network and node failure, most cluster software uses the heartbeat mechanism.

Each node in the cluster sends ‘‘heartbeat’’ packets to its peer nodes over TCP/IP network and/or non-TCP/IP network. If heartbeat packets are not received from the peer node for a predefined amount of time, the cluster software interprets it as a node failure.

When using only TCP/IP networks to send heartbeats, it is difficult to differentiate node failures from network failures. Because of this, most cluster software recommends (or require) a dedicated point-to-point network for sending heartbeat packets. Used together with TCP/IP networks, the point-to-point network prevents cluster software from misinterpreting network component failure as node failure. The network type for this point-to-point network may vary depending on the type of network the cluster software supports. RS-232C, Target Mode SCSI, Target Mode SSA is supported for point-to-point networks in some cluster software.

Managing system componentsCluster software is responsible for managing system components in a cluster. It is typically installed on the local disk of each cluster node. There is usually a set of processes or services that is running constantly on the cluster nodes. It monitors system components and takes control of those resources when required. These processes or services are often referred to as the cluster manager.

On a node, applications and other system components that are required by those applications are bundled into a group. Here, we refer to each application and system component as resource, and refer to a group of these resources as resource group.

A resource group is generally comprised of one or more applications, one or more logical storages residing on an external disk, and an IP address that is not bound to a node. There may be more or fewer resources in the group, depending on application requirements and how much the cluster software is able to support.


A resource group is associated with two or more nodes in the cluster, and in normal production. A resource group is the unit that a cluster manager uses to move resources to one node from another. It will reside on the primary node in normal production; in the event of a node or component failure on the primary node, the cluster manager will move the group to another node. Figure 2-2 shows an example of resources and resource groups in a cluster.

Figure 2-2 Resource groups in a cluster

In Figure 2-2, a resource group called GRP_1 is comprised of an application called APP1, and external disks DISK1 and DISK2. IP address 192.168.1.101 is associated to GRP_1. The primary node for GRP1 is Node_A, and the secondary node is Node_B.

GRP_2 is comprised of application APP2, and disks DISK3 and DISK4, and IP address 192.168.1.102. For GRP_2, Node_B is the primary node and Node_A is the secondary node.

Node_A

Cluster_A

Node_B

DISK1

DISK2

DISK3

DISK4

APP1 APP2

192.168.1.101 192.168.1.102

Resource Group: GRP_2Application: APP2Disk: DISK3, DISK4IP Address:192.168.1.102

Resource Group: GRP_1Application: APP1Disk: DISK1, DISK2IP Address:192.168.1.101


Fallover and fallback of a resource groupIn normal production, cluster software constantly monitors the cluster resources for any signs of failure. As soon as a cluster manager running on a node detects a node or a component failure, it will quickly acquire the ownership of the resource group and restart the application.

In our example, assume a case where Node_A crashed. Through heartbeats, Node_B detects Node_A’s failure. Because Node_B is configured as a secondary node for resource GRP_1, Node_B’s cluster manager acquires ownership of resource group GRP_1. As a result, DISK1 and DISK2 are mounted on Node_B, and the IP address associated to GRP_1 has moved to Node_B.

Using these resources, Node_B will restart APP1, and resume application processing. Because these operations are initiated automatically based on pre-defined actions, it is a matter of minutes before processing of APP1 is restored. This is called a fallover. Figure 2-3 on page 38 shows an image of the cluster after fallover.


Figure 2-3 Fallover of a resource group

Note that this is only a typical scenario of a fallover. Most cluster software is capable of detecting both hardware and software component failures, if configured to do so. In addition to basic resources such as nodes, network, disks, what other resources could be monitored differs by product. Some cluster software may require more or less configuration to monitor the same set of resources. For details on what your choice of cluster software can monitor, consult your service provider.

After a node recovers from a failure, it rejoins the cluster. Depending on the cluster configuration, the resource group that failed over to a standby node is returned to the primary node at the time of rejoining. In this Redbook, we refer to this cluster behavior as fallback.

Node_A

Cluster_A

Node_B

DISK1

DISK2

DISK3

DISK4

APP1

APP2

192.168.1.101

192.168.1.102

Resource Group: GRP_2Application: APP2Disk: DISK3, DISK4IP Address: 192.168.1.102

Resource Group: GRP_1Application: APP1Disk: DISK1, DISK2IP Address: 192.168.1.101


To describe this behavior using our example, when fallback is initiated, resource group GRP_1 moves back to Node_A and returns to its normal production state as shown in Figure 2-2 on page 36. There are some considerations about fallback. These are summarized in 2.1.2, “Software considerations” on page 39 under Fallback policy.

As described, cluster software addresses node failure by initiating a fallover of a resource group from the failed node to the standby node. A failed node would eventually recover from a failure and rejoin the cluster. After the rejoining of the failed node, you would have the choice of either keeping the resource group on the secondary node, or relocating the resource group to the original node. If you choose the latter option, then you should consider the timing of when to initiate the fallback.

Most cluster software provides options on how a resource group should be managed in the event of a node rejoining the cluster. Typically you would have the option of either initiating a fallback automatically when the node rejoins the cluster, or have the node just rejoin the cluster and manually initiate a fallback whenever appropriate. When choosing to initiate an automatic fallback, be aware that this initiates a fallback regardless of the application status. A fallback usually requires stopping the application on the secondary node and restarting the application on the primary node. Though a fallback generally takes place in a short period of time, this may disrupt your application processing.

To implement a successful HA cluster, certain software considerations and hardware considerations should be met. In the following section, we describe what you need to consider prior to implementing HA clusters.

2.1.2 Software considerationsIn order to make your application highly available, you must either use the high availability functions that your application provides, or put them under the control of cluster software. Many sites look to cluster software as a solution to ensure application high availability, as it is usually the case that high availability functions within an application do not withstand hardware failure.

Though most software programs are able to run in a multi-node HA cluster environment and are controllable by cluster software, there are certain considerations to take into account. If you plan to put your application under control of any cluster software, check the following criteria to make sure your application is serviced correctly by cluster software.

Application behaviorFirst think about how your application behaves in a single-node environment. Then consider how your application may behave in a multi-node HA cluster. This


determines how you should set up your application. Consider where you should place your application executables, and how you should configure your application to achieve maximum availability. Depending on how your application works, you may have to install them on a shared disk, or just have a copy of the software on the local disk of the other node. If several instances of the same application may run on one node in the event of a fallover, make sure that your application supports such a configuration.

LicensingUnderstand your application licensing requirements and make sure the configuration you plan is not breaching the application license agreements. Some applications are license-protected by incorporating processor-specific information into each instance of application installed. This means that even though you implement your application appropriately and the cluster hardware handles the application correctly in case of a fallover, the application may not be able to start because of your license restrictions. Make sure you have licenses for each node in the cluster that may run your applications. If you plan to have several instances of the same application running on one node, ensure you have the license for each instance.

DependenciesCheck your application dependencies. When configuring your software for an HA cluster, it is important that you know what your applications are dependent upon, but it is even more important to know what your application should not be dependent upon.

Make sure your application is independent of any node-bound resources. Any applications dependent on a resource that is bound to a particular node may have dependency problems, as those resources are usually not attached or accessible to the standby node. Things like binaries or configuration files installed on locally attached drives, hard coding to a particular device in a particular location, and hostname dependencies could become a potential dependency issue.

Once you have confirmed that your application does not depend on any local resource, define which resource needs to be in place to run your application. Common dependencies are data on external disks and an IP address for client access. Check to see if your application needs other dependencies.

AutomationMost cluster software uses scripts or agents to control software and hardware components in a cluster. For this reason, most cluster software requires that any application handled by it must be able to start and stop by command without manual intervention. Scripts to start and stop your applications are generally required. Make sure your application provides startup and shutdown commands.


Also, make sure that those commands do not prompt you for operator replies. If you plan to have your application monitored by the cluster software, you may have to develop a script to check the health of your application.

RobustnessApplications should be stable enough to withstand sudden hardware failure. This means that your application should be able to restart successfully on the other node after a node failure. Tests should be executed to determine if a simple restart of the application is sufficient to recover your application after a hardware failure. If further steps are needed, verify that your recovery procedure could be automated.

Fallback policy As described in “Fallover and fallback of a resource group” on page 37, cluster software addresses node failure by initiating a fallover of the resource group from the failed node to the standby node. A failed node would eventually recover from a failure and rejoin the cluster. After the rejoining of the failed node, you would have the choice of either keeping the resource group on the secondary node or relocating the resource group to the original node. If you choose to relocate the resource group to the original node, then you should consider the timing of when to initiate the fallback.

Most cluster software gives you options on how a resource group should be managed in the event of a node rejoining the cluster. Typically you would have the option of either initiating a fallback automatically when the node rejoins the cluster, or having the node just rejoin the cluster and manually initiate a fallback whenever appropriate.

When choosing to initiate an automatic fallback, be aware that this initiates a fallback regardless of the application status. A fallback usually requires stopping the application on the secondary node and restarting the application on the primary node. Though a fallback generally takes place in a short period of time, this may disrupt your application processing.

2.1.3 Hardware considerationsIn this case, hardware considerations involve how to provide redundancy. A cluster that provides maximum high availability is a cluster with no single points of failure. A single point of failure exists when a critical cluster function is provided by a single component. If that component fails, the cluster has no way of providing that function, and the application or service dependent on that component becomes unavailable.

An HA cluster is able to provide high availability for most hardware components when redundant hardware is supplied and the cluster software is configured to


take control of them. Preventing hardware components from becoming single points of failure is not a difficult task; simply duplicating them and configuring the cluster software to handle them in the event of a failure should solve the problem for most components.

However, we remind you again that adding redundant hardware components is usually associated with a cost. You may have to make compromises at some point. Consider the priority of your application. Balance® the cost of the failure against the cost of additional hardware and the workload it takes to configure high availability. Depending on the priority and the required level of availability for your application, manual recovery procedures after notifying the system administrator may be enough.

In Table 2-2 we point out basic hardware components which could become a single point of failure, and describe how to address them. Some components simply need to be duplicated, with no additional configuration, because the hardware in which they reside automatically switches over to the redundant component in the event of a failure. For other components you may have to perform further configuration to handle them, or write custom code to detect their failure and trigger recovery actions. This may vary depending on the cluster software you use, so consult your service provider for detailed information.

Table 2-2 Eliminating single points of failure

Hardware component Measures to eliminate single points of failure

Node Set up a standby node. An additional node could be a standby for one or more nodes. If an additional node will just be a “hot standby” for one node during production, a node with the same machine power as the active node is sufficient.If you are planning a mutual takeover, make sure the node has enough power to execute all the applications that will run on that server in the event of a fallover.

Power source Use multiple circuits or uninterruptable power supplies (UPS.)

Network adapter To recover from a network adapter failure, you will need at least two network adapters per node. If your cluster software requires a dedicated TCP/IP network for heartbeats, additional network adapters may be added.

Network Have multiple networks to connect nodes.


2.2 Hardware configurationsIn this section, we discuss the different types of hardware cluster, concentrating on disk clustering rather than network or IP load balancing scenarios. We also examine the differences between a hardware cluster and a hot standby system.

2.2.1 Types of hardware clusterThere are many types of hardware clustering configurations, but here we concentrate on four different configurations: two-node cluster, multi-node cluster, grid computing, and disk mirroring (these terms may vary, depending on the hardware manufacturer).

Two-node clusterA two-node cluster is probably the most common form of hardware cluster configuration; it consists of two nodes which are able to access a disk system that is externally attached to the two nodes, as shown in Figure 2-4 on page 44. The external drive system can be attached over the LAN or SAN network (SSA Disk system), or even by local SCSI cables.

This type of cluster is used when configuring only a couple of applications in a high availability cluster. This type of configuration can accommodate either

TCP/IP subsystem Use a point-to-point network to connect nodes in the cluster. Most cluster software requires, or recommends, at least one active network (TCP/IP or non-TCP/IP) to send “heartbeats” to the peer nodes. By providing a point-to-point network, cluster software will be able to distinguish a network failure from a node failure.For cluster software that does not support non-TCP/IP network for heartbeats, consult your service provider for ways to eliminate TCP/IP subsystem as a single point of failure.

Disk adapter Add an additional disk adapter to each node. When cabling your disks, make sure that each disk adapter has access to each external disk. This enables an alternate access path to external disks in case of a disk adapter failure.

Disk controller Use redundant disk controllers.

Disk Provide redundant disks and enable RAID to protect your data from disk failures.

Hardware component Measures to eliminate single points of failure


Active/Passive or Active/Active, depending on the operating system and cluster software that is used.

Figure 2-4 Two-node cluster

Multi-node clusterIn a multi-node cluster, we have between two and a number of nodes that can access the same disk system, which is externally attached to this group of nodes, as shown in Figure 2-5 on page 45. The external disk system can be over the LAN or SAN.

This type of configuration can be used for extra fault tolerance where, if Node1 were to fail, then all work would move onto Node2—but if Node2 were to fail as well, then all work would then move on to the next node, and so on.

It also can support many applications running simultaneously across all nodes configured in this cluster. The number of nodes that this configuration can support depends on the hardware and software manufacturers.

Node1 Node2

SharedDisk

Private NetworkConnection

Public NetworkConnection


Figure 2-5 Multi-node cluster

Grid computingEven though grid computing is not necessarily considered a cluster, it acts like one, so we will explain the concepts involved. Grid computing is based on the concept that the IT infrastructure can be managed as a collection of distributed computing resources available over a network that appear to an end user or application as one large virtual computing system.

A grid can span locations, organizations, machine architectures, and software boundaries to provide unlimited power, collaboration, and information access to everyone connected to the grid. Grid computing enables you to deliver computing power to applications and users that need it on demand, which is only when they need it for meeting business objectives.

Disk mirroringDisk mirroring is more commonly used in a hot standby mode, but it is also used in some clustering scenarios, especially when mirroring two systems across large distances; this will depend on the software and or hardware capabilities.

Disk mirroring functionality can be performed by software in some applications and in some clustering software packages, but it can also be performed at the hardware level where you have a local disk on each side of a cluster and any

Node2

SharedDisk



Node3 Node4Node1




changes made to one side is automatically sent across to the other side, thus keeping the two sides in synchronization.

2.2.2 Hot standby systemThis terminology is used for a system that is connected to the network and fully configured, with all the applications loaded but not enabled. It is normally an identical system for which it is on standby for, and this is both hardware and software.

One hot standby system can be on standby for several live systems which can include application servers which have a Fault Tolerant Agent, IBM Tivoli Workload Scheduler Master Domain Manager or a Domain Manager.

The advantage over a hardware cluster is that one server can be configured for several systems, which cut the cost dramatically.

The disadvantages over a hardware cluster are as follows:

� It is not an automatic switchover and can take several minuets or even hours to bring up the standby server.

� The work that was running on the live server has no visibility on the standby server, so an operator would have to know where to restart the standby server.

� The standby server has a different name, so the IBM Tivoli Workload Scheduler jobs would not run on this system as defined in the database. Therefore, the IBM Tivoli Workload Scheduler administrator would have to submit the rest of the jobs by hand or create a script to do this work.

2.3 Software configurationsIn this section we cover the different ways to implement IBM Tivoli Workload Scheduler in a cluster and also look at some of the currently available software configurations built into IBM Tivoli Workload Scheduler.

2.3.1 Configurations for implementing IBM Tivoli Workload Scheduler in a cluster

Here we describe the different configurations of IBM Tivoli Workload Scheduler workstations, how they are affected in a clustered environment, and why each configuration would be put into a cluster. We will also cover the different types of Extended Agents and how they work in a cluster.


Master Domain ManagerThe Master Domain Manager is the most critical of all the IBM Tivoli Workload Scheduler workstation configurations. It is strongly recommended to configure this into a cluster, as it manages and controls the scheduling database. From this database, it generates and distributes the 24-hour daily scheduling plan called a symphony file. It also controls, coordinates and keeps track of all the scheduling dependences throughout the entire IBM Tivoli Workload Scheduler network.

Keep the following considerations in mind when setting up a Master Domain Manager in a cluster:

� Connectivity to the IBM Tivoli Workload Scheduler database

� Ability of the IBM Tivoli Workload Scheduler installation to locate the components file (this only applies to versions prior to IBM Tivoli Workload Scheduler Version 8.2)

� Ability of the user interface (IBM Tivoli Workload Scheduler Console) to connect to the new location where IBM Tivoli Workload Scheduler is now running

� Starting all the IBM Tivoli Workload Scheduler processes and services

� Coordinating all messages from and to the IBM Tivoli Workload Scheduler network

� Linking all workstations in its domain

Let’s examine these considerations in more detail.

IBM Tivoli Workload Scheduler databaseThe IBM Tivoli Workload Scheduler database is held in the same file system as the installed directory of IBM Tivoli Workload Scheduler. Therefore, providing this is not being mounted or links to a separate file system, then the database will follow the IBM Tivoli Workload Scheduler installation.

If the version of IBM Tivoli Workload Scheduler used is prior to Version 8.2, then you will have to consider the TWShome/../unison/ directory, as this is where part of the database is held (workstation, NT user information); the working security file is also held here.

The directory TWShome/../unison/ may not be part of the same file system as the TWShome directory, so this will have to be added as part of the cluster package. Because the database is a sequential index link database, there is no requirement to start the database before IBM Tivoli Workload Scheduler can read it.


IBM Tivoli Workload Scheduler components file All versions prior to IBM Tivoli Workload Scheduler Version 8.2 require a components file. The contents of this file must contain the location of both maestro and Netman installations, and it is installed in the directory c:\win32app\TWS\Unison\netman. Under the UNIX operating system /usr/unson/ this needs to be accessed on both sides of the cluster.

IBM Tivoli Workload Scheduler console The IBM Tivoli Workload Scheduler console (called the Job Scheduling Console) connects to the IBM Tivoli Workload Scheduler engine through the IBM Tivoli Management Framework or the Framework.

The Framework authenticates the logon user, and communicates to the IBM Tivoli Workload Scheduler engine through two Framework modules (Job Scheduling Services and Job Scheduling Connector). Therefore, you need to consider both the IP address of the Framework and the location of the IBM Tivoli Workload Scheduler engine code.

� When a user executes the Job Scheduling Console, it prompts for a User name, Password for that user and an address of where the Framework is located. This address can be a fully-qualified domain name or an IP address, but it must be able to connect to where the Framework is running (after the cluster take over).

� The Job Scheduling Console displays a symbol of an engine. If the IBM Tivoli Workload Schedule engine is active, the engine symbol displays without a red cross through it. If the IBM Tivoli Workload Schedule engine is not active, then the engine symbol has a red crossmark through it, as shown in Figure 2-6.

Figure 2-6 Symbol of IBM Tivoli Workload Scheduler engine availability

Domain ManagerThe Domain Manager is the second critical workstation that needs to be protected in a HA cluster, because it controls, coordinates and keeps track of all scheduling dependences between workstations that are defined in the domain that this Domain Manager is managing (which may be hundreds or even a thousand workstations).

The considerations that should be kept in mind when setting up a Domain Manager in a cluster are:


� The ability of the IBM Tivoli Workload Scheduler installation to locate the components file (this only applies to versions prior to IBM Tivoli Workload Scheduler Version 8.2).

� The ability of the user interface (Job Scheduling Console) to connect to the new location of where IBM Tivoli Workload Scheduler is now running (this is optional, as it is not essential to run the console on this workstation).

In addition, the starting of all IBM Tivoli Workload Scheduler processes and services, the coordination of all messages from and to the IBM Tivoli Workload Scheduler network, and the linking of all workstations in its domain should be taken into account.

Fault Tolerant AgentThe Fault Tolerant Agent may be put in a cluster because a critical application needs to be in a HA environment, so the Fault Tolerant Agent that schedules and controls all the batch work needs to be in this same cluster.

Keep the following considerations in mind when setting up a Fault Tolerant Agent in a cluster:

� The ability of the IBM Tivoli Workload Scheduler installation to locate the components file (this only applies to versions prior to IBM Tivoli Workload Scheduler Version 8.2)

� The ability of the user interface (Job Scheduling Console) to connect to the new location of where IBM Tivoli Workload Scheduler is now running (this is optional, as it is not essential to run the console on this workstation).

In addition, the starting of all IBM Tivoli Workload Scheduler processes and services should be taken into account.

Extended Agents An Extended Agent (xa or x-agent) serves as an interface to an external, non-IBM Tivoli Workload Scheduler system or application. It is defined as an IBM Tivoli Workload Scheduler workstation with an access method and a host. The access method communicates with the external system or application to launch and monitor jobs and test Open file dependencies. The host is another IBM Tivoli Workload Scheduler workstation (except another xa) that resolves dependencies and issues job launch requests via the method.

In this section, we consider the implications of implementing these Extended Agents in a HA cluster with the different Extended Agents currently available. All the Extended Agents are currently installed partly in the application itself and also on a IBM Tivoli Workload Scheduler workstation (which can be a Master


Domain Manager, a Domain Manager or an Fault Tolerant Agent), so we need to consider the needs of the type of workstation the Extended Agent is installed on.

We will cover each type of Extended Agent in turn. The types of agents that are currently supported are: SAP R/3; Oracle e-Business Suite; PeopleSoft; z/OS access method; and Local and Remote UNIX access. For each Extended Agent, we describe how the access method will work in a cluster.

SAP R/3 access methodWhen you install and configure the SAP Extended Agent and then create a workstation definition for the SAP instance you wish to communicate with, there will be an R3batch method in the methods directory.

This is a C program that communicates with the remote R3 system. It finds where to run the job by reading the r3batch.opts file, and then matching the workstation name with the first field in the r3batch.opts file. R3batch then reads all the parameters in the matched workstation line, and uses these to communicate with the R/3 system.

The parameter that we are interested in is the second field of the r3batch.opts file: R/3 Application Server. This will be a IP address or domain name. In order for the Extended Agent to operate correctly, this system should be accessed from wherever IBM Tivoli Workload Scheduler is running. (This operates in the same way for the Microsoft or the UNIX cluster.)

Oracle e-Business Suite access methodThe Oracle e-Business Suite Extended Agent is installed, configured on the same system as the Oracle Application server. When setting this up in a cluster, you must first configure the Fault Tolerant Agent and Extended Agent to be in the same part of the cluster.

When the Oracle Applications x-agent is started, the IBM Tivoli Workload Scheduler host executes the access method mcmagent. Using the x-agent’s workstation name as a key, mcmagent looks up the corresponding entry in the mcmoptions file to determine which instance of Oracle Applications it will connect to. The Oracle Applications x-agent can then launch jobs on that instance of Oracle Applications and monitor the jobs through completion, writing job progress and status information to the job’s standard list file.

PeopleSoft access methodThe PeopleSoft Extended Agent is installed and configured on the same system as the PeopleSoft client. It also requires an IBM Tivoli Workload Scheduler Fault Tolerant Agent to host the PeopleSoft Extended Agent, which is also installed and configured on the same system as the PeopleSoft client.


When setting this configuration up in a cluster, you must first configure the Fault Tolerant Agent and Extended Agent to be in the same part of the cluster as the PeopleSoft Client.

To launch a PeopleSoft job, IBM Tivoli Workload Scheduler executes the psagent method, passing it information about the job. An options file provides the method with path, executable and other information about the PeopleSoft process scheduler and application server used to launch the job. The Extended Agent can then access the PeopleSoft process request table and make an entry in the table to launch the job. Job progress and status information are written to the job’s standard list file.

z/OS access methodIBM Tivoli Workload Scheduler z/OS access method has three separate methods, depending on what you would like to communicate to on the z/OS system. All of these methods work in the same way, and they are: JES, OPC and CA7. The Extended Agent will communicate to the z/OS gateway over TCP/IP, and will use the parameter HOST in the workstation definition to communicate to the gateway.

When configuring a z/OS Extended Agent in a cluster, be aware that this Extended Agent is hosted by a Fault Tolerant Agent; the considerations for a Fault Tolerant Agent are described in 2.3.1, “Configurations for implementing IBM Tivoli Workload Scheduler in a cluster” on page 46.

The parameter that we are interested in is in the workstation definition HOST. This will be a IP address or domain name. In order for the Extended Agent to operate correctly, this system should be accessed from wherever the IBM Tivoli Workload Scheduler is running. (This operates in the same way for the Microsoft or the UNIX cluster.)

Figure 2-7 on page 52 shows the architecture of the z/OS access method.


Figure 2-7 z/OS access method

Local UNIX access methodWhen the IBM Tivoli Workload Scheduler sends a job to a local UNIX Extended Agent, the access method, unixlocl, is invoked by the host to execute the job. The method starts by executing the standard configuration script on the host workstation (jobmanrc). If the job’s logon user is permitted to use a local configuration script and the script exists as $HOME/.jobmanrc, the local configuration script is also executed. The job itself is then executed either by the standard or the local configuration script. If neither configuration script exists, the method starts the job.

For the local UNIX Extended Agent to function properly in a cluster, the parameter that we are interested in is host, which is in the workstation definition. This will be an IP address or domain name, and providing that wherever the IBM Tivoli Workload Scheduler is running this system can be accessed, then the Extended Agent will still operate correctly.

Remote UNIX access methodNote: In this section we explain how this access method works in a cluster; this explanation is not meant to be used as a way to set up and configure this Extended Agent.

When the IBM Tivoli Workload Scheduler sends a job to a remote UNIX Extended Agent, the access method, unixrsh, creates a /tmp/maestro directory on the non-IBM Tivoli Workload Scheduler computer. It then transfers a wrapper script to the directory and executes it. The wrapper then executes the scheduled job. The wrapper is created only once, unless it is deleted, moved, or outdated.

mvs access method

mvs gateway

method.opts

JES2/JES3 CA7OPC

Job

Unix or NT Host

z/OS System

mvs access method

mvs gateway

method.optsmethod.opts

JES2/JES3 CA7OPC

Job

Unix or NT Host

z/OS System

TWS Host


For the remote UNIX Extended Agent to function properly in a cluster, the parameter that we are interested in is host, which is in the workstation definition. This will be an IP address or domain name, and providing that wherever the IBM Tivoli Workload Scheduler is running this system can be accessed, then the Extended Agent will still operate correctly.

One instance of IBM Tivoli Workload Scheduler In this section, we discuss the circumstances under which you might install one instance of the IBM Tivoli Workload Scheduler in a high availability cluster.

The first consideration is where the product is to be installed: it must be in the shared file system that moves between the two servers in the cluster.

The second consideration is how the IBM Tivoli Workload Scheduler instance is addressed: that must be the IP address that is associated to the cluster.

Why to install only one copy of IBM Tivoli Workload SchedulerIn this configuration there may be three reasons for installing only one copy of IBM Tivoli Workload Scheduler in this cluster:

� Installing a Master Domain Manager (MDM) in a cluster removes the single point of failure of the IBM Tivoli Workload Scheduler database and makes the entire IBM Tivoli Workload Scheduler network more fault tolerant against failures.

� Installing a Domain Manager (DM) in a cluster makes the segment of the IBM Tivoli Workload Scheduler network that the Domain Manager manages more fault tolerant against failures.

� If an application is running in a clustered environment and is very critical to the business, it may have some critical batch scheduling; you could install a Fault Tolerant Agent in the same cluster to handle the batch work.

When to install only one copy of IBM Tivoli Workload SchedulerYou would install the workstation in this cluster in order to provide high availability to an application or to the IBM Tivoli Workload Scheduler network by installing the Master Domain Manager in the cluster.

Where to install only one copy of IBM Tivoli Workload SchedulerTo take advantage of the cluster, install this instance of IBM Tivoli Workload Scheduler on the shared disk system that moves between the two sides of the cluster.


What to installDepending on why you are installing one instance of IBM Tivoli Workload Scheduler, you may install a Master Domain Manager, Domain Manager or Fault Tolerant Agent in the cluster.

Two instances of IBM Tivoli Workload Scheduler In this section, we discuss the circumstances under which you might install two instances of the IBM Tivoli Workload Scheduler.

The first consideration is where the product is to be installed: each IBM Tivoli Workload Scheduler instance must have a different installation directory, and that must be in the shared file system that moves between the two servers in the cluster. Each instance will also have its own installation user.

The second consideration is how the IBM Tivoli Workload Scheduler instance is addressed: that must be the IP address that is associated to the cluster. Each IBM Tivoli Workload Scheduler instance must also have its own port number. If the version of IBM Tivoli Workload Scheduler is older than 8.2, then it will need to access the components file from both sides of the cluster to run. If the version of IBM Tivoli Workload Scheduler is 8.2 or higher, then the components file is only needed to be sourced when upgrading IBM Tivoli Workload Scheduler.

Why to install two instances of IBM Tivoli Workload Scheduler In this configuration there may be two reasons for installing two copies of IBM Tivoli Workload Scheduler in this cluster:

� Installing a Master Domain Manager and a Domain Manager in the cluster not only removes the single point of failure of the IBM Tivoli Workload Scheduler database, but also makes the entire IBM Tivoli Workload Scheduler network more fault tolerant against failures.

� If two applications are running in a clustered environment and they are very critical to the business, they may have some critical batch scheduling; you could install a Fault Tolerant Agent for each application running in the cluster to handle the batch work.

When to install two instances of IBM Tivoli Workload SchedulerYou would install both instances of IBM Tivoli Workload Scheduler in this cluster in order to give a high availability to an application or to the IBM Tivoli Workload Scheduler network by installing the Master Domain Manager or Domain Manager in this cluster.

Where to install two instances of IBM Tivoli Workload Scheduler To take advantage of the cluster, you would install the two instances of IBM Tivoli Workload Scheduler on the shared disk system that moves between the two


sides of the cluster. You would set up the cluster software in such a way that the first instance of IBM Tivoli Workload Scheduler would have a preference of running on server A and the second instance would have a preference of running on server B.

What to installDepending on why you are installing two instances of IBM Tivoli Workload Scheduler, you may install a combination of a Master Domain Manager, Domain Manager or Fault Tolerant Agent in the cluster.

Three instances of IBM Tivoli Workload Scheduler In this section, we discuss the circumstances under which you might install three instances of the IBM Tivoli Workload Scheduler.

The first consideration is where the product is to be installed. When two instances of IBM Tivoli Workload Scheduler are running on the same system, you must have each IBM Tivoli Workload Scheduler instance installed in a different directory—and one of the instances must be installed in the shared file system that moves between the two servers in the cluster. Each instance will have it own installation user.

The second consideration is how the IBM Tivoli Workload Scheduler instance is addressed. In this case, one will have the IP address that is associated to the cluster, and the other two will have the IP address of each system that is in this cluster. Each IBM Tivoli Workload Scheduler instance must have its own port number. If the version of IBM Tivoli Workload Scheduler is older than 8.2, then it will need to access the components file from both sides of the cluster to run. If the version of IBM Tivoli Workload Scheduler is 8.2 or higher, then the components file is only needed to be sourced when upgrading IBM Tivoli Workload Scheduler.

Why to install three instances of IBM Tivoli Workload SchedulerIn this configuration, only one instance is installed in a high availability mode; the other two are installed on the local disks shown in Figure 2-8 on page 56. Why would you install IBM Tivoli Workload Scheduler in this configuration? Because an application is running on both sides of the cluster that cannot be configured in a cluster; therefore, you need to install the IBM Tivoli Workload Scheduler workstation with the application. Also, you may wish to install the Master Domain Manager in the cluster, or an third application is cluster-aware and can move.

When to install three instances of IBM Tivoli Workload SchedulerYou would install one instance of the IBM Tivoli Workload Scheduler in this cluster in order to give high availability to an application or to the IBM Tivoli Workload Scheduler network by installing the Master Domain Manager or


Domain Manager in this cluster, and one instance of IBM Tivoli Workload Scheduler on each local disk. This second instance may be scheduling batch work for the systems in the cluster, or an application that only runs on the local disk subsystem.

Where to install three instances of IBM Tivoli Workload SchedulerInstall one instance of IBM Tivoli Workload Scheduler on the shared disk system that moves between the two sides of the cluster, and one instance of IBM Tivoli Workload Scheduler on the local disk allocated to each side of the cluster, as shown in shown in Figure 2-8.

What to installDepending on why you are installing one instance of IBM Tivoli Workload Scheduler as described above, you may a Master Domain Manager, Domain Manager or Fault Tolerant Agent in the cluster. You would install a Fault Tolerant Agent on each side of the cluster.

Figure 2-8 Three-instance configuration

Multiple instances of IBM Tivoli Workload Scheduler In this section, we discuss the circumstances under which you might install multiple instances of the IBM Tivoli Workload Scheduler.

The first consideration is where the product is to be installed, because each IBM Tivoli Workload Scheduler instance must have a different installation directory. These installation directories must be in the shared file system that moves between the two servers in the cluster. Each instance will also have its own installation user.

The second consideration is how the IBM Tivoli Workload Scheduler instance is addressed: that must be the IP address that is associated to the cluster. Each IBM Tivoli Workload Scheduler instance must also have its own port number. If the version of IBM Tivoli Workload Scheduler is older than 8.2, then it will need to

System 2Local DiskVolume

TWSEngine 1

Shared DiskVolume

Local DiskVolume

TWSEngine 2

TWSEngine 3

System 2


access the components file from both sides of the cluster to run. If the version of IBM Tivoli Workload Scheduler is 8.2 or higher, then the components file is only needed to be sourced when upgrading IBM Tivoli Workload Scheduler.

Why to install multiple instances of IBM Tivoli Workload SchedulerIn this configuration there may be many applications running in this cluster, and each application would need to have its own workstation associated with this application. You might also want to install Master Domain Manager and even the Domain Manager in the cluster to make the entire IBM Tivoli Workload Scheduler network more fault tolerant against failures.

When to install multiple instances of IBM Tivoli Workload SchedulerYou would install multiple instances of IBM Tivoli Workload Scheduler in this cluster to give high availability to an application and to the IBM Tivoli Workload Scheduler network by installing the Master Domain Manager or Domain Manager in this cluster.

Where to install multiple instances of IBM Tivoli Workload SchedulerAll instances of IBM Tivoli Workload Scheduler would be installed on the shared disk system that moves between the two sides of the cluster. Each instance would need its own installation directory, its own installation user, and its own port address.

What to install

Depending on why you are installing multiple instances of IBM Tivoli Workload Scheduler, you may install a combination of a Master Domain Manager, Domain Manager or Fault Tolerant Agent in the cluster.

2.3.2 Software availability within IBM Tivoli Workload Scheduler In this section we discuss software options currently available with IBM Tivoli Workload Scheduler that will give you a level of high availability if you do not have, or do not want to use, a hardware cluster.

Backup Master Domain ManagerA Backup Master Domain Manager (BMDM) and the Master Domain Manager (MDM) are critical parts of a highly available IBM Tivoli Workload Scheduler environment. If the production Master Domain Manager fails and cannot be immediately recovered, a backup Master Domain Manager will allow production to continue.

The Backup Master Domain Manager must be identified when defining your IBM Tivoli Workload Scheduler network architecture; it must be a member of the


same domain as the Master Domain Manager, and the workstation definition must have the Full Status and Resolve Dependencies modes selected.

It may be necessary to transfer files between the Master Domain Manager and its standby. For this reason, the computers must have compatible operating systems. Do not combine UNIX with Windows NT® computers. Also, do not combine little-endian and big-endian computers.

When a Backup Master Domain Manager is correctly configured, the Master Domain Manager will send any changes and updates to the production file to the BMDM—but any changes or updates that are made to the database are not automatically sent to the BMDM. In order to keep the BMDM and the MDM databases synchronized, you must manually copy on a daily basis, following start-of-day processing, the TWShome\mozart and TWShome\..\unison\network directories (the unison directory is only for versions older than 8.2). Any changes to the security must be replicated to the BMDM, and configuration files like localopts and globalopts files must also be replicated to the BMDM.

The main advantages over a hardware HA solution is that this currently exists in the IBM Tivoli Workload Scheduler product, and the basic configuration where the BMDM takes over the IBM Tivoli Workload Scheduler network for a short-term loss of the MDM is fairly easy to set up. Also, no extra hardware or software is needed to configure this solution.

The main disadvantages are that the IBM Tivoli Workload Scheduler database is not automatically synchronized and it is the responsibility of the system administrator to keep both databases in sync. Also, for a long-term loss of the MDM, the BMDM will have to generate a new production day plan and for this an operator will have to submit a Jnextday job on the BMDM. Finally, any jobs or job streams that ran on the MDM will not run on the BMDM, because the workstation names are different.

Backup Domain ManagerThe management of a domain can be assumed by any Fault Tolerant Agent that is a member of the same domain.The workstation definition has to have Full Status and Resolve Dependencies modes selected. When the management of a domain is passed to another workstation, all domain workstations members are informed of the switch, and the old Domain Manager is converted to a Fault Tolerant Agent in the domain.

The identification of domain managers is carried forward to each new day’s symphony file, so that switches remain in effect until a subsequent switchmgr command is executed.


Once a new workstation has taken over the responsibility of the domain, it has the ability to resolve any dependencies for the domain it is managing, and also the ability to process any messages to or from the network.

Switch manager commandThe switch manager command is used to transfer the management of a IBM Tivoli Workload Scheduler domain to another workstation. This command can be used on the Master Domain Manager or on a Domain Manager.

To use the command switchmgr, the workstation that you would like to have take over the management of a domain must be a member of the same domain. It must also have resolve dependences and full status to work correctly. The syntax of the command is switchmgr domain;newmgr.

The command stops a specified workstation and restarts it as the Domain Manager. All domain member workstations are informed of the switch, and the old Domain Manager is converted to a Fault Tolerant Agent in the domain.

The identification of Domain Managers is carried forward to each new day’s symphony file, so that switches remain in effect until a subsequent switchmgr command is executed. However, if new day processing (the Jnextday job) is performed on the old domain manager, the domain will act as though another switchmgr command had been executed and the old Domain Manager will automatically resume domain management responsibilities.

2.3.3 Load balancing softwareUsing load balancing software is another way of bringing a form of high availability to IBM Tivoli Workload Scheduler jobs; the way to do this is by integrating IBM Tivoli Workload Scheduler with IBM LoadLeveler®, because IBM LoadLeveler will detect if a system is unavailable and reschedule it on one that is available.

IBM LoadLeveler is a job management system that allows users to optimize job execution and performance by matching job processing needs with available resources. IBM LoadLeveler schedules jobs and provides functions for submitting and processing jobs quickly and efficiently in a dynamic environment. This distributed environment consists of a pool of machines or servers, often referred to as a LoadLeveler cluster.

Jobs are allocated to machines in the cluster by the IBM LoadLeveler scheduler. The allocation of the jobs depends on the availability of resources within the cluster and on rules defined by the IBM LoadLeveler administrator. A user submits a job to IBM LoadLeveler and the scheduler attempts to find resources within the cluster to satisfy the requirements of the job.


At the same time, the objective of IBM LoadLeveler is to maximize the efficiency of the cluster. It attempts to do this by maximizing the utilization of resources, while at the same time minimizing the job turnaround time experienced by users.

2.3.4 Job recoveryIn this section we explain how IBM Tivoli Workload Scheduler will treat a job if it has failed; this is covered in three scenarios.

A job abends in a normal job runPrior to IBM Tivoli Workload Scheduler Version 8.2, if a job finished with a return code other than 0, the job was treated as ABENDED. If this was the correct return code for this job, the IBM Tivoli Workload Scheduler administrator would run a wrapper script around the job or change the .jobmanrc to change the job status to SUCCES.

In IBM Tivoli Workload Scheduler Version 8.2, however, a new field in the job definition allows you to set a boolean expression for the return code of the job. This new field is called rccondsucc. In this field you are allowed to type in a boolean expression which determines the return code (RC) required to consider a job successful. For example, you can define a successful job as a job that terminates with a return code equal to 3 or with a return code greater than or equal to 5, and less than 10, as follows:

rccondsucc "RC=3 OR (RC>=5 AND RC<10)"

Job process is terminatedA job can be terminated in a number of ways, and in this section we look at some of the more common ones. Keep in mind, however, that it is not the responsibility of IBM Tivoli Workload Scheduler to roll back any actions that a job may have done during the time that it was executing. It is the responsibility of the person creating the script or command to allow for a rollback or recovery action.

When a job abends, IBM Tivoli Workload Scheduler can rerun the abended job or stop or continue on with the next job. You can also generate a prompt that needs to be replied to, or launch a recovery job. The full combination of the job flow is shown in Figure 2-9 on page 61.


Figure 2-9 IBM Tivoli Workload Scheduler job flow

Here are the details of this job flow:

� When a job is killed through the conman CLI or Job Scheduling Console, the job will be terminated by terminating the parent process. The termination of any child processes that the parent has started will be the responsibility of the operating system and not IBM Tivoli Workload Scheduler.

After the job has been terminated, it will be displayed in the current plan in the Abend state. Any jobs or job streams that are dependent on a killed job are not released. Killed jobs can be rerun.

� When the process ID is “killed”, either in UNIX or Microsoft operating systems, the job will be terminated by terminating the parent process. The termination of any child processes that the parent has started will be the responsibility of the operating system and not IBM Tivoli Workload Scheduler.

After the job has been terminated, it will be displayed in the current plan in the Abend state. Any jobs or job streams that are dependent on a killed job are not released. Killed jobs can be rerun.

� When the system crashes or is powered off, the job is killed by the crash or by the system being powered down. In that case, when the system is re-booted

JOB 1 JOB 5FAILUREFAILURE

JOB 2

Issue a Recovery Prompt Run Recovery Job

JOB 4

JOB3A

Rerun ContinueStop

JOB 3 JOB 4

JOB 3JOB 1JOB 1 JOB 5JOB 5FAILUREFAILURE

JOB 2JOB 2

Issue a Recovery PromptIssue a Recovery Prompt Run Recovery JobRun Recovery Job

JOB 4JOB 4

JOB3AJOB3A

RerunRerun ContinueContinueStopStop

JOB 3JOB 3 JOB 4JOB 4

JOB 3JOB 3


and IBM Tivoli Workload Scheduler is restarted, IBM Tivoli Workload Scheduler will check to see if there are any jobs left in the jobtable file:

– If jobs are left, IBM Tivoli Workload Scheduler will read the process ID and then go out to see if that process ID is still running.

– If no jobs are left, it will mark the job as Abend and the normal recovery action will run.


Chapter 3. High availability cluster implementation

In this chapter, we provide step-by-step installation procedures to help you plan and implement an high availability cluster using High Availability Cluster Multiprocessing for AIX (HACMP) and Microsoft Cluster Service (MSCS), for a mutual takeover scenario of Tivoli Framework and Tivoli Workload Scheduler.

We cover the following procedures:

� “Our high availability cluster scenarios” on page 64

� “Implementing an HACMP cluster” on page 67

� “Implementing a Microsoft Cluster” on page 138

3


3.1 Our high availability cluster scenariosWith numerous cluster software packages on the market, each offering a variety of configurations, there are many ways of configuring a high availability (HA) cluster. We cannot cover all possible scenarios, so in this redbook we focus on two scenarios which we believe are applicable to many sites: a mutual takeover scenario for IBM Tivoli Workload Scheduler, and a hot standby scenario for IBM Tivoli Management Framework. We discuss these scenarios in detail in the following sections.

3.1.1 Mutual takeover for IBM Tivoli Workload SchedulerIn our scenario, we assume a customer case where they plan to manage jobs for two mission-critical business applications. They plan to have the two business applications running on separate nodes, and would like to install separate IBM Tivoli Workload Scheduler Master Domain Managers on each node to control the jobs for each application. They are seeking a cost-effective, high availability solution to minimize the downtime of their business application processing in case of a system component failure. Possible solutions for this customer would be the following:

� Create separate HA clusters for each node by adding two hot standby nodes and two sets of external disks.

� Create one HA cluster by adding an additional node and a set of external disks. Designate the additional node as a hot standby node for the two application servers.

� Create one HA cluster by adding a set of external disks. Each node is designated as a standby for the other node.

The first two solutions require additional machines to sit idle until a fallover occurs, while the third solution utilizes all machines in a cluster and no node is left to sit idle. Here we assume that the customer chose the third solution. This type of configuration is called a mutual takeover, as discussed in Chapter 2, “High level design and architecture” on page 31.

Note that this type of cluster configuration is allowed under the circumstance that the two business applications in question and IBM Tivoli Workload Scheduler itself have no software or hardware restrictions to run on the same physical machine. Figure 3-1 on page 65 shows a diagram of our cluster.


Figure 3-1 Overview of our HA cluster scenario

In Figure 3-1, node Node1 controls TWS1 and the application APP1. Node Node2 controls TWS2 and application APP2. TWS1 and TWS2 are installed on the shared external disk so that each instance of IBM Tivoli Workload Scheduler could fall over to another node.

We assume that system administrators would like to use the Job Scheduling Console (JSC) to manage the scheduling objects and production plans. To enable the use of JSC, Tivoli Management Framework(TMF) and IBM Tivoli Workload Scheduler Connector must be installed.

Because each IBM Tivoli Workload Scheduler instance requires a running Tivoli Management Framework Server or a Managed Node, we need two Tivoli Management Region (TMR) servers. Keep in mind that in our scenario, when a node fails, everything installed on the external disk will fall over to another node.

Note that it is not officially supported to run two TMR servers or Managed Nodes in one node. So the possible configuration of TMF in this scenario would be to install TMR servers on the local disks of each node.

Cluster

TWS1(for APP1)

TWS2(for APP2)

TivoliManagementFrameworkServer 1

TWSConnector1

Node1

Instance1

Instance2

TivoliManagementFrameworkServer 1

TWSConnector2

Instance1

Instance2

Node2

Chapter 3. High availability cluster implementation 65

IBM Tivoli Workload Scheduler connector will also be installed on the local disks. To enable JSC access to both IBM Tivoli Workload Scheduler instances during a fallover, each IBM Tivoli Workload Scheduler Connector needs two connector instances defined: Instance1 to control TWS1, and Instance2 to control TWS2.

3.1.2 Hot standby for IBM Tivoli Management FrameworkIn our mutual takeover scenario, we cover the high availability scenario for IBM Tivoli Workload Scheduler.Here, we cover a simple hot standby scenario for IBM Tivoli Management Framework (TMF). Because running multiple instances of Tivoli Management Region server (TMR server) on one node is not supported, a possible configuration to provide high availability would be to configure a cluster with the primary node, hot standby node and a disk subsystem.

Figure 3-2 shows a simple hot standby HA cluster with two nodes and a shared external disk. IBM Tivoli Management Framework is installed on the shared disk, and normally resides on Node1. When Node1 fails, TMF will fall over to Node2.

Figure 3-2 A hot standby cluster for a TMR server

Cluster

TivoliManagementFrameworkServer

Node1Node2


3.2 Implementing an HACMP clusterHACMP is a clustering software provided by IBM for implementing high availability solutions on AIX platforms. In the following sections we describe the process of planning, designing, and implementing a high availability scenario using HACMP. For each implementation procedure discussed in this section, we provide examples by planning an HACMP cluster for IBM Tivoli Workload Scheduler high availability scenario.

3.2.1 HACMP hardware considerationsAs mentioned in Chapter 2, “High level design and architecture” on page 31, the ultimate goal in implementing an HA cluster is to eliminate all possible single points of failure. Keep in mind that cluster software alone does not provide high availability; appropriate hardware configuration is also required to implement a highly available cluster. This applies to HACMP as well. For general hardware considerations about an HA cluster, refer to 2.2, “Hardware configurations” on page 43.

3.2.2 HACMP software considerationsHACMP not only provides high availability solutions for hardware, but for mission-critical applications that utilize those hardware resources as well. Consider the following before you plan high availability for your applications in an HACMP cluster:

� Application behavior� Licensing� Dependencies� Automation� Robustness� Fallback policy

For details on what you should consider for each criteria, refer to 2.1.2, “Software considerations” on page 39.

3.2.3 Planning and designing an HACMP clusterAs mentioned in Chapter 2, “High level design and architecture” on page 31, the sole purpose of implementing an HACMP cluster is to eliminate possible single points of failure in order to provide high availability for both hardware and software. Thoroughly planning the use of both hardware and software components is required prior to HACMP installation.


To plan our HACMP cluster, we followed the steps described in HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861. Because we cannot cover all possible high availability scenarios, in this section we discuss only the planning tasks needed to run IBM Tivoli Workload Scheduler in a mutual takeover scenario. Planning tasks for a mutual takeover scenario can be extended for a hot standby scenario.

The following planning tasks are described in this section.

� Planning the cluster nodes� Planning applications for high availability� Planning the cluster network� Planning the shared disk device� Planning the shared LVM components� Planning the resource groups� Planning the cluster event processing

Use planning worksheetsA set of offline and online planning worksheets is provided for HACMP 5.1. For a complete and detailed description of planning an HACMP cluster using these worksheets, refer to HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861.

By filling out these worksheets, you will be able to plan your HACMP cluster easily. Here we describe some of the offline worksheets. (Note, however, that our description is limited to the worksheets and fields that we used; fields and worksheets that were not essential to our cluster plan are omitted.)

Draw a cluster diagramIn addition to using these worksheets, it is also advisable to draw a diagram of your cluster as you plan. A cluster diagram should provide an image of where your cluster resources are located. In the following planning tasks, we show diagrams of what we planned in each task.

Planning the cluster nodesThe initial step in planning an HACMP cluster is to plan the size of your cluster. This is the phase where you define how many nodes and disk subsystems you need in order to provide high availability for your applications. If you plan high availability for one application, a cluster of two nodes and one disk subsystem may be sufficient. If you are planning high availability for two or more applications installed on several servers, you may want to add more than one nodes to provide high availability. You may also need more than one disk subsystem, depending on the amount of data you plan to store on external disks.


For our scenario of a mutual takeover, we plan a cluster with two AIX platforms and an SSA disk subsystem to share. Machine types used in the scenario are a given environment in our lab. When planning for a mutual takeover configuration, make sure that each node has sufficient machine power to perform the job of its own and the job of the other node in the event that a fallover occurs. Otherwise, you may not achieve maximum application performance during a fallover.

Figure 3-3 shows a diagram of our cluster node plan. The cluster name is cltivoli. There are two nodes in the cluster, tivaix1 and tivaix2, sharing an external disk subsystem. Each node will run one business application and one instance of IBM Tivoli Workload Scheduler to manage that application. Note that we left some blank space in the diagram for adding cluster resources to this diagram as we plan.

In this section and the following sections, we describe the procedures to plan an HACMP cluster using our scenario as an example. Some of the planning tasks may be extended to configure high availability for other applications; however, we are not aware of application-specific considerations and high availability requirements.

Figure 3-3 Cluster node plan

tivaix1

Disk Adapter

Disk Adapter

Disk subsystem

Cluster: cltivoli

tivaix2


Planning applications for high availabilityAfter you have planned the cluster nodes, the next step is to define where your application executables and data should be located, and how you would like HACMP to control them in the event of a fallover or fallback. For each business application or any other software packages that you plan to make highly available, create an application definition and an application server.

Application definition means giving a user-defined name to your application, and then defining the location of your application and how it should be handled in the event of fallover. An application server is a cluster resource that associates the application and the names of specially written scripts to start and stop the application. Defining an application server enables HACMP to resume application processing on the takeover node when a fallover occurs.

When planning for applications, the following HACMP worksheets may help to record any required information.

� Application Worksheet

� Application Server Worksheet

Completing the Application WorksheetThe Application Worksheet helps you to define which applications should be controlled by HACMP, and how they should be controlled. After completing this worksheet, you should have at least the following information defined:

Application Name Assign a name for each application you plan to put under HACMP control. This is a user-defined name associated with an application.

Location of Key Application Files For each application, define the following information for the executables and data. Make sure you enter the full path when specifying the path of the application files.

-Directory/path where the files reside -Location (internal disk/external disk) -Sharing (shared/not shared)

Cluster Name Name of the cluster where the application resides.

Node Relationship Specify the takeover relationship of the nodes in the cluster (choose from cascading, concurrent, or rotating). For a description of each takeover relationship,


refer to HACMP for AIX Version 5.1, Concepts and Facilities Guide, SC23-4864.

Fallover Strategy Define the fallover strategy for the application. Specify which node would be the primary and which node will be the takeover.

-The primary node will control the application in normal production.

-The takeover node will control the application in the event of a fallover deriving from a primary node failure or a component failure on the primary node.

Start Commands/Procedures Specify the commands or procedures for starting the application. This is the command or procedure you will write in your application start script. HACMP invokes the application start script in the event of a cluster start or a fallover.

Verification Commands Specify the commands to verify that your application is up or running.

Stop Commands/Procedures Specify the commands or procedures for stopping the application. This is the command or procedure you will write in your application stop script. HACMP will invoke the application stop script in the event of a cluster shutdown or a fallover.

Verification Commands Specify the commands to verify that your application has stopped.

Table 3-1 on page 72 and Table 3-2 on page 73 show examples of how we planned IBM Tivoli Workload Scheduler for high availability. Because we plan to have two instances of IBM Tivoli Workload Scheduler running in one cluster, we defined two applications, TWS1 and TWS2. In normal production, TWS1 resides on node tivaix1, while TWS2 resides on node tivaix2.

Note: Start, stop, and verification commands specified in this worksheet should not require operator intervention; otherwise, cluster startup, shutdown, fallover, and fallback may halt.


Notice that we placed the IBM Tivoli Workload Scheduler file systems on the external shared disk, because both nodes must be able to access the two IBM Tivoli Workload Scheduler instances for fallover. The two instances of IBM Tivoli Workload Scheduler should be located in different file systems to allow both instances of IBM Tivoli Workload Scheduler to run on the same node. Node relationship is set to cascading because each IBM Tivoli Workload Scheduler instance should return to its primary node when it rejoins the cluster.

Table 3-1 Application definition for IBM Tivoli Workload Scheduler1 (TWS1)

Note: If you are installing an IBM Tivoli Workstation Scheduler version older than 8.2, you cannot use /usr/maestro and /usr/maestro2 as the installation directories. Why? Because in such a case, both installations would use the same Unison directory—and the Unison directory should be unique for each installation.

Therefore, if installing a version older than 8.2, we suggest using /usr/maestro1/TWS and /usr/maestro2/TWS as the installation directories, which will make the Unison directory unique.

For Version 8.2, this is not important, since the Unison directory is not used in this version.

Items to define Value

Application Name TWS1

Location of Key Application Files1. Directory/path where the files

reside2. Location (internal disk/external

disk)3. Sharing (shared/not shared)

1. /usr/maestro2. external disk3. shared

Cluster Name cltivoli

Node Relationship cascading

Fallover Strategy tivaix1: primarytivaix2: takeover


Table 3-2 Application definition for Tivoli Workstation Scheduler2 (TWS2)

Start Commands/Procedures 1. run conman start to start IBM Tivoli Workload Scheduler process as maestro user

2. run conman link @; noask to link all FTAs

Verification Commands 1. run ps -ef | grep -v grep | grep ‘/usr/maestro’

2. check that netman, mailman, batchman and jobman are running

Stop Commands/Procedures 1. run conman unlink @;noask to unlink all FTAs as maestro user

2. run conman shut to stop IBM Tivoli Workload Scheduler process as maestro user

Verification Commands 1. run ps -ef | grep -v grep | grep ‘/usr/maestro’

2. check that netman, mailman, batchman and jobman are not running


Application Name TWS2

Location of Key Application Files1. Directory/path where the files

reside2. Location (internal disk/external

disk)3. Sharing (shared/not shared)

1. /usr/maestro22. external disk3. shared


Node Relationship cascading

Fallover Strategy tivaix2: primarytivaix1: takeover



Completing the Application Server WorksheetThis worksheet helps you to plan the application server cluster resource. Define an application server resource for each application that you defined in the Application Worksheet. If you plan to have more than one application server in a cluster, then add a server name and define the corresponding start/stop script for each application server.

Cluster Name Enter the name of the cluster. This must be the same name you specified for Cluster Name in the Application Worksheet.

Server Name For each application in the cluster, specify an application server name.

Start Script Specify the name of the application start script for the application server in full path.

Stop Script Specify the name of the application stop script for the application server in full path.

We defined two application servers, tws_svr1 and tws_svr2 in our cluster; tws_svr1 is for controlling application TWS1, and tws_svr2 is for controlling application TWS2. Table 3-3 shows the values we defined for tws_svr1.

Start Commands/Procedures 1. run conman start to start IBM Tivoli Workload Scheduler process as maestro user

2. run conman link @; noask to link all FTAs

Verification Commands 1. run ps -ef | grep -v grep | grep ‘/usr/maestro2’

2. check that netman, mailman, batchman and jobman are running

Stop Commands/Procedures 1. run conman unlink @;noask to unlink all FTAs as maestro user


Verification Commands 1. run conman unlink @;noask to unlink all FTAs as maestro user




Table 3-3 Application server definition for tws_svr1

Table 3-4 shows the values we defined for tws_svr2.

Table 3-4 Application server definition for tws_svr2

After planning your application, add the information about your applications into your diagram. Figure 3-4 shows an example of our cluster diagram populated with our application plan. We omitted specifics such as start scripts and stop scripts, because the purpose of the diagram is to show the names and locations of cluster resources.

Items to define

Value


Server Name tws_svr1

Start Script /usr/es/sbin/cluster/scripts/start_tws1.sh

Stop Script /usr/es/sbin/cluster/scripts/stop_tws1.sh

Items to define

Value


Server Name tws_svr2

Start Script /usr/es/sbin/cluster/scripts/start_tws2.sh

Stop Script /usr/es/sbin/cluster/scripts/stop_tws2.sh


Figure 3-4 Cluster diagram with applications added

Planning the cluster networkThe cluster network must be planned so that network components (network, network interface cards, TCP/IP subsystems) are eliminated as a single point of failure. When planning the cluster network, complete the following tasks:

� Design the cluster network topology.

Network topology is the combination of IP and non-IP (point-to-point) networks to connect the cluster nodes and the number of connections each node has to each network.

� Determine whether service IP labels will be made highly available with IP Address Takeover (IPAT) via IP aliases or IPAT via IP Replacement. Also determine whether IPAT will be done with or without hardware address takeover.

Service IP labels are relocatable virtual IP label HACMP uses to ensure client connectivity in the event of a fallover. Service IP labels are not bound to a particular network adapter. They can be moved from one adapter to another, or from one node to another.

tivaix1

SSA

SSA

SSA Disk subsystem

Cluster: cltivoli

tivaix2


We used the TCP/IP Network Worksheet, TCP/IP Network Interface Worksheet, and Point-to-point Networks Worksheet to plan our cluster network.

Completing the TCP/IP Network WorksheetEnter information about all elements of your TCP/IP network that you plan to have in your cluster. The following items should be identified when you complete this worksheet.

Cluster Name The name of your cluster.

Then, for each network, specify the following.

Network Name Assign a name for the network.

Network Type Enter the type of the network (Ethernet, Token Ring, and so on.)

Netmask Enter the subnet mask for the network.

Node Names Enter the names of the nodes you plan to include in the network.

IPAT via IP Aliases Choose whether to enable IP Address Takeover (IPAT) over IP Aliases or not. If you do not enable IPAT over IP Aliases, it will be IPAT via IP Replacement. For descriptions of the two types of IPAT, refer to HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861.

IP Address Offset for Heartbeating over IP Aliases

Complete this field if you plan heartbeating over IP Aliases. For a detailed description, refer to HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861.

Table 3-5 on page 81 lists the values we specified in the worksheet. We defined one TCP/IP network called net_ether_01.

Note: A network in HACMP is a group of network adapters that will share one or more service IP labels. Include all physical and logical network that act as a backup for one another in one network.

For example, if two nodes are connected to two redundant physical networks, then define one network to include the two physical networks.


Table 3-5 TCP/IP Network definition

Completing the TCP/IP Network Interface WorksheetAfter you have planned your TCP/IP network definition, plan your network Interface. Associate your IP labels and IP address to network interface. When you complete this worksheet, the following items should be defined. Complete this worksheet for each node you plan to have in your cluster.

Node Name Enter the node name.

IP Label Assign an IP label for each IP Address you plan to have for the node.

Network Interface Assign a physical network interface (for example, en0, en1) to the IP label.

Network Name Assign an HACMP network name. This network name must be one of the networks you defined in the TCP/IP Network Worksheet.

Interface Function Specify the function of the interface and whether the interface is service, boot or persistent.

IP Address Associate an IP address to the IP label.


Network Name

Network Type Netmask Node Names IPAT via IP Aliases


net_ether_01 Ethernet 255.255.255.0 tivaix1, tivaix2 enable 172.16.100.1

Note: In HACMP, there are several kinds of IP labels you can define. A boot IP label is a label that is bound to one particular network adapter. This label is used when the system starts.

A Service IP label is a label that is associated with a resource group and is able to move from one adapter to another on the same node, or from one node to another. It floats among the physical TCP/IP network interfaces to provide IP address consistency to an application serviced by HACMP. This IP label exists only when cluster is active.

A Persistent IP label is a label bound to a particular node. This IP label also floats among two or more adapters in one node, to provide constant access to a node, regardless of the cluster state.


Netmask Enter the netmask.

Hardware Address Specify hardware address of the network adapter if you plan IPAT with hardware address takeover.

Table 3-6 and Table 3-7 show the values we entered in our worksheet. We omitted hardware address because we do not plan to have hardware address takeover.

Table 3-6 TCP/IP network interface plan for tivaix1

Table 3-7 TCP/IP network interface plan for tivaix2

Completing the Point-to-Point Networks WorksheetYou may need a non-TCP/IP point-to-point network in the event of a TCP/IP subsystem failure. The Point-to-Point Networks Worksheet helps you to plan non-TCP/IP point-to-point networks. When you complete this worksheet, you should have the following items defined.

Cluster name Enter the name of your cluster.

Then, for each of your point-to-point networks, enter the values for the following items:

Network Name Enter the name of your point-to-point network.

Node Name tivaix1

IP Label Network Interface

Network Name

Interface Function

IP Address Netmask

tivaix1_svc - net_ether_01 service 9.3.4.3 255.255.254.0

tivaix1_bt1 en0 net_ether_01 boot 192.168.100.101 255.255.254.0


tivaix1 - net_ether_01 persistent 9.3.4.194 255.255.254.0

Node Name tivaix2

IP Label Network Interface

Network Name

Interface Function

IP Address Netmask

tivaix2_svc - net_ether_01 service 9.3.4.4 255.255.254.0



tivaix2 - net_ether_01 persistent 9.3.4.195 255.255.254.0


Network Type Enter the type of your network (disk heartbeat, Target Mode SCSI, Target Mode SSA, and so on). Refer to HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861, for more information.

Node Names Enter the name of the nodes you plan to connect with the network.

Hdisk Enter the name of the physical disk (required only for disk heartbeat networks).

Table 3-8 on page 91 lists the definition for point-to-point network we planned in our scenario. We omitted the value for Hdisk because we did not plan disk heartbeats.

Table 3-8 Point-to-point network definition

After you have planned your network, add your network plans to the diagram. Figure 3-5 on page 81 shows our cluster diagram with our cluster network plans added. There is a TCP/IP network definition net_ether_01. For a point-to-point network, we added net_tmssa_01. For each node, we have two boot IP labels, a service IP label and a persistent IP label.


Network Name Network Type Node Names

net_tmssa_01 Target Mode SSA tivaix1, tivaix2


Figure 3-5 Cluster diagram with network topology added

Planning the shared disk deviceShared disk is an essential part of HACMP cluster. It is usually one or more external disks shared among two or more cluster nodes. In a non-concurrent configuration, only one node at a time has control of the disks. If a node fails in a cluster, the node with the next highest priority in the cluster acquires the ownership of the disks and restarts applications to restore mission-critical services. This ensures constant access to application executables and data stored on those disks. When you complete this task, at a minimum the following information should be defined:

� Type of shared disk technology� The number of disks required� The number of disk adapters

tivaix1

SSA

SSA

SSA Disk subsystem

Cluster: cltivoli

en0

en1

en0

en1

boot1 tivaix1_bt1

service tivaix1_svc, persistent tivaix1

boot2: tivaix1_bt2

service tivaix2_svc,persistent tivaix2

tivaix2

IP labels and address fortivaix1:tivaix1_bt1: 192.168.100.101tivaix1_bt2: 10.1.1.101tivaix1_svc: 9.3.4.3tivaix1: 9.3.4.194


net_ether_01

net_tmssa_01


HACMP supports several disk technologies, such as SCSI and SSA. For a complete list of supported disk device, consult your service provider. We used an SSA disk subsystem for our scenario, because this was the given environment of our lab.

Because we planned to have two instances of IBM Tivoli Workload Scheduler installed in separate volume groups, we needed at least two physical disks. Mirroring SSA disks is recommended, as mirroring an SSA disk enables the replacement of a failed disk drive without powering off entire system. Mirroring requires an additional disk for each physical disk, so the minimum number of disks would be four physical disks.

To avoid having disk adapters become single points of failure, redundant disk adapters are recommended. In our scenario, we had one disk adapter for each node, due to the limitations of our lab environment.

Figure 3-6 on page 83 shows a cluster diagram with at least four available disks in the SSA subsystem. While more than one disk adapter per node is recommended, we only have one disk adapter on each node due to the limitations of our environment.


Figure 3-6 Cluster diagram with disks added

Planning the shared LVM componentsAIX uses Logical Volume Manager (LVM) to manage disks. LVM components (physical volumes, volume groups, logical volumes, file systems) maps data between physical and logical storage. For more information on AIX LVM, refer to AIX System Management Guide.

To share and control data in an HACMP cluster, you need to define LVM components. When planning for LVM components, we used the Shared Volume Group/Filesystem Worksheet.

tivaix1

SSA

SSA

Cluster: cltivoli

en0

en1

en0

en1

boot1 tivaix1_bt1


boot2 tivaix1_bt2


boot1 tivaix2_bt1

boot2 tivaix2_bt2

tivaix2



net_tmssa_01

net_ether_01


Completing the Shared Volume Group/Filesystem WorksheetFor each field in the worksheet, you should have at least the following information defined. This worksheet should be completed for each shared volume group you plan to have in your cluster.

Node Names Record the node name of the each node in the cluster.

Shared Volume Group Name Specify a name for the volume group shared by the nodes in the cluster.

Major Number Record the planned major number for the volume group. This field could be left blank to use the system default if you do not plan to have NFS exported filesystem.

When configuring shared volume group, take note of the major number. You may need this when importing volume groups on peer nodes.

Log Logical Volume Name Specify a name for the log logical volume (jfslog). The name of the jfslog must be unique in the cluster. (Do not use the system default name, because a log logical name on another node may be assigned the identical name.) When creating jfslog, make sure you rename it to the name defined in this worksheet.

Physical Volumes For each node, record the names of physical volumes you plan to include in the volume group. Physical volume names may differ by node, but PVIDs (16-digit IDs for physical volumes) for the shared physical volume must be the same on all nodes. To check the PVID, use the lspv command.

Then, for each logical volume you plan to include in the volume group, fill out the following information:

Logical Volume Name Assign a name for the logical volume.

Number of Copies of Logical Partition

Specify the number of copies of the logical volume. This number is needed for mirroring the logical volume. If you plan mirroring, the number of copies must be 2 or 3.


Filesystem Mount Point Assign a mount point for the logical volume name.

Size Specify the size of the file system in 512-byte blocks.

Table 3-9 and Table 3-10 show the definition of volume groups planned for our scenario. Because we plan to have shared volume groups for each instance of IBM Tivoli Workload Scheduler, we defined volume groups tiv_vg1 and tiv_vg2. Then, we defined one logical volume in each of the volume groups to host a file system. We assigned major numbers instead of using system default, but this is not mandatory when you are not using NFS exported file systems.

Table 3-9 Definitions for shared volume groups /file system (tiv_vg1)

Table 3-10 Definitions for shared volume groups /file system (tiv_vg2)


Node Names tivaix1, tivaix2

Shared Volume Group Name tiv_vg1

Major Number 45

Log Logical Volume name lvtws1_log

Physical Volume son tivaix1 hdisk6

Physical Volumes on tivaix2 hdisk7

Logical Volume Name lvtws1

Number of Copies 2

Filesystem Mount point /usr/maestro

Size 1048576


Node Names tivaix1, tivaix2

Shared Volume Group Name tiv_vg2

Major Number 46

Log Logical Volume name lvtws2_log



Logical Volume Name lvtws2


Figure 3-8 on page 91 shows the cluster diagram with shared LVM components added.

Figure 3-7 Cluster diagram with shared LVM added

Number of Copies 2

Filesystem Mount point /usr/maestro2

Size 1048576


tivaix1

SSA

Cluster: cltivoli

en0

en1

en0

en1

boot1 tivaix1_bt1


boot2 tivaix1_bt2

service tivaix2_svc,persistent tivaix2boot1 tivaix2_bt1

boot2 tivaix2_bt2

tivaix2



net_ether_01

net_tmssa_01


Planning the resource groupsA resource group refers to a set of resources that will move from one node to another in the event of an HACMP fallover or a fallback. A resource group usually consists of volume groups and service IP address. For this task, we used the Resource Group Worksheet. One worksheet must be completed for each resource group that you plan. The following items should be defined when you complete the worksheet.

Cluster Name Specify the name of the cluster where the resource group reside. This should be the name that you have defined when planning the cluster nodes.

Resource Group Name Assign a name for the resource group you are planning.

Management Policy Choose the management policy of the resource group (Cascading, Rotating, Concurrent or Custom). For details on management policy, refer to HACMP for AIX Version 5.1, Concepts and Facilities Guide, SC23-4864.

Participating Nodes/Default Node PrioritySpecify the name of nodes that may acquire the resource group. When specifying the nodes, make sure the nodes are listed in the order of their priority (nodes with higher priority should be listed first.)

Service IP Label Specify the service IP label for IP Address Takeover (IPAT). This IP label is associated to the resource group, and it is transferred to another adapter or a node in the event of a resource group fallover.

Volume Groups Specify the name of the volume group(s) to include in the resource group.

Filesystems Specify the name of the file systems to include in the resource group.


Filesystems Consistency Check Specify fsck or logredo. This is the method to check consistency of the file system.

Filesystem Recovery Method Specify parallel or sequential. This is the recovery method for the file systems.

Automatically Import Volume Groups Set it to true if you wish to have volume group imported automatically to any cluster nodes in the resource chain.

Inactive Takeover Set it to true or false. If you want the resource groups acquired only by the primary node, set this attribute to false.

Cascading Without Fallback Activated Set it to true or false. If you set this to true, then a resource group that has failed over to another node will not fall back automatically in the event that its primary node rejoins the cluster. This option is useful if you do not want HACMP to move resource groups during application processing.

Disk Fencing Activated Set it to true or false.

File systems Mounted before IP Configured

Set it true or false.

Table 3-11 on page 89 and Table 3-12 on page 89 show how we planned our resource groups. We defined one resource group for each of the two instances of IBM Tivoli Workload Scheduler, rg1 and rg2.

Notice that we set Inactive Takeover Activated to false, because we want the resource group to always be acquired by the node that has the highest priority in the resource chain.

Note: There is no need to specify file system names if you have specified a name of a volume group, because all the file systems in the specified volume group will be mounted by default. In the worksheet, leave the file system field blank unless you need to include individual file systems.


We set Cascading Without Fallback Activated to true because we do not want IBM Tivoli Workload Scheduler to fall back to the original node while jobs are running.

Table 3-11 Definition for resource group tws1_rg

Table 3-12 Definition for resource group tws1_rg2



Resource Group Name rg1

Management policy cascading

Participating Nodes/Default Node Priority

tivaix1, tivaix2

Service IP Label tivaix1_svc

Volume Groups tiv_vg1

Filesystems Consistency Check fsck

Filesystem Recovery Method sequential

Automatically Import Volume Groups false

Inactive Takeover Activated false

Cascading Without Fallback Activated true

Disk Fencing Activated false


false



Resource Group Name rg1

Management policy cascading

Participating Nodes/Default Node Priority

tivaix2, tivaix1

Service IP Label tivaix2_svc

Volume Groups tiv_vg2


Figure 3-9 on page 85 shows the cluster diagram with resource groups added.

Filesystems Consistency Check fsck

Filesystem Recovery Method sequential

Automatically Import Volume Groups false

Inactive Takeover Activated false

Cascading Without Fallback Activated true

Disk Fencing Activated false


false



Figure 3-8 Cluster diagram with resource group added

Planning the cluster event processingA cluster event is a change of status in the cluster. For example, if a node leaves the cluster, that is a cluster event. HACMP takes action based on these events by invoking scripts related to each event. A default set of cluster events and related scripts are provided. If you want some specific action to be taken on an occurrence of these events, you can define a command or script to execute before/after each event. You may also define events of your own. For details on cluster events and customizing events to tailor your needs, refer to HACMP documentation.

In this section, we give you an example of customized cluster event processing. In our scenario, we planned our resource group with CWOF because we do not

tivaix1

SSA

SSA

SSA Disk subsystem

Cluster: cltivoli

en0

en1

en0

en1

Resource Group: rg1Service IP Label:tivaix1_svcVolume Group: tiv_vg1

Resource Group: rg2Service IP Label:tivaix2_svcVolume Group: tiv_vg2

boot1 tivaix1_bt1


boot2 tivaix1_bt2

service tivaix2_svc,persistent tivaix2boot1 tivaix2_bt1

tivaix2



net_ether_01


want HACMP to fallback IBM Tivoli Workload Scheduler during job execution. However, this leaves two instances of IBM Tivoli Workload Scheduler running on one node, even after the failed node has reintegrated into the cluster. The resource group must be manually transferred to the reintegrated node, or some implementation must be done to automate this procedure.

Completing the Cluster Event WorksheetTo plan cluster event processing, you will need to define several items. The Cluster Event Worksheet helps you to plan your cluster events. Here we describe the items that we defined for our cluster events.

Cluster Name The name of the cluster.

Cluster Event Name The name of the event you would like to configure.

Post-Event Command The name of the command or script you would like to execute after the cluster event you specified in the Cluster Event Name field.

Table 3-13 shows the values we defined for each item.

Table 3-13 Definition for cluster event

3.2.4 Installing HACMP 5.1 on AIX 5.2This section provides step-by-step instructions for installing HACMP 5.1 on AIX 5.2. First we cover the steps to prepare the system for installing HACMP, then we go through the installation and configuration steps.

PreparationBefore you install HACMP software, complete the following tasks:

� Meet all hardware and software requirements

� Configure the disk subsystems

� Define the shared LVM components

� Configure Network Adapters



Cluster Event Name node_up_complete

Post-Event Command /usr/es/sbin/cluster/sh/quiesce_tws.sh


Meet all hardware and software requirementsMake sure your system meets the hardware and software requirements for HACMP software. The requirements may vary based on the hardware type and software version that you use. Refer to the release notes for requirements.

Configure the disk subsystemsDisk subsystems are an essential part of an HACMP cluster. The external disk subsystems enable physically separate nodes to share the same set of disks. Disk subsystems must be cabled and configured properly so that all nodes in a cluster is able to access the same set of disks. Configuration may differ depending on the type of disk subsystems you use. In our scenario, we used IBM 7133 Serial Storage Architecture (SSA) Disk Subsystem Model 010.

Figure 3-9 shows how we cabled our 7133 SSA Disk Subsystem.

Figure 3-9 SSA Cabling for high availability scenario

SSAAdapter

front

back

Disk Group1

Disk Group3

Disk Group2

Disk Group3Disk Group47133 Unit

Node1 Node2

SSAAdapter


The diagram shows a single 7133 disk subsystem containing eight disk drives connected between two nodes in a cluster. Each node has one SSA Four Port Adapter. The disk drives in the 7133 are cabled to the two machines in two loops. Notice that there is a loop that connects Disk Group1 and the two nodes, and another loop that connects Disk Group2 and the two nodes. Each loop is connected to a different port pair on the SSA Four Port Adapters, which enables the two nodes to share the same set of disks.

Once again, keep in mind that this is only an example scenario of a 7133 disk subsystem configuration. Configuration may vary depending on the hardware you use. Consult your system administrator for precise instruction on configuring your external disk device.

Define the shared LVM componentsPrior to installing HACMP, shared LVM components such as volume groups and file systems must be defined. In this section, we provide a step-by-step example of the following tasks:

� Defining volume groups

� Defining file systems

� Renaming logical volumes

� Importing volume groups

� Testing volume group migrations

Defining volume groups

1. Log in as root user on tivaix1.

2. Open smitty. The following command takes you to the Volume Groups menu.

# smitty vg

a. In the Volume Groups menu, select Add a Volume Group as seen in Figure 3-10 on page 95.

Important: In our scenario, we used only one SSA adapter per node. In actual production environments, we recommend that an additional SSA adapter be added to each node to eliminate single points of failure.


Figure 3-10 Volume Group SMIT menu

b. In the Add a Volume Group screen (Figure 3-11 on page 96), enter the following value for each field. Note that physical volume names and volume group major number may vary according to your system configuration.

VOLUME GROUP Name:tivaix1

Physical Partition SIZE in megabytes:4

PHYSICAL VOLUME names:hdisk6, hdisk7

Activate volume group AUTOMATICALLY at system restart?: no

Volume Group MAJOR NUMBER:45

Create VG Concurrent Capable?:no

Volume Groups

Move cursor to desired item and press Enter.

[TOP] List All Volume Groups Add a Volume Group Set Characteristics of a Volume Group List Contents of a Volume Group Remove a Volume Group Activate a Volume Group Deactivate a Volume Group Import a Volume Group Export a Volume Group Mirror a Volume Group Unmirror a Volume Group Synchronize LVM Mirrors Back Up a Volume Group Remake a Volume Group Preview Information about a Backup[MORE...4]

F1=Help F2=Refresh F3=Cancel Esc+8=ImageEsc+9=Shell Esc+0=Exit Enter=Do


Figure 3-11 Defining a volume group

c. Verify that the volume group you specified in the previous step (step d) is successfully added and varied on.

# lsvg -o

Example 3-1 shows the command output. With the -o option, you will only see the volume groups that are successfully varied on. Notice that volume group tiv_vg1 is added and varied on.

Example 3-1 lvsg -o output

# lsvg -o tiv_vg1rootvg

Defining file systems

1. To create a file system, enter the following command. This command takes you to the Add a Journaled File System menu.

# smitty crjfs

Add a Volume Group

Type or select values in entry fields.Press Enter AFTER making all desired changes.

[Entry Fields] VOLUME GROUP name [tiv_vg1] Physical partition SIZE in megabytes 4 +* PHYSICAL VOLUME names [hdisk6 hdisk7] + Force the creation of a volume group? no + Activate volume group AUTOMATICALLY yes + at system restart? Volume Group MAJOR NUMBER [45] +# Create VG Concurrent Capable? no + Create a big VG format Volume Group? no + LTG Size in kbytes 128 +

F1=Help F2=Refresh F3=Cancel F4=ListEsc+5=Reset Esc+6=Command Esc+7=Edit Esc+8=ImageEsc+9=Shell Esc+0=Exit Enter=Do


2. Select Add a Standard Journaled File System (Figure 3-12). You are prompted to select a volume group in which the shared filesystem should reside. Select the shared volume group that you defined previously, and proceed to the next step.

Figure 3-12 Add a Journaled File System menu

3. Specify the following values for the new journaled file system.

Volume Group Name:tiv_vg1

SIZE of file system unit size:Megabytes

Number of Units: 512

MOUNT POINT: /usr/maestro

Mount AUTOMATICALLY at system restart?:no

Start Disk Accounting: no

Add a Journaled File System


Add a Standard Journaled File System Add a Compressed Journaled File System Add a Large File Enabled Journaled File System



Figure 3-13 shows our selections.

Figure 3-13 Defining a journaled file system

4. Mount the file system using the following command:

# mount /usr/maestro

5. Using the following command, verify that the filesystem is successfully added and mounted:

# lsvg -l tiv_vg1

Note: When creating a file system that will be put under control of HACMP, do not set the attribute of Mount AUTOMATICALLY at system restart to YES. HACMP will mount the file system after cluster start.

Add a Standard Journaled File System


[Entry Fields] Volume group name tiv_vg1 SIZE of file system Unit Size Megabytes +* Number of units [512] #* MOUNT POINT [/usr/maestro] Mount AUTOMATICALLY at system restart? no + PERMISSIONS read/write + Mount OPTIONS [] + Start Disk Accounting? no + Fragment Size (bytes) 4096 + Number of bytes per inode 4096 + Allocation Group Size (MBytes) 8 +



Example 3-1 on page 96 shows a sample of the command output.

Example 3-2 lsvg -l tiv_vg1 output

tiv_vg1:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTloglv00 jfslog 1 1 1 open/syncd N/Alv06 jfs 1 1 1 open/syncd /usr/maestro

6. Unmount the file system using the following command:

# umount /usr/maestro

Renaming logical volumes

Before we proceed to configuring network adapters, we need to rename the logical volume name for the file system we created. This is because in an HACMP cluster, all shared logical volumes need to have a unique name.

1. Determine the name of the logical volume and the logical log volume by entering the following command.

# lsvg -l tiv_vg1

Example 3-3 shows the command output. Note that the logical volume name is loglv00, and the file system is lv06.

Example 3-3 lsvg -l tiv_vg1 output

# lsvg -l tiv_vg1tiv_vg1:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTloglv00 jfslog 1 1 1 closed/syncd N/Alv06 jfs 1 1 1 closed/syncd /usr/maestro

2. Enter the following command. This will take you to the Change a Logical Volume menu.

# smitty chlv

3. Select Rename a Logical Volume (see Figure 3-14 on page 100).


Figure 3-14 Changing a Logical Volume menu

4. Select or type the current logical volume name, and enter the new logical volume name. In our example, we use lv06 for the current name, and lvtiv1 for the new name (see Figure 3-15 on page 101).

Change a Logical Volume


Change a Logical Volume Rename a Logical Volume



Figure 3-15 Renaming a logical volume

5. Perform the steps 1 through 4 for the logical log volume. We specified the name of the current logical log volume name loglv00 and the new logical volume name as lvtiv1_log.

6. Verify that the logical volume name has been changed successfully by entering the following command.

# lsvg -l tiv_vg1

Example 3-4 shows the command output.

Example 3-4 Command output of lsvg

# lsvg -l tiv_vg1tiv_vg1:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTlvtws1_log jfslog 1 2 2 open/syncd N/Alvtws1 jfs 512 1024 2 open/syncd /usr/maestro

7. After renaming the logical volume and the logical log volume, check the entry for the file system in the /etc/filesystems file. Make sure the attributes dev and log reflect the change. The value for dev should be the new name for the

Rename a Logical Volume


[Entry Fields]* CURRENT logical volume name [lv06] +* NEW logical volume name [lvtiv1]



logical volume, while the value for log should be the name of the jfs log volume. If the log attributes do not reflect the change, issue the following command. (We used /dev/lvtws1_log in our example.)

# chfs -a log=/dev/lvtws1_log /usr/maestro

Example 3-5 shows how the entry for the file system should look in the /etc/filesystems file. Notice that the value for attribute dev is the new logical volume name(/dev/lvtws1), and the value for attribute log is the new logical log volume name (/dev/lvtws1_log).

Example 3-5 An entry in the /etc/filesystems file

/usr/maestro: dev = /dev/lvtws1 vfs = jfs log = /dev/lvtws1_log mount = false options = rw account = false

Importing the volume groups

At this point, you should have a volume group and a file system defined on one node. The next step is to set up the volume group and the file system so that the both nodes are able to access them. We do this by importing the volume groups from the source node to destination node. In our scenario, we import volume group tiv_vg1 to tivaix2.

The following steps describe how to import a volume group from one node to another. In these steps we refer to tivaix1 as the source server, and tivaix2 as the destination server.

1. Log in to the source server.

2. Check the physical volume name and the physical volume ID of the disk in which your volume group reside. In Example 3-6 on page 103, notice that the first column indicates the physical volume name, and the second column indicates the physical volume ID. The third column shows which volume group resides on each physical volume.

Check the physical volume ID (shown in the second column) for the physical volumes related to your volume group, as this information is required in the following steps to come.

# lspv

Example 3-6 on page 103 shows example output from tivaix1. You can see that volume group tiv_vg1 resides on hdisk6 and hdisk7.


Example 3-6 Output of an lspv command

# lspvhdisk0 0001813fe67712b5 rootvg activehdisk1 0001813f1a43a54d rootvg activehdisk2 0001813f95b1b360 rootvg activehdisk3 0001813fc5966b71 rootvg activehdisk4 0001813fc5c48c43 Nonehdisk5 0001813fc5c48d8c Nonehdisk6 000900066116088b tiv_vg1 activehdisk7 000000000348a3d6 tiv_vg1 activehdisk8 00000000034d224b Nonehdisk9 none Nonehdisk10 none Nonehdisk11 none Nonehdisk12 00000000034d7fad None

3. Vary off tiv_vg1 from the source node:

# varyoffvg tiv_vg1

4. Log into the destination node as root.

5. Check the physical volume name and ID on the destination node. Look for the same physical volume ID that you identified in step 2.).

Example 3-7 shows output of the lspv command run on node tivaix2. Note that hdisk5 has the same physical volume id as hdisk6 on tivaix1, and hdisk6 has the same physical volume ID as hdisk7 on tivaix1.

# lspv

Example 3-7 Output of lspv on node tivaix2

# lspvhdisk0 0001814f62b2a74b rootvg activehdisk1 none Nonehdisk2 none Nonehdisk3 none Nonehdisk4 none Nonehdisk5 000900066116088b Nonehdisk6 000000000348a3d6 None1hdisk7 00000000034d224b tiv_vg2 activehdisk16 0001814fe8d10853 Nonehdisk17 none Nonehdisk18 none Nonehdisk19 none Nonehdisk20 00000000034d7fad tiv_vg2 active


Importing volume groups

To import a volume group, enter the following command. This will take you to the Import a Volume Group screen.

# smitty importvg

1. Specify the following values.

VOLUME GROUP name: tiv_vg1

PHYSICAL VOLUME name:hdisk5

Volume Group MAJOR NUMBER: 45

Our selections are shown in Figure 3-16 on page 105.

Note: The physical volume name has to be the one with the same physical disk id that the importing volume group resides on. Also, note that the value for Volume Group MAJOR NUMBER should be the same value as specified when creating the volume group.


Figure 3-16 Import a Volume Group

2. Use the following command to verify that the volume group is imported on the destination node.

# lsvg -o

Example 3-8 shows the command output on the destination node. Note that tiv_vg1 is now varied on to tivaix2 and is available.

Example 3-8 lsvg -o output

# lsvg -otiv_vg1rootvg

Import a Volume Group


[Entry Fields] VOLUME GROUP name [tiv_vg1]* PHYSICAL VOLUME name [hdisk6] + Volume Group MAJOR NUMBER [45] +#


Note: By default, the imported volume group is set to be varied on automatically at system restart. In an HACMP cluster, the HACMP software varies on the volume group. We need to change the property of the volume group so that it will not be automatically varied on at system restart.


3. Enter the following command.

# smitty chvg

4. Select the volume group imported in the previous step. In our example, we use tiv_vg1 (Figure 3-17).

Figure 3-17 Changing a Volume Group screen

5. Specify the following, as seen in Figure 3-18 on page 107.

Activate volume group AUTOMATICALLY at system restart: no

Change a Volume Group

Type or select a value for the entry field.Press Enter AFTER making all desired changes.

[Entry Fields]* VOLUME GROUP name [tiv_vg1] +



Figure 3-18 Changing the properties of a volume group

Testing volume group migrations

You should manually test the migration of volume groups between cluster nodes before installing HACMP, to ensure each cluster node can use every volume group.

To test volume group migrations in our environment:

1. Log on to tivaix1 as root user.

2. Ensure all volume groups are available. Run the command lsvg. You should see local volume group(s) like rootvg, and all shared volume groups. In our environment, we see the shared volume groups tiv_vg1 and tiv_vg2 from the SSA disk subsystem, as shown in Example 3-9 on page 108.

Change a Volume Group


[Entry Fields]* VOLUME GROUP name tiv_vg1* Activate volume group AUTOMATICALLY no + at system restart?* A QUORUM of disks required to keep the volume yes + group on-line ? Convert this VG to Concurrent Capable? no + Change to big VG format? no + LTG Size in kbytes 128 + Set hotspare characteristics n + Set synchronization characteristics of stale n + partitions


Note: At this point, you should now have shared resources defined on one of the nodes. Perform steps “Defining the file systems” through “Testing volume group migrations” to define another set of shared resources that reside on the other node.


Example 3-9 Verifying all shared volume groups are available on a cluster node

[root@tivaix1:/home/root] lsvgrootvgtiv_vg1tiv_vg2

3. While all shared volume groups are available, they should not be online. Use the following command to verify that no shared volume groups are online:

lsvg -o

In our environment, the output from the command, as shown in Example 3-10, indicates only the local volume group rootvg is online.

Example 3-10 Verifying no shared volume groups are online on a cluster node

[root@tivaix1:/home/root] lsvg -orootvg

If you do see shared volume groups listed, vary them offline by running the command:

varyoffvg volume_group_name

where volume_group_name is the name of the volume group.

4. Vary on all available shared volume groups. Run the command:

varyonvg volume_group_name

where volume_group_name is the name of the volume group, for each shared volume group.

Example 3-11 shows how we varied on all shared volume groups.

Example 3-11 How to vary on all shared volume groups on a cluster node

[root@tivaix1:/home/root] varyonvg tiv_vg1[root@tivaix1:/home/root] lsvg -otiv_vg1rootvg[root@tivaix1:/home/root] varyonvg tiv_vg2[root@tivaix1:/home/root] lsvg -otiv_vg2tiv_vg1rootvg

Note how we used the lsvg command to verify at each step that the vary on operation succeeded.

5. Determine the corresponding logical volume(s) for each shared volume group varied on.


Use the following command to list the logical volume(s) of each volume group:

lsvg -l volume_group_name

where volume_group_name is the name of a shared volume group. As shown in Example 3-12, in our environment shared volume group tiv_vg1 has two logical volumes, lvtws1_log and lvtws1, and shared volume group tiv_vg2 has logical volumes lvtws2_log and lvtws2.

Example 3-12 Logical volumes in each shared volume group varied on in a cluster node

[root@tivaix1:/home/root] lsvg -l tiv_vg1tiv_vg1:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTlvtws1_log jfslog 1 2 2 closed/syncd N/Alvtws1 jfs 512 1024 2 closed/syncd /usr/maestro[root@tivaix1:/home/root] lsvg -l tiv_vg2tiv_vg2:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTlvtws2_log jfslog 1 2 2 closed/syncd N/Alvtws2 jfs 128 256 2 closed/syncd /usr/maestro2

6. Mount the corresponding JFS logical volume(s) for each shared volume group. Use the mount command to mount each JFS logical volume to its defined mount point. Example 3-13 shows how we mounted the JFS logical volumes in our environment.

Example 3-13 Mounts of logical volumes on shared volume groups on a cluster node

[root@tivaix1:/home/root] df /usr/maestroFilesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/hd2 2523136 148832 95% 51330 9% /usr[root@tivaix1:/home/root] mount /usr/maestro[root@tivaix1:/home/root] df /usr/maestroFilesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/lvtws1 2097152 1871112 11% 1439 1% /usr/maestro[root@tivaix1:/home/root] df /usr/maestro2Filesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/hd2 2523136 148832 95% 51330 9% /usr[root@tivaix1:/home/root] mount /usr/maestro2[root@tivaix1:/home/root] df /usr/maestro2Filesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/lvtws1 524288 350484 34% 1437 2% /usr/maestro2

Note how we use the df command to verify that the mount point before the mount command is in one file system, and after the mount command is attached to a different filesystem. The different file systems before and after the mount commands are highlighted in bold in Example 3-13.


7. Unmount each logical volume on each shared volume group. Example 3-14 shows how we unmount all logical volumes from all shared volume groups.

Example 3-14 Unmount logical volumes on shared volume groups on a cluster node

[root@tivaix1:/home/root] umount /usr/maestro[root@tivaix1:/home/root] df /usr/maestroFilesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/hd2 2523136 148832 95% 51330 9% /usr[root@tivaix1:/home/root] umount /usr/maestro2[root@tivaix1:/home/root] df /usr/maestro2Filesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/hd2 2523136 148832 95% 51330 9% /usr

Again, note how we use the df command to verify a logical volume is unmounted from a shared volume group.

d. Vary off each shared volume group on the cluster node. Use the following command to vary off a shared volume group:

varyoffvg volume_group_name

where volume_group_name is the name of the volume group, for each shared volume group. The following example shows how we vary off the shared volume groups tiv_vg1 and tiv_vg2:

Example 3-15 How to vary off shared volume groups on a cluster node

[root@tivaix1:/home/root] varyoffvg tiv_vg1[root@tivaix1:/home/root] lsvg -otiv_vg2rootvg[root@tivaix1:/home/root] varyoffvg tiv_vg2[root@tivaix1:/home/root] lsvg -orootvg

Note how we use the lsvg command to verify that a shared volume group is varied off.

8. Repeat this procedure for the remaining cluster nodes. You must test that all volume groups and logical volumes can be accessed through the appropriate varyonvg and mount commands on each cluster node.

You now know that if volume groups fail to migrate between cluster nodes after installing HACMP, then there is likely a problem with HACMP and not with the configuration of the volume groups themselves on the cluster nodes.


Configure Network AdaptersNetwork Adapters should be configured prior to installing HACMP.

1. Log in as root on the cluster node.

2. Enter the following command. This command will take you to the SMIT TCP/IP menu.

# smitty tcpip

3. From the TCP/IP menu, select Minimum Configuration & Startup (Figure 3-19 on page 112). You are prompted to select a network interface from the Available Network Interface list. Select the network interface you want to configure.

Important: When configuring Network Adapters, bind only the boot IP address to each network adapter. No configuration for service IP address and persistent IP address is needed at this point.

Do not bind a service or persistent IP address to any adapters. A service and persistent IP address is configured after HACMP is installed.


Figure 3-19 The TCP/IP SMIT menu

4. For the network interface you have selected, specify the following items and press Enter. Figure 3-20 on page 113 shows the configuration for our cluster.

HOSTNAME Hostname for the node.

Internet ADDRESS Enter the IP address for the adapter. This must be the boot address that you planned for the adapter.

Network MASK Enter the network mask.

NAME SERVER Enter the IP address and the domain name of your name server.

Default Gateway Enter the IP address of the default Gateway.

TCP/IP


Minimum Configuration & Startup Further Configuration Use DHCP for TCPIP Configuration & Startup IPV6 Configuration Quality of Service Configuration & Startup



Figure 3-20 Configuring network adapters

5. Repeat steps 1 through 4 for all network adapters in the cluster.

Install HACMPThe best results when installing HACMP are obtained if you plan the procedure before attempting it. We recommend that you read through the following installation procedures before undertaking them.

If you make a mistake, uninstall HACMP; refer to “Remove HACMP” on page 134.

Minimum Configuration & Startup

To Delete existing configuration data, please use Further Configuration menus


[TOP] [Entry Fields]* HOSTNAME [tivaix1]* Internet ADDRESS (dotted decimal) [192.168.100.101] Network MASK (dotted decimal) [255.255.254.0]* Network INTERFACE en0 NAMESERVER Internet ADDRESS (dotted decimal) [9.3.4.2] DOMAIN Name [itsc.austin.ibm.com] Default Gateway Address (dotted decimal or symbolic name) [9.3.4.41] Cost [0] # Do Active Dead Gateway Detection? no +[MORE...2]


Attention: To implement an HA cluster for IBM Tivoli Workload Scheduler, install IBM Tivoli Workload Scheduler before proceeding to 3.2.4, “Installing HACMP 5.1 on AIX 5.2” on page 92. For instructions on installing IBM Tivoli Workload Scheduler in an HA cluster environment, refer to 4.1, “Implementing IBM Tivoli Workload Scheduler in an HACMP cluster” on page 184.


The major steps to install HACMP are covered in the following sections:

� “Preparation” on page 114

� “Install base HACMP 5.1” on page 122

� “Update HACMP 5.1” on page 126

� (Optional, use only if installation or configuration fails) “Remove HACMP” on page 134

The details of each step follow.

PreparationBy now you should have all the requirements fulfilled and all the preparation completed. In this section, we provide a step-by-step description of how to install HACMP Version 5.1 on AIX Version 5.2. Installation procedures may differ depending on which version of HACMP software you use. For versions other than 5.1, refer to the installation guide for the HACMP version that you install.

Ensure that you are running AIX 5.2 Maintenance Level 02. To verify your current level of AIX, run the oslevel and lslpp commands, as shown in Example 3-16.

Example 3-16 Verifying the currently installed maintenance level of AIX 5.2

[root@tivaix1:/home/root] oslevel -r5200-02[root@tivaix1:/home/root] lslpp -l bos.rte.commands Fileset Level State Description ----------------------------------------------------------------------------Path: /usr/lib/objrepos bos.rte.commands 5.2.0.12 COMMITTED Commands

Path: /etc/objrepos bos.rte.commands 5.2.0.0 COMMITTED Commands

If you need to upgrade your version of AIX 5.2, visit the IBM Fix Central Web site:

http://www-912.ibm.com/eserver/support/fixes/fcgui.jsp

Be sure to upgrade from AIX 5.2.0.0 to Maintenance Level 01 first, then to Maintenance Level 02.

Tip: Install HACMP after all application servers are installed, configured, and verified operational. This simplifies troubleshooting because if the application server does not run after HACMP is installed, you know that addressing an HACMP issue will fix the error. You will not have to spend time identifying whether the problem is with your application or HACMP.


Figure 3-21 shows the IBM Fix Central Web page, and the settings you use to select the Web page with AIX 5.2 maintenance packages. (We show the entire Web page in Figure 3-21, but following figures omit the banners in the left-hand, upper, and bottom portions of the page.)

Figure 3-21 IBM Fix Central Web page for downloading AIX 5.2 maintenance packages

At the time of writing, Maintenance Level 02 is the latest available. We recommend that if you are currently running AIX Version 5.2, you upgrade to Maintenance Level 02.

Maintenance Level 01 can be downloaded from:

https://techsupport.services.ibm.com/server/mlfixes/52/01/00to01.html

Maintenance Level 02 can be downloaded from:

https://techsupport.services.ibm.com/server/mlfixes/52/02/01to02.html


Note: Check the IBM Fix Central Web site before applying any maintenance packages.

After you ensure the AIX prerequisites are satisfied, you may prepare HACMP 5.1 installation media. To prepare HACMP 5.1 installation media on a cluster node, follow these steps:

1. Copy the HACMP 5.1 media to the hard disk on the node. We used /tmp/hacmp on our nodes to hold the HACMP 5.1 media.

2. Copy the latest fixes for HACMP 5.1 to the hard disk on the node. We used /tmp/hacmp on our nodes to hold the HACMP 5.1 fixes.

3. If you do not have the latest fixes for HACMP 5.1, download them from the IBM Fix Central Web site:


4. From this Web page, select pSeries, RS/6000 for the Server pop-up, AIX OS, Java™, compilers for the Product or fix type pop-up, Specific fixes for the Option pop-up, and AIX 5.2 for the OS level pop-up, then press Continue, as shown in Figure 3-22 on page 117.


Figure 3-22 Using the IBM Fix Central Web page for downloading HACMP 5.1 patches

5. The Select fixes Web page is displayed, as shown in Figure 3-23 on page 118.

We use this page to search for and download the fixes for APAR IY45695 and also the following PTF numbers:

U496114, U496115, U496116, U496117, U496118, U496119, U496120, U496121, U496122, U496123, U496124, U496125, U496126, U496127, U496128, U496129, U496130, U496138, U496274, U496275

We used /tmp/hacmp_fixes1 for storing the fix downloads of APAR IY45695, and /tmp/hacmp_fixes2 for storing the fix downloads of the individual PTFs.


Figure 3-23 Select fixes page of IBM Fix Central Web site

6. To download the fixes for APAR IY45695, select APAR number or abstract for the Search by pop-up, enter IY45695 in the Search string field, and press Go. A browser dialog as shown in Figure 3-24 may appear, depending upon previous actions within IBM Fix Central. If it does appear, press OK to continue (Figure 3-24).

Figure 3-24 Confirmation dialog presented in IBM Fix Central Select fixes page

7. The Select fixes page displays the fixes found, as shown in Figure 3-25 on page 119.


Figure 3-25 Select fixes page showing fixes found that match APAR IY45695

8. Highlight the APAR in the list box, then press the Add to my download list link. Press Continue, which displays the Packaging options page.

9. Select AIX 5200-01 for the Indicate your current maintenance level pop-up. At the time of writing, the only available download servers are in North America, so selecting a download server is an optional step. Select a download server if a more appropriate server is available in the pop-up. Now press Continue, as shown in Figure 3-26 on page 120.


Figure 3-26 Packaging options page for packaging fixes for APAR IY45695

10.The Download fixes page is displayed as shown in Figure 3-27 on page 121. Choose an appropriate option from the Download and delivery options section of the page, then follow the instructions given to download the fixes.


Figure 3-27 Download fixes page for fixes related to APAR IY45695

11.Downloading fixes for PTFs follows the same procedure as for downloading the fixes for APAR IY45695, except you select Fileset or PTF number in the Search by pop-up in the Select fixes Web page.


12.Copy the installation media to each cluster node or make it available via a remote filesystem like NFS, AFS®, or DFS™.

Install base HACMP 5.1After the installation media is prepared on a cluster node, install the base HACMP 5.1 Licensed Program Products (LPPs):

1. Enter the command smitty install to start installing the software. The Software Installation and Maintenance SMIT panel is displayed as in Figure 3-28.

Figure 3-28 Screen displayed after running command smitty install

2. Go to Install and Update Software > Install Software and press Enter. This brings up the Install Software SMIT panel (Figure 3-29 on page 123).

Software Installation and Maintenance


Install and Update Software List Software and Related Information Software Maintenance and Utilities Software Service Management Network Installation Management EZ NIM (Easy NIM Tool) System Backup Manager

F1=Help F2=Refresh F3=Cancel F8=ImageF9=Shell F10=Exit Enter=Do


Figure 3-29 Filling out the INPUT device/directory for software field in the Install Software smit panel

3. Enter the directory that the HACMP 5.1 software is stored under into the INPUT device / directory for software field and press Enter, as shown in Figure 3-29.

In our environment we entered the directory /tmp/hacmp into the field and pressed Enter.

This displays the Install Software SMIT panel with all the installation options (Figure 3-30 on page 124).

Install Software


[Entry Fields]* INPUT device / directory for software [/tmp/hacmp] +

F1=Help F2=Refresh F3=Cancel F4=ListF5=Reset F6=Command F7=Edit F8=ImageF9=Shell F10=Exit Enter=Do


Figure 3-30 Install Software SMIT panel with all installation options

4. Press Enter to install all HACMP 5.1 Licensed Program Products (LPPs) in the selected directory.

5. SMIT displays an installation confirmation dialog as shown in Figure 3-31 on page 125. Press Enter to continue. The COMMAND STATUS SMIT panel is displayed.

Throughout the rest of this redbook, if a SMIT confirmation dialog is displayed it is assumed you will know how to respond to it, so we do not show this step again.

Install Software


[Entry Fields]* INPUT device / directory for software /tmp/hacmp* SOFTWARE to install [_all_latest] + PREVIEW only? (install operation will NOT occur) no + COMMIT software updates? yes + SAVE replaced files? no + AUTOMATICALLY install requisite software? yes + EXTEND file systems if space needed? yes + OVERWRITE same or newer versions? no + VERIFY install and check file sizes? no + Include corresponding LANGUAGE filesets? yes + DETAILED output? no + Process multiple volumes? yes + ACCEPT new license agreements? no + Preview new LICENSE agreements? no +



Figure 3-31 Installation confirmation dialog for SMIT

6. The COMMAND STATUS SMIT panel displays the progress of the installation. Installation will take several minutes, depending upon the speed of your machine. When the installation completes, the panel looks similar to Figure 3-32.

Figure 3-32 COMMAND STATUS SMIT panel showing successful installation of HACMP 5.1

+--------------------------------------------------------------------------+¦ ARE YOU SURE? ¦¦ ¦¦ Continuing may delete information you may want ¦¦ to keep. This is your last chance to stop ¦¦ before continuing. ¦¦ Press Enter to continue. ¦¦ Press Cancel to return to the application. ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F8=Image F10=Exit Enter=Do ¦+--------------------------------------------------------------------------+

COMMAND STATUS

Command: OK stdout: yes stderr: no

Before command completion, additional instructions may appear below.

[TOP]geninstall -I "a -cgNpQqwX -J" -Z -d /usr/sys/inst.images/hacmp/hacmp_510 -f File 2>&1

File: I:cluster.hativoli.client 5.1.0.0 I:cluster.hativoli.server 5.1.0.0 I:cluster.haview.client 4.5.0.0 I:cluster.haview.server 4.5.0.0

*******************************************************************************

[MORE...90]

F1=Help F2=Refresh F3=Cancel F6=CommandF8=Image F9=Shell F10=Exit /=Findn=Find Next


Update HACMP 5.1After installing the base HACMP 5.1 Licensed Program Products (LPPs), you must upgrade it to the latest fixes available. To update HACMP 5.1:

1. Enter the command smitty update to start updating HACMP 5.1. The Update Software by Fix (APAR) SMIT panel is displayed as shown in Figure 3-33.

Figure 3-33 Update Software by Fix (APAR) SMIT panel displayed by running command smitty update

2. Enter in the INPUT device / directory for software field the directory that you used to store the fixes for APAR IY45695, then press Enter.

We used /tmp/hacmp_fixes1 in our environment, as shown in Figure 3-35 on page 128.

Update Software by Fix (APAR)


[Entry Fields]* INPUT device / directory for software [] +



Figure 3-34 Entering directory of APAR IY45695 fixes into Update Software by Fix (APAR) SMIT panel

3. The Update Software by Fix (APAR) SMIT panel is displayed with all the update options. Move the cursor to the FIXES to install item as shown in Figure 3-35 on page 128, and press F4 (or Esc 4) to select the HACMP fixes to update.



[Entry Fields]* INPUT device / directory for software [/tmp/hacmp_fixes1] +



Figure 3-35 Preparing to select fixes for APAR IY45695 in Update Software by Fix (APAR) SMIT panel

4. The FIXES to install SMIT dialog is displayed as in Figure 3-36 on page 129. This lists all the fixes for APAR IY45695 that can be applied.



[Entry Fields]* INPUT device / directory for software /tmp/hacmp_fixes1* FIXES to install [] + PREVIEW only? (update operation will NOT occur) no + COMMIT software updates? yes + SAVE replaced files? no + EXTEND file systems if space needed? yes + VERIFY install and check file sizes? no + DETAILED output? no + Process multiple volumes? yes +



Figure 3-36 Selecting fixes for APAR IY45695 in FIXES to install SMIT dialog

5. Select all fixes in the dialog by pressing F7 (or Esc 7) on each line so that a selection symbol (>) is added in front of each line as shown in Figure 3-37 on page 130.

Press Enter after all fixes are selected.

+--------------------------------------------------------------------------+¦ FIXES to install ¦¦ ¦¦ Move cursor to desired item and press F7. Use arrow keys to scroll. ¦¦ ONE OR MORE items can be selected. ¦¦ Press Enter AFTER making all selections. ¦¦ ¦¦ [TOP] ¦¦ IY45538 - ENH: Updated Online Planning Worksheets for HACMP R510 ¦¦ IY45539 - ENH: clrgmove support of replicated resources ¦¦ IY47464 UPDATE WILL PUT IN TWO NAME_SERVER STANZAS ¦¦ IY47503 HAES,HAS: BROADCAST ROUTES EXIST ON LO0 INTERFACE AFTER ¦¦ IY47577 WITH TCB ACTIVE, MANY MSG 3001-092 IN HACMP.OUT DURING SYNCLVO ¦¦ IY47610 HAES: FAILURE TO UMOUNT EXPORTED FILESYSTEM WITH DEVICE BUSY - ¦¦ IY47777 IF ONE NODE UPGRADED TO HAES 5.1 SMIT START CLUSTER SERVICES ¦¦ IY48184 Fixes for Multiple Site Clusters ¦¦ [MORE...36] ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F7=Select F8=Image F10=Exit ¦¦ Enter=Do /=Find n=Find Next ¦+--------------------------------------------------------------------------+


Figure 3-37 Selecting all fixes of APAR IY45695 in FIXES to install SMIT dialog

6. The Update Software by Fix (APAR) SMIT panel is displayed again (Figure 3-38 on page 131), showing all the selected fixes from the FIXES to install SMIT dialog in the FIXES to install field.

Press Enter to begin applying all fixes of APAR IY45695.

+--------------------------------------------------------------------------+¦ FIXES to install ¦¦ ¦¦ Move cursor to desired item and press F7. Use arrow keys to scroll. ¦¦ ONE OR MORE items can be selected. ¦¦ Press Enter AFTER making all selections. ¦¦ ¦¦ [MORE...36] ¦¦ > IY48918 CSPOC:Add a Shared FS gives error in cspoc.log ¦¦ > IY48922 CSPOC:disk replacement does not work ¦¦ > IY48926 incorrect version info on node_up ¦¦ > IY49152 cluster synch changes NW attribute from private to public ¦¦ > IY49490 ENH: relax clverify check for nodes in fast connect mt rg ¦¦ > IY49495 clstrmgr has memory leaks ¦¦ > IY49497 ENH: Need option to leave log files out of cluster snapshot ¦¦ > IY49498 Verification dialogs use inconsistent terminology. ¦¦ [BOTTOM] ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F7=Select F8=Image F10=Exit ¦¦ Enter=Do /=Find n=Find Next ¦+--------------------------------------------------------------------------+


Figure 3-38 Applying all fixes of APAR IY45695 in Update Software by Fix (APAR) SMIT panel

7. The COMMAND STATUS SMIT panel is displayed. It shows the progress of the selected fixes for APAR IY45695 applied to the system. A successful update will appear similar to Figure 3-39 on page 132.



[Entry Fields]* INPUT device / directory for software /tmp/hacmp_fixes1* FIXES to install [IY45538 IY45539 IY474> + PREVIEW only? (update operation will NOT occur) no + COMMIT software updates? yes + SAVE replaced files? no + EXTEND file systems if space needed? yes + VERIFY install and check file sizes? no + DETAILED output? no + Process multiple volumes? yes +



Figure 3-39 COMMAND STATUS SMIT panel showing all fixes of APAR IY45695 successfully applied

8. Confirm that the fixes were installed by first exiting SMIT. Press F10 (or Esc 0) to exit SMIT. Then enter the following command:

lslpp -l "cluster.*"

The output should be similar to that shown in Example 3-17. Note that some of the Licensed Program Products (LPPs) show a version other than the 5.1.0.0 base version of HACMP. This confirms that the fixes were successfully installed.

Example 3-17 Confirming installation of fixes for APAR IY45695

[root@tivaix1:/home/root]lslpp -l "cluster.*" Fileset Level State Description ----------------------------------------------------------------------------Path: /usr/lib/objrepos cluster.adt.es.client.demos 5.1.0.0 COMMITTED ES Client Demos cluster.adt.es.client.include 5.1.0.2 COMMITTED ES Client Include Files cluster.adt.es.client.samples.clinfo

COMMAND STATUS



[TOP]instfix -d /usr/sys/inst.images/hacmp/hacmp_510_fixes -f /tmp/.instfix_selections.12882 > File

installp -acgNpqXd /usr/sys/inst.images/hacmp/hacmp_510_fixes -f File

File: cluster.adt.es.client.include 05.01.0000.0002 cluster.adt.es.client.samples.clinfo 05.01.0000.0002 cluster.adt.es.client.samples.clstat 05.01.0000.0001 cluster.adt.es.client.samples.libcl 05.01.0000.0001 cluster.es.client.lib 05.01.0000.0002 cluster.es.client.rte 05.01.0000.0002[MORE...67]



5.1.0.2 COMMITTED ES Client CLINFO Samples cluster.adt.es.client.samples.clstat 5.1.0.1 COMMITTED ES Client Clstat Samples cluster.adt.es.client.samples.demos 5.1.0.0 COMMITTED ES Client Demos Samples cluster.adt.es.client.samples.libcl 5.1.0.1 COMMITTED ES Client LIBCL Samples cluster.adt.es.java.demo.monitor 5.1.0.0 COMMITTED ES Web Based Monitor Demo cluster.adt.es.server.demos 5.1.0.0 COMMITTED ES Server Demos cluster.adt.es.server.samples.demos 5.1.0.1 COMMITTED ES Server Sample Demos cluster.adt.es.server.samples.images 5.1.0.0 COMMITTED ES Server Sample Images cluster.doc.en_US.es.html 5.1.0.1 COMMITTED HAES Web-based HTML Documentation - U.S. English cluster.doc.en_US.es.pdf 5.1.0.1 COMMITTED HAES PDF Documentation - U.S. English cluster.es.cfs.rte 5.1.0.1 COMMITTED ES Cluster File System Support cluster.es.client.lib 5.1.0.2 COMMITTED ES Client Libraries cluster.es.client.rte 5.1.0.2 COMMITTED ES Client Runtime cluster.es.client.utils 5.1.0.2 COMMITTED ES Client Utilities cluster.es.clvm.rte 5.1.0.0 COMMITTED ES for AIX Concurrent Access cluster.es.cspoc.cmds 5.1.0.2 COMMITTED ES CSPOC Commands cluster.es.cspoc.dsh 5.1.0.0 COMMITTED ES CSPOC dsh cluster.es.cspoc.rte 5.1.0.2 COMMITTED ES CSPOC Runtime Commands cluster.es.plugins.dhcp 5.1.0.1 COMMITTED ES Plugins - dhcp cluster.es.plugins.dns 5.1.0.1 COMMITTED ES Plugins - Name Server cluster.es.plugins.printserver 5.1.0.1 COMMITTED ES Plugins - Print Server cluster.es.server.diag 5.1.0.2 COMMITTED ES Server Diags cluster.es.server.events 5.1.0.2 COMMITTED ES Server Events cluster.es.server.rte 5.1.0.2 COMMITTED ES Base Server Runtime cluster.es.server.utils 5.1.0.2 COMMITTED ES Server Utilitiescluster.es.worksheets 5.1.0.2 COMMITTED Online Planning Worksheets cluster.license 5.1.0.0 COMMITTED HACMP Electronic License cluster.msg.en_US.cspoc 5.1.0.0 COMMITTED HACMP CSPOC Messages - U.S. English cluster.msg.en_US.es.client 5.1.0.0 COMMITTED ES Client Messages - U.S. English cluster.msg.en_US.es.server 5.1.0.0 COMMITTED ES Recovery Driver Messages - U.S. English

Path: /etc/objrepos cluster.es.client.rte 5.1.0.0 COMMITTED ES Client Runtime cluster.es.clvm.rte 5.1.0.0 COMMITTED ES for AIX Concurrent Access


cluster.es.server.diag 5.1.0.0 COMMITTED ES Server Diags cluster.es.server.events 5.1.0.0 COMMITTED ES Server Events cluster.es.server.rte 5.1.0.2 COMMITTED ES Base Server Runtime cluster.es.server.utils 5.1.0.0 COMMITTED ES Server Utilities

Path: /usr/share/lib/objrepos cluster.man.en_US.es.data 5.1.0.2 COMMITTED ES Man Pages - U.S. English

9. Repeat this procedure for each node in the cluster to install the LPPs for APAR IY45695.

10.Repeat this entire procedure for all the fixes corresponding to the preceding PTFs. Enter the directory path these fixes are stored in into the INPUT device / directory for software field referred to by step 2. We used /tmp/hacmp_fixes2 in our environment.

Remove HACMPIf you make a mistake with the HACMP installation, or if subsequent configuration fails due to Object Data Manager (ODM) errors or another error that prevents successful configuration, you can remove HACMP to recover to a known state.

Removing resets all ODM entries, and removes all HACMP files. Re-installing will create new ODM entries, and often solve problems with corrupted HACMP ODM entries.

To remove HACMP:

1. Enter the command smitty remove.

2. The Remove Installed Software SMIT panel is displayed. Enter the following text in the SOFTWARE name field: cluster.*, as shown in Figure 3-40 on page 135.


Figure 3-40 How to specify removal of HACMP in Remove Installed Software SMIT panel

3. Move the cursor to the PREVIEW only? (remove operation will NOT occur) field and press Tab to change the value to no, change the EXTEND file systems if space needed? field to yes, and change the DETAILED output field to yes, as shown in Figure 3-41 on page 136.

Remove Installed Software


[Entry Fields]* SOFTWARE name [cluster.*] + PREVIEW only? (remove operation will NOT occur) yes + REMOVE dependent software? no + EXTEND file systems if space needed? no + DETAILED output? no +



Figure 3-41 Set options for removal of HACMP in Installed Software SMIT panel

4. Press Enter to start removal of HACMP. The COMMAND STATUS SMIT panel displays the progress and final status of the removal operation. A successful removal looks similar to Figure 3-42 on page 137.

Remove Installed Software


[Entry Fields]* SOFTWARE name [cluster.*] + PREVIEW only? (remove operation will NOT occur) no + REMOVE dependent software? no + EXTEND file systems if space needed? yes + DETAILED output? yes +



Figure 3-42 Successful removal of HACMP as shown by COMMAND STATUS SMIT panel

5. Press F10 (or Esc 0) to exit SMIT.

When you finish the installation of HACMP, you need to configure it for the application servers you want to make highly available.

In this redbook, we show how to do this with IBM Tivoli Workload Scheduler first in 4.1.10, “Configure HACMP for IBM Tivoli Workload Scheduler” on page 210, then IBM Tivoli Management Framework in 4.1.11, “Add IBM Tivoli Management Framework” on page 303.

COMMAND STATUS



[TOP]geninstall -u -I "pX -V2 -J -w" -Z -f File 2>&1

File: cluster.*

+-----------------------------------------------------------------------------+ Pre-deinstall Verification...+-----------------------------------------------------------------------------+Verifying selections...doneVerifying requisites...doneResults...

[MORE...134]



3.3 Implementing a Microsoft Cluster In this section, we walk you through the installation process for a Microsoft Cluster (also referred to as Microsoft Cluster Service or MSCS throughout the book). We also discuss the hardware and software aspects of MSCS, as well as the installation procedure.

The MSCS environment that we create in this chapter is a two-node hot standby cluster. The system will share two external SCSI drives connected to each of the nodes via a Y-cable. Figure 3-43 illustrates the system configuration.

Figure 3-43 Microsoft Cluster environment

The cluster is connected using four Network Interface Cards (NICs). Each node has a private NIC and a public NIC. In an MSCS, the heartbeat connection is referred to as a private connection. The private connection is used for internal cluster communications and is connected between the two nodes using a crossover cable.

The public NIC is the adapter that is used by the applications that are running locally on the server, as well as cluster applications that may move between the nodes in the cluster. The operating system running on our nodes is Windows 2000 Advanced Edition with Service Pack 4 installed.

In our initial cluster installation, we will set up the default cluster group. Cluster groups in an MSCS environment are logical groups of resources that can be moved from one node to another. The default cluster group that we will set up will

tivw2k1tivw2k2

Public NICIP 9.3.4.197

Private NICIP 192.168.1.1

Private NICIP 192.168.1.2

Public NICIP 9.3.4.198

X:SCSI ID-5

Y: & Z:SCSI ID-4

SCSI

SCSI ID

-7

SCSI

SCSI

ID-6

C: C:


contain the shared drive X: an IP address (192.168.1.197) and a network name (tivw2kv1).

3.3.1 Microsoft Cluster hardware considerationsWhen designing a Microsoft Cluster, it is important to make sure all the hardware you would like to use is compatible with the Microsoft Cluster software. To make this easy Microsoft maintains a Hardware Compatibility List (HCL) found at:

http://www.microsoft.com/whdc/hcl/search.mspx

Check the HCL before you order your hardware to ensure your cluster configuration will be supported.

3.3.2 Planning and designing a Microsoft Cluster installation You need to execute some setup tasks before you start installing Microsoft Cluster Service. Following are the requirements for a Microsoft Cluster:

� Configure the Network Interface Cards (NICs)

Each node in the cluster will need two NICs: one for public communications, and one for private cluster communications. The NICs will have to be configured with static IP addresses.Table 3-14 shows our configuration.

Table 3-14 NIC IP addresses

� Set up the Domain Name System (DNS)

Make sure all IP addresses for your NICs, and IP addresses that will be used by the cluster groups, are added to the Domain Name System (DNS). The private NIC IP addresses do not have to be added to DNS.

Our configuration will require that the IP addresses and names listed in Table 3-15 on page 140 be added to the DNS.

Node IP

tivw2k1 (public) 9.3.4.197

tivw2k1 (private) 192.168.1.1

tivw2k2 (public) 9.3.4.198

tivw2k2 (private) 192.168.1.2


Table 3-15 DNS entries required for the cluster

� Set up the shared storage

When setting up the shared storage devices, ensure that all drives are partitioned correctly and that they are all formatted with the NT filesystem (NTFS). When setting up the drives, ensure that both nodes are assigned the same driver letters for each partition and are set up as basic drives.

We chose to set up our drive letters starting from the end of the alphabet so we would not interfere with any domain login scripts or temporary storage devices.

If you are using SCSI drives, ensure that the drives are all using different SCSI IDs and that the drives are terminated correctly.

When you partition your drives, ensure you set up a partition specifically for the quorum. The quorum is a partition used by the cluster service to store cluster configuration database checkpoints and log files. The quorum partition needs to be at least 100 MB in size.

Table 3-16 illustrates how we set up our drives.

Table 3-16 Shared drive partition table

Note: When configuring the disks, make sure that you configure them on one node at a time and that the node that is not being configured is powered off. If both nodes try to control the disk at the same time, they may cause disk corruption.

Hostname IP Address

tivw2k1 9.3.4.197

tivw2k2 9.3.4.198

tivw2kv1 9.3.4.199

tivw2kv2 9.3.4.175

Important: Microsoft recommends that the quorum partition be on a separate disk and also recommends the partition be 500 MB in size

Disk Drive Letter Size Label

Disk 1 X: 34 GB Partition1

DIsk 2 Y: 33.9 GB Partition 2

Z: 100 MB Quorum


� Update the operating system

Before installing the cluster service, connect to the Microsoft Software Update Web site to ensure you have all the latest hardware drivers and software patches installed. The Microsoft Software Update Web site can be found at:

http://windowsupdate.microsoft.com

� Create a domain account for the cluster

The cluster service requires that a domain account be created under which the cluster service will run. The domain account must be a member of the administrator group on each of the nodes in the cluster. Make sure you set the account so that the user cannot change the password and that the password never expires. We created the account “cluster_service” for our cluster.

� Add nodes to the domain

The cluster service runs under a domain account. In order for the domain account to be able to authenticate against the domain controller, the nodes must join the domain where the cluster user has been created.

3.3.3 Microsoft Cluster Service installationHere we discuss the Microsoft Cluster Service installation process. The installation is broken into three sections: installation of the primary node; installation of the secondary node; and configuration of the cluster resources.

Following is a high-level overview of the installation procedure. Detailed information for each step in the process are provided in the following sections.

Installation of the MSCS node 1

The cluster service is installed as a Windows component. To install the service, the Windows 2000 Advanced Server CD-ROM should be in CD-ROM drive. You can save time by copying the i386 directory from the CD to the local drive.

1. To start the installation, open the Start menu and select Settings -> Control Panel and then double-click Add/Remove Programs.

Important: Before starting the installation on Node 1, make sure that Node 2 is powered off.


2. Click Add/Remove Windows Components, located on the left side of the window. Select Cluster Service from the list of components as shown in Figure 3-44, then click Next.

Figure 3-44 Windows Components Wizard


3. Make sure that Remote administration mode is checked and click Next (Figure 3-45). You will be asked to insert the Windows 2000 Advanced Server CD if it is not already inserted. If you copied the CD to the local drive, select the location where it was copied.



4. Click Next at the welcome screen (Figure 3-46).

Figure 3-46 Welcome screen


5. The next window (Figure 3-47) is used by Microsoft to verify that you are aware that it will not support hardware that is not included in its Hardware Compatibility List (HCL).

To move on to the next step of the installation, click I Understand and then click Next.

Figure 3-47 Hardware Configuration


6. Now that we have located the installation media and have acknowledged the support agreement we can start the actually installation. The next screen is used to select whether you will be installing the first node or any additional node.

We will install the first node in the cluster at this point so make sure that the appropriate radio button is selected and click Next (Figure 3-48). We will return to this screen again later when we install the second node in the cluster.

Figure 3-48 Create or Join a Cluster


7. We must now name our cluster. The name is the local name associated with the whole cluster. This is not the virtual name that is associated with the a cluster group. This is used by the Microsoft Cluster Administrator utility to administer the cluster resources.

We prefer to use the same as the hostname to prevent confusion. In this case we call it TIVW2KV1. After you have entered a name for your cluster, click Next (Figure 3-49).

Figure 3-49 Cluster Name


8. The next step is to enter the domain account that the cluster service will use. See the pre-installation setup section for details on setting up the domain account that the cluster service will use. Click Next (Figure 3-50).

Figure 3-50 Select an Account


9. The next window is used to determine the disks that the cluster service will manage. In the example we have two partitions, one for the quorum and another for the data. Make sure both are set up as managed disks. Click Next (Figure 3-51).

Figure 3-51 Add or Remove Managed Disks


10.We now need to select where the cluster checkpoint and log files will be stored. This disk is referred to as the Quorum Disk. The quorum is a vital part of the cluster as it used for storing critical cluster files. If the data on the Quorum Disk becomes corrupt, the cluster will be unusable.

It is important to back up this data regularly so you will be able to recover your cluster. It is recommended that you have at least 100 MB on a separate partition for reserved for this purpose; refer to the preinstallation setup section on disk preparation.

After you select your Quorum Disk, select Next. (Figure 3-52).

Figure 3-52 Cluster File Storage


11.The next step is to configure networking. A window will pop up to recommend that you use multiple public adapters to remove any single point of failure. Click Next to continue (Figure 3-53).

Figure 3-53 Warning window


12.The next section will prompt you to identify each NIC as either public, private or both. Since we named our adapters ahead of time, this is easy. Set the adapter that is labeled Public Network Connection as Client access only (public network). Click Next (Figure 3-54).

Figure 3-54 Network Connections - All communications


13.Now we will configure the private network adapter. This adapter is used as a heartbeat connection between the two nodes of the cluster and is connected via a crossover cable.

Since this adapter is not accessible from the public network, this is considered a private connection and should be configured as Internal cluster communications only (private network). Click Next (Figure 3-55).

Figure 3-55 Network Connections - Internal cluster communications only (private network)


14.Because we configured two adapters to be capable of communicating as private adapters, we need to select the priority in which the adapters will communicate.

In our case, we want the Private Network Connection to serve as our primary private adapter. We will use the Public Network Connection as our backup adapter. Click Next to continue (Figure 3-56).

Figure 3-56 Network priority setup


15.Once the network adapters have been configured, it is time to create the cluster resources. The first cluster resource is the cluster IP address. The cluster IP address is the IP address associated with the cluster resource group; it will follow the resource group when it is moved from node to node. This cluster IP address is commonly referred to as the virtual IP.

Set up the cluster IP address you will need to enter the IP address and subnet mask that you plan to use, and select the Public Network Connection as the network to use. Click Next (Figure 3-57).

.

Figure 3-57 Cluster IP Address


16.Click Finish to complete the cluster service configuration (Figure 3-58).

Figure 3-58 Cluster Service Configuration Wizard

17.The next window is just an informational pop-up letting you know that the Cluster Administrator application is now available. The cluster service is managed using the Cluster Administrator tool. Click OK (Figure 3-60 on page 157).

Figure 3-59 Cluster Service Configuration Wizard


18.Click Finish one more time to close the installation wizard (Figure 3-60).


At this point, the installation of the cluster service on the primary node is compete. Now that we have created a cluster, we will need to add additional nodes to the cluster.

Installing the second node The next step is to install the second node in the cluster. To add the second node, you will have to perform the following steps on the secondary node. The installation of the secondary node is relatively easy, since the cluster is configured during the installation on the primary node. The first few steps are identical to installing the cluster service on the primary node.

To install the cluster service on the secondary node:

1. Go to the Start Menu and select Settings -> Control Panel and double-click Add/Remove Programs.


2. Click Add/Remove Windows Components, located on the left side of the window, and then select Cluster Service from the list of components. Click Next to start the installation (Figure 3-61).



3. Make sure the Remote administration mode is selected (Figure 3-62); it should be the only option available. Click Next to continue.



4. Click Next past the welcome screen (Figure 3-63).



5. Once again you will have to verify that the hardware that you have selected is compatible with the software you are installing and that you understand that Microsoft will not support software not on the HCL. Click I Understand and then Next to continue (Figure 3-64).

Figure 3-64 Hardware Configuration


6. The next step is to select that you will be adding the second node to the cluster. Once the second node option is selected, click Next to continue (Figure 3-65).

Figure 3-65 Create or Join a Cluster


7. You will now have to type in the name of the cluster that you would like the second node to be a member. Since we set up a domain account to be used for the cluster service, we will not need to check the connect to cluster box. Click Next (Figure 3-66).

Figure 3-66 Cluster Name


8. The next window prompts you for a password for the domain account that we installed the primary node with. Enter the password and click Next (Figure 3-67).

Figure 3-67 Select an Account


9. Click Finish to complete the installation (Figure 3-68).

Figure 3-68 Finish the installation


10.The next step is to verify that the cluster works. To verify that the cluster is operational, we will need to open the Cluster Administrator. You can open the Cluster Administrator in the Start Menu by selecting Programs -> Administrative Tools -> Cluster Administrator (Figure 3-69).

You will notice that the cluster will have two groups: one called cluster group, and the other called Disk Group 1:

– The cluster group is the group that contains the virtual IP and name and cluster shared disk.

– Disk Group 1 at this time only contains our quorum disk.

In order to verify that the cluster is functioning properly, we need to move the cluster group from one node to the other. You can move the cluster group by right-clicking the icon and selecting Move Group.

After you have done this, you should see the group icon change for a few seconds while the resources are moved to the secondary node. Once the group has been moved, you should see that the icon return to normal and the owner of the group should now be the second node in the cluster.

Figure 3-69 Verifying that the cluster works

The cluster service is now installed and we are ready to start adding applications to our cluster groups.


Configuring the cluster resourcesNow it is time to configure the cluster resources. The default setup using the cluster service installation wizard is not optimal for our Tivoli environment. For the scenarios used later in this book, we have to set up the cluster resources for a mutual takeover scenario. To support this, we have to modify the current resource groups and add two resources. Figure 3-70 illustrates the desired configuration.

Figure 3-70 Cluster resource diagram

The following steps will guide you through the cluster configuration.

tivw2k1 tivw2k2

TIVW2KV1 Resource GroupDriv e X:IP Address 9.3.4.199Network Name TIVW2KV1

TIVW2KV2 Resource GroupDriv e Y: Z:IP Address 9.3.4.175Network Name TIVW2KV2


1. The first step is to rename the cluster resource groups.

a. Right-click the cluster group containing the Y: and Z: drive resource and select Rename (Figure 3-71). Enter the name TIVW2KV1.

b. Right-click the cluster group containing the X: drive resource and select Rename. Enter the name TIVW2KV2.

Figure 3-71 Rename the cluster resource groups


2. Now we will need to move the disk resources to the correct groups.

a. Right-click the Disk Y: Z: resource under the TIVW2KV1 resource group and select Change Group -> TIVW2KV2 as shown in Figure 3-72.

Figure 3-72 Changing resource groups

b. Press Yes to complete the move (Figure 3-73).

Figure 3-73 Resource move confirmation

c. Right -lick the Disk X: resource under the TIVW2KV2 resource group and select Change Group -> TIVW2KV1.

d. Press Yes to complete the move.


3. The next step is to rename the resources. We do this so we can determine which resource group a resource belongs to by it name.

a. Right-click the Cluster IP Address resource under the TIVW2KV1 resource group and select Rename (Figure 3-74). Enter the name TIVW2KV1 - Cluster IP Address.

b. Right-click the Cluster Name resource under the TIVW2KV1 resource group and select Rename. Enter the name TIVW2KV1 - Cluster Name.

c. Right- click the Disk X: resource under the TIVW2KV1 resource group and select Rename. Enter the name TIVW2KV1 - Disk X:.

d. Right-click the Disk Y: Z: resource under the TIVW2KV2 resource group and select Rename. Enter the name TIVW2KV2 - Disk Y: Z:.

Figure 3-74 Rename resources


4. We now need to add two resources under the TIVW2KV2 resource group. The first resource we will add is the IP Address resource.

a. Right-click the TIVW2KV2 resource group and select New -> Resource (Figure 3-75).

Figure 3-75 Add a new resource


b. Enter TIVW2KV2 - IP Address in the name field and set the resource type to IP address. Click Next (Figure 3-76).

Figure 3-76 Name resource and select resource type


c. Select both TIVW2K1 and TIVW2K2 and possible owners of the resource. Click Next (Figure 3-77).

Figure 3-77 Select resource owners


d. Click Next past the dependencies screen; no dependencies need to be defined at this time (Figure 3-78).

Figure 3-78 Dependency configuration


e. The next step is to configure the IP address associated with the resource. Enter the IP address 9.3.4.175 in the Address field and add the subnet mask of 255.255.255.254. Make sure the Public Network Connection is selected in the Network field and the Enable NetBIOS for this address box is checked. Click Next (Figure 3-79).

Figure 3-79 Configure IP address

f. Click OK to complete the installation (Figure 3-80).

Figure 3-80 Completion dialog


5. Now that the IP address resource has been created, we need to create the Name resource for the TIVW2KV2 cluster group.

a. Right-click the TIVW2KV2 resource group and select New -> Resource (Figure 3-81).

Figure 3-81 Adding a new resource


b. Set the name of the resource to TIVW2KV2 - Cluster Name and specify the resource type to be Network Name. Click Next (Figure 3-82).

Figure 3-82 Specify resource name and type


c. Next select both TIVW2K1 and TIVW2K2 as possible owners of the resource. Click Next (Figure 3-83).

Figure 3-83 Select resource owners


d. Click Next in the Dependencies screen (Figure 3-84). We do not need to configure these at this time.

Figure 3-84 Resource dependency configuration


e. Next we will enter the cluster name for the TIVW2KV2 resource group. Enter the cluster name TIVW2KV2 in the Name field. Click Next (Figure 3-85).

Figure 3-85 Cluster name

f. Click OK to complete the cluster name configuration (Figure 3-86).



6. The final step of the cluster configuration is to bring the TIVW2KV2 resource group online. To do this, right-click the TIVW2KV2 resource group and select Bring Online (Figure 3-87).

Figure 3-87 Bring resource group online

This concludes our cluster configuration.


Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster

In this chapter, we cover implementation of IBM Tivoli Workload Scheduler in an HACMP and an MCSC cluster.

The chapter is divided into the following main sections:

� “Implementing IBM Tivoli Workload Scheduler in an HACMP cluster” on page 184

� “Implementing IBM Tivoli Workload Scheduler in a Microsoft Cluster” on page 347

4


4.1 Implementing IBM Tivoli Workload Scheduler in an HACMP cluster

In this section, we describe the steps to implement IBM Tivoli Workload Scheduler in an HACMP cluster. We use the mutual takeover scenario described in 3.1.1, “Mutual takeover for IBM Tivoli Workload Scheduler” on page 64.

4.1.1 IBM Tivoli Workload Scheduler implementation overviewFigure 4-1 on page 185 shows a diagram of a IBM Tivoli Workload Scheduler implementation in a mutual takeover HACMP cluster. Using this diagram, we will describe how IBM Tivoli Workload Scheduler could be implemented, and what you should be aware of. Though we do not describe a hot standby scenario of IBM Tivoli Workload Scheduler, the steps used to configure IBM Tivoli Workload Scheduler for a mutual takeover scenario also cover what should be done for a hot standby scenario.

Note: In this section we assume that you have finished planning your cluster and have also finished the preparation tasks to install HACMP. If you have not finished these tasks, perform the steps described in Chapter 3, “Planning and Designing an HACMP Cluster”, and the preparation tasks described in Chapter 3 “Installing HACMP”. We strongly recommend that you install IBM Tivoli Workload Scheduler before HACMP, and confirm that IBM Tivoli Workload Scheduler runs without any problem.

It is important that you also confirm that IBM Tivoli Workload Scheduler is able to fallover and fallback between nodes, by manually moving the volume group between nodes. This verification procedure is described in “Verify IBM Tivoli Workload Scheduler behavior in HACMP cluster” on page 202.


Figure 4-1 IBM Tivoli Workload Scheduler implementation overview

To make IBM Tivoli Workload Scheduler highly available in an HACMP cluster, the IBM Tivoli Workload Scheduler instance should be installed on the external shared disk. This means that the /TWShome directory should reside on the shared disk and not the locally attached disk. This is the bottom line to enable HACMP to relocate the IBM Tivoli Workload Scheduler engine from one node to another, along with other system components such as external disks and service IP labels.

When implementing IBM Tivoli Workload Scheduler in a cluster, there are certain items you should be aware of, such as the location of the IBM Tivoli Workload Scheduler engine and the IP address used for IBM Tivoli Workload Scheduler workstation definition. Specifically for a mutual takeover scenario, you have more to consider, as there will be multiple instances of IBM Tivoli Workload Scheduler running on one node.

Following are the considerations you need to keep in mind when implementing IBM Tivoli Workload Scheduler in an HACMP cluster. The following considerations apply for Master Domain Manager, Domain Manager, Backup Domain Manager and FTA.

Cluster: cltivoli

tivaix2tivaix1

TWS Engine1nm port=31111IP=tivaix1_svc

Mount Point 1: /usr/maestroUser: maestro

Mount Point 2: /usr/maestro2User: maestro2

Mount Point 1: /usr/maestroUser: maestro

Mount Point 2: /usr/maestro2User: maestro2

TWS Engine2nm port=31112IP=tivaix2_svc

Chapter 4. IBM Tivoli Workload Scheduler implementation in a cluster 185

� Location of IBM Tivoli Workload Scheduler engine executables

As mentioned earlier, IBM Tivoli Workload Scheduler engine should be installed in the external disk to be serviced by HACMP. In order to have the same instance of IBM Tivoli Workload Scheduler process its job on another node after a fallover, executables must be installed on the external disk. For Version 8.2, all files essential to IBM Tivoli Workload Scheduler processing are installed in the /TWShome directory. The /TWShome directory should reside on file systems on the shared disk.

For versions prior to 8.2, IBM Tivoli Workload Scheduler executables should be installed in a file system with the mount point above the /TWShome directory. For example, if /TWShome is /usr/maestro/maestro, the mount point should be /usr/maestro.

In a mutual takeover scenario, you may have a case where multiple instances of IBM Tivoli Workload Scheduler are installed on the shared disk. In such a case, make sure these instances are installed on separate file systems residing on separate volume groups.

� Creating mount points on standby nodes

Create a mount point for the IBM Tivoli Workload Scheduler file system on all nodes that may run that instance of IBM Tivoli Workload Scheduler. When configuring for a mutual takeover, make sure that you create mount points for every IBM Tivoli Workload Scheduler instance that may run a node.

In Figure 4-1 on page 185, nodes tivaix1 and tivaix2 may both have two instances of IBM Tivoli Workload Scheduler engine running in case of a node failure. Note that in the diagram, both nodes have mount points for TWS Engine1 and TWS Engine2.

� IBM Tivoli Workload Scheduler user account and group account

On each node, create a IBM Tivoli Workload Scheduler user and group for all IBM Tivoli Workload Scheduler instances that may run on the node. The user’s home directory must be set to /TWShome.

If a IBM Tivoli Workload Scheduler instance will fallover and fallback among several nodes in a cluster, make sure all those nodes have the IBM Tivoli Workload Scheduler user and group defined to control that instance. In the mutual takeover scenario, you may have multiple instances running at the same time on one node. Make sure you create separate users for each IBM Tivoli Workload Scheduler instance in your cluster so that you are able to control them separately.

In our scenario, we add user maestro and user maestro2 on both nodes because TWS Engine1 and TWS Engine2 should be able to run on both nodes. The same group accounts should be created on both nodes to host these users.


� Netman port

When there will be only one instance of IBM Tivoli Workload Scheduler running on a node, using the default port (31111) is sufficient.

For a mutual takeover scenario, you need to consider setting different port numbers for each IBM Tivoli Workload Scheduler instance in the cluster. This is because several instances of IBM Tivoli Workload Scheduler may run on the same node, and no IBM Tivoli Workload Scheduler instance on the same node should have same netman port. In our scenario, we set the netman port of TWS Engine1 to 31111, and the netman port of TWS Engine2 to 31112.

� IP address

The IP address or IP label specified in the workstation definition should be the service IP address or the service IP label for HACMP. If you plan a fallover or a fallback for an IBM Tivoli Workload Scheduler instance, it should not use an IP address or IP label that is bound to a particular node. (Boot address and persistent address used in an HACMP cluster are normally bound to one node, so these should not be used.) This is to ensure that IBM Tivoli Workload Scheduler instance does not lose connection with other IBM Tivoli Workload Scheduler instances in case of a fallover or a fallback.

In our diagram, note that TWS_Engine1 uses a service IP address called tivaix1_service, and TWS_Engine2 uses a service IP address called tivaix2_service. These service IP address will move along with the IBM Tivoli Workload Scheduler instance from one node to another.

� Starting and stopping IBM Tivoli Workload Scheduler instances

IBM Tivoli Workload Scheduler instances should be started and stopped from HACMP application start and stop scripts. Generate a custom script to start and stop each IBM Tivoli Workload Scheduler instance in your cluster, then when configuring HACMP, associate your custom scripts to resource groups that your IBM Tivoli Workload Scheduler instances reside in.

If you put IBM Tivoli Workload Scheduler under the control of HACMP, it should not be started from /etc/inittab or from any other way except for application start and stop scripts.

� Files installed on the local disk

Though most IBM Tivoli Workload Scheduler executables are installed in the IBM Tivoli Workload Scheduler file system, some files are installed on local disks. You may have to copy these local files to other nodes.

For IBM Tivoli Workload Scheduler 8.2, copy the /usr/Tivoli/TWS/TKG/3.1.5/lib/libatrc.a file.


For IBM Tivoli Workload Scheduler8.1, you may need to copy the following files to any node in the cluster that will host the IBM Tivoli Workload Scheduler instance:

– /usr/unison/components– /usr/lib/libatrc.a– /usr/Tivoli/TWS/TKG/3.1.5/lib/libatrc.a

� Monitoring the IBM Tivoli Workload Scheduler process

HACMP is able to monitor application processes. It can be configured to initiate a cluster event based on application process failures. When considering to monitor TWS using HACMP’s application monitoring, keep in mind that IBM Tivoli Workload Scheduler stops and restarts its all its processes (excluding the netman process) every 24 hours. The recycling of the processes is initiated by the FINAL jobstream, which is set to run at a certain time everyday.

Be aware that if you configure HACMP to initiate an action in the event of a TWS process failure, this expected behavior of IBM Tivoli Workload Scheduler could be interpreted as a failure of IBM Tivoli Workload Scheduler processes, and could trigger unwanted action. If you simply want to monitor process failures, we recommend that you use monitoring software (for example, IBM Tivoli Monitoring.)

4.1.2 Preparing to installBefore installing IBM Tivoli Workload Scheduler in an HACMP cluster, define the IBM Tivoli Workload Scheduler group and user account on each node that will host IBM Tivoli Workload Scheduler. The following procedure presents an example of how to prepare for an installation of IBM Tivoli Workload Scheduler 8.2 on AIX 5.2. We assume that IBM Tivoli Workload Scheduler file system is already created as described in 3.2.3, “Planning and designing an HACMP cluster” on page 67.

In our scenario, we added a group named tivoli, users maestro and maestro2 on each node.

1. Creating group accounts

Execute the following on all the nodes that IBM Tivoli Workload Scheduler instance will run.

a. Enter the following command; this will take you to the SMIT Groups menu:

# smitty groups

b. From the Groups menu, select Add a Group.

c. Enter a value for each of the following items:


Group NAME Assign a name for the group.

ADMINISTRATIVE Group true

Group ID Assign a group ID. Assign the same ID for all nodes in the cluster.

Figure 4-2 shows an example of adding a group. We added group tivoli with an ID 2000.

Figure 4-2 Adding a group

2. Adding IBM Tivoli Workload Scheduler users

Perform the following procedures for all nodes in the cluster:

a. Enter the following command; this will take you to the SMIT Users menu:

# smitty user

b. From the Users menu, select Add a User.

c. Enter the values for the following item, then press Enter. The other items should be left as it is.

User NAME Assign a name for the user.

Add a Group


[Entry Fields]* Group NAME [tivoli] ADMINISTRATIVE group? true + Group ID [2000] # USER list [] + ADMINISTRATOR list [] +



User ID Assign an ID for the user. This ID for the user should be the same on all nodes.

ADMINISTRATIVE USER? false

Primary GROUP Set the group that you have defined in the previous step.

Group SET Set the primary group and the staff group.

HOME directory Set /TWShome.

Figure 4-3 shows an example of a IBM Tivoli Workload Scheduler user definition. In the example, we defined maestro user.

Figure 4-3 Defining a user

d. After you have added the user, modify the $HOME/.profile of the user. Modify the PATH variable to include the /TWShome and /TWShome/bin directory. This enables you to run IBM Tivoli Workload Scheduler commands in any directory as long as you are logged in as the IBM Tivoli Workload Scheduler user. Also add the TWS_TISDIR variable. The value for the TWS_TISDIR should be the /TWShome directory. The TWS_TISDIR enables IBM Tivoli Workload Scheduler to display

Add a User


[TOP] [Entry Fields]* User NAME [maestro] User ID [1001] # ADMINISTRATIVE USER? false + Primary GROUP [tivoli] + Group SET [tivoli,staff] + ADMINISTRATIVE GROUPS [] + ROLES [] + Another user can SU TO USER? true + SU GROUPS [ALL] + HOME directory [/usr/maestro] Initial PROGRAM [] User INFORMATION [] EXPIRATION date (MMDDhhmmyy) [0][MORE...37]



messages in the correct language codeset. Example 4-1 shows an example of how the variable should be defined. In the example, /usr/maestro is the /TWShome directory.

Example 4-1 An example .profile for TWSusr

PATH=/TWShome:/TWShome/bin:$PATHexport PATHTWS_TISDIR=/usr/maestroexport TWS_TISDIR

4.1.3 Installing the IBM Tivoli Workload Scheduler engineIn this section, we show you the steps to install IBM Tivoli Workload Scheduler 8.2 Engine (Master Domain Manager) from the command line. For procedures to install IBM Tivoli Workload Scheduler using the graphical user interface, refer to IBM Tivoli Workload Scheduler Planning and Installation Guide Version 8.2, SC32-1273.

In our scenario, we installed two TWS instances called TIVAIX1 and TIVAIX2 on a shared external disk. TIVAIX1 was installed from node tivaix1, and TIVAIX2 was installed from tivaix2. We used the following steps to do this.

1. Before installing, identify the following items. These items are required when running the installation script.

– workstation type - master

– workstation name - The name of the workstation. This is the value for the host field that you specify in the workstation definition. It will also be recorded in the globalopts file.

– netman port - Specify the listening port for netman. We remind you again that if you plan to have several instances of IBM Tivoli Workload Scheduler running on machine, make sure you specify different port numbers for each IBM Tivoli Workload Scheduler instance.

– company name - Specify this if you would like your company name in reports produced by IBM Tivoli Workload Scheduler report commands.

2. Log in to the node where you want to install the IBM Tivoli Workload Scheduler engine, as a root user.

3. Confirm that the IBM Tivoli Workload Scheduler file system is mounted. If it is not mounted, use the mount command to mount the IBM Tivoli Workload Scheduler file system.

4. Insert IBM Tivoli Workload Scheduler Installation Disk 1.


5. Locate the twsinst script in the directory of the platform on which you want to run the script. The following is an example of installing a Master Domain Manager named TIVAIX1.

# ./twsinst -new -uname twsusr -cputype master -thiscpu cpuname -master cpuname -port port_no -company company_name

Where:

– twsusr - The name of the IBM Tivoli Workload Scheduler user.

– master - The workstation type. Refer to IBM Tivoli Workload Scheduler Planning and Installation Guide Version 8.2, SC32-1273, for other options

– cpuname - The name of the workstation. For -thiscpu, specify the name of the workstation that you are installing. For -master, specify the name of the Master Domain Manager. When installing the Master Domain Manager, specify the same value for -thiscpu and -master.

– port_no - Specify the port number that netman uses to receive incoming messages other workstations.

– company_name - The name of your company (optional)

Example 4-2 shows sample command syntax for installing Master Domain Manager TIVAIX1.

Example 4-2 twsinst script example for TIVAIX1

# ./twsinst -new -uname maestro -cputype master -thiscpu tivaix1 -master tivaix1 -port 31111 -company IBM

Example 4-3 shows sample command syntax for installing Master Domain Manager TIVAIX2.

Example 4-3 twsinst script example for TIVAIX2

# ./twsinst -new -uname maestro2 -cputype master -thiscpu tivaix2 -master tivaix2 -port 31112 -company IBM

4.1.4 Configuring the IBM Tivoli Workload Scheduler engineAfter you have installed the IBM Tivoli Workload Scheduler engine as a Master Domain Manager, perform the following configuration tasks. These are the minimum tasks that you should perform to get IBM Tivoli Workload Scheduler Master Domain Manager running. For instructions on configuring other types of workstation, such as Fault Tolerant Agents and Domain Managers, refer to Tivoli Workload Scheduler Job Scheduling Console User’s Guide, SH19-4552, or Tivoli Workload Scheduler Version 8.2, Reference Guide, SC32-1274.


Checking the workstation definitionIn order to have IBM Tivoli Workload Scheduler serviced correctly by HACMP in the event of a fallover, it must have the service IP label or the service IP address defined in its workstation definition. When installing a Master Domain Manager (master), the workstation definition is added automatically. After you have installed IBM Tivoli Workload Scheduler, check the workstation definition of the master and verify that the service IP label or the address is associated with the master.

1. Log into the master workstation as: TWSuser.

2. Execute the following command; this opens a text editor with the master’s CPU definition:

$ composer “modify cpu=master_name”

Where:

– master - the workstation name of the master.

Example 4-4 and Example 4-5 give the workstation definition for workstations TIVAIX1 and TIVAIX2 that we installed. Notice that the value for NODE is set to the service IP label in each workstation definition.

Example 4-4 Workstation definition for TIVIAIX1

CPUNAME TIVAIX1 DESCRIPTION "MASTER CPU" OS UNIX NODE tivaix1_svc DOMAIN MASTERDM TCPADDR 31111 FOR MAESTRO AUTOLINK ON RESOLVEDEP ON FULLSTATUS ONEND

Example 4-5 Workstation definition for TIVIAIX1

CPUNAME TIVAIX2 DESCRIPTION "MASTER CPU" OS UNIX NODE tivaix2_svc DOMAIN MASTERDM TCPADDR 31112 FOR MAESTRO AUTOLINK ON RESOLVEDEP ON FULLSTATUS ONEND


3. If the value for NODE is set to the service IP label correctly, then close the workstation definition. If is not set correctly, then modify the file and save.

Adding the FINAL jobstreamThe FINAL jobstream is responsible for generating daily production files. Without this jobstream, IBM Tivoli Workload Scheduler is unable to perform daily job processing. IBM Tivoli Workload Scheduler provides a definition file that you can use to add this FINAL jobstream. The following steps describe how to add the FINAL jobstream using this file.

1. Log in as the IBM Tivoli Workload Scheduler user.

2. Add the FINAL schedule by running the following command.

$ composer "add Sfinal"

3. Run Jnextday to create the production file.

$ Jnextday

4. Check the status of IBM Tivoli Workload Scheduler by issuing the following command.

$ conman status

If IBM Tivoli Workload Scheduler started correctly, the status should be Batchman=LIVES.

5. Check that all IBM Tivoli Workload Scheduler processes (netman, mailman, batchman, jobman) are running. Example 4-6 illustrates checking for the IBM Tivoli Workload Scheduler process.

Example 4-6 Checking for IBM Tivoli Workload Scheduler process

$ ps -ef | grep -v grep | grep maestromaestro2 14484 31270 0 16:59:41 - 0:00 /usr/maestro2/bin/batchman -parm32000maestro2 16310 13940 1 16:00:29 pts/0 0:00 -kshmaestro2 26950 1 0 22:38:59 - 0:00 /usr/maestro2/bin/netmanmaestro2 28658 16310 2 17:00:07 pts/0 0:00 ps -ef root 29968 14484 0 16:59:41 - 0:00 /usr/maestro2/bin/jobmanmaestro2 31270 26950 0 16:59:41 - 0:00 /usr/maestro2/bin/mailman -parm 32000 -- 2002 TIVAIX2 CONMAN UNIX 8.2 MESSAGE$

4.1.5 Installing IBM Tivoli Workload Scheduler ConnectorIf you plan to use JSC to perform administration tasks for IBM Tivoli Workload Scheduler, install the IBM Tivoli Workload Scheduler connector. IBM Tivoli Workload Scheduler connector must be installed on any TMR server or Managed


Node that is running IBM Tivoli Workload Scheduler Master Domain Manager. Optionally, JSC could be installed on any Domain Manager or FTA, providing that Managed Node is also installed.

Here we describe the steps to install Job Scheduling Services (a prerequisite to install IBM Tivoli Workload Scheduler Connector) and IBM Tivoli Workload Scheduler Connector by using the command line. For instructions on installing IBM Tivoli Workload Scheduler Connector from the Tivoli Desktop, refer to Tivoli Workload Scheduler Job Scheduling Console User’s Guide, SH19-4552.

For our mutual takeover scenario, each node in our two-node HACMP cluster (tivaix1, tivaix2) hosts a TMR server. We installed IBM Tivoli Workload Scheduler Connector on each of the two cluster nodes.

1. Before installing, identify the following items. These items are required when running the IBM Tivoli Workload Scheduler Connector installation script.

– Node name to install IBM Tivoli Workload Scheduler Connector - This must be the name defined in the Tivoli Management Framework.

– The full path to the installation image - For Job Scheduling Services, it is the directory with the TMF_JSS.IND file. For IBM Tivoli Workload Scheduler Connector, it is the directory with the TWS_CONN.IND file.

– IBM Tivoli Workload Scheduler installation directory - The /TWShome directory.

– Connector Instance Name - A name for a connector instance name.

– Instance Owner - The name of the IBM Tivoli Workload Scheduler user.

2. Insert the IBM Tivoli Workload Scheduler Installation Disk 1.

3. Log in on the TMR server as root user.

4. Run the following command to source the Tivoli environment variables:

# . /etc/Tivoli/setup_env.sh

5. Run the following command to install Job Scheduling Services:

# winstall -c install_dir -i TMF_JSS nodename

Where:

– install_dir - the path to the installation image

Note: Tivoli Management Framework should be installed prior to IBM Tivoli Workload Scheduler Connector installation. For instructions on installing a TMR server, refer to Chapter 5 or Tivoli Enterprise Installation Guide Version 4.1, GC32-0804. In this section, we assume that you have already installed Tivoli Management Framework, and have applied the latest set of fix packs.


– nodename - the name of the TMR server or the Managed Node that you are installing JSS on.

The command will perform a prerequisite verification, and you will be prompted to proceed with the installation or not.

Example 4-7 illustrates the execution of the command.

Example 4-7 Installing JSS from the command line

# winstall -c /usr/sys/inst.images/tivoli/wkb/TWS820_1/TWS_CONN -i TMF_JSS tivaix1

Checking product dependencies... Product TMF_3.7.1 is already installed as needed. Dependency check completed.Inspecting node tivaix2...Installing Product: Tivoli Job Scheduling Services v1.2

Unless you cancel, the following operations will be executed: For the machines in the independent class: hosts: tivaix2 need to copy the CAT (generic) to: tivaix2:/usr/local/Tivoli/msg_cat

For the machines in the aix4-r1 class: hosts: tivaix2 need to copy the BIN (aix4-r1) to: tivaix2:/usr/local/Tivoli/bin/aix4-r1 need to copy the ALIDB (aix4-r1) to: tivaix2:/usr/local/Tivoli/spool/tivaix2.db

Continue([y]/n)?

Creating product installation description object...Created.Executing queued operation(s)Distributing machine independent Message Catalogs --> tivaix2 Completed.

Distributing architecture specific Binaries --> tivaix2 Completed.

Distributing architecture specific Server Database --> tivaix2 ....Product install completed successfully. Completed.

Registering product installation attributes...Registered.


6. Verify that Job Scheduling Services was installed by running the following command:

# wlsinst -p

This command shows a list of all the Tivoli products installed in your environment. You should see in the list “Tivoli Job Scheduling Services v1.2”. Example 4-8 shows an example of the command output. The 10th line shows that JSS was installed successfully

Example 4-8 wlsinst -p command output

# wlsinst -pTivoli Management Framework 4.1Tivoli ADE, Version 4.1 (build 09/19)Tivoli AEF, Version 4.1 (build 09/19)Tivoli Java Client Framework 4.1Java 1.3 for TivoliTivoli Java RDBMS Interface Module (JRIM) 4.1JavaHelp 1.0 for Tivoli 4.1Tivoli Software Installation Service Client, Version 4.1Tivoli Software Installation Service Depot, Version 4.1Tivoli Job Scheduling Services v1.2Distribution Status Console, Version 4.1#

7. To install IBM Tivoli Workload Scheduler Connector, run the following command:

# winstall -c install_dir -i TWS_CONN twsdir=/TWShome iname=instance owner=twsuser createinst=1 nodename

Where:

– Install_dir - the path of the installation image.

– twsdir - set this to /TWSHome.

– iname - the name of the IBM Tivoli Workload Scheduler Connector instance.

– owner - the name of the IBM Tivoli Workload Scheduler user.

8. Verify that IBM Tivoli Workload Scheduler Connector was installed by running the following command.

# wlsinst -p

This command shows a list of all the Tivoli products installed in your environment. You should see in the list “TWS Connector 8.2”.

The following is an example of a command output. The 11th line shows that IBM Tivoli Workload Scheduler Connector was installed successfully.


Example 4-9 wlsinst -p command output

# wlsinst -pTivoli Management Framework 4.1Tivoli ADE, Version 4.1 (build 09/19)Tivoli AEF, Version 4.1 (build 09/19)Tivoli Java Client Framework 4.1Java 1.3 for TivoliTivoli Java RDBMS Interface Module (JRIM) 4.1JavaHelp 1.0 for Tivoli 4.1Tivoli Software Installation Service Client, Version 4.1Tivoli Software Installation Service Depot, Version 4.1Tivoli Job Scheduling Services v1.2Tivoli TWS Connector 8.2Distribution Status Console, Version 4.1

#

4.1.6 Setting the securityAfter you have installed IBM Tivoli Workload Scheduler Connectors, apply changes to the IBM Tivoli Workload Scheduler Security file so that users can access IBM Tivoli Workload Scheduler through JSC. If you grant access to a Tivoli Administrator, then any operating system user associated to that Tivoli Administrator is granted access through JSC. For more information on IBM Tivoli Workload Scheduler Security file, refer to Tivoli Workload Scheduler Version 8.2 Installation Guide, SC32-1273. To modify the security file, follow the procedures described in this section.

For our scenario, we added the name of two Tivoli Administrators, Root_tivaix1-region and Root_tivaix2-region, to the Security file of each Master Domain Manager. Root_tivaix1-region is a Tivoli Administrator on tivaix1, and Root_tivaix2-region is a Tivoli Administrator on tivaix2. This will make each iIBM Tivoli Workload Scheduler Master Domain Manager accessible from either of the two TMR servers. In the event of a fallover, IBM Tivoli Workload Scheduler Master Domain Manager remains accessible from JSC through the Tivoli Administrator on the surviving node.

1. Log into IBM Tivoli Workload Scheduler master as the TWSuser. TWSuser is the user you have used to install IBM Tivoli Workload Scheduler.

2. Run the following command to dump the Security file to a text file.

# dumpsec > /tmp/sec.txt

3.Modify the security file and save your changes. Add the name of Tivoli Administrators to the LOGON clause.


Example 4-8 on page 197 illustrates a security file. This security file grants full privileged access to Tivoli Administrators called Root_tivaix1-region and Root_tivaix2-region.

Example 4-10 Example of a security file

USER MAESTRO CPU=@+LOGON=maestro,root,Root_tivaix2-region,Root_tivaix1-regionBEGIN USEROBJ CPU=@ ACCESS=ADD,DELETE,DISPLAY,MODIFY,ALTPASS JOB CPU=@ ACCESS=ADD,ADDDEP,ALTPRI,CANCEL,CONFIRM,DELDEP,DELETE,DISPLAY,KILL,MODIFY,RELEASE,REPLY,RERUN,SUBMIT,USE,LIST SCHEDULE CPU=@ ACCESS=ADD,ADDDEP,ALTPRI,CANCEL,DELDEP,DELETE,DISPLAY,LIMIT,MODIFY,RELEASE,REPLY,SUBMIT,LIST RESOURCE CPU=@ ACCESS=ADD,DELETE,DISPLAY,MODIFY,RESOURCE,USE,LIST PROMPT ACCESS=ADD,DELETE,DISPLAY,MODIFY,REPLY,USE,LIST FILE NAME=@ ACCESS=CLEAN,DELETE,DISPLAY,MODIFY CPU CPU=@ ACCESS=ADD,CONSOLE,DELETE,DISPLAY,FENCE,LIMIT,LINK,MODIFY,SHUTDOWN,START,STOP,UNLINK,LIST PARAMETER CPU=@ ACCESS=ADD,DELETE,DISPLAY,MODIFY CALENDAR ACCESS=ADD,DELETE,DISPLAY,MODIFY,USEEND

4. Verify your security file by running the following command. Make sure that no errors or warnings are displayed.

$ makesec -v /tmp/sec.txt

Example 4-11 shows the sample output of the makesec -v command:

Example 4-11 Output of makesec -v command

$ makesec -v /tmp/sec.txtTWS for UNIX (AIX)/MAKESEC 8.2 (9.3.1.1)Licensed Materials Property of IBM5698-WKB(C) Copyright IBM Corp 1998,2003US Government User Restricted RightsUse, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM

Note: Running makesec command with the -v option only verifies your security file to see there are no syntax errors. It does not update the security database.


Corp.MAKESEC:Starting user MAESTRO [/tmp/sec.txt (#2)]MAKESEC:Done with /tmp/sec.txt, 0 errors (0 Total)$

5. If there are no errors, compile the security file with the following command:

$ makesec /tmp/sec.txt

Example 4-12 illustrates output of the makesec command:

Example 4-12 Output of makesec command

$ makesec /tmp/sec.txtTWS for UNIX (AIX)/MAKESEC 8.2 (9.3.1.1)Licensed Materials Property of IBM5698-WKB(C) Copyright IBM Corp 1998,2003US Government User Restricted RightsUse, duplication or disclosure restricted by GSA ADP Schedule Contract with IBMCorp.MAKESEC:Starting user MAESTRO [/tmp/sec.txt (#2)]MAKESEC:Done with /tmp/sec.txt, 0 errors (0 Total)MAKESEC:Security file installed as /usr/maestro/Security$

6. When applying changes to the security file, the connector instance should be stopped to allow the change to take effect. Run the following commands to source the Tivoli environment variables and stop the connector instance:

$ . /etc/Tivoli/setup_env.sh$ wmaeutil inst_name -stop "*"

where inst_name is the name of the instance you would like to stop.

Example 4-13 shows an example of wmaeutil command to stop a connector instance called TIVAIX1.

Example 4-13 Output of wmaeutil command

$ . /etc/Tivoli/setup_env.sh$ wmaeutil TIVAIX1 -stop "*"AWSBCT758I Done stopping the ENGINE serverAWSBCT758I Done stopping the DATABASE serverAWSBCT758I Done stopping the PLAN server$

Note: You do not need to manually restart the connector instance, as it is automatically started when a user logs in to JSC.


7. Verify that the changes in the security file are effective. by running the dumpsec command. This will dump the current content of the security file into a text file. Open the text file and confirm that the previous change you have made is reflected:

$ dumpsec > filename

where filename is the name of the text file.

8. Verify that the changes are effective by logging into JSC as a user you have added in the security file.

4.1.7 Add additional IBM Tivoli Workload Scheduler Connector instance

One IBM Tivoli Workload Scheduler Connector instance can only be mapped to one IBM Tivoli Workload Scheduler instance. In our mutual takeover scenario, one TMR server would be hosting two instances of IBM Tivoli Workload Scheduler in case a fallover occurs. An additional IBM Tivoli Workload Scheduler Connector instance is required on each node so that a user can access both instances of IBM Tivoli Workload Scheduler on the surviving node.

We added a connector instance to each node to control both IBM Tivoli Workload Scheduler Master Domain Manager TIVAIX1 and TIVAIX2. To add an additional IBM Tivoli Workload Scheduler Connector Instance, perform the following tasks.

1. Log into a cluster node as root.

2. Source the Tivoli environment variables by running the following command:

#. /etc/Tivoli/setup_env.sh

3. List the existing connector instance:

# wlookup -ar MaestroEngine

Example 4-14 on page 201 shows one IBM Tivoli Workload Scheduler Connector instance called TIVAIX1.

Example 4-14 Output of wlookup command before adding additional instance

# wlookup -ar MaestroEngineTIVAIX1 1394109314.1.661#Maestro::Engine#

Note: You must install the Job Scheduling Services and IBM Tivoli Workload Scheduler Connector Framework products before performing these tasks.


4. Add an additional connector instance:

# wtwsconn.sh -create -n instance_name -t TWS_directory

where:

instance_name - the name of the instance you would like to add.

TWS_directory - the path where the IBM Tivoli Workload Scheduler engine associated with the instance resides.

Example 4-15 shows output for the wtwsconn.sh command. We added a TWS Connector instance called TIVAIX2. This instance is for accessing IBM Tivoli Workload Scheduler engine installed on /usr/maestro directory.

Example 4-15 Sample wtwsconn.sh command

# wtwsconn.sh -create -n TIVAIX2 -t /usr/maestroScheduler engine createdCreated instance: TIVAIX2, on node: tivaix1MaestroEngine 'maestroHomeDir' attribute set to: /usr/maestro2MaestroPlan 'maestroHomeDir' attribute set to: /usr/maestro2MaestroDatabase 'maestroHomeDir' attribute set to: /usr/maestro2

5. Run the wlookup -ar command again to verify that the instance was successfully added. The IBM Tivoli Workload Scheduler Connector that you have just added should show up in the list.

# wlookup -ar MaestroEngine

Example 4-16 shows that IBM Tivoli Workload Scheduler Connector instance TIVAIX2 is added to the list.

Example 4-16 Output of wlookup command after adding additional instance

# wlookup -ar MaestroEngineTIVAIX1 1394109314.1.661#Maestro::Engine#TIVAIX2 1394109314.1.667#Maestro::Engine##

4.1.8 Verify IBM Tivoli Workload Scheduler behavior in HACMP cluster

When you have finished installing IBM Tivoli Workload Scheduler, verify that the BM Tivoli Workload Scheduler is able to move from one node to another, and that it is able to run on the standby node(s) in the cluster.

It is important that you perform this task manually before applying fix packs, and also before you install HACMP. Making sure that IBM Tivoli Workload Scheduler


behaves as expected before each major change simplifies troubleshooting in case you have issues with IBM Tivoli Workload Scheduler. If you apply IBM Tivoli Workload Scheduler fix packs and install HACMP, and then find out that IBM Tivoli Workload Scheduler behaves unexpectedly, it would be difficult to determine the cause of the problem. Though it may seem cumbersome, we strongly recommend that you verify IBM Tivoli Workload Scheduler behavior before you make a change to a system. The sequence of the verification is as follows.

1. Stop IBM Tivoli Workload Scheduler on a cluster node. Log in as TWSuser and run the following command:

$ conman "shut ;wait"

2. Migrate the volume group to another node. Refer to the volume group migration procedure described in “Define the shared LVM components” on page 94.

3. Start IBM Tivoli Workload Scheduler on the node by running the conman start command:

$ conman start

4. Verify the batchman status. Make sure the Batchman status is LIVES.

$ conman status

5. Verify that all IBM Tivoli Workload Scheduler processes are running by issuing the ps command:

$ ps -ef | grep -v grep | grep maestro

Example 4-17 shows an example of ps command output. Check that netman, mailman, batchman and jobman processes are running for each IBM Tivoli Workload Scheduler instance installed.

Example 4-17 Output of ps command

$ ps -ef | grep -v grep | grep maestro maestro 26378 43010 1 18:46:58 pts/1 0:00 -ksh root 30102 34192 0 18:49:59 - 0:00 /usr/maestro/bin/jobman maestro 33836 38244 0 18:49:59 - 0:00 /usr/maestro/bin/mailman -parm 32000 -- 2002 TIVAIX1 CONMAN UNIX 8.2 MESSAGE maestro 34192 33836 0 18:49:59 - 0:00 /usr/maestro/bin/batchman -parm 32000 maestro 38244 1 0 18:49:48 - 0:00 /usr/maestro/bin/netman maestro 41214 26378 4 18:54:52 pts/1 0:00 ps -ef$


6. If using JSC, log into the IBM Tivoli Workload Scheduler Master Domain Manager. Verify that you are able to see the scheduling objects and the production plan.

4.1.9 Applying IBM Tivoli Workload Scheduler fix packWhen you have completed installing IBM Tivoli Workload Scheduler and IBM Tivoli Workload Scheduler Connector, apply the latest fix pack available. For instructions on installing the fix pack for IBM Tivoli Workload Scheduler engine, refer to the README file included in each fix pack. The IBM Tivoli Workload Scheduler engine fix pack can be applied either from the command line by using the twspatch script, or from the Java-based graphical user interface.

The IBM Tivoli Workload Scheduler Connector fix pack is applied from the Tivoli Desktop. Because instructions on applying IBM Tivoli Workload Scheduler Connectors are not documented in the fix pack README, we describe the procedures to install IBM Tivoli Workload Scheduler Connector fix packs here.

Before applying any of the fix packs, make sure you have a viable backup.

Applying IBM Tivoli Workload Scheduler Connector fix pack from Tivoli Desktop

Install the IBM Tivoli Workload Scheduler Connector fix pack as follows:

1. Set the installation media. If you are using a CD, then insert the CD. If you have downloaded the fix pack from the fix pack download site, then extract the tar file in a temporary directory.

2. Log in to TMR server using the Tivoli Desktop. Enter the host machine name, user name, and password, then press OK as seen in Figure 4-4 on page 205.

Note: The same level of fix pack should be applied to the IBM Tivoli Workload Scheduler engine and the IBM Tivoli Workload Scheduler Connector. If you apply a fix pack to the IBM Tivoli Workload Scheduler engine, make sure you apply the same level of fix pack for IBM Tivoli Workload Scheduler Connector.


Figure 4-4 Logging into IBM Tivoli Management Framework through the Tivoli Desktop

3. Select Desktop -> Install -> Install Patch as seen in Figure 4-5.

Figure 4-5 Installing the fix pack

4. If the error message in Figure 4-6 on page 206 is shown, press OK and proceed to the next step.


Figure 4-6 Error message

5. In the Path Name field, enter the full path of the installation image, as shown in Figure 4-7. The full path should be the directory where U2_TWS.IND file resides.

Figure 4-7 Specifying the path to the installation image

6. In the Install Patch dialog (Figure 4-8 on page 207), select the fix pack from the Select Patches to Install list. Then make sure the node to install the fix pack is shown in the Clients to Install On list. Press Install.


Figure 4-8 Install Patch

7. Pre-installation verification is performed, and then you are prompted to continue or not. If there are no errors or warnings shown in the dialog, press Continue Install (Figure 4-9 on page 208).


Figure 4-9 Patch Installation

8. Confirm the “Finished Patch Installation” message, then press Close.

9. Log in, as root user, to the node where you just installed the fix pack.

10.Source the Tivoli environment variables:

# . /etc/Tivoli/setup_env.sh

11.Verify that the fix pack was installed successfully:

# wlsinst -P

For IBM Tivoli Workload Scheduler Connector Fix Pack 01, confirm that Tivoli TWS Connector upgrade to v8.2 patch 1 is included in the list. For Fix Pack 02, confirm that Tivoli TWS Connector upgrade to v8.2 patch 2 is included in the list. Example 4-18 on page 209 shows an output of the wlsinst command after installing Fix Pack 01.


Example 4-18 Verifying the fix pack installation

# wlsinst -P4.1-TMF-0008 Tier 2 3.7 Endpoint Bundles for Tier1 GatewaysTivoli Framework Patch 4.1-TMF-0013 (build 05/28)Tivoli Framework Patch 4.1-TMF-0014 (build 05/30)Tivoli Framework Patch 4.1-TMF-0015 for linux-ppc (LCF41) (build 05/14)Tivoli Management Agent 4.1 for iSeries Endpoint (41016)Tivoli Framework Patch 4.1-TMF-0034 (build 10/17)Java 1.3 for Tivoli, United LinuxTivoli Management Framework, Version 4.1 [2928] os400 Endpoint French languageTivoli Management Framework, Version 4.1 [2929] os400 Endpoint German languageTivoli Management Framework, Version 4.1 [2931] os400 Endpoint Spanish languageTivoli Management Framework, Version 4.1 [2932] os400 Endpoint Italian languageTivoli Management Framework, Version 4.1 [2962] os400 Endpoint Japanese languageTivoli Management Framework, Version 4.1 [2980] os400 Endpoint Brazilian Portuguese languageTivoli Management Framework, Version 4.1 [2984] os400 Endpoint DBCS English languageTivoli Management Framework, Version 4.1 [2986] os400 Endpoint Korean languageTivoli Management Framework, Version 4.1 [2987] os400 Endpoint Traditional Chinese languageTivoli Management Framework, Version 4.1 [2989] os400 Endpoint Simplified Chinese languageTivoli TWS Connector upgrade to v8.2 patch 1#

Best practices for applying IBM Tivoli Workload Scheduler fix packAs of December 2003, the latest fix pack for IBM Tivoli Workload Scheduler 8.2 is 8.2-TWS-FP02. Because 8.2-TWS-0002 is dependent on 8.2-TWS-FP01, we applied both fix packs. Here are some hints and tips for applying these fix packs.

Additional disk space required for backup filesThough not mentioned in the README for 8.2-TWS-FP01, a backup copy of the existing binaries is created under the home directory of the user applying the fix. For applying IBM Tivoli Workload Scheduler fix pack, we use the root user. This means that the backup is created under the home directory of the root user.

Before applying the fix, confirm that you have enough space in that directory for the backup file. For UNIX systems, it is 25 MB. If you do not have enough space on that directory, the fix pack installation may fail with the message shown in


Example 4-19. This example shows an installation failure message when installation of the fix pack was initiated from the command line.

Example 4-19 IBM Tivoli Workload Scheduler fix pack installation error

# ./twspatch -install -uname maestro2

Licensed Materials Property of IBMTWS-WSH(C) Copyright IBM Corp 1998,2003US Government User Restricted RightsUse, duplication or disclosure restricted by GSA ADP Schedule Contract with IBMCorp.TWS for UNIX/TWSPATCH 8.2Revision: 1.5

AWSFAF027E Error: Operation INSTALL failed. For more details see the /tmp/FP_TWS_AIX_maestro2^8.2.0.01.log log file.#

Check the fix pack installation log file. On UNIX systems, the fix pack installation log file is saved in /tmp directory. These logs are named twspatchXXXXX.log, where XXXXX is a 5-digit random number. Example 4-20 shows the log file we received when we had insufficient disk space.

Example 4-20 The contents of /tmp/twspatchXXXXX.log

Tue Dec 2 19:24:53 CST 2003

DISSE0006E Operation unsuccessful: fatal failure.

If you do not have sufficient disk space on the desired directory, you can either add additional disk space, or change the backup directory to another directory with sufficient disk space. For instructions on how to change the backup directory, refer to the README file attached to 8.2-TWS-FP02.

Note: Changing the backup directory requires a modification of a file used by IBM Tivoli Configuration Manager 4.2, and changes may affect the behavior of TCM 4.2 if you have it installed on your system. Consult your IBM service provider for more information.

4.1.10 Configure HACMP for IBM Tivoli Workload SchedulerAfter you complete the installation of the application server (IBM Tivoli Workload Scheduler, in this redbook) and then HACMP, you configure HACMP as you


planned in 3.2.3, “Planning and designing an HACMP cluster” on page 67, so the application server can be made highly available.

In this section we show how to configure HACMP specifically for IBM Tivoli Workload Scheduler. Configuration of HACMP 5.1 can be carried out through the HACMP menu of the SMIT interface, or by the Online Planning Worksheets tool shipped with the HACMP 5.1 software. In this and in the following sections, we describe the steps to configure HACMP using the SMIT interface to support IBM Tivoli Workload Scheduler. We walk you through a series of steps that are specifically tailored to make the following scenarios highly available:

� IBM Tivoli Workload Scheduler

� IBM Tivoli Workload Scheduler with IBM Tivoli Management Framework

� IBM Tivoli Management Framework (shown in Chapter 5, “Implement IBM Tivoli Management Framework in a cluster” on page 415)

The Online Planning Worksheet is a Java-based worksheet that will help you plan your HACMP configuration. It generates a configuration file based on the information you have entered that can be directly loaded into a live HACMP cluster, and it also generates a convenient HTML page documenting the configuration. We do not show how to use this worksheet here; for a complete and detailed explanation of this worksheet, see Chapter 16, “Using Online

Note: We strongly recommend that you install your application servers and ensure they function properly before installing HACMP. In the environment we used for this redbook, we installed IBM Tivoli Workload Scheduler and/or IBM Tivoli Management Framework as called for by the scenarios we implement. This section is specifically oriented towards showing you how to configure HACMP for IBM Tivoli Workload Scheduler in a mutual takeover cluster.

Note: There are many other possible scenarios, and many features are not used by our configuration in this redbook and not covered in the following sections. Any other scenario should be planned and configured using the HACMP manuals and IBM Redbooks, or consult your IBM service provider for assistance in planning and implementation of other scenarios.


Planning Worksheets”, High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00.

Following is an overview of the steps we use to configure HACMP for our IBM Tivoli Workload Scheduler environment, and where you can find each step:

� “Configure heartbeating” on page 213

� “Configure HACMP topology” on page 219

� “Configure HACMP service IP labels/addresses” on page 252

� “Configure application servers” on page 223

� “Configure application monitoring” on page 227

� “Add custom start and stop HACMP scripts” on page 234

� “Add a custom post-event HACMP script” on page 242

� “Modify /etc/hosts and name resolution order” on page 250

� “Configure HACMP networks and heartbeat paths” on page 254

� “Configure HACMP resource groups” on page 257

� “Configure HACMP resource groups” on page 257

� “Configure cascading without fallback” on page 264

� “Configure pre-event and post-event commands” on page 267

� “Configure pre-event and post-event processing” on page 269

� “Configure HACMP persistent node IP label/addresses” on page 272

� “Configure predefined communication interfaces” on page 276

� “Verify the configuration” on page 280

� “Start HACMP Cluster services” on page 287

� “Verify HACMP status” on page 292

� “Test HACMP resource group moves” on page 294

� “Live test of HACMP fallover” on page 298

Note: One of the major drawbacks when you use the Online Planning Worksheet is that certain HACMP configurations that are accepted by the tool might cause problems on a live HACMP cluster. The SMIT screens that we show in this redbook tend to catch these problems.

Our recommendation, as of HACMP Version 5.1, is to use the Online Planning Worksheet to create convenient HTML documentation of the configuration, and then manually configure the cluster through the SMIT screens.


� “Configure HACMP to start on system restart” on page 300

� “Verify IBM Tivoli Workload Scheduler fallover” on page 301

The details of each step are as follows.

Configure heartbeatingThe configuration we used implements two heartbeat mechanisms: one over the IP network, and one over the SSA disk subsystem (called target mode SSA). Best practices call for implementing at least one non-IP point-to-point network for exchanging heartbeat keepalive packets between cluster nodes, in case the TCP/IP-based subsystem, networks, or network NICs fail. Available non-IP heartbeat mechanisms are:

� Target Mode SSA

� Target Mode SCSI

� Serial (also known as RS-232C)

� Heartbeating over disk (only available for enhanced concurrent mode volume groups)

In this section, we describe how to configure a target mode SSA connection between HACMP nodes sharing disks connected to SSA on Multi-Initiator RAID adapters (FC 6215 and FC 6219). The adapters must be at Microcode Level 1801 or later.

You can define a point-to-point network to HACMP that connects all nodes on an SSA loop. The major steps of configuring target mode SSA are:

� “Changing node numbers on systems in SSA loop” on page 213

� “Configuring Target Mode SSA devices” on page 215

The details of each step follows.

Changing node numbers on systems in SSA loopBy default, SSA node numbers on all systems are zero. These must to be changed to unique, non-zero numbers on the nodes to enable target mode SSA.

To configure the target mode SSA devices:

1. Assign a unique non-zero SSA node number to all systems on the SSA loop.

Note: The ID on a given SSA node should match the HACMP node ID, which is contained in the node_id field of the HACMP node ODM entry.


The following command retrieves the HACMP node ID:

odmget -q "name = node_name" HACMPnode

where node_name is the HACMP node name of the cluster node. In our environment, we used tivaix1 and tivaix2 as the values for node_name.

Example 4-21 shows how we determined the HACMP node ID for tivaix1. Here we determined that tivaix1 uses node ID 1, based upon the information in the line highlighted in bold that starts with the string “node_id”.

Example 4-21 How to determine a cluster node’s HACMP node ID

[root@tivaix1:/home] odmget -q "name = tivaix1" HACMPnode | grep -p COMMUNICATION_PATHHACMPnode: name = "tivaix1" object = "COMMUNICATION_PATH" value = "9.3.4.194" node_id = 1 node_handle = 1 version = 6

Note that we piped the output from the odmget command used to the grep command to extract one stanza of the odmget command. If you omitted this part of the command string in the preceding figure, multiple stanzas are displayed that all have the same node_id field.

2. To change the SSA node number:

chdev -l ssar -a node_number=number

where number is the new SSA node number. Best practice calls for using the same number as the HACMP node ID determined in the preceding step.

In our environment, we assigned SSA node number 2 to tivaix1 and SSA node number 1 to tivaix2.

3. To show the system’s SSA node number:

lsattr -El ssar

Example 4-22 shows the output of this command for tivaix1, where the node number is highlighted in bold.

Example 4-22 Show a system’s SSA node number, taken from tivaix1

[root@tivaix1:/home] lsattr -El ssarnode_number 1 SSA Network node number True

Note: If you are using IBM AIX General Parallel File System (GPFS), you must make the SSA node number match the HACMP cluster node ID.


Repeat this procedure on each cluster node, assigning a different SSA node number for each cluster node. In our environment, Example 4-23 shows that tivaix2 was assigned SSA node number 2.

Example 4-23 Show a system’s SSA node number, taken from tivaix2

[root@tivaix2:/home] lsattr -El ssarnode_number 2 SSA Network node number True

Configuring Target Mode SSA devicesAfter enabling the target mode interface, run cfgmgr to create the initiator and target devices and make them available.

To create the initiator and target devices:

1. Enter: smit devices. SMIT displays a list of devices.

2. Select Install/Configure Devices Added After IPL and press Enter.

3. Exit SMIT after the cfgmgr command completes.

4. Ensure that the devices are paired correctly:

lsdev -C | grep tmssa

Example 4-24 shows this command’s output on tivaix1 in our environment.

Example 4-24 Ensure that target mode SSA is configured on a cluster node, taken from tivaix1

[root@tivaix1:/home] lsdev -C | grep tmssatmssa2 Available Target Mode SSA Devicetmssar Available Target Mode SSA Router

Example 4-25 shows this command’s output on tivaix2 in our environment.

Example 4-25 Ensure that target mode SSA is configured on a cluster node, taken from tivaix2

# lsdev -C | grep tmssatmssa1 Available Target Mode SSA Devicetmssar Available Target Mode SSA Router

Note how each cluster node uses the same target mode SSA router, but different target mode SSA devices. The differences are highlighted in bold in the preceding figures. Cluster node tivaix1 uses target mode SSA device tmssa2, while cluster node tivaix2 uses tmssa1.

Repeat the procedures for enabling and configuring the target mode SSA devices for other nodes connected to the SSA adapters.


Configuring the target mode connection creates two target mode files in the /dev directory of each node:

� /dev/tmssan.im - the initiator file, which transmits data

� /dev/tmssan.tm - the target file, which receives data

where n is a number that uniquely identifies the target mode file. Note that this number is different than the SAA® node number and HACMP node ID from the preceding section. These numbers are deliberately set differently.

Example 4-26 shows the target mode files created in the /dev directory for tivaix1 in our environment.

Example 4-26 Display the target mode SSA files for tivaix1

[root@tivaix1:/home] ls /dev/tmssa*.im /dev/tmssa*.tm/dev/tmssa2.im /dev/tmssa2.tm

Example 4-27 shows the target mode files created in the /dev directory for tivaix2 in our environment.

Example 4-27 Display the target mode SSA files for tivaix2

[root@tivaix2:/home] ls /dev/tmssa*.im /dev/tmssa*.tm/dev/tmssa1.im* /dev/tmssa1.tm*

Testing the target mode connectionIn order for the target mode connection to work, initiator and target devices must be paired correctly.

To ensure that devices are paired and that the connection is working after enabling the target mode connection on both nodes:

1. Enter the following command on a node connected to the SSA disks:

cat < /dev/tmssa#.tm

where # must be the number of the target node. (This command hangs and waits for the next command.)

Note: On page 273, in section “Configuring Target Mode SSA Devices” of High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00, these target mode SSA files are referred to as /dev/tmscsinn.im and /dev/tmscsinn.tm. We believe this is incorrect, because these are the files used for target mode SCSI heartbeating. This redbook shows what we believe are the correct file names. This includes the corrected unique identifiers, changed from two digits (nn) to one digit (n).


In our environment, on tivaix1 we ran the command:

cat < /dev/tmssa2.tm

2. On the target node, enter the following command:

cat filename > /dev/tmssa#.im

where # must be the number of the sending node and filename is any short ASCII file.

The contents of the specified file are displayed on the node on which you entered the first command.

In our environment, on tivaix2 we ran the command:

cat /etc/hosts > /dev/tmssa1.im

The contents of /etc/hosts on tivaix2 is shown in the terminal session of tivaix1.

3. You can also check that the tmssa devices are available on each system:

lsdev -C | grep tmssa

Defining the Target Mode SSA network to HACMPTo configure the Target Mode SSA point-to-point network in the HACMP custer, follow these steps.

1. Enter: smit hacmp.

2. In SMIT, select Extended Configuration -> Extended Topology Configuration -> Configure HACMP Networks -> Add a Network to the HACMP Cluster and press Enter.

SMIT displays a choice of types of networks.

3. Select the type of network to configure (select tmssa because we are using target mode SSA) and press Enter. The Add a Serial Network screen is displayed as shown in Figure 4-10 on page 218.


Figure 4-10 Filling out the Add a Serial Network to the HACMP Cluster SMIT screen

4. Fill in the fields on the Add a Serial Network screen as follows:

Network Name Name the network, using no more than 32 alphanumeric characters and underscores; do not begin the name with a numeric.

Do not use reserved names to name the network. For a list of reserved names see High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862.

Network Type Valid types are RS232, tmssa, tmscsi, diskhb. This is filled in for you by the SMIT screen.

5. Press Enter to configure this network.

6. Return to the Add a Serial Network SMIT screen to configure more networks if necessary.

For our environment, we configured net_tmssa_01 as shown in Figure 4-10. No other serial networks were necessary.

Add a Serial Network to the HACMP Cluster


[Entry Fields]* Network Name [net_tmssa_01]* Network Type tmssa



Configure HACMP topologyComplete the following procedures to define the cluster topology. You only need to perform these steps on one node. When you verify and synchronize the cluster topology, its definition is copied to the other nodes. To define and configure nodes for the HACMP cluster topology:

1. Enter: smitty hacmp. The HACMP for AIX SMIT screen is displayed as shown in Figure 4-11.

Figure 4-11 HACMP for AIX SMIT screen

2. Go to Initialization and Standard Configuration -> Add Nodes to an HACMP Cluster and press Enter. The Configure Nodes to an HACMP Cluster (standard) SMIT screen is displayed as shown in Figure 4-12 on page 220.

HACMP for AIX


Initialization and Standard Configuration Extended Configuration System Management (C-SPOC) Problem Determination Tools



Figure 4-12 Configure nodes to an HACMP Cluster

3. Enter field values on the Configure Nodes to an HACMP Cluster screen as follows:

Cluster Name Enter an ASCII text string that identifies the cluster. The cluster name can include alpha and numeric characters and underscores, but cannot have a leading numeric. Use no more than 32 characters. It can be different from the hostname.

Do not use reserved names. For a list of reserved names see Chapter 6, “Verifying and Synchronizing a Cluster Configuration”, in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862.

New nodes (via selected communication paths)

Enter (or add) one resolvable IP Label (this may be the hostname), IP address, or Fully Qualified Domain Name for each new node in the cluster, separated by spaces.

Configure Nodes to an HACMP Cluster (standard)


[Entry Fields]* Cluster Name [cltivoli] New Nodes (via selected communication paths) [tivaix1 tivaix2] + Currently Configured Node(s)



This path will be taken to initiate communication with the node (for example, NodeA, 10.11.12.13, NodeC.ibm.com). Use F4 to see the picklist display of the hostnames and/or addresses in /etc/hosts that are not already HACMP-configured IP Labels/Addresses.

You can add node names or IP addresses in any order.

Currently configured node(s)

If nodes are already configured, they are displayed here.

In our environment, we entered cltivoli in the Cluster Name field and tivaix1 tivaix2 in the New Nodes (via selected communication paths) path.

4. Press Enter to configure the nodes of the HACMP cluster. A COMMAND STATUS SMIT screen displays the progress of the cluster node configurations.

The HACMP software uses this information to create the cluster communication paths for the ODM. Once communication paths are established, HACMP runs the discovery operation and prints results to the SMIT screen.

5. Verify that the results are reasonable for your cluster.

At this point HACMP does not know how to locate the cluster nodes’ this step only reserves spaces for these nodes. The following steps fill out the remaining information that enables HACMP to associate actual computing resources like disks, processes, and networks with these newly-reserved cluster nodes.

Configure HACMP service IP labels/addressesA service IP label/address is used to establish communication between client nodes and the server node. Services, such as a database application, are provided using the connection made over the service IP label. This connection can be node-bound or taken over by multiple nodes.

For the standard configuration, it is assumed that the connection will allow IP Address Takeover (IPAT) via aliases.The /etc/hosts file on all nodes must contain all IP labels and associated IP addresses that you want to discover.

Follow this procedure to define service IP labels for your cluster:


2. Go to HACMP -> Initialization and Standard Configuration -> Configure Resources to Make Highly Available -> Configure Service IP Labels/Addresses and press Enter.


3. Fill in field values as follows as shown in Figure 4-13:

IP Label/IP Address Enter, or select from the picklist, the IP label/IP address to be kept highly available.

Network Name Enter the symbolic name of the HACMP network on which this Service IP label/address will be configured. If you leave the field blank, HACMP fills in this field automatically with the network type plus a number appended, starting with 1 (for example, netether1).

Figure 4-13 Enter service IP label for tivaix1

Figure 4-13 shows how we entered the service address label for tivaix1. In our environment, we used tivaix1_svc as the IP label and net_ether_01 as the network name.

4. Press Enter after filling in all required fields. HACMP now checks the validity of the IP interface configuration.

5. Repeat the previous steps until you have configured all IP service labels for each network, as needed.

Add a Service IP Label/Address (standard)


[Entry Fields]* IP Label/Address [tivaix1_svc] +* Network Name [net_ether_01] +



In our environment, we created another service IP label for cluster node tivaix2, as shown in Figure 4-14. We used tivaix2_svc as the IP label and net_ether_01 as the network name. Note how we assigned the network name net_ether_01 in both cases, so that both sets of service IP labels are in the same HACMP network.

Figure 4-14 Enter service IP labels for tivaix2

Configure application serversAn application server is a cluster resource used to control an application that must be kept highly available. Configuring an application server does the following:

� Associates a meaningful name with the server application. For example, you could give an installation of IBM Tivoli Workload Scheduler a name such as itws. You then use this name to refer to the application server when you define it as a resource.

� Points the cluster event scripts to the scripts that they call to start and stop the server application.

� Allows you to then configure application monitoring for that application server.






We show you in “Add custom start and stop HACMP scripts” on page 234 how to write the start and stop scripts for IBM Tivoli Workload Scheduler.

Complete the following steps to create an application server on any cluster node:

1. Enter smitty hacmp.

2. Go to Initialization and Standard Configuration -> Configure Resources to Make Highly Available -> Configure Application Servers and press Enter. The Configure Resources to Make Highly Available SMIT screen is displayed as shown in Figure 4-15.

Figure 4-15 Configure Resources to Make Highly Available SMIT screen

Go to Add an Application Server and press Enter (Figure 4-16 on page 225).

Note: Ensure that the server start and stop scripts exist on all nodes that participate as possible owners of the resource group where this application server resides.

Configure Resources to Make Highly Available


Configure Service IP Labels/Addresses Configure Application Servers Configure Volume Groups, Logical Volumes and Filesystems Configure Concurrent Volume Groups and Logical Volumes



Figure 4-16 Configure Application Servers SMIT screen

3. The Add Application Server SMIT screen is displayed as shown in Figure 4-17 on page 226. Enter field values as follows:

Server Name Enter an ASCII text string that identifies the server. You will use this name to refer to the application server when you define resources during node configuration. The server name can include alphabetic and numeric characters and underscores. Use no more than 64 characters.

Start Script Enter the name of the script and its full pathname (followed by arguments) called by the cluster event scripts to start the application server (maximum: 256 characters). This script must be in the same location on each cluster node that might start the server. The contents of the script, however, may differ.

Stop Script Enter the full pathname of the script called by the cluster event scripts to stop the server (maximum: 256 characters). This script must be in the same location

Configure Application Servers


Add an Application Server Change/Show an Application Server Remove an Application Server



on each cluster node that may start the server. The contents of the script, however, may differ.

Figure 4-17 Fill out the Add Application Server SMIT screen for application server tws_svr1

As shown in Figure 4-17, in our environment on tivaix1 we named the instance of IBM Tivoli Workload Scheduler that normally runs on that cluster node tws_svr1. For the instance of IBM Tivoli Workload Scheduler on tivaix2, we name the corresponding application server object tws_svr2. Note that no mention is made of the cluster nodes when defining an application server. We only mention them to make you aware of the conventions we used in our environment.

For the start script of application server tws_svr1, we entered the following in the Start Script field:

/usr/es/sbin/cluster/utils/start_tws1.sh

The stop script of this application server is:

/usr/es/sbin/cluster/utils/stop_tws1.sh

This is entered in the Stop Script field.

4. Press Enter to add this information to the ODM on the local node.

Add Application Server


[Entry Fields]* Server Name [tws_svr1]* Start Script [/usr/es/sbin/cluster/>* Stop Script [/usr/es/sbin/cluster/>



5. Repeat the procedure for all additional application servers.

In our environment, we added a definition for application server tws_svr2, using the start script for the Start Script field:

/usr/es/sbin/cluster/utils/start_tws2.sh

For tws_svr2, we entered the following stop script in the Stop Script field:

/usr/es/sbin/cluster/utils/stop_tws2.sh

Figure 4-18 shows how we filled out the SMIT screen to define application server tws_svr2.

Figure 4-18 Fill out the Add Application Server SMIT screen for application server tws_svr2

You only need to perform this on one cluster node. When you verify and synchronize the cluster topology, the new application server definitions are copied to the other nodes.

Configure application monitoringHACMP can monitor specified applications and automatically take action to restart them upon detecting process death or other application failures.



[Entry Fields]* Server Name [tws_svr2]* Start Script [/usr/es/sbin/cluster/>* Stop Script [/usr/es/sbin/cluster/>



You can select either of two application monitoring methods:

� Process application monitoring detects the death of one or more processes of an application, using RSCT Event Management.

� Custom application monitoring checks the health of an application with a custom monitor method at user-specified polling intervals.

Process monitoring is easier to set up, because it uses the built-in monitoring capability provided by RSCT and requires no custom scripts. However, process monitoring may not be an appropriate option for all applications. Custom monitoring can monitor more subtle aspects of an application’s performance and is more customizable, but it takes more planning, because you must create the custom scripts.

In this section, we show how to configure process monitoring for IBM Tivoli Workload Scheduler. Remember that an application must be defined to an application server before you set up the monitor.

For IBM Tivoli Workload Scheduler, we configure process monitoring for the netman process because it will always run under normal conditions. If it fails, we want the cluster to automatically fall over, and not attempt to restart netman.

Because netman starts very quickly, we only give it 60 seconds to start before monitoring begins. For cleanup and restart scripts, we will use the same scripts as the start and stop scripts discussed in “Add custom start and stop HACMP scripts” on page 234.

Note: If a monitored application is under control of the system resource controller, check to be certain that action:multi are -O and -Q. The -O specifies that the subsystem is not restarted if it stops abnormally. The -Q specifies that multiple instances of the subsystem are not allowed to run at the same time. These values can be checked using the following command:

lssrc -Ss Subsystem | cut -d : -f 10,11

If the values are not -O and -Q, then they must be changed using the chksys command.


Set up your process application monitor as follows:


2. Go to Extended Configuration -> Extended Resource Configuration -> Extended Resources Configuration -> Configure HACMP Application Monitoring -> Configure Process Application Monitor -> Add Process Application Monitor and press Enter. A list of previously defined application servers appears.

3. Select the application server for which you want to add a process monitor.

In our environment, we selected tws_svr1, as shown in Figure 4-19.

Figure 4-19 How to select an application server to monitor

4. In the Add Process Application Monitor screen, fill in the field values as follows:

Monitor Name

This is the name of the application monitor. If this monitor is associated with an application server, the

Tip: For more comprehensive application monitoring by HACMP, configure process monitoring for the IBM Tivoli Workload Scheduler processes batchman, jobman, mailman, and writer. Define application server resources for each of these processes before defining the process monitoring for them.

If you do this, be sure to use the cl_RMupdate command to suspend monitoring before Jnextday starts and to resume monitoring after Jnextday completes. Otherwise, the cluster will interpret the Jnextday-originated shutdown of these processes as a failure of the cluster node and inadvertently start a fallover.

+--------------------------------------------------------------------------+¦ Application Server to Monitor ¦¦ ¦¦ Move cursor to desired item and press Enter. ¦¦ ¦¦ tws_svr1 ¦¦ tws_svr2 ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F8=Image F10=Exit Enter=Do ¦¦ /=Find n=Find Next ¦+--------------------------------------------------------------------------+


monitor has the same name as the application server. This field is informational only and cannot be edited.

Application Server Name

(This field can be chosen from the picklist. It is already filled in with the name of the application server you selected.)

Processes to Monitor

Specify the process(es) to monitor. You can type more than one process name. Use spaces to separate the names.

Process Owner

Specify the user id of the owner of the processes specified above (for example: root). Note that the process owner must own all processes to be monitored.

Instance Count

Specify how many instances of the application to monitor. The default is 1 instance. The number of instances must match the number of processes to monitor exactly. If you put one instance, and another instance of the application starts, you will receive an application monitor error.

Stabilization Interval

Specify the time (in seconds) to wait before beginning monitoring. For instance, with a database application, you may wish to delay monitoring until after the start script and initial database search have been completed. You may need to experiment with this value to balance performance with reliability.

Note: To be sure you are using correct process names, use the names as they appear from the ps -el command (not ps -f), as explained in section “Identifying Correct Process Names” in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862.

Note: This number must be 1 if you have specified more than one process to monitor (one instance for each process).


Restart Count

Specify the restart count, that is the number of times to attempt to restart the application before taking any other actions. The default is 3.

Restart Interval

Specify the interval (in seconds) that the application must remain stable before resetting the restart count. Do not set this to be shorter than (Restart Count) x (Stabilization Interval). The default is 10% longer than that value. If the restart interval is too short, the restart count will be reset too soon and the desired fallover or notify action may not occur when it should.

Action on Application Failure

Specify the action to be taken if the application cannot be restarted within the restart count. You can keep the default choice notify, which runs an event to inform the cluster of the failure, or select fallover, in which case the resource group containing the failed application moves over to the cluster node with the next highest priority for that resource group.

For more information, refer to “Note on the Fallover Option and Resource Group Availability” in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862.

Notify Method

(Optional) Define a notify method that will run when the application fails. This custom method runs during the restart process and during notify activity.

Cleanup Method

(Optional) Specify an application cleanup script to be invoked when a failed application is detected, before invoking the restart method. The default is the

Note: In most circumstances, this value should not be zero.

Note: Make sure you enter a Restart Method if your Restart Count is any non-zero value.


application server stop script defined when the application server was set up.

Restart Method

(Required if Restart Count is not zero.) The default restart method is the application server start script defined previously, when the application server was set up. You can specify a different method here if desired.

In our environment, we entered the process /usr/maestro/bin/netman in the Process to Monitor field, maestro in the Process Owner field, 60 in the Restart Interval field, 0 in the Restart Count field, and fallover in the Action on Application Failure field; all other fields were left unchanged, as shown in Figure 4-20.

Figure 4-20 Add Process Application Monitor SMIT screen for application server tws_svr1

Note: With application monitoring, since the application is already stopped when this script is called, the server stop script may fail.

Add Process Application Monitor


[Entry Fields]* Monitor Name tws_svr1* Application Server Name tws_svr1 +* Processes to Monitor [/usr/maestro/bin/netm>* Process Owner [maestro] Instance Count [] #* Stabilization Interval [60] #* Restart Count [0] # Restart Interval [] #* Action on Application Failure [fallover] + Notify Method [] Cleanup Method [/usr/es/sbin/cluster/> Restart Method [/usr/es/sbin/cluster/>



In our environment, the COMMAND STATUS SMIT screen displays two warnings as shown in Figure 4-21, which could we safely ignore because the default values applied are the desired values.

Figure 4-21 COMMAND STATUS SMIT screen after creating HACMP process application monitor

5. Press Enter when you have entered the desired information.

The values are then checked for consistency and entered into the ODM. When the resource group comes online, the application monitor starts.

6. Repeat the operation for remaining application servers.

In our environment, we repeated the operation for application server tws_svr2. We entered the field values as shown in Figure 4-22 on page 234.

COMMAND STATUS



claddappmon warning: The parameter "INSTANCE_COUNT" was not specified. Will use 1.claddappmon warning: The parameter "RESTART_INTERVAL" was not specified. Willuse 0.



Figure 4-22 Add Process Application Monitor SMIT screen for application server tws_svr2

We entered the process /usr/maestro2/bin/netman in the Process to Monitor field, maestro2 in the Process Owner field, 60 in the Restart Interval field, 0 in the Restart Count field, and fallover in the Action on Application Failure field; all other fields were left unchanged.

Add custom start and stop HACMP scriptsFor IBM Tivoli Workload Scheduler, custom scripts for HACMP are required to start and stop the application server. These are used when HACMP starts an application server that is part of a resource group, and gracefully shuts down the application server when a resource group is taken offline or moved. The stop script, of course, does not get an opportunity to execute if a cluster node is unexpectedly halted. We developed the following basic versions of the scripts for our environment. You may need to write your own version to accommodate your site’s specific requirements.

Both of these example scripts are designed to recognize how they were called. That is, the name of the script is passed into itself, and based upon this name it



[Entry Fields]* Monitor Name tws_svr2* Application Server Name tws_svr2 +* Processes to Monitor [/usr/maestro/bin/netm>* Process Owner [maestro2] Instance Count [] #* Stabilization Interval [60] #* Restart Count [0] # Restart Interval [] #* Action on Application Failure [fallover] + Notify Method [] Cleanup Method [/usr/es/sbin/cluster/> Restart Method [/usr/es/sbin/cluster/>



performs certain actions. Our environment’s design has two variable factors when starting and stopping IBM Tivoli Workload Scheduler:

� Name of the TWSuser user account associated with a particular instance of IBM Tivoli Workload Scheduler. In our environment, there are two instances of the application, and the user accounts maestro and maestro2 are associated with these instances.

� Path to the installation of each instance of IBM Tivoli Workload Scheduler, called the TWShome directory. In our environment, the two instances are installed under /usr/maestro and /usr/maestro2.

The scripts are designed so that when they are called with a name that follows a certain format, they will compute these variable factors depending upon the name. The format is start_twsn.sh and stop_twsn.sh, where n matches the cluster node number by convention.

When n equals 1, it is treated as a special case. TWSuser is assumed to be maestro and TWShome is assumed to be /usr/maestro.

� When n equals any other number, TWSuser is assumed to be maestron. For example: If n is 4, TWSuser is maestro4.

� TWShome is assumed to be /usr/maestron. Using the same example, TWShome is /usr/maestro4.

You need one pair of start and stop scripts for each instance of IBM Tivoli Workload Scheduler that will run in the cluster. For mutual takeover configurations like the two-node cluster environment we show in this redbook, you need each pair of start and stop scripts on each cluster node that participates in the mutual takeover architecture.

In our environment, we used the start script shown in Example 4-28 on page 236. Most of the script deals with starting correctly.

The key line that actually starts IBM Tivoli Workload Scheduler is towards the end of the script, which reads:

su - ${clusterized_TWSuser} -c "./StartUp ; conman start"

This means the su command will execute as the TWSuser user account the command:

./StartUp ; conman start

This is a simple command to start IBM Tivoli Workload Scheduler. Your site may require a different start procedure, so you can replace this line with your own procedure to start IBM Tivoli Workload Scheduler.


Example 4-28 Sample start script for IBM Tivoli Workload Scheduler under HACMP

#!/bin/sh

## Sample script for starting IBM Tivoli Workload Scheduler Version 8.2# under IBM HACMP Version 5.1.## Comments and questions to Anthony Yen <[email protected]>#

#-----------------------------# User-Configurable Constants#-----------------------------## Base TWShome path. Modify this to match your site's standards.#root_TWShome=/usr

## Base TWSuser. Modify this to match your site's standards.#TWSuser="maestro"

## Debugging directory. This just holds a flag file; it won't grow more than 1 KB.#DBX_DIR=/tmp/ha_cfg

#-------------------# Main Program Body#-------------------## Ensure debugging directory is available, create if if necessaryif [ -d ${DBX_DIR} ] ; then DBX=1else mkdir ${DBX_DIR} rc=$? if [ $rc -ne 0 ] ; then echo "WARNING: no debugging directory could be created, no debug" echo "information will be issued..." DBX=0 else DBX=1 fifi## Determine how we are called


CALLED_AS=`basename $0`## Disallow being called as root nameif [ "${CALLED_AS}" = "start_tws.sh" ] ; then echo "FATAL ERROR: This script cannot be called as itself. Please create a" echo "symbolic link to it of the form start_twsN.sh where N is an integer" echo "corresponding to the cluster node number and try again." exit 1fi## Determine cluster node number we are called as.extracted_node_number=`echo ${CALLED_AS} | sed 's/start_tws$.*$\.sh/\1/g'`## Set TWShome path to correspond to cluster node number.if [ ${extracted_node_number} -eq 1 ] ; then clusterized_TWShome=${root_TWShome}/${TWSuser} clusterized_TWSuser=${TWSuser}else clusterized_TWShome=${root_TWShome}/${TWSuser}${extracted_node_number} clusterized_TWSuser=${TWSuser}${extracted_node_number}fi

echo "clusterized_TWShome = $clusterized_TWShome"echo "clusterized_TWSuser = $clusterized_TWSuser"

if [ $DBX -eq 1 ] ; then echo "Script for starting TWS ${extracted_node_number} at "`date` > \${DBX_DIR}/start${extracted_node_number}.flagfi

echo "Starting TWS ${extracted_node_number} at "`date`su - ${clusterized_TWSuser} -c "./StartUp ; conman start"echo "Netman on TWS ${extracted_node_number} started, conman start issued"sleep 10echo "Process list of ${clusterized_TWSuser}-owned processes..."ps -ef | grep -v grep | grep ${clusterized_TWSuser}

exit 0

In our environment, we used a stop script that uses the same execution semantics as the start script described in the preceding discussion. The exact commands it runs depends upon the name the stop script is called as when it is executed.

Most of the script deals with starting correctly. The script is oriented towards stopping the cluster node, which in our environment is a Master Domain Manager. The key lines that actually stop IBM Tivoli Workload Scheduler are


towards the end of the script, which are extracted and shown in Example 4-29 on page 238.

Example 4-29 Commands used by stop script to stop IBM Tivoli Workload Scheduler

su - ${clusterized_TWSuser} -c "conman 'unlink cpu=@ ; noask'"su - ${clusterized_TWSuser} -c "conman 'stop @ ; wait ; noask'"su - ${clusterized_TWSuser} -c "conman 'shutdown ; wait'" . . . wmaeutil ${connector} -stop "*"

This means the su command will execute, as the TWSuser user account, the following command:

conman 'unlink cpu=@ ; noask'

This unlinks all CPUs in the scheduling network. This is followed by another su command that executes, as the TWSuser user account, the following command:

conman 'stop @ ; wait ; noask'

This stops the IBM Tivoli Workload Scheduler engine on all CPUs in the scheduling network. A third and final su command executes, as the TWSuser user account, the following command:

conman 'shutdown ; wait'

This stops the netman process of the instance of IBM Tivoli Workload Scheduler on the cluster node.

Finally, the wmaeutil command is executed within a loop that passes the name of each IBM Tivoli Workload Scheduler Connector found on the cluster node to each iteration of the command. This stops all Connectors associated with the instance of IBM Tivoli Workload Scheduler that is being stopped.

This is a simple set of commands to stop IBM Tivoli Workload Scheduler. Your site may require a different stop procedure, so you can replace this line with your own procedure to stop IBM Tivoli Workload Scheduler. Example 4-30 shows our sample stop script.

Example 4-30 Sample stop script for IBM Tivoli Workload Scheduler under HACMP

#!/bin/ksh

## Sample script for stopping IBM Tivoli Workload Scheduler Version 8.2# under IBM HACMP Version 5.1.#


# Comments and questions to Anthony Yen <[email protected]>#

#-----------------------------# User-Configurable Constants#-----------------------------## Base TWShome path. Modify this to match your site's standards.#root_TWShome=/usr

## Base TWSuser. Modify this to match your site's standards.#TWSuser="maestro"

## Debugging directory. This just holds a flag file; it won't grow more than 1 KB.#DBX_DIR=/tmp/ha_cfg

#-------------------# Main Program Body#-------------------## Source in environment variables for IBM Tivoli Management Framework.if [ -d /etc/Tivoli ] ; then . /etc/Tivoli/setup_env.shelse echo "FATAL ERROR: Tivoli environment could not be sourced, exiting..." exit 1fi## Ensure debugging directory is available, create if if necessaryif [ -d ${DBX_DIR} ] ; then DBX=1else mkdir ${DBX_DIR} rc=$? if [ $rc -ne 0 ] ; then echo "WARNING: no debugging directory could be created, no debug" echo "information will be issued..." DBX=0 else DBX=1 fifi## Determine how we are called


CALLED_AS=`basename $0`## Disallow being called as root nameif [ "${CALLED_AS}" = "stop_tws.sh" ] ; then echo "FATAL ERROR: This script cannot be called as itself. Please create a" echo "symbolic link to it of the form stop_twsN.sh where N is an integer" echo "corresponding to the cluster node number and try again." exit 1fi## Determine cluster node number we are called as.extracted_node_number=`echo ${CALLED_AS} | sed 's/stop_tws$.*$\.sh/\1/g'`## Set TWShome path to correspond to cluster node number.if [ ${extracted_node_number} -eq 1 ] ; then clusterized_TWShome=${root_TWShome}/${TWSuser} clusterized_TWSuser=${TWSuser}else clusterized_TWShome=${root_TWShome}/${TWSuser}${extracted_node_number} clusterized_TWSuser=${TWSuser}${extracted_node_number}fi## Source IBM Tivoli Workload Scheduler environment variables.if [ -f ${clusterized_TWShome}/tws_env.sh ] ; then . ${clusterized_TWShome}/tws_env.shelse echo "FATAL ERROR: Unable to source ITWS environment from:" echo " ${clusterized_TWShome}/tws_env.sh" echo "Exiting..." exit 1fi

echo "clusterized_TWShome = $clusterized_TWShome"echo "clusterized_TWSuser = $clusterized_TWSuser"

if [ $DBX -eq 1 ] ; then echo "Script for stopping TWS ${extracted_node_number} at "`date` > ${DBX_DIR}/start${extracted_node_number}.flagfi

echo "Stopping TWS ${extracted_node_number} at "`date`su - ${clusterized_TWSuser} -c "conman 'unlink cpu=@ ; noask'"su - ${clusterized_TWSuser} -c "conman 'stop @ ; wait ; noask'"su - ${clusterized_TWSuser} -c "conman 'shutdown ; wait'"echo "Shutdown for TWS ${extracted_node_number} issued..."

echo "Verify netman is stopped..."ps -ef | grep -v grep | grep ${clusterized_TWShome}/bin/netman > /dev/null


rc=$?while ( [ ${rc} -ne 1 ] )do sleep 10 ps -ef | grep -v grep | grep ${clusterized_TWShome}/bin/netman > /dev/null rc=$?done

echo "Stopping all Connectors..."## Identify all Connector object labelsconnector_labels=`wlookup -Lar MaestroEngine`for connector in ${connector_labels}do echo "Stopping connector ${connector}..." wmaeutil ${connector} -stop "*"done

echo "Process list of ${clusterized_TWSuser}-owned processes:"ps -ef | grep -v grep | grep ${clusterized_TWSuser}

exit 0

To add the custom start and stop HACMP scripts:

1. Copy both scripts to the directory /usr/es/sbin/cluster/utils on each cluster node.

2. Run the commands in Example 4-31 to install the scripts. These create symbolic links to the scripts. When the script is called via one of these symbolic links, it will know which instance of IBM Tivoli Workload Scheduler to manage.

Example 4-31 Commands to run to install custom HACMP start and stop scripts for IBM Tivoli Workload Scheduler

ln -s /usr/es/sbin/cluster/utils/start_tws.sh /usr/es/sbin/cluster/utils/start_tws1.shln -s /usr/es/sbin/cluster/utils/start_tws.sh /usr/es/sbin/cluster/utils/start_tws2.shln -s /usr/es/sbin/cluster/utils/stop_tws.sh /usr/es/sbin/cluster/utils/stop_tws2.shln -s /usr/es/sbin/cluster/utils/stop_tws.sh /usr/es/sbin/cluster/utils/stop_tws1.sh

The symbolic links mean that no matter how many instances of IBM Tivoli Workload Scheduler you configure in a mutual takeover HACMP cluster, only two actual scripts need to be maintained. If you ensure that there are no unique variations between installations of IBM Tivoli Workload Scheduler, then maintaining the scripts among all installations is very easy. Only two scripts ever need to be modified, vastly simplifying maintenance and reducing copying errors.


Note: Keep in mind that, after a modification is made to either or both scripts, they need to be copied back to all the cluster nodes.

Add a custom post-event HACMP scriptIBM Tivoli Workload Scheduler presents a special case situation that HACMP can be configured to handle. If IBM Tivoli Workload Scheduler falls back to a cluster node, ideally it should fall back only after all currently running jobs have had a chance to finish.

For example, consider our environment of a two-node mutual takeover HACMP cluster, shown in Figure 4-23 when it is running normally. Here, cluster node tivaix1 runs an instance of IBM Tivoli Workload Scheduler we will call TWS Engine 1 from disk volume group tiv_vg1. Meanwhile, cluster node tivaix2 runs TWS Engine 2 from disk volume group tiv_vg2.

Figure 4-23 Normal operation of two-node mutual takeover HACMP cluster

Tip: Console output from the start and stop scripts are sent to /tmp/hacmp.out on the cluster nodes. This is useful to debug start and stop scripts while you develop them.

tivaix1

Mass Storage

tivaix2

tiv_vg1

tiv_vg2

TWS Engine1 TWS Engine2


Suppose cluster node tivaix2 suffers an outage, and falls over to tivaix1. This means TWS Engine2 now also runs on tivaix1, and tivaix1 picks up the connection to disk volume group tiv_vg2, as shown in Figure 4-24.

Figure 4-24 Location of application servers after tivaix2 falls over to tivaix1

Due to the sudden nature of a catastrophic failure, the jobs that are in progress on tivaix2 under TWS Engine2 when the disaster incident occurs are lost. When TWS Engine2 starts on tivaix1, you would perform whatever job recovery is necessary.

tivaix1

Mass Storage

tivaix2

tiv_vg1

tiv_vg2

TWS Engine1

TWS Engine2 X


Figure 4-25 State of cluster after tivaix2 returns to service and reintegrates with the cluster

When tivaix2 is restored to service, it reintegrates with the cluster, but because we chose to use the Cascading WithOut Fallback (CWOF) feature, TWS Engine2 is not immediately transferred back to tivaix2 when it reintegrates with the cluster. This is shown in Figure 4-25, where tivaix2 is shown as available and back in the cluster, but TWS Engine2 is not shut down and transferred over to it yet.

Here is where the special case situation presents itself. If we simply shut down TWS Engine2 and transfer it back to tivaix2, any jobs TWS Engine2 currently is running on tivaix1 can possibly lose their job state information, or in the worst case where the jobs are executed from the same disk volume group as TWS Engine2 (or use the same disk volume group to read and write their data), be interrupted in mid-execution. This is shown in Figure 4-26 on page 245.

As long as there are running jobs under TWS Engine2 in the memory of tivaix1, moving TWS Engine2 to tivaix2 can cause undesirable side effects because we cannot move the contents of memory from one machine to another, only the contents of a disk volume group.

tivaix1

Mass Storage

tivaix2

tiv_vg1

tiv_vg2

TWS Engine1

TWS Engine2


Figure 4-26 Running jobs under TWS Engine2 on tivaix1 prevent TWS Engine2 from transferring back to tivaix2

It is usually too inconvenient to wait for a lull in the jobs that are running under TWS Engine2 on tivaix1. In many environments there simply is no such “dead zone” in currently running jobs. When this occurs, the jobs currently executing on the cluster node in question (tivaix1, in this example) need to run through to completion, without any new jobs releasing on the cluster node, before moving the application server (TWS Engine2 ,in this example). The new jobs that are prevented from releasing will have a delayed launch time, but this is often the least disruptive approach to gracefully transferring an application server back to a reintegrated cluster node.

This process is called “quiescing the application server”. For IBM Tivoli Workload Scheduler, as long as there are no currently running jobs on the cluster node itself that an instance of IBM Tivoli Workload Scheduler needs to move away from, all information that needs to be transferred intact is held on disk. This makes it easy and safe to restart IBM Tivoli Workload Scheduler on the reintegrated cluster node. The job state information that needs to be transferred can be thought of as “in hibernation” after no jobs are actively running.

tivaix1

Mass Storage

tivaix2

tiv_vg1

tiv_vg2

TWS Engine1

TWS Engine2

Running Job(s)

Running Job(s)


We quiesce an instance of IBM Tivoli Workload Scheduler by raising the job fence of the instance on a cluster node high enough that all new jobs on the cluster node will not release. See IBM Tivoli Workload Scheduling Suite Version 8.2, General Information, SC32-1256, for more details on job fences. Raising the job fence does not affect currently running jobs.

We do not recommend using a job stream or CPU limit to quiesce the currently running jobs under an instance of IBM Tivoli Workload Scheduler on a CPU. Schedulers and users can still override a limit by forcing the priority of a job to the GO state, which can cause problems for falling back to a cluster node if a job is released at an inopportune time during the fallback.

It is very important to understand that when and how to quiesce an instance of IBM Tivoli Workload Scheduler is wholly dependent upon business considerations. When designing a schedule, collect information from the business users of IBM Tivoli Workload Scheduler on which jobs and job streams must not be delayed, which can be delayed if necessary, and for how long can they be delayed. This information is used to determine when to quiesce the server, and the impact of the operation. It can also be used to automate the decision and process of falling back an application server.

Some considerations external to IBM Tivoli Workload Scheduler usually affect this process as well. For example, if a database is used by jobs running on the cluster, or is hosted on the disk volume group that the application server uses, falling back would require shutting down the database. In some environments, this can be very time consuming, it can difficult to obtain authorization for on short notice, or it can be simply unacceptable during certain times of the year (like quarter-end processing periods). A highly available environment that takes these considerations into account is part of the design process of an actual production deployment. Consult your IBM service provider for advice on how to mitigate these additional considerations.

Tip: While you can quiesce IBM Tivoli Workload Scheduler at any time, you still gain benefits from planning when during the production day you quiesce it. Quiesce when there is as little time left as possible for currently running jobs to complete, because the sooner currently running jobs complete, the less time new jobs will be kept on hold. Use the available reports in IBM Tivoli Workload Scheduler to predict when currently running jobs will complete.


Figure 4-27 TWS Engine2 on tivaix1 is quiesced, only held jobs exist on tivaix1 under TWS Engine2. TWS Engine2 can now fall back to tivaix2

Once an instance of IBM Tivoli Workload Scheduler is quiesced on a CPU, all remaining jobs for that instance on that CPU are held either because their dependencies have not been satisfied yet, or the job fence has held their priority. This is shown inFigure 4-27, in which on tivaix1 only TWS Engine1 still has running jobs, while TWS Engine2’s jobs are all held, and their state recorded to the production file on disk volume group tiv_vg2.

Due to the business and other non-IBM Tivoli Workload Scheduler considerations that affect the decision and process of quiescing an application server in preparation for falling it back to its original cluster node, we do not show in this redbook a sample quiesce script. In our lab environment, because we are not running actual production applications, our quiesce script simply exits.

tivaix1

Mass Storage

tivaix2

tiv_vg1

tiv_vg2

TWS Engine1

TWS Engine2

Running Job(s)

Held Job(s)


However, when you develop your own quiesce script, we recommend that you design it as a script to be called as a post-event script for the node_up_complete event. Before raising the fence, the script should check for at least the following conditions:

� All business conditions are met for raising the fence. For example, do not raise the fence if a business user still requires scheduling services for a critical job that needs to execute in the near future.

� HACMP is already running on the cluster node a quiesced application server in a resource group needs to fall back to.

� The cluster node is reintegrated within the cluster, but the resource group that normally belongs on the cluster node is not on that node. This prevents the quiescing process from accidentally running on a new node that joins the cluster and unnecessarily shutting down an application server, for example.

� The resource group that falls back is in the ONLINE state on another cluster node. This prevents the quiescing from accidentally moving resource groups taken down for business reasons, for example.

Example 4-32 shows korn shell script code that can be used to determine if HACMP is running. It simply checks the status of the basic HACMP subsystems. You may need to modify it to suit your particular HACMP environment if other HACMP subsystems are used.

Example 4-32 How to determine in a script whether or not HACMP is running

PATH=${PATH}:/usr/es/sbin/cluster/utilitiesclstrmgrES=`clshowsrv clstrmgrES | grep -v '^Subsystem' | awk '{ print $3 }'`clinfoES=` clshowsrv clinfoES | grep -v '^Subsystem' | awk '{ print $3 }'`clsmuxpdES=`clshowsrv clsmuxpdES | grep -v '^Subsystem' | awk '{ print $3 }'`if ( [ "${clstrmgrES}" = 'inoperative' ] \ -o [ "${clinfoES}" = 'inoperative' ] \ -o [ "${clsmuxpdES}" = 'inoperative' ]) ; then echo "FATAL ERROR: HACMP does not appear to be running, exiting..." exit 1fi

Example 4-33 shows the clRGinfo command and sample output from our environment. This can be used to determine whether or not a resource group is ONLINE, and if so, which cluster node it currently runs upon.

Example 4-33 Using the clRGinfo command to determine the state of resource groups in a cluster

[root@tivaix1:/home/root] clRGinfo-----------------------------------------------------------------------------Group Name Type State Location-----------------------------------------------------------------------------


rg1 cascading OFFLINE tivaix1 OFFLINE tivaix2

rg2 cascading ONLINE tivaix2 OFFLINE tivaix1[root@tivaix1:/home/root] clRGinfo -srg1:OFFLINE:tivaix1:cascadingrg1:OFFLINE:tivaix2:cascadingrg2:ONLINE:tivaix2:cascadingrg2:OFFLINE:tivaix1:cascading

You use conman’s fence command to raise the job fence on a CPU. If we want to raise the job fence on cluster node tivaix1 for an instance of IBM Tivoli Workload Scheduler that is running as CPU TIVAIX2, we log into the TWSuser user account of that instance, then run the command:

conman "fence TIVAIX2 ; go ; noask"

In our environment, we would log into maestro2 on tivaix1. A quiesce script would be running on a reintegrated cluster node, and remotely log into the surviving node to perform the job fence operation.

Example 4-34 shows one way to have a shell script wait for currently executing jobs under an instance of IBM Tivoli Workload Scheduler on a CPU to exit. It is intended to be run as root user. It simply uses the su command to run a command as the TWSuser user account that owns the instance of IBM Tivoli Workload Scheduler. The command that is run lists all jobs in the EXEC state on the CPU TIVAIX1, then counts the number of jobs returned. As long as the number of jobs in the EXEC state is not equal to zero, the code waits for a minute, then checks the number of jobs in the EXEC state again. Again, a quiesce script would remotely run this code on the surviving node against the desired instance of IBM Tivoli Workload Scheduler.

Example 4-34 Wait for currently executing jobs to exit

num_exec_jobs=`su - maestro -c "conman sj TIVAIX1#@.@+state='exec' 2>/dev/null | \grep -v 'sj @#@.@+state=exec' | wc -l"`while ( [ ${num_exec_jobs} -ne 0 ] )do sleep 60 num_exec_jobs=`su - maestro -c "conman sj TIVAIX1#@.@+state='exec' 2>/dev/null | \grep -v 'sj @#@.@+state=exec' | wc -l"`done

If the implemented quiesce script successfully quiesces the desired instance of IBM Tivoli Workload Scheduler, it can also be designed to automatically perform


the resource group move. A script would use the clRGmove command, as shown in Example 4-35, to move resource group rg2 to tivaix2:

Example 4-35 Move a resource group using the clRGmove command

/usr/es/sbin/cluster/utilities/clRGmove -s 'false' -m -i -g 'rg2' -n 'tivaix2'

This command can be run from any cluster node.

In our environment, we copy our stub quiesce script to:

/usr/es/sbin/cluster/sh/quiesce_tws.sh

This script is copied to the same location on both cluster nodes tivaix1 and tivaix2. The stub does not perform any actual work, so it has no effect upon HACMP. In our environment, with CWOF set to true, the stub would have to run clRGmove to simulate quiescing. We still perform the quiescing manually as a result.

Modify /etc/hosts and name resolution orderThe IP hostnames we use for HACMP are configured in /etc/hosts so that local name resolution can be performed if access to the DNS server is lost. In our environment, our /etc/hosts file is the same on both cluster nodes tivaix1 and tivaix2, as shown in Figure 4-28 on page 251.

Tip: Make sure the basic HACMP services work for straight fallover and fallback scenarios before customizing HACMP behavior.

In a production deployment, the quiesce script would be implemented and tested only after basic configuration and testing of HACMP is successful.


Figure 4-28 File /etc/hosts copied to all by cluster nodes of cluster we used

Name resolution order is controlled by the following items, in decreasing order of precedence (the first line overrides the second line, which in turn overrides the third line):

� Environment variable NSORDER

� Host settings in the /etc/netsvc.conf file

� Host settings in the /etc/irs.conf file

In our environment, we used the following line in /etc/netsvc.conf to set the name resolution order on all cluster nodes:

hosts = local, bind

The /etc/netsvc.conf file on all cluster nodes is set to this line.

127.0.0.1 loopback localhost # loopback (lo0) name/address# 9.3.4.33 tivdce1.itsc.austin.ibm.com

# Administrative addresses (persistent on each node)9.3.4.194 tivaix1 tivaix1.itsc.austin.ibm.com9.3.4.195 tivaix2 tivaix2.itsc.austin.ibm.com

# Base IP labels for en1 on both nodes10.1.1.101 tivaix1_bt210.1.1.102 tivaix2_bt2

# Service IP labels9.3.4.3 tivaix1_svc9.3.4.4 tivaix2_svc

# Boot IP labels for en0192.168.100.101 tivaix1_bt1192.168.100.102 tivaix2_bt1

Note: In our environment, we used some IP hostnames that include underscores to test HACMP’s handling of name resolution. In a live production environment, we do not recommend this practice.


Underscores are not officially supported in DNS, so some of the host entries we use for our environment can never be managed by strict DNS servers. The rules for legal IP hostnames are set by RFC 952:

http://www.ietf.org/rfc/rfc952.txt

RFC 1123 also sets the rules for legal IP hostnames:


All the entries for /etc/hosts are drawn from the planning worksheets that you fill out when planning for HACMP.

Configure HACMP service IP labels/addressesA service IP label/address is used to establish communication between client nodes and the server node. Services, such as a database application, are provided using the connection made over the service IP label. This connection can be node-bound or taken over by multiple nodes.

For the standard configuration, it is assumed that the connection will allow IP Address Takeover (IPAT) via aliases.

The /etc/hosts file on all nodes must contain all IP labels and associated IP addresses that you want to discover.

Follow this procedure to define service IP labels for your cluster:


2. Go to HACMP -> Initialization and Standard Configuration -> Configure Resources to Make Highly Available -> Configure Service IP Labels/Addresses and press Enter.

3. Fill in field values as follows as shown in Figure 4-29 on page 253:

IP Label/IP Address Enter, or select from the picklist, the IP label/IP address to be kept highly available.

Network Name Enter the symbolic name of the HACMP network on which this Service IP label/address will be configured. If you leave the field blank, HACMP fills in this field automatically with the network type plus a number appended, starting with 1 (for example, netether1).


Figure 4-29 Enter service IP label for tivaix1

Figure 4-29 shows how we entered the service address label for tivaix1. In our environment, we use tivaix1_svc as the IP label and net_ether_01 as the network name.

4. Press Enter after filling in all required fields. HACMP now checks the validity of the IP Interface configuration.

5. Repeat the previous steps until you have configured all IP service labels for each network, as needed.

In our environment, we create another service IP label for cluster node tivaix2, as shown in Figure 4-30 on page 254.






Figure 4-30 How to enter service IP labels for tivaix2

We used tivaix2_svc as the IP label and net_ether_01 as the network name. Note how we assigned the network name net_ether_01 in both cases, so that both sets of service IP labels are in the same HACMP network.

Configure HACMP networks and heartbeat pathsThe cluster should have more than one network, to avoid a single point of failure. Often the cluster has both IP and non-IP based networks in order to use different heartbeat paths. Use the Add a Network to the HACMP cluster SMIT screen to define HACMP IP and point-to-point networks. Running HACMP discovery before configuring is recommended, to speed up the process.

In our environment, we use IP-based networks, heartbeating over IP aliases, and point-to-point networks over Target Mode SSA. In this section we show how to configure IP-based networks and heartbeating using IP aliases. Refer to “Configure heartbeating” on page 213 for information about configuring point-to-point networks over Target Mode SSA.






Configure IP-Based networksTo configure IP-based networks, take the following steps:


2. Go to Extended Configuration -> Extended Topology Configuration -> Configure HACMP Networks -> Add a Network to the HACMP Cluster and press Enter.

3. Select the type of network to configure and press Enter. The Add an IP-Based Network to the HACMP Cluster SMIT screen displays the configuration fields.

In our environment, we selected ether for the type of network to configure.

4. Enter the information as follows:

Network Name If you do not enter a name, HACMP will give the network a default network name made up of the type of network with a number appended (for example, ether1). If you change the name for this network, use no more than 32 alphanumeric characters and underscores.

Network Type This field is filled in depending on the type of network you selected.

Netmask The netmask (for example, 255.255.255.0).

Enable IP Takeover via IP Aliases

The default is True. If the network does not support IP aliases, then IP Replacement will be used. IP Replacement is the mechanism whereby one IP address is removed from an interface, and another IP address is added to that interface. If you want to use IP Replacement on a network that does support aliases, change the default to False.


Enter the base address of a private address range for heartbeat addresses (for example 10.10.10.1). HACMP will use this address to automatically generate IP addresses for heartbeat for each boot interface in the configuration. This address range must be unique and must not conflict with any other subnets on the network.

Refer to section “Heartbeat Over IP Aliases” in Chapter 3, Planning Cluster Network Connectivity in High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1,


SC23-4861-00, and to your planning worksheet for more information on selecting a base address for use by Heartbeating over IP Aliases.

Clear this entry to use the default heartbeat method.

In our environment, we entered the values for the IP-based network as shown in Figure 4-31. We used the network name of net_ether_01, with a netmask of 255.255.254.0 for our lab network, and set an IP address offset for heartbeating over IP aliases of 172.16.100.1, corresponding to the offset we chose during the planning stage. Because our lab systems use network interface cards capable of supporting IP aliases, we leave the flag Enable IP Address Takeover via IP Aliases toggled to Yes.

Figure 4-31 Add an IP-Based Network to the HACMP Cluster SMIT screen

5. Press Enter to configure this network.

6. Repeat the operation to configure more networks.

In our environment, this is the only network we configured, so we did not configure any other HACMP networks.

Add an IP-Based Network to the HACMP Cluster


[Entry Fields]* Network Name [net_ether_01]* Network Type ether* Netmask [255.255.254.0] +* Enable IP Address Takeover via IP Aliases [Yes] + IP Address Offset for Heartbeating over IP Aliases [172.16.100.1]



Configure heartbeating over IP aliasesIn HACMP 5.1, you can configure heartbeating over IP Aliases to establish IP-based heartbeat rings over IP Aliases to run over your existing topology. Heartbeating over IP Aliases supports either IP Address Takeover (IPAT) via IP Aliases or IPAT via IP Replacement. The type of IPAT configured determines how HACMP handles the service label:

IPAT via IP Aliases

The service label, as well as the heartbeat alias, is aliased onto the interface.

IPAT via IP Replacement

The service label is swapped with the interface IP address, not the heartbeating alias.

To configure heartbeating over IP Aliases, you specify an IP address offset when configuring an interface. See the preceding section for details. Make sure that this address does not conflict with addresses configured on your network.

When you run HACMP verification, the clverify utility verifies that:

� The configuration is valid for the address range

� All interfaces are the same type (for example, Ethernet) and have the same subnet mask

� The offset address allots sufficient addresses and subnets on the network.

In our environment we use IPAT via IP aliases.

Configure HACMP resource groupsThis creates a container to organize HACMP resources into logical groups that are defined later. Refer to High Availability Cluster Multi-Processing for AIX Concepts and Facilities Guide Version 5.1, SC23-4864, for an overview of types of resource groups you can configure in HACMP 5.1. Refer to the chapter on planning resource groups in High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00, for further planning information. You should have your planning worksheets in hand.

Using the standard path, you can configure resource groups that use the basic management policies. These policies are based on the three predefined types of startup, fallover, and fallback policies: cascading, rotating, concurrent.

Note: HACMP removes the aliases from the interfaces at shutdown. It creates the aliases again when the network becomes operational. The /tmp/hacmp.out file records these changes.


In addition to these, you can also configure custom resource groups for which you can specify slightly more refined types of startup, fallover and fallback policies.

Once the resource groups are configured, if it seems necessary for handling certain applications, you can use the Extended Configuration path to change or refine the management policies of particular resource groups (especially custom resource groups).

Configuring a resource group involves two phases:

� Configuring the resource group name, management policy, and the nodes that can own it

� Adding the resources and additional attributes to the resource group.

Refer to your planning worksheets as you name the groups and add the resources to each one.

To create a resource group:


2. On the HACMP menu, select Initialization and Standard Configuration -> Configure HACMP Resource Groups -> Add a Standard Resource Group and press Enter.

You are prompted to select a resource group management policy.

3. Select Cascading, Rotating, Concurrent or Custom and press Enter.

For our environment, we used Cascading.

Depending on the previous selection, you will see a screen titled Add a Cascading | Rotating | Concurrent | Custom Resource Group. The screen will only show options relevant to the type of the resource group you selected. If you select custom, you will be asked to refine the startup, fallover, and fallback policy before continuing.

4. Enter the field values as follows for a cascading, rotating, or concurrent resource group (Figure 4-32 on page 259):

Resource Group Name

Enter the desired name. Use no more than 32 alphanumeric characters or underscores; do not use a leading numeric.

Do not use reserved words. See “List of Reserved Words” in Chapter 6 of High Availability Cluster Multi-Processing for AIX Administration and


Troubleshooting Guide Version 5.1, SC23-4862. Duplicate entries are not allowed.

Participating Node Names

Enter the names of the nodes that can own or take over this resource group. Enter the node with the highest priority for ownership first, followed by the nodes with the lower priorities, in the desired order. Leave a space between node names (for example, NodeA NodeB NodeX).

If you choose to define a custom resource group, you define additional fields. We do not use custom resource groups in this redbook for simplicity of presentation.

Figure 4-32 shows how we configured resource group rg1 in the environment implemented by this redbook. We use this resource group to contain the instances of IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework normally running on tivaix1.

Figure 4-32 Configure resource group rg1

Add a Resource Group with a Cascading Management Policy (standard)


[Entry Fields]* Resource Group Name [rg1]* Participating Node Names / Default Node Priority [tivaix1 tivaix2] +



Figure 4-33 shows how we configured resource group rg2 in our environment. We used this resource group to contain the instances of IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework normally running on tivaix2.

Figure 4-33 How to configure resource group rg2

Configure cascading without fallback, other attributesWe configured all resource groups in our environment for cascading without fallback (CWOF) so IBM Tivoli Workload Scheduler can be given enough time to quiesce before falling back. This is part of the extended resource group configuration.

We use this step to also configure other attributes of the resource groups, such as the associated shared volume group and filesystems.

To configure CWOF and other resource group attributes:




[Entry Fields]* Resource Group Name [rg2]* Participating Node Names / Default Node Priority [tivaix2 tivaix1] +



2. Go to Initialization and Standard Configuration -> Configure HACMP Resource Groups -> Change/Show Resources for a Standard Resource Group and press Enter to display a list of defined resource groups.

3. Select the resource group you want to configure and press Enter. SMIT returns the screen that matches the type of resource group you selected, with the Resource Group Name, and Participating Node Names (Default Node Priority) fields filled in.

If the participating nodes are powered on, you can press F4 to list the shared resources. If a resource group/node relationship has not been defined, or if a node is not powered on, F4 displays the appropriate warnings.

4. Enter the field values as follows:

Service IP Label/IP Addresses

(Not an option for concurrent or custom concurrent-like resource groups.) List the service IP labels to be taken over when this resource group is taken over. Press F4 to see a list of valid IP labels. These include addresses which rotate or may be taken over.

Filesystems (empty is All for specified VGs)

(Not an option for concurrent or custom concurrent-like resource groups.) If you leave the Filesystems (empty is All for specified VGs) field blank and specify the shared volume groups in the Volume Groups field below, all file systems will be mounted in the volume group. If you leave the Filesystems field blank and do not specify the volume groups in the field below, no file systems will be mounted.

You may also select individual file systems to include in the resource group. Press F4 to see a list of the file systems. In this case only the specified file systems will be mounted when the resource group is brought online.

Filesystems (empty is All for specified VGs) is a valid option only for non-concurrent resource groups.

Volume Groups (If you are adding resources to a non-concurrent resource group) Identify the shared volume groups

Note: SMIT displays only valid choices for resources, depending on the type of resource group that you selected. The fields are slightly different for custom, non-concurrent, and concurrent groups.


that should be varied on when this resource group is acquired or taken over. Select the volume groups from the picklist, or enter desired volume groups names in this field.

Pressing F4 will give you a list of all shared volume groups in the resource group and the volume groups that are currently available for import onto the resource group nodes.

Specify the shared volume groups in this field if you want to leave the field Filesystems (empty is All for specified VGs) blank and to mount all file systems in the volume group.

If you specify more than one volume group in this field, then all file systems in all specified volume groups will be mounted; you cannot choose to mount all filesystems in one volume group and not to mount them in another.

For example, in a resource group with two volume group (vg1 and vg2), if the field Filesystems (empty is All for specified VGs) is left blank, then all the filesystems in vg1 and vg2 will be mounted when the resource group is brought up.

However, if the field Filesystems (empty is All for specified VGs) has only filesystems that are part of the vg1 volume group, then none of the filesystems in vg2 will be mounted, because they were not entered in the Filesystems (empty is All for specified VGs) field along with the filesystems from vg1.

If you have previously entered values in the Filesystems field, the appropriate volume groups are already known to the HACMP software.

Concurrent Volume Groups

(Appears only if you are adding resources to a concurrent or custom concurrent-like resource group.) Identify the shared volume groups that can be accessed simultaneously by multiple nodes. Select the volume groups from the picklist, or enter desired volume groups names in this field.

If you previously requested that HACMP collect information about the appropriate volume groups, then pressing F4 will give you a list of all existing concurrent


capable volume groups that are currently available in the resource group, and concurrent capable volume groups available to be imported onto the nodes in the resource group.

Disk fencing is turned on by default.

Application Servers Indicate the application servers to include in the resource group. Press F4 to see a list of application servers.

In our environment, we defined resource group rg1 as shown in Figure 4-34.

Figure 4-34 Define resource group rg1

Note: If you are configuring a custom resource group, and choose to use a dynamic node priority policy for a cascading-type custom resource group, you will see the field where you can select which one of the three predefined node priority policies you want to use.

Change/Show Resources for a Cascading Resource Group


[Entry Fields] Resource Group Name rg1 Participating Node Names (Default Node Priority) tivaix1 tivaix2

* Service IP Labels/Addresses [tivaix1_svc] + Volume Groups [tiv_vg1] + Filesystems (empty is ALL for VGs specified) [] + Application Servers [tws_svr1] +



For resource group rg1, we assigned tivaix1_svc as the service IP label, tiv_vg1 as the sole volume group to use, and tws_svr1 for the application server.

5. Press Enter to add the values to the HACMP ODM.

6. Repeat the operation for other resource groups to configure.

In our environment, we defined resource group rg2 as shown in Figure 4-35.


Configure cascading without fallbackWe configured all resource groups in our environment for cascading without fallback (CWOF) so IBM Tivoli Workload Scheduler can be given enough time to quiesce before falling back. This is part of the extended resource group configuration. To configure CWOF:


2. Go to Extended Configuration -> Extended Resource Configuration -> Extended Resource Group Configuration -> Change/Show Resources and Attributes for a Resource Group and press Enter.



[Entry Fields] Resource Group Name rg2 Participating Node Names (Default Node Priority) tivaix2 tivaix1

* Service IP Labels/Addresses [tivaix2_svc] + Volume Groups [tiv_vg2] + Filesystems (empty is ALL for VGs specified) [] + Application Servers [tws_svr2] +



SMIT displays a list of defined resource groups.

3. Select the resource group you want to configure and press Enter. SMIT returns the screen that matches the type of resource group you selected, with the Resource Group Name, Inter-site Management Policy, and Participating Node Names (Default Node Priority) fields filled in as shown in Figure 4-36.


4. Enter true in the Cascading Without Fallback Enabled field by pressing Tab in the field until the value is displayed.

Figure 4-36 Set cascading without fallback (CWOF) for a resource group

5. Repeat the operation for any other applicable resource groups.

In our environment, we applied the same operation to resource group rg2; all resources and attributes for resource group rg1 are shown in Example 4-36 on page 266.

Change/Show All Resources and Attributes for a Cascading Resource Group


[TOP] [Entry Fields] Resource Group Name rg1 Resource Group Management Policy cascading Inter-site Management Policy ignore Participating Node Names / Default Node Priority tivaix1 tivaix2 Dynamic Node Priority (Overrides default) [] + Inactive Takeover Applied false + Cascading Without Fallback Enabled true +

Application Servers [tws_svr1] + Service IP Labels/Addresses [tivaix1_svc] +

Volume Groups [tiv_vg1] + Use forced varyon of volume groups, if necessary false + Automatically Import Volume Groups false +[MORE...19]



Example 4-36 All resources and attributes for resource group rg1

[TOP] [Entry Fields] Resource Group Name rg1 Resource Group Management Policy cascading Inter-site Management Policy ignore Participating Node Names / Default Node Priority tivaix1 tivaix2 Dynamic Node Priority (Overrides default) [] + Inactive Takeover Applied false + Cascading Without Fallback Enabled true +


Volume Groups [tiv_vg1] + Use forced varyon of volume groups, if necessary false + Automatically Import Volume Groups false +

Filesystems (empty is ALL for VGs specified) [/usr/maestro] + Filesystems Consistency Check fsck + Filesystems Recovery Method sequential + Filesystems mounted before IP configured false + Filesystems/Directories to Export [] + Filesystems/Directories to NFS Mount [] + Network For NFS Mount [] +

Tape Resources [] + Raw Disk PVIDs [] +

Fast Connect Services [] + Communication Links [] +

Primary Workload Manager Class [] + Secondary Workload Manager Class [] +

Miscellaneous Data [][BOTTOM]

For resource group rg2, all resources and attributes configured for it are shown in Example 4-37.


[TOP] [Entry Fields] Resource Group Name rg2 Resource Group Management Policy cascading Inter-site Management Policy ignore Participating Node Names / Default Node Priority tivaix2 tivaix1 Dynamic Node Priority (Overrides default) [] +


Inactive Takeover Applied false + Cascading Without Fallback Enabled true +


Volume Groups [tiv_vg2] + Use forced varyon of volume groups, if necessary false + Automatically Import Volume Groups false +

Filesystems (empty is ALL for VGs specified) [/usr/maestro2] + Filesystems Consistency Check fsck + Filesystems Recovery Method sequential + Filesystems mounted before IP configured false + Filesystems/Directories to Export [] + Filesystems/Directories to NFS Mount [] + Network For NFS Mount [] +





We used this SMIT screen to overview and configure for the resource groups any resources we may have missed earlier.

Configure pre-event and post-event commandsTo define your customized cluster event scripts, take the following steps:


2. Go to HACMP Extended Configuration -> Extended Event Configuration -> Configure Pre- or Post-Events -> Add a Custom Cluster Event and press Enter.


Cluster Event Command Name

Enter a name for the command. The name can have a maximum of 31 characters.


Cluster Event Description

Enter a short description of the event.

Cluster Event Script Filename

Enter the full pathname of the user-defined script to execute.

In our environment, we defined the cluster event quiesce_tws in the Cluster Event Name field for the script we added in “Add a custom post-event HACMP script” on page 242. We entered the following file pathname to the field Cluster Event Script Filename:

/usr/es/sbin/cluster/sh/quiesce_tws.sh

Figure 4-37 shows how we entered these fields.

Figure 4-37 Add a Custom Cluster Event SMIT screen

4. Press Enter to add the information to HACMP custom in the local Object Data Manager (ODM).

Add a Custom Cluster Event


[Entry Fields]* Cluster Event Name [quiesce_tws]* Cluster Event Description []* Cluster Event Script Filename [/usr/es/sbin/cluster/>



5. Go back to the HACMP Extended Configuration menu and select Verification and Synchronization to synchronize your changes across all cluster nodes.

Configure pre-event and post-event processingComplete the following steps to set up or change the processing for an event. In this step you indicate to the cluster manager to use your customized pre-event or post-event commands.

You only need to complete these steps on a single node. The HACMP software propagates the information to the other nodes when you verify and synchronize the nodes.

To configure pre- and post-events for customized event processing, and specifically the quiesce_tws post-event script, follow these steps:


2. Select HACMP Extended Configuration -> Extended Event Configuration -> Change/Show Pre-defined HACMP Events to display a list of cluster events and subevents.

3. Select an event or subevent that you want to configure and press Enter. SMIT displays the screen with the event name, description, and default event command shown in their respective fields.

Note: Synchronizing does not propagate the actual new or changed scripts; you must add these to each node manually.

Note: When resource groups are processed in parallel, fewer cluster events occur in the cluster. In particular, only node_up and node_down events take place, and events such as node_up_local, or get_disk_vg_fs do not occur if resource groups are processed in parallel.

As a result, the use of parallel processing reduces the number of particular cluster events for which you can create customized pre- or post-event scripts. If you start using parallel processing for some of the resource groups in your configuration, be aware that your existing event scripts may not work for these resource groups.

For more information, see Appendix C, “Resource Group Behavior During Cluster Events” in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862, and the chapter on planning events in High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00.


In our environment, we used node_up_complete as the event to configure.

4. Enter field values as follows:

Event Name The name of the cluster event to be customize.

Description A brief description of the event’s function. This information cannot be changed.

Event Command The full pathname of the command that processes the event. The HACMP software provides a default script. If additional functionality is required, it is strongly recommended that you make changes by adding pre-or post-event processing of your own design, rather than by modifying the default scripts or writing new ones.

Notify Command (Optional) Enter the full pathname of a user-supplied script to run both before and after a cluster event. This script can notify the system administrator that an event is about to occur or has occurred.

The arguments passed to the command are: the event name, one keyword (either start or complete), the exit status of the event (if the keyword was complete), and the same trailing arguments passed to the event command.

Pre-Event Command (Optional) If you have defined custom cluster events, press F4 for the list. Or, enter the name of a custom-defined event to run before the HACMP Cluster event command executes. This command provides pre-processing before a cluster event occurs.

The arguments passed to this command are the event name and the trailing arguments passed to the event command. Remember that the ClusterManager will not process the event until this pre-event script or command has completed.

Post-Event Command

(Optional) If you have defined custom cluster events, press F4 for the list. Or, enter the name of the custom event to run after the HACMP Cluster event command executes successfully. This script provides post-processing after a cluster event. The arguments passed to this command are the event name, event exit status, and the trailing arguments passed to the event command.


Recovery Command (Optional) Enter the full pathname of a user-supplied script or AIX command to execute to attempt to recover from a cluster event command failure. If the recovery command succeeds and the retry count is greater than zero, the cluster event command is rerun. The arguments passed to this command are the event name and the arguments passed to the event command.

Recovery Counter Enter the number of times to run the recovery command. Set this field to zero if no recovery command is specified, and to at least one (1) if a recovery command is specified.

In our environment, we enter the quiesce_tws post-event command for the node_up_complete event, as shown in Figure 4-38.

Figure 4-38 Add quiesce_tws script in Change/Show Cluster Events SMIT screen

5. Press Enter to add this information to the HACMP ODM.

Change/Show Cluster Events


[Entry Fields]

Event Name node_up_complete

Description Script run after the >

* Event Command [/usr/es/sbin/cluster/>

Notify Command [] Pre-event Command [] + Post-event Command [quiesce_tws] + Recovery Command []* Recovery Counter [0] #



6. Return to the HACMP Extended Configuration screen and synchronize your event customization by selecting the Verification and Synchronization option. Note that all HACMP event scripts are maintained in the /usr/es/sbin/cluster/events directory. The parameters passed to a script are listed in the script’s header. If you want to modify the node_up_complete event itself, for example, you could customize it by locating the corresponding script in this directory.

See Chapter 8, “Monitoring an HACMP Cluster” in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862, for a discussion of event emulation to see how to emulate HACMP event scripts without actually affecting the cluster.

Configure HACMP persistent node IP label/addressesA persistent node IP label is an IP alias that can be assigned to a network for a specified node. A persistent node IP label is a label which:

� Always stays on the same node (is node-bound).

� Co-exists with other IP labels present on an interface.

� Does not require installing an additional physical interface on that node.

� Is not part of any resource group.

Assigning a persistent node IP label for a network on a node allows you to have a node-bound address on a cluster network that you can use for administrative purposes to access a specific node in the cluster.

Refer to “Configuring HACMP Persistent Node IP Labels/Addresses” in Chapter 3, “Configuring HACMP Cluster Topology and Resources (Extended)“ in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862, for information about persistent node IP labels prerequisites.

To add persistent node IP labels, follow these steps:


2. Go to Extended Configuration -> Extended Topology Configuration -> Configure HACMP Persistent Node IP Label/Addresses -> Add a Persistent Node IP Label/Address and press Enter. The Select a Node SMIT dialog shows cluster nodes currently defined for the cluster.

3. Select a node to add a persistent node IP label/address to and then press Enter, as shown in the following figure. The Add a Persistent Node IP Label/Address SMIT screen is displayed.

In our environment, we start with cluster node tivaix1, as shown in Figure 4-39 on page 273.


Figure 4-39 Select a Node SMIT dialog


Node Name The name of the node on which the IP label/address will be bound.

Network Name The name of the network on which the IP label/address will be bound.

Node IP Label/Address

The IP label/address to keep bound to the specified node.

In our environment, we enter net_ether_01 for the Network Name field, and tivaix1 for the Node IP Label/Address field, as shown in Figure 4-40 on page 274.

+--------------------------------------------------------------------------+¦ Select a Node ¦¦ ¦¦ Move cursor to desired item and press Enter. ¦¦ ¦¦ tivaix1 ¦¦ tivaix2 ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F8=Image F10=Exit Enter=Do ¦¦ /=Find n=Find Next ¦+--------------------------------------------------------------------------+


Figure 4-40 Add a Persistent Node IP Label/Address SMIT screen for tivaix1

We entered these values by pressing F4 to select them from a list. In our environment, the list for the Network Name field is shown in Figure 4-41 on page 275.

Add a Persistent Node IP Label/Address


[Entry Fields]* Node Name tivaix1* Network Name [net_ether_01] +* Node IP Label/Address [tivaix1] +


Note: If you want to use any HACMP IP address over DNS, do not use underscores in the IP hostname, because DNS does not recognize underscores.

The use of underscores in the IP hostnames in our environment was a way to ensure that they were never introduced into the lab’s DNS server.


Figure 4-41 Network Name SMIT dialog

The selection list dialog for the Node IP Label/Address is similar.

5. Press Enter.

In our environment, we also created a persistent node IP label for cluster node tivaix2, as shown in Figure 4-42 on page 276. Note that we used the enter the same Network Name field value.

+--------------------------------------------------------------------------+¦ Network Name ¦¦ ¦¦ Move cursor to desired item and press Enter. ¦¦ ¦¦ net_ether_01 (9.3.4.0/23 192.168.100.0/23 10.1.0.0/23) ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F8=Image F10=Exit Enter=Do ¦¦ /=Find n=Find Next ¦+--------------------------------------------------------------------------+


Figure 4-42 Add a Persistent Node IP Label/Address SMIT screen for tivaix2

Configure predefined communication interfacesIn our environment, communication interfaces and devices were already configured to AIX, and needed to be configured to HACMP (that means no HACMP discovery).

To add predefined network interfaces to the cluster, follow these steps:


2. Go to Extended Configuration -> Extended Topology Configuration -> Configure HACMP Communication Interfaces/Devices -> Add Communication Interfaces/Devices and press Enter.

A SMIT selector screen appears that lets you add previously discovered, or previously defined network interfaces:

Add Discovered Communication Interfaces and Devices

Displays a list of interfaces and devices which HACMP has been able to determine as being already

Add a Persistent Node IP Label/Address


[Entry Fields]* Node Name tivaix2* Network Name [net_ether_01] +* Node IP Label/Address [tivaix2] +



configured to the operating system on a node in the cluster.

Add Pre-defined Communication Interfaces and Devices

Displays a list of all communication interfaces and devices supported by HACMP.

Select the predefined option, as shown in Figure 4-43. SMIT displays a selector screen for the Predefined Communications Type.

Figure 4-43 Select Add a Pre-defined Communication Interface to HACMP Cluster configuration

3. Select Communication Interfaces as shown in Figure 4-44 and press Enter. The Select a Network SMIT selector screen appears.

Figure 4-44 Select the Pre-Defined Communication type SMIT selector screen

+--------------------------------------------------------------------------+¦ Select a category ¦¦ ¦¦ Move cursor to desired item and press Enter. ¦¦ ¦¦ Add Discovered Communication Interface and Devices ¦¦ Add Pre-defined Communication Interfaces and Devices ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F8=Image F10=Exit Enter=Do ¦¦ /=Find n=Find Next ¦+--------------------------------------------------------------------------+

+--------------------------------------------------------------------------+¦ Select the Pre-Defined Communication type ¦¦ ¦¦ Move cursor to desired item and press Enter. ¦¦ ¦¦ Communication Interfaces ¦¦ Communication Devices ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F8=Image F10=Exit Enter=Do ¦¦ /=Find n=Find Next ¦+--------------------------------------------------------------------------+


4. Select a network, as shown in Figure 4-45, and press Enter.

Figure 4-45 Select a Network SMIT selector screen

The Add a Communication Interface screen appears. In our environment we only had one network, net_ether_01, and we selected that network.

5. Fill in the fields as follows:

Node Name The name of the node on which this network interface physically exists.

Network Name A unique name for this logical network.

Network Interface Enter the network interface associated with the communication interface (for example, en0).

IP Label/Address The IP label/address associated with this communication interface which will be configured on the network interface when the node boots. The picklist filters out IP labels/addresses already configured to HACMP.

Network Type The type of network media/protocol (for example, Ethernet, Token Ring, FDDI, and so on). Select the type from the predefined list of network types.

In our environment, we enter the IP label tivaix1_bt1 for interface en0 on cluster node tivaix1 as shown in Figure 4-46 on page 279.

+--------------------------------------------------------------------------+¦ Select a Network ¦¦ ¦¦ Move cursor to desired item and press Enter. ¦¦ ¦¦ net_ether_01 (9.3.4.0/23 192.168.100.0/23 10.1.0.0/23) ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F8=Image F10=Exit Enter=Do ¦¦ /=Find n=Find Next ¦+--------------------------------------------------------------------------+

Note: The network interface that you are adding has the base or service function by default. You do not specify the function of the network interface as in releases prior to HACMP 5.1, but further configuration defines the function of the interface.


Figure 4-46 Add a Communication Interface SMIT screen

6. Repeat this operation for any remaining communication interfaces that you planned for earlier.

In our environment, we configured the communication interfaces shown in Table 4-1to HACMP network net_ether_01. Note that the first row corresponds to Figure 4-46.

Table 4-1 Communication interfaces to configure for network net_ether_01

Add a Communication Interface


[Entry Fields]* IP Label/Address [tivaix1_bt1] +* Network Type ether* Network Name net_ether_01* Node Name [tivaix1] + Network Interface [en0]


Network Interface IP Label/Address Node Name

en0 tivaix1_bt1 (192.168.10) tivaix1

en1 tivaix1_bt2 (10.1.1.101) tivaix1

en0 tivaix2_bt1 (192.168.10) tivaix2

en1 tivaix2_bt2 (10.1.1.101) tivaix2


If you configure a Target Mode SSA network as described in “Configure heartbeating” on page 213, you should not have to configure the interfaces listed in Table 4-2; we only show this information so you can verify other HACMP communication interface configurations. For HACMP network net_tmssa_01, we configured the following communication interfaces.

Table 4-2 Communication interfaces to configure for network tivaix1_tmssa2_01

Verify the configurationWhen all the resource groups are configured, verify the cluster components and operating system configuration on all nodes to ensure compatibility. If no errors are found, the configuration is then copied (synchronized) to each node in the cluster. If Cluster Services are running on any node, the configuration changes will take effect, possibly causing one or more resources to change state.

Complete the following steps to verify and synchronize the cluster topology and resources configuration:


2. Go to Initialization and Standard Configuration -> HACMP Verification and Synchronization and press Enter.

SMIT runs the clverify utility. The output from the verification is displayed in the SMIT Command Status window. If you receive error messages, make the necessary changes and run the verification procedure again. You may see warnings if the configuration has a limitation on its availability (for example, only one interface per node per network is configured).

Figure 4-47 on page 281 shows a sample SMIT screen of a successful verification of an HACMP configuration.

Device Name Device Path Node Name

tivaix1_tmssa1_01 /dev/tmssa2 tivaix1

tivaix2_tmssa1_01 /dev/tmssa1 tivaix2


Figure 4-47 COMMAND STATUS SMIT screen for successful verification of an HACMP Cluster configuration

It is useful to view the cluster configuration to document it for future reference. To display the HACMP Cluster, follow these steps:


2. Go to Initialization and Standard Configuration -> Display HACMP Configuration and press Enter.

SMIT displays the current topology and resource information.

The configuration for our environment is shown in Figure 4-48 on page 282.

COMMAND STATUS



[TOP]Verification to be performed on the following: Cluster Topology Cluster Resources

Retrieving data from available cluster nodes. This could take a few minutes....

Verifying Cluster Topology...

Verifying Cluster Resources...

WARNING: Error notification stanzas will be addedduring synchronization for the following:[MORE...40]



Figure 4-48 COMMAND STATUS SMIT screen for our environment’s configuration

If you want to obtain the same information from the command line, use the cltopinfo command as shown in Example 4-38.

Example 4-38 Obtain the HACMP configuration using the cltopinfo command

[root@tivaix1:/home/root] /usr/es/sbin/cluster/utilities/cltopinfoCluster Description of Cluster: cltivoliCluster Security Level: StandardThere are 2 node(s) and 3 network(s) defined

NODE tivaix1: Network net_ether_01 tivaix1_svc 9.3.4.3 tivaix2_svc 9.3.4.4 tivaix1_bt2 10.1.1.101 tivaix1_bt1 192.168.100.101 Network net_tmssa_01 tivaix1_tmssa2_01 /dev/tmssa2

NODE tivaix2:

COMMAND STATUS



[TOP]Cluster Description of Cluster: cltivoliCluster Security Level: StandardThere are 2 node(s) and 3 network(s) defined

NODE tivaix1: Network net_ether_01 tivaix1_svc 9.3.4.3 tivaix2_svc 9.3.4.4 tivaix1_bt2 10.1.1.101 tivaix1_bt1 192.168.100.101 Network net_tmssa_01 Network net_tmssa_02 tivaix1_tmssa2_01 /dev/tmssa2[MORE...21]



Network net_ether_01 tivaix1_svc 9.3.4.3 tivaix2_svc 9.3.4.4 tivaix2_bt1 192.168.100.102 tivaix2_bt2 10.1.1.102 Network net_tmssa_01 tivaix2_tmssa1_01 /dev/tmssa1

Resource Group rg1 Behavior cascading Participating Nodes tivaix1 tivaix2 Service IP Label tivaix1_svc

Resource Group rg2 Behavior cascading Participating Nodes tivaix2 tivaix1 Service IP Label tivaix2_svc

The clharvest_vg command can also be used for a more detailed configuration information, as shown in Example 4-39.

Example 4-39 Gather detailed shared volume group information with the clharvest_vg command

[root@tivaix1:/home/root] /usr/es/sbin/cluster/utilities/clharvest_vg -wInitializing..Gathering cluster information, which may take a few minutes...Processing...Storing the following information in file/usr/es/sbin/cluster/etc/config/clvg_config

tivaix1:

Hdisk: hdisk0PVID: 0001813fe67712b5VGname: rootvgVGmajor: activeConc-capable: YesVGactive: NoQuorum-required:YesHdisk: hdisk1PVID: 0001813f1a43a54dVGname: rootvgVGmajor: activeConc-capable: YesVGactive: NoQuorum-required:YesHdisk: hdisk2PVID: 0001813f95b1b360


VGname: rootvgVGmajor: activeConc-capable: YesVGactive: NoQuorum-required:YesHdisk: hdisk3PVID: 0001813fc5966b71VGname: rootvgVGmajor: activeConc-capable: YesVGactive: NoQuorum-required:YesHdisk: hdisk4PVID: 0001813fc5c48c43VGname: NoneVGmajor: 0Conc-capable: NoVGactive: NoQuorum-required:NoHdisk: hdisk5PVID: 0001813fc5c48d8cVGname: NoneVGmajor: 0Conc-capable: NoVGactive: NoQuorum-required:NoHdisk: hdisk6PVID: 000900066116088bVGname: tiv_vg1VGmajor: 45Conc-capable: NoVGactive: NoQuorum-required:YesHdisk: hdisk7PVID: 000000000348a3d6VGname: tiv_vg1VGmajor: 45Conc-capable: NoVGactive: NoQuorum-required:YesHdisk: hdisk8PVID: 00000000034d224bVGname: tiv_vg2VGmajor: 46Conc-capable: NoVGactive: NoQuorum-required:YesHdisk: hdisk9PVID: none


VGname: NoneVGmajor: 0Conc-capable: NoVGactive: NoQuorum-required:NoHdisk: hdisk10PVID: noneVGname: NoneVGmajor: 0Conc-capable: NoVGactive: NoQuorum-required:NoHdisk: hdisk11PVID: noneVGname: NoneVGmajor: 0Conc-capable: NoVGactive: NoQuorum-required:NoHdisk: hdisk12PVID: 00000000034d7fadVGname: tiv_vg2VGmajor: 46Conc-capable: NoVGactive: NoQuorum-required:YesHdisk: hdisk13PVID: noneVGname: NoneVGmajor: 0Conc-capable: NoVGactive: NoQuorum-required:NoFREEMAJORS: 48...

tivaix2:

Hdisk: hdisk0PVID: 0001814f62b2a74bVGname: rootvgVGmajor: activeConc-capable: YesVGactive: NoQuorum-required:YesHdisk: hdisk1PVID: noneVGname: NoneVGmajor: 0Conc-capable: No


VGactive: NoQuorum-required:NoHdisk: hdisk2PVID: noneVGname: NoneVGmajor: 0Conc-capable: NoVGactive: NoQuorum-required:NoHdisk: hdisk3PVID: noneVGname: NoneVGmajor: 0Conc-capable: NoVGactive: NoQuorum-required:NoHdisk: hdisk4PVID: noneVGname: NoneVGmajor: 0Conc-capable: NoVGactive: NoQuorum-required:NoHdisk: hdisk5PVID: 000900066116088bVGname: tiv_vg1VGmajor: 45Conc-capable: NoVGactive: NoQuorum-required:YesHdisk: hdisk6PVID: 000000000348a3d6VGname: tiv_vg1VGmajor: 45Conc-capable: NoVGactive: NoQuorum-required:YesHdisk: hdisk7PVID: 00000000034d224bVGname: tiv_vg2VGmajor: 46Conc-capable: NoVGactive: NoQuorum-required:YesHdisk: hdisk16PVID: 0001814fe8d10853VGname: NoneVGmajor: 0Conc-capable: No


VGactive: NoQuorum-required:NoHdisk: hdisk17PVID: noneVGname: NoneVGmajor: 0Conc-capable: NoVGactive: NoQuorum-required:NoHdisk: hdisk18PVID: noneVGname: NoneVGmajor: 0Conc-capable: NoVGactive: NoQuorum-required:NoHdisk: hdisk19PVID: noneVGname: NoneVGmajor: 0Conc-capable: NoVGactive: NoQuorum-required:NoHdisk: hdisk20PVID: 00000000034d7fadVGname: tiv_vg2VGmajor: 46Conc-capable: NoVGactive: NoQuorum-required:YesHdisk: hdisk21PVID: noneVGname: NoneVGmajor: 0Conc-capable: NoVGactive: NoQuorum-required:NoFREEMAJORS: 48...

Start HACMP Cluster servicesAfter verifying the HACMP configuration, start HACMP Cluster services. Before starting HACMP Cluster services, verify that all network interfaces are configured with the boot IP labels. Example 4-40 on page 288 for tivaix1 shows how to use the ifconfig and host commands to verify that the configured IP addresses (192.168.100.101, 9.3.4.194, and 10.1.1.101 in the example, highlighted in bold) on the network interfaces all correspond to boot IP labels.


Example 4-40 Configured IP addresses before starting HACMP Cluster services on tivaix1

[root@tivaix1:/home/root] ifconfig -aen0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 192.168.100.101 netmask 0xfffffe00 broadcast 192.168.101.255 inet 9.3.4.194 netmask 0xfffffe00 broadcast 9.3.5.255 tcp_sendspace 131072 tcp_recvspace 65536en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 10.1.1.101 netmask 0xfffffe00 broadcast 10.1.1.255lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT> inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1/0 tcp_sendspace 65536 tcp_recvspace 65536[root@tivaix1:/home/root] host 192.168.100.101tivaix1_bt1 is 192.168.100.101, Aliases: tivaix1[root@tivaix1:/home/root] host 9.3.4.194tivaix1 is 9.3.4.194, Aliases: tivaix1.itsc.austin.ibm.com[root@tivaix1:/home/root] host 10.1.1.101tivaix1_bt2 is 10.1.1.101

Example 4-41 shows the configured IP addresses before HACMP starts for tivaix2.

Example 4-41 Configured IP addresses before starting HACMP Cluster services on tivaix2

[root@tivaix2:/home/root] ifconfig -aen0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 192.168.100.102 netmask 0xfffffe00 broadcast 192.168.101.255 inet 9.3.4.195 netmask 0xfffffe00 broadcast 9.3.5.255en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 10.1.1.102 netmask 0xfffffe00 broadcast 10.1.1.255 tcp_sendspace 131072 tcp_recvspace 65536lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT> inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1/0 tcp_sendspace 65536 tcp_recvspace 65536[root@tivaix2:/home/root] host 192.168.100.102tivaix2_bt1 is 192.168.100.102[root@tivaix2:/home/root] host 9.3.4.195tivaix2 is 9.3.4.195, Aliases: tivaix2.itsc.austin.ibm.com[root@tivaix2:/home/root] host 10.1.1.102tivaix2_bt2 is 10.1.1.102

To start HACMP Cluster services:



2. Go to System Management (C-SPOC) -> Manage HACMP Services -> Start Cluster Services and press Enter. The Start Cluster Services SMIT screen is displayed.

3. Add all cluster nodes you want to start to the Start Cluster Services on these nodes field as a comma-separated list of cluster node names. Press Enter to start HACMP Cluster services on the selected cluster nodes. In our environment, we enter the cluster node names tivaix1 and tivaix2 as shown in Figure 4-49.

Figure 4-49 Start Cluster Services SMIT screen

4. The COMMAND STATUS SMIT screen displays the progress of the start operation, and will appear similar to Figure 4-50 on page 303 if successful.

Start Cluster Services


[Entry Fields]* Start now, on system restart or both now + Start Cluster Services on these nodes [tivaix1,tivaix2] + BROADCAST message at startup? true + Startup Cluster Lock Services? false + Startup Cluster Information Daemon? true + Reacquire resources after forced down ? false +



Figure 4-50 COMMAND STATUS SMIT screen displaying successful start of cluster services

Check the network interfaces again after the start operation is complete. The service IP label and the IP addresses for heartbeating over IP aliases are populated into the network interfaces after HACMP starts.

The service IP address is populated into any available network interface; HACMP selects which network interface. One IP address for heartbeating over IP aliases is populated by HACMP for each available network interface.

Example 4-42 on page 291 shows the configured IP addresses on the network interfaces of tivaix1 after HACMP is started. Note that three new IP addresses are added into our environment, 172.16.100.2, 172.16.102.2, and 9.3.4.3, highlighted in bold in the example output.

The IP addresses for heartbeating over IP aliases are 172.16.100.2 and 172.16.102.2. The service IP address is 9.3.4.3.

COMMAND STATUS



[TOP]

Starting Cluster Services on node: tivaix1This may take a few minutes. Please wait...tivaix2: start_cluster: Starting HACMPtivaix2: 0513-029 The portmap Subsystem is already active.tivaix2: Multiple instances are not supported.tivaix2: 0513-029 The inetd Subsystem is already active.tivaix2: Multiple instances are not supported.tivaix2: 8832 - 0:00 syslogdtivaix2: Setting routerevalidate to 1tivaix2: 0513-059 The topsvcs Subsystem has been started. Subsystem PID is 19384[MORE...30]



Example 4-42 Configured IP addresses after starting HACMP Cluster services on tivaix1

[root@tivaix1:/home/root] ifconfig -aen0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 192.168.100.101 netmask 0xfffffe00 broadcast 192.168.101.255 inet 9.3.4.194 netmask 0xfffffe00 broadcast 9.3.5.255 inet 172.16.100.2 netmask 0xfffffe00 broadcast 172.16.101.255 tcp_sendspace 131072 tcp_recvspace 65536en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 10.1.1.101 netmask 0xfffffe00 broadcast 10.1.1.255 inet 172.16.102.2 netmask 0xfffffe00 broadcast 172.16.103.255 inet 9.3.4.3 netmask 0xfffffe00 broadcast 9.3.5.255lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT> inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1/0 tcp_sendspace 65536 tcp_recvspace 65536[root@tivaix1:/home/root] host 172.16.100.2host: 0827-803 Cannot find address 172.16.100.2.[root@tivaix1:/home/root] host 172.16.102.2host: 0827-803 Cannot find address 172.16.102.2.[root@tivaix1:/home/root] host 9.3.4.3tivaix1_svc is 9.3.4.3

In our environment we do not assign IP hostnames to the IP addresses for heartbeating over IP aliases, so the host commands for these addresses return an error.

Example 4-43 shows the IP addresses populated by HACMP after it is started on tivaix2. The addresses on tivaix2 are 172.16.100.3, 172.16.102.3 for the IP addresses for heartbeating over IP aliases, and 9.3.4.4 for the service IP label, highlighted in bold.

Example 4-43 Configured IP addresses after starting HACMP Cluster services on tivaix2

[root@tivaix1:/home/root] ifconfig -aen0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 192.168.100.102 netmask 0xfffffe00 broadcast 192.168.101.255 inet 9.3.4.195 netmask 0xfffffe00 broadcast 9.3.5.255 inet 172.16.100.3 netmask 0xfffffe00 broadcast 172.16.101.255en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 10.1.1.102 netmask 0xfffffe00 broadcast 10.1.1.255 inet 172.16.102.3 netmask 0xfffffe00 broadcast 172.16.103.255 inet 9.3.4.4 netmask 0xfffffe00 broadcast 9.3.5.255 tcp_sendspace 131072 tcp_recvspace 65536lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>


inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1/0 tcp_sendspace 65536 tcp_recvspace 65536[root@tivaix1:/home/root] host 172.16.100.3host: 0827-803 Cannot find address 172.16.100.3.[root@tivaix1:/home/root] host 172.16.102.3host: 0827-803 Cannot find address 172.16.102.3.[root@tivaix1:/home/root] host 9.3.4.4tivaix2_svc is 9.3.4.4

HACMP is now started on the cluster.

Verify HACMP statusEnsure that HACMP has actually started before starting to use its features. Log into the first node as root user and follow these steps:


2. Go to System Management (C-SPOC) -> Manage HACMP Services -> Show Cluster Services and press Enter to move a resource group. The COMMAND STATUS SMIT screen is displayed with the current status of all HACMP subsystems on the current node, similar to Figure 4-51 on page 293.


Figure 4-51 Current status of all HACMP subsystems on a cluster node

3. You can also verify the status of each node on an HACMP Cluster by running the following command:

/usr/es/sbin/cluster/utilities/clshowsrv -a

This produces output similar to Example 4-44.

Example 4-44 Using the command line to obtain the current status of all HACMP subsystems on a cluster node

$ /usr/es/sbin/cluster/utilities/clshowsrv -aSubsystem Group PID Status clstrmgrES cluster 16684 active clinfoES cluster 12950 active clsmuxpdES cluster 26856 active cllockdES lock inoperative

Whether using SMIT or the command line, only the following HACMP subsystems must be active on each node in the cluster: clstrmgrES, clinfoES, and clsmuxpdES. All other subsystems should be active if their services are required by your application(s).

COMMAND STATUS



Subsystem Group PID Status clstrmgrES cluster 16684 active clinfoES cluster 12950 active clsmuxpdES cluster 26856 active cllockdES lock inoperative



Repeat the procedure for all remaining nodes in the cluster. In our cluster, we repeated the procedure on tivaix2, and verified that the same subsystems are active.

Test HACMP resource group movesManually testing the movement of resource groups between cluster nodes further validates the HACMP configuration of the resource groups. If a resource group does not fall over to a cluster node after it was successfully moved manually, then you immediately know that addressing the issue involves addressing the HACMP fallover process, and likely not the resource group configuration.

To test HACMP resource group moves, follow these steps:


2. Go to System Management (C-SPOC) -> HACMP Resource Group and Application Management -> Move a Resource Group to Another Node and press Enter to move a resource group. The Select a Resource Group SMIT dialog is displayed.

3. Move the cursor to resource group rg1, as shown in Figure 4-52, and press Enter.

Figure 4-52 Select a Resource Group SMIT dialog

4. Move the cursor to destination node tivaix2, as shown in Figure 4-53 on page 295, and press Enter.

+--------------------------------------------------------------------------+¦ Select a Resource Group ¦¦ ¦¦ Move cursor to desired item and press Enter. Use arrow keys to scroll. ¦¦ ¦¦ # ¦¦ # Resource Group State Node(s) / Site ¦¦ # ¦¦ rg1 ONLINE tivaix1 / ¦¦ # ¦¦ rg2 ONLINE tivaix2 / ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F8=Image F10=Exit Enter=Do ¦¦ /=Find n=Find Next ¦+--------------------------------------------------------------------------+


Figure 4-53 Select a Destination Node SMIT dialog

5. The Move a Resource Group SMIT dialog is displayed as in Figure 4-54 on page 296. Press Enter to start moving resource group rg2 to destination node tivaix2.

+--------------------------------------------------------------------------+¦ Select a Destination Node ¦¦ ¦¦ Move cursor to desired item and press Enter. Use arrow keys to scroll. ¦¦ ¦¦ # To choose the highest priority available node for the ¦¦ # resource group, and to remove any Priority Override Location ¦¦ # that is set for the resource group, select ¦¦ # "Restore_Node_Priority_Order" below. ¦¦ Restore_Node_Priority_Order ¦¦ ¦¦ # To choose a specific node, select one below. ¦¦ # ¦¦ # Node Site ¦¦ # ¦¦ tivaix2 ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F8=Image F10=Exit Enter=Do ¦¦ /=Find n=Find Next ¦+--------------------------------------------------------------------------+


Figure 4-54 Move a Resource Group SMIT screen

6. A COMMAND STATUS SMIT screen displays the progress of the resource group move. It takes about two minutes to complete the resource group move in our environment (it might take longer, depending upon your environment’s specific details).

When the resource group move is complete, the COMMAND STATUS screen displays the results of the move. This is shown in Figure 4-55 on page 297, where we move resource group rg1 to cluster node tivaix2.

Move a Resource Group


[Entry Fields] Resource Group to be Moved rg1 Destination Node tivaix2 Persist Across Cluster Reboot? false +



Figure 4-55 COMMAND STATUS SMIT screen for moving a resource group

7. Repeat the process of moving resource groups in comprehensive patterns to verify that all possible resource group moves can be performed by HACMP.

Table 4-3 lists all the resource group moves that we performed to test all possible combinations. (Note that you have already performed the resource group move listed in the first line of this table.)

Table 4-3 Resource group movement combinations to test

COMMAND STATUS



[TOP]Attempting to move group rg1 to node tivaix2.

Waiting for cluster to process the resource group movement request.....

Waiting for the cluster to stabilize............

Resource group movement successful.Resource group rg1 is online on node tivaix2.

-----------------------------------------------------------------------------Group Name Type State Location Priority Override[MORE...8]


Resource Group Destination Node Resource Groups in tivaix1 after move

Resource Groups in tivaix2 after move

rg1 tivaix2 none rg1, rg2

rg2 tivaix1 rg2 rg1

rg1 tivaix1 rg1, rg2 none

rg2 tivaix2 rg1 rg2


Of course, if you add more cluster nodes to a mutual takeover configuration, you will need to test more combinations of resource group moves. We recommend that you automate the testing if possible for clusters of six or more cluster nodes.

Live test of HACMP falloverAfter testing HACMP manually, perform a live test of its fallover capabilities.

A live test ensures that HACMP performs as expected during fallover and fallback incidents. To perform a live test of HACMP in our environment:

1. Make sure that HACMP is running on all cluster nodes before starting this operation.

2. On the node you want to simulate a catastrophic failure upon, run the sync command several times, followed by the halt command:

sync ; sync ; sync ; halt -q

This flushes disk buffers to the hard disks and immediately halts the machine, simulating a catastrophic failure. Running sync multiple times is not strictly necessary on modern AIX systems, but it is performed as a best practice measure. If the operation is successful, the terminal displays the following message:

....Halt completed....

In our environment, we ran the halt command on tivaix2.

3. If you are logged in remotely to the node, your remote connection is disconnected shortly after this message is displayed. To verify the success of the test, log into the node that will accept the failed node’s resource group(s) and inspect the resource groups reported for that node using the lsvg, ifconfig and clRGinfo commands.

In our environment, we logged into tivaix2, then ran the halt command. We then logged into tivaix1, and ran the lsvg, ifconfig, and clRGinfo commands to identify the volume groups, service label/service IP addresses, and resource groups that fall over from tivaix2, as shown in Example 4-45.

Example 4-45 Using commands on tivaix1 to verify that tivaix2 falls over to tivaix1

[root@tivaix1:/home/root] hostnametivaix1[root@tivaix1:/home/root] lsvg -o

Restriction: Do not perform this procedure unless you are absolutely certain that all users are logged off the node and that restarting the node hardware is allowed. This procedure involves restarting the node, which can lead to lost data if it is performed while users are still logged into the node.


tiv_vg2tiv_vg1rootvg[root@tivaix1:/home/root] ifconfig -aen0: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 192.168.100.101 netmask 0xfffffe00 broadcast 192.168.101.255 inet 9.3.4.3 netmask 0xfffffe00 broadcast 9.3.5.255 inet 9.3.4.4 netmask 0xfffffe00 broadcast 9.3.5.255 tcp_sendspace 131072 tcp_recvspace 65536en1: flags=4e080863,80<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG,CHAIN> inet 10.1.1.101 netmask 0xfffffe00 broadcast 10.1.1.255 inet 9.3.4.194 netmask 0xfffffe00 broadcast 9.3.5.255lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT> inet 127.0.0.1 netmask 0xff000000 broadcast 127.255.255.255 inet6 ::1/0 tcp_sendspace 65536 tcp_recvspace 65536[root@tivaix1:/home/root] /usr/es/sbin/cluster/utilities/clRGinfo-----------------------------------------------------------------------------Group Name Type State Location-----------------------------------------------------------------------------rg1 cascading ONLINE tivaix1 OFFLINE tivaix2

rg2 cascading OFFLINE tivaix2 ONLINE tivaix1

Note how volume group tiv_vg2 and the service IP label/IP address 9.3.4.4, both normally found on tivaix1, fall over to tivaix1. Also note that resource group rg2 is listed in the OFFLINE state for tivaix2, but in the ONLINE state for tivaix1.

4. If you would like to get a simple list of the resource groups that are in the ONLINE state on a specific node, run the short script shown in Example 4-46 on the node you want to inspect for resource groups in the ONLINE state, replacing the string tivaix1 with the cluster node of your choice:

Example 4-46 List resource groups in ONLINE state for a node

/usr/es/sbin/cluster/utilities/clRGinfo -s | grep ONLINE | grep tivaix1 | \awk -F':' '{ print $1 }'

In our environment, this script is run on tivaix1 and returns the results shown in Example 4-47 on page 300. This indicates that resource group rg2, which used to run on cluster node tivaix2, is now on cluster node tivaix1.


Example 4-47 Obtain a simple list of resource groups that are in the ONLINE state on a specific node

[root@tivaix1:/home/root] /usr/es/sbin/cluster/utilities/clRGinfo -s | grep ONLINE | \> grep tivaix1 | awk -F':' '{ print $1 }'rg1rg2

5. After the test, power back on the halted node.

In our environment, we powered back on tivaix2.

6. Start HACMP on the node that was halted after it powers back on. The node reintegrates back into the cluster.

7. Verify that Cascading Without Fallback (CWOF) works.

In our environment, we made sure that resource group rg2 still resides on cluster node tivaix1.

8. Move the resource group back to its original node, using the preceding procedure for testing resource groups moves.

In our environment, we moved resource group rg2 to tivaix2.

9. Repeat the operation for other potential failure modes.

In our environment, we tested halting cluster node tivaix1, and verified that resource group rg1 moved to cluster node tivaix2.

Configure HACMP to start on system restartWhen you are satisfied with the verification of HACMP’s functionality, configure AIX to automatically start the cluster subsystems when the node starts. The node then automatically joins the cluster when the machine restarts.


2. Go to System Management (C-SPOC) -> Manage HACMP Services -> Start Cluster Services and press Enter to configure HACMP’s cluster start attributes. The Start Cluster Services SMIT dialog is displayed as shown in Figure 4-56 on page 301.


Figure 4-56 How to start HACMP on system restart

3. In the Start now, on system restart or both field, press Tab to change the value to restart as shown in Example 4-56 on page 321t, hen press Enter so the cluster subsystems will start when the machine restarts.

HACMP now starts on the cluster nodes automatically when the node restarts.

Verify IBM Tivoli Workload Scheduler falloverWhen halting cluster nodes during testing in “Live test of HACMP fallover” on page 298, IBM Tivoli Workload Scheduler will also start appropriately when a resource group is moved. Once you verify that a resource group’s disk and network resources have moved, you must verify that IBM Tivoli Workload Scheduler itself functions in its new cluster node (or in HACMP terms, verify that the application server resource of the resource group is functions in the new cluster node).

In our environment, we perform the live test of HACMP operation at least twice: once to test HACMP resource group moves of disk and network resources in response to a sudden halt of a cluster node, and again while verifying IBM Tivoli Workload Scheduler is running on the appropriate cluster node(s).

Start Cluster Services


[Entry Fields]* Start now, on system restart or both restart + Start Cluster Services on these nodes [tivaix2] + BROADCAST message at startup? true + Startup Cluster Lock Services? false + Startup Cluster Information Daemon? true + Reacquire resources after forced down ? false +



To verify that IBM Tivoli Workload Scheduler is running during a test of a cluster node fallover from tivaix2 to tivaix1:

1. Log into the surviving cluster node as any user.

2. Run the following command:

ps -ef | grep -v grep | grep maestro

The output should be similar to the following figure. Note that there are two instances of IBM Tivoli Workload Scheduler, because there are two instances of the processes batchman, netman, jobman, and mailman. Each pair of instances is made up of one process owned by the TWSuser user account maestro, and another owned by maestro2.

Example 4-48 Sample output of command to verify IBM Tivoli Workload Scheduler is moved by HACMP

[root@tivaix1:/home/root] ps -ef | grep -v grep | grep maestro maestro 13440 38764 0 15:56:41 - 0:00 /usr/maestro/bin/batchman -parm 32000maestro2 15712 1 0 18:57:44 - 0:00 /usr/maestro2/bin/netmanmaestro2 26840 15712 0 18:57:55 - 0:00 /usr/maestro2/bin/mailman -parm 32000 -- 2002 TIVAIX2 CONMAN UNIX 8.2 MESSAGE maestro 30738 1 0 15:56:29 - 0:00 /usr/maestro/bin/netman root 35410 13440 0 15:56:42 - 0:00 /usr/maestro/bin/jobman root 35960 40926 0 18:57:56 - 0:00 /usr/maestro2/bin/jobman maestro 38764 30738 0 15:56:40 - 0:00 /usr/maestro/bin/mailman -parm 32000 -- 2002 TIVAIX1 CONMAN UNIX 8.2 MESSAGEmaestro2 40926 26840 0 18:57:56 - 0:00 /usr/maestro2/bin/batchman -parm 32000

The command should be repeated while testing that CWOF works. If CWOF works, then the output will remain identical after the halted cluster node reintegrates with the cluster.

The command should be repeated again to verify that falling back works. In our environment, after moving a resource group back to the reintegrated cluster node, so tivaix1 and tivaix2 each have their original resource groups, the output of the command on tivaix1 shows just one set of IBM Tivoli Workload Scheduler processes as shown in the following.

Example 4-49 IBM Tivoli Workload Scheduler processes running on tivaix1 after falling back resource group rg2 to tivaix2

[root@tivaix1:/home/root] ps -ef | grep -v grep | grep maestro maestro 13440 38764 0 15:56:41 - 0:00 /usr/maestro/bin/batchman -parm 32000 maestro 30738 1 0 15:56:29 - 0:00 /usr/maestro/bin/netman root 35410 13440 0 15:56:42 - 0:00 /usr/maestro/bin/jobman maestro 38764 30738 0 15:56:40 - 0:00 /usr/maestro/bin/mailman -parm 32000 -- 2002 TIVAIX1 CONMAN UNIX 8.2 MESSAGE


The output of the command on tivaix2 in this case also shows only one instance of IBM Tivoli Workload Scheduler. The process IDs are different, but the processes are otherwise the same, as shown in Example 4-50.

Example 4-50 IBM Tivoli Workload Scheduler processes running on tivaix2 after falling back resource group rg2 to tivaix2

[root@tivaix2:/home/root] ps -ef | grep -v grep | grep maestromaestro2 17926 39660 0 19:02:17 - 0:00 /usr/maestro2/bin/mailman -parm 32000 -- 2002 TIVAIX2 CONMAN UNIX 8.2 MESSAGEmaestro2 39660 1 0 19:02:06 - 0:00 /usr/maestro2/bin/netman root 47242 47366 0 19:02:19 - 0:00 /usr/maestro2/bin/jobmanmaestro2 47366 17926 0 19:02:18 - 0:00 /usr/maestro2/bin/batchman -parm 32000

4.1.11 Add IBM Tivoli Management FrameworkAfter IBM Tivoli Workload Scheduler is configured for HACMP and made highly available, you can add IBM Tivoli Management Framework so that the Job Scheduling Console component of IBM Tivoli Workload Scheduler can be used. In this section we show how to plan, install and configure IBM Tivoli Management Framework for a highly available installation of IBM Tivoli Workload Scheduler. The steps include:

� “Planning for IBM Tivoli Management Framework” on page 303

� “Planning the installation sequence” on page 312

� “Stage installation media” on page 313

� “Install base Framework” on page 315

� “Load Tivoli environment variable in .profile files” on page 318

� “Install Tivoli Framework components and patches” on page 318

� “Add IP alias to oserv” on page 320

� “Install IBM Tivoli Workload Scheduler Framework components” on page 322

� “Create additional Connectors” on page 328

� “Configure Framework access” on page 330

� “Interconnect Framework servers” on page 331

� “How to log in using the Job Scheduling Console” on page 339

The details of each step follow.

Planning for IBM Tivoli Management FrameworkIn this section we show the entire process of iteratively planning the integration of IBM Tivoli Management Framework into an HACMP environment specifically


configured for IBM Tivoli Workload Scheduler. We show successively more functional configurations of IBM Tivoli Management Framework.

Configuring multiple instances of IBM Tivoli Management Framework on the same operating system image is not supported by IBM Support. In our highly available IBM Tivoli Workload Scheduler environment of mutual takeover nodes, this means we cannot use two or more instances of IBM Tivoli Management Framework on a single cluster node.

In other words, IBM Tivoli Management Framework cannot be configured as an application server in a resource group configured for mutual takeover in a cluster. At the time of writing, while the configuration is technically feasible and even demonstrated in IBM publications such as the IBM Redbook High Availability Scenarios for Tivoli Software, SG24-2032, IBM Support does not sanction this configuration.

Due to this constraint, we install an instance of IBM Tivoli Management Framework on a local drive on each cluster node. We then create a Connector for both cluster nodes on each instance of IBM Tivoli Management Framework.

The Job Scheduling Console is the primary component of IBM Tivoli Workload Scheduler that uses IBM Tivoli Management Framework. It uses the Job Scheduling Services component in IBM Tivoli Management Framework. The primary object for IBM Tivoli Workload Scheduler administrators to manage in the Job Scheduling Services is the Connector. A Connector holds the specific directory location that an IBM Tivoli Workload Scheduler scheduling engine is installed into. In our environment, this is /usr/maestro for TWS Engine1 that normally runs on tivaix1 and is configured for resource group rg1, and /usr/maestro2 that normally runs on tivaix2 and is configured for resource group rg2.

In our environment, under normal operation the relationship of Connectors to IBM Tivoli Workload Scheduler engines and IBM Tivoli Management Framework on cluster nodes is as shown in Figure 4-57 on page 305.

Note: While we discuss this process after showing you how to configure HACMP for IBM Tivoli Workload Scheduler in this redbook, in an actual deployment this planning occurs alongside the planning for HACMP and IBM Tivoli Workload Scheduler.


Figure 4-57 Relationship of IBM Tivoli Workload Scheduler, IBM Tivoli Management Framework, Connectors, and Job Scheduling Consoles during normal operation of an HACMP Cluster

We use Job Scheduling Console Version 1.3 Fix Pack 1; best practice calls for using at least this level of the Job Scheduling Console or later because it addresses many user interface issues. Its prerequisite is the base install of Job Scheduling Console Version 1.3 that came with your base installation media for IBM Tivoli Workload Scheduler. If you do not already have it installed, download Fix Pack 1 from:

ftp://ftp.software.ibm.com/software/tivoli_support/patches/patches_1.3/1.3-JSC-FP01

You can use the environment in this initial configuration as is. Users can log into either TWS Engine1 or TWS Engine2 by logging into the corresponding service IP address. Users can even log into both, but that requires running two instances of the Job Scheduling Console. Figure 4-58 on page 306 shows the display of a user’s Microsoft Windows 2000 computer running two instances of Job Scheduling Console. Each instance of the Job Scheduling Console is logged into a different cluster node as root user. To run two instances of Job Scheduling Console, simply run it twice.

tivaix1 tivaix2

TWS Engine1/usr/maestro

TWS Engine2/usr/maestro2

Framework1 Framework2

Connector1 Connector2


9.3.4.3 port 949.3.4.4 port 94


Figure 4-58 Viewing multiple instances of IBM Tivoli Workload Scheduler on separate cluster nodes on a single display

Note how in the Job Scheduling Console window for Administrator Root_tivaix1-region (root@tivaix1), the scheduling engine for TIVAIX2, is unavailable. The engine for TIVAIX2 is marked by a small icon badge that looks like a red circle with a white “X” inside it, as shown in Figure 4-59 on page 307.


Figure 4-59 Available scheduling engines when logged into tivaix1 during normal operation

In the Job Scheduling Console window for Administrator Root_tivaix2-region (root@tivaix2), the reverse situation exists: the scheduling engine for TIVAIX1 is unavailable. The engine for TIVAIX1 is similarly marked unavailable as shown in Figure 4-60.

Figure 4-60 Available scheduling engines when logged into tivaix2 during normal operation

This happens because in our environment we actually configure two Connectors (one for each instance of IBM Tivoli Workload Scheduler) on each instance of IBM Tivoli Management Framework, as shown Figure 4-61 on page 308.

If we do not configure multiple Connectors in this manner, then for example, when resource group rg2 on tivaix2 falls over to tivaix1, no Connector for TWS Engine2 will exist on tivaix1 after the fallover.

In normal operation, when a user logs into tivaix1, they use the Connector for TWS Engine1 (called Connector1 in Figure 4-61 on page 308). But on tivaix1 the Connector for TWS Engine2 does not refer to an active instance of IBM Tivoli Workload Scheduler on tivaix1 because /usr/maestro2 is already mounted and in use on tivaix2.


Figure 4-61 How multiple instances of the Connector work during normal operation

If resource groups rg1 and rg2 are running on a single cluster node, each instance of IBM Tivoli Workload Scheduler in each resource group requires its own Connector. This is why we create two Connectors for each instance of IBM Tivoli Management Framework. The Job Scheduling Console clients connect to IBM Tivoli Workload Scheduler through the IBM Tivoli Management Framework oserv process that listens on interfaces that are assigned the service IP labels.

For example, consider the fallover scenario where tivaix2 falls over to tivaix1. It causes resource group rg2 to fall over to tivaix1. As part of this resource group move, TWS Engine2 on /usr/maestro2 is mounted on tivaix1. Connector2 on tivaix1 then determines that /usr/maestro2 contains a valid instance of IBM Tivoli Workload Scheduler, namely TWS Engine2. IBM Tivoli Management Framework is configured to listen to both tivaix1_svc (9.3.4.3) or tivaix2_svc (9.3.4.4).

Because HACMP moves these service IP labels as part of the resource group, it makes both scheduling engines TWS Engine1 and TWS Engine2 available to Job Scheduling Console users who log into either tivaix1_svc or tivaix2_svc, even though both service IP labels in this fallover scenario reside on a single cluster node (tivaix1).

tivaix1 tivaix2






Connector2 Connector1X X


When a Job Scheduling Console session starts, the instance of IBM Tivoli Workload Scheduler it connects to creates authentication tokens for the session. These tokens are held in memory. When the cluster node that this instance of IBM Tivoli Workload Scheduler falls over to another cluster node, these authentication tokens in memory are lost.

Figure 4-62 shows the fallover scenario where tivaix2 falls over to tivaix1, and the effect upon the Connectors.

Figure 4-62 Multiple instances of Connectors after tivaix2 falls over to tivaix1

Note: Users working through the Job Scheduling Console on the instance of IBM Tivoli Workload Scheduler in the cluster node that fails must exit their session and log in through the Job Scheduling Console again. Because the IP service labels are still valid, users simply log into the same service IP label they originally used.

As far as Job Scheduling Console users are concerned, if a fallover occurs, they simply log back into the same IP address or hostname.

tivaix1 tivaix2






9.3.4.3 port 94


X

XXX

X

9.3.4.4 port 94


Note how Job Scheduling Console sessions that were connected to 9.3.4.4 on port 94 used to communicate with tivaix2, but now communicate instead with tivaix1. Users in these sessions see an error dialog window similar to the following figure the next time they attempt to perform an operation.

Figure 4-63 Sample error dialog box in Job Scheduling Console indicating possible fallover of cluster node

Users should be trained to determine identify when this dialog indicates a cluster node failure. Best practice is to arrange for appropriate automatic notification whenever a cluster fallover occurs, whether by e-mail, pager, instant messaging, or other means, and to send another notification when affected resource group(s) are returned to service. When Job Scheduling Console users receive the second notification, they can proceed to log back in again.

Once the resource group falls over, understanding when and how Connectors recognize a scheduling engine is key to knowing why certain scheduling engines appear after certain actions.

The scheduling engine that falls over is not available to the Job Scheduling Console of the surviving node until two conditions are met, in the following order:

1. A Job Scheduling Console session against the engine that fell over is started. In the scenario we are discussing where tivaix2 falls over to tivaix1, this means Job Scheduling Console users must log into tivaix2_svc.

2. The Job Scheduling Console users who originally logged into tivaix1_svc (the users of the surviving node, in other words) log out and log back into tivaix1_svc.

Note: While Job Scheduling Console users from the failed cluster node who log in again will see both scheduling engines, Job Scheduling Console users on the surviving cluster node will not see both engines until at least one user logs into the instance of IBM Tivoli Workload Scheduler that fell over, and after they log in.


When these conditions are met, Job Scheduling Console users on the surviving node see a scheduling engine pane as shown in Figure 4-64.

Figure 4-64 Available scheduling engines on tivaix1 after tivaix2 falls over to it

Only after a Job Scheduling Console session communicates with the Connector for a scheduling engine is the scheduling engine recognized by other Job Scheduling Console sessions that connect later. Job Scheduling Console sessions that are already connected will not recognize the newly-started scheduling engine because identification of scheduling engines only occurs once during Job Scheduling Console startup.

While the second iteration of the design is a workable solution, it is still somewhat cumbersome because it requires users who need to work with both scheduling engines to remember a set of rules. Fortunately, there is one final refinement to our design that helps address some of this awkwardness.

The TMR interconnection feature of IBM Tivoli Management Framework allows objects on one instance of IBM Tivoli Management Framework to be managed by another instance, and vice versa. We used a two-way interconnection between the IBM Tivoli Management Framework instances on the two cluster nodes in the environment we used for this redbook to expose the Connectors on each cluster node to other cluster nodes. Now when tivaix2 falls over tivaix1, Job Scheduling Console users see the available scheduling engines, as shown in Figure 4-65.

Figure 4-65 Available Connectors in interconnected Framework environment after tivaix2 falls over to tivaix1

Note that we now define the Connectors by the cluster node and resource group they are used for. So Connector TIVAIX1_rg1 is for resource group rg1 (that is, scheduling engine TWS Engine1) on tivaix1. In Example 4-65, we see Connector TIVAIX1_rg2 is active. It is for resource group rg2 (that is, TWS Engine2) on


tivaix1, and it is active only when tivaix2 falls over tivaix1. Connector TIVAIX2_rg1 is used if resource group rg1 falls over to tivaix2. Connector TIVAIX2_rg2 would normally be active, but because resource group rg2 has fallen over to tivaix1, it is inactive in the preceding figure.

During normal operation of the cluster, the active Connectors are TIVAIX1_rg1 and TIVAIX2_rg2, as shown in Figure 4-66.

Figure 4-66 Available Connectors in interconnected Framework environment during normal cluster operation

In this section we show how to install IBM Tivoli Management Framework Version 4.1 into an HACMP Cluster configured to make IBM Tivoli Workload Scheduler highly available, with all available patches as of the time of writing. We specifically show how to install on tivaix1 in the environment we used for this redbook. Installing on tivaix2 is similar, except the IP hostname is changed where applicable.

Planning the installation sequenceBefore installing, plan the sequence of packages to install. The publication Tivoli Enterprise Installation Guide Version 4.1, GC32-0804, describes in detail what needs to be installed.

Figure 4-67 on page 313 shows the sequence and dependencies of packages we planned for IBM Tivoli Management Framework Version 4.1 for the environment used for this redbook.


Figure 4-67 IBM Tivoli Framework 4.1.0 application and patch sequence and dependencies as of December 2, 2003

Stage installation mediaWe first stage the installation media on a hard disk for ease of installation. If your system does not have sufficient disk space to allow this, you can copy the media to a system that does have enough disk space and use Network File System (NFS), Samba, Andrew File System (AFS) or similar remote file systems to mount the media over the network.

In our environment, we created directories and copied the contents of the media and patches to the directories as shown in Table 4-4. The media was copied to both cluster nodes tivaix1 and tivaix2.

Table 4-4 Installation media directories used in our environment

4.1-TMF-0014

4.1-TMF-0008

4.1-TMF-0015

4.1-TMF-0016

4.1-TMF-0017

TMF410

4.1-TMF-0032

odadmin rexec4.1-TMF-0034

Sub-directory under /usr/sys/inst.images/ Description of contents or disc title (or electronic download)

tivoli Top level of installation media directory.

tivoli/fra Top level of IBM Tivoli Management Framework media.

tivoli/fra/FRA410_1of2 Tivoli Management Framework v4.1 1 of 2

tivoli/fra/FRA410_2of2 Tivoli Management Framework v4.1 2 of 2


You can download the patches for IBM Tivoli Management Framework Version 4.1 from:

ftp://ftp.software.ibm.com/software/tivoli_support/patches/patches_4.1

Note that we only used the contents of the tar files of each patch into the corresponding patch directory, such that the file PATCH.LST is in the top level of the patch directory. For example, for patch 4.1-TMF-0008, we downloaded the tar file:

ftp://ftp.software.ibm.com/software/tivoli_support/patches/patches_4.1/4.1-TMF-0008/4.1-TMF-0008.tar

Then we expanded the tar file in /usr/sys/inst.images/tivoli, resulting in a directory called 41TMF008. One of the files beneath that directory was the PATCH.LST file.

Example 4-51 shows the top two levels of the directory structure.

Example 4-51 Organization of installation media

[root@tivaix1:/home/root] ls /usr/sys/inst.images/tivoli/./ ../ fra/[root@tivaix1:/home/root] ls /usr/sys/inst.images/tivoli/*/usr/sys/inst.images/tivoli/fra:./ 41TMF014/ 41TMF017/ FRA410_1of2/../ 41TMF015/ 41TMF032/ FRA410_2of2/41TMF008/ 41TMF016/ 41TMF034//usr/sys/inst.images/tivoli/wkb:./ 1.3-JSC-FP01/ JSC130_1/ TWS820_1/../ 8.2-TWS-FP01/ JSC130_2/ TWS820_2/

tivoli/fra/41TMFnnn Extracted tar file contents of patch 4.1-TMF-0nnn.

tivoli/wkb Top level of IBM Tivoli Workload Scheduler media

tivoli/wkb/TWS820_1 IBM Tivoli Workload Scheduler V8.2 1 of 2

tivoli/wkb/TWS820_2 IBM Tivoli Workload Scheduler V8.2 2 of 2

tivoli/wkb/8.2-TWS-FP01 IBM Tivoli Workload Scheduler V8.2 Fix Pack 1

tivoli/wkb/JSC130_1 Job Scheduling Console V1.3 1 of 2

tivoli/wkb/JSC130_2 Job Scheduling Console V1.3 2 of 2

tivoli/wkb/1.3-JSC-FP01 Job Scheduling Console V1.3 Fix Pack 1

Sub-directory under /usr/sys/inst.images/ Description of contents or disc title (or electronic download)


After staging the media, install the base product as shown in the following section.

Install base FrameworkIn this section we show how to install IBM Tivoli Management Framework so it is specifically configured for IBM Tivoli Workload Scheduler on HACMP. This enables you to transition the instances of IBM Tivoli Management Framework used for IBM Tivoli Workload Scheduler to a mutual takeover environment if that becomes a supported feature in the future. We believe the configuration as shown in this section can be started and stopped directly from HACMP in a mutual takeover configuration.

When installing IBM Tivoli Management Framework on an HACMP Cluster node in support of IBM Tivoli Workload Scheduler, use the primary IP hostname as the hostname for IBM Tivoli Management Framework. Add an IP alias later for the service IP label. When this configuration is used with the multiple Connector object configuration described in section, this enables Job Scheduling Console users to connect through any instance of IBM Tivoli Management Framework, no matter which cluster nodes fall over.

IBM Tivoli Management Framework consists of a base install and various components. You must first prepare for the base install by performing the commands shown in Example 4-52 for cluster node tivaix1, in our environment.

On tivaix2, we replace the IP hostname in the first command shown in bold from tivaix1 to tivaix2

Example 4-52 Preparing for installation of IBM Tivoli Management Framework 4.1

[root@tivaix1:/home/root] HOST=tivaix1[root@tivaix1:/home/root] echo $HOST > /etc/wlocalhost[root@tivaix1:/home/root] WLOCALHOST=$HOST[root@tivaix1:/home/root] export WLOCALHOST[root@tivaix1:/home/root] mkdir /usr/local/Tivoli/install_dir[root@tivaix1:/home/root] cd /usr/local/Tivoli/install_dir[root@tivaix1:/home/root] /bin/sh /usr/sys/inst.images/tivoli/fra/FRA410_1of2/WPREINST.SHto install, type ./wserver -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2[root@tivaix1:/home/root] DOGUI=no[root@tivaix1:/home/root] export DOGUI

After you prepare for the base install, perform the initial installation of IBM Tivoli Management Framework by running the command shown in Example 4-53 on page 316. You will see output similar to this example; depending upon the speed of your server, it will take 5 to 15 minutes to complete.


On tivaix2 in our environment, we run the same command except we change the third line of the command highlighted in bold from tivaix1 to tivaix2.

Example 4-53 Initial installation of IBM Tivoli Management Framework Version 4.1

[root@tivaix1:/home/root] sh ./wserver -y \-c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 \-a tivaix1 -d \BIN=/usr/local/Tivoli/bin! \LIB=/usr/local/Tivoli/lib! \ALIDB=/usr/local/Tivoli/spool! \MAN=/usr/local/Tivoli/man! \APPD=/usr/lib/lvm/X11/es/app-defaults! \CAT=/usr/local/Tivoli/msg_cat! \LK=1FN5B4MBXBW4GNJ8QQQ62WPV0RH999P99P77D \RN=tivaix1-region \AutoStart=1 SetPort=1 CreatePaths=1 @ForceBind@=yes @EL@=NoneUsing command line style installation.....

Unless you cancel, the following operations will be executed: need to copy the CAT (generic) to: tivaix1:/usr/local/Tivoli/msg_cat need to copy the CSBIN (generic) to: tivaix1:/usr/local/Tivoli/bin/generic need to copy the APPD (generic) to: tivaix1:/usr/lib/lvm/X11/es/app-defaults need to copy the GBIN (generic) to: tivaix1:/usr/local/Tivoli/bin/generic_unix need to copy the BUN (generic) to: tivaix1:/usr/local/Tivoli/bin/client_bundle need to copy the SBIN (generic) to: tivaix1:/usr/local/Tivoli/bin/generic need to copy the LCFNEW (generic) to: tivaix1:/usr/local/Tivoli/bin/lcf_bundle.40 need to copy the LCFTOOLS (generic) to: tivaix1:/usr/local/Tivoli/bin/lcf_bundle.40/bin need to copy the LCF (generic) to: tivaix1:/usr/local/Tivoli/bin/lcf_bundle need to copy the LIB (aix4-r1) to: tivaix1:/usr/local/Tivoli/lib/aix4-r1 need to copy the BIN (aix4-r1) to: tivaix1:/usr/local/Tivoli/bin/aix4-r1 need to copy the ALIDB (aix4-r1) to: tivaix1:/usr/local/Tivoli/spool/tivaix1.db need to copy the MAN (aix4-r1) to: tivaix1:/usr/local/Tivoli/man/aix4-r1 need to copy the CONTRIB (aix4-r1) to: tivaix1:/usr/local/Tivoli/bin/aix4-r1/contrib need to copy the LIB371 (aix4-r1) to:


tivaix1:/usr/local/Tivoli/lib/aix4-r1 need to copy the LIB365 (aix4-r1) to: tivaix1:/usr/local/Tivoli/lib/aix4-r1Executing queued operation(s)Distributing machine independent Message Catalogs --> tivaix1 ..... Completed.

Distributing machine independent generic Codeset Tables --> tivaix1 .... Completed.

Distributing architecture specific Libraries --> tivaix1 ...... Completed.

Distributing architecture specific Binaries --> tivaix1 ............. Completed.

Distributing architecture specific Server Database --> tivaix1 .......................................... Completed.

Distributing architecture specific Man Pages --> tivaix1 ..... Completed.

Distributing machine independent X11 Resource Files --> tivaix1 ... Completed.

Distributing machine independent Generic Binaries --> tivaix1 ... Completed.

Distributing machine independent Client Installation Bundle --> tivaix1 ... Completed.

Distributing machine independent generic HTML/Java files --> tivaix1 ... Completed.

Distributing architecture specific Public Domain Contrib --> tivaix1 ... Completed.

Distributing machine independent LCF Images (new version) --> tivaix1 ............. Completed.

Distributing machine independent LCF Tools --> tivaix1 ....... Completed.

Distributing machine independent 36x Endpoint Images --> tivaix1 ............ Completed.

Distributing architecture specific 371_Libraries --> tivaix1 .... Completed.


Distributing architecture specific 365_Libraries --> tivaix1 .... Completed.

Registering installation information...Finished.

Load Tivoli environment variable in .profile filesThe Tivoli environment variables contain pointers to important directories that IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework use for many commands. Loading the variables in the .profile file of a user account ensures that these environment variables are always available immediately after logging into the user account.

Use the commands shown in Example 4-54 to modify the .profile files of the root and TWSuser user accounts on all cluster nodes to source in all Tivoli environment variables for IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework.

Example 4-54 Load Tivoli environment variables

PATH=${PATH}:${HOME}/binif [ -f /etc/Tivoli/setup_env.sh ] ; then . /etc/Tivoli/setup_env.shfiif [ -f `maestro`/tws_env.sh ] ; then . `maestro`/tws_env.shfi

Also enter these commands on the command line, or log out and log back in to activate the environment variables for the following sections.

Install Tivoli Framework components and patchesAfter the base install is complete, you can install all remaining Framework components and patches by running the script shown in Example 4-55 on page 319.

If you use this script on tivaix2, change the line that starts with the string “HOST=” so that tivaix1 is replaced with tivaix2.


Example 4-55 Script for installing IBM Tivoli Management Framework Version 4.1 with patches

#!/bin/ksh

if [ -d /etc/Tivoli ] ; then . /etc/Tivoli/setup_env.shfi

reexec_oserv(){ echo "Reexecing object dispatchers..." if [ òdadmin odlist list_od | wc -l` -gt 1 ] ; then # # Determine if necessary to shut down any clients tmr_hosts=òdadmin odlist list_od | head -1 | cut -c 36-` client_list=òdadmin odlist list_od | grep -v ${tmr_hosts}$` if [ "${client_list}" = "" ] ; then echo "No clients to shut down, skipping shut down of clients..." else echo "Shutting down clients..." odadmin shutdown clients echo "Waiting for all clients to shut down..." sleep 30 fi fi odadmin reexec 1 sleep 30 odadmin start clients}

HOST="tivaix1"winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JRE130 $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JHELP41 $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JCF41 $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JRIM41 $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i MDIST2GU $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i SISDEPOT $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i SISCLNT $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 -y -i ADE $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 -y -i AEF $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF008 -y -i 41TMF008 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF014 -y -i 41TMF014 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF015 -y -i 41TMF015 $HOSTreexec_oservwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF016 -y -i 41TMF016 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2928 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2929 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2931 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2932 $HOST


wpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2962 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2980 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2984 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2986 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2987 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2989 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF034 -y -i 41TMF034 $HOSTreexec_oservwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF032 -y -i JRE130_0 $HOST

This completes the installation of IBM Tivoli Management Framework Version 4.1.

After installing IBM Tivoli Management Framework, configure it to meet the requirements of integrating with IBM Tivoli Workload Scheduler over HACMP.

Add IP alias to oservInstalling IBM Tivoli Management Framework using the primary IP hostname of the server binds the Framework server (also called oserv) to the corresponding IP address. It only listens for Framework network traffic on this IP address. This makes it easy to start IBM Tivoli Management Framework before starting HACMP.

In our environment, we also need oserv to listen on the service IP address. The service IP label/address is moved between cluster nodes along with its parent resource group, but the primary IP hostname remains on the cluster node to ease administrative access (that is why it is called the persistent IP label/address). Job Scheduling Console users depend upon using this IP address, not the primary IP hostname of the server, to access IBM Tivoli Workload Scheduler services.

As a security precaution, IBM Tivoli Management Framework only listens on the IP address it is initially installed against unless the feature is specifically disabled to bind against other addresses. We show you how to disable this feature in this section.

To add the service IP label as a Framework oserv IP alias, follow these steps:

1. Log in as root user on a cluster node.

In our environment, we log in as root user on cluster node tivaix1.

2. Use the odadmin command as shown in Example 4-56 on page 321 to verify the current IP aliases of the oserv, add the service IP label as an IP alias to the oserv, then verify that the service IP label is added to the oserv as an IP alias.


Note that the numeral “1” in the odadmin odlist add_ip_alias command should be replaced by the “dispatcher number” of your Framework installation.

Example 4-56 Add an IP alias to the Framework oserv server

[root@tivaix1:/home/root] odadmin odlistRegion Disp Flags Port IPaddr Hostname(s)1369588498 1 ct- 94 9.3.4.194 tivaix1,tivaix1.itsc.austin.ibm.com[root@tivaix1:/home/root] odadmin odlist add_ip_alias 1 tivaix1_svc[root@tivaix1:/home/root] odadmin odlistRegion Disp Flags Port IPaddr Hostname(s)1369588498 1 ct- 94 9.3.4.194 tivaix1,tivaix1.itsc.austin.ibm.com 9.3.4.3 tivaix1_svc

The dispatcher number is displayed in the second column of the odadmin odlist command, on the same line as the primary IP hostname of your Framework installation. In Example 4-57, the dispatcher number is 7.

Example 4-57 Identify the dispatcher number of a Framework installation

[root@tivaix1:/home/root] odadmin odlistRegion Disp Flags Port IPaddr Hostname(s)1369588498 7 ct- 94 9.3.4.194 tivaix1,tivaix1.itsc.austin.ibm.com

The dispatcher number will be something other than “1” if you delete and reinstall Managed Nodes, or if your Framework server is part of an overall Tivoli Enterprise™ installation.

3. Use the odadmin command as shown in Example 4-58to verify that IBM Tivoli Management Framework currently binds against the primary IP hostname, disable the feature, then verify that it is disabled.

Note that the numeral “1” in the odadmin set_force_bind command should be replaced by the “dispatcher number” of your Framework installation.

Example 4-58 Disable set_force_bind object dispatcher option

[root@tivaix1:/home/root] odadmin | grep ForceForce socket bind to a single address = TRUE[root@tivaix1:/home/root] odadmin set_force_bind FALSE 1[root@tivaix1:/home/root] odadmin | grep ForceForce socket bind to a single address = FALSE

The dispatcher number is displayed in the second column of the odadmin odlist command, on the same line as the primary IP hostname of your Framework installation.

In Example 4-59 on page 322, the dispatcher number is 7.


Example 4-59 Identify the dispatcher number of a Framework installation

[root@tivaix1:/home/root] odadmin odlistRegion Disp Flags Port IPaddr Hostname(s)1369588498 7 ct- 94 9.3.4.194 tivaix1,tivaix1.itsc.austin.ibm.com

The dispatcher number will be something other than “1” if you delete and reinstall Managed Nodes, or if your Framework server is part of an overall Tivoli Enterprise installation.

4. Repeat the operation on all remaining cluster nodes.

For our environment, we repeated the operation on tivaix2, replacing tivaix1 with tivaix2 in the commands.

Install IBM Tivoli Workload Scheduler Framework componentsAfter installing IBM Tivoli Management Framework, install the IBM Tivoli Workload Scheduler Framework. The components for IBM Tivoli Workload Scheduler Version 8.2 in the environment we use throughout this redbook are:

� Tivoli Job Scheduling Services v1.2

� Tivoli TWS Connector 8.2

There are separate versions for Linux environments. See Tivoli Workload Scheduler Job Scheduling Console User’s Guide, SH19-4552, to identify the equivalent components for a Linux environment.

Best practice is to back up the Framework object database before installing any Framework components. This enables you to restore the object database to its original state before the installation in case the install operation encounters a problem.

Important: Disabling the set_force_bind variable can cause unintended side effects for installations of IBM Tivoli Management Framework that also run other IBM Tivoli server products, such as IBM Tivoli Monitoring and IBM Tivoli Configuration Manager. Refer to your IBM service provider for advice on how to address this potential conflict if you plan on deploying other IBM Tivoli server products on top of the instance of IBM Tivoli Management Framework that you use for IBM Tivoli Workload Scheduler.

Best practice is to dedicate an instance of IBM Tivoli Management Framework for IBM Tivoli Workload Scheduler, typically on the Master Domain Manager, and not to install other IBM Tivoli server products into it. This simplifies these administrative concerns and does not affect the functionality of a Tivoli Enterprise environment.


Use the wbkupdb command as shown in Example 4-60 to back up the object database.

Example 4-60 Back up the object database of IBM Tivoli Management Framework

[root@tivaix1:/home/root] cd /tmp[root@tivaix1:/tmp] wbkupdb tivaix1 ; echo DB_`date +%b%d-%H%M`

Starting the snapshot of the database files for tivaix1.............................................................................................

Backup Complete.DB_Dec09-1958

The last line of the output is produced by the echo command; it returns the name of the backup file created by wbkupdb. All backup files are stored in the directory $DBDIR/../backups. shows how to list all the available backup files.

Example 4-61 List all available object database backup files

[root@tivaix1:/home/root] ls $DBDIR/../backups./ ../ DB_Dec08-1705 DB_Dec08-1716DB_Dec08-1723 DB_Dec08-1724 DB_Dec09-1829

Example 4-61 shows there are five backups taken of the object database on cluster node tivaix1.

A common reason wbkupdb fails is the current working directory that it is executed from either does not grant write permissions to the user account running it, or there is not enough space to temporarily hold a copy of the object database directory.

Example 4-62 on page 324 shows how to verify there is enough disk space to run wkbkupdb.

Tip: Backing up the object database of IBM Tivoli Management Framework requires that the current working directory that the wbkupdb command is executed from grants write permission to the current user and contains enough disk space to temporarily hold the object database.


Example 4-62 Verifying enough disk space in the current working directory for wbkupdb

[root@tivaix1:/tmp] pwd/tmp[root@tivaix1:/tmp] du -sk $DBDIR15764 /usr/local/Tivoli/spool/tivaix1.db[root@tivaix1:/tmp] df -k /tmpFilesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/hd3 1146880 661388 43% 872 1% /tmp

In Example 4-62, the current working directory is /tmp. The du command in the example shows how much space the object database directory occupies. It is measured in kilobytes, and is 15,764 kilobytes in this example (highlighted in bold).

The df command in the example shows how much space is available in the current working directory. The third column, labeled “Free” in the output of the command, shows the available space in kilobytes. In this example, the available disk space in /tmp is 661,388 kilobytes. As long as the latter number is at least twice as large as the former, proceed with running wbkupdb.

If the installation of these critical IBM Tivoli Workload Scheduler components fail, refer to your site’s Tivoli administrators for assistance in recovering from the error, and direct them to the file created by wbkupdb (as reported by the echo command).

To install the IBM Tivoli Management Framework components for IBM Tivoli Workload Scheduler:


In our environment, we logged in as root user on tivaix1.

2. Enter the winstall command as shown in Example 4-63 to install Job Scheduling Services.

Example 4-63 Install Job Scheduling Services component on cluster node tivaix1

[root@tivaix1:/home/root] winstall -c /usr/sys/inst.images/tivoli/wkb/TWS820_2/TWS_CONN \ -y -i TMF_JSS tivaix1Checking product dependencies... Product TMF_3.7.1 is already installed as needed. Dependency check completed.Inspecting node tivaix1...Installing Product: Tivoli Job Scheduling Services v1.2

Unless you cancel, the following operations will be executed: For the machines in the independent class:


hosts: tivaix1 need to copy the CAT (generic) to: tivaix1:/usr/local/Tivoli/msg_cat


Creating product installation description object...Created.Executing queued operation(s)Distributing machine independent Message Catalogs --> tivaix1 Completed.

Distributing architecture specific Binaries --> tivaix1 Completed.



3. Enter the winstall command as shown in Example 4 on page 327 to install the Connector Framework resource. The command requires two IBM Tivoli Workload Scheduler-specific arguments, twsdir and iname.

These arguments create an initial Connector object. Best practice is to create initial Connector objects on a normally operating cluster. The order that Connector objects are created in does not affect functionality. It is key, however, to ensure the resource group of the corresponding instance of IBM Tivoli Workload Scheduler the initial Connector is being created for is in the ONLINE state on the cluster node you are working on.

Note: Both IBM Tivoli Workload Scheduler Job Scheduling Console User’s Guide Feature Level 1.2, SH19-4552 (released for IBM Tivoli Workload Scheduler Version 8.1) on page 26, and IBM Tivoli Workload Scheduler Job Scheduling Console User’s Guide Feature Level 1.3, SC32-1257 (release for IBM Tivoli Workload Scheduler Version 8.2) on page 45 refer to an owner argument to pass to the winstall command to install the Connector.

We believe this is incorrect, because the index files TWS_CONN.IND for both versions of IBM Tivoli Workload Scheduler do not indicate support this argument, and using the argument produces errors in the installation.


twsdir Enter the TWShome directory of an active instance of IBM Tivoli Workload Scheduler. The file system of the instance must be mounted and available.

iname Enter a Connector name for the instance of IBM Tivoli Workload Scheduler.

In our environment, we use /usr/maestro for twsdir, make sure it is mounted, and use TIVAIX1_rg1 as the Connector name for iname because we want to create an initial Connector object for resource group rg1 on tivaix1, as the cluster is in normal operation and resource group rg1 in the ONLINE state on tivaix1 is the normal state.

Example 4-64 Install Connector component for cluster node tivaix1

root@tivaix1:/home/root] winstall -c \/usr/sys/inst.images/tivoli/wkb/TWS820_2/TWS_CONN -y -i TWS_CONN \twsdir=/usr/maestro iname=TIVAIX1_rg1 createinst=1 tivaix1Checking product dependencies... Product TMF_JSS_1.2 is already installed as needed. Product TMF_3.7.1 is already installed as needed. Dependency check completed.Inspecting node tivaix1...Installing Product: Tivoli TWS Connector 8.2

Unless you cancel, the following operations will be executed: For the machines in the independent class: hosts: tivaix1


Creating product installation description object...Created.Executing queued operation(s)Distributing architecture specific Binaries --> tivaix1 .. Completed.




4. Verify both Framework components are installed using the wlsinst command as shown in the following example. The strings “Tivoli Job Scheduling Services v1.2” and “Tivoli TWS Connector 8.2” (highlighted in bold in Example 4-65) should display in the output of the command.

Example 4-65 Verify installation of Framework components for IBM Tivoli Workload Scheduler

[root@tivaix1:/home/root] wlsinst -pTivoli Management Framework 4.1Tivoli ADE, Version 4.1 (build 09/19)Tivoli AEF, Version 4.1 (build 09/19)Tivoli Java Client Framework 4.1Java 1.3 for TivoliTivoli Java RDBMS Interface Module (JRIM) 4.1JavaHelp 1.0 for Tivoli 4.1Tivoli Software Installation Service Client, Version 4.1Tivoli Software Installation Service Depot, Version 4.1Tivoli Job Scheduling Services v1.2Tivoli TWS Connector 8.2Distribution Status Console, Version 4.1

5. Verify the installation of the initial Connector instance using the wtwsconn.sh. Pass the same Connector name used for the iname argument in the preceding step as the value to the -n flag argument. shows the flag argument value TIVAIX1_rg1 (highlighted in bold).

In our environment we passed TIVAIX1_rg1 as the value for the -n flag argument.

Example 4-66 Verify creation of initial Connector

[root@tivaix1:/home/root] wtwsconn.sh -view -n TIVAIX1_rg1MaestroEngine 'maestroHomeDir' attribute set to: "/usr/maestro"MaestroPlan 'maestroHomeDir' attribute set to: "/usr/maestro"MaestroDatabase 'maestroHomeDir' attribute set to: "/usr/maestro"

The output of the command shows the directory path used as the value for the twdir argument in the preceding step, repeated on three lines (highlighted in bold in Example 4-66).

6. Repeat the operation for the remaining cluster nodes.

In our environment, repeated the operation for cluster node tivaix2. We used /usr/maestro2 for the twsdir argument and TIVAIX2_rg2 for the iname argument.


Create additional ConnectorsThe initial Connector objects created as part of the installation of IBM Tivoli Workload Scheduler Framework components only address one resource group that can run on each cluster node. Create additional Connectors to address all possible resource groups that a cluster node can take over, on all cluster nodes.

To create additional Connector objects:


In our environment we log in as root user on cluster node tivaix1.

2. Use the wlookup command to identify which Connector objects already exist on the cluster node, as shown in Example 4-67.

Example 4-67 Identify which Connector objects already exist on a cluster node

[root@tivaix1:/home/root] wlookup -Lar MaestroEngineTIVAIX1_rg1

In our environment, the only Connector object that exists is the one created by the installation of the IBM Tivoli Workload Scheduler Framework components, TIVAIX1_rg1, highlighted in bold in Example 4-67.

3. Use the wtwsconn.sh command to create an additional Connector object, as shown in Example 4-68. The command accepts the name of the Connector object to create for the value of the -n flag argument, and the TWShome directory path of the instance of IBM Tivoli Workload Scheduler that the Connector object will correspond to, as the value for the -t flag argument.

The corresponding resource group does not have to be in the ONLINE state on the cluster node. This step only creates the object, but does not require the presence of the resource group to succeed.

In our environment we created the Connector object TIVAIX1_rg2 to manage resource group rg2 on tivaix1 in case tivaix2 falls over to tivaix1. Resource group rg2 contains scheduling engine TWS Engine2. TWS Engine2 is installed in /usr/maestro2. So we pass /usr/maestro2 as the value to the -t flag argument.

Example 4-68 Create additional Connector object

[root@tivaix1:/home/root] wtwsconn.sh -create -n TIVAIX1_rg2 -t /usr/maestro2Scheduler engine createdCreated instance: TIVAIX1_rg2, on node: tivaix1MaestroEngine 'maestroHomeDir' attribute set to: /usr/maestro2MaestroPlan 'maestroHomeDir' attribute set to: /usr/maestro2MaestroDatabase 'maestroHomeDir' attribute set to: /usr/maestro2


4. Verify the creation of the additional Connector objects using the wtwsconn.sh command as shown in Example 4-69.

Example 4-69 Verify creation of additional Connector object

[root@tivaix1:/home/root] wtwsconn.sh -view -n TIVAIX1_rg2MaestroEngine 'maestroHomeDir' attribute set to: "/usr/maestro2"MaestroPlan 'maestroHomeDir' attribute set to: "/usr/maestro2"MaestroDatabase 'maestroHomeDir' attribute set to: "/usr/maestro2"

Pass the name of a new Connector object as the value for the -n flag argument. The output displays the TWShome directory path you use to create the Connector object if the create operation is successful.

5. Repeat the operation for all remaining Connector objects to create on the cluster node. Only create Connector objects for possible resource groups that the cluster node can take over. Using the examples in this section for instance, we would not create any Connector objects on tivaix1 that start with “TIVAIX2”. So the Connector objects TIVAIX2_rg1 and TIVAIX2_rg2 would not be created on tivaix1. They are instead created on tivaix2. In our environment, we did not have any more resource groups to address, so we did not create any more Connectors on tivaix1.

6. Repeat the operation on all remaining cluster nodes. In our environment we created the Connector object TIVAIX2_rg1 as shown in Example 4-70.

Example 4-70 Create additional Connectors on tivaix2

[root@tivaix2:/home/root] wtwsconn.sh -create -n TIVAIX2_rg1 -t /usr/maestroScheduler engine createdCreated instance: TIVAIX2_rg1, on node: tivaix2MaestroEngine 'maestroHomeDir' attribute set to: /usr/maestroMaestroPlan 'maestroHomeDir' attribute set to: /usr/maestroMaestroDatabase 'maestroHomeDir' attribute set to: /usr/maestro[root@tivaix2:/home/root] wtwsconn.sh -view -n TIVAIX2_rg1MaestroEngine 'maestroHomeDir' attribute set to: "/usr/maestro"MaestroPlan 'maestroHomeDir' attribute set to: "/usr/maestro"MaestroDatabase 'maestroHomeDir' attribute set to: "/usr/maestro"

If you make a mistake creating a Connector, remove the Connector using the wtwsconn.sh command as shown in Example 4-71.

Example 4-71 Remove a Connector

[root@tivaix1:/home/root] wtwsconn.sh -remove -n TIVAIX2Removed 'MaestroEngine' for 'TIVAIX2' instanceRemoved 'MaestroPlan' for 'TIVAIX2' instanceRemoved 'MaestroDatabase' for 'TIVAIX2' instance


In Example 4-71 on page 329, the Connector TIVAIX2 is removed. You can also use wtwsconn.sh to edit the one value accepted by a Connector when creating it. This is the directory of TWShome of the instance of IBM Tivoli Workload Scheduler the Connector communicates with. Example 4-72 shows how to change the directory.

Example 4-72 Change Connector’s directory value

[root@tivaix1:/home/root] wtwsconn.sh -remove -n TIVAIX2Removed 'MaestroEngine' for 'TIVAIX2' instanceRemoved 'MaestroPlan' for 'TIVAIX2' instanceRemoved 'MaestroDatabase' for 'TIVAIX2' instance

Editing the value of the directory is useful to match changes to the location of TWShome if IBM Tivoli Workload Scheduler is moved.

Configure Framework accessAfter you install IBM Tivoli Management Framework (see “Implementing IBM Tivoli Workload Scheduler in an HACMP cluster” on page 184), configure Framework access for the TWSuser accounts. This lets the TWSuser accounts have full access to IBM Tivoli Management Framework so you can add Tivoli Enterprise products like IBM Tivoli Workload Scheduler Plus Module, and manage IBM Tivoli Workload Scheduler Connectors.

In this redbook we show how to grant access to the root Framework Administrator object.The Tivoli administrators of some sites do not allow this level of access. Consult your Tivoli administrator if this is the case, because other levels of access can be arranged.

Use the wsetadmin command for to grant this level of access to your TWSuser accounts. In the environment, we ran the following command as root user to identify which Framework Administrator object to modify:

wlookup -ar Administrator

This command returns output similar to that shown in Example 4-73, taken from tivaix1 in our environment.

Example 4-73 Identify which Framework Administrator object to modify to grant TWSuser account root-level Framework access

[root@tivaix1:/home/root] wlookup -ar AdministratorRoot_tivaix1-region 1394109314.1.179#TMF_Administrator::Configuration_GUI#root@tivaix1 1394109314.1.179#TMF_Administrator::Configuration_GUI#


This shows that the root account is associated with the Administrator object called root@tivaix1. We then used the following command to add the TWSuser accounts to this Administrator object:

wsetadmin -l maestro -l maestro2 root@tivaix1

This grants root-level Framework access to the user accounts maestro and maestro2. Use the wgetadmin command as shown in Example 4-74 to confirm that the TWSuser accounts were added to the root Framework Administrator object. In line 3, the line that starts with the string “logins:”, the TWSuser accounts maestro and maestro2 (highlighted in bold) indicate these accounts were successfully added to the Administrator object.

Example 4-74 Confirm TWSuser accounts are added to root Framework Administrator object

[root@tivaix1:/home/root] wgetadmin root@tivaix1Administrator: Root_tivaix1-regionlogins: root@tivaix1, maestro, maestro2roles: global super, senior, admin, user, install_client, install_product, policy security_group_any_admin user Root_tivaix1-region admin, user, rconnectnotice groups: TME Administration, TME Authorization, TME Diagnostics, TME Scheduler

Once these are added, you can use the wtwsconn.sh command (and other IBM Tivoli Management Framework commands) to manage Connector objects from the TWSuser user account. If you are not sure which Connectors are available, use the wlookup command to identify the available Connectors, as shown in Example 4-75.

Example 4-75 Identify available Connectors to manage on cluster node

[root@tivaix1:/home/root] wlookup -Lar MaestroEngineTIVAIX1

In Example 4-75, the Connector called “TIVAIX1” (case is significant for Connector names) is available on tivaix1.

Interconnect Framework serversThe Connectors for each resource group are configured on each cluster node. Interconnect the Framework servers to be able to manage the Connectors on each cluster node from every other cluster node. Framework interconnection is a complex subject. We will show how to interconnect the Framework servers for our environment, but you should plan your interconnection if your installation of IBM Tivoli Workload Scheduler is part of a larger Tivoli Enterprise environment. Consult your IBM service provider for assistance with planning the interconnection.


To interconnect the Framework servers for IBM Tivoli Workload Scheduler for our environment, follow these steps:

1. Before starting, make a backup of the IBM Tivoli Management Framework object database using the wbkupdb command as shown in Example 4-76. Log on to each cluster node as root user, and run a backup of the object database on each.

Example 4-76 Back up object database of IBM Tivoli Management Framework

[root@tivaix1:/home/root] cd /tmp[root@tivaix1:/tmp] wbkupdb tivaix1

Starting the snapshot of the database files for tivaix1.............................................................................................

Backup Complete.

2. Temporarily grant remote shell access to the root user on each cluster node. Edit or create as necessary the .rhosts file in the home directory of the root user on each cluster node. (This is a temporary measure and we will remove it after we finish the interconnection operation.)

In our environment we created the .rhosts file with the contents as shown in Example 4-77.

Example 4-77 Contents of .rhosts file in home directory of root user

tivaix1 roottivaix2 root

3. Temporarily grant the generic root user account (root with no hostname qualifier) a Framework login on the root Framework account. Run the wsetadmin command as shown:

wsetadmin -l root root@tivaix1

Tip: When working with Tivoli administrators, be aware that they are used to hearing “Framework resources” called “managed resources”. We use the term “Framework resource” in this redbook to point out that this is a concept applied to IBM Tivoli Management Framework, and to distinguish it from HACMP resources. It is not an official term, however, so when working with staff who are not familiar with HACMP we advise using the official term of “managed resources” to avoid confusion.


If you do not know your root Framework account, consult your Tivoli administrator or IBM service provider. (This is a temporary measure and we will remove it after we finish the interconnection operation.)

In our environment the root Framework account is root@tivaix, so we grant the generic root user account a login on this Framework account.

4. Run the wlookup commands on the cluster node as shown in Example 4-78 to determine the Framework objects that exist before interconnection, so you can refer back to them later in the operation.

Example 4-78 Sampling Framework objects that exist before interconnection on tivaix1

[root@tivaix1:/home/root] wlookup -Lar ManagedNodetivaix1[root@tivaix1:/home/root] wlookup -Lar MaestroEngineTIVAIX1_rg1TIVAIX1_rg2

In our environment we ran the commands on tivaix1.

5. Run the same sequence of wlookup commands, but on the cluster node on the opposing side of the interconnection operation, as shown in Example 4-79.

Example 4-79 Sampling Framework objects that exist before interconnection on tivaix2

[root@tivaix2:/home/root] wlookup -Lar ManagedNodetivaix2[root@tivaix2:/home/root] wlookup -Lar MaestroEngineTIVAIX2_rg1TIVAIX2_rg2

In our environment we ran the commands on tivaix2.

6. Interconnect the Framework servers in a two-way interconnection using the wconnect command as shown in Example 4-80 on page 334.

Refer to Tivoli Management Framework Reference Manual Version 4.1, SC32-0806, for a complete description of how to use wconnect.

Note: If an interconnection is made under a user other than root, the /etc/hosts.equiv file also must be configured. Refer to “Secure and Remote Connections” in Tivoli Management Framework Maintenance and Troubleshooting Guide Version 4.1, GC32-0807, for more information.


Example 4-80 Interconnect the Framework servers on tivaix1 and tivaix2

[root@tivaix1:/home/root] wconnect -c none -l root -m Two-way -r none tivaix2Enter Password for user root on host tivaix2:

In our environment we configured an interconnection against tivaix2, using the root account of tivaix2 to perform the operation through the remote shell service, as shown in Example 4-80.

Because we do not use an interregion encryption (set during Framework installation in the wserver command arguments), we pass none to the -c flag option. Because we do not use encryption in tivaix2’s Tivoli region, we pass none to the -r flag option.

We log into tivaix2 and use the odadmin command to determine the encryption used in tivaix2’s Tivoli region, as shown in Example 4-81. The line that starts with “Inter-dispatcher encryption level” displays the encryption setting of the Tivoli region, which is none in the example (highlighted in bold).

Example 4-81 Determine the encryption used in the Tivoli region of tivaix2

[root@tivaix2:/home/root] odadminTivoli Management Framework (tmpbuild) #1 Wed Oct 15 16:45:40 CDT 2003(c) Copyright IBM Corp. 1990, 2003. All Rights Reserved.

Region = 1221183877Dispatcher = 1Interpreter type = aix4-r1Database directory = /usr/local/Tivoli/spool/tivaix2.dbInstall directory = /usr/local/Tivoli/binInter-dispatcher encryption level = noneKerberos in use = FALSERemote client login allowed = version_2Install library path = /usr/local/Tivoli/lib/aix4-r1:/usr/lib:/usr/local/Tivoli/install_dir/iblib/aix4-r1:/usr/lib:/usr/local/Tivoli/lib/aix4-r1:/usr/libForce socket bind to a single address = FALSEPerform local hostname lookup for IOM connections = FALSEUse Single Port BDT = FALSEPort range = (not restricted)Single Port BDT service port number = default (9401)Network Security = none

Note: While writing this redbook, we observed that the wconnect command behaves inconsistently when used in trusted host mode, especially upon frequently restored object databases. Therefore, we enabled trusted host access through .rhosts only as a precaution, and forced wconnect to require a password; then it does not exhibit the same inconsistency.


SSL Ciphers = defaultALLOW_NAT = FALSEState flags in use = TRUEState checking in use = TRUEState checking every 180 secondsDynamic IP addressing allowed = FALSETransaction manager will retry messages 4 times.

7. Use the wlsconn and odadmin commands to verify the interconnection as shown in Example 4-82.

Example 4-82 Verify Framework interconnection

[root@tivaix1:/home/root] wlsconn MODE NAME SERVER REGION <----> tivaix2-region tivaix2 1221183877[root@tivaix1:/home/root] odadmin odlistRegion Disp Flags Port IPaddr Hostname(s)1369588498 1 ct- 94 9.3.4.194 tivaix1,tivaix1.itsc.austin.ibm.com 9.3.4.3 tivaix1_svc1112315744 1 ct- 94 9.3.4.195 tivaix2,tivaix2.itsc.austin.ibm.com

The output displays the primary IP hostname of the cluster node that is interconnected to in the preceding step. In our environment, the primary IP hostname of cluster node tivaix2 is found under the SERVER column of the output of the wlsconn command (highlighted in bold in Example 4-82, with the value tivaix2). The same value (tivaix2, highlighted in bold in Example 4-82) is found under the Hostname(s) column in the output of the odadmin command, on the row that shows the Tivoli region ID of the cluster node.

The Tivoli region ID is found by entering the odadmin command as shown in Example 4-83. It is on the line that starts with “Region =”.

Example 4-83 Determine Tivoli region ID of cluster node

[root@tivaix2:/home/root] odadminTivoli Management Framework (tmpbuild) #1 Wed Oct 15 16:45:40 CDT 2003(c) Copyright IBM Corp. 1990, 2003. All Rights Reserved.

Region = 1221183877Dispatcher = 1Interpreter type = aix4-r1Database directory = /usr/local/Tivoli/spool/tivaix2.dbInstall directory = /usr/local/Tivoli/bin

Important: Two-way interconnection operations only need to be performed on one side of the connection. If you have two cluster nodes, you only need to run the wconnect command on one of them.


Inter-dispatcher encryption level = noneKerberos in use = FALSERemote client login allowed = version_2Install library path = /usr/local/Tivoli/lib/aix4-r1:/usr/lib:/usr/local/Tivoli/install_dir/iblib/aix4-r1:/usr/lib:/usr/local/Tivoli/lib/aix4-r1:/usr/libForce socket bind to a single address = FALSEPerform local hostname lookup for IOM connections = FALSEUse Single Port BDT = FALSEPort range = (not restricted)Single Port BDT service port number = default (9401)Network Security = noneSSL Ciphers = defaultALLOW_NAT = FALSEState flags in use = TRUEState checking in use = TRUEState checking every 180 secondsDynamic IP addressing allowed = FALSETransaction manager will retry messages 4 times.

In this example, the region ID is shown as 1221183877.

8. Interconnecting Framework servers only establishes a communication path. The Framework resources that need to be shared between Framework servers have to be pulled across the servers using an explicit updating command.

Sharing a Framework resource shares all the objects that the resource defines. This enables Tivoli administrators to securely control which Framework objects are shared between Framework servers, and control the performance of the Tivoli Enterprise environment by leaving out unnecessary resources from the exchange of resources between Framework servers. Exchange all relevant Framework resources among cluster nodes by using the wupdate command.

In our environment we exchanged the following Framework resources:

– ManagedNode– MaestroEngine– MaestroDatabase– MaestroPlan– SchedulerEngine– SchedulerDatabase– SchedulerPlan

Use the script shown in Example 4-84 on page 337 to exchange resources on all cluster nodes.


Example 4-84 Exchange useful and required resources for IBM Tivoli Workload Scheduler

for resource in ManagedNode \ MaestroEngine MaestroDatabase MaestroPlan \ SchedulerEngine SchedulerDatabase SchedulerPlando wupdate -r ${resource} Alldone

The SchedulerEngine Framework resource enables the interconnected scheduling engines to present themselves in the Job Scheduling Console. The MaestroEngine Framework resource enables the wmaeutil command to manage running instances of Connectors.

In our environment, we ran the script in Example 4-84 on tivaix1 and tivaix2.

9. Verify the exchange of Framework resources. Run the wlookup command as shown in Example 4-85 on the cluster node.

Note the addition of Framework objects that used to only exist on the cluster node on the opposite side of the interconnection.

Example 4-85 Verify on tivaix1 the exchange of Framework resources

[root@tivaix1:/home/root] wlookup -Lar ManagedNodetivaix1tivaix2[root@tivaix1:/home/root] wlookup -Lar MaestroEngine

Important: Unlike the wconnect command, the wupdate command must be run on all cluster nodes, even on two-way interconnected Framework servers.

Tip: Best practice is to update the entire Scheduler series (SchedulerDatabase, SchedulerEngine, and SchedulerPlan) and Maestro™ series (MaestroDatabase, MaestroEngine, and MaestroPlan) of Framework resources, if for no other reason than to deliver administrative transparency so that all IBM Tivoli Workload Scheduler-related Framework objects can be managed from any cluster node running IBM Tivoli Management Framework.

It is much easier to remember that any IBM Tivoli Workload Scheduler-related Framework resource can be seen and managed from any cluster node running a two-way interconnected IBM Tivoli Management Framework server, than to remember a list of which resources must be managed locally on each individual cluster nodes, and which can be managed from anywhere in the cluster.


TIVAIX1_rg1TIVAIX1_rg2TIVAIX2_rg1TIVAIX2_rg2

In our environment, we ran the commands on tivaix1.

10.Run the same sequence of wlookup commands, but on the cluster node on the opposite side of the interconnection, as shown in Example 4-86. The output from the commands should be identical to the same commands run on the cluster node in the preceding step.

Example 4-86 Verify on tivaix2 the exchange of Framework resources

[root@tivaix2:/home/root] wlookup -Lar ManagedNodetivaix1tivaix2[root@tivaix2:/home/root] wlookup -Lar MaestroEngineTIVAIX1_rg1TIVAIX1_rg2TIVAIX2_rg1TIVAIX2_rg2

In our environment, we ran the commands on tivaix1.

11.Log into both cluster nodes through the Job Scheduling Console, using the service IP labels of the cluster nodes and the root user account. All scheduling engines (corresponding to the configured Connectors) on all cluster nodes appear. Those scheduling engines marked inactive are actually Connectors for potential resource groups on a cluster node that are not active because the resource group is not running on that cluster node.

In our environment, the list of available scheduling engines was as shown in Figure 4-86 on page 363, for a normal operation cluster.

Figure 4-68 Available scheduling engines after interconnection of Framework servers

12.Remove the .rhosts entries or delete the entire file if the two entries in this operation were the only ones added.


13.Remove the configuration that allows any root user to access Framework. Enter the wsetadmin command as shown.

wsetadmin -L root root@tivaix1

14.Set up a periodic job to exchange Framework resources using the wupdate command shown in the script in the preceding example. The frequency that the job should run at depends upon how often changes are made to the Connector objects. For most sites, best practice is a daily update about an hour before Jnextday. Timing it before Jnextday makes the Framework resource update compatible with any changes to the installation location of IBM Tivoli Workload Scheduler. These changes are often timed to occur right before Jnextday is run.

How to log in using the Job Scheduling ConsoleJob Scheduling Console users should log in using the service IP label of the scheduling engine they work with the most. Figure 4-69 shows how to log into TWS Engine1, no matter where it actually resides on the cluster, by using tivaix1_svc as the service label.

Figure 4-69 Log into TWS Engine1

Figure 4-70 on page 340 shows how to log into TWS Engine2.


Figure 4-70 Log into TWS Engine2

While using the IP hostnames will also work during normal operation of the cluster, they are not transferred during an HACMP fallover. Therefore, Job Scheduling Console users must use a service IP label for an instance of IBM Tivoli Workload Scheduler that falls over to a foreign cluster node.

4.1.12 Production considerationsIn this redbook, we present a very straightforward implementation of a highly available configuration of IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework. An actual production deployment adds considerably to the complexity of the presentation. In this section, we identify some of the considerations that have to be managed in an actual deployment.

Naming conventionsIn this redbook we used names selected to convey their product function as much as possible. However, this may lead to names that are inconvenient for users in a production environment.

The IP service labels in our environment, tivaix1_svc and tivaix2_svc, are the primary means for Job Scheduling Console users to specify what to log into. For these users, the “_svc” string typically holds no significance. We recommend using a more meaningful name like master1 and master2 for two cluster nodes that implement Master Domain Manager servers, for example.

Connector names in this redbook emphasized the cluster node first. In an actual production environment, we recommend emphasizing the resource group first in the name. Furthermore, the name of the resource group would be more


meaningful if it referred to its primary business function. For example, TIVAIX1_rg1 in the environment we used for this redbook would be changed to mdm1_tivaix1 for Master Domain Manager server 1. Job Scheduling Console users would then see in their GUI a list of resource groups in alphabetical order, in terms they already work with.

Dynamically creating and deleting ConnectorsThe inactive Connector objects do not have to remain in their static configurations. They only have to be created if a resource group falls over to a cluster node. For example, during normal operation of our environment, we do not use Connectors TIVAIX1_rg2 and TIVAIX2_rg1. If the Connectors can be dynamically created and deleted as necessary, then Job Scheduling Console users will only ever see active resource groups.

After a resource group is brought up in a cluster node, the rg_move_complete event is posted. A custom post-event script for the event can be developed to identify which resource group is moving, what cluster node it is moving to, and which Connectors are extraneous as a result of the move. This information, taken together, enables the script to create an appropriate new Connector and delete the old Connector. The result delivered to the Job Scheduling Console users is a GUI that presents the currently active scheduling engines running in the cluster as of the moment in time that the user logs into the scheduling network.

Time synchronizationBest practice is to use a time synchronization tool to keep the clocks on all cluster nodes synchronized to a known time standard. One such tool we recommend is ntp, an Open Source implementation of the Network Time Protocol. For more information on downloading and implementing ntp for time synchronization, refer to:

http://www.ntp.org/

Network Time Protocol typically works by pulling time signals from the Internet or through a clock tuned to a specific radio frequency (which is sometimes not available in certain parts of the world). This suffices for the majority of commercial applications, even though using the Internet for time signals represents a single point of failure. Sites with extremely high availability requirements for applications that require very precise time keeping can use their own onsite reference clocks to eliminate using the Internet or a clock dependent upon a radio frequency as the single point of failure.


SecurityIn this redbook we present a very simplified implementation with as few security details as necessary that obscure the HACMP aspects. In an actual production deployment, however, security is usually a large part of any planning and implementation. Be aware that some sites may not grant access to the Framework at the level that we show.

Some sites may also enforce a Framework encryption level across the Managed Nodes. This affects the interconnection of servers. Consult your IBM service provider for information about your site’s encryption configuration and about how to interconnect in an encrypted Framework environment.

Other security considerations like firewalls between cluster nodes, firewalls between cluster nodes and client systems like Job Scheduling Console sessions, and so forth require careful consideration and planning. Consult your IBM service provider for assistance on these additional scenarios.

MonitoringBy design, failures of components in the cluster are handled automatically—but you need to be aware of all such events. Chapter 8, “Monitoring an HACMP Cluster”, in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862, describes various tools you can use to check the status of an HACMP Cluster, the nodes, networks, and resource groups within that cluster, and the daemons that run on the nodes.

HACMP software incudes the Cluster Information Program (Clinfo), an SNMP-based monitor. HACMP for AIX software provides the HACMP for AIX MIB, which is associated with and maintained by the HACMP for AIX management agent, the Cluster SMUX peer daemon (clsmuxpd). Clinfo retrieves this information from the HACMP for AIX MIB through the clsmuxpd.

Clinfo can run on cluster nodes and on HACMP for AIX client machines. It makes information about the state of an HACMP Cluster and its components available to clients and applications via an application programming interface (API). Clinfo and its associated APIs enable developers to write applications that recognize and respond to changes within a cluster.

The Clinfo program, the HACMP MIB, and the API are documented in High Availability Cluster Multi-Processing for AIX Programming Client Applications Version 5.1, SC23-4865.

Although the combination of HACMP and the inherent high availability features built into the AIX system keeps single points of failure to a minimum, there are still failures that, although detected, can cause other problems. See the chapter on events in High Availability Cluster Multi-Processing for AIX, Planning and


Installation Guide Version 5.1, SC23-4861-00, for suggestions about customizing error notification for various problems not handled by the HACMP events.

Geographic high availabilityAn extension of cluster-based high availability is geographic high availability. As the name implies, these configurations increase the availability of an application even more when combined with a highly available cluster. The configurations accomplish this by treating the cluster’s entire site as a single point of failure, and introduce additional nodes in a geographically separate location. These geographically separate nodes can be clusters in themselves.

Consult your IBM service provider for assistance in planning and implementing a geographic high availability configuration.

Enterprise managementDelivering production-quality clusters often involves implementing enterprise systems management tools and processes to ensure the reliability, availability and serviceability of the applications that depend upon the cluster. This section covers some of the considerations we believe that should be given extra attention when implementing a highly available cluster for IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework.

Many IBM Tivoli products speed the time to deliver the additional necessary services to enable you to deliver service level guarantees for the users of the cluster. For more information about these products, refer to:

http://www.ibm.com/software/tivoli/

We recommend that you consult your IBM Tivoli service provider for advice on other enterprise systems management issues that should be considered. The issues covered in this section represent only a few of the benefits available for delivery to users of the cluster.

Measuring availabilityAvailability analysis is a major maintenance tool for clusters. You can use the Application Availability Analysis tool to measure the amount of time that any of your applications is available. The HACMP software collects, time stamps, and logs the following information:

� An application starts, stops, or fails� A node fails or is shut down, or comes online� A resource group is taken offline or moved� Application monitoring is suspended or resumed

Using SMIT, you can select a time period and the tool will display uptime and downtime statistics for a given application during that period.


The tool displays:

� Percentage of uptime� Amount of uptime� Longest period of uptime� Percentage of downtime� Amount of downtime� Longest period of downtime� Percentage of time application monitoring was suspended

The Application Availability Analysis tool reports application availability from the HACMP Cluster infrastructure's point of view. It can analyze only those applications that have been properly configured so that they will be managed by the HACMP software.

When using the Application Availability Analysis tool, keep in mind that the statistics shown in the report reflect the availability of the HACMP application server, resource group, and (if configured) the application monitor that represent your application to HACMP.

The Application Availability Analysis tool cannot detect availability from an end user's point of view. For example, assume that you have configured a client-server application so that the server was managed by HACMP, and after the server was brought online, a network outage severed the connection between the end user clients and the server.

End users would view this as an application outage because their client software could not connect to the server—but HACMP would not detect it, because the server it was managing did not go offline. As a result, the Application Availability Analysis tool would not report a period of downtime in this scenario.

For this reason, best practice is to monitor everything that affects the entire user experience. We recommend using tools like IBM Tivoli Monitoring, IBM Tivoli Service Level Advisor, and IBM Tivoli NetView to perform basic monitoring and reporting of the end-user service experience.

Configuration managementWhen there are many nodes in a cluster, configuration management often makes a difference of as much as hours or even days between the time a new cluster node is requested by users and when it is available with a fully configured set of highly available applications.

Configuration management tools also enable administrators to enforce the maintenance levels, patches, fix packs and service packs of the operating system and applications on the cluster nodes. They accomplish this by gathering inventory information and comparing against baselines established by the


administrators. This eliminates the errors that are caused in a cluster by mismatched versions of operating systems and applications.

We recommend using IBM Tivoli Configuration Manager to implement services that automatically create a new cluster node from scratch, and enforce the software levels loaded on all nodes in the cluster.

NotificationLarge, highly available installations are very complex systems, often involving multiple teams of administrators overseeing different subsystems. Proper notification is key to the timely and accurate response to problems identified by a monitoring system. We recommend using IBM Tivoli Enterprise Console and a notification server to implement robust, flexible and scalable notification services.

ProvisioningFor large installations of clusters, serving many highly available applications, with many on demand cluster requirements and change requests each week, provisioning software is recommended as a best practice. In these environments, a commercial-grade provisioning system substantially lowers the administrative overhead involved in responding to customer change requests. We recommend using IBM Tivoli ThinkDynamic Orchestrator to implement provisioning for very complex and constantly changing clusters.

Practical lessons learned about high availabilityWhile writing this redbook, a serial disk in the SSA disk tray we use in our environment failed. Our configuration does not use this disk for any of our volume groups, so we continued to use the SSA disk tray. However, the failed drive eventually impacted the performance of the SSA loop to the point that HACMP functionality was adversely affected.

The lesson we learned from this experience was that optimal HACMP performance depends upon a properly maintained system. In other words, using HACMP does not justify delaying normal system preventative and necessary maintenance tasks.

Forced HACMP stopsWe observed that forcing HACMP services to stop may leave it in an inconsistent state. If there are problems starting it again, we find that stopping it gracefully before attempting a start clears up the problem.

4.1.13 Just one IBM Tivoli Workload Scheduler instanceThe preceding sections show you how to design, plan and implement a two-node HACMP Cluster for an IBM Tivoli Workload Scheduler Master Domain Manager


in a mutual takeover configuration. This requires you to design your overall enterprise workload into two independent, or at most loosely coupled, sets of job streams. You can, however,opt to only implement a single instance of IBM Tivoli Workload Scheduler in a two-node cluster in a hot standby configuration.

Best practice is to use a mutual takeover configuration for Master Domain Managers. In this section, we discuss how to implement a single instance of IBM Tivoli Workload Scheduler in a hot standby configuration, which is appropriate for creating highly available Fault Tolerant Agents, for example.

You can create a cluster with just one instance of IBM Tivoli Workload Scheduler by essentially using the same instructions, but eliminating one of the resource groups. You can still use local instances of IBM Tivoli Management Framework. With only one resource group, however, there are some other, minor considerations to address in the resulting HACMP configuration.

Create only one IBM Tivoli Workload Scheduler Connector on each cluster node. If the installation of the single instance of IBM Tivoli Workload Scheduler is in /usr/maestro, the instance normally runs on cluster node tivaix1, and the IBM Tivoli Workload Scheduler Connector is named PROD for “production”, then all instances of IBM Tivoli Management Framework on other cluster nodes also use a IBM Tivoli Workload Scheduler Connector with the same name (“PROD”) and configured the same way. When the resource group containing an instance of IBM Tivoli Workload Scheduler falls over to the another cluster node, the IP service label associated with the instance falls over with the resource group.

Configure the instances of IBM Tivoli Management Framework on the cluster nodes to support this IP service label as an IP alias for the Managed Node in each cluster node. Job Scheduling Console sessions can connect against the corresponding IP service address even after a fallover event.

Consult your IBM service provider if you need assistance with configuring a hot standby, single instance IBM Tivoli Workload Scheduler installation.

Complex configurationsIn this redbook we show how to configure IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework on a cluster with two cluster nodes. More complex configurations include:

� One instance of IBM Tivoli Workload Scheduler across more than two cluster nodes.

Important: Going from a mutual takeover, dual Master Domain Manager configuration to only one instance of IBM Tivoli Workload Scheduler doubles the risk exposure of the scheduling environment.


� More than two instances of IBM Tivoli Workload Scheduler across more than two cluster nodes.

� Multiple instances of IBM Tivoli Workload Scheduler on a single cluster node, in a cluster with multiple nodes.

The number of permutations of fallover scenarios increases with each additional cluster node beyond the two-node environment we show in this redbook. Best practice is to test each permutation.

Consult your IBM service provider if you want assistance with configuring a more complex configuration.

4.2 Implementing IBM Tivoli Workload Scheduler in a Microsoft Cluster

In this section, we describe how to implement a Tivoli Workload Scheduler engine in a Microsoft Cluster using Microsoft Cluster Service. We cover both a single installation of Tivoli Workload Scheduler, and two copies of Tivoli Workload Scheduler in a mutual takeover scenario. We do not cover how to perform patch upgrades.

For more detailed information about installing IBM Tivoli Workload Scheduler under a windows platform, refer to IBM Tivoli Workload Scheduler Planning and Installation Guide Version 8.2, SC32-1273.

4.2.1 Single instance of IBM Tivoli Workload SchedulerFigure 4-71 on page 348 shows two Windows 2000 systems in a Microsoft Cluster. In the center of this cluster is a shared disk volume, configured in the cluster as volume X, where we intend to install the Tivoli Workload Scheduler engine.


Figure 4-71 Network diagram of the Microsoft Cluster

Once the cluster is set up and configured properly, as described in 3.3, “Implementing a Microsoft Cluster” on page 138, you can install the IBM Tivoli Workload Scheduler software in the shared disk volume X:.

The following steps will guide you through a full installation.

1. Ensure you are logged on as the local Administrator.

2. Ensure that the shared disk volume X:, that it is owned by System 1 (tivw2k1), and that it is online. To verify this, open the Cluster Administrator, as shown in Figure 4-72 on page 349.

tivw2k1tivw2k2

Shared DiskVolume X:

TWSEngine 1




Figure 4-72 Cluster Administrator

3. Insert the IBM Tivoli Workload Scheduler Installation Disk 1 into the CD-ROM drive.

4. Change directory to the Windows folder and run the setup program, which is the SETUP.exe file.


5. Select the language in which you want the wizard to be displayed, and click OK as seen in Figure 4-73.

Figure 4-73 Installation-Select Language


6. Read the welcome information and click Next, as seen in Figure 4-74.

Figure 4-74 Installation-Welcome Information


7. Read the license agreement, select the acceptance radio button, and click Next, as seen in Figure 4-75.

Figure 4-75 Installation-License agreement


8. The Install a new Tivoli Workload Scheduler Agent is selected by default. Click Next, as seen in Figure 4-76.

Figure 4-76 Installation-Install new Tivoli Workload Scheduler


9. Specify the IBM Tivoli Workload Scheduler user name. Spaces are not permitted.

On Windows systems, if this user account does not already exist, it is automatically created by the installation program.

Note the following:

The User name must be a domain user (this is mandatory); specify the name as domain_name\user_name.

Also, type and confirm the password.

Click Next, as seen in Figure 4-77.

Figure 4-77 Installation user information

10.If you specified a user name that does not already exist, an information panel is displayed about extra rights that need to be applied. Review the information and click Next.


11.Specify the installation directory under which the product will be installed.

The directory cannot contain spaces. On Windows systems only, the directory must be located on an NTFS file system. If desired, click Browse to select a different destination directory, and click Next as shown in Figure 4-78.

Figure 4-78 Installation install directory

Note: Make sure that the shared disk is attached to the node that you are installing IBM Tivoli Workload Scheduler on.


12.Select the Custom install option and click Next, as shown in Figure 4-79.

This option will allow the custom installation of just the engine and not the Framework or any other features.

Figure 4-79 Type of Installation


13.Select the type of IBM Tivoli Workload Scheduler workstation you would like to install (Master Domain Manager, Backup Master, Fault Tolerant Agent or a Standard Agent), as this installation will only install the parts of the code needed for each configuration.

If needed, you are able to promote the workstation to a different type of IBM Tivoli Workload Scheduler workstation using this installation program.

Select Master Domain Manager and click Next, as shown in Figure 4-80.

Figure 4-80 Type of IBM Tivoli Workload Scheduler workstation


14.Type in the following information and then click Next, as shown in Figure 4-81:

a. Company Name as you would like it to appear in program headers and reports. This name can contain spaces provided that the name is not enclosed in double quotation marks (“).

b. The IBM Tivoli Workload Scheduler 8.2 name for this workstation. This name cannot exceed 16 characters, cannot contain spaces, and it is not case sensitive.

c. The TCP port number used by the instance being installed. It must be a value in the range 1-65535. The default is 31111.

Figure 4-81 Workstation information


15.In this dialog box you are allowed to select the Tivoli Plus Module and/or the Connector. In this case we do not need these options, so leave them blank and click Next, as shown in Figure 4-82.

Figure 4-82 Extra optional features


16.In this dialog box, as shown in Figure 4-83, you have the option of installing additional languages.

We did not select any additional languages to install at this stage, since this requires the Tivoli Management Framework 4.1 Language CD-ROM be available in addition to Tivoli Framework 4.1 Installation CD-ROM during the install phase.

Figure 4-83 Installation of Additional Languages


17.Review the installation settings and then click Next, as shown in Figure 4-84.

Figure 4-84 Review the installation


18.A progress bar indicates that the installation has started, as shown in Figure 4-85.

Figure 4-85 IBM Tivoli Workload Scheduler Installation progress window


19.After the installation is complete a final summary panel will be displayed, as shown in Figure 4-86. Click Finish to exit the setup program.

Figure 4-86 Completion of a successful install


20.Now that the installation is completed on one side of the cluster (system1), you have to make sure the registry entries are updated on the other side of the cluster pair. The easiest way to do this is by removing the just installed software on system1 (tivw2k1) in the following way:

a. Make sure that all the services are stopped by opening the Services screen. Go to Start -> Settings -> Control Panel. Then open up Administrative Tools ->Services. Verify that Tivoli Netman, Tivoli Token Service and Tivoli Workload Scheduler services are not running.

b. Using Windows Explorer, go to the IBM Tivoli Workload Scheduler installation directory x:\win32app\TWS\TWS82 and remove all files and directories in this directory.

c. Use the Cluster Administrator to verify that the shared disk volume X: is owned by System 2 (tivw2k2), and is online. Open Cluster Administrator, as shown in Figure 4-87.


21.Now install IBM Tivoli Workload Scheduler on the second system by repeating steps 3 through 18.


22.To complete IBM Tivoli Workload Scheduler installation, you will need to add a IBM Tivoli Workload Scheduler user to the database. The install process should have created one for you, but we suggest that you verify that the user exists by running the composer program as shown in Example 4-87.

Example 4-87 Check the user creation

C:\win32app\TWS\maestro82\bin>composerTWS for WINDOWS NT/COMPOSER 8.2 (1.18.2.1)Licensed Materials Property of IBM5698-WKB(C) Copyright IBM Corp 1998,2001US Government User Restricted RightsUse, duplication or disclosure restricted by GSA ADP Schedule Contract with IBMCorp.Installed for user ''.Locale LANG set to "en"-display users tws82#@

CPU id. User Name---------------- ---------------------------------------------TWS82 gb033984USERNAME TWS82#gb033984PASSWORD "***************"ENDAWSBIA251I Found 1 users in @.-

If the user exists in the database, then you will not have to do anything.

23.Next you need to modify the workstation definition. You can modify this by running the composer modify cpu=TWS82 command. This will display the workstation definition that was created during the IBM Tivoli Workload Scheduler installation in an editor.

The only parameter you will have to change is the argument Node; it will have to be changed to the IP address of the cluster. Table 4-5 lists and describes the arguments.

Table 4-5 IBM Tivoli Workload Scheduler workstation definition

Argument Value Description

cpuname TWS82 Type in a workstation name that is appropriate for this workstation. Workstation names must be unique, and cannot be the same as workstation class and domain names.

Description Master CPU Type in a description that is appropriate for this workstation.


OS WNT Specifies the operating system of the workstation. Valid values include UNIX, WNT, and OTHER.

Node 9.3.4.199 This field is the address of the cluster. This address can be a fully-qualified domain name or an IP address.

Domain Masterdm Specify a domain name for this workstation. The default name is MASTERDM.

TCPaddr 31111 Specifies the TCP port number that is used for communications. The default is 31111. If you have two copies of TWS running on the same system, then the port address number must be different.

For Maestro This field has no value, because it is a key word to start the extra options for the workstation.

Autolink On When set to ON, this specifies whether to open the link between workstations at the beginning of each day during startup.

Resolvedep On With this set to ON, this workstation will track dependencies for all jobs and job streams, including those running on other workstations.

Fullstatus On With this set to ON, this workstation will be updated with the status of jobs and job streams running on all other workstations in its domain and in subordinate domains, but not on peer or parent domains.

End This field has no value, because it is a key word to end the workstation definition.



Figure 4-88 illustrates the workstation definition.

Figure 4-88 IBM Tivoli Workload Scheduler Workstation definition

24.After the workstation definition has been modified, you are able to add the FINAL job stream definition to the database which is the script that creates the next day’s production day file. To do this, login as the IBM Tivoli Workload Scheduler installation user and run this command:

Maestrohome\bin\composer add Sfinal

This will add the job and jobstreams to the database.

25.While still logged in as the IBM Tivoli Workload Scheduler installation user, run the batch file Jnextday:

Maetsrohome\Jnextday

Verify that Jnextday has worked correctly by running the conman program:

Maestrohome\bin\conman

In the output, shown in Example 4-88, you should see in the conman header “Batchman Lives”, which indicates that IBM Tivoli Workload Scheduler is installed correctly and is up and running.

Example 4-88 Header output for conman

x:\win32app\TWS\TWS82\bin>conmanTWS for WINDOWS NT/CONMAN 8.2 (1.36.1.7)Licensed Materials Property of IBM5698-WKB(C) Copyright IBM Corp 1998,2001US Government User Restricted RightsUse, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.Installed for user ''.Locale LANG set to "en"


Schedule (Exp) 06/11/03 (#1) on TWS82. Batchman LIVES. Limit: 10, Fence: 0, Audit Level: 0%

26.When a new workstation is created in an IBM Tivoli Workload Scheduler distributed environment, you need to set the workstation limit of concurrent jobs because the default value is set to 0, which means no jobs will run. To change the workstation limit from 0 to 10, enter the following command:

Maestrohome\bin\conman limit cpu=tws82;10

Verify that the command has worked correctly by running the conman show cpus command:

Maestrohome\bin\conman sc=tws82

The conman output, shown in Example 4-89, contains the number 10 in the fifth column, indicating that the command has worked correctly.

Example 4-89 conman output

C:\win32app\TWS\maestro82\bin>conman sc=tws82TWS for WINDOWS NT/CONMAN 8.2 (1.36.1.7)Licensed Materials Property of IBM5698-WKB(C) Copyright IBM Corp 1998,2001US Government User Restricted RightsUse, duplication or disclosure restricted by GSA ADP Schedule Contract with IBMCorp.Installed for user ''.Locale LANG set to "en"Schedule (Exp) 06/11/03 (#1) on TWS82. Batchman LIVES. Limit: 10, Fence: 0, Audit Level: 0sc=tws82CPUID RUN NODE LIMIT FENCE DATE TIME STATE METHOD DOMAINTWS82 1 *WNT MASTER 10 0 06/11/03 12:08 I J MASTERDM

27.Before you configure IBM Tivoli Workload Scheduler in the cluster services, you need to set the three IBM Tivoli Workload Scheduler services to manual start up. Do this by opening the Services Screen.

Go to Start -> Settings -> Control Panel and open Administrative Tools -> Services. Change Tivoli Netman, Tivoli Token Service and Tivoli Workload Scheduler to manual startup.

28.Now you can configure IBM Tivoli Workload Scheduler in the cluster services by creating a new resource for each of the three IBM Tivoli Workload Scheduler services: Tivoli Netman, Tivoli Token Service, and Tivoli Workload Scheduler. These three new resources have to be created in the same Cluster Services Group as the IBM Tivoli Workload Scheduler installation


drive. In this case we used the X: drive, which belongs to cluster group Disk Group1.

29.First create the new resource Tivoli Token Service, as shown in Figure 4-89.

Figure 4-89 New Cluster resource


30.Fill in the first screen (Figure 4-90) as follows, and then click Next:

Name Enter the name you want to use for this resource, such as “Tivoli Token Service”.

Description Enter a description of this resource, such as ”Tivoli Token Service”.

Resource type The resource type of service for “Tivoli Token Service”. Select Generic Service.

Group Select the group where you want to create this resource in. It must be created in the same group as any dependences (such as the installation disk drive or network).

Figure 4-90 Resource values


31.Now you need to select the possible nodes that this resource can run on. In this case, select both nodes as shown in Figure 4-91. Then click Next.

Figure 4-91 Node selection for resource


32.Select all the dependencies that you would like this resource (Tivoli Token Service) to be dependent on.

In this case, you need the disk, network and IP address to be online before you can start the Tivoli Token Service as shown in Figure 4-92. Then click Next.

Figure 4-92 Dependencies for this resource


33.Add in the service parameters for the service “Tivoli Token Service”, then click Next, as shown in Figure 4-93.

Service name To find the service name, open the Windows services panel; go to Start -> Settings -> Control Panel, then open Administrative Tools -> Services.

Highlight the service, then click Action -> Properties. Under the General tab on the first line you can see the service name, which in this case is tws_tokensrv_tws8_2.

Start parameters Enter any start parameters needed for this service (Tivoli Token Service).

In this case, there are no start parameters, so leave this field blank.

Figure 4-93 Resource parameters


34.This screen (Figure 4-94) allows you to replicate registry data to all nodes in the cluster.

In the case of this service, “Tivoli Token Service”, this is not needed, so leave it blank. Then click Finish.

Figure 4-94 Registry Replication

35.Figure 4-95 should then be displayed, indicating that the resource has been created successfully. Click OK.

Figure 4-95 Cluster resource created successfully


36.Now create a new resource for the Tivoli Netman service by repeating step 27 (shown in Figure 4-89 on page 369).

37.Fill in the resource values in the following way, then click Next.

Name Enter the name you want to use for this resource, such as “Tivoli Netman Service”.

Description Enter a description of this resource, such as “Tivoli Netman Service”.

Resource type The resource type of service for Tivoli Netman Service. Select Generic Service.

Group Select the group where you want to create this resource in. It must be created in the same group as any dependences (such as the installation disk drive or network).

38.Select the possible nodes that this resource can run on.

In this case select both nodes, then click Next.


39.Select all the dependencies that you would like this resource (Tivoli Netman Service) to be dependent on.

In this case we only need the Tivoli Token Service to be online before we can start the Tivoli Netman Service, because Tivoli Token Service will not start until the disk, network and IP address are available, as shown in Figure 4-96.

Then click Next.

Figure 4-96 Dependencies for IBM Tivoli Workload Scheduler Netman service

40.Add in the service parameters for the service “Tivoli Netman Service” with the following parameters, then click Next.

Service name To find the service name, open the Windows services panel. Go to Start -> Settings -> Control Panel, then open Administrative Tools -> Services. Highlight the service, then click Action -> Properties. Under the General tab on the first line you can see the service name, which in this case is tws_netman_tws8_2.

Start parameters Enter start parameters needed for the service “Tivoli Netman Service”. In this case, there are no start parameters so leave this field blank.


41.Repeat steps 32 and 33 by clicking Finish, which should then bring you to a window indicating that the resource was created successfully. Then click OK.

42.Now create a new resource for the IBM Tivoli Workload Scheduler by repeating step 27, as shown in Figure 4-89 on page 369.

43.Fill out the resource values in the following way; when you finish, click Next:

Name Enter the name you want to use for this resource, such as “TWS Workload Scheduler”.

Description Enter a description of this resource, such as “TWS Workload Scheduler”.

Resource type Select the resource type of service for “TWS Workload Scheduler”. Select Generic Service.

Group Select the group where you want to create this resource in. It must be created in the same group as any dependences like the installation disk drive or network.

44.Select the possible nodes that this resource can run on.

In this case, select both nodes. Then click Next.

45.Select all dependencies that you would like this resource, “TWS Workload Scheduler”, to be dependent on.

In this case we only need the Tivoli Netman Service to be online before we can start the TWS Workload Scheduler, because Tivoli Netman Service will not start until the Tivoli Token Service is started, and Tivoli Token Service will not start until the disk, network and IP address are available.

When you finish, click Next.

46.Add in the service parameters for this service, “TWS Workload Scheduler”, with the following parameters, then click Next.

Service name To find the service name, open the Windows services panel. Go to Start -> Settings -> Control Panel, then open Administrative Tools -> Services. Highlight the service, then click Action -> Properties. Under the General tab on the first line you can see the service name, which in this case is tws_maestro_tws8_2.

Start parameters Enter start parameters needed for this service “Tivoli NetmanService”. In this case there are no start parameters, so leave this field blank.

47.Repeat steps 32 and 33 by clicking Finish, which should then display a screen indicating that the resource was created successfully. Then click OK.


48.At this point all three resources have been created in the cluster. Now you need to change some of the advanced parameters—but only in the TWS Workload Scheduler resource.

To do this, open the Cluster Administrator tool. Click the Group that you have defined the TWS Workload Scheduler resource in. Highlight the resource and click Action -> Properties, as shown in Figure 4-97.



49.Now click the Advanced tab, as shown in Figure 4-98, and change the Restart to Do not Restart. Then click OK.

Figure 4-98 The Advanced tab

4.2.2 Configuring the cluster groupEach cluster group has a set of settings that affect the way the cluster fail over and back again. In this section we cover the different options and how they affect TWS. We describe the three main tabs used when dealing with the properties of the cluster group.

To modify any of these options:

1. Open Cluster Administrator.

2. In the console tree (usually the left pane), click the Groups folder.

3. In the details pane (usually the right pane), click the appropriate group.

4. On the File menu, click Properties.

5. On the General tab, next to Preferred owners, click Modify.


The General tab is shown in Figure 4-99. Using this tab, you can define the following:

Name Enter the name of the cluster.

Description Enter a description of this cluster.

Preferred owner Select the preferred owner of this cluster. If no preferred owners are specified, failback does not occur, but if more than one node is listed under preferred owners, priority is determined by the order of the list. The group always tries to fail back to the highest priority node that is available.

Figure 4-99 General tab for Group Properties


The Failover tab is shown in Figure 4-100.Using this table, you can define the following:

Threshold Enter a number. This number means the number of times the cluster can fail over within a set time period. To set an accurate number, consider how long it takes for all products in this group to come back online. Also consider the fact that if a services is not available on both sides of the cluster then the cluster software will continue to move it from side to side until it becomes available or the time period is reached.

Period Enter a period of time in which to monitor the cluster, and if it moves more than the threshold number within this period, then do not move any more.

Figure 4-100 Failover tab for Group Properties


The Failback tab is shown in Figure 4-101 on page 383. Using this tab gives you the choice of two options on where this cluster can or cannot fail back, as follows:

Prevent failback If Prevent fallback is set, and provided that all dependences of the cluster are met, this group will run on this side of the cluster until there is a problem and the group will move again. The other way the group can move is by the Cluster Administrator.

Allow failback If Allow fallback is set, then you have two further options: Immediately, and Fallback between.

If Immediately is set, then the Group will try to move the cluster back immediately.

If Fallback between is set, which is the preferred option, then you can define a time from and to where you would like the cluster group to move back. We recommend using a period of time before Jnextday, but yet allowing enough time for the Group to come back online before Jnextday has to start. But if no preferred owners are specified for the group, then failback does not occur.


Figure 4-101 Failover tab for Group Properties

4.2.3 Two instances of IBM Tivoli Workload Scheduler in a clusterIn this section, we describe how to install two instances of IBM Tivoli Workload Scheduler 8.2 Engine (Master Domain Manager) in a Microsoft Cluster. The configuration will be in a mutual takeover mode, which means that when one side of the cluster is down, you will have two copies of IBM Tivoli Workload Scheduler running on the same node. This configuration is shown in Figure 4-102 on page 384.


Figure 4-102 Network diagram of the Microsoft Cluster

1. Before starting the installation, some careful planning must take place. To plan most efficiently, you need the following information.

Workstation type You need to understand both types of workstations to be installed in the cluster, as this may have other dependences (such as JSC and Framework connectivity) as well as installation requirements.

In this configuration we are installing two Master Domain Managers (MDMs).

Location of the code This code should be installed on a file system that is external to both nodes in the cluster, but also accessible by both nodes. The location should also be in the same part of the file system (or at least the same drive) as the application that the IBM Tivoli Workload Scheduler engine is going to manage.

You also need to look at the way the two instances of IBM Tivoli Workload Scheduler will work together, so you need to make sure that the directory structure does not overlap.

Finally, you we need sufficient disk space to install IBM Tivoli Workload Scheduler into. Refer to IBM Tivoli

tivw2k1 tivw2k2

Shared DiskVolume Y:

TWSEngine 2



Shared DiskVolume X:

TWSEngine 1


Workload Scheduler Release Notes V 82 SC32-1277, for information about these requirements.

In this configuration, we will install one copy of IBM Tivoli Workload Scheduler 8.2 in the X drive and the other in the Y drive.

Installation user Each instance of IBM Tivoli Workload Scheduler needs an individual installation user name, because this user is used to start the services for this instance of IBM Tivoli Workload Scheduler. This installation user must exist on both sides of the cluster, because the IBM Tivoli Workload Scheduler instance can run on both sides of the cluster.

It also needs its own home directory to run in, and this home directory must be the same location directory, for the same reasons described in the Location of the code section.

In our case, we will use the same names as the Cluster group names. For the first installation, we will use TIVW2KV1; for the second installation, we will use TIVW2KV2.

Naming convention Plan your naming convention carefully, because it is difficult to change some of these objects after installing IBM Tivoli Workload Scheduler (in fact, it is easier to reinstall rather than change some objects).

The naming convention that you need to consider will be used for installation user names, workstation names, cluster group names, and the different resource names in each of the cluster groups. Use a naming convention that makes it easy to understand and identify what is running where, and that also conforms to the allowed maximum characters for that object.

Netman port This port is used for listening for incoming requests, and because we have a configuration where two instances of IBM Tivoli Workload Scheduler can be running on the same node (mutual takeover scenario), we need to set two different port numbers for each listing instance of IBM Tivoli Workload Scheduler.

The two port numbers that are chosen must not conflict with any other network products installed on these two nodes.


In this installation we use port number 31111 for the first installation of TIVW2KV1, and port 31112 for the second installation of YTIVW2KV2.

IP address The IP address that you define in the workstation definition for each IBM Tivoli Workload Scheduler instance should not be an address that is bound to a particular node, but the one that is bound to the cluster group. This IP address should be addressable from the network. If the two IBM Tivoli Workload Scheduler instances are to move separately, then you will need two IP addresses, one for each cluster group.

In this installation, we use 9.3.4.199 for cluster group TIVW2KV1, and 9.3.4.175 for cluster group TIVW2KV2.

2. After gathering all the information in step 2 and deciding on a naming convention, you can install the first IBM Tivoli Workload Scheduler engine in the cluster. To do this, repeat steps 1 through to 20 in “4.2.1, “Single instance of IBM Tivoli Workload Scheduler” on page 347, but use the parameters listed in Table 4-6.



InstallationUser Name

TIVW2KV1 In our case, we used the name of the cluster group as the installation user name.

Password TIVW2KV1 To keep the installation simple, we used the same password as the installation user name. However, in a real customer installation, you would use the password provided by the customer.

Destination Directory

X:\win32app\tws\tivw2kv1

This has to be installed on the disk that is associated with cluster group TIVW2KV1. In our case, that is the X drive.

Company Name

IBM ITSO This is used for the heading of reports, so enter the name of the company that this installation is for. In our case, we used IBM ITSO.

Master CPU name

TIVW2KV1 Because we are installing a Master Domain Manager, the Master CPU name is the same as This CPU name.


3. When you get to step 20, replace the Installation Arguments with the values listed in Table 4-6 on page 386.

4. When you get to step 22, replace the workstation definition with the arguments listed inTable 4-7.


TCP port Number

31111 This specifies the TCP port number that is used for communications. The default is 31111.If you have two copies of IBM Tivoli Workload Scheduler running on the same system, then the port address number must be different.


cpuname TIVW2KV1 Verify that the workstation name is TIVW2KV1, as this should be filled in during the installation.

Description Master CPU for the first cluster group

Enter a description that is appropriate for this workstation.


Node 9.3.4.199 This field is the address that is associated with the first cluster group. This address can be a fully-qualified domain name or an IP address.


TCPaddr 31111 Specifies the TCP port number that is used for communication. The default is 31111. If you have two copies of IBM Tivoli Workload Scheduler running on the same system, then the port address number must be different.


Autolink On When set to ON, this specifies whether to open the link between workstations at the beginning of each day during startup.

Resolvedep On When set to ON, this workstation will track dependencies for all jobs and job streams, including those running on other workstations.



5. Now finish off the first installation by repeating steps 23 through to 27.

However, at step 25, use the following command:

Maestrohome\bin\conman limit cpu=tivw2kv1;10

To verify that this command has worked correctly, run the conman show cpus command:

Maestrohome\bin\conman sc=tivw2kv1The

The conman output, shown in Example 4-90, contains the number 10 in the fourth column, illustrating that the command has worked correctly.


X:\win32app\TWS\tivw2kv1\bin>conman sc=tws82TWS for WINDOWS NT/CONMAN 8.2 (1.36.1.7)Licensed Materials Property of IBM5698-WKB(C) Copyright IBM Corp 1998,2001US Government User Restricted RightsUse, duplication or disclosure restricted by GSA ADP Schedule Contract with IBMCorp.Installed for user ''.Locale LANG set to "en"Schedule (Exp) 06/11/03 (#1) on TIVW2KV1. Batchman LIVES. Limit: 10, Fence: 0, Audit Level: 0sc=tivw2kv1CPUID RUN NODE LIMIT FENCE DATE TIME STATE METHOD DOMAINTIVW2KV1 1 *WNT MASTER 10 0 06/11/03 12:08 I J MASTERDM





6. After installing the first IBM Tivoli Workload Scheduler instance in the cluster you can now install the second IBM Tivoli Workload Scheduler engine in the cluster by repeating steps 1 through to 20 in “4.2.1, “Single instance of IBM Tivoli Workload Scheduler” on page 347, using the parameters listed in Table 4-8.


7. When you get to step 20, replace the Installation Arguments with the values in Table 4-8.


InstallationUser Name

TIVW2KV2 In this case, we used the name of the cluster group as the installation user name.

Password TIVW2KV2 To keep this installation simple, we used the same password as the installation user name, but in a real customer installation you would use the password provided by the customer.

Destination Directory

Y:\win32app\tws\tivw2kv2

This has to be installed on the disk that is associated with cluster group TIVW2KV2; in this case, that is the Y drive.

Company Name

IBM ITSO This is used for the heading of reports, so enter the name of the Company that this installation is for. In our case, we used “IBM ITSO”.

Master CPU name

TIVW2KV2 Because we are installing a Master Domain Manager, the Master CPU name is the same as This CPU name.

TCP Port Number

31112 Specifies the TCP port number that is used for communication. The default is 31111.If you have two copies of IBM Tivoli Workload Scheduler running on the same system, then the port address number must be different.


8. When you get to step 22, replace the workstation definition with the arguments list in Table 4-9.


9. Now finish the first installation by repeating steps 23 through 27.


cpuname TIVW2KV2 Check that the workstation name is TIVW2KV2, as this should be filled in during the installation.

Description Master CPU for the first cluster group

Type in a description that is appropriate for this workstation.


Node 9.3.4.175 This field is the address that is associated with the first cluster group. This address can be a fully-qualified domain name or an IP address.


TCPaddr 31112 Specifies the TCP port number that is used for communication. The default is 31111. If you have two copies of IBM Tivoli Workload Scheduler running on the same system, then the port address number must be different.


Autolink On When set to ON, it specifies whether to open the link between workstations at the beginning of each day during startup.

Resolvedep On With this set to ON, this workstation will track dependencies for all jobs and job streams, including those running on other workstations.




However, when you reach step 25, use the following command:

Maestrohome\bin\conman limit cpu=tivw2kv2;10

Run the conman show cpus command to verify that the command has worked correctly:

Maestrohome\bin\conman sc=tivw2kv2

The conman output, shown in Example 4-91, contains the number 10 in the fourth column, indicating that the command has worked correctly.


Y:\win32app\TWS\tivw2kv2\bin>conman sc=tws82TWS for WINDOWS NT/CONMAN 8.2 (1.36.1.7)Licensed Materials Property of IBM5698-WKB(C) Copyright IBM Corp 1998,2001US Government User Restricted RightsUse, duplication or disclosure restricted by GSA ADP Schedule Contract with IBMCorp.Installed for user ''.Locale LANG set to "en"Schedule (Exp) 06/11/03 (#1) on TIVW2KV2. Batchman LIVES. Limit: 10, Fence: 0, Audit Level: 0sc=tivw2kv2CPUID RUN NODE LIMIT FENCE DATE TIME STATE METHOD DOMAINTIVW2KV2 1 *WNT MASTER 10 0 06/11/03 12:08 I J MASTERDM

10.The two instances of IBM Tivoli Workload Scheduler are installed in the cluster. Now you need to configure the cluster software so that the two copies of IBM Tivoli Workload Scheduler will work in a mutual takeover.

11.You can configure the two instances of IBM Tivoli Workload Scheduler in the cluster services by creating two sets of new resources for each of the three IBM Tivoli Workload Scheduler services: Tivoli Netman, Tivoli Token Service and Tivoli Workload Scheduler.

These two sets of three new resources have to be created in the same cluster group as the IBM Tivoli Workload Scheduler installation drive. The first set (TIVW2KV1) was installed in the X drive, so this drive is associated with cluster group “TIVW2KV1” . The second set (TIVW2KV2) was installed in the Y drive, so this drive is associated with cluster group “TIVW2KV2”.

12.Create the new resource “Tivoli Token Service” for the two IBM Tivoli Workload Scheduler engines by repeating steps 28 through to 34 in 4.2.1, “Single instance of IBM Tivoli Workload Scheduler” on page 347. Use the parameters in Table 4-10 on page 392 for the first set (TIVW2KV1), and use the parameters in Table 4-11 on page 392 for the second set (TIVW2KV2).


Table 4-10 Tivoli Token Service definition for first instance

Table 4-11 Tivoli Token Service definition for second instance

REF figure Argument Value Description

4-90 Name ITIVW2KV1 - Token Service

Enter the name of the new resource. In our case, we used the cluster group name followed by the service.

4-90 Description Tivoli Token Service for the first instance

Enter a description of this resource “Tivoli Token Service for the first instance”.

4-90 Resource type

Generic Service

Select the resource type of service for “ITIVW2KV1 - Token Service”. Select Generic Service.

4-90 Group ITIVW2KV1 Select the group where you want to create this resource in.

4-93 Service name

tws_tokensrv_TIVW2KV1

Enter the service name; this can be found in the services panel.

4-93 Start parameters

This service does not need any start parameters, so leave this blank.


4-90 Name ITIVW2KV2 - Token Service


4-90 Description Tivoli Token Service for the second instance

Enter a description of this resource “Tivoli Token Service for the second instance”.

4-90 Resource type

Generic Service

Select the resource type of service for “ITIVW2KV2 - Token Service”. Select Generic Service.


4-93 Service name

tws_tokensrv_TIVW2KV2

Enter the service name; this can be found in the services panel


This service dose not need any start parameters, so leave this blank.


13.Create the new resource “Tivoli Netman Service” for the two IBM Tivoli Workload Scheduler engines by repeating steps 35 through to 40 in 4.2.1, “Single instance of IBM Tivoli Workload Scheduler” on page 347.

Use the parameters in Table 4-12 for the first set (TIVW2KV1) and use the parameters in Table 4-13 for the second set (TIVW2KV2) below.

Table 4-12 Tivoli Netman Service definition for first instance

Table 4-13 Tivoli Netman Service definition for second instance


4-90 Name ITIVW2KV1 - Netman Service

Enter the name of the new resource. In this case, we used the cluster group name followed by the service.

4-90 Description Tivoli Netman Service for the first instance

Enter a description of this resource “Tivoli Netman Service for the first instance”.

4-90 Resource type

Generic Service

Select the resource type of service for “ITIVW2KV1 - Netman Service”. Select Generic Service.


4-93 Service name

tws_netman_TIVW2KV1

Type in the service name; this can be found in the services panel.



4-96 Resource Dependencies

ITIVW2KV1 - Token Service

The only resource dependency is the ITIVW2KV1 - Token Service.

REF Figure Argument Value Description

4-90 Name ITIVW2KV2 - Netman Service


4-90 Description Tivoli Netman Service for the second instance

Enter a description of this resource “Tivoli Netman Service for the second instance”.


14.Create the new resource “Tivoli Workload Scheduler” for the two IBM Tivoli Workload Scheduler engines by repeating steps 41 through to 48 in 4.2.1, “Single instance of IBM Tivoli Workload Scheduler” on page 347.

Use the parameters inTable 4-14 for the first set (TIVW2KV1) and use the parameters in Table 4-15 on page 395 for the second set (TIVW2KV2).

Table 4-14 Tivoli Workload Scheduler definition for first instance

4-90 Resource type

Generic Service

Select the resource type of service for “ITIVW2KV2 - Netman Service”. Select Generic Service.


4-93 Service name

tws_netman_TIVW2KV2

Type in the service name; this can be found in the services panel.




ITIVW2KV2 - Token Service

The only resource dependency is the ITIVW2KV2 - Token Service.


4-90 Name ITIVW2KV1 - Tivoli Workload Scheduler


4-90 Description Tivoli Workload Scheduler for the first instance

Enter a description of this resource “Tivoli Workload Scheduler for the second instance”.

4-90 Resource type

Generic Service

Select the resource type of service for “ITIVW2KV1 - Tivoli Workload Scheduler”. Select Generic Service.

4-90 Group ITIVW2KV1 Select the where you want to create this resource in.

4-93 Service name

tws_maestro_TIVW2KV1


REF Figure Argument Value Description


Table 4-15 Tivoli Workload Scheduler definition for second instance

15.All resources are set up and configured correctly. Now configure the cluster groups by going through the steps in 4.2.2, “Configuring the cluster group” on page 379.

Use the parameters in Table 4-16 on page 396 for the first set (TIVW2KV1), and use the parameters in Table 4-13 on page 393 for the second set (TIVW2KV2).




ITIVW2KV1 - Netman Service

The only resource dependency is the ITIVW2KV1 - Netman Service.


4-90 Name ITIVW2KV2 - Tivoli Workload Scheduler


4-90 Description Tivoli Workload Scheduler for the second instance

Enter a description of this resource “Tivoli Workload Scheduler for the second instance”.

4-90 Resource type

Generic Service

Select the resource type of service for “ITIVW2KV2 - Tivoli Workload Scheduler”. Select Generic Service.


4-93 Service name

tws_maestro_TIVW2KV2





ITIVW2KV2 - Netman Service

The only resource dependency is the ITIVW2KV2 - Netman Service.



Table 4-16 Cluster group settings for first instance

16.You now have the two instances of IBM Tivoli Workload Scheduler engine installed on both sides and configured within the cluster, and also the cluster configured in the best way to satisfy IBM Tivoli Workload Scheduler.

17.To test this installation, open the Cluster Administrator. Expand the group to show the two groups. Highlight one “TIVW2KV1”. Go to File -> Move Group. All resources should go offline, and then the owner should change from TIVW2K1 to TIVW2K2. Then all resources should come back online, with the new owner.

4.2.4 Installation of the IBM Tivoli Management FrameworkThe IBM Tivoli Management Framework (Tivoli Framework) is used as an authenticating layer for any user that is using the Job Scheduling Console to connect with the IBM Tivoli Workload Scheduler engine. There are two products that get installed in the Framework: Job Scheduling Services (JSS), and the Job


4-99General tab

Name ITIVW2KV1 Group

This name should be there by default. If it is not, then verify that the correct group is selected.

4-99General tab

Description This group is for the first instance of IBM Tivoli Workload Scheduler

Enter a description of this group.

4-99General Tab

Preferred owner

TIVW2KV1 Select the preferred owner for this group. We selected TIVW2KV1.

4-100Failover Tab

Threshold 10 Enter a number to define that this group can fail over within a set time period.

4-100Failover Tab

Period 6 Enter a number to define that this group can fail over within this period. We selected 6 hours.

4-101Failback Tab

Allow failback

Check Allow Failback

This will enable the facility to failback to the preferred owner.

4-101Failback Tab

Failback between

4 and 6 Enter the time range that you would like the group to failback.


Scheduling Console (JSC). They make up the connection between the Job Scheduling Console and the IBM Tivoli Workload Scheduler engine, as shown in Figure 4-103.

Figure 4-103 IBM Tivoli Workload Scheduler user authentication flow

There are a number of ways to install the Tivoli Framework. You can install the Tivoli Framework separately from the IBM Tivoli Workload Scheduler engine. In this case, install the Tivoli Framework before installing IBM Tivoli Workload Scheduler.

If there is no Tivoli Framework installed on the system, you can use the Full install option when installing IBM Tivoli Workload Scheduler. This will install the Tivoli Management Framework 4.1, Job Scheduling Services (JSS), Job Scheduling Connector (JSC), and add the Tivoli Job Scheduling administration user.

In this section, we describe how to install the IBM Tivoli Management Framework separately. After or before IBM Tivoli Workload Scheduler is configured for Microsoft Cluster and made highly available, you can add IBM Tivoli

Job Scheduling Console (JSC)

TWS Extensions

OPC Extensions

GUI base code

Job Scheduling Console (JSC)

TWS Extensions

OPC Extensions

GUI base code

TMR


Management Framework so that the Job Scheduling Console component of IBM Tivoli Workload Scheduler can be used.

Because the IBM Tivoli Management Framework is not officially supported in a mutual takeover mode, we will install on the local disk on each side of the cluster, as shown in Figure 4-104.

Figure 4-104 Installation location for TMRs

The following instructions are only a guide to installing the Tivoli Framework. For more detailed information, refer to Tivoli Enterprise Installation Guide Version 4.1, GC32-0804.

To install Tivoli Framework, follow these steps:

1. Select node1 to install the Tivoli Framework on. In our configuration, node 1 is called TIVW2K1.

2. Insert the Tivoli Management Framework (1 of 2) CD into the CD-ROM drive, or map the CD from a drive on a remote system.

3. From the taskbar, click Start, and then select Run to display the Run window.

4. In the Open field, type x:\setup, where x is the CD-ROM drive or the mapped drive. The Welcome window is displayed.

Note: IBM Tivoli Management Framework should be installed prior to IBM Tivoli Workload Scheduler Connector installation. For instructions on installing a TMR server, refer to Chapter 5 of Tivoli Enterprise Installation Guide Version 4.1, GC32-0804.

Here, we assume that you have already installed Tivoli Management Framework, and have applied the latest set of fix packs.

Node1 Node2

SharedDisk



LocalDisk

LocalDisk

On this local disk wewill install the TMR,

JSS and JSC

On this local disk wewill install the TMR,

JSS and JSC


5. Click Next. The License Agreement window is displayed.

6. Read the license agreement and click Yes to accept the agreement. The Accounts and File Permissions window is displayed.

7. Click Next. The Installation Password window is displayed.

8. 7.In the Installation Password window, perform the following steps:

a. In the Password field, type an installation password, if desired. If you specify a password, this password must be used to install Managed Nodes, to create interregional connections, and to perform any installation using Tivoli Software Installation Service.

b. Click Next. The Remote Access Account window is displayed.

9. In the Remote Access Account window, perform the following steps:

a. Type the Tivoli remote access account name and password through which Tivoli programs will access remote file systems. If you do not specify an account name and password and you use remote file systems, Tivoli programs will not be able to access these remote file systems.

b. Click Next. The Setup Type window is displayed.

10.In the Setup Type window, do the following:

a. Select one of the following setup types:

• Typical - Installs the IBM Tivoli Management Framework product and its documentation library.

• Compact - Installs only the IBM Tivoli Management Framework product.

• Custom - Installs the IBM Tivoli Management Framework components that you select.

Note: During installation the specified password becomes the installation and the region password. To change the installation password, use the odadmin region set_install_pw command. To change the region password, use the odadmin region set_region_pw command.

Note that if you change one of these passwords, the other password is not automatically changed.

Note: If you are using remote file systems, the password must be at least one character. If the password is null, the object database is created, but you cannot start the object dispatcher (the oserv service).


b. Accept the default destination directory or click Browse to select a path to another directory on the local system.

c. Click Next. If you selected the Custom option, the Select Components window is displayed. If you selected Compact or Typical, go to step 12.

11.(Custom setup only) In the Select Components window, do the following:

a. Select the components to install. From this window you can preview the disk space required by each component, as well as change the destination directory.

b. If desired, click Browse to change the destination directory.

c. Click Next. The Choose Database Directory window is displayed.

12.In the Choose Database Directory window, do the following:

a. Accept the default destination directory, or click Browse to select a path to another directory on the local system.

b. Click Next. The Enter License Key window is displayed.

13.In the Enter License Key window, do the following:

a. In the Key field, type: “IBMTIVOLIMANAGEMENTREGIONLICENSEKEY41”.

b. Click Next. The Start Copying Files window is displayed.

14.Click Next. The Setup Status window is displayed.

15.After installing the IBM Tivoli Management Framework files, the setup program initializes the Tivoli object dispatcher server database. When the initialization is complete, you are prompted to press any key to continue.

16.If this is the first time you installed IBM Tivoli Management Framework on this system, you are prompted to restart the machine.

17.After the installation completes, configure the Windows operating system for SMTP e-mail. From a command line prompt, enter the following commands:

%SystemRoot%\system32 \drivers \etc \tivoli \setup_env.cmdbashwmailhost hostname

18.Tivoli Management Framework is installed on node 1, so now install it on node 2. In our configuration, node 2 is called TIVW2K2.

Note: Do not install on remote file systems or share Tivoli Framework files among systems in a Tivoli environment.

Tip: Rebooting the system loads the TivoliAP.dll file.


19.Log into node 2 (TIVW2K2) and repeat steps 2 through to 17.

4.2.5 Installation of Job Scheduling ServicesTo install IBM Workload Scheduler Job Scheduling Services 8.2, you must have the following component installed within your IBM Tivoli Workload Scheduler 8.2 network:

� Tivoli Framework 3.7.1 or 4.1

You must install the Job Scheduling Services on the Tivoli Management Region server or on a Managed Node on the same workstation where the Tivoli Workload Scheduler engine code is installed.

You can install and upgrade the components of the Job Scheduling Services using any of the following installation mechanisms:

� By using an installation program, which creates a new Tivoli Management Region server and automatically installs or upgrades the IBM Workload Scheduler Connector and Job Scheduling Services

� By using the Tivoli desktop, where you select which product and patches to install on which machine

� By using the winstall command provided by Tivoli Management Framework, where you specify which products and patches to install on which machine

Here we provide an example of installing the Job Scheduling Services using the Tivoli Desktop. Ensure you have set the Tivoli environment by issuing the command c:\windirsystem32\drivers\etc\Tivoli\setup_env.cmd, then follow these steps:

1. First select node1 to install the Tivoli Job Scheduling Services on. In our configuration, node 1 is called TIVW2K1.

2. Open the Tivoli Desktop on TIVW2K1.

3. From the Desktop menu choose Install, then Install Product. The Install Product window is displayed.

Note: You only have to install this component if you wish to monitor or access the local data on the Tivoli Workload Scheduler engine by the Job Scheduling Console.

Note: Before installing any new product into the Tivoli Management Region server. make a backup of the Tivoli database.


4. Click Select Media to select the installation directory. The File Browser window is displayed.

5. Type or select the installation path. This path includes the directory containing the CONTENTS.LST file.

6. Click Set Media & Close. You return to the Install Product window.

7. In the Select Product to Install list, select Tivoli Job Scheduling Services v. 1.2.

8. In the Available Clients list, select the nodes to install on and move them to the Clients to Install On list.

9. In the Install Product window, click Install. The Product Install window is displayed, which shows the operations to be performed by the installation program.

10.Click Continue Install to continue the installation, or click Cancel to cancel the installation.

11.The installation program copies the files and configures the Tivoli database with the new classes. When the installation is complete, the message Finished product installation appears. Click Close.

12.Now select node 2 to install the Tivoli Job Scheduling Services on. In our configuration, node 2 is called TIVW2K2.

13.Repeat steps 2 through to 11.

4.2.6 Installation of Job Scheduling ConnectorTo install IBM Workload Scheduler Connector 8.2, you must have the following components installed within your Tivoli Workload Scheduler 8.2 network:

� Tivoli Framework 3.7.1 or 4.1

� Tivoli Job Scheduling Services 1.3

You must install IBM Tivoli Workload Scheduler Connector on the Tivoli Management Region server or on a Managed Node on the same workstation where the Tivoli Workload Scheduler engine code is installed.

You can install and upgrade the components of IBM Tivoli Workload Scheduler Connector using any of the following installation mechanisms:

Note: You only have to install this component if you wish to monitor or access the local data on the Tivoli Workload Scheduler engine by the Job Scheduling Console.


� By using an installation program, which creates a new Tivoli Management Region server and automatically installs or upgrades IBM Workload Scheduler Connector and Job Scheduling Services

� By using the Tivoli Desktop, where you select which product and patches to install on which machine

� By using the winstall command provided by Tivoli Management Framework, where you specify which products and patches to install on which machine

Connector installation and customization varies, depending on whether your Tivoli Workload scheduler master is on a Tivoli Server or a Managed Node.

� When the Workload Scheduler master is on a Tivoli server, you must install both Job Scheduling Services and the Connector on the Tivoli server of your environment. You must also create a Connector instance for the Tivoli server. You can do this during installation by using the Create Instance check box and completing the required fields. In this example, we are installing the connector in this type of configuration.

� When the Workload Scheduler master is on a Managed Node, you must install Job Scheduling Services on the Tivoli Server and on the Managed Node where the master is located. You must then install the Connector on the Tivoli server and on the same nodes where you installed Job Scheduling Services. Ensure that you do not select the Create Instance check box.

� If you have more than one node where you want to install the Connector (for example, if you want to access the local data of a fault-tolerant agent through the Job Scheduling Console), you can install Job Scheduling Services and the connector on multiple machines. However, in this case you should deselect the Create Instance check box.

Following is an example of how to install the Connector using the Tivoli Desktop. Ensure you have installed Job Scheduling Services and have set the Tivoli environment. Then follow these steps:

1. Select node 1 to install Tivoli Job Scheduling Connector on. In our configuration, node 1 is called TIVW2K1.

2. Open the Tivoli Desktop on TIVW2K1.

3. From the Desktop menu choose Install, then Install Product. The Install Product window is displayed.

4. Click Select Media to select the installation directory. The File Browser window is displayed.

Note: Before installing any new product into the Tivoli Management Region server, make a backup of the Tivoli database.


5. Type or select the installation path. This path includes the directory containing the CONTENTS.LST file.

6. Click Set Media & Close. You will return to the Install Product window.

7. In the Select Product to Install list, select Tivoli TWS Connector v. 8.2 The Install Options window is displayed.

8. This window enables you to:

– Install the Connector only.

– Install the Connector and create a Connector instance.

9. To install the Connector without creating a Connector instance, leave the Create Instance check box blank and leave the General Installation Options fields blank. These fields are used only during the creation of the Connector Instance.

10.To install the Connector and create a Connector Instance:

a. Select the Create Instance check box.

b. In the TWS directory field, specify the directory where IBM Tivoli Workload Scheduler is installed.

c. In the TWS instance name field, specify a name for the IBM Tivoli Workload Scheduler instance on the Managed Node. This name must be unique in the network. It is preferable to use the name of the scheduler agent as the instance name.

11.Click Set to close the Install Options window and return to the Install Product window.

12.In the Available Clients list, select the nodes to install on and move them to the Clients to Install On list.

13.In the Install Product window, click Install. The Product Install window is displayed, which shows you the progress of the installation.

14.Click Continue Install to continue the installation, or click Cancel to cancel the installation.

15.The installation program copies the files and configures the Tivoli database with the new classes. When the installation is complete, the message Finished product installation appears. Click Close.

16.Now select node 2 to install the Tivoli Job Scheduling Connector on. In our configuration, node 2 is called TIVW2K2.

17.Repeat steps 2 through to 15.


4.2.7 Creating Connector instancesYou need to create one Connector instance on each Framework server (one on each side of the cluster) for each engine that you want to access with the Job Scheduling Console. If you selected the create instance check box when running the installation program or installing from the Tivoli desktop, you do not need to perform the following procedure, but in our environment we do needed to do this.

To create Connector instances from the command line, ensure you set the Tivoli environment, then enter the following command on the Tivoli server or Managed Node where you installed the Connector that you need to access through the Job Scheduling Console:

wtwsconn.sh -create -h node -n instance_name -t TWS_directory

So in our case we need to run this four times, twice on one Framework server, and twice on the other, using these parameters:

First, on node TIVW2K1

wtwsconn.sh -create -n TIVW2K1_rg1 -t X:\win32app\TWS\TWS82wtwsconn.sh -create -n TIVW2K2_rg1 -t Y:\win32app\TWS\TWS82

Then on node TIVW2K2

wtwsconn.sh -create -n TIVW2K1_rg2 -t X:\win32app\TWS\TWS82wtwsconn.sh -create -n TIVW2K2_rg2 -t Y:\win32app\TWS\TWS82

4.2.8 Interconnecting the two Tivoli Framework ServersNow that we have successfully installed and configured the two instances of the IBM Tivoli Workload Scheduler engine on the shared disk system in the Microsoft Cluster (4.2.3, “Two instances of IBM Tivoli Workload Scheduler in a cluster” on page 383) and the two Tivoli Management Frameworks, one on each workstation in the cluster on the local disk (4.2.4, “Installation of the IBM Tivoli Management Framework” on page 396).

Also we have successfully installed the Job Scheduling Services (4.2.5, “Installation of Job Scheduling Services” on page 401), and Job Scheduling Connectors in both of the Tivoli Management Framework.

We now need to share the IBM Tivoli Management Framework resources so that if one side of the cluster is down, then the operator can log into the other Tivoli Management Framework and see both IBM Tivoli Workload Scheduler engines through the connectors. To achieve this we need to share the resources between the two Tivoli Framework servers; this is called interconnection.


Framework interconnection is a complex subject. We will show how to interconnect the Framework servers for our environment, but you should plan your interconnection if your installation of IBM Tivoli Workload Scheduler is part of a larger Tivoli Enterprise environment

To interconnect the Framework servers for IBM Tivoli Workload Scheduler for the environment used in this redbook, first ensure you have set the Tivoli environment by issuing c:\windirsystem32\drivers\etc\Tivoli\setup_env.cmd

Then follow these steps:

1. Before starting, make a backup of the IBM Tivoli Management Framework object database using the wbkupdb command. Log onto each as the Windows Administrator, and run a backup of the object database on each node.

2. Run the wlookup commands on the cluster node 1 to determine that the Framework objects exists before interconnecting them. The syntax of the command is:

wlookup -Lar ManagedNode

and

wlookup -Lar MaestroEngine

3. Run the same wlookup commands on the other node in the cluster to see if the objects exist.

4. Interconnect the Framework servers in a two-way interconnection using the wconnect command. For a full description of how to use this command, refer to Tivoli Management Framework Reference Manual Version 4.1, SC32-0806. While logged on to node TIVW2K1, enter the following command:

wconnect -c none -l administrator -m Two-way -r none tivw2k2

5. Use the wlsconn and odadmin commands to verify that the interconnection has worked between the two Framework servers. Look at the output of the wlsconn command; it will contain the primary IP hostname of the node that is interconnected to in the preceding step.

In our environment, the primary IP hostname of cluster node TIVW2K2 is found under the SERVER column in the output of the wlsconn command. The same value is found under the Hostname(s) column in the output of the odadmin command, on the row that shows the Tivoli region ID of the cluster node.

Note: The two-way interconnection command only needs to be performed on one of the connections. If you have two cluster nodes, you only need to run the wconnect command on one of them.


6. Interconnecting Framework servers only establishes a communication path. The Framework resources that need to be shared between Framework servers have to be pulled across the servers by using an explicit updating command.

Sharing a Framework resource shares all the objects that the resource defines. This enables Tivoli administrators to securely control which Framework objects are shared between Framework servers, and to control the performance of the Tivoli Enterprise environment by leaving out unnecessary resources from the exchange of resources between Framework servers. Exchange all relevant Framework resources among cluster nodes by using the wupdate command.

In our environment we exchanged the following Framework resources:

– ManagedNode– MaestroEngine– MaestroDatabase– MaestroPlan– SchedulerEngine– SchedulerDatabase– SchedulerPlan

The SchedulerEngine Framework resource enables the interconnected scheduling engines to present themselves in the Job Scheduling Console. The MaestroEngine Framework resource enables the wmaeutil command to manage running instances of Connectors.

7. Now verify the exchange of the Framework resources has worked. You can use the wlookup command with the following parameters:

wlookup -Lar ManagedNode

and

wlookup -Lar MaestroEngine

When you use the command wlookup with the parameter “ManagedNode”, you will see the two nodes in this cluster. When you use the same command with the parameter “MaestroEngine”, you should see four names that are associated with the two connector instances.

8. Run the same sequence of wlookup commands, but on the cluster node on the opposite side of the interconnection. The output from the commands should be identical to the same commands run on the cluster node in the preceding step.

Important: The wupdate command must be run on all cluster nodes, even on two-way interconnected Framework servers.


9. Log into both cluster nodes through the Job Scheduling Console, using the service IP labels of the cluster nodes and the root user account. All scheduling engines (corresponding to the configured Connectors) on all cluster nodes appear. Those scheduling engines marked inactive are not active because the resource group is not running on that cluster node.

10.Set up a periodic job to exchange Framework resources by using the wupdate command shown in the preceding steps. The frequency that the job should run at depends upon how often changes are made to the Connector objects. For most sites, best practice is a daily update about an hour before Jnextday. Timing it before Jnextday makes the Framework resource update compatible with any changes to the installation location of IBM Tivoli Workload Scheduler. These changes are often timed to occur right before Jnextday is run.

4.2.9 Installing the Job Scheduling ConsoleThe Job Scheduling Console can be installed on any workstation that has a TCP/IP connection. However, to use the Job Scheduling Console Version 1.3 you should have the following components installed within your IBM Tivoli Workload Scheduler 8.2 network:

� Tivoli Framework 3.7.1 or 4.1� Tivoli Job Scheduling Services 1.3� IBM Tivoli Workload Scheduler Connector 8.2

For a full description of the installation, refer to IBM Tivoli Workload Scheduler Job Scheduling Console User’s Guide Feature Level 1.3, SC32-1257, and to IBM Tivoli Workload Scheduler Version 8.2: New Features and Best Practices, SG24-6628.

For the most current information about supported platforms and system requirements, refer to IBM Tivoli Workload Scheduler Job Scheduling Console Release Notes, SC32-1258.

An installation program is available for installing the Job Scheduling Console. You can install directly from the CDs. Alternatively, copy the CD to a network drive and map that network drive. You can install the Job Scheduling Console using any of the following installation mechanisms:

� By using an installation wizard that guides the user through the installation steps

� By using a response file that provides input to the installation program without user intervention

� By using Software Distribution to distribute the Job Scheduling Console files


Here we provide an example of the first method, using the installation wizard interactively. The installation program can perform a number of actions:

� Fresh install� Add new languages to an existing installation� Repair an existing installation

Here we assume that you are performing a fresh install. The installation is exactly the same for a non-cluster installation as for a clustered environment.

1. Insert the IBM Tivoli Workload Scheduler Job Scheduling Console CD 1 in the CD-ROM drive.

2. Navigate to the JSC directory.

3. Locate the directory of the platform on which you want to install the Job Scheduling Console, and run the setup program for the operating system on which you are installing:

– Windows: setup.exe

– UNIX: setup.bin

4. The installation program is launched. Select the language in which you want the program to be displayed, and click OK.

5. Read the welcome information and click Next.

6. Read the license agreement, select the acceptance radio button, and click Next.

7. Select the location for the installation, or click Browse to install to a different directory. Click Next.

8. On the dialog displayed, you can select the type of installation you want to perform:

– Typical. English and the language of the locale are installed. Click Next.

– Custom. Select the languages you want to install and click Next.

– Full. All languages are automatically selected for installation. Click Next.

9. A panel is displayed where you can select the locations for the program icons. Click Next.

10.Review the installation settings and click Next. The installation is started.

Note: The Job Scheduling Console installation directory inherits the access rights of the directory where the installation is performed. Because the Job Scheduling Console requires user settings to be saved, it is important to select a directory in which users are granted access rights.


11.When the installation completes, a panel will either display a successful installation, or it will contain a list of which items failed to install and the location of the log file containing the details of the errors.

12.Click Finish.

4.2.10 Scheduled outage configurationAfter IBM Tivoli Workload Scheduler is installed as described above and working correctly, there can be two separate situations where a IBM Tivoli Workload Scheduler Master Domain Manager or domain manager that is configured in a cluster does not link to the agents that are defined in the network it is managing. The descriptions of those situations and solutions are described here:

Situation 1The first situation is when IBM Tivoli Workload Scheduler Master Domain Manager or domain manager failover or fail back occurs in the cluster. When the IBM Tivoli Workload Scheduler engine restarts, the Fault Tolerant Agents that are defined in the network that this Master Domain Manager is managing can remain in a state of UNLINKED.

SolutionThe solution for this situation is to issue the conman command conman link @;noask. When you issue this command, the IBM Tivoli Workload Scheduler engine will link up all the Fault Tolerant agents that it is managing in its network. To make this an unattended solution, you can put this command into a command file; after the command is in a command file, you can run the command file in the failover/fail back procedure.

To make a command file run as a service, use the program svrany.exe, which can be found in the Resource Kit. svrany.exe will allow any bat, cmd or exe file to be run as a service. If the bat, cmd or exe file is not a “real” service, it is executed once at the start of the service, which is just what is required in this situation.

To set up this unattended solution, follow this procedure on each node in the cluster:

1. Create a service with the command INSTSRV service_name full_path_to_svrany.exe. This will execute as the IBM Tivoli Workload Scheduler installation userid.

2. Run regedit to edit the created service:

a. Add a @Parameters’ key (same level as ‘Enum’ and ‘Security’).

b. Add a String ‘Application’ to the added key.


c. Assign Value of added string = full_path_to_command_link_cmd:

Figure 4-105 New TWS_Link service

3. Set up a cluster service that refers to the ‘Link’ node service (similar to the cluster services set up for IBM Tivoli Workload Scheduler).

4. Make the ‘TWS_Link’ cluster service dependent on the IBM Tivoli Workload Scheduler cluster service.

Now when the node fails over or fails back, the cluster will do the following:

� When the Network, IP and Disk are available, the Token cluster service will start.

� When the Token cluster service is available, the Netman cluster service will start.

� When the Netman cluster service is available, the TWS cluster service will start.

� When the TWS cluster service is available, the Link cluster service will start.

Situation 2The second situation is when IBM Tivoli Workload Scheduler Master Domain Manager executes the Jnextday script. This script is used to create the new production day. When the Jnextday script runs, it shuts down the workstations that are under the control of the Master Domain Manager and restarts them (this is a normal operation). During this process, the Master Domain Manager is also shut down and restarted.

During this time the MSCS cluster is monitoring these processes and when the processes are shut down, the MSCS cluster marks them as failed and logs this event in the Windows EventLog.


The MSCS cluster, however, expects their services to be stopped/started using cluster administrator commands. A command line version of these exists and is documented on the Microsoft Web site:

http://www.microsoft.com/windows2000/en/datacenter/help/sag_mscsacluscmd_0.htm

Figure 4-106 displays the syntax for the cluster resource command.

Figure 4-106 The basic cluster resource syntax

SolutionThe solution for the second situation is to create two cmd files, as discussed here.

The first fileThe first file will issue offline commands to the cluster resource, as shown in Example 4-92 on page 412.

Example 4-92 Sample script to bring the TWS Cluster OFFLINE

@echo off

rem ********************************************************rem * Bring TWS Cluster OFFLINE on MSCS Cluster *rem ********************************************************

echo ********************************************************************************

The basic cluster resource syntax is:

cluster [[/cluster:]cluster name] resource resource name /option

The cluster name is optional. If no option is specified, the default option is /status.

If the name of your cluster is also a cluster command or its abbreviation, such as cluster or resource, use /cluster: to explicitly specify the cluster name. For a list of all the cluster commands, see Related Topics.

With /online and /offline the option /wait[:timeout in seconds] specifies how long Cluster.exe waits before canceling the command if it does not successfully complete. If no time-out is specified, Cluster.exe waits indefinitely or until the resource state changes.


http://www.microsoft.com/windows2000/en/datacenter/help/sag_mscsacluscmd_0.htm

echo Show cluster statuscluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler (for wnc008p)"cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler Linkage"echo .

echo Set cluster status

rem FIRST bring 'Linkage' offline, then 'TWS' !!! (reverse order from online)

cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler Linkage" /offlinecluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler (for wnc008p)" /offlineecho .


echo ********************************************************************************

Create a IBM Tivoli Workload Scheduler job for this first script and schedule it two minutes before Jnextday runs (the default is 0559 for Jnextday, so set the first script to run at 0557). The successful execution of this script will stop the monitoring of the IBM Tivoli Workload Scheduler service (“tws_maestro” in this MSCS cluster), as the IBM Tivoli Workload Scheduler services are now offline.

During the normal execution of Jnextday, a Conman stop command is issued—but because the services are already down, this command has no effect and no warning or error messages are produced. Jnextday also issues a conman start command, which brings up the TWS node service—but as the cluster did not start these services, MSCS cluster will still say they are offline.

The second fileThe second file will issue online commands to the cluster resource, defining this second file as a job in IBM Tivoli Workload Scheduler. This should be defined without any dependences so that it runs right after Jnextday.

Tip: The first script should not stop the netman process. Why? Because if the netman process were to be stopped, then the master domain manager would not be able to restart this agent or domain manager during Jnextday.


This job to bring the cluster services online will fail because the node services are already present, but the “cluster service status” has been updated and now shows the IBM Tivoli Workload Scheduler cluster service as online.

Example 4-93 on page 414 displays the second file.

Example 4-93 Sample script to bring the TWS Cluster ONLINE

@echo off

rem ********************************************************rem * Set TWS FTA Cluster status to ONLINE on MSCS Cluster *rem ********************************************************

echo ********************************************************************************


echo Set cluster statusrem FIRST bring 'TWS' online, then 'Linkage' !!! (reverse order from offline)cluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler (for wnc008p)" /onlinecluster /cluster:ITIVW2KV1 resource "Tivoli Workload Scheduler Linkage" /onlineecho .


echo *******************************************************************************

As the IBM Tivoli Workload Scheduler cluster is brought offline by a cluster command, there are no error entries in the EventLog (only cluster degraded warning entries are displayed, which seem normal).


Chapter 5. Implement IBM Tivoli Management Framework in a cluster

In this chapter, we show you how to implement IBM Tivoli Management Framework in a highly available cluster. Unlike in the preceding chapters, we show an implementation that consists only of IBM Tivoli Management Framework; we do not involve high availability considerations for IBM Tivoli Workload Scheduler.

We specifically discuss the following:

� “Implement IBM Tivoli Management Framework in an HACMP cluster” on page 416

� “Implementing Tivoli Framework in a Microsoft Cluster” on page 503

While this is the basis for a highly available Tivoli Enterprise configuration, specific IBM Tivoli products may present unique high availability issues not covered in this redbook. Consult your IBM service provider for assistance with designing and implementing high availability for products like IBM Tivoli Enterprise Console, IBM Tivoli Configuration Manager, IBM Tivoli Monitoring.

5


5.1 Implement IBM Tivoli Management Framework in an HACMP cluster

IBM Support officially does not recognize implementing two instances of IBM Tivoli Management Framework on a single operating system image. While it is technically possible to implement this configuration, it is not supported. You can read more about this configuration in the IBM Redbook High Availability Scenarios for Tivoli Software, SG24-2032. In this chapter, we show a supported HA configuration for a Tivoli server.

We also discuss how to configure Managed Nodes and Endpoints for high availability. The general steps to implement IBM Tivoli Management Framework for HACMP are:

� “Inventory hardware” on page 417

� “Planning the high availability design” on page 418

� “Create the shared disk volume” on page 420

� “Install IBM Tivoli Management Framework” on page 453

� “Tivoli Web interfaces” on page 464

� “Tivoli Managed Node” on page 464

� “Tivoli Endpoints” on page 466

� “Configure HACMP” on page 480

The following sections break down each step into the following operations.

Important: Even though both this chapter and 4.1.11, “Add IBM Tivoli Management Framework” on page 303 deal with configuring IBM Tivoli Management Framework for HACMP, they should be treated as separate from each other:

� This chapter describes how to configure IBM Tivoli Management Framework by itself.

� Chapter 4, “IBM Tivoli Workload Scheduler implementation in a cluster” on page 183, in contrast, deals with how to configure IBM Tivoli Management Framework and IBM Tivoli Workload Scheduler as an integrated whole.

This chapter also provides implementation details for IBM Tivoli Management Framework 4.1. For a discussion on how to implement IBM Tivoli Management Framework 3.7b on the MSCS platform, refer to Appendix B, “TMR clustering for Tivoli Framework 3.7b on MSCS” on page 601.


5.1.1 Inventory hardwareHere we present an inventory of the hardware we used for writing this redbook. This enables you to determine what changes you may need to make when using this book as a guide in your own deployment by comparing your environment against what we used.

Our environment consisted of two IBM RS/6000 7025-F80s. They are identically configured. There are four PowerPC® RS64-III 450 MHz processors in each system. There is 1 GB of RAM in each system. We determined the amount of RAM by using the lsattr command:

lsattr -El mem0

The firmware is at level CL030829, which we verified by using the lscfg command:

lscfg -vp | grep -F .CL

Best practice is to bring your hardware up to the latest firmware and microcode levels. Download the most recent firmware and microcode from:

http://www-1.ibm.com/servers/eserver/support/pseries/fixes/hm.html

Onboard the system, the following devices are installed:

� SCSI 8mm Tape Drive (20000 MB)

� 5 x 16-bit LVD SCSI Disk Drive (9100 MB)

� 16-bit SCSI Multimedia CD-ROM Drive (650 MB)

There are four adapter cards in each system:

� IBM 10/100 Mbps Ethernet PCI Adapter

� IBM 10/100/1000 Base-T Ethernet PCI Adapter (14100401)

� IBM SSA 160 SerialRAID Adapter

� IBM PCI Token ring Adapter

We did not use the IBM PCI Token ring Adapter.

Shared between the two systems is an IBM 7133 Model 010 Serial Disk System disk tray. Download the most recent SSA drive microcode from:

http://www.storage.ibm.com/hardsoft/products/ssa/index.html

The IBM SSA 160 SerialRAID Adapter is listed in this Web site as the Advanced SerialRAID Adapter. In our environment, the adapters are at loadable microcode level 05, ROS level BD00.

Chapter 5. Implement IBM Tivoli Management Framework in a cluster 417

There are 16 SSA drives physically installed in the disk tray, but only 8 are active. The SSA drives are 2 GB type DFHCC2B1, at microcode level 8877. In the preceding Web page, the drives are listed as type DFHC (RAMST).

5.1.2 Planning the high availability designThe restriction against two instances of IBM Tivoli Management Framework on the same operating system image prevents mutual takeover implementations. Instead, we show in this section how to install IBM Tivoli Management Framework and configure it in AIX HACMP for a two-node hot standby cluster.

In this configuration, IBM Tivoli Management Framework is active on only one cluster node at a time, but is installed onto a shared volume group available to all cluster nodes. It is configured to always run from the service IP label and corresponding IP address of the cluster node it normally runs upon. Tivoli Desktop sessions connect to this IP address.

In our environment we configured the file system /opt/hativoli on the shared volume group. In normal operation in our environment, the oserv server of IBM Tivoli Management Framework runs on tivaix1 as shown in Figure 5-1 on page 419.


Figure 5-1 IBM Tivoli Management Framework in normal operation on tivaix1

If IBM Tivoli Management Framework on tivaix1 falls over to tivaix2, the IP service label and shared file system are automatically configured by HACMP onto tivaix2. Tivoli Desktop sessions are restarted when the oserv server is shut down, so users of Tivoli Desktop will have to log back in. The fallover scenario is shown in Figure 5-2 on page 420.


Figure 5-2 State of cluster after IBM Tivoli Management Framework falls over to tivaix2

All managed resources are brought over at the same time because the entire object database is contained in /opt/hativoli. As far as IBM Tivoli Management Framework is concerned, there is no functional difference between running on tivaix1 or tivaix2.

5.1.3 Create the shared disk volumeIn this section, we show you how to create and configure a shared disk volume to install IBM Tivoli Management Framework into. Before installing HACMP, we create the shared volume group and install the application servers in them. We can then manually test the fallover of the application server before introducing HACMP.


Plan the shared diskThe cluster needs a shared volume group to host the IBM Tivoli Management Framework upon so that participating cluster nodes can take over and vary on the volume group during a fallover. Here we show how to plan shared volume groups for an HACMP cluster that uses SSA drives.

Start by making an assessment of the SSA configuration on the cluster.

Assess SSA linksEnsure that all SSA links are viable, to rule out any SSA cabling issues before starting other assessments. To assess SSA links:

1. Enter: smit diag.

2. Go to Current Shell Diagnostics and press Enter. The DIAGNOSTIC OPERATING INSTRUCTIONS diagnostics screen displays some navigation instructions.

3. Press Enter. The FUNCTION SELECTION diagnostics screen displays diagnostic functions.

4. Go to Task Selection (Diagnostics, Advanced Diagnostics, Service Aids, etc.) -> SSA Service Aids -> Link Verification and press Enter. The LINK VERIFICATION diagnostics screen displays a list of SSA adapters to test upon. Go to an SSA adapter to test and press Enter.

In our environment, we selected the SSA adapter ssa0 on tivaix1 as shown in Figure 5-3 on page 422.


Figure 5-3 Start SSA link verification on tivaix1

5. The link verification test screen displays the results of the test.

The results of the link verification test in our environment are shown in Figure 5-4 on page 423.

LINK VERIFICATION 802385

Move cursor onto selection, then press <Enter>.

tivaix1:ssa0 2A-08 IBM SSA 160 SerialRAID Adapter (

F3=Cancel F10=Exit


Figure 5-4 Results of link verification test on SSA adapter ssa0 in tivaix1

The link verification test indicates only the following SSA disks are available on tivaix1: pdisk9, pdisk8, pdisk11, pdisk13, pdisk10, pdisk14, pdisk14, pdisk12, and pdisk16.

6. Repeat the operation for remaining cluster nodes.

In the environment, we tested the link verification for SSA adapter ssa0 on tivaix2, as shown in Figure 5-5 on page 424.


SSA Link Verification for: tivaix1:ssa0 2A-08 IBM SSA 160 SerialRAID Adapter (

To Set or Reset Identify, move cursor onto selection, then press <Enter>

Physical Serial# Adapter Port A1 A2 B1 B2 Status

tivaix1:pdisk9 AC7D2457 0 4 Good tivaix1:pdisk8 AC7D200F 1 3 Good tivaix1:pdisk11 AC7D25F9 2 2 Good tivaix1:pdisk13 AC7D2654 3 1 Good tivaix2:ssa0:A 4 0 tivaix1:pdisk10 AC7D25A4 0 4 Good tivaix1:pdisk14 AC7D2A94 1 3 Good tivaix1:pdisk12 AC7D25FE 2 2 Good tivaix1:pdisk16 29922C0B 3 1 Good tivaix2:ssa0:B 4 0

F3=Cancel F10=Exit


Figure 5-5 Results of SSA link verification test on SSA adapter ssa0 in tivaix2

The link verification test indicates only the following SSA disks are available on tivaix2: pdisk0, pdisk1, pdisk2, pdisk3, pdisk4, pdisk5, pdisk6, and pdisk7.

Identify the SSA connection addressesThe connection address uniquely identifies a SSA device. To display the connection address of a physical disk, follow these steps:

1. Enter: smit chgssapdsk. The SSA Physical Disk SMIT selection screen displays a list of known physical SSA disks.

2. Go to a SSA disk and press Enter, as shown in Figure 5-6 on page 425.


SSA Link Verification for: tivaix2:ssa0 17-08 IBM SSA 160 SerialRAID Adapter (

To Set or Reset Identify, move cursor onto selection, then press <Enter>

Physical Serial# Adapter Port A1 A2 B1 B2 Status

tivaix1:ssa0:A 0 4 tivaix2:pdisk1 AC7D2457 1 3 Good tivaix2:pdisk0 AC7D200F 2 2 Good tivaix2:pdisk3 AC7D25F9 3 1 Good tivaix2:pdisk5 AC7D2654 4 0 Good tivaix1:ssa0:B 0 4 tivaix2:pdisk2 AC7D25A4 1 3 Good tivaix2:pdisk6 AC7D2A94 2 2 Good tivaix2:pdisk4 AC7D25FE 3 1 Good tivaix2:pdisk7 29922C0B 4 0 Good

F3=Cancel F10=Exit

Note: You can also enter: smit devices. Then go to SSA Disks -> SSA Physical Disks -> Change/Show Characteristics of an SSA Physical Disk and press Enter.


Figure 5-6 Select an SSA disk from the SSA Physical Disk SMIT selection screen

3. The Change/Show Characteristics of an SSA Physical Disk SMIT screen displays the characteristics of the selected SSA disk. The Connection address field displays the SSA connection address of the selected disk, as shown in Figure 5-7 on page 426.

+--------------------------------------------------------------------------+¦ SSA Physical Disk ¦¦ ¦¦ Move cursor to desired item and press Enter. ¦¦ ¦¦ [TOP] ¦¦ pdisk0 Defined 2A-08-P 2GB SSA C Physical Disk Drive ¦¦ pdisk1 Defined 2A-08-P 2GB SSA C Physical Disk Drive ¦¦ pdisk10 Available 2A-08-P 2GB SSA C Physical Disk Drive ¦¦ pdisk11 Available 2A-08-P 2GB SSA C Physical Disk Drive ¦¦ pdisk12 Available 2A-08-P 2GB SSA C Physical Disk Drive ¦¦ pdisk13 Available 2A-08-P 2GB SSA C Physical Disk Drive ¦¦ pdisk14 Available 2A-08-P 2GB SSA C Physical Disk Drive ¦¦ pdisk15 Defined 2A-08-P Other SSA Disk Drive ¦¦ pdisk16 Available 2A-08-P 2GB SSA C Physical Disk Drive ¦¦ pdisk2 Defined 2A-08-P 2GB SSA C Physical Disk Drive ¦¦ pdisk3 Defined 2A-08-P 2GB SSA C Physical Disk Drive ¦¦ [MORE...6] ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F8=Image F10=Exit Enter=Do ¦¦ /=Find n=Find Next ¦+--------------------------------------------------------------------------+


Figure 5-7 Identify the connection address of an SSA disk

4. Repeat the operation for all remaining SSA drives.

5. Repeat the operation for all remaining cluster nodes.

An SSA connection address is unique throughout the cluster. Identify the relationship between each connection address and the AIX physical disk definition it represents on each cluster node. This establishes an actual physical relationship between the defined physical disk in AIX and the hardware disk, as identified by its SSA connection address.

In our environment, we identified the SSA connection address of the disks on tivaix1 and tivaix2 as shown in Table 5-1.

Table 5-1 SSA connection addresses of SSA disks on tivaix1 and tivaix2

Change/Show Characteristics of an SSA Physical Disk


[Entry Fields] Disk pdisk0 Disk type 2000mbC Disk interface ssa Description 2GB SSA C Physical Di> Status Defined Location 2A-08-P Location Label [] Parent ssar adapter_a none adapter_b none primary_adapter adapter_a + Connection address 0004AC7D205400D


Physical disk on tivaix1 Connection address Physical disk on tivaix2

pdisk0 0004AC7D205400D pdisk8

pdisk1 0004AC7D20A200D pdisk9


Using the list of disks identified in the link verification test in the preceding section, we highlight (in bold in Table 5-1 on page 426) the disks on each cluster node that are physically available to be shared on both nodes. From this list we identify which disks are also available to be shared as logical elements by using the assessments in the following sections.

Assess tivaix1In our environment, the available SSA physical disks on tivaix1 are shown in Example 5-1.

Example 5-1 Available SSA disks on tivaix1 before configuring shared volume groups

[root@tivaix1:/home/root] lsdev -C -c pdisk -s ssar -Hname status location description

pdisk0 Defined 2A-08-P 2GB SSA C Physical Disk Drivepdisk1 Defined 2A-08-P 2GB SSA C Physical Disk Drivepdisk10 Available 2A-08-P 2GB SSA C Physical Disk Drivepdisk11 Available 2A-08-P 2GB SSA C Physical Disk Drivepdisk12 Available 2A-08-P 2GB SSA C Physical Disk Drive


pdisk3 0004AC7D240D00D pdisk11


pdisk5 0004AC7D25BC00D pdisk13

pdisk6 0004AC7D275E00D pdisk14

pdisk7 0004AC7DDACC00D pdisk15

pdisk8 0004AC7D200F00D pdisk0



pdisk11 0004AC7D25F900D pdisk3

pdisk12 0004AC7D25FE00D pdisk4



pdisk15 08005AEA42BC00D n/a

pdisk16 000629922C0B00D pdisk7

Physical disk on tivaix1 Connection address Physical disk on tivaix2


pdisk13 Available 2A-08-P 2GB SSA C Physical Disk Drivepdisk14 Available 2A-08-P 2GB SSA C Physical Disk Drivepdisk15 Defined 2A-08-P Other SSA Disk Drivepdisk16 Available 2A-08-P 2GB SSA C Physical Disk Drivepdisk2 Defined 2A-08-P 2GB SSA C Physical Disk Drivepdisk3 Defined 2A-08-P 2GB SSA C Physical Disk Drivepdisk4 Defined 2A-08-P 2GB SSA C Physical Disk Drivepdisk5 Defined 2A-08-P 2GB SSA C Physical Disk Drivepdisk6 Defined 2A-08-P 2GB SSA C Physical Disk Drivepdisk7 Defined 2A-08-P 2GB SSA C Physical Disk Drivepdisk8 Available 2A-08-P 2GB SSA C Physical Disk Drivepdisk9 Available 2A-08-P 2GB SSA C Physical Disk Drive

The logical disks on tivaix1 are defined as shown in Example 5-2. Note the physical volume ID (PVID) field in the second column, and the volume group assignment field in the third column.

Example 5-2 Logical disks on tivaix1 before configuring shared volume groups

[root@tivaix1:/home/root] lspvhdisk0 0001813fe67712b5 rootvg activehdisk1 0001813f1a43a54d rootvg activehdisk2 0001813f95b1b360 rootvg activehdisk3 0001813fc5966b71 rootvg activehdisk4 0001813fc5c48c43 Nonehdisk5 0001813fc5c48d8c Nonehdisk6 000900066116088b tiv_vg1hdisk7 000000000348a3d6 tiv_vg1hdisk8 00000000034d224b tiv_vg2hdisk9 none Nonehdisk10 none Nonehdisk11 none Nonehdisk12 00000000034d7fad tiv_vg2hdisk13 none None

The logical-to-physical SSA disk relationship of configured SSA drives on tivaix1 is shown in Example 5-3.

Example 5-3 How to show logical to physical SSA disk relationships on tivaix1.

[root@tivaix1:/home/root] for i in $(lsdev -CS1 -t hdisk -sssar -F name)> do> echo "$i: "$(ssaxlate -l $i)> donehdisk10: pdisk12hdisk11: pdisk13hdisk12: pdisk14hdisk13: pdisk16


hdisk6: pdisk8hdisk7: pdisk9hdisk8: pdisk10hdisk9: pdisk11

Assess tivaix2The same SSA disks in the same SSA loop that are available on tivaix2 are shown in Example 5-4.

Example 5-4 Available SSA disks on tivaix2 before configuring shared volume groups

[root@tivaix2:/home/root] lsdev -C -c pdisk -s ssar -Hname status location description

pdisk0 Available 17-08-P 2GB SSA C Physical Disk Drivepdisk1 Available 17-08-P 2GB SSA C Physical Disk Drivepdisk10 Defined 17-08-P 2GB SSA C Physical Disk Drivepdisk11 Defined 17-08-P 2GB SSA C Physical Disk Drivepdisk12 Defined 17-08-P 2GB SSA C Physical Disk Drivepdisk13 Defined 17-08-P 2GB SSA C Physical Disk Drivepdisk14 Defined 17-08-P 2GB SSA C Physical Disk Drivepdisk15 Defined 17-08-P 2GB SSA C Physical Disk Drivepdisk2 Available 17-08-P 2GB SSA C Physical Disk Drivepdisk3 Available 17-08-P 2GB SSA C Physical Disk Drivepdisk4 Available 17-08-P 2GB SSA C Physical Disk Drivepdisk5 Available 17-08-P 2GB SSA C Physical Disk Drivepdisk6 Available 17-08-P 2GB SSA C Physical Disk Drivepdisk7 Available 17-08-P 2GB SSA C Physical Disk Drivepdisk8 Defined 17-08-P 2GB SSA C Physical Disk Drivepdisk9 Defined 17-08-P 2GB SSA C Physical Disk Drive

The logical disks on tivaix2 are defined as shown in Example 5-5.

Example 5-5 Logical disks on tivaix2 before configuring shared volume groups

[root@tivaix2:/home/root] lspvhdisk0 0001814f62b2a74b rootvg activehdisk1 none Nonehdisk2 none Nonehdisk3 none Nonehdisk4 none Nonehdisk5 000900066116088b tiv_vg1hdisk6 000000000348a3d6 tiv_vg1hdisk7 00000000034d224b tiv_vg2hdisk8 0001813f72023fd6 Nonehdisk9 0001813f72025253 Nonehdisk10 0001813f71dd8f80 Nonehdisk11 00000000034d7fad tiv_vg2


hdisk12 0001814f7ce1d08d Nonehdisk16 0001814fe8d10853 None

The logical-to-physical SSA disk relationship of configured SSA drives on tivaix2 is shown in Example 5-6.

Example 5-6 Show logical-to-physical SSA disk relationships on tivaix2

[root@tivaix2:/home/root] for i in $(lsdev -CS1 -t hdisk -sssar -F name)> do> echo "$i: "$(ssaxlate -l $i)> donehdisk10: pdisk5hdisk11: pdisk6hdisk12: pdisk7hdisk5: pdisk0hdisk6: pdisk1hdisk7: pdisk2hdisk8: pdisk3hdisk9: pdisk4

Identify the volume group major numbersEach volume group is assigned a major device number, a unique number on a cluster node different from the major number of any other device on the cluster node. Creating a new shared volume group, on the other hand, requires a new major device number assigned to it with the following characteristics:

� It is different from any other major number of any device on the cluster node.

� It is exactly the same as the major number assigned to the same shared volume group on all other cluster nodes that share the volume group.

Satisfy these criteria by identifying the existing volume group major numbers that exist on each cluster node so a unique number can be assigned for the new shared volume group. If any other shared volume groups already exist, also identify the major numbers used for these devices. Whenever possible, try to keep major numbers of similar devices in the same range. This eases the administrative burden of keeping track of the major numbers to assign.

In our environment, we used the following command to identify all major numbers used by all devices on a cluster node:

ls -al /dev/* | awk '{ print $5 }' | awk -F',' '{ print $1 }' | sort | uniq

In our environment, the major numbers already assigned include the ones shown in Example 5-7 on page 431. We show only a portion of the output for brevity; the parts we left out are indicated by vertical ellipses (...).


Example 5-7 How to list major numbers already in use on tivaix1

[root@tivaix1:/home/root] ls -al /dev/* | awk '{ print $5 }' | \> awk -F',' '{ print $1 }' | sort -n | uniq

.

.

.811

.

.

.4344454647512

.

.

.

In this environment, the volume groups tiv_vg1 and tiv_vg2 are shared volume groups that already exist. We use the ls command on tivaix1, as shown in Example 5-8,to identify the major numbers used for these shared volume groups.

Example 5-8 Identify the major numbers used for shared volume groups on tivaix1

[root@tivaix1:/home/root] ls -al /dev/tiv_vg1crw-rw---- 1 root system 45, 0 Nov 05 15:51 /dev/tiv_vg1[root@tivaix1:/home/root] ls -al /dev/tiv_vg2crw-r----- 1 root system 46, 0 Nov 10 17:04 /dev/tiv_vg2

Example 5-8 shows that shared volume group tiv_vg1 uses major number 45, and shared volume group tiv_vg2 uses major number 46. We perform the same commands on the other cluster nodes that access the same shared volume groups. In our environment, these commands are entered on tivaix2, as shown in Example 5-9.

Example 5-9 Identify the major numbers used for shared volume groups on tivaix2

[root@tivaix2:/home/root] ls -al /dev/tiv_vg1crw-r----- 1 root system 45, 0 Dec 15 20:36 /dev/tiv_vg1[root@tivaix2:/home/root] ls -al /dev/tiv_vg2crw-r----- 1 root system 46, 0 Dec 15 20:39 /dev/tiv_vg2

Again, you can see that the major numbers are the same on tivaix2 for the same volume groups. Between the list of all major numbers used by all devices, and


the major numbers already used by the shared volume groups in our cluster, we choose 49 as the major number to assign to the next shared volume group on all cluster nodes that will access the new shared volume group.

Analyze the assessmentsUse the assessment data gathered in the preceding sections to plan the disk sharing design.

Identify which physical disks are not yet assigned to any logical elements. List the physical disks available on each cluster node, as well as each disk’s physical volume ID (PVID), its corresponding logical disk, and the volume group the physical disk is assigned to.

If a physical disk is not assigned to any logical elements yet, describe the logical elements as “not available”. Disks listed as defined but not available usually indicate connection problems or hardware failure on the disk itself, so do not include these disks in the analysis.

Table 5-2 Identify SSA physical disks on tivaix1 available for logical assignments

The analysis of tivaix1 indicates that four SSA disks are available as logical elements (highlighted in bold in Table 5-2) because no volume groups are allocated to them: pdisk11, pdisk12, pdisk13, and pdisk16.

We want the two cluster nodes in our environment to share a set of SSA disks, so we have to apply the same analysis of available disks to tivaix2; see Table 5-3 on page 433.

Physical Disk PVID Logical Disk Volume Group

pdisk8 000000000348a3d6 hdisk6 tiv_vg1


pdisk10 00000000034d224b hdisk8 tiv_vg2

pdisk11 n/a hdisk9 n/a



pdisk14 00000000034d7fad hdisk12 tiv_vg2



Table 5-3 Identify SSA physical disks on tivaix2 available for logical assignments

The analysis of tivaix2 indicates that four SSA disks are available as logical elements (highlighted in bold in Table 5-3) because no volume groups are allocated to them: pdisk3, pdisk4, pdisk5, and pdisk7.

Pooling together the separate analyses from each cluster node, we arrive at the map shown in Table 5-4. The center two columns show the actual, physical SSA drives as identified by their connection address and the shared volume groups hosted on these drives. The outer two columns show the AIX-assigned physical and logical disks on each cluster node, for each SSA drive.

Table 5-4 SSA connection addresses of SSA disks on tivaix1 and tivaix2

Physical Disk PVID Logical Disk Volume Group

pdisk0 000900066116088b hdisk5 tiv_vg1


pdisk2 00000000034d224b hdisk7 tiv_vg2

pdisk3 0001813f72023fd6 hdisk8 n/a

pdisk4 0001813f72025253 hdisk9 n/a

pdisk5 0001813f71dd8f80 hdisk10 n/a

pdisk6 00000000034d7fad hdisk11 tiv_vg2

pdisk7 0001814f7ce1d08d hdisk12 n/a

tivaix1 disksConnection address

Volume group

tivaix2 disks

Physical Logical Physical Logical

pdisk8 hdisk6 0004AC7D200F00D tiv_vg1 pdisk0 hdisk5

pdisk9 hdisk7 0004AC7D245700D tiv_vg1 pdisk1 hdisk6

pdisk10 hdisk8 0004AC7D25A400D tiv_vg2 pdisk2 hdisk7

pdisk11 hdisk9 0004AC7D25F900D pdisk3 hdisk8

pdisk12 hdisk10 0004AC7D25FE00D pdisk4 hdisk9

pdisk13 hdisk11 0004AC7D265400D pdisk5 hdisk10

pdisk14 hdisk12 0004AC7D2A9400D tiv_vg2 pdisk6 hdisk11

pdisk16 hdisk13 000629922C0B00D pdisk7 hdisk12


You can think of the AIX physical disk as the handle by which the SSA drivers in AIX use to communicate with the actual SSA hardware drive. Think of the AIX logical disk as the higher level construct that presents a uniform interface to the AIX volume management system. These logical disks are allocated to volume groups, and they map back through a chain (logical disk to physical disk to connection address) to reach the actual SSA hardware drive.

Allocate the SSA disks to a new volume groupThe assessments and the analyses shows us that four SSA drives are available to allocate to a volume group for IBM Tivoli Management Framework, and be assigned as a volume group amongst both nodes in our two-node cluster. These are highlighted in bold in the preceding table.

A basic installation of IBM Tivoli Management Framework requires no more than 2 GB. Our assessments in the preceding sections (“Assess tivaix1” on page 427 and , “Assess tivaix2” on page 429) show us that our SSA storage system uses 2 GB drives, so we know the physical capacity of each drive.

We will use two drives for the volume group that will hold IBM Tivoli Management Framework, as shown in the summary analysis table (Table 5-5) that distills all the preceding analysis into the concluding analysis identifying the physical SSA disks to use, and the order in which we specify them when defining them into a volume group.

Table 5-5 Summary analysis table of disks to use for new shared volume group

The following section describes how to allocate the new volume group on the selected SSA drives.

Configure volume group on SSA drivesUse the SSA drives selected during analysis to configure a volume group upon. This volume group is shared among all the cluster nodes.

To configure a volume group on SSA drives:

1. Select a cluster node from the final analysis table (Table 5-5). Log into that cluster node as root user.

In our environment, we logged into tivaix1 as root user.

tivaix1 DisksConnection Address

Volume Group

tivaix2 Disks

Physical Logical Physical Logical

pdisk11 hdisk9 0004AC7D25F900D itmf_vg pdisk3 hdisk8

pdisk12 hdisk10 0004AC7D25FE00D itmf_vg pdisk4 hdisk9


2. Enter the SMIT fast path command: smit mkvg. The Add a Volume Group SMIT screen appears.

3. Enter: itmf_vg in the VOLUME GROUP name field.

4. Go to the PHYSICAL VOLUME names field and press F4. The PHYSICAL VOLUME names SMIT dialog appears.

5. Select the physical volumes to include in the new volume group and press Enter. The Add a Volume Group SMIT selection screen appears.

In our environment, we used the summary analysis table to determine that because we are on tivaix1, we need to select hdisk9 and hdisk10 in the Add a Volume Group SMIT selection screen, as shown in Figure 5-8.

Figure 5-8 Select physical volumes for volume group itmf_vg

6. Go to the Volume Group MAJOR NUMBER field and enter a unique major number. This number must be unique in every cluster node that the volume group is shared in. Ensure the volume group is not automatically activated at system restart (HACMP needs to automatically activate it) by setting the Activate volume group AUTOMATICALLY at system restart field to no.

+--------------------------------------------------------------------------+¦ PHYSICAL VOLUME names ¦¦ ¦¦ Move cursor to desired item and press F7. ¦¦ ONE OR MORE items can be selected. ¦¦ Press Enter AFTER making all selections. ¦¦ ¦¦ hdisk4 ¦¦ hdisk5 ¦¦ > hdisk9 ¦¦ > hdisk10 ¦¦ hdisk11 ¦¦ hdisk13 ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F7=Select F8=Image F10=Exit ¦¦ Enter=Do /=Find n=Find Next ¦+--------------------------------------------------------------------------+

Tip: Record the volume group major number and the first physical disk you use for the volume group, for later reference in “Import the volume group into the remaining cluster nodes” on page 448.


In our environment, we entered 49 in the Volume Group MAJOR NUMBER field, and set the Activate volume group AUTOMATICALLY at system restart field to no, as shown in Figure 5-9. We use 49 as determined in “Identify the volume group major numbers” on page 430, so it will not conflict with the major numbers chosen for other volume groups and devices.

Figure 5-9 Configure settings to add volume group itmf_vg

7. Press Enter. The volume group is created.

8. Use the lsvg and lspv commands to verify the new volume group exists, as shown in Example 5-10.

Example 5-10 Verify creation of shared volume group itmf_vg on tivaix1

[root@tivaix1:/home/root] lsvgrootvgtiv_vg1tiv_vg2itmf_vg[root@tivaix1:/home/root] lspvhdisk0 0001813fe67712b5 rootvg active

Add a Volume Group


[Entry Fields] VOLUME GROUP name [itmf_vg] Physical partition SIZE in megabytes 4 +* PHYSICAL VOLUME names [hdisk9 hdisk10] + Force the creation of a volume group? no + Activate volume group AUTOMATICALLY no + at system restart? Volume Group MAJOR NUMBER [49] +# Create VG Concurrent Capable? no + Create a big VG format Volume Group? no + LTG Size in kbytes 128 +



hdisk1 0001813f1a43a54d rootvg activehdisk2 0001813f95b1b360 rootvg activehdisk3 0001813fc5966b71 rootvg activehdisk4 0001813fc5c48c43 Nonehdisk5 0001813fc5c48d8c Nonehdisk6 000900066116088b tiv_vg1hdisk7 000000000348a3d6 tiv_vg1hdisk8 00000000034d224b tiv_vg2hdisk9 0001813f72023fd6 itmf_vg activehdisk10 0001813f72025253 itmf_vg activehdisk11 0001813f71dd8f80 Nonehdisk12 00000000034d7fad tiv_vg2hdisk13 none None

Create the logical volume and Journaled File System Create a logical volume and a Journaled File System (JFS) on the new volume group. This makes the volume group available to applications running on AIX.

To create a logical volume and Journaled File System on the new volume group:

1. Create the mount point for the logical volume’s file system. Do this on all cluster nodes.

In our environment, we used the following command:

mkdir -p /opt/hativoli

2. Enter: smit crjfsstd.

3. The Volume Group Name SMIT selection screen displays a list of volume groups. Go to the new volume group and press Enter. The Add a Standard Journaled File System SMIT screen displays the attributes for a new standard Journaled File System.

In our environment, we selected itmf_vg, as shown in Figure 5-10 on page 438.


Figure 5-10 Select a volume group using the Volume Group Name SMIT selection screen

4. Enter values into the fields.

Number of units

Enter the number of megabytes to allocate for the standard Journaled File System.

MOUNT POINT

The mount point, which is the directory where the file system is available or will be made available.

Mount AUTOMATICALLY at system restart?

Indicates whether the file system is mounted at each system restart. Possible values are:

yes - meaning that the file system is automatically mounted at system restart

no - meaning that the file system is not automatically mounted at system restart.

In our environment, we entered 2048 in the Number of units field, /opt/hativoli in the MOUNT POINT field, and yes in the Mount AUTOMATICALLY at system restart? field, as shown in Figure 5-11 on page 439.

+--------------------------------------------------------------------------+¦ Volume Group Name ¦¦ ¦¦ Move cursor to desired item and press Enter. ¦¦ ¦¦ rootvg ¦¦ tiv_vg1 ¦¦ tiv_vg2 ¦¦ itmf_vg ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F8=Image F10=Exit Enter=Do ¦¦ /=Find n=Find Next ¦+--------------------------------------------------------------------------+


Figure 5-11 Create a standard Journaled File System on volume group itmf_vg in tivaix1

5. Press Enter to create the standard Journaled File System. The COMMAND STATUS SMIT screen displays the progress and result of the operation. A successful operation looks similar to Figure 5-12 on page 440.

Add a Standard Journaled File System


[Entry Fields] Volume group name itmf_vg SIZE of file system Unit Size Megabytes +* Number of units [2048] #* MOUNT POINT [/opt/hativoli] Mount AUTOMATICALLY at system restart? yes + PERMISSIONS read/write + Mount OPTIONS [] + Start Disk Accounting? no + Fragment Size (bytes) 4096 + Number of bytes per inode 4096 + Allocation Group Size (MBytes) 8 +



Figure 5-12 Successful creation of JFS file system /opt/hativoli on tivaix1

6. Use the ls, df, mount, and umount commands to verify the new standard Journaled File System, as shown in Example 5-11.

Example 5-11 Verify successful creation of a JFS file system

[root@tivaix1:/home/root] ls /opt/hativoli[root@tivaix1:/home/root] df -k /opt/hativoliFilesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/hd10opt 262144 68724 74% 3544 6% /opt[root@tivaix1:/home/root] mount /opt/hativoli[root@tivaix1:/home/root] df -k /opt/hativoliFilesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/lv09 2097152 2031276 4% 17 1% /opt/hativoli[root@tivaix1:/home/root] ls /opt/hativolilost+found[root@tivaix1:/home/root] umount /opt/hativoli

The new volume group is now populated with a new standard Journaled File System.

COMMAND STATUS



Based on the parameters chosen, the new /opt/hativoli JFS file systemis limited to a maximum size of 134217728 (512 byte blocks)

New File System size is 4194304



Configure the logical volumeRename the new logical volume and its log volume so it is guaranteed to be a unique name in any cluster node. The new name will be the same name on any cluster node that varies on the logical volume’s volume group, and must be unique from any other logical volume on all cluster nodes. You only need to perform this operation from one cluster node. The volume group must be online on this cluster node.

In our environment, we wanted to rename logical volume lv09 to itmf_lv, and logical log volume loglv00 to itmf_loglv.

To rename the logical volume and logical log volume:

1. Use the lsvg command as shown in Example 5-12 to identify the logical volumes on the new volume group.

In our environment, the volume group itmf_vg contains two logical volumes. Logical volume lv09 is for the standard Journal File System /opt/hativoli. Logical volume loglv00 is the log logical volume for lv09.

Example 5-12 Identify logical volumes on new volume group

[root@tivaix1:/home/root] lsvg -l itmf_vgitmf_vg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTloglv00 jfslog 1 1 1 closed/syncd N/Alv09 jfs 512 512 1 closed/syncd /opt/hativoli

2. Enter: smit chlv2. You can also enter: smit storage, go to Logical Volume Manager -> Logical Volumes -> Set Characteristic of a Logical Volume -> Rename a Logical Volume and press Enter. The Rename a Logical Volume SMIT screen is displayed.

3. Enter the name of the logical volume to rename in the CURRENT logical volume name field. Enter the new name of the logical volume in the NEW logical volume name field.

Important: Our environment does not use multiple SSA adapters due to resource constraints. In a production high availability environment, you use multiple disk controllers. Best practice for HACMP is to use multiple disk controllers and multiple disks for volume groups. Specifically, to ensure disk availability, best practice for each cluster node is to split a volume group between at least two disk controllers and three disks, mirroring across all the disks.


In our environment, we entered lv09 in the CURRENT logical volume name field, and itmf_lv in the NEW logical volume name field, as shown in Figure 5-13.

Figure 5-13 Rename a logical volume

4. Press Enter to rename the logical volume. The COMMAND STATUS SMIT screen displays the progress and the final status of the renaming operation.

5. Repeat the operation for the logical log volume.

In our environment, we renamed logical volume loglv00 to itmf_loglv, as shown in Figure 5-14 on page 443.



[Entry Fields]* CURRENT logical volume name [lv09] +* NEW logical volume name [itmf_lv]



Figure 5-14 REname the logical log volume

6. Run the chfs command as shown in Example 5-13 to update the relationship between the logical volume itmf_lv and logical log volume itmf_loglv.

Example 5-13 Update relationship between renamed logical volumes and logical log volumes

[root@tivaix1:/home/root] chfs /opt/hativoli

7. Verify the chfs command modified the /etc/filesystems file entry for the file system.

In our environment, we used the grep command as shown in Example 5-14 on page 444 to verify that the /etc/filesystems entry for /opt/hativoli matches the new names of the logical volume and logical log volume.

The attributes dev and log contain the new names itmf_lv and itmf_loglv, respectively.



[Entry Fields]* CURRENT logical volume name [loglv00] +* NEW logical volume name [itmf_loglv]



Example 5-14 Verify the chfs command

[root@tivaix1:/home/root] grep -p /opt/hativoli /etc/filesystems/opt/hativoli: dev = /dev/itmf_lv vfs = jfs log = /dev/itmf_loglv mount = true check = false options = rw account = false

Export the volume groupExport the volume group from the cluster node it was created upon to make it available to other cluster nodes.

To export a volume group:

1. Log into the cluster node that the volume group was created upon.


2. Note that the volume group is varied on as soon as it is created. Vary off the volume group if necessary, so it can be exported.

In our environment, we varied off the volume group itmf_vg by using the following command:

varyoffvg itmf_vg

3. Enter: smit exportvg. The Export a Volume Group SMIT screen displays a VOLUME GROUP name field.

4. Enter the new volume group in the VOLUME GROUP name field.

In our environment, we entered itmf_vg in the VOLUME GROUP name field, as shown in Figure 5-15 on page 445.


Figure 5-15 Export a Volume Group SMIT screen

5. Press Enter to export the volume group. The COMMAND STATUS SMIT screen displays the progress and final result of the export operation.

6. Use the lsvg and lspv commands as shown in Example 5-15 to verify the export of the volume group. Notice that the volume group name does not appear in the output of either command.

Example 5-15 Verify the export of volume group itmf_vg from tivaix1

[root@tivaix1:/home/root] lsvgrootvgtiv_vg1tiv_vg2[root@tivaix1:/home/root] lspvhdisk0 0001813fe67712b5 rootvg activehdisk1 0001813f1a43a54d rootvg activehdisk2 0001813f95b1b360 rootvg activehdisk3 0001813fc5966b71 rootvg activehdisk4 0001813fc5c48c43 Nonehdisk5 0001813fc5c48d8c None

Export a Volume Group


[Entry Fields]* VOLUME GROUP name [itmf_vg] +



hdisk6 000900066116088b tiv_vg1hdisk7 000000000348a3d6 tiv_vg1hdisk8 00000000034d224b tiv_vg2hdisk9 0001813f72023fd6 Nonehdisk10 0001813f72025253 Nonehdisk11 0001813f71dd8f80 Nonehdisk12 00000000034d7fad tiv_vg2hdisk13 none None

Re-import the volume groupOnce we export a volume group, we import it back into the same cluster node we first exported it from. We then log into the other cluster nodes on the same SSA loop as the cluster node we create the volume group upon in “Configure volume group on SSA drives” on page 434, and import the volume group so we can make it a shared volume group.

To import the volume group back to the same cluster node we first exported it from:

1. Log into the cluster node as root user.


2. Use the lsvg command as shown in Example 5-16 to verify the volume group is not already imported.

Example 5-16 Verify volume group itmf_vg is not already imported into tivaix1

[root@tivaix1:/home/root] lsvg -l itmf_vg0516-306 : Unable to find volume group i in the Device Configuration Database.

3. Enter: smit importvg. You can also enter: smit storage, go to Logical Volume Manager -> Volume Groups -> Import a Volume Group, and press Enter. The Import a Volume Group SMIT screen is displayed.

4. Enter the following values. Use the values determined in “Configure volume group on SSA drives” on page 434.

VOLUME GROUP name

The volume group name. The name must be unique system-wide, and can range from 1 to 15 characters.

PHYSICAL VOLUME name

The name of the physical volume. Physical volume names are typically in the form “hdiskx” where x is a system-wide unique number. This name is assigned when the disk is detected for the first time on a system


startup or when the system management commands are used at runtime to add a disk to the system.

Volume Group MAJOR NUMBER

The major number of the volume group. The system kernel accesses devices, including volume groups, through a major and minor number combination. To see what major numbers are available on your system, use the SMIT “List” feature.

In our environment, we entered itmf_vg in the VOLUME GROUP name field, hdisk9 in the PHYSICAL VOLUME name field, and 49 in the Volume Group MAJOR NUMBER, as shown in Figure 5-16.

Figure 5-16 Import a volume group

5. Press Enter to import the volume group. The COMMAND STATUS SMIT screen displays the progress and final result of the volume group import operation.

6. Vary on the volume group using the varyonvg command.



[Entry Fields] VOLUME GROUP name [itmf_vg]* PHYSICAL VOLUME name [hdisk9] + Volume Group MAJOR NUMBER [49] +#



In our environment, we entered the command:

varyonvg itmf_vg

7. Use the lsvg command as shown in Example 5-17 to verify the volume group import.

Example 5-17 Verify import of volume group itmf_vg into tivaix1

[root@tivaix1:/home/root] lsvg -l itmf_vgitmf_vg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTitmf_loglv jfslog 1 1 1 closed/syncd N/Aitmf_lv jfs 512 512 1 closed/syncd /opt/hativoli

8. Vary off the volume group using the varyoffvg command so you can import the volume group into the remaining cluster nodes.


varyoffvg itmf_vg

Import the volume group into the remaining cluster nodesImport the volume group into the remaining cluster nodes so it becomes a shared volume group.

In our environment, we imported volume group itmf_vg into cluster node tivaix2.

To import a volume group defined on SSA drives so it becomes a shared volume group with other cluster nodes:

1. Log into another cluster node as root user.


2. Enter the SMIT fast path command: smit importvg. You can also enter: smit storage, go to Logical Volume Manager -> Volume Groups -> Import a Volume Group, and press Enter. The Import a Volume Group SMIT screen is displayed.

3. Use the same volume group name that you used in the preceding operation for the VOLUME GROUP name field.

In our environment, we entered itmf_vg in the VOLUME GROUP name field.

4. Use the summary analysis table created in “Plan the shared disk” on page 421 to determine the logical disk to use. The volume group major

Note: Importing a volume group also varies it on, so be sure to vary it off first with the varyoffvg command if it is in the ONLINE state on a cluster node.


number is the same on all cluster nodes, so use the same volume group major number as in the preceding operation.

In our environment, we observed that hdisk9 on tivaix1 corresponds to hdisk8 on tivaix2, so we used hdisk8 in the PHYSICAL VOLUME name field, as shown in Figure 5-17.

Figure 5-17 Import volume group itmf_vg on tivaix2

5. Press Enter to import the volume group. The COMMAND STATUS SMIT screen displays the progress and final result of the volume group import operation.

6. Use the lsvg and lspv commands to verify the volume group import. The output of these commands contains the name of the imported volume group.

In our environment, we verified the volume group import as shown in Example 5-18 on page 450.



[Entry Fields] VOLUME GROUP name [itmf_vg]* PHYSICAL VOLUME name [hdisk8] + Volume Group MAJOR NUMBER [49] +#



Example 5-18 Verify the import of volume group itmf_vg into tivaix2

[root@tivaix2:/home/root] lsvgrootvgtiv_vg1tiv_vg2itmf_vg[root@tivaix2:/home/root] lsvg -l itmf_vgitmf_vg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTitmf_loglv jfslog 1 1 1 closed/syncd N/Aitmf_lv jfs 512 512 1 closed/syncd /opt/hativoli[root@tivaix2:/home/root] lspvhdisk0 0001814f62b2a74b rootvg activehdisk1 none Nonehdisk2 none Nonehdisk3 none Nonehdisk4 none Nonehdisk5 000900066116088b tiv_vg1hdisk6 000000000348a3d6 tiv_vg1hdisk7 00000000034d224b tiv_vg2hdisk8 0001813f72023fd6 itmf_vg activehdisk9 0001813f72025253 itmf_vg activehdisk10 0001813f71dd8f80 Nonehdisk11 00000000034d7fad tiv_vg2hdisk12 0001814f7ce1d08d Nonehdisk16 0001814fe8d10853 None

7. Vary off the volume group using the varyoffvg command.

In our environment, we entered the following command into tivaix2:

varyoffvg itmf_vg

Verify the volume group sharingManually verify that all imported volume groups can be shared between cluster nodes before configuring HACMP. If volume group sharing fails under HACMP, manual verification usually allows you to rule out a problem in the configuration of the volume groups, and focus upon the definition of the shared volume groups under HACMP.

To verify volume group sharing:

1. Log into a cluster node as root user.


2. Verify the volume group is not already active on the cluster node. Use the lsvg command as shown in Example 5-19 on page 451. The name of the


volume group does not appear in the output of the command if the volume group is not active on the cluster node.

Example 5-19 Verify a volume group is not already active on a cluster node

[root@tivaix1:/home/root] lsvg -orootvg

3. Vary on the volume group using the varyonvg command.


varyonvg itmf_vg

4. Use the lspv and lsvg commands as shown in Example 5-20 to verify the volume group is put into the ONLINE state. The name of the volume group appears in the output of these commands now, where it did not before.

Example 5-20 How to verify volume group itmf_vg is online on tivaix1

[root@tivaix1:/home/root] lsvg -oitmf_vgrootvg[root@tivaix1:/home/root] lsvg -l itmf_vgitmf_vg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTitmf_loglv jfslog 1 1 1 closed/syncd N/Aitmf_lv jfs 512 512 1 closed/syncd /opt/hativoli[root@tivaix1:/home/root] lspvhdisk0 0001813fe67712b5 rootvg activehdisk1 0001813f1a43a54d rootvg activehdisk2 0001813f95b1b360 rootvg activehdisk3 0001813fc5966b71 rootvg activehdisk4 0001813fc5c48c43 Nonehdisk5 0001813fc5c48d8c Nonehdisk6 000900066116088b tiv_vg1hdisk7 000000000348a3d6 tiv_vg1hdisk8 00000000034d224b tiv_vg2hdisk9 0001813f72023fd6 itmf_vg activehdisk10 0001813f72025253 itmf_vg activehdisk11 0001813f71dd8f80 Nonehdisk12 00000000034d7fad tiv_vg2hdisk13 none None

5. Use the df, mount, touch, and ls and umount commands to verify the availability of the logical volume, and to create a test file. The file system and mount point changes after mounting the logical volume.

In our environment, we created the test file /opt/hativoli/node_tivaix1.


Example 5-21 Verify availability of a logical volume in a shared volume group

[root@tivaix1:/home/root] df -k /opt/hativoliFilesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/hd10opt 262144 68724 74% 3544 6% /opt[root@tivaix1:/home/root] mount /opt/hativoli[root@tivaix1:/home/root] df -k /opt/hativoliFilesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/itmf_lv 2097152 2031276 4% 17 1% /opt/hativoli[root@tivaix1:/home/root] touch /opt/hativoli/node_tivaix1[root@tivaix1:/home/root] ls -l /opt/hativoli/node_tivaix*-rw-r--r-- 1 root sys 0 Dec 17 15:25 /opt/hativoli/node_tivaix1[root@tivaix1:/home/root] umount /opt/hativoli

6. Vary off the volume group using the varyoffvg command.

In our environment, we used the command:

varyoffvg itmf_vg

7. Repeat the operation on all remaining cluster nodes. Ensure test files created on other cluster nodes sharing this volume group exist.

In our environment, we repeated the operation on tivaix2 as shown in Example 5-22.

Example 5-22 Verify shared volume group itmf_vg on tivaix2

[root@tivaix2:/home/root] lsvg -orootvg[root@tivaix2:/home/root] varyonvg itmf_vg[root@tivaix2:/home/root] lsvg -oitmf_vgrootvg[root@tivaix2:/home/root] lsvg -l itmf_vgitmf_vg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTitmf_loglv jfslog 1 1 1 closed/syncd N/Aitmf_lv jfs 512 512 1 closed/syncd /opt/hativoli[root@tivaix2:/home/root] lspvhdisk0 0001814f62b2a74b rootvg activehdisk1 none Nonehdisk2 none Nonehdisk3 none Nonehdisk4 none Nonehdisk5 000900066116088b tiv_vg1hdisk6 000000000348a3d6 tiv_vg1hdisk7 00000000034d224b tiv_vg2hdisk8 0001813f72023fd6 itmf_vg activehdisk9 0001813f72025253 itmf_vg activehdisk10 0001813f71dd8f80 None


hdisk11 00000000034d7fad tiv_vg2hdisk12 0001814f7ce1d08d Nonehdisk16 0001814fe8d10853 None[root@tivaix2:/home/root] df -k /opt/hativoliFilesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/hd10opt 262144 29992 89% 3587 6% /opt[root@tivaix2:/home/root] mount /opt/hativoli[root@tivaix2:/home/root] df -k /opt/hativoliFilesystem 1024-blocks Free %Used Iused %Iused Mounted on/dev/itmf_lv 2097152 2031276 4% 17 1% /opt/hativoli[root@tivaix2:/home/root] touch /opt/hativoli/node_tivaix2[root@tivaix2:/home/root] ls -l /opt/hativoli/node_tivaix*-rw-r--r-- 1 root sys 0 Dec 17 15:25 /opt/hativoli/node_tivaix1-rw-r--r-- 1 root sys 0 Dec 17 15:26 /opt/hativoli/node_tivaix2[root@tivaix2:/home/root] umount /opt/hativoli[root@tivaix2:/home/root] varyoffvg itmf_vg

5.1.4 Install IBM Tivoli Management FrameworkIn this section we show how to install IBM Tivoli Management Framework Version 4.1 with all available patches as of the time of writing; specifically, how to install on tivaix1 in the environment used for this redbook. We only need to install once, because we used a hot standby configuration. After installing IBM Tivoli Management Framework, we describe how to install and configure HACMP for it on both tivaix1 and tivaix2.

Concurrent access requires application support of the Cluster Lock Manager. IBM Tivoli Management Framework does not support Cluster Lock Manager, so we use shared Logical Volume Manager (LVM) access.

Plan for high availability considerationsWe install the IBM Tivoli Management Framework before installing and configuring HACMP—so if IBM Tivoli Management Framework exhibits problems after introducing HACMP, we will know the root cause is likely an HACMP configuration issue.

It helps the overall deployment if we plan around some of the high availability considerations while installing IBM Tivoli Management Framework.

Installation directoriesIBM Tivoli Management Framework uses the following directories on a Tivoli server:

� /etc/Tivoli� Tivoli home directory, where IBM Tivoli Management Framework is installed

under, and most Tivoli Enterprise products are usually installed in.


In our environment, we left /etc/Tivoli on the local drives of each cluster node. This enabled the flexibility to easily use multiple, local Endpoint installations on each cluster node. Putting /etc/Tivoli on the shared disk volume is possible, but it involves adding customized start and stop HACMP scripts that would “shuffle” the contents of /etc/Tivoli depending upon what Endpoints are active on a cluster node.

We use /opt/hativoli as the Tivoli home directory. Following best practice, we first install IBM Tivoli Management Framework into /opt/hativoli, then install and configure HACMP.

Associated IP addressesConfiguring the Tivoli server as a resource group in a hot standby two-node cluster requires that the IP addresses associated with the server remain with the server, regardless of which cluster node it runs upon. The IP address associated with the installation of the Tivoli server should be the service IP address. When the cluster node the Tivoli server is running on falls over, the service IP label falls over to the new cluster node, along with the resource group that contains the Tivoli server.

Plan the installation sequenceBefore installing, plan the sequence of the packages you are going to install. Refer to Tivoli Enterprise Installation Guide Version 4.1, GC32-0804, for detailed information about what needs to be installed. Figure 5-18 on page 455 shows the sequence and dependencies of packages we planned for IBM Tivoli Management Framework Version 4.1 for the environment we used for this redbook.

Important: These are not the only directories used in a Tivoli Enterprise deployment of multiple IBM Tivoli products.

Note: In an actual production deployment, best practice is to implement /etc/Tivoli on a shared volume group because leaving it on the local disk of a system involves synchronizing the contents of highly available Endpoints across cluster nodes.


Figure 5-18 IBM Tivoli Framework 4.1.0 application and patch sequence and dependencies as of December 2, 2003

Stage installation mediaComplete the procedures listed in “Stage installation media” on page 313 to stage the IBM Tivoli Management Framework installation media.

Modify /etc/hosts and name resolution orderComplete the procedures in “Modify /etc/hosts and name resolution order” on page 250 to configure IP hostname lookups.

Install base FrameworkIn this section we show you how to install IBM Tivoli Management Framework so that it is specifically configured for IBM Tivoli Workload Scheduler on HACMP. This enables you to transition the instances of IBM Tivoli Management Framework used for IBM Tivoli Workload Scheduler to a mutual takeover environment if that becomes a supported feature in the future. We believe the configuration as shown in this section can be started and stopped directly from HACMP in a mutual takeover configuration.

When installing IBM Tivoli Management Framework on an HACMP cluster node in support of IBM Tivoli Workload Scheduler, use the primary IP hostname as the hostname for IBM Tivoli Management Framework. Add an IP alias later for the

4.1-TMF-0014

4.1-TMF-0008

4.1-TMF-0015

4.1-TMF-0016

4.1-TMF-0017

TMF410

4.1-TMF-0032

odadmin rexec4.1-TMF-0034


service IP label. When this configuration is used with the multiple Connector object configuration described in section, this enables Job Scheduling Console users to connect through any instance of IBM Tivoli Management Framework, no matter which cluster nodes fall over.

IBM Tivoli Management Framework itself consists of a base install, and various components. You must first prepare for the base install by performing the commands as shown in Example 5-23 for cluster node tivaix1 in our environment. On tivaix2, we replace the IP hostname in the first command from tivaix1_svc to tivaix2_svc.

Example 5-23 Preparing for installation of IBM Tivoli Management Framework 4.1

[root@tivaix1:/home/root] HOST=tivaix1_svc[root@tivaix1:/home/root] echo $HOST > /etc/wlocalhost[root@tivaix1:/home/root] WLOCALHOST=$HOST[root@tivaix1:/home/root] export WLOCALHOST[root@tivaix1:/home/root] mkdir /opt/hativoli/install_dir[root@tivaix1:/home/root] cd /opt/hativoli/install_dir[root@tivaix1:/opt/hativoli/install_dir] /bin/sh \> /usr/sys/inst.images/tivoli/fra/FRA410_1of2/WPREINST.SHto install, type ./wserver -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2[root@tivaix1:/opt/hativoli/install_dir] DOGUI=no[root@tivaix1:/opt/hativoli/install_dir] export DOGUI

After you prepare for the base install, perform the initial installation of IBM Tivoli Management Framework by running the command shown in Example 5-24. You will see output similar to this example; depending upon the speed of your server, it will take 5 to 15 minutes to complete.

On tivaix2 in our environment, we run the same command except we change the third line of the command from tivaix1_svc to tivaix2_svc.

Example 5-24 Initial installation of IBM Tivoli Management Framework Version 4.1

[root@tivaix1:/home/root] cd /usr/local/Tivoli/install_dir[root@tivaix1:/usr/local/Tivoli/install_dir] sh ./wserver -y \-c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 \-a tivaix1_svc -d \BIN=/opt/hativoli/bin! \LIB=/opt/hativoli/lib! \ALIDB=/opt/hativoli/spool! \MAN=/opt/hativoli/man! \APPD=/usr/lib/lvm/X11/es/app-defaults! \CAT=/opt/hativoli/msg_cat! \LK=1FN5B4MBXBW4GNJ8QQQ62WPV0RH999P99P77D \RN=tivaix1_svc-region \AutoStart=1 SetPort=1 CreatePaths=1 @ForceBind@=yes @EL@=None


Using command line style installation.....

Unless you cancel, the following operations will be executed: need to copy the CAT (generic) to: tivaix1_svc:/opt/hativoli/msg_cat need to copy the CSBIN (generic) to: tivaix1_svc:/opt/hativoli/bin/generic need to copy the APPD (generic) to: tivaix1_svc:/usr/lib/lvm/X11/es/app-defaults need to copy the GBIN (generic) to: tivaix1_svc:/opt/hativoli/bin/generic_unix need to copy the BUN (generic) to: tivaix1_svc:/opt/hativoli/bin/client_bundle need to copy the SBIN (generic) to: tivaix1_svc:/opt/hativoli/bin/generic need to copy the LCFNEW (generic) to: tivaix1_svc:/opt/hativoli/bin/lcf_bundle.40 need to copy the LCFTOOLS (generic) to: tivaix1_svc:/opt/hativoli/bin/lcf_bundle.40/bin need to copy the LCF (generic) to: tivaix1_svc:/opt/hativoli/bin/lcf_bundle need to copy the LIB (aix4-r1) to: tivaix1_svc:/opt/hativoli/lib/aix4-r1 need to copy the BIN (aix4-r1) to: tivaix1_svc:/opt/hativoli/bin/aix4-r1 need to copy the ALIDB (aix4-r1) to: tivaix1_svc:/opt/hativoli/spool/tivaix1.db need to copy the MAN (aix4-r1) to: tivaix1_svc:/opt/hativoli/man/aix4-r1 need to copy the CONTRIB (aix4-r1) to: tivaix1_svc:/opt/hativoli/bin/aix4-r1/contrib need to copy the LIB371 (aix4-r1) to: tivaix1_svc:/opt/hativoli/lib/aix4-r1 need to copy the LIB365 (aix4-r1) to: tivaix1_svc:/opt/hativoli/lib/aix4-r1Executing queued operation(s)Distributing machine independent Message Catalogs --> tivaix1_svc ..... Completed.

Distributing machine independent generic Codeset Tables --> tivaix1_svc .... Completed.

Distributing architecture specific Libraries --> tivaix1_svc ...... Completed.

Distributing architecture specific Binaries --> tivaix1_svc ............. Completed.

Distributing architecture specific Server Database --> tivaix1_svc


.......................................... Completed.

Distributing architecture specific Man Pages --> tivaix1_svc ..... Completed.

Distributing machine independent X11 Resource Files --> tivaix1_svc ... Completed.

Distributing machine independent Generic Binaries --> tivaix1_svc ... Completed.

Distributing machine independent Client Installation Bundle --> tivaix1_svc ... Completed.

Distributing machine independent generic HTML/Java files --> tivaix1_svc ... Completed.

Distributing architecture specific Public Domain Contrib --> tivaix1_svc ... Completed.

Distributing machine independent LCF Images (new version) --> tivaix1_svc ............. Completed.

Distributing machine independent LCF Tools --> tivaix1_svc ....... Completed.

Distributing machine independent 36x Endpoint Images --> tivaix1_svc ............ Completed.

Distributing architecture specific 371_Libraries --> tivaix1_svc .... Completed.

Distributing architecture specific 365_Libraries --> tivaix1_svc .... Completed.

Registering installation information...Finished.

Load Tivoli environment variables in .profile filesThe Tivoli environment variables contain pointers to important directories that IBM Tivoli Management Framework uses for many commands. Loading the variables in the .profile file of a user account ensures that these environment variables are always available immediately after logging into the user account.


Use the commands in Example 5-25 to modify the .profile files of the root user account on all cluster nodes to source in all Tivoli environment variables for IBM Tivoli Management Framework.

Example 5-25 Load Tivoli environment variables on tivaix1

PATH=${PATH}:${HOME}/binif [ -f /etc/Tivoli/setup_env.sh ] ; then . /etc/Tivoli/setup_env.shfi

Also enter these commands on the command line, or log out and log back in to activate the environment variables for the following sections.

Install Framework components and patchesAfter the base install is complete, you can install all remaining Framework components and patches by running the script shown in Example 5-26. If you use this script on tivaix2, change the line that starts with the string “HOST=” so that tivaix1 is replaced with tivaix2.

Example 5-26 Script for installing IBM Tivoli Management Framework Version 4.1 with patches

#!/bin/ksh

if [ -d /etc/Tivoli ] ; then . /etc/Tivoli/setup_env.shfi

reexec_oserv(){ echo "Reexecing object dispatchers..." if [ òdadmin odlist list_od | wc -l` -gt 1 ] ; then # # Determine if necessary to shut down any clients tmr_hosts=òdadmin odlist list_od | head -1 | cut -c 36-` client_list=òdadmin odlist list_od | grep -v ${tmr_hosts}$` if [ "${client_list}" = "" ] ; then echo "No clients to shut down, skipping shut down of clients..." else echo "Shutting down clients..." odadmin shutdown clients echo "Waiting for all clients to shut down..." sleep 30 fi fi odadmin reexec 1 sleep 30 odadmin start clients


}

HOST="tivaix1_svc"winstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JRE130 $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JHELP41 $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JCF41 $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i JRIM41 $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i MDIST2GU $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i SISDEPOT $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_2of2/JAVA -y -i SISCLNT $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 -y -i ADE $HOSTwinstall -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 -y -i AEF $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF008 -y -i 41TMF008 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF014 -y -i 41TMF014 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF015 -y -i 41TMF015 $HOSTreexec_oservwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF016 -y -i 41TMF016 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2928 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2929 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2931 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2932 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2962 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2980 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2984 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2986 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2987 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF017 -y -i TMA2989 $HOSTwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF034 -y -i 41TMF034 $HOSTreexec_oservwpatch -c /usr/sys/inst.images/tivoli/fra/41TMF032 -y -i JRE130_0 $HOST

This completes the installation of IBM Tivoli Management Framework Version 4.1. The successful completion of the installation performs a gross level verification of IBM Tivoli Management Framework.

After installing IBM Tivoli Management Framework, configure it to meet the requirements of integrating with IBM Tivoli Workload Scheduler over HACMP.

Add IP alias to oservInstalling IBM Tivoli Management Framework using the service IP hostname of the server binds the Framework server (also called oserv) to the corresponding service IP address.

It only listens for Framework network traffic on this IP address. This ensures a highly available IBM Tivoli Management Framework only starts after HACMP is running.


In our environment, we also need oserv to listen on the persistent IP address. The persistent IP label/address is not moved between cluster nodes when a resource group is moved, but remains on the cluster node to ease administrative access (that is why it is called the persistent IP label/address). Job Scheduling Console users depend upon using the service IP address to access IBM Tivoli Workload Scheduler services.

As a security precaution, IBM Tivoli Management Framework only listens on the IP address it is initially installed against unless the feature specifically disabled to bind against other addresses. We show you how to disable this feature in this section.

To add the service IP label as a Framework oserv IP alias:


In our environment, we logged in as root user on cluster node tivaix1.

2. Use the odadmin command as shown in Example 5-27 to verify the current IP aliases of the oserv, add the service IP label as an IP alias to the oserv, and then verify that the service IP label is added to the oserv as an IP alias.

Note that the numeral 1 in the odadmin odlist add_ip_alias command should be replaced by the dispatcher number of your Framework installation.

Example 5-27 Add IP alias to Framework oserv server

[root@tivaix1:/home/root] odadmin odlistRegion Disp Flags Port IPaddr Hostname(s)1369588498 1 ct- 94 9.3.4.3 tivaix1_svc[root@tivaix1:/home/root] odadmin odlist add_ip_alias 1 tivaix1[root@tivaix1:/home/root] odadmin odlistRegion Disp Flags Port IPaddr Hostname(s)1369588498 1 ct- 94 9.3.4.3 tivaix1_svc 9.3.4.194 tivaix1,tivaix1.itsc.austin.ibm.com


Example 5-28 Identify dispatcher number of Framework installation

[root@tivaix1:/home/root] odadmin odlistRegion Disp Flags Port IPaddr Hostname(s)1369588498 7 ct- 94 9.3.4.3 tivaix1_svc

The dispatcher number will be something other than 1 if you delete and re-install Managed Nodes, or if your Framework server is part of an overall Tivoli Enterprise installation.


3. Use the odadmin command as shown in Example 5-29 to verify that IBM Tivoli Management Framework currently binds against the primary IP hostname, then disable the feature, and then verify that it is disabled.

Note that the numeral 1 in the odadmin set_force_bind command should be replaced by the dispatcher number of your Framework installation.

Example 5-29 Disable set_force_bind object dispatcher option

[root@tivaix1:/home/root] odadmin | grep ForceForce socket bind to a single address = TRUE[root@tivaix1:/home/root] odadmin set_force_bind FALSE 1[root@tivaix1:/home/root] odadmin | grep ForceForce socket bind to a single address = FALSE


Example 5-30 Identify dispatcher number of Framework installation

[root@tivaix1:/home/root] odadmin odlistRegion Disp Flags Port IPaddr Hostname(s)1369588498 7 ct- 94 9.3.4.3 tivaix1_svc

The dispatcher number will be something other than 1 if you delete and re-install Managed Nodes, or if your Framework server is part of an overall Tivoli Enterprise installation.

4. Repeat the operation on all remaining cluster nodes.

For our environment, we repeat the operation on tivaix2, replacing tivaix1 with tivaix2 in the commands.

Important: Disabling the set_force_bind variable can cause unintended side effects for installations of IBM Tivoli Management Framework that also run other IBM Tivoli server products, such as IBM Tivoli Monitoring and IBM Tivoli Configuration Manager. Consult your IBM service provider for advice on how to address this potential conflict if you plan on deploying other IBM Tivoli server products on top of the instance of IBM Tivoli Management Framework that you use for IBM Tivoli Workload Scheduler.

Best practice is to dedicate an instance of IBM Tivoli Management Framework for IBM Tivoli Workload Scheduler, typically on the Master Domain Manager, and not to install other IBM Tivoli server products into it. This simplifies these administrative concerns and does not affect the functionality of a Tivoli Enterprise environment.


Move the .tivoli directoryThe default installation of IBM Tivoli Management Framework on a UNIX system creates the /tmp/.tivoli directory. This directory contains files that are required by the object dispatcher process. In a high availability implementation, the directory needs to move with the resource group that contains IBM Tivoli Management Framework. This means we need to move the directory into the shared volume group’s file system. In our environment, we moved the directory to /opt/hativoli/tmp/.tivoli.

To use a different directory, you must set an environment variable in both the object dispatcher and the shell. After installing IBM Tivoli Management Framework, perform the following steps to set the necessary environment variables:

1. Create a directory. This directory must have at least public read and write permissions. However, define full permissions and set the sticky bit to ensure that users cannot modify files that they do not own.

In our environment, we ran the commands shown in Example 5-31.

Example 5-31 Create the new .tivoli directory

mkdir -p /opt/hativoli/tmp/.tivolichmod ugo=rwx /opt/hativoli/tmp/.tivolichmod u+s /opt/hativoli/tmp/.tivoli

2. Set the environment variable in the object dispatcher:

a. Enter the following command:

odadmin environ get > envfile

b. Add the following line to the envfile file and save it:

TIVOLI_COMM_DIR=new_directory_name

c. Enter the following command:

odadmin environ set < envfile

3. Edit the Tivoli-provided set_env.csh, setup_env.sh, and oserv.rc files in the /etc/Tivoli directory to set the TIVOLI_COMM_DIR variable.

4. For HP-UX and Solaris systems, add the following line to the file that starts the object dispatcher:

TIVOLI_COMM_DIR=new_directory_name

Insert the line near where the other environment variables are set, in a location that runs before the object dispatcher is started. The following list contains the file that needs to be changed on each operating system:

– For HP-UX operating systems: /sbin/init.d/Tivoli– For Solaris operating systems: /etc/rc3.d/S99Tivoli


5. Shut down the object dispatcher by entering the following command:

odadmin shutdown all

6. Restart the object dispatcher by entering the following command:

odadmin reexec all

5.1.5 Tivoli Web interfacesIBM Tivoli Management Framework provides access to Web-enabled Tivoli Enterprise applications from a browser. When a browser sends an HTTP request to the Tivoli server, the request is redirected to a Web server. IBM Tivoli Management Framework provides this Web access by using some servlets and support files that are installed on the Web server. The servlets establish a secure connection between the Web server and the Tivoli server. The servlets and support files are called the Tivoli Web interfaces.

IBM Tivoli Management Framework provides a built-in Web server called the spider HTTP service. It is not as robust or secure as a third-party Web server, so if you plan on deploying a Tivoli Enterprise product that requires Web access, consult your IBM service provider for advice about selecting a more appropriate Web server.

IBM Tivoli Management Framework supports any Web server that implements the Servlet 2.2 specifications, but the following Web servers are specifically certified for use with IBM Tivoli Management Framework:

� IBM WebSphere® Application Server, Advanced Single Server Edition� IBM WebSphere Application Server, Enterprise Edition� IBM WebSphere Enterprise Application Server� Jakarta Tomcat

The Web server can be hosted on any computer system. If you deploy a Web server on a cluster node, you will likely want to make it highly available. In this redbook we focus upon high availability for IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework. Refer to IBM WebSphere V5.0 Performance, Scalability, and High Availability: WebSphere Handbook Series, SG24-6198-00, for details on configuring WebSphere Application Server for high availability. Consult your IBM service provider for more details on configuring other Web servers for high availability.

5.1.6 Tivoli Managed NodeManaged Nodes are no different from IBM Tivoli Management Framework Tivoli servers in terms of high availability design. They operate under the same constraint of only one instance per operating system instance. While the


AutoStart install variable of the wclient command implies we can configure multiple instances of the object dispatcher on a single operating system instance, IBM Tivoli Support staff confirmed for us that this is not a supported configuration at the time of writing.

Use the wclient command to install a Managed Node in a highly available cluster, as shown in Example 5-32.

Example 5-32 Install a Managed Node

wclient -c /usr/sys/inst.images/tivoli/fra/FRA410_1of2 \-p ibm.tiv.pr -P @AutoStart@=0 @ForceBind@=yes \BIN=/opt/hativoli/bin! \LIB=/opt/hativoli/lib! \DB=/opt/hativoli/spool! \MAN=/opt/hativoli/man! \APPD=/usr/lib/lvm/X11/es/app-defaults! \CAT=/opt/hativoli/msg_cat! \tivaix3_svc

In this example, we installed a Managed Node named tivaix3_svc on a system with the IP hostname tivaix3_svc (the service IP label of the cluster node) from the CD image we copied to the local drive in “Stage installation media” on page 455, into the directory /opt/hativoli. We also placed the managed resource object in the ibm.tiv.pr policy region. See about how to use the wclient command.

Except for the difference in the initial installation (using the wclient command instead of the wserver command), planning and implementing a highly available Managed Node is the same as for a Tivoli server, as described in the preceding sections.

If the constraint is lifted in future versions of IBM Tivoli Management Framework, or if you still want to install multiple instances of the object dispatcher on a single instance of an operating system, configure each instance with a different directory.

To configure a different directory, change the BIN, LIB, DB, MAN, CAT and (optionally) APPD install variables that are passed to the wclient command. Configure the Tivoli environment files and the oserv.rc executable in /etc/Tivoli to accommodate the multiple installations. Modify external dependencies upon /etc/Tivoli where appropriate. We recommend using multiple, separate directories, one for each instance of IBM Tivoli Management Framework. Consult your IBM service provider for assistance with configuring this design.


5.1.7 Tivoli EndpointsEndpoints offer more options for high availability designs. When designing a highly available Tivoli Enterprise deployment, best practice is to keep the number of Managed Nodes as low as possible, and to use Endpoints as much as possible. In some cases (such as, for very old versions of Plus Modules) this might not be feasible, but the benefits of using Endpoints can often justify the cost of refactoring these older products into an Endpoint form.

Unlike Managed Nodes, multiple Endpoints on a single instance of an operating system are supported. This opens up many possibilities for high availability design. One design is to create an Endpoint to associate with a highly available resource group on a shared volume group, as shown in Figure 5-19.

Figure 5-19 Normal operation of highly available Endpoint

tivaix2

Shared Volume Group

/opt/hativoli/lcf

tivaix1

HA Endpoint lcfd

Framework oserv Framework oserv


Under normal operation, cluster node tivaix1 runs the highly available Endpoint from the directory /opt/hativoli/lcf on the shared volume group. When the resource group falls over, tivaix1 is unavailable and the resource group moves to tivaix2. The Endpoint continues to listen on the IP service address of tivaix1, but runs off tivaix2 instead, as shown in Figure 5-20.

Figure 5-20 Fallover operation of highly available Endpoint

We recommend that you use this configuration to manage HACMP resource group-specific system resources. Examples of complementary IBM Tivoli products that leverage Endpoints in a highly available environment include:

� Monitor a file system in a resource group with IBM Tivoli Monitoring.

� Monitor a highly available database in a resource group with IBM Tivoli Monitoring for Databases.

tivaix2

Shared Volume Group

/opt/hativoli/lcf

tivaix1

HA Endpoint lcfd



� Inventory and distribute software used in a resource group with IBM Tivoli Configuration Manager.

� Enforce software license compliance of applications in a resource group with IBM Tivoli License Manager.

Specific IBM Tivoli products may have specific requirements that affect high availability planning and implementation. Consult your IBM service provider for assistance with planning and implementing other IBM Tivoli products on top of a highly available Endpoint.

Another possible design builds on top of a single highly available Endpoint. The highly available Endpoint is sufficient for managing the highly available resource group, but is limited in its ability to manage the cluster hardware. A local instance of an Endpoint can be installed to specifically manage compute resources associated with each cluster node.

For example, assume we use a cluster configured with a resource group for a highly available instance of IBM WebSphere Application Server. The environment uses IBM Tivoli Monitoring for Web Infrastructure to monitor the instance of IBM WebSphere Application Server in the resource group. This is managed through a highly available Endpoint that moves with the Web server’s resource group. It also needs to use IBM Tivoli Monitoring to continuously monitor available local disk space on each cluster node.

In one possible fallover scenario, the resource group moves from one cluster node to another such that it leaves both the source and destination cluster nodes running. A highly available Endpoint instance can manage the Web server because they both move with a resource group, but it will no longer be able to manage hardware-based resources because the cluster node hardware itself is changed when the resource group moves.

Under this design, the normal operation of the cluster we used for this redbook is shown in Figure 5-21 on page 469.


Figure 5-21 Normal operation of local and highly available Endpoints

In normal operation then, three Endpoints are running. If the cluster moves the resource group containing the highly available Endpoint from tivaix1 to tivaix2, the state of the cluster would still leave three Endpoints, as shown in Figure 5-22 on page 470.

tivaix2

Shared Volume Group

/opt/hativoli/lcf

tivaix1

HA Endpoint lcfd

Endpoint lcfd/opt/lcftivoli rootvg




Figure 5-22 Cluster state after moving highly available Endpoint to tivaix2

However, if cluster node tivaix1 fell over to tivaix2 instead, it would leave only two Endpoint instances running, as shown in Figure 5-23 on page 471.

tivaix2

Shared Volume Group

/opt/hativoli/lcf

tivaix1

HA Endpoint lcfd





Figure 5-23 Cluster state after falling over tivaix1 to tivaix2

In each scenario in this alternate configuration, an Endpoint instance is always running on all cluster nodes that remain operational, even if HACMP on that cluster node is not running. As long as the system is powered up and the operating system functional, the local Endpoint remains to manage that system.

In this redbook we show how to install and configure a highly available Endpoint, then add a local Endpoint to the configuration. We use the same two-node cluster used throughout this document as the platform upon which we implement this configuration.

Endpoints require a Gateway in the Tivoli environment to log into so they can reach the Endpoint Manager. In our environment, we create a Gateway using the wcrtgate command, and verify the operation using the wlookup and wgateway commands as shown in Example 5-33 on page 472.

tivaix2

Shared Volume Group

/opt/hativoli/lcf

tivaix1

HA Endpoint lcfd





Example 5-33 Create a Gateway on tivaix1

[root@tivaix1:/home/root] wlookup -Lar Gateway[root@tivaix1:/home/root] wcrtgate -h tivaix1 -n tivaix1-gateway1369588498.1.680#TMF_Gateway::Gateway#[root@tivaix1:/home/root] wlookup -Lar Gatewaytivaix1-gateway[root@tivaix1:/home/root] wgateway tivaix1-gateway describeObject : 1369588498.1.680#TMF_Gateway::Gateway#Protocols : TCPIPHostname : tivaix1TCPIP Port : 9494Session Timeout : 300Debug level : 0Start Time : 2003/12/22-18:53:05Log Dir : /opt/hativoli/spool/tivaix1.dbLog Size : 1024000RPC Threads : 250Max. Con. Jobs : 200Gwy Httpd : Disabledmcache_bwcontrol : Disabled

In Example 5-33, we create a Gateway named tivaix1-gateway on the Managed Node tivaix1. Best practice is to design and implement multiple sets of Gateways, each set geographically dispersed when possible, to ensure that Endpoints always have a Gateway to log into.

Gateways are closely related to repeaters. Sites that use IBM Tivoli Configuration Manager might want to consider using two parallel sets of Gateways to enable simultaneous use of inventory and software distribution operations, which require different bandwidth throttling characteristics. See Tivoli Enterprise Installation Guide Version 4.1, GC32-0804, for more information about how to design a robust Gateway architecture.

As long as at least one Gateway is created, all Endpoints in a Tivoli Enterprise installation can log into that Gateway. To install a highly available Endpoint:

1. Use the wlookup command to verify that the Endpoint does not already exist.

In our environment, no Endpoints have been created yet, so the command does not return any output, as shown in Example 5-34.

Example 5-34 Verify no Endpoints exist within a Tivoli Enterprise installation

[root@tivaix1:/home/root] wlookup -Lar Endpoint

[root@tivaix1:/home/root]


2. Use the winstlcf command as shown in Example 5-35 to install the Endpoint. Refer to Tivoli Management Framework Reference Manual Version 4.1, SC32-0806 for details about how to use the winstlcf command.

In our environment, we used the -d flag option to specify the installation destination of the Endpoint, the -g flag option to specify the Gateway we create, the -n flag option to specify the name of the Endpoint, the -v flag option for verbose output, and we use the IP hostname tivaix1_svc to bind the Endpoint to the IP service label of the cluster node.

Example 5-35 Install a highly available Endpoint on cluster node tivaix1

[root@tivaix1:/home/root] winstlcf -d /opt/hativoli/lcf -g tivaix1 -n hativoli \-v tivaix1_svc

Trying tivaix1_svc...password for root:

**********sh -c '

echo "__START_HERE__"

uname -m || hostinfo | grep NeXTuname -r || hostinfo | grep NeXTuname -s || hostinfo | grep NeXTuname -v || hostinfo | grep NeXT

cd /tmpmkdir .tivoli.lcf.tmp.16552cd .tivoli.lcf.tmp.16552tar -xBf - > /dev/null || tar -xf -tar -xBf tivaix1_svc-16552-lcf.tar generic/epinst.sh tivaix1_svc-16552-lcf.env > /dev/null || tar -xf tivaix1_svc-16552-lcf.tar generic/epinst.sh tivaix1_svc-16552-lcf.envsh -x generic/epinst.sh tivaix1_svc-16552-lcf.env tivaix1_svc-16552-lcf.tarcd ..rm -rf .tivoli.lcf.tmp.16552'**********AIX:2:5:0001813F4C00locating files in /usr/local/Tivoli/bin/lcf_bundle.41000...locating files in /usr/local/Tivoli/bin/lcf_bundle...

Ready to copy files to host tivaix1_svc: destination: tivaix1_svc:/opt/hativoli/lcf source: tivaix1:/usr/local/Tivoli/bin/lcf_bundle.41000 files: generic/lcfd.sh generic/epinst.sh


generic/as.sh generic/lcf_env.sh generic/lcf_env.csh generic/lcf_env.cmd generic/lcf.inv bin/aix4-r1/mrt/lcfd lib/aix4-r1/libatrc.a lib/aix4-r1/libcpl272.a lib/aix4-r1/libdes272.a lib/aix4-r1/libmd2ep272.a lib/aix4-r1/libmrt272.a lib/aix4-r1/libtis272.a lib/aix4-r1/libio.a lib/aix4-r1/libtos.a lib/aix4-r1/libtoslog.a lib/aix4-r1/libtthred.a source: tivaix1:/usr/local/Tivoli/bin/lcf_bundle files: lib/aix4-r1/libmrt.a lib/aix4-r1/libcpl.a lib/aix4-r1/libdes.a

Continue? [yYna?]yTivoli Light Client Framework starting on tivaix1_svcDec 22 19:00:53 1 lcfd Command line argv[0]='/opt/hativoli/lcf/bin/aix4-r1/mrt/lcfd'Dec 22 19:00:53 1 lcfd Command line argv[1]='-Dlcs.login_interfaces=tivaix1_svc'Dec 22 19:00:53 1 lcfd Command line argv[2]='-n'Dec 22 19:00:53 1 lcfd Command line argv[3]='hativoli'Dec 22 19:00:53 1 lcfd Command line argv[4]='-Dlib_dir=/opt/hativoli/lcf/lib/aix4-r1'Dec 22 19:00:53 1 lcfd Command line argv[5]='-Dload_dir=/opt/hativoli/lcf/bin/aix4-r1/mrt'Dec 22 19:00:53 1 lcfd Command line argv[6]='-C/opt/hativoli/lcf/dat/1'Dec 22 19:00:53 1 lcfd Command line argv[7]='-Dlcs.machine_name=tivaix1_svc'Dec 22 19:00:53 1 lcfd Command line argv[8]='-Dlcs.login_interfaces=tivaix1'Dec 22 19:00:53 1 lcfd Command line argv[9]='-n'Dec 22 19:00:53 1 lcfd Command line argv[10]='hativoli'Dec 22 19:00:53 1 lcfd Starting Unix daemonPerforming auto start configurationDone.+ set -a+ WINSTENV=tivaix1_svc-16552-lcf.env+ [ -z tivaix1_svc-16552-lcf.env ]+ . ./tivaix1_svc-16552-lcf.env+ INTERP=aix4-r1+ LCFROOT=/opt/hativoli/lcf+ NOAS=+ ASYNCH=+ DEBUG=+ LCFOPTS= -Dlcs.login_interfaces=tivaix1_svc -n hativoli+ NOTAR=


+ MULTIINSTALL=+ BULK_COUNT=+ BULK_PORT=+ HOSTNAME=tivaix1_svc+ VERBOSE=1+ PRESERVE=+ LANG=+ LC_ALL=+ LCFDVRMP=LCF41015+ rm -f ./tivaix1_svc-16552-lcf.env+ [ aix4-r1 != w32-ix86 -a aix4-r1 != w32-axp ]+ umask 022+ + pwdstage=/tmp/.tivoli.lcf.tmp.16552+ [ -n ]+ [ aix4-r1 = w32-ix86 -o aix4-r1 = os2-ix86 -o aix4-r1 = w32-axp ]+ [ -d /opt/hativoli/lcf/bin/aix4-r1 ]+ [ ! -z ]+ MKDIR_CMD=/bin/mkdir -p /opt/hativoli/lcf/dat+ [ -d /opt/hativoli/lcf/dat ]+ /bin/mkdir -p /opt/hativoli/lcf/dat+ [ aix4-r1 != w32-ix86 -a aix4-r1 != w32-axp -a aix4-r1 != os2-ix86 ]+ chmod 755 /opt/hativoli/lcf/dat+ cd /opt/hativoli/lcf+ [ aix4-r1 = os2-ix86 -a ! -d /tmp ]+ [ -n ]+ [ aix4-r1 = w32-ix86 -a -z ]+ [ aix4-r1 = w32-axp -a -z ]+ mv generic/lcf.inv bin/aix4-r1/mrt/LCF41015.SIG+ PATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:/usr/java130/jre/bin:/usr/java130/bin:/opt/hativoli/lcf/generic+ export PATH+ [ -n ]+ [ -n ]+ K=1+ fixup=1+ [ 1 -gt 0 ]+ unset fixup+ [ -n ]+ [ -n ]+ [ -n ]+ [ -n ]+ [ -z ]+ port=9494+ [ aix4-r1 = w32-ix86 -o aix4-r1 = w32-axp ]+ ET=/etc/Tivoli/lcf+ + getNextDirName /opt/hativoli/lcf/dat /etc/Tivoli/lcfuniq=1


+ LCF_DATDIR=/opt/hativoli/lcf/dat/1+ [ aix4-r1 != openstep4-ix86 ]+ mkdir -p dat/1+ s=/opt/hativoli/lcf/dat/1/lcfd.sh+ cp /opt/hativoli/lcf/generic/lcfd.sh /opt/hativoli/lcf/dat/1/lcfd.sh+ sed -e s!@INTERP@!aix4-r1!g -e s!@LCFROOT@!/opt/hativoli/lcf!g -e s!@LCF_DATDIR@!/opt/hativoli/lcf/dat/1!g+ 0< /opt/hativoli/lcf/dat/1/lcfd.sh 1> t+ mv t /opt/hativoli/lcf/dat/1/lcfd.sh+ [ aix4-r1 != w32-ix86 -a aix4-r1 != w32-axp -a aix4-r1 != os2-ix86 ]+ chmod 755 /opt/hativoli/lcf/dat/1/lcfd.sh+ chmod 755 /opt/hativoli/lcf/bin/aix4-r1/mrt/lcfd+ chmod 755 /opt/hativoli/lcf/lib/aix4-r1/libatrc.a /opt/hativoli/lcf/lib/aix4-r1/libcpl.a /opt/hativoli/lcf/lib/aix4-r1/libcpl272.a /opt/hativoli/lcf/lib/aix4-r1/libdes.a /opt/hativoli/lcf/lib/aix4-r1/libdes272.a /opt/hativoli/lcf/lib/aix4-r1/libio.a /opt/hativoli/lcf/lib/aix4-r1/libmd2ep272.a /opt/hativoli/lcf/lib/aix4-r1/libmrt.a /opt/hativoli/lcf/lib/aix4-r1/libmrt272.a /opt/hativoli/lcf/lib/aix4-r1/libtis272.a /opt/hativoli/lcf/lib/aix4-r1/libtos.a /opt/hativoli/lcf/lib/aix4-r1/libtoslog.a /opt/hativoli/lcf/lib/aix4-r1/libtthred.a+ s=/opt/hativoli/lcf/generic/lcf_env.sh+ [ -f /opt/hativoli/lcf/generic/lcf_env.sh ]+ sed -e s!@LCFROOT@!/opt/hativoli/lcf!g+ 0< /opt/hativoli/lcf/generic/lcf_env.sh 1> t+ mv t /opt/hativoli/lcf/generic/lcf_env.sh+ label=tivaix1_svc+ [ 1 -ne 1 ]+ [ aix4-r1 = w32-ix86 -o aix4-r1 = w32-axp ]+ [ -n ]+ /opt/hativoli/lcf/dat/1/lcfd.sh install -C/opt/hativoli/lcf/dat/1 -Dlcs.machine_name=tivaix1_svc -Dlcs.login_interfaces=tivaix1 -n hativoli+ + expr 1 - 1K=0+ [ 0 -gt 0 ]+ set +e+ ET=/etc/Tivoli/lcf/1+ [ aix4-r1 = w32-ix86 -o aix4-r1 = w32-axp ]+ [ aix4-r1 != openstep4-ix86 ]+ [ ! -d /etc/Tivoli/lcf/1 ]+ mkdir -p /etc/Tivoli/lcf/1+ mv /opt/hativoli/lcf/generic/lcf_env.sh /etc/Tivoli/lcf/1/lcf_env.sh+ sed -e s!@INTERP@!aix4-r1!g -e s!@LCFROOT@!/opt/hativoli/lcf!g -e s!@LCF_DATDIR@!/opt/hativoli/lcf/dat/1!g+ 0< /etc/Tivoli/lcf/1/lcf_env.sh 1> /etc/Tivoli/lcf/1/lcf_env.sh.12142+ mv /etc/Tivoli/lcf/1/lcf_env.sh.12142 /etc/Tivoli/lcf/1/lcf_env.sh+ [ aix4-r1 = w32-ix86 -o aix4-r1 = w32-axp -o aix4-r1 = os2-ix86 ]+ mv /opt/hativoli/lcf/generic/lcf_env.csh /etc/Tivoli/lcf/1/lcf_env.csh+ sed -e s!@INTERP@!aix4-r1!g -e s!@LCFROOT@!/opt/hativoli/lcf!g -e s!@LCF_DATDIR@!/opt/hativoli/lcf/dat/1!g+ 0< /etc/Tivoli/lcf/1/lcf_env.csh 1> /etc/Tivoli/lcf/1/lcf_env.csh.12142


+ mv /etc/Tivoli/lcf/1/lcf_env.csh.12142 /etc/Tivoli/lcf/1/lcf_env.csh+ cp /etc/Tivoli/lcf/1/lcf_env.csh /etc/Tivoli/lcf/1/lcf_env.sh /opt/hativoli/lcf/dat/1+ [ aix4-r1 = os2-ix86 ]+ [ -z ]+ sh /opt/hativoli/lcf/generic/as.sh 1+ echo 1+ 1> /etc/Tivoli/lcf/.instance+ echo Done.

3. Use the wlookup and wep commands as shown in Example 5-36 to verify the installation of the highly available Endpoint.

Example 5-36 Verify installation of highly available Endpoint

[root@tivaix1:/home/root] wlookup -Lar Endpointhativoli[root@tivaix1:/home/root] wep lsG 1369588498.1.680 tivaix1-gateway1369588498.2.522+#TMF_Endpoint::Endpoint# hativoli[root@tivaix1:/home/root] wep hativoli

object 1369588498.2.522+#TMF_Endpoint::Endpoint# label hativoli version 41014 id 0001813F4C00 gateway 1369588498.1.680#TMF_Gateway::Gateway# pref_gateway 1369588498.1.680#TMF_Gateway::Gateway# netload OBJECT_NIL interp aix4-r1 login_mode desktop, constant protocol TCPIP address 192.168.100.101+9495 policy OBJECT_NIL httpd tivoli:r)T!*`un alias OBJECT_NIL crypt_mode NONE upgrade_mode enable last_login_time 2003/12/22-19:00:54last_migration_time 2003/12/22-19:00:54 last_method_time NOT_YET_SET

4. If this is the first time an Endpoint is installed on the system, the Lightweight Client Framework (LCF) environment file is installed in the /etc/Tivoli/lcf/1 directory, as shown in Example 5-37 on page 478. The directory with the highest number in the /etc/Tivoli/lcf directory is the latest installed environment files directory. Identify this directory and record it.


Example 5-37 Identify directory location of LCF environment file

[root@tivaix1:/home/root] ls /etc/Tivoli/lcf/1./ ../ lcf_env.csh lcf_env.sh

If you are unsure of which directory contains the appropriate environment files, use the grep command as shown in Example 5-38 to identify which instance of an Endpoint an LCF environment file is used for.

Example 5-38 Identify which instance of an Endpoint an LCF environment file is used for

[root@tivaix1:/home/root] grep LCFROOT= /etc/Tivoli/lcf/1/lcf_env.shLCFROOT="/opt/hativoli/lcf"

5. Stop the new Endpoint to prepare it for HACMP to start and stop it. Use the ps and grep commands to identify the running instances of Endpoints, source in the LCF environment, use the lcfd.sh command to stop the Endpoint (the environment that is sourced in identifies the instance of the Endpoint that is stopped), and use the ps and grep commands to verify that the Endpoint is stopped, as shown in Example 5-39.

Example 5-39 Stop an instance of an Endpoint

[root@tivaix1:/home/root] ps -ef | grep lcf | grep -v grep root 21520 1 0 Dec 22 - 0:00 /opt/hativoli/bin/aix4-r1/mrt/lcfd -Dlcs.login_interfaces=tivaix1_svc -n hativoli -Dlib_dir=/opt/hativoli/lib/aix4-r1 -Dload_dir=/opt/hativoli/bin/aix4-r1/mrt -C/opt/hativoli/dat/1 -Dlcs.machine_name=tivaix1_svc -Dlcs.login_interfaces=tivaix1 -n hativoli[root@tivaix1:/home/root] . /etc/Tivoli/lcf/1/lcf_env.sh[root@tivaix1:/home/root] lcfd.sh stop[root@tivaix1:/home/root] ps -ef | grep lcf | grep -v grep

Disable automatic startDisable the automatic start of any highly available Tivoli server, Managed Node, or Endpoint so that instead of starting as soon as the system restarts, they start under the control of HACMP.

Tip: Best practice is to delete all unused instances of the LCF environment directories. This eliminates the potential for misleading configurations.

Important: Ensure the instance of a highly available Endpoint is the same on all cluster nodes that the Endpoint can fall over to. This enables scripts to be the same on every cluster node.


Endpoint installations configure the Endpoint to start every time the system restarts. High availability implementations need to start and stop highly available Endpoints after HACMP is running, so the automatic start after system restart needs to be disabled. Determine how an Endpoint starts on your platform after a system restart and disable it.

In our environment, the highly available Endpoint is installed on an IBM AIX system. Under IBM AIX, the file /etc/rc.tman starts an Endpoint, where n is the instance of an Endpoint that is installed. Example 5-40 shows the content of this file. We remove the file to disable automatic start after system restart.

Example 5-40 Identify how an Endpoint starts during system restart

[root@tivaix1:/etc] cat /etc/rc.tma1#!/bin/sh## Start the Tivoli Management Agent#if [ -f /opt/hativoli/dat/1/lcfd.sh ]; then /opt/hativoli/dat/1/lcfd.sh startfi

The oserv.rc program starts Tivoli servers and Managed Nodes. In our environment, the highly available Tivoli server is installed on an IBM AIX system. We use the find command as shown in Example 5-41 to identify the files in the /etc directory used to start the object dispatcher. The files (highlighted in italics) are: /etc/inittab and /etc/inetd.conf. We remove the lines found by the find command to disable the automatic start mechanism.

Example 5-41 Find all instances where IBM Tivoli Management Framework is started

[root@tivaix1:/etc] find /etc -type f -exec grep 'oserv.rc' {} \; -printoserv:2:once:/etc/Tivoli/oserv.rc start > /dev/null 2>&1/etc/inittabobjcall dgram udp wait root /etc/Tivoli/oserv.rc /etc/Tivoli/oserv.rc inetd/etc/inetd.conf

You can use the same find command to determine how the object dispatcher starts on your platform. Use the following find command to search for instances of the string “lcfd.sh” in the files in the /etc directory if you need to identify the files where the command is used to start an Endpoint:

find /etc -type f -exec grep 'lcfd.sh' {} \; -print

Note that the line containing the search string appears first, followed by the file location.


5.1.8 Configure HACMPAfter verifying that the installation of IBM Tivoli Management Framework (whether it is a Tivoli server, Managed Node, or Endpoint) you want to make highly available correctly functions, then install and configure HACMP on the system. If IBM Tivoli Management Framework subsequently fails to start or function properly, you will know that it is highly likely the cause is due to an HACMP issue instead of an IBM Tivoli Management Framework issue.

In this section we show how to install and configure HACMP for an IBM Tivoli Management Framework Tivoli server.

Install HACMPComplete the procedures in “Install HACMP” on page 113.

Configure HACMP topologyComplete the procedures in “Configure HACMP topology” on page 219 to define the cluster topology.

Configure service IP labels/addressesComplete the procedures in “Configure HACMP service IP labels/addresses” on page 221 to configure service IP labels and addresses.

Configure application serversAn application server is a cluster resource used to control an application that must be kept highly available. Configuring an application server does the following:

� It associates a meaningful name with the server application. For example, you could give an installation of IBM Tivoli Management Framework a name such as itmf. You then use this name to refer to the application server when you define it as a resource.

� It points the cluster event scripts to the scripts that they call to start and stop the server application.

Restriction: These procedures are mutually exclusive from the instructions given in Chapter 4, “IBM Tivoli Workload Scheduler implementation in a cluster” on page 183.

While some steps are the same, you can either implement either the scenario given in that chapter, or this chapter, but you cannot implement both at the same time.


� It allows you to then configure application monitoring for that application server.

We show in “Add custom HACMP start and stop scripts” on page 489 how to write the start and stop scripts for IBM Tivoli Management Framework.

Complete the following steps to create an application server on any cluster node:

1. Enter: smitty hacmp.

2. Go to Initialization and Standard Configuration -> Configure Resources to Make Highly Available -> Configure Application Servers and press Enter. The Configure Resources to Make Highly Available SMIT screen is displayed as shown in Figure 5-24.

Figure 5-24 Configure Resources to Make Highly Available SMIT screen

Note: Ensure that the server start and stop scripts exist on all nodes that participate as possible owners of the resource group where this application server resides.

Configure Resources to Make Highly Available


Configure Service IP Labels/Addresses Configure Application Servers Configure Volume Groups, Logical Volumes and Filesystems Configure Concurrent Volume Groups and Logical Volumes



3. Go to Configure Application Servers and press Enter. The Configure Application Servers SMIT screen is displayed as shown in Figure 5-25.

Figure 5-25 Configure Application Servers SMIT screen

4. Go to Add an Application Server and press Enter. The Add Application Server SMIT screen is displayed as shown in Figure 5-26 on page 483. Enter field values as follows:

Server Name Enter an ASCII text string that identifies the server. You will use this name to refer to the application server when you define resources during node configuration. The server name can include alphabetic and numeric characters and underscores. Use no more than 64 characters.

Start Script Enter the name of the script and its full pathname (followed by arguments) called by the cluster event scripts to start the application server. (Maximum 256 characters.) This script must be in the same location on each cluster node that might start the server. The contents of the script, however, may differ.

Configure Application Servers


Add an Application Server Change/Show an Application Server Remove an Application Server



Stop Script Enter the full pathname of the script called by the cluster event scripts to stop the server. (Maximum 256 characters.) This script must be in the same location on each cluster node that may start the server. The contents of the script, however, may differ.

Figure 5-26 Fill out the Add Application Server SMIT screen for application server itmf

As shown in Figure 5-26, in our environment on tivaix1 we named the instance of IBM Tivoli Management Framework that normally runs on that cluster node “itmf” (for IBM Tivoli Management Framework). Note that no mention is made of the cluster nodes when defining an application server. We only mention the cluster node so you are familiar with the conventions we use in our environment.

For the start script of application server itmf, we enter the following in the Start Script field:

/usr/es/sbin/cluster/utils/start_itmf.sh

The stop script of this application server is:

/usr/es/sbin/cluster/utils/stop_itmf.sh



[Entry Fields]* Server Name [itmf]* Start Script [/usr/es/sbin/cluster/>* Stop Script [/usr/es/sbin/cluster/>



This is entered in the Stop Script field.

5. Press Enter to add this information to the ODM on the local node.

6. Repeat the procedure for all additional application servers.

For our environment, there are no further application servers to configure.

Configure the application monitoringHACMP can monitor specified applications and automatically take action to restart them upon detecting process death or other application failures.

You can select either of two application monitoring methods:

� Process application monitoring detects the death of one or more processes of an application, using RSCT Event Management.

� Custom application monitoring checks the health of an application with a custom monitor method at user-specified polling intervals.

Process monitoring is easier to set up, as it uses the built-in monitoring capability provided by RSCT and requires no custom scripts. However, process monitoring may not be an appropriate option for all applications. Custom monitoring can monitor more subtle aspects of an application’s performance and is more customizable, but it takes more planning, as you must create the custom scripts.

We show you in this section how to configure process monitoring for IBM Tivoli Management Framework. Remember that an application must be defined to an application server before you set up the monitor.

For IBM Tivoli Management Framework, we configure process monitoring for the oserv process because it will always run under normal conditions. If it fails, we want the cluster to automatically fall over, and not attempt to restart oserv. Because oserv starts very quickly, we only give it 60 seconds to start before monitoring begins. For cleanup and restart scripts, we will use the same scripts as the start and stop scripts discussed in “Add custom HACMP start and stop scripts” on page 489.

Note: If a monitored application is under control of the system resource controller, check to be certain that action:multi are -O and -Q. The -O specifies that the subsystem is not restarted if it stops abnormally. The -Q specifies that multiple instances of the subsystem are not allowed to run at the same time. These values can be checked using the following command:

lssrc -Ss Subsystem | cut -d : -f 10,11

If the values are not -O and -Q, then they must be changed using the chssys command.


Set up your process application monitor as follows:


2. Go to Extended Configuration -> Extended Resource Configuration -> Extended Resources Configuration -> Configure HACMP Application Monitoring -> Configure Process Application Monitor -> Add Process Application Monitor and press Enter. A list of previously defined application servers appears.

3. Select the application server for which you want to add a process monitor.

In our environment, we selected itmf, as shown in Figure 5-27.

Figure 5-27 Select an application server to monitor

4. In the Add Process Application Monitor screen, fill in the field values as follows:

Monitor Name

This is the name of the application monitor. If this monitor is associated with an application server, the monitor has the same name as the application server. This field is informational only and cannot be edited.

Application Server Name

(This field can be chosen from the picklist. It is already filled with the name of the application server you selected.)

Processes to Monitor

Specify the process(es) to monitor. You can type more than one process name. Use spaces to separate the names.

+--------------------------------------------------------------------------+¦ Application Server to Monitor ¦¦ ¦¦ Move cursor to desired item and press Enter. ¦¦ ¦¦ itmf ¦¦ ¦¦ ¦¦ F1=Help F2=Refresh F3=Cancel ¦¦ F8=Image F10=Exit Enter=Do ¦¦ /=Find n=Find Next ¦+--------------------------------------------------------------------------+


Process Owner

Specify the user id of the owner of the processes specified above, for example, root. Note that the process owner must own all processes to be monitored.

Instance Count

Specify how many instances of the application to monitor. The default is 1 instance. The number of instances must match the number of processes to monitor exactly. If you put 1 instance, and another instance of the application starts, you will receive an application monitor error.

Stabilization Interval

Specify the time (in seconds) to wait before beginning monitoring. For instance, with a database application, you may wish to delay monitoring until after the start script and initial database search have been completed. You may need to experiment with this value to balance performance with reliability.

Restart Count

Specify the restart count, that is the number of times to attempt to restart the application before taking any other actions. The default is 3.

Note: To be sure you are using correct process names, use the names as they appear from the ps -el command (not ps -f), as explained in the section “Identifying Correct Process Names” in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862.

Note: This number must be 1 if you have specified more than one process to monitor (one instance for each process).

Note: In most circumstances, this value should not be zero.

Note: Make sure you enter a Restart Method if your Restart Count is any non-zero value.


Restart Interval

Specify the interval (in seconds) that the application must remain stable before resetting the restart count. Do not set this to be shorter than (Restart Count) x (Stabilization Interval). The default is 10% longer than that value. If the restart interval is too short, the restart count will be reset too soon and the desired fallover or notify action may not occur when it should.

Action on Application Failure

Specify the action to be taken if the application cannot be restarted within the restart count. You can keep the default choice notify, which runs an event to inform the cluster of the failure, or select fallover, in which case the resource group containing the failed application moves over to the cluster node with the next highest priority for that resource group.

Refer to “Note on the Fallover Option and Resource Group Availability” in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862 for more information.

Notify Method

(Optional) Define a notify method that will run when the application fails. This custom method runs during the restart process and during notify activity.

Cleanup Method

(Optional) Specify an application cleanup script to be invoked when a failed application is detected, before invoking the restart method. The default is the application server stop script defined when the application server was set up.

Restart Method

(Required if Restart Count is not zero.) The default restart method is the application server start script defined previously, when the application server was

Note: With application monitoring, since the application is already stopped when this script is called, the server stop script may fail.


set up. You can specify a different method here if desired.

In our environment, we entered the process /usr/hativoli/bin/aix4-r1/bin/oserv in the Process to Monitor field, root in the Process Owner field, 60 in the Stabilization Interval field, 0 in the Restart Count field, and fallover in the Action on Application Failure field; all other fields were left as is, as shown in Figure 5-28.

Figure 5-28 Add Process Application Monitor SMIT screen for application server itmf

5. Press Enter after you have entered the desired information.

The values are then checked for consistency and entered into the ODM. When the resource group comes online, the application monitor starts.

In our environment that we use for this Redbook, the COMMAND STATUS SMIT screen displays two warnings as shown inFigure 5-29 on page 489, which we safely ignore because the default values applied are the desired values.



[Entry Fields]* Monitor Name itmf* Application Server Name itmf +* Processes to Monitor [/usr/hativoli/bin/aix>* Process Owner [root] Instance Count [] #* Stabilization Interval [60] #* Restart Count [0] # Restart Interval [] #* Action on Application Failure [fallover] + Notify Method [] Cleanup Method [/usr/es/sbin/cluster/> Restart Method [/usr/es/sbin/cluster/>



Figure 5-29 COMMAND STATUS SMIT screen after creating HACMP process application monitor

6. Repeat the operation for remaining application servers.

In our environment that we use for this Redbook, there are no other IBM Tivoli Management Framework application servers to configure.

You can create similar application monitors for a highly available Endpoint.

Add custom HACMP start and stop scriptsFor IBM Tivoli Management Framework, custom scripts for HACMP are required to start and stop the application server (in this case, the object dispatcher for Managed Nodes or the lightweight client framework for Endpoints). These are used when HACMP starts an application server that is part of a resource group, and gracefully shuts down the application server when a resource group is taken offline or moved. The stop script of course does not get an opportunity to execute if a cluster node is unexpectedly halted. We developed the following basic versions of the scripts for our environment. You may need to write your own version to accommodate your site’s specific requirements.

COMMAND STATUS



claddappmon warning: The parameter "INSTANCE_COUNT" was not specified. Will use 1.claddappmon warning: The parameter "RESTART_INTERVAL" was not specified. Willuse 0.



The following example shows a start script for a highly available object dispatcher (Managed Node or Tivoli server).

Example 5-42 Script to start highly available IBM Tivoli Management Framework

#!/bin/sh

# # Start IBM Tivoli Management Frameworkif [ -f /etc/Tivoli/setup_env.sh ] ; then . /etc/Tivoli/setup_env.sh /etc/Tivoli/oserv.rc startelse exit 1fi

The following example shows a stop script for a highly available object dispatcher.

Example 5-43 Script to stop highly available IBM Tivoli Management Framework

#!/bin/sh

# # Shut down IBM Tivoli Management Frameworkodadmin shutdown 1

The following example shows a start script for a highly available Endpoint.

Example 5-44 Start script for highly available Endpoint

#!/bin/sh

# # Starts the highly available Endpointif [ -f /etc/Tivoli/lcf/1/lcf_env.sh ] ; then . /etc/Tivoli/lcf/1/lcf_env.sh lcfd.sh startelse exit 1fi

The stop script for a highly available Endpoint is similar, except that it passes the argument “stop” in the call to lcfd.sh, as shown in the following example.


Example 5-45 Stop script for highly available Endpoint

#!/bin/sh

# # Stops the highly available Endpointif [ -f /etc/Tivoli/lcf/1/lcf_env.sh ] ; then . /etc/Tivoli/lcf/1/lcf_env.sh lcfd.sh stopelse exit 1fi

If you want to implement a highly available object dispatcher and Endpoint in the same resource group, merge the corresponding start and stop scripts into a single script.

The configuration we show in this redbook is for a hot standby cluster, so using the same start and stop scripts on all cluster nodes is sufficient. Mutual takeover configurations will need to use more sophisticated scripts that determine the state of the cluster and start (or stop) the appropriate instances of object dispatchers and Endpoints.

Modify /etc/hosts and the name resolution orderComplete the procedures in “Modify /etc/hosts and name resolution order” on page 455 to modify /etc/hosts and name resolution order on both tivaix1 and tivaix2.

Configure HACMP networks and heartbeat pathsComplete the procedures in “Configure HACMP networks and heartbeat paths” on page 254 to configure HACMP networks and heartbeat paths.

Configure HACMP resource groupThis creates a container to organize HACMP resources into logical groups that are defined later. Refer to High Availability Cluster Multi-Processing for AIX Concepts and Facilities Guide Version 5.1, SC23-4864 for an overview of types of resource groups you can configure in HACMP 5.1. Refer to the chapter on planning resource groups in High Availability Cluster Multi-Processing for AIX, Planning and Installation Guide Version 5.1, SC23-4861-00 for further planning information. You should have your planning worksheets in hand.

Using the standard path, you can configure resource groups that use the basic management policies. These policies are based on the three predefined types of startup, fallover, and fallback policies: cascading, rotating, concurrent. In addition


to these, you can also configure custom resource groups for which you can specify slightly more refined types of startup, fallover and fallback policies.

Once the resource groups are configured, if it seems necessary for handling certain applications, you can use the Extended Configuration path to change or refine the management policies of particular resource groups (especially custom resource groups).

Configuring a resource group involves two phases:

� Configuring the resource group name, management policy, and the nodes that can own it.

� Adding the resources and additional attributes to the resource group.

Refer to your planning worksheets as you name the groups and add the resources to each one.

To create a resource group:

1. Enter smit hacmp.

2. On the HACMP menu, select Initialization and Standard Configuration > Configure HACMP Resource Groups >Add a Standard Resource Group and press Enter.

You are prompted to select a resource group management policy.

3. Select Cascading, Rotating, Concurrent or Custom and press Enter.

For our environment we use Cascading.

Depending on the previous selection, you will see a screen titled Add a Cascading | Rotating | Concurrent | Custom Resource Group. The screen will only show options relevant to the type of the resource group you selected. If you select custom, you will be asked to refine the startup, fallover, and fallback policy before continuing.

4. Enter the field values as follows for a cascading, rotating, or concurrent resource group:

Resource Group Name

Enter the desired name. Use no more than 32 alphanumeric characters or underscores; do not use a leading numeric. Do not use reserved words. See Chapter 6, section “List of Reserved Words” in High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862. Duplicate entries are not allowed.

Participating Node Names


Enter the names of the nodes that can own or take over this resource group. Enter the node with the highest priority for ownership first, followed by the nodes with the lower priorities, in the desired order. Leave a space between node names. For example, NodeA NodeB NodeX.

If you choose to define a custom resource group, you define additional fields. We do not use custom resource groups in this Redbook for simplicity of presentation.

Figure 5-30 shows how we configured resource group itmf_rg in the environment implemented by this Redbook. We use this resource group to contain the instance of IBM Tivoli Management Framework normally running on tivaix1.

Figure 5-30 Configure resource group rg1

Configure resources in the resource groupsOnce you have defined a resource group, you assign resources to it. SMIT can list possible shared resources for the node if the node is powered on (helping you to avoid configuration errors).



[Entry Fields]* Resource Group Name [itmf_rg]* Participating Node Names / Default Node Priority [tivaix1 tivaix2] +



When you are adding or changing resources in a resource group, HACMP displays only valid choices for resources, based on the resource group management policies that you have selected.

To assign the resources for a resource group:


2. Go to Initialization and Standard Configuration -> Configure HACMP Resource Groups -> Change/Show Resources for a Standard Resource Group and press Enter to display a list of defined resource groups.

3. Select the resource group you want to configure and press Enter. SMIT returns the screen that matches the type of resource group you selected, with the Resource Group Name, and Participating Node Names (Default Node Priority) fields filled in.



Service IP Label/IP Addresses

(Not an option for concurrent or custom concurrent-like resource groups.) List the service IP labels to be taken over when this resource group is taken over. Press F4 to see a list of valid IP labels. These include addresses which rotate or may be taken over.

Filesystems (empty is All for specified VGs)

(Not an option for concurrent or custom concurrent-like resource groups.) If you leave the Filesystems (empty is All for specified VGs) field blank and specify the shared volume groups in the Volume Groups field below, all file systems will be mounted in the volume group. If you leave the Filesystems field blank and do not specify the volume groups in the field below, no file systems will be mounted.

You may also select individual file systems to include in the resource group. Press F4 to see a list of the filesystems. In this case only the specified file systems

Note: SMIT displays only valid choices for resources, depending on the type of resource group that you selected. The fields are slightly different for custom, non-concurrent, and concurrent groups.


will be mounted when the resource group is brought online.

Filesystems (empty is All for specified VGs) is a valid option only for non-concurrent resource groups.

Volume Groups (If you are adding resources to a non-concurrent resource group) Identify the shared volume groups that should be varied on when this resource group is acquired or taken over. Select the volume groups from the picklist or enter desired volume groups names in this field.

Pressing F4 will give you a list of all shared volume groups in the resource group and the volume groups that are currently available for import onto the resource group nodes.

Specify the shared volume groups in this field if you want to leave the field Filesystems (empty is All for specified VGs) blank and to mount all filesystems in the volume group. If you specify more than one volume group in this field, then all filesystems in all specified volume groups will be mounted; you cannot choose to mount all filesystems in one volume group and not to mount them in another.

For example, in a resource group with two volume groups, vg1 and vg2, if the field Filesystems (empty is All for specified VGs) is left blank, then all the file systems in vg1 and vg2 will be mounted when the resource group is brought up. However, if the field Filesystems (empty is All for specified VGs) has only file systems that are part of the vg1 volume group, then none of the file systems in vg2 will be mounted, because they were not entered in the Filesystems (empty is All for specified VGs) field along with the file systems from vg1.

If you have previously entered values in the Filesystems field, the appropriate volume groups are already known to the HACMP software.

Concurrent Volume Groups

(Appears only if you are adding resources to a concurrent or custom concurrent-like resource group.) Identify the shared volume groups that can be accessed simultaneously by multiple nodes. Select the


volume groups from the picklist, or enter desired volume groups names in this field.

If you previously requested that HACMP collect information about the appropriate volume groups, then pressing F4 will give you a list of all existing concurrent capable volume groups that are currently available in the resource group, and concurrent capable volume groups available to be imported onto the nodes in the resource group.

Disk fencing is turned on by default.

Application Servers Indicate the application servers to include in the resource group. Press F4 to see a list of application servers.

In our environment, we defined resource group rg1 as shown in Figure 5-31 on page 497.

Note: If you are configuring a custom resource group, and choose to use a dynamic node priority policy for a cascading-type custom resource group, you will see the field where you can select which one of the three predefined node priority policies you want to use.



For resource group rg1, we assign tivaix1_svc as the service IP label, tiv_vg1 as the sole volume group to use, and tws_svr1 for the application server.

5. Press Enter to add the values to the HACMP ODM.

6. Repeat the operation for other resource groups to configure.

In our environment, we did not have any further resource groups to configure.

Configure cascading without fallback, other attributesWe configure all resource groups in our environment for cascading without fallback (CWOF) so IBM Tivoli Management Framework can be given enough time to quiesce before falling back. This is part of the extended resource group configuration.

We use this step to also configure other attributes of the resource groups like the associated shared volume group and filesystems.

To configure CWOF and other resource group attributes:




[Entry Fields] Resource Group Name itmf_rg Participating Node Names (Default Node Priority) tivaix1 tivaix2

* Service IP Labels/Addresses [tivaix1_svc] + Volume Groups [itmf_vg] + Filesystems (empty is ALL for VGs specified) [] + Application Servers [itmf] +



2. Go to Extended Configuration -> Extended Resource Configuration -> Extended Resource Group Configuration -> Change/Show Resources and Attributes for a Resource Group and press Enter.

SMIT displays a list of defined resource groups.

3. Select the resource group you want to configure and press Enter. SMIT returns the screen that matches the type of resource group you selected, with the Resource Group Name, Inter-site Management Policy, and Participating Node Names (Default Node Priority) fields filled in.


4. Enter true in the Cascading Without Fallback Enabled field by pressing Tab in the field until the value is displayed (Figure 5-32).

Figure 5-32 Set cascading without fallback (CWOF) for a resource group

5. Repeat the operation for any other applicable resource groups.

In our environment, we applied the same operation to resource group rg2.

Change/Show All Resources and Attributes for a Cascading Resource Group


[TOP] [Entry Fields] Resource Group Name itmf_rg Resource Group Management Policy cascading Inter-site Management Policy ignore Participating Node Names / Default Node Priority tivaix1 tivaix2 Dynamic Node Priority (Overrides default) [] + Inactive Takeover Applied false + Cascading Without Fallback Enabled true +

Application Servers [itmf] + Service IP Labels/Addresses [tivaix1_svc] +

Volume Groups [itmf_vg] + Use forced varyon of volume groups, if necessary false + Automatically Import Volume Groups false +[MORE...19]



For the environment in this Redbook, all resources and attributes for resource group rg1 are shown in Example 5-46.


[TOP] [Entry Fields] Resource Group Name itmf_rg Resource Group Management Policy cascading Inter-site Management Policy ignore Participating Node Names / Default Node Priority tivaix1 tivaix2 Dynamic Node Priority (Overrides default) [] + Inactive Takeover Applied false + Cascading Without Fallback Enabled true +

Application Servers [itmf] + Service IP Labels/Addresses [tivaix1_svc] +

Volume Groups [itmf_vg] + Use forced varyon of volume groups, if necessary false + Automatically Import Volume Groups false +

Filesystems (empty is ALL for VGs specified) [/usr/local/itmf] + Filesystems Consistency Check fsck + Filesystems Recovery Method sequential + Filesystems mounted before IP configured false + Filesystems/Directories to Export [] + Filesystems/Directories to NFS Mount [] + Network For NFS Mount [] +





Configure HACMP persistent node IP label/addresses

Complete the procedure in “Configure HACMP persistent node IP label/addresses” on page 272 to configure HACMP persistent node IP labels and addresses.


Configure predefined communication interfacesComplete the procedure in “Configure predefined communication interfaces” on page 276 to configure predefined communication interfaces to HACMP.

Verify the configurationComplete the procedure in “Verify the configuration” on page 280 to verify the HACMP configuration.

The output of the cltopinfo command for our environment is shown in Example 5-47.

Example 5-47 Output of cltopinfo command for hot standby Framework configuration

Cluster Description of Cluster: cltivoliCluster Security Level: StandardThere are 2 node(s) and 3 network(s) defined



Resource Group itmf_rg Behavior cascading Participating Nodes tivaix1 tivaix2 Service IP Label tivaix1_svc

The output would be the same for configurations that add highly available Endpoints, because we use the same resource group in the configuration we show in this redbook.


Start HACMP Cluster servicesComplete the procedure in “Start HACMP Cluster services” on page 287 to start HACMP on the cluster.

Verify HACMP statusComplete the procedure in “Verify HACMP status” on page 292 to verify HACMP is running on the cluster.

Test HACMP resource group movesComplete the procedure in “Test HACMP resource group moves” on page 294 to test moving resource group itmf_rg from cluster node tivaix1 to tivaix2, then from tivaix2 to tivaix1.

Live test of HACMP falloverComplete the procedure in “Live test of HACMP fallover” on page 298 to test HACMP fallover of the itmf_rg resource group. Verify the lsvg command displays the volume group itmf_vg and the command clRGinfo command displays the resource group itmf_rg.

Configure HACMP to start on system restartComplete the procedure in “Configure HACMP to start on system restart” on page 300 to set HACMP to start when the system restarts.

Verify Managed Node falloverWhen halting cluster nodes during testing in ”Live test of HACMP fallover”, a highly available Managed Node (or Tivoli server) will also start appropriately when the itmf_rg resource group is moved. Once you verify that a resource group’s disk and network resources have moved, you must verify that the Managed Node itself functions in its new cluster node (or in HACMP terms, verify that the application server resource of the resource group is functions in the new cluster node).

In our environment, we perform the live test of HACMP operation at least twice: once to test HACMP resource group moves of disk and network resources in response to a sudden halt of a cluster node, and again while verifying the highly available Managed Node is running on the appropriate cluster node(s).

To verify that a highly available Managed Node is running during a test of a cluster node fallover from tivaix1 to tivaix2, follow these steps:

1. Log into the surviving cluster node as any user.

2. Use the odadmin command, as shown in Example 5-48 on page 502.


Example 5-48 Sample output of command to verify IBM Tivoli Management Framework is moved by HACMP

[root@tivaix1:/home/root] . /etc/Tivoli/setup_env.sh[root@tivaix1:/home/root] odadmin odlistRegion Disp Flags Port IPaddr Hostname(s)1369588498 1 ct- 94 9.3.4.3 tivaix1_svc 9.3.4.194 tivaix1,tivaix1.itsc.austin.ibm.com

The command should be repeated while testing that CWOF works. If CWOF works, then the output will remain identical after the halted cluster node reintegrates with the cluster.

The command should be repeated again to verify that falling back works. In our environment, after moving a resource group back to the reintegrated cluster node so the cluster is in its normal operating mode (tivaix1 has the itmf_rg resource group, and tivaix2 has no resource group), the output of the odadmin command on tivaix1 verifies that the Managed Node runs on the cluster node, but the same command fails on tivaix2 because the resource group is not on that cluster node.

Verify Endpoint falloverVerifying an Endpoint fallover is similar to verifying a Managed Node fallover. Instead of using the odadmin command to verify that a cluster node is running a Managed Node, use the ps and grep commands as shown in Example 5-49 to verify that a cluster node is running a highly available Endpoint.

Example 5-49 Identify that an Endpoint is running on a cluster node

[root@tivaix1:/home/root] ps -ef | grep lcf | grep -v grep root 21520 1 0 Dec 22 - 0:00 /opt/hativoli/bin/aix4-r1/mrt/lcfd -Dlcs.login_interfaces=tivaix1_svc -n hativoli -Dlib_dir=/opt/hativoli/lib/aix4-r1 -Dload_dir=/opt/hativoli/bin/aix4-r1/mrt -C/opt/hativoli/dat/1 -Dlcs.machine_name=tivaix1_svc -Dlcs.login_interfaces=tivaix1 -n hativoli

If there are multiple instances of Endpoints, identify the instance by the directory the Endpoint starts from, highlighted in italics in Example 5-49.

Save HACMP configuration snapshotTake a snapshot to save a record of the HACMP configuration.

Production considerationsIn this document, we show an example implementation, leaving out many ancillary considerations that obscure the presentation. In this section we discuss


some of the issues that you might face in an actual deployment for a production environment.

SecurityIBM Tivoli Management Framework offers many configurable security options and mechanisms. One of these is an option to encrypt communications using Secure Sockets Layer (SSL). This requires a certificate authority (CA) to sign the SSL certificates. Highly available instances of IBM Tivoli Management Framework that use this option should plan and implement the means to make the CA highly available as well.

Tivoli Enterprise productsNot all Tivoli Enterprise products that leverage the Tivoli server, Managed Nodes and Endpoints are addressed with the high availability designs presented in this redbook. You should carefully examine each product’s requirements and modify your high availability design to accommodate them.

5.2 Implementing Tivoli Framework in a Microsoft Cluster

In this section we cover the installation of Tivoli on a Microsoft Cluster, which includes the following topics:

� Installation of a TMR server on a Microsoft Cluster

� Installation of a Managed Node on a Microsoft Cluster

� Installation of an Endpoint on a Microsoft Cluster

5.2.1 TMR serverIn the following sections, we walk you through the installation of Tivoli Framework in a MSCS environment.

� Installation overview - provides an overview of cluster installation procedures. It also provides a reference for administrators who are already familiar with configuring cluster resources and might not need detailed installation instructions.

� Framework installation on node 1 - provides installation instructions for installing and configuring Tivoli Framework on the first node in the cluster. In this section of the install, node 1 will own the cluster resources required for the installation.


� Framework installation on node 2 - provides installation instructions for installing and configuring Tivoli Framework on the second node in the cluster. The majority of the configuration takes place in this section. The second node is required to own the cluster resources in this section.

� Cluster resource configuration - this describes how the Tivoli Framework services are configured as cluster resources. After configuring the cluster resources, the Framework should be able to be moved between the nodes.

Installation overviewIn this section we walk through the installation and configuration of the Framework. The sections following provide greater detail.

Node 1 installation1. Make sure Node 1 is the owner of the cluster group that contains the drive

where framework will be installed (X:, in our example).

2. Insert the Tivoli Framework disc 1 in the CD-ROM drive and execute the following command: setup.exe advanced

3. Click Next past the welcome screen.

4. Click Yes at the license screen.

5. Click Next at the accounts and permissions page.

6. Enter the name of the cluster name resource in the advanced screen (tivw2kv1, in our example). Make sure that the start services automatically box is left unchecked.

7. Specify an installation password if you would like. Click Next.

8. Specify a remote administration account and password if applicable. Click Next.

9. Select Typical installation option. Click Browse and specify a location on the shared drive as the installation location (X:\tivoli, in our example).

10.Enter IBMTIVOLIMANAGEMENTREGIONLICENSEKEY41 as the license key. Click Next.

11.Click Next to start copying files.

12.Press any key after the oserv service has been installed.

13.Click Finish to end the installation on node 1.

Node 2 installation1. Copy tivoliap.dll from node 1 to node 2.

2. Copy the %SystemRoot%\system32\drivers\etc\Tivoli directory from node1 to node 2.


3. Move the cluster group from node 1 to node 2.

4. Source the Tivoli environment.

5. Create tivoli account by running %BINDIR%\TAS\Install\ntconfig -e.

6. Load the tivoliap.dll with the LSA by executing wsettap -a.

7. Set up TRAA account using wsettap.

8. Install TRIP using “trip -install -auto”.

9. Install the Autotrace service using %BINDIR%\bin\atinstall --quietcopy %BINDIR%\bin.

10.Install the object dispatcher using oinstall -install %DBDIR%\oserv.exe.

Cluster resource configuration1. Open the Microsoft Cluster administrator.

2. Create a new resource for the TRIP service.

a. Name the resource TRIP resource (TIVW2KV1 -Trip, in our example). Set the resource type to generic service.

b. Select both nodes as possible owners.

c. Select the cluster disk, cluster name and cluster IP as dependencies.

d. Set the service name to “trip” and check the box Use network name for computer name.

e. There is no registry setting required for the TRIP service.

3. Create a new resource for the oserv service.

a. Name the oserv resource (TIVW2KV1 - Oserv, in our example). Set the resource type to Generic Service.


c. Select the cluster disk, cluster name, cluster IP and TRIP as dependencies.

d. Set the service name to “oserv” and check the box Use network name for computer name.

e. Set the registry key “SOFTWARE\Tivoli” as key to replicate across nodes.

4. Bring the cluster group online.

TMR installation on node 1The installation of a TMR server on an MSCS is very similar to a normal Tivoli Framework installation. In order to perform the installation, make sure that the Framework 4.1 Disk 1 is in the CD-ROM drive or has been copied locally.


1. Start the installation by executing setup.exe advanced. Figure 5-33 illustrates how to initiate the setup using the Windows Run window.

Figure 5-33 Start the installation using setup.exe

2. After the installation is started, [advanced] is displayed after the Welcome to confirm that you are in advanced mode. Click Next to continue (Figure 5-34 on page 506).

Figure 5-34 Framework [advanced] installation screen

3. The license agreement will be displayed; click Yes to accept and continue (Figure 5-35 on page 507).


Figure 5-35 Tivoli License Agreement

4. The next setup screen (Figure 5-36 on page 508) informs you that the tmersrvd account will be created and the Tivoli_Admin_Privleges group will be created. If an Endpoint has been installed on the machine, the account and group will already be installed. Click Next to continue.


Figure 5-36 Accounts and file permissions screen

5. Now you need to enter the hostname of the virtual server where you want the TMR to be installed. The hostname that you enter here will override the default value of the local hostname.

Make sure that the Services start automatically box remains unchecked; you will handle the services via the Cluster Administrator. Click Next to continue (Figure 5-37 on page 509).


Figure 5-37 Framework hostname configuration

6. You can now enter an installation password, if desired. An installation password must be entered to install Managed Nodes, create interregion connections, or install software using Tivoli Software Installation Service.

An installation password is not required in this configuration. Click Next to continue (Figure 5-38 on page 510).


Figure 5-38 Framework installation password

7. Next you can specify a Tivoli Remote Access Account (TRAA). The TRAA is the user name and password that Tivoli will use to access remote file systems.

This is not a required field and can be left blank. Click Next to continue (Figure 5-39 on page 511).


Figure 5-39 Tivoli Remote Access Account (TRAA) setup

8. You can now select from the different installation types. In our example, we show a Typical installation. For information about the other types of installations, refer to the Framework 4.1 documentation.

You will want to change the location where the Tivoli Framework is installed. The installation defaults to C:\Program Files\Tivoli, so it needs to be changed to X\Tivoli. To change the installation directory, click Browse.

Use the Windows browser to select the correct location for the installation directory. In our example, the drive shared by the cluster is the X: drive.

Make sure you select the shared cluster drive as the installation location on your system. After the installation directory has been set, click Next to move to the next step (Figure 5-40 on page 512).


Figure 5-40 Framework installation type

9. In the License key dialog (Figure 5-40), enter the following:

IBMTIVOLIMANAGEMENTREGIONLICENSEKEY41

Click Next to continue.

.


Figure 5-41 Framework license key setup

The setup program will ask you to review the settings that you have specified (Figure 5-42 on page 514).

If settings need to be changed, you can modify them by clicking Back. After you are satisfied with the settings, click Next to continue.


Figure 5-42 Framework setting review

10.After the files have been copied, the oserv will be installed (see Figure 5-43). You will have to select the DOS window and press any key to continue the installation.

Figure 5-43 Tivoli oserv service installation window


11.The Framework installation is now complete on the first node. Click Finish to exit the installation wizard (Figure 5-44). If the installation prompts you to restart the computer, select the option to restart later. You will need to copy some files off node 1 prior to rebooting.

Figure 5-44 Framework installation completion

TMR installation on node 2The Tivoli Framework installation on the second node is not as straightforward as the installation of the installation of the first node. This installation consists of the following manual steps.

1. Before you fail over the X: drive and start the installation on node 2, you need to copy %SystemRoot%\system32\drivers\etc\Tivoli and %SystemRoot%\system32\tivoliap.dll files from node 1.

The easiest way to do this is to copy the files to the shared drive and simply move the drive. However, you can also copy the files from one machine to another. One way to copy the files is to open a DOS window and copy the files using the DOS commands; see Figure 5-45 on page 516.


The commands are as follows:

x:mkdir tmpxcopy /E c:\winnt\system32\drivers\etc\tivoli x:\tmpcopy c:\winnt\system32\tivoliap.dll x:\

Figure 5-45 shows the output.

Figure 5-45 File copy output

2. After the files are copied, you can fail over the X: driver to node 2. You can do this manually by using the Cluster Administrator, but in this case you will need to restart the machine to register the tivoliap.dll on node 1, so you can simply restart node 1 and the drive should fail over automatically.

After node 1 has started to reboot, the X: drive should fail over to node 2. To continue the Framework installation on the node, you will need to open a DOS window on node 2.

Create the c:\winnt\system32\drivers\etc\tivoli directory on node 2:

mkdir c:\winnt\system32\drivers\etc\tivoli

This is shown in Figure 5-46 on page 517.


Figure 5-46 Create the etc\tivoli directory on node 2

3. Now you need to copy the Tivoli environment files from the X:\tmp directory to the c:\winnt\system32\drivers\etc\tivoli directory just created in node 2. To do this, execute:

xcopy /E x:\tmp\* c:\winnt\system32\drivers\etc\tivoli

Figure 5-47 shows the output of this command.

Figure 5-47 Copy the Tivoli environment files

4. Source the Tivoli environment:

c\:winnt\system32\drivers\etc\tivoli\setup_env.cmd

Figure 5-48 on page 518 shows the output of this command.


Figure 5-48 Source the Tivoli environment

5. Now that the Tivoli environment is sourced, you can start configuring node 2 of the TMR. First you need to create the tmersrvd account and the Tivoli_Admin_Privleges group.

To do this, execute the ntconfig.exe executable:

%BINDIR%\TAS\Install\ntconfig -e

See Figure 5-46 on page 517.


Figure 5-49 Add the Tivoli account

6. Copy tivoliap.dll from the X: drive to c:\winnt\system32:

copy x:\tivoliap.dll c:\winnt\system32

The output is shown in Figure 5-50 on page 520.


Figure 5-50 Copy the tivoliap.dll

7. After tivoliap.dll has been copied, you can load it with the wsettap.exe utility:

wsettap -a

A reboot will be required before the tivoliap.dll is completely loaded.


Figure 5-51 Register the tivoliap.dll

8. Install the Autotrace service. Framework 4.1 includes a new embedded Autotrace service for use by IBM Support. Autotrace uses shared memory segments for logging purposes.

To install Autotrace:

%BINDIR%\bin\atinstall --quitecopy %BINDIR%\bin


Figure 5-52 Installing Autotrace

9. Finally, you need to install and start the oserv service. To install the oserv service:

oinstall -install %DBDIR%\oserv.exe

Figure 5-53 on page 523 shows the output of the command, indicating that oserv service has been installed.


Figure 5-53 Create the oserv service

After the oserv service is installed, your setup of node 2 is complete. Now you need to restart node 2 to load tivoliap.dll.

Setting up cluster resourcesNow that the binaries are installed on both nodes of the clusters, you need to create the cluster resources. You will need to create two cluster resources, one for the oserv service and one for the TRIP service. Because the oserv service depends on the TRIP service, you need to create the TRIP service first.

Create the resources using the Cluster Administrator.

1. Open the Cluster Administrator by selecting Start -> Programs -> Administrative Tools -> Cluster Administrator.

2. After the Cluster Administrator is open, you can create a new resource by right-clicking your cluster group and selecting New -> Resource, as shown in Figure 5-54 on page 524.


Figure 5-54 Create a new resource

3. Select the type of resource and add a name. You can name the resource however you would like. In our example, we chose TIVW2KV1 - TRIP, in order to adhere to our naming convention (see Figure 5-55 on page 525).

The Description field is optional. Make sure that you change the resource type to a generic service, and that the resource belongs to the cluster group that contains the drive where the Framework was installed. Click Next to continue.


Figure 5-55 Resource name and type setup

4. Define which nodes can own the resource. Since you are configuring your TMR for a hot standby scenario, you need to ensure that both nodes are added as possible owners (see Figure 5-56 on page 526). Click Next to continue.


Figure 5-56 Configure possible resource owners

5. Define the dependencies for the TRIP service. On an MSCS, dependencies are defined as resources that must be active in order for another resource to run properly. If a dependency is not running, the cluster will fail over and attempt to start on the secondary node.

To configure TRIP, you need to select the shared disk the cluster IP and the cluster name resources as dependencies, as shown in Figure 5-57 on page 527. Click Next to continue.


Figure 5-57 TRIP dependencies

6. Define which service is associated with your resource. The name of the Tivoli Remote Execution Service is “trip”, so enter that in the Service name field. There are no start parameters.

Make sure that the Use Network Name for computer name check box is selected (see Figure 5-58 on page 528). Click Next to continue.


Figure 5-58 TRIP service name

7. One of the options available with MSCS is to replicate registry keys between the nodes of a cluster. This option is not required for the TRIP service, but you will use it later when you create the oserv service.

Click Finish to continue (see Figure 5-59 on page 529).


Figure 5-59 Registry replication

The resource has now been created. You will notice that when a resource is created, it is offline. This is normal. You will start the resources after the configuration is complete.

Next, create the oserv cluster resource. You do this by using the same process used to create the TRIP resource.

8. Open the Cluster Administrator, right-click your cluster group, and select New -> Resource, as shown in Figure 5-60 on page 530.


Figure 5-60 Create a new resource

9. Select a name for the resource. We used oserv in our example, as seen in Figure 5-61 on page 531. Add a description if desired.

Make sure you specify the resource type to be a Generic Service. Click Next to continue.


Figure 5-61 Resource name and type setup

10.Select both nodes as owners for the oserv resource, as shown in Figure 5-62 on page 532. Click Next to continue.


Figure 5-62 Select owners of the resource

11.Select all the cluster resources in the cluster group as dependencies for the oserv resource, as seen in Figure 5-63 on page 533. Click Next to continue.


Figure 5-63 Select resource dependencies

f. Specify “oserv” as the service name. Make sure to check the box Use Network Name for computer name (see Figure 5-64 on page 534). Click Next to continue.


Figure 5-64 Service and parameter setup

g. Click Add and specify the registry key “SOFTWARE\Tivoli” as the key to replicate (see Figure 5-65 on page 535). Click Finish to complete the cluster setup.



12.At this point, the installation of Framework on an MSCS is almost complete. Now you have to bring the cluster resources online.

To do this, right-click the cluster group and select Bring Online, as seen in Figure 5-66 on page 536.


Figure 5-66 Bringing cluster resources online

The Framework service should now fail over whenever the cluster or one of nodes fails.

5.2.2 Tivoli Managed NodeIn this section, we cover the Managed Node Framework installation process on an MSCS. The Managed Node installation method we have chosen is via the Tivoli Desktop. However, the same concepts should apply for a Managed Node installed using Tivoli Software Installation Service (SIS), or using the wclient command. The following topics are covered in this section:

� Installation overview - provides a brief overview of the steps required to install Tivoli Framework on an MSCS Managed Node

� TRIP installation - describes the installation of the Tivoli Remote Execution Protocol (TRIP), which is a required prerequisite for Managed Node installation

� Managed Node installation - covers the steps to install a Managed Node on a MSCS from the Tivoli Desktop

� Manages Node configuration - covers the setup process on the second node, as well as the configuring oserv to bind to the cluster IP address


� Cluster resource configuration - covers the cluster configuration, which consists of the setup of the oserv and TRIP resources

The Managed Node installation process has many installation steps in common with the installation of the TMR server. For these steps, we refer you back to the previous section for the installation directives

Installation overviewHere we give a brief outline of the Managed Node installation process on an MSCS system. The sections following describe the steps listed here in greater detail.

Figure 5-67 on page 538 illustrates the configuration we use in our example.


Figure 5-67 Tivoli setup

TRIP installationTo install TRIP, follow these steps:

1. Insert Framework CD 2 in the CD-ROM drive and run setup.exe.

2. Click Next at the welcome screen,

3. Click Yes at the license agreement.

4. Select a local installation directory to install TRIP (c:\tivoli\trip, in our example).

5. Click Next to start copying files.

6. Press any key after the TRIP service has been installed.

7. Click Finish to complete the installation.

8. Follow steps 1-7 again on node 2 so TRIP is installed on both nodes of the cluster.

Clustered ManagedNode

tivw2k1 tivw2k2

TIVW2KV1 Resource GroupDriv e X:IP Address 9.3.4.199Network Name TIVW2KV1

edinburgh

TMR


Managed Node installation on node 11. Open the Tivoli desktop and log in to the TMR that will own the Managed

Node.

2. Open a policy region where the Managed Node should reside and select Create -> ManagedNode.

3. Click Add Clients and enter the name associated with the cluster group where the Managed Node will be installed (tivw2kv1, in out example).

4. Click Select Media and browse to the location of Framework disc 1.

5. Click Install Options and make sure that the installation directories are all located on the cluster’s shared drive (X:\tivoli, in our example). Verify that Arrange for start of the Tivoli daemon at system (re)boot time is unchecked.

6. Select Account as the default access method, and specify an account and password with administrator access to the Managed Node you are installing.

7. Click Install & Close to start the installation.

8. Click Continue Install at the Client Install screen.

9. Specify a Tivoli Remote Access Account if necessary (in our example, we used the default access method option).

10.Click Close at the reboot screen. You do not want to reboot at this time.

11.Click Close after the Client Install window states that it has finished the client install.

Managed Node installation on node 21. Copy tivoliap.dll from node 1 to node 2.



4. Source the Tivoli environment.

5. Create the tivoli account by running %BINDIR%\TAS\Install\ntconfig -e.

6. Load tivoliap.dll with the LSA by executing wsettap -a.

7. Set up the TRAA account using wsettap.

8. Install the autotrace service %BINDIR%\bin\atinstall --quietcopy %BINDIR%\bin.

9. Install the object dispatcher oinstall -install %DBDIR%\oserv.exe


10.Start the oserv service:

net start oserv /-Nali /-k%DBDIR% /-b%BINDIR%\..

11.Change the IP address of the Managed Node from the physical IP to the cluster IP address:

odadmin odlist change_ip <dispatcher> <cluster ip> TRUE

12.Set the oserv to bind to a single IP:

odadmin set_force_bind TRUE <dispatcher>

Cluster resource configuration1. Open the Microsoft Cluster administrator.


a. Name the resource TRIP resource (TIVW2KV1 -Trip, in our example). Set the resource type to Generic Service.




e. There are no registry settings required for the TRIP service.








TRIP installationTivoli Remote Execution Service (TRIP) must be installed before installing a Tivoli Managed Node. Install TRIP as follows:

1. Insert Tivoli Framework CD 2 in the CD-ROM drive of node 1 and execute the setup.exe found in the TRIP directory (see Figure 5-68 on page 541).


Figure 5-68 Start TRIP installation

2. Click Next past the installation Welcome screen (Figure 5-69).

Figure 5-69 TRIP Welcome screen

3. Click Yes at the License agreement (see Figure 5-70 on page 542).


Figure 5-70 The TRIP license agreement

4. Select the desired installation directory. We used the local directory c:\tivoli, as shown in Figure 5-71 on page 543. Click Next to continue.


Figure 5-71 Installation directory configuration

5. Click Next to start the installation (see Figure 5-72 on page 544).


Figure 5-72 Installation confirmation

6. Click any key after the TRIP service has been installed and started (Figure 5-73).

Figure 5-73 TRIP installation screen

7. Click Next to complete the installation (see Figure 5-74 on page 545).


Figure 5-74 TRIP installation completion

8. Repeat the TRIP installation steps 1-7 on node 2.

Managed Node installation on node 1.In this section we describe the steps needed to install the Managed Node software on node 1 of the cluster. The Managed Node software will be installed on the cluster’s shared drive X:, so you need to make sure that node 1 is the owner of the resource group that contains the X: drive.

We will be initiating the installation from the Tivoli Desktop, so log in the TMR (edinburgh).

1. After you are logged in to the TMR, navigate to a policy region where the Managed Node will reside and click Create -> ManagedNode (see Figure 5-75 on page 546).


Figure 5-75 ManagedNode installation

2. Click Add Clients button and enter the name of the virtual name of the cluster group. In our case, it is tivw2kv1. Click Add & Close (Figure 5-76).

Figure 5-76 Add Clients dialog


3. Insert the Tivoli Framework CD 1 in the CD-ROM drive on the TMR server and click Select Media....

Navigate to the directory where the Tivoli Framework binaries are located on the CD-ROM. Click Set Media & Close (Figure 5-77).

Figure 5-77 Tivoli Framework installation media

4. Click Install Options.... Set all installation directories to the shared disk (X:). Make sure you check the boxes When installing, create “Specified Directories if missing and Configure remote start capability of the Tivoli daemon.

Do not check the box Arrange for start of the Tivoli daemon at system (re)boot time. Let the cluster service handle the oserv service. Click Set to continue (see Figure 5-78 on page 548).


Figure 5-78 Tivoli Framework installation options

5. You need to specify the account that Tivoli will use to perform the installation on the cluster. Since you are only installing one Managed Node at this time, use the default access method.

Make sure the Account radio button is selected, then enter the userid and password of an account on the node 1 with administrative rights on the machine. If a TMR installation password is used on your TMR, enter it now. Click Install & Close (see Figure 5-79 on page 549).


Figure 5-79 Specify a Tivoli access account

6. Now the Tivoli installation program will attempt to contact the Managed Node and query it to see what needs to be installed. You should see output similar to Figure 5-80 on page 550.

7. If there are no errors, then click Continue Install to begin the installation; see Figure 5-80 on page 550.


Figure 5-80 Client installation screen

8. If your environment requires the use of a Tivoli Remote Access Account (TRAA), then specify the account here. In our example we selected Use Installation ‘Access Method’ Account for our TRAA account.

Click Continue (see Figure 5-81 on page 551).


Figure 5-81 Tivoli Remote Access Account (TRAA) setup

9. Select Close at the client reboot window (Figure 5-82). You do not want your servers to reboot until after you have configured them.

Figure 5-82 Managed Node reboot screen


10.The binaries will now start to copy from the TMR server to the Managed Node. The installation may take a while, depending on the speed of your network and the type of machines were your installing the ManagedNode software.

After the installation is complete, you should see the following message at the bottom of the scrolling installation window: Finished client install.

Click Close to complete the installation (Figure 5-83).

Figure 5-83 Managed Node installation window


Managed Node installation on node 2Now you need to replicate manually on node 2 what the Tivoli installation performed on node 1. Because steps 1 to 9 of the Managed Node configuration are the same as the TMR installation of node 2 (see 5.2.1, “TMR server” on page 503), we do not cover those steps in great detail here.

1. Copy the tivoliap.dll from node 1 to node 2.



4. Source the Tivoli environment on node 2.

5. Create the tivoli account by running %BINDIR%\TAS\Install\ntconfig -e.

6. Load the tivoliap.dll with the LSA by executing wsettap -a.

7. Set up the TRAA account by using wsettap.

8. Install the Autotrace service %BINDIR%\bin\atinstall --quietcopy %BINDIR%\bin.

9. Install the object dispatcher oinstall -install%DBDIR%\oserv.exe.

10.Start the oserv service:

net start oserv /-Nali /-k%DBDIR% /-b%BINDIR%\..S.

Figure 5-84 Starting the oserv service

11.Change the IP address of the Managed Node from the physical IP to the cluster IP address:

odadmin odlist change_ip <dispatcher> <cluster ip> TRUE


12.Set the oserv to bind to a single IP address:

odadmin set_force_bind TRUE <dispatcher>

Figure 5-85 Configure Managed Node IP address

13.Restart both systems to register tivoliap.dll.

Cluster resource configurationThe steps needed for cluster resource configuration here are the same as for the cluster resource configuration of a TMR as discussed in 5.2.1, “TMR server” on page 503, so refer to that section for detailed information. In this section, we simply guide you through the overall process.

1. Open the Microsoft Cluster administrator.


a. Name the resource TRIP resource (TIVW2KV1 -Trip, in our example). Set the resource type to Generic Service.




e. There are no registry settings required for the TRIP service.









5.2.3 Tivoli EndpointsIn this section we provide a detailed overview describing how to install multiple Tivoli Endpoints (TMAs) on a Microsoft Cluster Service (MSCS). The general requirements for this delivery are as follows:

� Install a Tivoli Endpoint on each physical server in the cluster.

� Install a Tivoli Endpoint on a resource group in the cluster (“Logical Endpoint”). This Endpoint will have the hostname and IP address of the virtual server.

� The Endpoint resource will roam with the cluster resources. During a failover, the cluster services will control the startup and shutdown of the Endpoint.

The purposes of this section are to clearly demonstrate what has been put in place (or implemented) by IBM/Tivoli Services, to provide a detailed document of custom configurations, installation procedures, and information that is generally not provided in user manuals. This information is intended to be a starting place for troubleshooting, extending the current implementation, and documentation of further work.

Points to considerNote the following points regarding IBM’s current solution for managing HA cluster environments for Endpoints.

� The Endpoint for the physical nodes to represent the physical characteristics (“Physical Endpoint”):

– Always stays at the local system– Does not fail over to the alternate node in the cluster– Monitors only the underlying infrastructure

� The Endpoint for every cluster resource group representing the logical characteristics (“Logical Endpoint”):

– Moves together with the cluster group– Stops and starts under control of HA– Monitors only the application components within the resource group

� Several limitations apply (for instance, Endpoints have different labels and listen on different ports)


� Platforms

– Solaris, AIX, HP-UX, Windows NT, Windows 2000– Platform versions as supported by our products today

Installation and configurationThe complete solution for managing/monitoring the MSCS involves installing three Tivoli Endpoints on the two physical servers. One “Physical Endpoint” will reside on each server, while the third Endpoint will run where the cluster resource is running. For example, if node 1 is the active cluster or contains the cluster group, this node will also be running the “Logical Endpoint” alongside its own Endpoint (see Figure 5-86).

Figure 5-86 Endpoint overview

An Endpoint is installed on each node to manage the physical components, and we call this the “Physical Endpoint". This Endpoint is installed on the local disk of the system using the standard Tivoli mechanism. This Endpoint is installed first, so its instance id is "1" on both physical servers (for example, \Tivoli\lcf\dat\1).


A second Endpoint instance (its instance id is "2") is installed on the shared file system. This Endpoint represents the application that runs on the cluster, and we call it the “Logical Endpoint”. The Endpoints will not share any path, cache or content; their disk layout is completely separated.

The Logical Endpoint will have an Endpoint label that is different from the physical Endpoint and will be configured to listen on a different port than the physical Endpoint.

The general steps to implementing this configuration are as follows:

1. Install the Tivoli Endpoint on node 1, local disk.

2. Install the Tivoli Endpoint on node 2, local disk.

3. Manually install the Tivoli Endpoint on the logical server, shared drive X: (while logged onto the currently active cluster node).

4. Configure the new LCFD service as a “generic service” in the cluster group (using the Cluster Administrator).

5. Move the cluster group to node 2 and register the new LCFD service on this node by using the lcfd.exe –i command (along with other options).

Environment preparation and configurationBefore beginning the installation, make sure there are no references to “lcfd” in the Windows Registry. Remove any references to previously installed Endpoints, or you may run into problems during the installation.

Verify that you have two-way communication to and from the Tivoli Gateways from the cluster server via hostname and IP address. Do this by updating your name resolution system (DNS, hosts files, and so on). We strongly recommend that you enter the hostname and IP address of the logical node in the host’s file of each physical node. This will locally resolve the logical server’s hostname when issuing the ping –a command.

Finally, note that this solution works only with version 96 and higher of the Tivoli Endpoint.

Note: This is very important to the success of the installation. If there are any references (typically legacy_lcfd), you will need to delete them using regedt32.exe.


Install the Tivoli Endpoint on node 1To install the Tivoli Endpoint on node 1, follow these steps:

1. Install the Tivoli Endpoint using the standard CD InstallShield setup program on one of the physical nodes in the cluster.

2. In our case, we leave the ports as default, but enter optional commands to configure the Endpoint and ensure its proper login.

Figure 5-87 Endpoint advanced configuration

The configuration arguments in the Other field are:

-n <ep label> -g <preferred gw> -d3 -D local_ip_interface=<node primary IP> -D bcast_disable=1

3. The Endpoint should install successfully and log in to the preferred Gateway. You can verify the installation and login by issuing the following commands on the TMR or Gateway (Figure 5-88 on page 559).


Figure 5-88 Endpoint login verification

Install the Tivoli Endpoint on node 2To install the Tivoli Endpoint on node 2, follow these steps:

1. Install the Tivoli Endpoint on the physical node 2 in the cluster. Follow the same steps and options as in node 1 (refer to “Install the Tivoli Endpoint on node 1” on page 558).

2. Verify that you have a successful installation and then log in as described.

Manually install the Tivoli Endpoint on the virtual nodeTo install the Tivoli Endpoint on the virtual node, follow these steps.

1. On the active node, copy only the Tivoli installation directory (c:\Program Files\Tivoli) to the root of the X: drive. Rename X:\Tivoli\lcf\dat\1 to X:\Tivoli\lcf\dat\2.

Note: You will only be able to do this from the active cluster server, because the non-active node will not have access to the shared volume X: drive.


Note: Do not use the “Program Files” naming convention on the X: drive.

2. Edit the X:\Tivoli\lcf\dat\2\last.cfg file, changing all of the references of c:\Program Files\Tivoli\lcf\dat\1 to X:\Tivoli\lcf\dat\2.

3. On both physical node 1 and physical node 2, copy the c:\winnt\Tivoli\lcf\1 directory to c:\winnt\Tivoli\lcf\2.

4. On both physical node 1 and physical node 2, edit the c:\winnt\Tivoli\lcf\2\lcf_env.cmd and lcf_env.sh files, replacing all references of c:\Program Files\Tivoli\lcf\dat\1 to X:\Tivoli\lcf\dat2.

5. Remove the lcfd.id, lcfd.sh, lcfd.log, lcfd.bk and lcf.dat files from the X:\Tivoli\lcf\dat\2 directory.

6. Add or change the entries listed in Example 5-50 to the X:\Tivoli\lcf\dat\2\last.cfg file.

Example 5-50 f:\Tivoli\lcf\dat\2\last.cfg file

lcfd_port=9497lcfd_preferred_port=9497lcfd_alternate_port=9498local_ip_interface=<IP of the virtual cluster>lcs.login_interfaces=<gw hostname or IP>lcs.machine_name=<hostname of virtual Cluster>UDP_interval=30UDP_attempts=3login_interval=120

The complete last.cfg file should resemble the output shown in Figure 5-89 on page 561.


Figure 5-89 Sample last.cfg file

7. Execute the following command:

X:\Tivoli\lcf\bin\w32-ix86\mrt\lcfd.exe -i -n <virtual_name> -C X:\Tivoli\lcf\dat\2 -P 9497 -g <gateway_label> –D local_ip_interface=<virtual_ip_address>

Note: The IP address and name are irrelevant as long as their label is a unique label with -n specified. Every time the Endpoint logs in, the Gateway registers the IP that contacted it. It will use that IP from that point forward for down calls.

A single interface cannot be bound to multiple interface machines, so the routing must be very good; otherwise, with every UP call generated, or every time the Endpoint starts, the IP address will be changed if it differs from the Gateway.

However, if the Endpoint is routing out of an interface that is not reachable by the Gateway, then all downcalls will fail, even though the Endpoint logged in successfully. This will obviously cause some problems with the Endpoint.


8. Set the Endpoint manager login_interval to a smaller number. The default=270 New=20. Run the following command on the TMR:

wepmgr set login_interval 20

Set up physical node 2 to run the Logical EndpointTo set up the physical node 2 to run the Logical Endpoint, follow these steps:

1. Move the cluster group containing the X: drive to node 2, using the Cluster Administrator.

2. On node 2, which is now the active node (the node which you e have not yet registered the logical Endpoint), open a command prompt window and again run the following command to create and register the lcfd-2 service on this machine:

X:\Tivoli\lcf\bin\w32-ix86\mrt\lcfd.exe -i -n <virtual_name> -C X:\Tivoli\lcf\dat\2 -P 9497 -g <gateway_label> –D local_ip_interface=<virtual_ip_address>

The output listed in Figure 5-90 is similar to what you should see.

Figure 5-90 Registering the lcfd service

3. Verify that the new service was installed correctly by viewing the services list (use the net start command or Control Pane -> Services). Also view the new registry entries using the Registry Editor. You will see two entries for the lcfd service, “lcfd” and “lcfd-2”, as shown in Figure 5-91 on page 563.


Figure 5-91 lcfd and lcfd-2 services in the registry

4. Verify that the Endpoint successfully started and logged into the Gateway/TMR and that it is reachable (Figure 5-92).

Figure 5-92 Endpoint login verification


Configure the cluster resources for the failoverTo configure the cluster resources for the failover, follow these steps:

1. Add a new resource to the cluster.

2. Log on to the active cluster node and start the Cluster Administrator, using the virtual IP address or hostname.

3. Click Resource, then right-click in the right-pane and select New -> Resource (Figure 5-93).

Figure 5-93 Add a new cluster resource

4. Fill in the information as shown in the next dialog (see Figure 5-94 on page 565).


Figure 5-94 Name and resource type configuration

5. Select both TIVW2KV1 and TIVW2KV2 as possible owners of the cluster Endpoint resource (see Figure 5-95 on page 566).


Figure 5-95 Possible Owners

6. Move all available resources to the “Resources dependencies” box (see Figure 5-96 on page 567).


Figure 5-96 Dependency configuration

7. Enter the new service name of the Endpoint just installed (see Figure 5-97 on page 568).


Figure 5-97 Add lcfd-2 service name

8. Click Next past the registry replication screen (see Figure 5-98 on page 569). No registry replication is required.



9. Click Next at the completion dialog (Figure 5-99).


10.Bring the new service resource online by right-clicking the resource and selecting Bring Online (Figure 5-100 on page 570). You will see the icon first change to the resource “book” with a clock, and then it will come online and display the standard icon indicating it is online.


Figure 5-100 Bring resource group online

11.Test the failover mechanism and failover of the Cluster Endpoint service, as follows:

a. Move the resource group from one server to the other, using the Cluster Administrator.

b. After the resource group has been moved, log into the new active server and verify that Endpoint Service “Tivoli Endpoint-1” is running along side the physical server’s Endpoint “Tivoli Endpoint”.

c. Failover again and do the same.


Appendix A. A real-life implementation

In this appendix, we describe the implementation tasks within a deployment of a HACMP Version 4.5 and IBM Tivoli Workload Scheduler Version 8.1 scheduling environment in a real customer. We cover the installation roadmap and actual installation steps, and provide our observations in this real-life implementation.

The versions of software used are HACMP.

A


Rationale for IBM Tivoli Workload Scheduler and HACMP integration

The rationale for the integration of IBM Tivoli Workload Scheduler and HACMP was to use a proactive approach to a highly available scheduling solution, rather than a reactive approach. The IBM AIX/SP frame hardware environment has been an impressively stable system. However, on occasion as a TCP/IP network issue arises, customers new to IBM Tivoli Workload Scheduler scheduling environments naturally become concerned that IBM Tivoli Workload Scheduler schedules and jobs are not running on FTAs as expected. It is then realized that the IBM Tivoli Workload Scheduler FTAs continue to run their jobs even during these temporary network disruptions. This concern then developed into a risk assessment where the actual loss of the IBM Tivoli Workload Scheduling Master Domain Manager was considered.

Taking the loss of a IBM Tivoli Workload SchedulerMaster Domain Manager into consideration can be a serious concern for many customers. Where some customers feel a IBM Tivoli Workload Scheduler Backup Domain Manager is sufficient for a failover scenario, other customers will realize that their entire data center, which is now controlled by IBM Tivoli Workload Scheduler, could potentially go idle for several hours during this failover period.

This could be a very serious problem for a large customer environment, especially if a IBM Tivoli Workload Scheduler MDM failure were to occur shortly before the release of the (05:59) Jnextday job. Data centers running business critical applications or 10000 to 20000 jobs a day simply cannot afford a lapse in a scheduling service. Therefore, a highly available IBM Tivoli Workload Scheduling scheduling solution must be implemented.

Our environmentFigure A-1 on page 573 shows an overview of the environments used in this implementation.


Figure A-1 Our environment

Installation roadmapFigure A-2 on page 574 shows our installation roadmap. This flowchart is provided to help visualize the steps required to perform a successful IBM Tivoli Workload Scheduler HACMP integration. The steps are arranged sequentially, although there are certain tasks that can be performed in parallel.

This flowchart can be considered to be at least a partial checklist for the tasks that must be performed in your installation.

Appendix A. A real-life implementation 573

Figure A-2 IBM Tivoli Workload Scheduler HACMP configuration flowchart

Software configurationThe following is a description of the IBM Tivoli Workload Scheduling software configuration which is in production.


� AIX 5.1 (Fix Pack 5100-03).

� IBM Tivoli Workload Scheduling 8.1 (Patch 08).

� Anywhere from 500 to 1500 business critical jobs running per day.

� There are currently 56 FTAs (both AIX and NT), with average of one FTA node being added per month.

� 125 defined schedules.

� 325 defined jobs.

� Nine different workstations classes

� Four customized calendars.

Hardware configurationThe hardware design and configuration for this type of work must be carefully planned and thought out before purchasing any devices for the configuration. If this is not done properly, the deployment of your design may be stalled until all component issues are resolved.

There are several groups of people who would be involved in this design, and various team members may be able to assist in the configuration.

Disk storage designThe disk storage design and configuration is a critical component to a successful HACMP failover design. This disk configuration must be able to be seen by all nodes within the cluster.

Our selection for this centralized disk storage is based on IBM 7133 SSA storage arrays.

Heartbeat interfaceThe HACMP heartbeat design is a critical component to a stable HACMP deployment.

Note: The redundant SSA controllers must be of the same version and revision. Different levels of controllers provide different raid levels, speeds, or other functions, thereby introducing incompatibility problems into the HACMP design.


Our design uses the Non-IP Network Serial Cable method, because of:

� Simplicity; once the cable is installed and tested, the configuration will probably never be touched again.

� There are no electrical or power issues associated with this configuration.

� The design is portable in the event you migrate from one disk technology to another (for example, SCSI to SSA).

� There are no moving parts to this configuration, so there is virtually no mean time between failure (MTBF) issues on a serial cable.

Ethernet connectivityProper network connectivity is critical to a successful HACMP deployment. There is little purpose to continuing without it, as HACMP will not validate or accept the configuration if the network is not properly configured.

Currently we have three Ethernet adapters per machine (en0, en1, en2), totaling 6 adapters. This configuration has six IP addresses, plus one more that is actually used for the IBM Tivoli Workload Scheduler service that all IBM Tivoli Workload Scheduler FTAs connect to (the service address).

We will use IP aliasing in the final production environment; this aliasing process promotes a very fast HACMP failover.

Installing the AIX operating systemAIX 5.1 must be installed on both nodes. The same version must be installed on both machines, and both nodes must be running at the same patch level.

The files that should be backed up and restored to the new confirmation are:

– Root: .rhosts

– /etc/hosts

Notes:

� Understanding the network configuration is probably one of the most critical components to the HACMP configuration. Find assistance with this step if you do not have a good understanding of the HACMP and networking relationship.

� All adapters to be utilized within the HACMP solution must reside within different network subnets (but the netmask must be the same).


– root: .profile/.kshrcopermps: .profile/.kshrcmaestro: .profile/.kshrc operha: profile/.kshrc

– Installation files: maestro.tar, HACMP, IBM Tivoli Workload Scheduler connectors, Plus module, and the IBM Tivoli Workload Scheduler Windows Java Console code

– /etc/password

– /usr/local/HACMP/scripts/*

Patching the operating systemHACMP 4.5 requires that the AIX operating system be patched to version 5100-02; the current HACMP test configuration is at 5100-03. IBM recommends that the latest level of operating system patches be installed on the nodes before going into production. The latest available patch level is 5100-04.

Finishing the network configurationAfter the operating system installation (and patching) has been completed, all the network adapters should be reviewed for accuracy.

Creating the TTY device within AIXThe creation of a tty device on each node is required for the serial heartbeat. This is done through the SMIT interface (it must be run by root). At this point, you can connect your serial cable (null modem Cable).

Note: If you connect your cable before you define your device, your graphical display may not work because the boot process will see a device connected to the serial port and assume it is a terminal.

Use the following link to Create the TTY Device within AIX:

SMIT -> Devices -> TTY -> Add a TTY -> tty rs232 Asynchronous Terminal -> sa0 Available 00-00-S1 Standard I/O Serial Port1

Tip: To identify the current version on the AIX node, enter:

oslevel -r

Tip: As root, run the command ifconfig -a, which will display all information about the configured adapters in the machine


Figure A-3 shows our selections.

Figure A-3 Add a TTY

Testing the heartbeat interfaceTo test the heartbeat interface, run the following tests.

The stty testTo test communication over the serial line after creating the tty device on both nodes, do the following:

1. On the first node, enter:

stty < /dev/ttyx

where /dev/ttyx is the newly added tty device. The command line on the first node should hang until the second node receives a return code.

2. On the second node, enter:

stty < /dev/ttyx

where /dev/ttyx is the newly added tty device.


If the nodes are able to communicate over the serial line, both nodes display their tty settings and return to the prompt.

The cat testTo perform the cat test on two nodes connected by an RS232 cable, do the following:

1. On the first node, run:

cat < /dev/ttyN

where ttyN is the tty number which RS232 is using on the first node. Press Enter. The command line on the first node should hang.

2. On the second node, run:

cat /etc/hosts > /dev/ttyN

where ttyN is the tty number which RS232 is using on the second node. Press Enter.

3. If the data is transmitted successfully from one node to another, then the text from the /etc/hosts file from the second node scrolls on the console of the first node.

Note: You can use any text file for this test, and do not need to specifically use the /etc/hosts file.

Configuring shared disk storage devicesDisk storage must be configured between both nodes. Both nodes must be able to mount the file system(s) in the same location. This file system is a non-concurrent volume because IBM Tivoli Workload Scheduler has no way of properly working with “Raw File Systems”.

Note: This is a valid communication test of a newly added serial connection before the HACMP/ES for AIX /usr/es/sbin/cluster/clstrmgr daemon has been started. This test yields different results after the daemon has been started, since this daemon changes the initial settings of the tty devices and applies its own settings. The original settings are restored when the HACMP/ES for AIX software exits.

Note: Testing of disk storage can be done (as root) by issuing the commands:

� varyonvg IBM Tivoli Workload Schedulingvg, mount /opt/tws

� umount /opt/tws, varyoffvg twsvg


Copying installation code to shared storageSince the machines in this cluster are not physically accessible, it is not realistic to assume you will be able to put CDs into their CD-ROMs as required in a normal installation. Therefore, it is important to copy the installation code into a central location within the cluster. The code that should be copied into the shared volume group that all cluster nodes can see.

Following is a list of the code that should be copied into this shared location:

� IBM Tivoli Workload Scheduler Installation code/opt/tws/tmp/swinst/tws_81

� IBM Tivoli Workload Scheduler - Patch Code/opt/tws/tmp/swinst/tws_81.patch

� IBM Tivoli Workload Scheduler Java Console (Latest version)/opt/tws/tmp/swinst/javacon_1.2.x

� Tivoli Framework/opt/tws/tmp/swinst/framework_3.7

� Tivoli Framework - Patch Code /opt/tws/tmp/swinst/framework_3.7b.patch

� IBM Tivoli Workload Scheduler Connector for the Framework/opt/tws/tmp/swinst/connector_1.2

� IBM Tivoli Workload Scheduler Connector – Patch Code /opt/tws/tmp/swinst/connector_1.2.x.patch

� IBM Tivoli Workload Scheduler Plus Module for the Framework/opt/tws/tmp/swinst/plusmod_1.2

� IBM Tivoli Workload Scheduler Plus Module – Patch Code/opt/tws/tmp/swinst/plusmod_1.2.x.patch

� HACMP Installation Code/opt/tws/tmp/swinst/hacmp_4.5

� HACMP – Patch Code/opt/tws/tmp/swinst/hacmp_4.5.x.patch

Documentation will also be located in the same volume group so that users can easily access it.

The Adobe documentation (*.pdf) will be copied into this shared location:

� IBM Tivoli Workload Scheduler Documentation/opt/tws/tmp/docs/tws_v81

� HACMP Documentation/opt/tws/tmp/docs/hacmp_v45

Note: It is critical that all data copied up to the UNIX cluster through FTP be copied in “bin” mode. This will prevent data corruption from dissimilar nodes (for example, Windows and UNIX).


Creating user accountsCreate the user accounts (maestro and operha) after the shared disk storage is configured and tested.

� maestro

The maestro account must be created on both machines while the volume group/file system is mounted to the machine. This means mounting the file system, creating the account, un-mounting the file system, logging onto the next machine in the cluster, mounting the file system on the second machine, creating the maestro account, and then un-mounting the file system.

� operha

The operha account is an account to log into other than the maestro account (currently we are using an opermps account). The operha account is important because there are moments where we will need access to one or all nodes in the cluster, but we should be logged in as maestro because we would not have access to the shared file systems (opt/tws). Also, during a failover procedure, a user logged in as maestro will create problems as the system tries to un-mount the /opt/tws file system.

Creating group accountsThe user Tivoli group should be created after the shared disk storage is configured and tested.

Keep the following in mind when creating the group accounts:

� This account was formerly known as unison.

� The Tivoli group must be associated with the creation of the maestro account.

� The Tivoli group must not be associated with the creation of the operha account.

Installing IBM Tivoli Workload Scheduler softwareInstallation of the IBM Tivoli Workload Scheduler software (Version 8.1) at this time must be done on all nodes in the cluster, which means that if there are two

Note: Because the users are created on both machines, they must have their userids synchronized across both machines. This is critical to the successful configuration of a cluster.


nodes in the cluster, then two IBM Tivoli Workload Scheduler installations must occur.

Patching the IBM Tivoli Workload Scheduler configurationPatching the IBM Tivoli Workload Scheduler engine (on both the master and the FTAs) is highly recommended. The method for deploying the patch will vary among customers; some will patch manually, while others may use a software distribution mechanism.

Installing HACMP softwareThe installation of the HACMP software must be performed on all nodes within the HACMP Cluster (in our case, we have a two-node cluster).

Patching the HACMP softwarePatching the HACMP software is critical within the HACMP environment; it is advisable to patch the HACMP system twice a year.

Whenever the HACMP upgrade occurs, it must be performed to all nodes within the HACMP Cluster. You cannot have multiple nodes within the cluster out of

Note: You must complete the maestro user and Tivoli group before starting the installation of the IBM Tivoli Workload Scheduler software. The actual software installation can be done following the creation of the user and group on a single machine, or you can create the user and group on all nodes first, and then cycle around to install the software again (requiring you to issue the umount, varyoffvg/varonvg and mount commands).

Note: It is advised that the patching of the IBM Tivoli Workload Scheduler Master be done manually, because the IBM Tivoli Workload Scheduler Administration staff has access to the machine and you need to be very careful about the procedures that are performed, especially when dealing with the added complexities that the HACMP environment introduces.

Notes:

� The current version of HACMP in our environment is 4.5.

� The location for the HACMP documentation (*.pdf) should reside under the volume group (twsvg) and be located in /opt/tws/tmp/docs. These Adobe *.pdf files will be delivered during the installation of HACMP and should be copied into the /opt/tws/tmp/docs so that they are easily located.


code synchronization for an extended period of time (IBM will not support this configuration).

Installing the Tivoli TMR softwareInstalling the Tivoli TMR (or Tivoli server) must be done on all nodes in the cluster; if there are two nodes in the cluster, then two IBM Tivoli Workload Scheduling installations must occur. This is best done after the HACMP software is up and running, so you can install the TMR over the same intended HACMP TCP/IP Service address.

Patching the Tivoli TMR softwareIn contrast to the frequent patching of many TMR production environments, it is recommended that you patch your TMR to the latest code during the initial installation and then leave the TMR alone from there. Since IBM Tivoli Workload Scheduling uses the TMR solely for authentication, patching of the TMR rarely provides added benefits to the IBM Tivoli Workload Scheduling/Standalone TMR configuration.

TMR versus Managed Node installationTivoli recommends that the TMR used to facilitate the connection of the JSC to the IBM Tivoli Workload Scheduling engine be configured as a standalone TMR, for the following reasons:

� As mentioned, the TMR associated with IBM Tivoli Workload Scheduling rarely needs maintenance applied to it. Generally speaking, this has not proven to be the case for Framework infrastructures that are supporting other applications such as monitoring and software distribution.

Having the TMR associated with IBM Tivoli Workload Scheduling separate allows for the mission-critical IBM Tivoli Workload Scheduling application to be isolated from the risks and downtime associated with patching which may be necessary for other Framework applications, but is not necessary for IBM Tivoli Workload Scheduling.

� The Framework is a critical component to the JSC GUI. Unlike monitoring, software distribution, or other applications, IBM Tivoli Workload Scheduling operations can typically tolerate very little downtime. By isolating the IBM Tivoli Workload Scheduling TMR from other Managed Nodes in the environment, different service level agreements can be established and adhered to for the environment.


In some cases, customers may decide to not follow Tivoli's recommended practice of using a dedicated TMR. In such cases, they will need to install a Tivoli Managed Node instead. Regardless of the customers’ decision, they must still install the Managed Node into the HACMP Cluster similarly to installing a TMR.

If customers require a Tivoli Endpoint on the IBM Tivoli Workload Scheduling Master, that is an optional installation procedure that they will need to perform in the HACMP Cluster. In order to save time, this installation step should be coordinated with the TMR installation.

Configuring IBM Tivoli Workload Scheduler start and stop scripts

The start and stop scripts for the IBM Tivoli Workload Scheduler application must be prepared and located on each node within the cluster. Those scripts, located in /usr/local/HACMP/scripts on each machine, are called:

� tws_mdm_up.ksh

� tws_mdm_down.ksh

Keep the following in mind when configuring the IBM Tivoli Workload Scheduler start and stop scripts:

� The start and stop scripts must not be located within the shared disk volume. The HACMP verification mechanism will flag this as an error.

� This particular location is consistent with other HACMP installations, that reside within the IBM England North Harbor Data Center.

� The start and stop scripts should be tested for their functionality before HACMP integration begins.

Configuring miscellaneous start and stop scripts Following the creation of the IBM Tivoli Workload Scheduler Start and Stop scripts, it is likely that there will be other applications that will need to be included in the TSM design. Examples of applications that might be included in the IBM Tivoli Workload Scheduler HACMP design are:

� Apache Web Services� DB2®� TSM (Tivoli Storage Manager – for data backups)


Creating and modifying various system filesYou will need to create or modify various files within this configuration; these files are required in order for IBM Tivoli Workload Scheduler and HACMP to work properly:

� /etc/hosts� roots .rhosts files (needed for HACMP communications)� maestro’s .profile file� roots’s .profile file� operha / opermps .profile file

Configuring the HACMP environmentAfter the IBM Tivoli Workload Scheduler start and stop scripts have been developed and tested, you can begin your HACMP configuration.

You will need to configure the following:

� Cluster Definition (Cluster ID)� Cluster nodes (all nodes in the cluster)� Cluster adapters (TCPIP network adapters)� Cluster adapters (Non-TCPIP - Serial Heartbeat)� Define Application Servers (IBM Tivoli Workload Scheduler start and stop

script references)� Define Resource Groups (IBM Tivoli Workload Scheduler Resource Group)� Synchronize Cluster Topology� Synchronize Cluster Resources

Testing the failover procedureTesting the HACMP failover is a procedure that can take several days, depending upon the complexity of the configuration. The configuration that we test here has no complicated failover requirements, but it must still be tested and

Note: The creation of these start and stop scripts can occasionally be rather complicated, especially when the application is expected to run under an HACMP environment, so it is useful to have the subject matter expert for the application available, as well as a contact that can provide UNIX startup and shutdown shell scripts for that application.


understood. As we gain further experience in this area, we will begin to understand and tune both our HACMP environment and its test procedures.

Figure A-4 shows our implementation environment in detail.

Figure A-4 Our environment in more detail

The details for specific configurations on our IBM Tivoli Workload Scheduler HACMP environment are described in the following sections.

HACMP Cluster topologyExample A-1 on page 587 shows our HACMP Cluster topology.


Example: A-1 /usr/es/sbin/cluster/utilities/cllscf > cl_top.txt

Cluster Description of Cluster twsCluster ID: 71There were 2 networks defined : production, serialheartbeatThere are 2 nodes in this cluster.NODE tehnigaxhasa01:

This node has 2 service interface(s): Service Interface emeamdm: IP address: 9.149.248.77 Hardware Address: Network: production Attribute: public Aliased Address?: Not Supported Service Interface emeamdm has 1 boot interfaces. Boot (Alternate Service) Interface 1: tehnigaxhasa01 IP address: 9.149.248.72 Network: production Attribute: public Service Interface emeamdm has 1 standby interfaces. Standby Interface 1: ha01stby IP address: 9.149.248.113 Network: production Attribute: public Service Interface nodetwo: IP address: /dev/tty1 Hardware Address: Network: serialheartbeat Attribute: serial Aliased Address?: Not Supported Service Interface nodetwo has no boot interfaces. Service Interface nodetwo has no standby interfaces.

NODE tehnigaxhasa02: This node has 2 service interface(s): Service Interface tehnigaxhasa02: IP address: 9.149.248.74 Hardware Address: Network: production Attribute: public Aliased Address?: Not Supported Service Interface tehnigaxhasa02 has no boot interfaces. Service Interface tehnigaxhasa02 has 1 standby interfaces. Standby Interface 1: ha02stby IP address: 9.149.248.114 Network: production Attribute: public Service Interface nodeone: IP address: /dev/tty1 Hardware Address:


Network: serialheartbeat Attribute: serial Aliased Address?: Not Supported Service Interface nodeone has no boot interfaces. Service Interface nodeone has no standby interfaces.

Breakdown of network connections:Connections to network production

Node tehnigaxhasa01 is connected to network production by these interfaces:

tehnigaxhasa01 emeamdm ha01stby Node tehnigaxhasa02 is connected to network production by these

interfaces: tehnigaxhasa02 ha02stby

Connections to network serialheartbeat Node tehnigaxhasa01 is connected to network serialheartbeat by these

interfaces: nodetwo Node tehnigaxhasa02 is connected to network serialheartbeat by these

interfaces: nodeone

HACMP Cluster Resource Group topologyExample A-2 shows our HACMP Cluster Resource Group topology.

Example: A-2 /usr/es/sbin/cluster/utilities/clshowres -g'twsmdmrg' > rg_top.txt

Resource Group Name twsmdmrg Node Relationship cascading Site Relationship ignore Participating Node Name(s) tehnigaxhasa01 tehnigaxhasa02 Dynamic Node Priority Service IP Label emeamdm Filesystems /opt/tws Filesystems Consistency Check fsckFilesystems Recovery Method sequentialFilesystems/Directories to be exported /opt/tws Filesystems to be NFS mounted /opt/tws Network For NFS Mount Volume Groups twsvg Concurrent Volume Groups Disks GMD Replicated Resources PPRC Replicated Resources Connections Services


Fast Connect Services Shared Tape Resources Application Servers twsmdm Highly Available Communication Links Primary Workload Manager Class Secondary Workload Manager Class Miscellaneous Data Automatically Import Volume Groups false Inactive Takeover false Cascading Without Fallback true SSA Disk Fencing false Filesystems mounted before IP configured falseRun Time Parameters:Node Name tehnigaxhasa01Debug Level highFormat for hacmp.out StandardNode Name tehnigaxhasa02Debug Level highFormat for hacmp.out Standard

ifconfig -aExample A-3 shows the output of ifconfig -a in our environment.

Example: A-3 fconfig -a output

Node01$ ifconfig -aen0: flags=e080863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT> inet 9.164.212.104 netmask 0xffffffe0 broadcast 9.164.212.127en1: flags=4e080863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG> inet 9.149.248.72 netmask 0xffffffe0 broadcast 9.149.248.95en2: flags=7e080863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD,CHECKSUM_SUPPORT,PSEG> inet 9.149.248.113 netmask 0xffffffe0 broadcast 9.149.248.127lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>Node02$ ifconfig -aen0: flags=e080863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT> inet 9.164.212.105 netmask 0xffffffe0 broadcast 9.164.212.127


en1: flags=4e080863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,PSEG> inet 9.149.248.74 netmask 0xffffffe0 broadcast 9.149.248.95en2: flags=7e080863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD,CHECKSUM_SUPPORT,PSEG> inet 9.149.248.114 netmask 0xffffffe0 broadcast 9.149.248.127lo0: flags=e08084b<UP,BROADCAST,LOOPBACK,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT>

Skills required to implement IBM Tivoli Workload Scheduling/HACMP

There are many skills needed to place such a system into production, and it is unlikely that any one person will perform this complex task alone. A large environment requiring this type of solution generally has specialists that administer various technology sectors. Therefore, it is critical that all participants become involved early in the design process so that there are minimal delays in implementing the project.

It should be noted that while this particular exercise was specific to a IBM Tivoli Workload Scheduling/HACMP integration, the complexity and involvement needed would be no different were this a design utilizing the HP Service Guard or Sun Cluster to provide the high availability needed in a UNIX-based architecture other than AIX.

Following is a summary of the roles and skill levels needed for this effort.

Networking Administration team The networking team must have ample time to prepare the network switches and segments for an HACMP Cluster design. They may need to supply multiple network drops at the data center floor location. Since a large HACMP configuration may require six or more network drops, there may also be a need to purchase additional switches or blades.

The skill set for these activities is medium to high. It is likely that several members of a networking team would be involved in these activities.

Required time for activity: 2 to 5 days


AIX Administration teamThis team is responsible for the following tasks:

� General setup of the RS/6000 and the AIX operating systems within the cluster

� Patching AIX operating systems

� DASD configuration

� Configuring and testing of the serial heart beat cable at the OS level

� Network configuration and connectivity testing

� Possibly assisting with the HACMP and IBM Tivoli Workload Scheduling installation

The skill level for these tasks is high, and is best performed by an AIX administrator/specialist.

Required time for activity: 5 to 15+ days

HACMP Administration teamThe HACMP administration team is responsible for the daily operations of the HACMP environment. Many large customers will have a team dedicated to maintaining these complicated HACMP Clusters. Some of the duties they perform are installations, upgrades, troubleshooting and tuning. It is not unusual to find them having strong AIX skills, and their duties may overlap into AIX administration.

The required skill level for these types of activities is high. The whole purpose of this environment is to provide a highly available 24-hour, 365-day a year operation. HACMP administrators having no training and a minimal skill level place the HACMP system, the application and the business at risk. Therefore, training for HACMP (or any clustering product) is required. Training for seasoned HACMP administrators is also suggested as HACMP has seen significant changes over the last several revisions.

Required time for activity: 10 to 15 days, and ongoing support

Tivoli Framework Administration team In larger shops, there may exist a Framework team that would install the TMRs (or Managed Nodes, if you decide against a dedicated TMR) for you. This team would need to be aware that, although it is performing multiple installations of a TMR, this effort must be coordinated with the HACMP administrators.


The required skill level for this activity is medium to high. Administrators may have procedures that will make the installation more efficient.

Required time for activity: 10 to 15 days, and ongoing support

IBM Tivoli Workload Scheduling Administration team The IBM Tivoli Workload Scheduling administration team may be well versed in the installation of the IBM Tivoli Workload Scheduling code (and patches) into the cluster. Otherwise, this task might be handled by the AIX administrators.

The skill level for this type of configuration is high. This is a process requiring a thorough understanding of the following areas:

� The IBM Tivoli Workload Scheduling application and its recommended installation procedures

� The AIX operating system

� RAID levels and file system tuning configurations

� Fundamental understanding of the HACMP environment (which introduces complexities into the normal IBM Tivoli Workload Scheduling application installation)


Hardware Purchasing AgentThis resource is responsible for purchasing all RS/6000 and AIX-related hardware, software, cables, storage cabinets, DASD, null modem serial cables, additional TCP/IP network switches and other hardware components required for the IBM Tivoli Workload Scheduling/HACMP implementation.

The skill level for this activity is estimated to be low to medium. IBM sales has resources that are capable of quickly generating a robust configuration based on a customer's general hardware requirements.


Data Center Manager The tasks that are performed and coordinated by the data center management team can vary greatly. Tasks that need to be coordinated are floor space allocation and various procedures for placing machines into production. They also coordinate with other personnel such as electricians, HVAC specialists, and maintenance teams, who may need to prepare or reinforce the raised floor structure for the new system being delivered.


While the estimated technical skill level of this activity is low, it is an effort requiring a great deal of coordination skills. These activities can be time-consuming and need to be coordinated properly; otherwise, they will negatively impact the implementation schedule.


Electrical Engineers Tasks performed by a licensed engineer typically deal with potentially hazardous high voltage situations.

The skill level for this type of activity is high. As this is specialized trade, it should not be performed by anyone other than a licensed engineer.


HVAC Engineers Heating, ventilation and air conditioning configurations are generally installed in large data centers before any equipment is ever delivered onto the data center floor. As the data center equipment population grows, however, cooling requirements should be reviewed as new equipment is placed on the data center floor.

The skill level for these types of activities is high. As this is specialized trade, it should not be performed by anyone other than a licensed engineer.


Service EngineersIBM Service Engineers (SEs) are responsible for installing and testing the base functionality of the RS/6000 and possibly the base AIX operating system. The SE may also consult with the customer and assist in such activities as:

� SSA adapter configurations and tuning

� SSA Raid configurations and tuning

� TCP/IP network configurations and tuning

The skill level for these installation activities is high. The IBM Service Engineer is a resource that is critical to a properly installed cluster configuration (for example, if a cable were improperly installed, you would inadvertently witness false HACMP takeovers).



Backup Administration team This team provides the vital service of integrating the HACMP solution into the backup configuration. In the case of this effort, a TSM client was installed into the configuration and the cluster is backed up nightly. This team is also responsible for providing assistance with disaster recovery testing, adding one more level of security to the complete environment.

The skill level for any enterprise backup solution is high. Large backup environments require personnel who are trained and specialized in a very critical business activity.


Observations and questionsIn this section we offer our observations, together with questions and answers related with our implementation.

Observation 1HACMP startup does not occur until both cluster nodes are running HACMP. After rebooting both nodes, we started the HACMP services on Node01 first and checked whether IBM Tivoli Workload Schedule had started. But after 10 or 15 minutes, IBM Tivoli Workload Scheduler still had not started.

After waiting for some time, we started the HACMP services on Node02. Shortly after Node02 started its HACMP services, we saw the IBM Tivoli Workload Scheduler application come up successfully on Node01. We have placed UNIX “wall” commands in the IBM Tivoli Workload Scheduler startup (and shutdown) scripts, so we will see exactly when these IBM Tivoli Workload Scheduler-related scripts are invoked.

QuestionOur environment is a two-node cluster dedicated to running the IBM Tivoli Workload Scheduler process tree (the second node in the cluster sits idle). Therefore, wouldn’t it make sense for us to start the HACMP IBM Tivoli Workload Scheduler Resource Group as soon as possible, regardless of which node comes up first?

AnswerYes, and that is normal. Your cascading config, as far as having node priority, is listed to have it start on Node01.


QuestionIf this is acceptable (and advisable), exactly how is the HACMP configuration modified to accomplish this goal?

AnswerWhy you are dependent on the second node to start should be related to how your NFS is set up. You may leave your fs as a local mount and export it, but do not nfs mount it back to itself.

Observation 2During the startup of the HACMP Cluster, the connection to Node01 is lost. What occurs during this procedure is that the IP address on the Ethernet adapter is migrated to the EMEAMDM Service address (9.149.248.77). During this migration, your connection is broken and you must now physically reconnect to the machine through the EMEAMDM address.

QuestionDoes the addition of a third IP/Address (IP Aliasing) resolve this issue?

AnswerYes. Your setup what is called a node alias, and probably even changed your topology config, where the boot and standby are both boot adapters. This would implement IP address takeover via aliasing (which would also be fast).

However, node alias itself may not resolve this if it comes up on the boot adapter, which we believe is normal. So we think you would want to implement both node alias and IPAT via aliasing.

QuestionWould this third IP address require an additional Ethernet adapter?

AnswerNo, it does not.

QuestionWould this third IP address need to be in a different subnet from the other two addresses?

AnswerYes, it would. Here is what to do: Change your standby adapter to be a type “boot” adapter, and change your existing boot adapter(s) to be a different subnet than your service adapter subnet. This will give you a total of three subnets being used.


Then you can create a node alias, which can be the same subnet as the service, and it is actually quite normal to do so.

Figure A-5 shows a generic example of topology config with IPAT via aliasing and the node alias, which is listed as persistent. This configuration requires a minimum of three subnets. The persistent address and service addresses can be on the same subnet (which is normal) or on separate subnets. This is also true when using multiple service addresses.

(This example shows mutual takeover, which means node B fails to A also, so the service 1b does not apply for you, but should give you the idea.)

Figure A-5 IPAT via aliasing topology example

Observation 3During the failover process from Node01 to Node02, the service address on Node02 (9.149.248.74) remains unchanged, while the standby adapter (EN2 - 9.149.248.114) is migrated to the EMEAMDM service address (9.149.248.77). (In contrast, when HACMP services are started, we do get disconnected from the primary adapter in Node01, which is what we expected.)

In this configuration, when we telnet into the EN1 adapters (9.149.248.72 and 9.149.248.74) on both machines, we do not get disconnected from the machine during the failover process.

QuestionIs this behavior expected (or desired)?

Node A Node B

Boot 1a -Persistent -

Boot 2a

Service 1a -

Boot 1bPersistent

Boot 2b

Service 1b

IP - 10.10.1.9 10.10.1.10 - IPIP - 9.19.163.12 9.19.163.13 - IP

IP - 10.10.2.9 10.10.2.10 - IP

IP - 9.19.163.15 9.19.163.25 - IP

Netmask 255.255.255.0


AnswerThis is normal when doing traditional IPAT and one-sided takeover, because fallover of a service address will always move to the standby adapter, either locally for NIC failure, or remotely on system failure. If you implemented aliasing, you would not see any significant difference.

QuestionIs this situation something we would like to see our Node01 do? (For example, have the secondary adapter (EN3) switch over to the EMEAMDM Service address, while EN2 (9.149.248.72) remains untouched and essentially acts as the backup Ethernet adapter.)

AnswerYou could see the desired results if you implement aliasing.

Observation 4Upon starting the HACMP Services on the nodes, we see content like that shown in Example A-4 in our smit logs.

Example: A-4 smit logs

Oct 17 2003 20:56:39Starting execution of /usr/es/sbin/cluster/etc/rc.clusterwith parameters: -boot -N -b0513-029 The portmap Subsystem is already active.Multiple instances are not supported.0513-029 The inetd Subsystem is already active.Multiple instances are not supported.Oct 17 2003 20:56:51Checking for srcmstr active...Oct 17 2003 20:56:51complete. 23026 - 0:00 syslogdOct 17 2003 20:56:52/usr/es/sbin/cluster/utilities/clstart : called with flags -sm -b0513-059 The topsvcs Subsystem has been started. Subsystem PID is 20992.0513-059 The grpsvcs Subsystem has been started. Subsystem PID is 17470.0513-059 The grpglsm Subsystem has been started. Subsystem PID is 20824.0513-059 The emsvcs Subsystem has been started. Subsystem PID is 19238.

QuestionAre the statements outlined in bold normal?

AnswerYes, especially after starting the first time. These services are started by HA on Node01, and by reboot on Node02. When stopping HA, it does not stop these particular services, so it is fine.


Observation 5When attempting to test the failover on the cluster; never be logged in as the maestro user. Since this user’s home file system resides in the shared volume group (twsvg or /opt/tws), we will most likely have problems with:

� The cluster actually failing over because it will not be able to mount the file system

� Possible corruption of a file, or file system

Observation 6The failover of the HACMP Cluster seems to work fine. We decided to benchmark the failover timings:

� Shutdown of HA services on Node1 - Wed Oct 22 17:45:51 EDT 2003

� Startup of HA services on Node2 Wed Oct 22 17:47:37 EDT 2003

Result: a failover benchmark of approximately 106 seconds.

The test is performed as follows. Have a machine that is external to the cluster prepared to ping emeamdm (9.149.248.77). This machine is called doswald.pok.ibm.com (you will need two terminals open to this machine).

1. In the first terminal, enter the UNIX “date” command (do not press Enter).

2. In the second terminal, enter the UNIX command ping 9.149.248.77 (do not press Enter).

3. Have terminals open to both nodes in the cluster. (We had both nodes in the cluster running the HACMP services, with the IBM Tivoli Workload Scheduler Resource Group running on Node1.)

Node1 must be in seen when selecting smit hacmp -> Cluster Services -> Stop Cluster Services -> “shutdown mode = takeover (press Enter only one time).

4. In the first terminal, from doswald, press Enter. This will give you the begin time of the cluster failover.

5. Very quickly go back to node1, and press Enter. This will start the cluster failover.

6. In the second terminal, from doswald, press Enter. This will execute the ping command.

7. In the first terminal, from doswald, enter the UNIX date command again (do not press Enter).

8. Wait for the ping command to resolve. Then press Enter for the final date command.


9. Subtract the first date command results from the second date command results.

QuestionDoes 106 seconds sound like a reasonable duration of time?

AnswerIt does sound reasonable. However, the overall time should be the instant of failure until the time it takes for the application to get up and running by user connectivity on the other machine. You seem to only be testing IP connectivity time. You should also test via a hard failure, meaning halt the system.

QuestionWould the addition of another IP address possibly improve this failover time of 106 seconds?

AnswerOnly implementing IPAT via aliasing should improve this time (by perhaps a few seconds).

QuestionWould the addition of another IP address require another physical Ethernet card?

AnswerNo.


Appendix B. TMR clustering for Tivoli Framework 3.7b on MSCS

In this appendix, we provide step-by-step instructions on how the Tivoli Management Framework 3.7b on Windows 2000 was configured in a high availability environment. We guide you through the steps needed to install and configure the TMR.

In this environment, the Windows server is configured with Windows 2000 Advanced Server Edition SP and is running the Microsoft Cluster Manager.

B


SetupThe setup shown in Table B-1 was used during Windows TMR installation. The cluster includes physical nodes SJC-TDB-01 and SJC-TDB-02, with a virtual node named tivoli-cluster. The shared resource that is configured to fail over is defined as drive D:.

Table B-1 Installation setup

Configure the wlocalhost Framework 3.7b for Windows does not read the /etc/wlocalhosts file or the wlocalhosts environment variable. Instead, with Framework 3.7b, there is a wlocalhost command that is used to configure the value of the wlocalhost. The command will create the localhost registry key in the HKEY_LOCAL_MACHINE\SOFTWARE\Tivoli\Platform registry path.

If you have installed Framework on another Windows machine, you can copy the $BINDIR/bin/wlocalhost binary from another machine and run it locally to set this value. The syntax we used to set the value of the wlocalhost was “wlocalhost tivoli-cluster”. If you are installing Framework for the first time on the Windows platform, you can manually create this value using regedit.

Install Framework on the primary nodeAfter the wlocalhost is set, the next step is to install Framework on the primary node. This is done by using the same procedures that are provided in the 3.7 Installation guide; the only exception is that you will want to specify the installation directory to be the shared drive (in our case, it is D:\tivoli).

Once Framework is installed, open a command prompt and run the odadmin odlist command to verify that the oserv is bound to the virtual IP and hostname defined by the wlocalhost command. Restart the primary node to register the tivoliap.dll.

Hostname IP address Description

SJC-TDB-01 10.254.47.191 Physical node

SJC-TDB-02 10.254.47.192 Physical node

tivoli-cluster 10.254.47.190 Virtual node


Install Framework on the secondary nodePrior to installing Framework 3.7b on the secondary node, you will need to open the Cluster Manager and initiate a failover. Once the failover has occurred, you will need to delete the %DBDIR% directory and set the wlocalhost on the secondary node. If all went well during the installation on the primary node, you will be able to find the wlocalhost binary in the %BINDIR%/bin directory.

After the %DBDIR% has been removed and the wlocalhost has been set, you can install Framework on the secondary node. The Framework installation should be identical to the installation on the primary node, with the installation directory being on the shared drive (D:\tivoli). After the installation, run the odadmin odlist command to verify that the oserv is bound to the virtual IP and hostname. Restart the secondary node, if it has not already been restarted.

Configure the TMRFollow these steps, in the order specified, to configure the TMR.

Set the root administrators loginWhen installing Framework, a default high level administrator is created that is named Root_ SJC-TDB-02. This administrator is, by default, bound to a login at the hostname where the TMR was installed. In order to log in to the Tivoli Desktop, you need to modify the login so the user will be able to log in at the virtual host.

First, open a command prompt and run the following command to set up an alias to allow the root user to log in:

odadmin odlist add_hostname_alias 1 10.254.47.190 SJC-TDB-02

Once the alias has been set, log in to the desktop and set the login with the appropriate hostname. Then use the following command to remove the alias:

odadmin odlist del_hostname_alias 1 10.254.47.190 SJC-TDB-02

Force the oserv to bind to the virtual IPIn order for the oserv to work properly, you need to bind it to the virtual IP address. This can be done with the following command:

odadmin set_force_bind TRUE 1

Appendix B. TMR clustering for Tivoli Framework 3.7b on MSCS 603

Change the name of the DBDIRWhen Framework is installed, it will still point to SJC-TDB-02.db for the DBDIR, regardless of whether or not the wlocalhost is set. To resolve this, manually rename the DBDIR from the SJC-TDB-02.db to the tivoli-cluster.db directory.

Modify the setup_env.cmd and setup_env.shNext, modify the c:\winnt\system32\drivers\etc\tivoli\setup_env.* files that are used to set up the environment variables. Since Framework on Windows installs the DBDIR using the <hostname>.db directory instead of in the <virtual hostname>.db directory, you need to open a text editor and modify the directory where the environment variables point by changing all references of SJC-TDB-02.db to tivoli-cluster.db.

Once this is done, copy the modified setup_env.cmd and setup_env.sh to the c:\winnt\system32\drivers\etc\tivoli on both nodes.

Configure the registryThere are two places to modify in the Windows registry when Tivoli is installed. You can modify these locations by using the regedit command.

� The first place to modify is under the HKEY_LOCAL_MACHINE\SOFTWARE\Tivoli\Platform\oserv94 path. You will need to modify the Service directory key and the Database directory key to point to the new D:\tivol\db\tivoli-cluster.db directory, instead of to the SJC-TDB-02.db directory.

� The second place to modify is where the oserv service looks for the oserv.exe executable; the location in the registry is HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\oserv. You will only need to modify the path to the oserv.exe to “d:\tivoli\db\tivoli-cluster.db”.

The modifications will have to be made on both primary and secondary nodes.

Rename the Managed NodeThe TMR’s Managed Node, which was created during the installation of Tivoli, was named by the hostname instead of the virtual hostname. This is not necessarily a problem since the oserv is bound to the virtual hostname and IP. To maintain consistency, however, in our case we opted to rename the ManagedNode to the name of the virtual hostname.


This was done with the following command from the Windows bash shell:

MN=`wlookup –r ManagedNode SJC-TDB-02`idlcall $MN _set_label ‘”tivoli-cluster”’

If you perform this task, run the wlookup –ar tivoli-cluster command afterward to verify that the rename was successful.

Rename the TMRThe default name of the TMR when it was installed on Windows was still SJC-TDB-01-region instead of tivoli-cluster-region. This is not a problem, but to maintain consistency we renamed the TMR using the following command:

wtmrname <virtual hostname>-region

If you perform this task, verify the result of the command by running the wtmrname command and check that the output shows tivoli-cluster-region.

Rename the top-level policy regionWhen the Framework was installed, it created a top-level policy region call SJC-TDB-02-region. This is not a problem, but to maintain consistency we chose to rename the region.

This can be done from the Tivoli Desktop by right-clicking the SJC-TDB-02-region icon on the root administrator’s desktop and selection properties. Once the Properties dialog is open, you can change the name to “tivoli-cluster-region” then click Set & Close to activate the changes.

We chose to change the name of the top-level policy region from the command line by using the following command:

PR=`wlookup –r PolicyRegion SJC-TDB-02-region`idlcall $PR _set_label ‘”tivoli-cluster-region”

If you perform this task, run the following command to verify the change:

wlookup –r PolicyRegion tivoli-cluster-region`

Rename the root administratorThe default Tivoli administrator that was created was named Root_SJC-TDB-02-region. This is not a problem, but for consistency we choose to change the name to Root_tivoli-cluster-region.

This was done from the Tivoli Desktop by opening the administrator’s window and right-clicking Root_SJC-TDB-02-region administrator and selecting


Properties. Once the properties window was open, we modified the name to Root_tivoli-cluster-region. If you perform this task, then click Save & Close and the configuration is complete.

Configure the ALIDBWhen Tivoli was installed, the ALIDB was set to SJC-TDB-02.db; this is an internal value that is hardcoded into the Tivoli object database. In order to change this value, we had to output the sequence list to a file, then modify the file, and re-import the sequence list. In order to get the sequence list, we ran the following command from a bash shell:

MN=`wlookup –r ManagedNode tivoli-cluster`idlcall $MN _get_locations > c:/locations.txt

We opened the c:\locations.txt file with a text editor and changed all occurrences of SJC_TDB-02 to tivoli-cluster. When the editing was complete, we re-imported the sequence-list using the following command.

idlcall $MN _set_locations < c:/locations.txt

If you perform this task, once the value is set you should be able to install software successfully.

Create the cluster resourcesWe followed these steps to create the cluster resources.

Create the oserv cluster resourceIn order for the oserv service to fail over, we created a resource in the cluster manager for both oserv services. We opened up the cluster manager first on the primary node, and then on the secondary node. We right-clicked the cluster group and selected new resource. We defined the oserv as a Generic Service and added the required information.

Create the trip cluster resourceThe trip service is required for the oserv to process correctly, so we also had to create a resource for it in the cluster manager. We opened up the cluster manager on either the primary or secondary node, right-clicked the cluster group, and selected new resource. We defined trip as a generic service and added the required information.


Set up the resource dependenciesTo set up the resource dependencies, right-click the oserv resource and set it so that the virtual hostname, virtual IP, quorum disk, shared disk and trip are set as dependencies. Without setting up these dependencies, the oserv could possibly get in to an infinite failover loop.

Validate and backupFollow these steps to validate and back up your configuration.

Test failoverOpen the cluster manager and initiate a failover. Verify that the oserv service starts on each node. If failover works, bring down the oserv on each node and verify that the cluster will fail over successfully. If the backup of the Tivoli databases works, it means that you have successfully installed Framework 3.7b on a Windows cluster.

Back up the Tivoli databasesThis is the most important part of the installation—if all the validation tests are positive, back up your Tivoli databases by running the wbkupdb command.


acronyms

AFS Andrew File System

AIX Advanced Interactive Executive

APAR authorized program analysis report

API Application Program Interface

BDM Backup Domain Manager

BMDM Backup Master Domain Manager

CLI command line interface

CMP cluster multi-processing

CORBA Common Object Request Broker Architecture

CPU ITWS workstation

CWOF cascading without fallback

DHCP Dynamic Host Configuration Protocol

DM Domain Manager

DNS Domain Name System

ESS IBM TotalStorage Enterprise Storage Server

FTA Fault Tolerant Agent

FTP File Transfer Protocol

HA high availability

HACMP High Availability Cluster Multi-Processing

HAGEO High Availability Geographic Cluster system

HCL Hardware Compatibility List

IBM International Business Machines Corporation

IP Internet Protocol

IPAT IP Address Takeover

Abbreviations and

© Copyright IBM Corp. 2004. All rights reserved.

ITSO International Technical Support Organization

ITWS IBM Tivoli Workload Scheduler

JFS Journaled File System

JSC Job Scheduling Console

JSS Job Scheduling Services

JVM Java Virtual Machine

LCF Lightweight Client Framework

LVM Logical Volume Manager

MDM Master Domain Manager

MIB Management Information Base

MSCS Microsoft Cluster Service

NFS Network File System

NIC Network Interface Card

ODM Object Data Manager

PERL Practical Extraction and Report Language

PID process ID

PTF program temporary fix

PV physical volume

PVID physical volume id

RAM random access memory

RC return code

SA Standard Agent

SAF System Authorization Facility

SAN Storage Area Network

SMIT System Management Interface Tool

SNMP Simple Network Management Protocol

SSA Serial Storage Architecture

609

SCSI Small Computer System Interface

STLIST standard list

TCP Transmission Control Protocol

TMA Tivoli Management Agent

TMF Tivoli Management Framework

TMR Tivoli Management Region

TRIP Tivoli Remote Execution Service

X-agent Extended Agent


Related publications

The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this Redbook.

IBM RedbooksFor information on ordering these publications, see “How to get IBM Redbooks” on page 613. Note that some of the documents referenced here may be available in softcopy only.

� High Availability Scenarios for Tivoli Software, SG24-2032

� IBM Tivoli Workload Scheduler Version 8.2: New Features and Best Practices, SG24-6628

Other publicationsThese publications are also relevant as further information sources:

� Tivoli Workload Scheduler Version 8.2, Error Message and Troubleshooting, SC32-1275

� IBM Tivoli Workload Scheduler Version 8.2, Planning and Installation, SC32-1273

� Tivoli Workload Scheduler Version 8.2, Reference Guide, SC32-1274

� Tivoli Workload Scheduler Version 8.2, Plus Module User’s Guide, SC32-1276

� Tivoli Management Framework Maintenance and Troubleshooting Guide, GC32-0807

� Tivoli Management Framework Reference Manual Version 4.1, SC32-0806

� Tivoli Workload Scheduler for Applications User Guide, SC32-1278

� Tivoli Workload Scheduler Release Notes, SC32-1277

� IBM Tivoli Workload Scheduler Job Scheduling Console Release Notes, SC32-1258

� Tivoli Enterprise Installation Guide Version 4.1, GC32-0804

� HACMP for AIX Version 5.1, Planning and Installation Guide, SC23-4861


� High Availability Cluster Multi-Processing for AIX Master Glossary, Version 5.1, SC23-4867

� HACMP for AIX Version 5.1, Concepts and Facilities Guide, SC23-4864

� High Availability Cluster Multi-Processing for AIX Programming Client Applications Version 5.1, SC23-4865

� High Availability Cluster Multi-Processing for AIX Administration and Troubleshooting Guide Version 5.1, SC23-4862

� Tivoli Enterprise Installation Guide Version 4.1, GC32-0804

� IBM Tivoli Workload Scheduler Job Scheduling Console User’s Guide Feature Level 1.2, SH19-4552

� IBM Tivoli Workload Scheduler Job Scheduling Console User’s Guide Feature Level 1.3, SC32-1257

� Tivoli Management Framework Reference Manual Version 4.1, SC32-0806

Online resourcesThese Web sites and URLs are also relevant as further information sources:

� FTP site for downloading Tivoli patches

ftp://ftp.software.ibm.com/software/tivoli_support/patches/patches_1.3/

� HTTP site for downloading Tivoli patches

http://www3.software.ibm.com/ibmdl/pub/software/tivoli_support/patches_1.3/

� Tivoli public Web site


� IBM Fix Central Web site


� Microsoft Software Update Web site


� IBM site for firmware and microcode download-for storage devices


� IBM site for firmware and microcode download-for pSeries servers


� Microsoft Hardware Compatibility List Web site

http://www.microsoft.com/hcl


ftp://ftp.software.ibm.com/software/tivoli_support/patches/patches_1.3/

http://www3.software.ibm.com/ibmdl/pub/software/tivoli_support/patches_1.3/






http://www.microsoft.com/hcl

� Microsoft Cluster Server white paper location


� IBM Web site that summarizes HACMP features


� Microsoft Cluster Server white paper location


� RFC 952 document


� RFC 1123 document


� Web page for more information on downloading and implementing ntp for time synchronization

http://www.ntp.org/

How to get IBM RedbooksYou can search for, view, or download Redbooks, Redpapers, Hints and Tips, draft publications and Additional materials, as well as order hardcopy Redbooks or CD-ROMs, at this Web site:

ibm.com/Redbooks

Related publications 613









http://www.ntp.org/

Index

Symbols.jobmanrc 60.profile files 318, 458.rhosts file 332.tivoli directory 463/etc/filesystems 101–102/etc/inittab 187/etc/wlocalhosts 602

Numerics7133 Serial Disk System 268.2-TWS-FP02 210

AAbend state 61ABENDED 60access method 50active cluster server 559active instance 307Active/Active 44Active/Passive 44add IP alias to oserv 460additional domains 2advanced mode 506AFS 313AIX 33AIX 5.2.0.0 114AIX logical disk 434AIX physical disk 434ALIDB 606Allow failback 382Amount of downtime 344Amount of uptime 344Andrew File System

See AFSAPAR IY45695 117Application Availability Analysis tool 344application healthiness 41application monitoring 228, 487Application Server Worksheet 70Application Worksheet 70

Application name 70

© Copyright IBM Corp. 2004. All rights reserved.

Cluster name 70Fallover Strategy 71Location of key application files 70Node Relationship 70Start Commands/Procedures 71Stop Commands/Procedures 71Verification Commands 71

ATM 21authentication services 5Autolink 387Automatically Import Volume Groups 88Automation 40AutoStart install variable 465Autotrace service 521Availability analysis 343available Connectors 331

BBackup Administrations Team 594Backup Domain Manager 25, 572Backup Master Domain Manager 57–58backup processors 9base Framework install 459batchman 194, 203Batchman Lives 367Batchman=LIVES 194BDM

See Backup Domain Managerbest practice 408big endian 58bind 111BMDM

See Backup Master Domain Managerboolean expression 60boot 78boot IP label 78built-in web server 464business requirements 4

CC program 50CA7 51cascading 72, 257

615

cascading without fallback 260Cascading Without Fallback Activated 88cat test 579certificate authority 503cfgmgr 215chdev 214cl_RMupdate 229Cleanup Method 487clharvest_vg command 283Client reconnection 22Clinfo 21Clinfo API 21clRGinfo command 298clsmuxpd 342clsmuxpdES 293clstrmgrES 293cltopinfo command 282cluster 7cluster administration console 23Cluster Administrator tool 156Cluster Event Name 92cluster event processing 91Cluster Event Worksheet 92

Cluster Event Name 92Cluster Name 92Post-Event Command 92

cluster events 91cluster group 138, 379cluster hardware 468cluster IP address 155cluster manager 35, 269cluster multi-processing 16Cluster Service 23, 157Cluster Service Configuration Wizard 156Cluster Services 23Cluster Services Group 368cluster software 23, 38cluster state 78cluster status information 21Clustering Technologies

Basic elements 32Managing system component 35Typical configuration 33

Clustering technologies ix, 1, 8High availability versus fault tolerance 8loosely coupled machines 8MC/Service Guard 33Open Source Cluster Application Resources 33overview 8

SCYLD Beowulf 33Sun Cluster 33terminology 7types of cluster configurations 12Veritas Cluster Service 33

clverify utility 257, 280command 2, 270, 388command line 191communication networks 21company name 191, 389component failures 17components file 48, 54–55composer program 365computing power 45concurrent 257concurrent access environments 20concurrent jobs 368Configuration management 344Configure the registry 604Configure the TMR 603Configure the wlocalhost 602Configuring a resource group 492conman CL 61Connector 328Connector Framework resource 325Connector instance 5, 405Connector name 327Connector objects 331container 491CONTENTS.LST file 402cookbook approach 28CPU 2cpuname 387crossover cable 138current plan 61current working directory 323Custom application monitoring 484custom monitor 228customized event processing 269customizing cluster events 91CWOF

See cascading without fallback

DData Center Manager 592database 58Database directory registry key 604databases in sync 58


default cluster group 138default Gateway 112Default Node Priority 87Dependencies 40Destination Directory 389df command 110, 324disaster recovery 11disk adapter 82Disk fencing 496Disk Fencing Activated 88Disk Mirroring 18, 45disk technologies 82dispatcher number 461distributed computing resources 45DM

See Domain ManagerDNS configuration 4DNS server 252, 274domain 2, 24, 47–48domain account 141domain management responsibilities 59Domain Manager 2, 4, 25, 48, 54, 192domain name 50Domain Name System

See DNSdomain user 354domain workstations 58downtime 9, 21du command 324dual Master Domain Manager configuration 346dumpsec 198duplicating system components 32

Eecho command 323efficiency of the cluster 60Electrical Engineers 593Enable NetBIOS 175Endpoint 479, 489, 500, 502–503Endpoint manager login_interval 562Enhanced Journaled File System 18Enterprise management 343environment variable 463Error notification 18ESS 26Ethernet 21Ethernet PCI Adapter 417exchange Framework resources 337, 408

executable file 2Extended Agent 49, 51external disk 35external disk device 20external drive system 43

FFailback 23failed applications 22failed disk drive 82failed job 60fallback 7, 23, 38fallback policy 258, 492fallover 7, 37–38, 229, 257, 294fallover policy 492Fallover Strategy 73Fault tolerance 8fault tolerant 57Fault Tolerant Agent 2, 4, 25, 28, 49–50, 54, 192, 357fence 249Filesystem Recovery Method 88Filesystems Consistency Check 88FINAL 12FINAL job stream definition 367For Maestro 390Forced HACMP stops 345Framework 48Framework 3.7b 601Framework oserv IP alias 320frequency of resource exchange 408front-end application 21fsck 88FTA

See Fault Tolerant AgentFull Status 58

GGateway architecture 472generic service 377, 554Geographic high availability 343geographically dispersed Gateways 472get_disk_vg_fs 269globalopts 58globalopts file 191grid 45Grid Computing 45grid computing 45

Index 617

Group 23

HHA

See high availabilityHACMP 33, 67–71, 78, 82HACMP 4.5 577HACMP 5.1

Benefits 17Implementing 67Install base 122Removing 134Updating 126

HACMP Administrations Team 591HACMP Cluster topology 586HAGEO 26halt command 298hardware address 79Hardware Compatibility List 139, 145Hardware configurations 43Hardware considerations

Disk 43Disk adapter 43Disk controller 43Network 42Network adapter 42Node 42Power source 42TCP/IP subsystem 43

hardware HA solution 58Hardware Purchasing Agent 592Hdisk 80heartbeat 35heartbeat mechanism 35heartbeat packet 35heartbeating 255–257Heartbeating over disk 213high availability ix, 2, 8, 16, 32high availability design 27High Availability Geographic Cluster system

See HAGEOHigh availability terminology

Backup 7Cluster 7Fallback 7Fallover 7Joining 7Node 7

Primary 7Reintegration 7

High-Availability Cluster MultiprocessingSee HACMP

highest-priority node 380highly available object dispatcher 490–491hostname 250–251hosts files 557hot standby 12, 33hot standby node 66Hot standby scenario 66Hot standby systems 46HP-UX 33HP-UX operating systems 463HVAC Engineers 593

IIBM Fix Central web site 114IBM LoadLeveler, 59IBM PCI Tokenring Adapter 417IBM RS/6000 7025-F80 417IBM service provider 464IBM SSA 160 SerialRAID Adapter 417IBM Tivoli Business Systems Manager 4IBM Tivoli Configuration Manager 4, 345IBM Tivoli Configuration Manager 4.2 210IBM Tivoli Distributed Monitoring (Classic Edition) 4IBM Tivoli Enterprise Console 4, 345IBM Tivoli Enterprise Data Warehouse 4IBM Tivoli Management Framework 4–5, 48, 66, 304, 318IBM Tivoli NetView 4IBM Tivoli ThinkDynamic Orchestrator 345IBM Tivoli Workload Scheduler 5, 49, 54, 260, 318, 324

architectural overview 2Backup Domain Manager 58Backup Domain Manager feature 25Backup Domain Manager feature versus high availability solutions 24Backup Master Domain Manager 57components file 48Console 48CPU 2database 47Domain Manager 2, 48engine code 48Extended Agent 49


Fault Tolerant Agent 25, 49geographically separate nodes 26hardware failures to plan for 26highly available configuration 25instance 72job flow 61Job recovery 60Job Scheduling Console 48managed groups 2Master Domain Manager 2, 4, 47Multiple instances 56out of the box integration 4pre 8.2 versions 56relationship between major components 6scheduling network 2scheduling objects 2Software availability 57Switch manager command 59switchmgr command 24Two instances 54–56when to implement high availability 24workstation 2

IBM Tivoli Workload Scheduler high availabilityAdvantages 26HA solutions versus Backup Domain Manager 24Hardware failures to plan for 26in a nutshell 27Possible failures 24When to implement 24

IBM Tivoli Workload Scheduler Version 8.1 571IBM Tivoli Workload Scheduler z/OS access method 51IBM Tivoli Workload Scheduler/HACMP integration

Add custom post-event HACMP script 242Add custom start and stop HACMP scripts 234Add IBM Tivoli Management Framework 303Adding the FINAL jobstream 194Applying fix pack 204Checking the workstation definition 193Configure application servers 223Configure cascading without fallback 260, 264Configure Framework access 330Configure HACMP networks and heartbeat paths 254Configure HACMP persistent node IP label/ad-dresses 272Configure HACMP resource groups 257Configure HACMP service IP labels/addresses

221, 252Configure HACMP to start on system restart 300Configure heartbeating 213Configure predefined communication interfac-es 276Configure pre-event and post-event com-mands 267Configure pre-event and post-event process-ing 269Configuring the engine 192Create additional Connectors 328Creating mount points on standby nodes 186example .profile 191implementation 184implementation overview 184Install base Framework 315Installing the Connector 194Installing the engine 191Interconnect Framework servers 331lessons learned 345Live test of HACMP fallover 298Log in using Job Scheduling Console 339Modify /etc/hosts and name resolution order 250one IBM Tivoli Workload Scheduler instance 345Planning for IBM Tivoli Management Framework 303Planning the installation sequence 312Poduction considerations

Configuration management 344Dynamically creating and deleting Connec-tors 341Enterprise management 343forced HACMP stops 345Geographic high availability 343Measuring availability 343Monitoring 342Naming conventions 340Notification 345Provisioning 345Security 342Time synchronization 341

Preparing to install 188Required skills 590Setting the security 198Start HACMP cluster services 287Test HACMP resource group moves 294

Index 619

Things to considerCreating mount points on standby nodes 186Files installed on the local disk 187IP address 187Location of engine executables 186Netman port 187Starting and stopping instances 187user account and group account 186

Verify fallover 301Verify the configuration 280

IBM Tivoli Workload Scheduling Administrations Team 592IBM TotalStorage Enterprise Storage Server

See ESSIBM WebSphere Application Server 464ifconfig 298Inactive Takeover 88index file 325industry-standard hardware 18initializing oserv 400initiator file 216installation code 580installation password 399installation roadmap 573installation user 53–54, 56Installation User Name 389installation wizard 408Installing

additional languages 360Autotrace service 505Base Framework 315Connector 194Connector fix pack 204Framework 37b 602Framework components and patches 459HACMP 92highly available Endpoint 472IBM Tivoli Management Framework Version 4.1 312IBM Tivoli Workload Scheduler engine 191IBM Tivoli Workload Scheduler Framework com-ponents 322IBM Tivoli Workload Scheduler on MSCS 348installation directory 355Job Scheduling Connector 402Job Scheduling Console 408Job Scheduling Services 195, 401Microsoft Cluster Service 141

multiple Tivoli Endpoints 555Tivoli Framework components and patches 318Tivoli Managed Node 536TRIP 538

InstallShield 558Instance Count 486Instance Owner 195instant messaging 310Interconnecting Framework Servers 405Inter-dispatcher encryption level 334Interface Function 78internal cluster communications 138interregion encryption 334interregional connections 399Inter-site Management Policy 498IP 78IP address 155IP Address Takeover 87IP Alias 257IP hostname lookup 455IP label 78IPAT 76IPAT via IP Aliases 77IPAT via IP Replacement 76

JJakarta Tomcat 464Java interface 61JES 51JFS filesystem 437jfs log volume 102JFS logical volume 109jfslog 84Jnextday 194, 367Jnextday job 58job 2, 60job abend 60job definition 60job execution 92job management system 59job progress information 51job recovery 60Job Scheduling Connector 48Job Scheduling Console 5–6, 21, 49, 61, 320Job Scheduling Services 5, 48job status information 51job turnaround time 60job’s standard list file 50


jobman 203jobmanrc 52jobtable file 62joining 7JSC

See Job Scheduling ConsoleJSS

See Job Scheduling Services

Kkill a job 61kill a process 61killed job 61

LLAN 43laptop 6LCF 477less busy server 24License Key 400license restrictions 40licensing requirements 40Lightweight Client Framework

See LCFlightweight client framework 489link verification test 422Linux 33Linux environment 322little endian 58Load balancing 59Load balancing software 59LoadLeveler administrator 59LoadLeveler cluster 59local configuration script 52local disk 56Local UNIX access method 52local UNIX Extended Agent 52local user 354localhost registry key 602localopts 58logical storage 35logical unit 23logical volume 437Logical Volume Manager 17, 83logical volume name 99logredo 88Longest period of downtime 344Longest period of uptime 344

loosely coupled machines 8lsattr 214, 417lspv 84lsvg 298LVD SCSI Disk Drive 417LVM

See Logical Volume Manager

MMaestroDatabase 337, 407MaestroEngine 337, 407MaestroPlan 337, 407mailman 203Maintenance Level 02 115major number 84, 436makesec 199Managed Node 195–196, 321, 462, 489, 536, 552Managed Node software 545ManagedNode resource 407management hub 2Management Policy 87manual startup 368-master 192Master CPU name 389Master Domain Manager 47, 54, 57–58, 192, 357Master’s CPU definition 193MC/Service Guard 33mcmagent 50mcmoptions 50MDM

See Master Domain ManagerMeasuring availability 343method 52methods directory 50Microsoft 33, 145Microsoft Cluster Administrator utility 147Microsoft Cluster Service

cluster group 166concepts 22

Failback 23Group 23Load balancing 24Quorum disk 23Resource 23Shared nothing 22

default cluster group 138hardware considerations 139installation 141

Index 621

network name 139our environment 138Planning for installation 139Pre-installation setup

Add nodes to the domain 141Configure Network Interface Cards 139Create a domain account for the cluster 141quorum partition size 140Setup Domain Name System 139Setup the shared storage 140Update the operating system 141

Primary servicesAvailability 21Scalability 21Simplification 21

private NIC 138public NIC 138service 22

Microsoft Windows 2000 305Mirroring SSA disks 82modify cpu 193monitor jobs 49mount 109MSCS 21–24MSCS white paper 22multi node cluster 44multiple SSA adapters 441mutual takeover 13, 346, 391mutual takeover scenario 195

Nnaming convention 340, 385, 524netman 203Netmask 77–78, 256network adapter 77, 79, 155Network File System

See NFSnetwork interface 290Network Interface Card

See NICNetwork Name 77Network Time Protocol 341Network Type 77Networking Administrations Team 590new day processing 59new logical volume 441NFS 313NFS exported filesystem 84–85

NIC 138–139node 35Node Name 77node_down event 269node_id 213node_up event 269node_up_local 269node-bound connection 221non-active node 559non-concurrent access 20non-concurrent access environments 20non-TCP/IP subsystems 21normal job run 60Notification 345notification services 345Notify Method 487NT filesystem 140NTFS 140NTFS file system 355

OObject Data Manager

See ODMobject database 406object dispatcher 399, 489observations 594odadmin 406odadmin command 320ODM 134, 268ODM entry 213odmget 214Online Planning Worksheet 211OPC 51Open Source Cluster Application Resources 33Opens file dependency 49Oracle Applications 50Oracle e-Business Suite 50oserv 320oserv service 399, 606oserv.exe 604oserv.rc 479oslevel 114

Pparent process 61Participating Node Names 492Participating Nodes 87Patching


Best practices 209Connector 204HACMP 5.1 117IBM Fix Central web page 117IBM Tivoli Workload Scheduler 204, 582Job Scheduling Console 305log file 210operating system 141, 577patch apply sequence for Framework 4.1 313Tivoli Framework and components 318Tivoli TMR software 583twspatch script 204

PeopleSoft 50PeopleSoft Client 51PeopleSoft Extended Agent 50PeopleSoft job 51Percentage of uptime 344persistent 78Persistent IP label 78physical disks 82physical network 77Planning

applications for high availability 70HA hardware considerations 41HA software considerations

Application behavior 39Automation 40Dependencies 40Fallback policy 41Licensing 40Robustness 41

HACMP Cluster network 76HACMP Cluster nodes 68HACMP resource groups 87HACMP shared disk device 81HACMP shared LVM components 83high availability design 418IBM Tivoli Workload Scheduler in an HACMP Cluster 184MSCS hardware 139MSCS installation 139shared disks for HACMP 421

point-to-point network 35policy region 465, 539, 605polling interval 228port address 57port number 56post-event commands 92, 269PowerPC 417

pre-event commands 269Preferred owner 380Prevent failback 382primary IP hostname 335primary node 72private connection 138Private Network Connection 154private NIC 138Process application monitoring 484Process control 18process ID 62Process monitoring 228Process Owner 486process request table 51Production considerations 340production day 3production file 194production plan 204program 2promote the workstation 357Provisioning 345psagent 51public NIC 138PVID 84

Qquiesce script 247, 269quiescing the application server 245quorum 149Quorum Disk 23, 150

RR/3 Application Server 50R3batch 50RAID 26RAID array 26rccondsucc 60real life implementation 571recovery procedure 41Redbooks Web site 613

Contact us xiredundant disk adapters 82redundant disk controllers 43Redundant hardware 34redundant network adapter 26redundant network path 26redundant physical networks 77regedt32.exe 557

Index 623

region password 399registry key 534Registry replication 569reintegrated node 92reintegration 7remote filesystem 313, 399remote R3 System 50remote shell access 332Remote UNIX access method 52Remote UNIX Extended Agent 52replicate registry keys 528Required skills 590Resolve Dependencies 58Resolvedep 387resource 23, 35resource group 35–36, 87resource group fallover 87Resource Group Name 87Resource group policy

Cascading 87Concurrent 87Custom 87Rotating 87

Resource Group Worksheet 87Automatically Import Volume Groups 88Cascading Without Fallback Activated 88Cluster Name 87Disk Fencing Activated 88File systems Mounted before IP Configured 88Filesystem Recovery Method 88Filesystems 87Filesystems Consistency Check 88Inactive Takeover 88Management Policy 87Participating Nodes 87Resource Group Name 87Service IP Label 87Volume Groups 87

response file 408Restart Count 486Restart Interval 487Restart Method 487restoration of service 18return code 60RFC 1123 252RFC 952 252Robustness 41root user 195rotating 257

RS-232C 35, 213

SSamba 313Sample last.cfg file 561SAN 13SAN network 43SAP Extended Agent 50SAP instance 50SAP R/3 50SchedulerDatabase 337, 407SchedulerEngine 337, 407SchedulerPlan 337, 407scheduling network 2scheduling objects 65, 204SCSI 82, 417SCSI drives 140Secure Sockets Layer 503Security 342security file 198Serial 213Serial Storage Architecture

See SSASERVER column 406server failure 22Server versus job availability 10service 78Service directory registry key 604Service Engineers 593Service IP Label 78, 87, 320Service Pack 4 138Servlet 2.2 specifications 464set_force_bind 462set_force_bind variable 322setup_env.cmd 604setup_env.sh 604shared disk volume 348shared LVM access 453shared memory segments 521Shared nothing 22shared nothing clustering architecture 22shared resource 13Shared Volume Group/Filesystem Worksheet 84

Filesystem Mount Point 85Log Logical Volume name 84Logical Volume Name 84Major Number 84Node Names 84


Number of Copies of Logical Partition 84Physical Volume 84Shared Volume Group Name 84Size 85

single point of failure 43single points of failure 82Small Computer System Interfaces

SCSISMIT 343SMTP e-mail 400SMUX 342Software configurations 46Software considerations

Application behavior 39Automation 40Dependencies 40Fallback policy 41Licensing 40Robustness 41

software heartbeat 22Solaris 33Solaris operating systems 463spider HTTP service 464SSA 82, 93

Serial Storage ArchitectureSSA connection address 424, 426SSA disk subsystem 82SSA Disk system 43SSA disk tray 345SSA links 421Stabilization Interval 486, 488Standard Agent 357standard list file 51start and stop scripts 584start-of-day processing 58startup policy 492statefull connection 22stateless connection 22static IP address 139Stop Commands/Procedures 74Storage Area Network

See SANstty test 578subevent 269subnet 77, 155, 175, 596subnet mask 257SUCCES 60successful job 60Sun Cluster 33

supported HA configuration for a Tivoli server 416supported platforms 408switch manager command 59switchmgr 58–59symbolic link 241Symphony file 58–59synchronize the configuration 280system crash 61

Ttar file 204target 213target file 216target mode interface 215Target Mode SCSI 35, 213Target Mode SSA 35, 213TCP port number 389TCP/IP Network Interface Worksheet 78

Interface Function 78IP Address 78Netmask 79Network Interface 78Network Name 78Node Name 78

TCP/IP Networks Worksheet 77–78Cluster Name 77IP Address Offset for Heart beating over IP Aliases 77IPAT via IP Aliases 77Netmask 77Network Name 77Network Type 77

TCP/IP subsystem 21TCPaddr 387TCPIP 51-thiscpu 192Threshold 381Time synchronization 341time to quiesce 497Tivoli administrator 330Tivoli database 401Tivoli Desktop 204, 401Tivoli Desktop applications 21Tivoli Desktop users 419Tivoli Endpoint 555, 584Tivoli Enterprise environment 462Tivoli Enterprise products 503Tivoli environment variable 318

Index 625

Tivoli Framework 3.7.1 408Tivoli Framework Administrations Team 591Tivoli Framework/HACMP integration

Analyze assessments 432Configure HACMP 480Configure the application monitoring 484Configure the logical volume 441Create a logical volume and a JFS filesystem 437Create shared disk volume 420Export the volume group 444Implementing 416Install Tivoli Framework 453Plan for high availability 453Production considerations 502Re-import the volume group 446Security 503Tivoli Endpoints 466Tivoli Enterprise products 503Tivoli Managed Node 464Tivoli Web interfaces 464Verify the volume group sharing 450

Tivoli Job Scheduling administration user 397Tivoli Job Scheduling Services 1.3 408Tivoli Management Region 65

See TMRTivoli Management Region server 66Tivoli Netman 368Tivoli region ID 335Tivoli Remote Access Account 399, 510Tivoli Remote Execution Service

See TRIPTivoli Software Installation Service 399Tivoli TMR software 583Tivoli Token Service 368Tivoli Web interfaces 464Tivoli Workload Scheduler 368Tivoli_Admin_Privleges group 507tivoliap.dll 520, 602TivoliAP.dll file 400tmersrvd account 507TMF_JSS.IND 195TMR 65–66TMR interconnection 311TMR server 65TMR versus Managed Node installation 583Token-Ring 21top-level policy region 605TRIP 540

TRIP resource 540TRIP service 528TTY Device 577two node cluster 43two-way interconnected TMR 337two-way interconnection 335, 406TWS_CONN.IND 325TWShome directory 326twspatch script 204Types of hardware clusters

Disk Mirroring 45Grid Computing 45Multi node cluster 44Two node cluster 43

UUNIX cluster 50–51unixlocl 52unixrsh 52upgrade AIX 114

Vvaryoffvg 110varyonvg 110Verification Commands 71, 74Verify Endpoint fallover 502Verify Managed Node fallover 501Veritas Cluster Service 33virtual IP 155virtual IP label 76virtual server 24, 508volume group 83, 102volume group major number 435

Wwbkupdb 406wclient 465wconnect 333–334, 406wcrtgate 471wgateway 471wgetadmin command 331Windows 2000 Advanced Edition 138Windows 2000 Advanced Server 141Windows Components Wizard 159Windows NT/2000 Server Enterprise Edition 21Windows registry 604winstall command 325


wkbkupdb 323wlocalhost binary 603wlocalhost command 602wlookup 333, 406wlookup command 328wlsconn 335, 406wmaeutil 407workstation 2workstation definition 387workstation limit 368workstation name 50wrapper 52wrapper script 60wserver 465wserver command 334wsetadmin command 330wtmrname 605wtwsconn.sh 327wupdate 407

Xx-agent 49

YY-cable 138

Zz/OS access method 51z/OS gateway 51

Index 627

(1.0” spine)0.875”<->

1.498”460 <->

788 pages

High Availability Scenarios with

IBM Tivoli W

orkload Scheduler andIBM

Tivoli Framew

ork

®

SG24-6632-00 ISBN 0738498874

INTERNATIONAL TECHNICALSUPPORTORGANIZATION

BUILDING TECHNICALINFORMATION BASED ONPRACTICAL EXPERIENCE

IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.

For more information:ibm.com/redbooks

High Availability Scenarios with IBM Tivoli Workload Scheduler andIBM Tivoli FrameworkImplementing high availability for ITWS and Tivoli Framework

Windows 2000 Cluster Service and HACMP scenarios

Best practices and tips

In this IBM Redbook, we show you how to design and create highly available IBM Tivoli Workload Scheduler and IBM Tivoli Management Framework (TMR server, Managed Nodes and Endpoints) environments. We present High Availability Cluster Multiprocessing (HACMP) for AIX and Microsoft Windows Cluster Service (MSCS) case studies. The implementation of IBM Tivoli Workload Scheduler within a high availability environment will vary from platform to platform and from customer to customer, based on the needs of the installation. Here, we cover the most common scenarios and share practical implementation tips. We also give recommendations for other high availability platforms; although there are many different clustering technologies in the market today, they are similar enough to allow us to give useful advice regarding the implementation of a highly available scheduling system.Finally, although we basically address highly available scheduling systems, we also offer a section for customers who want to implement a highly available IBM Tivoli Management Framework environment, but who are not currently using IBM Tivoli Workload Scheduler. This publication is intended to be used as a major reference for designing and creating highly available IBM Tivoli Workload Scheduler and Tivoli Framework environments.

Back cover




high availability scenarios with ibm tivoli workload scheduler and ibm tivoli framework sg246632

Technology

ibm tivoli frameworkmarch

ibm hacmp

tivoli endpoints466

tivoli endpoints555appendix

ibm tivoliworkload scheduler

copyright ibm

tivoli framework servers

tivoli managed node