activities hacmp

21
Day-To-Day activities on HACMP. Overview This document contains Operational procedure for Day-To-Day activities respective to the HACMP. Contents: 1. Basics 2. HACMP Installation 3. HACMP Configuration 4. Disk Heartbeat 5. HACMP Startup/Stop 6. Resource Group Management 7. Application startup/stop scripts 8. HACMP Logical Volume Management 9. Cluster verification 10. User and Group Administration

Upload: musabsyd

Post on 27-Nov-2015

75 views

Category:

Documents


2 download

DESCRIPTION

ha activities daily

TRANSCRIPT

Day-To-Day activities on HACMP.

Overview

This document contains Operational procedure for Day-To-Day

activities respective to the HACMP.

Contents:

1. Basics

2. HACMP Installation

3. HACMP Configuration

4. Disk Heartbeat

5. HACMP Startup/Stop

6. Resource Group Management

7. Application startup/stop scripts

8. HACMP Logical Volume Management

9. Cluster verification

10. User and Group Administration

Basics:

Cluster Topology: The Nodes, networks, storage, clients, persistent

node ip label/devices

Cluster resources: HACMP can move these components from one

node to others Ex: Service labels, File systems and applications

HACMP Services:

Cluster communication daemon (clcomdES)

Cluster Manager (clstrmgrES)

Cluster information daemon (clinfoES)

Cluster locks manager (cllockd)

Cluster SMUX peer daemon (clsmuxpd)

HACMP Daemons: clstrmgr, clinfo, clmuxpd, cllockd.

HACMP installation:

Smitty install_all à fast path for installation

Start the cluster communication daemon à startsrc –s clcomdES

Upgrading the cluster options: node by node migration and snapshot

conversion

Steps for migration:

Stop cluster services on all nodes

Upgrade the HACMP software on each node

Start cluster services on one node at a time

Convert from supported version of HAS to hacmp

Current s/w should be commited

Save snapshot

Remove the old version

Install HA 5.1 and verify

Check previous version of cluster: lslpp –h ―cluster‖

To save your HACMP configuration, create a snapshot in HACMP

Remove HACMP: smitty install_remove ( select software name

cluster*)

Lppchk –v and lppchk –c cluster* both commands run clean if the

installation is ok.

After you have installed HA on cluster nodes you need to convert and

apply the snapshot. Converting the snapshot must be performed

before rebooting the cluster nodes

Cluster Configuration:

All HACMP configuration is done through the smit menus. The rest of

this section tells you what the configuration is. This part tells you how

to do that configuration. Unless noted, this need only be done on one

server in the HACMP cluster – HACMP copies everything to the other

server.

smitty hacmp

à cluster configuration

à cluster topology

à configure cluster

à add a cluster definition

à cluster name : cl_mgmt

à configure nodes

à add cluster nodes

à enter the hostnames of the two nodes, separated by spaces

à configure adapters

à There are 8 adapters to configure in a standard implementation, so

this screen must be completed 8 times – once for each adapter. The

8 are : service, standby, boot and serial adapters for each of the

servers,

Note that

·All labels must be in /etc/hosts before you do this step

·Adapter IP label must match the entry in /etc/hosts

·network type is ―ether‖ except for the serial adapters which are

―rs232‖.

·Network attribute is ―public‖ for all adapters except the serial

adapters which are ―serial‖

·Network name is ―ether1‖ for all adapters except the serial

adapters which are ―serial1‖

·Node name is required for all adapters

·Other fields can be left blank

à Show cluster topology

à Show cluster topology

Check that this output looks like the cluster topology shown below.

à Synchronize cluster topology

à Run this with defaults. If it fails, check output and correct any errors

(these may be errors in network or AIX as well as HACMP

configuration).

à Cluster Resources

à Define Resource Groups

à Add a resource group

à See resource group definitions and names below. The first three

lines of the definition are defined in this panel.

à Define Application Servers

à Add an application server

à See below for application server configuration details.

à Change/Show Resources/Attributes for a resource group

à For the resource group fill in the attributes as shown below.

à Synchronise cluster resources

à Synchronize with the defaults. If it fails, check output and fix any

problems.

Once your cluster has completed a resource synchronization with no

errors (and you are happy with any warnings) you have completed the

HACMP configuration. You may now start HACMP.

smit hacmp

à Cluster services

à Start cluster services

Disk Heartbeat:

Disk heartbeating will typically requires 4 seeks/second. That is

each of two nodes will write to the disk and read from the disk

once/second.

Configuring disk heartbeat:

Vpaths are configured as member disks of an enhanced

concurrent volume group. Smitty lvmàselect volume groupsàAdd

a volume groupàGive VG name, PV names, VG major number,

Set create VG concurrent capable to enhanced concurrent.

Import the new VG on all nodes using smitty importvg or

importvg –V 53 –y c23vg vpath5

Create the diskhb networkàsmitty hacmpàextended

configuration àextended topology configurationàconfigure

hacmp networksàAdd a network to the HACMP clusteràchoose

diskhb

Add 2 communication devicesà smitty hacmpàextended

configuration àextended topology configurationàConfigure

HACMP communication Interfaces/DevicesàAdd

communication interfaces/devicesàAdd pre-defined

communication interfaces and devicesà communication

devicesàchoose the diskhb

Create one communication device for other node also

Testing Disk Heartbeat connectivity:/usr/sbin/rsct/dhb_read is

used to test the validity of a diskhb connection.

Dhb_read –p vpath0 –r for receives data over diskhb network

Dhb_read –p vpath3 –t for transmits data over diskhb network.

Monitoring disk heartbeat: Monitor the activity of the disk

heartbeats via lssrc –ls topsvcs.

Cluster Startup/Stop:

Cluster Startup : smit cl_admin à Manage HACMP Services > Start

ClusterServices

Note: Monitor with /tmp/hacmp.out and check for

node_up_complete.

Stop the cluster : smitty cl_admin à HACMP Services > Stop

ClusterServices

Note: Monitor with /tmp/hacmp.out and check fr

node_down_complete.

Resource Group Management:

Resource group takeover relationship:

1. Cascading

2. Rotating

3. Concurrent

4. Custom

Cascading:

Cascading resource group is activated on its home node by

default.

Resource group can be activated on low priority node if the

highest priority node is not available at cluster startup.

If node failure resource group falls over to the available node

with the next priority.

Upon node reintegration into the cluster, a cascading resource

group falls back to its home node by default.

Attributes:

1. Inactive takeover(IT): Initial acquisition of a resource group in

case the home node is not available.

2. Fallover priority can be configured in default node priority list.

3. cascading without fallback is an attribute that modifies the fall

back behavior. If cwof flag is set to true, the resource group will

not fall back to any node joining. When the flag is false the

resource group falls back to the higher priority node.

Rotating:

At cluster startup first available node in the node priority list will

activate the resource group.

If the resource group is on the takeover node. It will never

fallback to a higher priority node if one becomes available.

Rotating resource groups require the use of IP address takeover.

The nodes in the resource chain must all share the same network

connection to the resource group.

Concurrent:

A concurrent RG can be active on multiple nodes at the same

time.

Custom:

Users have to explicitly specify the desired startup, fallover and

fallback procedures.

This support only IPAT – via aliasing service IP addresses.

Startup Options:

Online on home node only

Online on first available node

Online on all available nodes

Online using distribution policyàThe resource group will only be

brought online if the node has no other resource group online.

You can find this by lssrc –ls clstrmgrES

Fallover Options:

Fallover to next priority node in list

Fallover using dynamic node priorityàThe fallover node can be

selected on the basis of either its available CPU, its available

memory or the lowest disk usage. HACMP uses RSCT to gather

all this information then the resource group will fallover to the

node that best meets.

Bring offlineàThe resource group will be brought offline in the

event of an error occur. This option is designed for resource

groups that are online on all available nodes.

Fallback Options:

Fallback to higher priority node in the list

Never fallback

Resource group Operation:

Bring a resource group offline: smitty cl_adminàselect

hacmp resource group and application managementàBring a

resource group offline.

Bring a resource group online: smitty hacmp àselect hacmp

resource group and application managementàBring a resource

group online.

Move a resource group: smitty hacmp à select hacmp

resource group and application managementà Move a resource

group to another node

To find the resource group information: clrginfo –P

Resource group states: online, offline, aquiring, releasing, error,

temporary error, or unknown.

Application Startup/Stop Scripts:

smitty hacmp

à cluster configuration

à Cluster Resources

à Define Application Servers

à Add an application server

Configure HACMP Application Monitoring: smitty

cm_cfg_appmonàAdd a process application monitoràgive process

names, app startup/stop scripts

HACMP Logical Volume Management:

C-SPOC LVM: smitty cl_admin à HACMP Logical Volume

Management

Shared Volume groups

Shared Logical volumes

Shared File systems

Synchronize shared LVM mirrors (Synchronize by

VG/Synchronize by LV)

Synchronize a shared VG definition

C-SPOC concurrent LVM: smitty cl_admin à HACMP

concurrent LVM

Concurrent volume groups

Concurrent Logical volumes

Synchronize concurrent LVM mirrors

C-SPOC Physical volume management: smitty

cl_adminàHACMP physical volume management

Add a disk to the cluster

Remove a disk from the cluster

Cluster disk replacement

Cluster datapath device management

Cluster Verification:

smitty hacmpàExtended verificationàExtended verification and

synchronization. Verification log files stored in /var/hacmp/clverify.

/var/hacmp/clverify/clverify.log à Verification log

/var/hacmp/clverify/pass/nodename à If verification succeeds

/var/hacmp/clverify/fail/nodename à If verification fails

Automatic cluster verification: Each time you start cluster services and

every 24 hours.

Configure automatic cluster verification: smitty hacmpàproblem

determination toolsàhacmp verification à Automatic cluster

configuration monitoring.

User and group Administration:

Smitty cl_usergroupàusers in a HACMP cluster

Add a user to the cluster

List users in the cluster

Change/show characteristics of a user in the cluster

Remove a user from the cluster

Smitty cl_usergroupàGroups in a HACMP cluster

Add a group to the cluster

List groups to the cluster

Change a group in the cluster

Remove a group

Smitty cl_usergroupàPasswords in an HACMP cluster

FAQ’S

Does HACMP work on different operating systems?

Yes. HACMP is tightly integrated with the AIX 5L operating system and System p servers

allowing for a rich set of features which are not available with any other combination of

operating system and hardware. HACMP V5 introduces support for the Linux operating system

on POWER servers. HACMP for Linux supports a subset of the features available on AIX 5L,

however this mutli-platform support provides a common availability infrastructure for your entire

enterprise.

What applications work with HACMP?

All popular applications work with HACMP including DB2, Oracle, SAP, WebSphere, etc.

HACMP provides Smart Assist agents to let you quickly and easily configure HACMP with

specific applications. HACMP includes flexible configuration parameters that let you easily set it

up for just about any application there is.

Does HACMP support dynamic LPAR, CUoD, On/Off CoD,

or CBU?

HACMP supports Dynamic Logical Partitioning, Capacity Upgrade on Demand, On/Off Capacity

on Demand and Capacity Backup Upgrade.

If a server has LPAR capability, can two or more

LPARs be configured with unique instances of HACMP

running on them without incurring additional

license charges?

Yes. HACMP is a server product that has one charge unit: number of processors on

which HACMP will be installed or run. Regardless of how many LPARs or instances of AIX

5L that run in the server, you are charged based on the number of active processors in the

server that is running HACMP. Note that HACMP configurations containing

multipleLPARs within a single server may represent a potential single point-of-failure. To avoid

this, it is recommended that the backup for an LPAR be an LPAR on a different server or a

standalone server.

Does HACMP support non-IBM hardware or operating

systems?

Yes. HACMP for AIX 5L supports the hardware and operating systems as specified in the

manual where HACMP V5.4includes support for Red Hat and SUSE Linux.

Paging space and paging rates

HACMP interview questions a. What characters should a hostname contain for HACMP configuration? The hostname cannot have following characters: -, _, * or other special characters. b. Can Service IP and Boot IP be in same subnet? No. The service IP address and Boot IP address cannot be in same subnet. This is the basic requirement for HACMP cluster configuration. The verification process does not allow the IP addresses to be in same subnet and cluster will not start. c. Can multiple Service IP addresses be configured on single Ethernet cards? Yes. Using SMIT menu, it can be configured to have multiple Service IP addresses running on single Ethernet card. It only requires selecting same network name for specific Service IP addresses in SMIT menu. d. What happens when a NIC having Service IP goes down? When a NIC card running the Service IP address goes down, the HACMP detects the failure and fails over the service IP address to available standby NIC on same node or to another node in the cluster. e. Can Multiple Oracle Database instances be configured on single node of HACMP cluster? Yes. Multiple Database instances can be configured on single node of HACMP cluster. For this one needs to have separate Service IP addresses over which the listeners for every Oracle Database will run. Hence one can have separate Resource groups which will own each Oracle instance. This configuration will be useful if there is a failure of single Oracle Database instance on one node to be failed over to another node without disturbing other running Oracle instances. f. Can HACMP be configured in Active - Passive configuration? Yes. For Active - In Passive cluster configuration, do not configure any Service IP on the passive node. Also for all the resource groups on the Active node please specify the passive node as the next node in the priority to take over in the event of failure of active node. g. Can file system mounted over NFS protocol be used for Disk Heartbeat? No. The Volume mounted over NFS protocol is a file system for AIX, and since disk

device is required for Enhanced concurrent capable volume group for disk heartbeat the NFS file system cannot be used for configuring the disk heartbeat. One needs to provide disk device to AIX hosts over FCP or iSCSI protocol. h. Which are the HACMP log files available for troubleshooting? Following are log files which can be used for troubleshooting: 1. /var/hacmp/clverify/current//* contains logs from current execution of cluster verification. 2. /var/hacmp/clverify/pass//* contains logs from the last time verification passed. 3. /var/hacmp/clverify/fail//* contains logs from the last time verification failed. 4. /tmp/hacmp.out file records the output generated by the event scripts of HACMP as they execute. 5. /tmp/clstmgr.debug file contains time-stamped messages generated by HACMP clstrmgrES activity. 6. /tmp/cspoc.log file contains messages generated by HACMP C-SPOC commands. 7. /usr/es/adm/cluster.log file is the main HACMP log file. HACMP error messages and messages about HACMP related events are appended to this log. 8. /var/adm/clavan.log file keeps track of when each application that is managed by HACMP is started or stopped and when the node stops on which an application is running. 9. /var/hacmp/clcomd/clcomd.log file contains messages generated by HACMP cluster communication daemon. 10. /var/ha/log/grpsvcs. file tracks the execution of internal activities of the grpsvcs daemon. 11. /var/ha/log/topsvcs. file tracks the execution of internal activities of the topsvcs daemon. 12. /var/ha/log/grpglsm file tracks the execution of internal activities of grpglsm daemon.

Key PowerHA terms

The following terms are used throughout this article and are helpful to

know when discussing PowerHA:

Cluster: A logical grouping of servers running PowerHA.

Node: An individual server within a cluster.

Network: Although normally this term would refer to a larger

area of computer-to-computer communication (such as a WAN),

in PowerHA network refers to a logical definition of an area for

communication between two servers. Within PowerHA, even

SAN resources can be defined as a network.

Boot IP: This is a default IP address a node uses when it is first

activated and becomes available. Typically—and as used in this

article—the boot IP is a non-routable IP address set up on an

isolated VLAN accessible to all nodes in the cluster.

Persistent IP: This is an IP address a node uses as its regular

means of communication. Typically, this is the IP through which

systems administrators access a node.

Service IP: This is an IP address that can "float" between the

nodes. Typically, this is the IP address through which users

access resources in the cluster.

Application server: This is a logical configuration to tell

PowerHA how to manage applications, including starting and

stopping applications, application monitoring, and application

tunables. This article focuses only on starting and stopping an

application.

Shared volume group: This is a PowerHA-managed volume

group. Instead of configuring LVM structures like volume

groups, logical volumes, and file systems through the operating

system, you must use PowerHA for disk resources that will be

shared between the servers.

Resource group: This is a logical grouping of service IP

addresses, application servers, and shared volume groups that

the nodes in the cluster can manage.

Failover: This is a condition in which resource groups are

moved from one node to another. Failover can occur when a

systems administrator instructs the nodes in the cluster to do so

or when circumstances like a catastrophic application or server

failure forces the resource groups to move.

Failback/fallback: This is the action of moving back resource

groups to the nodes on which they were originally running after a

failover has occurred.

Heartbeat: This is a signal transmitted over PowerHA networks

to check and confirm resource availability. If the heartbeat is

interrupted, the cluster may initiate a failover depending on the

configuration.