mcserviceguard2

MC/ServiceGuardMC/ServiceGuard

Gwendolyn WrightGwendolyn Wright

01/16/0301/16/03

22

AgendaAgenda• IntroductionIntroduction• MC/ServiceGuard OverviewMC/ServiceGuard Overview• MC/ServiceGuard DefinitionsMC/ServiceGuard Definitions• MC/ServiceGuard Cluster Configuration for MC/ServiceGuard Cluster Configuration for

Partitioned SystemsPartitioned Systems• MC/ServiceGuard Naming ConventionsMC/ServiceGuard Naming Conventions• MC/ServiceGuard Basic ProceduresMC/ServiceGuard Basic Procedures• MC/ServiceGuard Starting ClusterMC/ServiceGuard Starting Cluster• MC/ServiceGuard Adding Nodes to a ClusterMC/ServiceGuard Adding Nodes to a Cluster• MC/ServiceGuard Removing Nodes from a MC/ServiceGuard Removing Nodes from a

ClusterCluster• MC/ServiceGuard Halting the Entire ClusterMC/ServiceGuard Halting the Entire Cluster• MC/ServiceGuard Reconfiguring the ClusterMC/ServiceGuard Reconfiguring the Cluster• MC/ServiceGuard Halting a PackageMC/ServiceGuard Halting a Package• MC/ServiceGuard Moving a PackageMC/ServiceGuard Moving a Package• MC/ServiceGuard Reconfiguring a PackageMC/ServiceGuard Reconfiguring a Package

33

MC/Service Guard OverviewMC/Service Guard OverviewAn An MC/ServiceGuardMC/ServiceGuard cluster is a networked grouping of HP 9000 series 800 cluster is a networked grouping of HP 9000 series 800 servers (nodes) having sufficient redundancy of software and hardware that servers (nodes) having sufficient redundancy of software and hardware that a single point of failure will not significantly disrupt service. a single point of failure will not significantly disrupt service. MC/ServiceGuard software provides only part of a high availability solution MC/ServiceGuard software provides only part of a high availability solution that includes disk mirroring, redundant disk interface links and that includes disk mirroring, redundant disk interface links and uninterruptible power supplies (UPS). Applications and services are uninterruptible power supplies (UPS). Applications and services are grouped together in packages. In case of a service, node or network failure, grouped together in packages. In case of a service, node or network failure, MC/ServiceGuard can automatically transfer control of all system resources MC/ServiceGuard can automatically transfer control of all system resources in a designated package to another node within a cluster, allowing your in a designated package to another node within a cluster, allowing your applications to remain available with minimal interruption.applications to remain available with minimal interruption.

MC/ServiceGuardMC/ServiceGuard provides the following features: provides the following features: In the case of LAN failure, In the case of LAN failure, MC/ServiceGuardMC/ServiceGuard switches to a standby LAN switches to a standby LAN

on the same node.on the same node. In the case of SPU (Single Processing Unit) failure, the package is In the case of SPU (Single Processing Unit) failure, the package is

transferred from a failed SPU to a functioning SPU automatically and in transferred from a failed SPU to a functioning SPU automatically and in a minimum amount of time.a minimum amount of time.

For software failures, an application can be restarted on the same node For software failures, an application can be restarted on the same node or another node with minimum disruption as per predefined rules.or another node with minimum disruption as per predefined rules.

MC/ServiceGuardMC/ServiceGuard also gives you the advantage of easily transferring control also gives you the advantage of easily transferring control of your application to another SPU in order to bring the original SPU down of your application to another SPU in order to bring the original SPU down for system administration, maintenance or version upgrades.for system administration, maintenance or version upgrades.

44

MC/Service Guard Overview MC/Service Guard Overview (continued)(continued)

HP MC/Serviceguard BenefitsHP MC/Serviceguard Benefits• Fast detection of failure, automatic restoration of applicationsFast detection of failure, automatic restoration of applications• Eliminate operator errorEliminate operator error• Availability during hardware and software maintenanceAvailability during hardware and software maintenance• Robust cluster architectureRobust cluster architecture• Matching system capabilities to application requirementsMatching system capabilities to application requirements• Reduction in planned downtimeReduction in planned downtime• FlexibleFlexible• Minimize performance impact on applications within a clusterMinimize performance impact on applications within a cluster• Flexibility in cluster configurationFlexibility in cluster configuration• Ease of implementationEase of implementation• Protecting data IntegrityProtecting data Integrity

55

MC/Service Guard DefinitionsMC/Service Guard Definitions ClusterCluster:: AA cluster is a collection of up to 16 HP-UX servers (8 for cluster is a collection of up to 16 HP-UX servers (8 for

MC/ServiceGuard OPS Edition) connected together in order to provide MC/ServiceGuard OPS Edition) connected together in order to provide failover functionality to the application(s) that execute on those servers.failover functionality to the application(s) that execute on those servers.

NodeNode:: A node is a server that is a member of an MC/ServiceGuard cluster. A node is a server that is a member of an MC/ServiceGuard cluster. Specifically, the term “node” refers to the server’s role as a member of a Specifically, the term “node” refers to the server’s role as a member of a cluster. So, when a node is down, it means that the cluster daemons on cluster. So, when a node is down, it means that the cluster daemons on that server do not respond. When a node is up, it is an active member of that server do not respond. When a node is up, it is an active member of the cluster.the cluster.

FailoverFailover:: A failover is an event that occurs whenever a clustered node or A failover is an event that occurs whenever a clustered node or a package fails. At such time, the package is transferred from its a package fails. At such time, the package is transferred from its primary node to its adoptive node for execution.primary node to its adoptive node for execution.

Primary NodePrimary Node:: For each package defined within a cluster, the primary For each package defined within a cluster, the primary node is the server that is the first choice for execution of that package.node is the server that is the first choice for execution of that package.

Adoptive or Secondary NodeAdoptive or Secondary Node:: An adoptive node is a server that is defined An adoptive node is a server that is defined to take over the execution of a package in the event of a failover.to take over the execution of a package in the event of a failover.

Local LAN FailoverLocal LAN Failover:: A local LAN failover occurs when there is a A local LAN failover occurs when there is a communications failure on an initialized LAN on a node in a cluster and communications failure on an initialized LAN on a node in a cluster and traffic is automatically transferred to an un-initiated LAN of the same traffic is automatically transferred to an un-initiated LAN of the same type. The un-initiated LAN must be connected to the same network as type. The un-initiated LAN must be connected to the same network as the failed LAN. There is no interruption in service to the the failed LAN. There is no interruption in service to the MC/ServiceGuard package.MC/ServiceGuard package.

66

MC/Service Guard Definitions MC/Service Guard Definitions (continued)(continued)

HeartbeatHeartbeat: : The signal that is exchanged between clustered nodes is The signal that is exchanged between clustered nodes is called the heartbeat. Heartbeat failure will trigger a Service Guard called the heartbeat. Heartbeat failure will trigger a Service Guard failover.failover.

PackagePackage: : A package is a grouping of software (application), disks A package is a grouping of software (application), disks (volume groups), network addresses and monitoring services that (volume groups), network addresses and monitoring services that execute on a server. When the package fails the application, disks, execute on a server. When the package fails the application, disks, network addresses and monitoring services transfer to an adoptive network addresses and monitoring services transfer to an adoptive node for execution. If the whole package can not be transferred to node for execution. If the whole package can not be transferred to the adoptive node, then it will remain in a halted state.the adoptive node, then it will remain in a halted state.

Cluster Aware or Shared Volume GroupCluster Aware or Shared Volume Group:: (This does not apply for the (This does not apply for the PowerPlant cluster since we are using Veritas Volume Manager). Any PowerPlant cluster since we are using Veritas Volume Manager). Any volume group that has been made cluster aware is said to have been volume group that has been made cluster aware is said to have been “clusterized”. Specifically, this means that the “clusterized”. Specifically, this means that the vgchange –c y vgchange –c y <volume-group-name><volume-group-name> command has been executed. This command command has been executed. This command can be executed only when the cluster is up and MC/ServiceGuard can be executed only when the cluster is up and MC/ServiceGuard daemon, daemon, cmcldcmcld is running. is running.

Shared Logical VolumeShared Logical Volume:: A shared logical volume is one defined in a A shared logical volume is one defined in a cluster aware volume group.cluster aware volume group.

Stationary IP Address and NameStationary IP Address and Name:: The IP address that is The IP address that is configured on a LAN interface and associated with anode in a configured on a LAN interface and associated with anode in a cluster is referred to as a stationary IP address. The cluster is referred to as a stationary IP address. The hostnamehostname of that node is usually assigned to one of those IP of that node is usually assigned to one of those IP addresses.addresses.

77

MC/Service Guard Definitions MC/Service Guard Definitions (continued)(continued)

Floating or Relocatable IP Address and NameFloating or Relocatable IP Address and Name:: More than one IP More than one IP address can be assigned to a single LAN interface. The additional address can be assigned to a single LAN interface. The additional IP address configured on a LAN interface and associated with a IP address configured on a LAN interface and associated with a package is called a floating or relocatable IP address. This IP package is called a floating or relocatable IP address. This IP address must be in the same subnet as the stationary IP addresses address must be in the same subnet as the stationary IP addresses within the cluster. When a package overflow occurs, the floating IP within the cluster. When a package overflow occurs, the floating IP address is de-configured from the LAN interface on the primary address is de-configured from the LAN interface on the primary node and configured on the appropriate LAN interface on the node and configured on the appropriate LAN interface on the adoptive node.adoptive node.

NODE_TIMEOUTNODE_TIMEOUT – The Node Timeout parameter controls Cluster – The Node Timeout parameter controls Cluster Timing. The default is 2 seconds. This setting yields the fastest Timing. The default is 2 seconds. This setting yields the fastest cluster reformation. However, the use of the default value cluster reformation. However, the use of the default value increases the potential for spurious reformations due to increases the potential for spurious reformations due to momentary system hangs or network load spikes.momentary system hangs or network load spikes.CabinetCabinet – A cabinet may contain several components, such as one – A cabinet may contain several components, such as one or two PCI (Peripheral Component Interconnect) boxes for I/O in or two PCI (Peripheral Component Interconnect) boxes for I/O in megahertz, as well as disk storage. I/O expansion cabinets can bolt megahertz, as well as disk storage. I/O expansion cabinets can bolt to the main cabinet and it can be comprised of five (5) PCI boxes to the main cabinet and it can be comprised of five (5) PCI boxes and up to six (6) Digital Power Supplies (DPS). The HP and up to six (6) Digital Power Supplies (DPS). The HP Superdome cabinet can be comprised of up to 28gb of memory per Superdome cabinet can be comprised of up to 28gb of memory per cabinet, Peak 16GB/s memory controller bandwith and 64GBs per cabinet, Peak 16GB/s memory controller bandwith and 64GBs per 64-way cabinet, as well as 16 200MB/s (33MHz PCI) or 400MB/s 64-way cabinet, as well as 16 200MB/s (33MHz PCI) or 400MB/s (66MHz PCI) I/O channels per cabinet.(66MHz PCI) I/O channels per cabinet.

88

ServiceGuard Cluster ServiceGuard Cluster Configuration for Partitioned Configuration for Partitioned

SystemsSystemsThe MC/ServiceGuard product provides an infrastructure for the design The MC/ServiceGuard product provides an infrastructure for the design and implementation of highly available HP-UX clusters that can quickly and implementation of highly available HP-UX clusters that can quickly restore mission critical application services after hardware or software restore mission critical application services after hardware or software failures. To achieve the highest level of availability, clusters must be failures. To achieve the highest level of availability, clusters must be configured to eliminate all single points of failure (SPOF). This requires configured to eliminate all single points of failure (SPOF). This requires a careful analysis of the hardware and software infrastructure used to a careful analysis of the hardware and software infrastructure used to build the cluster. Partitioning technologies such as Superdome build the cluster. Partitioning technologies such as Superdome nPartitions and the HP-UX Virtual Partitions (VPARS) present some nPartitions and the HP-UX Virtual Partitions (VPARS) present some unique issues that must be considered when utilizing them within a unique issues that must be considered when utilizing them within a ServiceGuard configuration.ServiceGuard configuration.Partitioning technologies such as nPartitions and VPARS provide Partitioning technologies such as nPartitions and VPARS provide increased flexibility in effectively managing system resources. They can increased flexibility in effectively managing system resources. They can be used to provide hardware and/or software fault isolation between be used to provide hardware and/or software fault isolation between applications sharing the same hardware platform. These technologies applications sharing the same hardware platform. These technologies also allow hardware resources to be more efficiently utilized, based on also allow hardware resources to be more efficiently utilized, based on application capacity requirements, and they provide the means to quickly application capacity requirements, and they provide the means to quickly re-deploy the hardware resources should the application requirements re-deploy the hardware resources should the application requirements change. Given this capability, it is natural to want to utilize these change. Given this capability, it is natural to want to utilize these technologies when designing MC/ServiceGuard clusters. Care must be technologies when designing MC/ServiceGuard clusters. Care must be taken, however, as the use of partitioning does present some unique taken, however, as the use of partitioning does present some unique failure scenarios that must be considered when designing a cluster to failure scenarios that must be considered when designing a cluster to meet specific uptime requirements.meet specific uptime requirements.

99

ServiceGuard Cluster ServiceGuard Cluster Configuration for Partitioned Configuration for Partitioned

Systems (continued)Systems (continued)The partitioning provided by nPartitions is done at a hardware The partitioning provided by nPartitions is done at a hardware level and each partition is isolated from both hardware and level and each partition is isolated from both hardware and software failures of other partitions. VPARS partitioning is software failures of other partitions. VPARS partitioning is implemented at a software level. While this provides greater implemented at a software level. While this provides greater flexibility in dividing hardware resources between partitions and flexibility in dividing hardware resources between partitions and allows partitioning on legacy systems, it does not provide any allows partitioning on legacy systems, it does not provide any isolation of hardware failures between the partitions.isolation of hardware failures between the partitions.

Sample nPartitions and VPARS Sample nPartitions and VPARS ConfigurationsConfigurations

nPartitions

I/O Channels

CPU

Memory

Partitio

n 1

Partitio

n 2

Partitio

n 3

I/O Channels

VPARS

P1

P2

P1

CPU

P2

MemoryP1 P2

SoftwarePartitions

VPARS partition 1 (p1)VPARS partition 2 (p2)

1111

ServiceGuard Design AssumptionsServiceGuard Design AssumptionsHardware RedundancyHardware Redundancy – ServiceGuard, like all other HA clustering – ServiceGuard, like all other HA clustering products, uses products, uses hardware redundancy hardware redundancy to maintain application to maintain application availability. For example, the ServiceGuard configuration availability. For example, the ServiceGuard configuration guidelines require redundant networking paths between the nodes guidelines require redundant networking paths between the nodes in the cluster. This requirement protects against total loss of in the cluster. This requirement protects against total loss of communication to a node if a networking interface card fails. If a communication to a node if a networking interface card fails. If a card should fail, there is a redundant card that can take over for it.card should fail, there is a redundant card that can take over for it.

As can be readily seen, this strategy of hardware redundancy relies As can be readily seen, this strategy of hardware redundancy relies on an important underlying assumption: on an important underlying assumption: the failure of one the failure of one component is independent of the failure of other components. component is independent of the failure of other components. That is, if the two networking cards were somehow related, then That is, if the two networking cards were somehow related, then there could exist a failure event that would disable them both there could exist a failure event that would disable them both simultaneously. This represents a SPOF and effectively defeats the simultaneously. This represents a SPOF and effectively defeats the purpose of having redundant cards. It is for this reason that the purpose of having redundant cards. It is for this reason that the ServiceGuard configuration rules do not allow both heartbeat ServiceGuard configuration rules do not allow both heartbeat networks on a node to travel through multiple ports on the same networks on a node to travel through multiple ports on the same multi-ported networking interface. A single networking interface multi-ported networking interface. A single networking interface card failure would disable both heartbeat networks.card failure would disable both heartbeat networks.

1212

Service Guard Design Assumptions Service Guard Design Assumptions (continued)(continued)

Cluster MembershipProtocolCluster MembershipProtocol – This same philosophy of hardware – This same philosophy of hardware redundancy is reflected in the clustering concept. If a node in the cluster redundancy is reflected in the clustering concept. If a node in the cluster fails, another node is available to take over applications that were active fails, another node is available to take over applications that were active on the failed node. Determining, with certainty, which nodes in the on the failed node. Determining, with certainty, which nodes in the cluster are currently operational is accomplished through a cluster cluster are currently operational is accomplished through a cluster membership protocol whereby the nodes exchange heartbeat messages membership protocol whereby the nodes exchange heartbeat messages and maintain a cluster and maintain a cluster quorumquorum..After a failure that results in loss of communication between the nodes, After a failure that results in loss of communication between the nodes, active cluster nodes execute a cluster reformation algorithm that is used active cluster nodes execute a cluster reformation algorithm that is used to determine the new cluster quorum. This new quorum, in conjunction to determine the new cluster quorum. This new quorum, in conjunction with the previous quorum, is used to determine which nodes remain in with the previous quorum, is used to determine which nodes remain in ServiceGuard Cluster Configuration for Partitioned Systems.ServiceGuard Cluster Configuration for Partitioned Systems.The algorithm for cluster reformation generally requires a cluster quorum The algorithm for cluster reformation generally requires a cluster quorum of a strict majority, that is , more than 50% of the nodes that were of a strict majority, that is , more than 50% of the nodes that were previously running. However, exactly 50% of the previously running nodes previously running. However, exactly 50% of the previously running nodes are allowed to re-form as a new cluster, provided there is a guarantee that are allowed to re-form as a new cluster, provided there is a guarantee that the other 50% of the previously running nodes do not also re-form. In the other 50% of the previously running nodes do not also re-form. In these cases, some form of quorum arbitration or tie-breaker is needed. these cases, some form of quorum arbitration or tie-breaker is needed. For example, if there is a communication failure between the nodes in a For example, if there is a communication failure between the nodes in a two-node cluster, and each node is attempting to re-form the cluster, then two-node cluster, and each node is attempting to re-form the cluster, then ServiceGuard must only allow one node to form the new cluster. This is ServiceGuard must only allow one node to form the new cluster. This is accomplished by configuring a accomplished by configuring a cluster lockcluster lock..

1313


The important concept to note here is that if more than 50% of the The important concept to note here is that if more than 50% of the nodes in the cluster fail at the same time, then the remaining nodes in the cluster fail at the same time, then the remaining nodes have insufficient quorum to form a new cluster and fail nodes have insufficient quorum to form a new cluster and fail themselves. This is irrespective of whether or not a cluster lock themselves. This is irrespective of whether or not a cluster lock has been configured. It is for this reason that cluster configuration has been configured. It is for this reason that cluster configuration must be carefully analyzed to prevent failure modes that are must be carefully analyzed to prevent failure modes that are common amongst the cluster nodes. common amongst the cluster nodes. Quorum ArbitrationQuorum Arbitration – One form of quorum arbitration is a shared – One form of quorum arbitration is a shared disk device configured as a cluster lock.disk device configured as a cluster lock.The The cluster lockcluster lock diskdisk is a disk area located in a volume group that is a disk area located in a volume group that is shared by all nodes in the cluster. The cluster lock disk is used is shared by all nodes in the cluster. The cluster lock disk is used as a tie-breaker only for situations in which a running cluster fails as a tie-breaker only for situations in which a running cluster fails and, as ServiceGuard attempts to form a new cluster, the cluster is and, as ServiceGuard attempts to form a new cluster, the cluster is split into two sub-clusters of equal size. Each sub-cluster attempts split into two sub-clusters of equal size. Each sub-cluster attempts to acquire the cluster lock. The sub-cluster that gets the cluster to acquire the cluster lock. The sub-cluster that gets the cluster lock forms the new cluster and the nodes that were unable to get lock forms the new cluster and the nodes that were unable to get the lock cease activity. This prevents the possibility of split-brain the lock cease activity. This prevents the possibility of split-brain activity, this is, two sub-clusters running at the same time. If the activity, this is, two sub-clusters running at the same time. If the two sub-clusters are of unequal size, the sub-cluster with grater two sub-clusters are of unequal size, the sub-cluster with grater than 50% of the previous quorum forms the new cluster and the than 50% of the previous quorum forms the new cluster and the cluster lock is not used.cluster lock is not used.

1414


For obvious reasons, two node cluster configurations are required to For obvious reasons, two node cluster configurations are required to configure some type of quorum arbitration. By definition, failure of a configure some type of quorum arbitration. By definition, failure of a node or loss of communication in a two-node cluster results in a 50% node or loss of communication in a two-node cluster results in a 50% partition. Due to the assumption that nodes fail independently of each partition. Due to the assumption that nodes fail independently of each other (other (independent failure assumptionindependent failure assumption), ), the use of quorum arbitration the use of quorum arbitration for cluster configurations with three or more nodes is optional, though for cluster configurations with three or more nodes is optional, though highly recommended.highly recommended.

Partition InteractionsPartition Interactions – We need to examine what extent the – We need to examine what extent the partitioning schemes either meet or violate the partitioning schemes either meet or violate the independent failure independent failure assumptionassumption..

The partitioning provided by The partitioning provided by nPartitionsnPartitions is done at a hardware level is done at a hardware level and each partition is isolated from both hardware and software failures and each partition is isolated from both hardware and software failures of other partitions. This provides very good isolation between the OS of other partitions. This provides very good isolation between the OS instances running within the partitions. So in this sense, nPartitions instances running within the partitions. So in this sense, nPartitions meets the assumption that the failure of one node (partition) will not meets the assumption that the failure of one node (partition) will not affect other nodes. However, within the Superdome infrastructure, affect other nodes. However, within the Superdome infrastructure, there does exist a very small possibility of a failure that can affect all there does exist a very small possibility of a failure that can affect all partitions within the cabinet. So, to the extent that this infrastructure partitions within the cabinet. So, to the extent that this infrastructure failure exists, nPartitions violates the independent failure assumption.failure exists, nPartitions violates the independent failure assumption.

1515


TheThe VPARS VPARS form of partitioning is implemented at a software level. While form of partitioning is implemented at a software level. While this provides greater flexibility in dividing hardware resources between this provides greater flexibility in dividing hardware resources between partitions and allows partitioning on legacy systems, it does not provide partitions and allows partitioning on legacy systems, it does not provide any isolation of hardware failures between the partitions. Thus, the any isolation of hardware failures between the partitions. Thus, the failure of a hardware component being used by one partition can bring failure of a hardware component being used by one partition can bring down other partitions within the same hardware platform. From a down other partitions within the same hardware platform. From a software perspective, VPARS provides isolation for most software failures, software perspective, VPARS provides isolation for most software failures, such as kernel panics, between partitions. Due to the lack of hardware such as kernel panics, between partitions. Due to the lack of hardware isolation however, there is no guarantee that a failure, such as a isolation however, there is no guarantee that a failure, such as a misbehaving kernel that erroneously writes to the wrong memory address, misbehaving kernel that erroneously writes to the wrong memory address, will not affect other OS partitions. Based on these observations, one can will not affect other OS partitions. Based on these observations, one can conclude that VPARS will violate the independent failure assumption to a conclude that VPARS will violate the independent failure assumption to a greater degree that will nPartitions.greater degree that will nPartitions.In addition to the failure case interactions, VPAR exhibit a behavior that In addition to the failure case interactions, VPAR exhibit a behavior that should also be considered when including a VPARS as a node in a should also be considered when including a VPARS as a node in a ServiceGuard cluster. Due to the nature of the hardware/firmware sharing ServiceGuard cluster. Due to the nature of the hardware/firmware sharing between VPARS, it is possible for one partition to induce latency in other between VPARS, it is possible for one partition to induce latency in other partitions. For example, during boot up, when the booting partition partitions. For example, during boot up, when the booting partition request the system firmware to initialize the boot disk, it is possible for request the system firmware to initialize the boot disk, it is possible for other partitions running in the same machine to become blocked until the other partitions running in the same machine to become blocked until the initialization operation completes. During ServiceGuard qualification initialization operation completes. During ServiceGuard qualification testing, delays of up to 13 seconds have been observed on systems with a testing, delays of up to 13 seconds have been observed on systems with a PCI bus and SCSI disks.PCI bus and SCSI disks.

1616


Cluster Configuration ConsiderationsCluster Configuration Considerations – – Using any information from Using any information from the preceding sections, we can now assess any impacts or potential the preceding sections, we can now assess any impacts or potential issues that arise from utilizing partitions (either VPARS or issues that arise from utilizing partitions (either VPARS or NPartitions) as part of a ServiceGuard cluster. From a NPartitions) as part of a ServiceGuard cluster. From a ServiceGuard perspective, an OS instance running in a partition, is ServiceGuard perspective, an OS instance running in a partition, is not treated any differently than OS instances running on a non-not treated any differently than OS instances running on a non-partitioned nodes. partitioned nodes. Quorum Arbitration RequirementsQuorum Arbitration Requirements – ServiceGuard configurations – ServiceGuard configurations rules for non-partitioned systems rules for non-partitioned systems requirerequire the use of a cluster lock the use of a cluster lock only in the two node cluster case. This requirement is in place to only in the two node cluster case. This requirement is in place to protect against failures that result in a 50% quorum with respect protect against failures that result in a 50% quorum with respect to the membership prior to the failure. Clusters with more than to the membership prior to the failure. Clusters with more than two nodes do not have this as a strict requirement because of the two nodes do not have this as a strict requirement because of the “independent failure”“independent failure” assumption. As can be seen, this assumption. As can be seen, this assumption is no longer valid when dealing with partitions. assumption is no longer valid when dealing with partitions. Cluster configurations that contain OS instances running within a Cluster configurations that contain OS instances running within a membership based on complete failure of hardware components membership based on complete failure of hardware components that support more than one partition.that support more than one partition.

1717


Rule 1. Configurations containing the potential for a loss of more Rule 1. Configurations containing the potential for a loss of more than 50% membership resulting from a single failure are not than 50% membership resulting from a single failure are not supported.supported.These include configurations with the majority of nodes as These include configurations with the majority of nodes as partitions within a single hardware cabinet. This implies that in partitions within a single hardware cabinet. This implies that in the two cabinet case, the partitions must be symmetrically divided the two cabinet case, the partitions must be symmetrically divided between the cabinets.between the cabinets.HP SuperdomeHP Superdome - 16-, 32-, 64-way and the IO expansion cabinet — - 16-, 32-, 64-way and the IO expansion cabinet — successfully passed all twelve criteria required by The Uptime successfully passed all twelve criteria required by The Uptime Institute for compliance certification. The systems continued to Institute for compliance certification. The systems continued to operate without interruption or loss of functionality through all operate without interruption or loss of functionality through all testing manipulations. The systems were monitored at the testing manipulations. The systems were monitored at the operating console and showed no errors, hard or soft, during these operating console and showed no errors, hard or soft, during these tests. Certification was earned at the Tier IV level, the most fault tests. Certification was earned at the Tier IV level, the most fault tolerant classification. tolerant classification. Exception:Exception: Where all cluster nodes are running within partitions Where all cluster nodes are running within partitions in a single cabinet (such as the so-called in a single cabinet (such as the so-called cluster in a boxcluster in a box configuration). The configuration is supported provided users configuration). The configuration is supported provided users understand and accept the possibility of a complete cluster failure. understand and accept the possibility of a complete cluster failure. This configuration is discussed in the Section, “Cluster In-A-Box”. This configuration is discussed in the Section, “Cluster In-A-Box”.

1818


Rule 2. Configurations containing the potential for a loss of exactly 50% Rule 2. Configurations containing the potential for a loss of exactly 50% membership resulting from a single failure require the use of a cluster lock.membership resulting from a single failure require the use of a cluster lock.

This includes:This includes:– Cluster configurations where the nodes are running in partitions that Cluster configurations where the nodes are running in partitions that

are wholly contained within two hardware cabinets.are wholly contained within two hardware cabinets.

Example: Example: to be supported, a four-node cluster consisting of two to be supported, a four-node cluster consisting of two nPartitions in each of two Superdome cabinets, would require the use of nPartitions in each of two Superdome cabinets, would require the use of a cluster lock.a cluster lock.

– Cluster configurations where the nodes are running as VPARS partitions Cluster configurations where the nodes are running as VPARS partitions that are wholly contained within two nPartitions.that are wholly contained within two nPartitions.

Cluster Configuration and PartitionsCluster Configuration and Partitions – Given the configuration requirements – Given the configuration requirements described in Rule 1 and Rule 2, a few interesting observations can be made described in Rule 1 and Rule 2, a few interesting observations can be made of clusters utilizing partitioning:of clusters utilizing partitioning:– If it is determined that a cluster lock is needed for a particular If it is determined that a cluster lock is needed for a particular

configuration, the cluster must be configured such that the cluster lock configuration, the cluster must be configured such that the cluster lock is isolated from failures affecting the cluster nodes. This means that the is isolated from failures affecting the cluster nodes. This means that the lock device must be powered independently of the cluster nodes (such as lock device must be powered independently of the cluster nodes (such as hardware cabinets containing the partitions that make up the cluster).hardware cabinets containing the partitions that make up the cluster).

1919


Clusters wholly contained within two hardware cabinets and that Clusters wholly contained within two hardware cabinets and that utilize the cluster lock for quorum arbitration, are limited to either utilize the cluster lock for quorum arbitration, are limited to either two or four nodes. This is due to a combination of the existing two or four nodes. This is due to a combination of the existing ServiceGuard rule that limits support of the cluster lock to four ServiceGuard rule that limits support of the cluster lock to four nodes and Rule 1.nodes and Rule 1.Cluster configurations can contain a mixture of VPARS, nPartitions, Cluster configurations can contain a mixture of VPARS, nPartitions, and independent nodes as long as quorum requirements are met.and independent nodes as long as quorum requirements are met.For a cluster configuration to contain no single points of failure, it For a cluster configuration to contain no single points of failure, it must extend beyond a single hardware cabinet, comply with both the must extend beyond a single hardware cabinet, comply with both the quorum rules and the ServiceGuard configuration rules.quorum rules and the ServiceGuard configuration rules.Cluster in-A-BoxCluster in-A-Box – One unique cluster configuration possibility that – One unique cluster configuration possibility that is enabled by partitioning is the so-called cluster in-a-box. In this is enabled by partitioning is the so-called cluster in-a-box. In this case all the OS instances (nodes) of the cluster are running in case all the OS instances (nodes) of the cluster are running in partitions within the same hardware cabinet. While this partitions within the same hardware cabinet. While this configuration is subject to single points of failure, it may provide configuration is subject to single points of failure, it may provide adequate availability characteristics for some applications and is adequate availability characteristics for some applications and is thus considered a supported ServiceGuard configuration. Users thus considered a supported ServiceGuard configuration. Users must carefully assess the potential impact of a complete cluster must carefully assess the potential impact of a complete cluster failure on their availability requirements before choosing to deploy failure on their availability requirements before choosing to deploy this type of cluster configuration.this type of cluster configuration.

2020


A cluster in-a-box configuration consisting exclusively of VPARS is A cluster in-a-box configuration consisting exclusively of VPARS is susceptible to a wider variety of possible failures, that could result in a susceptible to a wider variety of possible failures, that could result in a complete cluster failure, than is a cluster made up exclusively of nPartitions.complete cluster failure, than is a cluster made up exclusively of nPartitions.

I/O ConsiderationsI/O Considerations – ServiceGuard does not treat OS instances running in a – ServiceGuard does not treat OS instances running in a partition any differently than those running on an independent node. Thus, partition any differently than those running on an independent node. Thus, partitions do not provide any exemptions from the normal ServiceGuard partitions do not provide any exemptions from the normal ServiceGuard connectivity rules (such as redundant paths for heartbeat networks and to connectivity rules (such as redundant paths for heartbeat networks and to storage) nor do they impose any new requirements. There is however a storage) nor do they impose any new requirements. There is however a couple of interesting aspects related to partitioned systems that should be couple of interesting aspects related to partitioned systems that should be noted:noted:– While not a strictly “partitioning” issue per-se, the Superdome platform that While not a strictly “partitioning” issue per-se, the Superdome platform that

supports nPartitions contains its interface cards in an I/O chassis and there can be supports nPartitions contains its interface cards in an I/O chassis and there can be more that one I/O chassis per partition. Since the I/O chassis represents a potential more that one I/O chassis per partition. Since the I/O chassis represents a potential unit of failure, nPartitions redundant I/O paths must be configured in separate I/O unit of failure, nPartitions redundant I/O paths must be configured in separate I/O chassis. Generally speaking, Superdome provides enough I/O capacity that chassis. Generally speaking, Superdome provides enough I/O capacity that ServiceGuard’s redundant path requirement should not constrain the user of ServiceGuard’s redundant path requirement should not constrain the user of partitioning within the cluster.partitioning within the cluster.

VPARS on the other hand must share essentially one node’s worth of I/O VPARS on the other hand must share essentially one node’s worth of I/O capacity. In this case, the redundant path requirement can be a limiting capacity. In this case, the redundant path requirement can be a limiting factor in determining the number of partitions that can be configured on a factor in determining the number of partitions that can be configured on a single hardware platform.single hardware platform.

2121


The use of “combination” cards that combine both network and storage The use of “combination” cards that combine both network and storage can help in some situations. However, redundant paths for a particular can help in some situations. However, redundant paths for a particular device must be split across separate interface cards (for example, using device must be split across separate interface cards (for example, using multiple ports on the same network interface card for the heartbeat lans multiple ports on the same network interface card for the heartbeat lans is not supported).is not supported).Latency ConsiderationsLatency Considerations – As mentioned previously, there is a latency – As mentioned previously, there is a latency issue, unique to VPARS, that must be considered when configuring a issue, unique to VPARS, that must be considered when configuring a ServiceGuard cluster to utilize VPARS.ServiceGuard cluster to utilize VPARS.There are certain operations performed by one partition (such as There are certain operations performed by one partition (such as initializing the boot disk during boot up) that can induce delays in other initializing the boot disk during boot up) that can induce delays in other partitions on the same hardware platform. The net result to partitions on the same hardware platform. The net result to ServiceGuard is the loss of cluster heartbeats if the delay exceeds the ServiceGuard is the loss of cluster heartbeats if the delay exceeds the configured NODE_TIMEOUT parameter. If this should happen, the configured NODE_TIMEOUT parameter. If this should happen, the cluster starts the cluster re-formation protocol and, providing the delay cluster starts the cluster re-formation protocol and, providing the delay is within the failover time, the delayed node simply rejoins the cluster. is within the failover time, the delayed node simply rejoins the cluster. This results in cluster re-formation messages appearing in the This results in cluster re-formation messages appearing in the syslog(1m) file with diagnostic messages from the ServiceGuard cluster syslog(1m) file with diagnostic messages from the ServiceGuard cluster monitor (cmcld) describing the length of the delay.monitor (cmcld) describing the length of the delay.For this reason, it is recommended that clusters containing nodes For this reason, it is recommended that clusters containing nodes running in a VPARS partition, increase the NODE_TIMEOUT parameter running in a VPARS partition, increase the NODE_TIMEOUT parameter to fourteen seconds in order to eliminate cluster reformations caused by to fourteen seconds in order to eliminate cluster reformations caused by latency with the VPARS nodes.latency with the VPARS nodes.

2222

MC/ServiceGuardMC/ServiceGuard Naming Naming ConventionsConventions

A naming convention is a commonly understood pattern that is A naming convention is a commonly understood pattern that is used to name files and directories. The paragraphs below describe used to name files and directories. The paragraphs below describe the convention used for most MC/ServiceGuard installations. the convention used for most MC/ServiceGuard installations. Actual file names and commands will be noted in Actual file names and commands will be noted in boldbold text. text.

All MC/ServiceGuard configuration files must reside in the All MC/ServiceGuard configuration files must reside in the /etc/cmcluster/etc/cmcluster directory. The binary cluster configuration file directory. The binary cluster configuration file should be named should be named cmclconfig.cmclconfig. The ASCII file that is used to create The ASCII file that is used to create the cluster is commonly named the cluster is commonly named cmcluster.ascii.cmcluster.ascii. The ASCII The ASCII configuration file must be edited in order to change the basic configuration file must be edited in order to change the basic cluster parameters. When the cluster is recreated, using cluster parameters. When the cluster is recreated, using cmapplyconfcmapplyconf, the , the cmclconfigcmclconfig file will be recreated. file will be recreated.

For each package that is defined within the cluster, there should For each package that is defined within the cluster, there should be a directory below be a directory below /etc/cmcluster/etc/cmcluster to contain the package to contain the package definition and control files. That directory should be named after definition and control files. That directory should be named after the package name used in the package configuration file. For the package name used in the package configuration file. For example, 3G EAMS the package configuration file is example, 3G EAMS the package configuration file is <package <package name>.confname>.conf. The control file is called . The control file is called <package name>.ctl<package name>.ctl..

2323

MC/ServiceGuard Basic MC/ServiceGuard Basic ProceduresProcedures

The state of the cluster should be verified before, during and after The state of the cluster should be verified before, during and after performing all cluster and package activities. Before any cluster performing all cluster and package activities. Before any cluster or package activities are performed, use the or package activities are performed, use the cmviewcl –vcmviewcl –v command command to verify that the cluster is an appropriate state for the action you to verify that the cluster is an appropriate state for the action you intend to perform. During the execution of any package intend to perform. During the execution of any package commands, monitor the system and package log files. Use the commands, monitor the system and package log files. Use the following commands to monitor package start and stops and to following commands to monitor package start and stops and to monitor cluster activity:monitor cluster activity:

# tail –f /etc/cmcluster/<database directory>/<package # tail –f /etc/cmcluster/<database directory>/<package name>.ctl.logname>.ctl.log

# tail –f /var/adm/syslog/syslog.log# tail –f /var/adm/syslog/syslog.log These two commands should be run simultaneously, each in a These two commands should be run simultaneously, each in a

different window. When all activity has finished, verify the state of different window. When all activity has finished, verify the state of the cluster using the the cluster using the cmviewcl –vcmviewcl –v command. command.

NOTE: The cm commands must be run as root, so most of these NOTE: The cm commands must be run as root, so most of these commands are usually administered by the Systems Administrator.commands are usually administered by the Systems Administrator.

2424

MC/ServiceGuard CommandsMC/ServiceGuard Commands Starting a ClusterStarting a Cluster - When all Systems are UP but all Nodes are Down: Cluster - When all Systems are UP but all Nodes are Down: Cluster

Activity Terminated:Activity Terminated:# cmruncl –v# cmruncl –v# cmruncl –v –n<node-name-l> # cmruncl –v –n<node-name-l> - Use only when the cluster is not running.- Use only when the cluster is not running.MC/ServiceGuard cannot guarantee data integrity if you try to start a cluster MC/ServiceGuard cannot guarantee data integrity if you try to start a cluster with the with the cmruncl –ncmruncl –n command while one or more of the nodes of a cluster is command while one or more of the nodes of a cluster is already running a cluster.already running a cluster.

Adding Nodes to a ClusterAdding Nodes to a Cluster - Use the - Use the cmrunnodecmrunnode command to add one or more command to add one or more nodes to an already running cluster. Any node you add must already be a part nodes to an already running cluster. Any node you add must already be a part of the cluster configuration. The following example adds node <node-name-of the cluster configuration. The following example adds node <node-name-2> to the cluster that was previously started with the 2> to the cluster that was previously started with the cmrunclcmruncl command: command:# cmrunnode –v <node-name-2># cmrunnode –v <node-name-2>Since the cluster is already running, the node joins the cluster and packages Since the cluster is already running, the node joins the cluster and packages may be started on that node. If the node does not find its cluster running, or may be started on that node. If the node does not find its cluster running, or the node is not part of the cluster configuration, the command fails.the node is not part of the cluster configuration, the command fails.

Removing Nodes From a ClusterRemoving Nodes From a Cluster - To halt a node with a running package, use - To halt a node with a running package, use the –f option. If a package was running that could be switched to an adoptive the –f option. If a package was running that could be switched to an adoptive node, the switch takes place and the package starts on the adoptive node. For node, the switch takes place and the package starts on the adoptive node. For example, the following command causes the MC/ServiceGuard daemon example, the following command causes the MC/ServiceGuard daemon running on node <node-name-2>, its adoptive node:running on node <node-name-2>, its adoptive node:# cmhaltnode –f –v <node-name-l># cmhaltnode –f –v <node-name-l>

Returning a Node to a Cluster Returning a Node to a Cluster - To return a node to the cluster, use - To return a node to the cluster, use cmrunnodecmrunnode..

2525

MC/ServiceGuard Commands MC/ServiceGuard Commands (continued)(continued)

Reconfiguring the ClusterReconfiguring the Cluster - To make a permanent change in the cluster - To make a permanent change in the cluster configuration: Halt the cluster on all nodes only if cluster timing parameters configuration: Halt the cluster on all nodes only if cluster timing parameters are being changed, all other changes can be dynamically done on a running are being changed, all other changes can be dynamically done on a running cluster.cluster.• On one node, reconfigure the cluster by editing the cluster definition file:On one node, reconfigure the cluster by editing the cluster definition file:

/etc/cmcluster/cluster.ascii/etc/cmcluster/cluster.ascii• Use the Use the cmcheckconfcmcheckconf command to check the ASCII cluster configuration command to check the ASCII cluster configuration

file. For example:file. For example:# cmcheckconf –v –C /etc/cmcluster/cluster.ascii# cmcheckconf –v –C /etc/cmcluster/cluster.ascii

• Use the Use the cmapplyconfcmapplyconf command to copy the binary cluster configuration command to copy the binary cluster configuration file to all nodes. This file overwrites any previous version of the binary file to all nodes. This file overwrites any previous version of the binary cluster configuration file. For example:cluster configuration file. For example:# cmapplyconf –v –C /etc/cmcluster/cluster.ascii# cmapplyconf –v –C /etc/cmcluster/cluster.ascii

• If the cluster was brought down, use the If the cluster was brought down, use the cmrunclcmruncl command to start the command to start the cluster on all nodes or on a subset of nodes, as desired.cluster on all nodes or on a subset of nodes, as desired.

Note that this procedure is for cluster changes only. For permanent Note that this procedure is for cluster changes only. For permanent package modifications, you would Reconfigure a Package. Also note that in package modifications, you would Reconfigure a Package. Also note that in order to maintain a package definition in the cluster, there must be order to maintain a package definition in the cluster, there must be appropriate references in the appropriate references in the cmcheckconfcmcheckconf and and cmapplyconfcmapplyconf commands: commands:# cd /etc/cmcluster# cd /etc/cmcluster# cmcheckconf –v –C cluster.ascii –P <database directory>/<package # cmcheckconf –v –C cluster.ascii –P <database directory>/<package name>.confname>.conf# cmapplyconf –v –C cluster.ascii –P <database directory>/<package # cmapplyconf –v –C cluster.ascii –P <database directory>/<package name>.confname>.conf

Viewing the Status of the ClusterViewing the Status of the Cluster# cmviewcl -v# cmviewcl -v

2626

MC/ServiceGuardMC/ServiceGuard Commands Commands (continued)(continued)

Halting a PackageHalting a Package - You halt a MC/ServiceGuard package when you wish to - You halt a MC/ServiceGuard package when you wish to bring the package out of use but wish the node to continue in operation. bring the package out of use but wish the node to continue in operation. Halting a package has a different effect than halting a node. When you halt a Halting a package has a different effect than halting a node. When you halt a node, its packages may switch to adoptive nodes, assuming that package node, its packages may switch to adoptive nodes, assuming that package switching is enabled for them. When you halt a package, it is disabled from switching is enabled for them. When you halt a package, it is disabled from switching to another node, and must be restarted manually on another node switching to another node, and must be restarted manually on another node or on the same node. For example, use the or on the same node. For example, use the cmhaltpkgcmhaltpkg command to halt a command to halt a package, as follows:package, as follows:# cmhaltpkg <package name># cmhaltpkg <package name>This command halts the package and disables it from switching to another This command halts the package and disables it from switching to another node.node.

Moving a PackageMoving a Package - Before you can move a package, you must halt it on its - Before you can move a package, you must halt it on its current node using the current node using the cmhaltpkgcmhaltpkg command. This action not only halts the command. This action not only halts the package, but also disables package switching back to the node on which it is package, but also disables package switching back to the node on which it is halted.halted.

After you halt the package you must restart it and enable package switching. After you halt the package you must restart it and enable package switching. You can do this by issuing the You can do this by issuing the cmrunpkgcmrunpkg command followed by the command followed by the cmmodpkgcmmodpkg command can be used with the command can be used with the –n–n option to enable a package to option to enable a package to run on a node if the package has been disabled from running on that node run on a node if the package has been disabled from running on that node due to some sort of error. If no node is specified, the node from which the due to some sort of error. If no node is specified, the node from which the command is issued is the implied node. For example:command is issued is the implied node. For example:# cmhaltpkg –n <node-name-2> <package name># cmhaltpkg –n <node-name-2> <package name># cmrunpkg –n <node-name-l> <package name># cmrunpkg –n <node-name-l> <package name># cmmodpkg –e <package name># cmmodpkg –e <package name>

This procedure is useful when a failover has occurred and you want to push This procedure is useful when a failover has occurred and you want to push the package back to its primary node.the package back to its primary node.

2727

MC/ServiceGuard Commands MC/ServiceGuard Commands (continued)(continued)

Reconfiguring a PackageReconfiguring a Package - To make a permanent change in package - To make a permanent change in package configuration, you must use the following steps:configuration, you must use the following steps:• Halt the packageHalt the package• On the primary node, reconfigure the package by editing the package On the primary node, reconfigure the package by editing the package

configuration file: configuration file: /etc/cmcluster/<database directory>/<package /etc/cmcluster/<database directory>/<package name>.confname>.conf

• To modify the package control script, edit the package control script directly: To modify the package control script, edit the package control script directly: /etc/cmcluster/<database directory>/<package name>.ctl/etc/cmcluster/<database directory>/<package name>.ctl. Any changes in . Any changes in service names will also require changes in the package configuration file.service names will also require changes in the package configuration file.

• Copy the modified control script to all nodes that can run the package.Copy the modified control script to all nodes that can run the package.• Use the Use the cmcheckconfcmcheckconf command to check the ASCII cluster configuration file command to check the ASCII cluster configuration file

and package configuration file. For example:and package configuration file. For example:# cd /etc/cmcluster# cd /etc/cmcluster# cmcheckconf –v –C cluster.ascii –P <database directory>/<package # cmcheckconf –v –C cluster.ascii –P <database directory>/<package name>.confname>.conf

• Use the Use the cmapplyconfcmapplyconf command to copy the binary cluster configuration file to command to copy the binary cluster configuration file to all nodes. This file overwrites any previous version of the binary cluster all nodes. This file overwrites any previous version of the binary cluster configuration file. For example:configuration file. For example:# cmapplyconf –v –C cluster.ascii –P <database directory>/<package # cmapplyconf –v –C cluster.ascii –P <database directory>/<package name>.confname>.conf

NOTE: Remember to reference all package configuration files with NOTE: Remember to reference all package configuration files with separate –P parameters.separate –P parameters.

• Use the Use the cmrunpkgcmrunpkg command to start the cluster on all nodes or on a subset command to start the cluster on all nodes or on a subset of nodes, as desired. The package will start up as nodes come online.of nodes, as desired. The package will start up as nodes come online.

2828

MC/ServiceGuard Commands MC/ServiceGuard Commands (Continued)(Continued)

NOTE:NOTE: For package changes that only involve modifications of the For package changes that only involve modifications of the package control file, package control file, etc/cmcluster/<database directory>/<package etc/cmcluster/<database directory>/<package name>.ctlname>.ctl, it is only necessary to halt that package, make the , it is only necessary to halt that package, make the necessary system changes, modify the package control file and necessary system changes, modify the package control file and distribute it, then restart that package. Changes in the package distribute it, then restart that package. Changes in the package control file that do not affect the package configuration file control file that do not affect the package configuration file include, but are not limited to changes in the package run and/or include, but are not limited to changes in the package run and/or halt commands or logical volume changes in existing package halt commands or logical volume changes in existing package volume groups.volume groups.

mcserviceguard2

Technology

package mcserviceguard

mcserviceguard package

entire cluster mcserviceguard

mcserviceguard software

adoptive node

cluster flexibility

powerplant cluster

cluster daemons