ibm power systems platform: advancements in the state … · ibm power systems platform:...

IBM Power Systemsplatform: Advancementsin the state of the art inIT availability

&

G. T. McLaughlin

L. Y. Liu

D. J. DeGroff

K. W. Fleck

This paper surveys the current state of information technology (IT) availability on the

IBM System p5t server platform, then describes how selected hardware and software

features of the next-generation IBM Powere Systems platform (by which we

specifically mean IBM POWER6e processor-based systems running the IBM AIXt 6

operating system) will enable client IT organizations to more closely approach true

continuous availability. Also presented is information on several IT management

disciplines that are critical to achieving high levels of availability. The objective is to

enable accelerated adoption and success with the new Power Systems platform by

explaining how the technologies can be used to improve IT availability. We define the

underlying dependencies required to implement the new live partition mobility and

the live application mobility features and show how the environment can be best

designed for planned maintenance. Until now, the concept of server virtualization in

the UNIXt environment has been limited to a single server, but the Power Systems

platform extends the virtualization realm. A brief discussion is given comparing

software clustering with the new mobility features and illustrating how they are

complementary.

INTRODUCTION

Many IBM clients have come to rely on the IBM

System p5* platform (servers based on IBM

POWER5* or POWER5þ* processor technology) to

run their key business functions. Its performance,

reliability, and leadership in virtualization make it a

compelling choice. In addition, there is a rich library

of commercially available software from which to

choose. Properly designed infrastructures based on

the System p5 platform are generally capable of

meeting stated availability requirements. However,

clients often struggle with meeting maintenance

demands given constrained IT budgets and changing

availability requirements; as availability needs

�Copyright 2008 by International Business Machines Corporation. Copying inprinted form for private use is permitted without payment of royalty providedthat (1) each reproduction is done without alteration and (2) the Journalreference and IBM copyright notice are included on the first page. The titleand abstract, but no other portions, of this paper may be copied or distributedroyalty free without further permission by computer-based and otherinformation-service systems. Permission to republish any other portion of thepaper must be obtained from the Editor. 0018-8670/08/$5.00 � 2008 IBM

IBM SYSTEMS JOURNAL, VOL 47, NO 4, 2008 MCLAUGHLIN ET AL. 519

increase, so does the importance of properly

maintaining all infrastructure components. While

the current System p5 platform provides capabilities

to mitigate the impact of planned outages on

information technology (IT) availability, it is evident

that these measures are still too disruptive in cases

where availability requirements are very stringent.

The next-generation IBM System p* platform is built

on IBM POWER6* processor hardware and the IBM

AIX* 6 operating system (referred to herein as the

IBM Power* Systems platform) and addresses the

need for nondisruptive maintenance capabilities.

The Power Systems platform provides new work-

load mobility features and numerous hardware and

operating system (OS) resiliency improvements,

allowing planned and emergency maintenance

actions to be completed at any time rather than

having to be scheduled at times of low use, such as

nights and weekends. These new features comple-

ment existing high-availability (HA) technologies to

advance the state of the art in IT availability.

SETTING THE STAGE

It is useful to briefly describe some key concepts as a

prelude to examining the availability characteristics

of the System p5 and Power Systems platforms.

Assessments done by the IBM High Availability

Center of Competency (HACoC) show that base

hardware, OS technologies, and environmental

factors—areas that are traditionally the focus of

availability solutions—account for only about 20

percent of the total unplanned downtime; operations

(process) and applications are responsible for the

remaining 80 percent. In addition, when considering

strategies to maximize availability, we have found

that planned downtime is often overlooked.

As a rule, it is better to design in availability from

the start than to try to retrofit it. Consider the end-to-

end logical view of a typical infrastructure (Figure 1).

For a business to achieve its availability goals, it is

critical to view the system not only with the

traditional component-focused view, but from an

end user’s perspective, by asking, ‘‘What are the

business objectives that drive availability require-

ments?’’ To answer this question, it is necessary to

quantify the costs of an outage and to understand

the capabilities and limitations of current technolo-

gy. If the availability goal is near-continuous

availability, then it must be possible to remove any

infrastructure component without affecting service

delivery; in short, all single points of failure must be

eliminated.

The IT Infrastructure Library*** (ITIL****) v3

Services Design volume defines the term Single

Point of Failure (SPoF) as follows: Any Configura-

tion Item that can cause an Incident when it fails,

and for which a Countermeasure has not been

implemented. A SPoF may be a person, or a step in a

Process or Activity, as well as a Component of the IT

Infrastructure. To that, the IBM HACoC team has

added an informal definition of the term ‘‘counter-

measure’’: An action or solution that will mitigate

the impact of the failure of a Configuration Item to

meet the stated availability goals.

Some of the key terms used throughout this paper

are defined as follows:

� High availability (HA): The attribute of a system to

provide service during defined periods at accept-

able or agreed-upon levels and to mask unplanned

outages from end users� Continuous operations (CO): The attribute of a

system to continuously operate and mask planned

outages from end users� Continuous availability (CA): The attribute of a

system to deliver nondisruptive service to the end

user seven days a week, 24 hours a day (no

planned or unplanned outages)

The relationship of these three terms can be stated

informally as: CA ¼ CO þ HA.

In general, reducing data loss costs more, so an

appropriate trade-off must be made to achieve the

availability goals of the business within budgetary

constraints. Similarly, shorter recovery times are

more costly. Recovery time considerations include

fault detection, network and data recovery, and the

bringing online of servers, middleware, and appli-

cations. The desired point in time by which data

must be restored in order to resume transaction

processing is called the recovery point objective

(RPO). The desired length of time required to restore

IT services following a disruptive event is called the

recovery time objective (RTO).

AVAILABILITY LANDSCAPE

In this section, we review some key features of

System p5 servers, server clustering technologies,

and data mirroring technologies. While applications

MCLAUGHLIN ET AL. IBM SYSTEMS JOURNAL, VOL 47, NO 4, 2008520

and networking are also important contributing

factors to achieving business availability goals, our

focus is on the base infrastructure. We then briefly

cover two load-balancing options and conclude with

a summary of HA configurations.

Virtualization and clustering

Through virtualization and capacity management

features, System p5 servers provide benefits that

help clients efficiently achieve their availability

targets. Detailed information on such features as

logical partitioning, dynamic logical partitioning,

cluster systems management, network installation

management, hardware management console, vir-

tual Ethernet, virtual I/O server, and integrated

virtualization manager is given in References 1 and 2.

The System p5 platform also offers capacity on

demand (CoD) capabilities.3

CoD can be included in

cluster takeover design and capacity, and it helps

address the old complaint that IT availability is

achieved by buying two of everything, with no easy

way to utilize the excess capacity.

One of the characteristics of the System p5 platform

is that there are generally several options to address

any need, including HA clustering. In this section,

we briefly describe IBM PowerHA Cluster Manager

(HACMP*) software,4

IBM Tivoli* System Automa-

tion for Multiplatforms5

(SA MP), and Symantec

Veritas** Cluster Server (VCS) products. These

technologies reduce the duration of an outage with

automated, fast recovery.

HACMP software first shipped in 1991, and it is

recognized as a robust, mature HA clustering

solution. It is also a powerful tool for testing,

maintenance, and monitoring hardware, networks,

and applications. Reference 6 describes best prac-

tices for HA cluster design and also testing,

Managedcontent

Web servicesdirectory

ClientsPresentation services

Application servicesResource managers

Loadbalancer

Gatewayservices

Pervasivedevice

Transcoder

Internet

Browser

Third-partysystems orservices

Public or private networks

Reverseproxyserver

Web server

Contentdelivery

Webapplicationservices

Applicationlogic

Contentmanagement

Directory and security services

Applicationdatabaseservices

Applicationdata

Integrationhub

Packagedsolutions

Customerrelationshipmanagement

Legacyapplications

Enterprisedatabases

Universal layer

Enterprise system management

Enterprise security management

Devicegateway

Userprofiles

Staticcontent

Figure 1Example of a typical IT infrastructure


maintenance, and monitoring in a virtualized world.

Reference 7 covers various clustering options, HA

design considerations for application availability,

and best practices for testing and maintenance.

HACMP software is particularly recommended in

environments that are standardized on AIX.

In the most recent release, which was generally

available in June 2008, the end-to-end component

was separated to become its own product, the

System Automation Application Manager (SA AM).

The base component was renamed System Auto-

mation for Multiplatforms (SA MP). SA MP provides

the capability to manage the availability of applica-

tions running on Microsoft Windows**, AIX, and

Linux** on IBM and non-IBM platforms. SA AM

manages different clustering technologies and pro-

vides a single point of control and management for

heterogeneous mixes of various clustering technol-

ogies, such as HACMP, SA MP base, VCS, SA z/OS*,

and Microsoft Cluster Server (MCS). More details

are available in Reference 8.

VCS is an original equipment manufacturer HA

cluster software product that supports the following

platforms: Sun Microsystems Solaris**, AIX, HP-UX,

Linux (Red Hat or Novell SUSE Linux Enterprise),

and Windows. Details of the Veritas Foundation

Suite for AIX are available in Reference 9.

Data availability

Ensuring data availability is a significant component

of an overall availability strategy. This section

briefly describes several options that are used in IT

infrastructures based on System p servers. Please

consult the references for further details for data and

database mirroring.

The geographic logical volume manager (GLVM) is a

component of AIX that provides real-time geo-

graphic data mirroring over standard TCP/IP

(Transmission Control Protocol/Internet Protocol)

networks. It is built upon the AIX logical volume

manager (LVM) and allows the creation of a mirror

copy of critical data at a geographically distant

location.10

The HACMP extended distance (XD)

feature provides integration of GLVM into an

HACMP cluster environment. If the server cluster

nodes are within approximately 2 to 4 km, then LVM

is a viable solution; otherwise, GLVM should be

considered.

The business continuity (BC) solutions in the IBM

portfolio of system storage resiliency technologies

have been segmented to meet client RPO and RTO

objectives for System p environments with HACMP/

XD. Information on the seven tiers of BC (a mapping

of recovery costs compared with recovery time

objectives) and more details on Metro Mirror (MM)

and Global Mirror (GM) solutions (and Metro Global

Mirror, a combination of the two) can be found in

References 11 and 12. Additional comprehensive

storage solutions can be found in Reference 13. The

MM solution is synchronous and is the preferred

solution when data centers are within approxi-

mately 100 km of one another. MM can be used for

both HA and disaster recovery (DR) purposes. The

GM solution is asynchronous and can be used

between data centers that are separated by long

distances, making it the preferred DR solution.

As its name implies, IBM DB2* HADR can be used

for both HA and DR (when configured in an HA

cluster environment) and provides three data

mirroring modes: synchronous, near synchronous,

or asynchronous. It is an effective solution to mirror

critical DB2 databases. More detailed information,

advantages, and limitations are available in Refer-

ences 12 and 14.

The use of Oracle database products is popular on

the System p platform. The Oracle Real Application

Clusters (RAC) product is a good solution for

clustering database servers with data-sharing capa-

bility. Recommendations and samples of how to

implement an Oracle 9i RAC database with IBM

General Parallel File System* (GPFS*) are available

in Reference 15. Reference 16 provides an overview

of the Oracle Maximum Availability Architecture.

Figure 2 shows how cluster configurations can be

altered to address data availability requirements

over various distances. It is also possible to mix

configurations to create a multisite solution, for

instance, combining MM with GM. Because of the

parallel resources, a multisite solution increases

availability beyond the combined availability of a

one-site infrastructure.

LVM and GLVM are less expensive solutions, but

they use some server CPU cycles because they are

OS-level functions. With its different synchroniza-

tion modes, DB2 HADR could be used for database

mirroring in all cluster configurations.


Load balancing and cluster configuration optionsHA clustering does not completely eliminate outag-

es, but instead reduces the impact of an outage. If

the goal is near-continuous availability, consider

load-balancing options if the costs of an outage

justify the additional investment.

The IBM brand image hinges heavily on the

performance and availability of its complex Web

site. It is a three-site global server load-balancing

implementation that eliminates traditional mainte-

nance windows. All three sites are identical, active

sites with POWER processor-based servers. This

solution is ideal for Web applications with minimal

back-end data synchronization or stateless applica-

tions that do not require any back-end data

synchronization.

Figure 3 shows another load-balancing solution

implemented using the container architecture, in

which a container is an implementation of a

particular function. The interfaces to container 1 and

container 2 are identical, as are the functions for the

two containers, but the implementation of each

container could be in different physical configura-

tions. The front end requires a hardware or software

workload dispatcher, while data synchronization is

accomplished through IBM MQSeries* middleware.

Because the client sessions are not sticky state

(stateless), container load balancing is possible for

both inbound and outbound traffic. This solution

could be implemented in the same server, within the

same data center, or among multiple data centers

(two or more sites) at various distances. The

benefits of this solution include increased availabil-

ity with additional redundancy, easier introduction

of new technologies and applications, less-risky

platform migrations, easy performance testing, and

DR capability.

Having briefly surveyed various server clustering

products, data mirroring options, and load-balanc-

ing solutions, we turn to considering some sample

cluster implementations drawn from real-life client

case studies. Any option that is chosen must take

into consideration the availability goals of the

business, the IT budget, and the applications.

Table 1 illustrates several HA cluster configuration

2 km

Site A Site BCampus clusterLocal cluster

Remote cluster

100 km

Site A Site B Global cluster

1000 km

Site A Site B

Asynchronousreplication

Synchronousreplication

O/S mirroring

Figure 2Clustering and data replication scenarios


options that can be used effectively in servers with

logical partitions (LPARs). Indeed, the recommend-

ed practice is to deploy different cluster types based

on specific processing needs. The integration of HA

clustering with virtualization (particularly dynamic

resource management) allows the achievement of

HA while efficiently utilizing server resources.

WHY IS IMPROVING IT AVAILABILITY DIFFICULT?

Protecting the business from unplanned outages is

only part of the CA equation. Even if hardware and

software never failed, the changes in an IT

environment would still need to be managed.

Change is driven by many causes, including

business growth, evolving requirements, and soft-

ware and hardware life cycles. The infrastructure

must be able to tolerate changes without disruption

to the users.

A proactive maintenance strategy is an obvious way

to manage planned change and reduce the risk of

outages due to defects for which fixes are already

available. Change is constant, and clients who

achieve very high levels of IT availability have

designed infrastructures, supported by effective

management practices, that can tolerate change.

Options exist today to manage service availability

during planned maintenance actions, including

concurrent maintenance features, rolling upgrades

enabled by clustering, and application load balanc-

ing. Yet many clients are still unable to consistently

make changes nondisruptively, so they hesitate to

make changes, even when they are urgently needed.

In order to save money, many clients consolidate

workloads onto a few large servers. If a planned or

unplanned outage occurs on a consolidated server,

Applicationgateways

Applicationgateways

Shared applicationservices

Shared data services

Internet banking environment

Container 1 Container 2

Data center network equipment

Internet

Gen

eral

man

agem

ent s

ervi

ces

R ele

ase

man

agem

ent e

nviro

nmen

t

Inte

rnet

ban

king

man

agem

ent s

ervi

ces

IBM WebSphere*

External applications

Token servers Registrations Filenet Mainframe

Internet banking application servers

Web servers

Internet banking application servers

Web servers

Shared security authorization services

Figure 3Logical design of a container load-balancing solution


significant production workload must be relocated

in a short period. This is complicated by different

business units being hosted on the same server,

leading to scheduling conflicts that often cause

maintenance to be deferred to a planned mainte-

nance window.

Sufficient capacity on the takeover server (HA

clustering) or on redundant servers (load balancing)

is required to support the resource needs of the

additional workloads that are relocated from other

servers. Often, to reduce costs, this capacity is

undersized, resulting in longer recovery times, failed

cluster takeovers, and degraded performance.

With hot-standby clusters, the secondary node is

often undersized to reduce unused capacity. The

workload runs on the secondary node while the

primary node is being restored, at which time the

workload is moved back to the primary node.

Recovery requires two takeovers (failover and fail-

back) to return to the initial state. A mutual takeover

cluster model removes the primary versus second-

ary role (either node can fully support the workloads

running on both), eliminating the second takeover

and reducing the service impact.

Effective HA clustering requires careful planning,

design, and ongoing monitoring and maintenance

after deployment. Too often, clients install their HA

software, do some basic design and configuration,

and expect the solution to take care of itself. Cluster

testing is often overlooked, which leads to takeover

problems that are not discovered until an outage

occurs. Applications managed by HA clusters must

be cluster-aware, that is, able to be shut down,

restarted, and recovered programmatically.

The use of HA solutions should be second nature to

the IT operations staff so that when an unplanned

outage occurs, the time to recover service to the

business is well known. The same is true for

Table 1. HA cluster options with two virtualized System p servers

Server A LPARs Comments Server B LPARs

Application server 1 Load-balanced application servers Application Server 2

� Deploy in dynamic LPARs

Application 2 or DB2primary

Active–passive HA cluster Application 2 or DB2 standbymicropartition

1

� Also known as hot-standby cluster

� Primary node is dynamic LPAR

� Standby node is expanded when failover occurs

Application 3 or DB3 Active–active HA cluster Application 4 or DB4

� Also known as mutual takeover cluster

� Both nodes are dynamic LPARs

� Additional resources allocated to receiving nodewhen failover occurs

Test 1 Quality assurance clusters strongly recommended to testcluster failover and changes in a production-like safeenvironment

Test 3

Test 2 Test 4

Non-production 1 Mixing nonproduction or other low-priority workloadsincreases capacity flexibility and resource utilization

Non-production 3

Non-production 2 Non-production 4

CoD pool CoD allows resource usage on as-needed basis CoD pool


planned outages; migrating workload to another

system should be a documented, tested procedure so

that the decision to act can be made confidently and

based on business needs.

POWER SYSTEMS PLATFORM RESHAPES THEAVAILABILITY LANDSCAPE

Any HA solution that mitigates the impact of

unplanned outages by shutting down resources on

the failing node and restarting them on the backup

node typically demonstrates an observable service

interruption. Still, this approach is typically faster

than manually recovering the failed component.

Many clients have also adopted HA clustering as a

planned maintenance tool, yet they find that the

small service interruptions caused by the takeover

process are still too disruptive. These clients would

undoubtedly welcome a solution that would allow

them to perform maintenance actions nondisrup-

tively.

The new features described below advance the

capability of the Power Systems platform to help

eliminate outages for planned maintenance or

administrative changes. The philosophy behind

these features represents a fundamental shift in

thinking toward availability. When an operation can

be performed dynamically, with no service inter-

ruption, it becomes possible to take proactive

actions to avoid problems as soon as the need is

identified.

Live partition mobility

Live partition mobility (LPM) is the ability to

logically move an active partition between Power

Systems servers. This technology builds upon the

virtualization capability of the IBM POWER Hyper-

visor*17

and allows movement of both AIX- and

Linux-based workloads. Mobile partitions provide

additional capability for workload capacity and

energy management in the IT infrastructure.

Virtual partition memory, the processor compati-

bility register, and processor time-base adjustment

facilities are the key POWER6 processor enhance-

ments that make LPM possible.18

These technolo-

gies are necessary to ensure that the OS and

applications are able to function seamlessly after a

live migration to a different managed system.

When migrating the processor and memory states of

active partitions between different physical systems,

two key challenges are ensuring that there is

adequate capacity (so that the partition definition

can run on the target system) and limiting the time

required to move processing from the source to the

target system (so that no outage is perceived by the

user).19

LPM works by virtualizing all storage and network

resources through a virtual I/O (VIO) server so that

they are accessible on the source and target Power

Systems servers. When a migration is requested, the

hardware management console (HMC) validates

that the target system is capable and the LPAR is in a

proper state for migration. It then creates an

identical partition definition on the target system

and copies the current memory state to the target

partition. As copied memory pages are again

changed on the source, they are tracked to be resent,

and once the memory state is copied and the system

has determined it is ready to perform the switch,

processing is stopped and the processing state

copied and started on the target system. Remaining

dirty memory pages are resent and destination faults

on empty pages are given copy priority. Once the

remaining memory pages are sent to the target, the

migration is complete.20

The only potential inter-

ruption is during the processing steps of the

migration: stop, copy state, and start. This interval

should be short enough to be observable, at most, as

a slight pause during runtime. This slight pause does

not impact the running partition.

There are several planning requirements to enabling

LPM:20

� Systems must be managed by the same HMC� Adequate resources must exist on the target

managed system� All storage is virtualized on an external storage

area network (SAN)� No physical adapters can be used by the mobile

partition� All network and disk access must be virtualized

through VIO servers� Each Shared Ethernet Adapter (SEA) on both VIO

servers is configured to bridge to the same

Ethernet network

Given the virtualized storage requirement, it is not

surprising that the mobile partition cannot own any

physical adapters, as they would not be available on

the target system. Tape and optical drives may also


be virtualized through VIO servers; however, if

physical tape or optical drives are required for

backup operations, the I/O controllers may be added

to the partition and removed prior to migration

using dynamic LPARs. Inactive migrations (or non-

live partition migrations) may be performed on

partitions that own physical I/O. The hardware must

then be verified or removed from the partition

profile prior to partition activation on the target

managed system.20

The VIO server virtualizes both disk and network

resources for use by LPARs and allows an identical

configuration to be created on the target managed

system during a partition migration. In addition to

the general storage virtualization requirements, the

root file system (rootvg) must also be accessed

through the VIO server storage that resides on the

SAN.

The new integrated virtual Ethernet adapter (i.e., the

host Ethernet adapter) is a physical Ethernet port on

the managed system that can be virtualized up to 32

partitions. These ports, while virtualized, are

considered to be physical I/O and cannot be

assigned to mobile partitions.

Partition mobility depends on the storage subsystem

being accessible from both the source and target

systems. These considerations are outside the scope

of this paper; however, it is extremely important

that care and planning must be applied to the design

of the storage infrastructure to protect against

service outages.

Reference 21 provides additional details on deploy-

ing a resilient VIO server configuration that utilizes

SEA takeover, link aggregation, mutipathing I/O,

and dual VIO servers.

Live application mobility

Workload partitions (WPARs) allow the logical

separation of workloads into virtualized partitions

within the AIX OS. The application runtime envi-

ronment is virtualized and is moveable between

other AIX partitions. This enhances the ability to

handle dynamic workloads, capacity management,

and other planned outages. WPARs are a function of

AIX 6 and can be created on any server supported by

this OS. In most cases, applications can run

unmodified in WPARs. LPARs supporting WPARs

may use shared or dedicated processors that will

then be virtualized to the supported WPARs. Live

application mobility (LAM) is the term used to

describe the movement of applications running in

WPARs between managed systems.

Like LPM, LAM does not protect the application

from unplanned outages. Instead, it facilitates

nondisruptive, planned workload relocations. Un-

like LPM, there is no requirement for the systems to

be managed by the same HMC or for the resources

to be virtualized through a VIO server.

The global environment is similar to the main

operating environment of earlier versions of AIX. It

is the part of AIX that owns all resources and does

not belong to the WPAR and it is where WPARs are

managed (WPARs cannot be managed from within a

WPAR). The global environment shares its allotted

system resources, such as processors, memory, and

file systems, with the defined WPARs. Capacity caps

on processor and memory utilization can be defined.

Because each WPAR runs in the global environ-

ment, they share the same OS environment. This

simplifies OS and application software maintenance,

but care is needed to ensure that all applications

deployed and running in separate WPARs under a

single AIX global environment are capable of

running at the same AIX level.

The capacity of all resources assigned to the global

environment, either dedicated or virtualized, needs

to be carefully planned and managed in order to

accommodate active WPARs and the potential

relocation of workload through LAM. For example, a

single Ethernet interface on the global environment

can be virtualized to WPARs, each with its own

unique IP address through IP aliases, but the

networking configuration can be changed only from

the global environment. Since a single adapter can

be shared by many WPARs, there may be a need to

balance WPARs across multiple adapters. The

WPARs can access storage only through file systems

defined in the global environment. WPARs cannot

access or be assigned physical or virtual disks.

There are two types of WPARs, each having

different operating characteristics that are important

to understand. A system workload partition (SWP)

is a near fully capable AIX operating environment

with its own /, /usr, /opt, /var, /tmp, and /home

file systems as well as init and all attendant

services (/usr and /opt can optionally be shared


from the global environment). It is a near clone of

the global environment. When an SWP is started, an

init process is created and subsequent services are

spawned. The SWP can have its own network

environment including inetd service. An applica-

tion workload partition (AWP) is more lightweight

and transient; it is started as soon as the application

is started and is removed when the application

stops. An AWP is started by running the wparexec

command and passing the application start com-

mand as an argument. Once the application task

passed to wparexec completes, the AWP is termi-

nated as well.

The new AIX 6 Workload Partitions Manager*

(WPM) provides a central point of management for

creating, changing, and controlling WPARs across

any number of servers and global environments.22

It

is the point of control for initiating manual WPAR

mobility operations and for setting up automated

WPAR mobility. Based on predefined policies, a

WPAR group can be defined that will relocate

WPARs across multiple servers to meet peak

demands and then return to the original configura-

tion when the peak subsides. This is useful for

managing daily workload peaks or end-of-month

processing needs.

LAM works by checkpointing the application and

restarting it on another LPAR running AIX. This is

coordinated by the WPM or by running commands

directly on the source and target global environment

AIX instances. Since the WPM has a cross-system

view, it can provide performance details on all

active WPARs. Checkpoint and restart are the key

operations for LPM and can be used to balance

workload by initiating a checkpoint and kill opera-

tion followed by a restart operation at a later time.

Complementary technologies for workload

relocation

Clients will often shut down a workload during a

maintenance window rather than use a workload

relocation mechanism because they lack automation

or because of application characteristics. As avail-

ability requirements continue to rise, this practice is

becoming inadequate. Without the ability to relocate

workloads during a planned outage, clients often

choose to delay or even skip software maintenance.

This practice exposes them to unplanned outages

due to missing service updates. LPM and LAM both

provide dynamic workload relocation capability to

help alleviate this problem.

In a typical HA cluster solution, workload relocation

requires that resources, such as the application, IP

service address, storage, and file systems, be shut

down on one cluster node and then reactivated on

an alternate node. The steps to accomplish this are

automated through the clustering framework once a

resource group takeover is initiated. The end-to-end

process can take several minutes or longer, de-

pending on the amount of disk storage, the number

of file systems involved, and the length of time it

takes for the applications to be stopped and started.

User impact during the cluster takeover may be

significant. A switchover of less than two minutes is

not realistic in most production environments.

In Figure 4, the lower portion of the bar represent-

ing the HA cluster takeover illustrates that it actually

provides fast recovery. The application must be

stopped, file systems unmounted, volume groups

varied off (taken offline), and finally the service IP

address (used for cluster communications) is taken

down. The takeover is then performed, and these

steps must be taken in reverse to return the

application to a usable state. Until the application

has completely started and is accepting connections,

the end user is without service. Start-to-finish

takeover times vary widely based on application

stop and restart characteristics and are typically

measured in minutes.

Both LPM and LAM have a much smaller impact on

the end user; depending on the workload, the

interruption may be imperceptible. In the LPM bar,

the only time the application is not running is

between the stop and start processing states when

the partition is being moved to the target managed

system. Similarly, the pause in runtime shown in the

LAM bar is only during the state migration of the

WPAR. In contrast to HA cluster takeover times,

LPM or LAM pauses typically take only a few

seconds.

For LPM, the stop and start processing time is very

short as the partition is migrated between managed

systems, but the complete time to migrate the

partition will depend on the workload and how long

it takes to copy the memory to the target system.

System and network capacity will affect these times.


Unlike traditional HA clustering, it is not necessary

to keep cluster nodes synchronized for planned

maintenance tasks when LPM or LAM is used, but

the consideration applies for unplanned outages. If

the VIO server on the source is modified, similar

modifications must be made on the target server.

LPM and LAM are highly beneficial for planned

maintenance and workload balancing, but they will

not protect against unplanned outages. Both mobil-

ity features require the source and destination

systems to be operational. A situation in which the

source fails unexpectedly (e.g., an application

termination or an LPAR or system crash) requires

the quick failure detection and automated resource

relocation provided by HA clustering software, such

as HACMP.

HACMP 5.4 leverages WPAR features in order to

realize the benefits that WPARs provide for appli-

cation isolation. HACMP runs in the global envi-

ronment and manages the application execution in a

defined and active WPAR, but it does not manage

the WPAR itself. HACMP has created a new resource

group attribute, a WPAR-enabled resource group

that is configured to run in a specific WPAR, one

Application stop Unmount/vary off Mount/vary on Application start

System A System B

Normal runtimeInterruptionPause

IP takendown

IP re-enabled

HMC verification

Partition creation

Copy memory state over Ethernet Complete memory copy

Stop processing Start processing

HA clustering

HACMP takeover complete

Initiatemigration

Migrationcomplete

CheckpointVerify

Restart on target

Delete source WPAR

LAM

LPM

System A System B

System A System B

Relocationcomplete

Initiaterelocation

Verify

Figure 4Comparison of HA clustering, LPM, and LAM (No time reference is implied, and time lines are not to scale)


that supports only the IP service address, file

system, and application-type resources.

The key value of LPM and LAM is to increase

uptime. The capability to move workloads dynam-

ically around the data center makes it possible to

perform firmware maintenance, environmental

maintenance (power or cooling), and other tasks

nondisruptively. Many clients have processes and

plans in place to deal with component failures and

DR, but are still using planned outages to perform

tasks that can now be handled dynamically.

Deciding between LPM and LAM

LPM and LAM are similar because both can move

workloads from one managed system to another

quickly enough that the movement is transparent to

the user. However, they have different require-

ments, and it is important to understand which

technology will best meet business needs. As with

many virtualization technologies, performance im-

plications should be considered.

File system performance is one such consideration.

LPM-enabled partitions must use virtualized storage

on a VIO server which, depending on the VIO server

configuration, could reduce system performance for

I/O-intensive applications. WPARs must use Net-

work File System (NFS) to access application data

and perform checkpoints, which will be limited to

the performance of the NFS server and TCP/IP

network.

While WPARs may offer better CPU and memory

utilization (because they allow management of

multiple workloads to meet the capacity of the

LPAR), this also means a possibility of resource

contention. Mobile LPARs are normal partitions;

they have dedicated resources and are not suscep-

tible to resource contention. By using dynamic

LPARs to adjust CPU and memory capacity to match

the workload, high utilization can also be achieved.

Because all WPARs share the global environment,

they must all share the same fixes, and an OS fault

would cause all WPARs to fail. Because of the

limited isolation of a WPAR, it is possible that one

WPAR may impact another. LPARs offer greater

fault isolation than WPARs.

WPARs would be suitable for tier 1 (Web server) of

a multitier architecture. Where data may be ac-

cessed in a read-only fashion and application

scalability is an important factor in responding to

dynamic workloads, WPARs would also be a good

fit. Mobile LPARs may be the better choice when

dedicated capacity is required for an application,

such as a database.

LPM is a POWER6 processor feature, while LAM is

an AIX 6 feature. If the application to be moved runs

on Linux but not AIX, then LPM would be used.

Additional enhanced availability features

While this paper focuses on new workload reloca-

tion capabilities, the AIX 6 OS and POWER6

processor-based servers include numerous addi-

tional features and enhancements that can improve

IT service availability.

Concurrent firmware maintenance

Concurrent firmware maintenance (CFM) was in-

troduced in firmware level 2.3.0 for the System p5

platform and is also available for the release of the

new POWER6 processor-based Power Systems

platform. CFM requires that the platform be

managed by an HMC. CFM generally eliminates the

need for a system reboot when applying an update

within a release level. In many cases, all of the fixes

provided in a service pack can be activated

concurrently, allowing the client to apply the fixes

without disrupting the managed system. There are a

small number of fixes that cannot be activated

concurrently. The client usually still has the option

to apply the concurrent portions of these fix packs

and delay the remainder until a system reboot can

be scheduled.

When upgrading the firmware on the managed

system to a new release level, a system reboot is still

required. Because of this, some clients choose to

remain on a release of firmware past the end-of-

support date to avoid a reboot. Beginning with

POWER6 processor-based systems, firmware re-

leases will be supported for two years to help ensure

that most clients are able to run for at least one year

on a supported firmware release. It is strongly

recommended to maintain currency and upgrade

firmware levels once a year as available.

Service processor takeover

Service processor takeover helps reduce outages due

to a failure of the service processor hardware or

other hardware failures that could cause a system


outage. A common misconception is that the

redundant service processor provides the mecha-

nism to eliminate outages due to firmware mainte-

nance. In fact, the redundant service processor

provides the ability to handle certain error condi-

tions and keep them from impacting the managed

system by failing over to the other service processor.

On the Power Systems platform, the dual service

processor capability is being installed by default on

all Model 570 8- , 12- , and 16-way systems. This

feature was previously available only on Model 590/

595 and select 570 models.

AIX concurrent kernel updates

AIX 6 includes the capability to apply some kernel

updates without requiring a reboot.23

This helps

reduce the number of planned outages required to

maintain currency of the AIX system. It also helps

reduce the number of unplanned outages by

allowing the more efficient application of available

fixes.

POWER6 processor instruction retry

This feature can shield a running application from a

transient processor fault by retrying the instruction

on the same processor. A solid processor fault

causes the alternate processor recovery capability to

be invoked in order to deallocate the failing

processor and locate the instruction stream on a new

processor core. The new processor can be acquired

through CoD or, if available, an unused spare

processor. If neither of these options is available,

then the new partition availability priority capability

will be used to attempt to terminate lower-priority

partitions in order to acquire a spare processor. The

priority is set by the system administrator so that

lower-tier applications or development workloads

can be impacted in order to protect higher-priority

applications.24

RAS features

The reliability, availability, serviceability (RAS)

capabilities of the POWER platform are extensive

and have been greatly improved for the POWER6

processor release. These capabilities have been

thoroughly documented elsewhere; the following

list provides a few highlights.

� The HMC version 7 offers the capability of remote

management through a Web browser, enhancing

the serviceability and manageability.25

� Additional enhancements to the HMC and service

processor will enable greater reliability and

serviceability.26

� Concurrent maintenance has been further ex-

tended to include additional devices such as GX

Adapters.26

CONCLUSION

Today’s System p5 servers can deliver very high IT

availability when properly configured, deployed,

and maintained. However, client IT organizations

increasingly struggle to meet their users’ growing

availability requirements while dealing with IT

budget constraints, constantly evolving infrastruc-

tures, and decreasing time available for critical

maintenance activities.

The new Power Systems platform addresses this

situation because it provides the following:

� New workload mobility features—LPM and

WPAR-based LAM—that enable administrators to

nondisruptively move workloads between man-

aged systems, creating new opportunities to

maximize IT availability while facilitating the

dynamic management of maintenance and other

changes� Significant enhancements in hardware and soft-

ware RAS� Greater performance with equivalent energy con-

sumption compared with the previous-generation

System p platform

As with any new or improved technology, clients

must analyze the benefits, understand the costs, and

learn how to use the new capabilities effectively.

Once this is done, clients should realize increased

availability and greater flexibility to perform main-

tenance tasks whenever necessary.

When POWER6 and AIX 6 workload mobility

features and RAS enhancements are combined with

existing availability technologies—including HA

clustering software, middleware availability capa-

bilities, and host- and storage-based data mirroring

and replication—extremely high levels of IT service

availability can be efficiently achieved.

ACKNOWLEDGMENTSThe authors thank the following people for their

contributions to this paper: Randall Wilson, Alfredo

Fernandez, Creighton M. Hicks, Cam-Thuy T. Do,


James Wang, Stephan D. Linam, Brent W. Jacobs,

Michael L. Trantow, Terrence T. Nixa, Brian O’Leary,

and Christine O’Sullivan.

*Trademark, service mark, or registered trademark ofInternational Business Machines Corporation in the UnitedStates, other countries, or both.

**Trademark, service mark, or registered trademark ofSymantec Corporation, Microsoft Corporation, Linus Tor-valds, Sun Microsystems, Inc., or Hewlett-Packard Develop-ment Company, L.P., in the United States, other countries, orboth.

***A registered trademark of the Central Computer andTelecommunications Agency, which is now part of the Officeof Government Commerce.

****A registered trademark, and a registered communitytrademark of the Office of Government Commerce, and isregistered in the U.S. Patent and Trademark Office.

CITED REFERENCES AND NOTES1. C. Hales, C. Milsted, O. Stadler, and M. Vagmo, PowerVM

Virtualization on IBM System p: Introduction andConfiguration Fourth Edition, IBM Redbook SG24-7940-03, IBM Corporation (May 2008), http://www.redbooks.ibm.com/redbooks/pdfs/sg247940.pdf.

2. D. Quintero, L. Rosca, M. Vanous, R. Wale, J. Yoshino,and O. Lascu, Virtualization and Clustering Best PracticesUsing IBM System p Servers, IBM Redbook SG24-7349-00,IBM Corporation (May 2007), http://www.redbooks.ibm.com/redbooks/pdfs/sg247349.pdf.

3. Capacity on Demand, IBM Corporation, http://www-03.ibm.com/systems/p/advantages/cod/index.html.

4. IBM PowerHA Cluster Manager (HACMP), IBM Corpora-tion, http://www-03.ibm.com/systems/p/advantages/ha/index.html.

5. Tivoli System Automation for Multiplatforms, IBM Cor-poration, http://www-306.ibm.com/software/tivoli/products/sys-auto-multi/.

6. A. Abderrazag, High Availability Cluster Multiprocessing(HACMP) Best Practices, White Paper, IBM Corporation(January 2008), http://www-03.ibm.com/systems/p/software/whitepapers/hacmp_bestpractices.html.

7. G. McLaughlin, Considerations and Sample Architecturesfor High Availability on IBM eServer pSeries and Systemp5 Servers, White Paper, IBM Corporation (March 2006),http://www14.software.ibm.com/webapp/set2/sas/f/best/arch_ha_Systemp5.pdf.

8. Tivoli System Automation for Multiplatforms, IBM Cor-poration, http://www-306.ibm.com/software/tivoli/products/sys-auto-multi/.

9. A. Govindjee, F. Sherman, J. Littin, S. Robertson, and K.Jeong, Introducing VERITAS Foundation Suite for AIX,IBM Redbook SG24-6619-00, IBM Corporation (December2006), http://www.redbooks.ibm.com/redbooks/pdfs/sg246619.pdf.

10. S. Tovcimak, Using the Geographic LVM in AIX 5L, WhitePaper, IBM Corporation (September 2005), http://www-03.ibm.com/systems/p/os/aix/whitepapers/pdf/aix_glvm.pdf.

11. C. Brooks, C. Leung, A. Mirza, C. Neal, Y. L. Qiu, J. Sing,F. T. H. Wong, and I. R. Wright, IBM System StorageBusiness Continuity Solutions Overview, IBM RedbookSG24-6684-01, IBM Corporation (February 2007), http://www.redbooks.ibm.com/redbooks/pdfs/sg246684.pdf.

12. D. Clitherow, M. Brookbanks, N. Clayton, and G. Spear,‘‘Combining High Availability and Disaster RecoverySolutions for Critical IT Environments,’’ IBM SystemsJournal 47, No. 4, 563–575 (2008, this issue).

13. C. Brooks, F. Byrne, L. Higuera, C. Krax, and J. Kuo, IBMSystem Storage Solutions Handbook, IBM Redbook SG24-5250-06, IBM Corporation (October 2006), http://www.redbooks.ibm.com/redbooks/pdfs/sg245250.pdf.

14. W.-J. Chen, C. Chandrasekaran, D. Gneiting, G. Castro, P.Descovich, and T. Iwahashi, High Availability andScalability Guide for DB2 on Linux, UNIX, and Windows,IBM Redbook SG24-7363-00, IBM Corporation (Septem-ber 2007), http://www.redbooks.ibm.com/redbooks/pdfs/sg247363.pdf.

15. O. Lascu, V. Carastanef, L. Li, M. Passet, N. Pistoor, andJ. Wang, Deploying ORACLE 9i RAC on IBM eserverCluster 1600 with GPFS, IBM Redbook SG24-6954-00, IBMCorporation (October 2003), http://www.redbooks.ibm.com/redbooks/pdfs/sg246954.pdf.

16. S. Djordjevi�c, Oracle’s Maximum Availability Architec-ture, Oracle Corporation, http://www.oracle.com/global/yu/dogadjaji/db11gl_bg/max_avail.pdf.

17. W. J. Armstrong, R. L. Arndt, D. C. Boutcher, R. G.Kovacs, D. Larson, K. A. Lucke, N. Nayar, and R. W.Swanberg, ‘‘Advanced Virtualization Capabilities ofPOWER5 Systems,’’ IBM Journal of Research & Develop-ment 49, No. 4/5, 523–532 (2005).

18. W. J. Armstrong, R. L. Arndt, T. R. Marchini, N. Nayar,and W. M. Sauer, ‘‘IBM POWER6 Partition Mobility:Moving Virtual Servers Seamlessly Between PhysicalSystems,’’ IBM Journal of Research & Development 51,No. 6, 757–762 (2007). http://researchweb.watson.ibm.com/journal/rd/516/armstrong.html.

19. R. L. Arndt, ‘‘Power6 for Partition Mobility,’’ Proceedingsof the Power Architecture Developer Conference, Austin,TX (2007), http://www.power.org/devcon/07/Session_Downloads/PADC07_Arndt_Partition_Mobility.pdf.

20. S. Vetter, M. Harding, N. Itoh, P. Nutt, G. Somers, F.Vagnini, and J. Wain, PowerVM Live Partition Mobility onIBM System p, IBM Redbook SG24-7460-00, IBM Corpo-ration (December 2007), http://www.redbooks.ibm.com/abstracts/sg247460.html.

21. S. Vetter, J. Abbott, R. Bassemir, C. Hales, O. Plachy, andM. Yeom, Advanced POWER Virtualization on IBMSystem p Virtual I/O Server Deployment Examples, IBMRedpaper REDP-4224-00, IBM Corporation (February2007), http://www.redbooks.ibm.com/abstracts/redp4224.html.

22. C. Almond, B. Blanchard, P. Coelho, M. Hazuka, J. Petru,and T. Thitayanun, Introduction to Workload PartitionManagement in IBM AIX Version 6.1, IBM Redbook SG24-7431-00, IBM Corporation (November 2007), http://www.redbooks.ibm.com/abstracts/sg247431.html.

23. S. Vetter, R. Aleksic, I. N. Castillo, R. Fernandez, A. Roll,and N. Watanabe, IBM AIX Version 6.1 Differences Guide,IBM Redbook SG24-7559-00, IBM Corporation (December2007), http://www.redbooks.ibm.com/abstracts/sg247559.html.

24. D. Henderson, B. Warner, and J. Mitchell, IBM POWER6Processor-based Systems: Designed for Availability, White


Paper, IBM Corporation (June 2007), http://www-03.ibm.com/systems/p/hardware/whitepapers/power6_availability.html.

25. S. Hochstetler, J. Min, M. Robbins, N. Milliner, N. Chand,and S. Hidayat, Hardware Management Console V7Handbook, IBM Redbook SG247491-00 18, IBM Corpo-ration (October 2007), http://www.redbooks.ibm.com/abstracts/sg247491.html.

26. J. Mitchell, G. Ahrens, J. Villarreal, and B. Warner, IBMPOWER6 Processor-based Systems: Designing and Imple-menting Serviceability, White Paper, IBM Corporation(June 2007), http://www-03.ibm.com/systems/p/hardware/whitepapers/power6_serviceability.html.

Accepted for publication May 30, 2008.

Grant T. McLaughlinIBM Systems and Technology Group, 2455 South Road,Poughkeepsie, NY 12601 ([email protected]).Mr. McLaughlin recently became the Business ResilienceOfferings manager for Enterprise Systems. He was previouslythe client engagement program manager and a PowerSystemse specialist at the IBM HACoc. He received a B.S.degree in computer science from Syracuse University. He isalso Project Management Institute and ITILe certified. Hisresponsibilities have spanned mainframe software technicalsupport, multiplatform UNIXt software development andsupport, AIXe software development, and softwaredevelopment project and program management.

Leo Y. LiuIBM Systems and Technology Group, 113 Buckden Place, Cary,NC 27518 ([email protected]). Dr. Liu is a certifiedengagement leader and System p platform specialist at theIBM HACoC. He received a B.S. degree in electronicengineering and an M.S. degree in management science, bothfrom the National Chiao-Tung University, Taiwan, and aPh.D. degree in computer science from Pennsylvania StateUniversity. He is PMI and ITILe certified.

Daniel J. DeGroffIBM Systems and Technology Group, 3605 Highway 52 N,Rochester, MN 55901 ([email protected]). Mr. DeGroff is amember of the IBM PowerHA for i development team and hasworked as a technical specialist for the IBM HACoC. Hereceived a B.S. degree in electrical engineering and technologyfrom South Dakota State University. He also received the ITILfoundation certification.

Kenneth W. FleckIBM Systems and Technology Group, 2455 South Road,Poughkeepsie, NY 12601 ([email protected]). Mr. Fleck is anavailability specialist and subject-matter expert with the IBMHACoC and has worked on availability assessments aroundthe world where he specializes in availability on the IBMSystem xe and System pe platforms and the implementationof HA and infrastructure resiliency best practices. &


Published online October 27, 2008.

ibm power systems platform: advancements in the state … · ibm power systems platform:...

Documents