virt usiness-critic f omponents - cognizant€¦ · virt usiness-critic . f omponents. separ ?ssf s...

8
Virtualizing Business-Critical Applications: Foundational Components Separating “run-the-business” from other business applications and then identifying the IT infrastructure necessary to ensure their high availabil- ity, scalability and performance are a must for organizations that seek to reap the greatest operational benefits from emerging virtual computing architectures. Put another way, not succeeding at getting the most complex and compute-intensive workloads to thrive in virtual infrastructure such that they are as easily deployed as any other application is one of the greatest barriers to achieving the goal of the SDDC. One Size Does Not Fit All When most organizations first deploy virtual infrastructure environments, they do so with the goal of reducing their data center footprint by consolidating server workloads onto fewer hard- ware components. This results in immediate and tangible savings. Then, over time, they begin to realize that the average virtual infrastructure environment, when properly tuned and managed, will provide notably higher levels of availability for those applications running on them. When combined with the initial cost savings achieved, organizations are often drawn to virtualize as much as they can. … And then they hit the wall. Executive Summary It should come as no surprise that the jour- ney to the software defined data center (SDDC) requires fundamental shifts in how applications are deployed and managed. To fully realize the vision of SDDC, organizations must first embrace the fact that the journey includes not only moving 100% of their servers into the virtual world, but also 100% of the storage and network components that support them. As a practical matter, this becomes a journey that is far from easy. Getting all applications migrated into a virtual infrastructure platform alone requires new skills and ways of managing capacity. In addition, licensing issues require spe- cial attention as vendors also stay current with the idea that compute workloads will no longer be directly tied to physical hardware components. But most important to this journey is understand- ing and successfully migrating the most business- critical applications onto virtual infrastructure such that they not only function well, but thrive. cognizant 20-20 insights | august 2013 Cognizant 20-20 Insights

Upload: others

Post on 21-May-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Virtualizing Business-Critical Applications: Foundational ComponentsSeparating “run-the-business” from other business applications and then identifying the IT infrastructure necessary to ensure their high availabil-ity, scalability and performance are a must for organizations that seek to reap the greatest operational benefits from emerging virtual computing architectures.

Put another way, not succeeding at getting the most complex and compute-intensive workloads to thrive in virtual infrastructure such that they are as easily deployed as any other application is one of the greatest barriers to achieving the goal of the SDDC.

One Size Does Not Fit AllWhen most organizations first deploy virtual infrastructure environments, they do so with the goal of reducing their data center footprint by consolidating server workloads onto fewer hard-ware components. This results in immediate and tangible savings. Then, over time, they begin to realize that the average virtual infrastructure environment, when properly tuned and managed, will provide notably higher levels of availability for those applications running on them. When combined with the initial cost savings achieved, organizations are often drawn to virtualize as much as they can.

… And then they hit the wall.

Executive SummaryIt should come as no surprise that the jour-ney to the software defined data center (SDDC) requires fundamental shifts in how applications are deployed and managed. To fully realize the vision of SDDC, organizations must first embrace the fact that the journey includes not only moving 100% of their servers into the virtual world, but also 100% of the storage and network components that support them.

As a practical matter, this becomes a journey that is far from easy. Getting all applications migrated into a virtual infrastructure platform alone requires new skills and ways of managing capacity. In addition, licensing issues require spe-cial attention as vendors also stay current with the idea that compute workloads will no longer be directly tied to physical hardware components.

But most important to this journey is understand-ing and successfully migrating the most business-critical applications onto virtual infrastructure such that they not only function well, but thrive.

cognizant 20-20 insights | august 2013

• Cognizant 20-20 Insights

cognizant 20-20 insights 2

Virtualization and virtual infrastructure environments do add a layer of abstraction of resources, and this abstraction layer changes the way in which applications can be run. But the way in which virtual infrastructure environments create this abstraction layer is exactly the same regardless of the applications running in that environment.

The first time a business-critical application requires higher levels of availability — or far greater compute resources — than is traditionally made available on basic virtual infrastructure, problems quickly arise. At first, the business-critical application runs slowly and can become much more unstable. It is then moved back to its original physical infrastructure at least as quickly as it was moved onto the virtual infrastructure environment in the first place. Then the virtual environment is blamed.

To be fair, the virtual infrastructure environment actually is to blame when this happens. But that’s usually due to a combination of the way the virtual infrastructure environment was config-ured and how the business-critical application was then deployed on top of it. Generally speaking, it’s not because virtual infrastructure platforms are ill equipped to handle these applications.

What Makes an Application Business-Critical?

Of note, the qualities that make an application business-critical often have little to do with the technology or platform said application uses. In the end, business-criticality is best determined by answering a simple question: Can I run my business without this application? From there, a corollary question emerges: How long can I run my business without this application?

If the answers to these questions are “no,” or “not for very long,” then that application is critical to the business.

Nevertheless, most business-critical applications share key technological characteristics. They include:

• High compute loads – either with heavy thread-ing or heavy math processing.

• High RAM utilization.

• High and specialized I/O – particularly storage.

• High availability configurations – often requir-ing OS or application clustering.

• Complex networking configurations – public and private networks, often to support clustering.

Applications with any of these qualities will need extra care and attention to configuration and resource management in order to virtualize them successfully. Moreover, the majority of applica-tions that do fall into the business-critical cate-gory have more than one of these qualities in play.

Because every application has something unique about the way it runs in any given environment, it’s easy to quickly reach a conclusion that every application will then have its own set of best prac-tices that need to be explicitly defined to make that application thrive in a virtual infrastructure environment. In reality, this is not actually the case. The fact is that virtu-alization and virtual infra-structure environments do add a layer of abstraction of resources, and this abstrac-tion layer changes the way in which applications can be run. But the way in which virtual infrastruc-ture environments create this abstraction layer is exactly the same regard-less of the applications run-ning in that environment. Thus, there exists a set of common practices that must be accounted for that will enable every business-critical application to run successfully on virtual infrastructure. What’s actu-ally different is the way in which these common ele-ments are expressed. This expression is indeed as unique as any application.

Virtualization software vendor VMware identi-fies the following six key applications that are considered business-critical:

• Oracle – and Oracle RAC.

• Microsoft SQL Server.

• Microsoft Exchange.

• Microsoft SharePoint.

• SAP.

• Custom Java on Linux.

Most organizations run at least one of these six applications; all exhibit a subset of at least some of the characteristics listed above. Again, while they are not the only business-critical applications in use at most organizations, the independent research commissioned by VMware shows they are the most common ones. In addition, a second and less often found set of applications exist that businesses will often identify as business-critical. Again, these applications also share qualities that can make virtualization more difficult.

cognizant 20-20 insights 3

As the Star Trek character Mr. Scott — or “Scotty” — once put it, “The more complicated the plumbing, the easier it is to stop up the drain.”

These “honorable mention” business-critical apps:

• DB2.

• WebSphere.

• WebLogic.

• Hadoop/HBase.

• Cassandra.

• Tomcat.

• Message queue systems such as Tibco, Rabbit MQ, MQ Series, etc.

• Custom, in-house built and maintained “home-grown” applications.

Again, each of these applications will have specific, individual ways in which they should be tuned to thrive on a virtual infrastructure platform. This is no different than how they are optimized when running on bare metal hardware. But compute resources themselves are very consistent. Therefore, if an organiza-tion properly accounts for how an application will make use of its compute resources, common themes begin to emerge.

The Four Food Groups of ComputingWhen planning a virtual infrastructure envi-ronment, architects are taught to consider the following four types of compute resources, which are sometimes referred to as the “four food groups” of computing:

• CPU.

• RAM.

• Disk – including both disk space and disk I/O.

• Network – including number of connections and bandwidth.

All applications (not just business-critical ones) consume different quantities of these compute resources at any given point in time depend-ing on the tasks at hand. The difference is that most business-critical applications will consume disproportionate amounts of one or more of these resources compared with other applica-tions. They also will have requirements for higher levels of redundancy, availability and recoverabil-ity compared with other applications. Remember, we answered “no” and “not for very long” to the questions about if, and for how long, we could run the business without these applications.

The following set of general guidelines will help organizations assemble applications that thrive on virtual infrastructure:

• AlwaysfollowtheKISSprinciple: As the Star Trek character Mr. Scott – or “Scotty” – once put it, “The more complicated the plumbing, the easier it is to stop up the drain.” There is elegance in simplicity of design. But more than that, simple designs are generally more stable, more scalable and easier to maintain. Business-critical applications are already inherently more complex, so adding complexity when virtu-alizing them only makes things worse. Examples of mistakes in this area include:

» Needlessly adding disks and spreading them across multiple data stores. Just because your physical server splits out a separate drive letter for each class of data, logs, etc., isn’t necessarily a reason to do the same in a virtual world. More than one disk — and even more than one data store — is often necessary, but take an eyes-open approach that stresses less rather than more.

» Splitting out base files that are part of a virtual machine’s (VM’s) core compo-nents, including vswap and others, is not an effective way of increasing efficiency, per-formance or storage management. Sadly, it is a good way of introducing complexity, loss of function and loss of portability into your environment.

» Duplicating features for high availability or redundancy through external or home-grown tools that are already present in the base systems or architecture. This often leads to managing or implementing abstracted features that don’t actually do what they are intended to. They also make troubleshooting more difficult.

• Architect hardware from a “total perfor-mance” perspective: Your virtual environ-ment should always be optimized from bottom to top – not top to bottom or from the middle out. High school and college students seem to be the most willing to put $6,000 stereos into $3,000 cars. This doesn’t work nearly as well with high-compute, business-critical

cognizant 20-20 insights 4

applications running on general class hardware with virtual infrastructure on top of it. Even

though vSphere will support the so-called “monster VM” with 64 vCPUs , 1TB of RAM and a million IOPS, no VM can truly be bigger or faster than the host hardware on which it runs. Make sure all hardware components that are part of the virtual infra-structure environment are appropriately sized to han-dle the anticipated work-loads placed on top of them.

Be sure to also optimize resources across all four of the computing food groups. It’s easy today to become distracted by CPU cores and GHz speeds of the newest generation processors, and then forget about RAM — the

compute resource that is almost always exhausted first on a virtual infrastructure environment. From a storage perspective, make sure to spread I/O appropriately across your storage area network (SAN). Take appropriate advantage of solid state drive (SSD) and cache capabilities to boost perfor-mance, and do so in a way that is easy to replicate. For IP SAN technologies – iSCSI and NFS – jumbo frames should be enabled as the norm.

From a network perspec-tive, Gig-E connections are no longer enough. With today’s price/performance advantages, 10GbE should be the minimum standard for all network connectiv-ity in virtual infrastructure environments. Reserve Gig-E connectivity for out of bandwidth management of hardware only. As stan-dards evolve and prices recede, plan your network investments wisely to be ready to take advantage of 40GbE and 100GbE. These standards will likely creep into your data center faster than anyone expects.

• Understand specific compute needs: Remember, each application will use resources uniquely, but also predictably. The key is to translate how any application would use resources when running on native hardware to the way these would be used when abstracted into the virtual world.

For CPU utilization, assigning more CPU cores is not necessarily better. In fact, assigning too many vCPUs will slow performance. If an application has eight vCPUs but only four vCPUs worth of work to do, it will force the hypervisor to find a way to schedule four cores

High school and college students

seem to be the most willing to put

$6,000 stereos into $3,000 cars. This doesn’t work

nearly as well with high-compute,

business-critical applications

running on general class hardware

with virtual infrastructure on

top of it.

With today’s price/performance advantages, 10GbE should be the minimum standard for all network connectivity in virtual infrastructure environments.

Business Critical Application Optimization Methodology

Figure 1

ApplicationOriented

Optimization

Virtual Infrastructure

Oriented Optimization

Physical HardwareServer, Storage, Network

HypervisorResource Pools, HA, DRS, Data Stores, Parameter Tuning

Operating SystemParavirtual Drivers, Kernel Parameter Tuning (Linux)

Virtual Machine HardwareOptimize RAM, vCPU, Storage, Resource Limits & Reservations

Java ApplicationResource Allocation, App Tunables

Java Virtual MachineHeap Size, Threads, …

ApplicationCache, SGA, RAM

Commitment App Specific Tunables

Optimize Bottom to Top

cognizant 20-20 insights 5

on the processor that are servicing those vCPUs to do nothing. Heavily threaded applications tend to use more cores while those which crunch num-bers use fewer cores and more cycles.

When it comes to RAM, allocate based on what the application will actu-ally use. Also be sure to set memory reservations for

that RAM which will be needed. For example, an Oracle database server should have a memory reservation that is equal to the size of the OS plus the SGA. For Java applications, an appro-priate memory reservation would include the OS plus the Java heap size as well as a couple of other smaller items. If necessary, it’s prefer-able as a good practice to have ever so slightly more RAM assigned for these, as opposed to slightly less. However, it’s also good practice to keep memory reservations as small as practi-cal. Making them too large will interfere with the ability to vMotion a VM from one host to another (by extension impeding the workload balancing capabilities of VMware Distributed Resource Scheduler), complicate HA admis-sion control in the event of a host failure (interfering with HA recovery) or even prevent the VM from being able to start at all.

Storage is arguably the most complex of all of the resources to manage because it is the component in virtual infrastructure that itself is almost always abstracted in multiple lay-ers and in widely varying ways depending on the make and model of storage system used. As a result, it is also the area where applica-tion performance problems tend to arise first and most frequently. As a general rule, stor-age capabilities should be pushed as low in the hardware stack as practical. That stated, if a given storage system doesn’t have a feature needed or desired, implement and integrate these features at other layers while taking care to not add undue complexity. Make sure that individual components are not easily overwhelmed, just as you would when archi-tecting shared storage for high-capacity I/O systems and applications. Align these capabili-ties so they are easily identified and presented

in standard data stores so your applications using them remain just as logically configured.

Finally, use raw disk mappings (RDMs) as a last resort only. With today’s virtual infrastructure systems, there is no performance advantage to using an RDM over a virtual disk located in a properly configured data store. Further, RDMs will add natural complexity to your virtual infrastructure environment from both a con-figuration and system management perspec-tive. Where feasible, use the OS-level storage systems – such as ASM on Oracle – as recom-mended by respective application vendors, but layered on top of the optimized storage envi-ronment that is created.

Networks should be kept as simple as possible. There’s no need to do things like vNIC teaming and bonding inside a VM in almost every con-ceivable situation. This is already handled by the hypervisor. Instead, use one virtual network interface controller (NIC) for each distinct network to which you need to connect. For example, a typical Oracle RAC node will need two vNICs: one for the public network and one for the private network. The SCAN and associated virtual IPs do not need a vNIC.

• BuildVMstobetrans-parent and simple: When building virtual machines, less is defi-nitely more. If you know you will never need a specific feature, you’re probably better off not installing it. Just as is the norm with any OS build, turn off unneces-sary services and follow the best practices for hardening the OS in question. The goal here is to have a “squeaky-clean” OS on the VM that feels the same to the application as it would on any other optimized environment.

• Storageshouldappearassimple,localdisks, and networks should appear as simple connec-tions – because all of the optimization of these

The key is to translate how any application would

use resources when running on native

hardware to the way these would

be used when abstracted into the

virtual world.

Storage is arguably the most complex of all of the resources to manage because it is the component in virtual infrastructure that itself is almost always abstracted in multiple layers and in widely varying ways depending on the make and model of storage system used.

cognizant 20-20 insights 6

components has already been accomplished within the virtual infrastructure environment itself.

Then Take AdvantageOnly after the virtual environment is optimized should your organization be truly concerned about taking full advantage of its unique benefits and features. At this point, your organization should be able to do so easily. But for business-critical applications, there is still more work to do.

High Availability: When (Not) to Cluster

Business-critical applications naturally have requirements for very high availability and recov-erability. In many cases, the enhanced availability provided by a well-engineered virtual infrastruc-ture platform will meet this need. When it does, certain high-availability configurations — system clustering in particular — that are a must for physi-cal infrastructure deployments can be eliminated. Understanding when and when not to cluster, as well as how to best accomplish it, can depend greatly on the capabilities of the application in question, but there are some common guidelines.

First, a properly engineered vSphere HA/DRS cluster can be expected to reliably achieve somewhere between three nines and four nines of availability for all systems running on it. By comparison, traditional database clustering techniques used by the likes of Oracle RAC and Microsoft SQL Cluster Services are intended to provide only three nines of availability at best in and of themselves. To achieve higher levels of availability requires work at the application layer.

What this means is that, unless something explic-itly is performed at the application layer to enhance availability (which is actually not all that common), a properly optimized vSphere HA/DRS cluster can provide equal or better levels of avail-ability than clustering at the OS layer can. This is an excellent opportunity to consider simplifying some clustered systems.

… But before running off to destroy every cluster in the data center, consider that systems are often clustered for reasons beyond availability. It’s not unusual for clustered systems to be active-active, or to be clustered to minimize downtime during patches – also known as rolling upgrades. If these kinds of operations are part of your organization’s regular maintenance, clustering is still required. Thus, the key to knowing when to cluster systems

on virtual infrastructure is to fully understand the specific application require-ments, and then validate if the requirements hold up when migrating to a virtual infrastructure environment.

When clustering on top of virtual infrastructure, the high-availability features of each layer should be opti-mized to complement one another. At the same time, your organization should avoid clustering techniques that might interfere with infrastructure layers above and below. Operating sys-tem clusters on virtual infrastructure will gener-ally require that shared disk is used between the individual nodes (voting and quorum drives) and usually involve one of four methods:

• Shared via RDM.

• Shared via iSCSI or NFS on SAN/NAS.

• Shared via multi-write virtual disk.

• Shared via iSCSI or NFS target VM.

While all of these options can be made to work well, they have distinct advantages and disadvantages. Sharing via RDM is the oldest and most well known, but provides the least advantages and greatest limitations. With this option, VMs in a cluster use an RDM to share data. While well-known, this option also introduces a condition of SCSI bus sharing into the cluster between the nodes. Migrating VMs via vMotion is not supported in this configuration, so VMs are fixed to whichever host they are running on unless and until restarted on another node. Data on the shared disk is also kept in a different for-mat using the native file system of the OS on a LUN as compared to on a virtual disk in a data store. This can impact how data is protected. Of all of the options, share via RDM provides the least amount of flexibility and should be used only when other methods are not available.

Share via iSCSI or NFS on SAN/NAS resolves the issue of SCSI bus sharing, thus enabling sup-port for vMotion on cluster nodes. However, this option is not available when using FC SAN stor-age systems. Organizations with an investment in FC SAN may not wish to change the storage

The key to knowing when to cluster systems on virtual infrastructure is to fully understand the specific application requirements, and then validate if the requirements hold up when migrating to a virtual infrastructure environment.

cognizant 20-20 insights 7

A very simple way to share a disk is to

share via a multi-write virtual disk. This option allows all data to remain

in virtual disk files on a data store.

infrastructure just to enable this method. Finally, this option has the same issues of data protection differences that are present with share via RDM.

A very simple way to share a disk is to share via a multi-write virtual disk. This option allows all data to remain in virtual disk files on a data store. Here, the shared virtual disk is located in a folder where all cluster nodes can access it. It is formatted Eager Zeroed Thick and the multi-

write flag is set, allowing all VMs to write to it at will. There are distinct advantages to this method. It is easy to set up, allows for vMotion and makes data protection consistent. Its primary drawbacks are that the shared virtual disk is associated with more than one virtual machine, so data protec-tion systems must account for this, and that the

host HA/DRS cluster where such a configuration is running can have no more than eight ESXi host systems.

Also, disks that have multi-write flags set on them can have support issues with certain vStorage API based backup tools. While it is expected that future versions of vSphere should address this issue, be sure that your data protection systems take this into account.

The iSCSI/NFS Gateway VM method is growing in popularity because it resolves almost all of the limitations of the others. Here, an additional VM is configured as an iSCSI or NFS target to re-share the SAN storage over a private virtual net-work. This VM can be a single vCPU, which means vSphere Fault Tolerance can be used to increase its availability. The nodes of the OS cluster then use the iSCSI or NFS share provided by the target VM for their shared storage (see Figure 2).

DatabaseNode

VM Disk VM Disk VM Disk VM DiskVM Disk

DatabaseNode

Guest to Guest iSCSI Disk Sharing

GatewayShared iSCSI Disk

DatabaseNode

iSCSIGateway

VMwareFault

Tolerance iSCSIGateway

(FT Clone)

ESXi ESXi ESXi

SAN Infrastructure

vSphere Datastore

ESXi

Highlights:All Storage is VMDK on SAN

iSCSI Gateway virtualizes and re-shares diskover VM Network (Virtual SAN on SAN)

HA, DRS, and FT work together

All Systems can be vMotioned

Portable to any vSphere architecture

Virtualization Schematic

Figure 2

Physical Disk

VMware, Inc.

Gateway Shared Disk

iSCSI Gateway VM Configuration

About Cognizant

Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process outsourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 50 delivery centers worldwide and approximately 164,300 employees as of June 30, 2013, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world.

Visit us online at www.cognizant.com for more information.

World Headquarters

500 Frank W. Burr Blvd.Teaneck, NJ 07666 USAPhone: +1 201 801 0233Fax: +1 201 801 0243Toll Free: +1 888 937 3277Email: [email protected]

European Headquarters

1 Kingdom StreetPaddington CentralLondon W2 6BDPhone: +44 (0) 207 297 7600Fax: +44 (0) 207 121 0102Email: [email protected]

India Operations Headquarters

#5/535, Old Mahabalipuram RoadOkkiyam Pettai, ThoraipakkamChennai, 600 096 IndiaPhone: +91 (0) 44 4209 6000Fax: +91 (0) 44 4209 6060Email: [email protected]

© Copyright 2013, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.

About the AuthorChristopher (Chris) A. Williams is a Director of Cognizant Virtual Solutions, within CBC-ITIS Enterprise Computing’s Infrastructure Technology Management Services Practice. In this role, Chris is responsi-ble for designing and developing innovative virtual infrastructure, private and hybrid cloud solutions, optimizing business critical applications and database systems including Oracle RAC, DB2, SQL Server clusters, MySQL and Sybase. Chris has an M.B.A., information systems emphasis, from the University of Colorado, and a bachelor of science degree, with aerospace science and management emphasis, from Metropolitan State University of Denver. He can be reached at [email protected].

This configuration allows all nodes – and even the iSCSI Gateway – to be vMotioned, works with every supported vSphere storage system and can be used on HA/DRS clusters with more than eight nodes. It also clearly associates the shared disk with a specific VM. The primary drawback of this configuration is that it also is arguably the most complex to both set up and maintain. Also, when the iSCSI/NFS target is made fault tolerant, its disk is marked Eager Zeroed Thick and the multi-write flag is set. If using a vStorage API based tool, organizations may need to add a script to temporarily disable vSphere Fault Tolerance when backing up this VM.

Regardless of the clustering methodology used, anti-affinity policies between the various cluster nodes is a must. This ensures that no two nodes will run on the same physical host at the same time, and thus defeat one of the high-availability purposes of clustering. This is true even for share via RDM configurations because, in the event of a host failure, VMware HA will follow DRS rules for placement when deciding where to restart the failed cluster node.

Looking Forward Business-critical applications have special com-pute needs that go well beyond those of other systems usually found in virtual infrastructure. When not carefully attended to, this can cause these applications to perform poorly and deliver reduced functionality. Fortunately, while each application expresses how it consumes resources differently, the four food groups of computing are always involved. As a result, common methods and themes arise when abstracting infrastructure for these applications.

Properly configured, mission-critical applications can thrive on virtual infrastructure, gaining the same benefits of performance, consistency, avail-ability and recoverability as all other systems. Understanding how each application uses avail-able compute resources is the key to successfully virtualizing business-critical applications, and accelerating the journey to both cloud computing and the software defined data center.