hp proliant bl660c with virtualization optimization ... · vmware virtualization optimization ......

13
Technical white paper HP ProLiant BL660c with VMware virtualization optimization performance report HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I/O-intensive applications Table of contents Executive summary ...................................................................................................................................................................... 2 Disclaimers ..................................................................................................................................................................................... 3 Performance test environment.................................................................................................................................................. 3 Workload characteristics ......................................................................................................................................................... 3 Hardware and software configuration ................................................................................................................................. 4 Performance characterization methodology ..................................................................................................................... 6 Performance results ..................................................................................................................................................................... 6 VMware vSphere 5.1 performance relative to native ....................................................................................................... 6 Scalability of VMware vSphere 5.1 relative to native ........................................................................................................ 7 Technical details on how to improve performance ............................................................................................................. 10 Configure BL660c in high performance mode.................................................................................................................. 10 Configure system and Oracle to use HugePages............................................................................................................. 10 Configure the block devices to use the NOOP or deadline I/O scheduler ................................................................... 11 Configure 8 Oracle DB writers .............................................................................................................................................. 11 Summary ....................................................................................................................................................................................... 12 For more information ................................................................................................................................................................. 13

Upload: hamien

Post on 24-Jun-2018

248 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HP ProLiant BL660c with virtualization optimization ... · VMware virtualization optimization ... HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I ... Operating

Technical white paper

HP ProLiant BL660c with VMware virtualization optimization performance report HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I/O-intensive applications

Table of contents Executive summary ...................................................................................................................................................................... 2

Disclaimers ..................................................................................................................................................................................... 3

Performance test environment .................................................................................................................................................. 3

Workload characteristics ......................................................................................................................................................... 3

Hardware and software configuration ................................................................................................................................. 4

Performance characterization methodology ..................................................................................................................... 6

Performance results ..................................................................................................................................................................... 6

VMware vSphere 5.1 performance relative to native ....................................................................................................... 6

Scalability of VMware vSphere 5.1 relative to native ........................................................................................................ 7

Technical details on how to improve performance ............................................................................................................. 10

Configure BL660c in high performance mode.................................................................................................................. 10

Configure system and Oracle to use HugePages............................................................................................................. 10

Configure the block devices to use the NOOP or deadline I/O scheduler ................................................................... 11

Configure 8 Oracle DB writers .............................................................................................................................................. 11

Summary ....................................................................................................................................................................................... 12

For more information ................................................................................................................................................................. 13

Page 2: HP ProLiant BL660c with virtualization optimization ... · VMware virtualization optimization ... HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I ... Operating

Technical white paper | HP ProLiant BL660c with VMware virtualization optimization performance report

2

Executive summary

HP and Oracle have over 140,000 joint customers worldwide with 35% of all Oracle E-Business Suite (EBS), 39% of PeopleSoft, 33% of Siebel, and 29% of JD Edwards installations being deployed on HP servers. In addition, HP has a 36% database market share on x86 servers.1 Simply by upgrading to the latest HP Converged Infrastructure can result in savings on maintenance costs by up to 75%, server consolidation ratios of up to 20:1, and reduced power and cooling expenses by up to 97% compared to traditional infrastructure.

In addition to the benefits reaped from simply upgrading to HP’s Converged Infrastructure, you will position your environment for further efficiencies by taking advantage of virtualization available through VMware. VMware vSphere 5.1 makes it easier than ever to virtualize demanding applications such as databases and programs that are I/O intensive. Running virtual machines on larger systems allows for more flexibility in distributing the physical server into more virtual machines (VM) to fit your immediate needs as they change.

Database workloads are widely acknowledged to be extremely resource-intensive. The large number of storage commands issued and the network activity to serve remote clients place significant challenges on the platform. The high consumption of CPU and memory resources leaves little room for inefficient virtualization software. The questions often asked are:

• How well will my heavy-duty database application perform in a virtual machine?

• How well will my virtualized environment handle high storage Input/Output Operations Per Second (IOPS) and high network packets/second?

To address concerns regarding virtualizing database solutions, HP conducted a set of tests using a converged infrastructure consisting of HP ProLiant BL660c Gen8 blades with HP 3PAR StoreServ attached storage. We leveraged an Online Transaction Processing (OLTP) workload (described in Workload Characteristics) with the goal of quantifying:

• Performance differential between VMware vSphere 5.1 virtual and native physical environments

• Performance gains and comparisons of a 4 socket versus a 2 socket server

Results from these tests show that even the most intense database applications can be deployed with excellent performance in a virtual environment. Below are some of the key findings:

• A virtual machine with twenty four virtual CPUs (vCPUs) running on a VMware vSphere host with twenty four physical cores (pCPUs), shows the throughput at 87-99% of native physical CPUs on the same hardware platform. (For this test, one database was used on each VM and there was only one VM for each physical system for a precise comparison of virtual to native. It’s understood that many customers desire to use multiple databases, each with their own VM on a larger physical server, possibly 4 processors or larger.)

• When the number of processors was increased from 2 to 4, the virtual machines scaled equally as well as their native counterparts. In both cases the maximum performance gain was around 50% while using 25% less CPU utilization for the same load. With very high I/O intensive applications, the system can’t take advantage of extra CPU as compared to CPU intensive applications.

• When using Oracle RAC, the 4 processor (4P) systems had 83% better throughput overall than the RAC solution using two 2 processor (2P) servers.

On a virtual machine versus its native counterpart we confirmed that large database applications deployed in virtual machines have excellent performance.

By leveraging HP’s years of experience with virtualization and Oracle database technology, especially on ProLiant servers, customers can fully utilize their investments. In addition, HP has decades of experience developing solutions for VMware and Oracle environments. As HP is a known, trusted advisor for many companies, HP can help you get the most out of your application environment. HP is not only investing in acquiring knowledge of our own products, but in thoroughly understanding the applications of our partners to maximize the experience of our customers. HP has captured our extensive experience and knowledge in sizing tools, presentations and white papers detailing diverse approaches and methodologies. This will give you a head start in implementing and virtualizing applications to fit your specific business requirements. Please look at hp.com/go/vmware, hp.com/go/oracle, or hp.com/solutions/activeanswers for more details.

When deployed together, VMware vSphere and HP 3PAR StoreServ Storage deliver a compelling virtual data center solution that increases overall resource utilization, provisioning agility, application availability, administrative efficiency, and a reduction in both capital and operating expenses.

1 2012 Profit Market Share Study

Page 3: HP ProLiant BL660c with virtualization optimization ... · VMware virtualization optimization ... HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I ... Operating

Technical white paper | HP ProLiant BL660c with VMware virtualization optimization performance report

3

Target audience: This document is intended to give technically trained customers, technical advisors, system integrators,

presales consultants, and solution architects a head start in implementing a virtualized environment using VMware and an Oracle or I/O intensive application. Basic knowledge of VMware and in-depth familiarity with Oracle database are required. Please see the For more information section at the end of this paper for links to additional information on these topic areas.

This white paper describes testing performed and completed in April 2013.

Disclaimers

All data is based on in-lab results with a developmental version of VMware vSphere. Prototypes of Intel® processors were used. Our performance is not a guarantee of the exact performance of what will be generally available.

Our throughput is not meant to indicate the absolute performance of Oracle, or to compare its performance to another DBMS. Oracle was simply used to place a DBMS workload on VMware vSphere and to observe and to optimize the performance of a VMware vSphere server.

Our goal was to show virtual performance overhead of VMware vSphere as compared to native or physical performance to understand virtualization’s ability to handle a heavy database workload. Our goal was not to measure the absolute performance of the hardware and software components used in this study.

Performance test environment

This section describes the workload, hardware and software configurations and the performance characterization methodology.

Workload characteristics

Our testing used an OLTP type of workload with a typical read write ratio of 2:1. Our performance characterization was run with a few hundred no think time users directly connected to the database. No think time was used in order to have as high a CPU load as possible. Database size is an important parameter in this performance characterization. A very small database of less than 50 GB leads to lots of caching and a low I/O rate with misleading results. Hence we used a reasonable database size of 600 GB for what we are measuring, producing the right disk I/O rate for the desired level of performance.

The workload is an implementation of an online store with products that are purchased online. It simulates an online e-commerce site. The driver program simulates users logging in, browsing for products by title, actor or category, adding selected products to their shopping cart, and then purchasing those products. We simulated different user counts from 50 to 3000, and found that with higher user counts the I/O contentions increased due to hot blocks causing the system CPU and I/O wait to dramatically increase. Hence we settled with an optimal user count of 1700 for our 2P and 4P tests through trial and error. Secondly in one of the queries, there was a NOCACHE hint which caused many re-reads of the same blocks repeatedly. The re-reads were addressed by removing the NOCACHE hint from the query, which caused the I/O to drop dramatically. The driver program can be launched with a list of parameters and/or using a configuration file. Some of the workload parameters that can be specified are number of driver threads, ramp up rate, think time, and average number of searches per order.

Page 4: HP ProLiant BL660c with virtualization optimization ... · VMware virtualization optimization ... HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I ... Operating

Technical white paper | HP ProLiant BL660c with VMware virtualization optimization performance report

4

Hardware and software configuration

The test bed consisted of a physical or native server machine, a VMware vSphere virtual system, a client machine to drive the performance characterization, and backend storage. The storage had sufficient capacity such that disk latencies were at acceptable levels. Figure 1 shows the connectivity between the various components. The subsequent subsections provide details about the hardware and software configuration. Figure 2 shows the BladeSystem used for testing.

Figure 1. Performance characterization environment

VMwarevSphereserver

NativeVirtual

client

FC switch FC switch

switchc7000

3PAR StoreServ

Figure 2. HP BladeSystem c7000 Enclosure with HP ProLiant BL660c Gen8 servers and Virtual Connect (VC) FlexFabric modules

Page 5: HP ProLiant BL660c with virtualization optimization ... · VMware virtualization optimization ... HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I ... Operating

Technical white paper | HP ProLiant BL660c with VMware virtualization optimization performance report

5

The firmware and software versions used in the test environment are listed in Table 1.

Table 1. Firmware and software levels

Component Version

HP Onboard Administrator 3.60

HP Virtual Connect Manager 3.70

Servers

ProLiant BL660c Gen8

Intel® Xeon® CPU E5-4610 @ 2.40 GHz (6 cores)

64GB memory

Client

ProLiant DL580 G7

Intel® Xeon® CPU X7560 @ 2.26 GHz (8 cores)

128GB memory

Storage

3PAR StoreServ 10400

192 x 15K RPM Fibre Channel Disks

32 x 150K RPM Solid State Disk (SSD)

Hypervisor VMware vSphere 5.1

Operating System Red Hat Enterprise Linux (RHEL) 6.3

Database Oracle 11gR2 (11.2.0.3) Single Instance and RAC

HP Integrated Lights-Out 1.10 (iLO 4)

Network Adapter HP FlexFabric 10Gb 2-port 554FLB Adapter

VC module HP VC FlexFabric 10Gb/24-Port Module

HP Onboard Administrator 3.60

The virtual machine and the native configurations were identical in the sense that the same operating system and DBMS software were used for both the VM and native tests. When 4 physical processors were utilized, a total of 24 cores were available on the ProLiant BL660c Gen8 server blades. The same configuration and scripts were used to set up and run the performance characterization. In both cases large memory pages were used to ensure optimum performance. Virtual machine and native tests were run against the same database. An HP BladeSystem c7000 Enclosure with two HP ProLiant BL660c Gen8 server blades was used for this exercise. Red Hat Enterprise Linux 6.3 was installed on the first blade and VMware vSphere 5.1 was installed on the second blade. A single virtual machine was created on the VMware vSphere host. More than 500GB of virtual volumes were created on the 3PAR storage array and presented to both blades. On the virtual machines these virtual volumes were added as Raw Device Mapping (RDM) devices.

RDM is an option in the VMware server virtualization environment that enables a storage logical unit number (LUN) to be directly connected to a virtual machine (VM) from the storage area network (SAN).

RDM is one of two methods for enabling disk access in a virtual machine. The other method is Virtual Machine File System (VMFS). While VMFS is recommended by VMware for most data center applications (including databases, customer relationship management (CRM) applications and enterprise resource planning (ERP) applications), RDM can be used for configurations involving clustering between virtual machines, between physical and virtual machines or where SAN-aware applications are running inside a virtual machine.

For random workloads, VMFS and RDM produce similar input/output (I/O) throughput. For sequential workloads with small I/O block sizes, RDM provides a small increase in throughput compared to VMFS. However, the performance gap decreases as the I/O block size increases. In this environment we have tested with both RDM and VMFS and achieved similar results.

Oracle Automatic Storage Management (ASM) diskgroups were created on the virtual volumes on which a single instance Oracle database was installed. The database was opened from either the virtual machine or the native machine one at a time and tests were carried out according to our plan. Whether the tests were done with 2 processors (2P) or 4 processors (4P), the same workload was used. While this was done to make the results consistent, it may have had the effect of not stressing the 4P systems enough.

Page 6: HP ProLiant BL660c with virtualization optimization ... · VMware virtualization optimization ... HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I ... Operating

Technical white paper | HP ProLiant BL660c with VMware virtualization optimization performance report

6

Performance characterization methodology

All tests were conducted with the number of pCPUs used by VMware vSphere equal to the number of vCPUs configured in the virtual machine. By fully committing CPU resources in this way we ensure that performance comparisons between VMware vSphere and native are the same.

In an under committed test environment, a virtual machine running on a VMware vSphere host can offload certain tasks, such as I/O processing, to the additional processors beyond the number of its virtual CPUs. For example, when a 4-vCPU virtual machine is run on an 8-pCPU VMware vSphere host, the throughput is approximately 8% higher than when that virtual machine is run on a 4-pCPU VMware vSphere host. In all our tests we have made vCPU=pCPU. If vCPU is less than pCPU then VMware vSphere can offload some tasks to unused processors, which would be unfair to the native or physical tests.

Using fewer than eight cores in the test machine required additional consideration. When configured to use fewer than the available number of physical cores, VMware vSphere round-robins between sockets while selecting cores, whereas native Linux selects cores from the same socket. This would have made comparisons with native unfair in the scaling performance tests. Therefore in the two- and four-CPU configurations (i.e., 2P and 4P) the same set of cores was made available to both VMware vSphere and native Linux by configuring the appropriate number of cores in BIOS.

Performance results

Results from tests executing performance characterizations in both native and virtual machine modes are detailed in the section below.

VMware vSphere 5.1 performance relative to native

In our testing, we have compared the performance of a virtual machine with a native machine for these hardware configurations of disk types and processor counts:

• 2P – Database server with two sockets

• 4P – Database server with four sockets

• SSD – Database mounted on a solid state device

• FC – Database mounted on fibre channel device

Page 7: HP ProLiant BL660c with virtualization optimization ... · VMware virtualization optimization ... HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I ... Operating

Technical white paper | HP ProLiant BL660c with VMware virtualization optimization performance report

7

The main point of Figure 3 is that regardless of whether a 2P or 4P configuration was used, or fibre channel or solid state disk drives were used, the overhead of virtualization versus native is between -1.2 and 5% (-1.2% means that virtualization has 1.2% more throughput than physical).

Figure 3. VMware vSphere 5.1 versus native for different hardware configurations

Here is an example of how to interpret the graph looking at the leftmost blue bar: On a 2P system with SSD drives used for database storage, using VMware virtualization with fibre channel has a 4.3% overhead compared to running the same workload on a native system with the same disk drives and CPU. Looking at the rightmost bar on the graph: A 4P server with FC drives for database storage and VMware virtualization achieved 1.2% better throughput as compared to running the same workload on a native system with the same drives and CPU. While that may seem unrealistic, virtualized systems can outperform their native counterparts. This is possible because multiple runs using the same data on the exact same platform, there are variances in the amount of CPU taken even if the run is done over and over.

Scalability of VMware vSphere 5.1 relative to native

To measure the scalability of a virtual machine versus its native counterpart we performed these tests:

• Increasing the load

• Doubling the processor count from 2 processors to 4 processors.

In all the above tests we observed that a virtual machine scales as well as the native machine.

SSD, 2P, 4.35% SSD, 4P, 4.60% FC, 2P, 4.91%

FC, 4P, -1.21%

Pe

rfo

rman

ce d

elt

a p

erc

en

tage

Processors

Performance Delta Between Native and Virtual using SSD and FC drives

Page 8: HP ProLiant BL660c with virtualization optimization ... · VMware virtualization optimization ... HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I ... Operating

Technical white paper | HP ProLiant BL660c with VMware virtualization optimization performance report

8

Increasing the load

Tests were performed with different user counts on both native and virtual machine. Results from the tests showed that the performance delta between native and virtual is less than 5%.

Figure 4 shows the percentage delta between VMware vSphere 5.1 and native for different user counts ranging from 100 to 1700 on a 2P server. Here’s how to read the leftmost point in the graph: With a workload of 100 users, virtualization has a 6% overhead compared to native. This is a different test compared to the test shown in Figure 3, and thus the overhead for virtualization is different.

Figure 4. VMware vSphere 5.1 versus native for different user counts using a 2P server

From the results we see that for 100 users the delta is about 6% and that increases up to slightly above 10% for 1700 users. We see that as the system gets busier, native starts to have a slightly larger advantage over virtualization. This is because as the number of users increases, there is slightly more overhead for virtual as compared to native. The overhead is still fairly low. If the graph were proportional on the X-axis for number of users, the graph would be much more flat. This is because the spacing from 100-600 users would be 5 spaces versus the one space it is now and the distance between 700 users and 1700 users would be 10 spaces versus the one that it is now.

The increase in overhead is mainly attributable to database concurrency issues where in a larger and larger number of users tries to access the same hot blocks. Also, with a lower number of users, the virtualization overhead is covered by the idle CPUs. Typically, when we increased the load but see no increase in throughput, some events like latch cache buffer chain events and buffer busy wait events shoot up in the top 5 wait events in the Oracle Automatic Workload Repository (AWR) reports. For the operating system. CPU utilization dramatically increases as the system becomes busy handling the hot block contention.

Page 9: HP ProLiant BL660c with virtualization optimization ... · VMware virtualization optimization ... HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I ... Operating

Technical white paper | HP ProLiant BL660c with VMware virtualization optimization performance report

9

Doubling the processors

By increasing the processors from two to four on both native and virtual machines, we witnessed a throughput increase from 22 to 47% with a reduction in CPU utilization of 25%.

Figure 5 shows the performance gains achieved by doubling the processor count for each of the combinations of SSD and FC in both physical and virtual machines. An example of how to read the graph looking at the leftmost blue bar on the graph: When using a system with SSD drives and VMware virtualization, the throughput on a 4P system is 42% more than on a 2P system.

Figure 5. Performance gains by doubling the processor count

When the database is on SSD drives, there is a 40-50% improvement when using 4P instead of 2P. We don’t see 100% scaling because this performance characterization test is I/O bound and the full power of 4P cannot be utilized. When the database is on FC disks, we see only a 22-30% improvement when using a 4P system over a 2P system. Explanation: When using SSD disks on a 2P server (i.e., either of the two leftmost bars on the graph), the SSD are fast so the disk bottleneck is small. This means that a 4P server can take advantage of the more CPU available. So the two leftmost bars show much better performance on 4P versus 2P. For the two rightmost bars, FC disks are slower, so there is more disk overhead. Those two bars show the benefit of using 4P over 2P and it’s much less because in either case the system is waiting for the slow FC disk rather than taking advantage of the 4P CPU bandwidth.

Page 10: HP ProLiant BL660c with virtualization optimization ... · VMware virtualization optimization ... HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I ... Operating

Technical white paper | HP ProLiant BL660c with VMware virtualization optimization performance report

10

In addition, we also performed tests with an Oracle RAC database. For RAC, we performed the tests with both RDM and VMFS and got similar results.

Figure 6. Performance throughput using Oracle RAC and upgrading to 4P on a native server

For Oracle RAC, the native 4P systems had 83% better throughput overall than 2P using double the number of processors. Node 1 did not have as much of an improvement because it had to manage the larger environment. Node 2 had 100% better throughput when using 4P compared to 2P. While 100% better throughput is the goal, this is a heavy I/O intensive database application that doesn’t utilize the CPU as much. Oracle RAC shows better scaling than Oracle Single Instance results in Figures 3, 4, and 5 because Oracle RAC relies not only on high disk I/O, but also on high networking I/O, which takes advantage of the power of 4P better than the power of 2P.

Technical details on how to improve performance

Configure BL660c in high performance mode

If the system is configured to use the intel_idle cpuidle driver, it will completely ignore the BIOS high performance settings and engage higher c-states. To ensure that the BIOS setting is in high performance mode, add the following boot parameters to the kernel:

idle.max_cstate=0 processor.max_cstate=0 idle=mwait

To ensure that the operating system isn’t reducing the CPU frequency during periods of lower load, set the frequency governor used for each CPU to performance:

# echo "performance" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

Configure system and Oracle to use HugePages

For large System Global Area (SGA) sizes, HugePages can give substantial benefits in virtual memory management. Using HugePages, the page size can be increased to anything between 2MB and 1024MB, thereby reducing the total number of pages to be managed by the kernel and the Translation Look Aside Buffer (TLB). Fewer pages will therefore reduce the amount of memory required to hold the page table in memory. In addition to these changes, the memory associated with HugePages cannot be swapped out, which forces the SGA to stay memory resident. The savings in memory and the effort of page management make HugePages mandatory for highly performing Oracle 11g systems running on x86-64 architectures. In our setup we had used 2MB pages.

Note

Automatic Memory Management (AMM) is not compatible with HugePages. Instead, Automatic Shared Memory (ASM) Management and Automatic Process Global Area (PGA) Management should be used as they are compatible with HugePages.

Page 11: HP ProLiant BL660c with virtualization optimization ... · VMware virtualization optimization ... HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I ... Operating

Technical white paper | HP ProLiant BL660c with VMware virtualization optimization performance report

11

To enable HugePages add the parameter:

vm.nr_hugepages to /etc/sysctl.conf

Configure the block devices to use the NOOP or deadline I/O scheduler

Enabling the NOOP IO scheduler helped in achieving the good response time of transactions and also increased TPS count. NOOP scheduler was used for all scheduling. The NOOP scheduler inserts all incoming I/O requests into a simple FIFO queue and implements request merging. The scheduler assumes I/O performance optimization will be handled at some other layer of the I/O hierarchy. The NOOP scheduler is best used with solid state devices such as flash memory or in general with devices that do not depend on mechanical movement to access data.

To change the scheduler, add the following boot parameters to the kernel:

elevator=NOOP

Configure 8 Oracle DB writers

When only one DB writer was configured, free buffer waits were showing up in the Top 5 timed Events of the Oracle AWR report. This indicated that the DB writers are not flushing the dirty buffers in the SGA fast enough. This problem disappeared when the DB writers were increased to 8. By increasing the DB writers to 8 we were able to achieve increase in the throughput by 12%.

For the LGWR and DBW* processes, configure Oracle parameter:

_high_priority_processes

By default, all the Oracle background processes have the same dispatching priority. However the priority of the DBWR and LGWR processes should be adjusted to allow them to go to the head of the CPU dispatch queue on a busy system. By increasing their priorities, there was an increase in the throughput.

Scheduler parameters To tune scheduler parameters to provide for more efficient use of CPUs and reduce CPU migrations:

kernel.sched_min_granularity_ns = 10000000(increase from default 4 msecs to 10 msec)

kernel.sched_latency_ns = 40000000(increase from 20 msecs to 40 msecs)

kernel.sched_wakeup_granularity_ns = 8000000 (increase from 4 msecs to 8 msecs)

kernel.sched_migration_cost = 4000000(increase from 0.5 msecs to 4 msecs)

kernel.sched_nr_migrate = 16(decrease from 32 to 16)

kernel.sched_compatibility_yield = 1(changed from 0 to 1)

Latch contention

The AWR report seemed to show some significant latch contention. The latch contention is likely due to accessing hot blocks in the database due to the SQL statements executed from the workload. We can configure Oracle to reduce the amount of latch spinning that it performs. We can do this by tuning the _spin_count variable. The default is 2000. We tested with the _spin_count of 200. This will not reduce the latch contention, but it will reduce the CPU utilization, which may allow other processes to use the CPU, including the latch owners.

NOCACHE hint The NOCACHE hint in one of the queries that came from the original workload was resulting in high I/O. The NOCACHE hint specifies that the blocks retrieved for the table are placed at the least recently used end of the LRU list in the buffer cache, when a full table scan is performed. This led to the Oracle process doing a lot of reads on the same blocks on disk. By removing the NOCACHE hint, we were able to achieve more than 200% performance improvement and IOPS reduced from 50k to 20k with no multipathing and from 120k to 20k with multipathing.

Multipathing

VMware vSphere 5.1 includes active/active multipath support to maintain a constant connection between the VMware vSphere host and the HP 3PAR Storage array. Three path policies are available: “Fixed”, “Most Recently Used” and “Round Robin”. For HP 3PAR storage, Round Robin is the recommended policy for best performance and load balancing; however, it may not be enabled by default. The path policies can be viewed and modified from the vSphere Web Client. With NOCACHE hint in the query, enabling multipathing gave a 100% boost in the performance, but without the NOCACHE hint in the query, enabling multipathing didn’t result in any performance improvement. This is because the I/O had already dropped drastically when the hint was removed from the query.

Page 12: HP ProLiant BL660c with virtualization optimization ... · VMware virtualization optimization ... HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I ... Operating

Technical white paper | HP ProLiant BL660c with VMware virtualization optimization performance report

12

DISK_ASYNCH_IO

ASM bypasses the file system layer and ASM I/O is entirely controlled by the DISK_ASYNCH_IO parameter. AIO needs to be enabled / disabled by setting the disk_asynch_io parameter value to either true or false.

In order to test this behavior, database writer process DBW0 was traced with strace during the test. The strace of DBW0 process when AIO is used shows io_submit/io_getevents calls. However, in our environment setting the DISK_ASYNCH_IO to TRUE did not show any performance improvement as the system had issues with barrier writes.

NUMA

NUMA is enabled by default in the database starting with 10.2.0.4. When Oracle database software detects that the system is a NUMA enabled machine, it will automatically be enabled at the Oracle level. However, if the system has not specifically been set up and tuned to make use of this architecture, then the Resource Manager may cause unnecessary waits for resmgr:cpu quantum. Since the server used was not NUMA-based architecture, NUMA was disabled by setting the parameter:

_enable_NUMA_optimization=FALSE

Resource Manager

The first of the top five timed events in the Oracle AWR report was resmgr:cpu quantum and it was consuming 65% of DB Time; therefore, we decided to disable the Resource Manager. To disable the Resource Manager, complete the following steps:

• Issue the following SQL statement: ALTER SYSTEM SET RESOURCE_MANAGER_PLAN =

• Disassociate the Resource Manager from all Oracle Scheduler windows

After disabling the Resource Manager, we were able to increase the throughput by 10%.

Redo log switch

The redo log switch was taking place once every three minutes. Frequent log switching, affects the performance of the database. Hence to reduce the frequency, we increased the number of redo log groups from 3 to 5 and increased the size of each member from 500MB to 2GB. After that change, the frequency of log switches came down to once in every 15 minutes.

VMXNET3

We used a VMXNET3 adapter in our setup. The VMXNET3 adapter is the next generation of a paravirtualized NIC designed for performance, and is not related to VMXNET or VMXNET 2. It offers all the features available in VMXNET 2, and adds several new features like multiqueue support (also known as Receive Side Scaling in Microsoft® Windows®), IPv6 offloads, and MSI/MSI-X interrupt delivery.

PVSCSI adapter

PVSCSI adapters were used for accessing the SAN. PVSCSI adapters are high-performance storage adapters that can result in greater throughput and lower CPU utilization. PVSCSI adapters are best suited for environments, especially SAN environments, where hardware or applications drive a very high amount of I/O throughput.

Summary

As a result of these tests using VMware 5.1 on HP ProLiant servers, you can see that virtualization overhead is usually between 1-10% over the same test via native processing. This result is the same if we used 2P or 4P servers, SSD or FC disks, RDM or VMFS storage, and/or Oracle Single Instance Database Servers or Oracle RAC Databases. This testing was performed using an extremely heavy database I/O workload based on Oracle.

In addition, even with a high I/O workload, 4P systems provide substantial performance throughput over 2P workloads as one would expect. This is predictable in a high CPU workload, but with a high I/O workload, there was substantial improvement. With Oracle RAC, the performance throughput is even better for 4P over 2P because the 4P server has the horsepower to manage both the high network I/O, the high disk I/O, the CPU interrupts, and the customer’s CPU workload.

Page 13: HP ProLiant BL660c with virtualization optimization ... · VMware virtualization optimization ... HP ProLiant, HP 3PAR StoreServ, VMware vSphere, Oracle Database and I ... Operating

Technical white paper | HP ProLiant BL660c with VMware virtualization optimization performance report

For more information

HP BladeSystem, hp.com/go/bladesystem

HP BladeSystem Technical Resources, http://h71028.www7.hp.com/enterprise/cache/316682-0-0-0-121.html

HP 3PAR StoreServ Storage, hp.com/go/3par

HP SAN design reference guide: best practices for SAN design, http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00403562/c00403562.pdf

Oracle 11g Database, oracle.com/pls/db112/homepage

VMware, vmware.com/support

hp.com/go/vmware

To help us improve our documents, please provide feedback at hp.com/solutions/feedback.

Sign up for updates

hp.com/go/getupdated

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for

HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as

constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation. Intel and Xeon are trademarks of Intel Corporation in the U.S. and other

countries. Oracle is a registered trademark of Oracle and/or its affiliates.

4AA4-7097ENW, May 2013, Rev. 1