eehr-vmware best practices

Allscripts Enterprise

This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 1


VMware Best Practices

Production Database Server

Last Updated 1:00 PM, January 7, 2013



OVERVIEW

Virtualization of the Allscripts Enterprise EHR Production Database Server is

supported provided that a predictable, well configured environment can be verified

and maintained. To that end, Allscripts only supports production database

virtualization if the below best practices and configuration guidelines for optimal

performance are followed.

The below configuration settings, especially the hardware options, should be taken

as recommendations for best overall performance not absolute requirements and

need to be balanced with available physical assets and technical resources.

SOFTWARE

To meet the technical & performance requirements of the Enterprise EHR and give

the customers the best user experience possible, Allscripts supports virtualization of a production database server environment only if VMware vSphere 5.0 or higher is

used.

HARDWARE

CPU Use processors that support Hardware-Assisted Virtualization, specifically: CPU: VT-x or AMD-V Memory: Intel EPT or AMD RVI I/O: VT-d or AMD-Vi (optional)

Networking Use NICs that support the following options: Checksum offload TCP Segmentation Offload (TSO) Ability to handle 64-bit DMA Addresses Ability to handle multiple Scatter Gather elements per TX frame Jumbo Frames (JF) Large Receive Offload (LRO) If using 10GB NICs:

o NetQueue

o Single-Port NICs should use PCIe x8 (or higher) or PCI-X 266 bus architecture

o Dual-Port NICs should use PCIe x16 (or higher) bus architecture

Storage Use hardware that supports VMware vStorage APIs for Array Integration (VAAI)

Use fully redundant Storage Network (NICs, HBAs, Switches, Front-End Storage Ports, etc.)

Enable Read/Write Caching on Storage

Server BIOS Settings:

Use latest version available



Enable Turbo Boost Enable Hyper-Threading Disable Node Interleaving (Enable NUMA) Enable ALL Hardware-Assisted Virtualization Features (see CPU section above) Disable Cache Prefetching Mechanisms Disable unused hardware (see Recommendations section below)

HOST CONFIGURATION

Disconnect and/or Disable ALL unused and unnecessary system devices including:

Floppy Drivers COM Ports LPT Ports CD-ROM Drives USB Adapters Network Interfaces Storage Controllers

NOTE: Disabling some devices can be complicated and may cause other problems,

so thorough testing of specific changes is recommended.

Use separate virtual switches and physical network adapters for host management

(VMkernel) and Virtual Machine networks.

Use a single vSwitch to optimize internal communication between Enterprise EHR

VMs.

Figure 1.1: ESXi Networking



Virtualization technology has resource overhead requirements needed to manage the

VMs therefore leave at least 4 GB RAM for the physical host.

For hosts supporting Enterprise EHR VMs, set the Power Policy Option to High

Performance.

Figure 1.2: ESXi Power Management Settings

For best performance and availability, the use of Host Clusters with HA and DRS are

recommended. For DRS, it is recommended to use at least the Partially Automated settings. For production-level systems, be cautious about using the Fully Automated settings as it may cause undesired migrations of the VMs.

Figure 1.3: Cluster Settings

Have the VMs contained in a Resource Pool with proper resource reservations.



VMs should be located on optimized shared storage. The optimizations include using

multiple HBAs, high-speed disks, high-speed uncongested data networks.

The settings of the HBAs in your ESXi hosts may need to be adjusted to optimize

their performance. In general, the default settings should be used unless changes

are recommended by the documentation for your specific SAN storage and switches.

VMware recommends changing the Disk.SchedNumReqOutstanding Setting on your

ESXi hosts to match the Maximum Queue Depth of the HBAs. QLogic has a default

Queue Depth of 32 and Emulex uses 30. Reference VMware knowledge base article

1267. Evaluating your specific environment is recommended. Based on the results

of your testing, you may see a benefit by increasing the values to 64.

Figure 1.4: ESXi Software Advanced Settings

VIRTUAL MACHINE CONFIGURATION

Virtual machines (VMs) must meet normal Enterprise EHR configuration standards OS level, Service Packs, Hot Fixes, Application versions, etc.

The latest version of VMware Tools must be installed in the VMs. The VMware Tools

package provides optimized Device Drivers and management features that improve

performance and reliability.



Disconnect and/or Disable ALL unused and unnecessary system devices including:

Floppy Drivers COM Ports LPT Ports CD-ROM Drives USB Adapters

Figure 1.6: Virtual Machine BIOS

VMs should use only version 8 virtual hardware; specifically: VMXNET 3 network

adapter and Paravirtual SCSI Controller.



Figure 1.7: Virtual Machine Hardware

For I/O intensive VMs, especially SQL Servers, Allscripts recommends spreading the

disk I/O across 3 or 4 Paravirtual SCSI controllers (see above screenshot).

When VMs with more than eight vCPUs are used Virtual NUMA is enabled, so make

sure that the total CPU count of the VM is a multiple of the cores per NUMA node on

the physical server. (NOTE: Some multi-core processors have NUMA node sizes

that are different than the number of cores per socket. For example, some 12-core

processors have two six-core NUMA nodes per processor.)

Figure 1.8: Virtual NUMA Settings



In systems with under-committed resources, the ESXi CPU scheduler spreads load

across all sockets by default (if NUMA is disabled). For VMs that exhibit significant

data sharing between CPUs (aka they share cache), you can force the virtual CPUs to

always share the same core. Change the VMs .vmx configuration file: sched.cpu.vsmpConsolidate=TRUE. If NUMA is enabled, the CPU scheduler restricts the CPUs to the same socket.

EntepriseEHR Virtual Machines should always be configured using Thick Provisioned Eager-zeroed disks.

Figure 1.9: Virtual Disk Thick Provision Eager Zeroed

GUEST OS SETTINGS

The following settings are recommended for optimal performance:

Disable Screensavers and Windows animation in ALL VMs.

Disable IE Enhanced Security Configuration

Disable User Account Control

Disable Write Debugging Information on System Failure

Set Power Plan to High Performance

Disabled Scheduled Tasks:

\Microsoft\Windows\Defrag\ScheduledDefrag



\Microsoft\Windows\Registry\RegIdleBackup \Microsoft\Windows\Time Synchronization\SynchronizeTime

STORAGE

The storage requirements and design for the SQL Database Server used for

Enterprise EHR Applications have numerous caveats that are dependent on the

unique characteristics a given customers environment. First a brief overview of the storage architecture supported by VMware is needed.

The operating system, applications and user data of a virtual machine are kept in

one or more virtual SCSI disks. These virtual disk files (or VMDKs) are typically

maintained in a VMFS datastore connected to a physical storage subsystem Direct Attached Storage in a host, Fibre Channel SAN, iSCSI SAN or NAS. VMware also

supports the use of Raw Device Mapping (RDM) which allows the VM to have direct

access to a LUN on the storage (Fibre Channel or ISCSI only).

In vSphere 5.0, a VMFS datastore can be a maximum of 64TB in size but any single

VMDK file can only be 2TB minus 512 bytes. RDMs in physical compatibility mode

can be up to 64TB in size.

For overall best performance, Allscripts recommends using only RDMs (physical

compatibility mode) for the LUNs storing the Enterprise EHR Application SQL data for

the following reasons:

RDMs allow the use of SAN-based snapshots and/or copies.

RDMs are required if you are leveraging Microsoft Failover Clustering that need shared volumes.

Individual RDMs can be up to 64TB in size, so the growth of your database environment can be easily accommodated.

RDMs are easier to use for migrations from physical to virtual systems.

VMOTION

As stated by VMware:

Consider using a 10GbE vMotion network. Using a 10GbE network in place of a 1GbE network for vMotion will result in significant improvements in vMotion

performance. When using very large virtual machines (for example, 64GB or

more), consider using multiple 10GbE network adaptors for vMotion to further

improve vMotion performance.

When configuring resource pools, plan to leave at least 10% of the CPU capacity unreserved. CPU reservations that fully commit the capacity of the cluster can

prevent DRS from migrating virtual machines between hosts.



When using the multiplenetwork adaptor feature, configure all the vMotion vmnics under one vSwitch and create one vMotion vmknic for each vmnic. In the

vmknic properties, configure each vmknic to leverage a different vmnic as its

active vmnic, with the rest marked as standby. This way, if any of the vMotion

vmnics become disconnected or fail, vMotion will transparently switch over to one

of the standby vmnics. When all your vmnics are functional, though, each vmknic

will route traffic over its assigned, dedicated vmnic.

TROUBLESHOOTING

Virtual Machine

A couple for basic items to validate for a VM are that VMware Tools are indeed

installed & running and that it is a part of a HA cluster.

Figure 1.10: Virtual Machine General Settings

The summary screen of a VM gives a basic summary of its performance including

CPU and Memory usage. If Consumed Host CPU or Active Guest Memory is sustained

at a level close the VMs configured quantity then further investigation is warranted.



Figure 1.11: Virtual Machine Resource Consumption

The Resource Allocation Tab is graphical view of the VMs resource utilization.

Figure 1.12: Virtual Machine Resource Consumption

The Performance Tab of a VM provides real-time and historical data about the usage

of all resources CPU, Memory, Disk & Network. The default view provides a 1 Day Summary which gives you a good overview of the VMs health. At the bottom of the default view is a 1 Day Summary of the host the VM resides on for comparison. You

should develop baseline numbers for each different type of server that you manage

for reference so you can better identify abnormal values.

NOTE: Unlike on physical servers, high CPU utilization (70% - 80%) in a virtual

server is normal and desired. High Memory utilization is not an issue as long as it is

not causing the Guest OS to page the memory contents you have to balance good resource utilization with good performance.



ESXi Host

The summary screen of a host gives a basic summary of its performance.

Figure 1.13: ESXi Host Summary Tab

The Performance tab of a host is a good starting point for reviewing system

performance given a desired time range.



Figure 1.14: ESXi Host Performance Tab

By using the Hardware Status Tab, the hosts physical resources can be reviewed for issues.

Figure 1.15: ESXi Host Hardware Status Tab



DRS and/or HA Cluster

At the cluster level, the Hosts tab gives a good summary view of the resource

consumption.

Figure 1.16: vSphere Cluster Hosts Tab

If a DRS Cluster is set to Fully Automated, monitor the value of Total Migrations using vMotion. A high number may indicate a performance hit due to atypically high

numbers of VM migrations because the cluster is attempting to balance its resources.

Figure 1.17: vSphere Cluster Summary Tab

Datastores

The Datastores tab of the vSphere Datacenter can provide an overview of the space

capacity and consumption. VMwares best practice recommendation is to limit each datastore to 80% utilization.



Figure 1.18: vSphere Datacenter Datastores Tab

NOTE: For further troubleshooting guidance, please refer to VMwares website and documentation.

ADVANCED TROUBLESHOOTING

Check for Resource Pool CPU Saturation

Select a Resource Pool; Use the Summary Tab to determine the CPU limit:

Figure 1.19: Performance Troubleshooting Resource Pool CPU Limit

Select the Performance Tab; Select the Advanced option; Switch view to CPU; Select

Usage in MHz Counter; Select all CPU objects.



Figure 1.20: Performance Troubleshooting Resource Pool CPU Saturation

Compare the Usage in MHz value to the CPU Limit setting on the Resource Pool. If

the values are close the pool may be experiencing CPU saturation Additional resources should be allocated to the pool.

If the performance problem is specific to a VM in the Resource Pool, use that VM in

the following steps. If not, repeat the steps for all the VMs in the Resource Pool.

Select the VM; Select the Performance Tab; Select the Advanced option; Switch view

to CPU; Select Usage in MHz Counter for the VM object.

If the Average value is greater than 85% and peaks above 90-95%, then CPU

Saturation is an issue.

Figure 1.21: Performance Troubleshooting VM CPU Ready

Check for an Overloaded Storage Device

Select a Host; Select the Performance Tab; Select the Advanced option; Switch view

to Disk; Select Commands Terminated Counter; Select all Datastore objects.

Any value other than zero indicates an issue with the storage device.



Figure 1.22: Performance Troubleshooting Disk Saturation

Using ESXTOP

ESXTOP is a command line utility used to get real-time performance statistics of a

given ESXi Host. You must connect to the host using SSH (use PuTTY or other SSH

friendly client); however, with VMware ESXi 5.0 or later, SSH is disabled by default

and must be manually enabled.

Select a host; Select the Configuration Tab; Under Software, Select Security Profile;

Under Services, Select Properties; Select SSH, Select the Options Button; Under

Services Commands, Select Start to enable SSH; Select OK.



Figure 1.23: ESXi Host Configuration Enabling SSH

Once SSH has been started, you can connect to the host and run ESXTOP.

ESXTOP CPU

NOTE: All ESXTOP Key commands are case sensitive.

The starting screen for ESXTOP is the CPU utilization panel. You can press V to show only VMs instead of all processes.

Figure 1.24: ESXTOP CPU Utilization

Examine PCPU UTIL(%) line for an unequal load across processor cores with some at

saturation and some remaining near idle. This would indicate applications within the

VM utilizing all of the cores provided to them.



Examine the %RDY field for the percentage of time that a virtual machine was ready

but could not get scheduled to run on a physical CPU. This value should remain

below 5%. Anything greater indicates a problem at the host level - such as not

enough resources available.

Examine the %USED field for the percentage of physical CPU resources used by a

vCPU. If the physical CPUs are running near or at full capacity then ensure that the

CPU utilization per vCPU is less than 80%.

ESXTOP Memory

To access the Memory utilization panel press m. You can press V to show only VMs instead of all processes.

Figure 1.25: ESXTOP Memory Utilization

Examine the MEMSZ field for the amount of physical memory allocated to the VM.

Examine the SZTGT field for the amount of memory the ESXi VMkernel wants to

allocate to the VM.

Examine the SWCUR field for the amount of memory in Megabytes currently being

swapped. This value should always be zero to maintain optimal performance.

ESXTOP Network

To access the network utilization panel press n.

Figure 1.26: ESXTOP Network Utilization



Examine the %DRPTX and DRPRX fields indicate dropped packets. If high values

dropped packets are consistent, thoroughly review the network configuration of all

VMs and especially the hosts.

ESXTOP Storage Adapters

To access the storage adapter utilization panel press d. Press f to add fields; Press j to add Error Stats.

Figure 1.27: ESXTOP Storage Adapter Utilization

Ideally the the DAVG/cmd (device latency) & GAVG/cmd (VM latency) fields should

be 5ms or less; values greater than 20ms may indicate a bottleneck at the switch or

SAN. The KAVG/cmd field should always be zero high values indicate an issue with a device driver and/or with device queue depth. Examine the FCMDs/s field for any

failed commands which may indicate queue saturation or hardware issues.

ESXTOP SCSI Queue Depth

To access the disk device utilization panel press u.

Figure 1.28: ESXTOP Disk Device Utilization

The ACTV field is the current commands in queue; a metric of less than 20 is

excellent. The QUED field is commands waiting to process; any value over zero is

unhealthy.

ESXTOP Virtual Machine Storage

To access the storage adapter utilization panel press v.

Figure 1.29: ESXTOP Virtual Machine Storage Utilization



Examine the LAT/rd & LAT/wr fields for values greater than 5ms which may indicate

a disk configuration issue.

RESOURCES

VMware Web Site:

http://www.vmware.com

VMware vSphere 5.0 Documentation:

http://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html

Performance Best Practices for VMware vSphere 5.0:

http://www.vmware.com/resources/techresources/10220

VMware vSphere vMotion Architecture, Performance and Best Practices in VMware

vSphere 5

http://www.vmware.com/files/pdf/vmotion-perf-vsphere5.pdf

VMware vSphere 5.0 Troubleshooting Guide:

http://pubs.vmware.com/vsphere-50/topic/com.vmware.ICbase/PDF/vsphere-esxi-

vcenter-server-501-troubleshooting-guide.pdf

eehr-vmware best practices

Documents

hardware cpu use processors

optional networking

unused hardware

enterprise ehr vms

hardware options

unauthorized persons

redundant storage network

frontend storage ports