eehr-vmware best practices

21
Allscripts Enterprise This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 1 Allscripts Enterprise VMware Best Practices Production Database Server Last Updated 1:00 PM, January 7, 2013

Upload: orlando-cruz

Post on 06-Nov-2015

31 views

Category:

Documents


2 download

DESCRIPTION

Best Practices EEHR VMware

TRANSCRIPT

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 1

    Allscripts Enterprise

    VMware Best Practices

    Production Database Server

    Last Updated 1:00 PM, January 7, 2013

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 2

    OVERVIEW

    Virtualization of the Allscripts Enterprise EHR Production Database Server is

    supported provided that a predictable, well configured environment can be verified

    and maintained. To that end, Allscripts only supports production database

    virtualization if the below best practices and configuration guidelines for optimal

    performance are followed.

    The below configuration settings, especially the hardware options, should be taken

    as recommendations for best overall performance not absolute requirements and

    need to be balanced with available physical assets and technical resources.

    SOFTWARE

    To meet the technical & performance requirements of the Enterprise EHR and give

    the customers the best user experience possible, Allscripts supports virtualization of a production database server environment only if VMware vSphere 5.0 or higher is

    used.

    HARDWARE

    CPU Use processors that support Hardware-Assisted Virtualization, specifically: CPU: VT-x or AMD-V Memory: Intel EPT or AMD RVI I/O: VT-d or AMD-Vi (optional)

    Networking Use NICs that support the following options: Checksum offload TCP Segmentation Offload (TSO) Ability to handle 64-bit DMA Addresses Ability to handle multiple Scatter Gather elements per TX frame Jumbo Frames (JF) Large Receive Offload (LRO) If using 10GB NICs:

    o NetQueue

    o Single-Port NICs should use PCIe x8 (or higher) or PCI-X 266 bus architecture

    o Dual-Port NICs should use PCIe x16 (or higher) bus architecture

    Storage Use hardware that supports VMware vStorage APIs for Array Integration (VAAI)

    Use fully redundant Storage Network (NICs, HBAs, Switches, Front-End Storage Ports, etc.)

    Enable Read/Write Caching on Storage

    Server BIOS Settings:

    Use latest version available

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 3

    Enable Turbo Boost Enable Hyper-Threading Disable Node Interleaving (Enable NUMA) Enable ALL Hardware-Assisted Virtualization Features (see CPU section above) Disable Cache Prefetching Mechanisms Disable unused hardware (see Recommendations section below)

    HOST CONFIGURATION

    Disconnect and/or Disable ALL unused and unnecessary system devices including:

    Floppy Drivers COM Ports LPT Ports CD-ROM Drives USB Adapters Network Interfaces Storage Controllers

    NOTE: Disabling some devices can be complicated and may cause other problems,

    so thorough testing of specific changes is recommended.

    Use separate virtual switches and physical network adapters for host management

    (VMkernel) and Virtual Machine networks.

    Use a single vSwitch to optimize internal communication between Enterprise EHR

    VMs.

    Figure 1.1: ESXi Networking

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 4

    Virtualization technology has resource overhead requirements needed to manage the

    VMs therefore leave at least 4 GB RAM for the physical host.

    For hosts supporting Enterprise EHR VMs, set the Power Policy Option to High

    Performance.

    Figure 1.2: ESXi Power Management Settings

    For best performance and availability, the use of Host Clusters with HA and DRS are

    recommended. For DRS, it is recommended to use at least the Partially Automated settings. For production-level systems, be cautious about using the Fully Automated settings as it may cause undesired migrations of the VMs.

    Figure 1.3: Cluster Settings

    Have the VMs contained in a Resource Pool with proper resource reservations.

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 5

    VMs should be located on optimized shared storage. The optimizations include using

    multiple HBAs, high-speed disks, high-speed uncongested data networks.

    The settings of the HBAs in your ESXi hosts may need to be adjusted to optimize

    their performance. In general, the default settings should be used unless changes

    are recommended by the documentation for your specific SAN storage and switches.

    VMware recommends changing the Disk.SchedNumReqOutstanding Setting on your

    ESXi hosts to match the Maximum Queue Depth of the HBAs. QLogic has a default

    Queue Depth of 32 and Emulex uses 30. Reference VMware knowledge base article

    1267. Evaluating your specific environment is recommended. Based on the results

    of your testing, you may see a benefit by increasing the values to 64.

    Figure 1.4: ESXi Software Advanced Settings

    VIRTUAL MACHINE CONFIGURATION

    Virtual machines (VMs) must meet normal Enterprise EHR configuration standards OS level, Service Packs, Hot Fixes, Application versions, etc.

    The latest version of VMware Tools must be installed in the VMs. The VMware Tools

    package provides optimized Device Drivers and management features that improve

    performance and reliability.

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 6

    Disconnect and/or Disable ALL unused and unnecessary system devices including:

    Floppy Drivers COM Ports LPT Ports CD-ROM Drives USB Adapters

    Figure 1.6: Virtual Machine BIOS

    VMs should use only version 8 virtual hardware; specifically: VMXNET 3 network

    adapter and Paravirtual SCSI Controller.

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 7

    Figure 1.7: Virtual Machine Hardware

    For I/O intensive VMs, especially SQL Servers, Allscripts recommends spreading the

    disk I/O across 3 or 4 Paravirtual SCSI controllers (see above screenshot).

    When VMs with more than eight vCPUs are used Virtual NUMA is enabled, so make

    sure that the total CPU count of the VM is a multiple of the cores per NUMA node on

    the physical server. (NOTE: Some multi-core processors have NUMA node sizes

    that are different than the number of cores per socket. For example, some 12-core

    processors have two six-core NUMA nodes per processor.)

    Figure 1.8: Virtual NUMA Settings

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 8

    In systems with under-committed resources, the ESXi CPU scheduler spreads load

    across all sockets by default (if NUMA is disabled). For VMs that exhibit significant

    data sharing between CPUs (aka they share cache), you can force the virtual CPUs to

    always share the same core. Change the VMs .vmx configuration file: sched.cpu.vsmpConsolidate=TRUE. If NUMA is enabled, the CPU scheduler restricts the CPUs to the same socket.

    EntepriseEHR Virtual Machines should always be configured using Thick Provisioned Eager-zeroed disks.

    Figure 1.9: Virtual Disk Thick Provision Eager Zeroed

    GUEST OS SETTINGS

    The following settings are recommended for optimal performance:

    Disable Screensavers and Windows animation in ALL VMs.

    Disable IE Enhanced Security Configuration

    Disable User Account Control

    Disable Write Debugging Information on System Failure

    Set Power Plan to High Performance

    Disabled Scheduled Tasks:

    \Microsoft\Windows\Defrag\ScheduledDefrag

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 9

    \Microsoft\Windows\Registry\RegIdleBackup \Microsoft\Windows\Time Synchronization\SynchronizeTime

    STORAGE

    The storage requirements and design for the SQL Database Server used for

    Enterprise EHR Applications have numerous caveats that are dependent on the

    unique characteristics a given customers environment. First a brief overview of the storage architecture supported by VMware is needed.

    The operating system, applications and user data of a virtual machine are kept in

    one or more virtual SCSI disks. These virtual disk files (or VMDKs) are typically

    maintained in a VMFS datastore connected to a physical storage subsystem Direct Attached Storage in a host, Fibre Channel SAN, iSCSI SAN or NAS. VMware also

    supports the use of Raw Device Mapping (RDM) which allows the VM to have direct

    access to a LUN on the storage (Fibre Channel or ISCSI only).

    In vSphere 5.0, a VMFS datastore can be a maximum of 64TB in size but any single

    VMDK file can only be 2TB minus 512 bytes. RDMs in physical compatibility mode

    can be up to 64TB in size.

    For overall best performance, Allscripts recommends using only RDMs (physical

    compatibility mode) for the LUNs storing the Enterprise EHR Application SQL data for

    the following reasons:

    RDMs allow the use of SAN-based snapshots and/or copies.

    RDMs are required if you are leveraging Microsoft Failover Clustering that need shared volumes.

    Individual RDMs can be up to 64TB in size, so the growth of your database environment can be easily accommodated.

    RDMs are easier to use for migrations from physical to virtual systems.

    VMOTION

    As stated by VMware:

    Consider using a 10GbE vMotion network. Using a 10GbE network in place of a 1GbE network for vMotion will result in significant improvements in vMotion

    performance. When using very large virtual machines (for example, 64GB or

    more), consider using multiple 10GbE network adaptors for vMotion to further

    improve vMotion performance.

    When configuring resource pools, plan to leave at least 10% of the CPU capacity unreserved. CPU reservations that fully commit the capacity of the cluster can

    prevent DRS from migrating virtual machines between hosts.

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 10

    When using the multiplenetwork adaptor feature, configure all the vMotion vmnics under one vSwitch and create one vMotion vmknic for each vmnic. In the

    vmknic properties, configure each vmknic to leverage a different vmnic as its

    active vmnic, with the rest marked as standby. This way, if any of the vMotion

    vmnics become disconnected or fail, vMotion will transparently switch over to one

    of the standby vmnics. When all your vmnics are functional, though, each vmknic

    will route traffic over its assigned, dedicated vmnic.

    TROUBLESHOOTING

    Virtual Machine

    A couple for basic items to validate for a VM are that VMware Tools are indeed

    installed & running and that it is a part of a HA cluster.

    Figure 1.10: Virtual Machine General Settings

    The summary screen of a VM gives a basic summary of its performance including

    CPU and Memory usage. If Consumed Host CPU or Active Guest Memory is sustained

    at a level close the VMs configured quantity then further investigation is warranted.

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 11

    Figure 1.11: Virtual Machine Resource Consumption

    The Resource Allocation Tab is graphical view of the VMs resource utilization.

    Figure 1.12: Virtual Machine Resource Consumption

    The Performance Tab of a VM provides real-time and historical data about the usage

    of all resources CPU, Memory, Disk & Network. The default view provides a 1 Day Summary which gives you a good overview of the VMs health. At the bottom of the default view is a 1 Day Summary of the host the VM resides on for comparison. You

    should develop baseline numbers for each different type of server that you manage

    for reference so you can better identify abnormal values.

    NOTE: Unlike on physical servers, high CPU utilization (70% - 80%) in a virtual

    server is normal and desired. High Memory utilization is not an issue as long as it is

    not causing the Guest OS to page the memory contents you have to balance good resource utilization with good performance.

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 12

    ESXi Host

    The summary screen of a host gives a basic summary of its performance.

    Figure 1.13: ESXi Host Summary Tab

    The Performance tab of a host is a good starting point for reviewing system

    performance given a desired time range.

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 13

    Figure 1.14: ESXi Host Performance Tab

    By using the Hardware Status Tab, the hosts physical resources can be reviewed for issues.

    Figure 1.15: ESXi Host Hardware Status Tab

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 14

    DRS and/or HA Cluster

    At the cluster level, the Hosts tab gives a good summary view of the resource

    consumption.

    Figure 1.16: vSphere Cluster Hosts Tab

    If a DRS Cluster is set to Fully Automated, monitor the value of Total Migrations using vMotion. A high number may indicate a performance hit due to atypically high

    numbers of VM migrations because the cluster is attempting to balance its resources.

    Figure 1.17: vSphere Cluster Summary Tab

    Datastores

    The Datastores tab of the vSphere Datacenter can provide an overview of the space

    capacity and consumption. VMwares best practice recommendation is to limit each datastore to 80% utilization.

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 15

    Figure 1.18: vSphere Datacenter Datastores Tab

    NOTE: For further troubleshooting guidance, please refer to VMwares website and documentation.

    ADVANCED TROUBLESHOOTING

    Check for Resource Pool CPU Saturation

    Select a Resource Pool; Use the Summary Tab to determine the CPU limit:

    Figure 1.19: Performance Troubleshooting Resource Pool CPU Limit

    Select the Performance Tab; Select the Advanced option; Switch view to CPU; Select

    Usage in MHz Counter; Select all CPU objects.

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 16

    Figure 1.20: Performance Troubleshooting Resource Pool CPU Saturation

    Compare the Usage in MHz value to the CPU Limit setting on the Resource Pool. If

    the values are close the pool may be experiencing CPU saturation Additional resources should be allocated to the pool.

    If the performance problem is specific to a VM in the Resource Pool, use that VM in

    the following steps. If not, repeat the steps for all the VMs in the Resource Pool.

    Select the VM; Select the Performance Tab; Select the Advanced option; Switch view

    to CPU; Select Usage in MHz Counter for the VM object.

    If the Average value is greater than 85% and peaks above 90-95%, then CPU

    Saturation is an issue.

    Figure 1.21: Performance Troubleshooting VM CPU Ready

    Check for an Overloaded Storage Device

    Select a Host; Select the Performance Tab; Select the Advanced option; Switch view

    to Disk; Select Commands Terminated Counter; Select all Datastore objects.

    Any value other than zero indicates an issue with the storage device.

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 17

    Figure 1.22: Performance Troubleshooting Disk Saturation

    Using ESXTOP

    ESXTOP is a command line utility used to get real-time performance statistics of a

    given ESXi Host. You must connect to the host using SSH (use PuTTY or other SSH

    friendly client); however, with VMware ESXi 5.0 or later, SSH is disabled by default

    and must be manually enabled.

    Select a host; Select the Configuration Tab; Under Software, Select Security Profile;

    Under Services, Select Properties; Select SSH, Select the Options Button; Under

    Services Commands, Select Start to enable SSH; Select OK.

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 18

    Figure 1.23: ESXi Host Configuration Enabling SSH

    Once SSH has been started, you can connect to the host and run ESXTOP.

    ESXTOP CPU

    NOTE: All ESXTOP Key commands are case sensitive.

    The starting screen for ESXTOP is the CPU utilization panel. You can press V to show only VMs instead of all processes.

    Figure 1.24: ESXTOP CPU Utilization

    Examine PCPU UTIL(%) line for an unequal load across processor cores with some at

    saturation and some remaining near idle. This would indicate applications within the

    VM utilizing all of the cores provided to them.

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 19

    Examine the %RDY field for the percentage of time that a virtual machine was ready

    but could not get scheduled to run on a physical CPU. This value should remain

    below 5%. Anything greater indicates a problem at the host level - such as not

    enough resources available.

    Examine the %USED field for the percentage of physical CPU resources used by a

    vCPU. If the physical CPUs are running near or at full capacity then ensure that the

    CPU utilization per vCPU is less than 80%.

    ESXTOP Memory

    To access the Memory utilization panel press m. You can press V to show only VMs instead of all processes.

    Figure 1.25: ESXTOP Memory Utilization

    Examine the MEMSZ field for the amount of physical memory allocated to the VM.

    Examine the SZTGT field for the amount of memory the ESXi VMkernel wants to

    allocate to the VM.

    Examine the SWCUR field for the amount of memory in Megabytes currently being

    swapped. This value should always be zero to maintain optimal performance.

    ESXTOP Network

    To access the network utilization panel press n.

    Figure 1.26: ESXTOP Network Utilization

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 20

    Examine the %DRPTX and DRPRX fields indicate dropped packets. If high values

    dropped packets are consistent, thoroughly review the network configuration of all

    VMs and especially the hosts.

    ESXTOP Storage Adapters

    To access the storage adapter utilization panel press d. Press f to add fields; Press j to add Error Stats.

    Figure 1.27: ESXTOP Storage Adapter Utilization

    Ideally the the DAVG/cmd (device latency) & GAVG/cmd (VM latency) fields should

    be 5ms or less; values greater than 20ms may indicate a bottleneck at the switch or

    SAN. The KAVG/cmd field should always be zero high values indicate an issue with a device driver and/or with device queue depth. Examine the FCMDs/s field for any

    failed commands which may indicate queue saturation or hardware issues.

    ESXTOP SCSI Queue Depth

    To access the disk device utilization panel press u.

    Figure 1.28: ESXTOP Disk Device Utilization

    The ACTV field is the current commands in queue; a metric of less than 20 is

    excellent. The QUED field is commands waiting to process; any value over zero is

    unhealthy.

    ESXTOP Virtual Machine Storage

    To access the storage adapter utilization panel press v.

    Figure 1.29: ESXTOP Virtual Machine Storage Utilization

  • Allscripts Enterprise

    This page contains Allscripts proprietary information and is not to be duplicated or disclosed to unauthorized persons. 21

    Examine the LAT/rd & LAT/wr fields for values greater than 5ms which may indicate

    a disk configuration issue.

    RESOURCES

    VMware Web Site:

    http://www.vmware.com

    VMware vSphere 5.0 Documentation:

    http://www.vmware.com/support/pubs/vsphere-esxi-vcenter-server-pubs.html

    Performance Best Practices for VMware vSphere 5.0:

    http://www.vmware.com/resources/techresources/10220

    VMware vSphere vMotion Architecture, Performance and Best Practices in VMware

    vSphere 5

    http://www.vmware.com/files/pdf/vmotion-perf-vsphere5.pdf

    VMware vSphere 5.0 Troubleshooting Guide:

    http://pubs.vmware.com/vsphere-50/topic/com.vmware.ICbase/PDF/vsphere-esxi-

    vcenter-server-501-troubleshooting-guide.pdf