large, high density vmware esx server platformsdownload3.vmware.com/vmworld/2006/adc4439.pdf ·...

Large, High Density VMware ESX Server Platforms

Tony Kay

Systems Virtualization Manager, Sun Microsystems Inc.

Agenda

IntroductionA Fat Node - Quick OverviewVirtualization Motivators - Today

Today and TomorrowWhy Fat Nodes?Node Selection - An OverviewConclusion

Introduction

Introduction Abstract - A Reminder

What are the advantages and considerations of using large processor count ESX Server hosts? As VI3 moves deeper into the data center and is deployed to support more critical applications with higher SLAs, 4-socket systems have become commonplace, with correspondingly higher consolidation ratios. This session investigates the implications of using larger enterprise-class 6, 8, and 16 core systems. For example, larger x86 systems may have different performance characteristics and may exhibit NUMA characteristics that need to considered. In addition high consolidation ratios must be accounted for in architecting for maintenance of SLAs.

Caveats

The Bad NewsThe X4600 used here as an illustrative platform was not on the ESX Server 3.0.1 HCL at the time of “going to press”

The Good NewsThe X4600 will be on the HCL very shortly in both Opteron Rev E. & Rev F configurations (There's the first benefit, modular design supports multiple CPU models within a single chassis life cycle...)Sun has numerous X4600 POCs underway with major Fortune 200 companies from a wide cross section including:• Major Systems Integrators• Finance (many)• Manufacturing (Automotive, NEPs, etc)• ISVs and Service providers• Universities and academia• Transportation• Retail etc...

A Fat Node…

Quick Introduction - SunFire x4600

SunFire x4600

Enterprise Data Management, ERP, Virtualization & Server

Consolidation, HPC

Compute4 to 16 way SMPUp to 128GB memory

I/OOver 12 GB/sec uni-dir of I/O6 PCI-E slots (40 lanes)4 10/100/1000 Ethernet ports4 SAS 2.5” Disks (with RAID)

RedundancyDual redundant hot swap power supplies and fans

ManagementLights for all FRUs. IPMI 2.0, remote KVM/floppy/CDROM with dedicated 10/100 EthernetSolaris, Linux, Windows supportVMware 3.0.1 imminently

Virtualization Motivators

Why do people Virtualize?

Please re-order to suit… but typically1) Server Sprawl: particularly Microsoft 1 instance/1 Application

2) Legacy OS and application support:

3) SLAs: consolidate yet maintain and/or enhance SLAs

4) Environmental issues: heat, power, cooling, footprint

5) Utilization: raise average platform utilization

6) Disaster Recovery

7) TCO: Support costs for aging servers

8) Simplify platforms and infrastructure

9) Flexibility: time to deploy, agile, dynamic data center

10) Security

1) Server Sprawl

Particularly but not just limited to Microsoft environments1 application per OS instance (DLL Hell, scalability etc)Solaris & Linux less susceptible• Also *nix offers stacked virtualization e.g. Containers within a VM

Small ESX Server hosts will just lead to unnecessary “ESX Server Sprawl”VM Sprawl coming...

Analysts already noting issues hereVirtualCenter and ease of creation/cloningOpen Source Operating Systems allow “free” deployment

High Consolidation ratios can help reduce past and future sprawlIf SLAs and availability can be managed and met

2) Legacy OS and Server Support

Today for many end user virtualization is about “legacy problems”Supporting “old” OEs (NT 3.51, NT4, Netware, Win 2K...)Often light/low utilization – even on “ancient” hardware

Going forward VMware becoming a strategic deployment platform“I want x% of all new x86 deployments on VMware...”

Fast time to deployFlexibility (cloning, encapsulation, roll back capabilities...)Enhanced Operations (backup, patching, DR)

Some IT Directors, CIOs, Data center Managers looking at > 50% of “new” deployments to be “virtualized. This means

Bigger VMs, more vCPUs, more memory, more I/OHeavier workloads, light to medium databases, messaging, hostingMore mission critical, higher SLAs

3) SLAs and HA - A Virtualization Paradox?

Pressure to enhance SLAs yet lower costsOne common objection “8 sockets/16 cores is too big, too many VMs”NB Spreading VMs around on “little boxes” is not a strategy for SLAs

SLAs come through well thought out methodologies and practices• Classify workloads by SLAs (e.g. 3 categories) virtual & non Virtual• Non Virtual can be for performance, scalability or availability etc• Hybrid Environments may play a role – mix 2, 4 and 8 socket platforms

(Stick with same CPU steppings...)Use Clustering capabilities both virtual and non virtual

Solaris/Linux physical clustersMicrosoft physical 2 physical, physical to virtual, virtual to virtual

Use VI 3's new featuresVMware HA – A building block towards application availabilityVMware DRS – Familiarise yourself with affinity rules functionality

Quick Recap: VMware DRS

Dynamic and intelligent allocation of hardware resources to ensure optimal alignment between business and IT

Dynamic balancing of computing resource pools across VI3 hosts

Intelligent resource allocation based on pre-defined rules

Can be a component in maintaining SLAs and systems availability

Resource Pool

Business Demand

How Does VMware DRS Work

Initial PlacementPower on virtual machine in resource poolRecommend host (prioritized list)

Dynamic BalancingMonitor key virtual machines, pool and host metricsDeliver entitled resources to pools and VMsRecommend migrations (prioritized list)

Goal of VMware DRSBalance Virtual Machines across ESX Server hosts within clusterEnforce resource policies accuratelyRespect placement constraints• Affinity and anti-affinity rules• VMotion compatibility (CPU type, SAN and LAN connectivity)

VMware DRS Cluster Constraints

Anti-affinity rulesRun virtual machines on different hostsMotivation: avoid resource contention

Affinity rulesRun virtual machine on the same hostMotivation: locality

Can be a component in maintaining SLAs and systems availability

VMware HA

FunctionalityAutomatic restart of VMs after ESX Server host failure

Cost effective starting point for recoveryA building block in an SLA strategy

Does not recover VM state itself• Restarts VM allowing recoveryDoes not recover application• VM should initiate recovery• e.g. Roll backs, redo logs etc.

Why Fat Nodes?

Common Objections

Big x86 Servers “don’t scale”Look at TSUBAME 655 X4600 (10,000 Opterons)• 7th Most powerful SuperComputer in the May top500• Most powerful Opteron Cluster, Cray Red Storm different architecture• Next week, at SuperComputing 06 watch out for next “TBA”

“Large” (> 4 socket) X86 Servers have poor memory & I/O characteristicsIntel SMPs relatively weak (Intel doesn’t make > 4 socket chipset)AMD Opteron NUMA architecture - 8 socket Glueless• ESX Server 2.5 and ESX Server 3.0 NUMA aware

Large Servers represent a single point of failure

Fat Node Advantages

High density (e.g. 16 core X4600 v 8 cores 4U HP 585 4U, X440 4U, Dell 6950 4U

Amortize expensive interconnects:FC10G Ethernet (& increasing use of TOE with iSCSI etc)More affordable to dual home everything (FC, 10G)

Strong I/O capability – typically high slot counts e.g. 8 on a x4600More robust components e.g. Large FANs have higher MTBF than smallTypically more modular, longer chassis life – to be discussed

What to look for…

What to look for in a fat node

Expect modular constructionSame chassis should support at least 2 generations of CPU• NB not speed bumps, 2 Generations minimum

Density, expect at least 4 cores per RU (Rack Unit)“Big” I/OPrior to 2008 and Intel CSI choose Opteron for 4 socket and aboveExpect roadmap to

SunFire x4600

Enterprise Data Management, ERP, Virtualization & Server

Consolidation, HPC

Compute4 to 16 way SMPUp to 128GB memory

I/OOver 12 GB/sec uni-dir of I/O6 PCI-E slots (40 lanes)4 10/100/1000 Ethernet ports4 SAS 2.5” Disks (with RAID)

RedundancyDual redundant hot swap power supplies and fans

ManagementLights for all FRUs. IPMI 2.0, remote KVM/floppy/CDROM with dedicated 10/100 EthernetSolaris, Linux, Windows supportVMware 3.0.1 imminently

Big I/O

Bigger workloads are comingVmware 64bit SupportI/O Virtualization

Strong I/O connectivity> 12 GB/sec of I/O2 x PCI-X6 x PCI-E (40 lanes)

Allows strong mutipathing e.g.2+ x 4GB FC2+ x Fast networking• e.g. 10 GigE• IB (not currently supported for ESX

Server)Strong onboard Ethernet connectivity

4 x 10/100/1000

Modular Construction

Modular construction, Chassis has long life (2-4 ways typically disposable) Rev E (single/dual core) -> Rev F (dual core) -> Rev “Next” (quad core)

Also allows multiple memory types, X4600 can use DDR or DDR2VMware DRS helps here, offline, upgrade, rejoin resource pool

(Don't VMotion, these represent new CPU steppings! )

Why Opteron

Above 2 sockets Opteron clearly superior performanceLimited benchmarking available yet (2.5 EULA etc)On-board Memory Controller Glueless DirectConnect Architecture

Superior SWaP to Xeon including WoodcrestDDR/DDR2 has huge heat/power advantage over FBDIMMs

Superior, at least until Intel CSI (2008?), virtualization assistanceVMware ESX Server does not take advantage of AMD-V yet...Onboard memory controller has significant implications for virtualization• Much of the overhead in virtualization is around memory management• e.g. AMD-V adds Tagged Translation Look Aside Buffers• e.g. AMD-V Device exclusion Vectors• AMD will add IOMMU during 2007

• http://www.devx.com/amd/Article/32146

http://www.devx.com/amd/Article/32146

AMD’s Direct Connect

Combination of On-board memory Controllers & HypertransportGlueless, i.e. No additional chipsets, up to 8 sockets (16 cores today)

HyperTransport is a parallel, point-to-point, chip-to-chip interconnect built using dual, unidirectional linksHyperTransport version 2.0 provides:

2, 4, 8, 16 or 32 data bits, at 200 to 1.4 GHz DDR, in both directionsAggregate bandwidth of 400 - 22.4 GB/secDaisy chaining using HyperTransport tunnelsAsymmetric upstream / downstream connections

HyperTransport

cc

AMD-V, Now and Future

Starts with Socket F systems (X4600 available with Socket F modules)AMD Rev F CPUs (Rev F does not equal Socket F!)Virtualization enabled, ESX Server currently (2.5.x, 3/3.0.1) does not utilize• VMRUN command etc (VMCB)• Tagged Translation Look Aside Buffers• Device Exclusion Vector (DEV)

AMD's onboard memory controller – feature of DirectConnect keyAllows tasks to be done in hardware which Intel VT does in softwareFor Example both Tagged TLB lookup and DEV not done by VT...

CPU

AMD-V In Action

Not supported today in ESX Server 3, expect in CY07?Xen embracing AMD-V & Intel VT as will Viridian (Microsoft)

CPU

Hypervisor

Guest VMs

Hypervisor

Guest VMs

1) Execute VMRUN 2) Guest runs

direct on CPU

3) Switch Privileged instructions, register access, interrupts etc

4) Virtual Memory Tagged TBDEV etc.

Conclusion

As VMware ESX Server moves into the Enterprise its time to move onto Enterprise server platforms.Fat nodes offer many compelling advantages

DensityTypically superior engineering (more modular, more redundancy)High Performance design (large memory footprints, “big I/O”)Hardware Virtualization will create demand for powerful ESX Server hosts

VI 3 features ease the move to Fat NodesVMware DRS, VMware HA (plus intelligent use of VMotion and VirtualCenter)

Best of all we are so confidant in these machines that you can “try and buy”! Go to the X4600 page and click “Free 60 day trial”

http://www.sun.com/servers/x64/x4600/Come see the X4600 on the Sun stand...

http://www.sun.com/servers/x64/x4600/

Still Nervous?

The X4600 has a “little” brother...SB8000

10 x 4 socket Opteron Rev E• 64GB memory per blade• 192 Gbs/sec of I/O per Blade• 2 x 8 lane PCI-E per Blade• 4 NEMs (FC, Ethernet, Infiniband)Rev F next

(and its already on the 3.0.1 HCL....)

Large, High Density VMware ESX Server Platforms

Tony Kay

Systems Virtualization Manager, Sun Microsystems Inc.

Presentation Download

Please remember to complete yoursession evaluation form

and return it to the room monitorsas you exit the session

The presentation for this session can be downloaded at http://www.vmware.com/vmtn/vmworld/sessions/

Enter the following to download (case-sensitive):

Username: cbv_repPassword: cbvfor9v9r

large, high density vmware esx server platformsdownload3.vmware.com/vmworld/2006/adc4439.pdf ·...

Documents