large, high density vmware esx server platformsdownload3.vmware.com/vmworld/2006/adc4439.pdf ·...
TRANSCRIPT
Large, High Density VMware ESX Server Platforms
Tony Kay
Systems Virtualization Manager, Sun Microsystems Inc.
Agenda
IntroductionA Fat Node - Quick OverviewVirtualization Motivators - Today
Today and TomorrowWhy Fat Nodes?Node Selection - An OverviewConclusion
Introduction
Introduction Abstract - A Reminder
What are the advantages and considerations of using large processor count ESX Server hosts? As VI3 moves deeper into the data center and is deployed to support more critical applications with higher SLAs, 4-socket systems have become commonplace, with correspondingly higher consolidation ratios. This session investigates the implications of using larger enterprise-class 6, 8, and 16 core systems. For example, larger x86 systems may have different performance characteristics and may exhibit NUMA characteristics that need to considered. In addition high consolidation ratios must be accounted for in architecting for maintenance of SLAs.
Caveats
The Bad NewsThe X4600 used here as an illustrative platform was not on the ESX Server 3.0.1 HCL at the time of “going to press”
The Good NewsThe X4600 will be on the HCL very shortly in both Opteron Rev E. & Rev F configurations (There's the first benefit, modular design supports multiple CPU models within a single chassis life cycle...)Sun has numerous X4600 POCs underway with major Fortune 200 companies from a wide cross section including:• Major Systems Integrators• Finance (many)• Manufacturing (Automotive, NEPs, etc)• ISVs and Service providers• Universities and academia• Transportation• Retail etc...
A Fat Node…
Quick Introduction - SunFire x4600
SunFire x4600
Enterprise Data Management, ERP, Virtualization & Server
Consolidation, HPC
Compute4 to 16 way SMPUp to 128GB memory
I/OOver 12 GB/sec uni-dir of I/O6 PCI-E slots (40 lanes)4 10/100/1000 Ethernet ports4 SAS 2.5” Disks (with RAID)
RedundancyDual redundant hot swap power supplies and fans
ManagementLights for all FRUs. IPMI 2.0, remote KVM/floppy/CDROM with dedicated 10/100 EthernetSolaris, Linux, Windows supportVMware 3.0.1 imminently
Virtualization Motivators
Why do people Virtualize?
Please re-order to suit… but typically1) Server Sprawl: particularly Microsoft 1 instance/1 Application
2) Legacy OS and application support:
3) SLAs: consolidate yet maintain and/or enhance SLAs
4) Environmental issues: heat, power, cooling, footprint
5) Utilization: raise average platform utilization
6) Disaster Recovery
7) TCO: Support costs for aging servers
8) Simplify platforms and infrastructure
9) Flexibility: time to deploy, agile, dynamic data center
10) Security
1) Server Sprawl
Particularly but not just limited to Microsoft environments1 application per OS instance (DLL Hell, scalability etc)Solaris & Linux less susceptible• Also *nix offers stacked virtualization e.g. Containers within a VM
Small ESX Server hosts will just lead to unnecessary “ESX Server Sprawl”VM Sprawl coming...
Analysts already noting issues hereVirtualCenter and ease of creation/cloningOpen Source Operating Systems allow “free” deployment
High Consolidation ratios can help reduce past and future sprawlIf SLAs and availability can be managed and met
2) Legacy OS and Server Support
Today for many end user virtualization is about “legacy problems”Supporting “old” OEs (NT 3.51, NT4, Netware, Win 2K...)Often light/low utilization – even on “ancient” hardware
Going forward VMware becoming a strategic deployment platform“I want x% of all new x86 deployments on VMware...”
Fast time to deployFlexibility (cloning, encapsulation, roll back capabilities...)Enhanced Operations (backup, patching, DR)
Some IT Directors, CIOs, Data center Managers looking at > 50% of “new” deployments to be “virtualized. This means
Bigger VMs, more vCPUs, more memory, more I/OHeavier workloads, light to medium databases, messaging, hostingMore mission critical, higher SLAs
3) SLAs and HA - A Virtualization Paradox?
Pressure to enhance SLAs yet lower costsOne common objection “8 sockets/16 cores is too big, too many VMs”NB Spreading VMs around on “little boxes” is not a strategy for SLAs
SLAs come through well thought out methodologies and practices• Classify workloads by SLAs (e.g. 3 categories) virtual & non Virtual• Non Virtual can be for performance, scalability or availability etc• Hybrid Environments may play a role – mix 2, 4 and 8 socket platforms
(Stick with same CPU steppings...)Use Clustering capabilities both virtual and non virtual
Solaris/Linux physical clustersMicrosoft physical 2 physical, physical to virtual, virtual to virtual
Use VI 3's new featuresVMware HA – A building block towards application availabilityVMware DRS – Familiarise yourself with affinity rules functionality
Quick Recap: VMware DRS
Dynamic and intelligent allocation of hardware resources to ensure optimal alignment between business and IT
Dynamic balancing of computing resource pools across VI3 hosts
Intelligent resource allocation based on pre-defined rules
Can be a component in maintaining SLAs and systems availability
Resource Pool
Business Demand
How Does VMware DRS Work
Initial PlacementPower on virtual machine in resource poolRecommend host (prioritized list)
Dynamic BalancingMonitor key virtual machines, pool and host metricsDeliver entitled resources to pools and VMsRecommend migrations (prioritized list)
Goal of VMware DRSBalance Virtual Machines across ESX Server hosts within clusterEnforce resource policies accuratelyRespect placement constraints• Affinity and anti-affinity rules• VMotion compatibility (CPU type, SAN and LAN connectivity)
VMware DRS Cluster Constraints
Anti-affinity rulesRun virtual machines on different hostsMotivation: avoid resource contention
Affinity rulesRun virtual machine on the same hostMotivation: locality
Can be a component in maintaining SLAs and systems availability
VMware HA
FunctionalityAutomatic restart of VMs after ESX Server host failure
Cost effective starting point for recoveryA building block in an SLA strategy
Does not recover VM state itself• Restarts VM allowing recoveryDoes not recover application• VM should initiate recovery• e.g. Roll backs, redo logs etc.
Why Fat Nodes?
Common Objections
Big x86 Servers “don’t scale”Look at TSUBAME 655 X4600 (10,000 Opterons)• 7th Most powerful SuperComputer in the May top500• Most powerful Opteron Cluster, Cray Red Storm different architecture• Next week, at SuperComputing 06 watch out for next “TBA”
“Large” (> 4 socket) X86 Servers have poor memory & I/O characteristicsIntel SMPs relatively weak (Intel doesn’t make > 4 socket chipset)AMD Opteron NUMA architecture - 8 socket Glueless• ESX Server 2.5 and ESX Server 3.0 NUMA aware
Large Servers represent a single point of failure
Fat Node Advantages
High density (e.g. 16 core X4600 v 8 cores 4U HP 585 4U, X440 4U, Dell 6950 4U
Amortize expensive interconnects:FC10G Ethernet (& increasing use of TOE with iSCSI etc)More affordable to dual home everything (FC, 10G)
Strong I/O capability – typically high slot counts e.g. 8 on a x4600More robust components e.g. Large FANs have higher MTBF than smallTypically more modular, longer chassis life – to be discussed
What to look for…
What to look for in a fat node
Expect modular constructionSame chassis should support at least 2 generations of CPU• NB not speed bumps, 2 Generations minimum
Density, expect at least 4 cores per RU (Rack Unit)“Big” I/OPrior to 2008 and Intel CSI choose Opteron for 4 socket and aboveExpect roadmap to
SunFire x4600
Enterprise Data Management, ERP, Virtualization & Server
Consolidation, HPC
Compute4 to 16 way SMPUp to 128GB memory
I/OOver 12 GB/sec uni-dir of I/O6 PCI-E slots (40 lanes)4 10/100/1000 Ethernet ports4 SAS 2.5” Disks (with RAID)
RedundancyDual redundant hot swap power supplies and fans
ManagementLights for all FRUs. IPMI 2.0, remote KVM/floppy/CDROM with dedicated 10/100 EthernetSolaris, Linux, Windows supportVMware 3.0.1 imminently
Big I/O
Bigger workloads are comingVmware 64bit SupportI/O Virtualization
Strong I/O connectivity> 12 GB/sec of I/O2 x PCI-X6 x PCI-E (40 lanes)
Allows strong mutipathing e.g.2+ x 4GB FC2+ x Fast networking• e.g. 10 GigE• IB (not currently supported for ESX
Server)Strong onboard Ethernet connectivity
4 x 10/100/1000
Modular Construction
Modular construction, Chassis has long life (2-4 ways typically disposable) Rev E (single/dual core) -> Rev F (dual core) -> Rev “Next” (quad core)
Also allows multiple memory types, X4600 can use DDR or DDR2VMware DRS helps here, offline, upgrade, rejoin resource pool
(Don't VMotion, these represent new CPU steppings! )
Why Opteron
Above 2 sockets Opteron clearly superior performanceLimited benchmarking available yet (2.5 EULA etc)On-board Memory Controller Glueless DirectConnect Architecture
Superior SWaP to Xeon including WoodcrestDDR/DDR2 has huge heat/power advantage over FBDIMMs
Superior, at least until Intel CSI (2008?), virtualization assistanceVMware ESX Server does not take advantage of AMD-V yet...Onboard memory controller has significant implications for virtualization• Much of the overhead in virtualization is around memory management• e.g. AMD-V adds Tagged Translation Look Aside Buffers• e.g. AMD-V Device exclusion Vectors• AMD will add IOMMU during 2007
• http://www.devx.com/amd/Article/32146
AMD’s Direct Connect
Combination of On-board memory Controllers & HypertransportGlueless, i.e. No additional chipsets, up to 8 sockets (16 cores today)
HyperTransport is a parallel, point-to-point, chip-to-chip interconnect built using dual, unidirectional linksHyperTransport version 2.0 provides:
2, 4, 8, 16 or 32 data bits, at 200 to 1.4 GHz DDR, in both directionsAggregate bandwidth of 400 - 22.4 GB/secDaisy chaining using HyperTransport tunnelsAsymmetric upstream / downstream connections
HyperTransport
cc
AMD-V, Now and Future
Starts with Socket F systems (X4600 available with Socket F modules)AMD Rev F CPUs (Rev F does not equal Socket F!)Virtualization enabled, ESX Server currently (2.5.x, 3/3.0.1) does not utilize• VMRUN command etc (VMCB)• Tagged Translation Look Aside Buffers• Device Exclusion Vector (DEV)
AMD's onboard memory controller – feature of DirectConnect keyAllows tasks to be done in hardware which Intel VT does in softwareFor Example both Tagged TLB lookup and DEV not done by VT...
CPU
AMD-V In Action
Not supported today in ESX Server 3, expect in CY07?Xen embracing AMD-V & Intel VT as will Viridian (Microsoft)
CPU
Hypervisor
Guest VMs
Hypervisor
Guest VMs
1) Execute VMRUN 2) Guest runs
direct on CPU
3) Switch Privileged instructions, register access, interrupts etc
4) Virtual Memory Tagged TBDEV etc.
Conclusion
As VMware ESX Server moves into the Enterprise its time to move onto Enterprise server platforms.Fat nodes offer many compelling advantages
DensityTypically superior engineering (more modular, more redundancy)High Performance design (large memory footprints, “big I/O”)Hardware Virtualization will create demand for powerful ESX Server hosts
VI 3 features ease the move to Fat NodesVMware DRS, VMware HA (plus intelligent use of VMotion and VirtualCenter)
Best of all we are so confidant in these machines that you can “try and buy”! Go to the X4600 page and click “Free 60 day trial”
http://www.sun.com/servers/x64/x4600/Come see the X4600 on the Sun stand...
Still Nervous?
The X4600 has a “little” brother...SB8000
10 x 4 socket Opteron Rev E• 64GB memory per blade• 192 Gbs/sec of I/O per Blade• 2 x 8 lane PCI-E per Blade• 4 NEMs (FC, Ethernet, Infiniband)Rev F next
(and its already on the 3.0.1 HCL....)
Large, High Density VMware ESX Server Platforms
Tony Kay
Systems Virtualization Manager, Sun Microsystems Inc.
Presentation Download
Please remember to complete yoursession evaluation form
and return it to the room monitorsas you exit the session
The presentation for this session can be downloaded at http://www.vmware.com/vmtn/vmworld/sessions/
Enter the following to download (case-sensitive):
Username: cbv_repPassword: cbvfor9v9r