hyper-v performance, scale & architecture changes benjamin armstrong senior program manager lead...

51
Hyper-V Performance, Scale & Architecture Changes Benjamin Armstrong Senior Program Manager Lead Microsoft Corporation VIR413

Upload: ashlie-haynes

Post on 13-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

Hyper-V Performance, Scale & Architecture ChangesBenjamin ArmstrongSenior Program Manager LeadMicrosoft Corporation

VIR413

Hyper-V Scale ComparisonMassive Scale in the Box

Windows Server 2008 Windows Server 2008 R2

Windows Server 2012

HW Logical Processor Support

16 LPs 64 LPs 320 LPs

Physical Memory Support

1 TB 1 TB 4 TB

Cluster Scale 16 Nodes up to 1000 VMs

16 Nodes up to 1000 VMs

64 Nodes up to 8000 VMs

Virtual Machine Processor Support

Up to 4 VPs Up to 4 VPs Up to 64 VPs

VM Memory Up to 64 GB Up to 64 GB Up to 1TB

Live Migration Yes, one at a time Yes, one at a time Yes, with no limits. As many as hardware will

allow.

Live Storage Migration

No. Quick Storage Migration via SCVMM

No. Quick Storage Migration via SCVMM

Yes, with no limits. As many as hardware will

allow.

Servers in a Cluster 16 16 64

VP:LP Ratio 8:1 8:1 for Server12:1 for Client (VDI)

No limits. As many as hardware will allow.

Agenda

Supporting 320 LPs

Hypervisor Early Launch

In previous versions of Hyper-V, the OS in the parent partition booted first, then launches the hypervisor via a driverIn Windows Server 2012, the hypervisor starts first

The hypervisor initializes the BSP, applies any microcode update needed, and enables virtualizationThe OS is booted on a VP

Minimal Parent Hypervisor

The parent partition will run with no more than 64 parent VPsRegardless of the number of LPs present in the systemDefault value is 64, not user-configurable

The hypervisor continues to manage all LPs, schedule guest VPs on all processors, etc.All (most) UI and APIs run in the parent will return only what the parent sees

Task Manager has been updated to show the full set of host logical processorsWhy was this implemented?

Managing more than 64 parent VPs presents a scalability bottleneckBeyond 64LPs, the parent should not need more VPs to handle I/O for the system

Minimal Parent Hypervisor

8-LP machine, artificially constrained to 2 parent VPs via BCD flag

Hypervisor counters correctly show:• The total number of system LPs• The number of parent VPs

MSINFO32, WMI, POWERSHELL

Minimal Parent Hypervisor

8-LP machine, artificially constrained to 2 parent VPs via BCD flag

Name : Genuine Intel(R) CPU @ 1.80GHzDescription : Intel64 Family 6 Model 58 Stepping 2NumberOfCores : 2NumberOfLogicalProcessors : 2

Agenda

64 Virtual Processors in a VM

Scheduling Basics

Hypervisor schedules virtual processors that have code they need to runSimple when you can schedule virtual processors separately

Core Core Core Core

Hypervisor

VP

VP

VPVP

VP

VP

VP

VP

VPVP

VP

VP

VP

VP

VP

VP

VP

Core Core Core Core

Hypervisor

VP

VP

VPVP

VP

VP

VP

VP

VPVP

VP

VP

VP

VP

VP

VP

VP

Core Core Core Core

Hypervisor

VP

VP

VPVP

VP

VP VP

VP

VPVP

VP

VP

VP

VP

VP

VP

VP

Core Core Core Core

Hypervisor

VP

VP

VPVP

VP

VP VP

VP

VPVP

VP

VP

VP

VP

VP

VP

VP

Core Core Core Core

Hypervisor

VP

VP

VPVP

VP

VP

VP

VP

VPVP

VP

VP

VP

VP

VP

VP

VP

Core Core Core Core

Hypervisor

VP

VP

VP

VP

VP

VP

VP

VP

VPVP

VP

VP

VP

VP

VP

VP

VP

Core Core Core Core

Hypervisor

VP

VP

VP

VP

VP

VP

VP

VP

VPVP

VP

VP

VP

VP

VP

VP

VP

Multi-processor scheduling

Historically, operating systems have assumed that all processors are running at the same timeThis is a problem for virtualization

Core Core Core Core

Hypervisor

VP VPVP

VPVP VP

VP

VP

VPVPVP VP VP

VPVPVPVP

Core Core Core Core

Hypervisor

VP VPVP

VPVP VP

VP

VP

VPVPVP VP VP

VPVPVPVP

Core Core Core Core

Hypervisor

VP VPVP

VPVP VP

VP

VP

VPVPVP VP VP

VPVPVPVP

Core Core Core Core

Hypervisor

VP VPVP

VPVP VP

VP

VP

VPVPVP VP VP

VPVPVPVP

Core Core Core Core

Hypervisor

VP VPVP

VPVP VP

VP

VP

VPVPVP VP VP

VPVPVPVP

Core Core Core Core

Hypervisor

VP VPVP

VPVP VP

VP

VP

VP

VPVP VP VPVPVP

VPVP

Core Core Core Core

Hypervisor

VP VPVP

VPVP VP

VP

VP

VP

VPVP VP VPVPVP

VPVP

This doesn’t work…

Solution

Fix the guest operating systemWindows Server 2008 and later

Schedule virtual processors independently

Agenda

1TB of memory in a VM

Scaling up: Physical NUMA

NUMA (Non-uniform memory access)

Helps hosts scale up the number of cores and memory access

Partitions cores and memory into “nodes”

Allocation and latency depends on the memory location relative to a processor

High performance applications detect NUMA and minimize cross-node memory access Host NUMA

Memory

Processors

NUMA node 1 NUMA node 2

Scaling up: Physical NUMA

This is optimal…System is balanced

Memory allocation and thread allocations within the same NUMA node

Memory populated in each NUMA node

Host NUMA

Memory

Processors

NUMA node 1 NUMA node 2

Memory

Processors

NUMA node 3 NUMA node 4

Scaling up: Physical NUMA

This isn’t optimal…System is imbalanced

Memory allocation and thread allocations across different NUMA nodes

Multiple node hops

NUMA Node 2 has an odd number of DUMMS

NUMA Node 3 doesn’t have enough

NUMA Node 4 has no local memory (worst case)

Memory

Processors

NUMA node 1 NUMA node 2

Memory

Processors

NUMA node 3 NUMA node 4

Host NUMA

Scaling Up: Guest NUMA

Guest NUMAPresenting NUMA topology within VM

Guest operating systems & apps can make intelligent NUMA decisions about thread and memory allocation

Guest NUMA nodes are aligned with host resources

Policy driven per host – best effort, or force alignment

vNUMA node AvNUMA node B vNUMA node AvNUMA node B

NUMA node 1 NUMA node 2 NUMA node 3 NUMA node 4

Live Migration

Live migration works for virtual machines at any scale point

Live Migration

Faster I/O

Agenda

Single-Root I/O Virtualization (SR-IOV)

Reduces latency of network pathReduces CPU utilization for processing network trafficIncreases throughputDirect device assignment to virtual machines without compromising flexibilitySupports Live Migration

Network I/O path with SR-IOVNetwork I/O path without SR-IOV

Physical NIC

Root Partition

Hyper-V Switch

RoutingVLAN Filtering

Data Copy

Virtual Machine

Virtual NIC

SR-IOV Physical NIC

Virtual Function

VMBUS

Virtual MachineNetwork Stack

Software NIC

Enable IOV (VM NIC Property) Virtual Function is “Assigned” Team automatically created Traffic flows through VF

Turn On IOV Break Team Reassign Virtual Function

Assuming resources are available Migrate as normal

Live Migration Post Migration

Remove VF from VM

VM has connectivity even if

Switch not in IOV mode IOV physical NIC not

present Different NIC vendor Different NIC firmware

SR-IOV Enabling & Live Migration

SR-IOV Physical NICPhysical

NIC

Software Switch

(IOV Mode)

“TEAM”Software NIC

Virtual Function

SR-IOV Physical NIC

Software Switch

(IOV Mode)

“TEAM”

Virtual Function

Software path is not used

SR-IOV

The New Default Format for Virtual Hard Disks

VHDX

Up To 64 TB

Larger Virtual Disks

MB Alignmen

t

Large Sector

Support

Enhanced Perf

Larger Block Sizes

Internal Log

Enhanced Resiliency

Embed Custom

Metadata

User Defined

Metadata

VHDX Performance - 32KB Random Writes

PassThru Fixed Dynamic Differencing125000

130000

135000

140000

145000

150000

155000

160000Disk VHD VHDX

10%

10%

Queue Depth 16

IOPS

VHDX Performance - 1MB Sequential Writes

PassThru Fixed Dynamic Differencing0

200

400

600

800

1000

1200

1400

1600

1800Disk VHD VHDX

Queue Depth 16

MB/S

25% 25%

Hyper-V Host

Parent Partitio

n

IO SCALING

VirtualStorage Stack

VM

VMDev

IO Throughput Was Limited By

1 Channel Per VMFixed VP For IO Interrupt Handling256 Queue Depth/SCSI, Shared For All Attached Devices

Windows Server 20121 Channel/16 VPs , Per SCSI256 Queue Depth/Device, Per SCSIIO Interrupt Handling Distributed Amongst VPs

Dev

Hyper-V Host

VHD Stack

Offloaded Data Transfer (ODX)Traditional Data Copy Model

Server Issues Read Request To SANData Is Read Into MemoryData Is Written From Memory To SAN

ProblemsIncreased CPU & Memory UtilizationIncreased Storage TrafficInefficient For SAN

External Storage Array

LUN1 LUN2

Hyper-V Host

VHD Stack

Offloaded Data Transfer (ODX)

Offload Enabled Data Copy ModelServer Issues Offload Read Request To SANSAN Returns Token Representing RequestServer Issues Write Request To SAN Using TokenSAN Completes Data Copy InternallySAN Confirms Data Was Copied

Reduce Maintenance TimeMerge, Mirror, VHD/VHDX Creation

Increased Workload PerformanceVM’s Are Fully ODX Aware and Enabled

External Storage Array

LUN2LUN1

Toke

n

Toke

n

ODX

Agenda

Results

Hyper-V Performance Testing

Virtualization Overhead with active SQL

32 LP/VP 64 LP/VP

Native Hyper-V Native Hyper-V

Throughput 960 840 1589 1496

CPU Utilization 97.4% 98.6% 79% 86.8%

Throughput Loss 12.5% 6%

Path Length Overhead

15.7% 16.9%

Related Content

VIR312: What's New in Windows Server 2012 Hyper-V, Part 1

VIR315: What's New in Windows Server 2012 Hyper-V, Part 2

VIR321: Enabling Disaster Recovery using Hyper-V Replica

VIR314: WS2012 Hyper-V Live Migration and Live Storage Migration

Find Me Later At @VirtualPCGuy

© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS

PRESENTATION.