windows virtualization best practices and future hardware directions benjamin armstrong program...

42
Windows Virtualization Windows Virtualization Best Practices And Best Practices And Future Hardware Future Hardware Directions Directions Benjamin Armstrong Benjamin Armstrong Program Manager Program Manager Virtualization Virtualization Microsoft Corporation Microsoft Corporation David Wooten David Wooten Hardware Architect Hardware Architect System Integrity G System Integrity Gr Microsoft Corporat Microsoft Corporati

Upload: juniper-morrison

Post on 23-Dec-2015

230 views

Category:

Documents


3 download

TRANSCRIPT

Windows Virtualization Best Windows Virtualization Best Practices And Future Practices And Future Hardware DirectionsHardware Directions

Benjamin ArmstrongBenjamin ArmstrongProgram ManagerProgram ManagerVirtualizationVirtualizationMicrosoft CorporationMicrosoft Corporation

David WootenDavid WootenHardware ArchitectHardware ArchitectSystem Integrity GroupSystem Integrity GroupMicrosoft CorporationMicrosoft Corporation

GoalsGoals

After this session, you willAfter this session, you willBetter understand how a Microsoft Windows Better understand how a Microsoft Windows virtualization virtual machine (VM) virtualization virtual machine (VM) environment differs from a physical machineenvironment differs from a physical machine

Know what to do to ensure that your software Know what to do to ensure that your software works well within a VMworks well within a VM

AgendaAgenda

Virtual machine hardwareVirtual machine hardware

Virtualization impacts onVirtualization impacts onProcessorProcessor

StorageStorage

NetworkingNetworking

VideoVideo

Understanding isolationUnderstanding isolation

Development opportunitiesDevelopment opportunities

Hardware EquivalencyHardware Equivalency

Virtual machines (VMs) shouldVirtual machines (VMs) shouldaim to achieve a high level ofaim to achieve a high level ofhardware equivalencyhardware equivalency

Most software solutions ‘just work’Most software solutions ‘just work’

Not always possible to haveNot always possible to have100% equivalency100% equivalency

Awareness of differences in theAwareness of differences in theVM environment can help you to deliver a VM environment can help you to deliver a better solution for your customers on a better solution for your customers on a virtual platformvirtual platform

VSPs And VSCsVSPs And VSCs

Windows virtualization will provide a setWindows virtualization will provide a setof core VSPs and VSCs plus emulated of core VSPs and VSCs plus emulated hardware on supported platformshardware on supported platforms

Core VSP/VSCs will be included for storage, Core VSP/VSCs will be included for storage, networking, input and videonetworking, input and video

You cannot modify the core VSP/VSCsYou cannot modify the core VSP/VSCs

Emulated HardwareEmulated Hardware

The initial release of Windows The initial release of Windows virtualization will always exposevirtualization will always exposea limited set of emulated hardwarea limited set of emulated hardware

S3 Trio 64 Video cardS3 Trio 64 Video card

DEC 21140 Network cardDEC 21140 Network card

Etc.Etc.

It is possible to reconfigure theIt is possible to reconfigure theemulated hardwareemulated hardware

It is not possible to change the typeIt is not possible to change the typeof hardware being emulatedof hardware being emulated

Processor TopologyProcessor Topology

Changing processor type and topology inside of Changing processor type and topology inside of VMs under Windows virtualization is possibleVMs under Windows virtualization is possible

Processor changes require a cold boot of the VMProcessor changes require a cold boot of the VM

Do not make assumptions thatDo not make assumptions thatThe number of processors won’t changeThe number of processors won’t change

The core-to-processor ratio won’t changeThe core-to-processor ratio won’t change

The processor type won’t changeThe processor type won’t change

Hot add of virtual processors planned for Hot add of virtual processors planned for Windows Server 2003 and Windows Server Windows Server 2003 and Windows Server codenamed “Longhorn” guest operating systemcodenamed “Longhorn” guest operating system

Each VM is a single NUMA nodeEach VM is a single NUMA node

Processor SchedulingProcessor Scheduling

Each virtual processor “believes” that it Each virtual processor “believes” that it has 100% of its physical processor has 100% of its physical processor resources and that time is accurateresources and that time is accurate

This is not always trueThis is not always truePhysical processors can be oversubscribedPhysical processors can be oversubscribed

Resource limits can be configuredResource limits can be configured

Hypervisor is responsible for scheduling of Hypervisor is responsible for scheduling of virtual processors virtual processors

High-precision timing inside of VMs is High-precision timing inside of VMs is usually, but not always, guaranteed to usually, but not always, guaranteed to be accuratebe accurate

ProcessorProcessor

User-mode codeUser-mode codeMostly, no noticeable change to Mostly, no noticeable change to user-mode codeuser-mode code

Use CPUID to determine what is available Use CPUID to determine what is available Processor features might be subset of Processor features might be subset of physical machine physical machine

Do not assume all processors are always Do not assume all processors are always running at the same timerunning at the same time

Affects parallel execution codeAffects parallel execution code

ProcessorProcessor

Kernel-mode codeKernel-mode codeDon’t access processor structures directlyDon’t access processor structures directly(CRs, DRs, MSRs, PMC)(CRs, DRs, MSRs, PMC)

This is very expensiveThis is very expensive

Don’t use CPUID as a synchronizing instructionDon’t use CPUID as a synchronizing instructionUse fences insteadUse fences instead

Don’t assume CLI/STI gives accurate timingDon’t assume CLI/STI gives accurate timingInterrupts will still happenInterrupts will still happen

Don’t use RDTSC accesses for timingDon’t use RDTSC accesses for timingThis is highly volatileThis is highly volatile

Don't rely on processor performance countersDon't rely on processor performance countersCounters don't work outside of the parent partitionCounters don't work outside of the parent partition

StorageStorage

Storage is completely encapsulatedStorage is completely encapsulatedand the VM is not aware of thisand the VM is not aware of this

Unless you are using pass-through storageUnless you are using pass-through storage

Do not assume performance characteristicsDo not assume performance characteristicsof storage devicesof storage devices

Do not assume that CDs are slow (ISOs are fast) Do not assume that CDs are slow (ISOs are fast)

Do not assume that hard disks are fastDo not assume that hard disks are fast(might be on a network)(might be on a network)

Do not assume that floppy disks are slowDo not assume that floppy disks are slow

Emulated storage controllers areEmulated storage controllers areIntel 440BX controllerIntel 440BX controller

AIC 7870 SCSI controllerAIC 7870 SCSI controller

Persistency Of StoragePersistency Of Storage

Technologies like differencing disks, and Technologies like differencing disks, and snapshots mean that traditionally snapshots mean that traditionally persistent storage might not be persistent persistent storage might not be persistent any moreany more

Your software may find itself arbitrarily Your software may find itself arbitrarily moved back to an older point in timemoved back to an older point in time

Patches may be applied andPatches may be applied andthen ‘undone’then ‘undone’

Changes to storage persistencyChanges to storage persistencyare always user initiatedare always user initiated

NetworkingNetworking

Routing through host networkRouting through host networkadapter performed at OSI Layer 2adapter performed at OSI Layer 2

Host network security software provides Host network security software provides no protectionno protection

Unless the host is manually configuredUnless the host is manually configuredto route the VM’s network traffic at a higher to route the VM’s network traffic at a higher OSI LayerOSI Layer

Windows virtualization will only support Windows virtualization will only support 802.3 networking devices802.3 networking devices

NetworkingNetworking

Each virtual network card hasEach virtual network card hasits own separate MAC addressits own separate MAC address

This will be changed in the eventThis will be changed in the eventof a MAC address conflictof a MAC address conflict

MAC addresses can be configuredMAC addresses can be configuredto be static; But default to dynamicto be static; But default to dynamic

Emulated network controller isEmulated network controller isDEC/Intel 21140 Network controllerDEC/Intel 21140 Network controller

Performance not limited to 100 MbitPerformance not limited to 100 Mbit

VideoVideo

In Windows Server virtualization video In Windows Server virtualization video capabilities will be targeted at server capabilities will be targeted at server scenariosscenarios

2D video support only2D video support only

All video will be remoted over RDPAll video will be remoted over RDP

Emulated video controllerEmulated video controllerS3 Trio 64 Video controllerS3 Trio 64 Video controller

VGA and Text Mode performanceVGA and Text Mode performanceis not optimizedis not optimized

Non-planar video modes perform bestNon-planar video modes perform best

IsolationIsolation

By default, VMs are isolated entitiesBy default, VMs are isolated entitiesChild partitions are not able to access Child partitions are not able to access memory in any other partitionsmemory in any other partitions

Child partitions are not able to crashChild partitions are not able to crashany other partitionsany other partitions

Only methods for inter-virtualOnly methods for inter-virtualmachine communication aremachine communication are

Traditional networkingTraditional networking

HypercallsHypercalls

Integration ComponentsIntegration Components

Integration components operateIntegration components operateover VMBus to provide basic over VMBus to provide basic integration featuresintegration features

Time synchronizationTime synchronization

Operating System (OS) shutdownOperating System (OS) shutdown

Registry updatingRegistry updating

OS heartbeatOS heartbeat

OS identificationOS identification

Development OpportunitiesDevelopment Opportunities

VM neutral developmentVM neutral developmentSoftware that is not dependent onSoftware that is not dependent onspecific hardware will continue to function specific hardware will continue to function inside of VMsinside of VMs

External VM managementExternal VM managementSoftware can utilize WMI interfaces to control Software can utilize WMI interfaces to control and monitor VMsand monitor VMs

Integrated VM solutionsIntegrated VM solutionsVM-aware solutions can be developed that VM-aware solutions can be developed that provide enhanced features for users of VMsprovide enhanced features for users of VMs

Virtualization Hardware Virtualization Hardware FuturesFutures

David WootenDavid WootenHardware ArchitectHardware ArchitectSystem Integrity GroupSystem Integrity GroupMicrosoftMicrosoft

david.wooten @ microsoft.comdavid.wooten @ microsoft.com

Future TechnologiesFuture Technologies

The topics in this presentation relate The topics in this presentation relate to hardware to support possible to hardware to support possible features in version 2 of the Windows features in version 2 of the Windows hypervisor (HV2)hypervisor (HV2)

The hardware “requirements” discussed The hardware “requirements” discussed are expected to be needed to support the are expected to be needed to support the features of HV2 but future events may features of HV2 but future events may change these requirementschange these requirements

TopicsTopics

““Execution Environment” and why it Execution Environment” and why it needs protectionneeds protection

Protections in Root Complex with Protections in Root Complex with DMA RemappingDMA Remapping

Protections in Fabric to Regulate RoutingProtections in Fabric to Regulate Routing

Roots of Trust and the SMM ConundrumRoots of Trust and the SMM Conundrum

The EnvironmentThe Environment

The software running on a computer has The software running on a computer has control of the hardware on which it is control of the hardware on which it is running – its “execution environment”running – its “execution environment”

If that software is running on a virtual If that software is running on a virtual computer, it is important to preserve the computer, it is important to preserve the illusion of control over the virtualized illusion of control over the virtualized execution environmentexecution environment

Prevents unexpected behaviorPrevents unexpected behavior

Preserves meaning of local attestation used Preserves meaning of local attestation used for sealingfor sealing

Preservation Of EnvironmentPreservation Of Environment

The preservation of the apparent execution environment The preservation of the apparent execution environment of a virtual computer in a partition is the responsibility of of a virtual computer in a partition is the responsibility of the hypervisorthe hypervisor

The hypervisor must be able to enforce isolation between The hypervisor must be able to enforce isolation between partitions to insure adequate fidelity of the virtualization partitions to insure adequate fidelity of the virtualization

The main isolation tool for the hypervisor is The main isolation tool for the hypervisor is memory managementmemory management

Memory virtualization by the MMU (and associated Memory virtualization by the MMU (and associated registers) can prevent inappropriate changes to the registers) can prevent inappropriate changes to the memory of another partition through direct access by memory of another partition through direct access by the CPUthe CPU

Memory virtualization extensions are needed in IO Memory virtualization extensions are needed in IO hardware to complete the memory protectionshardware to complete the memory protections

Hypervisor

Partition 1

Partition 2

IOIO

The IO ProblemThe IO Problem

MMU MemoryMemory

100100

100100

100100

100100 42004200

100100

100100

AddressAddress

ControlControl

LegendLegend

FA00FA00

42004200

Evolution Of IO ProtectionEvolution Of IO Protection

In initial implementation of Windows In initial implementation of Windows virtualization, the IO mapping problem is virtualization, the IO mapping problem is finessed byfinessed by

““Assign” all IO devices to the Parent partitionAssign” all IO devices to the Parent partitionGive Parent partition a special mapping of Guest Give Parent partition a special mapping of Guest Physical = System PhysicalPhysical = System PhysicalPlace a lot of “trust” in the ParentPlace a lot of “trust” in the Parent

In HV2, the Parent may not have special rights to In HV2, the Parent may not have special rights to see into other partitionssee into other partitions

other partition may be a “peer” to the Parentother partition may be a “peer” to the Parent

In HV2, devices may be assigned to partitions In HV2, devices may be assigned to partitions other than the Parentother than the Parent

Partitions doing IO may not have the same level of Partitions doing IO may not have the same level of assumed “trust” as the V1 Parentassumed “trust” as the V1 Parent

Mechanisms For IO ProtectionMechanisms For IO Protection

Main new mechanism is DMA Main new mechanism is DMA remapping (DMAr)remapping (DMAr)

Adds address translation to DMAAdds address translation to DMA

Lets hypervisor limit device access Lets hypervisor limit device access to memoryto memory

PCI Routing Control and ID CheckingPCI Routing Control and ID CheckingRestrict peer-to-peer (P2P) access Restrict peer-to-peer (P2P) access so that devices can’t do P2P with so that devices can’t do P2P with un-translated addressun-translated address

Check ID of requester in switchesCheck ID of requester in switches

MemoryMemory

IOIO DMAr

MMUPartition 1

Partition 2

DMA RemappingDMA Remapping

100100

100100100100

100100

Hypervisor

42004200FA00FA00

HV2 DMAr RequirementsHV2 DMAr Requirements

Chipset must support either IOMMU Chipset must support either IOMMU (AMD) or VT-d (Intel)(AMD) or VT-d (Intel)

DMAr is not processor specific so IOMMU DMAr is not processor specific so IOMMU can be used with Intel processor and VT-d can be used with Intel processor and VT-d can be used with AMDcan be used with AMD

All IO devices must access memory All IO devices must access memory through DMArthrough DMAr

Chipset may have more that one DMAr Chipset may have more that one DMAr unit but they must use the same type of unit but they must use the same type of programming interfaceprogramming interface

PCI Routing ControlPCI Routing Control

PCI devices are accessed using System PCI devices are accessed using System Physical Addresses (SPA)Physical Addresses (SPA)

Drivers will program devices with Device Drivers will program devices with Device Physical Addresses (DPA) – DPA may be Physical Addresses (DPA) – DPA may be equal to Guest Physical Address (GPA) or equal to Guest Physical Address (GPA) or be device specific be device specific

To prevent a DPA from accessing a PCI To prevent a DPA from accessing a PCI device, switches must not route based device, switches must not route based on DPAon DPA

PCI Routing ControlPCI Routing Control

Microsoft is working through the PCI-SIG Microsoft is working through the PCI-SIG to define a modification to switches and to define a modification to switches and Functions so that DPA-based routing Functions so that DPA-based routing between PCI Functions can be disabledbetween PCI Functions can be disabled

Devices that must do P2P can get SPA Devices that must do P2P can get SPA from RC by using Address Translation from RC by using Address Translation Services (ATS)Services (ATS)

With ATS, Function can ask DMAr in RC for With ATS, Function can ask DMAr in RC for the SPA corresponding to DPA and then use the SPA corresponding to DPA and then use that DPA to directly access another devicethat DPA to directly access another device

Requester ID CheckingRequester ID Checking

DMAr hardware uses the Requester ID (Bus-DMAr hardware uses the Requester ID (Bus-Dev-Func or “BDF”) to chose a translation tableDev-Func or “BDF”) to chose a translation table

A device could write to wrong memory address if A device could write to wrong memory address if the BDF is wrongthe BDF is wrong

A switch can check the Requester ID and A switch can check the Requester ID and prevent errors of this sortprevent errors of this sort

Bus number of requester must be >= the secondary Bus number of requester must be >= the secondary bus number and <= the subordinate bus number of a bus number and <= the subordinate bus number of a switch portswitch port

Microsoft is working with the PCI-SIG to have Microsoft is working with the PCI-SIG to have this checking capability added to switchesthis checking capability added to switches

Static And Dynamic Roots Static And Dynamic Roots Of TrustOf Trust

Static Root of Trust Measurement (SRTM) and Static Root of Trust Measurement (SRTM) and Dynamic Root of Trust Measurement (DRTM) Dynamic Root of Trust Measurement (DRTM) are different ways to start a chain of trustare different ways to start a chain of trust

To start a chain of trust, the CPU must be in a To start a chain of trust, the CPU must be in a known state, running known code, and the known state, running known code, and the system must be in a state in which the code can system must be in a state in which the code can “defend” itself“defend” itself

From this initial condition, we can measure each From this initial condition, we can measure each of the state changes and be able to make of the state changes and be able to make assertions about the state of the computerassertions about the state of the computer

Static Root Of Static Root Of Trust MeasurementTrust Measurement

This is a chain of trust that is started by This is a chain of trust that is started by computer system reset – puts CPU in a computer system reset – puts CPU in a known stateknown stateThe first code executed (The Core Root of Trust The first code executed (The Core Root of Trust for Measurement – CRTM) measures the next for Measurement – CRTM) measures the next thing to be executed – CRTM is known codething to be executed – CRTM is known codeHardware is reset and peripheral access to Hardware is reset and peripheral access to memory is not allowed – CRTM can memory is not allowed – CRTM can “defend” itself“defend” itselfSignificant issue with SRTM is that, once trust is Significant issue with SRTM is that, once trust is lost (e.g., unknown code executed), only way to lost (e.g., unknown code executed), only way to get it back is to reboot the systemget it back is to reboot the system

Dynamic Root Of Dynamic Root Of Trust MeasurementTrust Measurement

Uses new CPU instructions to put the CPU in a Uses new CPU instructions to put the CPU in a known stateknown state

Code to be executed is sent to TPM to be “measured” Code to be executed is sent to TPM to be “measured” into a special Platform Configuration Register (PCR)into a special Platform Configuration Register (PCR)

This PCR is accessible only when in the DRTM initialization state This PCR is accessible only when in the DRTM initialization state and only by CPUand only by CPU

Initial, measured DRTM code is protected by hardware – Initial, measured DRTM code is protected by hardware – method varies by vendormethod varies by vendor

With DMAr, hypervisor can “defend” itself from IO devicesWith DMAr, hypervisor can “defend” itself from IO devices

With DRTM, if trust is lost, can restart chain of trust With DRTM, if trust is lost, can restart chain of trust without rebootingwithout rebooting

Secure LaunchSecure Launch

““Secure Launch” refers to the act of starting the Secure Launch” refers to the act of starting the hypervisor using the DRTMhypervisor using the DRTMA Secure Launch allows the hypervisor to come A Secure Launch allows the hypervisor to come up in a trusted state, with control of the system, up in a trusted state, with control of the system, regardless of what code has run previouslyregardless of what code has run previously

Allows arbitrary initialization code to run without Allows arbitrary initialization code to run without affecting the trust state of the systemaffecting the trust state of the system

Major benefit of DRTM is that attestation of the Major benefit of DRTM is that attestation of the platform can exclude lots of meaningless platform can exclude lots of meaningless information that can’t be ignored by SRTMinformation that can’t be ignored by SRTM

Add-in cardsAdd-in cardsBIOS updatesBIOS updatesDriver code used to boot hypervisorDriver code used to boot hypervisor

DRTM And Trust StateDRTM And Trust State

The attestation of a partition must include The attestation of a partition must include the partition state and anything that can the partition state and anything that can affect the execution of that partitionaffect the execution of that partition

Would like attestation only to include Would like attestation only to include software that is loaded after the DRTMsoftware that is loaded after the DRTM

This allows sealing to exclude This allows sealing to exclude pre-launch actionspre-launch actions

Can maintain chain of trust when code Can maintain chain of trust when code is updatedis updated

Bring system up in trusted state, verify that Bring system up in trusted state, verify that changes are within policy, then make changes changes are within policy, then make changes and update sealed blobsand update sealed blobs

Isolation And SMMIsolation And SMM

SMM can be more privileged than the hypervisorSMM can be more privileged than the hypervisorSMM can access any memory location without mediation by SMM can access any memory location without mediation by the hypervisorthe hypervisor

The privilege level of SMM means that SMM code may The privilege level of SMM means that SMM code may have to be included in the seal-to statehave to be included in the seal-to state

Because SMM loads before the DRTM is initiated, almost Because SMM loads before the DRTM is initiated, almost all of the code update problems related to the SRTM are all of the code update problems related to the SRTM are reinserted into the attestation/sealing process reinserted into the attestation/sealing process

SRTM problems arise because of changes to BIOS code which is SRTM problems arise because of changes to BIOS code which is not vetted by the OS/hypervisornot vetted by the OS/hypervisor

When OS/hypervisor loads, the changed BIOS means that PCRs When OS/hypervisor loads, the changed BIOS means that PCRs no longer match, which means that blobs can’t be unsealedno longer match, which means that blobs can’t be unsealed

What To Do About SMM?What To Do About SMM?

One approach to dealing with SMM is to make it run in a One approach to dealing with SMM is to make it run in a “container” that is controlled by the hypervisor“container” that is controlled by the hypervisor

Hypervisor can prevent SMM from accessing anything Hypervisor can prevent SMM from accessing anything that it shouldn’tthat it shouldn’t

Issue that OEMs have with this approach is that it could Issue that OEMs have with this approach is that it could allow the hypervisor to prevent SMM from accessing the allow the hypervisor to prevent SMM from accessing the parts of the hardware that it must accessparts of the hardware that it must access

Will CPU melt if hypervisor is broken or rogue?Will CPU melt if hypervisor is broken or rogue?

OEMs consider SMM to be part of the hardware and just OEMs consider SMM to be part of the hardware and just as “trustworthy” as the hardwareas “trustworthy” as the hardware

Trust isn’t the issue, the attestation and security evaluation of Trust isn’t the issue, the attestation and security evaluation of SMM is the issueSMM is the issue

““SMM is hardware” position begs the question of whether this SMM is hardware” position begs the question of whether this applies equally to SMM “applications”.applies equally to SMM “applications”.

SMM In HV2SMM In HV2

Microsoft does not yet have a complete Microsoft does not yet have a complete solution for dealing with SMM privilege solution for dealing with SMM privilege in HV2in HV2

Likely to have to evolve the solution by Likely to have to evolve the solution by working with processor, chipset, BIOS, working with processor, chipset, BIOS, and computer system vendorsand computer system vendors

Call To ActionCall To Action

Chipset vendors: Start planning Chipset vendors: Start planning DMAr deploymentDMAr deploymentSwitch vendors: Look to PCI-sig for ECRs Switch vendors: Look to PCI-sig for ECRs to implement access controlsto implement access controlsDevice vendors: Consider impact of Device vendors: Consider impact of DMAr and evaluate need for ATSDMAr and evaluate need for ATSBIOS, CPU, system vendors: Help with BIOS, CPU, system vendors: Help with SMM problemSMM problemAttend other virtualization presentations, Attend other virtualization presentations, especially VIR046 – HyperCall especially VIR046 – HyperCall APIs ExplainedAPIs Explained

© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,

it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.