hardware virtualization technology and its security - pku virtualization... · vt is designed to...

61
Hardware virtualization technology and its security Dr. Qingni Shen Peking University Intel UPO Supported

Upload: dinhdien

Post on 20-Nov-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Hardware virtualization

technology and its security

Dr. Qingni Shen

Peking University

Intel UPO Supported

Main Points

VMM technology

Intel VT technology

Security analysis of Intel VT-d

Virtual Machine Monitors (VMMs)

VMM is a software layer

Allow many virtual machine to share hardware

Allow unmodified software directly compatible

...

Virtual Machine Monitor (VMM)

VMn VM0 VM1

Platform HW

I/O Devices Processor/CS Memory

Virtual

Machines

(VMs)

Appn App0

Guest OS0

App1

Guest OS1 Guest OSn

Workload Isolation

Purpose of Virtualization

Workload Consolidation

Workload Migration Workload Embedding

HW

App2 App1

OS

HW1 HW2

App2 App1

OS1 OS2

VMM

HW

App2 App1

OS1 OS2

VMM

HW1

App

HW2

VMM

OS

VMM

HW1

App

HW2

VMM

OS

VMM

HW

App App

OS1 OS2

VMM

HW

App1 App2

OS OS

Virtualization has powerful capabilities

Virtualization Usage Models

Legacy software support

Test

The active partition

Manageable

Server consolidation

Failure recovery architecture

High elastic data center

Manageable

Migration

Consolidation

Consolidation

Consolidation

Isolation

Migration

Embedding

Isolation Migration

Embedding

Isolation Migration

CL

IEN

T

SE

RV

ER

What is Intel VT technology Formerly known by the codenames Vanderpool* & Silvervale*

VT is a collection of a series of hardware enhanced

components

VT is designed to simplify the virtualization software

VT brings a new value, and various opportunities

VT-x and VT-i the first VT series products implement on

Intel processor and chip set.

VT-x for IA-32 CPU virtualization enhancement

VT-i for IPF CPU virtualization enhancement

Main components of Intel-VT

Intel-VT technology, which is designed by

Intel corporation, is a solution of hardware

assisted virtualization. Including:

VT-x/VT-i for CPU

VT-d for chip set

VT-c for network

Core function of VT-x/VT-i

Intel flexible priority technology

–(Intel VT FlexPriority)

Intel VT flexible migration technology

–(Intel VT FlexMigration)

Intel VT extended page table

–(Extended Page Tables)

Intel VT FlexPriority When the processor executes the task, it will receive request or

“Interruption” command which needs to pay attention to and

produced by other devices or applications. In order to minimize the

impact on performance, a special register within the processor will

monitor the task priority. Thus, only a higher priority than the

currently running task interruption will be timely focused. Intel

FlexPriority can create a virtual copy of TPR6,which can be read,

and can be modified by guest os without any intervention in some cases.

This measure can make a significant performance improvement in 32

bit OS which uses TPR frequently.( For instance,the performance of

application in Windows Server* 2000 will be improved by 35%.)

Intel VT FlexMigration

An important advantage of virtualization is that in no downtime

condition, running applications can be migrated between physical

machines. The aim of Intel VT FlexMigration is to achieve the

seamless migration between current server and future server

which are based on Intel processor, even if the new system may

include enhanced instruction set. With the help of this technology,

management process can create a set of consistent instructions in

all servers in migration pool, realizing seamless migration of

workload. This generates a more flexible and unified server

resource pool which can run seamlessly among generations of

hardware.

Platform Hardware

VM1

VM Monitor

VM0

Guest OS0

App App App ...

... Guest OS1

App App App

...

OS and applications should not know that they are

sharing CPU resources with others

VMM should be able to protect themselves from other client software threat

Challenge of development of VMM

VMM should be able to make software stack in VM mutually independent

VMM should be able to provide virtual hardware platform interface to guest software

Platform Hardware

VM1

VM Monitor

VM0

Guest OS0 ... Guest OS1

Run VMM in VMM to handle

errors during Guest OS operation

CPU virtualization of current IA architecture requires complex software design.

Software solution: Client degradation

Virtual hole of IA architecture: • Ring level rename • Non-trap instruction • Out of bound error • I interruption virtualization • Context switching of CPU state •Address space compression

Complex software skills • Source code modification • Binary code modification

App App App ... App App App ...

Sensitive instruction will go wrong when run Guest OS in ring 0 and above

VMM is able to execute privilege instructions before guest software

VT removes the design of virtualization hole and complex software

Intel® Virtualization Technology

Guest software runs in the new model, and the privilege is down;

• Applications still run in ring 3 • OS runs in degraded privilege ring 0 • VMM runs in a new model with all privileges

Platform Hardware

VM1

VM Monitor

VM0

Guest OS0 ... Guest OS1

App App App ... App App App ...

An overview of VT-x

Operation Mode

Guest OS VMM transition

VM control structure

Virtual-machine control structure

Principle of VM exit

Benefits

Operation mode VMX root mode:

Own all privileges for the operation of the VMM

VMX non-root mode:

Own a subset of privileges for running guest softwares

Rely on the ring level to reduce guest and software privileges

With the help of renaming the ring and compression

VMX operation mode

Root operation mode

VMM is running in the root operation mode

Non- root operation mode

Guest software is running in the non-root operation

mode

VM Entry and VM Exit

VM Entry From VMM into Guest

Fetch VM state from VMCS,and enter in non-root mode

VMLAUNCH instruction is used to initialize the entry VMRESUME is used to re-enter the virtual machine state

Physical Host Hardware

VM1

VM Monitor

VM0

Guest OS0

App App App ...

... Guest OS1

App App App ...

VM Exit VM Entry

VM Exit ➤From Guest into VMM ➤Enters VMX root mode ➤Place guest state into

VMCS ➤Import VMM state from

VMCS

IA-32

Operation

VT-x Operation

Ring 0

Ring 3

VT-x Operation

Ring 0

Ring 3 VMX Root

Operation VMXON

VT-x Operation

Ring 0

Ring 3 VMX Root

Operation

VMX

Non-root

Operation Ring 0

Ring 3

VM 1

VMLAUNCH

VT-x Operation

Ring 0

Ring 3 VMX Root

Operation

VMX

Non-root

Operation Ring 0

Ring 3

VM 1

VM Exit

VT-x Operation

Ring 0

Ring 3 VMX Root

Operation

VMX

Non-root

Operation Ring 0

Ring 3

VM 1

VMRESUME

VT-x Operation

Ring 0

Ring 3 VMX Root

Operation

VMX

Non-root

Operation

. . . Ring 0

Ring 3

VM 1

Ring 0

Ring 3

VM 2

Ring 0

Ring 3

VM n

VMLAUNCH

VT-x Operation

Ring 0

Ring 3 VMX Root

Operation

VMX

Non-root

Operation

. . . Ring 0

Ring 3

VM 1

Ring 0

Ring 3

VM 2

Ring 0

Ring 3

VM n

VMCS2 VMCSn VMCS1

Virtual Machine Control

Structure (VMCS)

VMCSs is control structure stored in the memory

Only one VMCS is active every time

VMCS Payload:

VM execution,exit,entry control

Guest and host state

VM exits information field

VMCS currently has no uniform standard , so different designs may have different definitions

VMPTRLD: a pointer pointing to VMCS

VMREAD/VMWRITE: new VMCS access instructions

Virtual machine control structure (VMCS)

In the view of VMX operation,Intel defines VMCS. This structure can only be operated by VMCLEAR, VMPTRLD, VMREAD, and VMWRITE。

a) GUEST-STATE domain:state of processor when VM changes from root mode to non-root mode;

b) HOST-STATE domain:state of processor when VM changes from non-root mode to root mode ;

c) VM execution control domain: Processor is forced to exit from non-root operation mode to root operation mode if VM is running in non-root operation mode.

d) VM exit control domain:Store information f VM exits from non-root operation mode.

e) VM entry control domain:Read information if VM enters into non-root operation mode.

f) VM exit information domain:Save the reason into domain if VM exits from non-root operation mode to root operation mode.

Reasons of VM EXIT

Exit paging state to operate on the page table

Access CR3, INVLPG instruction(Control TLB disabled)

Page error

CR0/CR4 access

Some states need virtualization

CPUID, RDMSR, WRMSR, RDPMC, RDTSC, MOV DRx

Exception and I/O access

32-entry exception bitmap, I/O-port access bitmap

Control of the asynchronous events

When guest interrupt blocks, VMM should handle this situation

Detect guest states in order to facilitate VM scheduling

HLT, MWAIT, PAUSE

Benefits: VT helps improve VMMs

VT reduces the guest OS’s dependency

No need for binary package or translation

Provide support for legacy system

VT improves robustness

No need for complex software technology

Simplified

Smaller Trusted Compute Base (TCB)

VT improves performance

Fewer switching between VM and VMM

Device Virtualization (VT-d)

As for server, I/O is an important component. The improvement

of CPU computing ability can lead to faster data processing, only

with the premise of the smooth arrival of data to CPU. As a result,

whether the storage or the network, as well as the graphic cards,

memory, and so on, I/O capability is an critical part of enterprise-

level architecture.

Without VT-d technology, VMM must be involved in the

interaction with I/O directly, which will not only slows down the

speed of data transmission, but also increases processor’s

workload due to frequent VMM activities. VT-d provides direct

access to real hardware mechanism for guest OS, which greatly

reduces server processor’s workload.

Current way of virtualization

Simulate the I/O device:VMM simulates an I/O device for the

guest so that the guest can make use of the corresponding real

drivers through fully simulating devices’ functionality. This

approach can provide perfect compatibility (regardless of the fact

that whether this device exists or not), but this simulation will

affect performance apparently.

Additional software interface : This mode is more like I/O

simulation model. VMM software will provide a series of direct

device interface to VM, so as to enhance the efficiency of

virtualization. This is a bit like the DirectX technology of

Windows OS, which offers better performance than I/O simulation

model, but decreases the capability.

Simulate the I/O device

Additional software interface

Design of VT-d

The key to I/O virtualization is to solve the problem of DMA and

IRQ interrupt request.

Intel VT-d technology is based on hardware-assisted virtualization

technology of North Bridge. The DMA virtualization hardware

and IRQ virtualization hardware, built in the North Bridge,

greatly enhance the reliability, flexibility and performance of I/O.

Traditional IOMMUs (I/O memory management units)

distinguishes devices through the range of memory address. So it is

easy to realize, but is not easy to implement DMA isolation.

Therefore, VT-d realizes the existence of multiple DMA protected

areas by updating the design of IOMMU architecture, and

achieves DMA virtualization eventually. It is also called DMA

Remapping.

I/O device will generate many interrupt requests, so the I/O virtualization

must separate these requests correctly, and routes them to different virtual

machines. Traditional devices have two kinds of interrupt requests: One way is

through I/O interrupt controller router, and the other way is through

MSI(message signaled interrupts) which is sent by DMA write request directly.

Due to the need to embed the target memory address into DMA request, this

architecture requires fully access all the memory addresses, without realizing

interrupt isolation.

VT-d’s interrupt-remapping architecture solves this problem by redefining

MSI format. The new MSI is still in the form of a DMA write request, but does

not embed the target memory address, and replaces with a message ID instead.

Hardware can identify different VM domains through different message IDs

by maintaining a table structure. The interrupt-remapping architecture

implemented by VT-d is able to support all I/O resources, including IOAPICs,

and all types of interrupt, such as common MSI and extended MSI-X.

DMA Remapping

DMA remapping can provide hardware isolation for

devices to access the memory. Through different I/O

page tables, every device will be assigned to a specific

domain. When the device attempts to access the

system memory, DMA intercepts the access, decides

whether to allow the access, and determines the real

address location simultaneously. When the I/O table

data structure is used frequently, it will be cached.

DMA remapping mechanism can be configured

independently by every device.

Interrupt Remapping

Interrupt remapping provides the

functions of remapping and routing the

interrupt requests from I/O devices.

New design of IOMMU

IOMMU manages device access to system

memory. It locates between the peripheral

devices and the host, and translates the

address of device request to system memory

address, and also checks the appropriate

permission for each access.

With IOMMU, every device can be assigned

to a protection domain, which defines that the

I/O page translation will be used in every

device of the domain, and reveals the read

privilege of every I/O page. As to

virtualization, VMM can specify all devices

to a specific guest OS environment in the

same protected domain, which will create a

series of address translation and access

restrict for devices running on specific guest

OS.

Two kinds of new device virtualization based on VT-d

Direct assignment of I/O device:Physical I/O device is directly assigned to VM. In

this model, drivers inside the VM will directly communicate with hardware devices,

only through a small amount or without the management of VMM. For the sake of

system’s robustness, hardware virtualization is needed to isolate and protect

hardware resources only for specified VM to use. In the meanwhile, hardware also

needs to possess multiple I/O container partitions for multiple VMs simultaneously.

This model almost eliminates the need of running drivers in VMM completely.

Such as CPU,although it is not an I/O device in common sense, it is surely in this

way allocated to VM, while the CPU resources are still under the management of

VMM.

Shared I/O device: This model is an extension of the I/O assignment model, and has

a high requirement that needs to support multiple function interfaces, and each

interface can be assigned to a VM independently. This model will no doubt provide

very high virtualization performance.

Network Virtualization (VT-c)

Intel VT-c can further optimize network for virtualization.

Essentially, the function of this set of technology

combination is similar with post office: categorize all the

received letters, packages and envelopes, and deliver them to

their respective destinations. Intel VT-c significantly

increases the speed of delivery, and reduces the workload of

VMM and server processor through these functions

implementing in private network chips. VT-c includes:

Virtual Machine Device Queue (VMDq)

Virtual Machine Direct Connection (VMDc)

VMDq

In traditional server virtualization environment, VMM must

categorize every individual data packet, and deliver it to its

assigned VM, which will take up a lot of processor cycles. And

with VMDq, this function can be performed by specified hardware

within Intel server network card, and VMM is only responsible to

deliver presort data packet group to appropriate guest OS. This

will slow down I/O latency, and gain more available cycles for

processor to deal with business applications. I/O throughput can

be more than doubled by Intel VT-c, so that virtualized

applications are able to reach the level of the host throughput.

Every server will integrate more applications, while I/O

bottlenecks will be less.

Network virtualization model

Currently, all the VM softwares with

network capabilities have built-in virtual

switches, a majority of which provide the

function of router on that basis. Their

aim is to connect multiple virtual

machines together into one or more

networks, like the effect of real switch or

router.

General network virtualization model

Structure of VMDq

VMDq technology provides a classification/sorting engine, belonging to

the second layer of ISO OSI 7-layer model, realizes part of the functions

of the switch. In order to offer a suitable performance, it must use a stack

buffer queue, therefore the network card that supports VMDq will also

supports RSS receiver’s extended function.

A layer 2 classification/sorting device is realized by a hardware on the

network card that supports VMDq, which through the MAC address or

VLAN to send packets to specified VM queue(this queue is called pool).

VMM software that completes virtual switch task only requires simple

data replication in the final. Thus it greatly improve the efficiency of the

virtual network.

Network card that supports VMDq queue usually supports RSS queue.

For example, Intel 82576EB network card supports 8 VM queues, and 16

RSS queues. The are essentially 16 send/receive queue pairs, which means

every VM can be assigned two pairs.

Diagram of VMDq Acceleration Structure

Make use of hardware to accomplish the work of certain soft routing.

Virtual Machine Direct Connection( VMDc )

With the aid of single root I/O virtualization (SR-IOV)

standard in PCI-SI, VM direct connection (VMDc) supports

VM’s direct access to network I/O hardware, and thus

improves the performance significantly. As it is mentioned

before, Intel VT-d supports direct communication channel

between guest OS and I/O port. SR-IOV can be extended by

supporting each I/O port’s multiple communication

channels. For example,each of the 10 guest OSes can be

assigned a protected and 1Gb/s private link by the mean of

a single Intel 10 Gigabit server network card. These links

bypass the VMM switch,and can further enhance I/O in

performance and reduce workload of server processors.

Security Analysis of VT-d

Hardware virtualization solves the security

problem of virtual system, and provides a

better isolation solution in system hardware

resources.

But the hardware system is complicated, so

there are still some security problems to be

solved. In the meantime, a few attackers

have discovered some loopholes in hardware

virtualization.

Attack Scenario

Assume such a virtual system, which builds a driver

domain with the aid of the Intel VT-d technology.

Driver domains are similar to traditional VMs, but

they are assigned the privileges of choosing devices

such as network card, disk controller etc.

We can attempt to get the complete control of the

whole system by the mean of such a deriver domain.

In this attack scenario, we suppose that attackers

have managed to get a full control of a certain driver

domain.

Diagram

MSI( Message Signaled Interrupts )

MSI Format(From Intel developer manual ):

All the three attacks, which will be mentioned later, make use of I/O devices to generate the MSI, so as to realize the attack.

1)Threat based on SIPI Construction

SIPI ( Start-up Inter Processor Interrupt )

interrupt is a key function of any multiprocessor

(or multi-core) system based on Intel processor.

BIOS uses SIPI interrupt to initialize all

processers and distribute tasks to them at startup.

When system starts, only one processor, called

Bootstrap processor or BSP, is active, and its job

is to initialize other processors to make them

work properly.

SIPI interrupt informs target processor to start to

execute special boot code at the address 0xvv000.

While VV is passed by SIPI interrupt vector. In

order to make SIPI effective, target CPU must be

sent a INIT interrupt firstly, which will reset CPU to

enter the wait-for-SIPI state. BSP sends SIPI

interrupts to all other processors under normal

circumstances.

The only mechanism of sending SIPI interrupt is

through the local advanced programmable interrupt

controller.

SIPI 格 式( 摘 自Intel 开 发人员手册):

Diagram of Attack

2)System call injection attack

Driver Domain

CPU#0 CPU#1 CPU#2

Hypervisor

NIC

0x82h

hypercall

Dom0

3)#AC-based injection attack

#AC can be tried to confuse the stack layout

of exception handler.

#AC exception is the only exception that

meets the following two requirements:

The vector value is greater than 15, so that it

can be distributed by MSI;

It is the only one that can be interpreted as

exception, without storage error codes.

LOW

HIG

H

ErrorCode

RIP

CS

RFLAGS

RSP

SS

Normal distribution of #AC exception

Storage exception code

The #AC handler will be triggered to execute on

target CPU if the MSI, with a vector value 0x11(#

AC), is distributed from some devices. Because

handler is expected to place error codes on the top of

the stack, so it will go wrong when resolve other

values on the stack. In this case, CS may be revolved

to RIP, and RFLAGS will be treated as CS and so on.

When an exception handler ends, it will execute

IRET instruction to popup saved register values, and

jumps back to CS:RIP, which means that handler

will return to RFLAGS:CS actually。

Mapping

Bibliography 1. Hiremane, R. (2007). "Intel virtualization technology for directed i/o (intel vt-d)."

Technology@ Intel Magazine 4(10).

2. Neiger, G., et al. (2006). "Intel virtualization technology: Hardware support for

efficient processor virtualization." Intel Technology Journal 10(3): 167-177.

3. Uhlig, R., et al. (2005). "Intel virtualization technology." Computer 38(5): 48-56.

4. Adams, K. and O. Agesen (2006). A comparison of software and hardware

techniques for x86 virtualization. ACM SIGOPS Operating Systems Review, ACM.

5. Zhang, X. and Y. Dong (2008). Optimizing Xen VMM Based on Intel®

Virtualization Technology. Internet Computing in Science and Engineering, 2008.

ICICSE'08. International Conference on, IEEE.

6. Perez, R., et al. (2008). "Virtualization and hardware-based security." Security &

Privacy, IEEE 6(5): 24-31.

7. De Gelas, J. and I. ESX (2008). "Hardware Virtualization: the Nuts and Bolts."

AnandTech. Retrieved March 17: 2008.

Intel UPO Supported