a comparison of software and hardware techniques for x86

A Comparison of Software and Hardware Techniques for

x86 Virtualization

A Comparison of Software and Hardware Techniques for

x86 Virtualization

Paper by Keith Adams & Ole Agesen (VMWare)

Presentation by Jason Agron

Presentation OverviewPresentation Overview

• What is virtualization?• Traditional virtualization techniques.• Overview of Software VMM.• Overview of Hardware VMM.• Evaluation of VMMs.• Conclusions• Questions

“Virtualization”“Virtualization”

• Defined by Popek & Goldberg in 1974.• Establishes 3 essential characteristics of a VMM:

• Fidelity• Running on VMM == Running directly on HW.

• Performance• Performance on VMM == Performance on HW.

• Safety• VMM manages all hardware resources (correctly?).

Is This Definition Correct?Is This Definition Correct?

• Yes, but it’s scope should be taken into account.• It assumes the traditional “trap-and-emulate” style

of full virtualization.• This was extremely popular circa 1974.

• Completely “transparent”.

• It does not account for…• Paravirtualization.

• Not transparent.

• Guest software is modified.

Full VirtualizationFull Virtualization

• Full == Transparent

• Must be able to “detect” when VMM must intervene.

• Definitions:• Sensitive Instruction:

• Accesses and/or modifies privileged state.

• Privileged Instruction:• Traps when run in an unprivileged mode.

Traditional TechniquesTraditional Techniques

• De-privileging• Run guest programs in a reduced privilege level

so that privileged instructions trap.• VMM intercepts the trap and emulates the

functionality of the original call.• Very similar to the way programs transfer

control to the OS kernel during a system call.


• Primary & Shadow Structures• Each virtual system’s privileged state differs from that

of the underlying HW.• Therefore, the VMM must provide the “correct”

environment to meet the guests’ expectations.

• Guest-level primary structures reflect the state that a guest sees.

• VMM-level shadow structures are copies of primary structures.• Kept coherent via “memory traces”.


• Memory traces• Traps occur when on-chip privileged state is

accessed/modified.• What about off-chip privileged state?

• i.e. page tables.They can be accessed by LOADs/STOREs.

Either by CPU or DMA-capable devices.

• HW page protection schemes are employed to “detect” when this happens.

Refinements to Classical Virtualization

Refinements to Classical Virtualization

• Traps are expensive!• Improve the Guest/VMM interface:

• AKA Paravirtualization.• Allows for higher-level information to be passed to the

VMM.• Can provide features beyond the baseline of “classic”

virtualization.

• Improve the VMM/HW interface:• IBM’s System 370 - Interpretive Execution Mode.• Guests allowed safe and direct access to certain pieces

of privileged information w/o trapping.

Software VMMSoftware VMM

• x86 - not “classically” virtualizable.• Visibility of privileged state.

• i.e. Guest can observe it’s privilege level via un-protected %cs register.

• Not all sensitive instructions trap.• i.e. Privileged execution of popf (pop flags)

instruction modifies on-chip privileged state.

• Unprivileged execution must trap so that VMM can emulate it’s effects.

• Unfortunately, no trap occurs, instead a NO-OP.

Software VMMSoftware VMM

• How can x86’s faults be overcome?• What if guests execute on an interpreter?• The interpreter can…

• Prevent leakage of privileged state.• Ensure that all sensitive instructions are correctly

detected.

• Therefore it can provide…• Fidelity• Safety• Performance??

Interpreter-Based Software VMMInterpreter-Based Software VMM

• Authors’ Statement:• An interpreter-based VMM will not provide

adequate performance.• A single native x86 instruction will take N

instructions to interpret.

• Question:• Is this necessarily true?

• Authors’ Solution:• Binary Translation.

Properties of This BTProperties of This BT

• Dynamic and On-Demand• Run-time translation interleaved with code execution.• Code is translated only when about to execute.• Laziness avoids problem of distinguishing code & data.

• System-level• All translation rules are set by the x86 ISA.

• Subsetting• Input is x86 ISA binary • Output is a “safe” subset of the ISA.

• Mostly user-mode instructions.

• Adaptive• Can optimize generated code over time

BT ProcessBT Process

• Input a TU (Translation Unit)• Stopping at either:

• 12 instructions.• Terminating instruction (usually control flow).

• Translate the TU into a CCF (Compiled Code Fragment).

• Place generated CCF into the TC (Translation Cache).

BT ProcessBT Process

• CCFs must be chained together to form a “complete” program.

• Each CCF ends in a continuation that acts as a link.• Continuations are evaluated at run-time…

• Can be translated into jumps• Can be “removed” (code merely falls through to next CCF).

• If a continuation is never “hit”…• Then it is never transformed.• Thus, the BT acts like a just-in-time compiler.

• Software VMM can switch between BT-mode and direct execution.

• Performance optimization.

Adaptive BTAdaptive BT

• Traps are expensive.• BT can avoid some traps.

• i.e. rdtsc instruction• TC emulation << Call-out & emulate << Trap-

and-emulate.

• Sensitive non-privileged instructions are harder to avoid.• i.e. LOADs/STOREs to privileged data.• Use adaptive BT to re-work code.

Adaptive BTAdaptive BT

• Detect instructions that trap frequently

• Adapt the translation of these instructions.• Re-translate to avoid trapping.

• Jump directly to translation.

• Call out to interpreter.

• Adaptive BT tries to eliminate more and more traps over time.

Hardware VMMHardware VMM

• Experimental VMM based on new x86 virtualization extensions.• AMD’s SVM & Intel’s VT.

• New HW features:• Virtual Machine Control Blocks (VMCBs).

• Guest mode privilege level.

• Ability to transfer control to/from guest mode.• vmrun - host to guest.

• exit - guest to host.

Hardware VMMHardware VMM

• VMM executes vmrun to start a guest.• Guest state is loaded into HW from in-memory VMCB.

• Guest mode is resumed and guest continues execution.

• Guests execute until they “toy” with control bits of the VMCB.• An exit operation occurs.

• Guest saves data to VMCB.

• VMM state is loaded into HW - switches to host mode.

• VMM begins executing.

x86 Architecture Extensionsx86 Architecture Extensions

Qualitative ComparisonQualitative Comparison

• Software wins in…• Trap elimination via adaptive BT.

• HW replaces traps w/ exits.

• Emulation speed.• Translations and call-outs essentially jump to pre-

decoded emulation routines.

• HW VMM must fetch VMCB and decode trapping instructions before emulating.

Qualitative ComparisonQualitative Comparison

• Hardware wins in…• Code density.

• No translation = No replicated code segments

• Precise exceptions.• BT approach must perform extra work to recover

guest state for faults and interrupts.

• HW approach can just examine the VMCB.

• System calls.• [Can] run w/o VMM intervention.

Qualitative Comparison (Summary)

Qualitative Comparison (Summary)

• Hardware VMMs…• Native performance for things that avoid exits.• However exits are still costly (currently).

• Strongly targeted towards “trap-and-emulate” style.

• Software VMMs…• Carefully engineered to be efficient.• Flexible (b/c it isn’t HW).

ExperimentsExperiments

• 3.8 GHz Intel Pentium 4.• HT disabled (b/c most virtualization products

can’t handle this).

• The contenders…• Mature commercial Software VMM.• Recently developed Hardware VMM.

• Fair battle?

SPECint & SPECjbbSPECint & SPECjbb

• Primarily user-level computations.• Unaffected by VMMs• Therefore, performance should be near native.

• Experimental results confirm this.• 4% average slowdown for Software VMM.• 5% average slowdown for Hardware VMM.• The cause is “host background activity”.

• Windows jiffy rate << Linux jiffy rate• Windows test closer to native than Linux test.

Apache ab BenchmarkApache ab Benchmark

• Tests I/O efficiency• SW VMM (and HW VMM?) use host as I/O

controller.• Therefore ~2x overhead of normal I/O

• Experimental results confirm this…• ~ 2x slowdown.• Both HW and SW VMMs “suck”.• Windows and Linux tests differ widely

• Windows - single process (less paging). HW VMM is better.

• Linux - multiple processes (more paging). SW VMM is better.

• Why (hint: VMCB)?

PassMark BenchmarksPassMark Benchmarks

• A synthetic suite of microbenchmarks.• used to pinpoint various aspects of workstation

performance.

• Large RAM test - exhausts memory• Intended to test paging capability• SW VMM wins.

• 2D Graphics test - hits system calls• HW VMM wins.

Compile Jobs TestCompile Jobs Test

• “Less” synthetic test.• Compilation time of Linux Kernel, Apache, etc.

• SW VMM beats the HW VMM again.• Big compilation job w/ lots of files = Lots of page

faults.• SW VMM is better at this than HW VMM.

• Compared to native speed…• SW VMM is ~60% as fast.• HW VMM is ~55% as fast.

ForkWait TestForkWait Test

• Test to stress process creation/destruction.• System calls, context switching, page table modifications, page

faults, context switching, etc.

• Native = 6.0 seconds.• SW VMM = 36.9 seconds.• HW VMM = 106.4 seconds.

NanobenchmarksNanobenchmarks

• Tests used to exercise single “virtualization sensitive” operations.• All tests are conducted using a specially developed guest OS -- FrobOS.

NanobenchmarksNanobenchmarks

• Syscall (Native == HW << SW)• HW VMM doesn’t intervene.• SW VMM traps.

• In (SW << Native << HW)• Native goes off-chip.• SW VMM interacts with virtual CPU model.• HW VMM intervenes

• Ptemod (Native << SW << HW)• Both take a hit (both use shadowing)• SW VMM can adapt, but still less than ideal.• HW VMM can’t, so it must always do exit/vmrun.

Analysis of ResultsAnalysis of Results

• SW and HW VMMs are “even” except…• When BT adaptation helps.

• i.e. page table faults vs.. exit/vmrun round-trips.

• They claim that “we have found few workloads that benefit from current HW extensions”.

• BUT…• HW extensions are getting faster all of the time.

• But “stateless” HW VMM approach still has a memory bottleneck with VMCB access!

• Trouble w/ HW VMM is MMU virtualization.• HW assisted MMU could relieve VMM of a lot of work!• Being proposed by both AMD and Intel.

Future/Related WorksFuture/Related Works

• CISC/RISC?• Should the HW be more complex to support

virtualization?• Should a complex SW VMM be used?

• Open source?• Open source OS code allows for paravirtualization.• What should the OS/VMM interface be?

• It should be investigated, standardized, documented, and most importantly SUPPORTED!

• What should the OS/HW interface be?• This should be looked at as well!

ConclusionsConclusions

• Hardware extensions now allow x86 to execute guests directly (trap-and-emulate style).

• Comparison of SW and HW VMMs…• Both are able to execute computation-bound

workloads at near native speed.• When I/O and process management is involved.

• SW prevails.

• When there are a lot of system calls.• HW prevails.

ConclusionsConclusions

• SW VMM techniques are very mature.• Also, very flexible.

• New x86 extensions are relatively immature and present a fixed (inflexible) interface.

• Future work on HW extensions promises to improve performance.

• Hybrid SW/HW VMMs promise to provide benefits of both worlds.

• There is no “clear” winner at this time.

Questions????Questions????

• References:

• K. Adams and O. Agesen (2006). A comparison of software and hardware techniques for x86 virtualization. In Proceedings of the 12th international Conference on Architectural Support For Programming Languages and Operating Systems. ASPLOS-XII. ACM Press, New York, NY, 2-13.

a comparison of software and hardware techniques for x86

Technology