the turtles project: design and implementation of …...mode, linux/kvm) to be able to run other...

63
The Turtles Project: Design and Implementation of Nested Virtualization Muli Ben-Yehuda Michael D. Day Zvi Dubitzky Michael Factor Nadav Har’El Abel Gordon Anthony Liguori Orit Wasserman Ben-Ami Yassour IBM Research – Haifa IBM Linux Technology Center Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 1 / 22

Upload: others

Post on 07-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

The Turtles Project:Design and Implementation of Nested Virtualization

Muli Ben-Yehuda† Michael D. Day‡ Zvi Dubitzky† Michael Factor†

Nadav Har’El† Abel Gordon† Anthony Liguori‡ Orit Wasserman†

Ben-Ami Yassour†

†IBM Research – Haifa

‡IBM Linux Technology Center

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 1 / 22

Page 2: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

What is nested x86 virtualization?

Running multiple unmodifiedhypervisorsWith their associatedunmodified VM’sSimultaneouslyOn the x86 architectureWhich does not supportnesting in hardware. . .. . . but does support a singlelevel of virtualization Hardware

Hypervisor

GuestHypervisor

GuestOS

GuestOSGuestOS

GuestOS

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 2 / 22

Page 3: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Why?

Operating systems are already hypervisors (Windows 7 with XPmode, Linux/KVM)To be able to run other hypervisors in cloudsSecurity (e.g., hypervisor-level rootkits)Co-design of x86 hardware and system softwareTesting, demonstrating, debugging, live migration of hypervisors

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 3 / 22

Page 4: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Why?

Operating systems are already hypervisors (Windows 7 with XPmode, Linux/KVM)To be able to run other hypervisors in cloudsSecurity (e.g., hypervisor-level rootkits)Co-design of x86 hardware and system softwareTesting, demonstrating, debugging, live migration of hypervisors

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 3 / 22

Page 5: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Why?

Operating systems are already hypervisors (Windows 7 with XPmode, Linux/KVM)To be able to run other hypervisors in cloudsSecurity (e.g., hypervisor-level rootkits)Co-design of x86 hardware and system softwareTesting, demonstrating, debugging, live migration of hypervisors

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 3 / 22

Page 6: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Why?

Operating systems are already hypervisors (Windows 7 with XPmode, Linux/KVM)To be able to run other hypervisors in cloudsSecurity (e.g., hypervisor-level rootkits)Co-design of x86 hardware and system softwareTesting, demonstrating, debugging, live migration of hypervisors

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 3 / 22

Page 7: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Why?

Operating systems are already hypervisors (Windows 7 with XPmode, Linux/KVM)To be able to run other hypervisors in cloudsSecurity (e.g., hypervisor-level rootkits)Co-design of x86 hardware and system softwareTesting, demonstrating, debugging, live migration of hypervisors

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 3 / 22

Page 8: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Related work

First models for nested virtualization [PopekGoldberg74,BelpaireHsu75, LauerWyeth73]First implementation in the IBM z/VM; relies on architecturalsupport for nested virtualization (sie)Microkernels meet recursive VMs [FordHibler96]: assumes wecan modify software at all levelsx86 software based approaches (slow!) [Berghmans10]KVM [KivityKamay07] with AMD SVM [RoedelGraf09]Early Xen prototype [He09]Blue Pill rootkit hiding from other hypervisors [Rutkowska06]

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 4 / 22

Page 9: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Related work

First models for nested virtualization [PopekGoldberg74,BelpaireHsu75, LauerWyeth73]First implementation in the IBM z/VM; relies on architecturalsupport for nested virtualization (sie)Microkernels meet recursive VMs [FordHibler96]: assumes wecan modify software at all levelsx86 software based approaches (slow!) [Berghmans10]KVM [KivityKamay07] with AMD SVM [RoedelGraf09]Early Xen prototype [He09]Blue Pill rootkit hiding from other hypervisors [Rutkowska06]

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 4 / 22

Page 10: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Related work

First models for nested virtualization [PopekGoldberg74,BelpaireHsu75, LauerWyeth73]First implementation in the IBM z/VM; relies on architecturalsupport for nested virtualization (sie)Microkernels meet recursive VMs [FordHibler96]: assumes wecan modify software at all levelsx86 software based approaches (slow!) [Berghmans10]KVM [KivityKamay07] with AMD SVM [RoedelGraf09]Early Xen prototype [He09]Blue Pill rootkit hiding from other hypervisors [Rutkowska06]

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 4 / 22

Page 11: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Related work

First models for nested virtualization [PopekGoldberg74,BelpaireHsu75, LauerWyeth73]First implementation in the IBM z/VM; relies on architecturalsupport for nested virtualization (sie)Microkernels meet recursive VMs [FordHibler96]: assumes wecan modify software at all levelsx86 software based approaches (slow!) [Berghmans10]KVM [KivityKamay07] with AMD SVM [RoedelGraf09]Early Xen prototype [He09]Blue Pill rootkit hiding from other hypervisors [Rutkowska06]

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 4 / 22

Page 12: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Related work

First models for nested virtualization [PopekGoldberg74,BelpaireHsu75, LauerWyeth73]First implementation in the IBM z/VM; relies on architecturalsupport for nested virtualization (sie)Microkernels meet recursive VMs [FordHibler96]: assumes wecan modify software at all levelsx86 software based approaches (slow!) [Berghmans10]KVM [KivityKamay07] with AMD SVM [RoedelGraf09]Early Xen prototype [He09]Blue Pill rootkit hiding from other hypervisors [Rutkowska06]

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 4 / 22

Page 13: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Related work

First models for nested virtualization [PopekGoldberg74,BelpaireHsu75, LauerWyeth73]First implementation in the IBM z/VM; relies on architecturalsupport for nested virtualization (sie)Microkernels meet recursive VMs [FordHibler96]: assumes wecan modify software at all levelsx86 software based approaches (slow!) [Berghmans10]KVM [KivityKamay07] with AMD SVM [RoedelGraf09]Early Xen prototype [He09]Blue Pill rootkit hiding from other hypervisors [Rutkowska06]

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 4 / 22

Page 14: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Related work

First models for nested virtualization [PopekGoldberg74,BelpaireHsu75, LauerWyeth73]First implementation in the IBM z/VM; relies on architecturalsupport for nested virtualization (sie)Microkernels meet recursive VMs [FordHibler96]: assumes wecan modify software at all levelsx86 software based approaches (slow!) [Berghmans10]KVM [KivityKamay07] with AMD SVM [RoedelGraf09]Early Xen prototype [He09]Blue Pill rootkit hiding from other hypervisors [Rutkowska06]

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 4 / 22

Page 15: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

What is the Turtles project?

Efficient nested virtualization for Intel x86 based on KVMMultiple guest hypervisors and VMs: VMware, Windows, . . .Code publicly available

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 5 / 22

Page 16: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

What is the Turtles project? (cont’)

Nested VMX virtualization for nested CPU virtualizationMulti-dimensional paging for nested MMU virtualizationMulti-level device assignment for nested I/O virtualizationMicro-optimizations to make it go fast (see paper)

+ + =

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 6 / 22

Page 17: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

What is the Turtles project? (cont’)

Nested VMX virtualization for nested CPU virtualizationMulti-dimensional paging for nested MMU virtualizationMulti-level device assignment for nested I/O virtualizationMicro-optimizations to make it go fast (see paper)

+ + =

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 6 / 22

Page 18: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

What is the Turtles project? (cont’)

Nested VMX virtualization for nested CPU virtualizationMulti-dimensional paging for nested MMU virtualizationMulti-level device assignment for nested I/O virtualizationMicro-optimizations to make it go fast (see paper)

+ + =

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 6 / 22

Page 19: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

What is the Turtles project? (cont’)

Nested VMX virtualization for nested CPU virtualizationMulti-dimensional paging for nested MMU virtualizationMulti-level device assignment for nested I/O virtualizationMicro-optimizations to make it go fast (see paper)

+ + =

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 6 / 22

Page 20: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Theory of nested CPU virtualization

Single-level architectural support (x86) vs. multi-level architecturalsupport (e.g., z/VM)Single level ⇒ one hypervisor, many guestsTurtles approach: L0 multiplexes the hardware between L1 and L2,running both as guests of L0—without either being aware of it(Scheme generalized for n levels; Our focus is n=2)

Hardware

Host Hypervisor

Guest

Hardware

Host Hypervisor

Multiplexed on a single level Multiple logical levels

L0

L1

L2

L1

GuestL2

GuestL2

L0

GuestL2L2

Guest Hypervisor

GuestHypervisor GuestGuest

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 7 / 22

Page 21: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Theory of nested CPU virtualization

Single-level architectural support (x86) vs. multi-level architecturalsupport (e.g., z/VM)Single level ⇒ one hypervisor, many guestsTurtles approach: L0 multiplexes the hardware between L1 and L2,running both as guests of L0—without either being aware of it(Scheme generalized for n levels; Our focus is n=2)

Hardware

Host Hypervisor

Guest

Hardware

Host Hypervisor

Multiplexed on a single level Multiple logical levels

L0

L1

L2

L1

GuestL2

GuestL2

L0

GuestL2L2

Guest Hypervisor

GuestHypervisor GuestGuest

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 7 / 22

Page 22: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Theory of nested CPU virtualization

Single-level architectural support (x86) vs. multi-level architecturalsupport (e.g., z/VM)Single level ⇒ one hypervisor, many guestsTurtles approach: L0 multiplexes the hardware between L1 and L2,running both as guests of L0—without either being aware of it(Scheme generalized for n levels; Our focus is n=2)

Hardware

Host Hypervisor

Guest

Hardware

Host Hypervisor

Multiplexed on a single level Multiple logical levels

L0

L1

L2

L1

GuestL2

GuestL2

L0

GuestL2L2

Guest Hypervisor

GuestHypervisor GuestGuest

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 7 / 22

Page 23: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Theory of nested CPU virtualization

Single-level architectural support (x86) vs. multi-level architecturalsupport (e.g., z/VM)Single level ⇒ one hypervisor, many guestsTurtles approach: L0 multiplexes the hardware between L1 and L2,running both as guests of L0—without either being aware of it(Scheme generalized for n levels; Our focus is n=2)

Hardware

Host Hypervisor

Guest

Hardware

Host Hypervisor

Multiplexed on a single level Multiple logical levels

L0

L1

L2

L1

GuestL2

GuestL2

L0

GuestL2L2

Guest Hypervisor

GuestHypervisor GuestGuest

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 7 / 22

Page 24: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Nested VMX virtualization: flow

L0 runs L1 with VMCS0→1

L1 prepares VMCS1→2 andexecutes vmlaunchvmlaunch traps to L0

L0 merges VMCS’s:VMCS0→1 merged withVMCS1→2 is VMCS0→2

L0 launches L2

L2 causes a trapL0 handles trap itself orforwards it to L1

. . .eventually, L0 resumes L2

repeat

Hardware

Host Hypervisor

GuestOS

VMCS

MemoryTables

VMCS

MemoryTables

VMCSMemoryTables

L0

L1 L2

1-2 State

0-2 State0-1 State

GuestHypervisor

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22

Page 25: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Nested VMX virtualization: flow

L0 runs L1 with VMCS0→1

L1 prepares VMCS1→2 andexecutes vmlaunchvmlaunch traps to L0

L0 merges VMCS’s:VMCS0→1 merged withVMCS1→2 is VMCS0→2

L0 launches L2

L2 causes a trapL0 handles trap itself orforwards it to L1

. . .eventually, L0 resumes L2

repeat

Hardware

Host Hypervisor

GuestOS

VMCS

MemoryTables

VMCS

MemoryTables

VMCSMemoryTables

L0

L1 L2

1-2 State

0-2 State0-1 State

GuestHypervisor

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22

Page 26: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Nested VMX virtualization: flow

L0 runs L1 with VMCS0→1

L1 prepares VMCS1→2 andexecutes vmlaunchvmlaunch traps to L0

L0 merges VMCS’s:VMCS0→1 merged withVMCS1→2 is VMCS0→2

L0 launches L2

L2 causes a trapL0 handles trap itself orforwards it to L1

. . .eventually, L0 resumes L2

repeat

Hardware

Host Hypervisor

GuestOS

VMCS

MemoryTables

VMCS

MemoryTables

VMCSMemoryTables

L0

L1 L2

1-2 State

0-2 State0-1 State

GuestHypervisor

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22

Page 27: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Nested VMX virtualization: flow

L0 runs L1 with VMCS0→1

L1 prepares VMCS1→2 andexecutes vmlaunchvmlaunch traps to L0

L0 merges VMCS’s:VMCS0→1 merged withVMCS1→2 is VMCS0→2

L0 launches L2

L2 causes a trapL0 handles trap itself orforwards it to L1

. . .eventually, L0 resumes L2

repeat

Hardware

Host Hypervisor

GuestOS

VMCS

MemoryTables

VMCS

MemoryTables

VMCSMemoryTables

L0

L1 L2

1-2 State

0-2 State0-1 State

GuestHypervisor

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22

Page 28: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Nested VMX virtualization: flow

L0 runs L1 with VMCS0→1

L1 prepares VMCS1→2 andexecutes vmlaunchvmlaunch traps to L0

L0 merges VMCS’s:VMCS0→1 merged withVMCS1→2 is VMCS0→2

L0 launches L2

L2 causes a trapL0 handles trap itself orforwards it to L1

. . .eventually, L0 resumes L2

repeat

Hardware

Host Hypervisor

GuestOS

VMCS

MemoryTables

VMCS

MemoryTables

VMCSMemoryTables

L0

L1 L2

1-2 State

0-2 State0-1 State

GuestHypervisor

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22

Page 29: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Nested VMX virtualization: flow

L0 runs L1 with VMCS0→1

L1 prepares VMCS1→2 andexecutes vmlaunchvmlaunch traps to L0

L0 merges VMCS’s:VMCS0→1 merged withVMCS1→2 is VMCS0→2

L0 launches L2

L2 causes a trapL0 handles trap itself orforwards it to L1

. . .eventually, L0 resumes L2

repeat

Hardware

Host Hypervisor

GuestOS

VMCS

MemoryTables

VMCS

MemoryTables

VMCSMemoryTables

L0

L1 L2

1-2 State

0-2 State0-1 State

GuestHypervisor

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22

Page 30: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Nested VMX virtualization: flow

L0 runs L1 with VMCS0→1

L1 prepares VMCS1→2 andexecutes vmlaunchvmlaunch traps to L0

L0 merges VMCS’s:VMCS0→1 merged withVMCS1→2 is VMCS0→2

L0 launches L2

L2 causes a trapL0 handles trap itself orforwards it to L1

. . .eventually, L0 resumes L2

repeat

Hardware

Host Hypervisor

GuestOS

VMCS

MemoryTables

VMCS

MemoryTables

VMCSMemoryTables

L0

L1 L2

1-2 State

0-2 State0-1 State

GuestHypervisor

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22

Page 31: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Nested VMX virtualization: flow

L0 runs L1 with VMCS0→1

L1 prepares VMCS1→2 andexecutes vmlaunchvmlaunch traps to L0

L0 merges VMCS’s:VMCS0→1 merged withVMCS1→2 is VMCS0→2

L0 launches L2

L2 causes a trapL0 handles trap itself orforwards it to L1

. . .eventually, L0 resumes L2

repeat

Hardware

Host Hypervisor

GuestOS

VMCS

MemoryTables

VMCS

MemoryTables

VMCSMemoryTables

L0

L1 L2

1-2 State

0-2 State0-1 State

GuestHypervisor

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22

Page 32: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Nested VMX virtualization: flow

L0 runs L1 with VMCS0→1

L1 prepares VMCS1→2 andexecutes vmlaunchvmlaunch traps to L0

L0 merges VMCS’s:VMCS0→1 merged withVMCS1→2 is VMCS0→2

L0 launches L2

L2 causes a trapL0 handles trap itself orforwards it to L1

. . .eventually, L0 resumes L2

repeat

Hardware

Host Hypervisor

GuestOS

VMCS

MemoryTables

VMCS

MemoryTables

VMCSMemoryTables

L0

L1 L2

1-2 State

0-2 State0-1 State

GuestHypervisor

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22

Page 33: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Nested VMX virtualization: flow

L0 runs L1 with VMCS0→1

L1 prepares VMCS1→2 andexecutes vmlaunchvmlaunch traps to L0

L0 merges VMCS’s:VMCS0→1 merged withVMCS1→2 is VMCS0→2

L0 launches L2

L2 causes a trapL0 handles trap itself orforwards it to L1

. . .eventually, L0 resumes L2

repeat

Hardware

Host Hypervisor

GuestOS

VMCS

MemoryTables

VMCS

MemoryTables

VMCSMemoryTables

L0

L1 L2

1-2 State

0-2 State0-1 State

GuestHypervisor

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 8 / 22

Page 34: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Exit multiplication makes angry turtle angry

To handle a single L2 exit, L1 does many things: read and writethe VMCS, disable interrupts, . . .Those operations can trap, leading to exit multiplicationExit multiplication: a single L2 exit can cause 40-50 L1 exits!Optimize: make a single exit fast and reduce frequency of exits

……

L0

L1

L2

L3

Two Levels

Three LevelsSingle Level

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 9 / 22

Page 35: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Exit multiplication makes angry turtle angry

To handle a single L2 exit, L1 does many things: read and writethe VMCS, disable interrupts, . . .Those operations can trap, leading to exit multiplicationExit multiplication: a single L2 exit can cause 40-50 L1 exits!Optimize: make a single exit fast and reduce frequency of exits

……

L0

L1

L2

L3

Two Levels

Three LevelsSingle Level

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 9 / 22

Page 36: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Exit multiplication makes angry turtle angry

To handle a single L2 exit, L1 does many things: read and writethe VMCS, disable interrupts, . . .Those operations can trap, leading to exit multiplicationExit multiplication: a single L2 exit can cause 40-50 L1 exits!Optimize: make a single exit fast and reduce frequency of exits

……

L0

L1

L2

L3

Two Levels

Three LevelsSingle Level

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 9 / 22

Page 37: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Exit multiplication makes angry turtle angry

To handle a single L2 exit, L1 does many things: read and writethe VMCS, disable interrupts, . . .Those operations can trap, leading to exit multiplicationExit multiplication: a single L2 exit can cause 40-50 L1 exits!Optimize: make a single exit fast and reduce frequency of exits

……

L0

L1

L2

L3

Two Levels

Three LevelsSingle Level

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 9 / 22

Page 38: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

MMU virtualization via multi-dimensional paging

Three logical translations: L2 virt → phys, L2 → L1, L1 → L0

Only two tables in hardware with EPT:virt → phys and guest physical→ host physicalL0 compresses three logical translations onto two hardware tables

SPT12

L2 virtual

L2 physical

L1 physical

L0 physical

GPT

L2 virtual

L2 physical

L1 physical

L0 physical

GPT

SPT02

Shadow on top of shadow

SPT12

EPT01

L2 virtual

L2 physical

L1 physical

L0 physical

GPT

EPT02

EPT12

Multi-dimensional paging

Shadow on top of EPT

EPT01

baseline better best

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 10 / 22

Page 39: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Baseline: shadow-on-shadow

SPT12

L2 virtual

L2 physical

L1 physical

L0 physical

GPT

SPT02

Assume no EPT table; all hypervisors use shadow pagingUseful for old machines and as a baselineMaintaining shadow page tables is expensiveCompress: three logical translations⇒ one table in hardware

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 11 / 22

Page 40: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Better: shadow-on-EPT

L2 virtual

L2 physical

L1 physical

L0 physical

SPT12

EPT01

GPT

Instead of one hardware table we have twoCompress: three logical translations⇒ two in hardwareSimple approach: L0 uses EPT, L1 uses shadow paging for L2

Every L2 page fault leads to multiple L1 exits

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 12 / 22

Page 41: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Best: multi-dimensional pagingL2 virtual

L2 physical

L1 physical

L0 physical

GPT

EPT02

EPT12

EPT01

EPT table rarely changes; guest page table changes a lotAgain, compress three logical translations⇒ two in hardwareL0 emulates EPT for L1

L0 uses EPT0→1 and EPT1→2 to construct EPT0→2

End result: a lot less exits!Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 13 / 22

Page 42: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Introduction to I/O virtualization

Device emulation [Sugerman01]GUEST

HOST

1

2

34

deviceemulation

driverdevice

driverdevice

Para-virtualized drivers [Barham03, Russell08]GUEST

HOST

driver

1

23

back−end

virtualdriver

front−end

virtualdevicedriver

Direct device assignment [Levasseur04,Yassour08]GUEST

HOST

devicedriver

Direct assignment best performing optionDirect assignment requires IOMMU for safe DMA bypass

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22

Page 43: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Introduction to I/O virtualization

Device emulation [Sugerman01]GUEST

HOST

1

2

34

deviceemulation

driverdevice

driverdevice

Para-virtualized drivers [Barham03, Russell08]GUEST

HOST

driver

1

23

back−end

virtualdriver

front−end

virtualdevicedriver

Direct device assignment [Levasseur04,Yassour08]GUEST

HOST

devicedriver

Direct assignment best performing optionDirect assignment requires IOMMU for safe DMA bypass

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22

Page 44: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Introduction to I/O virtualization

Device emulation [Sugerman01]GUEST

HOST

1

2

34

deviceemulation

driverdevice

driverdevice

Para-virtualized drivers [Barham03, Russell08]GUEST

HOST

driver

1

23

back−end

virtualdriver

front−end

virtualdevicedriver

Direct device assignment [Levasseur04,Yassour08]GUEST

HOST

devicedriver

Direct assignment best performing optionDirect assignment requires IOMMU for safe DMA bypass

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22

Page 45: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Introduction to I/O virtualization

Device emulation [Sugerman01]GUEST

HOST

1

2

34

deviceemulation

driverdevice

driverdevice

Para-virtualized drivers [Barham03, Russell08]GUEST

HOST

driver

1

23

back−end

virtualdriver

front−end

virtualdevicedriver

Direct device assignment [Levasseur04,Yassour08]GUEST

HOST

devicedriver

Direct assignment best performing option

Direct assignment requires IOMMU for safe DMA bypass

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22

Page 46: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Introduction to I/O virtualization

Device emulation [Sugerman01]GUEST

HOST

1

2

34

deviceemulation

driverdevice

driverdevice

Para-virtualized drivers [Barham03, Russell08]GUEST

HOST

driver

1

23

back−end

virtualdriver

front−end

virtualdevicedriver

Direct device assignment [Levasseur04,Yassour08]GUEST

HOST

devicedriver

Direct assignment best performing optionDirect assignment requires IOMMU for safe DMA bypass

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 14 / 22

Page 47: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Multi-level device assignment

With nested 3x3 options for I/O virtualization (L2 ⇔ L1 ⇔ L0)Multi-level device assignment means giving an L2 guest directaccess to L0’s devices, safely bypassing both L0 and L1

L1 hypervisor

physical device

L0 hypervisor

L2 device driver

MMIOs and PIOs

L0 IOMMUL1 IOMMU

Device DMA via platform IOMMU

How? L0 emulates an IOMMU for L1 [Amit10]L0 compresses multiple IOMMU translations onto the singlehardware IOMMU page tableL2 programs the device directlyDevice DMA’s into L2 memory space directly

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22

Page 48: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Multi-level device assignment

With nested 3x3 options for I/O virtualization (L2 ⇔ L1 ⇔ L0)Multi-level device assignment means giving an L2 guest directaccess to L0’s devices, safely bypassing both L0 and L1

L1 hypervisor

physical device

L0 hypervisor

L2 device driver

MMIOs and PIOs

L0 IOMMUL1 IOMMU

Device DMA via platform IOMMU

How? L0 emulates an IOMMU for L1 [Amit10]L0 compresses multiple IOMMU translations onto the singlehardware IOMMU page tableL2 programs the device directlyDevice DMA’s into L2 memory space directly

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22

Page 49: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Multi-level device assignment

With nested 3x3 options for I/O virtualization (L2 ⇔ L1 ⇔ L0)Multi-level device assignment means giving an L2 guest directaccess to L0’s devices, safely bypassing both L0 and L1

L1 hypervisor

physical device

L0 hypervisor

L2 device driver

MMIOs and PIOs

L0 IOMMUL1 IOMMU

Device DMA via platform IOMMU

How? L0 emulates an IOMMU for L1 [Amit10]L0 compresses multiple IOMMU translations onto the singlehardware IOMMU page tableL2 programs the device directlyDevice DMA’s into L2 memory space directly

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22

Page 50: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Multi-level device assignment

With nested 3x3 options for I/O virtualization (L2 ⇔ L1 ⇔ L0)Multi-level device assignment means giving an L2 guest directaccess to L0’s devices, safely bypassing both L0 and L1

L1 hypervisor

physical device

L0 hypervisor

L2 device driver

MMIOs and PIOs

L0 IOMMUL1 IOMMU

Device DMA via platform IOMMU

How? L0 emulates an IOMMU for L1 [Amit10]L0 compresses multiple IOMMU translations onto the singlehardware IOMMU page tableL2 programs the device directlyDevice DMA’s into L2 memory space directly

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22

Page 51: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Multi-level device assignment

With nested 3x3 options for I/O virtualization (L2 ⇔ L1 ⇔ L0)Multi-level device assignment means giving an L2 guest directaccess to L0’s devices, safely bypassing both L0 and L1

L1 hypervisor

physical device

L0 hypervisor

L2 device driver

MMIOs and PIOs

L0 IOMMUL1 IOMMU

Device DMA via platform IOMMU

How? L0 emulates an IOMMU for L1 [Amit10]L0 compresses multiple IOMMU translations onto the singlehardware IOMMU page tableL2 programs the device directlyDevice DMA’s into L2 memory space directly

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22

Page 52: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Multi-level device assignment

With nested 3x3 options for I/O virtualization (L2 ⇔ L1 ⇔ L0)Multi-level device assignment means giving an L2 guest directaccess to L0’s devices, safely bypassing both L0 and L1

L1 hypervisor

physical device

L0 hypervisor

L2 device driver

MMIOs and PIOs

L0 IOMMUL1 IOMMU

Device DMA via platform IOMMU

How? L0 emulates an IOMMU for L1 [Amit10]L0 compresses multiple IOMMU translations onto the singlehardware IOMMU page tableL2 programs the device directlyDevice DMA’s into L2 memory space directly

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 15 / 22

Page 53: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Experimental Setup

Running Linux, Windows,KVM, VMware, SMP, . . .Macro workloads:

kernbenchSPECjbbnetperf

Multi-dimensional paging?Multi-level device assignment?KVM as L1 vs. VMware as L1?

See paper for full experimentaldetails, more benchmarks andanalysis, including worst casesynthetic micro-benchmark

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 16 / 22

Page 54: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Macro: SPECjbb and kernbench

kernbenchHost Guest Nested NestedDRW

Run time 324.3 355 406.3 391.5% overhead vs. host - 9.5 25.3 20.7% overhead vs. guest - - 14.5 10.3

SPECjbbHost Guest Nested NestedDRW

Score 90493 83599 77065 78347% degradation vs. host - 7.6 14.8 13.4% degradation vs. guest - - 7.8 6.3

Table: kernbench and SPECjbb results

Exit multiplication effect not as bad as we feared

Direct vmread and vmwrite (DRW) give an immediate boost

Take-away: each level of virtualization adds approximately the sameoverhead!

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 17 / 22

Page 55: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Macro: multi-dimensional paging

Shadow on EPTMulti−dimensional paging

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

kernbench specjbb netperf

Impr

ovem

ent r

atio

Impact of multi-dimensional paging depends on rate of page faultsShadow-on-EPT: every L2 page fault causes L1 multiple exitsMulti-dimensional paging: only EPT violations cause L1 exitsEPT table rarely changes: #(EPT violations) << #(page faults)Multi-dimensional paging huge win for page-fault intensivekernbench

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 18 / 22

Page 56: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Macro: multi-level device assignment

throughput (Mbps)%cpu

0

100

200

300

400

500

600

700

800

900

1,000

nativesingle level guest

emulation

single level guest

virtio

single level guest

direct access

nested guest emulation / emulation

nested guest virtio / emulation

nested guest virtio / virtio

nested guest direct / virtio

nested guest direct / direct

0

20

40

60

80

100

throug

hput (M

bps)

% cpu

Benchmark: netperf TCP_STREAM (transmit)Multi-level device assignment best performing optionBut: native at 20%, multi-level device assignment at 60% (x3!)Interrupts considered harmful, cause exit multiplication

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 19 / 22

Page 57: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Macro: multi-level device assignment (sans interrupts)

100

200

300

400

500

600

700

800

900

1000

16 32 64 128 256 512

Thro

ughp

ut (M

bps)

Message size (netperf -m)

L0 (bare metal)L2 (direct/direct)L2 (direct/virtio)

What if we could deliver device interrupts directly to L2?Only 7% difference between native and nested guest!

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 20 / 22

Page 58: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Conclusions

Efficient nested x86virtualization is challenging butfeasibleA whole new ballpark openingup many excitingapplications—security, cloud,architecture, . . .Current overhead of 6-14%

Negligible for someworkloads, not yet for othersWork in progress—expect atmost 5% eventually

Code is availableIt’s turtles all the way down

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22

Page 59: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Conclusions

Efficient nested x86virtualization is challenging butfeasibleA whole new ballpark openingup many excitingapplications—security, cloud,architecture, . . .Current overhead of 6-14%

Negligible for someworkloads, not yet for othersWork in progress—expect atmost 5% eventually

Code is availableIt’s turtles all the way down

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22

Page 60: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Conclusions

Efficient nested x86virtualization is challenging butfeasibleA whole new ballpark openingup many excitingapplications—security, cloud,architecture, . . .Current overhead of 6-14%

Negligible for someworkloads, not yet for othersWork in progress—expect atmost 5% eventually

Code is availableIt’s turtles all the way down

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22

Page 61: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Conclusions

Efficient nested x86virtualization is challenging butfeasibleA whole new ballpark openingup many excitingapplications—security, cloud,architecture, . . .Current overhead of 6-14%

Negligible for someworkloads, not yet for othersWork in progress—expect atmost 5% eventually

Code is availableIt’s turtles all the way down

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22

Page 62: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Conclusions

Efficient nested x86virtualization is challenging butfeasibleA whole new ballpark openingup many excitingapplications—security, cloud,architecture, . . .Current overhead of 6-14%

Negligible for someworkloads, not yet for othersWork in progress—expect atmost 5% eventually

Code is availableIt’s turtles all the way down

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 21 / 22

Page 63: The Turtles Project: Design and Implementation of …...mode, Linux/KVM) To be able to run other hypervisors inclouds Security (e.g., hypervisor-level rootkits) Co-design of x86 hardware

Questions?

Ben-Yehuda et al. (IBM Research) The Turtles Project: Nested Virtualization OSDI ’10 22 / 22