lguest64 - a new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...lguest64 - a new breed...

38
Lguest64 - A new breed of puppies Glauber de Oliveira Costa [email protected] Red Hat Inc. January, 2008 Glauber de Oliveira Costa [email protected] Lguest64 - A new breed of puppies January, 2008 2 / 26

Upload: truonghanh

Post on 09-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

Lguest64 - A new breed of puppies

Glauber de Oliveira [email protected]

Red Hat Inc.

January, 2008

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 2 / 26

Page 2: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

The need for a 64-bit PV

x86_64 PV not nearly as efficient as i386.Not strictly. But we wanted it (HVM enabled x86_64 hardwareslightly more common)Where’s the hardware?Testbed for the pvops64 patchlguest64 - smp from the very beginningIdeas exported into lguest32 (For ex: get rid of the ugly elf loader)

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 3 / 26

Page 3: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

The need for a 64-bit PV

x86_64 PV not nearly as efficient as i386.

Not strictly. But we wanted it (HVM enabled x86_64 hardwareslightly more common)Where’s the hardware?Testbed for the pvops64 patchlguest64 - smp from the very beginningIdeas exported into lguest32 (For ex: get rid of the ugly elf loader)

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 3 / 26

Page 4: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

The need for a 64-bit PV

x86_64 PV not nearly as efficient as i386.Not strictly. But we wanted it (HVM enabled x86_64 hardwareslightly more common)

Where’s the hardware?Testbed for the pvops64 patchlguest64 - smp from the very beginningIdeas exported into lguest32 (For ex: get rid of the ugly elf loader)

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 3 / 26

Page 5: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

The need for a 64-bit PV

x86_64 PV not nearly as efficient as i386.Not strictly. But we wanted it (HVM enabled x86_64 hardwareslightly more common)Where’s the hardware?

Testbed for the pvops64 patchlguest64 - smp from the very beginningIdeas exported into lguest32 (For ex: get rid of the ugly elf loader)

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 3 / 26

Page 6: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

The need for a 64-bit PV

x86_64 PV not nearly as efficient as i386.Not strictly. But we wanted it (HVM enabled x86_64 hardwareslightly more common)Where’s the hardware?Testbed for the pvops64 patch

lguest64 - smp from the very beginningIdeas exported into lguest32 (For ex: get rid of the ugly elf loader)

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 3 / 26

Page 7: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

The need for a 64-bit PV

x86_64 PV not nearly as efficient as i386.Not strictly. But we wanted it (HVM enabled x86_64 hardwareslightly more common)Where’s the hardware?Testbed for the pvops64 patchlguest64 - smp from the very beginning

Ideas exported into lguest32 (For ex: get rid of the ugly elf loader)

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 3 / 26

Page 8: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

The need for a 64-bit PV

x86_64 PV not nearly as efficient as i386.Not strictly. But we wanted it (HVM enabled x86_64 hardwareslightly more common)Where’s the hardware?Testbed for the pvops64 patchlguest64 - smp from the very beginningIdeas exported into lguest32 (For ex: get rid of the ugly elf loader)

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 3 / 26

Page 9: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

x86_64 - Intrinsically more complicated!

No segment limit protection

swapgs all-in-one instructionsyscall instruction always presentsyscalls bounces to hypervisor4-level page tablesMuch room for code sharing, but hard in 2.6.22

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 4 / 26

Page 10: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

x86_64 - Intrinsically more complicated!

No segment limit protectionswapgs all-in-one instruction

syscall instruction always presentsyscalls bounces to hypervisor4-level page tablesMuch room for code sharing, but hard in 2.6.22

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 4 / 26

Page 11: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

x86_64 - Intrinsically more complicated!

No segment limit protectionswapgs all-in-one instructionsyscall instruction always present

syscalls bounces to hypervisor4-level page tablesMuch room for code sharing, but hard in 2.6.22

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 4 / 26

Page 12: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

x86_64 - Intrinsically more complicated!

No segment limit protectionswapgs all-in-one instructionsyscall instruction always presentsyscalls bounces to hypervisor

4-level page tablesMuch room for code sharing, but hard in 2.6.22

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 4 / 26

Page 13: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

x86_64 - Intrinsically more complicated!

No segment limit protectionswapgs all-in-one instructionsyscall instruction always presentsyscalls bounces to hypervisor4-level page tables

Much room for code sharing, but hard in 2.6.22

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 4 / 26

Page 14: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

x86_64 - Intrinsically more complicated!

No segment limit protectionswapgs all-in-one instructionsyscall instruction always presentsyscalls bounces to hypervisor4-level page tablesMuch room for code sharing, but hard in 2.6.22

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 4 / 26

Page 15: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

No segment limit protection

Forced to use page tables for protectionlguest32 also benefited from it.

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 5 / 26

Page 16: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

host2guest comm

3 pages: (guest perspective)HV text - Executable

guest ro area - the vcpu struct, Read Onlyguest scratch pad - mapped in the same virtual address for all vcpus,RW

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 6 / 26

Page 17: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

host2guest comm

3 pages: (guest perspective)HV text - Executableguest ro area - the vcpu struct, Read Only

guest scratch pad - mapped in the same virtual address for all vcpus,RW

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 6 / 26

Page 18: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

host2guest comm

3 pages: (guest perspective)HV text - Executableguest ro area - the vcpu struct, Read Onlyguest scratch pad - mapped in the same virtual address for all vcpus,RW

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 6 / 26

Page 19: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

What you mean?

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 8 / 26

Page 20: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

Why map in the same virtual address?

Consider the code: (It’s guest code)

ENTRY(lguest_iret)pushl %eaxmovl 12(%esp), %eaxmovl %eax,%ss:lguest_data+LGUEST_DATA_irq_enabled^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^popl %eaxiret

How do you know where to write ? userspace stack, userspace gs, etc

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 10 / 26

Page 21: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

No segment limit protection - Guest kernel

When guest kernel runs: all rw pages can be touched.Map hypervisor (vcpu_data) RO (with a RW scratch pad - irq state, etc)

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 11 / 26

Page 22: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

No segment limit protection - switcher

Hypervisor has a lot of updates to do → all of them have to happen beforecr3 switch

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 12 / 26

Page 23: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

No segment limit protection - userapp

When userspace app runs, no kernel pages are mapped.

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 13 / 26

Page 24: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

Like this:

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 15 / 26

Page 25: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

What does 32-bit do?

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 17 / 26

Page 26: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

Communications

Extended set of hypercalls over plain lguestsetup hypercalls use int 0x80, switch to syscall ASAP.

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 18 / 26

Page 27: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

syscall always present

and always go to privilege 0!

write msr at every run → no mess with userspace host appsguest kernel and guest userspace differentiate through a flag

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 19 / 26

Page 28: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

syscall always present

and always go to privilege 0!

write msr at every run → no mess with userspace host apps

guest kernel and guest userspace differentiate through a flag

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 19 / 26

Page 29: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

syscall always present

and always go to privilege 0!

write msr at every run → no mess with userspace host appsguest kernel and guest userspace differentiate through a flag

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 19 / 26

Page 30: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

swapgs

Before: Access to kernel data structuresAfter: Forget about it(And the other way around too)

Hard to call functions (stack is kernel data)We made pvops have a symbol that points to syscall after swapgssyscall handler trampoline go straight there

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 20 / 26

Page 31: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

swapgs

Before: Access to kernel data structuresAfter: Forget about it(And the other way around too)

Hard to call functions (stack is kernel data)

We made pvops have a symbol that points to syscall after swapgssyscall handler trampoline go straight there

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 20 / 26

Page 32: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

swapgs

Before: Access to kernel data structuresAfter: Forget about it(And the other way around too)

Hard to call functions (stack is kernel data)We made pvops have a symbol that points to syscall after swapgs

syscall handler trampoline go straight there

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 20 / 26

Page 33: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

swapgs

Before: Access to kernel data structuresAfter: Forget about it(And the other way around too)

Hard to call functions (stack is kernel data)We made pvops have a symbol that points to syscall after swapgssyscall handler trampoline go straight there

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 20 / 26

Page 34: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

x86_64 system call

#define SWAPGS_UNSAFE_STACK swapgs

ENTRY(system_call)SWAPGS_UNSAFE_STACK

ENTRY(system_call_after_swapgs)movq %rsp,%gs:pda_oldrspmovq %gs:pda_kernelstack,%rsp

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 22 / 26

Page 35: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

4-level page tables

The nastier one: page table updates have to find their corresponding pmd,pud, pgd.We keep a hash binding to upper level

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 23 / 26

Page 36: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

Other features

strong statisticsNMI handlingBut features kill puppies, so no much more.

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 24 / 26

Page 37: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

Current Status

Long winter due to need of getting pvops64 upstream (x86 merge)Strategy is to not even keep trees separatedRusty took first part of smp patches (missing the scratch pad)Work on progress to make lguest hv functions less 32-bit centric

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 25 / 26

Page 38: Lguest64 - A new breed of puppiesgiles/2008/lca/mirror/slides/300-palestra...Lguest64 - A new breed of puppies Glauber de Oliveira Costa gcosta@redhat.com Red Hat Inc. January, 2008

That’s all, Folks!

... Unless you have questions!Many thanks to Steven Rostedt, who could not unfortunately be here

Glauber de Oliveira Costa [email protected] (Red Hat Inc.)Lguest64 - A new breed of puppies January, 2008 26 / 26