seattle2015 xen
TRANSCRIPT
2 PVHVM Linux guest: why doesn't kexec work?
Why?
● We support Red Hat Enterprise Linux.
● Bare hardware, virtualized and cloud environments, ...
● Kernel issues happen.
● Analyse stack traces.
● In complicated cases use kdump!
3 PVHVM Linux guest: why doesn't kexec work?
Kexec/kdump
● “kexec … is a mechanism of the Linux kernel that allows "live" booting of a new kernel "over" the currently running kernel”
● Kdump uses kexec:● Some memory is reserved at boot (crashkernel=)● Crash kernel/initrd are loaded to the area.● On crash we trigger crash kernel's boot.● Crash initrd dumps all domain's memory and reboots.● You have crash file to analyse! (profit!!!)
5 PVHVM Linux guest: why doesn't kexec work?
Issues with Kexec on PVHVM
● Previously used structures cause problems, no good way to transfer knowledge to kexec kernel.
● and we need these interfaces working!● Xen/guest interfaces we need to re-establish:
● shared_info frame (XENMAPSPACE_shared_info)● VCPU_info (VCPUOP_register_vcpu_info)● Event channels (EVTCHNOP_bind_*, ABI)
● + Emuirq/pirq mappings (PHYSDEVOP_map_pirq)
● Granted pages
6 PVHVM Linux guest: why doesn't kexec work?
shared_info page:
● 4k page, belongs to Xen hypervisor.
● Required for events, vcpu_info for first 32 VCPUs lives here.
● Upon boot guest chooses one of its pages to sacrifice.● XENMEM_add_to_physmap(XENMAPSPACE_shared_info)
frees guest's frame and mounts shared_info there.
● kexec kernel does the same for another frame → we get a hole as shared_info is being unmapped from its previous place.
7 PVHVM Linux guest: why doesn't kexec work?
Event channels:
● Already bound event channels● “(XEN) event_channel.c:370:d2v0 EVTCHNOP failure: error -17”
● 2 level → FIFO ABI switch at boot
● Mapped control block, event array pages.● Some INTERDOMAIN channels are being set up by
the toolstack:
● Xenstore, xenconsole,..● EVTCHNOP_reset resets everything, there is no
way back.
8 PVHVM Linux guest: why doesn't kexec work?
Grant pages:
● Memory sharing mechanism in Xen.
● We can't do anything guest-side:
● Forcibly unmapping a page from backend domain will crash it.
● Requesting new pages requires additional memory.● Some grants are “persistent”.
● Maybe not-an-issue for kdump because its memory region is separated but
● We still need functional backends for kexec kernel!
10 PVHVM Linux guest: why doesn't kexec work?
“Obvious solution”
● Implement set of hypercalls to tear all interfaces down:
● reset_vcpu_info● evtchn_switch_to_2l● unmap_shared_info● do_something_with_granted_pages● …
● Good from “if there is a way to set something up there should be one to tear it down” PoV.
● Good for hypervisor testing :-)
11 PVHVM Linux guest: why doesn't kexec work?
“Obvious solution”
● Issues:
● Domain needs to follow a special protocol – what if it doesn't?
● Granted pages story is complicated.● Not all bits are being set up by the domain.● Too many possible issues (including security).
12 PVHVM Linux guest: why doesn't kexec work?
“New domain with the same memory”
● Destroy the original domain leaving its memory intact.
● Create new domain, reassign all memory pages, copy vcpu contexts.
● Benefits:
● No cumbersome teardown required!● Migration path is being reused!● Supportability: new interfaces/objects should “just
work”.
13 PVHVM Linux guest: why doesn't kexec work?
“New domain with the same memory”
● Issues:
● Memory reassignment appears to be cumbersome :-(
● Superpages, PoD, mem_access issues.● No m2p on ARM.
● Non-trivial toolstack part repeating migration code.● Too complicated.
14 PVHVM Linux guest: why doesn't kexec work?
“Reset everything”
● No cumbersome memory reassignment.
● Explicit list of interfaces to reset with one hypercall:
● shared_info, vcpu_info, event channels, pirq_to_emuirq, ioreq servers.
● Toolstack involvement required:
● Restart device model.● Reopen xenstore/xenconsole event channels.● ..
● Hypervisor maintainers like it :-)
15 PVHVM Linux guest: why doesn't kexec work?
“Reset everything”
● Granted pages - let's do (almost) nothing!
● Remove the domain from xenstore and add it back – all backends are supposed to release all mappings.
● Xenconsoled doesn't release its mapping (but that's fine).
● Special debug print to find future issues.● Hunt for misbehaving backends! (if there are such)
17 PVHVM Linux guest: why doesn't kexec work?
Current status and future work
● [PATCH v10 00/11] “toolstack-assisted approach to PVHVM guest kexec” is out waiting for reviewers!
● … and testers too!● PVH (as "HVM without device model") should "just
work".● Not tested, minor issues are possible.
● ARM-specific part is -ENOSYS stub for now.● shared_info page needs handling (same as x86).● Some GIC cleanup?