xpds16: review and analysis of performance metrics of the xen hypervisor on zynq® ultrascale+™...
TRANSCRIPT
REVIEW AND ANALYSIS OF
PERFORMANCE METRICS OF
THE XEN HYPERVISOR ON
ZYNQ ULTRASCALE+ MPSOC
where hardware and software design meet
Introduction
12 Jul 2016 2
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
Presentation Outline
12 Jul 2016
(c) 2016, DornerWorks Ltd.
3
Introduction
Overview
Methodology & Results
Summary
where hardware and software design meet
Overview
12 Jul 2016 4
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
Motivation
12 Jul 2016
(c) 2016, DornerWorks Ltd.
5
Why embedded Xen?
where hardware and software design meet
.....
Motivation
12 Jul 2016
(c) 2016, DornerWorks Ltd.
6
Why embedded Xen?
Same reasons as for server Xen
Customers asking about it
where hardware and software design meet
.....
Motivation
12 Jul 2016
(c) 2016, DornerWorks Ltd.
7
Why embedded Xen?
Same reasons as for server Xen
Customers asking about it
Why performance metrics?
where hardware and software design meet
.....
Motivation
12 Jul 2016
(c) 2016, DornerWorks Ltd.
8
Why embedded Xen?
Same reasons as for server Xen
Customers asking about it
Why performance metrics?
To know the cost of adding new software layer
where hardware and software design meet
.....
Motivation
12 Jul 2016
(c) 2016, DornerWorks Ltd.
9
Why embedded Xen?
Same reasons as for server Xen
Customers asking about it
Why performance metrics?
To know the cost
Which metrics?
where hardware and software design meet
.....
Motivation
12 Jul 2016
(c) 2016, DornerWorks Ltd.
10
Why embedded Xen?
Same reasons as for server Xen
Customers asking about it
Why performance metrics?
To know the cost
Which metrics?
Boot Time
Interrupt Latency
Context Switch Overhead
where hardware and software design meet
.....
Xilinx Zynq UltraScale+ MPSoC
12 Jul 2016 11
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
ZCU102
12 Jul 2016
A53 clocked at 1.1GHz
4 GiB DDR4 400MHz
2GiB DDR4 configured*
12
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
XenTrace/XenAlyze
12 Jul 2016
(c) 2016, DornerWorks Ltd.
13
XenTrace
Real-Time event capture of Xen kernel
Multiple event types
Binary output
XenAlyze
Binary parser
Human readable output
where hardware and software design meet
.....12 Jul 2016
(c) 2016, DornerWorks Ltd.
14
where hardware and software design meet
.....
XenTrace – ARM Support
12 Jul 2016
(c) 2016, DornerWorks Ltd.
15
XenTrace was not supported on ARM
x86 based code incompatible
Memory translation for Xen Domain
PVH on ARM instead of PV
Page mapping
Trace timestamp
XenAlyze
where hardware and software design meet
.....
XenTrace – Patch
12 Jul 2016
(c) 2016, DornerWorks Ltd.
16
Patch Created
Previous work by Pavlo Suikov
Changes
Special case for Xen Domain
Page access type detection
TSC timestamp
XenAlyze build
where hardware and software design meet
.....
XenTrace – Patch Status
12 Jul 2016
(c) 2016, DornerWorks Ltd.
17
Patch Submitted - Currently in Review
Special case for DOMID_XEN
Only non auto-translated domain
Fundamental re-work of DOMID_XEN memory
mapping needed?
Working version available at:
https://github.com/dornerworks/xen
branch: dornerworks/xentrace
where hardware and software design meet
Methodology & Results
12 Jul 2016 18
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
Methodology
12 Jul 2016
(c) 2016, DornerWorks Ltd.
19
SW Configuration
Xen Zynq Distro (www.xen.world)
beta_02_19_2016
Petalinux 2015.4
Linux version 4.0
Xen Version 4.7, prelease
HW Configuration
ZCU102 Rev C.2
where hardware and software design meet
.....
Boot Time, Xen
12 Jul 2016
(c) 2016, DornerWorks Ltd.
20
Time stamps
End time captured in arch/arm/setup.c
/* C entry point for boot CPU */
void __init start_xen(unsigned long boot_phys_offset,
unsigned long fdt_paddr,
unsigned long cpuid)
{
…
xen_done_t = READ_SYSREG64(CNTPCT_EL0);
switch_stack_and_jump(idle_vcpu[0]-> …);
}
where hardware and software design meet
.....
Boot Time, Xen
12 Jul 2016
Samples: 404, Avg: 876.6+/- 0.16 msec
Min: 876.3 msec, Max: 877.1 msec
21
(c) 2016, DornerWorks Ltd.
0
20
40
60
80
100
120
140
87
6
87
6.1
87
6.2
87
6.3
87
6.4
87
6.5
87
6.6
87
6.7
87
6.8
87
6.9
87
7
87
7.1
87
7.2
87
7.3
87
7.4
87
7.5
87
7.6
87
7.7
87
7.8
87
7.9
Mo
re
Fre
qu
en
cy
msec
Xen Boot
where hardware and software design meet
.....
Boot Time, dom0
12 Jul 2016
(c) 2016, DornerWorks Ltd.
22
static int __ref kernel_init(void *unused)
{
…
if (execute_command) {
ret = run_init_process(execute_command);
if (!ret)
return 0;
panic("Requested init %s failed (error %d).",
execute_command, ret);
}
end_time = arch_timer_read_counter_abs()
if (!try_to_run_init_process("/sbin/init") ||
!try_to_run_init_process("/etc/init") ||
…
}
where hardware and software design meet
.....
Boot Time, dom0
12 Jul 2016
Samples: 148, Avg: 5.13 +/- 0.18 seconds
Max: 6.31 seconds, Min: 4.77 seconds
23
(c) 2016, DornerWorks Ltd.
Xen boot end time, 0.00
Dom0 start, 35.11
Dom0 kernel threads
spawned, 339.14
Dom0 init call, 5125.70
0 1000 2000 3000 4000 5000
Dom0 Boot timeline (milliseconds)
where hardware and software design meet
.....
Interrupt Latency
12 Jul 2016
(c) 2016, DornerWorks Ltd.
24
where hardware and software design meet
.....
Interrupt Latency
12 Jul 2016
(c) 2016, DornerWorks Ltd.
25
where hardware and software design meet
.....
Interrupt Latency
12 Jul 2016
(c) 2016, DornerWorks Ltd.
26
where hardware and software design meet
.....
Interrupt Latency, Timer
12 Jul 2016
(c) 2016, DornerWorks Ltd.
27
arch/arm/arm64/entry.S:hyp_irq:
entry hyp=1
ldr x1, =irq_entry
mrs x2, cntpct_el0
str x2, [x1]
…
arch/arm/gic.c:static inline void gic_set_lr(int lr, struct pending_irq *p,
unsigned int state)
{
ASSERT(!local_irq_is_enabled());
TRACE_2D_64(TRC_HW_IRQ_GUEST_HANDOFF, irq_entry,(get_cycles()+boot_count));
gic_hw_ops->update_lr(lr, p, state);
…
where hardware and software design meet
.....
Interrupt Latency, Timer
12 Jul 2016
Timer test : 2.35 +/- 0.43 µsec
28
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
Interrupt Latency, GPIO
12 Jul 2016
(c) 2016, DornerWorks Ltd.
29
where hardware and software design meet
.....
Interrupt Latency, GPIO
12 Jul 2016
(c) 2016, DornerWorks Ltd.
30
where hardware and software design meet
.....
Interrupt Latency, GPIO
12 Jul 2016
(c) 2016, DornerWorks Ltd.
31
where hardware and software design meet
.....
Context Switch Overhead
8/22/2016
0dom0
idle
Time
dom1
1 2 3
0 dom0
idle
dom1
2 3
1
C
P
U
C
P
U
where hardware and software design meet
.....
Context Switch Overhead, Credit
12 Jul 2016
Two domains, same core
6.24 +/- 1.3 µsec every ~15msec average
Expected overhead of 0.04%
Two domains, different cores
dom0 : 5.5 +/- 0.72 µsec on 7.3 msec average
Expected overhead of 0.08%
dom1 : 2.48 +/- 0.10 µsec every 30.003 +/- 0.0001
msec
Expected overhead of 0.008%
33
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
Context Switch Overhead, A653
12 Jul 2016
Two domains, same core
4.70 +/- 1.1 µsec every 4.7 msec
Expected overhead of 0.1%
Two domains, different cores
dom0: 4.76 +/- 0.6 µsec every ~5.1 msec average
Expected overhead of 0.093%
dom1: 1.17 +/- 0.1 µsec every 200.00 +/- 4E-5
msec
Expected overhead of 0.0006%
34
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
Expected vs Actual
12 Jul 2016
(c) 2016, DornerWorks Ltd.
35
Run simple bare metal Dhrystone application
natively to get baseline
Run simple Dhrystone app as Xen guest
Using Credit Scheduler with different time slices
Dhrystone guest pinned to own core
Compare guest results against native baseline
Smaller time slices results in greater overhead
where hardware and software design meet
.....
Expected vs Actual Comparison
12 Jul 2016 36
(c) 2016, DornerWorks Ltd.
0.000%
0.100%
0.200%
0.300%
0.400%
0.500%
0.600%
0.700%
0.800%
1 10 100 1000
Co
nte
xt S
wit
ch
Ove
rhe
ad
Timeslice (msec)
Overhead as a Function of Timeslice Size
Expected Actual
where hardware and software design meet
Summary
12 Jul 2016 37
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
Implications
12 Jul 2016
Boot times
Xen 0.8 sec, dom0 5.1 sec
38
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
Implications
12 Jul 2016
Boot times
Xen 0.8 sec, dom0 5.1 sec
Interrupt Latency
2.3µsec
39
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
Implications
12 Jul 2016
Boot times
Xen 0.8 sec, dom0 5.1 sec
Interrupt Latency
2.3µsec
Context Switch Overhead
~0% to 0.6%
40
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
Future Work
12 Jul 2016
Optimize dom0 boot time
Optimize Xen boot time
Reduce interrupt latency for certain special
configurations
Develop better model to estimate performance
loss
Measure jitter
Calculate WCET
41
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
Q&A
12 Jul 2016 42
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
Closing Thoughts
12 Jul 2016
There is a growing interest in embedded Xen
XenTrace now working for ARMv8!
Available at https://github.com/dornerworks/xen
branch: dornerworks/xentrace
Xen potentially viable for a good percentage of
embedded projects
Future Work
Optimize dom0 (Linux) boot time
Reduce interrupt latency for certain configurations
43
(c) 2016, DornerWorks Ltd.
where hardware and software design meet
.....
More Info
12 Jul 2016
(c) 2016, DornerWorks Ltd.
44
Xen Zynq Distribution
http://xen.world
Jarvis Roach
https://www.linkedin.com/in/jarvis-roach-33bb64
Ben Sanda
https://www.linkedin.com/in/benjamin-sanda-a2a1b920
Presentation
http://xzdforums.dornerworks.com/showthread.php?tid=599
where hardware and software design meet
Thank You!
12 Jul 2016 45
(c) 2016, DornerWorks Ltd.