![Page 1: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/1.jpg)
SFO15-407:Performance Overhead
of ARM VirtualizationLinaro Connect SFO15 Christoffer Dall, Linaro
![Page 2: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/2.jpg)
Virtualization Use Cases
• Resource Sharing
• Isolation
• High Availability
• Provisioning
• Load balancing
![Page 3: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/3.jpg)
No Free Lunches
There’s a cost: Performance
![Page 4: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/4.jpg)
Virtualization on ARM
Virtualization Extensions
![Page 5: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/5.jpg)
ARM Virtualization Extensions
Kernel
User
![Page 6: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/6.jpg)
ARM Virtualization Extensions
Kernel
User
Hyp
EL0
EL1
EL2
![Page 7: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/7.jpg)
Kernel
User
Hypervisor
Kernel
User
VM 0 VM 1
ARM Virtualization Extensions
EL0
EL1
EL2
![Page 8: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/8.jpg)
Xen ARM
Kernel
User
Kernel
User
Dom0 DomU
EL0
EL1
XenEL2
![Page 9: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/9.jpg)
KVM
ARM Hardware
Kernel
VMGuest Kernel
Hypervisor
VMGuest Kernel
HYP mode?
![Page 10: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/10.jpg)
KVM
Kernel / KVM
User
Kernel
User
Host VM
EL0
EL1
EL2 KVM
![Page 11: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/11.jpg)
Hypercall on Xen
Kernel
User
Xen
Kernel
User
Dom0 DomU
EL0
EL1
EL2
HVC Ret
![Page 12: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/12.jpg)
Hypercall on KVM
Kernel / KVM
User
Kernel
User
Host VM
EL0
EL1
EL2 KVMHVCswitch
state
Ret
switch state Ret
HVC
![Page 13: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/13.jpg)
Hypercall Comparison
1. HVC Instruction 2. Xen handler 3. Return to VM
1. HVC Instruction 2. Switch EL1 state in KVM EL2 3. Return to host kernel 4. KVM handler 5. HVC Instruction 6. Switch EL1 state in KVM EL2 7. Return to VM
Xen KVM
![Page 14: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/14.jpg)
![Page 15: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/15.jpg)
Measurement Methodology
• Compare virtual to native
• CloudLab cluster with both ARM64 servers and x86 servers
![Page 16: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/16.jpg)
Hardware UsedARM Server x86 Server
Type HP Moonshot m400 Dell r320
CPU 2.4 GHz APM Atlas 2.1 GHz Xeon ES-2450
SMP 8-way 8-way
Memory 64 GB 16 GB
Disk SATA SSD 7200 RPM SATA HDD
Network Mellanox ConnectX-3 10GbE Mellanox MX354A 10GbE
![Page 17: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/17.jpg)
Configuration
• Same configuration across hardware
• Max 12GB of RAM
• Max 4 CPUs
• Hyperthreading disabled on x86
![Page 18: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/18.jpg)
VM Configuration
• 4 vCPUs per VM/host, 8 physical CPUs
• Pin all VCPUs to dedicated PCPUs
• VHOST enabled for KVM
![Page 19: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/19.jpg)
Software Configurations
• Same software version
• Linux v4.1-rc2+
• Ubuntu Trusty
• Same kernel config, manually tweaked x86 and arm64 options
![Page 20: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/20.jpg)
Micro NumbersARM 64-bit x86 64-bit
Microbenchmark KVM Xen KVM Xen
Hypercall NA NA NA NA
Interrupt Trap NA NA NA NA
IPI NA NA NA NA
EOI+ACK NA NA NA NA
VM Switch NA NA - -
I/O Latency Out NA NA - -
I/O Latency In NA NA - -
All numbers shown in cycles
![Page 21: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/21.jpg)
Hypercall BreakdownKVM ARM
State Save RestoreGP Regs NA NA
System Regs NA NAFP Regs NA NA
VGIC Regs NA NATimer Regs NA NA
EL2 Config Regs NA NAStage-2 MMU Regs NA NA
Save = Save state to MemoryRestore = Restore state From Memory
![Page 22: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/22.jpg)
ARM 64-bit x86 64-bit
Microbenchmark KVM Xen KVM Xen
Hypercall NA NA NA NA
Interrupt Trap NA NA NA NA
IPI NA NA NA NA
EOI+ACK NA NA NA NA
VM Switch NA NA - -
I/O Latency Out NA NA - -
I/O Latency In NA NA - -
All numbers shown in cycles
Micro Numbers
![Page 23: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/23.jpg)
I/O Latency Out Xen
Kernel
User
Xen
Kernel
User
Dom0 - PCPU 0 DomU - PCPU 4
EL0
EL1
EL2
HVCIRQIPI
vIRQ
![Page 24: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/24.jpg)
I/O Latency Out KVM
Kernel / KVM
User
Kernel
User
Host - PCPU 4 VM - PCPU 4
EL0
EL1
EL2 KVMMMIO Trapswitch
state
Ret
Same physical CPU!
![Page 25: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/25.jpg)
ARM 64-bit x86 64-bit
Microbenchmark KVM Xen KVM Xen
Hypercall NA NA NA NA
Interrupt Trap NA NA NA NA
IPI NA NA NA NA
EOI+ACK NA NA NA NA
VM Switch NA NA - -
I/O Latency Out NA NA - -
I/O Latency In NA NA - -
All numbers shown in cycles
Micro Numbers
![Page 26: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/26.jpg)
I/O Latency In Xen
Kernel
User
Xen
Kernel
User
Dom0 - PCPU 0 DomU - PCPU 4
EL0
EL1
EL2
IRQHVCIPI
vIRQ
![Page 27: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/27.jpg)
I/O Latency In KVM
Kernel / KVM
User
Kernel
User
Host - PCPU 0 VM - PCPU 4
EL0
EL1
EL2 KVMIRQ Exitswitch
state
Ret
IPI from VHOST
switch state
Ret vIRQ
HVC
![Page 28: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/28.jpg)
ARM 64-bit x86 64-bit
Microbenchmark KVM Xen KVM Xen
Hypercall NA NA NA NA
Interrupt Trap NA NA NA NA
IPI NA NA NA NA
EOI+ACK NA NA NA NA
VM Switch NA NA - -
I/O Latency Out NA NA - -
I/O Latency In NA NA - -
All numbers shown in cycles
Micro Numbers
![Page 29: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/29.jpg)
CPU Intensive Benchmarks
Normalized overhead (lower is better)
• Kernbench
• Hackbench
• SpecJVM2088
Results are NA
![Page 30: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/30.jpg)
Netperf
Normalized overhead (lower is better)
• TCP_STREAM
• TCP_MAERTS
• TCP_RR
Results are NA
![Page 31: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/31.jpg)
Netperf Study
• TCP_STREAM sends bulk data from client to VM
• Xen does not support zero-copy
![Page 32: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/32.jpg)
Netperf Study
• TCP_MAERTS sends bulk data from VM to client
• Xen performance is regression in Linux v4.0 from patch to fight buffer bloat
• Can be reduced to XX% by tuning sysfs
![Page 33: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/33.jpg)
Netperf Study• TCP_RR sends byte-by-byte
on open connection
Native KVM XenTrans/sec NA NA NATime/trans NA NA NAOverhead - NA NA
recv to send NA NA NAVM recv to VM send NA NA
recv to VM recv - NA NAVM send to send NA NA
Numbers in µseconds
![Page 34: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/34.jpg)
Application Benchmarks
Normalized overhead (lower is better)
• Apache
• memcached
• MySQL 20 Threads
Results are NA
![Page 35: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/35.jpg)
Conclusions
• Despite better hypercall performance, Xen does not necessarily outperform KVM on ARM.
• ARM servers do not exhibit worse overhead than x86 and is a viable choice.
• Latency is significant with paravirtualized I/O
![Page 36: SFO15-407: Performance Overhead of ARM Virtualization](https://reader031.vdocuments.site/reader031/viewer/2022021918/58a8a2c31a28ab7f458b5383/html5/thumbnails/36.jpg)
Future Work
• Further application benchmark analysis
• Device Assignment
• Upstream support for micro-benchmarks
• Automation and regression monitoring