transkernel: bridging monolithic kernels to peripheral cores...motivation: ephemeral tasks in smart...
TRANSCRIPT
![Page 1: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/1.jpg)
Transkernel: Bridging Monolithic Kernels to Peripheral Cores
Liwei Guo, Shuang Zhai, Yi Qiao, and Felix Xiaozhu Lin
Purdue ECE
![Page 2: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/2.jpg)
What is Transkernel? ➔ A novel OS model to run unmodified binary of a monolithic
kernel
➔ on a microcontroller-like core …
➔ of a heterogeneous SoC
➔ Key techniques: dynamic binary translation + kernel service emulation
2
![Page 3: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/3.jpg)
Motivation: Ephemeral tasks in smart thingsPrevalent: push notifications, periodic data logging, etc.
○ User tasks running on a monolithic kernel (e.g., Linux)
Energy-hungry: ~30% or higher battery drain in smart things [1]
Device suspend/resume is the key bottleneck [2]
3[1] Smartphone background activities in the wild: Origin, energy drain, and optimization, Chen et al., MobiCom’15[2] Decelerating Suspend and Resume, Zhai et al., Hotmobile’17
Ephemeraltask
Devicesuspend
Deviceresume
Wakeup Sleep
Time
![Page 4: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/4.jpg)
Why is device suspend/resume so inefficient?Slow power state transitions keep CPU waiting
Difficult to parallelize due to device dependencies
4Understanding modern device drivers, Kadav et al., ASPLOS’12
![Page 5: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/5.jpg)
Our proposal: suspend/resume on a peripheral core
5
● Benefit: Lower idle power and higher busy execution efficiency
● Asymmetric processors○ CPU + Peripheral core
● Heterogeneous, yet similar ISAs○ Same family, different profile ○ e.g., ARMv7a + v7m
● Loose coupling○ CPU can be turned on/off independently
● Shared platform resources○ IRQs○ DRAM
Apple A9 (Chipworks)
![Page 6: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/6.jpg)
Many SoCs fit this hardware model
6
CPUPeripheral Core
Interconnect
DRAM IO Devices
OMAP4460Cortex M3 + A92010
AM572xCortex M4 + A15 2014
iPhone 6Cortex M3 2014
i.MX 7Cortex M4 + A72017
Azure SphereCortex M4 + A72019
![Page 7: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/7.jpg)
Problem statement➔ On a heterogeneous SoC
➔ Given a commodity monolithic kernel (e.g., Linux)
➔ How to offload device suspend/resume kernel phase to the peripheral core?
7
CPU
Existing
CPU PeripheralCore
User Task
Device Resume
Device Suspend
Time The desired workflow
![Page 8: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/8.jpg)
Peripheralkernel
Design space exploration: multikernelHowever…
8
Main kernel
Suspend Resume
Kernel State
CPU Peripheral Core
IOKernel State
![Page 9: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/9.jpg)
Design space exploration: code transplantHowever…
9
Linux kernel
Suspend Resume
Peripheralkernel
Kernel State
CPU Peripheral Core
IO
Drivers
Driver lib
Kernel services & lib
Suspend Resume
![Page 10: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/10.jpg)
Design space exploration: full virtual executionHowever…
10
Linux kernel
Translated Code
Kernel State
CPU Peripheral Core
IO
DBT
> 25x overhead with current DBT
![Page 11: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/11.jpg)
Our proposal: Transkernel● Goal: Linux kernel offloading with affordable overhead● Approach: the peripheral core dynamically translates the kernel binary,
supported by a small set of emulated kernel services
11
CPU Peripheral Core
SuspendResume
CommodityKernel
Kernel StateDRAM
DynamicBinary Translation
IO devs
TranslatedcodeEmu
![Page 12: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/12.jpg)
Transkernel in the design space
12
Linu
x ke
rnel
com
patib
ility
Barrelfish [SOSP’09]M3 [ASPLOS’16]
Execution overhead
QEMU [ATC’05]
Transkernel
K2 [ASPLOS’14]Popcorn [Eurosys’15]
![Page 13: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/13.jpg)
Principle 1: translate stateful code, emulate stateless ● Stateful vs. stateless: whether
the states of the kernel are shared across cores
● Translated code: state-sharing made easy
● Emulated services: drop-in replacement
13
CPU Peripheral Core
kmalloc
Kernel StateDRAM
DynamicBinary Translation
IO devs
kmalloc
schedCommodity
Kernel
![Page 14: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/14.jpg)
Principle 2: identify narrow trans/emulation interface● The interface has to be:
○ Narrow
○ Stable
● Maintenance of emulated services made easy
14
CPU Peripheral Core
Suspend/Resume
CommodityKernel
Kernel StateDRAM
DynamicBinary Translation
IO devs
Translated
emu
![Page 15: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/15.jpg)
Principle 3: specialize for hot paths
● Hot paths: 99% of executions
○ Encounter no errors
○ All needed resources acquired
● Going off? Fall back to CPU
● Simplify DBT implementation on a peripheral core
15
![Page 16: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/16.jpg)
Principle 4: exploit ISA similarity● Between the ISAs of CPU & peripheral core:
○ General purpose registers○ Control flow registers (SP, LR, PC)○ Flag semantics (NZCV)
● Reducing the number of emitted instructions in DBT ● Key to low overhead!
16
![Page 17: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/17.jpg)
ARK: an ARm transKernel
Platform: OMAP4460 (Cortex A9+M3)
ARK instantiates the principles on Linux● Execute unmodified Linux kernel
binary on the peripheral core● Depend on stable ABIs (only 12
functions + 1 variable)● Focus on hot paths; may fall back to
CPU● Low-overhead ARM v7a -> v7m DBT
17
sched spinlock
virtaddr
deferredwork
IRQhandler
IRQ handler(early)
mutexsem
memalloc
fallback
TranslatedCode(stateful)
delaysleep Emulation
(stateless)
Linuxkernel binary
Device-specific
Driver libs
AccessingLinuxkernel state
privatelib
Stable ABI
Kernel libs
DBTcontexts
DBT Engine
![Page 18: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/18.jpg)
ARK: the cross-ISA DBT engine● Systemize similar semantics of
ARM v7m & v7a from formal specification [1]
● Most instructions have identical semantics (447)
● Others instructions …○ Side effect○ Constant constraints○ Shift modes
● Our DBT engine correctly executes over 200 million instructions!
18
v7a insncount
Each translated to # of v7m insns
Identity 447 1Side effect 52 3-5Const constraints 22 2-5Shift modes 10 2No counterparts 27 2-5
Total (v7a) 558
[1] Trustworthy Specifications of ARM v8-A and v8-M System Level Architecture, Reid et al., FMCAD’16
![Page 19: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/19.jpg)
Evaluation
● Does ARK …a. Incur low-overhead?b. Incur tractable engineering efforts?c. Yield efficiency benefit?
● Benchmarks setup• Test the whole suspend/resume phase, driven by a userspace test harness
• Diverse drivers: SD card, Flash drive, MMC controller, USB controller, Regulator, Keyboard, Camera, Bluetooth NIC, Wi-Fi NIC
19
![Page 20: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/20.jpg)
ARK’s DBT achieves low execution overhead
20
05
10152025
SD Card
Flash
MMC-Ctrl
USB-Ctrl
Regula
tor KBCam BT
Wi-Fi
Suspend
Baseline ARK
05
101520
Resume
25x -> 2.7x
Ove
rhea
d (X
)
![Page 21: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/21.jpg)
ARK reuses Linux with low efforts● Good code reuse: 10K vs. 40K
and even more
● Good compatibility: multiple versions and configurations of Linux kernel
21
Existing code (unchanged)
Translated 15K SLoCSubstituted w/ emu 25K SLoC
New implementationDBT 9K SLoCEmulation 1K SLoC
![Page 22: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/22.jpg)
End-to-end execution time & energy ● Time: prolonged execution time● Energy: 34% energy saved● Interesting finding: ARK sees higher DRAM energy
22
0 1 2 3 4 5
Baseline
ARK
Native
Accumulated Time (s)Idle Busy
23
0 100 200 300 400
Baseline
ARK
Native
Energy (mJ)IO DRAM Core busy Core idle
681
![Page 23: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/23.jpg)
What-if analysis
23
ARK energy: 66% w/o optimizationenergy: 333%
(2.7x, 41%) (13.9x, 41%)
![Page 24: Transkernel: Bridging Monolithic Kernels to Peripheral Cores...Motivation: Ephemeral tasks in smart things Prevalent: push notifications, periodic data logging, etc. User tasks runningon](https://reader033.vdocuments.site/reader033/viewer/2022050102/5f414c119b5af75992330e45/html5/thumbnails/24.jpg)
● Transkernel & its key techniques○ An appropriate translation/emulation boundary inside a monolithic kernel○ Exploit ISA similarity
● To OS ○ A new model to span a monolithic kernel over heterogeneous cores
● To DBT○ Efficiency loss can enable efficiency gain○ DBT applies to translate a specific path of a complex software stack!
● To Architects○ A heterogenous SoC friendly to transkernel
Take-home messages
24