ed nightingale, orion hodson, ross mcilroy, chris hawblitzel, galen hunt microsoft research helios:...

26
Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels 1

Upload: francine-wilcox

Post on 01-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen

Hunt

MICROSOFT RESEARCH

Helios: Heterogeneous Multiprocessing with Satellite Kernels

1

Page 2: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

Problem: HW now heterogeneous

Heterogeneity ignored by operating systems

RAM

Programming models are fragmented

Standard OS abstractions are missing

2

CPU CPU

Once upon a time…

CPU

Hardware was homogeneous

CPU CPU

CPUCPU

CPU CPU

CPUCPU

RAM

CPU CPU

CPUCPU

CPU CPU

CPUCPU

GP-GPU

RAM

Programmable

NIC

RAM

Single CPU

SMPCMPNUMA

Page 3: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

3

Solution

Helios manages ‘distributed system in the small’ Simplify app development, deployment, and tuning Provide single programming model for heterogeneous

systems

4 techniques to manage heterogeneity Satellite kernels: Same OS abstraction everywhere Remote message passing: Transparent IPC between

kernels Affinity: Easily express arbitrary placement policies to

OS 2-phase compilation: Run apps on arbitrary devices

Page 4: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

4

Results

Helios offloads processes with zero code changes Entire networking stack Entire file system Arbitrary applications

Improve performance on NUMA architectures Eliminate resource contention with multiple kernels Eliminate remote memory accesses

Page 5: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

5

Outline

MotivationHelios design

Satellite kernels Remote message passing Affinity Encapsulating many ISAs

EvaluationConclusion

Page 6: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

Kernel

Programmable device

Driver interface is poor app interface

Hard to perform basic tasks: debugging, I/O, IPCDriver encompasses services and runtime…an OS!

6

CPU

I/O device

driver101

0

AppApp

JIT Sched.Mem

.IPC

Page 7: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

7

Satellite kernels provide single interface

Sat. Kernel

CPU Programmable device

App

NUMA

App FSApp

Satellite kernels: Efficiently manage local resources Apps developed for single system call interface μkernel: Scheduler, memory manager, namespace manager

Sat. Kernel

TCP

NUMA

\\

Sat. Kernel

Page 8: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

8

Remote Message Passing

Local IPC uses zero-copy message passingRemote IPC transparently marshals data Unmodified apps work with multiple kernels

Sat. Kernel

Programmable device

App

NUMA

App FSApp

Sat. Kernel

TCP

NUMA

\\

Sat. Kernel

Page 9: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

9

Connecting processes and services

Applications register in a namespace as servicesNamespace is used to connect IPC channels

/fs/dev/nic0/dev/disk0/services/TCP/services/PNGEater/services/kernels/ARMv5

Satellite kernels register in namespace

Page 10: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

10

Where should a process execute?

Three constraints impact initial placement decision1. Heterogeneous ISAs makes migration is difficult2. Fast message passing may be expected3. Processes might prefer a particular platform

Helios exports an affinity metric to applications Affinity is expressed in application metadata and acts as a

hint Positive represents emphasis on communication – zero copy

IPC Negative represents desire for non-interference

Page 11: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

11

Affinity Expressed in Manifests

Affinity easily edited by dev, admin, or user

<?xml version=“1.0” encoding=“utf-8”?><application name=TcpTest” runtime=full> <endpoints> <inputPipe id=“0” affinity=“0” contractName=“PipeContract”/> <endpoint id=“2” affinity=“+10” contractName=“TcpContract”/> </endpoints></application>

Page 12: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

12

Platform Affinity

Platform affinity processed firstGuarantees certain performance characteristics

X86NUMA

GP-GPUProgrammabl

eNIC

X86NUMA

/services/kernels/vector-CPUplatform affinity = +2

/services/kernels/x86platform affinity = +1

+2

+1 +1

Page 13: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

13

Positive Affinity

Represents ‘tight-coupling’ between processes Ensure fast message passing between processes

Positive affinities on each kernel summed

X86NUMA

GP-GPUProgrammabl

eNIC

X86NUMA

/services/TCPcommunication affinity = +1

/services/PNGEatercommunication affinity = +2

/services/antiviruscommunication affinity = +3

X86NUMA

Programmable NIC

+1

+2+5

TCP

PNG

A/V

Page 14: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

14

Negative Affinity

Expresses a preference for non-interference Used as a means of avoiding resource contention

Negative affinities on each kernel summed

X86NUMA

GP-GPUProgrammabl

eNIC

X86NUMA

/services/kernels/x86platform affinity = +100

/services/antivirusnon-interference affinity = -1

X86NUMA

-1X86NUMA A/V

Page 15: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

15

Self-Reference Affinity

Simple scale-out policy across available processors

X86NUMA

GP-GPUProgrammabl

eNIC

X86NUMA

/services/webservernon-interference affinity = -1

W1-1 -1W2

W3

Page 16: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

16

Turning policies into actions

Priority based algorithm reduces candidate kernels by: First: Platform affinities Second: Other positive affinities Third: Negative affinities Fourth: CPU utilization

Attempt to balance simplicity and optimality

Page 17: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

17

Encapsulating many architectures

Two-phase compilation strategy All apps first compiled to MSIL At install-time, apps compiled down to available ISAs

MSIL encapsulates multiple versions of a method

Example: ARM and x86 versions of Interlocked.CompareExchange function

Page 18: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

18

Implementation

Based on Singularity operating system Added satellite kernels, remote message passing, and

affinity

XScale programmable I/O card 2.0 GHz ARM processor, Gig E, 256 MB of DRAM Satellite kernel identical to x86 (except for ARM asm bits) Roughly 7x slower than comparable x86

NUMA support on 2-socket, dual-core AMD machine 2 GHz CPU, 1 GB RAM per domain Satellite kernel on each NUMA domain.

Page 19: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

19

Limitations

Satellite kernels require timer, interrupts, exceptions Balance device support with support for basic

abstractions GPUs headed in this direction (e.g., Intel Larrabee)

Only supports two platforms Need new compiler support for new platforms

Limited set of applications Create satellite kernels out of commodity system Access to more applications

Page 20: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

20

Outline

MotivationHelios design

Satellite kernels Remote message passing Affinity Encapsulating many ISAs

EvaluationConclusion

Page 21: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

21

Evaluation platform

NUMA EvaluationXScale

NIC

Kernel

X86 X86

SatelliteKernelNI

C

XScale

SatelliteKernel

X86NUMA

Single Kernel

X86NUMA

X86NUMA

SatelliteKernel

X86NUMA

SatelliteKernel

A B A B

Page 22: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

22

Offloading Singularity applications

Helios applications offloaded with very little effort

Name LOC LOC changed LOM changed

Networking stack

9600 0 1

FAT 32 FS 14200 0 1

TCP test harness

300 5 1

Disk indexer 900 0 1

Network driver 1700 0 0

Mail server 2700 0 1

Web server 1850 0 1

Page 23: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

23

Netstack offload

Offloading improves performance as cycles freed Affinity made it easy to experiment with offloading

PNG Size X86 Only uploads/sec

X86+Xscaleuploads/sec

Speedup % reduction in context switches

28 KB 161 171 6% 54%

92 KB 55 61 12% 58%

150 KB 35 38 10% 65%

290 KB 19 21 10% 53%

Page 24: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

24

Email NUMA benchmark

Satellite kernels improve performance 39%

0

10

20

30

40

50

60

70

80

90

No Sat. Kernel Sat. Kernel

Em

ail

s P

er

Seco

nd

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

No Sat. Kernel Sat. KernelIn

str

ucti

on

s P

er

Cycle

(IP

C)

Page 25: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

25

Related Work

Hive [Chapin et. al. ‘95] Multiple kernels – single system image

Multikernel [Baumann et. Al. ’09] Focus on scale-out performance on large NUMA

architectures

Spine [Fiuczynski et. al.‘98] Hydra [Weinsberg et. al. ‘08] Custom run-time on programmable device

Page 26: Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH Helios: Heterogeneous Multiprocessing with Satellite Kernels

26

Conclusions

Helios manages ‘distributed system in the small’ Simplify application development, deployment, tuning

4 techniques to manage heterogeneity Satellite kernels: Same OS abstraction everywhere Remote message passing: Transparent IPC between kernels Affinity: Easily express arbitrary placement policies to OS 2-phase compilation: Run apps on arbitrary devices

Offloading applications with zero code changesHelios code release soon.