gpu-accelerated signal processing in...

24
GPU Accelerated Signal Processing in OpenStack John Paul Walters Computer Scien5st, USC Informa5on Sciences Ins5tute [email protected]

Upload: doandung

Post on 03-May-2018

233 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

GPU Accelerated Signal Processing in OpenStack

John  Paul  Walters  Computer  Scien5st,  USC  Informa5on  Sciences  Ins5tute  

[email protected]  

Page 2: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

2

Outline

§  Motivation §  OpenStack Background §  Heterogeneous OpenStack §  GPU Performance across hypervisors §  Current Status §  Future work

Page 3: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

3

Motivation

§  Scientific workloads demand increasing performance with greater power efficiency –  Architectures have been driven towards specialization,

heterogeneity §  Infrastructure-as-a-Service (IaaS) clouds can democratize

access to the latest, most powerful accelerators –  Then why are most of today’s clouds homogeneous? –  Of the major providers, only Amazon offers virtual machine

access to GPUs in the public cloud

Page 4: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

4

Cloud Computing and GPUs

§  GPU passthrough has historically been hard –  Specific to particular GPUs, hypervisors, host OS –  Legacy VGA BIOS support, etc.

§  Today we can access GPUs through most of the major hypervisors –  KVM, VMWare ESXi, Xen, LXC

§  Combine this with a heterogeneous cloud

Page 5: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

5

OpenStack Background §  OpenStack founded by Rackspace

and NASA §  In use by Rackspace, HP, and

others for their public clouds §  Open source with hundreds of

participating companies §  In use for both public and private

clouds §  Current stable release: OpenStack

Havana –  OpenStack Icehouse to be

released in April

0  

20  

40  

60  

80  

100  

120  

Google  Trends  Searches  for  Common  Open  Source  IaaS  Projects  openstack  cloudstack  opennebula  eucalyptus  cloud  

Page 6: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

6

OpenStack Architecture

Image source: http://docs.openstack.org/training-guides/content/module001-ch004-openstack-architecture.html

Page 7: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

7

Supporting GPUs §  We’re pursuing multiple approaches for GPU support in

OpenStack –  LXC support for container-based VMs –  Xen support for fully virtualized guests –  KVM support for fully virtualized guests, SR-IOV

§  Also compare against VMWare ESXi §  Our OpenStack work currently supports GPU-enabled LXC

containers –  Xen prototype implementation as well

§  Given widespread support for GPUs across hypervisors, does hypervisor choice impact performance?

7  

Page 8: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

8

GPU Performance Across Hypervisors

§  2 CPU Architectures, 2 GPU Architectures, 4 Hypervisors –  Sandy Bridge + Kepler, Westmere + Fermi –  KVM, Xen, LXC, VMWare ESXi

§  Standardize on a common CentOS 6.4 base system for comparison –  Same 2.6.32-358.23.2 kernel across all guests and LXC

Page 9: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

9

Hardware Setup

Sandy  Bridge  +  Kepler   Westmere  +  Fermi  

CPU  (cores)   2xE5-­‐2670  (16)   2xX5660  (12)  

Clock  Speed   2.6  GHz   2.6  GHz  

RAM   48  GB   192  GB  

NUMA  Nodes   2   2  

GPU   1xK20m   2xC2075  

Page 10: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

10

Hypervisor Configuration

Hypervisor   Linux  Kernel   Linux  Distro  

KVM   3.12   Arch  2013.10.01  

Xen  4.3.0-­‐7   3.12  (dom0)   Arch  2013.10.01  

VMWare  ESXi  5.5.0   N/A   N/A  

LXC   2.6.32-­‐358.23.2   CentOS  6.4  

Page 11: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

11

Benchmarks §  3 Benchmarks

–  SHOC OpenCL: signal processing –  GPU-LIBSVM: big data, machine learning –  HOOMD: Molecular dynamics, GPUDirect

§  Virtual machines: CentOS 6.4 with 2.6.32-358.23.2 kernel, 20 GB RAM, and 1 CPU socket –  Control for NUMA effects

Page 12: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

12

K20 Results - SHOC

0.96  0.97  0.98  0.99  

1  1.01  

RelaGv

e  Pe

rforman

ce  

SHOC  Performance  for  Common  Signal  Processing  Kernels    

KVM  

Xen  

LXC  

VMWare  

Page 13: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

13

K20 Results – SHOC Outliers

0.94  0.96  0.98  

1  1.02  1.04  1.06  

RelaGv

e  Pe

rforman

ce  

SHOC  OpenCL  Level  1,  Level  2  Outliers  

KVM  Xen  LXC  VMWare  

Page 14: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

14

C2075 Results - SHOC

0.82  0.84  0.86  0.88  0.9  0.92  0.94  0.96  0.98  

1  1.02  

RelaGv

e  Pe

rforman

ce  

SHOC  Performance  for  Common  Signal  Processing  Kernels  

KVM  

Xen  

LXC  

VMWare  

Page 15: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

15

C2075 Results – SHOC Outliers

0.6  0.7  0.8  0.9  1  

1.1  

RelaGv

e  Pe

rforman

ce  

SHOC  OpenCL  Level  1,  Level  2  Outliers  

KVM  Xen  LXC  VMWare  

Page 16: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

16

SHOC Observations §  Overall both Fermi and Kepler systems perform near-

native –  This is especially true for KVM and LXC

§  Xen on the C2075 system shows some overhead –  Likely because Xen couldn’t activate large page tables

§  Some unexpected performance improvement for Kepler Spmv

Page 17: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

17

K20 Results – GPU-LIBSVM

0.88  0.9  

0.92  0.94  0.96  0.98  

1  1.02  

1800   3600   4800   6000  

RelaGv

e  Pe

rforman

ce  

#  of  training  instances  

GPU-­‐LIBSVM  RelaGve  Performance  

KVM  

Xen  

LXC  

VMWare  

Page 18: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

18

C2075 Results – GPU-LIBSVM

0  0.2  0.4  0.6  0.8  1  

1.2  1.4  

1800   3600   4800   6000  

RelaGv

e  Pe

rforman

ce  

#  of  training  instances  

GPU-­‐LIBSVM  RelaGve  Performance  

KVM  

Xen  

LXC  

VMWare  

Page 19: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

19

GPU-LIBSVM Observations §  Unexpected performance improvement for KVM on both

systems –  Most pronounced on Westmere/Fermi platform

§  This is due to the use of transparent hugepages (THP) –  Back the entire guest memory with hugepages –  Improves TLB performance

§  Disabling hugepages on Westmere/Fermi platform reduces performance to 80-87% of the base system

Page 20: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

20

Multi-GPU with GPUDirect §  Many real applications extend beyond a single node’s

capabilities §  Test multi-node performance with Infiniband SR-IOV and

GPUDirect §  2 Sandy Bridge nodes equipped with K20 GPUs

–  ConnectX-3 IB with SR-IOV enabled –  Ported Mellanox OFED 2.1-1 to 3.13 kernel –  KVM hypervisor

§  Test with HOOMD, a commonly used GPUDirect-enabled particle dynamics simulator

Page 21: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

21

GPUDirect Advantage

Image source: http://old.mellanox.com/content/pages.php?pg=products_dyn&product_family=116

Page 22: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

22

HOOMD Performance

0.92  

0.94  

0.96  

0.98  

1  

1.02  

16k   32k   64k   128k   256k   512k  RelaGv

e  Pe

rforman

ce  

Number  of  ParGcles  

HOOMD  Lennard-­‐Jones  Liquid  MD  RelaGve  Performance  

Page 23: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

23

Current Status

§  Source code is available now –  https://github.com/usc-isi/nova

§  Includes support for heterogeneity –  GPU-enabled LXC instances –  Bare-metal provisioning –  Architecture-aware scheduler –  Prototype Xen with GPU passthrough implementation

Page 24: GPU-Accelerated Signal Processing in OpenStackon-demand.gputechconf.com/...accelerated-signal-processing-opensta… · Title: GPU-Accelerated Signal Processing in OpenStack Author:

24

Future Work §  Primary focus: multi-node

–  Greater range of applications, larger systems §  Integrate GPU passthrough support for KVM

–  This might come free with the existing OpenStack PCI passthrough work

§  NUMA support –  This work assumes perfect NUMA mapping –  OpenStack should be NUMA-aware