microsecond latency, real-time, multi-input/output control...

30
Microsecond Latency, Real-Time, Multi-Input/Output Control using GPU Processing Nikolaus Rath March 20th, 2013 N. Rath (Columbia University) μs Latency Control using GPU Processing March 20th, 2013 1 / 23

Upload: others

Post on 24-Jun-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Microsecond Latency, Real-Time,Multi-Input/Output Control using GPU Processing

Nikolaus Rath

March 20th, 2013

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 1 / 23

Page 2: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Outline

1 Motivation

2 GPU Control System Architecture

3 Performance

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 2 / 23

Page 3: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Outline

1 Motivation

2 GPU Control System Architecture

3 Performance

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 3 / 23

Page 4: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Fusion keeps the Sun Burning

Nuclear fusion is the processthat keeps the sun burning.

Very hot hydrogen atoms(the “plasma”) collide toform helium, releasing lots ofenergy

Would be great to replicatethis on earth. Plenty of fuelavailable, and no risk ofnuclear meltdown.

Challenges: heat things tomillions of degrees (not sohard), and keep themconfined (very hard)

n + 14.1 MeV

H2H3

He + 3.5 MeV4

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 4 / 23

Page 5: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

At Millions of Degrees, Small Plasmas EvaporateAway

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 5 / 23

Page 6: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Magnetic Fields Constrain Plasma Movement toOne Dimension

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 6 / 23

Page 7: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Closed Magnetic Fields Can Confine Plasmas

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 7 / 23

Page 8: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Tokamaks Confine Plasmas Using Magnetic Fields

Orange, Magenta, Green: magnetic field generating coils

Violet: plasma; Blue: single magnetic field line (example)

1 meter radius, 1 million °C, 15000 Ampere current

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 8 / 23

Page 9: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Self Generated Fields Cause Instabilities

Electric currents (which generate magnetic fields) flow not justin the coils, but also in the plasma itself

The plasma thus modifies the fields that confine it

... sometimes in a self-amplifying way – instability

Typical shape: rotating, helical deformation. Timescale: 50microseconds.

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 9 / 23

Page 10: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Only High-Speed Feedback Control Can PreserveConfinement

Sensors detect deformations due to plasma currentsControl coils dynamically push back – “feedback control”

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 10 / 23

Page 11: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Outline

1 Motivation

2 GPU Control System Architecture

3 Performance

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 11 / 23

Page 12: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Real-Time Performance is Determined ByLatency and Sampling Period

S sample paket

GPU Processing Pipelines

S

S S S

S S S

Digitizer Analog Output

samplingperiod

S

S

latency

Latency is response time of feedback system

Sampling period determines smoothness

Algorithmic complexity limits latency, not sampling period

Need both latency and sampling period in the order of fewmicroseconds

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 12 / 23

Page 13: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Control Algorithm is Implemented in One Kernel

CPU GPU

Read input data

Process data

Send data to GPU memory

Start GPU kernel A

Wait for GPU kernel A

Read results fromGPU Memory

Computeresult a

Process results

Send new data to GPU memory

Start GPU kernel B

Wait for GPU kernel B

Read results fromGPU Memory

Computeresult b

... ...

Time

CPU GPU

Read data

Send parametersto GPU memory

Start GPU kernel

Wait for GPU kernel

Computeresult a

Process results

Computeresult b

...

Process data

Write output data

Write output data

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 13 / 23

Page 14: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Redundant PCIe Transfers have to be Avoided ToReduce Latency

Traditional

Data bounces throughhost RAM

PCIe bus has multi GB/sthroughput

Transfer setup takesseveral µs

Okay if data chunksare big, transfer andprocessing takes long

Bad if latency is longerthan transfer time

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 14 / 23

Page 15: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Redundant PCIe Transfers have to be Avoided ToReduce Latency

New

Peer-to-peer transferseliminate need forbounce buffer

Good performanceeven for smallamounts of data

Can be implementedin software (kernel)

Required peer-to-peercapable root-complexpresent in most mid-to high-endmainboards.

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 14 / 23

Page 16: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Peer-to-peer PCIe transfers are set up by sharingBARs

A/D ModuleGPU

BARs

0x020x01 0x03BARs

0x05 0x06

GPU Memory

DMAController

writes

D/A Module

BARs0x08 0x09

DMAController

reads

0x03 0x01

Initialized fromBIOS by CPU

PCIe devices communicate via “BARs” in the PCI address space

GPU can map part of its memory into a BAR

AD/DA modules can transfer to/from arbitrary PCI address

CPU establishes communication by telling AD/DA modulesabout GPU BAR.

Required some trickery in the past, but with CUDA 5 nowofficially supported.

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 15 / 23

Page 17: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Example: Userspace

/* Allocate buffer with extra space for 64kb alignment */CUdeviceptr dev_addr;cuMemAlloc(&dev_addr, size + 0xFFFF);

/* Prepare mapping */CUDA_POINTER_ATTRIBUTE_P2P_TOKENS tokens;cuPointerGetAttribute(&tokens, CU_POINTER_ATTRIBUTE_P2P_TOKENS,

dev_addr);

/* Align to 64kb */dev_addr += 0xFFFF;dev_addr &= ~0xFFFF;

/* Call custom kernel module to get bus address,

* @fd refers to open device file */struct rdma_info s;s.dev_addr = dev_addr;s.p2pToken = tokens.p2pToken;s.vaSpaceToken = tokens.vaSpaceToken;s.size = size;ioctl(fd, RDMA_TRANSLATE_TOKEN, &s)

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 16 / 23

Page 18: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Example: Kernelspace

long rtm_t_dma_ioctl(struct file *filp, unsigned int cmd,unsigned long arg) {

nvidia_p2p_page_table_t *page_table;// ....switch(cmd){case RDMA_TRANSLATE_TOKEN: {

COPY_FROM_USER(&rdma_info, varg, sizeof(struct rdma_info));nvidia_p2p_get_pages(rdma_info.p2pToken, rdma_info.vaSpaceToken,

rdma_info.dev_addr, rdma_info.size,&page_table, rdma_free_callback, tdev);

rdma_info.bus_addr = page_table->pages[0]->physical_address;COPY_TO_USER(varg, &rdma_inf, sizeof(struct rdma_info));return 0;

}// Other ioctls

}

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 17 / 23

Page 19: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Userspace Continued

/* Call custom kernel module to get bus address,

* @fd refers to open device file */rtm_t_rdma_info s;s.dev_addr = dev_addr;ioctl(fd, RTM_T_TRANSLATE_TOKEN, &s)

/* Retrieve bus address */uint64_t bus_addr;bus_addr = s.bus_addr;

/* Send bus address to digitizer */init_rtm_t(bus_addr, other, stuff, here);

// Start GPU kernel

// Kernel polls input buffer

// Wait for kernel to complete

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 18 / 23

Page 20: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Outline

1 Motivation

2 GPU Control System Architecture

3 Performance

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 19 / 23

Page 21: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

The HBT-EP Plasma Control System was Built withCommodity Hardware.

Hardware:

Workstation PC

NVIDIA GeForce GTX 580

D-TACQ ACQ196 A-D Converter(96 channels, 16 bit)

2 D-TACQ AO32CPCI D-A Converter(2 x 32 channels, 16 bit)

Standard Linux host system(no real-time kernel required!)

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 20 / 23

Page 22: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

P2P Transfers Reduce Latency by 50%

2 4 6 8 10 12 14 16 18 20Sampling Period [us]

10

12

14

16

18

20

22

24La

tenc

y [u

s]

GPU RAMHost RAM

Optimal latency when using host memory: 16 µs

Optimal latency when using GPU memory: 10 µs

50% difference does not mean having to wait twice as long, itis the difference between things blowing up or not.

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 21 / 23

Page 23: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

GPU Beats CPU in Computational and Real-TimePerformance even in the Microsecond Regime

Performance testedwith repeated matrixapplication

GPU beats CPU downto 5 µs

Missed samplescounted in 1000 runs

Missed samples withGPU: None, with CPU:up to 2.5%

30 40 50 60 70 80 90 100Matrix Size

0

10

20

30

40

50

60

70

Sam

plin

g Pe

riod

[us]

GPUCPU

0.0 0.5 1.0 1.5 2.0 2.5Missed Samples [%]

10-1

100

101

102

103

Coun

t

CPUGPU

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 22 / 23

Page 24: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Summary

1 The advantages of GPUs are not restricted to large problemsrequiring long calculations.

2 Even when processing kB sized batches under microsecondlatency constraints, GPUs can be faster than CPUs, while at thesame time offering better real-time performance.

3 In these regimes, data transfer overhead becomes thedominating factor, and using peer to peer transfers improvesperformance by more than 50%.

4 A GPU based real-time control system has been developed atColumbia University and tested for the control of magneticallyconfined plasmas. Contact us for details.

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 23 / 23

Page 25: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Outline

4 Backup Slides

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 1 / 6

Page 26: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Latency and Sampling Period are MeasuredExperimentally by Copying Square Waves

2380 2385 2390 2395 2400 2405Time [us]

0.20

0.15

0.10

0.05

0.00

0.05

0.10

0.15

0.20Vo

ltShot 76504A

BControl InputControl OutputSample Clock

Control algorithm set up to copy input to output 1:1

Blue trace is input square wave

Green trace is output square wave

Output lags behind input by control system latency

Red trace is sampling interval (sampling on downward edge)

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 2 / 6

Page 27: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Plasma Physics Results: Dominant Mode AmplitudeReduced by up to 60%

20 15 10 5 0 5 10 15 20Frequency [kHz]

0.00

0.08

0.16

0.24

Ampl

itude

No FBg=144g=577

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 3 / 6

Page 28: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Self Generated Fields Cause Instabilities

Electric currents (which generate magnetic fields) flow not justin the coils, but also in the plasma itself

The plasma thus modifies the fields that confine it

... sometimes in a self-amplifying way – instability

Typical shape: rotating, helical deformation. Timescale: 50microseconds.

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 4 / 6

Page 29: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Feedback Control uses Measurements to DetermineControl Signals

Controller SystemActuators

SensorsMeasurements / Control Input

Input OutputControl Signal / Control Output

PhysicalInteraction

PhysicalInteraction

Goal: keep system in specific state

If system is perfectly known, can calculate required controlsignals (open-loop control)

In practice, need to use measurements to determine effects andupdate signals: feedback control

A control system acquires measurements, performs computations,and generates control output to manipulate the system state.

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 5 / 6

Page 30: Microsecond Latency, Real-Time, Multi-Input/Output Control ...on-demand.gputechconf.com/gtc/2013/presentations/S... · Microsecond Latency, Real-Time, Multi-Input/Output Control using

Data Passthrough Establishes 8 µs Lower Latency Limit

0 2 4 6 8 10 12Sampling Period [us]

8

10

12

14

16La

tenc

y [u

s]

GPU RAMHost RAM

Control system uses same buffer to write input and read output

No GPU processing, so no difference between host and GPUmemory

Jump: 4 µs required for A-D conversion and data push

Offset: 4 µs required for data pull and D-A conversion

N. Rath (Columbia University) µs Latency Control using GPU Processing March 20th, 2013 6 / 6