Download - Bifrost: Easy GPU Pipeline Development · 2017-08-23 · Bifrost is deployed in the wild: • Backend for newest LWA station in NM • Bifrost-powered data capture for live all-sky

8/14/17 Miles Cranmer 1

Bifrost: Easy GPU Pipeline Development

github.com/ledatelescope/bifrost

• Presenter: Miles Cranmer (CfA/McGill)• On behalf of: Ben Barsdell (NVIDIA), Danny Price

(Berkeley), Jayce Dowell (UNM), Hugh Garsden (CfA), Frank Schinzel (NRAO), Greg Taylor (UNM), Lincoln Greenhill (CfA)

Stream-processing and real-time GPU computing

• Stream-processing: operating on data which is potentially unlimited in extent

• E.g., time stream of digitized voltages

• Nontrivial for CPU/GPU systems:• Creation of data structures for buffer memory management,

packet capture• Additional complexities for asynchronous copies and kernel

execution• Manual parallelization/core binding of algorithms and pipelines• Potential issues include memory leaks and race conditions


Bifrost is deployed in the wild:• Backend for newest LWA

station in NM• Bifrost-powered data

capture for live all-sky image• Google: “LWA TV 2”

• Pulsar detection:• Validation timing within

0.0001 ms of canonical for PSR B0834+06 (well within 1σ of measurement)


Bifrost core concepts

• Blocks• Independent thread• “Black box” algorithm

• Ring buffers (Rings)• Emulates wrap-around

in memory• Memory spaces

• Rings assigned to specific “space”

• Pipelines• Combination of the

above8/14/17 Miles Cranmer 4

The Bifrost framework

• Python frontend wraps fast C/C++/CUDA backend• Frontend:

• Blocks and Pipelines are Python object abstractions for the backend

• ND-array object for memory management (span of ring buffer)• ctypes wraps all C calls

• Backend:• Common type definitions and “BFarray” generic data structure• “Ring buffer” used for inter-block communication• Several common modules implemented


Ring Buffer implementation

• Multiple readers, single writer ⇒ branched pipelines OK• Thread safe• Allocated in system (CPU), cuda (GPU), or cuda_host (pinned CPU)

memory

• What’s unique?• Read/write any location, any size• Ringlets• Optional guarantee• Metadata rich• Multi-”sequence” = resizable, changeable metadata


API example 1: block


class QuantizeBlock(TransformBlock): def __init__(self, iring, dtype, scale=1., *args, **kwargs): TransformBlock.__init__(self, iring, *args, **kwargs) self.dtype = dtype self.scale = scale def on_sequence(self, isequence): ohdr = deepcopy(isequence.header) ohdr['_tensor']['dtype'] = self.dtype return ohdr def on_data(self, ispan, ospan): bf.quantize.quantize(ispan.data, ospan.data, self.scale)

API example 2: pipeline

Read in file

Copy to GPUFFT

Square modulusTranspose

Copy back to CPUConvert to 8-bit

integerSave

Run the pipeline


bc = bf.BlockChainer() bc.blocks.read_wav(['audio_file.wav'], gulp_nframe=4096) bc.blocks.copy(space='cuda') bc.views.split_axis('time', 256, label='fine_time') bc.blocks.fft(axes='fine_time', axis_labels='freq') bc.blocks.detect(mode='scalar')

bc.blocks.transpose(['time', 'pol', 'freq']) bc.blocks.copy(space='cuda_host') bc.blocks.quantize('i8')

bc.blocks.write_sigproc() pipeline = bf.get_default_pipeline() pipeline.shutdown_on_signals() pipeline.run()

bf.map

• Easy CUDA kernel generation from Bifrost• JIT compiler uses NVRTC


# Create three arrays on the GPU, A and B, and an empty output C a = bf.ndarray([1,2,3,4,5], space='cuda') b = bf.ndarray([1,0,1,0,1], space='cuda') c = bf.empty(5, space='cuda') # Add A, B together bf.map("c = a + b", data={'c': c, 'a': a, 'b': b})

bf.map

Explicit indexing also supported.Outer product:


bf.map("c(i,j) = a(i) * b(j)",

{'c': c, 'a': a, 'b': b},

axis_names=('i','j'))

Why Bifrost?


Astronomy-specific

• Bifrost developed in parallel with LWA-SV, driven by radio astronomy applications

• ⇒ Core structural advantages for astronomy

• Ring features• Metadata describes the units of ring buffer dimensions; used in algorithms

(e.g., dedispersion)• Multi-sequence ring buffers, useful for different observations. The metadata

will propagate down the pipeline.• Time-tagged sequences in ring buffers ⇒ can dump section of data to disk

based on time range, observation name• Useful for detections of transient phenomena

• Ndarray is a child of numpy.ndarray ⇒ compatibility with many numpy functions, matplotlib, etc.

Why Bifrost?


Block library

Many astronomy and general processing blocks already built• State of the art and flexible high-performance implementations• Metadata rich• Well-documented• Flexible dimensions

These include:

Why Bifrost?

• accumulate • audio • binary_io • detect • fdmt • fft • fftshift • guppi_raw • quantize • reduce • reverse • serialize • sigproc • transpose • unpack • wav


Logging and performance benchmarking

Why Bifrost?

• getirq• getsiblings• like_bmon• like_ps• like_top• pipeline2dot• setirq


Rapid development speed; high performance

Why Bifrost?

Bifrost code vs. C++ legacy:



Why Bifrost?


Conclusion

• Future work• PSRDADA – Bifrost block

• To enable capture with PSRDADA to a Bifrost ring for post-processing• Additional options for visualization, "ScopeBlock”

• Visualize ring contents in real-time• Aiming for full support of correlation, pulsar/transient backend pipelines

github.com/ledatelescope/bifrost(or, Google: “leda telescope bifrost”)


Download - Bifrost: Easy GPU Pipeline Development · 2017-08-23 · Bifrost is deployed in the wild: • Backend for newest LWA station in NM • Bifrost-powered data capture for live all-sky

Top Related