8/14/17 Miles Cranmer 1
Bifrost: Easy GPU Pipeline Development
github.com/ledatelescope/bifrost
• Presenter: Miles Cranmer (CfA/McGill)• On behalf of: Ben Barsdell (NVIDIA), Danny Price
(Berkeley), Jayce Dowell (UNM), Hugh Garsden (CfA), Frank Schinzel (NRAO), Greg Taylor (UNM), Lincoln Greenhill (CfA)
Stream-processing and real-time GPU computing
• Stream-processing: operating on data which is potentially unlimited in extent
• E.g., time stream of digitized voltages
• Nontrivial for CPU/GPU systems:• Creation of data structures for buffer memory management,
packet capture• Additional complexities for asynchronous copies and kernel
execution• Manual parallelization/core binding of algorithms and pipelines• Potential issues include memory leaks and race conditions
8/14/17 Miles Cranmer 2
Bifrost is deployed in the wild:• Backend for newest LWA
station in NM• Bifrost-powered data
capture for live all-sky image• Google: “LWA TV 2”
• Pulsar detection:• Validation timing within
0.0001 ms of canonical for PSR B0834+06 (well within 1σ of measurement)
8/14/17 Miles Cranmer 3
Bifrost core concepts
• Blocks• Independent thread• “Black box” algorithm
• Ring buffers (Rings)• Emulates wrap-around
in memory• Memory spaces
• Rings assigned to specific “space”
• Pipelines• Combination of the
above8/14/17 Miles Cranmer 4
The Bifrost framework
• Python frontend wraps fast C/C++/CUDA backend• Frontend:
• Blocks and Pipelines are Python object abstractions for the backend
• ND-array object for memory management (span of ring buffer)• ctypes wraps all C calls
• Backend:• Common type definitions and “BFarray” generic data structure• “Ring buffer” used for inter-block communication• Several common modules implemented
8/14/17 Miles Cranmer 5
Ring Buffer implementation
• Multiple readers, single writer ⇒ branched pipelines OK• Thread safe• Allocated in system (CPU), cuda (GPU), or cuda_host (pinned CPU)
memory
• What’s unique?• Read/write any location, any size• Ringlets• Optional guarantee• Metadata rich• Multi-”sequence” = resizable, changeable metadata
8/14/17 Miles Cranmer 6
API example 1: block
8/14/17 Miles Cranmer 7
class QuantizeBlock(TransformBlock): def __init__(self, iring, dtype, scale=1., *args, **kwargs): TransformBlock.__init__(self, iring, *args, **kwargs) self.dtype = dtype self.scale = scale def on_sequence(self, isequence): ohdr = deepcopy(isequence.header) ohdr['_tensor']['dtype'] = self.dtype return ohdr def on_data(self, ispan, ospan): bf.quantize.quantize(ispan.data, ospan.data, self.scale)
API example 2: pipeline
Read in file
Copy to GPUFFT
Square modulusTranspose
Copy back to CPUConvert to 8-bit
integerSave
Run the pipeline
8/14/17 Miles Cranmer 8
bc = bf.BlockChainer() bc.blocks.read_wav(['audio_file.wav'], gulp_nframe=4096) bc.blocks.copy(space='cuda') bc.views.split_axis('time', 256, label='fine_time') bc.blocks.fft(axes='fine_time', axis_labels='freq') bc.blocks.detect(mode='scalar')
bc.blocks.transpose(['time', 'pol', 'freq']) bc.blocks.copy(space='cuda_host') bc.blocks.quantize('i8')
bc.blocks.write_sigproc() pipeline = bf.get_default_pipeline() pipeline.shutdown_on_signals() pipeline.run()
bf.map
• Easy CUDA kernel generation from Bifrost• JIT compiler uses NVRTC
8/14/17 Miles Cranmer 9
# Create three arrays on the GPU, A and B, and an empty output C a = bf.ndarray([1,2,3,4,5], space='cuda') b = bf.ndarray([1,0,1,0,1], space='cuda') c = bf.empty(5, space='cuda') # Add A, B together bf.map("c = a + b", data={'c': c, 'a': a, 'b': b})
bf.map
Explicit indexing also supported.Outer product:
8/14/17 Miles Cranmer 10
bf.map("c(i,j) = a(i) * b(j)",
{'c': c, 'a': a, 'b': b},
axis_names=('i','j'))
Why Bifrost?
8/14/17 Miles Cranmer 11
Astronomy-specific
• Bifrost developed in parallel with LWA-SV, driven by radio astronomy applications
• ⇒ Core structural advantages for astronomy
• Ring features• Metadata describes the units of ring buffer dimensions; used in algorithms
(e.g., dedispersion)• Multi-sequence ring buffers, useful for different observations. The metadata
will propagate down the pipeline.• Time-tagged sequences in ring buffers ⇒ can dump section of data to disk
based on time range, observation name• Useful for detections of transient phenomena
• Ndarray is a child of numpy.ndarray ⇒ compatibility with many numpy functions, matplotlib, etc.
Why Bifrost?
8/14/17 Miles Cranmer 12
Block library
Many astronomy and general processing blocks already built• State of the art and flexible high-performance implementations• Metadata rich• Well-documented• Flexible dimensions
These include:
Why Bifrost?
• accumulate • audio • binary_io • detect • fdmt • fft • fftshift • guppi_raw • quantize • reduce • reverse • serialize • sigproc • transpose • unpack • wav
8/14/17 Miles Cranmer 13
Logging and performance benchmarking
Why Bifrost?
• getirq• getsiblings• like_bmon• like_ps• like_top• pipeline2dot• setirq
8/14/17 Miles Cranmer 14
Rapid development speed; high performance
Why Bifrost?
Bifrost code vs. C++ legacy:
8/14/17 Miles Cranmer 15
Rapid development speed; high performance
Why Bifrost?
8/14/17 Miles Cranmer 16
Rapid development speed; high performance
Why Bifrost?
8/14/17 Miles Cranmer 17
Conclusion
• Future work• PSRDADA – Bifrost block
• To enable capture with PSRDADA to a Bifrost ring for post-processing• Additional options for visualization, "ScopeBlock”
• Visualize ring contents in real-time• Aiming for full support of correlation, pulsar/transient backend pipelines
github.com/ledatelescope/bifrost(or, Google: “leda telescope bifrost”)
8/14/17 Miles Cranmer 18