brook for gpus ian buck, tim foley, daniel horn, jeremy sugerman pat hanrahan february 10th, 2003
Post on 19-Dec-2015
216 views
TRANSCRIPT
February 11th, 2004 2
Brook: general purpose streaming language
• developed for PCA Program/Merrimac
– compiler: RStream• Reservoir Labs
– DARPA PCA Program• Stanford: SmartMemories• UT Austin: TRIPS• MIT: RAW
– Brook version 0.2 spec: http://merrimac.stanford.edu
– Brook for GPUs: http://brook.sourceforce.net
StreamExecution Unit
StreamRegister File
MemorySystem
NetworkInterface
ScalarExecution
Unit
texttext
DRDRAMNetwork
February 11th, 2004 3
Brook: general purpose streaming language
• stream programming model– enforce data parallel computing
• streams
– encourage arithmetic intensity• kernels
• C with streams
February 11th, 2004 4
Brook for gpus
• demonstrate gpu streaming coprocessor– make programming gpus easier
• hide texture/pbuffer data management• hide graphics based constructs in CG/HLSL• hide rendering passes• virtualize resources
– performance!• … on applications that matter
– highlight gpu areas for improvement• features required general purpose stream
computing
February 11th, 2004 5
system outline
.brBrook source files
brccsource to source
compiler
brtBrook run-time library
February 11th, 2004 6
Brook language
streams• streams
– collection of records requiring similar computation
• particle positions, voxels, FEM cell, …
float3 positions<200>;
float3 velocityfield<100,100,100>;
– encourage data parallelism
February 11th, 2004 7
Brook language
kernels• kernels
– functions applied to streams• similar to for_all construct
kernel void foo (float a<>, float b<>, out float result<>) {
result = a + b;}
float a<100>;float b<100>;float c<100>;
foo(a,b,c);for (i=0; i<100; i++)
c[i] = a[i]+b[i];
– no dependencies between stream elements• encourage high arithmetic intensity
February 11th, 2004 8
Brook language
kernels• Ray Triangle Intersection
kernel void krnIntersectTriangle(Ray ray<>, Triangle tris[], RayState oldraystate<>, GridTrilist trilist[], out Hit candidatehit<>) { float idx, det, inv_det; float3 edge1, edge2, pvec, tvec, qvec; if(oldraystate.state.y > 0) { idx = trilist[oldraystate.state.w].trinum; edge1 = tris[idx].v1 - tris[idx].v0; edge2 = tris[idx].v2 - tris[idx].v0; pvec = cross(ray.d, edge2); det = dot(edge1, pvec); inv_det = 1.0f/det; tvec = ray.o - tris[idx].v0; candidatehit.data.y = dot( tvec, pvec ) * inv_det; qvec = cross( tvec, edge1 ); candidatehit.data.z = dot( ray.d, qvec ) * inv_det; candidatehit.data.x = dot( edge2, qvec ) * inv_det; candidatehit.data.w = idx; } else { candidatehit.data = float4(0,0,0,-1); }}
February 11th, 2004 9
Brook language
additional features• reductions
– scalar– stream
• stride & repeat• GatherOp & ScatterOp
– a[i] += p – p = a[i]++
February 11th, 2004 10
brcc compiler
infrastructure• based on ctool
– http://ctool.sourceforge.net
• parser– build code tree– extend C grammar to accept Brook
• convert– tree transformations
• codegen– generate cg & hlsl code– call cgc, fxc– generate stub function
February 11th, 2004 11
Applications
Ray-tracerFFTSegmentationLinear Algebra:
– BLAS, LINPACK, LAPACK
February 11th, 2004 15
GPU Gotchas
NVIDIA:• Register Penalty• Render to Texture Limitation
– Requires explicit copy or heavy pbuffer solution– Superbuffer extension neededhttp://mirror.ati.com/developer/SIGGRAPH03/Percy_OpenGL_Extensions SIG03.pdf
February 11th, 2004 16
GPU Gotchas
ATI Radeon 9800 Pro• Limited dependent
texture lookup• 96 instructions• 24-bit floating point
– s16e7Integers up to 131,072(s23e8: 16,777,216)
Memory Refs
Math Ops
Memory Refs
Math Ops
Memory Refs
Math Ops
Memory Refs
Math Ops
11
22
33
44
February 11th, 2004 17
GPU Catch-Up!
• Integer & Bit Ops & Double Precision• Memory Addressing• CGC/FXC Performance
– Hand code performance critical code
• No native reduction support• No native scatter support
– p[i] = a (indirect write)
• No programmable blend– GatherOp / ScatterOp
• Limited 4x4 output– Brook virtualized kernel outputs
• Readback still slow– NV35 OpenGL: 600 MB/sec Download 170 MB/sec Readback– ATI DirectX: 550 MB/sec Download 50 MB/sec Readback
February 11th, 2004 18
GPUs of the future (we hope)
• Complete Instruction Sets– Integers, Bit Ops, Doubles, Mem Access
• Integration– Streaming coprocessor not just a rendering
device
• Streaming architectures
SDRAM
SDRAM
SDRAM
SDRAM
Str
eam
R
egis
ter
Fil
e ALU Cluster
ALU Cluster
ALU Cluster
February 11th, 2004 19
Brook for GPUs
• Release v0.3 available on Sourceforge• Project Page
– http://graphics.stanford.edu/projects/brook
• Source– http://www.sourceforge.net/projects/brook
• Over 4K downloads!• Questions?
Fly-fishing fly images from The English Fly Fishing Shop