a streaming virtual machine for gpus · high level compilers (hlc) virtual machine api machine...
TRANSCRIPT
![Page 1: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/1.jpg)
Mackenzie 1HPEC, 22-Sep-2005
r eser voi r abs
A Streaming Virtual Machine for GPUs
Kenneth Mackenzie (Reservoir Labs, Inc)Dan Campbell (Georgia Tech Research Institute)
Peter Szilagyi (Reservoir Labs, Inc)
Copyright © 2005 Government Purpose Rights, All Other Rights Reserved
![Page 2: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/2.jpg)
Mackenzie 2HPEC, 22-Sep-2005
r eser voi r abs
Goal: Compile to PCs w/GPUs
foo.c
CPU
DRAM
GPU
VRAM
12 GFLOPS
6.4 GB/s
45 GFLOPS
38 GB/s
![Page 3: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/3.jpg)
Mackenzie 3HPEC, 22-Sep-2005
r eser voi r abs
Barriers to General-Purpose Use
• Hardware:– Severe GPU programming restrictions!
y=f(x) applied in parallel over an array, y.– CPU<->GPU bottleneck: 4GB/s
• Compiler:– No existing streaming compiler
• Abstraction:– GPU drivers built for graphics– Driver and hardware details are
proprietary
hostproc. GPU
12 GFLOPS
6.4 GB/s
45 GFLOPS
38 GB/s
DRAMVRAM
![Page 4: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/4.jpg)
Mackenzie 4HPEC, 22-Sep-2005
r eser voi r abs
Subgoal: Build and Evaluate an Abstraction atop GPUs
• Hardware:– Severe GPU programming restrictions!
y=f(x) applied in parallel over an array.– CPU<->GPU pipe: 4GB/s
• Compiler:– No existing streaming compiler
• Abstraction:– GPU drivers built for graphics– Driver and hardware details are
proprietary
GPU vendors working on more general functionality
Reservoir and others working under DARPA Polymorphous Computing Architectures (PCA) program
This project: implement PCA’s Streaming Virtual Machine (SVM) abstraction atop GPUsand evaluate it.
![Page 5: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/5.jpg)
Mackenzie 5HPEC, 22-Sep-2005
r eser voi r abs
Status; Related Work
• Status: in-progress– Runs simple programs end-to-end
• Must spoon-feed programs through the not-quite-GPU-aware streaming compiler.
– Experimenting with feedback
• Related Work:– BrookGPU, Ian Buck, et al (Stanford), SIGGRAPH, 2004.– PUG, Mark Harris (nVidia), GPU Gems 2, 2005.– Sh, Michael McCool, et al (Waterloo), Graphics Hardware 2002.
– All are programmer interfaces, not compiler targets.
![Page 6: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/6.jpg)
Mackenzie 6HPEC, 22-Sep-2005
r eser voi r abs
Outline
• Background on GPUs (2 slides)
• Streaming Virtual Machine
• Prototype SVM Toolchain
• Results
• Future Work
![Page 7: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/7.jpg)
Mackenzie 7HPEC, 22-Sep-2005
r eser voi r abs
GPUs
• GPUs implement the last few stages of a standard 3D graphics rendering pipeline.
• Recent GPUs employ embedded multiprocessors (e.g. 24-way SIMD) for programmability in several the stages.
• Trend is toward more generality and wider multiprocessing.
Illustration: from Cg Toolkit User’s Manual, nVidia corp.
![Page 8: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/8.jpg)
Mackenzie 8HPEC, 22-Sep-2005
r eser voi r abs
GPUs for non-Graphics Programs
• Use the “fragment processor” embedded multiprocessor only.– Ignore for now potentially useful but mind-bending hardware
goodies.• Place data arrays in textures.• Compute y=f(x1, x2, ...) where y, xs are textures and f() is a
function of any entries in the xs onto each entry in y.
• Many and serious restrictions:– No-scatter constraint: gather from xs but no scatter to y– No local storage; no loop-carried dependencies.– Ops are 32-bit, not-quite-IEEE floating-point; no integer.– Branches permitted but penalized by SIMD architecture– Byzantine limits/costs on the complexity of f()– Substantial startup overhead; N1/2 in 1000s
![Page 9: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/9.jpg)
Mackenzie 9HPEC, 22-Sep-2005
r eser voi r abs
Streaming Virtual Machine
![Page 10: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/10.jpg)
Mackenzie 10HPEC, 22-Sep-2005
r eser voi r abs
DARPA Polymorphous Computing Architectures (PCA)Tiled Multiprocessors
• Chip multiprocessors built of replicated tiles• Architectural novelty: mechanisms for combining tiles into
larger units• “Polymorphous”: configure the hardware to match the
application, e.g. “threaded” vs. “streaming”
MITRAW
ISI/RaytheonMonarch
UT AustinTRIPS
StanfordSmart Memories
![Page 11: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/11.jpg)
Mackenzie 11HPEC, 22-Sep-2005
r eser voi r abs
PCA Toolchain
• Two-level compilation factors the compilation problem.• SVM is one abstraction and path through the toolchain.
StreamIt Brook C/C++ Others…Stable APIs (SAPI)
Stable ArchitectureAbstraction Layer (SAAL)
Binaries
Low Level Compilers (LLC)
TRIPS MONARCH Smart Memories RAW Others...
High Level Compilers (HLC)
Virtual Machine API
Machine ModelMetadata Context
SVMTVM-HAL
UVM
![Page 12: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/12.jpg)
Mackenzie 12HPEC, 22-Sep-2005
r eser voi r abs
SVM Slice of the PCA Toolchain
foo.c mm.xml
High-Level Compiler
foo.svm.c
Low-Level Compiler
SVM AbstractionSVM Code: C “kernels” for the stream processors, C w/SVM API calls for control.
SVM Code: C “kernels” for the stream processors, C w/SVM API calls for control.
SourceSource
foo.svm.exe
Machine Model: processors, memories, interconnect in SVM-specified format.
Machine Model: processors, memories, interconnect in SVM-specified format.
LLC-to-HLC feedback (undefined)
LLC-to-HLC feedback (undefined)
![Page 13: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/13.jpg)
Mackenzie 13HPEC, 22-Sep-2005
r eser voi r abs
SVM Details
• Machine Model: abstract architecture description in terms of processors, memory units, dma unit and interconnect in some topology.
• High Level Compiler: parallelizes, maps and schedules computation, storage and communication onto the machine model resources.
• Low Level Compiler: a hardware-specific uniprocessorcompiler.
![Page 14: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/14.jpg)
Mackenzie 14HPEC, 22-Sep-2005
r eser voi r abs
SVM Detail: R-Stream High-Level Compiler
• Map and schedule computation, storage and communication
• Reservoir’s R-Stream– Oriented to static computation,
e.g. radar front-end.– Converts loop bodies to kernels
sized to fit local memory constraints.
– modulo-schedules kernels on stream processors in a macro-pipeline.
Loopnest 1
Loopnest 2
Loop nest 3
Loop nest 4
time
moduloschedule
processors
initiation interval
![Page 15: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/15.jpg)
Mackenzie 15HPEC, 22-Sep-2005
r eser voi r abs
SVM Detail: R-Stream High-Level Compiler
#pragma res paralleldoloop (int i = 0; i < N; ++i) {
z[[i]] = a * x[[i]] + y[[i]];}
Input is “Gumdrop”: an annotated C
static void main_kernel_work_0(struct kernel_data_tag_0 *d) {int i;int const hlc_hi_i = d->i_max;for (i = d->i_min; i < hlc_hi_i; i++) {float _t, _t_1, _t_2;SVM_BLOCK_READ(d->x_block, i - d->x_block_offset_0, &_t_2);SVM_BLOCK_READ(d->y_block, i - d->y_block_offset_0, &_t_1);_t = d->a * _t_2 + _t_1;SVM_BLOCK_WRITE(d->z_block, i - d->z_block_offset_0, &_t);
}}// ...
Output is SVM: C for kernels (shown)plus C w/API calls to invoke kernels (not shown)
![Page 16: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/16.jpg)
Mackenzie 16HPEC, 22-Sep-2005
r eser voi r abs
Prototype SVM-GPU Toolchain
1. Machine Model2. Low-Level Compiler3. Runtime
![Page 17: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/17.jpg)
Mackenzie 17HPEC, 22-Sep-2005
r eser voi r abs
Toolchain (HLC)
foo.c svmgpu.xml
High-Level Compiler:R-Stream
foo.svm.cSVM Abstraction
1. Machine Model:Processors, Memories/Interconnectin SVM-specified format
1. Machine Model:Processors, Memories/Interconnectin SVM-specified format
SVM Code: control + kernelsSVM Code: control + kernels
Source: R-Stream’s“Gumdrop” (C +abstract arrays)
Source: R-Stream’s“Gumdrop” (C +abstract arrays)
![Page 18: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/18.jpg)
Mackenzie 18HPEC, 22-Sep-2005
r eser voi r abs
Toolchain (all)
foo.c svmgpu.xml
High-Level Compiler:R-Stream
foo.svm.c
SVMGPUtranslator
foo.svmgpu.c
MSVC/othercompiler
foo.svmgpu.exe svmgpu.dll
Cg compilerOpenGL Runtime
SVM Abstraction
1. Machine Model:Processors, Memories/Interconnectin SVM-specified format
1. Machine Model:Processors, Memories/Interconnectin SVM-specified format
3. Runtime: SVMimplementation w/ extensions for Cg
3. Runtime: SVMimplementation w/ extensions for Cg
SVMGPU Code: C control code + Cg kernel code.
Source: R-Stream’s“Gumdrop” (C +abstract arrays)
Source: R-Stream’s“Gumdrop” (C +abstract arrays)
SVM Code: control + kernelsSVM Code: control + kernels
2. Low-Level Compiler:• Translator to C + Cg,• MSVC compiler• nVidia Cg compiler
2. Low-Level Compiler:• Translator to C + Cg,• MSVC compiler• nVidia Cg compiler
![Page 19: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/19.jpg)
Mackenzie 19HPEC, 22-Sep-2005
r eser voi r abs
1. Machine Model
• Model the GPU as one fast processor (the fragment shader).• Model the VRAM as local memory.• Model a GPU “i-cache” to indicate limited program store• Model DMA between DRAM and VRAM although hidden by driver.
• Handles multiple GPUs (duplicate VRAM and DMA to match)• Handles multiple CPUs
CPU12GFLOPs
GPU48GFLOPs
DMA4GB/s
DRAM1GB,
6.4GB/s
VRAM256MB,38GB/s
“i-cache”64KB
![Page 20: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/20.jpg)
Mackenzie 20HPEC, 22-Sep-2005
r eser voi r abs
Machine Model Approximations
• No model of extra hardware features, e.g. interpolation, z-sort– Use of these features is likely limited to libraries
• No model of SIMD details: startup cost, branch cost– Fixable
• No model of the no-scatter constraint– Conceivable in SVM’s machine model schema but R-Stream does not
currently understand it.
• No model of detailed resource constraints– Number of registers (shader programs cannot spill registers)– Cost of instruction combinations– Cost of register usage vs. # of threads– Note: much of this detail is impossible to model precisely!
![Page 21: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/21.jpg)
Mackenzie 21HPEC, 22-Sep-2005
r eser voi r abs
2. Translator
• What it is:– SVM (C) to SVMGPU (C + Cg) translator– Combines with vendor C and Cg compilers to form an SVM “Low-
Level Compiler”
• Compact experimental prototype– 1400 lines of SML
![Page 22: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/22.jpg)
Mackenzie 22HPEC, 22-Sep-2005
r eser voi r abs
Translator Operation
• Translates kernel bodies to Cg fragment shader programs– Outermost loop in a kernel removed (becomes hardware
rasterization)– Input arrays become Cg textures– Input loop-invarient values become Cg uniform parameters– Output arrays become Cg out parameters
• Translates the outermost loop in kernels to hardware rasterization– Fragment program invocation over a block of data– Block extents given by loop bounds
• Checks correctness conditions at compile- and/or at runtime– check no-scatter constraint– A kernel that fails this check is run on the CPU instead of the GPU
![Page 23: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/23.jpg)
Mackenzie 23HPEC, 22-Sep-2005
r eser voi r abs
3. Runtime
• Implements SVM functionality
• Includes support for SMP/clusters of CPUs and multiple GPUs
• Built atop OpenGL, Cg, nVidia/ATI drivers, and Windows.
• Compact experimental prototype– 2300 lines of C
![Page 24: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/24.jpg)
Mackenzie 24HPEC, 22-Sep-2005
r eser voi r abs
Runtime Operation
• Manages textures as storage for SVM blocks
• Executes Cg code for translated SVM kernels– Falls back to running the kernel on the CPU if Cg compilation fails
• Implements DMA kernels using OpenGL calls
![Page 25: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/25.jpg)
Mackenzie 25HPEC, 22-Sep-2005
r eser voi r abs
Results
![Page 26: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/26.jpg)
Mackenzie 26HPEC, 22-Sep-2005
r eser voi r abs
Results
• Quantitative:– Successfully executes simple programs.– Still tuning to reduce overhead to the level of BrookGPU.
• Qualitative:– GPUs
• The no-scatter constraint is the most serious.• The no-local-storage constraint is the next worst.
– R-Stream• Needs to recognize the basic GPU constraints to be automatic.• We can work around this in source code for experiments.
– SVM• C is tough to translate; the HLC’s analyses are lost.• Feedback is necessary.
![Page 27: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/27.jpg)
Mackenzie 27HPEC, 22-Sep-2005
r eser voi r abs
Result: SAXPY Execution Time
0
50
100
150
200
250
300
350
0 5 10 15 20log2(nelements)
mill
isec
onds
SVM-GPUBrook
![Page 28: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/28.jpg)
Mackenzie 28HPEC, 22-Sep-2005
r eser voi r abs
GPU Kernel Constraints
• Fragment programs write outputs exactly once, in-order.• Fragment programs have no local storage.
• R-Stream currently doesn’t recognize the constraints and will, e.g., fuse together GPU-friendly loops into one GPU-unfriendly loop.
• Workaround: mark loops separately.
#pragma res parallel{
for (i = 1; i < N; i++) {y[i] = x[i - 1] + x[i];
}for (i = 1; i < N; i++) {z[i] = y[i - i] + y[i]
}}
![Page 29: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/29.jpg)
Mackenzie 29HPEC, 22-Sep-2005
r eser voi r abs
Feedback
• Feed-forward via the machine model is preferable• Feedback is inevitable
– Some constraints are impractical to model or to solve– Some constraints are unknown/proprietary– Conservative interpretation of constraints is sub-optimal
• Feedback makes the compilation process a search
• What kind of feedback is available when:– From the translator (arbitrary but imprecise)– From Cg (pass/fail, little else without vendor assist)– From trial execution of code (performance)
![Page 30: A Streaming Virtual Machine for GPUs · High Level Compilers (HLC) Virtual Machine API Machine Model Metadata Context SVM TVM-HAL UVM. Mackenzie 12 HPEC, 22-Sep-2005 reservoirabs](https://reader034.vdocuments.site/reader034/viewer/2022050116/5f4d4c4983911e0795401779/html5/thumbnails/30.jpg)
Mackenzie 30HPEC, 22-Sep-2005
r eser voi r abs
Summary and Future Work
• A Streaming Virtual Machine for GPUs– Machine model– Low-level compiler built via a translator to C + Cg– Runtime atop ATI/nVidia targets
• Work in progress:– Characterize feedback requirements and propose mechanisms
• Future work:– Supporting library code; optimization across libraries.– Exporting special hardware features via SVM.