gramps: a programming model for graphics pipelines

23
1 GRAMPS: A Programming Model For Graphics Pipelines Jeremy Sugerman, Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, Pat Hanrahan

Upload: sun

Post on 11-Jan-2016

34 views

Category:

Documents


1 download

DESCRIPTION

GRAMPS: A Programming Model For Graphics Pipelines. Jeremy Sugerman, Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, Pat Hanrahan. “The Graphics Pipeline”. Central to the rise of 3D hardware and software. A stable and universal abstraction Shaped the evolution of the field… - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GRAMPS: A Programming Model For Graphics Pipelines

1

GRAMPS: A Programming Model

For Graphics PipelinesJeremy Sugerman,

Kayvon Fatahalian, Solomon Boulos,Kurt Akeley, Pat Hanrahan

Page 2: GRAMPS: A Programming Model For Graphics Pipelines

2

“The Graphics Pipeline”

Central to the rise of 3D hardware and software.

A stable and universal abstraction

Shaped the evolution of the field…

… while leaving enormous room for innovation.

InputAssembler

VertexShader

RastOutputMerger

PixelShader

Page 3: GRAMPS: A Programming Model For Graphics Pipelines

3

RastOutputMerger

PixelShader

Fixed Function

The Graphics Pipeline is evolving

Programmable Shading

InputAssembler

VertexShader

Direct3D 10Direct3D 11

PixelShader

VertexShader

HullShader

DomainShader

Tessellator

GeometryShader

StreamOutput

Page 4: GRAMPS: A Programming Model For Graphics Pipelines

4

“GPU” is evolving, too

Continued drive for algorithmic innovation and advanced rendering techniques

First class programming models for compute:

– OpenCL, compute shaders, vendor specific, …

New / different hardware implementations:

– E.g., Larrabee, CPU-GPU combinations / hybrids

– Even NVIDIA and AMD GPUs are very different

Page 5: GRAMPS: A Programming Model For Graphics Pipelines

5

From fixed to programmable (again)

Idea: Evolve the pipeline itself from preset configurations to a programmable entity

Page 6: GRAMPS: A Programming Model For Graphics Pipelines

6

Programming model and run-time for parallel hardware

Graphs of stages and queues

GRAMPS handles scheduling, parallelism, data-flow

GRAMPS

Example: Simple GRAMPS Graph

ShaderThreadFixed

Function

Page 7: GRAMPS: A Programming Model For Graphics Pipelines

7

The Graphics Pipeline becomes an app!

Structure/setup is (application) software

– Customized or completely novel renderers

Reuses current hardware: FIFOs, shader cores, rast, …

Analogous to the transition to programmable shading

– Proliferation of new use cases and parameters

– Not (unthinkably) radical

Fra

me

Buf

fer

VertexInput PixelRast Merge

Page 8: GRAMPS: A Programming Model For Graphics Pipelines

8

Writing a GRAMPS application

Design the execution graph:

Design the stages:

– Shaders

– Threads (and Fixed Function stages)

Instantiate and launch.

VertexInput Pixel MergeRast

Fra

me

Buf

fer

Merge

Page 9: GRAMPS: A Programming Model For Graphics Pipelines

9

Fra

me

Buf

fer

VertexInput Pixel MergeRast

More Detail – Queues

Queues operate at a “packet” granularity

– “Large bundles of coherent work”

GRAMPS can optionally enforce ordering

– Required for some workloads, adds overhead

Page 10: GRAMPS: A Programming Model For Graphics Pipelines

10

Fra

me

Buf

fer

RastVertex PixelInput Merge

More Detail – Shaders

Shaders: Like pixel (or compute) shaders, stateless

– Automatic instancing, pre-reserve/post-commit

“Collection” packets: shared header and N elements

New: “Push” operation to coalesce variable outputs

Vertex

Page 11: GRAMPS: A Programming Model For Graphics Pipelines

11

Fra

me

Buf

fer

RastVertex PixelInput Merge

More Detail – Thread/Fixed Function

Threads: Like POSIX threads, stateful

– Explicit reserve/commit on queues

Fixed Function: Effectively non-programmable Threads

Rast

Page 12: GRAMPS: A Programming Model For Graphics Pipelines

12

More Detail – Queue Sets

Queue sets enable binning-style algorithms

One logical queue with multiple lanes (or bins)

– One consumer at a time per lane

– Many lanes with data allows many parallel consumers

VertexInput Rast

Fra

me

Buf

fer

Pixel MergeMerge

Fra

me

Buf

fer

Pixel

Page 13: GRAMPS: A Programming Model For Graphics Pipelines

13

Quick Comparison to “Streaming”

Streaming: “squeeze out every FLOP”

– Goals: throughput, bulk transfer, arithmetic intensity

– Intensive static analysis, program transformation

– Bound space, data access, execution time

GRAMPS: “interesting applications are irregular”

– Goals: throughput, dynamic, data-dependent code

– Aggregate work at run-time, heterogeneous hardware

– Streaming apps are GRAMPS apps

Page 14: GRAMPS: A Programming Model For Graphics Pipelines

14

Evaluation: Design Goals

Broad application scope: preferable to roll-your-own

Multi-platform: suits many hardware configurations

High performance: competitive with roll-your-own

Tunable: expert users can optimize their apps

Optimized Implementations: inform, and are informed by, hardware

Page 15: GRAMPS: A Programming Model For Graphics Pipelines

15

Broad Application Scope

Ray Tracing Graph

= Thread = Queue

= Shader = Stage Output

= Fixed-Func = Push Output

Rasterization Pipeline (with ray tracing extension)

Ray Tracing Extension

Fra

me

Buf

fer

PSRastRO

Ver

tex

Buf

fers

VSNIAN

VS1IA1 OM

Trace PS2

Tiler Sampler Camera Intersect

ShadeShadowIntersect

Blend

Fra

me

Buf

fer

Page 16: GRAMPS: A Programming Model For Graphics Pipelines

16

Multi-Platform: Two (Simulated) Machines

CPU-Like: 8 Fat Cores, Rast

GPU-Like: 1 Fat Core, 4 Micro Cores, Rast, Sched

Page 17: GRAMPS: A Programming Model For Graphics Pipelines

17

High Performance – Metrics

Priority #1: Show scale out parallelism

– Can GRAMPS exploit the application parallelism and fill the machine?

Priority #2: Show ‘reasonable’ bandwidth / storage requirements for queueing

– What is the worst case total footprint of all queues?

– A scheduling problem: trade-off with possible parallelism

Page 18: GRAMPS: A Programming Model For Graphics Pipelines

18

High Performance – Scheduling

Very simple static prototype scheduler (both platforms):

Static stage priorities:

Limited pre-emption points

No dynamic weighting of current queue depths

(Lowest)

(Highest)

1 2 3 4

5 6 7

Fra

me

Buf

fer

Page 19: GRAMPS: A Programming Model For Graphics Pipelines

19

High Performance – Results

Three scenes x { Rasterization, Ray Tracer, Hybrid }

Parallelism is 95+% for all but rasterized fairy (~80%).

Queues are small: < 600KB CPU-like, < 1.5MB GPU-like

Order costs footprint

Page 20: GRAMPS: A Programming Model For Graphics Pipelines

20

Also: raw counters, statistics, text log of run-time activity

Tunability – Understanding Performance

GRAMPSviz:

Page 21: GRAMPS: A Programming Model For Graphics Pipelines

21

Tiler Sampler Camera Intersect

ShadeShadowIntersect

Blend

Fra

me

Buf

fer

Execution Graph topology / design:

Sizing critical queues:

Tunability – Lessons Learned

PSRast

Fra

me

Buf

fer

PS + OMRast

Fra

me

Buf

fer

Sort-Middle Sort-Last

OM

Page 22: GRAMPS: A Programming Model For Graphics Pipelines

22

Summary

After a long era of stability, the Graphics Pipeline is undergoing rapid change.

GRAMPS enables software-defined custom pipelines.

– The Graphics Pipeline becomes an app

– Prototypes show plausible performance, resource needs

– Handles heterogeneous parallelism well

– Applicable beyond rendering and beyond GPUs

Page 23: GRAMPS: A Programming Model For Graphics Pipelines

23

Thank You

Our funding agencies:

Stanford Pervasive Parallelism Lab

Department of the Army Research

Intel Corporation

Rambus Stanford Graduate Fellowship

Intel PhD Fellowship

NSF Graduate Research Fellowship

http://graphics.stanford.edu/papers/gramps-tog/