many-core programming with gramps jeremy sugerman stanford university september 12, 2008

22
Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

Upload: naasir

Post on 06-Jan-2016

55 views

Category:

Documents


0 download

DESCRIPTION

Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008. Background, Outline. Stanford Graphics / Architecture Research Collaborators: Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, Pat Hanrahan To appear in ACM Transactions on Graphics - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

Many-Core Programming with GRAMPS

Jeremy SugermanStanford University

September 12, 2008

Page 2: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

2

Background, Outline Stanford Graphics / Architecture Research

– Collaborators: Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, Pat Hanrahan

To appear in ACM Transactions on Graphics

CPU, GPU trends… and collision? Two research areas:

– HW/SW Interface, Programming Model– Future Graphics API

Page 3: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

3

Problem Statement Drive efficient development and execution in

many-/multi-core systems. Support homogeneous, heterogeneous cores. Inform future hardware

Status Quo: GPU Pipeline (Good for GL, otherwise hard) CPU (No guidance, fast is hard)

Page 4: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

4

Software defined graphs Producer-consumer, data-parallelism Initial focus on rendering

GRAMPSInput

FragmentQueue

OutputFragment

Queue

Rasterization Pipeline

Ray Tracing Graph

= Thread Stage= Shader Stage= Fixed-func Stage

= Queue= Stage Output

RayQueue

Ray HitQueue Fragment

Queue

Camera Intersect

Shade FB Blend

Shade FB BlendRasterize

Page 5: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

5

As a Graphics Evolution Not (too) radical for ‘graphics’ Like fixed → programmable shading

– Pipeline undergoing massive shake up– Diversity of new parameters and use cases

Bigger picture than ‘graphics’– Rendering is more than GL/D3D– Compute is more than rendering– Some ‘GPUs’ are losing their innate pipeline

Page 6: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

6

As a Compute Evolution (1) Sounds like streaming:

Execution graphs, kernels, data-parallelism

Streaming: “squeeze out every FLOP”– Goals: bulk transfer, arithmetic intensity– Intensive static analysis, custom chips (mostly)– Bounded space, data access, execution time

Page 7: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

7

As a Compute Evolution (2) GRAMPS: “interesting apps are irregular”

– Goals: Dynamic, data-dependent code– Aggregate work at run-time– Heterogeneous commodity platforms

Naturally allows streaming when applicable

Page 8: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

8

GRAMPS’ Role A ‘graphics pipeline’ is now an app! GRAMPS models parallel state machines.

Compared to status quo:– More flexible than a GPU pipeline– More guidance than bare metal– Portability in between– Not domain specific

Page 9: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

9

GRAMPS Interfaces Host/Setup: Create execution graph

Thread: Stateful, singleton

Shader: Data-parallel, auto-instanced

Page 10: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

GRAMPS Entities (1) Accessed via windows

Queues: Connect stages, Dynamically sized– Ordered or unordered– Fixed max capacity or spill to memory

Buffers: Random access, Pre-allocated– RO, RW Private, RW Shared (Not Supported)

Page 11: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

GRAMPS Entities (2) Queue Sets: Independent sub-queues

– Instanced parallelism plus mutual exclusion– Hard to fake with just multiple queues

Page 12: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

12

What We’ve Built (System)

Page 13: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

13

GRAMPS Scheduler Tiered Scheduler

‘Fat’ cores: per-thread, per-core

‘Micro’ cores: shared hw scheduler

Top level: tier N

Page 14: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

14

What We’ve Built (Apps)Direct3D Pipeline (with Ray-tracing Extension)

Ray-tracing Graph

IA 1 VS 1 RO Rast

Trace

IA N VS N

PS

SampleQueue Set

RayQueue

PrimitiveQueue

Input VertexQueue 1

PrimitiveQueue 1

Input VertexQueue N

OM

PS2

FragmentQueue

Ray HitQueue

Ray-tracing Extension

PrimitiveQueue N

Tiler

Shade FB Blend

SampleQueue

TileQueue

RayQueue

Ray HitQueue

FragmentQueue

CameraSampler Intersect

= Thread Stage= Shader Stage= Fixed-func

= Queue= Stage Output= Push Output

Page 15: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

15

Initial Results Queues are small, utilization is good

Page 16: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

16

GRAMPS Visualization

Page 17: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

17

GRAMPS Visualization

Page 18: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

18

GRAMPS Portability Portability really means performance.

Less portable than GL/D3D– GRAMPS graph is (more) hardware sensitive

More portable than bare metal– Enforces modularity– Best case, just works – Worst case, saves boiler plate

Page 19: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

19

High-level Challenges Is GRAMPS a suitable GPU evolution?

– Enable pipeline competitive with bare metal?– Enable innovation: advanced / alternative

methods?

Is GRAMPS a good parallel compute model?– Map well to hardware, hardware trends?– Support important apps?– Concepts influence developers?

Page 20: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

20

What’s Next: Implementation Better scheduling

– Less bursty, better slot filling– Dynamic priorities– Handle graphs with loops better

More detailed costs– Bill for scheduling decisions– Bill for (internal) synchronization

More statistics

Page 21: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

21

What’s Next: Programming Model Yes: Graph modification (state change)

Probably: Data sharing / ref-counting

Maybe: Blocking inter-stage calls (join) Maybe: Intra/inter-stage synchronization primitives

Page 22: Many-Core Programming with GRAMPS Jeremy Sugerman Stanford University September 12, 2008

22

What’s Next: Possible Workloads REYES, hybrid graphics pipelines Image / video processing Game Physics

– Collision detection or particles Physics and scientific simulation AI, finance, sort, search or database query, …

Heavy dynamic data manipulation- k-D tree / octree / BVH build- lazy/adaptive/procedural tree or geometry