sgrt: a scalable mobile gpu architecture based on ray...

24
SGRT: A Scalable Mobile GPU Architecture based on Ray Tracing Won-Jong Lee , Shi-Hwa Lee , Jae-Ho Nah * , Jin-Woo Kim * , Youngsam Shin , Jaedon Lee , Seok-Yoon Jung SAIT, SAMSUNG Electronics , Yonsei Univ. * , Korea Talks, ACM SIGGRAPH 2012

Upload: others

Post on 23-Apr-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

SGRT: A Scalable Mobile GPU

Architecture based on Ray Tracing

Won-Jong Lee†, Shi-Hwa Lee†, Jae-Ho Nah*, Jin-Woo Kim*,

Youngsam Shin†, Jaedon Lee†, Seok-Yoon Jung†

SAIT, SAMSUNG Electronics†, Yonsei Univ.*, Korea

Talks, ACM SIGGRAPH 2012

Page 2: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

Outline • Introduction

• SGRT Core Architecture – T&I Engine: H/W Accelerator

– SRP : Programmable DSP

– SMK : Parallelization Framework

• Experimental Results

• Conclusion

Talk, ACM SIGGRAPH 2012

Page 3: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

Introduction

Talks, ACM SIGGRAPH 2012

Page 4: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

Graphics Trends

’10 ’15 ’20

PC/Console

Mobile/CE

Reality

Realistic 3D Game (‘10)

Immersive AR/MR

3D Game (‘04)

Smart Phone (‘09)

Realistic 3D Game on Mobile/CE

Immersive AR/MR on Mobile/CE

Smart TV (‘10)

• Graphics is being important as increasing smart devices

• Evolving toward more realistic graphics

• Mobile graphics template earlier PC graphics (5~6 years)

Talk, ACM SIGGRAPH 2012

Page 5: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

Mobile SoC

Talk, ACM SIGGRAPH 2012

Apple A5X Die Photo Image Courtesy: Chipworks

Page 6: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

• Inadequate Performance

– Flagship mobile GPU: ~256GFLOPS (ARM Mali T658)

– Real-time ray tracing @HD: >300Mray/sec (1~2TFLOPS)

• Unsuitable Execution Model

– “Multithreaded SIMD” is not fit for processing incoherent rays

• Weak Branch Supports

– Performance drops when recursion, function calls, control flow…

Current Mobile GPU for Ray Tracing

Talk, ACM SIGGRAPH 2012

Page 7: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

• Dedicated, Fixed Function H/W

– Performance & power-efficient, but weak flexibility

– RPU [Woop, SIGGRAPH 2005]

• Fully Programmable Processor

– Flexible, but inadequate performance and power consumption

– Reconfigurable stream processor [Kim, CICC 2012] : 1~2 Mrays/sec

– MIMD threaded processor [Spjut, SHAW-3 2012] : ~30 Mrays/sec

Need a New Architecture?

Talk, ACM SIGGRAPH 2012

Page 8: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

• Performance for Real Time Rendering

– 200~300Mray/sec

• Reasonable Flexibility

– Programmable shading and ray generation

– Support various BVHs : SAH/Binned/SBVH/LBVH..

– Easy to extend to GI (path tracing, photon mapping..)

– Easy to combine rasterizer (OpenGL|ES) and ray tracing

• Low Power & Cost

Requirements

Talk, ACM SIGGRAPH 2012

Page 9: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

SGRT

Talks, ACM SIGGRAPH 2012

Page 10: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

• Combination of CPU, H/W and DSP (Mobile SoC)

– Tree Build: sorting, irregular work Multi-core CPU (with multi-level $)

– Refit, Traversal, Intersection: embarrassingly parallel Dedicated H/W

– Ray Gen. & Shading: need for flexibility Programmable DSP

Our Approach

Talk, ACM SIGGRAPH 2012

Dedicated H/W

(Traversal &

Intersection)

Programmable

DSP

(Ray Gen. &

Shading)

Multi-core CPU

(Tree Build)

Memory Memory

Page 11: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

SGRT Core #4 SGRT Core #3

SGRT Core #2

• SGRT (Samsung reconfigurable GPU based on Ray Tracing)

– T&I Engine: fast, compact H/W to accelerate traversal & intersection

– SRP: Samsung Reconfigurable Processor to support flexible shading

– SMK : Parallelization framework

System Architecture

Talk, ACM SIGGRAPH 2012

SGRT Core #1

External DRAM

T&I Engine

Intersection

Unit

Cache(L1)

Traversal Unit

Cache(L1)

Traversal Unit

Cache(L1)

Traversal Unit

Cache(L1)

Traversal Unit

Cache(L1)

Cache(L2)

SRP VLIW Engine

Internal SRAM

Coarse Grained Reconfigurable

Array

I-Cache

C-Mem

Texture Unit

Cache(L1)

Multi-core ARM

Core #1 Core #2

Core #3 Core #4

Host DRAM

Host System BUS

Refitting

Unit

AXI System BUS

Page 12: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

T&I Engine : A MIMD H/W Accelerator

• Newly designed H/W Accelerator based

on our previous work – KDtree H/W

engine [Nah, SIGGRAPH ASIA 2011]

– Single-ray-based MIMD architecture

: Efficient processing for incoherent rays

– Ray Accumulation Unit (RAU)

: Hardware multithreading

• Optimized restart & short stack algorithm

– Adaptive restart trail [Lee, HPG 2012]

• Early Intersection Test

– Reducing expensive ray-primitive IST test

Talk, ACM SIGGRAPH 2012

T&I Engine

Ray Dispatcher

Intersection Unit

L2$ L1$

L1$

Traversal Unit

Traversal Unit

Traversal Unit Traversal Unit

L1$ RAU pipe

stack

L1$ RAU pipe

Rays

Hit info

L1$

L1$

Traversal Unit

Traversal Unit

Traversal Unit Traversal Unit

L1$ RAU pipe

stack L1$

L1$

Traversal Unit

Traversal Unit

Traversal Unit Traversal Unit

L1$ RAU pipe

stack L1$

L1$

Traversal Unit

Traversal Unit

Traversal Unit Traversal Unit

L1$ RAU pipe

stack

MIMD arch.

Page 13: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

Early (Two-Pass) Intersection Test

Inner node

Leaf node

Primitive AABB

Primitive

1

2 3

4 5

6 7

10 11

8 9

1

2 3

T0 T1 T2 T3 T5 T6 T7 T4

Talk, ACM SIGGRAPH 2012

Ray-nodeAABB Test Ray-Primitive Test

Ray-nodeAABB Test

Ray-primAABB Test Ray-Primitive Test

Conventional IST

Early IST

Traversal Unit Intersection Unit

Traversal Unit Intersection Unit

Page 14: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

Ray Accumulation Unit

Ray Accumulation Unit • Specialized H/W multi-threading for latency hiding [Nah, 2011]

– $ missed rays are accumulated in RA buffer, other rays can be processed during this period

– Coherence can be increased, the rays that reference the same cache line are accumulated

in the same row in an RA buffer

– Experimental results, up to 3x performance gain

4

0

1

3

rays

cache address cache

data occupation counter

Traversal or Intersection pipeline

Non-blocking

CACHE

hit result cache data

cache address

Ray + data

ray

Control Buffer

Input Buffer

Cache hit Cache miss

Talk, ACM SIGGRAPH 2012

Page 15: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

Samsung Reconfigurable Processor • A flexible architecture template [Lee, HPG 2011/2012]

• ISA such as arithmetic, special function and texture are properly implemented.

• The VLIW engine useful for GP computations (function invocation, control flow).

• The CGRA makes full use of software pipeline technique for loop acceleration.

Talk, ACM SIGGRAPH 2012

FU

RF

FU

RF

FU

RF

FU

RF

FU

RF

FU

RF

FU

RF

FU

RF

FU

RF

FU

RF

FU

RF

FU

RF

Central RF (Register file)

FU FU FU FU

Instruction DATA

CGA

VLIW for ( )

{

Loop

}

for ( )

{

Loop

}

for ( )

{

Loop

}

Control proc

Data proc

Control proc

Data proc

Control proc

Data proc

Page 16: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

Packet Stream Tracing on SRP

• Remove recursion

Job-Q based streamed iteration

• Classified according to the types of

operation CGA kernel

• A packet of rays are batched

• Each kernels are mapped on CGA,

loop accelerated

– shows high IPC rate up to the

maximum number of FU arrays

Talk, ACM SIGGRAPH 2012

Classify hit rays, Update colors

Compute normal vectors

Classify second rays & texture

Gen. second rays Compute texture color

Compute N·L,

classify shading

Shading,

Gen. shadow rays

Reflection

Ray

Refraction

Ray

Shadow

Ray

Intersection Result

CGA Kernels

VLIW code

Page 17: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

Parallelization Framework • Parallel ray tracing with multi-tasking system

– Utilized embedded RTOS, SMK (Samsung Multi-Platform Kernel) [Shin, SAC 2011]

– Supports multi-tasking by systematic scheduling in the task queues

• Individual task for each SGRT core is responsible for

– Different pixels (or pixel tiles), the scheduler can distribute the next tasks to

the idle SGRT core first, dynamic load balancing

Talk, ACM SIGGRAPH 2012

T&I

Engine

SRP

T&I

Engine

SRP

T&I

Engine

SRP

SMK SMK SMK

Page 18: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

Evaluation

Talks, ACM SIGGRAPH 2012

Page 19: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

• Built a cycle accurate simulator (T&I Engine), and a in-house

cycle accurate compiled simulator, called csim (SRP)

• Test condition w/ two benchmarks

– Full SAH, cost ratio 5:1 (TRV:IST) for shallow tree

– Ferrari scene (210K triangles, 1 light source)

– Fairy scene (170K triangles, 2 light sources)

– Shadow, reflection, refraction @WVGA (800x640)

Simulation Environment

Talk, ACM SIGGRAPH 2012

Page 20: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

• Architecture configuration

– 4 SGRT cores, traversal & intersection unit = 4:1 per SGRT core

– 1Ghz core clock

• Achieved around 170 MRPS (T&I), 255 MRPS (RGS) for Fairy

– Recent GPU ray tracer (156~317 MRPS, NVIDIA Kepler) [Alia, HPG 2012]

Preliminary Results

Talk, ACM SIGGRAPH 2012

Scene

# of

tri.

# of

ray

T&I Engine SRP Simulated

FPS Pipeline

usage

TRV $

hit ratio

IST $

hit ratio MRPS MRPS

Fairy 170K 1.7M 87.27 93.83 96.53 171.32 255.72 87.82

Ferrari 210K 1.5M 79.75 92.56 92.92 122.48 319.56 67.83

Page 21: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

FPGA

Talk, ACM SIGGRAPH 2012

• Currently, we are also testing the SGRT on FPGA board

Page 22: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

Conclusion

Talks, ACM SIGGRAPH 2012

Page 23: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

• SGRT: A novel mobile GPU based on ray tracing,

– first mobile GPU to realize a real-time ray tracing

• Carefully designed to suit for mobile SoC environment

• Currently implementing the T&I engine at the RTL level

• Future work

– Analyze cost and power consumption

– Support dynamic scenes with a fast BVH build algorithm

optimized for mobile environment

– Higher-level shading/ecosystem

• Poster (#103) session: 8/7, 8/8 12:15-13:15PM

Conclusion

Talk, ACM SIGGRAPH 2012

Page 24: SGRT: A Scalable Mobile GPU Architecture based on Ray Tracingweb.yonsei.ac.kr/wjlee/document/siggraph2012_wjlee_talk.pdf · 2015-01-01 · SGRT: A Scalable Mobile GPU Architecture

• This project is based on the collaboration with two University

(Yonsei, National Kongju). Authors appreciate to two professors

(Tack-Don Han, Hyun-Sang Park) for their valuable advices.

• Thanks

Acknowledgements

Talk, ACM SIGGRAPH 2012