date conference template by jano gebeleinbgoossen/pub/goossensdate2018_slides.pdf · • compact...

14
Bart Goossens, Hiep Luong, Jan Aelterman and Wilfried Philips QUASAR A High-level Programming Language and Development Environment for Designing Smart Vision Systems on Embedded Platforms

Upload: others

Post on 21-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DATE Conference Template by Jano Gebeleinbgoossen/pub/GoossensDATE2018_slides.pdf · • Compact code, easy to learn, easy to use • High level of abstraction • Hardware agnostic

Bart Goossens, Hiep Luong, Jan Aelterman and Wilfried Philips

QUASAR A High-level Programming Language and

Development Environment for Designing Smart Vision Systems on Embedded Platforms

Page 2: DATE Conference Template by Jano Gebeleinbgoossen/pub/GoossensDATE2018_slides.pdf · • Compact code, easy to learn, easy to use • High level of abstraction • Hardware agnostic

The benefits of GPUs. Growing application domain

Exploits massive parallelism • e.g. to process large amounts of data in parallel

Speed-ups of 10 to 100

Energy efficient • calculations/Watt

Applied to many applications: • Multimedia, finances, big data, scientific computing,…

Commodity HW • Standard in desktops and laptops; embedded platforms

See: https://blogs.nvidia.com/blog/2017/05/24/ai-revolution-eating-software

NV

IDIA

10

00

X b

y 2

02

5

2

Page 3: DATE Conference Template by Jano Gebeleinbgoossen/pub/GoossensDATE2018_slides.pdf · • Compact code, easy to learn, easy to use • High level of abstraction • Hardware agnostic

The drawbacks of GPU programming solved by Quasar

Low level coding experts

needed

Strong coupling between algorithm development and implementation

Long development

lead times

Each HW platform

requires new optimizations

3

Page 4: DATE Conference Template by Jano Gebeleinbgoossen/pub/GoossensDATE2018_slides.pdf · • Compact code, easy to learn, easy to use • High level of abstraction • Hardware agnostic

Algorithm (quasar code)

Code analysis

Compilation

Runtime

CPU Multi-core CPU Many-core accelerator

GPU Embedded

device

• Kernel generation • Loop parallelization • Target-specific optim. • Developer feedback

• OpenMP & SIMD • OpenCL • CUDA

• Memory management • Load balancing • Scheduling • Kernel parameter optimization

• Programming language • Compact code, easy to learn,

easy to use • High level of abstraction • Hardware agnostic programming

Hardware characteristics

Ker

nel

ch

arac

teri

stic

s

Dat

a ch

arac

teri

stic

s

Data

Quasar: overview

4

Page 5: DATE Conference Template by Jano Gebeleinbgoossen/pub/GoossensDATE2018_slides.pdf · • Compact code, easy to learn, easy to use • High level of abstraction • Hardware agnostic

Quasar – High-level Scripting language • Same abstraction level as Python and Matlab

Matrix slices!

Matrix multiplication

Heterogeneous compilation: code parallelized and optimized for CPU and GPU! 5

Page 6: DATE Conference Template by Jano Gebeleinbgoossen/pub/GoossensDATE2018_slides.pdf · • Compact code, easy to learn, easy to use • High level of abstraction • Hardware agnostic

Quasar – High-level to low-level translation

• High-level array programming: • manipulating vectors and matrices

• concise expressions (easy to write, easy to understand)

• no special compiler support performance cost (poor data locality)!

• Quasar code analysis: • Automatic parallelization

• Aggregation operations (sum,mean,prod,min,max) parallel reduction

• Cumulative operations (cumsum, cummin, ...) parallel prefix sum

• Vector slices, matrix slices, vector arithmetic automatic kernel generation

A[:,1] B[:,:,1]

sum

sum(B[:,:,1])

• Flexible algorithmic development with optimized CPU/GPU code

6

Page 7: DATE Conference Template by Jano Gebeleinbgoossen/pub/GoossensDATE2018_slides.pdf · • Compact code, easy to learn, easy to use • High level of abstraction • Hardware agnostic

Quasar – High-level to low-level translation • Compiler intermediate results can be visualized

7

Page 8: DATE Conference Template by Jano Gebeleinbgoossen/pub/GoossensDATE2018_slides.pdf · • Compact code, easy to learn, easy to use • High level of abstraction • Hardware agnostic

Quasar – Redshift IDE • Code editing, debugging, variable inspection, immediate evaluation

8

Page 9: DATE Conference Template by Jano Gebeleinbgoossen/pub/GoossensDATE2018_slides.pdf · • Compact code, easy to learn, easy to use • High level of abstraction • Hardware agnostic

Quasar – Redshift IDE • Profiler: analyze CPU and GPU activity (bottleneck kernels, timeline)

Page 10: DATE Conference Template by Jano Gebeleinbgoossen/pub/GoossensDATE2018_slides.pdf · • Compact code, easy to learn, easy to use • High level of abstraction • Hardware agnostic

Quasar – test case results

Test case: Parallel MRI reconstruction algorithm by experienced CUDA programmer and Quasar novice

0

10

20

30

40

50

60

70

CUDA Programmer Quasar Novice

Work days

Future proof: program once, run everywhere

0

1

2

3

4

CUDA Programmer Quasar Novice

Execution time (ms)

4% faster

10

Page 11: DATE Conference Template by Jano Gebeleinbgoossen/pub/GoossensDATE2018_slides.pdf · • Compact code, easy to learn, easy to use • High level of abstraction • Hardware agnostic

Quasar – benchmark results

0

1

2

3

4

5

6

Filter 32 taps,with global

memory

2D spatial filter32x32

separable, withglobal memory

wavelet filter,with global

memory

Execution time (s)

CUDA - time (s) Quasar - time (s)

0

100

200

300

400

Filter 32 taps,with global

memory

2D spatial filter32x32 separable,

with globalmemory

wavelet filter,with global

memory

Lines of Code

CUDA - LOC Quasar - LOC

11

Page 12: DATE Conference Template by Jano Gebeleinbgoossen/pub/GoossensDATE2018_slides.pdf · • Compact code, easy to learn, easy to use • High level of abstraction • Hardware agnostic

Quasar - demo

12

Video link: https://www.youtube.com/watch?v=IEwrMHXgyoU

Page 13: DATE Conference Template by Jano Gebeleinbgoossen/pub/GoossensDATE2018_slides.pdf · • Compact code, easy to learn, easy to use • High level of abstraction • Hardware agnostic

• GPUs: performance revolution

• High-level code: gives compiler more flexibility for code generation + optimization

• Targetting heterogeneous systems (GPUs)

• Quasar: new programming paradigm for heterogeneous systems (CPU, GPU...) • Low barrier of entry (Matlab-like)

• Reduces development time

• Future proof!

• Redshift IDE: powerful code editing, debugging, profiling!

• Available for tryout: http://gepura.io/quasar/try-quasar/

Conclusion

13

Page 14: DATE Conference Template by Jano Gebeleinbgoossen/pub/GoossensDATE2018_slides.pdf · • Compact code, easy to learn, easy to use • High level of abstraction • Hardware agnostic

• Thank you for your attention!

• Questions?

http://gepura.io/quasar/try-quasar/ Try-out Quasar: 14