high-productivity supercomputing: metaprogramming gpus · 2020. 8. 21. · scripting: python for...

180

Upload: others

Post on 12-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

High-Productivity Supercomputing:

Metaprogramming GPUs

Andreas Klöckner

Applied Mathematics, Brown University

January 28, 2009

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 2: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Thanks

Jan Hesthaven (Brown)

Tim Warburton (Rice)

Nico Gödel (HSU Hamburg)

Lucas Wilcox (UT Austin)

Akil Narayan (Brown)

PyCuda contributors

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 3: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Outline

1 Scripting Languages

2 Scripting CUDA

3 Metaprogramming CUDA

4 Discontinuous Galerkin on CUDA

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 4: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Outline

1 Scripting LanguagesScripting: what and why?

2 Scripting CUDA

3 Metaprogramming CUDA

4 Discontinuous Galerkin on CUDA

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 5: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Goals

Scripting languages aim to reduce the load on the programmer:

Reduce required knowledge

Encourage experimentation

Eliminate sources of error

Encourage abstraction wherever possible

Value programmer time over computer time

Think about the tools you use.Use the right tool for the job.

How are these goals achieved?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 6: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Goals

Scripting languages aim to reduce the load on the programmer:

Reduce required knowledge

Encourage experimentation

Eliminate sources of error

Encourage abstraction wherever possible

Value programmer time over computer time

Think about the tools you use.Use the right tool for the job.

How are these goals achieved?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 7: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Goals

Scripting languages aim to reduce the load on the programmer:

Reduce required knowledge

Encourage experimentation

Eliminate sources of error

Encourage abstraction wherever possible

Value programmer time over computer time

Think about the tools you use.Use the right tool for the job.

How are these goals achieved?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 8: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Goals

Scripting languages aim to reduce the load on the programmer:

Reduce required knowledge

Encourage experimentation

Eliminate sources of error

Encourage abstraction wherever possible

Value programmer time over computer time

Think about the tools you use.Use the right tool for the job.

How are these goals achieved?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 9: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Goals

Scripting languages aim to reduce the load on the programmer:

Reduce required knowledge

Encourage experimentation

Eliminate sources of error

Encourage abstraction wherever possible

Value programmer time over computer time

Think about the tools you use.Use the right tool for the job.

How are these goals achieved?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 10: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Goals

Scripting languages aim to reduce the load on the programmer:

Reduce required knowledge

Encourage experimentation

Eliminate sources of error

Encourage abstraction wherever possible

Value programmer time over computer time

Think about the tools you use.Use the right tool for the job.

How are these goals achieved?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 11: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Goals

Scripting languages aim to reduce the load on the programmer:

Reduce required knowledge

Encourage experimentation

Eliminate sources of error

Encourage abstraction wherever possible

Value programmer time over computer time

Think about the tools you use.Use the right tool for the job.

How are these goals achieved?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 12: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Means

A scripting language. . .

is discoverable and interactive.

is interpreted, not compiled.

has comprehensive built-in functionality.

manages resources automatically.

is dynamically typed.

works well for gluing lower-level blocks together.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 13: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Means

A scripting language. . .

is discoverable and interactive.

is interpreted, not compiled.

has comprehensive built-in functionality.

manages resources automatically.

is dynamically typed.

works well for gluing lower-level blocks together.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 14: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Means

A scripting language. . .

is discoverable and interactive.

is interpreted, not compiled.

has comprehensive built-in functionality.

manages resources automatically.

is dynamically typed.

works well for gluing lower-level blocks together.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 15: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Means

A scripting language. . .

is discoverable and interactive.

is interpreted, not compiled.

has comprehensive built-in functionality.

manages resources automatically.

is dynamically typed.

works well for gluing lower-level blocks together.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 16: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Means

A scripting language. . .

is discoverable and interactive.

is interpreted, not compiled.

has comprehensive built-in functionality.

manages resources automatically.

is dynamically typed.

works well for gluing lower-level blocks together.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 17: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Means

A scripting language. . .

is discoverable and interactive.

is interpreted, not compiled.

has comprehensive built-in functionality.

manages resources automatically.

is dynamically typed.

works well for gluing lower-level blocks together.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 18: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Interpreted, not Compiled

Program creation workow:

Edit

Compile

Link

Run

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 19: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Interpreted, not Compiled

Program creation workow:

Edit

Compile

Link

Run

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 20: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Interpreted, not Compiled

Program creation workow:

Edit

Compile

Link

Run

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 21: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Batteries Included

Scripting languages come with batteries included(or easily available):

Data structures: Lists, Sets, Dictionaries

Linear algebra: Vectors, Matrices

OS Interface: Files, Networks, Databases

Persistence: Store, send and retrieve objects

Dened, usable C interface

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 22: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Run-Time Typing

Typing Discipline

If it walks like a duck and quacks like a duck, it is a duck.

def print_all ( iterable ):for i in iterable :

print i

print_all ([6, 7, 19])print_all (1: "a",2: "b",3: "c")

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 23: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Python

For this talk, Python is the scripting language of choice.

Mature language

Has a large and active community

Emphasizes readability

Written in widely-portable C

A `multi-paradigm' language

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 24: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Python

For this talk, Python is the scripting language of choice.

Mature language

Has a large and active community

Emphasizes readability

Written in widely-portable C

A `multi-paradigm' language

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 25: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Python

For this talk, Python is the scripting language of choice.

Mature language

Has a large and active community

Emphasizes readability

Written in widely-portable C

A `multi-paradigm' language

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 26: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Python

For this talk, Python is the scripting language of choice.

Mature language

Has a large and active community

Emphasizes readability

Written in widely-portable C

A `multi-paradigm' language

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 27: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Python

For this talk, Python is the scripting language of choice.

Mature language

Has a large and active community

Emphasizes readability

Written in widely-portable C

A `multi-paradigm' language

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 28: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Python

For this talk, Python is the scripting language of choice.

Mature language

Has a large and active community

Emphasizes readability

Written in widely-portable C

A `multi-paradigm' language

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 29: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Speed

Speed(C) Speed(Python)

For most code, it does notmatter.

It does matter for inner loops.

One solution: hybrid (glued)code.

Python + CUDA hybrids? PyCuda!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 30: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Speed

Speed(C) Speed(Python)

For most code, it does notmatter.

It does matter for inner loops.

One solution: hybrid (glued)code.

Python + CUDA hybrids? PyCuda!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 31: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Speed

Speed(C) Speed(Python)

For most code, it does notmatter.

It does matter for inner loops.

One solution: hybrid (glued)code.

Python + CUDA hybrids? PyCuda!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 32: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Speed

Speed(C) Speed(Python)

For most code, it does notmatter.

It does matter for inner loops.

One solution: hybrid (glued)code.

Python + CUDA hybrids? PyCuda!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 33: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Speed

Speed(C) Speed(Python)

For most code, it does notmatter.

It does matter for inner loops.

One solution: hybrid (glued)code.

Python + CUDA hybrids?

PyCuda!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 34: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Scripting: Speed

Speed(C) Speed(Python)

For most code, it does notmatter.

It does matter for inner loops.

One solution: hybrid (glued)code.

Python + CUDA hybrids? PyCuda!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 35: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Scripting: what and why?

Questions?

?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 36: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Whetting your Appetite

Outline

1 Scripting Languages

2 Scripting CUDAWhetting your AppetiteWorking with PyCudaA peek under the hood

3 Metaprogramming CUDA

4 Discontinuous Galerkin on CUDA

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 37: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Whetting your Appetite

Whetting your appetite

1 import pycuda.driver as cuda2 import pycuda.autoinit3 import numpy45 a = numpy.random.randn(4,4).astype(numpy.oat32)6 a_gpu = cuda.mem_alloc(a.size ∗ a.dtype.itemsize)7 cuda.memcpy_htod(a_gpu, a)

[This is examples/demo.py in the PyCuda distribution.]

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 38: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Whetting your Appetite

Whetting your appetite

9 mod = cuda.SourceModule("""10 __global__ void doublify(oat ∗a)11 12 int idx = threadIdx.x + threadIdx.y∗4;13 a[ idx ] ∗= 2;14 15 """)1617 func = mod.get_function("doublify")18 func(a_gpu, block=(4,4,1))1920 a_doubled = numpy.empty_like(a)21 cuda.memcpy_dtoh(a_doubled, a_gpu)22 print a_doubled23 print a

Compute kernel

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 39: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Whetting your Appetite

Whetting your appetite

9 mod = cuda.SourceModule("""10 __global__ void doublify(oat ∗a)11 12 int idx = threadIdx.x + threadIdx.y∗4;13 a[ idx ] ∗= 2;14 15 """)1617 func = mod.get_function("doublify")18 func(a_gpu, block=(4,4,1))1920 a_doubled = numpy.empty_like(a)21 cuda.memcpy_dtoh(a_doubled, a_gpu)22 print a_doubled23 print a

Compute kernel

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 40: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Whetting your Appetite

Whetting your appetite, Part II

Did somebody say Abstraction is good?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 41: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Whetting your Appetite

Whetting your appetite, Part II

1 import numpy2 import pycuda.autoinit3 import pycuda.gpuarray as gpuarray45 a_gpu = gpuarray.to_gpu(6 numpy.random.randn(4,4).astype(numpy.oat32))7 a_doubled = (2∗a_gpu).get()8 print a_doubled9 print a_gpu

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 42: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Outline

1 Scripting Languages

2 Scripting CUDAWhetting your AppetiteWorking with PyCudaA peek under the hood

3 Metaprogramming CUDA

4 Discontinuous Galerkin on CUDA

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 43: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda Philosophy

Provide complete access

Automatically manage resources

Provide abstractions

Allow interactive use

Check for and report errorsautomatically

Integrate tightly with numpy

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 44: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda Philosophy

Provide complete access

Automatically manage resources

Provide abstractions

Allow interactive use

Check for and report errorsautomatically

Integrate tightly with numpy

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 45: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda Philosophy

Provide complete access

Automatically manage resources

Provide abstractions

Allow interactive use

Check for and report errorsautomatically

Integrate tightly with numpy

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 46: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda Philosophy

Provide complete access

Automatically manage resources

Provide abstractions

Allow interactive use

Check for and report errorsautomatically

Integrate tightly with numpy

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 47: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda Philosophy

Provide complete access

Automatically manage resources

Provide abstractions

Allow interactive use

Check for and report errorsautomatically

Integrate tightly with numpy

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 48: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda Philosophy

Provide complete access

Automatically manage resources

Provide abstractions

Allow interactive use

Check for and report errorsautomatically

Integrate tightly with numpy

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 49: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Completeness

PyCuda exposes all of CUDA.

For example:

Arrays and Textures

Pagelocked host memory

Memory transfers (asynchronous, structured)

Streams and Events

Device queries

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 50: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Completeness

PyCuda exposes all of CUDA.

For example:

Arrays and Textures

Pagelocked host memory

Memory transfers (asynchronous, structured)

Streams and Events

Device queries

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 51: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Completeness

PyCuda supports every OS that CUDA supports.

Linux

Windows

OS X

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 52: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Completeness

PyCuda supports every OS that CUDA supports.

Linux

Windows

OS X

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 53: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Documentation

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 54: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Workow

Edit

PyCuda

Run

SourceModule("...")

Cache

nvcc .cubin

Upload to GPU

Run on GPU

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 55: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Workow

Edit

PyCuda

Run

SourceModule("...")

Cache

nvcc .cubin

Upload to GPU

Run on GPU

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 56: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Workow

Edit

PyCuda

Run

SourceModule("...")

Cache

nvcc .cubin

Upload to GPU

Run on GPU

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 57: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Workow

Edit

PyCuda

Run

SourceModule("...")

Cache?

nvcc .cubin

Upload to GPU

Run on GPU

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 58: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Workow

Edit

PyCuda

Run

SourceModule("...")

Cache?

nvcc

.cubin

Upload to GPU

Run on GPU

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 59: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Workow

Edit

PyCuda

Run

SourceModule("...")

Cache?

nvcc .cubin

Upload to GPU

Run on GPU

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 60: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Workow

Edit

PyCuda

Run

SourceModule("...")

Cache!

nvcc .cubin

Upload to GPU

Run on GPU

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 61: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Workow

Edit

PyCuda

Run

SourceModule("...")

Cache!

nvcc .cubin

Upload to GPU

Run on GPU

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 62: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Workow

Edit

PyCuda

Run

SourceModule("...")

Cache!

nvcc .cubin

Upload to GPU

Run on GPU

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 63: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Kernel Invocation

mod = pycuda.driver.SourceModule("__global__ my_func(int x, oat ∗y)...")

func = mod.get_function("my_func")mem = pycuda.driver.mem_alloc(20000)

Two ways:

Immediate:

func(numpy.int32(17), mem, block=(tx,ty,tz ), grid=(bx,by))

Prepared:

func. prepare("iP", block=(tx, ty , tz)) # see: pydoc struct

func. prepared_call ((bx,by), 17, mem)

Fast, Safe

Convenient :-)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 64: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Kernel Invocation

mod = pycuda.driver.SourceModule("__global__ my_func(int x, oat ∗y)...")

func = mod.get_function("my_func")mem = pycuda.driver.mem_alloc(20000)

Two ways:

Immediate:

func(numpy.int32(17), mem, block=(tx,ty,tz ), grid=(bx,by))

Prepared:

func. prepare("iP", block=(tx, ty , tz)) # see: pydoc struct

func. prepared_call ((bx,by), 17, mem)

Fast, Safe

Convenient :-)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 65: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Kernel Invocation

mod = pycuda.driver.SourceModule("__global__ my_func(int x, oat ∗y)...")

func = mod.get_function("my_func")mem = pycuda.driver.mem_alloc(20000)

Two ways:

Immediate:

func(numpy.int32(17), mem, block=(tx,ty,tz ), grid=(bx,by))

Prepared:

func. prepare("iP", block=(tx, ty , tz)) # see: pydoc struct

func. prepared_call ((bx,by), 17, mem)

Fast, Safe

Convenient :-)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 66: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Kernel Invocation

mod = pycuda.driver.SourceModule("__global__ my_func(int x, oat ∗y)...")

func = mod.get_function("my_func")mem = pycuda.driver.mem_alloc(20000)

Two ways:

Immediate:

func(numpy.int32(17), mem, block=(tx,ty,tz ), grid=(bx,by))

Prepared:

func. prepare("iP", block=(tx, ty , tz)) # see: pydoc struct

func. prepared_call ((bx,by), 17, mem)

Fast, Safe

Convenient :-)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 67: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Kernel Invocation

mod = pycuda.driver.SourceModule("__global__ my_func(int x, oat ∗y)...")

func = mod.get_function("my_func")mem = pycuda.driver.mem_alloc(20000)

Two ways:

Immediate:

func(numpy.int32(17), mem, block=(tx,ty,tz ), grid=(bx,by))

Prepared:

func. prepare("iP", block=(tx, ty , tz)) # see: pydoc struct

func. prepared_call ((bx,by), 17, mem)

Fast, Safe

Convenient :-)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 68: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Kernel Invocation

mod = pycuda.driver.SourceModule("__global__ my_func(int x, oat ∗y)...")

func = mod.get_function("my_func")mem = pycuda.driver.mem_alloc(20000)

Two ways:

Immediate:

func(numpy.int32(17), mem, block=(tx,ty,tz ), grid=(bx,by))

Prepared:

func. prepare("iP", block=(tx, ty , tz)) # see: pydoc struct

func. prepared_call ((bx,by), 17, mem)

Fast, Safe

Convenient :-)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 69: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Kernel Invocation: Automatic Copies

mod = pycuda.driver.SourceModule("__global__ my_func(oat ∗out, oat ∗in)...")

func = mod.get_function("my_func")

src = numpy.random.randn(400).astype(numpy.oat32)dest = numpy.empty_like(src)

my_func(cuda.Out(dest),cuda.In( src ),block=(400,1,1))

InOut exists, too.

Only for immediate invocation style.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 70: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Kernel Invocation: Automatic Copies

mod = pycuda.driver.SourceModule("__global__ my_func(oat ∗out, oat ∗in)...")

func = mod.get_function("my_func")

src = numpy.random.randn(400).astype(numpy.oat32)dest = numpy.empty_like(src)

my_func(cuda.Out(dest),cuda.In( src ),block=(400,1,1))

InOut exists, too.

Only for immediate invocation style.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 71: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Kernel Invocation: Automatic Copies

mod = pycuda.driver.SourceModule("__global__ my_func(oat ∗out, oat ∗in)...")

func = mod.get_function("my_func")

src = numpy.random.randn(400).astype(numpy.oat32)dest = numpy.empty_like(src)

my_func(cuda.Out(dest),cuda.In( src ),block=(400,1,1))

InOut exists, too.

Only for immediate invocation style.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 72: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Kernel Invocation: Automatic Copies

mod = pycuda.driver.SourceModule("__global__ my_func(oat ∗out, oat ∗in)...")

func = mod.get_function("my_func")

src = numpy.random.randn(400).astype(numpy.oat32)dest = numpy.empty_like(src)

my_func(cuda.Out(dest),cuda.In( src ),block=(400,1,1))

InOut exists, too.

Only for immediate invocation style.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 73: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Automatic Cleanup

Reachable objects (memory,streams, . . . ) are never destroyed.

Once unreachable, released at anunspecied future time.

Scarce resources (memory) can beexplicitly freed. (obj.free())(partially true now, in VC and nextrelease)

Correctly deals with multiplecontexts and dependencies.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 74: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Automatic Cleanup

Reachable objects (memory,streams, . . . ) are never destroyed.

Once unreachable, released at anunspecied future time.

Scarce resources (memory) can beexplicitly freed. (obj.free())(partially true now, in VC and nextrelease)

Correctly deals with multiplecontexts and dependencies.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 75: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Automatic Cleanup

Reachable objects (memory,streams, . . . ) are never destroyed.

Once unreachable, released at anunspecied future time.

Scarce resources (memory) can beexplicitly freed. (obj.free())(partially true now, in VC and nextrelease)

Correctly deals with multiplecontexts and dependencies.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 76: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Automatic Cleanup

Reachable objects (memory,streams, . . . ) are never destroyed.

Once unreachable, released at anunspecied future time.

Scarce resources (memory) can beexplicitly freed. (obj.free())(partially true now, in VC and nextrelease)

Correctly deals with multiplecontexts and dependencies.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 77: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)

mod = cuda.SourceModule("...")tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 78: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)

mod = cuda.SourceModule("...")tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 79: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)

mod = cuda.SourceModule("...")tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 80: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")

tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 81: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")

tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 82: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")

tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 83: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")

tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()

Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 84: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")

tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()

Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 85: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")tr = mod.get_texref("my_tex")

tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()

Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 86: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")tr = mod.get_texref("my_tex")

tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()

Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 87: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")tr = mod.get_texref("my_tex")tr.set_address(mem, size)

tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()

Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 88: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")tr = mod.get_texref("my_tex")tr.set_address(mem, size)

tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()

Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 89: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()

Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 90: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")

f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()

Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 91: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")

f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()

Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 92: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()

Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 93: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()

Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 94: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()

Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 95: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

Working with Textures

mem = cuda.mem_alloc(size)mod = cuda.SourceModule("...")tr = mod.get_texref("my_tex")tr.set_address(mem, size)tr.set_format(...oat, 2)tr.set_ags(...)

f = mod.get_function("f")f.prepare(arg_types"",block=(bx,by,bz), texrefs=[tr])

f()

GPU Memory

SourceModule

texture<float2> my_tex

__global__ void f()Go!

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 96: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

gpuarray: Simple Linear Algebra

pycuda.gpuarray:

Meant to look and feel just like numpy.

gpuarray.to_gpu(numpy_array)

numpy_array = gpuarray.get()

No: indexing, slicing, etc. (yet)

Yes: +, -, ∗, /, ll, sin, exp, log, rand, . . .print gpuarray for debugging.

Memory behind gpuarray available as.gpudata attribute.

Use as kernel arguments, textures, etc.

Control concurrency through streams.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 97: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

gpuarray: Simple Linear Algebra

pycuda.gpuarray:

Meant to look and feel just like numpy.

gpuarray.to_gpu(numpy_array)

numpy_array = gpuarray.get()

No: indexing, slicing, etc. (yet)

Yes: +, -, ∗, /, ll, sin, exp, log, rand, . . .print gpuarray for debugging.

Memory behind gpuarray available as.gpudata attribute.

Use as kernel arguments, textures, etc.

Control concurrency through streams.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 98: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

gpuarray: Simple Linear Algebra

pycuda.gpuarray:

Meant to look and feel just like numpy.

gpuarray.to_gpu(numpy_array)

numpy_array = gpuarray.get()

No: indexing, slicing, etc. (yet)

Yes: +, -, ∗, /, ll, sin, exp, log, rand, . . .print gpuarray for debugging.

Memory behind gpuarray available as.gpudata attribute.

Use as kernel arguments, textures, etc.

Control concurrency through streams.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 99: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

gpuarray: Simple Linear Algebra

pycuda.gpuarray:

Meant to look and feel just like numpy.

gpuarray.to_gpu(numpy_array)

numpy_array = gpuarray.get()

No: indexing, slicing, etc. (yet)

Yes: +, -, ∗, /, ll, sin, exp, log, rand, . . .

print gpuarray for debugging.

Memory behind gpuarray available as.gpudata attribute.

Use as kernel arguments, textures, etc.

Control concurrency through streams.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 100: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

gpuarray: Simple Linear Algebra

pycuda.gpuarray:

Meant to look and feel just like numpy.

gpuarray.to_gpu(numpy_array)

numpy_array = gpuarray.get()

No: indexing, slicing, etc. (yet)

Yes: +, -, ∗, /, ll, sin, exp, log, rand, . . .print gpuarray for debugging.

Memory behind gpuarray available as.gpudata attribute.

Use as kernel arguments, textures, etc.

Control concurrency through streams.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 101: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

gpuarray: Simple Linear Algebra

pycuda.gpuarray:

Meant to look and feel just like numpy.

gpuarray.to_gpu(numpy_array)

numpy_array = gpuarray.get()

No: indexing, slicing, etc. (yet)

Yes: +, -, ∗, /, ll, sin, exp, log, rand, . . .print gpuarray for debugging.

Memory behind gpuarray available as.gpudata attribute.

Use as kernel arguments, textures, etc.

Control concurrency through streams.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 102: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

gpuarray: Simple Linear Algebra

pycuda.gpuarray:

Meant to look and feel just like numpy.

gpuarray.to_gpu(numpy_array)

numpy_array = gpuarray.get()

No: indexing, slicing, etc. (yet)

Yes: +, -, ∗, /, ll, sin, exp, log, rand, . . .print gpuarray for debugging.

Memory behind gpuarray available as.gpudata attribute.

Use as kernel arguments, textures, etc.

Control concurrency through streams.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 103: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Working with PyCuda

PyCuda: Vital Information

http://mathema.tician.de/software/

pycuda

X Consortium License(no warranty, free for all use)

Requires: numpy, Boost C++,Python 2.4+.

Support via mailing list.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 104: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

Outline

1 Scripting Languages

2 Scripting CUDAWhetting your AppetiteWorking with PyCudaA peek under the hood

3 Metaprogramming CUDA

4 Discontinuous Galerkin on CUDA

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 105: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

CUDA APIs

Hardware

Kernel Driver

Driver API

Runtime API PyCuda

C/C++ Python

CUDA has two ProgrammingInterfaces:

Runtime

high-level(libcudart.so, in thetoolkit)

Driver

low-level(libcuda.so, comes withGPU driver)

(mutually exclusive)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 106: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

CUDA APIs

Hardware

Kernel Driver

Driver API

Runtime API PyCuda

C/C++ Python CUDA has two ProgrammingInterfaces:

Runtime

high-level(libcudart.so, in thetoolkit)

Driver

low-level(libcuda.so, comes withGPU driver)

(mutually exclusive)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 107: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

CUDA APIs

Hardware

Kernel Driver

Driver API

Runtime API PyCuda

C/C++ Python CUDA has two ProgrammingInterfaces:

Runtime high-level

(libcudart.so, in thetoolkit)

Driver low-level

(libcuda.so, comes withGPU driver)

(mutually exclusive)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 108: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

CUDA APIs

Hardware

Kernel Driver

Driver API

Runtime API PyCuda

C/C++ Python CUDA has two ProgrammingInterfaces:

Runtime high-level(libcudart.so, in thetoolkit)

Driver low-level

(libcuda.so, comes withGPU driver)

(mutually exclusive)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 109: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

CUDA APIs

Hardware

Kernel Driver

Driver API

Runtime API PyCuda

C/C++ Python CUDA has two ProgrammingInterfaces:

Runtime high-level(libcudart.so, in thetoolkit)

Driver low-level(libcuda.so, comes withGPU driver)

(mutually exclusive)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 110: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

CUDA APIs

Hardware

Kernel Driver

Driver API

Runtime API PyCuda

C/C++ Python CUDA has two ProgrammingInterfaces:

Runtime high-level(libcudart.so, in thetoolkit)

Driver low-level(libcuda.so, comes withGPU driver)

(mutually exclusive)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 111: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

Runtime vs. Driver API

Runtime ↔ Driver dierences:

Explicit initialization.

Code objects (Modules) become programming languageobjects.

Texture handling requires slightly more work.

Only needs nvcc for compiling GPU code.

Driver API:

Conceptually cleaner

Less sugar-coating (provide in Python)

Not very dierent otherwise

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 112: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

Runtime vs. Driver API

Runtime ↔ Driver dierences:

Explicit initialization.

Code objects (Modules) become programming languageobjects.

Texture handling requires slightly more work.

Only needs nvcc for compiling GPU code.

Driver API:

Conceptually cleaner

Less sugar-coating (provide in Python)

Not very dierent otherwise

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 113: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

Runtime vs. Driver API

Runtime ↔ Driver dierences:

Explicit initialization.

Code objects (Modules) become programming languageobjects.

Texture handling requires slightly more work.

Only needs nvcc for compiling GPU code.

Driver API:

Conceptually cleaner

Less sugar-coating (provide in Python)

Not very dierent otherwise

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 114: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

Runtime vs. Driver API

Runtime ↔ Driver dierences:

Explicit initialization.

Code objects (Modules) become programming languageobjects.

Texture handling requires slightly more work.

Only needs nvcc for compiling GPU code.

Driver API:

Conceptually cleaner

Less sugar-coating (provide in Python)

Not very dierent otherwise

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 115: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

Runtime vs. Driver API

Runtime ↔ Driver dierences:

Explicit initialization.

Code objects (Modules) become programming languageobjects.

Texture handling requires slightly more work.

Only needs nvcc for compiling GPU code.

Driver API:

Conceptually cleaner

Less sugar-coating (provide in Python)

Not very dierent otherwise

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 116: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

PyCuda: API Tracing

With ./configure --cuda-trace=1:

import pycuda. driver as cudaimport pycuda. autoinitimport numpy

a = numpy.random.randn(4,4).astype(numpy.oat32)a_gpu = cuda.mem_alloc(a.size ∗ a.dtype.itemsize)cuda.memcpy_htod(a_gpu, a)

mod = cuda.SourceModule("""__global__ void doublify(oat ∗a)int idx = threadIdx.x + threadIdx.y∗4;a[ idx ] ∗= 2;

""")

func = mod.get_function("doublify")func(a_gpu, block=(4,4,1))

a_doubled = numpy.empty_like(a)cuda.memcpy_dtoh(a_doubled, a_gpu)print a_doubledprint a

cuInit

cuDeviceGetCount

cuDeviceGet

cuCtxCreate

cuMemAlloc

cuMemcpyHtoD

cuCtxGetDevice

cuDeviceComputeCapability

cuModuleLoadData

cuModuleGetFunction

cuFuncSetBlockShape

cuParamSetv

cuParamSetSize

cuLaunchGrid

cuMemcpyDtoH

cuCtxPopCurrent

cuCtxPushCurrent

cuMemFree

cuCtxPopCurrent

cuCtxPushCurrent

cuModuleUnload

cuCtxPopCurrent

cuCtxDestroy

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 117: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

PyCuda: API Tracing

With ./configure --cuda-trace=1:

import pycuda. driver as cudaimport pycuda. autoinitimport numpy

a = numpy.random.randn(4,4).astype(numpy.oat32)a_gpu = cuda.mem_alloc(a.size ∗ a.dtype.itemsize)cuda.memcpy_htod(a_gpu, a)

mod = cuda.SourceModule("""__global__ void doublify(oat ∗a)int idx = threadIdx.x + threadIdx.y∗4;a[ idx ] ∗= 2;

""")

func = mod.get_function("doublify")func(a_gpu, block=(4,4,1))

a_doubled = numpy.empty_like(a)cuda.memcpy_dtoh(a_doubled, a_gpu)print a_doubledprint a

cuInit

cuDeviceGetCount

cuDeviceGet

cuCtxCreate

cuMemAlloc

cuMemcpyHtoD

cuCtxGetDevice

cuDeviceComputeCapability

cuModuleLoadData

cuModuleGetFunction

cuFuncSetBlockShape

cuParamSetv

cuParamSetSize

cuLaunchGrid

cuMemcpyDtoH

cuCtxPopCurrent

cuCtxPushCurrent

cuMemFree

cuCtxPopCurrent

cuCtxPushCurrent

cuModuleUnload

cuCtxPopCurrent

cuCtxDestroy

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 118: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

A peek under the hood

Questions?

?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 119: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Outline

1 Scripting Languages

2 Scripting CUDA

3 Metaprogramming CUDAPrograms that write Programs

4 Discontinuous Galerkin on CUDA

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 120: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Metaprogramming

Idea

Python Code

CUDA C Code

nvcc

.cubin

GPU

Result

Machine

HumanEasy to write

In PyCuda,

CUDA C codedoes not need tobe a compile-time

constant.

(unlike the CUDA Runtime API)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 121: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Metaprogramming

Idea

Python Code

CUDA C Code

nvcc

.cubin

GPU

Result

Machine

HumanEasy to write

In PyCuda,

CUDA C codedoes not need tobe a compile-time

constant.

(unlike the CUDA Runtime API)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 122: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Metaprogramming

Idea

Python Code

CUDA C Code

nvcc

.cubin

GPU

Result

Machine

HumanEasy to write

In PyCuda,

CUDA C codedoes not need tobe a compile-time

constant.

(unlike the CUDA Runtime API)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 123: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Metaprogramming

Idea

Python Code

CUDA C Code

nvcc

.cubin

GPU

Result

Machine

HumanEasy to write

In PyCuda,

CUDA C codedoes not need tobe a compile-time

constant.

(unlike the CUDA Runtime API)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 124: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Metaprogramming

Idea

Python Code

CUDA C Code

nvcc

.cubin

GPU

Result

Machine

HumanEasy to write

In PyCuda,

CUDA C codedoes not need tobe a compile-time

constant.

(unlike the CUDA Runtime API)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 125: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Metaprogramming

Idea

Python Code

CUDA C Code

nvcc

.cubin

GPU

Result

Machine

HumanEasy to write

In PyCuda,

CUDA C codedoes not need tobe a compile-time

constant.

(unlike the CUDA Runtime API)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 126: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Metaprogramming

Idea

Python Code

CUDA C Code

nvcc

.cubin

GPU

Result

Machine

HumanEasy to write

In PyCuda,

CUDA C codedoes not need tobe a compile-time

constant.

(unlike the CUDA Runtime API)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 127: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Metaprogramming

Idea

Python Code

CUDA C Code

nvcc

.cubin

GPU

Result

Machine

HumanEasy to write

In PyCuda,

CUDA C codedoes not need tobe a compile-time

constant.

(unlike the CUDA Runtime API)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 128: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Metaprogramming

Idea

Python Code

CUDA C Code

nvcc

.cubin

GPU

Result

Machine

HumanEasy to write

In PyCuda,

CUDA C codedoes not need tobe a compile-time

constant.

(unlike the CUDA Runtime API)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 129: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Metaprogramming

Idea

Python Code

CUDA C Code

nvcc

.cubin

GPU

Result

Machine

HumanEasy to write

In PyCuda,

CUDA C codedoes not need tobe a compile-time

constant.

(unlike the CUDA Runtime API)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 130: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Metaprogramming

Idea

Python Code

CUDA C Code

nvcc

.cubin

GPU

Result

Machine

Human

Easy to write

In PyCuda,

CUDA C codedoes not need tobe a compile-time

constant.

(unlike the CUDA Runtime API)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 131: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Metaprogramming

Idea

Python Code

CUDA C Code

nvcc

.cubin

GPU

Result

Machine

Human

Easy to writeIn PyCuda,

CUDA C codedoes not need tobe a compile-time

constant.

(unlike the CUDA Runtime API)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 132: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Machine-generated Code

Why machine-generate code?

Automated Tuning(cf. ATLAS, FFTW)

Data types

Specialize code for given problem

Constants faster than variables(→ register pressure)

Loop Unrolling

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 133: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Machine-generated Code

Why machine-generate code?

Automated Tuning(cf. ATLAS, FFTW)

Data types

Specialize code for given problem

Constants faster than variables(→ register pressure)

Loop Unrolling

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 134: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Machine-generated Code

Why machine-generate code?

Automated Tuning(cf. ATLAS, FFTW)

Data types

Specialize code for given problem

Constants faster than variables(→ register pressure)

Loop Unrolling

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 135: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Machine-generated Code

Why machine-generate code?

Automated Tuning(cf. ATLAS, FFTW)

Data types

Specialize code for given problem

Constants faster than variables(→ register pressure)

Loop Unrolling

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 136: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Machine-generated Code

Why machine-generate code?

Automated Tuning(cf. ATLAS, FFTW)

Data types

Specialize code for given problem

Constants faster than variables(→ register pressure)

Loop Unrolling

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 137: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

PyCuda: Support for Metaprogramming

Access properties of compiled code:func.registers,lmem,smem

Exact GPU timing via events

Can calculate hardware-dependent MP occupancy

codepy:

Build C syntax trees from PythonGenerates readable, indented CAlso: CPU metaprogramming (so far Linux only)Unreleased (but in public VCask me)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 138: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

PyCuda: Support for Metaprogramming

Access properties of compiled code:func.registers,lmem,smem

Exact GPU timing via events

Can calculate hardware-dependent MP occupancy

codepy:

Build C syntax trees from PythonGenerates readable, indented CAlso: CPU metaprogramming (so far Linux only)Unreleased (but in public VCask me)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 139: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

PyCuda: Support for Metaprogramming

Access properties of compiled code:func.registers,lmem,smem

Exact GPU timing via events

Can calculate hardware-dependent MP occupancy

codepy:

Build C syntax trees from PythonGenerates readable, indented CAlso: CPU metaprogramming (so far Linux only)Unreleased (but in public VCask me)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 140: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

PyCuda: Support for Metaprogramming

Access properties of compiled code:func.registers,lmem,smem

Exact GPU timing via events

Can calculate hardware-dependent MP occupancy

codepy:

Build C syntax trees from PythonGenerates readable, indented CAlso: CPU metaprogramming (so far Linux only)Unreleased (but in public VCask me)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 141: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

PyCuda: Support for Metaprogramming

Access properties of compiled code:func.registers,lmem,smem

Exact GPU timing via events

Can calculate hardware-dependent MP occupancy

codepy:

Build C syntax trees from Python

Generates readable, indented CAlso: CPU metaprogramming (so far Linux only)Unreleased (but in public VCask me)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 142: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

PyCuda: Support for Metaprogramming

Access properties of compiled code:func.registers,lmem,smem

Exact GPU timing via events

Can calculate hardware-dependent MP occupancy

codepy:

Build C syntax trees from PythonGenerates readable, indented C

Also: CPU metaprogramming (so far Linux only)Unreleased (but in public VCask me)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 143: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

PyCuda: Support for Metaprogramming

Access properties of compiled code:func.registers,lmem,smem

Exact GPU timing via events

Can calculate hardware-dependent MP occupancy

codepy:

Build C syntax trees from PythonGenerates readable, indented CAlso: CPU metaprogramming (so far Linux only)

Unreleased (but in public VCask me)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 144: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

PyCuda: Support for Metaprogramming

Access properties of compiled code:func.registers,lmem,smem

Exact GPU timing via events

Can calculate hardware-dependent MP occupancy

codepy:

Build C syntax trees from PythonGenerates readable, indented CAlso: CPU metaprogramming (so far Linux only)Unreleased (but in public VCask me)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 145: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Programs that write Programs

Questions?

?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 146: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Outline

1 Scripting Languages

2 Scripting CUDA

3 Metaprogramming CUDA

4 Discontinuous Galerkin on CUDAIntroductionResultsConclusions

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 147: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Discontinuous Galerkin Method

Let Ω :=⋃

iDk ⊂ Rd .

Goal

Solve a conservation law on Ω: ut +∇ · F (u) = 0

Example

Maxwell's Equations: EM eld: E (x , t), H(x , t) on Ω governed by

∂tE −1

ε∇× H = − j

ε, ∂tH +

1

µ∇× E = 0,

∇ · E =ρ

ε, ∇ · H = 0.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 148: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Discontinuous Galerkin Method

Let Ω :=⋃

iDk ⊂ Rd .

Goal

Solve a conservation law on Ω: ut +∇ · F (u) = 0

Example

Maxwell's Equations: EM eld: E (x , t), H(x , t) on Ω governed by

∂tE −1

ε∇× H = − j

ε, ∂tH +

1

µ∇× E = 0,

∇ · E =ρ

ε, ∇ · H = 0.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 149: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Discontinuous Galerkin Method

Let Ω :=⋃

iDk ⊂ Rd .

Goal

Solve a conservation law on Ω: ut +∇ · F (u) = 0

Example

Maxwell's Equations: EM eld: E (x , t), H(x , t) on Ω governed by

∂tE −1

ε∇× H = − j

ε, ∂tH +

1

µ∇× E = 0,

∇ · E =ρ

ε, ∇ · H = 0.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 150: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Discontinuous Galerkin Method

Multiply by test function, integrate by parts:

0 =

ˆDk

utϕ+ [∇ · F (u)]ϕ dx

=

ˆDk

utϕ− F (u) · ∇ϕ dx +

ˆ∂Dk

(n · F )∗ϕ dSx ,

Integrate by parts again, subsitute in basis functions, introduceelementwise dierentiation and lifting matrices D, L:

∂tuk = −

∑ν

D∂ν ,k [F (uk)] + Lk [n · F − (n · F )∗]|A⊂∂Dk.

For straight-sided simplicial elements:Reduce D∂ν and L to reference matrices.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 151: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Discontinuous Galerkin Method

Multiply by test function, integrate by parts:

0 =

ˆDk

utϕ+ [∇ · F (u)]ϕ dx

=

ˆDk

utϕ− F (u) · ∇ϕ dx +

ˆ∂Dk

(n · F )∗ϕ dSx ,

Integrate by parts again, subsitute in basis functions, introduceelementwise dierentiation and lifting matrices D, L:

∂tuk = −

∑ν

D∂ν ,k [F (uk)] + Lk [n · F − (n · F )∗]|A⊂∂Dk.

For straight-sided simplicial elements:Reduce D∂ν and L to reference matrices.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 152: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Discontinuous Galerkin Method

Multiply by test function, integrate by parts:

0 =

ˆDk

utϕ+ [∇ · F (u)]ϕ dx

=

ˆDk

utϕ− F (u) · ∇ϕ dx +

ˆ∂Dk

(n · F )∗ϕ dSx ,

Integrate by parts again, subsitute in basis functions, introduceelementwise dierentiation and lifting matrices D, L:

∂tuk = −

∑ν

D∂ν ,k [F (uk)] + Lk [n · F − (n · F )∗]|A⊂∂Dk.

For straight-sided simplicial elements:Reduce D∂ν and L to reference matrices.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 153: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Discontinuous Galerkin Method

Multiply by test function, integrate by parts:

0 =

ˆDk

utϕ+ [∇ · F (u)]ϕ dx

=

ˆDk

utϕ− F (u) · ∇ϕ dx +

ˆ∂Dk

(n · F )∗ϕ dSx ,

Integrate by parts again, subsitute in basis functions, introduceelementwise dierentiation and lifting matrices D, L:

∂tuk = −

∑ν

D∂ν ,k [F (uk)] + Lk [n · F − (n · F )∗]|A⊂∂Dk.

For straight-sided simplicial elements:Reduce D∂ν and L to reference matrices.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 154: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Discontinuous Galerkin Method

Multiply by test function, integrate by parts:

0 =

ˆDk

utϕ+ [∇ · F (u)]ϕ dx

=

ˆDk

utϕ− F (u) · ∇ϕ dx +

ˆ∂Dk

(n · F )∗ϕ dSx ,

Integrate by parts again, subsitute in basis functions, introduceelementwise dierentiation and lifting matrices D, L:

∂tuk = −

∑ν

D∂ν ,k [F (uk)] + Lk [n · F − (n · F )∗]|A⊂∂Dk.

For straight-sided simplicial elements:Reduce D∂ν and L to reference matrices.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 155: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Decomposition of a DG operator into Subtasks

DG's execution decomposes into two (mostly) separate branches:

uk

Flux Gather Flux Lifting

F (uk) Local Dierentiation

∂tuk

Green: Element-local parts of the DG operator.

Note: Explicit timestepping.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 156: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

DG: Properties

Flexible:

Variable order of accuracy

Unstructured discretizations

Usable for many types of equations

Implementation-friendly:

Good stability properties

Parallelizes well

Simple (compared to otherhigh-order unstructured methods)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 157: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

DG: Properties

Flexible:

Variable order of accuracy

Unstructured discretizations

Usable for many types of equations

Implementation-friendly:

Good stability properties

Parallelizes well

Simple (compared to otherhigh-order unstructured methods)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 158: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

DG: Properties

Flexible:

Variable order of accuracy

Unstructured discretizations

Usable for many types of equations

Implementation-friendly:

Good stability properties

Parallelizes well

Simple (compared to otherhigh-order unstructured methods)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 159: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

DG: Properties

Flexible:

Variable order of accuracy

Unstructured discretizations

Usable for many types of equations

Implementation-friendly:

Good stability properties

Parallelizes well

Simple (compared to otherhigh-order unstructured methods)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 160: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

DG: Properties

Flexible:

Variable order of accuracy

Unstructured discretizations

Usable for many types of equations

Implementation-friendly:

Good stability properties

Parallelizes well

Simple (compared to otherhigh-order unstructured methods)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 161: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

DG: Properties

Flexible:

Variable order of accuracy

Unstructured discretizations

Usable for many types of equations

Implementation-friendly:

Good stability properties

Parallelizes well

Simple (compared to otherhigh-order unstructured methods)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 162: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

DG: Properties

Flexible:

Variable order of accuracy

Unstructured discretizations

Usable for many types of equations

Implementation-friendly:

Good stability properties

Parallelizes well

Simple (compared to otherhigh-order unstructured methods)

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 163: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Why do DG on Graphics Cards?

DG on GPUs: Why?

GPUs have deep Memory Hierarchy

The majority of DG is local.

Compute Bandwidth Memory Bandwidth

DG is arithmetically intense.

GPUs favor local workloads.

DG has very limited communication.

A match made in heaven?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 164: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Why do DG on Graphics Cards?

DG on GPUs: Why?

GPUs have deep Memory Hierarchy

The majority of DG is local.

Compute Bandwidth Memory Bandwidth

DG is arithmetically intense.

GPUs favor local workloads.

DG has very limited communication.

A match made in heaven?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 165: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Why do DG on Graphics Cards?

DG on GPUs: Why?

GPUs have deep Memory Hierarchy

The majority of DG is local.

Compute Bandwidth Memory Bandwidth

DG is arithmetically intense.

GPUs favor local workloads.

DG has very limited communication.

A match made in heaven?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 166: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Why do DG on Graphics Cards?

DG on GPUs: Why?

GPUs have deep Memory Hierarchy

The majority of DG is local.

Compute Bandwidth Memory Bandwidth

DG is arithmetically intense.

GPUs favor local workloads.

DG has very limited communication.

A match made in heaven?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 167: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Introduction

Why do DG on Graphics Cards?

DG on GPUs: Why?

GPUs have deep Memory Hierarchy

The majority of DG is local.

Compute Bandwidth Memory Bandwidth

DG is arithmetically intense.

GPUs favor local workloads.

DG has very limited communication.

A match made in heaven?

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 168: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Results

Outline

1 Scripting Languages

2 Scripting CUDA

3 Metaprogramming CUDA

4 Discontinuous Galerkin on CUDAIntroductionResultsConclusions

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 169: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Results

GTX280 vs. single core of Intel Core 2 Duo E8400

2 4 6 8Polynomial Order N

0

50

100

150

200

250

300

GFl

ops/

s

GPUCPU

2 4 6 820

25

30

35

40

45

50

55

60

Speedup F

act

or

Speedup

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 170: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Results

Memory Bandwidth on a GTX 280

1 2 3 4 5 6 7 8 9Polynomial Order N

20

40

60

80

100

120

140

160

180

200

Glo

bal M

em

ory

Bandw

idth

[G

B/s

]

GatherLiftDiffAssy.Peak

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 171: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Results

Real-World Scattering Calculation

Order N = 4,78745 elements,2.7M · 6 DOFs,single Tesla C1060.

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 172: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Conclusions

Outline

1 Scripting Languages

2 Scripting CUDA

3 Metaprogramming CUDA

4 Discontinuous Galerkin on CUDAIntroductionResultsConclusions

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 173: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Conclusions

Conclusions

Fun time to be in computational science

Use Python and PyCuda to have even more fun :-)

With no compromise in performance

CUDA tuning too tedious? Need more speed?

Automate it: Metaprogramming

Further work in CUDA-DG:

Multi-GPUOther equations (Euler, Poisson, possibly Navier-Stokes?)Double Precision

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 174: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Conclusions

Conclusions

Fun time to be in computational science

Use Python and PyCuda to have even more fun :-)

With no compromise in performance

CUDA tuning too tedious? Need more speed?

Automate it: Metaprogramming

Further work in CUDA-DG:

Multi-GPUOther equations (Euler, Poisson, possibly Navier-Stokes?)Double Precision

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 175: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Conclusions

Conclusions

Fun time to be in computational science

Use Python and PyCuda to have even more fun :-)

With no compromise in performance

CUDA tuning too tedious? Need more speed?

Automate it: Metaprogramming

Further work in CUDA-DG:

Multi-GPUOther equations (Euler, Poisson, possibly Navier-Stokes?)Double Precision

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 176: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Conclusions

Conclusions

Fun time to be in computational science

Use Python and PyCuda to have even more fun :-)

With no compromise in performance

CUDA tuning too tedious? Need more speed?

Automate it: Metaprogramming

Further work in CUDA-DG:

Multi-GPUOther equations (Euler, Poisson, possibly Navier-Stokes?)Double Precision

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 177: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Conclusions

Conclusions

Fun time to be in computational science

Use Python and PyCuda to have even more fun :-)

With no compromise in performance

CUDA tuning too tedious? Need more speed?

Automate it: Metaprogramming

Further work in CUDA-DG:

Multi-GPUOther equations (Euler, Poisson, possibly Navier-Stokes?)Double Precision

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 178: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Conclusions

Where to from here?

PyCuda Homepage

(also these slides, tonight)→ http://mathema.tician.de/software/pycuda

CUDA-DG Preprint

AK, T. Warburton, J. Bridge, J.S. Hesthaven, Nodal DiscontinuousGalerkin Methods on Graphics Processors, submitted.→ http://arxiv.org/abs/0901.1024

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 179: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Conclusions

Questions?

?Thank you for your attention!

http://mathema.tician.de/software/pycuda

http://arxiv.org/abs/0901.1024

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs

Page 180: High-Productivity Supercomputing: Metaprogramming GPUs · 2020. 8. 21. · Scripting: Python For this talk, Python is the scripting language of choice. Mature language Has a large

Scripting Languages Scripting CUDA Metaprogramming CUDA Discontinuous Galerkin on CUDA

Conclusions

Image Credits I

Batteries: ickr.com/thebmag

Python logo: python.org

Snail: ickr.com/hadi_fooladi

Old Books: ickr.com/ppdigital

Adding Machine: ickr.com/thomashawk

Floppy disk: ickr.com/ethanhein

Machine: ickr.com/13521837@N00

Andreas Klöckner Applied Math, Brown

High-Productivity Supercomputing: Metaprogramming GPUs