intro to cuda - famaf uncwhat is cuda? • cuda means: compute unified device architecture. • cuda...

27
Intro to CUDA Author: Dionisio E Alonso <[email protected]> Date: June 2011

Upload: others

Post on 20-May-2020

25 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Intro to CUDAAuthor: Dionisio E Alonso <[email protected]>

Date: June 2011

Page 2: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Known methods of parallelism

MPI

• Clusters

• Over networks

OpenMP

• One computer

• Parallelism over multiples cores

GPGPU - FaMAF© 2011

2

Page 3: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

What is CUDA?

• CUDA means: Compute Unified Device Architecture.

• CUDA is developed by NVIDIA for computing over graphic devices.

• The architecture used from G8x.

• There are many flavors (C, Fortran, OpenCL, Python, etc.)

GPGPU - FaMAF© 2011

3

Page 4: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Former Graphics Pipelines

GPGPU - FaMAF© 2011

4

Page 5: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

NVIDIA GPUs structure

2006: unified structure.

GPGPU - FaMAF© 2011

5

Page 6: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Let's see a CPU core

GPGPU - FaMAF© 2011

6

Page 7: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

CPU on diet

GPGPU - FaMAF© 2011

7

Page 8: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Parallelism

GPGPU - FaMAF© 2011

8

Page 9: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

SIMD

GPGPU - FaMAF© 2011

9

Page 10: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

More parallelism

GPGPU - FaMAF© 2011

10

Page 11: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

What if...

not everyone executes the same code?

GPGPU - FaMAF© 2011

11

Page 12: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

And the memory access?

• No more cache, more(?) latency

• More parallelism available

• Many more threads than execution units.

GPGPU - FaMAF© 2011

12

Page 13: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Hiding memory latency

GPGPU - FaMAF© 2011

13

Page 14: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Hardware examples

G110 (GeForce GTX580, Tesla c2090):

• 16 Streaming Multiprocessors (SM)

• 32 CUDA cores per SM

• Two instructions per CUDA core per cycle @ 1554MHz (GTX 580) = 1581 GFLOPS

• 192 GBPS to memory = 33 instructions per float access

• Adicional information:

• 128KB registers per SM

• 64KB shared memory / cache L1 per SM

• 12KB read only cache per SM

• 768KB global cache L2

GPGPU - FaMAF© 2011

14

Page 15: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Compared to CPU high-end

~15 times more throughput, ~10 times more memory bandwidth

GPGPU - FaMAF© 2011

15

Page 16: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Some differences with CPU

• More threads is better

• Cost launching thread: ~0

• Cost in context switch between threads: ~0

• Cost terminating threads: ~0

• Is better to recalculate

• Use registers or shared memory instead of global memory

GPGPU - FaMAF© 2011

16

Page 17: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

CUDA C

• C++ Syntax & Semantics

GPGPU - FaMAF© 2011

17

Page 18: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Kernels

• The kernel is the function which runs in each thread

• Don't return values

• __global__ prefix

• A kernel can call __device__ functions

• No recursion allowed

GPGPU - FaMAF© 2011

18

Page 19: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

an example

__device__ int abs(int a) { return a < 0 ? -a : a;}

__global__ void distance(int * a, int * b, int * c) { int idx = threadIdx.x + blockIdx.x * blockDim.x; c[idx] = abs(a[idx] - b[idx]);}

GPGPU - FaMAF© 2011

19

Page 20: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Blocks

Threads a grouped in blocks from 1 to 3 dimensions and run in the same SM

• Data declared as __shared__ is shared between threads

• Threads in the same block can synchronize using __syncthreads

• The threadIdx predefined variable allocates the thread coordinate

• Predefined blockDim allocates the block size

GPGPU - FaMAF© 2011

20

Page 21: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

an example

__global__ void sum_all(int * a) { __shared__ int s[256]; int idx = threadIdx.x;

s[idx] = a[idx]; __syncthreads();

int sum = 0; for (int i = 0; i < 256; ++i) { sum += s[i]; } a[idx] = sum;}

GPGPU - FaMAF© 2011

21

Page 22: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Launching kernels

• Specify block size

• Put them in a (up to 2 dimensions) grid (in newer cards may be 3)

• Predefined blockIdx allocates the block coordinate

• Predefined gridDim allocates the grid size

GPGPU - FaMAF© 2011

22

Page 23: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

an example

__global__ void distance(int * a, int * b, int * c) { int idx = threadIdx.x + blockIdx.x * blockDim.x; c[idx] = abs(a[idx] - b[idx]);}

int main(...) { // ... dim3 dim_block, dim_grid; dim_block.x = 256; dim_grid.x = ceil(N / dim_block.x); distance<<<dim_grid, dim_block>>>(vector1, vector2, result); // ...}

GPGPU - FaMAF© 2011

23

Page 24: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Memory management

• GPU owns memory addresses, where you can allocate device memory or map hostmemory

• Can't receive host pointers

• Programmer tracks which are host pointers and device pointers

• GPU memory management is similar to C language

• cudaMalloc allocates memory and returns the pointer (as a parameter)

• cudaFree frees allocated memory in a pointer

• cudaMemcpy copies in/from/to the device

GPGPU - FaMAF© 2011

24

Page 25: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

en example

int h_a[N];initialize(a);

int *d_a;cudaMalloc(&d_a, N * sizeof(int));cudaMemcpy(d_a, h_a, N * sizeof(int), cudaMemcpyHostToDevice);

modify<<<grid,block>>>(d_a);

cudaMemcpy(h_a, d_a, N * sizeof(int), cudaMemcpyDeviceToHost);

GPGPU - FaMAF© 2011

25

Page 26: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Q & A:

Questions?

GPGPU - FaMAF© 2011

26

Page 27: Intro to CUDA - FAMAF UNCWhat is CUDA? • CUDA means: Compute Unified Device Architecture. • CUDA is developed by NVIDIA for computing over graphic devices. • The architecture

Bibliography

• NVIDIA GeForce 8800 GPU Architecture Overview, 2006.

• David B. Kirk, Wen-mei W. Hwu. Programming Massively Parallel Processors: A Hands-onApproach, Morgan Kaufmann, 2010.

• NVIDIA Inc., CUDA C Programming Guide, version 3.2, 2010.

• NVIDIA Inc., CUDA Toolkit Reference Manual, version 3.2, 2010.

• Wolovick Nicolás - Bederián Carlos, Basic course PEAGPGPU, 2011.

GPGPU - FaMAF© 2011

27