introduction to cuda geek camp singapore 2011

INTRODUCTION TO CUDA Prepared for Geek Camp Singapore 2011

Raymond Tay

THE FREE LUNCH IS OVER – HERB SUTTER

WE NEED TO THINK BEYOND MULTI-CORE CPUS … WE NEED TO THINK MANY-CORE GPUS

…

NVIDIA GPUS FPS

  FPS – Floating-point per second aka flops. A measure of how many flops can a GPU do. More is Better

GPUs beat CPUs

NVIDIA GPUS MEMORY BANDWIDTH

  With massively parallel processors in Nvidia’s GPUs, providing high memory bandwidth plays a big role in high performance computing.

GPUs beat CPUs

GPU VS CPU

CPU "   Optimised for low-latency

access to cached data sets "   Control logic for out-of-order

and speculative execution

GPU "   Optimised for data-parallel,

throughput computation "   Architecture tolerant of

memory latency "   More transistors dedicated to

computation

I DON’T KNOW C/C++, SHOULD I LEAVE?

 Relax, no worries. Not to fret.

Your Brain Asks: Wait a minute, why should I learn the C/C++ SDK?

CUDA Answers: Efficiency!!!

WHAT DO I NEED TO BEGIN WITH CUDA?

 A Nvidia CUDA enabled graphics card e.g. Fermi

HOW DOES CUDA WORK

1.  Copy input data from CPU memory to GPU memory

2.  Load GPU program and execute, caching data on chip for performance

3.  Copy results from GPU memory to CPU memory

PCI Bus

EXAMPLE: BLOCK CYPHER

void host_shift_cypher(unsigned int *input_array, unsigned int *output_array, unsigned int shift_amount, unsigned int alphabet_max, unsigned int array_length)

{

for(unsigned int i=0;i<array_length;i++)

{

int element = input_array[i];

int shifted = element + shift_amount;

if(shifted > alphabet_max)

{

shifted = shifted % (alphabet_max + 1);

}

output_array[i] = shifted;

}

}

Int main() {

host_shift_cypher(input_array, output_array, shift_amount, alphabet_max, array_length);

}

__global__ void shift_cypher(unsigned int *input_array, unsigned int *output_array, unsigned int shift_amount, unsigned int alphabet_max, unsigned int array_length)

{

unsigned int tid = threadIdx.x + blockIdx.x * blockDim.x;

int shifted = input_array[tid] + shift_amount;

if ( shifted > alphabet_max )

shifted = shifted % (alphabet_max + 1);

output_array[tid] = shifted;

}

Int main() {

dim3 dimGrid(ceil(array_length)/block_size);

dim3 dimBlock(block_size);

shift_cypher<<<dimGrid,dimBlock>>>(input_array, output_array, shift_amount, alphabet_max, array_length);

} CPU Program

GPU Program

EXAMPLE: VECTOR ADDITION // CUDA CODE __global__ void VecAdd(const float* A, const float* B, float* C,

unsigned int N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) C[i] = A[i] + B[i]; }

// C CODE void VecAdd(const float* A, const float* B, float* C,unsigned int N) { for( int i = 0; i < N; ++i) C[i] = A[i] + B[i]; }

DEBUGGER

CUDA-GDB

Parallel Nsight

• Based on GDB • Linux • Mac OS X

• Plugin inside Visual Studio

VISUAL PROFILER & MEMCHECK

Profiler

•  Microsoft Windows •  Linux •  Mac OS X

•  Analyze Performance

CUDA-MEMCHECK

•  Microsoft Windows •  Linux •  Mac OS X

•  Detect memory access errors

WHERE’S CUDA AT IN 2011?

  60,000 researchers use it to aid drug discovery   470 universities teach CUDA

WHERE’S CUDA AT IN 2011? (PART 2..)

 NVIDIA Show Case (1000+ applications)

ADDITIONAL RESOURCES   CUDA FAQ (http://tegradeveloper.nvidia.com/cuda-faq)

  CUDA Tools & Ecosystem (http://tegradeveloper.nvidia.com/cuda-tools-ecosystem)

  CUDA Downloads (http://tegradeveloper.nvidia.com/cuda-downloads)

  NVIDIA Forums (http://forums.nvidia.com/index.php?showforum=62)

  GPGPU (http://gpgpu.org )

  CUDA By Example (http://tegradeveloper.nvidia.com/content/cuda-example-introduction-general-purpose-gpu-programming-0)

  Jason Sanders & Edward Kandrot   GPU Computing Gems Emerald Edition (

http://www.amazon.com/GPU-Computing-Gems-Emerald-Applications/dp/0123849888/ )   Editor in Chief: Prof Hwu Wen-Mei

CUDA LIBRARIES

 Visit this site http://developer.nvidia.com/cuda-tools-ecosystem#Libraries

 Thrust, CUFFT, CUBLAS, CUSP, NPP, OpenCV, GPU AI-Tree Search, GPU AI-Path Finding

 A lot of the libraries are hosted in Google Code. Many more gems in there too!

THANK YOU @RaymondTayBL

introduction to cuda geek camp singapore 2011

Technology

nvidia cuda

unsigned int shift

unsigned int alphabet

unsigned int n

forunsigned int

cuda libraries

wheres cuda

cuda answers