introduction to cuda geek camp singapore 2011

18
INTRODUCTION TO CUDA Prepared for Geek Camp Singapore 2011 Raymond Tay

Upload: raymond-tay

Post on 23-Jun-2015

2.799 views

Category:

Technology


2 download

DESCRIPTION

This presentation is for Geek Camp Singapore 2011 1st October

TRANSCRIPT

Page 1: Introduction to cuda   geek camp singapore 2011

INTRODUCTION TO CUDA Prepared for Geek Camp Singapore 2011

Raymond Tay

Page 2: Introduction to cuda   geek camp singapore 2011

THE FREE LUNCH IS OVER – HERB SUTTER

Page 3: Introduction to cuda   geek camp singapore 2011

WE NEED TO THINK BEYOND MULTI-CORE CPUS … WE NEED TO THINK MANY-CORE GPUS

Page 4: Introduction to cuda   geek camp singapore 2011

NVIDIA GPUS FPS

  FPS – Floating-point per second aka flops. A measure of how many flops can a GPU do. More is Better

GPUs beat CPUs

Page 5: Introduction to cuda   geek camp singapore 2011

NVIDIA GPUS MEMORY BANDWIDTH

  With massively parallel processors in Nvidia’s GPUs, providing high memory bandwidth plays a big role in high performance computing.

GPUs beat CPUs

Page 6: Introduction to cuda   geek camp singapore 2011

GPU VS CPU

CPU "   Optimised for low-latency

access to cached data sets "   Control logic for out-of-order

and speculative execution

GPU "   Optimised for data-parallel,

throughput computation "   Architecture tolerant of

memory latency "   More transistors dedicated to

computation

Page 7: Introduction to cuda   geek camp singapore 2011

I DON’T KNOW C/C++, SHOULD I LEAVE?

 Relax, no worries. Not to fret.

Your Brain Asks: Wait a minute, why should I learn the C/C++ SDK?

CUDA Answers: Efficiency!!!

Page 8: Introduction to cuda   geek camp singapore 2011

WHAT DO I NEED TO BEGIN WITH CUDA?

 A Nvidia CUDA enabled graphics card e.g. Fermi

Page 9: Introduction to cuda   geek camp singapore 2011

HOW DOES CUDA WORK

1.  Copy input data from CPU memory to GPU memory

2.  Load GPU program and execute, caching data on chip for performance

3.  Copy results from GPU memory to CPU memory

PCI Bus

Page 10: Introduction to cuda   geek camp singapore 2011

EXAMPLE: BLOCK CYPHER

void host_shift_cypher(unsigned int *input_array, unsigned int *output_array, unsigned int shift_amount, unsigned int alphabet_max, unsigned int array_length)

{

for(unsigned int i=0;i<array_length;i++)

{

int element = input_array[i];

int shifted = element + shift_amount;

if(shifted > alphabet_max)

{

shifted = shifted % (alphabet_max + 1);

}

output_array[i] = shifted;

}

}

Int main() {

host_shift_cypher(input_array, output_array, shift_amount, alphabet_max, array_length);

}

__global__ void shift_cypher(unsigned int *input_array, unsigned int *output_array, unsigned int shift_amount, unsigned int alphabet_max, unsigned int array_length)

{

unsigned int tid = threadIdx.x + blockIdx.x * blockDim.x;

int shifted = input_array[tid] + shift_amount;

if ( shifted > alphabet_max )

shifted = shifted % (alphabet_max + 1);

output_array[tid] = shifted;

}

Int main() {

dim3 dimGrid(ceil(array_length)/block_size);

dim3 dimBlock(block_size);

shift_cypher<<<dimGrid,dimBlock>>>(input_array, output_array, shift_amount, alphabet_max, array_length);

} CPU Program

GPU Program

Page 11: Introduction to cuda   geek camp singapore 2011

EXAMPLE: VECTOR ADDITION // CUDA CODE __global__ void VecAdd(const float* A, const float* B, float* C,

unsigned int N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) C[i] = A[i] + B[i]; }

// C CODE void VecAdd(const float* A, const float* B, float* C,unsigned int N) { for( int i = 0; i < N; ++i) C[i] = A[i] + B[i]; }

Page 12: Introduction to cuda   geek camp singapore 2011

DEBUGGER

CUDA-GDB

Parallel Nsight

• Based on GDB • Linux • Mac OS X

• Plugin inside Visual Studio

Page 13: Introduction to cuda   geek camp singapore 2011

VISUAL PROFILER & MEMCHECK

Profiler

•  Microsoft Windows •  Linux •  Mac OS X

•  Analyze Performance

CUDA-MEMCHECK

•  Microsoft Windows •  Linux •  Mac OS X

•  Detect memory access errors

Page 14: Introduction to cuda   geek camp singapore 2011

WHERE’S CUDA AT IN 2011?

  60,000 researchers use it to aid drug discovery   470 universities teach CUDA

Page 15: Introduction to cuda   geek camp singapore 2011

WHERE’S CUDA AT IN 2011? (PART 2..)

 NVIDIA Show Case (1000+ applications)

Page 16: Introduction to cuda   geek camp singapore 2011

ADDITIONAL RESOURCES   CUDA FAQ (http://tegradeveloper.nvidia.com/cuda-faq)

  CUDA Tools & Ecosystem (http://tegradeveloper.nvidia.com/cuda-tools-ecosystem)

  CUDA Downloads (http://tegradeveloper.nvidia.com/cuda-downloads)

  NVIDIA Forums (http://forums.nvidia.com/index.php?showforum=62)

  GPGPU (http://gpgpu.org )

  CUDA By Example (http://tegradeveloper.nvidia.com/content/cuda-example-introduction-general-purpose-gpu-programming-0)

  Jason Sanders & Edward Kandrot   GPU Computing Gems Emerald Edition (

http://www.amazon.com/GPU-Computing-Gems-Emerald-Applications/dp/0123849888/ )   Editor in Chief: Prof Hwu Wen-Mei

Page 17: Introduction to cuda   geek camp singapore 2011

CUDA LIBRARIES

 Visit this site http://developer.nvidia.com/cuda-tools-ecosystem#Libraries

 Thrust, CUFFT, CUBLAS, CUSP, NPP, OpenCV, GPU AI-Tree Search, GPU AI-Path Finding

 A lot of the libraries are hosted in Google Code. Many more gems in there too!

Page 18: Introduction to cuda   geek camp singapore 2011

THANK YOU @RaymondTayBL