programming gpgpus using cuda - about us | …research.nesc.ac.uk/files/cuda_programming.pdf · why...

FAN ZHU 2012-11-20 PROGRAMMING GPGPUS USING CUDA

Upload: ngonhan

Post on 09-May-2018

244 views

Category:

Documents

1 download

Report

Download

Embed Size (px):

TRANSCRIPT

F A N Z H U 2 0 1 2 - 1 1 - 2 0

PROGRAMMING GPGPUS USING CUDA

WHY GPGPUS

•  GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

From NVIDIA: CUDA C Programming Guide

Page 3: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

GPUS VS. CPUS

•  NVIDEA: 10x to 1000x speedups

•  Intel: 2.5x speedups

CUDA

•  CUDA - Compute Unified Device Architecture •  C for CUDA is the programming language •  Fortran for CUDA •  Version 1.0 in 2007 •  Version 5.0 in 2012

•  Shared Memory Architecture

Page 5: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

CUDA CODE PORTABILITY

•  Hardware independent. •  Change configuration to achieve best performance

Page 6: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

CUDA WORKFLOW

1.  A CPU thread copies data from main memory to GPU memory.

2.  A CPU thread instructs GPU threads to start processing.

3.  GPU threads execute in parallel on different GPU cores.

3.∗ The CPU thread and all of the idle GPU threads wait for completion of the running GPU threads. This step happens at the same time as step 3.

4.  The CPU thread copies the results from GPU memory to main memory.

5.  The CPU thread acts on the results, and may return to step 1 in order to execute another GPU function.

Page 7: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

FUNCTION TYPES

•  __host__ •  Executed on the host (CPU) •  Callable from the host only

•  __global__ •  Executed on the device (GPU) •  Callable from the host only

•  __device__ •  Executed on the device •  Callable from the device only

Page 8: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

FUNCTIONS: MEMORY COPY

•  Executed on CPU

•  Allocate and free GPU memory •  cudaMalloc() and cudaFree()

•  Copy CPU memory to GPU memory •  cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);

•  Copy GPU memory to CPU memory •  cudaMemcpy(h_B, d_B, size, cudaMemcpyDeviceToHost);

Page 9: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

FUNCTIONS

•  __syncthreads() •  Called from the host

•  clock(); clock64(); •  Called from device code

Page 10: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

CUDA EXAMPLES: VECTOR ADD

On GPU

On CPU

You can ask for memory here. 16 KB limitation

Page 11: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

GRID AND BLOCK

0,0 0,1 0,2 0,3

1,0 1,1 1,2 1,3

2,0 2,1 2,2 2,3

3,0 3,1 3,2 3,3

Grid

Block(0,0)

Block(0,1)

Block(1,0)

Block(1,1)

•  GRID •  Share

memory •  Block (<=1024

threads) •  Share cache

Page 12: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

BLOCKS

0,0 0,1 0,2 0,3

1,0 1,1 1,2 1,3

2,0 2,1 2,2 2,3

3,0 3,1 3,2 3,3

Page 13: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

CUDA EXAMPLE: MATRIX ADD

Block = 1x1 Block = 16x16

Page 14: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

COMPLETE CODE

In a same .cu file!

Page 15: PROGRAMMING GPGPUS USING CUDA - About us | …research.nesc.ac.uk/files/CUDA_PROGRAMMING.pdf · WHY GPGPUS • GPGPUs - General Purpose Computing on Graphics Processing Units (GPUs)

THANK YOU.

•  CUDA C Programming Guide •  http://docs.nvidia.com/cuda/index.html

Accelerating MATLAB using CUDA- Enabled GPUs · Accelerating MATLAB using CUDA-Enabled GPUs ... –lots of other MATLAB functions (finance, biology, ... Supercomputing Performance

Introduction to CUDA - PRACE Research Infrastructure€¦ · CUDA A Parallel Computing Architecture for NVIDIA GPUs Supports standard languages and APIs •C • OpenCL • DX Compute

Tutorial CUDA - Pascal-Man CUDA © NVIDIA Corporation ... Why GPUs? CUDA programming model, language, and runtime Break CUDA implementation on the GPU ... vec_dot…

CUDA Fortran · 2016. 6. 27. · CUDA Fortran includes a Fortran 2003 compiler and tool chain for programming NVIDIA GPUs using Fortran. PGI 2011 includes support for CUDA Fortran

Introduction to GPGPUs and to CUDA programming model: … Autumn... · Introduction to GPGPUs and to CUDA programming model: CUDA Libraries ... Standard C Math library ... CUBLAS

GPUs: NVIDIA Tesla, Fermi, and CUDA

Programming GPUs with CUDA Day 2 - University of Toronto€¦ · Programming GPUs with CUDA Day 2 Sergey Mashchenko SHARCNET Summer School on High Performance Computing University

Sourcery VSIPL++ for NVIDIA CUDA GPUs

GPGPUs General Purpose GPUs. GPGPUs - AMD Fusion - Nvidia CUDA - Intel Nehalem - AMD Fusion - Nvidia CUDA - Intel Nehalem

GPGPUs and CUDA Guest Lecture, CSE167, Fall 2008 Computing with GPGPUs Raj Singh National Center for Microscopy and Imaging Research

25 de fevereiro de 2014 Programação em GPUs (OpenGL/GLSL CUDA) Yalmar Ponce

Scalable Parallel Programming with CUDA on Manycore GPUs

Programming of multiple GPUs with CUDA and Qt library

CUDA - Parallel Computing on GPUspdsgroup.hpclab.ceid.upatras.gr/files/CUDA-Parallel-Computing-on-G… · CUDA - Parallel Computing on GPUs Richard Membarth [email protected]

NVIDIA CUDA C GETTING STARTED GUIDE FOR ...developer.download.nvidia.com/compute/cuda/3_2_prod/docs/...®Many of the NVIDIA Quadro products An up-to-date list of CUDA-enabled GPUs

CUDA Fortran: Porting Scientific Research Codes to GPUs ... · CUDA C vs. CUDA Fortran Getting existing Fortran codes up and running on GPUs can be easy if you use the right tools

An Implementation of RSA 2048 on GPUs Using CUDA€¦ · An Implementation of RSA 2048 on GPUs Using CUDA ... Inline PTX with CUDA (near assembly) 4 ... An Implementation of RSA 2048

GPGPUs, CUDA and OpenCL - Aalto University Wiki · includes compiler, pro ler, debugger, manual pages, runtime libraries ... Timo Lilja GPGPUs, CUDA and OpenCL January 21, 2010 33

Random Forests for CUDA GPUs - Semantic Scholar Forests for CUDA GPUs Daniel Slat Mikael Hellborg Lapajne . ii This thesis is submitted to the School of Computing at Blekinge Institute

Accelerating image registration on GPUs - FAU image registration on GPUs ... Programming Tools: CUDA or OpenCL ... ATI Stream SDK User Guide; 39;

Many-core GPUs: Achievements and perspectivesicpp2013.ens-lyon.fr/GPUs-ICPP.pdf · 2013. 10. 4. · 2008 4,000 Academic Papers 150K CUDA Downloads 60 University Courses 100M CUDA-Capable

Implementation of RSA 2048 on GPUs using CUDA · 2019-05-16 · GPU NVIDIA GTX 465 #SM 11 Total #CUDA Cores 352 Device Clock Freq. 1’215 MHz CUDA Kit CUDA Tookit 3.2 CPU AMD Phenom

GPGPUs and their programming - Óbudai Egyetemcuda.nik.uni-obuda.hu/doc/gpgpu_course.pdfATI Stream OpenCL ... (Nvidia CUDA Programming Guide v2.0) Figure 1.5 [3] ... • CUDA SDK Sample

An#Introduction#to#CUDA/OpenCL# …parlab.eecs.berkeley.edu/sites/all/parlab/files/CatanzaroIntroToG... · Mapping#CUDA#to#Nvidia#GPUs#! ... Introduction to CUDA! CUDA Programming

GPGPUs and CUDA Guest Lecture Computing with GPGPUs Raj Singh National Center for Microscopy and Imaging Research

CUDA Fortran - PGI Compilers and Tools · 2018-11-21 · CUDA Fortran Quick Reference Guide CUDA Fortran is a Fortran analog to the NVIDIA CUDA C language for programming GPUs. It

A beginner’s guide to programming GPUs with CUDA

Distributed Genetic Programming on GPUs using CUDA · Distributed Genetic Programming on GPUs using CUDA ... Abstract Using of a cluster of Graphics Processing Unit (GPU) equipped

NVIDIA CUDA Libraries - Peoplepeople.sc.fsu.edu/~gerlebacher/gpus/nvidia_cuda_libraries_gtc2010... · NVIDIA CUDA Libraries —CUFFT —CUBLAS ... Directly approach our CUDA Library

Programação em GPUs (OpenGL/GLSL CUDA)

Computing Spherical Harmonic Transforms on CUDA-Compatible GPUs

OpenCL for NVIDIA GPUs 2013. 5. 7. · 1June 14, 2010 Chapter 1. Introduction OpenCL Supercomputing with CUDA architecture GPUs NVIDIA ® CUDA TM is a general purpose parallel computing

COMP528: Multi-core and Multi-Processor Computingmkbane/COMP528/copy_of_PDFs/... · 2018-11-28 · Programming GPUs CUDA • proprietary • NVIDIA only GPUs • non-portable •

Compiling CUDA and Other Languages for GPUs - GTC 2012

High Performance Computing on GPUs using NVIDIA CUDA · Mark Silberstein, Technion 1 High Performance Computing on GPUs using NVIDIA CUDA Slides include some material from GPGPU tutorial