1 itcs 6/8010 cuda programming, unc-charlotte, b. wilkinson, april 7, 2011, opencl.ppt opencl these...

34
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

Upload: terence-ford

Post on 13-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

1ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson,April 7, 2011, OpenCL.ppt

OpenCL

These notes will introduce OpenCL

Page 2: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

2

OpenCL(Open Computing Language)

A standard based upon C for portable parallel applications

Task parallel and data parallel applications

Focuses on multi platform support (multiple CPUs, GPUs, …)

Development initiated by Apple.

Developed by Khromos group who also managed OpenGLOpenCL 1.0 2008. Released with Max OS 10.6 (Snow Leopard) OpenCL 1.1 June 2010

Similarities with CUDA

Implementation available for NVIDIA GPUs

Wikipedia “OpenCL http://en.wikipedia.org/wiki/OpenCL

Page 3: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

3

OpenCL Programming Model

Uses data parallel programming model, similar to CUDA

Host program launches kernel routines as in CUDA, but allows for just-in-time compilation during host execution.

OpenCL “work items” corresponds to CUDA threads

OpenCL “work groups” corresponds to CUDA thread blocks

Work items in same work group can be synchronized with a barrier as in CUDA.

Page 4: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

4

Sample OpenCL code to add two vectors

To illustrate OpenCL commands, will used OpenCl code to add two vectors, A and B which are transferred to the device (GPU) and the result, C, returned to host (CPU), similar to CUDA vector addition

Page 5: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

5

Structure of OpenCL main programGet information about platform and devices available on system

Select devices to use

Create an OpenCL command queue

Create memory buffers on device

Create kernel program object

Build (compile) kernel in-line (or load precompiled binary)

Create OpenCL kernel object

Set kernel arguments

Execute kernel

Read kernel memory and copy to host memory.

Transfer data from host to device memory buffers

Page 6: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

6

Platform

"The host plus a collection of devices managed by the OpenCL framework that allow an application to share resources and execute kernels on devices in the platform."

Platforms represented by a cl_platform object, initialized with clGetPlatformID()

http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%201

Page 7: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

7

Simple code for identifying platform

//Platform

cl_platform_id platform;

clGetPlatformIDs (1, &platform, NULL);

List of OpenCL platforms found.

(Platform IDs)In our case just one platform, identified

by &platform

Number of platform entries

Returns number of OpenCL platforms

available. If NULL, ignored.

Page 8: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

8

Context

“The environment within which the kernels execute and the domain in which synchronization and memory management is defined.

The context includes a set of devices, the memory accessible to those devices, the corresponding memory properties and one or more command-queues used to schedule execution of a kernel(s) or operations on memory objects.”

The OpenCL Specification version 1.1 http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf

Page 9: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

9

//Context

cl_context_properties props[3];

props[0] = (cl_context_properties) CL_CONTEXT_PLATFORM;

props[1] = (cl_context_properties) platform;

props[2] = (cl_context_properties) 0;

cl_context GPUContext =

clCreateContextFromType(props,CL_DEVICE_TYPE_GPU,NULL,NULL,NULL);

//Context info

size_t ParmDataBytes;

clGetContextInfo(GPUContext,CL_CONTEXT_DEVICES,0,NULL,&ParmDataBytes);

cl_device_id* GPUDevices = (cl_device_id*)malloc(ParmDataBytes);

clGetContextInfo(GPUContext,CL_CONTEXT_DEVICES,ParmDataBytes,GPUDevices,

NULL);

Code for context

Page 10: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

10

Command Queue

“An object that holds commands that will be executed on a specific device.

The command-queue is created on a specific device in a context.

Commands to a command-queue are queued in-order but may be executed in-order or out-of-order. ...”

The OpenCL Specification version 1.1 http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf

Page 11: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

11

// Create command-queue

cl_command_queue GPUCommandQueue = clCreateCommandQueue(GPUContext,GPUDevices[0],0,NULL);

Simple code for creating a command queue

Page 12: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

12

Allocating memory on device

Use clCreatBuffer:

cl_mem clCreateBuffer(cl_context context,

cl_mem_flags flags,

size_t size,

void *host_ptr,

cl_int *errcode_ret)

OpenCL context, from clCreateContextFromType()

Bit field to specify type of allocation/usage (CL_MEM_READ_WRITE ,…)

No of bytes in buffer memory object

Returns error code if an error

Ptr to buffer data (May be previously allocated.)

Returns memory object

Page 13: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

13

Sample code for allocating memory on device for source data

// source data on host, two vectors

int *A, *B;A = new int[N];B = new int[N];for(int i = 0; i < N; i++) {

A[i] = rand()%1000;B[i] = rand()%1000;

}…

// Allocate GPU memory for source vectors

cl_mem GPUVector1 = clCreateBuffer(GPUContext,CL_MEM_READ_ONLY |CL_MEM_COPY_HOST_PTR,sizeof(int)*N, A, NULL);

cl_mem GPUVector2 = clCreateBuffer(GPUContext,CL_MEM_READ_ONLY |CL_MEM_COPY_HOST_PTR,sizeof(int)*N, B, NULL);

Page 14: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

14

Sample code for allocating memory on device for results on GPU

// Allocate GPU memory for output vector

cl_mem GPUOutputVector =

clCreateBuffer(GPUContext,CL_MEM_WRITE_ONLY,sizeof(int)*N,

NULL,NULL);

Page 15: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

15

Kernel Program

Simple programs might be in the same file as the host code as our CUDA examples.

In that case need to formed into strings in a character array.

If in a separate file, can read that file into host program as a character string

Page 16: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

16

Kernel program

const char* OpenCLSource[] = {"__kernel void vectorAdd (const __global int* a,"," const __global int* b,"," __global int* c)","{"," unsigned int gid = get_global_id(0);"," c[gid] = a[gid] + b[gid];","}"};

…int main(int argc, char **argv){…}

If in same program as host, kernel needs to be strings (I think it can be a single string)

OpenCL qualifier to indicate kernel code

OpenCL qualifier to indicate kernel memory(Memory objects allocated from global memory pool)

Double underscores optional in OpenCL qualifiers

Returns global work-item ID in given dimension (0 here)

Page 17: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

17

Kernel in a separate file

// Load the kernel source code into the array source_str

FILE *fp; char *source_str; size_t source_size;

fp = fopen("vector_add_kernel.cl", "r"); if (!fp) { fprintf(stderr, "Failed to load kernel.\n"); exit(1); }

source_str = (char*)malloc(MAX_SOURCE_SIZE); source_size = fread( source_str, 1, MAX_SOURCE_SIZE, fp); fclose( fp );

http://mywiki-science.wikispaces.com/OpenCL

Page 18: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

18

Create kernel program object

const char* OpenCLSource[] = {…};

int main(int argc, char **argv)…

// Create OpenCL program object

cl_program OpenCLProgram = clCreateProgramWithSource(GPUContext,7,OpenCLSource,NULL,NULL);

Number of strings in kernel program array

Used if strings not null-terminated to given length of strings

Used to return error code if error

This example uses a single file for both host and kernel code. Can use clCreateprogramWithSource() with a separate kernel file read into host program

Page 19: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

19

Build kernel program

// Build the program (OpenCL JIT compilation)

clBuildProgram(OpenCLProgram,0,NULL,NULL,NULL,NULL);

Program object from clCreateProgramwithSource

Number of devices

List of devices, if more than one

Build options

Arguments for notification

routine

Function ptr to notification routine called with build complete. Then

clBuildProgram will return immediately,

otherwise only when build complete

Page 20: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

20

Creating Kernel Objects

// Create a handle to the compiled OpenCL function

cl_kernel OpenCLVectorAdd = clCreateKernel(OpenCLProgram, "vectorAdd", NULL);

Built prgram from clBuildProgram

Function name with __kernel qualifier

Return error code

Page 21: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

21

Set Kernel Arguments

// Set kernel arguments

clSetKernelArg(OpenCLVectorAdd,0,sizeof(cl_mem), (void*)&GPUVector1);

clSetKernelArg(OpenCLVectorAdd,1,sizeof(cl_mem), (void*)&GPUVector2);

clSetKernelArg(OpenCLVectorAdd,2,sizeof(cl_mem), (void*)&GPUOutputVector);

Kernel objectfrom clCreateKernel()

Which argument

Size of argument

Pointer to data for argument, from clCreateBuffer()

Page 22: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

Number of events to complete

before this commands

Array describing no of global work

items

Array describing no of work items that make up a

work group

22

Enqueue a command to execute kernel on device

// Launch the kernel

size_t WorkSize[1] = {N}; // Total number of work items size_t localWorkSize[1]={256}; //No of work items in work group

// Launch the kernel

clEnqueueNDRangeKernel(GPUCommandQueue,OpenCLVectorAdd,1,NULL,

WorkSize, localWorkSize, 0, NULL, NULL);

Kernel object from clCreatKernel()

Dimensions of work items

Offset used with work item

Event wait list

Event

Page 23: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

23

Function to copy from buffer object to host memory

The following function enqueue commands to read from a buffer object to host memory:

cl_int clEnqueueReadBuffer (cl_command_queue command_queue,cl_mem buffer,cl_bool blocking_read,size_t offset,size_t cb,void *ptr,cl_uint num_events_in_wait_list,const cl_event *event_wait_list,cl_event *event)

The OpenCL Specification version 1.1 http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf

Page 24: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

24

Function to copy from host memory to buffer object

The following function enqueue commands to write to a buffer object from host memory:

cl_int clEnqueueWriteBuffer (cl_command_queue command_queue,cl_mem buffer,cl_bool blocking_write,size_t offset,size_t cb,const void *ptr,cl_uint num_events_in_wait_list,const cl_event *event_wait_list,cl_event *event)

The OpenCL Specification version 1.1 http://www.khronos.org/registry/cl/specs/opencl-1.1.pdf

Page 25: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

25

Copy data back from kernel

// Copy the output back to CPU memory

int *C;

C = new int[N];

clEnqueueReadBuffer(GPUCommandQueue,GPUOutputVector,

CL_TRUE, 0, N*sizeof(int), C, 0, NULL, NULL);

Read is blocking

Byte offset in buffer

Size of data to read in bytes

Pointer to buffer in host to write

data

Number of events to

complete before this commands

Event wait list

Event

Command queue from clCreateCommandQueue

Device buffer from clCreateBuffer

Page 26: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

26

Results from GPU

cout << "C[“ << 0 << "]: " << A[0] <<"+"<< B[0] <<"=" << C[0]

<< "\n";

cout << "C[“ << N-1 << "]: “ << A[N-1] << "+“ << B[N-1] << "="

<< C[N-1] << "\n";

C++ here

Page 27: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

27

Clean-up

// Cleanup

free(GPUDevices);

clReleaseKernel(OpenCLVectorAdd);

clReleaseProgram(OpenCLProgram);

clReleaseCommandQueue(GPUCommandQueue);

clReleaseContext(GPUContext);

clReleaseMemObject(GPUVector1);

clReleaseMemObject(GPUVector2);

clReleaseMemObject(GPUOutputVector);

Page 28: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

28

Compiling

Need OpenCL header:

#include <CL/cl.h>

(For mac: #include <OpenCL/opencl.h> )

and link to the OpenCL library.

Compile OpenCL host program main.c using gcc, two phases:

gcc -c -I /path-to-include-dir-with-cl.h/ main.c -o main.ogcc -L /path-to-lib-folder-with-OpenCL-libfile/ -l OpenCL main.o -o host

Ref: http://www.thebigblob.com/getting-started-with-opencl-and-gpu-computing/

Page 29: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

29

Make File(Program called scalarmulocl)

CC = g++LD = g++ -lmCFLAGS = -Wall -sharedCDEBUG =LIBOCL = -L/nfs-home/mmishra2/NVIDIA_GPU_Computing_SDK/OpenCL/common/libINCOCL = -I/nfs-home/mmishra2/NVIDIA_GPU_Computing_SDK/OpenCL/common/incSRCS = scalarmulocl.cppOBJS = scalarmulocl.oEXE = scalarmulocl.aall: $(EXE)$(OBJS): $(SRCS)

$(CC) $(CFLAGS) $(INCOCL) -I/usr/include -c $(SRCS)$(EXE): $(OBJS)

$(LD) -L/usr/local/lib $(OBJS) $(LIBOCL) -o $(EXE) -l OpenCLclea:

rm -f $(OBJS) *~clear

References: http://mywiki-science.wikispaces.com/OpenCLSubmitted by: Manisha Mishra

Page 30: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

30

To compile: make

To Run: ./scalarmulocl.a

Snapshot:

Submitted by: Manisha Mishra

Compiling and Executing the program

Page 31: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

Questions

Page 32: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

32

Chapter 11 of Programming Massively Parallel Processors by D. B. Kirk and W-M W. Hwu, Morgan Kaufmann, 2010

More Information

Page 33: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

33

clGetPlatformIDsObtain the list of platforms available.

cl_int clGetPlatformIDs(cl_uint num_entries,cl_platform_id *platforms,cl_uint *num_platforms)

Parametersnum_entries The number of cl_platform_id entries that can be added to platforms. If platforms is not NULL, the num_entries must be greater than zero.

platforms Returns a list of OpenCL platforms found. The cl_platform_id values returned in platforms can be used to identify a specific OpenCL platform. If platforms argument is NULL, this argument is ignored. The number of OpenCL platforms returned is the mininum of the value specified by num_entries or the number of OpenCL platforms available.

num_platforms Returns the number of OpenCL platforms available. If num_platforms is NULL, this argument is ignored.

http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/

Page 34: 1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, April 7, 2011, OpenCL.ppt OpenCL These notes will introduce OpenCL

34

#include <stdio.h>#include <stdlib.h>

#include <CL/cl.h> //OpenCL header for C

#include <iostream> //C++ input/outputusing namespace std;

Includes