opencl joseph kider university of pennsylvania cis 565 - fall 2011
TRANSCRIPT
![Page 1: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/1.jpg)
OpenCL
Joseph KiderUniversity of PennsylvaniaCIS 565 - Fall 2011
![Page 2: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/2.jpg)
Sources
Patrick Cozzi Spring 2011 NVIDIA CUDA Programming Guide CUDA by Example Programming Massively Parallel
Processors
![Page 3: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/3.jpg)
Image from: http://www.khronos.org/developers/library/overview/opencl_overview.pdf
![Page 4: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/4.jpg)
OpenCL
Open Compute Language For heterogeneous parallel-computing
systems Cross-platform
Implementations for ATI GPUs NVIDIA GPUs x86 CPUs
Is cross-platform really one size fits all?Image from: http://developer.apple.com/softwarelicensing/agreements/opencl.html
![Page 5: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/5.jpg)
OpenCL
Standardized Initiated by Apple Developed by the Khronos Group
![Page 6: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/6.jpg)
Image from: http://www.khronos.org/developers/library/overview/opencl_overview.pdf
![Page 7: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/7.jpg)
Image from: http://www.khronos.org/developers/library/overview/opencl_overview.pdf
![Page 8: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/8.jpg)
Image from: http://www.khronos.org/developers/library/overview/opencl_overview.pdf
![Page 9: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/9.jpg)
OpenCL
API similar to OpenGL Based on the C language Easy transition form CUDA to OpenCL
![Page 10: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/10.jpg)
OpenCL and CUDA
Many OpenCL features have a one to one mapping to CUDA features
OpenCLMore complex platform and device managementMore complex kernel launch
![Page 11: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/11.jpg)
OpenCL and CUDA
Compute Unit (CU) correspond toCUDA streaming multiprocessors (SMs)CPU coreetc.
Processing Element correspond toCUDA streaming processor (SP)CPU ALU
![Page 12: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/12.jpg)
OpenCL and CUDA
Image from: http://developer.amd.com/zones/OpenCLZone/courses/pages/Introductory-OpenCL-SAAHPC10.aspx
![Page 13: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/13.jpg)
OpenCL and CUDA
CUDA OpenCL
Kernel Kernel
Host program Host program
Thread Work item
Block Work group
Grid NDRange (index space)
![Page 14: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/14.jpg)
OpenCL and CUDA
Work Item (CUDA thread) – executes kernel code
Index Space (CUDA grid) – defines work items and how data is mapped to them
Work Group (CUDA block) – work items in a work group can synchronize
![Page 15: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/15.jpg)
OpenCL and CUDA
CUDA: threadIdx and blockIdxCombine to create a global thread IDExample
blockIdx.x * blockDim.x + threadIdx.x
![Page 16: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/16.jpg)
OpenCL and CUDA
OpenCL: each thread has a unique global indexRetrieve with get_global_id()
CUDA OpenCL
threadIdx.x get_local_id(0)
blockIdx.x * blockDim.x + threadIdx.x
get_global_id(0)
![Page 17: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/17.jpg)
OpenCL and CUDA
CUDA OpenCL
gridDim.x get_num_groups(0)
blockIdx.x get_group_id(0)
blockDim.x get_local_size(0)
gridDim.x * blockDim.x get_global_size(0)
![Page 18: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/18.jpg)
OpenCL and CUDA
Image from: http://courses.engr.illinois.edu/ece498/al/textbook/Chapter2-CudaProgrammingModel.pdf
Recall CUDA:
![Page 19: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/19.jpg)
get_local_size(1)
OpenCL and CUDA
Index Space In OpenCL:
get_global_size(0)
get_global_size(1)
Work Group(0, 0)
Work Group(1, 0)
Work Group(2, 0)
Work Group(0, 1)
Work Group(1, 1)
Work Group(2, 1)
get_local_size(0)
WorkItem(0, 0)
Work Group (0,0)
WorkItem(1, 0)
WorkItem(2, 0)
WorkItem(3, 0)
WorkItem(4, 0)
WorkItem(0, 1)
WorkItem(1, 1)
WorkItem(2, 1)
WorkItem(3, 1)
WorkItem(4, 1)
WorkItem(0, 2)
WorkItem(1, 2)
WorkItem(2, 2)
WorkItem(3, 2)
WorkItem(4, 2)
![Page 20: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/20.jpg)
Image from http://developer.amd.com/zones/OpenCLZone/courses/pages/Introductory-OpenCL-SAAHPC10.aspx
![Page 21: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/21.jpg)
OpenCL and CUDA
Image from http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
Mapping to NVIDIA hardware:
![Page 22: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/22.jpg)
OpenCL and CUDA
Recall the CUDA memory model:
Image from: http://courses.engr.illinois.edu/ece498/al/textbook/Chapter2-CudaProgrammingModel.pdf
![Page 23: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/23.jpg)
OpenCL and CUDA
In OpenCL:
Image from http://developer.amd.com/zones/OpenCLZone/courses/pages/Introductory-OpenCL-SAAHPC10.aspx
![Page 24: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/24.jpg)
OpenCL and CUDACUDA OpenCL
Global memory Global memory
Constant memory Constant memory
Shared memory Local memory
Local memory Private memory
![Page 25: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/25.jpg)
OpenCL and CUDACUDA OpenCL
__syncthreads() __barrier()
Both also have Fences In CL
mem_fence() read_mem_fence() write_mem_fence()
![Page 26: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/26.jpg)
Image from: http://www.khronos.org/developers/library/overview/opencl_overview.pdf
![Page 27: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/27.jpg)
OpenCL and CUDA
__global__ void vecAdd(float *a, float *b, float *c)
{
int i = threadIdx.x;
c[i] = a[i] + b[i];
}
Kernel functions. Recall CUDA:
![Page 28: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/28.jpg)
OpenCL and CUDA
__kernel void vecAdd(__global const float *a, __global const float *b, __global float *c)
{
int i = get_global_id(0);
c[i] = a[i] + b[i];
}
In OpenCL:
![Page 29: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/29.jpg)
OpenCL and CUDA
__kernel void vecAdd(__global const float *a, __global const float *b, __global float *c)
{
int i = get_global_id(0);
c[i] = a[i] + b[i];
}
In OpenCL:
![Page 30: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/30.jpg)
Slide from: http://developer.amd.com/zones/OpenCLZone/courses/pages/Introductory-OpenCL-SAAHPC10.aspx
![Page 31: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/31.jpg)
Slide from: http://developer.amd.com/zones/OpenCLZone/courses/pages/Introductory-OpenCL-SAAHPC10.aspx
![Page 32: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/32.jpg)
Slide from: http://developer.amd.com/zones/OpenCLZone/courses/pages/Introductory-OpenCL-SAAHPC10.aspx
![Page 33: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/33.jpg)
Slide from: http://developer.amd.com/zones/OpenCLZone/courses/pages/Introductory-OpenCL-SAAHPC10.aspx
![Page 34: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/34.jpg)
Slide from: http://developer.amd.com/zones/OpenCLZone/courses/pages/Introductory-OpenCL-SAAHPC10.aspx
![Page 35: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/35.jpg)
Slide from: http://developer.amd.com/zones/OpenCLZone/courses/pages/Introductory-OpenCL-SAAHPC10.aspx
![Page 36: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/36.jpg)
Slide from: http://developer.amd.com/zones/OpenCLZone/courses/pages/Introductory-OpenCL-SAAHPC10.aspx
CUDAStreams
OpenGLBuffers
OpenGLShader Programs
![Page 37: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/37.jpg)
OpenCL API
Walkthrough OpenCL host code for running our vecAdd kernel:
__kernel void vecAdd(__global const float *a, __global const float *b, __global float *c)
{
int i = get_global_id(0);
c[i] = a[i] + b[i];
}See NVIDIA OpenCL JumpStart Guide for full code example: http://developer.download.nvidia.com/OpenCL/NVIDIA_OpenCL_JumpStart_Guide.pdf
![Page 38: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/38.jpg)
OpenCL API
// create OpenCL device & context
cl_context hContext;
hContext = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, 0, 0, 0);
![Page 39: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/39.jpg)
OpenCL API
// create OpenCL device & context
cl_context hContext;
hContext = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, 0, 0, 0);
Create a context for a GPU
![Page 40: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/40.jpg)
OpenCL API
// query all devices available to the context
size_t nContextDescriptorSize;
clGetContextInfo(hContext, CL_CONTEXT_DEVICES, 0, 0, &nContextDescriptorSize);
cl_device_id aDevices = malloc(nContextDescriptorSize);
clGetContextInfo(hContext, CL_CONTEXT_DEVICES,
nContextDescriptorSize, aDevices, 0);
![Page 41: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/41.jpg)
OpenCL API
// query all devices available to the context
size_t nContextDescriptorSize;
clGetContextInfo(hContext, CL_CONTEXT_DEVICES, 0, 0, &nContextDescriptorSize);
cl_device_id aDevices = malloc(nContextDescriptorSize);
clGetContextInfo(hContext, CL_CONTEXT_DEVICES,
nContextDescriptorSize, aDevices, 0);
Retrieve an array of each GPU
![Page 42: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/42.jpg)
OpenCL API
// create a command queue for first
// device the context reported
cl_command_queue hCmdQueue;
hCmdQueue = clCreateCommandQueue(hContext,
aDevices[0], 0, 0);
![Page 43: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/43.jpg)
OpenCL API
// create a command queue for first
// device the context reported
cl_command_queue hCmdQueue;
hCmdQueue = clCreateCommandQueue(hContext,
aDevices[0], 0, 0);
Create a command queue (CUDA stream) for the first GPU
![Page 44: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/44.jpg)
OpenCL API
// create & compile program
cl_program hProgram;
hProgram = clCreateProgramWithSource(hContext,
1, source, 0, 0);
clBuildProgram(hProgram, 0, 0, 0, 0, 0);
![Page 45: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/45.jpg)
OpenCL API
// create & compile program
cl_program hProgram;
hProgram = clCreateProgramWithSource(hContext,
1, source, 0, 0);
clBuildProgram(hProgram, 0, 0, 0, 0, 0);
• A program contains one or more kernels. Think dll.• Provide kernel source as a string• Can also compile offline
![Page 46: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/46.jpg)
OpenCL API
// create kernel
cl_kernel hKernel;
hKernel = clCreateKernel(hProgram, “vecAdd”, 0);
![Page 47: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/47.jpg)
OpenCL API
// create kernel
cl_kernel hKernel;
hKernel = clCreateKernel(hProgram, “vecAdd”, 0);
Create kernel from program
![Page 48: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/48.jpg)
OpenCL API
// allocate host vectors
float* pA = new float[cnDimension];
float* pB = new float[cnDimension];
float* pC = new float[cnDimension];
// initialize host memory
randomInit(pA, cnDimension);
randomInit(pB, cnDimension);
![Page 49: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/49.jpg)
OpenCL API
cl_mem hDeviceMemA = clCreateBuffer(
hContext,
CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
cnDimension * sizeof(cl_float),pA, 0);
cl_mem hDeviceMemB = /* ... */
![Page 50: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/50.jpg)
OpenCL API
cl_mem hDeviceMemA = clCreateBuffer(
hContext,
CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR,
cnDimension * sizeof(cl_float),pA, 0);
cl_mem hDeviceMemB = /* ... */
Create buffers for kernel input. Read only in the kernel. Written by the host.
![Page 51: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/51.jpg)
OpenCL API
hDeviceMemC = clCreateBuffer(hContext, CL_MEM_WRITE_ONLY,
cnDimension * sizeof(cl_float),
0, 0);
![Page 52: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/52.jpg)
OpenCL API
hDeviceMemC = clCreateBuffer(hContext, CL_MEM_WRITE_ONLY,
cnDimension * sizeof(cl_float),
0, 0);
Create buffer for kernel output.
![Page 53: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/53.jpg)
OpenCL API
// setup parameter values
clSetKernelArg(hKernel, 0, sizeof(cl_mem), (void *)&hDeviceMemA);
clSetKernelArg(hKernel, 1, sizeof(cl_mem), (void *)&hDeviceMemB);
clSetKernelArg(hKernel, 2, sizeof(cl_mem), (void *)&hDeviceMemC);
![Page 54: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/54.jpg)
OpenCL API
// setup parameter values
clSetKernelArg(hKernel, 0, sizeof(cl_mem), (void *)&hDeviceMemA);
clSetKernelArg(hKernel, 1, sizeof(cl_mem), (void *)&hDeviceMemB);
clSetKernelArg(hKernel, 2, sizeof(cl_mem), (void *)&hDeviceMemC);
Kernel argumentsset by index
![Page 55: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/55.jpg)
OpenCL API
// execute kernel
clEnqueueNDRangeKernel(hCmdQueue, hKernel, 1, 0, &cnDimension, 0, 0, 0, 0);
// copy results from device back to host
clEnqueueReadBuffer(hContext, hDeviceMemC, CL_TRUE, 0,
cnDimension * sizeof(cl_float),
pC, 0, 0, 0);
![Page 56: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/56.jpg)
OpenCL API
// execute kernel
clEnqueueNDRangeKernel(hCmdQueue, hKernel, 1, 0, &cnDimension, 0, 0, 0, 0);
// copy results from device back to host
clEnqueueReadBuffer(hContext, hDeviceMemC, CL_TRUE, 0,
cnDimension * sizeof(cl_float),
pC, 0, 0, 0);
Let OpenCL pickwork group size
Blocking read
![Page 57: OpenCL Joseph Kider University of Pennsylvania CIS 565 - Fall 2011](https://reader036.vdocuments.site/reader036/viewer/2022062305/5697c0091a28abf838cc71d6/html5/thumbnails/57.jpg)
OpenCL API
delete [] pA;
delete [] pB;
delete [] pC;
clReleaseMemObj(hDeviceMemA);
clReleaseMemObj(hDeviceMemB);
clReleaseMemObj(hDeviceMemC);