programming models for next generation of gpgpu architectures · programming models for next...

18
Programming models for next generation of GPGPU architectures Benedict R. Gaster February, 2011

Upload: trinhduong

Post on 21-Apr-2018

242 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

Programming models for next generation

of GPGPU architecturesBenedict R. Gaster

February, 2011

Page 2: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

2 | Programing models for next generation GPGPU | February, 2011 | Public

Motivation

Page 3: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

3 | Programing models for next generation GPGPU | February, 2011 | Public

OPENCL™ PROGRAM STRUCTURE

DEVICE

(OpenCL C)

CPU

(Platform and

Runtime APIs)

Host C/C++ Code OpenCL™ C Device Code

Page 4: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

4 | Programing models for next generation GPGPU | February, 2011 | Public

HELLO WORLD OPENCL™ C SOURCE

__constant char hw[] = "Hello World\n";

__kernel void hello(__global char * out) {

size_t tid = get_global_id(0);

out[tid] = hw[tid];

}

Page 5: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

5 | Programing models for next generation GPGPU | February, 2011 | Public

HELLO WORLD OPENCL™ C SOURCE

__constant char hw[] = "Hello World\n";

__kernel void hello(__global char * out) {

size_t tid = get_global_id(0);

out[tid] = hw[tid];

}

• This is a separate source file (or string)

• Cannot directly access host data

• Compiled at runtime

Page 6: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

6 | Programing models for next generation GPGPU | February, 2011 | Public

HELLO WORLD - HOST PROGRAM// create the OpenCL context on a GPU device

cl_context = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);

// get the list of GPU devices associated with context

clGetContextInfo(context, CL_CONTEXT_DEVICES, 0,

NULL, &cb);

devices = malloc(cb);

clGetContextInfo(context, CL_CONTEXT_DEVICES, cb, devices, NULL);

// create a command-queue

cmd_queue = clCreateCommandQueue(context, devices[0], 0, NULL);

memobjs[0] = clCreateBuffer(context,CL_MEM_WRITE_ONLY,

sizeof(cl_char)*strlen(“Hello World”, NULL,NULL);

// create the program

program = clCreateProgramWithSource(context, 1, &program_source, NULL, NULL);

// build the program

err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);

// create the kernel

kernel = clCreateKernel(program, “vec_add”, NULL);

// set the args values

err = clSetKernelArg(kernel, 0, (void *) &memobjs[0],

sizeof(cl_mem));

// set work-item dimensions

global_work_size[0] = strlen(“Hello World”);;

// execute kernel

err = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, global_work_size, NULL, 0, NULL, NULL);

// read output array

err = clEnqueueReadBuffer(cmd_queue, memobjs[0], CL_TRUE, 0, strlen(“Hello World”) *sizeof(cl_char), dst, 0, NULL, NULL);

Page 7: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

7 | Programing models for next generation GPGPU | February, 2011 | Public

HELLO WORLD - HOST PROGRAM

// create the OpenCL context on a GPU device

cl_context = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);

// get the list of GPU devices associated with context

clGetContextInfo(context, CL_CONTEXT_DEVICES, 0,

NULL, &cb);

devices = malloc(cb);

clGetContextInfo(context, CL_CONTEXT_DEVICES, cb, devices, NULL);

// create a command-queue

cmd_queue = clCreateCommandQueue(context, devices[0], 0, NULL);

// allocate the buffer memory objects

memobjs[0] = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(cl_char)*strlen(“Hello World”), srcA, NULL);}

// create the program

program = clCreateProgramWithSource(context, 1, &program_source, NULL, NULL);

// build the program

err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);

// create the kernel

kernel = clCreateKernel(program, “vec_add”, NULL);

// set the args values

err = clSetKernelArg(kernel, 0, (void *) &memobjs[0],

sizeof(cl_mem));

// set work-item dimensions

global_work_size[0] = n;

// execute kernel

err = clEnqueueNDRangeKernel(cmd_queue, kernel, 1, NULL, global_work_size, NULL, 0, NULL, NULL);

// read output array

err = clEnqueueReadBuffer(context, memobjs[2], CL_TRUE, 0, n*sizeof(cl_float), dst, 0, NULL, NULL);

Define platform and queues

Define Memory objects

Create the program

Build the program

Create and setup kernel

Execute the kernel

Read results on the host

Page 8: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

8 | Programing models for next generation GPGPU | February, 2011 | Public

What can we learn

Page 9: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

9 | Programing models for next generation GPGPU | February, 2011 | Public

LEARN FROM CURRENT GENERATION ARCHITECTURE

Page 10: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

10 | Programing models for next generation GPGPU | February, 2011 | Public

COMMON USE CASES

In OpenCL™ we generally see:

– Pick single device (often GPU or CL_DEVICE_TYPE_DEFAULT)

– All “kernels” in cl_program object are used in application

In CUDA the default for runtime mode is:

– Pick single device (always GPU)

– All “kernels” in scope are exported to the host application for specific

translation unit, i.e. calling kernels is syntactic and behave similar to

static linkage.

Page 11: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

11 | Programing models for next generation GPGPU | February, 2011 | Public

A look into the future

Page 12: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

12 | Programing models for next generation GPGPU | February, 2011 | Public

NEXT GENERATION GPGPU PROGRAM STRUCTURE

DEVICE

(C++0x)

CPU

(C++0x)

C++0x Code

Page 13: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

13 | Programing models for next generation GPGPU | February, 2011 | Public

HELLO WORLD C++0X SOURCE

hw[] = "Hello World\n";

void __attribute__(gpu) hello(

Index<1> index,

char * out)

{

size_t id = index.getX();

out[id] = hw[id];

}

int main(void)

{

char output[100];

parallelFor(Range<1>(length(hw)),

[output] (Index<1> index) {

hello(index, output);

});

}

Page 14: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

14 | Programing models for next generation GPGPU | February, 2011 | Public

HELLO WORLD C++0X SOURCE

hw[] = "Hello World\n";

void __attribute__(gpu) hello(

Index<1> index,

char * out)

{

size_t id = index.getX();

out[id] = hw[id];

}

int main(void)

{

char output[100];

parallelFor(Range<1>(length(hw)),

[output] (Index<1> index) {

hello(index, output);

});

}

• A single C++0x program

• Can directly access data on host and device

• Compiled offline

Page 15: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

15 | Programing models for next generation GPGPU | February, 2011 | Public

What questions still need

to be answered

Page 16: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

16 | Programing models for next generation GPGPU | February, 2011 | Public

WHAT QUESTIONS NEED TO BE ANSWERED

How close can the CPU and GPU really be?

– How does it effect the models of today:

GPU implies through put computing!

CPU implies local latency hiding in branchy code!

What effect does it have on the kind of applications one can run on these

Fusion systems?

How does this all fit with managed languages?

Is C++0x enough on its own?

What about languages like Haskell or other high-level models?

Page 17: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

17 | Programing models for next generation GPGPU | February, 2011 | Public

QUESTIONS

Page 18: Programming models for next generation of GPGPU architectures · Programming models for next generation of GPGPU architectures ... Programing models for next ... // create the OpenCL

18 | Programing models for next generation GPGPU | February, 2011 | Public

Trademark Attribution

AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States

and/or other jurisdictions. OpenCL is a trademark of Apple Inc. used with permission by Khronos. Other names used in this

presentation are for identification purposes only and may be trademarks of their respective owners.

©2011 Advanced Micro Devices, Inc. All rights reserved.