gmac global memory for accelerators
DESCRIPTION
GMAC Global Memory for Accelerators. Isaac Gelado PUMPS Summer School - Barcelona. Vector Addition CUDA code. Vector addition Really simple kernel code But, what about the CPU code? GMAC is a complement to the CUDA run-time Simplifies the CPU code Exploits advanced CUDA features for free. - PowerPoint PPT PresentationTRANSCRIPT
GMACGlobal Memory for Accelerators
Isaac GeladoPUMPS Summer School - Barcelona
Vector Addition CUDA code• Vector addition– Really simple kernel code– But, what about the CPU code?
• GMAC is a complement to the CUDA run-time– Simplifies the CPU code– Exploits advanced CUDA features for free
__global__ void vector(float *c, float *a, float *b, size_t size){
int idx = threadIdx.x + blockIdx.x * blockDim.x;if(idx < size) c[idx] = a[idx] + b[idx];
}
7/8/10 2PUMPS Summer School
Some easy CUDA code (I)• Read from disk, transfer to GPU and computeint main(int argc, char *argv[]) {
float *h_a, *h_b, *h_c, *d_a, *d_b, *d_c;size_t size = LENGTH * sizeof(float);
assert((h_a = malloc(size) != NULL);assert((h_b = malloc(size) != NULL);assert((h_c = malloc(size) != NULL);
assert(cudaMalloc((void **)&d_a, size) == cudaSuccess));assert(cudaMalloc((void **)&d_b, size) == cudaSuccess));assert(cudaMalloc((void **)&d_c, size) == cudaSuccess));
read_file(argv[A], h_a);read_file(argv[B], h_b);
assert(cudaMemcpy(d_a, h_a, size, cudaMemcpyHostToDevice) == cudaSuccess);
assert(cudaMemcpy(d_b, h_b, size, cudaMemcpyHostToDevice) == cudaSuccess);
7/8/10 3PUMPS Summer School
Some easy CUDA code (and II)• Read from disk, transfer to GPU and compute
Db(BLOCK_SIZE);Dg(LENGTH / BLOCK_SIZE);if(LENGTH % BLOCK_SIZE) Dg.x++;vector<<<Dg, Db>>>(d_c, d_a, d_b, LENGTH);assert(cudaThreadSynchronize() == cudaSuccess);
assert(cudaMemcpy(d_c, h_c, LENGTH * sizeof(float), cudaMemcpyDeviceToHost) == cudaSuccess);
save_file(argv[C], h_c);
free(h_a); cudaFree(d_a);free(h_b); cudaFree(d_b);free(h_c); cudaFree(d_c);
return 0;}
7/8/10 4PUMPS Summer School
Some really easy GMAC code
7/8/10 6PUMPS Summer School
int main(int argc, char *argv[]) {float *a, *b, *c;size_t size = LENGTH * sizeof(float);
assert(gmacMalloc((void **)&a, size) == gmacSuccess));assert(gmacMalloc((void **)&b, size) == gmacSuccess));assert(gmacMalloc((void **)&c, size) == gmacSuccess));
read_file(argv[A], a);read_file(argv[B], b);
Db(BLOCK_SIZE);Dg(LENGTH / BLOCK_SIZE);if(LENGTH % BLOCK_SIZE) Dg.x++;vector<<<Dg, Db>>>(c, a, b, LENGTH);assert(gmacThreadSynchronize() == gmacSuccess);
save_file(argv[C], c);
gmacFree(a); gmacFree(b); gmacFree(c);
return 0;}
There is no memory
copy
There is no memory
copy
Getting GMAC• GMAC is at http://adsm.googlecode.com/
• Debian / Ubuntu binary and development .deb files
• UNIX (also MacOS X) source code package
• Experimental versions from mercurial repository
7/8/10 7PUMPS Summer School
Outline• Introduction• GMAC Memory Model– Asymmetric Memory– Global Memory
• GMAC Execution Model– Multi-threading– Inter-thread communication
• Conclusions
7/8/10 8PUMPS Summer School
GMAC Memory Model• Unified CPU / GPU virtual address space• Asymmetric address space visibility
CPU
Memory
GPU
Shared Data
CPU Data
7/8/10 9PUMPS Summer School
GMAC Consistency Model• Implicit acquire / release primitives at
accelerator call / return boundaries
7/8/10 PUMPS Summer School 10
CPU ACC
CPU ACC
GMAC Memory API• Memory allocationgmacError_t gmacMalloc(void **ptr, size_t size)– Allocated memory address (returned by reference)– Gets the size of the data to be allocated– Error code, gmacSuccess if no error
• Example usage#include <gmac.h>
int main(int argc, char *argv[]) { float *foo = NULL; gmacError_t error; if((error = gmacMalloc((void **)&foo, FOO_SIZE)) != gmacSuccess) FATAL(“Error allocating memory %s”, gmacErrorString(error)); . . .}
7/8/10 11PUMPS Summer School
GMAC Memory API• Memory releasegmacError_t gmacFree(void *ptr)– Memory address to be release– Error code, gmacSuccess if no error
• Example usage#include <gmac.h>
int main(int argc, char *argv[]) { float *foo = NULL; gmacError_t error; if((error = gmacMalloc((void **)&foo, FOO_SIZE)) != gmacSuccess) FATAL(“Error allocating memory %s”, gmacErrorString(error)); . . . gmacFree(foo);}
7/8/10 12PUMPS Summer School
GMAC Memory API• Memory translation (workaround for Fermi)Void *gmacPtr(void *ptr)template<typename T> T *gmacPtr(T *ptr)– CPU memory address– GPU memory address
• Example usage#include <gmac.h>
int main(int argc, char *argv[]) { . . . kernel<<<Dg, Db>>>(gmacPtr(buffer), size); . . .}
7/8/10 13PUMPS Summer School
GMAC Execution Example• Get advanced CUDA features for free– Asynchronous data transfers– Pinned memory
7/8/10 14PUMPS Summer School
Outline• Introduction• GMAC Memory Model– Asymmetric Memory– Global Memory
• GMAC Execution Model– Multi-threading– Inter-thread communication
• Conclusions
7/8/10 15PUMPS Summer School
GMAC Global Memory• Data accessible by all accelerators, but owned
by the CPU
CPU
Memory
GPU
GPU
7/8/10 16PUMPS Summer School
GMAC Global memory API• Memory allocationgmacError_t gmacGlobalMalloc(void **ptr, size_t size)– Allocated memory address (returned by reference)– Gets the size of the data to be allocated– Error code, gmacSuccess if no error
• Example usage#include <gmac.h>
int main(int argc, char *argv[]) { float *foo = NULL; gmacError_t error; if((error = gmacGlobalMalloc((void **)&foo, FOO_SIZE)) != gmacSuccess) FATAL(“Error allocating memory %s”, gmacErrorString(error)); . . .}7/8/10 17PUMPS Summer School
Outline• Introduction• GMAC Memory Model– Asymmetric Memory– Global Memory
• GMAC Execution Model– Multi-threading– Inter-thread communication
• Conclusions
7/8/10 18PUMPS Summer School
GMAC and Multi-threading• In the past, one thread one CPU• In GMAC, one thread:– One CPU– One GPU
• A thread is running in the GPU or the CPU, but not in both at the same time
• Create threads using what you already know– pthread_create(...)
7/8/10 19PUMPS Summer School
GMAC and Multi-threading• Virtual memory accessibility:– Complete address space in CPU code– Partial address space in GPU code
CPU CPU
GPU GPUMemory
7/8/10 20PUMPS Summer School
Outline• Introduction• GMAC Memory Model– Asymmetric Memory– Global Memory
• GMAC Execution Model– Multi-threading– Inter-thread communication
• Conclusions
7/8/10 21PUMPS Summer School
GPU Passing and Copying• GPU passing:– Send the thread’s virtual GPU to another thread– Do not move data, move computation
• API Calls– Virtual GPU sendinggmacError_t gmacSend(thread_id dest)– Virtual GPU receivinggmacError_t gmacReceive()– Virtual GPU copyinggmacError_t gmacCopy(thread_id dest)
7/8/10 22PUMPS Summer School
Outline• Introduction• GMAC Memory Model– Asymmetric Memory– Global Memory
• GMAC Execution Model– Multi-threading– Inter-thread communication
• Conclusions
7/8/10 23PUMPS Summer School
Conclusions• Single virtual address space for CPUs and
GPUs• Use CUDA advanced features– Automatic overlap data communication and
computation– Get access to any GPU from any CPU thread
• Get more performance from your application more easily
• Go: http://adsm.googlecode.com
7/8/10 24PUMPS Summer School
GMACGlobal Memory for Accelerators
Isaac GeladoPUMPS Summer School - Barcelona
Backup Slides
GMAC Unified Address Space• When allocating memory
1. Allocate accelerator memory2. Allocate CPU memory at the same virtual address
CPU AcceleratorSystemMemory
AcceleratorMemory
7/8/10 27PUMPS Summer School
GMAC Unified Address Space• Use fixed-size segments to map accelerator memory• Implement and export Accelerator Virtual Memory
Accelerator
SystemMemory
CPU
AcceleratorMemory
AcceleratorAcceleratorMemory
0x100100000 0x00100000
0x001000000x200100000
7/8/10 28PUMPS Summer School
GMAC Data Transfers• Avoid unnecessary data copies• Lazy-update:
– Call: transfer modified data– Return: transfer when needed
7/8/10 PUMPS Summer School 29
AcceleratorSystemMemory
AcceleratorMemory
CPU
GMAC Data Transfers• Overlap CPU execution and data transfers• Minimal transfer on-demand• Rolling-update:
– Memory-block size granularity
7/8/10 PUMPS Summer School 30
AcceleratorSystemMemory
AcceleratorMemory
CPU
GMACGlobal Memory for Accelerators
Isaac GeladoPUMPS Summer School - Barcelona