openmp to cuda

17

Mapping OpenMP to the Stream Programming Model Hu Ming Zhang Fangzhou Yue Kun

Upload: hu-ming

Post on 21-Apr-2015

129 views

Category:

Documents

1 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: OpenMP to CUDA

Mapping OpenMP to the Stream Programming Model

Hu Ming Zhang Fangzhou Yue Kun

Page 2: OpenMP to CUDA

Objective 1. Study the mapping relationship of parallel mechanism in OpenMP to stream programming model (CUDA). 2. Point out the which part is suitable for translation. 3. Analyzing typical scientific applications

Page 3: OpenMP to CUDA

Outline OpenMP vs CUDA: Execution model OpenMP vs CUDA: Semantics OpenMP vs CUDA: Performace Analysis of Benchmarks

Page 4: OpenMP to CUDA

OpenMP vs CUDA Execution Model

Page 5: OpenMP to CUDA

OpenMP vs CUDA Execution Model

Page 6: OpenMP to CUDA

Page 7: OpenMP to CUDA

OpenMP vs CUDA Semantic

Parallel Construct parallel

Worksharing Construct loop, sections, single

Master and Synchronization Construct critical, barrier, taskwait, atomic, flush, ordered

Data Environment shared, private, firstprivate, lastprivate, reduction, copyin, copyprivate

Page 8: OpenMP to CUDA

OpenMP vs CUDA Semantic

#include <omp.h>

main()

{

int x;

x = 0;

#pragma omp parallel shared(x)

{

#pragma omp critical

x = x + 1;

}

/* end of parallel section */

}

Page 9: OpenMP to CUDA

OpenMP vs CUDA Semantic

#pragma omp for ordered [clauses...] (loop region) #pragma omp ordered structured_block (endo of loop region)

Page 10: OpenMP to CUDA

OpenMP vs CUDA Semantic

Most of the directives and clauses can be mapped into the stream programs

Page 11: OpenMP to CUDA

OpenMP vs CUDA Performance

CUDA: lightweight hardware thread data-centric processing model simple control logic inefficient to handle branch

OpenMP: OS level thread thread-centric parallel processing model thread can be complicated

Map those constructs that have large parallelism and uniform processing among threads

Page 12: OpenMP to CUDA

OpenMP vs CUDA Performance

Not suitable: single, section. –-- they have small parallelism and different processing among threads master ---- parallelism is 1 barrier, taskwait ---- demand all threads grouped into one block lastprivate ---- processing is not uniform among threadc

Page 13: OpenMP to CUDA

OpenMP vs CUDA

To understand whether it is reasonable to translate OpenMP program to CUDA program, we should analyze the application’s pattern.

Page 14: OpenMP to CUDA

Page 15: OpenMP to CUDA

Page 16: OpenMP to CUDA

Conclusion 1. A majority of scientific applications

are suitable to be mapped to stream programming model.

2. The heterogeneous architecture using CPU and GPU will be more common.

Page 17: OpenMP to CUDA

Comments: 1.This paper’s work is mainly on

analysis.

2.We think more real applications should be considered, not just benchmark.

3.Automatically translate OpenMP program to CUDA program may be possible.

Introduction to OpenMP

Introduction to Scientific Programming using GPGPU and CUDA · Introduction to Scientific Programming using GPGPU and CUDA ... (NVIDIA CUDA Programming Guide) ... CUDA C OpenCL CUDA

Introduction to Openmp & openACC

OP2 MANY-CORE ARCHITECTURES - University of Oxfordpeople.maths.ox.ac.uk/gilesm/talks/AWE-Visit-27012012.pdf · Single Node CUDA Single Node OpenMP Cluster MPI Cluster MPI+CUDA Conventional

CUDA 와 OpenMP application 의 성능비교를 통한 적합한 GPGPU …nyx.skku.ac.kr/publications/papers/CUDAopenPM.pdf · 2011. 12. 30. · CUDA 와 OpenMP application 의 성능비교를

Introduction to OpenMP. OpenMP Introduction Credits: allans/cs260/lectures/OpenMP.ppt douglas/Classes/cs521-s02/...openmp/MPI-OpenMP.ppt

Tutorial: Introduction to OpenMP

Biref Introduction to OpenMP

Debugging Numerical Simulations on Accelerated Architectures - TotalView for OpenPOWER, CUDA and OpenMP

Python for Development of OpenMP and CUDA Kernels for

NVIDIA PROFILING TOOLS€¦ · OpenMP profiling Tracing support for CUDA kernels, memcpy and memset nodes launched by a CUDA Graph Support for version 3 NVIDIA Tools Extension API

CUDA, OpenMPI, OpenMP Basics - UB Computer Science and Engineering

Multigrid Method using OpenMP/MPI Hybrid Parallel ... CPU+GPU, CPU+Manycores (e.g. Intel MIC/Xeon Phi) • MPI+X: OpenMP, OpenACC, CUDA, OpenCL Fujitsu@SC12 2 Multigrid • Scalable

Parallel Programming AMANO, Hideharu. Parallel Programming Message Passing PVM MPI Shared Memory POSIX thread OpenMP CUDA/OpenCL Automatic Parallelizing

Introduction to OpenMP - KFUPMhpc.kfupm.edu.sa/Documentation/OpenMP.pdf · Introduction to OpenMP • Introduction • OpenMP basics • OpenMP directives, clauses, and ... between

Introduction to OpenMP - PSC

JCudaMP: OpenMP/Java on CUDA

An#Introduction#to#CUDA/OpenCL# …parlab.eecs.berkeley.edu/sites/all/parlab/files/CatanzaroIntroToG... · Mapping#CUDA#to#Nvidia#GPUs#! ... Introduction to CUDA! CUDA Programming

Tuned OpenMP to CUDA Translationeigenman/app/omp2gpu-upc2011.pdfWhy OpenMP? Advantages of OpenMP as a programming paradigm for GPGPUs. Loop-level parallelism of OpenMP is an ideal

A “Hands-on” Introduction to OpenMP - Intel® Software · The OpenMP API for Multithreaded Programming OpenMP Tutorial 1 1 A “Hands-on” Introduction to OpenMP* Tim Mattson

An Introduction to CUDA Programming and the NAG … · An Introduction to CUDA Programming and the NAG Numerical Routines for GPUs ... Some History –PC Games ... e.g. OpenMP parallel

OpenMP Offload Evaluating Support for Features · OpenMP 1.0 OpenMP 2.0 OpenMP 2.0 OpenMP 2.5 OpenMP 3.0 OpenMP 3.1 OpenMP 4.0 OpenMP 4.5 OpenMP 5.0 OpenACC 1.0 OpenACC 2.5 OpenACC

Introduction to CUDA programming - DSTdst.lbl.gov/.../downloads/Introduction-to-CUDA-programming.pdfIntroduction to CUDA Programming Hemant Shukla [email protected] . Trends ... vector

Exercises to support learning OpenMP · Exercises to support learning OpenMP* * The name “OpenMP” is the property of the OpenMP Architecture Review Board. Tim Mattson Intel Corp

Introduction to CUDA - TUMIntroduction to CUDA Oliver Meister November 7th 2012 Oliver Meister: Introduction to CUDA ... software-side: programming models for GPU computing: CUDA,

Introduction to OpenMP - homepages.math.uic.eduhomepages.math.uic.edu/~jan/mcs572/intro2openmp.pdf · Introduction to OpenMP 1 the OpenMP Application Program Interface programming

HPC1 OpenMP E. Bruce Pitman October, 2002. HPC1 Outline What is OpenMP Multi-threading How to use OpenMP Limitations OpenMP + MPI References

Exercises to support learning OpenMPbebop.cs.berkeley.edu/bootcamp2014/omp-exercises.pdf · Exercises to support learning OpenMP* * The name “OpenMP” is the property of the OpenMP

CUDA, OpenMPI, OpenMP Basics - University at Buffalo · Device code compiled into binary (cubin object) ... nodes in a cluster environment ... CUDA, OpenMPI, OpenMP Basics Created

TRM-06704-001 v5.5 | July 2013 CUDA SAMPLES Reference Manualacarus.uson.mx/docs/cuda-5.5/CUDA_Samples.pdf · OpenGL, CUDA, MPI, and OpenMP libraries for all OS Platforms (Mac, Linux

An Introduction to OpenMP

Cours 10 : Introduction au Parallélisme SIMD OpenMP, CUDA · 2018. 12. 12. · Cours 10 : Introduction au Parallélisme SIMD OpenMP, CUDA. Yann Thierry-Mieg [email protected]

Parallel Hybrid Computing · GPU GPU GPU GPU OpenMP HMPP MPI CUDA. Programming Multicores/ ... CILK, TBB, automatic parallelization, vectorization… • Distributed memory architectures

C OpenMP - cc.u-tokyo.ac.jp · C OpenMP 1. OpenMP OpenMP Architecture Review Board ARB

Introduction to parallel programming via OpenMP · 2019. 11. 20. · Introduction to parallel programming via OpenMP August 20, 2019 Introduction to parallel rogrammingp via OpenMP