parallelism in the standard c++: what to expect in c++ 17

41
Parallelism in the Standard C++: What to Expect in C++ 17 Artur Laksberg Microsoft Corp. May 8th, 2014

Upload: kory

Post on 23-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Parallelism in the Standard C++: What to Expect in C++ 17. Artur Laksberg Microsoft Corp. May 8th, 2014. Agenda. Fundamentals Task regions Parallel Algorithms Parallelization Vectorization. Part 1: The Fundamentals. Renderscript. OpenMP. CUDA. C++ AMP. PPL. TBB. MPI. OpenACC. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Parallelism in the Standard C++: What to Expect in C++ 17

Parallelism in the Standard C++: What to Expect in C++ 17

Artur LaksbergMicrosoft Corp.May 8th, 2014

Page 2: Parallelism in the Standard C++: What to Expect in C++ 17

Agenda Fundamentals

Task regions Parallel Algorithms

Parallelization Vectorization

Page 3: Parallelism in the Standard C++: What to Expect in C++ 17

Part 1: The Fundamentals

Page 4: Parallelism in the Standard C++: What to Expect in C++ 17

OpenMPTBBPPL

MPIOpenCLOpenACC

CUDA C++ AMP

Renderscript

Cilk Plus GCD

Page 5: Parallelism in the Standard C++: What to Expect in C++ 17

Parallelism in C++11/14 Fundamentals:

Memory model Atomics

Basics: thread mutex condition_variable async future

Page 6: Parallelism in the Standard C++: What to Expect in C++ 17

Quicksort: Serial

void quicksort(int *v, int start, int end) { if (start < end) { int pivot = partition(v, start, end);

quicksort(v, start, pivot - 1);

quicksort(v, pivot + 1, end);

}}

Page 7: Parallelism in the Standard C++: What to Expect in C++ 17

Quicksort: Use Threadsvoid quicksort(int *v, int start, int end) { if (start < end) {

int pivot = partition(v, start, end);

std::thread t1([&] { quicksort(v, start, pivot - 1); });

std::thread t2([&] { quicksort(v, pivot + 1, end); });

t1.join(); t2.join(); }}

Problem 1:expensive

Problem 2:Fork-join not enforced

Problem 3:Exceptions??

Page 8: Parallelism in the Standard C++: What to Expect in C++ 17

Quicksort: Fork-Join Parallelism

void quicksort(int *v, int start, int end) { if (start < end) {

int pivot = partition(v, start, end);

quicksort(v, start, pivot - 1);

quicksort(v, pivot + 1, end);

}}

parallel region

task

task

Page 9: Parallelism in the Standard C++: What to Expect in C++ 17

Quicksort: Using Task Regions (N3832)void quicksort(int *v, int start, int end) { if (start < end) {

task_region([&] (auto& r) {

int pivot = partition(v, start, end);

r.run([&] { quicksort(v, start, pivot - 1); });

r.run([&] { quicksort(v, pivot + 1, end); });

}); }}

task

task

parallel region

Page 10: Parallelism in the Standard C++: What to Expect in C++ 17

Under The Hood…

Page 11: Parallelism in the Standard C++: What to Expect in C++ 17

Work Stealing Schedulingproc 1 proc 3proc 2 proc 4

Page 12: Parallelism in the Standard C++: What to Expect in C++ 17

Work Stealing Schedulingproc 1

Old items

proc 3proc 2 proc 4

New items

Page 13: Parallelism in the Standard C++: What to Expect in C++ 17

Work Stealing Schedulingproc 1

Old items

proc 3proc 2 proc 4

New items

Page 14: Parallelism in the Standard C++: What to Expect in C++ 17

Work Stealing Schedulingproc 1

Old items

proc 3proc 2 proc 4

New items

“Thief”

Page 15: Parallelism in the Standard C++: What to Expect in C++ 17

Fork-Join Parallelism and Work Stealing

e();

task_region([] (auto& r) {

r.run(f);

g();

});

h();

e()

f() g()

h()

Q2: What thread runs g?

Q3: What thread runs h?

Q1: What thread runs f?

Page 16: Parallelism in the Standard C++: What to Expect in C++ 17

Work Stealing Design Choices What Thread Executes After

a Spawn? Child Stealing Continuation (parent)

Stealing

What Thread Executes After a Join? Stalling: initiating thread

waits Greedy: the last thread to

reach join continuestask_region([] (auto& r) { for(int i=0; i<n; ++i) r.run(f);});

Page 17: Parallelism in the Standard C++: What to Expect in C++ 17

Part 2: The Algorithms

Page 18: Parallelism in the Standard C++: What to Expect in C++ 17

Alex Stepanov: Start With The Algorithms

Page 19: Parallelism in the Standard C++: What to Expect in C++ 17

InspirationPerforming Parallel Operations On Containers

Intel Threading Building Blocks

Microsoft Parallel Patterns Library, C++ AMP

Nvidia Thrust

Page 20: Parallelism in the Standard C++: What to Expect in C++ 17

Parallel STL Just like STL, only parallel…

Can be faster If you know what you’re doing

Two Execution Policies: std:par std::vec

Page 21: Parallelism in the Standard C++: What to Expect in C++ 17

Parallelization: What’s a Big Deal? Why not already parallel?std::sort(begin, end, [](int a, int b) { return a < b; });

User-provided closures must be thread safe:int comparisons = 0;std::sort(begin, end, [&](int a, int b) { comparisons++; return a < b; });

But also special-member functions, std::swap etc.

Page 22: Parallelism in the Standard C++: What to Expect in C++ 17

It’s a Contract What the user can do What the implementer can do

Asymptotic Guarantees:std::sort: O(n*log(n)), std::stable_sort: O(n*log2(n)), what about parallel sort?

What is a valid implementation? (see next slide)

Page 23: Parallelism in the Standard C++: What to Expect in C++ 17

Chaos Sorttemplate<typename Iterator, typename Compare>void chaos_sort( Iterator first, Iterator last, Compare comp ) { auto n = last-first; std::vector<char> c(n); for(;;) { bool flag = false; for( size_t i=1; i<n; ++i ) { c[i] = comp(first[i],first[i-1]); flag |= c[i]; } if( !flag ) break; for( size_t i=1; i<n; ++i ) if( c[i] ) std::swap( first[i-1], first[i] ); }}

Page 24: Parallelism in the Standard C++: What to Expect in C++ 17

Execution Policies Built-in Execution Policies:

extern const sequential_execution_policy seq;extern const parallel_execution_policy par;extern const vector_execution_policy vec;

Dynamic Execution Policy:class execution_policy{public:// ... const type_info& target_type() const; template<class T> T *target(); template<class T> const T *target() const;};

Page 25: Parallelism in the Standard C++: What to Expect in C++ 17

Using Execution Policy To Write Paralel Code

std::vector<int> vec = ...

// standard sequential sortstd::sort(vec.begin(), vec.end());

using namespace std::experimental::parallel;

// explicitly sequential sortsort(seq, vec.begin(), vec.end());

// permitting parallel executionsort(par, vec.begin(), vec.end());

// permitting vectorization as wellsort(vec, vec.begin(), vec.end());

Page 26: Parallelism in the Standard C++: What to Expect in C++ 17

Picking Execution Policy Dynamicallysize_t threshold = ...

execution_policy exec = seq;

if(vec.size() > threshold){ exec = par;}

sort(exec, vec.begin(), vec.end());

Page 27: Parallelism in the Standard C++: What to Expect in C++ 17

Exception Handling In C++ philosophy, no exception is silently ignored Exception list: container of exception_ptr objectstry{ r = std::inner_product(std::par, a.begin(), a.end(), b.begin(), func1, func2, 0);}catch(const exception_list& list){ for(auto& exptr : list) { // process exception pointer exptr }}

Page 28: Parallelism in the Standard C++: What to Expect in C++ 17

Vectorization: A Tale From Agriculture

Page 29: Parallelism in the Standard C++: What to Expect in C++ 17

A Tale From Agriculture

Page 30: Parallelism in the Standard C++: What to Expect in C++ 17

A Tale From Agriculture

Page 31: Parallelism in the Standard C++: What to Expect in C++ 17

Idea: Fewer Tractors, Wider Plows

Page 32: Parallelism in the Standard C++: What to Expect in C++ 17

Vectorization: What’s a Big Deal?int a[n] = ...;int b[n] = ...;for(int i=0; i<n; ++i){ a[i] = b[i] + c;}

movdqu xmm1, XMMWORD PTR _b$[esp+eax+132]movdqu xmm0, XMMWORD PTR _a$[esp+eax+132]paddd xmm1, xmm2paddd xmm1, xmm0movdqu XMMWORD PTR _a$[esp+eax+132], xmm1

a[i:i+3] = b[i:i+3] + c;

Page 33: Parallelism in the Standard C++: What to Expect in C++ 17

Vector Lane is not a Thread! Taking locks

Thread with thread_id x takes a lock… Then another “thread” with the same thread_id enters the

lock… Deadlock!!!

Exceptions Can we unwind 1/4th of the stack?

Page 34: Parallelism in the Standard C++: What to Expect in C++ 17

Vectorization: Not So Easy Any More…void f(int* a, int*b){ for(int i=0; i<n; ++i) { a[i] = b[i] + c; func();

}}

mov ecx, DWORD PTR _b$[esp+esi+140]add ecx, ediadd DWORD PTR _a$[esp+esi+140], ecxcall func

Aliasing?

Side effects?Dependence?Exceptions?

Page 35: Parallelism in the Standard C++: What to Expect in C++ 17

Vectorization Hazard: Locks

for(int i=0; i<n; ++i){ lock.enter(); a[i] = b[i] + c; lock.release();}

for(int i=0; i<n; i+=4){ for(int j=0; j<4; ++j) lock.enter();

a[i:i+3] = b[i:i+3] + c;

for(int j=0; j<4; ++j) lock.release();}

This transformation is not safe!

Consider: f takes a lock, g releases the lock:

?

Page 36: Parallelism in the Standard C++: What to Expect in C++ 17

How Do We Get This?void f(int* a, int*b){ for(int i=0; i<n; ++i) { a[i] = b[i] + c; func(); }

}

for(int i=0; i<n; i+=4){ a[i:i+3] = b[i:i+3] + c; for(int j=0; j<4; ++j) func();}

Need a helping hand from the programmer, because…

Page 37: Parallelism in the Standard C++: What to Expect in C++ 17

Vector Loop with Parallel STLvoid f(int* a, int*b){ integer_iterator begin {0}; integer_iterator end {n};

std::for_each( std::vec, begin, end, [&](int i) { a[i] = b[i] + c; func(); }}

Page 38: Parallelism in the Standard C++: What to Expect in C++ 17

Parallelization vs. VectorizationParallelization

Threads Stack Good for divergent code Relatively heavy-weight

Vectorization Vector Lanes No stack Lock-step execution Very light-weight

Page 39: Parallelism in the Standard C++: What to Expect in C++ 17

When To Vectorizestd::par

No race conditions No aliasing

std::vec Same as std::vec, plus: No Exceptions No Locks No/Little Divergence

Page 40: Parallelism in the Standard C++: What to Expect in C++ 17

References N3832: Task Region N3872: A Primer on Scheduling Fork-Join Parallelism

with Work Stealing N3724: A Parallel Algorithms Library N3850: Working Draft, Technical Specification for C++

Extensions for Parallelism parallelstl.codeplex.com

Page 41: Parallelism in the Standard C++: What to Expect in C++ 17