openmp dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/didattica/hpc/slidespdf/09....
TRANSCRIPT
![Page 2: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/2.jpg)
Outline
› Expressing parallelism– Understanding parallel threads
› Memory Data management– Data clauses
› Synchronization– Barriers, locks, critical sections
› Work partitioning– Loops, sections, single work, tasks…
› Execution devices– Target
2
![Page 3: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/3.jpg)
Let's talk about performance
› We already saw how parallelism ≠> performance– Example: a loop
– If one thread is delayed, it prevents other threads to do useful work!!
3
TT TT
#pragma omp parallel num_threads(4)
{
#pragma omp for
for(int i=0; i<N; i++)
{
...
} // (implicit) barrier
// USEFUL WORK!!
} // (implicit) barrier
![Page 4: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/4.jpg)
Unbalanced loop partitioning
› Iterations are statically assigned before entering the loop– Might not be effective nor efficient
4
T T TT#pragma omp parallel for num_threads (4)
for (int i=0; i<16; i++)
{
/* UNBALANCED LOOP CODE */
} /* (implicit) Barrier */
I
D
L
E
I
D
L
E
I
D
L
E
![Page 5: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/5.jpg)
Dynamic loops
› Assign iterations to threads in a dynamic manner– At runtime!!
› Static semantic– "Partition the loop in Nthreads parts threads and assign them to the team"
– Naive and passive
› Dynamic semantic– "Each thread in the team fetches an iteration (or a block of) when he's idle"
– Proactive
– Work-conservative
5
![Page 6: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/6.jpg)
Dynamic loops
› Activated using the schedule clause
6
T T TT#pragma omp parallel for num_threads (4) \
schedule(dynamic)
for (int i=0; i<16; i++)
{
/* UNBALANCED LOOP CODE */
} /* (implicit) Barrier */
15
![Page 7: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/7.jpg)
The schedule clause
› The iteration space is divided according to the schedule clause– kind can be : { static | dynamic | guided | auto | runtime }
7
#pragma omp for [clause [[,] clause]...] new-line
for-loops
Where clauses can be:
private(list)
firstprivate(list)
lastprivate(list)
linear(list[ : linear-step])
reduction(reduction-identifier : list)
schedule([modifier [, modifier]:]kind[, chunk_size])
collapse(n)
ordered[(n)]
nowait
![Page 8: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/8.jpg)
OMP loop schedule policies
› schedule(static[, chunk_size])– Iterations are divided into chunks of chunk_size, and chunks are assigned to
threads before entering the loop
– If chunk_size unspecified, = NITER/NTHREADS (with some adjustement…)
› schedule(dynamic[, chunk_size])– Iterations are divided into chunks of chunk_size
– At runtime, each thread requests for a new chunk after finishing one
– If chunk_size unspecified, then = 1
8
![Page 9: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/9.jpg)
Static vs. Dynamic
9
4
5
6
7
T T
0
1
2
3
1
6
7
T T
0
3
4
5
2
#pragma omp parallel for num_threads (2) \
schedule( ... )
for (int i=0; i<8; i++)
{
// ...
} /* (implicit) Barrier */
ID 0ID 1
![Page 10: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/10.jpg)
OMP loop schedule policies (cont'd)
› schedule(guided[, chunk_size])– A mix of static and dynamic
– chunk_size determined statically, assignment done dynamically
› schedule(auto)– Programmer let compiler and/or runtime decide
– Chunk size, thread mapping..
– "I wash my hands"
› schedule(runtime)– Only runtime decides according to run-sched-var ICV
– If run-sched-var = auto, then implementation defined
10
![Page 11: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/11.jpg)
Loops chunking
11
1
3
5
7
TT
0
2
4
6
schedule(static)
schedule(dynamic, NITER/NTRHD)
schedule(dynamic, 2)
schedule(dynamic, 1)
Schedule(dynamic)
2
3
6
7
TT
0
1
4
5
4
5
6
7
TT
0
1
2
3
0
1
2
3
TT
4
5
6
7
ID 1ID 0
chunk
![Page 12: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/12.jpg)
Modifiers, collapsed and ordered
› These we won't see– E.g., modifier can be : { monothonic | nonmonothonic | simd }
– Let you tune the loop and give more information to the OMP stack
– To maximize performance12
#pragma omp for [clause [[,] clause]...] new-line
for-loops
Where clauses can be:
private(list)
firstprivate(list)
lastprivate(list)
linear(list[ : linear-step])
reduction(reduction-identifier : list)
schedule([modifier [, modifier]:]kind[, chunk_size])
collapse(n)
ordered[(n)]
nowait
![Page 13: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/13.jpg)
Static vs. dynamic loops
› So, why not always dynamic?– For unbalanced workloads, they are more flexible– "For balanced workload, in the worst case, they behave like static loops!"
Not always true!
› Static loops loops have a (light) cost only before the loop– Actually, the lighter way you can distribute work in OpenMP!!– Often a performance reference..
› Dynamic loops have a cost:– For initializing the loop– For fetching a(nother) chunk of work– At the end of the loop
13
![Page 14: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/14.jpg)
OpenMP loops overhead
14
4
5
6
7
TT
0
1
3
4
1
3
5
7
0
2
4
6
2
3
6
7
0
1
4
5
4
5
6
7
TT
0
1
2
3
schedule(static)
schedule(dynamic, NITER/NTHRD)
schedule(dynamic, 2)
schedule(dynamic, 1)
schedule(dynamic)
TTTT
![Page 15: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/15.jpg)
Exercise
› Create an array of N elements– Put inside each array element its index, multiplied by '2'
– arr[0] = 0; arr[1] = 2; arr[2] = 4; ...and so on..
› Now, simulate unbalanced workload– Use both static and dynamic loops
– Each thread prints iteration index i
– What do you (should) see?
15
Let's
code!
#pragma omp parallel for schedule(...)
for (int i=0; i<NUM; i++)
{
// ...
// Simulate iteration-dependant work
volatile long a = i * 1000000L;
while(a--)
;
}
![Page 16: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/16.jpg)
How to run the examples
› Download the Code/ folder from the course website
› Compile
› $ gcc –fopenmp code.c -o code
› Run (Unix/Linux)
$ ./code
› Run (Win/Cygwin)
$ ./code.exe
16
Let's
code!
![Page 17: OpenMP dynamic loops - algo.ing.unimo.italgo.ing.unimo.it/people/andrea/Didattica/HPC/SlidesPDF/09. OMP dy… · Dynamic loops ›Assign iterations to threads in a dynamic manner](https://reader034.vdocuments.site/reader034/viewer/2022042622/5f8b96fcc8d03832cf677745/html5/thumbnails/17.jpg)
References
› "Calcolo parallelo" website
– http://hipert.unimore.it/people/paolob/pub/Calcolo_Parallelo/
› My contacts
– http://hipert.mat.unimore.it/people/paolob/
› Useful links
– http://www.openmp.org
– http://www.google.com
– http://gcc.gnu.org
17