openmp examples - part 1 · #pragma omp barrier performs a barrier synchronization between all the...

56

Upload: others

Post on 16-Mar-2020

28 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

OpenMP Examples - Part 1

Mirto Musci, PhD Candidate

Department of Computer ScienceUniversity of Pavia

Processors Architecture Class, Fall 2011

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 2: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

Outline

1 RecapSyntaxParallelization ConstructsData EnvironmentSynchronization

2 ExamplesBasicBug Fixing

3 AssignmentsAssigment 1: PiAssigment 2: Quicksort

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 3: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Outline

1 RecapSyntaxParallelization ConstructsData EnvironmentSynchronization

2 ExamplesBasicBug Fixing

3 AssignmentsAssigment 1: PiAssigment 2: Quicksort

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 4: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

OpenMP Syntax

Most of the constructs of OpenMP are pragmas

#pragma omp c o n s t r u c t [ c l a u s e [ c l a u s e ] . . . ]

(FORTRAN: !$OMP, not covered here)An OpenMP construct applies to a structural blockUsually enclosed by { }

In addition:

Several omp_<something> function callsSeveral OMP_<something> environment variables

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 5: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Controlling OpenMP Behavior

Function calls and (for each one) matching environmentvariables:

omp_set_num_threads(int)/omp_get_num_threads()

Control the number of threads used for parallelization(maximum in case of dynamic adjustment)

Must be called from sequential code

Also can be set by OMP_NUM_THREADS environmentvariable

omp_get_thread_num()

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 6: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Controlling OpenMP Behavior II

omp_get_num_procs()

How many processors are currently available?

omp_set_nested(int)/omp_get_nested()

Enable nested parallelism

omp_in_parallel()

Am I currently running in parallel mode?

omp_get_wtime()

A portable way to compute wall clock time

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 7: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Outline

1 RecapSyntaxParallelization ConstructsData EnvironmentSynchronization

2 ExamplesBasicBug Fixing

3 AssignmentsAssigment 1: PiAssigment 2: Quicksort

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 8: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Parallel Regions

Main construct:#pragma omp parallel

De�nes a parallel region overstructured block of code

Threads are created as �parallel�pragma is crossed

Threads block at end of region(implicit barrier)

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 9: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Work Sharing: For

Used to assign each thread anindependent set of iterations

Threads must wait at the end

Can combine the directives:

#pragma omp parallel for

Only simple kinds of for loops:

Only one signed integer variableInitialization: var=initComparison: var op last

op: <, >, <=, >=Increment: var++, var--,var+=incr, var-=incr, etc.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 10: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Work Sharing: Sections

answer1 = long_computation_1 ( ) ;answer2 = long_computation_2 ( ) ;i f ( answer1 != answer2 ) { . . . }

How to parallelize? These are just two independentcomputations!

#pragma omp s e c t i o n s{#pragma omp s e c t i o nanswer1 = long_computation_1 ( ) ;#pragma omp s e c t i o nanswer2 = long_computation_2 ( ) ;

}i f ( answer1 != answer2 ) { . . . }

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 11: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Schedule Clause: Controlling Work Distribution

schedule(static [, chunksize])

Default: chunks of approximately equivalent size, one to eachthreadIf more chunks than threads: assigned in round-robin to thethreadsWhy might we want to use chunks of di�erent size?

schedule(dynamic [, chunksize])

Threads receive chunk assignments dynamicallyDefault chunk size = 1 (why?)

schedule(guided [, chunksize])

Start with large chunksThreads receive chunks dynamicallyChunk size reduces exponentially, down to chunksize

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 12: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Outline

1 RecapSyntaxParallelization ConstructsData EnvironmentSynchronization

2 ExamplesBasicBug Fixing

3 AssignmentsAssigment 1: PiAssigment 2: Quicksort

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 13: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Data Visibility

Shared Memory programming model

Most variables (including locals) are shared by default(unlike Pthreads!)

{i n t sum = 0 ;#pragma omp p a r a l l e l f o rf o r ( i n t i =0; i<N; i++) sum += i ;

}

Global variables are shared

Some variables can be private

Automatic variables inside the statement blockAutomatic variables in the called functionsVariables can be explicitly declared as private. In that case, alocal copy is created for each thread

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 14: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Overriding Storage Attributes

private:

A copy of the variable iscreated for each thread.

No connection between theoriginal variable and theprivate copies

Can achieve the same usingvariables inside { }

i n t i ;

#pragma omp p a r a l l e l f o r \p r i v a t e ( i )

f o r ( i =0; i<n ; i++) { . . . }

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 15: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Overriding Storage Attributes II

�rstprivate:

Same, but the initial value iscopied from the main copy

lastprivate:

Same, but the last value iscopied to the main copy

i n t i d x =1;i n t x = 10 ;

#pragma omp p a r a l l e l f o r \f i r s p r i v a t e ( x ) \l a s t p r i v a t e ( i d x )

f o r ( i =0; i<n ; i++) {i f ( data [ i ]==x ) i d x = i ;

}

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 16: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Outline

1 RecapSyntaxParallelization ConstructsData EnvironmentSynchronization

2 ExamplesBasicBug Fixing

3 AssignmentsAssigment 1: PiAssigment 2: Quicksort

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 17: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Single

#pragma omp single

Only one of the threads will execute the following block ofcode

The rest will wait for it to completeGood for non-thread-safe regions of code (such as I/O)Must be used in a parallel regionApplicable to parallel for sections

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 18: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Master

#pragma omp master

The following block will be executed by the master thread

No synchronization involved

Applicable only to parallel sections

#pragma omp p a r a l l e l{

do_prep roce s s i ng ( ) ;

#pragma omp s i n g l eread_input ( ) ;#pragma omp masternot i fy_input_consumed ( ) ;

do_proces s ing ( ) ;}

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 19: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Critical Sections

#pragma omp critical [name]

Standard critical section functionality

Critical sections are global in the program

Can be used to protect a single resource in di�erent functions

Critical sections are identi�ed by the name

All the unnamed critical sections are mutually exclusivethroughout the programAll the critical sections having the same name are mutuallyexclusive between themselves

i n t x = 0 ;#pragma omp p a r a l l e l s ha r ed ( x ){#pragma omp c r i t i c a lx++;

}

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 20: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Atomic Execution

Critical sections on the cheap

Protects a single variable updateCan be much more e�cient (a dedicated assembly instructionon some architectures)

#pragma omp atomicupdate_statement

Update statement is one of: var= var op expr, var op= expr,var++, var�.

The variable must be a scalarThe operation op is one of: +, -, *, /, ^, &, |, <�<, >�>The evaluation of expr is not atomic!

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 21: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Ordered

#pragma omp orderedstatement

Executes the statement in the sequential order of iterations

Example:

#pragma omp p a r a l l e l f o rf o r ( j =0; j<N; j++) {

i n t r e s u l t = heavy_computat ion ( j ) ;#pragma omp o rde r edp r i n t f ( " computat ion(%d ) = %d\n" , j , r e s u l t ) ;

}

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 22: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Barrier synchronization

#pragma omp barrier

Performs a barrier synchronization between all the threads in ateam at the given point.

Example:

#pragma omp p a r a l l e l{

i n t r e s u l t = heavy_computat ion_part1 ( ) ;#pragma omp atomicsum += r e s u l t ;#pragma omp b a r r i e rheavy_computat ion_part2 ( sum) ;

}

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 23: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

SyntaxParallelization ConstructsData EnvironmentSynchronization

Reduction

f o r ( j =0; j<N; j++) {sum = sum+a [ j ]∗ b [ j ] ;

}

How to parallelize this code?

sum is not private, but accessing it atomically is too expensiveHave a private copy of sum in each thread, then add them up

Use the reduction clause!#pragma omp parallel for reduction(+: sum)

Any associative operator must be used: +, -, ||, |, *, etc.The private value is initialized automatically (to 0, 1, ~0 . . . )

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 24: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Outline

1 RecapSyntaxParallelization ConstructsData EnvironmentSynchronization

2 ExamplesBasicBug Fixing

3 AssignmentsAssigment 1: PiAssigment 2: Quicksort

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 25: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Practical Stu�

Login on your machine, open a terminal

Copy the whole content of the shared exercise directory toyour home directory

cp −r /home/ e t c / scambio / p a r a l l e l i s m o /∗ ~

Open the source �les with your favorite editor (e.g. gedit)

In case you need to consult the OpenMP reference manual,change your browser connection setting to use odino.unipv.itas proxy (port 8080)

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 26: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Exercise 1: Hello World

Take a moment to examine the source code and note howOpenMP directives and library routines are being used.

Use the following command to compile the code:

gcc −fopenmp omp_hello . c −o h e l l o

To run the code, simply type the command hello and theprogram should run.

How many threads were created? Why?

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 27: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Exercise 1: Hello World

Vary the number of threads and re-run Hello World

Set the number of threads to use by means of theOMP_NUM_THREADS environment variable.

OMP_NUM_THREADS=4

Do you know other ways to set the number of threads?

Your output should look similar to below. The actual order ofoutput strings may vary.

He l l o World from th r ead = 0Number o f t h r e ad s = 4He l l o World from th r ead = 3He l l o World from th r ead = 1He l l o World from th r ead = 2

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 28: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Exercise 2: Environment Information

Starting from scratch, write a simple program that obtainsinformation about your openMP environment.

Alternately, you can modify the "hello" program to do this.

Using the appropriate openMP functions, have the masterthread query and print the following:

The number of processors availableThe number of threads being usedThe maximum number of threads availableIf you are in a parallel regionIf dynamic threads are enabledIf nested parallelism is supported

If you need help, you can consult the omp_getEnvInfoexample �le.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 29: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Exercise 3: Parallel For

This example demonstrates use of the OpenMP forwork-sharing construct.

It speci�es dynamic scheduling of threads and assigns aspeci�c number of iterations to be done by each thread.

After reviewing the source code, compile and run theexecutable. (Assuming OMP_NUM_THREADS still set to 4).

gcc −fopenmp omp_workshare1 . c −o workshare1workshare1 | s o r t

Review the output. Note that it is piped through the sortutility. This will make it easier to view how loop iterationswere actually scheduled.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 30: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Exercise 3: Parallel For

Run the program a couple more times and review the output.What do you see?

Typically, dynamic scheduling is not deterministic.

Everytime you run the program, di�erent threads can rundi�erent chunks of work.

It is even possible that a thread might not do any workbecause another thread is quicker and takes more work.

It might be possible for one thread to do all of the work.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 31: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Exercise 3: Parallel For

Edit the workshare1 source �le and switch to static scheduling.

Recompile and run the modi�ed program. Notice thedi�erence in output compared to dynamic scheduling.

Speci�cally, notice that thread 0 gets the �rst chunk, thread 1the second chunk, and so on.

Rerun the program. Does the output change?

With static scheduling, the allocation of work is deterministicand should not change between runs.

Every thread gets work to do.

Re�ect on possible performance di�erences between dynamicand static scheduling.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 32: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Exercise 4: Sections

This example demonstrates use of the OpenMP sectionswork-sharing construct.

Note how the parallel region is divided into separate sections,each of which will be executed by one thread.

As before, compile and execute the program after reviewing it.

gcc −openmp omp_workshare2 . c −o workshare2workshare2

Run the program several times and observe any di�erences inoutput.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 33: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Exercise 4: Sections

Because there are only two sections, you should notice thatsome threads do not do any work.

You may/may not notice that the threads doing work can vary.

For example, the �rst time thread 0 and thread 1 may do thework, and the next time it may be thread 0 and thread 3.

It is even possible for one thread to do all of the work.

Which thread does work is non-deterministic in this case.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 34: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Exercise 5: Orphan Directive

This example computes a dot product in parallel.

It di�ers from previous examples because the parallel loopconstruct is orphaned

It's contained in a subroutine outside the lexical extent of themain program's parallel region.

After reviewing the source code, compile and run the program

gcc −fopenmp omp_orphan . c −o orphanorphan | s o r t

Note the result...and the fact that this example will come backto haunt as omp_bug6 later.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 35: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Exercise 6: Matrix Multiply

This example performs a matrix multiply by distributing theiterations of the operation between available threads.

After reviewing the source code, compile and run the program

gcc −fopenmp omp_mm. c −o matmult

Review the output. It shows which thread did each iterationand the �nal result matrix.

Run the program again, however this time sort the output toclearly see which threads execute which iterations:

matmult | s o r t | g rep Thread

Do the loop iterations match the schedule(static, chunk)clause for the matrix multiple loop in the code?

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 36: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Outline

1 RecapSyntaxParallelization ConstructsData EnvironmentSynchronization

2 ExamplesBasicBug Fixing

3 AssignmentsAssigment 1: PiAssigment 2: Quicksort

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 37: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

When things go wrong...

There are many things that can go wrong when developingOpenMP programs.

The omp_bugX.X series of programs demonstrate just a few.

See if you can �gure out what the problem is with each caseand then �x it.

The buggy behavior will di�er for each example. Some hintsare provided in the next slide.

More in details explanations are provided in the following.

Don't cheat!

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 38: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Hints

Code Hint

omp_bug1 Fails compilation. Solution provided.

omp_bug2 Thread identi�ers are wrong. Wrong answers.

omp_bug3 Run-time error, hang.

omp_bug4 Causes a segmentation fault. Script provided.

omp_bug5 Program hangs. Solution provided.

omp_bug6 Failed compilation.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 39: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Explanations: omp_bug1

This exercise attempts to show the use of the combinedparallel for directive.

It fails because the loop does not come immediately after thedirective.

Corrections include removing all statements between theparallel for directive and the actual loop.

Logic is added to preserve the ability to query the thread idand print it from inside the loop.

Notice the use of the �rstprivate clause to intialize the �ag.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 40: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Explanations: omp_bug2

The bugs in this case are caused by neglecting to scope the tidand total variables as private.

By default, most OpenMP variables are scoped as shared.

These variables need to be unique for each thread.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 41: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Explanations: omp_bug3

The run time error is caused by by the omp barrier directive inthe print_results subroutine.

By de�nition, an omp barrier can not be nested outside thestatic extent of a sections directive.

In this case it is orphaned outside the calling sections block.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 42: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Explanations: omp_bug4

OpenMP thread stack size is an implementation dependentresource.

In this case, the array is too large to �t into the thread stackspace and causes the segmentation fault.

Solution provided - note that it is a script and will need to be"sourced".

For example: source omp_bug4fig.

Be sure to examine the solution �le to see what's going on.

In the last line you may need to change the name of theexecutable to match yours.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 43: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Explanations: omp_bug5

The problem is that the �rst thread acquires locka and thentries to get lockb before releasing locka .

Meanwhile, the second thread has acquired lockb and thentries to get locka before releasing lockb .

The solution overcomes the deadlock by using locks correctly.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 44: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

BasicBug Fixing

Explanations: omp_bug6

With orphaned directives, the correct scoping of variables iscritical.

The error occurs because the sum variable is scoped incorrectly.

See the omp_orphan routine for one example of correctscoping.

Note that there are other ways.

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 45: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

Assigment 1: PiAssigment 2: Quicksort

Outline

1 RecapSyntaxParallelization ConstructsData EnvironmentSynchronization

2 ExamplesBasicBug Fixing

3 AssignmentsAssigment 1: PiAssigment 2: Quicksort

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 46: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

Assigment 1: PiAssigment 2: Quicksort

Numerical Integration

Mathematically, we know that:∫ 1

0

4.0

(1+ x2)dx = π

We can approximate the integralas a sum of rectangles:

N

∑i=0

F (xi )∆x ≈ π

Where each rectangle has width∆x and height F (xi ) at themiddle of interval i .

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 47: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

Assigment 1: PiAssigment 2: Quicksort

Serial Code

s t a t i c long num_steps=100000;double s tep , p i ;

vo id main ( ){ i n t i ;

double x , sum = 0 . 0 ;

s t e p = 1 . 0/ ( double ) num_steps ;

f o r ( i =0; i< num_steps ; i++){x = ( i +0.5)∗ s t e p ;sum = sum + 4 . 0 / ( 1 . 0 + x∗x ) ;

}p i = s t ep ∗ sum ;p r i n t f ( "Pi = %f \n" , p i ) ;

}

Parallelize thenumericalintegration codeusing OpenMP

What variables canbe shared?

What variables needto be private?

What variablesshould be set up forreductions?

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 48: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

Assigment 1: PiAssigment 2: Quicksort

Parallel Code

s t a t i c long num_steps=100000;double s tep , p i ;

vo id main ( ){ i n t i ;

double x , sum = 0 . 0 ;

s t e p = 1 . 0/ ( double ) num_steps ;

#pragma omp p a r a l l e l f o r \p r i v a t e ( x ) r e d u c t i o n (+:sum)

f o r ( i =0; i< num_steps ; i++){x = ( i +0.5)∗ s t e p ;sum = sum + 4 . 0 / ( 1 . 0 + x∗x ) ;

}p i = s t ep ∗ sum ;p r i n t f ( "Pi = %f \n" , p i ) ;

}

Parallelization codeis a one-liner!

sum is a reduction,hence shared,variable

i is private since itis the loop variable

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 49: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

Assigment 1: PiAssigment 2: Quicksort

Assignment

Modify the pi calculation example, so you can:

vary, at run time, the number of steps

Will the calculated pi value change?

get the total time for the calculation using omp_get_wtime

Implement the computational core in a separate function, andcall it varying the number of thread spawned

Observe di�erences in elapsed timeWhat happens if you use more threads than availableprocessors?

Advanced: reimplement not using the reduction clause

it is slower? faster?

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 50: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

Assigment 1: PiAssigment 2: Quicksort

Outline

1 RecapSyntaxParallelization ConstructsData EnvironmentSynchronization

2 ExamplesBasicBug Fixing

3 AssignmentsAssigment 1: PiAssigment 2: Quicksort

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 51: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

Assigment 1: PiAssigment 2: Quicksort

Quicksort Algorithm

Given an array of n elements (e.g., integers). If array only containsone element: return. Else:

Pick one element to use as pivot.

Partition elements into two sub-arrays:

Elements less than or equal to pivotElements greater than pivot

Recursively quicksort two sub-arrays

Return results

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 52: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

Assigment 1: PiAssigment 2: Quicksort

Considerations

There are a number of ways to pick the pivot element

Commonly �rst or last element, but bad performance if thearray is already orderedRandom index or middle-point index solve the problem

After partitioning, the sub-arrays can bestored in the originaldata array.

Partitioning loops through, swapping elements

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 53: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

Assigment 1: PiAssigment 2: Quicksort

Example

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 54: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

Assigment 1: PiAssigment 2: Quicksort

Serial Code

vo id q u i c k s o r t ( i n t a [ ] , i n t l ower , i n t upper ){

i n t i ;i f ( upper > lowe r ){

i = p a r t i t i o n ( a , lower , upper ) ;q u i c k s o r t ( a , lower , i − 1 ) ;q u i c k s o r t ( a , i + 1 , upper ) ;

}}

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 55: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

RecapExamples

Assignments

Assigment 1: PiAssigment 2: Quicksort

Assigment

Re�ne the serial implementation provided

Try to parallelize the code using OpenMP... is not easy!

Try with section constructs, or experment with task

Remember the code is recursive!

Call omp_set_nested(1)Somehow limit thread spawning

Carefully measure performance with omp_get_wtime

Mirto Musci, PhD Candidate OpenMP Examples - Part 1

Page 56: OpenMP Examples - Part 1 · #pragma omp barrier Performs a barrier synchronization between all the threads in a teamat the given point. Example: p#ragma omp parallel {int result =

Appendix For Further Reading

For Further Reading

Blaise BarneyOpenMP Exercise, 2011https://computing.llnl.gov/tutorials/openMP/

exercise.html

Mirto Musci, PhD Candidate OpenMP Examples - Part 1