prog-ompss
TRANSCRIPT
Programming with OmpSs
Alejandro Duran
Barcelona Supercomputing Center
28th May 2011
1 Installation
2 Tasking in OpenMPWaiting for tasksThe dependency clauses
3 Other OpenMP useful concepts
Alex Duran (BSC) Programming with OmpSs 28th May 2011 2 / 35
Installation
Outline
1 Installation
2 Tasking in OpenMPWaiting for tasksThe dependency clauses
3 Other OpenMP useful concepts
Alex Duran (BSC) Programming with OmpSs 28th May 2011 3 / 35
Installation
Installation
You need to install the following:MercuriumNanos++Decide a TARGET installation directory
TARGET=$HOME/soft
Alex Duran (BSC) Programming with OmpSs 28th May 2011 4 / 35
Installation
Installing Nanos++
1 git clone http://pm.bsc.es/git/nanox.git
2 cd nanox
3 autoreconf -f -i
4 ./configure --prefix=$TARGET
5 make install
Alex Duran (BSC) Programming with OmpSs 28th May 2011 5 / 35
Installation
Installing Mercurium
1 git clone http://pm.bsc.es/git/mcxx.git
2 cd mcxx
3 autoreconf -f -i
4 ./configure --prefix=$TARGET --enable-ompss
5 make install
Alex Duran (BSC) Programming with OmpSs 28th May 2011 6 / 35
Installation
Ready to use
1 Add to pathPATH=$TARGET/bin:$PATH
2 Compile your applicationmcc --ompss -o app app.c
3 RunOMP_NUM_THREADS=4 ./app
Alex Duran (BSC) Programming with OmpSs 28th May 2011 7 / 35
Tasking in OpenMP
Outline
1 Installation
2 Tasking in OpenMPWaiting for tasksThe dependency clauses
3 Other OpenMP useful concepts
Alex Duran (BSC) Programming with OmpSs 28th May 2011 8 / 35
Tasking in OpenMP
What is a task in OpenMP?
Tasks are work units whose execution may be deferredthey can also be executed immediately
Tasks are composed of:code to executea data environment
Initialized at creation time
Threads cooperate to execute them
Alex Duran (BSC) Programming with OmpSs 28th May 2011 9 / 35
Tasking in OpenMP
Creating tasks
The task construct
#pragma omp task [ c lauses ]s t r u c t u r e d block | func t ion−dec l | f unc t ion−def
Where clauses can be:Data-sharing clauses:
sharedprivatefirstprivatedefault
if(expression)
untied
Alex Duran (BSC) Programming with OmpSs 28th May 2011 10 / 35
Tasking in OpenMP
Data-sharing clauses
Each variable in OpenMP has a data-sharing attribute:shared The task shares the variable of its parentprivate Each task has its own copy of the variable with undefinedvaluefirstprivate The private copy is initialized with the parent variable
This is value is captured when the task is created
This is the default for tasks
Alex Duran (BSC) Programming with OmpSs 28th May 2011 11 / 35
Tasking in OpenMP
Data-sharing clauses
void foo ( ){
i n t a , b , c ;
a = 1;b = 2;c = 3;
#pragma omp task shared ( a ) private ( b ) firstprivate ( c ){
p r i n t f ("%d\n" , a ) ;p r i n t f ("%d\n" , b ) ;p r i n t f ("%d\n" , c ) ;
}
a++;b++;c++;
}
prints 1 or 2. Requires synchronization
prints whatever. Undefined valueprints 3
Alex Duran (BSC) Programming with OmpSs 28th May 2011 12 / 35
Tasking in OpenMP
Data-sharing clauses
void foo ( ){
i n t a , b , c ;
a = 1;b = 2;c = 3;
#pragma omp task shared ( a ) private ( b ) firstprivate ( c ){
p r i n t f ("%d\n" , a ) ;p r i n t f ("%d\n" , b ) ;p r i n t f ("%d\n" , c ) ;
}
a++;b++;c++;
}
prints 1 or 2. Requires synchronization
prints whatever. Undefined value
prints 3
Alex Duran (BSC) Programming with OmpSs 28th May 2011 12 / 35
Tasking in OpenMP
Data-sharing clauses
void foo ( ){
i n t a , b , c ;
a = 1;b = 2;c = 3;
#pragma omp task shared ( a ) private ( b ) firstprivate ( c ){
p r i n t f ("%d\n" , a ) ;p r i n t f ("%d\n" , b ) ;p r i n t f ("%d\n" , c ) ;
}
a++;b++;c++;
}
prints 1 or 2. Requires synchronizationprints whatever. Undefined value
prints 3
Alex Duran (BSC) Programming with OmpSs 28th May 2011 12 / 35
Tasking in OpenMP
List traversal
void t r a v e r s e _ l i s t ( L i s t l ){
Element e ;for ( e = l −> f i r s t ; e ; e = e−>next )
#pragma omp taskprocess ( e ) ;
}e is firstprivate
Alex Duran (BSC) Programming with OmpSs 28th May 2011 13 / 35
Tasking in OpenMP
Task clauses
untied Allows the task to be stolen once startedif Allows to create a task conditionally
Alex Duran (BSC) Programming with OmpSs 28th May 2011 14 / 35
Tasking in OpenMP
Tree traversal
void po s to rd e r _ t r a ve r sa l ( Node ∗ n , i n t depth ){
#pragma omp task i f ( d < MAX_DEPTH) untiedpo s to r de r _ t r ave r sa l ( n−> l e f t , d +1 ) ;
#pragma omp task i f ( d < MAX_DEPTH) untiedpo s to rde r _ t r ave r sa l ( n−>r i g h t , d +1) ;
process ( n−>data ) ;}
Helps with load balancingRegular if works betterwith fine-grain tasks
This could be executed before children
Alex Duran (BSC) Programming with OmpSs 28th May 2011 15 / 35
Tasking in OpenMP
Tree traversal
void po s to rd e r _ t r a ve r sa l ( Node ∗ n , i n t depth ){
#pragma omp task i f ( d < MAX_DEPTH) untiedpo s to r de r _ t r ave r sa l ( n−> l e f t , d +1 ) ;
#pragma omp task i f ( d < MAX_DEPTH) untiedpo s to rde r _ t r ave r sa l ( n−>r i g h t , d +1) ;
process ( n−>data ) ;}
Helps with load balancing
Regular if works betterwith fine-grain tasks
This could be executed before children
Alex Duran (BSC) Programming with OmpSs 28th May 2011 15 / 35
Tasking in OpenMP
Tree traversal
void po s to rd e r _ t r a ve r sa l ( Node ∗ n , i n t depth ){
#pragma omp task i f ( d < MAX_DEPTH) untiedpo s to r de r _ t r ave r sa l ( n−> l e f t , d +1 ) ;
#pragma omp task i f ( d < MAX_DEPTH) untiedpo s to rde r _ t r ave r sa l ( n−>r i g h t , d +1) ;
process ( n−>data ) ;}
Helps with load balancing
Regular if works betterwith fine-grain tasks
This could be executed before children
Alex Duran (BSC) Programming with OmpSs 28th May 2011 15 / 35
Tasking in OpenMP
Tree traversal
void po s to rd e r _ t r a ve r sa l ( Node ∗ n , i n t depth ){
#pragma omp task i f ( d < MAX_DEPTH) untiedpo s to r de r _ t r ave r sa l ( n−> l e f t , d +1 ) ;
#pragma omp task i f ( d < MAX_DEPTH) untiedpo s to rde r _ t r ave r sa l ( n−>r i g h t , d +1) ;
process ( n−>data ) ;}
Helps with load balancingRegular if works betterwith fine-grain tasks
This could be executed before children
Alex Duran (BSC) Programming with OmpSs 28th May 2011 15 / 35
Tasking in OpenMP Waiting for tasks
Waiting for children
The taskwait construct
#pragma omp taskwait
Suspends the current task until all children tasks are completedJust direct children, not descendants
Alex Duran (BSC) Programming with OmpSs 28th May 2011 16 / 35
Tasking in OpenMP Waiting for tasks
Taskwait
Example
void t r a v e r s e _ l i s t ( L i s t l ){
Element e ;for ( e = l −> f i r s t ; e ; e = e−>next )
#pragma omp taskprocess ( e ) ;
#pragma omp taskwait
}
All tasks guaranteed to be completed here
Alex Duran (BSC) Programming with OmpSs 28th May 2011 17 / 35
Tasking in OpenMP Waiting for tasks
Taskwait
Example
void t r a v e r s e _ l i s t ( L i s t l ){
Element e ;for ( e = l −> f i r s t ; e ; e = e−>next )
#pragma omp taskprocess ( e ) ;
#pragma omp taskwait
}All tasks guaranteed to be completed here
Alex Duran (BSC) Programming with OmpSs 28th May 2011 17 / 35
Tasking in OpenMP Waiting for tasks
Tree traversal
void po s to rd e r _ t r a ve r sa l ( Node ∗ n , i n t depth ){
#pragma omp task i f ( d < MAX_DEPTH) untiedpo s to r de r _ t r a ve r sa l ( n−> l e f t , d +1 ) ;
#pragma omp task i f ( d < MAX_DEPTH) untiedpo s to r de r _ t r a ve r sa l ( n−>r i g h t , d +1) ;
#pragma omp taskwaitprocess ( n−>data ) ;
}
Children completed here
Alex Duran (BSC) Programming with OmpSs 28th May 2011 18 / 35
Tasking in OpenMP Waiting for tasks
Tree traversal
void po s to rd e r _ t r a ve r sa l ( Node ∗ n , i n t depth ){
#pragma omp task i f ( d < MAX_DEPTH) untiedpo s to r de r _ t r a ve r sa l ( n−> l e f t , d +1 ) ;
#pragma omp task i f ( d < MAX_DEPTH) untiedpo s to r de r _ t r a ve r sa l ( n−>r i g h t , d +1) ;
#pragma omp taskwaitprocess ( n−>data ) ;
}Children completed here
Alex Duran (BSC) Programming with OmpSs 28th May 2011 18 / 35
Tasking in OpenMP The dependency clauses
Clauses
We have four clauses:input(lvalue-expression)
specifies that the task will block until its inputs are fulfilledoutput(lvalue-expression)
specifies that the task will release tasks waiting on its outputsinout(lvalue-expression)
combination of the above
concurrent(lvalue-expression)
Variables specified as depedencies are sharedCorrectness depends on the programmerExpressions are evaluated at runtime!
Alex Duran (BSC) Programming with OmpSs 28th May 2011 19 / 35
Tasking in OpenMP The dependency clauses
Example
x = 0;#pragma omp task input ( x )
p r i n t f ("x = %d\n" , x ) ;
#pragma omp task inout ( x )x++;
#pragma omp task output ( x )x = 2 ;
#pragma omp task input ( x )p r i n t ("x = %d\n" , x ) ;
#pragma omp task input ( x )w r i t e ( x ) ;
Alex Duran (BSC) Programming with OmpSs 28th May 2011 20 / 35
Tasking in OpenMP The dependency clauses
Extended expressions
OmpSs extends the valid expressions in the context of dependencyclauses
Array sections. Two syntaxes:[ lower : upper ]
A[3:5] <=> A[3], A[4], A[5]A[ : upper ] <=> A[ 0 : upper ]A[ lower : ] <=> A[ lower : elements of (A) - 1 ]A[ : ] <=> A <=> A[ 0 : elements of (A) - 1 ]
[ lower ; size ]A[3;5] <=> A[3],A[4],A[5],A[6],A[7]
Shaping expressionsAllow to define the shape of the memory behind a pointerMultiple dimensions can be specified
[N][N] p
Alex Duran (BSC) Programming with OmpSs 28th May 2011 21 / 35
Right now the sections must overlap perfectly!
Tasking in OpenMP The dependency clauses
How the graph is build
The graph is built in sequential orderEnsures that no-deadlock can occur
For each dependency:1 Evaluate the expression2 If input, lookup for a previous writer on the evaluated address3 If output/inout, lookup for a previous task on that address4 If found, connect5 If output/inout, become last writer
Alex Duran (BSC) Programming with OmpSs 28th May 2011 22 / 35
Tasking in OpenMP The dependency clauses
SparseLU
The problemPerform a LU factorization on blocked sparse matrix
lu0 fwd fwd fwd
bdiv bmod bmod bmod
bdiv bmod bmod bmod
bdiv bmod bmod bmod
Alex Duran (BSC) Programming with OmpSs 28th May 2011 23 / 35
Tasking in OpenMP The dependency clauses
SparseLU
for ( kk =0; kk<NB; kk ++) {lu0 (A [ kk ] [ kk ] ) ;for ( j j =kk +1; j j <NB; j j ++)
i f (A [ kk ] [ j j ] != NULL)fwd (A [ kk ] [ kk ] , A [ kk ] [ j j ] ) ;
for ( i i =kk +1; i i <NB; i i ++)i f (A [ i i ] [ kk ] != NULL)
bdiv (A [ kk ] [ kk ] , A [ i i ] [ kk ] ) ;for ( i i =kk +1; i i <NB; i i ++)
for ( j j =kk +1; j j <NB; j j ++)i f (A [ kk ] [ j j ] != NULL) {
i f (A [ i i ] [ j j ]==NULL) A [ i i ] [ j j ]= a l l oca te_c lean_b lock ( ) ;bmod (A [ i i ] [ kk ] , A [ kk ] [ j j ] , A [ i i ] [ j j ] ) ;
}}
Alex Duran (BSC) Programming with OmpSs 28th May 2011 24 / 35
Tasking in OpenMP The dependency clauses
SparseLUWith dependences
for ( kk =0; kk<NB; kk ++) {#pragma omp task inout (A [ kk ] [ kk ] )
lu0 (A [ kk ] [ kk ] ) ;for ( j j =kk +1; j j <NB; j j ++)
i f (A [ kk ] [ j j ] != NULL)#pragma omp task input (A [ kk ] [ kk ] ) inout (A [ kk ] [ j j ] )
fwd (A [ kk ] [ kk ] , A [ kk ] [ j j ] ) ;for ( i i =kk +1; i i <NB; i i ++)
i f (A [ i i ] [ kk ] != NULL)#pragma omp task input (A [ kk ] [ kk ] ) inout (A [ i i ] [ kk ] )
bdiv (A [ kk ] [ kk ] , A [ i i ] [ kk ] ) ;for ( i i =kk +1; i i <NB; i i ++)
for ( j j =kk +1; j j <NB; j j ++)i f (A [ kk ] [ j j ] != NULL) {
i f (A [ i i ] [ j j ]==NULL) A [ i i ] [ j j ]= a l l oca te_c lean_b lock ( ) ;#pragma omp task input (A [ i i ] [ kk ] ,A [ kk ] [ j j ] ) inout (A [ i i ] [ j j ] )
bmod (A [ i i ] [ kk ] , A [ kk ] [ j j ] , A [ i i ] [ j j ] ) ;}
}#pragma omp taskwait
Alex Duran (BSC) Programming with OmpSs 28th May 2011 25 / 35
Tasking in OpenMP The dependency clauses
SparseLUGenerated graph
Alex Duran (BSC) Programming with OmpSs 28th May 2011 26 / 35
Tasking in OpenMP The dependency clauses
Waiting on some data
#pragma taskwait on ( expression )
Expressions allowed are the same as for the dependency clausesBlocks the encountering task until the data is available
#pragma omp task output ( x )x = 3 ;
#pragma omp taskwait on ( x )p r i n t f ("%d\n" , x ) ;
Alex Duran (BSC) Programming with OmpSs 28th May 2011 27 / 35
Tasking in OpenMP The dependency clauses
Waiting on some data
#pragma taskwait on ( expression )
Expressions allowed are the same as for the dependency clausesBlocks the encountering task until the data is available
#pragma omp task output ( x )x = 3 ;
#pragma omp taskwait on ( x )p r i n t f ("%d\n" , x ) ;
Alex Duran (BSC) Programming with OmpSs 28th May 2011 27 / 35
Tasking in OpenMP The dependency clauses
The concurrent clause
Sometimes the dependencies are too restrictive because of someknow property
E.g., reductionsThe concurrent clause is less-restrictive inout clause:
Concurrent dependencies are ordered with respect to outputsbeforeConcurrent dependencies are ordered with respect to outputsafterConcurrent dependencies are not ordered between themselvesThe task may require additional synchronization becauso of thisordering
Alex Duran (BSC) Programming with OmpSs 28th May 2011 28 / 35
Tasking in OpenMP The dependency clauses
The concurrent clause
#pragma omp task output ( [ n ] vec )void generate_vector ( i n t ∗vec , i n t n )
#pragma omp task input ( [ n ] vec ) concurrent ( ∗ r e s u l t s )void sum_task ( i n t ∗vec , i n t n , i n t ∗ r e s u l t s ){
i n t i ;i n t local_sum =0;
for ( i = 0 ; i < n ; i ++)local_sum += vec [ i ] ;
#pragma omp atomic∗ r e s u l t s += local_sum ;
}
Alex Duran (BSC) Programming with OmpSs 28th May 2011 29 / 35
Tasking in OpenMP The dependency clauses
Dependencies & nesting
Dependencies are only computed between siblingsThere are multiple graphs in practice
Each task must specify the coarse-grain depedenciesIts children can specify a subset of those
Depedencies across branches of tasks result in undefinedbehavior
Alex Duran (BSC) Programming with OmpSs 28th May 2011 30 / 35
Other OpenMP useful concepts
Outline
1 Installation
2 Tasking in OpenMPWaiting for tasksThe dependency clauses
3 Other OpenMP useful concepts
Alex Duran (BSC) Programming with OmpSs 28th May 2011 31 / 35
Other OpenMP useful concepts
Syncronization
#pragma omp atomicx++
#pragma omp critical{
foo ( ) ;}
Alex Duran (BSC) Programming with OmpSs 28th May 2011 32 / 35
Only works on SMP
Other OpenMP useful concepts
Loop worksharings
#pragma omp for schedule ( STATIC )for ( i = 0 ; i < N; i ++ )
f ( i ) ;
Iterations distributed among threadsDependencies can be applied to the whole loop
Alex Duran (BSC) Programming with OmpSs 28th May 2011 33 / 35
Other OpenMP useful concepts
Loop dependencies
#pragma omp for output ( a )for ( i = 0 ; i < N; i ++ )
a [ i ] = 0 ;
#pragma omp task input ( a )for ( i = 0 ; i < N; i ++ )
p r i n t f ("%d\n" , a [ i ] ) ;
Implies a nowait clause
Alex Duran (BSC) Programming with OmpSs 28th May 2011 34 / 35
Other OpenMP useful concepts
Loop dependencies
#pragma omp for output ( a )for ( i = 0 ; i < N; i ++ )
a [ i ] = 0 ;
#pragma omp task input ( a )for ( i = 0 ; i < N; i ++ )
p r i n t f ("%d\n" , a [ i ] ) ;
Implies a nowait clause
Alex Duran (BSC) Programming with OmpSs 28th May 2011 34 / 35
Other OpenMP useful concepts
API Calls
omp_num_threads Returns number of total threadsomp_get_thread_num Returns id of the threadomp_get_wtime Returns wall time
Alex Duran (BSC) Programming with OmpSs 28th May 2011 35 / 35