lecture 3: designing parallel programs. methodological design designing and building parallel...
DESCRIPTION
Methodological Design Partitioning Communication Agglomeration MappingTRANSCRIPT
Lecture 3:Lecture 3:
Designing Parallel Designing Parallel ProgramsPrograms
Methodological Design
Designing and Building Parallel Programs
by Ian Foster
www-unix.mcs.anl.gov/dbpp
Methodological Design
Partitioning Communication Agglomeration Mapping
Methodological Design
PROBLEM
Methodological Design Partitioning
The computation that is to be performed and the data operated on by this computation are decomposed into small tasks.
Practical issues such as the number of processors in the target computer are ignored, and attention is focused on recognizing opportunities for parallel execution.
PROBLEMPartitioning
Partitioning
Functional Decomposition
Data Decomposition
Partitioning
Functional Decomposition
Partitioning
Data Decomposition
Methodological DesignSingle Program Multiple Data (SPMD) programming model
A fixed number of identical tasks are created at program startup. Each task executes the same program but operates on different data.
Methodological Design Communication
The communication required to coordinate task execution is determined, and appropriate communication structures and algorithms are defined.
PROBLEMPartitioning
Communication
Communication
Latency (L): How long does it take to start sending a "message"? (in microseconds)
Bandwidth (B): What data rate can be sustained once the message is started? (in Mbytes/sec)
Transfer time = L + (1/B) * data size
Communication Local Communication
Xi,j = (4 Xi,j + Xi-1,j + Xi+1,j + Xi,j-1 + Xi,j+1 )/8
Xi-1,j
Xi,jXi,j-1 Xi,j+1
Xi+1,j
5-point Stencil
Communication Global Communication
n
∑Xi i=0
X0 X1 X2 X3 X4 X5 X6 X7
∑01 ∑2
3 ∑45 ∑6
7
∑07
∑03 ∑4
7
Communication
Static CommunicationCommunicating tasks do not change over time.
Dynamic CommunicationCommunicating tasks are determined by data computed at run time.
Communication
Dynamic Communication
B[i]=A[ k[i] ]k 1 2 1 3 4
A 10 15 20 25 30
B 10 15 10 20 25
Methodological Design Agglomeration
The task and communication structures defined in the first two stages of a design are evaluated with respect to performance requirements and implementation costs.
If necessary, tasks are combined into larger tasks to improve performance or to reduce development costs.
PROBLEMPartitioning
Communication
Agglomeration
Agglomeration
Agglomeration
Agglomeration
Goals are:
Increasing Granularity (fine grain / course grain)
Decreasing communication/computation ratio
AgglomerationXi,j = (4 Xi,j + Xi-1,j + Xi+1,j + Xi,j-1 + Xi,j+1 )/8
AgglomerationXi,j = (4 Xi,j + Xi-1,j + Xi+1,j + Xi,j-1 + Xi,j+1 )/8
Agglomeration
Methodological Design Mapping
Each task is assigned to a processor in a manner that minimizes the total execution time.
Mapping can be specified statically or determined at runtime by load-balancing algorithms.
PROBLEMPartitioning
Communication
Agglomeration
Mapping
MappingIf each task performs the same amount of computation
and communicates:
Cyclic Mapping
MappingThe goal of mapping algorithms is to minimize total
execution time.
Mapping algorithms attempt to satisfy the competing goals of:
maximizing processor utilization minimizing communication costs.
Mapping is NP-complete
MappingTwo strategies to achieve this goal:
Place tasks that are able to execute concurrently on different processors
Place tasks that communicate frequently on the same processor
Mapping
Consider the topology
Methodological Design
Case Study: Atmosphere
Model
Methodological Design
Case Study: Atmosphere
Model
Methodological Design
Case Study: Atmosphere
Model
Methodological Design
Case Study: Atmosphere
Model
PartitioningNx x Ny x Nz tasks
Methodological Design
Case Study: Atmosphere
Model
Communication
9-point stencil in horizontal dimension3-point stencil in vertical dimension
Methodological Design
Case Study: Atmosphere
Model
Communication
Computations
1. Global operations
2. Physics calculations
where denotes the mass at grid point (i,j,k) .
The total clear sky (TCS) at level k≥1 is defined as
where level 0 is the top of the atmosphere and
cldi is the cloud fraction at level i . In total, the physics component of the model requires on the order of 30 communications per grid point and per time step
Methodological Design
Case Study: Atmosphere
Model
Communicationobtains data from eight other tasks
Methodological Design
Case Study: Atmosphere
Model
Communicationobtains data from eight other tasks
only 4 communications are required per task
Methodological Design
Case Study: Atmosphere
Model
AgglomerationVertical dimension requires communication (2 messages) also (30 messages) for various ``physics'' computations
These communications can be avoided by agglomerating tasks within each vertical column.
Therefore tasks will be created.
Methodological Design
Case Study: Atmosphere
Model
Mapping
Methodological Design
Methodological Design
Agglomeration
Methodological Design
Increase Granularity Decrease communication/computation ratio