1 parallel computing—higher-level concepts of mpi
Post on 19-Dec-2015
243 views
TRANSCRIPT
![Page 1: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/1.jpg)
1
Parallel Computing—Higher-level concepts of MPI
![Page 2: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/2.jpg)
2
MPI—Presentation Outline
• Communicators, Groups, and Contexts• Collective Communication• Derived Datatypes• Virtual Topologies
![Page 3: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/3.jpg)
3
Communicators, groups, and contexts
• MPI provides a higher level abstraction to create parallel libraries:• Safe communication space• Group scope for collective operations• Process Naming
• Communicators + Groups provide:• Process Naming (instead of IP address + ports)• Group scope for collective operations
• Contexts:• Safe communication
![Page 4: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/4.jpg)
4
What are communicators?• A data-structure that contains groups (and thus processes)• Why is it useful:
• Process naming, ranks are names for application programmers • Easier than IPaddress + ports
• Group communications as well as point to point communication
• There are two types of communicators, • Intracommunicators:
• Communication within a group
• Intercommunicators:• Communication between two groups (must be disjoint)
![Page 5: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/5.jpg)
5
What are contexts?• An unique integer:
• An additional tag on the messages
• Each communicator has a distinct context that provides a safe communication universe:• A context is agreed upon by all processes when a
communicator is built
• Intracommunicators has two contexts:• One for point-to-point communications• One for collective communications,
• Intercommunicators has two contexts:• Explained in the coming slides
![Page 6: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/6.jpg)
6
Intracommunicators • Contains one group • Allows point-to-point and collective communications between
processes within this group• Communicators can only be built from existing communicators:
• MPI.COMM_WORLD is the first Intracommunicator to start with
• Creation of intracommunicators is a collective operation: • All processes in the existing communicator must call it in order to
execute successfully
• Intracommunicators can have process topologies: • Cartesian
• Graph
![Page 7: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/7.jpg)
7
Creating new Intracommunicators
newComm
COMM_WORLD
0 1 2 3
Create new communicatorwith process 0 and 3 only
0 1
MPI.Init(args);int [] incl1 = { 0, 3};Group grp1 = MPI.COMM_WORLD.Group();Group grp2 = grp1.Incl(incl1);Intracomm newComm = MPI.COMM_WORLD.Create(grp2);
![Page 8: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/8.jpg)
8
How do processes agree on context for new Intracommunicators ?
• Each process has a static context variable which is incremented whenever an Intracomm is created
• Each process increments this variable, sends it to all the other processes
• The max integer is agreed upon as the context• An existing communicators’ context is used for
sending “context agreement” messages:• What about MPI.COMM_WORLD?
• It is safe anyway, because it is the first intracommunicator and there is no chance of conflicts
![Page 9: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/9.jpg)
9
Intercommunicators• Contains two groups:
• Local group (the local process is in this group)• Remote group • Both groups must be disjoint
• Only allows point-to-point communications• Intercommunicators cannot have process
topologies • Next slide: How to create intercommunicators
![Page 10: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/10.jpg)
10
MPI.Init(args);int [] incl2 = {0, 2, 4, 6};int [] incl3 = {1, 3, 5, 7};Group grp1 = MPI.COMM_WORLD.Group();int rank = MPI.COMM_WORLD.Rank();Group grp2 = grp1.Incl(incl2);Group grp3 = grp1.Incl(incl3);Intercomm icomm = null;
if(rank == 0 || rank == 2 || rank == 4 || rank == 6) { icomm = MPI.COMM_WORLD.Create_intercomm(comm1,0,1,56);} else { icomm = MPI.COMM_WORLD.Create_intercomm(comm2,1,0,56);}
newComm
remote group
local group
Comm1
0(a) 1(b) 2(c) 3(d)
Comm2
0(e) 1(f) 2(g) 3(h)
0(a) 1(b) 2(c) 3(d)
0(e) 1(f) 2(g) 3(h)
Creating intercommunicators
![Page 11: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/11.jpg)
11
Creating intercomms …• What are the arguments to Create_intercomm method:
• Local communicator (which contains current process)
• local_leader (rank)
• remote_leader (rank)
• tag for messages sent for selection of contexts
• But, the groups were disjoint, how can they communicate?• That is where a peer communicator is required
• At least local_leader and remote_leader are part of this peer communicator
• In the last figure, MPI.COMM_WORLD is the peer communicator, and process 0 and 1 (ranks relative to MPI.COMM_WORLD) are leaders of their respective groups
![Page 12: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/12.jpg)
12
Selecting contexts for intercomms
• An intercommunicator has two contexts: • send_context (Used for sending messages)• recv_context (Used for receiving messages)
• In intercommunicators, processes in local group can only send messages to remote groups
• How is context agreed upon?• Each group decides its context, • The leaders (local and remote) exchange the
contexts agreed upon, • The one which is greater, is selected as the context
![Page 13: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/13.jpg)
13
Process 0
Process 1
Process 3
Process 2
Process 4
Process 5
Process 7
Process 6
COMM_WORLD
Group1
Group2 0
12
0
1
2
![Page 14: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/14.jpg)
14
MPI—Presentation Outline
• Point to Point Communication• Communicators, Groups, and Contexts• Collective Communication• Derived Datatypes• Virtual Topologies
![Page 15: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/15.jpg)
15
Collective communications
• Provided as a convenience for application developers:• Save significant development time• Efficient algorithms may be used • Stable (tested)
• Built on top of point-to-point communications• These operations include:
• Broadcast, Barrier, Reduce, Allreduce, Alltoall, Scatter, Scan, Allscatter
• Versions that allows displacements between the data
![Page 16: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/16.jpg)
16
Image from MPI standard doc
Broadcast, scatter, gather, allgather, alltoall
![Page 17: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/17.jpg)
17
Reduce collective operations
1
2
3
4
5
15
1
2
3
4
5
15
15
15
15
15
reduce
allreduce
Processes Data MPI.PROD MPI.SUM MPI.MIN MPI.MAX MPI.LAND MPI.BAND MPI.LOR MPI.BOR MPI.LXOR MPI.BXOR MPI.MINLOC MPI.MAXLOC
Processes
![Page 18: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/18.jpg)
18
Group A
0 54321
time ->
6 7
Eight processes, thus forms only one group Each process exchanges an integer 4 times Overlaps communications well
A Typical Barrier() Implementation
![Page 19: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/19.jpg)
19
Intracomm.Bcast( … )• Sends data from a process to all the other processes• Code from adlib:
• A communication library for HPJava
• The current implementation is based on n-ary tree:• Limitation: broadcasts only from rank=0• Generated dynamically
• Cost: O( log2(N) )• MPICH1.2.5 uses linear algorithm:
• Cost O(N)
• MPICH2 has much improved algorithms• LAM/MPI uses n-ary trees:
• Limitation, broadcast from rank=0
![Page 20: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/20.jpg)
20
0
654
4321
A Typical Broadcast Implementation
![Page 21: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/21.jpg)
21
MPI—Presentation Outline
• Point to Point Communication• Communicators, Groups, and Contexts• Collective Communication• Derived Datatypes• Virtual Topologies
![Page 22: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/22.jpg)
22
MPI Datatypes• What kind (type) of data can be sent using MPI
messaging? • Basically two types:
• Basic (primitive) datatypes• Derived datatypes
![Page 23: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/23.jpg)
23
MPI Basic Datatypes
• MPI_CHAR • MPI_SHORT • MPI_INT • MPI_LONG • MPI_UNSIGNED_CHAR • MPI_UNSIGNED_SHORT • MPI_UNSIGNED_LONG • MPI_UNSIGNED • MPI_FLOAT • MPI_DOUBLE • MPI_LONG_DOUBLE • MPI_BYTE
![Page 24: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/24.jpg)
24
• Besides basic datatypes, it is possible communicate heterogeneous and non-contiguous data: • Contiguous • Indexed• Vector• Struct
Derived Datatypes
![Page 25: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/25.jpg)
25
MPI—Presentation Outline
• Point to Point Communication• Communicators, Groups, and Contexts• Collective Communication• Derived Datatypes• Virtual Topologies
![Page 26: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/26.jpg)
26
Virtual topologies• Used to specify processes in a geometric shape• Virtual topologies have no connection with the
physical layout of machines:• Its possible to make use of underlying machine
architecture
• These virtual topologies can be assigned to processes in an Intracommunicator
• MPI provides:• Cartesian topology• Graph topology
![Page 27: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/27.jpg)
27
Cartesian topology: Mapping four processes onto 2x2 topology
• Each process is assigned a coordinate:• Rank 0: (0,0)• Rank 1: (1,0)• Rank 2: (0,1)• Rank 3: (1,1)
• Uses:• Calculate rank by knowing grid position• Calculate grid positions from ranks• Easier to locate rank of neighbours• Applications may have communication
patterns:• Lots of messaging with immediate
neighbours
Comm1
0 1 2 3
y-axis
x-axis
0 (0,0) 1 (1,0)
2 (0,1) 3 (1,1)
![Page 28: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/28.jpg)
28
Periods in cartesian topology
• Axis 1 (y-axis is periodic):• Processes in top and bottom
rows have valid neighbours towards top and bottom respectively
• Axis 0 (x-axis is non-periodic):• Processes in right and left
column have undefined neighbour towards right and left respectively
y-axis
x-axis
periodicity[0] = false;periodicity[1] = true;
![Page 29: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/29.jpg)
29
Graph topology
nnodes=4;index=2,3,4,6edges=1,3,0,3,0,2
0 1
2 3
![Page 30: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/30.jpg)
30
• Just to give you an idea how MPI-based applications are designed …
Doing Matrix Multiplication using MPI
![Page 31: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/31.jpg)
31
=x
1 0 2
2 1 0
0 2 2
0 1 0
0 0 1
1 1 1
2 3 2
0 2 1
2 2 4
Basically how it works!
![Page 32: 1 Parallel Computing—Higher-level concepts of MPI](https://reader036.vdocuments.site/reader036/viewer/2022062308/56649d3e5503460f94a16b5b/html5/thumbnails/32.jpg)
32
Matrix Multiplication MxN ..
int rank = MPI.COMM_WORLD.Rank() ; int size = MPI.COMM_WORLD.Size() ;
if(master_mpi_process) {
initialize matrices M and N
for(int i=1 ; i<size ; i++) { send rows of matrix M to process `i’ } broadcast matrix N to all non-zero processes
for (int i=0 ; i<size ; i++) { receive rows of resultant matrix from process `i’ }
.. print results .. } else { receive rows of Matrix M call broadcast to receive matrix N compute matrix multiplication for sub matrix (its done in parallel) send resultant row back to master process }
..