lecture 27: halo exchange and contention
TRANSCRIPT
![Page 1: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/1.jpg)
Lecture 27: Halo Exchange and Contention
William Gropp www.cs.illinois.edu/~wgropp
![Page 2: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/2.jpg)
2
Unexpected Hot Spots
• Even simple operations can give surprising performance behavior.
• Examples arise even in common grid exchange patterns
• Message passing illustrates problems present even in shared memory ♦ Blocking operations may cause
unavoidable stalls
![Page 3: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/3.jpg)
3
Mesh Exchange
• Exchange data on a mesh
![Page 4: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/4.jpg)
4
Sample Code
• Do i=1,n_neighbors Call MPI_Send(edge(1,i), len, MPI_REAL, & nbr(i), tag,comm, ierr) Enddo Do i=1,n_neighbors Call MPI_Recv(edge(1,i), len, MPI_REAL, & nbr(i), tag, comm, status, ierr) Enddo
![Page 5: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/5.jpg)
5
Deadlocks!
• All of the sends may block, waiting for a matching receive (will for large enough messages)
• The variation of if (has down nbr) then Call MPI_Send( … down … ) endif if (has up nbr) then Call MPI_Recv( … up … ) endif … sequentializes (all except the bottom process blocks)
![Page 6: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/6.jpg)
6
Sequentialization
StartSend
StartSend
StartSend
StartSend
StartSend
StartSend
Send Recv
Send Recv
Send Recv
Send Recv
Send Recv
Send Recv
Send Recv
![Page 7: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/7.jpg)
7
Fix 1: Use Irecv
• Do i=1,n_neighbors Call MPI_Irecv(inedge(1,i), len, MPI_REAL, nbr(i), tag,& comm, requests(i), ierr) Enddo Do i=1,n_neighbors Call MPI_Send(edge(1,i), len, MPI_REAL, nbr(i), tag,& comm, ierr) Enddo Call MPI_Waitall(n_neighbors, requests, statuses, ierr)
• Does not perform well in practice. Why?
![Page 8: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/8.jpg)
8
Understanding the Behavior: Timing Model
• Sends interleave • Sends block (data larger than
buffering will allow) • Sends control timing • Receives do not interfere with
Sends • Exchange can be done in 4 steps
(down, right, up, left)
![Page 9: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/9.jpg)
9
Mesh Exchange - Step 1
• Exchange data on a mesh
![Page 10: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/10.jpg)
10
Mesh Exchange - Step 2
• Exchange data on a mesh
![Page 11: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/11.jpg)
11
Mesh Exchange - Step 3
• Exchange data on a mesh
![Page 12: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/12.jpg)
12
Mesh Exchange - Step 4
• Exchange data on a mesh
![Page 13: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/13.jpg)
13
Mesh Exchange - Step 5
• Exchange data on a mesh
![Page 14: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/14.jpg)
14
Mesh Exchange - Step 6
• Exchange data on a mesh
![Page 15: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/15.jpg)
15
Timeline from IBM SP
• Note that process 1 finishes last, as predicted
![Page 16: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/16.jpg)
16
Distribution of Sends
![Page 17: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/17.jpg)
17
Why Six Steps?
• Ordering of Sends introduces delays when there is contention at the receiver
• Takes roughly twice as long as it should
• Bandwidth is being wasted • Same thing would happen if using
memcpy and shared memory
![Page 18: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/18.jpg)
18
Fix 2: Use Isend and Irecv
• Do i=1,n_neighbors Call MPI_Irecv(inedge(1,i),len,MPI_REAL,nbr(i),tag,& comm, requests(i),ierr) Enddo Do i=1,n_neighbors Call MPI_Isend(edge(1,i), len, MPI_REAL, nbr(i), tag,& comm, requests(n_neighbors+i), ierr) Enddo Call MPI_Waitall(2*n_neighbors, requests, statuses, ierr)
![Page 19: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/19.jpg)
19
Mesh Exchange - Steps 1-4
• Four interleaved steps
![Page 20: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/20.jpg)
20
Timeline from IBM SP
Note processes 5 and 6 are the only interior processors; these perform more communication than the other processors
![Page 21: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/21.jpg)
21
Lesson: Defer Synchronization
• Send-receive accomplishes two things: ♦ Data transfer ♦ Synchronization
• In many cases, there is more synchronization than required
• Use nonblocking operations and MPI_Waitall to defer synchronization
• Effect still common; recently observed on Blue Waters
![Page 22: Lecture 27: Halo Exchange and Contention](https://reader034.vdocuments.site/reader034/viewer/2022042800/5871ff861a28ab38768b5e07/html5/thumbnails/22.jpg)
22
More Flexibility
• MPI_Waitall forces the process (strictly thread) to wait until all requests have completed
• At the cost ot extra code complexity, can use ♦ MPI_Waitany – return when any one of the requests
complete ♦ MPI_Waitsome – return all complete request once at
least one is complete • Now available data can be processed while the
rest arrives ♦ Works best when there is asynchronous progress by
the MPI implementation