parallel computing through mpi technologies author: nyameko lisa supervisors: prof. elena...
TRANSCRIPT
![Page 1: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/1.jpg)
Parallel Computing Through MPI Technologies
Author: Nyameko Lisa
Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov
![Page 2: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/2.jpg)
Outline – Parallel Computing through MPI Technologies
Introduction Overview of MPI General Implementation Examples Application to Physics Problems Concluding Remarks
![Page 3: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/3.jpg)
Introduction – Need for Parallelism
More stars in the sky than there are grains of sands on all the beaches of the world
![Page 4: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/4.jpg)
Introduction – Need for Parallelism
It requires approximately 204 billion atoms to encode the human genome sequence
Vast number of problems from a wide range of fields have significant computational requirements
![Page 5: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/5.jpg)
Introduction – Aim of Parallelism
Attempt to divide a single problem into multiple parts
Distribute the segments of said problem amongst various processes or nodes
Provide a platform layer to manage data exchange between multiple processes that solve a common problem simultaneously
![Page 6: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/6.jpg)
Introduction – Serial Computation
Problem divided into discrete, serial sequence of instructions
Each executed individually, on a single CPU
![Page 7: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/7.jpg)
Introduction – Parallel Computation
Same problem distributed amongst several processes (program and allocated data)
![Page 8: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/8.jpg)
Introduction – Implementation
Main goal is to save time and hence money– Furthermore can solve larger problems – depleted resources – Overcome intrinsic limitations of serial computation– Distributed systems provide redundancy, concurrency and
access to non-local resources, e.g. SETI, Facebook, etc 3 methodologies for implementation of parallelism
– Physical Architecture– Framework– Algorithm
In practice will almost always be combination of above Greatest hurdle is managing distribution of information
and data exchange i.e. overhead
![Page 9: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/9.jpg)
Introduction – Top 500
Japan’s K Computer (Kei = 10 quadrillion) Currently fastest supercomputer cluster in the world 8.162 petaflops (~8 x 1015 calculations per second)
![Page 10: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/10.jpg)
Overview – What is MPI?
Message Passing Interface One of many frameworks and technologies
for implementing parallelization Library of subroutines (FORTRAN), classes
(C/C++) and bindings for python packages that mediate communication (via messages) between single threaded processes, executing independently and in parallel
![Page 11: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/11.jpg)
Overview – What is needed?
Common user accounts with same password Administrator / root privileges for all accounts Common directory structure and paths MPICH2 installed on all machines This is combination of MPI-1 and MPI-2
standards CH – Chameleon portability layer provides
backward compatibility to existing MPI frameworks
![Page 12: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/12.jpg)
Overview – What is needed?
MPICC & MPIF77 – Provide options and special libraries needed to compile and link MPI programs
MPIEXEC – Initialize parallel jobs and spawn copies of the executable to all of the processes
Each process executes its own copy of code By convention choose root process (rank 0) to
serve as master process
![Page 13: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/13.jpg)
General ImplementationHello World - C++
![Page 14: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/14.jpg)
General ImplementationHello World - FORTRAN
![Page 15: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/15.jpg)
General ImplementationHello World - Output
![Page 16: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/16.jpg)
Example - Broadcast Routine
Point-to-point (send & recv) and Collective (bcast) library routines are contained in library
Source node mediates distribution of data to/from all other nodes
![Page 17: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/17.jpg)
Example - Broadcast RoutineLinear Case
Apart from root and last nodes, each node receives from and sends to previous and next node respectively
Use point-to-point library routines to build custom collective routine
MPI_RECV(myProc - 1)
MPI_SEND(myProc + 1)
![Page 18: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/18.jpg)
Example - Broadcast RoutineBinary Tree
Each parent node sends message to two child nodes
MPI_SEND(2 * myProc)
MPI_SEND(2 * myProc+1) IF( MOD(myProc,2) == 0 ) MPI_RECV( myProc/2 ) ELSE MPI_RECV((myProc-1)/2)
![Page 19: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/19.jpg)
Example – Broadcast RoutineOutput
![Page 20: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/20.jpg)
Applications to Physics Problems
Quadrature – Discretize interval [a,b] into N steps and divide amongst processes: – FOR LOOP (1+myProc to N;incr of numProcs)– E.g. with N = 10 and numProcs = 3
Process: Iteration1, Iteration2,… 0: 1,4,7,10 1: 2,5,8 2: 3,6,9
Finite Difference problems – Similarly divide mesh/grid amongst processes
Many applications, limited only by our ingenuity
![Page 21: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/21.jpg)
Closing Remarks
In 1970’s, Intel co-founder Gordon Moore, correctly predicted that, ”number of transistors that can be inexpensively placed on an integrated circuit doubles approximately every 2 years”
10-Core Xeon E7 processor family chips are currently commercially available
MPI easy to implement and well suited to many independent operations that can be executed simultaneously
Only limitations are overhead incurred by inter-process communications, out ingenuity ands strictly sequential segments of program
![Page 22: Parallel Computing Through MPI Technologies Author: Nyameko Lisa Supervisors: Prof. Elena Zemlyanaya, Prof Alexandr P. Sapozhnikov and Tatiana F. Sapozhnikov](https://reader035.vdocuments.site/reader035/viewer/2022062518/56649ea05503460f94ba3f5e/html5/thumbnails/22.jpg)
Acknowledgements and Thanks
NRF and South African Department of Science and Technology
JINR, University Center Dr. Jacobs and Prof. Lekala Prof. Elena Zemlyanaya, Prof Alexandr P.
Sapozhnikov and Tatiana F. Sapozhnikov Last but not least my fellow colleagues