porting an mpi application to hybrid mpi+openmp with reveal tool on shaheen ii
Post on 15-Feb-2017
98 Views
Preview:
TRANSCRIPT
KAUSTSupercompu.ngLaboratoryPor.nganMPIapplica.ontohybridMPI+OpenMPwithRevealtoolonShaheenII
GeorgeMarkomanolisComputa.onalScien.stJune23th,2016
Outline
KAUST King Abdullah University of Science and Technology 2
❖ Introduction
❖ Test case
❖ Reveal
Introduc.on-ComponentsofCrayPat
KAUST King Abdullah University of Science and Technology 3
❖ Module perftools-base
• pat_build – Instruments the program to be analyzed • pat_report – Generates text reports from the performance data
captured during program execution and exports data for use in other programs.
• Cray Apprentice2 – A graphical analysis tool that can be used to visualize and explore the performance data captured during program, execution
• Reveal – A graphical source code analysis tool that can be used to correlate performance analysis data with annotated source code listings, to identify key opportunities for optimization (it works only with Cray compiler)
Studyingcase
KAUST King Abdullah University of Science and Technology 4
❖ Application from seismic group related to acoustic wave
solver • Why this application? A user asked for it • MPI application • Test on 3 nodes with totally 96 cores on
Shaheen II
Prepareforthetutorial
KAUST King Abdullah University of Science and Technology 5
• Connect to Shaheen II and copy the material: • ssh –X username@shaheen.kaust.edu.sa
• cp /scratch/tmp/model_reveal.tgz .
• tar zxvf model_reveal.tgz
• cd model_reveal
• Reservation name: k1056_141
Reveal
A tool to port your application to OpenMP programming model
KAUST King Abdullah University of Science and Technology 6
Reveal
KAUST King Abdullah University of Science and Technology 7
❖ Reveal is Cray’s next-generation integrated performance analysis and code optimization tool.
• Source code navigation using whole program
analysis (data provided by the Cray compilation environment only)
• Coupling with performance data collected during execution by CrayPAT. Understand which high level serial loops could benefit from parallelism.
• Enhanced loop mark listing functionality. • Dependency information for targeted loops • Assist users optimize code by providing variable
scoping feedback and suggested compile directives.
PrepareforReveal
KAUST King Abdullah University of Science and Technology 8
❖ Load Perftools • module unload darshan • module load perftools-base/6.3.2 • module load perftools/6.3.2
❖ Execute the MPI version • cd model_reveal • make clean • make • In the submit.sh file changed to your account number and submit the
job § sbatch submit.sh
• tail -n 10 testdata.XXX.err § 1m46.361s
Reservation: k1056_141
Preparetheapplica.onforReveal
KAUST King Abdullah University of Science and Technology 9
❖ Compile the version for Reveal tool • make clean –f Makefile_reveal • In the Makefile_reveal file
§ $(CC) -h profile_generate -hpl=data.pl -h noomp $< -o $@ $(CFLAGS)
§ ${CC} -h profile_generate -hpl=data.pl -h noomp -c $< CrayData.c § Reveal needs the object of the files, so you need to modify the
Makefile if needed • make –f Makefile_reveal • The folder data.pl is created in the folder • Instrument your application
§ pat_build –w CrayData.exe § New executable is called CrayData.exe+pat, replace it to submit.sh
SubmitthejobforRevealtool
KAUST King Abdullah University of Science and Technology 10
❖ Submit your job script and do not forget the reservation name (--reservation=…)
• sbatch submit.sh
❖ A performance file (extension .xf) is created, if not something was wrong in the previous steps
❖ Generate the report and the ap2 file • pat_report -o report.txt CrayData.exe+pat+58072-37t.xf
❖ Execute Reveal • reveal data.pl CrayData.exe+pat+58072-37t.ap2
Reveal–LoopPerformance
KAUST King Abdullah University of Science and Technology 11
Reveal–Scoping
KAUST King Abdullah University of Science and Technology 12
Reveal–Programview
KAUST King Abdullah University of Science and Technology 13
Reveal–Func.onView
KAUST King Abdullah University of Science and Technology 14
Reveal–ArrayView
KAUST King Abdullah University of Science and Technology 15
Reveal–CompilerMessages
KAUST King Abdullah University of Science and Technology 16
Reveal–LoopPerformance
KAUST King Abdullah University of Science and Technology 17
Reveal–ScopingTool
KAUST King Abdullah University of Science and Technology 18
Reveal–ScopingResults
KAUST King Abdullah University of Science and Technology 19
Reveal–OpenMPpragmas
KAUST King Abdullah University of Science and Technology 20
Reveal–InsertedOpenMPpragmas
KAUST King Abdullah University of Science and Technology 21
CleanthecodefromunresolvedissuesandobserveOpenMPpragmas
KAUST King Abdullah University of Science and Technology 22
❖ vim CrayData.c ❖ Remove the lines with unresolved, only if you are sure.
#pragma omp parallel for default(none) \ private (i1,i2,u) \ shared (nxpad,nzpad)
#pragma omp parallel for default(none) \ private (ix,ib,ibz) \ shared (nxpad,nb,nzpad,bndr,p0) \ lastprivate (w)
CheckanOpenMPpragmaanditsvalida.on
KAUST King Abdullah University of Science and Technology 23
#pragma omp parallel for default(none) private (ix,ib,ibz) \ shared (nxpad,nb,nzpad,bndr,p0) \ lastprivate (w) for(ix=0; ix<nxpad; ix++) {
for(ib=0; ib<nb; ib++) { w = bndr[nb-ib-1]; ibz = nzpad-ib-1;
p0[ix][ib ] *= w; /* top sponge */ p0[ix][ibz] *= w; /* bottom sponge */ } } for(ib=0; ib<nb; ib++) { ibx = nxpad-ib-1; for(iz=0; iz<nzpad; iz++) { p0[ib ][iz] *= w; /* left sponge */
p0[ibx][iz] *= w; /* right sponge */ } }
Cleanthecodefromunresolvedissues,compileandrun
KAUST King Abdullah University of Science and Technology 24
❖ vim CrayData.c ❖ Remove the lines with unresolved if you are sure. ❖ Compile your application with MPI and OpenMP
• make –f Makefile_omp • The new executable is called CrayData_omp.exe • Comment the active srun line in the submit.sh and uncomment
the next srun call. • Uncomment also the line with OMP_NUM_THREADS=2 • Now, we will execute the application with 48 MPI processes
(ntasks) and 2 threads per MPI process (cpus-per-task) • srun --ntasks=48 --ntasks-per-node=16 --ntasks-per-socket=8 --
hint=nomultithread --cpus-per-task=2 ./CrayData_omp.exe
Differentcasesandresults
KAUST King Abdullah University of Science and Technology 25
❖ Results for 2 threads • Change according:
§ export OMP_NUM_THREADS=2 § srun –ntasks=48 --ntasks-per-node=16 --ntasks-per-
socket=8 --hint=nomultithread --cpus-per-task=2 ./CrayData_omp.exe
• 51.211s (2.86X)
❖ Results 4 threads • Change according:
§ export OMP_NUM_THREADS=4 § srun --ntasks=24 --ntasks-per-node=8 --ntasks-per-socket=4
--hint=nomultithread --cpus-per-task=4 ./CrayData_omp.exe • 24.815s (5.9X)
Differentcasesandresults
KAUST King Abdullah University of Science and Technology 26
❖ Results 8 threads • 12.222s (11.98X)
❖ Results 16 threads • Change according:
§ export OMP_NUM_THREADS=16
§ srun --ntasks=6 --ntasks-per-node=2 --ntasks-per-socket=1 --hint=nomultithread --cpus-per-task=16 ./CrayData_omp.exe
• 8.895s (16.45X)
Theoriginalversionwasimproved19.19.mes
KAUST King Abdullah University of Science and Technology 27
170.67
106.36
8.8950
20406080
100120140160180
Originalversion Op.mizedMPIversion
MPI+OpenMP
Time(in
sec.)
Execu.on.me
Valida.on
KAUST King Abdullah University of Science and Technology 28
Original version Optimized MPI+OpenMP
Summary
KAUST King Abdullah University of Science and Technology 29
❖ Reveal is an easy to use tool
❖ The user should be careful though, give notice to compiler messages
❖ You can have great speedup with this tool
❖ We need to investigate more complicated applications
KAUST Supercomputing Laboratory
KAUST King Abdullah University of Science and Technology 30
top related