the mantevo project : tools for codesign
DESCRIPTION
The Mantevo Project : Tools for codesign. Richard F. Barrett Scalable Computer Architectures Sandia National Laboratories, NM [email protected] SAND 2013 - 2296P. ASCR X -Stack PI and coordination meeting March 22, 2013 Lawrence Berkeley National Laboratory. 1. Development Team. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/1.jpg)
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for
the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
Photos placed in horizontal position with even amount
of white space between photos
and header
Photos placed in horizontal position
with even amount of white space
between photos and header
The Mantevo Project :Tools for codesign
Richard F. BarrettScalable Computer ArchitecturesSandia National Laboratories, [email protected]
SAND 2013-2296P
ASCR X-Stack PI and coordination meeting
March 22, 2013Lawrence Berkeley National Laboratory
1
![Page 2: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/2.jpg)
2
Development Team
Daniel Barnette (Sandia), Richard Barrett (Sandia, Co-lead), David Beckingsale (U. Warwick), Jim Belak (LLNL), Mike Boulton (U. Bristol), Paul Crozier (Sandia), Doug Doerfler (Sandia), Carter Edwards (Sandia), Wayne Gaudin (AWE), Tim Germann (LANL), Si Hammond (Sandia), Andy Herdman (AWE), Mike Heroux (Sandia, Co-lead), Stephen Jarvis (U. Warwick), Paul Lin (Sandia), Justin Luitjens (Nvidia), Andrew Mallinson (U. Warwick), Simon McIntosh-Smith (U. Bristol), Sue Mniszewski (LANL), Jamal Mohd-Yusof (LANL), David Richards (LLNL), Christopher Sewell (LANL), Sriram Swaminarayan (LANL), Heidi Thornquist (Sandia), Christian Trott (Sandia), Courtenay Vaughan (Sandia), Alan Williams, and your name here.
![Page 3: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/3.jpg)
3
Theory Simulation
Experiment
Pillars of science
Performance models
Miniapps on testbeds
Architecture simulators, e.g. SST
![Page 4: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/4.jpg)
1M+ SLOC, written using
MPI + Fortran/C/C++ (+ OpenMP),
depending on multiple tpls,
executing on multiple generations of machines from
multiple vendors,
capture the work of generations of scientists, and are
getting mission-driven work done today!
Application codes are…
![Page 5: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/5.jpg)
5
Miniapps : Tools enabling explorationFocus Proxy for a key app performance issueIntent Tool for codesign: output is informationScope of change Any and allSize K SLOCAvailability Open source (LGPL)Developer/owner Application teamLife span Until its no longer useful
Related:
Benchmark Output: metric to be ranked.Compact app Application relevant answer.Skeleton app Inter-process comm, application “fake”
Proxy app Über notion
![Page 6: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/6.jpg)
6
Note:
Goal is to tune application, not miniapp.
Assessing the predictive capabilities of miniapps http://www.sandia.gov/~rfbarre/PAPERS/Miniapp_vv_SAND.pdf
Trinity procurement using miniapps: http://www.nersc.gov/systems/trinity-nersc-8-rfp/draft-nersc-8-trinity-benchmarks/
![Page 7: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/7.jpg)
7
Mantevo projectMiniapp
CloverLeaf Compressible Euler eqns on Cartesian grid, explicit, 2nd order
CoMD Molecular dynamics, mimics SPaSM.
HPCCG Conjugate gradient solver
miniFE Implicit finite element solver
miniGhost FDM/FVM explicit
miniMD Molecular dynamics (Lennard-Jones)
miniXyce SPICE-style circuit simulator
mini”Aero”* tbd
miniAMR* Eulerian on structured grid with AMR
miniExDyn-FE Explicit Dynamics (Kokkos-based)
miniITC-FE Implicit Thermal Conduction (Kokkos-based)
miniWave* Lagrangian FEM for the hyperbolic wave equation
PathFinder* Data analytics
phdMesh Explicit FEM: contact detection * in development
Version 1.0
![Page 8: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/8.jpg)
Snapshot of Technologies
MPI Intrins. CUDA OpenAcc OpnCL KokkosArray Cilk+
OpenMP
TBB ArBB/CEAN qthreads pthreads
MKL/Math Lib.
Adv. Lang. DSLs
DP TP
NVIDIA (GPU) ?
AMD (CPU) ?
AMD (APU) ? ?
Intel (CPU) ?
Intel (MIC) ?
IBM (BG) ? ?
ARM ?
miniFE
miniMD
miniGhost
LULESH
S3D
CoMD
VectorizationThreadingNode Level Future Languages
![Page 9: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/9.jpg)
9
Impact of memory speeds,Charon and miniFE
Nehalem
Cha
ron
and
min
iFE
MagnyCours
min
iFE
red : FEAblue : CG
![Page 10: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/10.jpg)
10
CTH (and miniGhost) Original Re-ordered
CharonminiFE
Communication patterns
![Page 11: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/11.jpg)
11
Revisiting halo exchange strategies:
num_vars
do i = 1, num_tsteps
end do
do j = 1, num_vars
end docompute
Multi-MBytes
CTH : Eulerian Multi-material modeling application.
• 3-d, finite volume, stencil computation.
• BSP with message aggregation (BSPMA).
• miniGhost Mantevo miniapp configured as proxy.
![Page 12: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/12.jpg)
CTH and miniGhost
Cha
ma
Cie
lo
![Page 13: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/13.jpg)
CTH and miniGhost
Cha
ma
Cie
lo
Profiling shows that computation time remains constant, but need more experiments to make stronger claim.
![Page 14: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/14.jpg)
14
Alternative inter-node strategy : Single Variable Aggregated Faces (SVAF)
Future architectures : less global bandwidth proportional to msg injection rate and bw
end do
compute
do i = 1, num_tsteps
do j = 1, num_vars
end do
![Page 15: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/15.jpg)
15
Cielo Chama
SVAF performance
![Page 16: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/16.jpg)
16
miniGhost : Difference stencils on XK6 node
Not enough memory
![Page 17: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/17.jpg)
17
miniGhost : Strong scaling on XK610243 domain, 20 variables
Begin gpu
![Page 18: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/18.jpg)
18
Summary Concrete steps can be taken to preparing applications for
what we see impacting exascale computation.
These steps lead to stronger performance on current and emerging architectures.
Miniapps and testbeds enable a tractable means for exploring these issues.
Mailing lists: http://mantevo.org/mail_lists.php
On-going work Intel φ, Kepler, ARM, XC, Tilera, Convey, Xeon, Opteron, …
Revolutionary programming models, languages, mechanisms
![Page 19: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/19.jpg)
19
Towards Performance Predictive Application-Dependent Workload Characterization, W. Alkohlani and J. Cook, In Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS12), 2012.
Exascale Work-load Characterization and Architecture Implications, P. Balaprakash, D. Buntinas, A. Chan, A. Guha, R. Gupta, S. H. K. Narayanan, A. A. Chien, P. Hovland, and B. Norris. Technical Report ANL/MCS-P3013-0712, Argonne National Laboratory, July 2012.
A case for dual stack virtualization: consolidating hpc and commodity applications in the cloud, B. Kocoloski, J. Ouyang, and J. Lange. In Proceedings of the Third ACM Symposium on Cloud Com- puting, SoCC ’12, New York, NY, USA, 2012.
Report of Experiments and Evidence for ASC L2 Milestone 4467 - Demonstration of a Legacy Application’s Path to Exascale, B.W. Barrett et al. Technical Report SAND2012-1750, Sandia National Laboratories, 2012.
Task Mapping for Noncontiguous Allocations, S.P. Feer, Z.D. Rhodes, N.W. Price, D.P. Bunde, and V.J. Leung. Technical report, 2013. SAND 2011-7334C, Submitted.
Communication Optimization Beyond MPI, Andrew Friedley and Andrew Lumsdaine. In Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, IPDPSW ’11. IEEE Computer Society, 2011.
On the Viability of Checkpoint Compression for Extreme Scale Fault Tolerance, Dewan Ibtesham, Dorian Arnold, Kurt B. Ferreira, and Patrick G. Bridges. In Proceedings of the 2011 international conference on Parallel Processing - Volume 2, Euro-Par’11. Springer-Verlag, 2012.
Charm++ for productivity and performance, L. Kale et al., 2011. GPU Acceleration of Data Assembly in Finite Element Methods and Its Energy implications, L.Tang,
X.Sharon Hu, Danny Z. Chen, M. Niemier, R.F. Barrett, S.D. Hammond, and G. Hsieh. 2013. A performance model with a fixed point for a molecular dynamics kernel, Robert W. Numrich and Michael
A. Heroux. . Computer Science - Research and Development, 23(3-4), June 2009. Micro-applications for communication data access patterns and mpi datatypes, Timo Schneider, Robert
Gerstenberger, and Torsten Hoefler. In Proceedings of EuroMPI, Lecture Notes in Computer Science, volume 7490. Springer, 2012.
Exploring latency-power tradeoffs in deep non-volatile memory hierarchies, Doe Hyun Yoon, Tobin Gonzalez, Parthasarathy Ranganathan, and Robert S. Schreiber. In Proceedings of the 9th conference on Computing Frontiers, CF ’12, pages 95–102, New York, NY, USA, 2012. ACM.
![Page 20: The Mantevo Project : Tools for codesign](https://reader036.vdocuments.site/reader036/viewer/2022062302/5681668a550346895dda4a45/html5/thumbnails/20.jpg)
20
Mantevo Home Page. http://mantevo.org, 2012. Improving Performance via Mini-applications. M.A. Heroux et al. Technical Report SAND2009-5574, Sandia National
Laboratories, September 2009. Using the Cray Gemini Performance Counters, K.T. Pedretti, C.T.Vaughan, K.S.Hemmert, and R.F. Barrett. In Proc. 55th Cray
User Group Meeting, 2013. Exascale Computing and the Role of Co-design. In High Performance Computing: From Grids and Clouds to Exascale, chapter.
J. Ang et al. IOS Press Inc, 2011. Emerging High Performance Computing Systems and Next Generation Engineering Analysis Applications , J.A. Ang, R.F.
Barrett, S.D. Hammond, and A.F. Rodrigues. In Pacific Rim Workshop on Innovations in Civil Infrastructure Engineering. National Taiwan University of Science and Technology, 2013. SAND 2013-0054P.
Summary of Work for ASC L2 Mile- stone 4465: Characterize the Role of the Mini-Application in Predicting Key Performance Characteristics of Real Applications, R.F. Barrett, P.S. Crozier, S.D. Hammond, M.A. Heroux, P.T. Lin, T.G. Trucano, and C.T. Vaughan. Technical Report SAND2012-4667, Sandia National Laboratories, 2012.
Assessing the Validity of the Role of Mini- Applications in Predicting Key Performance Characteristics of Scientific and Engineering Applications, R.F. Barrett, P.S. Crozier, S.D. Hammond, M.A. Heroux, P.T. Lin, T.G. Trucano, and C.T. Vaughan. In preparation.
Exascale Design Space Exploration and Co-design. R.F. Barrett, D.W. Doerfler, S.S. Dosanjh, S.D. Hammond, K.S. Hemmert, M.A. Heroux, P.T. Lin, J.P. Lutjiens, K.T. Pedretti, A.F. Rodrigues, and T.G. Trucano. Under review, 2012.
Navigating An Evolutionary Fast Path to Exascale, R.F. Barrett, S.D. Hammond, C.T. Vaughan, D.W. Doerfler, M.A. Heroux, J.P. Luitjens, and D. Roweth. In Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS12), 2012. Extended version: Technical Report SAND 2012-4667, Sandia National Laboratories, 2012. http://www.sandia.gov/~rfbarre/pubs_list.html.
Application Explorations for Future Interconnects, R.F. Barrett, C.T. Vaughan, S.D. Hammond, and D. Roweth. In Workshop on Large Scale Parallel Processing, at the IEEE International Parallel & Distributed Processing Symposium (IPDPS) Meeting, 2013.
MiniGhost: A Miniapp for Exploring Boundary Exchange Strategies Using Stencil Computations in Scientific Parallel Computing, R.F. Barrett, C.T. Vaughan, and M.A. Heroux. Technical Report SAND2011- 5294832, Sandia National Laboratories, May 2011.
Toward codesign in high performance computing systems, R.F. Barrett, X. S. Hu, S.S. Dosanjh, S. Parker, M.A. Heroux, and J. Shalf. In Proceedings of the International Conference on Computer- Aided Design, ICCAD ’12, New York, NY, USA, 2012. ACM.
Mantevo views: A flexible system for gathering and analyzing data for the mantevo project, Cameron Christesen. College of St. Benedict/St. John’s University Senior Honors Thesis, 2007. Undergraduate Thesis.