dynamic performance measurement control dynamic event grouping multiple configurable counters...
TRANSCRIPT
Dynamic performance measurement control
• Dynamic event grouping• Multiple configurable counters• Selective instrumentation
Application-Level Performance Access• Incremental profile dumping• Runtime profile access
Multi-Level Performance Instrumentation and Mapping• Optimized instrumentation• Tracing library enhancement• Performance mapping profile display
System and Hardware Performance Integration• Callback registration for system and hardware counters• Trace recording of counts
Project Goals and Recent Accomplishments
Performance Technology for Tera-Class Parallel Computers: Evolution of the TAU Performance System
Application/ Library
C / C++parser
Fortran 77/90parser
C / C++IL analyzer
Fortran 77/90IL analyzer
ProgramDatabase
Files
IL IL
DUCTAPE
PDBhtml
SILOON
CHASM
TAU_instr
Programdocumentation
Applicationcomponent glue
C++ / F90interoperability
Automatic sourceinstrumentation
Program Database Toolkit (PDT) Platforms: IBM SP3, SGI Origin 2K/3K, Compaq Alpha Cluster, Cray T3E / SV-1, Linux x86 / IA-64 clusters,
Hewlett-Packard Superdome/V-class, Hitachi SR8000, NEC SX-5
Languages: C, C++, Fortran 77, F90, OpenMP, Java, Python
Thread libraries: pthreads, SGI sproc, OpenMP, Java, Windows
Communications libraries: MPI, PVM, SHMEM
Parallelism paradigms: shared memory multi-threading, distributed memory message passing, mixed-mode
Performance technologies: Dyninst dynamic instrumentation, PAPI and PCL hardware counter libraries, Opari automatic OpenMP instrumentation, EPILOG tracing library, EXPERT trace analyzer, Vampir trace visualization, Paraver trace visualization
TAU Performance System Status
SAMRAI (LLNL, Andy Wissink). TAU is being integrated in SAMRAI framework as the main performance measurement facility, replacing hand-instrumented counters and timers. TAU is included as part of the SAMRAI distribution.
OVERTURE (LLNL, Brian Miller). TAU is being used in the Overture framework for object-oriented performance measurement. Miller has also integrated TAU in the adaptive-mesh refinement simulator, AMRSim, and is using TAU for another (undisclosed) software development project.
ALPS (LLNL, James Schek). The Adaptive Laser Plasma Simulator uses SAMRAI as a software component. It gains access to TAU’s measurement support in SAMRAI as well as uses TAU directly in other ALPS code.
C-SAFE (ASCI/ASAP; Utah, Chris Johnson and Steve Parker). TAU is being integrated in the Uintah Computation Framework (UCF) for use in performance measurement and analysis of C-SAFE applications. This work also includes new performance regression analysis and results reporting tools for cross-version and cross-experiment performance evaluation.
PERC (DOE SciDAC; PERC consortium). TAU has been installed at PERC sites and is being applied in performance analysis studies of SciDAC applications, including the EVH11 astrophysics code. Oregon is an affiliate member of PERC.
VTF (ASCI/ASAP; Caltech, Julian Cummings and Michael Aivazis). TAU is being integrated in the Virtual Test Facility (VTF) infrastructure because of its sole ability to portably measure performance across HPC systems with dynamically-loaded libraries and Python-controlled execution.
OpenMP (ASCI Path Forward; Intel/KAI, Bob Kuhn; Pallas, Hans-Christian Hoppe). A performance monitoring interface for OpenMP is being defined and will be submitted to the OpenMP Architecture Review Board for standardization. A prototype has been developed and demonstrated with the TAU performance system and the EXPERT analysis tool.
SAGE (LANL, Jack Horner). TAU has been applied to performance scalability analysis of the SAGE application. TAU is the only performance system that reliably provided a portable parallel profiling capability across all ASCI platforms.
POOMA II (LANL / Code Sourcery LLC, Jeffrey Oldham and Mark Mitchel). TAU has been used as the primary tool for performance analysis throughout the POOMA project, most recently in the POOMA 2 development. TAU is the only tool able to instrument and measure the complex use of C++ templates and expression template parallelization in the POOMA framework.
TAU Applications
EPILOG
Paraver
TAU Performance System Architecture
Work packet computation events colored by task type
Distinct phases of computation can be identifed based on task
Performance Mapping in Uintah
• Utah ASCI/ASAP Center for Simulation of Accidental Fires and Explosions
• Uintah Computational Framework (UCF)
• Taskgraph macro dataflow
• Performance mapping based on task semantics
Performance Profiling of EVH1
• Enhanced Virginia Hydrodynamics #1 benchmark
• IBM SP3, 16 processors
• jRacy parallel profile display tools
TAU Integration in SAMRAI
• Structured Adaptive Mesh Refinement Application Infrastructure
• Group-based SAMRAI performance timers and counts
• Seamless integration in routine and MPI performance measurement
IntegratedOpenMP +MPI events
Threadmessagepairing
Mixed-Mode Performance Analysis (OpenMP + MPI)
• 2-D Stommel model of ocean circulation
• Jacobi iteration,5-point stencil
• Integrated OpenMP and MPI events
• Uses OpenMP performance tools interface
• Automatic instrumentation with Opari
Dr. Allen D. Malony, PIDr. Sameer Shende, Post-docRobert Bell, Research associateKai Li, Ph.D. studentLi Li, Ph.D. student
Computational Science InstituteDepartment Computer & Information ScienceUniversity of Oregon
DOE Grant No. DE FG03-01ER25501
. . .
Raw performance data
PerfDMLdata
description
Performance analysis programs
PerfDMLtranslators
Performance analysisand query toolkit
ORDB
PostgreSQL
TAU Performance Database Framework
• XML profile data representation
• Multiple experiment performance database
Performance Technology for Tera-Class Parallel Computers: Evolution of the TAU Performance System
. . .
Raw performance data
PerfDMLdata
description
Performance analysis programs
PerfDMLtranslators
Performance analysisand query toolkit
ORDBPostgreSQL
Performanceresults
Application/ Library
C / C++parser
Fortran 77/90parser
C / C++IL analyzer
Fortran 77/90IL analyzer
ProgramDatabase
Files
IL IL
DUCTAPE
PDBhtml
SILOON
CHASM
TAU_instr
Programdocumentation
Applicationcomponent glue
C++ / F90interoperability
Automatic sourceinstrumentation
JVMPI
Thread API
Eventnotification
Java program
TAU package mpiJava package
MPI profiling interface
TAU wrapper
Native MPI library
Profile DB
JNI
TAU
Allen D. Malony, PISameer Shende, Research AssociateRobert Bell, Research AssociateDOE Grant No. DE FG03-01ER25501
Performance mapping for different tasks
Mapped task performance across processes
Work packet computation events colored by task type
Distinct phases of computation can be identifed based on task
8 processes
8 processes
32 processes32 processes
32 processes
EPILOG
Paraver. . .
Raw performance data
PerfDMLdata
description
Performance analysis programs
PerfDMLtranslators
Performance analysisand query toolkit
ORDBPostgreSQL
TAU Performance Database Framework
System and Hardware Performance Integration•Callback registration for system and hardware counters•Trace recording of counts
Multi-Level Performance Instrumentation and Mapping•Optimized instrumentation•Tracing library enhancement•Performance mapping profile display