the tau performance systemthe tau performance technology for complex parallel systems (performance...
TRANSCRIPT
The TAU Performance Technology for Complex Parallel Systems
(Performance Analysis Bring Your Own Code Workshop,NRL Washington D.C.)
Sameer Shende, Allen D. Malony, Robert BellUniversity of Oregon
{sameer, malony, bertie}@cs.uoregon.edu
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20042
Outline
MotivationPart I: InstrumentationPart II: MeasurementPart III: Analysis Tools Conclusion
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20043
TAU Performance System Framework
Tuning and Analysis UtilitiesPerformance system framework for scalable parallel and distributed high-performance computingTargets a general complex system computation model
nodes / contexts / threadsMulti-level: system / software / parallelismMeasurement and analysis abstraction
Integrated toolkit for performance instrumentation, measurement, analysis, and visualization
Portable, configurable performance profiling/tracing facilityOpen software approach
University of Oregon, LANL, FZJ Germanyhttp://www.cs.uoregon.edu/research/paracomp/tau
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20044
TAU Performance System Architecture
EPILOG
Paraver
paraprof
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20045
TAU Analysis
Parallel profile analysispprof
parallel profiler with text-based displayparaprof
Graphical, scalable, parallel profile analysis and displayTrace analysis and visualization
Trace merging and clock adjustment (if necessary)Trace format conversion (ALOG, SDDF, VTF, Paraver)Trace visualization using Vampir (Pallas/Intel)
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20046
Pprof Output (ESMF CoupledFlowSolver)IBM AIXF95,C++,C, MPIProfile- Node- Context- ThreadEvents- code- MPI
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20047
Terminology – Exampleint main( ){ /* takes 100 secs */
f1(); /* takes 20 secs */f2(); /* takes 50 secs */f1(); /* takes 20 secs */
/* other work */}
/*Time can be replaced by counts */
For routine “int main( )”:Exclusive time
100-20-50-20=10 secsInclusive time
100 secsCalls
1 callSubrs (no. of child routines called)
3Inclusive time/call
100secs
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20048
Performance Analysis and Visualization
Analysis of parallel profile and trace measurementParallel profile analysis
ParaProfCube Profile Browser (UTK, FZJ)Profile generation from trace data
Performance data management framework (PerfDMF)Parallel trace analysis
Translation to VTF 3.0 and EPILOGIntegration with VNG (Technical University of Dresden)
Online parallel analysis and visualization
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20049
TAU’s ParaProf Framework Architecture
Portable, extensible, and scalable tool for profile analysisTry to offer “best of breed” capabilities to analystsBuild as profile analysis framework for extensibility
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200410
Profile Manager Window
Structured AMR toolkit (SAMRAI++), LLNL
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200411
Paraprof: CoupledFlowApp (ESMF) on 4 Nodes
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200412
Paraprof Mean Profile (4 nodes)
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200413
Individual Node (0) Profile in Paraprof
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200414
MPI Routines
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200415
Text Profile Window
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200416
k-Level Callpath Implementation in TAU
TAU maintains a performance event (routine) callstackProfiled routine (child) looks in callstack for parent
Previous profiled performance event is the parentA callpath profile structure created first time parent callsTAU records parent in a callgraph map for childString representing k-level callpath used as its key
“a( )=>b( )=>c()” : name for time spent in “c” when called by “b” when “b” is called by “a”
Map returns pointer to callpath profile structurek-level callpath is profiled using this profiling dataSet environment variable TAU_CALLPATH_DEPTH to depth
Build upon TAU’s performance mapping technologyMeasurement is independent of instrumentationUse –PROFILECALLPATH to configure TAU
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200417
k-Level Callpath Implementation in TAU
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200418
Examining Callpaths
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200419
Unique Callpaths
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200420
Gprof Style Parent, Routine, Children Display
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200421
Clickable Callpath Entities
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200422
Paraprof
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200423
Tracking I/O on Node 0 in ESMF
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200424
Calling Path for MPI_Recv( )
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200425
CUBE (UTK, FZJ) Browser [Sept. 2004]
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200426
Using TAU with Vampir (Intel Trace Analyzer)
Configure TAU with -TRACE option% configure –TRACE –mpi …
Execute application% poe CoupledFlowApp –procs 4
This generates TAU traces and event descriptors Merge all traces using tau_merge% tau_merge *.trc app.trc
Convert traces to Vampir Trace format using tau_convert% tau_convert –pv app.trc tau.edf app.pvNote: Use –vampir instead of –pv for multi-threaded traces
Load generated trace file in Vampir% vampir app.pv
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200427
Global Timeline Display with Parallelism View
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200428
Vampir: Zooming In…
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200429
Vampir: IO on Node 0
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200430
Vampir: Communication Matrix Display
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200431
Vampir: Calltree View
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200432
Summary Chart
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200433
TAU Performance System Status
Computing platforms (selected)IBM SP / pSeries, SGI Origin 2K/3K, Cray T3E / SV-1 / X1, HP (Compaq) SC (Tru64), Sun, Hitachi SR8000, NEC SX-5/6, Linux clusters (IA-32/64, Alpha, PPC, PA-RISC, Power, Opteron), Apple (G4/5, OS X), Windows
Programming languagesC, C++, Fortran 77/90/95, HPF, Java, OpenMP, Python
Thread librariespthreads, SGI sproc, Java,Windows, OpenMP
Compilers (selected)Intel KAI (KCC, KAP/Pro), PGI, GNU, Fujitsu, Sun, Microsoft, SGI, Cray, IBM (xlc, xlf), Compaq, NEC, Intel
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200434
Concluding Remarks
Complex parallel systems and software pose challenging performance analysis problems that require robust methodologies and toolsTo build more sophisticated performance tools, existing proven performance technology must be utilizedPerformance tools must be integrated with software and systems models and technology
Performance engineered softwareFunction consistently and coherently in software and system environments
TAU performance system offers robust performance technology that can be broadly integrated
Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200435
Support Acknowledgements
Department of Energy (DOE)Office of Science contractsUniversity of Utah DOE ASCI Level 1 sub-contractDOE ASCI Level 3 (LANL, LLNL)
NSF National Young Investigator (NYI) awardResearch Centre Juelich
John von Neumann Institute for ComputingDr. Bernd Mohr
Los Alamos National Laboratory