the tau performance systemthe tau performance technology for complex parallel systems (performance...

35
The TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende , Allen D. Malony, Robert Bell University of Oregon {sameer, malony, bertie}@cs.uoregon.edu

Upload: others

Post on 22-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

The TAU Performance Technology for Complex Parallel Systems

(Performance Analysis Bring Your Own Code Workshop,NRL Washington D.C.)

Sameer Shende, Allen D. Malony, Robert BellUniversity of Oregon

{sameer, malony, bertie}@cs.uoregon.edu

Page 2: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20042

Outline

MotivationPart I: InstrumentationPart II: MeasurementPart III: Analysis Tools Conclusion

Page 3: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20043

TAU Performance System Framework

Tuning and Analysis UtilitiesPerformance system framework for scalable parallel and distributed high-performance computingTargets a general complex system computation model

nodes / contexts / threadsMulti-level: system / software / parallelismMeasurement and analysis abstraction

Integrated toolkit for performance instrumentation, measurement, analysis, and visualization

Portable, configurable performance profiling/tracing facilityOpen software approach

University of Oregon, LANL, FZJ Germanyhttp://www.cs.uoregon.edu/research/paracomp/tau

Page 4: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20044

TAU Performance System Architecture

EPILOG

Paraver

paraprof

Page 5: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20045

TAU Analysis

Parallel profile analysispprof

parallel profiler with text-based displayparaprof

Graphical, scalable, parallel profile analysis and displayTrace analysis and visualization

Trace merging and clock adjustment (if necessary)Trace format conversion (ALOG, SDDF, VTF, Paraver)Trace visualization using Vampir (Pallas/Intel)

Page 6: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20046

Pprof Output (ESMF CoupledFlowSolver)IBM AIXF95,C++,C, MPIProfile- Node- Context- ThreadEvents- code- MPI

Page 7: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20047

Terminology – Exampleint main( ){ /* takes 100 secs */

f1(); /* takes 20 secs */f2(); /* takes 50 secs */f1(); /* takes 20 secs */

/* other work */}

/*Time can be replaced by counts */

For routine “int main( )”:Exclusive time

100-20-50-20=10 secsInclusive time

100 secsCalls

1 callSubrs (no. of child routines called)

3Inclusive time/call

100secs

Page 8: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20048

Performance Analysis and Visualization

Analysis of parallel profile and trace measurementParallel profile analysis

ParaProfCube Profile Browser (UTK, FZJ)Profile generation from trace data

Performance data management framework (PerfDMF)Parallel trace analysis

Translation to VTF 3.0 and EPILOGIntegration with VNG (Technical University of Dresden)

Online parallel analysis and visualization

Page 9: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 20049

TAU’s ParaProf Framework Architecture

Portable, extensible, and scalable tool for profile analysisTry to offer “best of breed” capabilities to analystsBuild as profile analysis framework for extensibility

Page 10: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200410

Profile Manager Window

Structured AMR toolkit (SAMRAI++), LLNL

Page 11: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200411

Paraprof: CoupledFlowApp (ESMF) on 4 Nodes

Page 12: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200412

Paraprof Mean Profile (4 nodes)

Page 13: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200413

Individual Node (0) Profile in Paraprof

Page 14: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200414

MPI Routines

Page 15: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200415

Text Profile Window

Page 16: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200416

k-Level Callpath Implementation in TAU

TAU maintains a performance event (routine) callstackProfiled routine (child) looks in callstack for parent

Previous profiled performance event is the parentA callpath profile structure created first time parent callsTAU records parent in a callgraph map for childString representing k-level callpath used as its key

“a( )=>b( )=>c()” : name for time spent in “c” when called by “b” when “b” is called by “a”

Map returns pointer to callpath profile structurek-level callpath is profiled using this profiling dataSet environment variable TAU_CALLPATH_DEPTH to depth

Build upon TAU’s performance mapping technologyMeasurement is independent of instrumentationUse –PROFILECALLPATH to configure TAU

Page 17: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200417

k-Level Callpath Implementation in TAU

Page 18: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200418

Examining Callpaths

Page 19: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200419

Unique Callpaths

Page 20: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200420

Gprof Style Parent, Routine, Children Display

Page 21: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200421

Clickable Callpath Entities

Page 22: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200422

Paraprof

Page 23: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200423

Tracking I/O on Node 0 in ESMF

Page 24: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200424

Calling Path for MPI_Recv( )

Page 25: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200425

CUBE (UTK, FZJ) Browser [Sept. 2004]

Page 26: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200426

Using TAU with Vampir (Intel Trace Analyzer)

Configure TAU with -TRACE option% configure –TRACE –mpi …

Execute application% poe CoupledFlowApp –procs 4

This generates TAU traces and event descriptors Merge all traces using tau_merge% tau_merge *.trc app.trc

Convert traces to Vampir Trace format using tau_convert% tau_convert –pv app.trc tau.edf app.pvNote: Use –vampir instead of –pv for multi-threaded traces

Load generated trace file in Vampir% vampir app.pv

Page 27: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200427

Global Timeline Display with Parallelism View

Page 28: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200428

Vampir: Zooming In…

Page 29: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200429

Vampir: IO on Node 0

Page 30: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200430

Vampir: Communication Matrix Display

Page 31: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200431

Vampir: Calltree View

Page 32: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200432

Summary Chart

Page 33: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200433

TAU Performance System Status

Computing platforms (selected)IBM SP / pSeries, SGI Origin 2K/3K, Cray T3E / SV-1 / X1, HP (Compaq) SC (Tru64), Sun, Hitachi SR8000, NEC SX-5/6, Linux clusters (IA-32/64, Alpha, PPC, PA-RISC, Power, Opteron), Apple (G4/5, OS X), Windows

Programming languagesC, C++, Fortran 77/90/95, HPF, Java, OpenMP, Python

Thread librariespthreads, SGI sproc, Java,Windows, OpenMP

Compilers (selected)Intel KAI (KCC, KAP/Pro), PGI, GNU, Fujitsu, Sun, Microsoft, SGI, Cray, IBM (xlc, xlf), Compaq, NEC, Intel

Page 34: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200434

Concluding Remarks

Complex parallel systems and software pose challenging performance analysis problems that require robust methodologies and toolsTo build more sophisticated performance tools, existing proven performance technology must be utilizedPerformance tools must be integrated with software and systems models and technology

Performance engineered softwareFunction consistently and coherently in software and system environments

TAU performance system offers robust performance technology that can be broadly integrated

Page 35: The TAU Performance SystemThe TAU Performance Technology for Complex Parallel Systems (Performance Analysis Bring Your Own Code Workshop, NRL Washington D.C.) Sameer Shende, Allen

Using TAU Performance Technology in ESMF ESMF Team Meeting July 14, 200435

Support Acknowledgements

Department of Energy (DOE)Office of Science contractsUniversity of Utah DOE ASCI Level 1 sub-contractDOE ASCI Level 3 (LANL, LLNL)

NSF National Young Investigator (NYI) awardResearch Centre Juelich

John von Neumann Institute for ComputingDr. Bernd Mohr

Los Alamos National Laboratory