tau performance system

Download TAU Performance System

If you can't read please download the document

Upload: admon

Post on 19-Mar-2016

32 views

Category:

Documents


0 download

DESCRIPTION

TAU Performance System. Workshop on Petascale Architectures and Performance Strategies 4-5pm, Tuesday, July 24, 2007, Snowbird, UT Sameer S. Shende [email protected] http://www.cs.uoregon.edu/research/tau Performance Research Laboratory University of Oregon. Acknowledgements. - PowerPoint PPT Presentation

TRANSCRIPT

  • TAU Performance SystemWorkshop on Petascale Architectures and Performance Strategies4-5pm, Tuesday, July 24, 2007, Snowbird, UT

    Sameer S. Shende [email protected]://www.cs.uoregon.edu/research/tauPerformance Research LaboratoryUniversity of Oregon

    Petascale Architectures and Performance Strategies*TAU Performance System

    AcknowledgementsDr. Allen D. Malony, ProfessorAlan Morris, Senior software engineerWyatt Spear, Software engineerScott Biersdorff, Software engineerKevin Huck, Ph.D. studentAroon Nataraj, Ph.D. studentBrad Davidson, Systems administrator

    Petascale Architectures and Performance Strategies*TAU Performance System

    OutlineOverview of featuresInstrumentationMeasurementAnalysis toolsParallel profile analysis (ParaProf)Performance data management (PerfDMF)Performance data mining (PerfExplorer)Application examplesKernel monitoring and KTAU

    Petascale Architectures and Performance Strategies*TAU Performance System

    Performance EvaluationProfilingPresents summary statistics of performance metricsnumber of times a routine was invokedexclusive, inclusive time/hpm counts spent executing itnumber of instrumented child routines invoked, etc. structure of invocations (calltrees/callgraphs)memory, message communication sizes also trackedTracingPresents when and where events took place along a global timelinetimestamped log of eventsmessage communication events (sends/receives) are trackedshows when and where messages were sentlarge volume of performance data generated leads to more perturbation in the program

    Petascale Architectures and Performance Strategies*TAU Performance System

    Definitions ProfilingProfilingRecording of summary information during executioninclusive, exclusive time, # calls, hardware statistics, Reflects performance behavior of program entitiesfunctions, loops, basic blocksuser-defined semantic entitiesVery good for low-cost performance assessmentHelps to expose performance bottlenecks and hotspotsImplemented throughsampling: periodic OS interrupts or hardware counter trapsinstrumentation: direct insertion of measurement code

    Petascale Architectures and Performance Strategies*TAU Performance System

    Definitions TracingTracingRecording of information about significant points (events) during program executionentering/exiting code region (function, loop, block, )thread/process interactions (e.g., send/receive message)Save information in event recordtimestampCPU identifier, thread identifierEvent type and event-specific information Event trace is a time-sequenced stream of event records Can be used to reconstruct dynamic program behaviorTypically requires code instrumentation

    Petascale Architectures and Performance Strategies*TAU Performance System

    Event Tracing: Instrumentation, Monitor, Trace

    Petascale Architectures and Performance Strategies*TAU Performance System

    Event Tracing: Timeline Visualization

    Petascale Architectures and Performance Strategies*TAU Performance System

    Steps of Performance EvaluationCollect basic routine-level timing profile to determine where most time is being spentCollect routine-level hardware counter data to determine types of performance problemsCollect callpath profiles to determine sequence of events causing performance problemsConduct finer-grained profiling and/or tracing to pinpoint performance bottlenecksLoop-level profiling with hardware countersTracing of communication operations

    Petascale Architectures and Performance Strategies*TAU Performance System

    TAU Performance SystemTuning and Analysis Utilities (15+ year project effort)Performance system framework for HPC systemsIntegrated, scalable, flexible, and parallelTargets a general complex system computation modelEntities: nodes / contexts / threadsMulti-level: system / software / parallelismMeasurement and analysis abstractionIntegrated toolkit for performance problem solvingInstrumentation, measurement, analysis, and visualizationPortable performance profiling and tracing facilityPerformance data management and data miningPartners: LLNL, ANL, LANL, Research Center Jlich

    Petascale Architectures and Performance Strategies*TAU Performance System

    TAU Parallel Performance System GoalsPortable (open source) parallel performance systemComputer system architectures and operating systemsDifferent programming languages and compilersMulti-level, multi-language performance instrumentationFlexible and configurable performance measurementSupport for multiple parallel programming paradigmsMulti-threading, message passing, mixed-mode, hybrid, object oriented (generic), component-basedSupport for performance mappingIntegration of leading performance technologyScalable (very large) parallel performance analysis

    Petascale Architectures and Performance Strategies*TAU Performance System

    TAU Performance System Architecture

    Petascale Architectures and Performance Strategies*TAU Performance System

    TAU Performance System Architecture

    Petascale Architectures and Performance Strategies*TAU Performance System

    Program Database Toolkit (PDT)Application/ LibraryC / C++parserFortran parserF77/90/95C / C++IL analyzerFortranIL analyzerProgramDatabaseFilesILILDUCTAPEPDBhtmlSILOONCHASMTAU_instrProgramdocumentationApplicationcomponent glueC++ / F90/95interoperabilityAutomatic sourceinstrumentation

    Petascale Architectures and Performance Strategies*TAU Performance System

    Building Bridges to Other Tools: TAU

    Petascale Architectures and Performance Strategies*TAU Performance System

    TAU Instrumentation ApproachSupport for standard program eventsRoutines, classes and templatesStatement-level blocksSupport for user-defined eventsBegin/End events (user-defined timers)Atomic events (e.g., size of memory allocated/freed)Selection of event statisticsSupport for hardware performance counters (PAPI)Support definition of semantic entities for mappingSupport for event groups (aggregation, selection)Instrumentation optimizationEliminate instrumentation in lightweight routines

    Petascale Architectures and Performance Strategies*TAU Performance System

    PAPI Performance Application Programming InterfaceThe purpose of the PAPI project is to design, standardize and implement a portable and efficient API to access the hardware performance monitor counters found on most modern microprocessors.Parallel Tools Consortium project started in 1998Developed by University of Tennessee, Knoxvillehttp://icl.cs.utk.edu/papi/

    Petascale Architectures and Performance Strategies*TAU Performance System

    TAU Instrumentation MechanismsSource codeManual (TAU API, TAU component API)Automatic (robust)C, C++, F77/90/95 (Program Database Toolkit (PDT))OpenMP (directive rewriting (Opari), POMP2 spec)Object codePre-instrumented libraries (e.g., MPI using PMPI)Statically-linked and dynamically-linkedExecutable codeDynamic instrumentation (pre-execution) (DynInstAPI)Virtual machine instrumentation (e.g., Java using JVMPI)TAU_COMPILER to automate instrumentation process

    Petascale Architectures and Performance Strategies*TAU Performance System

    Using TAU: A brief IntroductionTo instrument source code using PDTChoose an appropriate TAU stub makefile in /lib:% setenv TAU_MAKEFILE /usr/tau-2.x/xt3/lib/Makefile.tau-mpi-pdt-pgi% setenv TAU_OPTIONS -optVerbose (see tau_compiler.sh)And use tau_f90.sh, tau_cxx.sh or tau_cc.sh as Fortran, C++ or C compilers:% mpif90 foo.f90 changes to % tau_f90.sh foo.f90Execute application and analyze performance data:% pprof (for text based profile display)% paraprof (for GUI)

    Petascale Architectures and Performance Strategies*TAU Performance System

    TAU Measurement System Configurationconfigure [OPTIONS]{-c++=, -cc=} Specify C++ and C compilers-pdt=Specify location of PDT -opari=Specify location of Opari OpenMP tool-papi=Specify location of PAPI -vampirtrace=Specify location of VampirTrace-mpi[inc/lib]=Specify MPI library instrumentation -dyninst=Specify location of DynInst Package-shmem[inc/lib]= Specify PSHMEM library instrumentation-python[inc/lib]=Specify Python instrumentation-tag=Specify a unique configuration name-epilog=Specify location of EPILOG -slog2Build SLOG2/Jumpshot tracing package-otf=Specify location of OTF trace package-arch=Specify architecture explicitly (bgl, xt3,ibm64,ibm64linux) {-pthread, -sproc}Use pthread or SGI sproc threads-openmpUse OpenMP threads-jdk=Specify Java instrumentation (JDK) -fortran=[vendor]Specify Fortran compiler

    Petascale Architectures and Performance Strategies*TAU Performance System

    TAU Measurement System Configurationconfigure [OPTIONS]-TRACEGenerate binary TAU traces-PROFILE (default) Generate profiles (summary)-PROFILECALLPATHGenerate call path profiles-PROFILEPHASEGenerate phase based profiles-PROFILEPARAMGenerate parameter based profiles-PROFILEMEMORYTrack heap memory for each routine-PROFILEHEADROOMTrack memory headroom to grow-MULTIPLECOUNTERSUse hardware counters + time-COMPENSATECompensate timer overhead-CPUTIMEUse usertime+system time -PAPIWALLCLOCKUse PAPIs wallclock time-PAPIVIRTUALUse PAPIs process virtual time-SGITIMERSUse fast IRIX timers-LINUXTIMERSUse fast x86 Linux timers

    Petascale Architectures and Performance Strategies*TAU Performance System

    Performance Evaluation AlternativesFlat profileDepthlimitprofileParameterprofileCallpath/ callgraph profilePhaseprofileTraceVolume of performance dataEach alternative has: one metric/counter multiple counters

    Petascale Architectures and Performance Strategies*TAU Performance System

    TAU Measurement Configuration Examples./configure pdt=/usr/pkgs/pkgs/pdtoolkit-3.11 -mpiinc=/usr/pkgs/mpich/include -mpilib=/usr/pkgs/mpich/lib -mpilibrary=-lmpich -L/usr/gm/lib64 -lgm -lpthread -ldlConfigure using PDT and MPI for x86_64 Linux./configure -arch=xt3 -papi=/opt/xt-tools/papi/3.2.1 -mpi -MULTIPLECOUNTERS; make clean installUse PAPI counters (one or more) with C/C++/F90 automatic instrumentation for XT3. Also instrument the MPI library. Use PGI compilers.Typically configure multiple measurement librariesEach configuration creates a unique /lib/Makefile.tau stub makefile. It corresponds to the configuration options used. e.g.,/usr/pkgs/tau/x86_64/lib/Makefile.tau-mpi-pdt-pgi/usr/pkgs/tau/x86_64/lib/Makefile.tau-multiplecounters-mpi-papi-pdt-pgi

    Petascale Architectures and Performance Strategies*TAU Performance System

    TAU Measurement Configuration Examples% cd /usr/pkgs/tau/x86_64/lib; ls Makefile.*pgiMakefile.tau-pdt-pgiMakefile.tau-mpi-pdt-pgiMakefile.tau-callpath-mpi-pdt-pgiMakefile.tau-mpi-pdt-trace-pgiMakefile.tau-mpi-compensate-pdt-pgiMakefile.tau-multiplecounters-mpi-papi-pdt-pgiMakefile.tau-multiplecounters-mpi-papi-pdt-trace-pgiMakefile.tau-mpi-papi-pdt-epilog-trace-pgiMakefile.tau-pdt-pgiFor an MPI+F90 application, you may want to start with:Makefile.tau-mpi-pdt-pgiSupports MPI instrumentation & PDT for automatic source instrumentation for PGI compilers

    Petascale Architectures and Performance Strategies*TAU Performance System

    Configuration Parameters in Stub MakefilesEach TAU stub Makefile resides in //lib directoryVariables:TAU_CXXSpecify the C++ compiler used by TAUTAU_CC, TAU_F90Specify the C, F90 compilersTAU_DEFSDefines used by TAU. Add to CFLAGSTAU_LDFLAGSLinker options. Add to LDFLAGSTAU_INCLUDEHeader files include path. Add to CFLAGSTAU_LIBSStatically linked TAU library. Add to LIBSTAU_SHLIBSDynamically linked TAU libraryTAU_MPI_LIBSTAUs MPI wrapper library for C/C++TAU_MPI_FLIBSTAUs MPI wrapper library for F90TAU_FORTRANLIBSMust be linked in with C++ linker for F90TAU_CXXLIBSMust be linked in with F90 linker TAU_INCLUDE_MEMORYUse TAUs malloc/free wrapper libTAU_DISABLETAUs dummy F90 stub libraryTAU_COMPILERInstrument using tau_compiler.sh scriptEach stub makefile encapsulates the parameters that TAU was configured with It represents a specific instance of the TAU libraries. TAU scripts use stub makefiles to identify what performance measurements are to be performed.

    Petascale Architectures and Performance Strategies*TAU Performance System

    Automatic InstrumentationWe now provide compiler wrapper scriptsSimply replace mpxlf90 with tau_f90.shAutomatically instruments Fortran source code, links with TAU MPI Wrapper libraries.Use tau_cc.sh and tau_cxx.sh for C/C++BeforeCXX = mpCCF90 = mpxlf90_rCFLAGS =LIBS = -lmOBJS = f1.o f2.o f3.o fn.o

    app: $(OBJS)$(CXX) $(LDFLAGS) $(OBJS) -o $@ $(LIBS).cpp.o:$(CC) $(CFLAGS) -c $