11/17/02 1 papi and dynaprof application signatures and performance analysis of scientific...

55
11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory, UTK Performance Evaluation Research Center, LBL [email protected] http://icl.cs.utk.edu/~mucci/dynaprof/snapshots/sc2002.ppt

Upload: britton-mitchell

Post on 12-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

PAPI and Dynaprof

Application Signatures and Performance Analysis of Scientific Applications

Philip J. MucciInnovative Computing Laboratory, UTK

Performance Evaluation Research Center, LBL

[email protected]://icl.cs.utk.edu/~mucci/dynaprof/snapshots/sc2002.ppt

Page 2: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Goals

● Understanding the behavior of the application– Identification of bottlenecks.– Usage of the hardware resources.– Effects of that usage on performance.

● Using Dynaprof to achieve that goal– Command line usage– 3 Dynaprof probes

● Wallclock Time● Hardware performance counters● Resource usage traces

Page 3: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Motivation

● Optimize the application's performance.● Evaluate the algorithms efficiency.● Generate an application signature.

– A collection of data that represent the major terms in the performance model.

● Develop a performance model.

Page 4: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Overview of Hardware Counters

● Data is NOT PORTABLE, but PAPI is...● Small number of registers dedicated for

performance monitoring functions.– AMD Athlon, 4 counters

– Pentium <= III, 2 counters

– Pentium IV, 18 counters

– IA64, 4 counters

– Alpha 21x64, 2 counters

– Power 3, 8 counters

– Power 4, 8 counters to a group

– UltraSparc II, 2 counters

– MIPS R14K, 2 counters

Page 5: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Applications used in this Tutorial

● Serial: – FSPX: A binary alloy solidification benchmark.– SWIM: The SPEC shallow water benchmark.

● Parallel (MPI):– Ex19 from PetSC distribution. – Solves nonlinear driven cavity with multigrid. A 2D

driven cavity problem solved in a velocity-vorticity formulation.

Page 6: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

FPSX Execution Environment

● Intel PIII, 1.2 Ghz– FP Results/Clock: 1 1.2 Gflips

● 4 SP/clk with SSE, 2DP/clk with SSE2

– Caches: 16K/16K, 256K● G77 version 2.96-g -O -malign-double -mpentiumpro -funroll-

loops -fexpensive-optimizations

● Execution time:> /bin/time fspx

115.370u 0.030s 1:58.17 97.6% 0+0k 0+0io 162pf+0w

Page 7: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

swim Execution Environment

● IBM Nighthawk, 16-way Power 3, 375MHz– FP Results/Clock: 4 (1.5 Gflips)– Caches: 32K/64K, 8MB– MPI over TCP/IP via switch

● Xlc 5.0.2.1 built with -g -O3 -qstrict -qarch=pwr3 -qtune=pwr3

● Execution time:> /bin/time poe swim -procs 2

0.4u 0.0s 0:15 3% 217+3933k 0+0io 1pf+0w

Page 8: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

ex19 Execution Environment

● IBM Nighthawk, 16-way Power 3, 375MHz– FP Results/Clock: 4 (1.5 Gflips)– Caches: 32K/64K, 8MB

● Xlc 5.0.2.1 built with -g● Execution time:

> /bin/time poe ex19 -procs 2 -da_grid_x 56 -da_grid_y 56

0.520u 0.200s 0:44.18 1.6% 297+3580k 0+0io 0pf+0w

Page 9: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Gprof

● Gathers timer interrupts vs. text address.● Recompile with -p option.● Gprof profile is useful for a high level overview● Does it tell us why?

Page 10: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Gprof Profile of FSPX

%time cumulative self calls ms/call tot/call name 21.71 18.93 18.93 6080 3.11 3.11 flux_ 19.99 36.36 17.43 9124 1.91 3.91 proflux_ 8.26 43.56 7.20 6080 1.18 1.18 pde_ 8.11 50.63 7.07 6080 1.16 4.17 phase_ 7.96 57.57 6.94 100061386 0.00 0.00 cplintg_ 7.46 64.08 6.51 100061388 0.00 0.00 cpsintg_ 6.05 69.36 5.28 49807360 0.00 0.00 tsofx_ 5.60 74.24 4.88 49807362 0.00 0.00 tlofx_ 4.07 77.79 3.55 62202877 0.00 0.00 cpl_ 2.44 79.92 2.13 37371906 0.00 0.00 cps_ 1.67 81.38 1.46 37371904 0.00 0.00 hl_ 1.43 82.63 1.25 37371904 0.00 0.00 hs_ 1.07 83.56 0.93 24903681 0.00 0.00 elqds_ 0.89 84.34 0.78 37371904 0.00 0.00 aks_

Page 11: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

FPSX: Top 4 functions

● Top 4 functions make up 50% of execution time● In module update.F

– flux– proflux– pde

● In module phase.F– phase

● Use the list command to explore modules and functions

Page 12: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Gprof Profile of SWIM

% cumulative self time seconds seconds name 37.3 3.22 3.22 .calc2 [1] 33.4 6.10 2.88 .calc1 [2] 24.7 8.23 2.13 .calc3 [3] 1.3 8.34 0.11 .kickpipes [4] 1.0 8.43 0.09 .inital [5]

Page 13: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Gprof Profile of ex19

% cumulative self name 70.6 22.57 22.57 .MatLUFactorNumeric_SeqAIJ_Inode 6.4 24.61 2.04 .MatFDColoringCreate_MPIAIJ [2] 5.2 26.26 1.65 .MatSetValues_MPIAIJ [3] 3.4 27.35 1.09 .MatLUFactorSymbolic_SeqAIJ [4] 2.3 28.09 0.74 .MatSolve_SeqAIJ_Inode [5] 2.3 28.82 0.73 .FormFunctionLocal [6] 1.7 29.35 0.53 .memset [7] 1.2 29.74 0.39 .MatSetValues [8] 0.9 30.02 0.28 .MatFDColoringApply [9] 0.7 30.24 0.22 .kickpipes [10]

Page 14: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Dynaprof Environment Variables

● LD_LIBRARY_PATH: Colon seperated list where to look for shared libraries. We need to find:– DynInst library– PAPI library– Any dependancies on the above. (libperfctr.so,

libcpc.so)● DYNINSTAPI_RT_LIB: Full pathname of

DynInst runtime library.● No settings necessary for AIX/DPCL port

Page 15: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Running Dynaprof

● Usage:

dynaprof [-d] [serial_application]● -d enables debugging output● Specifying an application automatically loads it

into the tool immediately after initialization.

Page 16: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Command Line Interface

● Uses GNU Readline library for input● Full featured Command Line Editing

– File and command completion: <Tab>– History: <Up>/<Down>

● Settings, macros and aliases in ~/.inputrc● Allows Emacs or VI style bindings

– set editing-mode emacs– set editing-mode vi

● See man page, TexInfo file or home page.

Page 17: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Load command

● Starts the application and stops it at the first instruction.

● Usage:

load <application> [args]

> dynaprof

(dynaprof) load tests/fpsx

Page 18: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Poeload command

● For use with MPI applications on AIX and DPCL.– DPCL < 3.2.5 requires full path

● Usage:

poeload <application> [args]

(dynaprof) poeload tests/swim -procs 2

Page 19: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Mpiload command

● For use with MPI applications.● Stops the application after it calls PMPI_Init().

● Mostly useful for script driven execution of MPI jobs

● Usage:

mpiload <application> [args]

(dynaprof) mpiload tests/mpicount

Page 20: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Attach command

● Attaches to a running application (or poe process) and stops it.

● Usage:

attach <application> <pid>(dynaprof) ^Z

> tests/fspx &

[2] 17500

> fg

(dynaprof) attach tests/fspx 17500

Page 21: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Poeattach Command

● For use with MPI applications on AIX and DPCL.– DPCL < 3.2.5 requires full path

● Usage:

poeattach <application> <pid_of_poe>

(dynaprof) ^Z

poe ex19 -da_grid_x 56 -da_grid_y 56 -procs 2 &

[2] 17500

> fg

(dynaprof) poeattach ex19 17500

Page 22: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

List command

● list

– List all modules in process● list <pattern>

– List all matching modules● list <module>

– List all functions in module● list <module> <pattern>

– List all matching functions in module● list <module> <function>

– List instrumentable points in function

Page 23: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Exploring FSPX

(dynaprof) listDEFAULT_MODULEeos.Fphase.Fsetup.Fsupmain.Fio.Fproperties.FsolveT.Fupdate.Flibm.so.6libc.so.6

●G77's Fortran Runtime supportCode compiled with g77 without -gends up in the DEFAULT_MODULE

●Application Code

●Shared libraries

Page 24: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Exploring FSPX 2(dynaprof) list DEFAULT_MODULEcall_gmon_startfini_dummycopyap_endop_gengt_numf_sne_de_di_temf_listtype_fl_Rrd_count

●G77's Fortran Runtime supportCode compiled with g77 without -gends up in the DEFAULT_MODULE

Page 25: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Exploring FSPX 3(dynaprof) list phase.FPhase_(dynaprof) list update.Fproflux_flux_pde_(dynaprof) list phase.F phase_Entry

Call tsofx_Call tlofx_Call eslds_Call elqds_Call tinsol_Call s_wsleCall do_lioCall do_lioCall do_lio

Function Calls

Page 26: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Use command

● Loads a probe shared library into address space

(dynaprof) use [probe [args]]● Use by itself displays current probe.● To change options, respecify probe.● 4 probes in this release

– Wallclock: Real time clock– PAPI: Hardware metrics– Perfometer: RT Visi of streaming hardware metrics

Page 27: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Instr command

● instr

– list all instrumented functions● instr module <pattern> [arg]

– Instrument all functions in modules matching pattern● instr function <module> <pattern> [arg]

– Instrument all functions matching pattern in module

Page 28: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Threads and Dynaprof Probes

● For threaded code, use the same probe!● Dynaprof detects threads and loads a special

version of the probe library.● Each probe specifies what to do when a new

thread is discovered.● Each thread gets the same instrumentation.

Page 29: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Probe Warning

● Instrumentation is not free.● Consider granularity of region being measured.● Overhead for PAPI 2.3 is O(100) cycles.

– Between 500 and 2000 cycles for a 2 counter read.● Overhead for Wallclock is O(100) cycles.

Page 30: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Wallclock Probe

● High resolution, low latency timer● Usage:

use wallclockprobe● Reports time in microseconds, 1.0x10-6s.

Page 31: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

PAPI Probe

● Count PAPI Presets or Native Events● Usage:

use papiprobe [event,event,...]● Default argument is either PAPI_FP_INS or PAPI_TOT_INS if the architecture doesn't support it.

● Available events a can be obtained by using:

papi_avail -a

Page 32: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

PAPI Probe and Multiplexing

● More than physical number of metrics automatically enables multiplexing.

● Minimum runtime of instrumented regions must be observed, such that all virtual counters get a chance to run at least once.

run-timemin

= num_events * .01s

● Automatic warning functionality is being rolled into PAPI.

Page 33: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

PAPI Native Events

● Look in the PAPI distribution● See the README file for your architecture in the src directory

● See the example program tests/native.c in the src/tests directory

Page 34: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Power 3 EventsPAPI_L1_DCM Yes Level 1 data cache misses (PM_LD_MISS_L1,PM_ST_L1MISS)PAPI_L1_ICM No Level 1 instruction cache misses (PM_IC_MISS)PAPI_L1_TCM Yes Level 1 cache misses (PM_IC_MISS,PM_LD_MISS_L1,PM_ST_L1MISS)PAPI_CA_SNP No Requests for a snoop (PM_SNOOP)PAPI_CA_SHR No Requests for exclusive access to shared cache line (PM_SNOOP_E_TO_S)PAPI_CA_ITV No Requests for cache line intervention (PM_SNOOP_PUSH_INT)PAPI_BRU_IDL No Cycles branch units are idle (PM_BRU_IDLE)PAPI_FXU_IDL No Cycles integer units are idle (PM_FXU_IDLE)PAPI_FPU_IDL No Cycles floating point units are idle (PM_FPU_IDLE)PAPI_LSU_IDL No Cycles load/store units are idle (PM_LSU_IDLE)PAPI_TLB_TL No Total translation lookaside buffer misses (PM_TLB_MISS)PAPI_L1_LDM No Level 1 load misses (PM_LD_MISS_L1)PAPI_L1_STM No Level 1 store misses (PM_ST_L1MISS)PAPI_L2_LDM No Level 2 load misses (PM_LD_MISS_EXCEED_L2)PAPI_L2_STM No Level 2 store misses (PM_ST_MISS_EXCEED_L2)PAPI_BTAC_M No Branch target address cache misses (PM_BTAC_MISS)PAPI_PRF_DM No Data prefetch cache misses (PM_PREF_MATCH_DEM_MISS)PAPI_TLB_SD No Translation lookaside buffer shootdowns (PM_TLBSYNC_RERUN)PAPI_CSR_FAL No Failed store conditional instructions (PM_ST_COND_FAIL)PAPI_CSR_SUC No Successful store conditional instructions (PM_RESRV_CMPL)PAPI_CSR_TOT No Total store conditional instructions (PM_RESRV_RQ)PAPI_MEM_SCY Yes Cycles Stalled Waiting for memory accesses (PM_CMPLU_WT_LD,PM_CMPLU_WT_ST)PAPI_MEM_RCY No Cycles Stalled Waiting for memory Reads (PM_CMPLU_WT_LD)PAPI_MEM_WCY No Cycles Stalled Waiting for memory writes (PM_CMPLU_WT_ST)PAPI_STL_ICY No Cycles with no instruction issue (PM_0INST_DISP)PAPI_STL_CCY No Cycles with no instructions completed (PM_0INST_CMPL)PAPI_BR_CN No Conditional branch instructions (PM_CBR_DISP)PAPI_BR_MSP No Conditional branch instructions mispredicted (PM_MPRED_BR_CAUSED_GC)PAPI_BR_PRC No Conditional branch instructions correctly predicted (PM_BR_PRED)

Page 35: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Power 3 Events 2

PAPI_FMA_INS No FMA instructions completed (PM_EXEC_FMA)PAPI_TOT_IIS No Instructions issued (PM_INST_DISP)PAPI_TOT_INS No Instructions completed (PM_INST_CMPL)PAPI_INT_INS Yes Integer instructions (PM_FXU0_PROD_RESULT,PM_FXU1_PROD_RESULT,PM_FXU2_PROD_RESULT)PAPI_FP_INS Yes Floating point instructions (PM_FPU0_CMPL,PM_FPU1_CMPL)PAPI_LD_INS No Load instructions (PM_LD_CMPL)PAPI_SR_INS No Store instructions (PM_ST_CMPL)PAPI_BR_INS No Branch instructions (PM_BR_CMPL)PAPI_FLOPS Yes Floating point instructions per second (PM_CYC,PM_FPU0_CMPL,PM_FPU1_CMPL)PAPI_TOT_CYC No Total cycles (PM_CYC)PAPI_IPS Yes Instructions per second (PM_CYC,PM_INST_CMPL)PAPI_LST_INS Yes Load/store instructions completed (PM_LD_CMPL,PM_ST_CMPL)PAPI_SYC_INS No Synchronization instructions completed (PM_SYNC)PAPI_FDV_INS No Floating point divide instructions (PM_FPU_FDIV)PAPI_FSQ_INS No Floating point square root instructions (PM_FPU_FSQRT)

Page 36: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Power 4 Events

PAPI_L1_DCM Yes Level 1 data cache misses (PM_LD_MISS_L1,PM_ST_MISS_L1)PAPI_FXU_IDL No Cycles integer units are idle (PM_FXU_IDLE)PAPI_TLB_DM No Data translation lookaside buffer misses (PM_DTLB_MISS)PAPI_TLB_IM No Instruction translation lookaside buffer misses (PM_ITLB_MISS)PAPI_TLB_TL Yes Total translation lookaside buffer misses (PM_DTLB_MISS,PM_ITLB_MISS)PAPI_L1_LDM No Level 1 load misses (PM_LD_MISS_L1)PAPI_L1_STM No Level 1 store misses (PM_ST_MISS_L1)PAPI_STL_ICY No Cycles with no instruction issue (PM_0INST_FETCH)PAPI_HW_INT No Hardware interrupts (PM_EXT_INT)PAPI_FMA_INS No FMA instructions completed (PM_FPU_FMA)PAPI_TOT_IIS No Instructions issued (PM_INST_DISP)PAPI_TOT_INS No Instructions completed (PM_INST_CMPL)PAPI_INT_INS No Integer instructions (PM_FXU_FIN)PAPI_FP_INS No Floating point instructions (PM_FPU_FIN)PAPI_FLOPS Yes Floating point instructions per second (PM_CYC,PM_FPU_FIN)PAPI_TOT_CYC No Total cycles (PM_CYC)PAPI_IPS Yes Instructions per second (PM_CYC,PM_INST_CMPL)PAPI_L1_DCA Yes Level 1 data cache accesses (PM_LD_REF_L1,PM_ST_REF_L1)PAPI_L1_DCR No Level 1 data cache reads (PM_LD_REF_L1)PAPI_L1_DCW No Level 1 data cache writes (PM_ST_REF_L1)PAPI_FDV_INS No Floating point divide instructions (PM_FPU_FDIV)PAPI_FSQ_INS No Floating point square root instructions (PM_FPU_FSQRT)

Page 37: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Pentium III EventsPAPI_L1_DCM No Level 1 data cache misses (0x45,0x45)PAPI_L1_ICM No Level 1 instruction cache misses (0xf28,0xf28)PAPI_L2_ICM No Level 2 instruction cache misses (0x68,0x68)PAPI_L1_TCM No Level 1 cache misses (0xf2e,0xf2e)PAPI_L2_TCM No Level 2 cache misses (0x24,0x24)PAPI_CA_SHR No Requests for exclusive access to shared cache line (0x22e,0x22e)PAPI_CA_CLN No Requests for exclusive access to clean cache line (0x66,0x66)PAPI_CA_INV No Requests for cache line invalidation (0x69,0x69)PAPI_CA_ITV No Requests for cache line intervention (0x4007b,0x4007b)PAPI_TLB_IM No Instruction translation lookaside buffer misses (0x85,0x85)PAPI_L1_LDM No Level 1 load misses (0xf29,0xf29)PAPI_L1_STM No Level 1 store misses (0xf2a,0xf2a)PAPI_L2_LDM Yes Level 2 load misses (0x24,0x25)PAPI_L2_STM No Level 2 store misses (0x25,0x25)PAPI_BTAC_M No Branch target address cache misses (0xe2,0xe2)PAPI_HW_INT No Hardware interrupts (0xc8,0xc8)PAPI_BR_CN No Conditional branch instructions (0xc4,0xc4)PAPI_BR_TKN No Conditional branch instructions taken (0xc9,0xc9)PAPI_BR_NTK Yes Conditional branch instructions not taken (0xc4,0xc9)PAPI_BR_MSP No Conditional branch instructions mispredicted (0xc5,0xc5)PAPI_BR_PRC Yes Conditional branch instructions correctly predicted (0xc4,0xc5)PAPI_TOT_IIS No Instructions issued (0xd0,0xd0)PAPI_TOT_INS No Instructions completed (0xc0,0xc0)PAPI_FP_INS No Floating point instructions (0xc1,0x0)PAPI_BR_INS No Branch instructions (0xc4,0xc4)PAPI_VEC_INS No Vector/SIMD instructions (0xb0,0xb0)PAPI_FLOPS Yes Floating point instructions per second (0xc1,0x79)

Page 38: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Intel Pentium IV Events

PAPI_L1_DCM No Level 1 data cache misses 0x0003b000/0x12000204@0x8000000c)

PAPI_L2_DCM No Level 2 data cache misses (0x0003b000/0x12000204@0x8000000c)

PAPI_L1_LDM No Level 1 load misses (0x0003b000/0x12000204@0x8000000c)PAPI_L1_STM No Level 1 store misses (0x0003b000/0x12000204@0x8000000c)PAPI_L2_LDM No Level 2 load misses (0x0003b000/0x12000204@0x8000000c)PAPI_L2_STM No Level 2 store misses (0x0003b000/0x12000204@0x8000000c)PAPI_TOT_INS No Instructions completed

(0x00039000/0x04000204@0x8000000c)PAPI_FP_INS No Floating point instructions

(0x0003b000/0x18000204@0x8000000c 0x00033000/0x09000034@0x80000008)

PAPI_TOT_CYC No Total cycles (0x00ff9000/0x7e000004@0x8000000d)

(Arguments to perfex -e from PerfCtr distribution)

Page 39: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Sun UltraSparc II Events

PAPI_L1_ICM Yes Level 1 instruction cache misses (0x8,0x8)PAPI_L2_TCM Yes Level 2 cache misses (0xc,0xc)PAPI_CA_SNP No Requests for a snoop (-1,0xe)PAPI_CA_INV No Requests for cache line invalidation (0xe,-1)PAPI_L1_LDM Yes Level 1 load misses (0x9,0x9)PAPI_L1_STM Yes Level 1 store misses (0xa,0xa)PAPI_BR_MSP No Conditional branch instructions mispredicted (-1,0x2)PAPI_TOT_IIS No Instructions issued (-1,0x1)PAPI_TOT_INS No Instructions completed (-1,0x1)PAPI_LD_INS No Load instructions (0x9,-1)PAPI_SR_INS No Store instructions (0xa,-1)PAPI_TOT_CYC No Total cycles (0x0,0x0)PAPI_IPS Yes Instructions per second (0x0,0x1)PAPI_L1_DCR No Level 1 data cache reads (0x9,-1)PAPI_L1_DCW No Level 1 data cache writes (0xa,-1)PAPI_L1_ICH No Level 1 instruction cache hits (-1,0x8)PAPI_L2_ICH No Level 2 instruction cache hits (-1,0xf)PAPI_L1_ICA No Level 1 instruction cache accesses (0x8,-1)PAPI_L2_TCH No Level 2 total cache hits (-1,0xc)PAPI_L2_TCA No Level 2 total cache accesses (0xc,-1)

Page 40: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Sun UltraSparc III Events

PAPI_L1_ICM No Level 1 instruction cache misses (-1,0x8)PAPI_L2_ICM No Level 2 instruction cache misses (-1,0xf)PAPI_L2_TCM No Level 2 cache misses (-1,0xc)PAPI_TLB_DM No Data translation lookaside buffer misses (-1,0x12)PAPI_TLB_IM No Instruction translation lookaside buffer misses (-1,0x11)PAPI_L1_LDM No Level 1 load misses (-1,0x9)PAPI_L1_STM No Level 1 store misses (-1,0xa)PAPI_BR_MSP No Conditional branch instructions mispredicted (-1,0x2)PAPI_TOT_IIS No Instructions issued (0x1,0x1)PAPI_TOT_INS No Instructions completed (0x1,0x1)PAPI_FP_INS Yes Floating point instructions (0x18,0x27)PAPI_TOT_CYC No Total cycles (0x0,0x0)PAPI_IPS Yes Instructions per second (0x0,0x1)PAPI_L1_DCR No Level 1 data cache reads (0x9,-1)PAPI_L1_DCW No Level 1 data cache writes (0xa,-1)PAPI_L1_ICH No Level 1 instruction cache hits (0x8,-1)PAPI_L1_ICA Yes Level 1 instruction cache accesses (0x8,0x8)PAPI_L2_TCH Yes Level 2 total cache hits (0xc,0xc)PAPI_L2_TCA No Level 2 total cache accesses (0xc,-1)PAPI_FML_INS No Floating point multiply instructions (-1,0x27)PAPI_FAD_INS No Floating point add instructions (0x18,-1)

Page 41: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

MIPS R12K EventsPAPI_L1_DCM No Level 1 data cache misses (25)PAPI_L1_ICM No Level 1 instruction cache misses (9)PAPI_L2_DCM No Level 2 data cache misses (26)PAPI_L2_ICM No Level 2 instruction cache misses (10)PAPI_L1_TCM Yes Level 1 cache misses (9,25)PAPI_L2_TCM Yes Level 2 cache misses (10,26)PAPI_CA_SHR No Requests for exclusive access to shared cache line (31)PAPI_CA_INV No Requests for cache line invalidation (13)PAPI_CA_ITV No Requests for cache line intervention (12)PAPI_TLB_TL No Total translation lookaside buffer misses (23)PAPI_PRF_DM No Data prefetch cache misses (17)PAPI_CSR_FAL No Failed store conditional instructions (5)PAPI_CSR_SUC Yes Successful store conditional instructions (20,5)PAPI_CSR_TOT No Total store conditional instructions (20)PAPI_BR_CN No Conditional branch instructions (6)PAPI_BR_MSP No Conditional branch instructions mispredicted (24)PAPI_BR_PRC Yes Conditional branch instructions correctly predicted(6,24)PAPI_TOT_IIS No Instructions issued (1)PAPI_TOT_INS No Instructions completed (15)PAPI_FP_INS No Floating point instructions (21)PAPI_LD_INS No Load instructions (18)PAPI_SR_INS No Store instructions (19)PAPI_FLOPS Yes Floating point instructions per second (0,21)PAPI_TOT_CYC No Total cycles (0)PAPI_IPS Yes Instructions per second (0,15)PAPI_LST_INS Yes Load/store instructions completed (18,19)

Page 42: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Alpha/DADD 21264 Events

PAPI_L1_ICM No Level 1 instruction cache misses (0x3)PAPI_L2_TCM No Level 2 cache misses (0x1)PAPI_TLB_DM No Data translation lookaside buffer misses (0x2)PAPI_BR_UCN No Unconditional branch instructions (0x15)PAPI_BR_CN No Conditional branch instructions (0x16)PAPI_BR_NTK No Conditional branch instructions not taken (0x18)PAPI_BR_MSP No Conditional branch instructions mispredicted (0x19)PAPI_BR_PRC No Conditional branch instructions correctly predicted (0x1a)PAPI_TOT_IIS No Instructions issued (0x7)PAPI_TOT_INS No Instructions completed (0x8)PAPI_INT_INS No Integer instructions (0x9)PAPI_FP_INS No Floating point instructions (0x14)PAPI_LD_INS No Load instructions (0xa)PAPI_SR_INS No Store instructions (0xb)PAPI_TOT_CYC No Total cycles (0x0)PAPI_LST_INS No Load/store instructions completed (0xc)PAPI_SYC_INS No Synchronization instructions completed (0xd)PAPI_FML_INS No Floating point multiply instructions (0x11)PAPI_FAD_INS No Floating point add instructions (0x10)PAPI_FDV_INS No Floating point divide instructions (0x12)PAPI_FSQ_INS No Floating point square root instructions (0x13)

Page 43: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Perfometer Probe

● Sends a stream of performance data every N seconds to the Perfometer GUI.

● Functions can be colored at instrumentation time.– Default color is white, 0xFFFFFF

● Usage:use perfometerprobe [0xRRGGBB]

instr <args> <0xRRGGBB>

Page 44: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Perfometer Probe 2

● Perfometer GUI is NOT launched automatically.● showrgb in X11 lists colors and names.● Run the Java GUI

– Java -jar Perfometer.jar● Connect up to the specified hostname and port.

Page 45: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Instrumenting SWIM withperfometerprobe

Module perfometerprobe.so was loaded.Module libperfometer.so was loaded.Module libpapi.so was loaded.(dynaprof) instr function swim.F calc1_ 0xff0000swim.F, inserted 1 instrumentation points(dynaprof) instr function swim.F calc2_ 0x00ff00swim.F, inserted 1 instrumentation points(dynaprof) instr function swim.F calc3_ 0x0000ffswim.F, inserted 1 instrumentation points(dynaprof) runModule libnss_files.so.2 was loaded.Module libnss_nisplus.so.2 was loaded.Module libnsl.so.1 was loaded.Module libnss_dns.so.2 was loaded.Module libresolv.so.2 was loaded.Perfometer client awaiting connection on port #33733

Page 46: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Instrumenting FSPX forInstructions Per Cycle

(dynaprof) use probes/papiprobe PAPI_TOT_CYC, PAPI_TOT_INSModule papiprobe.so was loaded.Module libpapi.so was loaded.Module libperfctr.so was loaded.(dynaprof) instr module update.Fupdate.F, inserted 3 instrumentation points(dynaprof) instr module pde.F (dynaprof) instrproflux_flux_pde_(dynaprof) instr module phase.Fphase.F, inserted 1 instrumentation points(dynaprof) instrproflux_flux_pde_phase_

Page 47: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Instrumenting SWIM forInstructions Per Cycle

(dynaprof) use probes/papiprobe PAPI_TOT_CYC, PAPI_TOT_INSModule papiprobe.so was loaded.Module libpapi.so was loaded.Module libperfctr.so was loaded.(dynaprof) instr function swim.F calc*Swim.F, inserted 3 instrumentation points(dynaprof) instrcalc1_calc2_calc3_calc3z_

Page 48: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Reporting Probe Data

● The wallclock and PAPI probes produce very similar data.

● Both use a parsing script written in Perl.– wallclockrpt <file>– papiproberpt <file>

● Produce 3 profiles– Inclusive: T

function = T

self + T

children

– Exclusive: Tfunction

= Tself

– 1-Level Call Tree: Tchild

= Inclusive Tfunction

Page 49: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Fspx Cycles

& Instrs.

Exclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.031e+11 1 unknown 53.81 1.631e+11 1 proflux_ 27.75 8.411e+10 9124 phase_ 15.44 4.68e+10 6080 flux_ 2.507 7.598e+09 6080 pde_ 0.4884 1.48e+09 6080

Inclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 3.031e+11 0 proflux_ 59.31 1.797e+11 2.242e+08phase_ 37.69 1.142e+11 1.247e+08flux_ 2.507 7.598e+09 0 pde_ 0.4884 1.48e+09 0

1-Level Inclusive Call Tree of Metric PAPI_TOT_INS.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.031e+11 1 proflux_ 100 1.797e+11 9124 - akl_ 8.504 1.529e+10 3.737e+07- aks_ 8.4 1.51e+10 3.737e+07- cpl_ 8.525 1.532e+10 3.737e+07- cps_ 8.525 1.532e+10 3.737e+07- hl_ 9.689 1.742e+10 3.737e+07- hs_ 9.564 1.719e+10 3.737e+07flux_ 100 7.598e+09 6080 pde_ 100 1.48e+09 6080 phase_ 100 1.142e+11 6080 - tsofx_ 11.72 1.339e+10 2.49e+07- tlofx_ 11.49 1.312e+10 2.49e+07- eslds_ 12.88 1.471e+10 2.49e+07- elqds_ 12.69 1.449e+10 2.49e+07- tinsol_ 4.999e-07 571 1 - tinmush_ 1.114 1.273e+09 7.271e+04- xsoft_ 0.121 1.383e+08 7.271e+04- xloft_ 0.1031 1.178e+08 7.271e+04- cpl_ 8.913 1.018e+10 2.483e+07

Exclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 5.017e+11 1 unknown 53.62 2.69e+11 1 proflux_ 27.75 1.393e+11 9124 phase_ 14.9 7.475e+10 6080 flux_ 3.096 1.554e+10 6080 pde_ 0.6356 3.189e+09 6080

Inclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 5.017e+11 0 proflux_ 57.32 2.876e+11 2.242e+08phase_ 38.92 1.953e+11 1.247e+08flux_ 3.096 1.554e+10 0 pde_ 0.6356 3.189e+09 0

1-Level Inclusive Call Tree of Metric PAPI_TOT_CYC.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 5.017e+11 1 proflux_ 100 2.876e+11 9124 - akl_ 7.945 2.285e+10 3.737e+07- aks_ 7.871 2.264e+10 3.737e+07- cpl_ 8.84 2.542e+10 3.737e+07- cps_ 8.705 2.503e+10 3.737e+07- hl_ 9.252 2.661e+10 3.737e+07- hs_ 8.967 2.579e+10 3.737e+07flux_ 100 1.554e+10 6080 pde_ 100 3.189e+09 6080 phase_ 100 1.953e+11 6080 - tsofx_ 12.42 2.425e+10 2.49e+07- tlofx_ 12.42 2.425e+10 2.49e+07- eslds_ 13.41 2.618e+10 2.49e+07- elqds_ 13.41 2.62e+10 2.49e+07- tinsol_ 1.013e-06 1978 1 - tinmush_ 1.716 3.351e+09 7.271e+04- xsoft_ 0.1749 3.415e+08 7.271e+04- xloft_ 0.151 2.95e+08 7.271e+04- cpl_ 8.032 1.569e+10 2.483e+07

Page 50: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

fspx IPC

Exclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.031e+11 1 unknown 53.81 1.631e+11 1 proflux_ 27.75 8.411e+10 9124 phase_ 15.44 4.68e+10 6080 flux_ 2.507 7.598e+09 6080 pde_ 0.4884 1.48e+09 6080

Inclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 3.031e+11 0 proflux_ 59.31 1.797e+11 2.242e+08phase_ 37.69 1.142e+11 1.247e+08flux_ 2.507 7.598e+09 0 pde_ 0.4884 1.48e+09 0

1-Level Inclusive Call Tree of Metric PAPI_TOT_INS.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.031e+11 1 proflux_ 100 1.797e+11 9124 - akl_ 8.504 1.529e+10 3.737e+07- aks_ 8.4 1.51e+10 3.737e+07- cpl_ 8.525 1.532e+10 3.737e+07- cps_ 8.525 1.532e+10 3.737e+07- hl_ 9.689 1.742e+10 3.737e+07- hs_ 9.564 1.719e+10 3.737e+07flux_ 100 7.598e+09 6080 pde_ 100 1.48e+09 6080 phase_ 100 1.142e+11 6080 - tsofx_ 11.72 1.339e+10 2.49e+07- tlofx_ 11.49 1.312e+10 2.49e+07- eslds_ 12.88 1.471e+10 2.49e+07- elqds_ 12.69 1.449e+10 2.49e+07- tinsol_ 4.999e-07 571 1 - tinmush_ 1.114 1.273e+09 7.271e+04- xsoft_ 0.121 1.383e+08 7.271e+04- xloft_ 0.1031 1.178e+08 7.271e+04- cpl_ 8.913 1.018e+10 2.483e+07

Exclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 5.017e+11 1 unknown 53.62 2.69e+11 1 proflux_ 27.75 1.393e+11 9124 phase_ 14.9 7.475e+10 6080 flux_ 3.096 1.554e+10 6080 pde_ 0.6356 3.189e+09 6080

Inclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 5.017e+11 0 proflux_ 57.32 2.876e+11 2.242e+08phase_ 38.92 1.953e+11 1.247e+08flux_ 3.096 1.554e+10 0 pde_ 0.6356 3.189e+09 0

1-Level Inclusive Call Tree of Metric PAPI_TOT_CYC.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 5.017e+11 1 proflux_ 100 2.876e+11 9124 - akl_ 7.945 2.285e+10 3.737e+07- aks_ 7.871 2.264e+10 3.737e+07- cpl_ 8.84 2.542e+10 3.737e+07- cps_ 8.705 2.503e+10 3.737e+07- hl_ 9.252 2.661e+10 3.737e+07- hs_ 8.967 2.579e+10 3.737e+07flux_ 100 1.554e+10 6080 pde_ 100 3.189e+09 6080 phase_ 100 1.953e+11 6080 - tsofx_ 12.42 2.425e+10 2.49e+07- tlofx_ 12.42 2.425e+10 2.49e+07- eslds_ 13.41 2.618e+10 2.49e+07- elqds_ 13.41 2.62e+10 2.49e+07- tinsol_ 1.013e-06 1978 1 - tinmush_ 1.716 3.351e+09 7.271e+04- xsoft_ 0.1749 3.415e+08 7.271e+04- xloft_ 0.151 2.95e+08 7.271e+04- cpl_ 8.032 1.569e+10 2.483e+07

proflux 0.61phase 0.63flux 0.49pde 0.46

Page 51: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Swim Cycles

& Instrs.

Exclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 1.723e+09 1 calc2 38.28 6.598e+08 120 calc1 32.31 5.567e+08 120 calc3 22.33 3.847e+08 118 unknown 7.084 1.221e+08 1

Inclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 1.723e+09 0 calc2 39.42 6.793e+08 1680 calc1 35.28 6.08e+08 1800 calc3 22.87 3.942e+08 1652

1-Level Inclusive Call Tree of Metric PAPI_TOT_INS.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 1.723e+09 1 calc1 100 6.08e+08 120 - fsav 0.02065 1.255e+05 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_isend 0.05911 3.593e+05 120 - mpi_isend 0.06434 3.912e+05 120 -mpi_waitall 0.9013 5.479e+06 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_isend 0.05356 3.256e+05 120 - mpi_isend 0.05079 3.088e+05 120 -mpi_waitall 6.813 4.142e+07 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_isend 0.07504 4.562e+05 120 - mpi_isend 0.06757 4.108e+05 120 -mpi_waitall 0.161 9.791e+05 120 calc2 100 6.793e+08 120 - fsav 0.01848 1.255e+05 120 - mpi_irecv 0.02804 1.904e+05 120 - mpi_irecv 0.02804 1.904e+05 120 - mpi_isend 0.07762 5.273e+05 120 - mpi_isend 0.048 3.26e+05 120 -mpi_waitall 0.8084 5.491e+06 120 - mpi_irecv 0.02804 1.904e+05 120 - mpi_isend 0.05213 3.541e+05 120

Exclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.181e+09 1 calc2 34.85 1.108e+09 120 calc1 33.48 1.065e+09 120 calc3 26.1 8.301e+08 118 unknown 5.568 1.771e+08 1

Inclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 3.181e+09 0 calc2 35.98 1.144e+09 1680 calc1 35.61 1.133e+09 1800 calc3 26.88 8.55e+08 1652

1-Level Inclusive Call Tree of Metric PAPI_TOT_CYC.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.181e+09 1 calc1 100 1.133e+09 120 - fsav 0.03432 3.887e+05 120 - mpi_irecv 0.07356 8.332e+05 120 - mpi_isend 0.0663 7.51e+05 120 - mpi_isend 0.0739 8.371e+05 120 -mpi_waitall 0.7189 8.143e+06 120 - mpi_irecv 0.1646 1.864e+06 120 - mpi_irecv 0.03407 3.859e+05 120 - mpi_isend 0.1867 2.115e+06 120 - mpi_isend 0.06067 6.872e+05 120 -mpi_waitall 4.22 4.78e+07 120 - mpi_irecv 0.03979 4.506e+05 120 - mpi_irecv 0.03008 3.407e+05 120 - mpi_isend 0.1014 1.148e+06 120 - mpi_isend 0.07568 8.573e+05 120 -mpi_waitall 0.1076 1.219e+06 120 calc2 100 1.144e+09 120 - fsav 0.03382 3.87e+05 120 - mpi_irecv 0.03222 3.687e+05 120 - mpi_irecv 0.03554 4.067e+05 120 - mpi_isend 0.0959 1.097e+06 120 - mpi_isend 0.05655 6.471e+05 120 -mpi_waitall 0.7268 8.317e+06 120 - mpi_irecv 0.1865 2.134e+06 120 - mpi_isend 0.2616 2.993e+06 120 - mpi_isend 0.06976 7.983e+05 120

Page 52: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Swim IPC

Exclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 1.723e+09 1 calc2 38.28 6.598e+08 120 calc1 32.31 5.567e+08 120 calc3 22.33 3.847e+08 118 unknown 7.084 1.221e+08 1

Inclusive Profile of Metric PAPI_TOT_INS.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 1.723e+09 0 calc2 39.42 6.793e+08 1680 calc1 35.28 6.08e+08 1800 calc3 22.87 3.942e+08 1652

1-Level Inclusive Call Tree of Metric PAPI_TOT_INS.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 1.723e+09 1 calc1 100 6.08e+08 120 - fsav 0.02065 1.255e+05 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_isend 0.05911 3.593e+05 120 - mpi_isend 0.06434 3.912e+05 120 -mpi_waitall 0.9013 5.479e+06 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_isend 0.05356 3.256e+05 120 - mpi_isend 0.05079 3.088e+05 120 -mpi_waitall 6.813 4.142e+07 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_irecv 0.03132 1.904e+05 120 - mpi_isend 0.07504 4.562e+05 120 - mpi_isend 0.06757 4.108e+05 120 -mpi_waitall 0.161 9.791e+05 120 calc2 100 6.793e+08 120 - fsav 0.01848 1.255e+05 120 - mpi_irecv 0.02804 1.904e+05 120 - mpi_irecv 0.02804 1.904e+05 120 - mpi_isend 0.07762 5.273e+05 120 - mpi_isend 0.048 3.26e+05 120 -mpi_waitall 0.8084 5.491e+06 120 - mpi_irecv 0.02804 1.904e+05 120 - mpi_isend 0.05213 3.541e+05 120

Exclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.181e+09 1 calc2 34.85 1.108e+09 120 calc1 33.48 1.065e+09 120 calc3 26.1 8.301e+08 118 unknown 5.568 1.771e+08 1

Inclusive Profile of Metric PAPI_TOT_CYC.

Name Percent Total SubCalls------------- ------- ----- --------TOTAL 100 3.181e+09 0 calc2 35.98 1.144e+09 1680 calc1 35.61 1.133e+09 1800 calc3 26.88 8.55e+08 1652

1-Level Inclusive Call Tree of Metric PAPI_TOT_CYC.

Parent/-Child Percent Total Calls ------------- ------- ----- --------TOTAL 100 3.181e+09 1 calc1 100 1.133e+09 120 - fsav 0.03432 3.887e+05 120 - mpi_irecv 0.07356 8.332e+05 120 - mpi_isend 0.0663 7.51e+05 120 - mpi_isend 0.0739 8.371e+05 120 -mpi_waitall 0.7189 8.143e+06 120 - mpi_irecv 0.1646 1.864e+06 120 - mpi_irecv 0.03407 3.859e+05 120 - mpi_isend 0.1867 2.115e+06 120 - mpi_isend 0.06067 6.872e+05 120 -mpi_waitall 4.22 4.78e+07 120 - mpi_irecv 0.03979 4.506e+05 120 - mpi_irecv 0.03008 3.407e+05 120 - mpi_isend 0.1014 1.148e+06 120 - mpi_isend 0.07568 8.573e+05 120 -mpi_waitall 0.1076 1.219e+06 120 calc2 100 1.144e+09 120 - fsav 0.03382 3.87e+05 120 - mpi_irecv 0.03222 3.687e+05 120 - mpi_irecv 0.03554 4.067e+05 120 - mpi_isend 0.0959 1.097e+06 120 - mpi_isend 0.05655 6.471e+05 120 -mpi_waitall 0.7268 8.317e+06 120 - mpi_irecv 0.1865 2.134e+06 120 - mpi_isend 0.2616 2.993e+06 120 - mpi_isend 0.06976 7.983e+05 120

calc20.59calc10.53calc30.46

Page 53: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Perfometer Screenshot

Page 54: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

Dynaprof 0.8 SC Release

● Binary distribution for 4 Platforms on the website– AIX 3.x / DPCL 3.2.5 on Power 3– Linux / DynInst 3.0 on Pentium <= III– Solaris 2.8 / DynInst 3.0 on UltraSparc II/III– IRIX / DynInst 3.0 on MIPS R10/12/14k– Power 4 and Pentium 4 are coming...

● Xdynaprof Java/Swing GUI included● perfometerprobe and GUI included● Updated documentation

Page 55: 11/17/02 1 PAPI and Dynaprof Application Signatures and Performance Analysis of Scientific Applications Philip J. Mucci Innovative Computing Laboratory,

11/17/02

1

References

● The Dynaprof Homepage

http://www.cs.utk.edu/~mucci/dynaprof

● The PAPI Homepage

http://icl.cs.utk.edu/projects/papi

● The DynInst Homepage

http://www.dyninst.org

● The DPCL Homepage

http://oss.software.ibm.com/developerworks/opensource/dpcl

● The Vprof Homepage

http://aros.ca.sandia.gov/~cljanss/perf/vprof

● The GNU Readline Homepage

http://cnswww.cns.cwru.edu/~chet/readline/rltop.html