ibm haifa tools update and directions

26
IBM Haifa Labs © 2005 IBM Corporation IBM Haifa Tools Update and Directions http://www.haifa.il.ibm.com/dept/svt/c ode_paot.html Gad Haber ([email protected])

Upload: jola

Post on 11-Jan-2016

47 views

Category:

Documents


3 download

DESCRIPTION

IBM Haifa Tools Update and Directions. http://www.haifa.il.ibm.com/dept/svt/code_paot.html Gad Haber ([email protected]). IBM Haifa Performance Tools. FDPR-Pro Feedback-based optimizer operating on binary executable files Part of the AIX 5L Available on Linux on Power via alphaworks - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: IBM Haifa Tools Update and Directions

IBM Haifa Labs © 2005 IBM Corporation

IBM Haifa Tools Update and Directions

http://www.haifa.il.ibm.com/dept/svt/code_paot.html

Gad Haber([email protected])

Page 2: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation2

IBM Haifa Performance Tools FDPR-Pro

Feedback-based optimizer operating on binary executable files Part of the AIX 5L Available on Linux on Power via alphaworks Under development for z/OS to be available in SDK 2.0 for the Cell platform

CodeAnalyzer Eclipse plugin for analyzing executable files and shared libraries

Part of the Visual Performance Navigator (VPA) to be available in the Cell SDK 2.0

ESTO Utility for identifying the optimal set of optimization options Embedded into FDPR-Pro Under development for tuning compilers’ options

BProber Utility for instrumenting binary executable files Under development

PDT – Performance Debugging Tool for the Cell Operates on trace files from the Cell SPEs

Page 3: IBM Haifa Tools Update and Directions

IBM Haifa Labs © 2005 IBM Corporation

FDPR-Pro

Feedback Directed Program Restructuring

Page 4: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation4

FDPR-Pro - Feedback Directed Program Restructuring

Using a global view of the entire program Operating on the executable file after linkage These properties enable FDPR-Pro to do:

Global Code ReorderingInter Procedure Boundaries OptimizationsStatic Data RearrangementConstant Area RearrangementData Prefetching

Examples of FDPR-Pro additional optimizations:Usage of Branch TablesUsage of TOC load instructions More..

Page 5: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation5

Method

Phase 1: Code instrumentationBasic block level

Phase 2: Profile information gatheringSelection of "right" input set (representative workload)Accumulation over several input sets

Phase 3: Global Code & Data OptimizationsComplements the compiler

Page 6: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation6

Partial list of FDPR-Pro Optimizations

-RC Reorder Code -bf Branch folding -bp Branch prediction bit setting -align Code alignment -uce Unreachable code elimination -i_resched Instruction re-scheduling -RD, -build_dcg Static data reordering -tocload, -reduce_toc Tocload optimizations -si, -ipht, -ihf, -isf Function inlining options -ptrgl_optimization Optimize function calls via pointers -dp Data prefetching -link_reg_optimization Eliminate stores/restore of link register -volatile_regs Eliminate stores/restores using available volatile regs -killed_regs Eliminate stores/restores of killed registers -load_after_store Separate between frequent load and store to same address -loop_unroll Loop unrolling -stack_opt Reduce stack frame size of Hot functions -dce Dead code elimination -cp Constant propagation

Page 7: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation7

FDPR-Pro Directions

New heavy analyses for more optimizations enablementUnder development

Value propagation Constant Evaluation Stack aliasing

FDPR-Pro for multi-core FDPR-Pro for the Cell processor to be available in SDK 2.0

Special options for profile gahering on the Cell New optimizations for SPE code Auto-parallelization optimizations

FDPR-Pro for embedded PowerPC is available Special features added to FDPR-Pro

accepting sampled profile and complemeting it optimizations taking into account pipeline stalls of embedded PowerPC

New optimizations for space reduction are added

Page 8: IBM Haifa Tools Update and Directions

IBM Haifa Labs © 2005 IBM Corporation

Code Analyzer

Page 9: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation9

Why Code Analyzer?

Architectures are becoming more complex Now upcoming multi-core platforms

Using only hardware simulators to detect information about potential performance bottlenecks in a given program is hard

There is a need for performance tools that can statically analyze and visualize programs for a platform design, to be used by: Hardware architects Compiler writers Application developers

Page 10: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation10

What is Code Analyzer?

Code Analyzer is an eclipse plugin which performs comprehensive static analysis on given executable files and DLLs Relies on the FDPR-Pro as the engine for the

analysis phase

Code Analyzer displays the analyzed information together with profiling data collected by:

tprof/Oprofile (in VPA xml format - ETM files) FDPR-Pro (in binary or xml format)

The code is then colored according to: Frequency counters - gathered by FDPR-Pro Hardware event ticks - gathered by tprof/Oprofile

Page 11: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation11

Code Analyzer Views

Provides several views of the input binary Assembly instructions Basic blocks Procedures CSECT modules control flow graph Hot loops Call graph Annotated source code Dispatch group formation Pipeline slots and functional units

Page 12: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation12

Grouping, Performance Comments and Pipeline Views

Page 13: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation13

Code Analyzer opened up from Profile Analyzer

Page 14: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation14

Code Analyzer (on the right) synchronized with Profile Analyzer (on the left)

Page 15: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation15

Code Analyzer - Available Performance Comments

Comments which do not require profiling Pipeline stalls for the Power architecture Pipeline stalls for the z9 platform Unreachable code and non-used data Misaligned targets

Profile-based comments Invariant instructions within Hot loops Hot function calls proceeded by overwriting non-volatile registers Hot saves and restores of registers which could be relocated to cold spill areas Hot instructions that could be scheduled to colder areas in the code Removable hot branches Hot direct unconditional branches Hot direct conditional branches that are taken, which have a colder fallthru Hot call sites that are appropriate candidates for function inlining Hot TOC load instructions that can be replaced by immediate add instructions Hot Branch to branch instructions

Page 16: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation16

Code Analyzer Directions

Enablement of more comments Under development Using FDPR-Pro added analyses

Value propagation Constant Evaluation Stack aliasing

Code Analyzer for multi-core Code Analyzer for the Cell processor to be available in SDK 2.0

Special views for distribution of instructions’ frequency on SPE code New stall comments relevant to the PPE and SPEs

Page 17: IBM Haifa Tools Update and Directions

IBM Haifa Labs © 2005 IBM Corporation

ESTO Expert System for Tuning Optimizations

Page 18: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation18

Optimization is controlled by a large number of options The problem is finding the option set that maximizes performance Parameterized (ranged) options complicate and multiply the

possibilities Each option performs a rather small change in the object program Typical users do not know which options are best for their

programs The default (e.g. -O3) is adequate, but not best for a specific

program Optimizer (compiler) developers need to find the optimal option

sets for the default combinations (e.g. -O3) and benchmarking (e.g. SPEC)

Why an automatic tool for tuning optimizations?

Page 19: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation19

ESTO - Expert System for Tuning Optimizations

Purpose Enable a typical user to utilize the actual optimization potential Automate the search in the very complex option space Produce a ‘close to optimal’ program in a reasonable time

Method Trial-and-error search in the multidimensional options space In each step another option set is used to optimize same program The program runtime is measured and compared to other results The algorithm converges to some ‘close to optimal’ option set

Features Flexible configuration for applications and running environments Possibility to extend the components, run parallel processes, etc.

Page 20: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation20

ESTO today

Embedded into FDPR-Pro By using a command line option --tune Reaches impressive speed-ups on some benchmarks Provides a good average

ESTO gain % over FDPR-Pro -O3 on Linux with SPEC2000 train workload, 64 bit

0.002.004.006.008.00

10.0012.0014.0016.00

bzip

2

craf

ty

eon

gap

gcc

gzip

mcf

pars

er

perlb

mk

twol

f

vort

ex

amm

p

appl

u

apsi art

equa

ke

mes

a

mgr

id

swim

wup

wis

e

aver

age

Page 21: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation21

ESTO directions Enabling ESTO to tune compiler optimizations

Under development Requires a configuration file with descriptions of all optimization

flags Initial adaptation for GCC

Looked at GCC “binary” (on/off) options: ~60 affect performance Runtime speed-up on SPEC BMs relative to -O1

spec 64 runtime gain over -O1applu 10.25 35.71%apsi 10.88 25.75%art 4.92 30.38%bzip2 30.20 75.26%equake 17.61 21.55%gap 7.48 3.53%gcc 3.51 0.11%mcf 13.41 25.41%mesa 68.29 10.82%mgrid 16.30 39.38%perlbmk 72.42 4.39%sixtrack 66.02 15.76%swim 9.89 17.60%twolf 12.22 6.71%vpr 22.80 19.77%average 22.14%

ESTO gain over GCC -O1 (train 64)

0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%

ap

plu

ap

si art

bzi

p2

eq

ua

ke

ga

p

gcc

mcf

me

sa

mg

rid

pe

rlb

mk

sixt

rack

swim

two

lf

vpr

ave

rag

e

Page 22: IBM Haifa Tools Update and Directions

IBM Haifa Labs © 2005 IBM Corporation

BProberBinary Prober

Page 23: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation23

Analysis Each Application has it own characteristics Insert tailored instrumentation stubs

Simulation New architectures Insert code that simulates new functionality

Optimization Performing optimizations locally Function level down to instructions level Insert code to be executed instead of existing one

Why binary probing technology is needed?

Page 24: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation24

Based on FDPR-Pro technology Enables insertion of code at

Specific address Specific Function (entry and exit points)

The inserted code is defined as function in separate library Can be written in any language Control transfer to the code is done via inserted call Parameters passed to the function

Original address of instrumentation Save area of the registers prior to the call

Definition file of user code (libraries and functions) and insertion locations is used

Availability IBM internal use (alpha) Supports very large programs including 64bit applications Both AIX and Linux on Power

BProber Today

Page 25: IBM Haifa Tools Update and Directions

IBM Haifa Labs © 2005 IBM Corporation

PDTPerformance Debugging Tool for the Cell

Page 26: IBM Haifa Tools Update and Directions

IBM Haifa Labs

© 2005 IBM Corporation26

PDT – Performance Debugging Tool PDT enables analysis and visualizing of traces from the

various SPE and the interactions between them