jesús labarta, jordi caubet, judit gimenez sergi girona, francesc escale cepba-upc

Post on 19-Jan-2016

58 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

OpenMP Performance Visualization with Paraver. Jesús Labarta, Jordi Caubet, Judit Gimenez Sergi Girona, Francesc Escale CEPBA-UPC. PARAVER. (1992- ) Flexible performance visualization tool Functions of time Precedence relationships Quantitative, comparative Powerful / not trivial - PowerPoint PPT Presentation

TRANSCRIPT

OpenMP Performance OpenMP Performance Visualization with ParaverVisualization with Paraver

Jesús Labarta, Jordi Caubet, Judit Gimenez

Sergi Girona, Francesc Escale

CEPBA-UPC

Jesús Labarta, SPSciComp2000

PARAVERPARAVER

(1992- )

Flexible performance visualization tool

Functions of time

Precedence relationships

Quantitative, comparative

Powerful / not trivial

You drive the analysis

MPI + OpenMP, System activity, performance counters,…

Distributed by CEPBA

Jesús Labarta, SPSciComp2000

Process modelProcess model

Multithreaded + message passing + multiprogramming

Objects:ThreadTaskPtask (application)

Jesús Labarta, SPSciComp2000

TracefileTracefile

RecordsState (Object, time_start, time_end, state)Events: Flag (Object, time, type, value) Precedence (Object_src, Object_dst, time_src, time_dst, tag, size)

Instrumented codesMPI + OpenMPJavaPthreads, shmem

Monitoring toolsSystem activity (SCPUs)InfoPerfex

SimulatorsDimemasSimplescalar

Filterspar2ParaverUTE2Paraver

Jesús Labarta, SPSciComp2000

StructureStructure

Tracefile

Filter

Semantics

Visualization AnalysisTextual

Representation

Reduced Tracefile

Function of time (semantic value)Events

Demand Driven evaluation

Jesús Labarta, SPSciComp2000

Filter moduleFilter module

Events

by type

by value

Communications

by tag

by size

by source / destination

logical / physical

Jesús Labarta, SPSciComp2000

Semantic value: f(t)

f = fcomp2 fcomp1 fPtask ftask fthread

Semantic functions

fcomp2, fcomp1: sign, mod, div, in range

fPtask : add, average, max, select

ftask : add, average, max, select

fthread: in state, useful, given state,

last event value,

next event value,

average next event value

Semantic moduleSemantic module

fPtask

ftask

fthread fthread fthreadfthread

ftaskftask

fthread fthread

fcomp1

Jesús Labarta, SPSciComp2000

VisualizationVisualization

Type of window

Ptask / Task / thread: one row per object of selected type

Object selection (scalability)

Representation

Color encoded / Gradient / Function of time

Multiple windows

Synchronised

Forward/backward animation

Precise time measurement

Within/between windows

Jesús Labarta, SPSciComp2000

TextualTextual

Textual detail of area around point within window

Semantic value and duration / flag / communication

Numeric / translated text (.pcf file)

Jesús Labarta, SPSciComp2000

AnalysisAnalysis

Time and object range selected pointing on window

Analysis function applied to output of semantic module

Average semantic value

Average duration/variance/number of bursts (if within range)

Number of events

Number of communications

...

Jesús Labarta, SPSciComp2000

OpenMP instrumentationOpenMP instrumentation

Compiler instrumentation

NANOS compiler

Dynamic Interception SGI native OpenMP (MP library)

Tracing of thread status running idle (busy wait) scheduling blocked

Jesús Labarta, SPSciComp2000

OpenMP analysisOpenMP analysis

Application structure Stamping code

Jesús Labarta, SPSciComp2000

OpenMP analysisOpenMP analysis

Loop scheduling Antena design

Jesús Labarta, SPSciComp2000

OpenMP analysisOpenMP analysis

Jesús Labarta, SPSciComp2000

OpenMP analysisOpenMP analysis

How do bees see flowers?

Jesús Labarta, SPSciComp2000

OpenMP analysisOpenMP analysis

Jesús Labarta, SPSciComp2000

OpenMP analysisOpenMP analysis

Jesús Labarta, SPSciComp2000

OpenMP analysisOpenMP analysis

What bees don’t see

Function A B C D

Av. L2 misses/ms 62 52 163 14

FLOPS/ms 41K 21K 8K 1K

Loads/ms 57K 52K 18K 100K

Jesús Labarta, SPSciComp2000

Static vs. Dynamic ParallelismStatic vs. Dynamic Parallelism

Jesús Labarta, SPSciComp2000

More on hardware countersMore on hardware counters

Less misses, more time

Jesús Labarta, SPSciComp2000

More on hardware countersMore on hardware counters

More memory accesses per second

Less coherence state changes

Jesús Labarta, SPSciComp2000

MPI + OpenMPMPI + OpenMP

NAS FT

Quantitative data:

%MPI collective comm: 18%

%OMP: fork/join 5%

%non parallelized: 32%

Avg. || Loop: 50ms

# || loops: 38

# || loops < 5ms 6

Jesús Labarta, SPSciComp2000

Other usesOther uses

System activity

InfoPerfex

Pthreads

Average : 33 MFLOPS

Peak: 60 MFLOPS

Jesús Labarta, SPSciComp2000

Paraver on IBMParaver on IBM

DPCL + PAPI : Sequential programs OpenMP

UTE MPI MPI+OpenMP

Jesús Labarta, SPSciComp2000

Filter Thread states

Executing application code

Executing MPI Reveive

Executing MPI Send

Descheduled

Statistics

UTE ParaverUTE Paraver

Appl. Code MPI Rec. MPI Send Descheduled

97% 1% 0% 1%

38% 9% 1% 52%

46% 1% 1% 52%

47% 8% 0% 45%

Jesús Labarta, SPSciComp2000

UTE AnalysisUTE Analysis

Communication pattern Exchanges 1 2 ; 3 4

Load balance More load on thread 1

MPI implementation Busy wait on receives

Scheduling Thread 2 and 3 time sharing one CPU Thread 4 time sharing one CPU with other processes OS quantum: 10 ms.

Jesús Labarta, SPSciComp2000

More informationMore information

http://www.cepba.upc.es/paraver

cepbatools@cepba.upc.es

top related