computer and automation research institute hungarian academy of sciences the p-grade visual parallel...
Post on 20-Dec-2015
216 views
TRANSCRIPT
Computer and Automation Research InstituteComputer and Automation Research Institute
Hungarian Academy of SciencesHungarian Academy of Sciences
The P-GRADE The P-GRADE Visual Parallel Programming EnvironmentVisual Parallel Programming Environment
Péter KacsukLaboratory of Parallel and Distributed Systems
MTA SZTAKI Research Institute
www.lpds.sztaki.hu
Problems of Developing Parallel Problems of Developing Parallel ProgramsPrograms
High-SpeedSwitch
Observing?
Programming?
Our Solution: P-GRADEOur Solution: P-GRADE
P-GRADE is a parallel programming environment which supports the whole life-cycle of parallel program development
For non-specialist programmers it provides a complete solution for efficient and easy parallel program development
Fast reengineering of sequential programs for parallel computers
Unified graphical support in program design, debugging and performance analysis
Portability on supercomputers and heterogeneous workstation/PC clusters based on PVM and MPI
Tools of P-GRADETools of P-GRADE
• GRAPNEL: Hybrid Parallel Prog. Language – Graphics to express parallelism– C/C++ to describe sequential parts
• GRED: Graphical Editor• GRP2C: Pre-compiler to (C/C++)+(PVM/MPI)• DIWIDE: Integrated distributed debugger and
animation system• GRM: distributed monitoring system• PROVE: Integrated visualisation tool
Parallel Program Design
GRAPNEL
GRED
Mapping
User mapping
GRP file
Pre-compilation
GRP2C
C source code, Cross-ref file, Make file
Building executables
C compiler, linker
GRP-PVM
GRM Library
PVM Library
GRP-MPI
GRM Library
MPI Library
executables
Trace file
Monitoring
GRM
Visualisation
PROVE
Life-cycle of Life-cycle of Parallel Parallel Program Program Development Development and its and its support in support in P-GRADEP-GRADE
GRP file
Debugging
DIWIDE
Design Goals of GRAPNELDesign Goals of GRAPNEL
• Graphical interface – to define all parallel activities– Strong support for hierarchical design– Visual abstractions to hide the low level details of
message-passing
• C/C++ (or Fortran) to describe sequential parts – Strong support for parallelizing sequential
applications– Support for programming in large– No steep learning curve
• GRAPNEL = (C/C++) + graphics
GRAPNEL: GRaphical Process NEt LanguageGRAPNEL: GRaphical Process NEt Language
• Programming paradigm: message-passing– component processes run in parallel and can
interact only by means of sending and receiving messages
• Communication model:– point-to-point, synchronous/asynchronous– collective (e.g. multicast, scatter, reduce, etc.)
• Process model:– single processes– process groups– predefined process communication templates
Three layers of GRAPNELThree layers of GRAPNEL
GRAPNELGRAPNEL
Hierarchical design levels: Graphics used at application
level: Defines interprocess
communication topology Port protocols
Graphics hides PVM/MPI function calls
Support for SPMD programming style Predefined communication
patterns Automatic scaling of parallel
programs
Communication TemplatesCommunication Templates
• Pre-defined regular process topologies– process farm– pipeline– 2D mesh– tree
• User defines: – representative
processes– actual size
• Automatic scaling
Mesh TemplateMesh Template
Tree TemplateTree Template
The process farm The process farm parallelisation approachparallelisation approach
Master
Send work packagessend();
Collect resultsrecv();
Slave1 Slave2 SlaveN
spawn(N);
The code of each slave is the same.
Parallelising the Mandelbrot set Parallelising the Mandelbrot set computationcomputation
Draw process outputDraw process output
Compute process inputCompute process input
Compute process outputCompute process output
Draw process inputDraw process input
Process GroupsProcess Groups
• Hierarchical design(subgraph abstraction)
• Collective communication(group ports)– multicast– scatter– gather– reduce
GRAPNELGRAPNEL
Hierarchical design levels: Graphics used at process
internal level C/C++ used at the text level
Synch/asynch. comm. Programming in large:
Any C/C++ library call can be included in text blocks
Graphical support for object-based programming
GRAPNELGRAPNEL
Structuring facility by macro graphs
multicastgather
Userdef(grp_in)
reduce
scatter
Point-point
gatherscatter
Userdef(grp_out)
GRAPNELGRAPNEL
Parallelising the Parallelising the Mandelbrot set Mandelbrot set
computationcomputation
Parallelising the Parallelising the Mandelbrot set Mandelbrot set
computationcomputation
Parallelising the Parallelising the Mandelbrot set Mandelbrot set
computationcomputation
Parallelising the Parallelising the Mandelbrot set Mandelbrot set
computationcomputation
GRED EditorGRED Editor
Supports the creation of all the elements of GRAPNEL
Drag-and-drop style of drawing
Cut/copy/paste/move on graphical objects
Automatic port positioning with minimal lengths and crossing of communication channels
GRED EditorGRED Editor
Extremely easy and fast construction of process graph Automatic arrange of the
process graph Automatic resizing of process
windows
Cut/copy/paste on graphical objects
Macro graph construction at arbitrarily nested level
C/C++ code can be edited by any standard text editor
GRP2C Pre-compilerGRP2C Pre-compiler
• Automatic generation of PVM and MPI calls based on GRAPNEL graphics
GRP2CC/C++
graphics
GRAPNEL
• Automatic code instrumentation for debugging and performance monitoring
C/C++
PVM/MPI
Generated code
Debugging Parallel ProgramsDebugging Parallel Programs
High-SpeedSwitch
Observing?
Principle of sequential program Principle of sequential program debuggingdebugging
• Reproducibility - determinism– For the same input set the sequential
program delivers always the same output set (even if the program is incorrect)
• Used technique: cyclic debugging– breakpoints– step-by-step execution
Problem of parallel program Problem of parallel program debuggingdebugging
• Non-reproducibility (non-determinism)– For the same input set the incorrect parallel
program can deliver different output sets
• Cyclic debugging cannot be used– breakpoints– step-by-step execution
Classification of parallel Classification of parallel debuggersdebuggers
Parallel runningseq. debuggers
Replayabledebuggers
Monitor&replay Control&replay
DIWIDEDIWIDE DebuggerDebugger
Graphical and C/C++ level debug support (breakpoints, variable inspection, etc.)
3 kinds of “step by step execution”, according to the programmer’s demand: Instruction by instruction, Graphical item by graphical
item, Macrostep by macrostep
Visualisation and animation support
Hierarchical Hierarchical Debugging by Debugging by
DIWIDEDIWIDE
Classification of parallel Classification of parallel debuggersdebuggers
Parallel runningseq. debuggers
Replayabledebuggers
Monitor&replay Control&replay
Classification of parallel Classification of parallel breakpointsbreakpoints
Local breakpoints
Global breakpoints
Individual breakpoints
Collectivebreakpoints
Principle of Macrostep DebuggingPrinciple of Macrostep Debugging
Parallel debugging is as easy as debugging traditional sequential programs.
Macrosteps Collective Breakpoints
M0 = {S1-> A1, S2-> A2, S3-> A3} A1 A2 A3
M1 = {A1-> B1, A2-> B2, A3-> B3} B1 B2 B3
M2 = {B1-> B1, B2-> C2, B3-> B3} B1 C2 B3
M3 = {B1-> B1, C2-> D2, B3-> E3} B1 D2 E3
M4 = {B1-> E1, D2-> E2} E1 E2
whereSi = Starti and Ei = Endi
P1 P2 P3
S1
A1
B1
E1
S2 S3
A2
A3
B2
C2
D2
E2
B3
E3
Macrostep DebuggingMacrostep Debugging
Support for systematic
debugging to handle non-
deterministic behaviour of
parallel applications
Systematic and automatic
generation of Execution Trees
Testing parallel programs for all
time conditions
Replay technique with
collective breakpoints
Automatic Deadlock Detection by Automatic Deadlock Detection by Macrostep DebuggingMacrostep Debugging
Integration of Integration of Macrostep Macrostep Debugging Debugging and PROVEand PROVE
Performance monitoring and analysis Performance monitoring and analysis of Parallel Programsof Parallel Programs
High-SpeedSwitch
Observing?
Visualisation SystemsVisualisation Systems
Scientific (Data
Oriented) Visualisation
Scientific (Data
Oriented) Visualisation
Program Visualisation
Program Visualisation
Problem Visualisation
(Alg. Animation)
Problem Visualisation
(Alg. Animation)
Correctness Debugging
Correctness Debugging
Performance (Debugging) Visualisation
Performance (Debugging) Visualisation
Combined Visualisation
Combined Visualisation
Goal of visualisation?
What to visualise?
Program Visualisation
Program Visualisation
Correctness Debugging
Correctness Debugging
Performance Visualisation
Performance Visualisation
Combined Visualisation
Combined Visualisation
Goal of visualisation?
Off-lineOff-lineOn-lineOn-line Semi On-lineSemi On-line
When to visualise?
Phases of Performance Visualisation
Source CodeInstrumentation
(GRAPNEL/GRED)
Runtime Monitoring
(GRM)
Visualisation
(PROVE)
Data Analysis
(PROVE)
Performance
Visualisation
Performance
Visualisation
Scalability(Data handling)
Scalability(Data handling)
Source CodeInstrumentation
Source CodeInstrumentation
Versatility(Visualisation)
Versatility(Visualisation)
Evaluation Criteria
Source Code Instrumentation
Source Code Instrumentation
Manual or Automatic
Manual or Automatic
Monitoring modes
Monitoring modes
FilteringFiltering Click-back facility
Click-back facility
Selectable program
units
Selectable program
units
Individual Events
Individual Events
On/off facility
On/off facility
StatisticsStatistics
ScalabilityScalability
Data Acquisition
Data Acquisition
Data Analysis & Display
Data Analysis & Display
Turning tracing on/off
Turning tracing on/off
FilteringFiltering Zooming Zooming FilteringFiltering
Interactive Interactive Non-Interactive Non-Interactive
VISTOP Nupshot
VersatilityVersatility
Interoperate with other tools
Interoperate with other tools
Different viewsDifferent views
Event views
Event views
NoNoStatistics views
Statistics views
YesYes
Standalone Performance Standalone Performance Analysis ToolsAnalysis Tools
• VAMPIR• Pablo• ParaGraph• AIMS• Paradyn
VAMPIRVAMPIR
Integrated Performance Integrated Performance Analysis ToolsAnalysis Tools
• VISTOP (TOPSYS)
• PVMVis (EDPEPPS)
• PROVE (GRADE)
Source Code Instrumentation
Source Code Instrumentation
Automatic Automatic Monitoring modes
Monitoring modes
FilteringFiltering Click-back facility
Click-back facility
Selectable program
units
Selectable program
units
Individual Events
Individual Events
On/off facility
On/off facility
StatisticsStatistics
Source Code Instrumentation
Source Code Instrumentation
Automatic Automatic Monitoring modes
Monitoring modes
FilteringFiltering Click-back facility
Click-back facility
Selectable program
units
Selectable program
units
Individual Events
Individual Events
On/off facility
On/off facility
StatisticsStatistics
Source code click-back Source code click-back facility and click-forwardfacility and click-forward
facilityfacility
ScalabilityScalability
Data Acquisition
Data Acquisition
Data Analysis & Display
Data Analysis & Display
Turning tracing on/off
Turning tracing on/off
FilteringFiltering Zooming Zooming FilteringFiltering
Interactive Interactive Non-Interactive Non-Interactive
ScalabilityScalability
Data Acquisition
Data Acquisition
Data Analysis & Display
Data Analysis & Display
Turning tracing on/off
Turning tracing on/off
FilteringFiltering Zooming Zooming FilteringFiltering
Interactive Interactive Non-Interactive Non-Interactive
Behaviour Window of PROVEBehaviour Window of PROVE
Scrolling visualisation windows forward and backwards User controlled focus on processors, processes and messages Zooming, event filtering facilities
ScalabilityScalability
Data Acquisition
Data Acquisition
Data Analysis & Display
Data Analysis & Display
Turning tracing on/off
Turning tracing on/off
FilteringFiltering Zooming Zooming FilteringFiltering
Interactive Interactive Non-Interactive Non-Interactive
Filtering in PROVE
VersatilityVersatility
Interoperate with other tools
Interoperate with other tools
Different viewsDifferent views
Event views
Event views
NoNoStatistics views
Statistics views
YesYes
PROVE Performance analyserPROVE Performance analyser
• Various views for displaying performance information
Synchronised multi-window visualisation
PROVE Summary WindowsPROVE Summary Windows
Various views for displaying summary information
Synchronised multi-window visualisation
PROVE Statistics WindowsPROVE Statistics Windows
Profiling based on counters Analysis of very long running
programs is enabled
VersatilityVersatility
Interoperate with other tools
Interoperate with other tools
Different viewsDifferent views
Event views
Event views
NoNoStatistics views
Statistics views
YesYes
P-GRADE
The GRM MonitorThe GRM Monitor
• Off-line monitoring (GRADE)– stores trace events in a (local or global) storage and– makes it available after execution for post-mortem
processing.
• Semi-on-line monitoring (P-GRADE)– stores trace events in a storage but– makes it available for the visualisation tool any time
during execution if the user asks for it– interactive usage of PROVE– user can remove already inspected part of the trace– evaluation of long-running programs– macrostep debugging in P-GRADE with execution
visualisation
• Application-level monitor• Tracing + statistics collection• Semi-on-line
P-GRADE
D IW ID E G R E D
G R M(M M ) Trace file
L M L M L M
H o st 1 H o st 2 H o st n
so ck e t f ile o p e ra tio n
L o ca l h o s t
S e rv e r h o s t
R em o te c lu s te r
P R O V E
P ro v e -rd d
GRM monitorGRM monitor
M M
L M
p ro c 1 p ro c 2
S h a re d -m e m o ry
b u ffe r
H o st
so ck e t
m em o ry o p .p ip e
Buffer is full (to a certain threshold)
Trace collectionTrace collection
MM
LM LM
Process 1 Process 2 Process 3
Trace fileProcess notifies LM
LM notifies MM
MM asks all LMs to stop application
MM for each LM:
asks each LM to send trace
sets timestamps to a global time
writes trace into the trace file
receives trace from LM
MM asks LMs to continue application
Trace fileTrace file
PortabilityPortabilitySupported Hardware/Software PlatformsSupported Hardware/Software Platforms
Workstation clusters SGI MIPS / IRIX 5.x/6.x (MTA SZTAKI, Univ. of Vienna) Sun UltraSPARC / Solaris 2.x (Univ. of Athens) Intel x86 / Linux (MTA SZTAKI)
Supercomputers Hitachi SR2201 / HI-UX/MPP (Polish-Japanese School,
Warsaw) Cray T3E / UNICOS(Jülich, Germany)
International installationsInternational installations
• Current– UK– Austria – Spain– Portugal– Poland– Germany– Slovakia– Greece– Japan– Mexico– USA
• Planned– Australia– Korea
Further DevelopmentsFurther Developments
• Family of parallel programming environments
P-GRADE VisualMP VisualGrid
- checkpointing
- dynamic load
balancing
- fault tolerance
- grid resource management
- grid monitoring
- mobile processes
ConclusionConclusion
• Current applications in physics– Efficency lost due to high level graphical programming is
less than 2 %
• Weather forecast application under development
• Download version:– www.lpds.sztaki.hu
• P-GRADE (Professional GRADE)– Project with Silicon Graphics Hungary– Current developments to support
• SPMD style programming• Object based programming
Thank You ...Thank You ...
?