pablo and autopilot: performance tuning in distributed computing environments ruth aydt pablo...

47
Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University of Illinois at Urbana- Champaign http://www-pablo.cs.uiuc.edu Pablo Research Group - Department of Computer Science - UIUC

Upload: lambert-jennings

Post on 27-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Pablo and Autopilot: Performance Tuning in Distributed Computing

Environments

Ruth AydtPablo Research Group

Department of Computer ScienceUniversity of Illinois at Urbana-Champaign

http://www-pablo.cs.uiuc.eduPablo Research Group - Department of Computer Science - UIUC

Page 2: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Presentation Outline• Requirements for successful performance

tuning• Pablo toolkit components - how we got here• Autopilot

– Basic concepts– Component interactions– Fuzzy Logic decision infrastructure

• Pablo-provided monitor/control programs– Autodriver– Virtue

• Case study of Parallel Rocket Simulation Code• Current Work

Pablo Research Group - Department of Computer Science - UIUC

Page 3: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Requirements for Successful Performance Tuning in a Distributed Environment:

• top to bottom and end to end real-time performance data capture

• “appropriate” performance data detail and granularity… just enough but not too much!

• tools to help correlate and interpret captured data

• dynamic policy selection in response to current resource availability and application demands

Pablo Research Group - Department of Computer Science - UIUC

Page 4: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Pablo Toolkit Components:

a Decade of Performance

Monitoring and Analysis Tools

Pablo Research Group - Department of Computer Science - UIUC

Page 5: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Pablo Trace Library and Extensions

• Libraries linked with application to trace “generic events” and also loops, message passing, procedure calls, Unix I/O, MPI I/O, HDF routines

• Standard function names (e.g. read) replaced with tracing version (e.g. traceREAD) by preprocessor for C codes. For Fortran, calls bracketed by traceReadBegin / traceReadEnd manually

• Timestamped event data written to buffer and flushed periodically to per-processor files

Pablo Research Group - Department of Computer Science - UIUC

Page 6: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Pablo I/O, MPI I/O, HDF Analysis

• Produce reports from I/O event data• Sample MPI-IO summary report shown:

Pablo Research Group - Department of Computer Science - UIUC

Page 7: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Pablo Self-Defining Data Format

• A performance data metaformat that specifies both data record structures and data record instances

• Unlimited set of event types supported depending on the “interesting” performance data

• SDDF library provides classes to read and write files in SDDF format

• General-purpose tools can be written using the library and the Record/Field names in the SDDF files

Pablo Research Group - Department of Computer Science - UIUC

Page 8: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Sample SDDF File showing Data Structure and Data

Instance

Pablo Research Group - Department of Computer Science - UIUC

SDDFA

#337:// "description" "IO Read""Read" {

// "Seconds" "Timestamp"double "Seconds";// "Event ID" "Corresponding event"// "700009" "read"// "700011" "fread"int "Event Identifier";// "Node" "Processor number"int "Processor Number";// "Duration" "Event duration in seconds"double "Duration";// "File ID" "Unique file identifier"int "File ID";// "Number Bytes" "Number of bytes read"int "Number Bytes";

};;

"Read" { 0.019991, 700011, 0, 0.000203, 3, 3 };;

Page 9: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

SDDFStatistics Analysis Program for SDDF Files

Pablo Research Group - Department of Computer Science - UIUC

Page 10: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

SvPablo• A graphical source code browser and

performance capture/correlation tool

• Allows user to select loops and procedures to instrument in C, F77, F90 code. Automatic instrumentation for HPF via PGI performance interface.

• Collects performance data and later displays it relative to source code line

• Option for real-time data transmission via Autopilot tagged sensors (more later)

Pablo Research Group - Department of Computer Science - UIUC

Page 11: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

SvPablo GUI

Pablo Research Group - Department of Computer Science - UIUC

Page 12: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Virtue • A collaborative virtual environment for direct

software manipulation– Hierarchical graph representations that show

software structure, dynamics, and performance

– Manipulation tools for augmented interactions with the virtual environment

– Annotation tools for distributed, collaborative exploration and recording

• Uses OpenGL and EVL CAVE library for 3-d effects in CAVE, ImmersaDesk, and desktop environments

Pablo Research Group - Department of Computer Science - UIUC

Page 13: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Autopilot :

Performance Tuning in Distributed Computing

Environments

Pablo Research Group - Department of Computer Science - UIUC

Page 14: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Autopilot Toolkit• Provides a framework for the capture and analysis

of real-time application and infrastructure data in a multi-threaded distributed environment

• Offers the ability to control volume of performance data through – selective registration and property matching – analysis and data reduction at point of collection – constant, periodic, or on-demand transmission of data– ability to dynamically enable/disable data collection

• Includes a control interface to allow steering of infrastructure policies and applications, either interactively or via automated decision procedures

Pablo Research Group - Department of Computer Science - UIUC

Page 15: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Basic Autopilot Concepts

• Sensors: provide data to remote processes, allowing real-time monitoring– intrinsic (procedural - push)

– extrinsic (threaded - push)

– transfer data when requested by remote process (pull)

• Sensor Attached Functions: transform sensed data via user-defined functions before it is recorded by the sensor, providing an important data-reduction technique

Pablo Research Group - Department of Computer Science - UIUC

Page 16: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Basic Autopilot Concepts• Actuators: provide remote processes the ability

to invoke local functions or update data, allowing remote steering – synchronous (application controls when updates are

made; requests may be held in pending buffer)

– asynchronous (updates are made when request received from external agent)

• Properties: key-value pairs that are associated with and used to identify a sensor or actuator, allowing remote processes to be selective about the sensors and actuators they connect to

Pablo Research Group - Department of Computer Science - UIUC

Page 17: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Basic Autopilot Concepts

• Sensor Client: a process that connects to one or more sensors with matching properties and receives data from those sensors

• Actuator Client: a process that connects to one or more actuators with matching properties and sends data to those actuators, causing application variables controlled by the actuators to be updated or functions to be invoked

Pablo Research Group - Department of Computer Science - UIUC

Page 18: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Basic Autopilot Concepts

• Autopilot Manager: a daemon process that is responsible for handling registration requests from sensors and actuators, and matching sensor client and actuator client requests to registered sensors and actuators.

* AutopilotManager daemons may be run on multiple hosts throughout the computational grid, allowing sensors, actuators, and clients to tailor data transfer volumes to appropriate levels for local and distant tasks.

Pablo Research Group - Department of Computer Science - UIUC

Page 19: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Tagged Sensors, Actuators, Clients

• Information about the structure of the data is forwarded when a client first connects to a matching sensor or actuator, allowing the client to perform verification checks and ignore unwanted data.

• Tagged data sets map naturally into what we normally think of as event trace records.

• Sometimes called “SDDF-enabled” because the buffer contents can easily be translated to SDDF

Pablo Research Group - Department of Computer Science - UIUC

Page 20: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

• Autopilot uses the Nexus component of the Globus toolkit (http://www-globus.org) to provide...– communication substrate & multithreading capabilities

• Nexus creates a global address space that encompasses all processes executing on a distributed network

• Nexus Remote Service Requests (RSRs) used by Autopilot classes to transmit messages, insuring optimal underlying transfer protocol

• Nexus multi-threaded handlers used by Autopilot classes to process RSRs

• Most Nexus details hidden by Autopilot classes

Autopilot and Nexus/Globus

Pablo Research Group - Department of Computer Science - UIUC

Page 21: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Autopilot Component Interactions

3. global pointers returned for

matches

Autopilot Manager

Monitor/ControlTask

1. sensors and actuators

register with

their properties

2. clients request matching sensors

and actuators

4. sensor and actuator

controls and actuator data

5. sensor data

InstrumentedTask

Pablo Research Group - Department of Computer Science - UIUC

Page 22: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Instrumented Tasks

InstrumentedTask

•May contain multiple sensors and/or actuators

•Many instrumented tasks may be active at any given time

•May register sensors and actuators with multiple Autopilot Managers running on different hosts

May be application code or infrastructure resource monitor (lmon)

Pablo Research Group - Department of Computer Science - UIUC

Page 23: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Monitor/Control Tasks•May contain multiple sensor clients

and/or actuator clients

•Many monitor/control tasks may be active at any given time

•May query multiple Autopilot Managers running on different hosts

•May implement “human in the loop” (Autodriver, Virtue) or automated fuzzy logic decision server (PPFS II)

•May be monitor only,writing collected data to a file or displaying it

Monitor/ControlTask

Pablo Research Group - Department of Computer Science - UIUC

Page 24: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Fuzzy Logic Decision Infrastructure

Pablo Research Group - Department of Computer Science - UIUC

Knowledge Repository

Fuzzy Logic Rule Base

Fuzzy LogicDecision Process

Fuz

zifi

er

Def

uzzi

fier

Inpu

ts

Out

puts

SystemSensors ActuatorsActuatorsSensors

Instrumented Task(s)

Monitor/Control Task(s)

Page 25: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Sample Fuzzy Logic Rule Base for Temperature Control

Pablo Research Group - Department of Computer Science - UIUC

rulebase FurnaceRules;

// decide what to do based on roomtemp which falls into 3 rangesvar roomtemp(0,100) { set trapez cold ( 0, 50, 0, 20 ); set trapez medium( 50, 70, 10, 10 ); set trapez hot ( 80, 100, 20, 0 ); };

roomtemptruthvalues

0

1

0 10

20

30

40

50

60

70

80

90

100

coldmediumhot

Page 26: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Sample Fuzzy Logic Rule Base for Temperature Control

(continued)

Pablo Research Group - Department of Computer Science - UIUC

// control the furnace value in a range of 0-1, with 0 = offvar furnace(0,1) { set triangle off ( 0, 0, 0.1 ); set triangle half( 0.5, 0.1, 0.1 ); set triangle full( 1, 0.1, 0 ); };

// the rulesif ( roomtemp == cold ) { furnace = full; } if ( roomtemp == medium ) { furnace = half; }if ( roomtemp == hot ) { furnace = off; }

Page 27: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Fuzzy Logic Decision Infrastructure

Pablo Research Group - Department of Computer Science - UIUC

• Autopilot sensors provide a stream of room temperature readings. After fuzzification, this stream defines the value of the roomtemp fuzzy variable.

• Rules whose conditions are non-zero all contribute to determining the value of the output fuzzy variable furnace. After defuzzification, the value of furnace defines the action taken by the Autopilot actuator.

• Fuzzy logic handles noisy data and conflicting goals.• Fuzzy logic separates data sets (definition of fuzzy variables)

and rules (assertions and consequents) allowing each to be independently adjusted for a particular computing environment without re-coding the decision procedure.

Page 28: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Autodriver Monitor and Control Architecture

Pablo Research Group - Department of Computer Science - UIUC

Autopilot Manager

Autodriver -Autopilot

Adapter Task

InstrumentedTask

Java Remote Method Invocation

Unix

AutodriverJava GUI

Page 29: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Autodriver Startup

Pablo Research Group - Department of Computer Science - UIUC

•User specifies hosts for Autopilot Manager and, if remote, Adapter

•Main window displays currently registered sensors and actuators

•User selects sensors and/or actuators they are interested in

Page 30: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Autodriver Field Selection

Pablo Research Group - Department of Computer Science - UIUC

• When a tagged sensor is selected, a new window showing the list of fields in that sensor are displayed

• The user selects the field(s) they want to view

Page 31: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Autodriver Numeric Display

Pablo Research Group - Department of Computer Science - UIUC

• Data can be displayed as numeric values

• The user can choose to save the data values to a file for later analysis

Page 32: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Autodriver Plot Display

Pablo Research Group - Department of Computer Science - UIUC

• Using ptplot package from Berkeley, values can be plotted as connected or unconnected points

• Multiple fields can be plotted to a single window

•User can control number of points to display in window and zoom in on area of graph

Page 33: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Autodriver Actuator Interaction

Pablo Research Group - Department of Computer Science - UIUC

• User may enter value for selected actuator and transmit it to the remote process

• Interface may be customized for non-numeric data entry such as pull-down menu choice of LRU or MRU for actuator controlling cache replacement policy

Page 34: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Virtue Monitor and Control Architecture

Pablo Research Group - Department of Computer Science - UIUC

Autopilot Manager

Virtue

InstrumentedTask

Tagged Sensordata

Actuatorcontrols

Page 35: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Virtue Display and Control

Pablo Research Group - Department of Computer Science - UIUC

• Each sphere in the ring represents a workstation

• lmon collects processor utilization data and makes it available via sensors

• Virtue maps the data to the display

• Data transmission frequency can be adjusted via slider connected to lmon actuator

Page 36: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Case study: Rocket Simulation Code

Pablo Research Group - Department of Computer Science - UIUC

• Code developed by DOE ASCI Center for Simulation of Advanced Rockets (CSAR) at UIUC

• 40,000 lines of Fortran, MPI for communication between processes, runs on SGI Origin

• 200+ hours on 128 PEs to simulate 1/2 second of burn

• Ultimately want to model 2 minutes for complete booster burn-off

Init

Fluids Code(10 fluid iterations)

Interpolation

Solids CodeDo 3:1

Multigrid Solution for Each of the Meshes

Convergence Test n

Output

Y* Check Against a Residual* Best Case, Converge on First Try

* Saves * Saves DateDate* Advances * Advances Time StepTime Step

* 3 for coarse grain mesh; 1 for fine grain

* * Could Modify Iterations with Actuator

Page 37: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Lmons on systems across

the country

Lmons on systems across

the country

Execution Environment

Pablo Research Group - Department of Computer Science - UIUC

Autopilot Manager

Virtue

CSAR code instrumented via SvPablo

lmon gatheringnetwork data

Running on SGI Origin at NCSA

Running on SGI Octaneand Immersadeskin Pablo group

Running on SPARCin Pablo group

Running on systems aroundthe country

Page 38: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Wide-area Network Performance Data

Pablo Research Group - Department of Computer Science - UIUC

• Network latency statistics gathered via modified traceroute and made available via Autopilot sensors

• Edge color represents latency -- warm colors for high latency

• Cutting plane shows max value of intersected edges

Page 39: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Time Tunnel in Display Hierarchy

Pablo Research Group - Department of Computer Science - UIUC

•Time tunnel is second level in Virtue display hierarchy, showing application behavior on a single parallel system

•Notice long delays for some MPI allreduce calls (shown in white)

Page 40: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Application Phases and Communication Patterns

Pablo Research Group - Department of Computer Science - UIUC

Page 41: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

View from “inside” Time Tunnel

Pablo Research Group - Department of Computer Science - UIUC

•User can fly around within the virtual environment to get different views

•MPI profiling wrappers provide MPI call information via Autopilot Sensors

•SvPablo provides code region information via Autopilot Sensors

Page 42: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Call Graph in Display Hierarchy

Pablo Research Group - Department of Computer Science - UIUC

•For each processor in the time tunnel, you can “drill-down” to the procedure call graph

•SvPablo provides call graph layout and dynamic updates via Autopilot sensors

Page 43: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Call Graph Close-Up View

Pablo Research Group - Department of Computer Science - UIUC

•Color mapped to inclusive procedure execution time

•Size mapped to number of times procedure called

•Magic lens exposes the procedure names

Page 44: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Source Code Text Billboard

Pablo Research Group - Department of Computer Science - UIUC

•The user can select a procedure in the call graph display and “drill-down” to the final level, which is the source code for the procedure

Page 45: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Current Efforts

• SvPablo: version with output via Autopilot sensors generally available

• Virtue: new displays and controls for interacting with Autopilot sensors and actuators

• Autodriver: integrated event definition, recognition, adaptation, and notification

• Trace Library and Extensions: rework to use Autopilot as infrastructure, providing “automatic” instrumentation of I/O, MPI I/O, and HDF calls with corresponding well-defined sensor data structures

Pablo Research Group - Department of Computer Science - UIUC

Page 46: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Current Efforts

• Integrate sensors and actuators into Globus infrastructure

• Provide translators from – (appropriate) tagged sensor data to NetLogger

format

– Netlogger format to SDDF

– SDDF to XML

– XML to SDDF

• Continue to explore analysis, visualization, and control techniques in dynamic, distributed environments

Pablo Research Group - Department of Computer Science - UIUC

Page 47: Pablo and Autopilot: Performance Tuning in Distributed Computing Environments Ruth Aydt Pablo Research Group Department of Computer Science University

Pablo Research Group - Department of Computer Science - UIUC

• Randy Ribler*• Huseyin Simitci• Jim Oly• Nancy Tran• Guoyi Wang• Don Schmidt• Jeff Vetter*• Luiz DeRose* • Ying Zhang• Mario Pantano*

• Eric Shaffer• Shannon Whitmore• Ben Schaeffer• Dan Wells• Deb Israel• and lots of others

who have been part of the Pablo group over the years

* postdocs previously with the Pablo group

• Professor Dan Reed, Pablo Project Director

Pablo Group Participants