detecting and solving memory problems in linux clusters ...detecting and solving memory problems in...

21
Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager

Upload: others

Post on 29-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Detecting and Solving Memory Problems in Linux Clusters

Chris GottbrathProduct Manager

Page 2: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

What is a Memory Bug?

• A Memory Bug is a mistake in the management of heap memory

• Failure to check for error conditions

• Relying on non standard behaviour

• Leaking: Failure to free memory

• Dangling references: Failure to clear pointers

• Memory Corruption: Writing to memory not owned / Over running array bounds

Page 3: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Heap Memory

• Heap is managed by the program– C: Malloc() and free()– C++: New and Delete– Fortran90: Allocatable arrays

• Malloc usage is something like:

Page 4: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

ptr Heap Block

Normal Allocation

ptr Heap Block

Correct Behavior

ptr Heap Block

Leaked Block

What is a Memory Leak?

ptr Leaked Block

Page 5: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

ptr Heap Block

Normal Allocation

ptr

Heap Block

Dangling Pointer

What is a Dangling Pointer

ptr Heap Block

Correct Behavior

ptr

ptr

Heap Block

Heap Block UnrelatedHeap Block

Page 6: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Memory Problems in Clusters

• Moving an application to a cluster increases the problem complexity– Distributed algorithms are more complex– Application data set size may push available memory even

when everything is functioning correctly – Porting to cluster may involve moving to a new

architecture/OS• The Cluster Environment is different

– Many potentially useful memory tools aren't designed for use in a cluster

• May simply fail• May require extreme 'workarounds'

– Report based tools need cluster-aware filtering mechanisms

Page 7: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

3/29/06 7

What is MemoryScape?

• What is MemoryScape?

– Streamlined– Lightweight– Intuitive– Collaborative– Memory Debugging

• Features– Shows

• Memory Errors• Memory Status• Memory Leaks• Bounds Violations

– MPI Memory Debugging– Remote Memory Debugging

• Tech– Low Overhead– No Instrumentation

• Interface– Inductive– Collaboration– Multi-process

Page 8: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

8

What is TotalView?

• Source Code Debugger– C, C++, Fortran 77,

Fortran90, UPC• Complex Language

Features– Wide Compiler and Platform

Support– Multi-Threaded Debugging– Parallel Debugging

• MPI, PVM, Others– Remote Debugging– Memory Debugging capabilities

• Integrated into the debugger– Powerful and Easy GUI

• Visualization– CLI for Scripting

Page 9: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

9

Architecture for Cluster Debugging

• Cluster Architecture– Single Front End (TotalView)

• GUI and debug engine– Debugger Agents (tvdsvr)

• Low overhead, 1 per node• Traces multiple rank processes

– TotalView communicates directly with tvdsvrs• Not using MPI• Optimized Protocol

• Provides: Robust, Scalable, Minimal Interaction

Interface Node

Compute Nodes

………

Compute Nodes

TotalView starts a set of Lightweight debugger servers

Interface Node

Page 10: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Memory Status

• Multiple Reports – Memory Statistics– Interactive Graphical Display– Source Code Display– Backtrace Display

• Allow the user to– Understand Program

Memory Usage Behavior– Discover Allocation Layout– Look for Inefficient Allocation– Look for Memory Leaks

Page 11: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Leak Detection

• MemoryScape Leak Detection– Based on Conservative

Garbage Collection– Can be performed at

any point in runtime• Helps localize leaks in

time– Multiple Reports

• Backtrace Report• Source Code

Structure• Graphically Memory

Location

Page 12: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Array Bounds Violations

• Heap Guard Blocks– Before and/or After – All Allocations or

just a few – Variable Size– Check at Any Time– Reports

• By Memory Address

• Only Corrupted

Page 13: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

MemoryScape Technology

• Based on TotalView Technologies HIA Tech– Heap Interposition Agent– Also seen in TotalView

• Advantages of HIA Technology– Use it with your existing builds

• No Source Code or Binary Instrumentation– Programs run nearly full speed

• Low performance overhead – Efficient memory usage

• Low memory overhead– Support wide range of platforms and compilers

Page 14: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

The Agent and Interposition

Heap InterpositionAgent (HIA)

Malloc API

User Code and Libraries

AllocationTable

Deallocation

Table

Process

MemoryScape

Page 15: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Graphical User Interface

• Inductive Task Based Approach– Walks the

user throughspecific tasks

– Easy to pickup and use

– Sidebar forsecondary tasks

– Homepagelike summaryreport

Page 16: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Script Mode

• MemoryScape Supports Automation– MemoryScape lets users run tests and check programs for

memory leaks without having to be in front of the program– Simple command line program called memscript

• Doesn’t start up the GUI• Can be run from within a script or test harness

– The user defines• What configuration options are active• What things MemoryScape is looking for• What actions MemoryScape should take for each type of event

that may occur

Page 17: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

Multi-Process and Multi-Thread

• Memory debug many processes at the same time– MPI– Client-Server– Fork-Exec– Compare

two runs• Remote

applications• Muti-threaded

applications

Page 18: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

• The TotalView Technologies Solution: A new approach to debugging for the next wave of HPC development

• Defines five core technologies required to develop the next generation of multi-threaded, multi-process applications

• Comprehensive, integrated software development tools to improve development productivity and quality

8

Page 19: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

3/29/06 19

For more info

• General Info– See the TotalView Technologies Website:

www.totalviewtech.com• Documentation

– See the TotalView Technologies Website: www.totalviewtech.com

– Full documentation in HTML and PDF format– Order hard-copy documentation

• Webcasts– See the TotalView Technologies Website:

www.totalviewtech.com • Training

– Onsite MemoryScape Training will be available soon.

• Contact us– Sales: [email protected]– Support: [email protected]

Page 20: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

MemoryScape Supports

• Linux– RedHat and SuSE– X86 and x86-64– Power and ia64

• UNIX– Solaris AMD and SPARC– AIX

• Apple– Power and Intel

• GCC• Vendor Compilers

● Sun Studio● Intel C/C++● Intel Fortran● XL C/C++● XL Fortran

• See platforms document on the www.totalviewtech.com site for details

Page 21: Detecting and Solving Memory Problems in Linux Clusters ...Detecting and Solving Memory Problems in Linux Clusters Chris Gottbrath Product Manager. ... • The Cluster Environment

21

TotalView Debugger Supported Compilers, Distros and Architectures

• Platform Support– Linux x86, x86-64, ia64, Power– Mac Power and Intel– Solaris Sparc and AMD64– AIX, Tru64, IRIX– Cray X1, XT3, IBM BGL

• Languages / Compilers– C/C++, Fortran, UPC, Assembly– Many Commercial & Open Source Compilers

• Parallel Environments– MPI (MPICH1 & 2, LAM, Open MPI, poe, MPT, Quadrics,

MVAPICH, & many others )– UPC