detecting and solving memory problems in linux clusters ...detecting and solving memory problems in...
TRANSCRIPT
Detecting and Solving Memory Problems in Linux Clusters
Chris GottbrathProduct Manager
What is a Memory Bug?
• A Memory Bug is a mistake in the management of heap memory
• Failure to check for error conditions
• Relying on non standard behaviour
• Leaking: Failure to free memory
• Dangling references: Failure to clear pointers
• Memory Corruption: Writing to memory not owned / Over running array bounds
Heap Memory
• Heap is managed by the program– C: Malloc() and free()– C++: New and Delete– Fortran90: Allocatable arrays
• Malloc usage is something like:
ptr Heap Block
Normal Allocation
ptr Heap Block
Correct Behavior
ptr Heap Block
Leaked Block
What is a Memory Leak?
ptr Leaked Block
ptr Heap Block
Normal Allocation
ptr
Heap Block
Dangling Pointer
What is a Dangling Pointer
ptr Heap Block
Correct Behavior
ptr
ptr
Heap Block
Heap Block UnrelatedHeap Block
Memory Problems in Clusters
• Moving an application to a cluster increases the problem complexity– Distributed algorithms are more complex– Application data set size may push available memory even
when everything is functioning correctly – Porting to cluster may involve moving to a new
architecture/OS• The Cluster Environment is different
– Many potentially useful memory tools aren't designed for use in a cluster
• May simply fail• May require extreme 'workarounds'
– Report based tools need cluster-aware filtering mechanisms
3/29/06 7
What is MemoryScape?
• What is MemoryScape?
– Streamlined– Lightweight– Intuitive– Collaborative– Memory Debugging
• Features– Shows
• Memory Errors• Memory Status• Memory Leaks• Bounds Violations
– MPI Memory Debugging– Remote Memory Debugging
• Tech– Low Overhead– No Instrumentation
• Interface– Inductive– Collaboration– Multi-process
8
What is TotalView?
• Source Code Debugger– C, C++, Fortran 77,
Fortran90, UPC• Complex Language
Features– Wide Compiler and Platform
Support– Multi-Threaded Debugging– Parallel Debugging
• MPI, PVM, Others– Remote Debugging– Memory Debugging capabilities
• Integrated into the debugger– Powerful and Easy GUI
• Visualization– CLI for Scripting
9
Architecture for Cluster Debugging
• Cluster Architecture– Single Front End (TotalView)
• GUI and debug engine– Debugger Agents (tvdsvr)
• Low overhead, 1 per node• Traces multiple rank processes
– TotalView communicates directly with tvdsvrs• Not using MPI• Optimized Protocol
• Provides: Robust, Scalable, Minimal Interaction
Interface Node
Compute Nodes
………
Compute Nodes
TotalView starts a set of Lightweight debugger servers
Interface Node
Memory Status
• Multiple Reports – Memory Statistics– Interactive Graphical Display– Source Code Display– Backtrace Display
• Allow the user to– Understand Program
Memory Usage Behavior– Discover Allocation Layout– Look for Inefficient Allocation– Look for Memory Leaks
Leak Detection
• MemoryScape Leak Detection– Based on Conservative
Garbage Collection– Can be performed at
any point in runtime• Helps localize leaks in
time– Multiple Reports
• Backtrace Report• Source Code
Structure• Graphically Memory
Location
Array Bounds Violations
• Heap Guard Blocks– Before and/or After – All Allocations or
just a few – Variable Size– Check at Any Time– Reports
• By Memory Address
• Only Corrupted
MemoryScape Technology
• Based on TotalView Technologies HIA Tech– Heap Interposition Agent– Also seen in TotalView
• Advantages of HIA Technology– Use it with your existing builds
• No Source Code or Binary Instrumentation– Programs run nearly full speed
• Low performance overhead – Efficient memory usage
• Low memory overhead– Support wide range of platforms and compilers
The Agent and Interposition
Heap InterpositionAgent (HIA)
Malloc API
User Code and Libraries
AllocationTable
Deallocation
Table
Process
MemoryScape
Graphical User Interface
• Inductive Task Based Approach– Walks the
user throughspecific tasks
– Easy to pickup and use
– Sidebar forsecondary tasks
– Homepagelike summaryreport
Script Mode
• MemoryScape Supports Automation– MemoryScape lets users run tests and check programs for
memory leaks without having to be in front of the program– Simple command line program called memscript
• Doesn’t start up the GUI• Can be run from within a script or test harness
– The user defines• What configuration options are active• What things MemoryScape is looking for• What actions MemoryScape should take for each type of event
that may occur
Multi-Process and Multi-Thread
• Memory debug many processes at the same time– MPI– Client-Server– Fork-Exec– Compare
two runs• Remote
applications• Muti-threaded
applications
• The TotalView Technologies Solution: A new approach to debugging for the next wave of HPC development
• Defines five core technologies required to develop the next generation of multi-threaded, multi-process applications
• Comprehensive, integrated software development tools to improve development productivity and quality
8
3/29/06 19
For more info
• General Info– See the TotalView Technologies Website:
www.totalviewtech.com• Documentation
– See the TotalView Technologies Website: www.totalviewtech.com
– Full documentation in HTML and PDF format– Order hard-copy documentation
• Webcasts– See the TotalView Technologies Website:
www.totalviewtech.com • Training
– Onsite MemoryScape Training will be available soon.
• Contact us– Sales: [email protected]– Support: [email protected]
MemoryScape Supports
• Linux– RedHat and SuSE– X86 and x86-64– Power and ia64
• UNIX– Solaris AMD and SPARC– AIX
• Apple– Power and Intel
• GCC• Vendor Compilers
● Sun Studio● Intel C/C++● Intel Fortran● XL C/C++● XL Fortran
• See platforms document on the www.totalviewtech.com site for details
21
TotalView Debugger Supported Compilers, Distros and Architectures
• Platform Support– Linux x86, x86-64, ia64, Power– Mac Power and Intel– Solaris Sparc and AMD64– AIX, Tru64, IRIX– Cray X1, XT3, IBM BGL
• Languages / Compilers– C/C++, Fortran, UPC, Assembly– Many Commercial & Open Source Compilers
• Parallel Environments– MPI (MPICH1 & 2, LAM, Open MPI, poe, MPT, Quadrics,
MVAPICH, & many others )– UPC