simplifying debugging for multi-core linux devices and low-power linux clusters

31
Simplifying debugging for multi-core Linux devices and low-power Linux clusters Embedded World Exhibition & Conference February 24, 2015

Upload: rogue-wave-software

Post on 06-Aug-2015

67 views

Category:

Software


4 download

TRANSCRIPT

Page 1: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Embedded World Exhibition & Conference

February 24, 2015

Page 2: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Introduction

Page 3: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Embedded Linux development

Why?

– Reuse

–Community

–Memory constraints

–C and C++

–Device compatibility

–Cost

Where?

– Routers

–Media streaming

– POS

–Hardware control

– Sensor display

Free Electrons.com

© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED 3

Page 4: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Multi-core

• 2 or 4 core devices much more common

– Multi-core

– Many-core

• You have a choice

– Leave the core idle

– Run additional processes

– Write multithreaded code to utilize the additional cores

• Graphical Processing Unit accelerators on the device?

How to use the additional cores?

4© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 5: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Multi-thread

• Concurrency: execution proceeds asynchronously along two or more

sequences

– Parallelism : concurrency with parallel execution

• Interdependencies

– Explicit is generally better than implicit

• Synchronization

– Race Conditions

– Deadlocks

– Live-locks

Taking advantage of parallelism in your

device

5© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 6: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Multi-device

• Computationally challenging problem

– Algorithm is parallelizable

• Requirements

– Power

– Space

– Cooling

• Fault tolerance

• Off the shelf parallel runtime vs custom

Embedded clusters

6© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 7: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

High performance computing

• Typically linux-x86

– Sometimes with GP-GPU or Intel Xeon Phi accelerators

• Programmed as sets of multi-core nodes

– Data is distributed with communication and synchronization as

needed

– Communication typically takes the form of message passing

• Entire system is optimized for app performance

– Low latency interconnect

– Parallel filesystem

• Access is via submitting batch jobs to a resource management system

Supercomputers and clusters with 100s – 1000s of nodes

7© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 8: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Rogue Wave Software

Page 9: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Rogue Wave helps organizations simplify

complex software development, improve

code quality, and shorten cycle times

9

What we do

© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 10: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Capabilities

10

Klocwork, OpenLogic, TotalView, IMSL,

SourcePro

Klocwork, OpenLogic

Klocwork, TotalView

Klocwork

Visualization, Stingray, PV-WAVE

SourcePro, IMSL, HydraExpress

SourcePro, IMSL, Stingray,

Visualization

OpenLogic OpenLogic OpenLogic OpenLogic

IMSL, SourcePro

Klocwork

© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 11: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

11

Used by 3,000 customers in over 57 countries across diverse industries to develop mission-critical applications and software

Financial Services Telecom Gov’t / Defense Technology Other Verticals

Global, diversified customer base

© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 12: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Embedded use cases

Page 13: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Retail point of sale

• Highly connected

– Operations

– Ad and promotional services

– Sensors (scale, scanner)

• Modern C++

• Many threads

– 1 or more threads for each task

– Responsiveness requirements for the threads reading the sensor

data.

13© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 14: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Industrial device controller

• Expensive equipment

• Used in production testing

• Controller software

– X86-linux

– C++

– Multi-threaded

• Customized at each site

– Customization takes the form of C code that runs in a pre-

compiled framework

14© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 15: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Sonar console

• Runs on Linux-64 bit and Linux-arm

– 2G flash memory

• Monolithic C++ with millions of lines of code

– Qt interface (touch displays)

– 100s of threads

• Rich visual data

– Video streaming

– One or more sensors

15© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 16: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Signal processing compute cluster

• Computationally demanding

• Sophisticated algorithms

– Translated from 4th generation languages/environments

• Need an answer quickly

• Using industry standard technologies

– C++

– MPI

– X86 processors for development

– Power processors for deployment

• Memory & Power constraints

16© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 17: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Techniques and

best practices

Page 18: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Debugging distributed applications

• Print debugging doesn’t scale

• You can debug 1 of N processes

– Do all processes exhibit the error

– Needle in the haystack problem

– Passing the bad apple problem

• You can run N debuggers on N processes

– Frustrating with N=2 impractical above N=4

• You can use a parallel debugger

– One debugger controlling all N processes

Techniques for debugging distributed apps

(1/3)

18© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 19: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Debugging distributed applications

• Parallel Debuggers will

– If any process fails you can focus on it and see its back-trace

– Allow you to synchronize your processes (if the code includes

common execution pathways)

– Allow you to focus on any process

– Allow you to compare processes

– Give you ways to find outliers

– Give you ways to group processes and work with those groups

Techniques for debugging distributed apps

(2/3)

19© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 20: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Debugging distributed applications

• Re-run at different scales

– Debug at lowest scale that exhibits defect

– What is different at that scale?

• Compare program flow in working and non-working cases

• Follow bad data back from the symptom to the cause

• Look closely at communication points and data decomposition

• Racy bugs

– Try out different relative orders of execution

– Add synchronization

• Deadlocks & Live Locks

– Examine sync points to make sure all assumptions are valid

– Examine flow control around sync points

• Take careful notes, there can be a lot of subtle factors

Techniques for debugging distributed apps

(3/3)

20© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 21: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Debugging multi-core applications

• Multithreaded applications and shared memory programming

– Data can be shared (higher memory efficiency)

• Shared memory programming

– Complexity: Only some memory is shared

• Multi-threaded programming

– All threads share the same heap and global

– Separate stacks (but mutually readable)

• Concurrency is the same

– Many of the same challenges and many of the same techniques

• Communication (accidental and intention) not as localized

• Memory management (new/delete, malloc/free) is shared

Observations about multi-threaded

debugging

21© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 22: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Debugging multi-core applications

• Print debugging can work for some bugs but can be very confusing for others

– Changes timing

• Look carefully at the thread capabilities of your debugger

• A good multi-thread debugger will give you

– An asynchronous interface

• Doesn’t assume a simple running/stopped state

– Easy access to all threads

– Complete control over threads

– Display of thread states

– Thread aware breakpoints

– Ways to synchronize threads

– Ways to hold threads

– Thread groups

– Display of thread-private data

– Display of data across threads

Techniques for multi-threaded debugging

(1/2)

22© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 23: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Debugging multi-core applications

• Try to reproduce problems without threads

• Vary the number of threads

• Try different interleaving patters

• Look at thread synchronization point (mutexes, semaphores, barriers)

• Use watchpoints (aka data breakpoints)

• Make sure resources are cleaned up before thread termination

• Use record and deterministic replay to capture the exact thread

execution pattern

Techniques for multi-threaded debugging

(2/2)

23© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 24: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Log file debugging

• Recompile with print statements for a log file

• Compile in and toggle on/off with a runtime flag

• Trace with an external tool

– System call tracing

– Debugger assisted tracing: refocus experiments without a recompile

• Tension & Trade-off

– Capture enough context to understand what is happening

– Manage the large volume of output that may be required

• Tips & Techniques

– Binary search to find the site of the error

– Consider file system / file size

– Flush the pipe, otherwise file writing is asynchronous

– The presence of a call sometimes changes the behavior (compiler bugs, optimization, race

conditions)

– Print debugging can be hairy with multi-thread or multi-process

• Externally driven tracing tools may be preferable to ensure logging happens

Narrow down the site and capture the context of the

bug

24© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 25: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Dynamic memory analysis

• Dynamic memory tools help catch hard to identify bugs

– memory leaks can lurk in a code base

– bounds violations can corrupt data

• can be an open door for malicious agents

– dangling pointers lead to racy, hard to reproduce symptoms

• Dynamic memory tools can also be used to inspect what is happening in the heap memory

– Normally quite hard to visualize and understand

– Critical for optimizing for low memory environments

• Tips & techniques

– Maintain a policy of eliminating 100% of leaks

– Use with a testing system to make sure you exercise different kinds of input and

different code paths

– Compare heap behavior over time to make sure OS and library changes don’t

introduce problems

Pinpoint leaks and analyze memory use

25© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 26: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Reverse debugging

• Record and deterministically replay execution trajectory through the code

– Record non deterministic inputs

– Replay those as needed to access any point in the execution

• If you can get a racy bug to reproduce you can examine it at leisure

– Give yourself the full benefit of hindsight

– What steps led to it happening?

– Where did the program go wrong?

• Tips & Techniques

– Use watchpoints (data breakpoints) to find the source of corrupt data

– Wait till you are close to the bug before activating the recording to avoid paying overhead for the entire runtime

– Capture recordings and save them to a file as part of bug reports

– Review recordings of defects in unfamiliar parts of the code with subject

matter experts

Get “racy” bugs “on tape”

26© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 27: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Remote Debugging / Cross Debugging

• Remote Debugging

– debugger core runs on your workstation (host) system

– lightweight agent process runs on the device (target) system

• The agent process is very lightweight

• The debugger core holds all the complex analysis data structures

• Tips & Techniques

– Start with a debug target on the host machine

• Copy and strip the version that goes on the device

– You can start the server and then choose the target process

– Sources may need to be accessible on the host

– Use a tool that does the right thing with host/target library mismatch

– Be aware of security

Limit debugger resource utilization in the target system

27© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 28: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Core file debugging

• The corefile isn’t always sufficient

– It can be trashed

– It represents the consequence of the defect, but not the cause

• Examine the site of the crash

• Look for ‘suspicious’ variables

• Tips & Techniques

– Compile with debug information

– You can sync up a pre-stripped executable with a corefile

generated by its stripped counterpart

– Check the more than one stack frame

A corefile is a good place to start

28© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 29: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Static analysis

• Scan your code with a “sanity checker”

– Identifies patterns which may or will lead to errors

– Can check for compliance with coding standards

• Finds bugs that could lead to a crash, even if they don’t right away

• Finds certain kinds of resource leakage

• The sooner the better

– Faster feedback, easier to correct

– Ideally this should work like a spell checker

Catch defects early on

29© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 30: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Rogue Wave solutions

• TotalView

– Asynchronous Thread Control

– Parallel Debugging

– Core file Debugging

– Reverse Debugging

– Dynamic Memory Analysis

• Klocwork

– C and C++ Static Code Analysis

• OpenLogic

– Mange Your Open Source Components

We can help!

Visit us at booth #4-139

30© 2015 ROGUE WAVE SOFTWARE, INC. ALL RIGHTS RESERVED

Page 31: Simplifying debugging for multi-core Linux devices and low-power Linux clusters

Resource slides