profiling multicore systems to maximize core utilization

18
mentor.com/embedded Colin Walls colin_walls@mentor. com Profiling Multicore Systems to Maximize Core Utilization

Upload: mentoresd

Post on 06-May-2015

796 views

Category:

Technology


0 download

DESCRIPTION

Profiling Multicore Systems to Maximize Core Utilization – Colin Walls Underutilization of cores in a multicore system can be considered a bug. As your system incorporates more cores, you need to make sure that all the cores are being utilized fully. Un-expected inter-actions between processes, the operating system, and resources can prevent cores from delivering peak performance. In this session explore how to profile what each core is doing, which processes are running on each core, and understand where core utilization falls below optimum values.

TRANSCRIPT

Page 1: Profiling Multicore Systems to Maximize Core Utilization

mentor.com/embedded

Colin Walls

[email protected]

Profiling Multicore Systems to

Maximize Core Utilization

Page 2: Profiling Multicore Systems to Maximize Core Utilization

Multicore Drives Complexity

*Source: VDC Research Group, STRATEGIC INSIGHTS 2012: EMBEDDED SOFTWARE & TOOLS MARKET, TRACK 2: Embedded Software Engineering Technologies, VOLUME 3: Software Development & Multicore Tool.

Almost two-thirds of all future projects plan to use multi-core

or multi-processor devices!

Previous Project Current Project Next Two Years

23.9%

32.2%37.2%

Open Source OS Use

Current Project Next Two Years0%

70%

62.7% 65.1%

Multicore & Multi-processor Use

Page 3: Profiling Multicore Systems to Maximize Core Utilization

Manual Debugging

Complexity Stresses Timing

Single ProcessorSingle Core

Bare MetalRTOS

Single Application

Manual Debugging

Multi - ProcessorMulti - Core

Complex OS

Multiple Applications

Different Defects

Complex Debugging

!

Page 4: Profiling Multicore Systems to Maximize Core Utilization

Debuggers: Stop and Stare

Debuggers are indispensable, but they only show a snapshot.

From this photo, can you tell if this building will be completed on schedule?

– How long does it usually take this worker?

– Would better tools help?

– Are other workers sitting idle?

Construction Worker by Rubber Dragon

Page 5: Profiling Multicore Systems to Maximize Core Utilization

5

Tracing, Instrumenting, Logging

Historically, tracing involved a hardware instrument– Or on-chip logic

– Buffer size limited

– Completely non-intrusive

– Ideal in ISS

Instrumenting application code– Adding custom code

– Maybe condition compile

– Debugging with printf()

Logging option with many RTOSes

Page 6: Profiling Multicore Systems to Maximize Core Utilization

Photo by woodleywonderworks

Beyond Debuggers

Answering the higher-level questions require information that traditional interactive debuggers lack:

– Tracing historical state

– Application awareness

Tracing can help find:– race conditions

– latencies

– bugs that don't cause traps

– systems where stopping the worldisn't feasible

... in both application and platform code

Page 7: Profiling Multicore Systems to Maximize Core Utilization

HRB, Analyzer, Sep 2012 7

Trace Data Sources – Linux Trace Toolkit

Sourcery Analyzer focuses on LTTng to record and collect trace data on Linux.

– Mature, high-performance tracing system for Linux

– Can record both kernel and userspace events

– Low overhead

Linux Trace Toolkit - next generation

Page 8: Profiling Multicore Systems to Maximize Core Utilization

Sourcery Analyzer with LTTng Architecture

8

C/C++ Application

Linux Kernel

LTTng Consumer Daemon

Storage

memoryflashdisk

network FS

hostLinux target

Sourcery Analyzer netw

ork

Page 9: Profiling Multicore Systems to Maximize Core Utilization

LTTng 2.0 Attributes

Tracepoints

• Low overhead• No trap or system call

required• Suitable for use in

realtime systems• Inactive tracepoints have

negligible overhead

Common Trace Format

• New compact binary format

• Flexible data layout• Network streamable• Size and seek optimized

for very large trace files

Deployment

• Loadable kernel module (2.6.38+)• Companion target side daemons and libraries

Page 10: Profiling Multicore Systems to Maximize Core Utilization

Linux Kernel Tracing - 3.6.6

250+ Tracepoints

Page 11: Profiling Multicore Systems to Maximize Core Utilization

Sourcery Analyzer - Not Just A Trace Viewer

Trace viewing tools depend on users to find the patterns.

Sourcery Analyzer focuses on analysis. Task-centric Analysis Agents calculate and display the higher-level patterns.

Analysis Agents

Event List

Page 12: Profiling Multicore Systems to Maximize Core Utilization

Viewing Trace Data

Sourcery Analyzer inherited its engine from Mentor's high-end hardware design tools.

– high-performance event database

– sophisticated measurement tools

– variety of visualization types

Visualize event payloads, notjust events.

Lamborghini Engine by Dr. Warner

Page 13: Profiling Multicore Systems to Maximize Core Utilization

Customizability is Important Most developers are working on the

application, but most debugging tools provide only platform awareness.

application

hardware

operating system

platform

where most work occurs

Sourcery Analyzer

out-of-the-box Analysis Agents

customized Analysis Agents

Stock 3rd‑party

Tools

To compensate, developers often cobble together in-house debugging tools.

Mentor Embedded Sourcery Analyzer provides platform visibility and a rich platform for user-developed analysis tools.

In-house Tools

Page 14: Profiling Multicore Systems to Maximize Core Utilization

Analysis Agents

• Out-of-box access to powerful analysis routines

• Ships with library of 15 popular agents

• One-click flow to automatically generate pre-processed analysis views

• Ability to also create and add customized agents to the library

14

Software thread state

Scheduling

CPU utilization

IRQ rate

Page fault rate

Function call flow

CPU state

Filesystem activity

Network activity

Thread migration rate

or add your own

Page 15: Profiling Multicore Systems to Maximize Core Utilization

Sourcery Analyzer Graph Types

Multicore Utilization

Statistical

Step

State

Flow

Scatter

Histogram Spectral

Tick

Digital

Floating

Page 16: Profiling Multicore Systems to Maximize Core Utilization

Real World Example

Old Design

RTOS, single-core

New Design

Linux, multicore

16

max

imum ~200

ms

aver

age

~150ms

min

imum ~40

ms

max

imum 7000+

ms

aver

age

~150ms

min

imum ~40

ms

Page 17: Profiling Multicore Systems to Maximize Core Utilization

HRB, Analyzer, Sep 2012 17

Diagnosing Problems: Real-time Response

Common problem: a real-time deadline is occasionally, but rarely, missed.

Approach:– Instrument the start/stop measurement points (e.g. IRQ and application's “read”

function).

– Run the test workload.

– Use Sourcery Analyzer to highlight only the missed deadlines.

– Correlate those occurrences with other system activities on the timeline.

– If more detailed data is needed, add instrumentation and repeat.

OK Not OKUser-specified budget

Page 18: Profiling Multicore Systems to Maximize Core Utilization

mentor.com/embedded

Colin [email protected]

http://blogs.mentor.com/colinwalls

Thank you