mastering the challenge of multicore soc debugging

33
Mastering the challenge of multicore SoC debugging Aaron Bauch, Sr FAE

Upload: others

Post on 27-Dec-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mastering the challenge of multicore SoC debugging

Mastering the challenge of multicore SoCdebugging

• Aaron Bauch, Sr FAE

Page 2: Mastering the challenge of multicore SoC debugging

Agenda

• Different types of multicore processing • Reasons for using multicore processors• Challenges of using multicore devices• Debugging in multicore environments• Debugging Demo• Questions

Page 3: Mastering the challenge of multicore SoC debugging

America4 offices

38 employees

Asia3 offices

24 employees

We are a dedicated team that offers superior technology and service that enable customers to create the products of today and the innovations of tomorrow.

Global organization

Europe4 offices

138 employees

Employees IAR Systems

Product / DevelopmentSales / Market / SupportAdministration

Page 4: Mastering the challenge of multicore SoC debugging

System performance

Page 5: Mastering the challenge of multicore SoC debugging

Performance is improved by...

• Compiler optimizations–Code motion–Loop unrolling–Function inlining

• Parallelism–Bit– Instruction–Data–Task

• Clock Frequency

Code motion

for(i=0;i<10;i++){

b = k * c;p[i] = b;

}

b = k * c;for(i=0;i<10;i++){

p[i] = b;}

Loop unrolling

/* copy 20 elements */for(i=0;i<20;++i){

a[i]=b[i];}

/* unrolled four times */for (i=0;i<20;i+=4){

a[i]=b[i];a[i+1]=b[i+1];a[i+2]=b[i+2];a[i+3]=b[i+3];

}

x=(x>>n)|(x<<(32-n)) MOV R0,R0,ROR R2

if ((x & 0x03) != 0)x >>= 2;

TST R0,#+0x3MOVNE R0,R0,LSR #+2

MUL R1,R0,R3ADD R2,R2,R1

MLA R2,R0,R3,R2

Recognize coding patterns

Page 6: Mastering the challenge of multicore SoC debugging

Parallelism

Some tasks are not suitable for parallelization

Page 7: Mastering the challenge of multicore SoC debugging

Parallelism

Other tasks are easily parallelized

Page 8: Mastering the challenge of multicore SoC debugging

SIMD

SIMD parallelism = Single Instruction, Multiple Data

Page 9: Mastering the challenge of multicore SoC debugging

Multicores

• Multiple cores on one chip can scale performance• Each core is a full CPU and can work independently or in concert

with other cores

Core 1

Core 2Core vs.

Page 10: Mastering the challenge of multicore SoC debugging

Different types of multicore processing

Page 11: Mastering the challenge of multicore SoC debugging

Homogenous multicore

• Two or more identical processors (cores) which can share a main memory, peripherals, interrupt controller etc.

• Each processor has its own registers and function units, and may have its own local memory or cache

Core

I/O

Local

Core

Local

Memory ...

Page 12: Mastering the challenge of multicore SoC debugging

Heterogeneous multicore

• Different cores share a main memory and peripherals• Can be used for application that need both real time performance

and signal processing capabilities

Core 1

Local

Core 2

Local

Shared memory

Page 13: Mastering the challenge of multicore SoC debugging

SMP vs. AMP

Symmetric Multi Processing (SMP):• Each core runs the same code from common memory• Requires homogenous system

Asymmetric Multi Processing (AMP):• Each core runs its own code or part of the application• Cores are independent of each other• Can be done with both homogenous and heterogeneous multicore processors

Page 14: Mastering the challenge of multicore SoC debugging

Overview

CPU Core 1

Local Cache

Sharedmemory/

Cache

CPU Core N

Local Cache

Interrupt Controlle

rI/O

Local Cache

Sharedmemory

Local Cache

CPU Core 1 CPU Core

2

Local Cache

Local Cache

CPU Core 1 CPU Core N

SMP AMP

Homogenous

Heterogeneous x

Sharedmemory/

Cache

Interrupt Controlle

rI/O

Page 15: Mastering the challenge of multicore SoC debugging

Reasons for Using Multicore

Page 16: Mastering the challenge of multicore SoC debugging

Homogeneous SMP

• High performance requirements–Max clock is 1-2GHz per core on ARM Cortex A9–More cores means more performance

• Multicore has easier communication and board layout vs. multi-device

CPU-1N MHz

CPU-1N MHz

CPU-1N MHz

CPU-2N MHz

vs.

Page 17: Mastering the challenge of multicore SoC debugging

Heterogeneous AMP

• Applications with multiple constraints, e.g.:–Throughput vs. interrupt latency–Constant sensor data only needs small core

• Allows for “offloading” of processing by function

CPU-1

CPU-1

CPU-1

CPU-2

vs.

Page 18: Mastering the challenge of multicore SoC debugging

Challenges using multicore

Page 19: Mastering the challenge of multicore SoC debugging

Multicore considerations

• In general, applications perform faster with more cores• However:

–When the application has defects, they are generally much harder to detect and correct

–Traditional procedural-based coding may not lend itself well to parallelization

Page 20: Mastering the challenge of multicore SoC debugging

Problems with multicore• Inefficient parallelization• Data bottlenecks• I/O bottlenecks• Imbalanced workload

An RTOS may help:–Distribution of tasks/threads across the cores–Load balancing–Handling of inter-processor communication–RTOS Example: ThreadX SMP from Express Logic

Page 21: Mastering the challenge of multicore SoC debugging

Multicore multitasking

Decodedata

Filterdata

Core 1 Core 2

Decodedata

Core 1

Filterdata

Decodedata

Core 2

Filterdata

Page 22: Mastering the challenge of multicore SoC debugging

Debugging in Multicore Environments

Page 23: Mastering the challenge of multicore SoC debugging

Debugging multicore processors

Software tool desired features:• Visibility of all cores• Start and stop cores simultaneously or individually• Multicore breakpoints

– BP on 1 core stops execution on all cores– BP on core A with condition on core B

• Multicore Trace–Very challenging for multicores with different Trace capabilities

Page 24: Mastering the challenge of multicore SoC debugging

ARM CoreSight™

Source: ARM Ltd.

Page 25: Mastering the challenge of multicore SoC debugging

V8.40 New Feature Highlights• Streaming Trace

– Enhanced Profiling and Code Coverage• C18 Support

– Latest C Standardo Clarifies some undefined behaviorso No compatibility issues with C11

• Full C++17 Support• Enhanced multicore support

– More than 2 core “groups”• Improved source browser

– Separate thread for dramatically enhanced performance– Enhanced diagnostic messages

• Documentation Comments– Editor recognizes doxygen format comments– Will appear in tooltips and parameter hints for variables and functions

• Performance monitor enhancements for Cortex A and R

Page 26: Mastering the challenge of multicore SoC debugging

IAR Embedded Workbench SMP Support

IAR Embedded Workbench support today:1 project and debugger instance for all corescores can be stopped/run individually or together

Page 27: Mastering the challenge of multicore SoC debugging

IAR Embedded Workbench AMP Support

Master (Cortex-A) Slave (Cortex-M4)

Start/stop core0/core1 Start/stop all cores

Page 28: Mastering the challenge of multicore SoC debugging

Demo configuration

• ST Discovery board with Dual core processor

–Core 1: M7 at 400 MHz–Core 2: M4 at 200 MHz

• Both cores running FreeRTOS–Each core has its own project–Each core running its own copy of

FreeRTOS

Page 29: Mastering the challenge of multicore SoC debugging

Demo software

• Two separate projects in the same workspace– CM7 project has a task which sends messages to tasks

running on CM4– CM4 has two instances of receive task running– CM7 has “check” task to see if things are still running

• Debugger loads both projects– Starts an additional instance of Embedded Workbench for

second debugger

Page 30: Mastering the challenge of multicore SoC debugging

AMP Setup for demo

• Pick M7 project as “master”– Arbitrary, we need one master

to launch a slave project• Set up Master project in Debug

options, Multicore tab to point to other project to launch at debug time– Check box to enable master– Fill in details of slave project

Page 31: Mastering the challenge of multicore SoC debugging

Demonstration

Page 32: Mastering the challenge of multicore SoC debugging

Summary• Multicore enables performance gains

when Moore’s law runs out...• However, multicore presents

debugging challenges• Modern hardware and software tools

can help you overcome mulicore debugging challenges

Page 33: Mastering the challenge of multicore SoC debugging

Thank you for your attention!

www.iar.com