programming for the multicore msc815x and msc825x dsps · this session will cover the basics of...

29
TM Freescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLink and VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc. FTF-NET-F0436 Overcoming Multicore Challenges: Programming for the Multicore MSC815x and MSC825x DSPs June 21, 2010 Andrew Temple NMG DSP Applications

Upload: others

Post on 15-Oct-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

FTF-NET-F0436

Overcoming Multicore Challenges: Programming for the Multicore MSC815x and MSC825x DSPs

June 21, 2010

Andrew TempleNMG DSP Applications

Page 2: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc. 2

► Introduction

►Objectives

►MSC8156 Overview

►Multicore Hardware Models

►Multicore Programming Models

►Concerns for Porting Applications to Multicore

►Scheduling, Messaging, and Data I/O

►Memory Model Symmetry and Multicore Programming

►Porting an Application to Multicore: Motion JPEG

►Demo: Motion JPEG Asymmetric Multiprocessing

Agenda

Page 3: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc. 3

Session Introduction

►This session will cover the main concepts regarding the porting of an application to a multicore environment

►Aspects covered in this session will help programmers in identifying key concepts and ideas to be considered in multicore programming

►The session will close with a practical case study of a multicore application

►Presenter: Applications Engineer for MSC81xx multicore platforms

►1 Hour for this session

Presenter
Presentation Notes
I also helped develop, debug, and maintain the case study application.
Page 4: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc. 4

Session Objectives

►After completing this session, you will be able to effectively:

• Understand and use different multicore programming models

• Identify major concerns regarding porting to a multicore system

• Optimize a multicore system for performance

Presenter
Presentation Notes
"Overcoming Multicore Challenges: Programming for the MSC815x DSP" Abstract: This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156.     We will cover the aspects of porting a single core application to a multicore system. We will briefly discuss homogeneous vs. heterogeneous linking In closing, the session will demonstrate the Motion JPEG demo for MSC8156(4). Session is based off of  AN3620 - which was written for this demo  
Page 5: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc. 5

MSC8156 Overview

BootROM

JTAG/SAP

DDR 2/3Controller

64-bit @ DDR800

MAPLE-B

TVPE

RISC RISC

FFTPE

DFTPE

M3Memory1056 KB

8GB/s

1 Gbps

SecurityEngine

QUICCEngineTM

1GbpsEthernet

1GbpsEthernet

1 Gbps

RISC RISC

SPI

8 GB/s

SC3850DSP Core

800MHz-1GHzStarCore SC3850

Subsystem

32 KBI-Cache

32 KBD-Cache

MMU

512 KBL2/M2

4GB/s

Virtual

Interrupts &Hardware

Semaphores

I2C

UAR

T

TDM1024Ch.

16 ch.DMA

8 GB/s8 GB/s

8 GB/s

SC3850DSP Core

800MHz-1GHzStarCore SC3850

Subsystem

32 KBI-Cache

32 KBD-Cache

MMU

512 KBL2/M2

4GB/s

8 GB/s

SC3850DSP Core

800MHz-1GHzStarCore SC3850

Subsystem

32 KBI-Cache

32 KBD-Cache

MMU

512 KBL2/M2

4GB/s

8 GB/s

SC3850DSP Core

800MHz-1GHzStarCore SC3850

Subsystem

32 KBI-Cache

32 KBD-Cache

MMU

512 KBL2/M2

4GB/s

8 GB/s

SC3850DSP Core

800MHz-1GHzStarCore SC3850

Subsystem

32 KBI-Cache

32 KBD-Cache

MMU

512 KBL2/M2

4GB/s

8 GB/s

SC3850DSP Core

800MHz-1GHzStarCore SC3850

Subsystem

32 KBI-Cache

32 KBD-Cache

MMU

512 KBL2/M2

4GB/s

<8GB/s

Non-blocking Switching Matrix 128bit @ 500MHz

4GB/s

8 GB/s

SRIO1x/4x

SRIO1x/4x

OCN8

LYNX LYNX

1x/4x 3.125

Gb

High Speed Serial Interfaces (HSSI)

PCI-EX

1x/4x 3.125

Gb

DDR 2/3Controller

64-bit @ DDR800

<8GB/s

Presenter
Presentation Notes
Main points include: Memory architecture and master/slave (initiator/target) bus access as it relates to the cores and QE/HSSI
Page 6: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Basics: Multicore Hardware Models

►Hardware Models:• Symmetrical Multicore System

Identical processors have equal access to the same memory subsystem• Asymmetrical Multicore System

Different processors (DSP + RISC) have unequal access to memory

►MSC8156 qualifies as BOTH:• SC3850 cores have equal access to the same memories• RISC Processing can be done by QUICC Engine• Focus on symmetric aspect of MSC8156 with regards to SC3850

6

Page 7: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Programming Models for Symmetric Multicore Devices

►Multiple Single Cores – Cores execute applications independently

►True Multicore – Cores cooperate in the some way to execute an application

►Mixed Model - Some cores cooperate and some act independently

►Areas for Consideration:• Scheduling• Inter-core communication• Input and Output• Memory Layout and Linking

7

Page 8: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Multiple Single Cores

8

►All cores execute an application independently►Applications can be identical or different on each core►Simplest way to port from single core to multicore

• Replicating a single core application on each core of the system• Avoids interference, deadlock, and starvation concerns

Media Gateway Example

Page 9: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Multiple Single Cores: Advantages

►Advantages• Scheduling:

Task scheduling and load balancing can be avoided/ignored:

– Lower complexity/overhead– Predictable– Easy to maintain – Easy to debug

• Inter-core Communication:No need for inter-core communication: minimizes inter-core data coherency issues

• Input and Output:Cores are not involved in partitioning I/OPeripheral/DMA manages I/O for multiple cores

9

Page 10: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Multiple Single Cores: Disadvantages

►Disadvantages• Scheduling:

Uneven loading: Some cores may be overloaded while others are idle

• Inter-Core Communication: No communication Unable to dynamically assign tasks between cores

• Input and output:Limited to peripherals capable of partitioning data streams for each core and signaling appropriately (Such as the QUICC Engine). Operating system must provide adequate services to manage the I/O devices

10

Presenter
Presentation Notes
In the case of SDOS – service to enable queues to link directly to cores thru a bio low level driver enables this model to be implemented with ease. The driver can be called from a core and it is assigned a channel, for example. And specific channels are always associated with a specific core.
Page 11: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Applications for Multiple Single Core Model

►Applications designed for the multiple single core model have the following characteristics:

• A single core in the multicore system is capable of meeting application requirements with just the resources allocated to that core

• I/O for the application must be assignable to each core with no runtime intervention

• Applications will generally not have a complicated control path or very strict real time constraints due to the inability to manage loading across cores

• Applications should be able to efficiently make use of cache. In the case that an application cannot, multicore partitioning could become necessary so that cache is not thrashed

11

Presenter
Presentation Notes
Single core in the mcore system – resources allocated to it include a partition of memory, bus BW, I/O, etc. I/O Assigned of data to a ccore at compile time, system init, or by an external device outside of the multicore system Cache thrashing – imagine 2 large program/data segments. If the core is swapping between each – it will be thrashing cache on every swap – may become overly inefficient
Page 12: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

True Multicore Model

►Cores in the multicore system cooperate with each other to best utilize available system resources

►Cores usually do not perform identical tasks because processing is partitioned at either the application level, scheduling, I/O, etc

►Required by applications that are too large/complex to process on a single core

12

Core 0 Cache M2

Core 1 Cache M2

M3 Memory

DATARx QueueQUICC Engine

Subsystem

HSSISRIO

MSC8156

Page 13: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

True Multicore Model Advantages

►Scheduling: • Able to dynamically manage resources at the system level• Scheduling implementation possibilities for max system performance:

Centralized Control: single core assigns tasks to remaining cores in systemDistributed Control: each core decides which tasks to perform

►Inter-core Communications (Flexibility)Communication is application specificMessages can be sent to a specific core or broadcast to allSoftware (OS) provides mechanism through API (shared queues and inter-core messages)

►Input and OutputHaving a centralized core managing I/O reduces system level overhead for managing I/OAllows I/O throughput optimization

13

Page 14: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

True Multicore Model Disadvantages

►Scheduling:• Overhead: System overhead impacts real-time performance.

Must not be offset by the gain in distributed processing across cores

►Inter-core communications:• Overhead: communication overhead due to message passing

between cores• Inter-core Dependencies: When dependencies exist between tasks

executing on different cores, real-time performance of the system as a whole is affected

14

Presenter
Presentation Notes
Use of the true multiple cores model is limited by a point of diminishing returns beyond which the application complexity simply requires too much overhead or renders the system less than deterministic. The complexity is largely due to the required scheduling, inter-core communication, and I/O activities, all of which impose overhead onto the basic application processing
Page 15: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Memory Models for Multicore Processing

►Symmetric Memory Model• Used for both “multiple single cores” and “true multiple cores” models• Standard format for linker command file• All cores have the same memory map• Code execution differs by core based on core number and MMU

translation

►Asymmetric Memory Model• Custom linker command file required• Cores have a unique memory map

Core 0 can have a different amount of memory allocated as “private” and “shared” compared to Core 2, etc

• More complex code support and maintenance• Higher configurability

15

Page 16: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Porting a Single Core Application to a Multicore System

General Guidelines►Choose tasks with clearly defined real-time characteristics

►Avoid tasks that are too short

►Minimize dependencies between cores (loose coupling)

►When tasks are moved from a sequential single core execution to multicore, completion may occur out of order

• Example:Tasks A, B, & CC can execute only after task A and B have completedOn a multi-core device, task A and B can execute simultaneously

on separate cores

16

Presenter
Presentation Notes
• Clearly defined real-time characteristics just as they do in a single core application. • Avoid tasks that are too short. The overhead associated with short tasks is proportionally more significant than for larger tasks. Over-partitioning an application with the aim of providing flexibility and concurrency will generally create a large number of tasks and priorities spread out over several cores complicating the scheduler, increasing overhead, and making it harder to implement and debug. • Minimize the dependencies between cores. Over-designing the tasks and their interaction complicates the application and makes the system more difficult to implement, debug, and maintain. Inter-core dependencies also incur an overhead. • Task execution in a single core device forces tasks to execute sequentially. In a multi-core environment, the same tasks can execute concurrently and tasks do not necessarily complete in the same order as in a single core. A multi-core environment can expose dependencies that are hidden in a single core environment.
Page 17: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Motion JPEG Implementation

Tasks►DCT►Zig-Zag►Quantization►RLC-Huffman

17

Zig-ZagDCT

Vector

QuantizationRLC-Huffman

8 pixels

8 pixels

Presenter
Presentation Notes
Input Data: Blocks of pixels called Minimum Coded Units (MCUs) = 16x16 array of 8 bit pixels - macroblock There is no relation between any two 512byte MCU blocks Discrete Cosine Transfer: Convert raw pixels into spacial freq representation (level of detail). Outputs 8x8 block of ints (DCT coefficients) Zig-zag Reordering: Outputs a vector of 64 elements from low to high freq. Quantization:
Page 18: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Porting MJPEG to Multicore

Partitioning Options:

►Separation by JPEG Task (Pipelined Approach)

►Separation by JPEG MCU blocks

Considerations• Scheduling• Inter-core communication• Input/Output

18

Presenter
Presentation Notes
Partition for h.264 as another example – Slice based partitioning – there is a dependency between macroblocks. So we use a slice of macroblocks and do data partitioning To do a pipeline of pixel based encoding – Functions don’t partition well A lot of data between cores Load can range by function depending on certain factors. Advantage of slice: very good load balancing Encode you have control over how you distribute the slices – blocks. Divide frame into small pieces and then hand off to each core. Then when these cores are done – they go back to look for remainder blocks. For decode you can’t assume you have multiple slices. Forced to assume that you have 1 slice per frame. For video conference, thanks to packet size, can assume x slices per frame. Note for both – there is deblocking. When a core finished a slice, it will check to see if there is deblocking to be done. If so, it will do that before going to another slice. This is functional partitioning instead of data partitioning. So in essence there is a split of functional and data partitioning.
Page 19: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Multicore Considerations

►Data Input/Output:• Frame rate determined by PC• Raw image frames broken into blocks and sent in packet format• Rate of transmission of blocks within a frame is fixed• MSC815x QUICC Engine Ethernet handles packets

Simple to use a single core to manage data I/O

►Scheduling:• MCU blocks are independent: no restrictions on when a certain block is

encoded during the encoding of a JPEG frame, as long as the frame is constructed correctly

No restriction on which core processes a blockMaster/slave approach will allow easy management of load balancing

►Inter-core Communication• Signaling required when data ready

19

Page 20: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Partitioning for Motion JPEG

►Data Partitioning: each core will perform all JPEG encode tasks on an MCU block

►Enables better load balancing• Unloaded cores can encode more since any core can encode any block

►Less data passing overhead• 2 passes per block as opposed to data passing for each JPEG task per

block

20

Presenter
Presentation Notes
Partition for h.264 as another example – Slice based partitioning – there is a dependency between macroblocks. So we use a slice of macroblocks and do data partitioning To do a pipeline of pixel based encoding – Functions don’t partition well A lot of data between cores Load can range by function depending on certain factors. Advantage of slice: very good load balancing Encode you have control over how you distribute the slices – blocks. Divide frame into small pieces and then hand off to each core. Then when these cores are done – they go back to look for remainder blocks. For decode you can’t assume you have multiple slices. Forced to assume that you have 1 slice per frame. For video conference, thanks to packet size, can assume x slices per frame. Note for both – there is deblocking. When a core finished a slice, it will check to see if there is deblocking to be done. If so, it will do that before going to another slice. This is functional partitioning instead of data partitioning. So in essence there is a split of functional and data partitioning.
Page 21: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Implementation

►Master-Slave Model: Application Intelligence on Master Core

21

MSC815xEthernet Packets

M3 Memory

Page 22: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Scheduling

►Main Scheduler functionality in the master core (Core 0)►Master Core manages I/O and processing for application►Master Core sends raw frames to shared queue and signals all cores

that a block is ready to be encoded►Slave cores wait for task requests and read data from queue

22

MSC815x

M3 Memory

Presenter
Presentation Notes
the incoming raw video images are received in blocks by the QUICC Engine network interface. The master core services the QUICC Engine interrupt and then sends a message to the slave cores with the information pertinent to the received block. The slave cores are notified by messages placed in this queue which then vie to access the message and be assigned the task of JPEG encoding the video data block. If a slave core is already processing a block, it does not dequeue a task until it completes the encoding of the current data block. Core 0, even though it is the master core, is also notified when messages are posted to the task queue and can perform the JPEG encoding of an block if it has available processing bandwidth.
Page 23: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Scheduling (cont)

►On completion of encoded block, slave core notifies master core via a message queue for encoded blocks

►Master core checks if the encoded block is the next available block (serialization), and if so, sends to the QUICC Engine

► If Master core encoded the block, it directly sends output to QUICC Engine

23

MSC815x

M3 Memory

Page 24: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Inter-core Communication

►Master core is interrupted after packets received from QUICC Engine►Master core sends message to all cores indicating data ready

• Master-Slave core messaging via SDOS APIs►After encoding, slaves send a “data ready” message to master core

24

Page 25: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Messaging Queues

Message Location Call-back Priority

Purpose

Core 0 to Slaves

Core 0 6 Send / Receive message: “Block ready for encode”

Slave Cores 5 Receive message: “Block ready for encode”

Slaves to Core 0

Core 0 5 Receive message: “block encode complete”Slave Cores N/A Send message: “block encode complete”

25

►Purpose of message queues:• Shared queues for encoded and decoded blocks• Signaling mechanism for master to slave cores and slaves

to master core

Page 26: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Input and Output► Input: Core 0 passes pointer from data from QUICC Engine

to shared queues

►Output: Serialization...• Output blocks must be sent to PC in order (requires reordering

of blocks in the case that frames were encoded out of order• Core 0 uses a FIFO serializer queue in order to track block number

and ensure JPEG blocks are transmitted in order

26

MSC815x

M3 Memory

Presenter
Presentation Notes
The serializer concept is similar to the jitter buffer used in voice over IP (VoIP) applications. The differences are that the jitter buffer in VoIP is located at the receiving end of the voice connection.
Page 27: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

Session Summary

In this session we have discussed:

►Multicore programming models

►Major concerns regarding porting to a multicore system

►Optimizing performance at the multicore system level

Presenter
Presentation Notes
CORE SLIDE 6: This is largely the same as your “Session Objectives” slide but positioned in the past tense.
Page 28: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TMFreescale, the Freescale logo, AltiVec, C-5, CodeTEST, CodeWarrior, ColdFire, C-Ware, mobileGT, PowerQUICC, StarCore, and Symphony are trademarks of Freescale Semiconductor, Inc., Reg. U.S. Pat. & Tm. Off. BeeKit, BeeStack, CoreNet, the Energy Efficient Solutions logo, Flexis, MXC, Platform in a Package, Processor Expert, QorIQ, QUICC Engine, SMARTMOS, TurboLinkand VortiQa are trademarks of Freescale Semiconductor, Inc. All other product or service names are the property of their respective owners. © 2010 Freescale Semiconductor, Inc.

For Further Information

►MSC815x Reference Manual

►QUICC Engine Reference Manual

►AN3620

[email protected]

Presenter
Presentation Notes
Add text
Page 29: Programming for the Multicore MSC815x and MSC825x DSPs · This session will cover the basics of homogeneous vs heterogeneous multicore programming, the multicore architecture of MSC8156

TM