1 introduction to dsp processor 20140919

Post on 18-Aug-2015

70 Views

Category:

Devices & Hardware

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Introduction to DSP Processor

Hans Kuohans.kuo@tatung.com

2

OUTLINE

Introduction to DSP Processor C6000 Architecture C6000 Memory Map Homework 1

3

OUTLINE

Introduction to DSP Processor C6000 Architecture C6000 Memory Map Homework 1

Silicon Solutions

Decision table for designers of real-time

“Choosing the Right Architecture for Real-Time Signal Processing Designs”, Leon Adams, Texas Instruments

4

Programmability : GPP > DSP > FPGA > ASIC Performance : ASIC > FPGA > DSP > GPP Example : Wireless communication

GPP : OS, Network Protocol DSP : A/V Codec ASIC, FPGA : Reed Solomon, Viterbi decoder

Evaluating Category ASIC FPGA DSP GPP

Programmability 1 4 5 5

Development Cycle 2 3 4 5

Performance 5 5 4 2

Power consumption 4 2 2 2

GPP : general-purpose processor DSP : digital signal processorFPGA : field programmable gate arrayASIC : application specific IC

Silicon Solutions

5

Ti Embedded Processors

32-bitReal-time

32-bit ARM (MCU)

ARM M3/M4

Industry StdLow Power

<100 MHz

Flash64 KB to 1 MB

USB, ENET, ADC, PWM, SPI

Host Control

$2.00 to $8.00

16-bit

Microcontrollers

MSP430

Ultra-Low Power

Up to 25 MHz

Flash1 KB to 256 KB

Analog I/O, ADCLCD, USB, RF

Measurement,Sensing, General

Purpose

$0.49 to $9.00

DSPs

C647x, C64x+, C674x, C55x

Leadership DSP Performance

24,000 MMACS

Up to 3 MB L2 Cache

1G EMAC, SRIO,DDR2, PCI-66

Comm, WiMAX, Industrial/

Medical Imaging

$4.00 to $99.00+

ARM(MPU)

ARM9Cortex A-8

Industry-Std Core,High-Perf GPP

Accelerators

MMU

USB, LCD,MMC, EMAC

Linux/WinCE User Apps

$8.00 to $35.00

DSP

DaVinci, OMAP

Industry-Std Core +DSP for Signal Proc.

4800 MMACs/1.07 DMIPS/MHz

MMU, Cache

VPSS, USB, EMAC, MMC

Linux/Win +Video, Imaging,

Multimedia

$12.00 to $65.00

ARM + DSP

ARM-Based

C2000™

Fixed & Floating Point

Up to 300 MHz

Flash32 KB to 512 KB

PWM, ADC, CAN, SPI, I2C

Motor Control, Digital Power,

Lighting, Sensing

$1.50 to $20.00

6

7

DSP Applications

8

Why do we need DSP processors?

The Sum of Products (SOP) or Multiply-accumulate(MAC) is the key element in most DSP algorithms:

Algorithm Equation

Finite Impulse Response Filter

M

kk knxany

0

)()(

Infinite Impulse Response Filter

N

kk

M

kk knybknxany

10

)()()(

Convolution

N

k

knhkxny0

)()()(

Discrete Fourier Transform

1

0

])/2(exp[)()(N

n

nkNjnxkX

Discrete Cosine Transform

1

0

122

cos).().(N

x

xuN

xfucuF

9

Hardware vs. Software multiplication

DSP processors are optimized to perform multiplication and addition operations.

Multiplication and addition are done in hardware and in one cycle.

Example: 4-bit multiply (unsigned).

1011x 1110

1011x 1110

Hardware Software

10011010 00001011.1011..

1011...

10011010

Cycle 1Cycle 2Cycle 3Cycle 4

Cycle 5

10

OUTLINE

Introduction to DSP Processor C6000 Architecture C6000 Memory Map Homework 1

11

C6000 System Block Diagram

PERIPHERALS

Internal Memory

Internal Buses

ExternalMemory

.D1

.M1

.L1

.S1

.D2

.M2

.L2

.S2

Regs (B

0-B15)

Regs (A

0-A15)

Control Regs

CPU

12

C6000 Central Processing Unit

PERIPHERALS

Internal Memory

Internal Buses

ExternalMemory

.D1

.M1

.L1

.S1

.D2

.M2

.L2

.S2

Regs (B

0-B15)

Regs (A

0-A15)

Control Regs

CPU

13

Implementation of Sum of Products (SOP)

SOP is the key element for most DSP algorithms.

let’s write the code for this algorithm and at the same time discover the C6000 architecture.

The implementation in this module will be done in assembly.

Two basic

operations are required

for this algorithm.

(1) Multiplication

(2) Addition

Therefore two basic

instructions are required

Y =N

å an xnn = 1

*

= a1 * x1 + a2 * x2 +... + aN * xN

14

Multiply (MPY)

The multiplication of a1 by x1 is done in assembly by the following instruction:

MPY a1, x1, Y

This instruction is performed by a multiplier unit that is called “.M”

Y =N

å an xnn = 1

*

= a1 * x1 + a2 * x2 +... + aN * xN

15

Multiply (.M unit)

.M.M

Y =40

å an xnn = 1

*

The . M unit performs multiplications in hardware

MPY .M a1, x1, Y

16

Addition (.?)

.M.M

.?.?

Y =40

å an xnn = 1

*

MPY .M a1, x1, prod

ADD .? Y, prod, Y

17

Add (.L unit)

.M.M

.L.L

Y =40

å an xnn = 1

*

MPY .M a1, x1, prod

ADD .L Y, prod, Y

C6000 use registers to hold the operands, so lets change this code.

18

Register File - A

Y =40

å an xnn = 1

*

MPY .M a1, x1, prod

ADD .L Y, prod, Y

.M.M

.L.L

A0A1A2A3A4

A15

Register File A

.

.

.

a1x1

prod

32-bits

Y

Let us correct this by replacing a, x, prod and Y by the registers as shown above.

19

Specifying Register Names

Y =40

å an xnn = 1

*

MPY .M A0, A1, A3

ADD .L A4, A3, A4

Register File A contains 16 registers (A0 -A15) which are 32-bits wide.

.M.M

.L.L

A0A1A2A3A4

A15

Register File A

.

.

.

a1x1

prod

32-bits

Y

20

Data loading

Q: How do we load the operands into the registers?

.M.M

.L.L

A0A1A2A3A4

A15

Register File A

.

.

.

a1x1

prod

32-bits

Y

21

Load Unit “.D”

.M.M

.L.L

A0

A1

A2

A3

A15

Register File A

.

.

.

a1x1

prod

32-bits

Y

.D.D

Data Memory

A: The operands are loaded into the registers by loading them from the memory using the .D unit.

Q: How do we load the operands into the registers?

Q: Which instruction(s) can be used for loading operands from the memory to the registers?

A: The load instructions.

(LDB, LDH,LDW,LDDW)

22

Using the Load Instructions

Y =40

å an xnn = 1

*

LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

.M.M

.L.L

A0

A1

A2

A3

A15

Register File A

.

.

.

a1x1

prod

32-bits

Y

.D.D

Data Memory

23

Creating a loop

So far we have only implemented the SOP for one tap only, i.e.

Y= a1 * x1

So let’s create a loop so that we can implement the SOP for N Taps.

Y =40

å an xnn = 1

*

LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

24

Create a label to branch

loop LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4

Y =40

å an xnn = 1

*

25

Add a branch instruction, B.

loop LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4 B .? loop

Y =40

å an xnn = 1

*

26

Which unit is used by the B instruction?

.S.SY =

40

å an xnn = 1

*

.M.M

.L.L

A0

A1

A2

A3

A15

Register File A

.

.

.

a1x1

prod

32-bits

Y

.D.D

Data Memory

loop LDH .D *A5, A0

LDH .D *A6, A1

MPY .M A0, A1, A3

ADD .L A4, A3, A4 B .S loop

27

How can we add more processing power to this processor?

.S.S

.M.M

.L.L

A0

A1

A2

A3

A15

Register File A

.

.

.

32-bits

.D.D

Data Memory

(1 ) Increase the clockfrequency.

(2 ) Increase the number of Processing units.

28

Increase the number of Processing units

.S.S

.M.M

.L.L

A0

A1

A2

A3

A15

Register File A

.

.

.

32-bits

.D.D

Data Memory

.S2.S2

.M2.M2

.L2.L2

.D2.D2

B0

B1

B2

B3

B15

Register File B

.

.

.

32-bits

29

C6211 Instruction Set (by unit)

.S Unit

MVKLHNEGNOT ORSETSHLSHRSSHLSUBSUB2XORZERO

ADDADDKADD2ANDBCLREXTMVMVCMVKMVKLMVKH

.M Unit

SMPYSMPYH

MPYMPYH

.L Unit

NOTORSADDSATSSUBSUBSUBCXORZERO

ABSADDANDCMPEQCMPGTCMPLTLMBDMVNEGNORM

.D Unit

STB/H/WSUBSUBAZERO

ADDADDALDB/H/WMVNEG

Other

IDLENOP

30

C language vs Assembly

HandOptimize

AssemblyOptimizer

CompilerOptimizer

Source Efficiency Effort

C

LinearASM

ASM

70-100%

95-100%

100%

Low

Med

High

31

'C6x Peripherals

Internal Memory

Internal Buses

ExternalMemory

.D1

.M1

.L1

.S1

.D2

.M2

.L2

.S2

Regs (B

0-B15)

Regs (A

0-A15)

Control Regs

CPU

PERIPHERALS

32

'C6x Peripherals

EMIF (External Memory Interface)

- Glueless access to async/sync memory

EPROM, SRAM, SDRAM, SBSRAM

DMA/EDMA (Enhance Direct Memory Acces)

- 4/16 Channels

BOOT

- Boot from 4M external block

- Boot from HPI/XB

‘C6x

CPU

‘C6x

CPU

EMIFEMIF

DMADMA

BootBoot

ExternalMemory

McBSPMcBSP

HPI/XBHPI/XB

TimerTimer

PLLPLL

McBSP (Multi-Channel Buffered

Serial Port) - High speed sync serial comm

- T1/E1/MVIP interface

HPI (Host Port Interface)

/Expansion Bus (XB)- 16/32-bit host P access

Timer/Counters- Two 32-bit Timer/Counters

33

OUTLINE

Introduction to DSP Processor C6000 Architecture C6000 Memory Map Homework 1 Reference

34

C6000 Memory

PERIPHERALS

Internal Memory

Internal Buses

ExternalMemory

.D1

.M1

.L1

.S1

.D2

.M2

.L2

.S2

Regs (B

0-B15)

Regs (A

0-A15)

Control Regs

CPU

35

C6416 Memory Map

FFFF_FFFF

0000_0000 1024KB Internal (L2 cache)

Internal Memory Unified (data or prog) 1024KB

On-chip Peripherals0180_0000

External Memory Async (SRAM, ROM, etc.) Sync (SBSRAM, SDRAM)

6000_0000

8000_0000

EMIFB 64MB x 4 External

Level 1 Cache 16KB Program 16KB Data Not in map CPU L2

1024K

16KP

16KD

EMIFA 256MB x 4 External

36

Memory Allocation

C source code

CompilerAssmebler

COFFObject file

Text

Data

Bss

COFFObject file

ROM

External RAM

Internal RAM

Target Memory0x00000

0xfffff

SECTION

Stack

Heap

Text

Data

Bss

MEMORY

Memory Layout

MEMORY { ISRAM : origin = 0x00000000, len = 0x00100000}SECTIONS{ .text > ISRAM}

37

What is stored in memory ?

What is stored in memory ? Code Constants Global and static variables Local variables Dynamic memory

Memory 0x00000

0xfffff

38

How is memory organized?

How is memory organized? text : Code and constant data data : Initialized global and

static variables bss : Unintialized global and

static variables stack :

Local variables Function return addresses Arguments of function

heap : Dynamic memory

Memory 0x00000

0xfffff

stack

heap

bss

data

text

39

How is memory allocated?

How is memory allocated ?

long array[100];long bufsize =100;int main(void) { int i; char* buf; i=10; buf=f1(i); return(0);}

Char* f1(int n){ int k; Return malloc(bufsize);}

Memory 0x00000

0xfffff

heapbssdata

text

stack

100 byte block

array[100]

bufsize = 100

int main(void) { i=10; buf=f1(i); return(0);} …

Main return addressibuff1 argument nf1 return addressk

40

Memory Allocation & Deallocation

How, and when , is memory allocated? Gobal and static variables = program startup Local variables = function call Dynamic memory = malloc()

How, and when, is memory deallocated? Global and static variables = program finish Local variables = function return Dynamic memory = free()

41

When is memory allocated?

long array[100];long bufsize =100;int main(void) { int i; char* buf; i=10; buf=f1(i); return(0);}

Char* f1(int n){ int k; Return malloc(bufsize);}

bss : 0 at startupdata : 100 at startup

Stack : at function call

Stack : at function call

Heap : 100 bytes at malloc()

42

When is memory deallocated?

long array[100];long bufsize =100;int main(void) { int i; char* buf; i=10; buf=f1(i); return(0);}

Char* f1(int n){ int k; Return malloc(bufsize);}

Available till terminationAvailable till termination

Deallocate on return from main()

Deallocate on return from f1()

Deallocate on free()

43

Sections defined in C6000 compiler

Initialized sections .cinit : Initial values for global/static variables .const : Global and static string literals .switch : Tables for switch instructions .text : code

Uninitialized sections .bss : Global and static variables .stack : Stack(local variables, return address, arguments) .far : Global and statics declared far .sysmem : Memory for malloc functions (heap)

44

Example : 6416 DSK

16MB512KB

45

Example : C6416 DSK

Base Length

Internal Memory 0x00000000 0x00100000 (1024K)

External SDRAM 0x80000000 0x01000000(16M)

External Flash 0x64000000 0x00080000 (512K)

46

Linker command file (*.cmd)

MEMORY Directive System memory description Name : origin = address, length = size-in-bytes

MEMORY{ ISRAM : origin = 0x00000000, len = 0x00100000 SDRAM : origin = 0x80000000, len = 0x01000000 FLASH : origin = 0x64000000, len = 0x00080000}

47

Linker command file (*.cmd)

SECTIONS Directive Binding sections to memory

SECTIONS{ .text > ISRAM .bss > ISRAM .cinit > ISRAM …}

48

C6416.cmd

-stack 0x400MEMORY{ ISRAM : origin = 0x00000000, len = 0x00100000 SDRAM : origin = 0x80000000, len = 0x01000000 FLASH : origin = 0x64000000, len = 0x00080000}SECTIONS{ .text > ISRAM .bss > ISRAM .cinit > ISRAM .stack > ISRAM …}

49

DSP/BIOS Configure Tool (*.cdb)

ISRAM Properties

System memory description

50

DSP/BIOS Configure Tool (*.cdb)

Properties

Binding sections to memory

Program Cases :

Case 1 :

51

Void main(){ int Image[1000]; …. }

int Image[1000];Void main(){ …. }

stack = ?

stack 0x400 (1024)

Program Cases :

Case 2 :

52

Void main(){ double Image[200000]; …. }

52

bss > SDRAM

stack 0x400 (1024)

bss < 0x100000 (1024k)double Image[200000];Void main(){ …. }

Q&A

top related