2000/03/051 this presentation will probably involve audience discussion, which will create action...

48
2000/03/05 1 Processor Requirements needed to optimize DSP performance M. R. Smith, Electrical and Computer Engineering, University of Calgary, Alberta, Canada smithmr @ ucalgary.ca

Post on 19-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 1

Processor Requirements needed to optimize

DSP performance

M. R. Smith, Electrical and Computer Engineering,University of Calgary, Alberta, Canada

smithmr @ ucalgary.ca

Page 2: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 2 / 48

To be tackled today Characteristics of DSP algorithms Specialized handling of

Multiplication Division (21K has no division instruction)

ENCM515 Reference Material How RISCy Is DSP, IEEE Micro (Jan-10) Simply Signal Processing (Jan-40) Fast Scaling, CCI (Apr-10) Saturation Arithmetic (Apr-20)

Page 3: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 3 / 48

DSP Algorithms

DSP algorithms require specialized features on processors

Processors are a compromise speed, cost, silicon

When have you as a designer found a compromise that meets your requirements?

As a consultant may have to add DSP characteristics to an existing system or add DSP coprocessor to an existing system

Page 4: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 4 / 48

FIR

Multiply/Addition intensive Sum operation with high precision -- overflow

considerations Long simple loop Online operation -- “infinite” amount of data Store coefficients on-chip for fast access Complex domain arithmetic

Page 5: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 5 / 48

IIR-1

Interrelated and order dependent multiplications and additions

Small number of delays via register moves? short loop -- low number of instructions in

loop which makes it difficult to optimize Precision -- very important because of

feedback Multiple stages -- I.e. IIR follows IIR etc

Page 6: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 6 / 48

IIR-2 LDI Short

complicatedloop

Manyintermediatevalues

Pipelineissuesbecause ofinterdependence

Page 7: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 7 / 48

FFT

Complex variables (A and B) and fixed coefficients (W) Address calculations complex Memory accesses numerable Multiplication and additions Need for fast access to many registers, address

pointers, constants, variables

Page 8: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 8 / 48

Fast instruction cycle -- needed

DSP chips -- two cycle instructions (on top of FETCH/DECODE) during which the processor performs many parallel operations More recent technology -- 1 clock cycle

Many processors takes 6 to 32 cycles to handle MULT, FMULT, FDIV or even FADD

Make processor highly pipelined -- pipeline must be started and then kept full FIR (easy to pipeline) IIR (hard to pipeline) FFT (challenging to pipeline)

Page 9: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 9 / 48

Loop Overhead -- must be minimized

Use specialized hardware specialized decrement and branch instructions

occurring in a single cycle instruction cached with counter superscalar operations delayed branches hardware loop control

Use specialized software techniques loop unrolling down counting loops

Page 10: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 10 / 48

Memory operations -- Many of them

Data/instruction and data/data conflicts Data caches

Will also have external data memory banks Harvard architecture branch target caches multi-ported memory register pre-forwarding -- avoid stalls while

trying to write back result of ALU operation only to re--access the same register

large register banks -- avoid memory ops associated with just calculated values

Page 11: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 11 / 48

Precision -- high but without speed loss

FIR -- accumulated value can grow big IIR -- recursive use of a value

External Memory bus width Internal Memory bus width Data width of registers and ALU Saturation arithmetic

Page 12: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 12 / 48

Saturation Arithmetic For full discussion see 21K SHARC user manual and

also “Being Assertive with your processor” (APR-20) Internal register 80 bits but external busses only 32

wide 0xFFFF F0000001 00000000

stored as F0000001 0xFFFF 00000001 00000000

stored as 00000001 (normal math) stored as 80000000 (saturation) Can be good solution (FIR) or bad solution (IIR) to the

problem of overflow

Page 13: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 13 / 48

Complex arithmetic -- frequency domain operations

Need to fetch real and imaginary parts in at different times during the algorithm

Need fast access to adjacent memory locations -- burst memory

Need for many internal registers to temporarily store real/imaginary components (FFT butterfly and last years exams)

Duplication of resources -- was custom, but consider now 21160

Page 14: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 14

DAG 2 8 x 4 x 32

DAG 1 8 x 4 x 32

CACHE MEMORY 32 x 48

PROGRAM SEQUENCER

PMD BUS

DMD BUS

32PMA BUS

PMD

DMD

PMA

32DMA BUSDMA

64

64

JTAG TEST & EMULATION

FLAGS

TIMER

TigerSHARC ADSP-21160 Core ArchitectureTigerSHARC ADSP-21160 Core Architecture

BUS CONNECT

FLOATING & FIXED-POINT MULTIPLIER,

FIXED-POINT

ACCUMULATOR

REGISTER FILE

16 x 40 32-BIT BARREL

SHIFTER FLOATING-POINT

&FIXED-POINT

ALU

FLOATING & FIXED-POINT MULTIPLIER,

FIXED-POINT

ACCUMULATOR

REGISTER FILE

16 x 40 32-BIT BARREL

SHIFTER FLOATING-POINT

&FIXED-POINT

ALU

Page 15: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 15 / 48

Page 16: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 16 / 48

Address calculations -- frequent

Complex addressing modes -- take many clock cycles

Use pointers and autoincrement rather than calculating pointer + offset need many address-related registers address calculations compete with ALU

calculations group instructions within program

e.g. read and store often use same or similar addresses so don’t recalculate the addresses.

Page 17: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 17 / 48

Specialized addressing modes

standard memory access premodify postmodify circular buffers (modulo arithmetic on

the address registers) bit-reverse addressing structure handling auto-increment with size accounted

for

Page 18: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 18 / 48

Key issue -- ease of development

Microcontrollers -- onboard peripherals Host communication Multiprocessor communications Simulators

Multi-processor operations Application notes Good working environment Compatibility to previous processor versions

-- legacy code (advantage and a disadvantage)

Page 19: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 19 / 48

Multiplication Extensive algorithms

Off-chip multipliers have big bottlenecks Get and then give instruction to multiplier Get and then give first, second data to multiplier Wait till cooked, and then get value

Newer chips have on-board multiplication or intelligent co-processors (F-LINE exceptions)

Many chips do multiplication using specialized techniques introduced by optimizing compiler

Page 20: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 20 / 48

Smart Multiplication through optimizing compiler techniques

29K RISC FMULT execution takes 6 cycles + fetch

16bit x 16bit INTEGER multiplication on 68K CISC takes 70 cycles regardless of operations

Use adds and shift instead since these take less time -- easy with integer, but floats? What are equivalent operations on 21K.

Discussed in early lecture on Quirks and SHARCs

Page 21: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 21 / 48

Smart Integer 68k Multiplication

Multiplication by 2, 4, 8, 16 Achieved by shifting 1, 2, 3 or 4 times

(done in 6 + 2n operations on 68K) D2 = D0 * 19

MOVE.W D0, D2ASL.W #4, D2 D2 = D0 * 16ADD.W D0, D2 D2 = D0 * 17ASL.W #1, D0 D0 = D0 *2ADD.W D0, D2 D2 = D0 * 19

(29 cycles compared to 70) Watch out for overflow, may need conversion to 32 bits (SSI, SSF

on some processssors -- not only 21k) Waste of time if have single cycle multipliers (21k?). Careful

because multiplication results may end in special register.

Page 22: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 22 / 48

Multiplication Extensive algorithms

Highly pipelined, therefore complex instruction interdependence R0 = R1 * R2 BUT R0 = R1 * R2 R3 = R4 * R5 R3 = R0 * R5 <- delay dependency

Need automated tools to schedule instructions Need multiple destinations (registers) for multiplier result Multiple and Accumulate (MAC) instruction

Super-scalar operations even on a simpler processor Cause problems in short loops Many types of MACs needed

Not all processors have the 21061 single cycle multiplication operation See “In the AM29050 a FIR-bearing animal” (FEB-80 in

class notes))

Page 23: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 23 / 48

Typically need “Normalization” of result

N point DFT Result = DFT (Input) ; 0 <= n < N

N point inverse DFT Result = IDFT (Input) / N ; 0 <= n < N

Division is typically done by the equivalent of repeated subtraction -- 150 cycles on 68K

result = 0;do { Numerator = Numerator - Denom;

result++;} while (Numerator > 0); result--;

Special shift-subtract tricks speed operations

Page 24: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 24 / 48

Smart Integer Division Division by 2, 4, 8, 16 unsigned signed LSL #1, D0 ASL #1, D0

Need to propagate (or not propagate) the sign bit

Unsigned original = 0x80 (128) final = 0x40 (64)

Signed original = 0x80 (-128) final = 0xC0 (-64)

Page 25: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 25 / 48

Floating Point Division The FDIV on 29K takes 15 cycles There is not a FDIV on the 21K -- use

recursion!!

Page 26: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 26 / 48

Why is floating point so difficult?

Number Internal representation 1.0 0x3F 80 00 00 32.0 0x42 00 00 00

31.98125 0x41 FF D9 9A 1023.4 0x44 7F D9 9A

31.98125 = 1023.4 / 32 = 1023.4 / 2^5

Page 27: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 27 / 48

Why is floating point so difficult?

“Fast scaling Routine for Floating-point RISC and DSP processors” (APR-10)

Floating Point Format

31 23 22 0S bexp frac

Page 28: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 28 / 48

Floating point number K s (bexp -127)

(-1) x 1.frac x 2

01.0 = 0x1.0 x 2

0 (127 - 127)(-1) x 0x1.0000 x 2

Page 29: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 29 / 48

Floating point number K

s (bexp -127)

(-1) x 1.frac x 2

3 310.0 = 0x10.0 = %1010.0 = %1.0100 x 2 (0x1.4 x 2 )

0 (130 - 127)(-1) x 0x1.4000 x 2

Page 30: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 30 / 48

IEEE Std. 754, 1985

Number Internal s bexp frac representation

1.0 0x3F 80 00 00 0 0x7F 0x00 00 00

32.0 0x42 00 00 00 0 0x84 0x00 00 00

31.98125 0x41 FF D9 9A 0 0x83 0x7F D9 9A 1023.4 0x44 7F D9 9A 0 0x88 0x7F D9 9A

1.frac -- only fractional part is stored

Remember JAMES BOND helped by M (Smith) “The ONE is remembered and not stored”

Page 31: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 31 / 48

Fast floating pt division possible

Number Internal s bexp frac representation

1.0 0x3F 80 00 00 0 0x7F 0x00 00 00

32.0 0x42 00 00 00 0 0x84 0x00 00 00 BEXP DIFF = 5 31.98125 0x41 FF D9 9A 0 0x83 0x7F D9 9A 1023.4 0x44 7F D9 9A 0 0x88 0x7F D9 9A BEXP DIFF = 5

K = K / -1 -- flip the sign bit with XOR instruction p K = K / N where N = 2 -- decrease bexp = bexp -5

Page 32: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 32 / 48

Fast Floating Point Division by 32 Doing it

29K -- FP# K is in gr96

Setting up the powerCONST BEXPchange, 5 Setting up the bexp-diffSLL BEXPchange, BEXPchange, 23 result = K / 32SUB result, K, BEXPchange <- REPEATEDNote -- when processing a large array -- only the

last step needed for every number (inside the loop)

Page 33: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 33 / 48

Fast Floating Point Division by FP Mwhen M is known to be 2^p

F0 = 1.0R0 = R8 - R0 // NOTE integer operation Setting up the bexp-diffR0 = ASHIFT R0 BY 23 result = K / 32R4 = R4 - R0

Works because

F8 = 32.0 (0x42000000) F0 = 1.0 (0x3F800000)

Page 34: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 34 / 48

PROBLEMS? Try to do 0 / 32 Get a large negative number

Number s bexp frac

0.0 0 0x00 0x00 00 00 subtract 0 0x05 0x00 00 00 -2.126 * 10^37 1 0xFB 0x00 00 00

If dividing by 2^p -- problems if number is smaller than 2^(p-127) Must be overcome on many processors Non-issue on 21k which has single cycle multiplication

and division. Calculate reciprocal and then multiply

Page 35: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 35 / 48

Must guarantee result 68K, 29K, MIPS and 21k problems

ADD.W R0, R1 ADD gr96, gr97, gr98 Every addition (subtraction) result has the

possibility of being out of range -- overflow. Must be tested.

68K solutionADD.W R0, R1BVS Somewhere <- Test takes cycles

29K and MIPS solution Special instructions -- ADDU and ADDS

21k solution is what?

Page 36: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 36 / 48

Specialized coding techniques e.g. 29k has the ability of “throwing” SWI as part of compare (ASSERT)

Test for FP number too small from previous special Division operation

CMP.L #toosmall, D0 68K codeBGE okay <- EXTRA cycles always

executedMOVE.L #0, D0BRA continue

okay: SUB.L #b_exp, D0continue:

ASGE TRAP#, temp, BEXPchange <- Only “compare” for 29k

SUB gr96, gr96, BEXPchange <- Not in a delay slot?where TOOSMALL: CONST gr96, 0

RTIExtra code only executed in the special case that it is

needed

Page 37: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 37 / 48

Specialized conditional instructions on 21k

21K -- F4 contains the FP value -- need F4/32

R0 = 5

R0 = ASHIFT R0 BY 23F1 = minimum value ( 2^(5-127) )

F2 = ABS F4COMP (F2, F1)IF GE R4 = R4 - R0 ELSE R4 = R4 - R4 <- NO DELAY

Can’t use ELSE R4 = 0

As this not a compute operation but uses 32-bit constant.

Page 38: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 38 / 48

LIES -- ALL LIESIF GE R4 = R4 - R0 ELSE R4 = R4 - R4

This is not a legal instruction either!! COMPUTE instructions take 22 bits to

describe IF JUMP/CALL ELSE R4 = R4 - R4 is allowed

Useless approach anyway since there are better ways on 21k to do repeated division by a constant.

Page 39: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 39 / 48

Processors compared IEEE Micro Magazine Special Feature 1992 DSP

TMS320C25, 030 DSP56000/1, DSP96002 (Motorola)

RISC i860 (Intel) MC88100 (Motorola) SPARC (Sparc Consortium NOT Sun) Am29050

Ideal -- SMITH CRISP

Page 40: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 40 / 48

CRISP -- triple pun as well Comprehensive RISC -- Predicted 1992

Harvard architecture MAC (rather than Super -- Scalar instructions) Ability to do X = R+S, Y = R-S operations many registers for address/values FP as well as integer capability Bit-reverse addressing Peripherals with DMA Low power standby High precision -- double precision Efficient pipeline with parallel completion of many

operations (dual-ported memory and register banks)

Page 41: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 41 / 48

Comparisons -- 1

Page 42: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 42 / 48

FIR/IIR

Page 43: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 43 / 48

FFT -- Radix 2 and Radix 4

Page 44: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 44 / 48

Requirements for “perfect” DSP

Fast instruction cycle -- different from high clock speed

Cycle time adjustable according to instruction type

Fast hardware multiplier Floating point for easier algorithm design High precision, implying wide data buses for

memory, internal processor transfers, registers and on-board processing units

Page 45: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 45 / 48

Requirements for “perfect” DSP

Several data buses available to reduce bus conflict transfer overhead

Harvard architecture and/or instruction cache to avoid instruction and data-fetch clashes

Duplicate resources for parallel computation of real and imaginary components of complex numbers

Dedicated hardware required for address calculations to avoid APU clash with main algorithm

Page 46: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 46 / 48

Requirements for “perfect” DSP

Extensive temporary registers to reduce unwanted fetches of continually used data Or single cycle, highly parallel, memory operations

Fast and reliable, easily programmed, developed and upgraded

Inexpensive and easy to develop peripherals High level of customer support Inexpensive to purchase Lower power consumption with a standby

mode

Page 47: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 47 / 48

Requirements for “perfect” DSP

Several data buses available to reduce bus conflict transfer overhead

Harvard architecture and/or instruction cache to avoid instruction and data-fetch clashes

Duplicate resources for parallel computation of real and imaginary components of complex numbers

Dedicated hardware required for address calculations to avoid APU

Page 48: 2000/03/051 This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items

2000/03/05 ENCM515 -- Characteristics needed in DSP processors

Copyright [email protected] 48 / 48

Tackled today Characteristics of DSP algorithms Specialized handling of

Multiplication Division (21K has no division instruction)

ENCM515 Reference Material How RISCy Is DSP, IEEE Micro (Jan-10) Simply Signal Processing (Jan-40) Fast Scaling, CCI (Apr-10) Saturation Arithmetic (Apr-20)