02 processor requirements
TRANSCRIPT
-
8/13/2019 02 Processor Requirements
1/48
2000/03/05 1
Processor Requirements
needed to optimizeDSP performance
M. R. Smith,
Electrical and Computer Engineering,University of Calgary, Alberta, Canada
smithmr @ ucalgary.ca
-
8/13/2019 02 Processor Requirements
2/48
2000/03/05ENCM515 -- Characteristics needed in DSP processors
Copyright [email protected] 2 / 48
To be tackled today
Characteristics of DSP algorithms
Specialized handling of
Multiplication
Division (21K has no division instruction)
ENCM515 Reference Material
How RISCy Is DSP, IEEE Micro (Jan-10)
Simply Signal Processing (Jan-40)
Fast Scaling, CCI (Apr-10) Saturation Arithmetic (Apr-20)
-
8/13/2019 02 Processor Requirements
3/48
-
8/13/2019 02 Processor Requirements
4/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 4 / 48
FIR
Multiply/Addition intensive
Sum operation with high precision -- overflow considerations
Long simple loop Online operation -- infinite amount of data
Store coefficients on-chip for fast access
Complex domain arithmetic
-
8/13/2019 02 Processor Requirements
5/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 5 / 48
IIR-1
Interrelated and order dependent multiplicationsand additions
Small number of delays via register moves?
short loop -- low number of instructions in loop
which makes it difficult to optimize Precision -- very important because of feedback
Multiple stages -- I.e.IIR follows IIR etc
-
8/13/2019 02 Processor Requirements
6/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 6 / 48
IIR-2 LDI
Short
complicatedloop
Many
intermediatevalues
Pipelineissues
because ofinterdependence
-
8/13/2019 02 Processor Requirements
7/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 7 / 48
FFT
Complex variables (A and B) and fixed coefficients (W)
Address calculations complex
Memory accesses numerable Multiplication and additions
Need for fast access to many registers, address pointers,constants, variables
-
8/13/2019 02 Processor Requirements
8/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 8 / 48
Fast instruction cycle -- needed
DSP chips -- two cycle instructions (on top of
FETCH/DECODE) during which the processor performsmany parallel operations
More recent technology -- 1 clock cycle
Many processors takes 6 to 32 cycles to handle MULT,FMULT, FDIV or even FADD
Make processor highly pipelined -- pipeline must bestarted and then kept full
FIR (easy to pipeline)
IIR (hard to pipeline)
FFT (challenging to pipeline)
-
8/13/2019 02 Processor Requirements
9/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 9 / 48
Loop Overhead -- must be minimized
Use specialized hardware specialized decrement and branch instructions
occurring in a single cycle
instruction cached with counter
superscalar operations
delayed branches
hardware loop control
Use specialized software techniques
loop unrolling
down counting loops
-
8/13/2019 02 Processor Requirements
10/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 10 / 48
Memory operations -- Many of them
Data/instruction and data/data conflicts
Data caches
Will also have external data memory banks
Harvard architecture
branch target caches multi-ported memory
register pre-forwarding -- avoid stalls while trying
to write back result of ALU operation only to re--access the same register
large register banks -- avoid memory opsassociated with just calculated values
-
8/13/2019 02 Processor Requirements
11/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 11 / 48
Precision -- high but without speed loss
FIR -- accumulated value can grow big
IIR -- recursive use of a value
External Memory bus width Internal Memory bus width
Data width of registers and ALU
Saturation arithmetic
-
8/13/2019 02 Processor Requirements
12/48
-
8/13/2019 02 Processor Requirements
13/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 13 / 48
Complex arithmetic -- frequency domain operations
Need to fetch real and imaginary parts in atdifferent times during the algorithm
Need fast access to adjacent memory
locations -- burst memory Need for many internal registers to
temporarily store real/imaginarycomponents (FFT butterfly and last yearsexams)
Duplication of resources -- was custom, butconsider now 21160
-
8/13/2019 02 Processor Requirements
14/48
2000/03/05 14
DAG 2
8 x 4 x 32
DAG 1
8 x 4 x 32
CACHE
MEMORY
32 x 48
PROGRAM
SEQUENCER
PMD BUS
DMD BUS
32PMA BUS
PMD
DMD
PMA
32DMA BUSDMA
64
64
JTAG TEST &
EMULATION
FLAGS
TIMER
TigerSHARC ADSP-21160 Core Architecture
BUS CONNECT
FLOATING & FIXED-POINT MULTIPLIER,
FIXED-POINT
ACCUMULATOR
REGISTERFILE
16 x 40 32-BITBARREL
SHIFTER
FLOATING-POINT&FIXED-POINT
ALU
FLOATING & FIXED-POINT MULTIPLIER,
FIXED-POINT
ACCUMULATOR
REGISTERFILE
16 x 4032-BITBARREL
SHIFTER
FLOATING-POINT&FIXED-POINT
ALU
-
8/13/2019 02 Processor Requirements
15/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 15 / 48
-
8/13/2019 02 Processor Requirements
16/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 16 / 48
Address calculations -- frequent
Complex addressing modes -- take manyclock cycles
Use pointers and autoincrement rather thancalculating pointer + offset
need many address-related registers
address calculations compete with ALUcalculations
group instructions within program
e.g. read and store often use same or similar addressesso dont recalculate the addresses.
-
8/13/2019 02 Processor Requirements
17/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 17 / 48
Specialized addressing modes
standard memory access
premodify
postmodify
circular buffers (modulo arithmetic on theaddress registers)
bit-reverse addressing
structure handling auto-increment with size accounted for
-
8/13/2019 02 Processor Requirements
18/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 18 / 48
Key issue -- ease of development
Microcontrollers -- onboard peripherals
Host communication
Multiprocessor communications
Simulators
Multi-processor operations
Application notes
Good working environment
Compatibility to previous processor versions --legacy code (advantage and a disadvantage)
-
8/13/2019 02 Processor Requirements
19/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 19 / 48
Multiplication Extensive algorithms
Off-chip multipliers have big bottlenecks
Get and then give instruction to multiplier
Get and then give first, second data to multiplier
Wait till cooked, and then get value
Newer chips have on-board multiplication orintelligent co-processors (F-LINE exceptions)
Many chips do multiplication using specializedtechniques introduced by optimizing compiler
-
8/13/2019 02 Processor Requirements
20/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 20 / 48
Smart Multiplication throughoptimizing compiler techniques
29K RISC FMULT execution takes 6 cycles +fetch
16bit x 16bit INTEGER multiplication on 68KCISC takes 70 cycles regardless of operations
Use adds and shift instead since these takeless time -- easy with integer, but floats?
What are equivalent operations on 21K. Discussedin early lecture on Quirks and SHARCs
-
8/13/2019 02 Processor Requirements
21/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 21 / 48
Smart Integer 68k Multiplication
Multiplication by 2, 4, 8, 16
Achieved by shifting 1, 2, 3 or 4 times(done in 6 + 2n operations on 68K)
D2 = D0 * 19MOVE.W D0, D2
ASL.W #4, D2 D2 = D0 * 16ADD.W D0, D2 D2 = D0 * 17
ASL.W #1, D0 D0 = D0 *2
ADD.W D0, D2 D2 = D0 * 19(29 cycles compared to 70)
Watch out for overflow, may need conversion to 32 bits (SSI, SSF onsome processssors -- not only 21k)
Waste of time if have single cycle multipliers (21k?). Careful becausemultiplication results may end in special register.
-
8/13/2019 02 Processor Requirements
22/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 22 / 48
Multiplication Extensive algorithms
Highly pipelined, therefore complex instructioninterdependence
R0 = R1 * R2 BUT R0= R1 * R2
R3 = R4 * R5 R3 = R0* R5
-
8/13/2019 02 Processor Requirements
23/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 23 / 48
Typically need Normalization of result
N point DFT Result = DFT (Input) ; 0
-
8/13/2019 02 Processor Requirements
24/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 24 / 48
Smart Integer Division
Division by 2, 4, 8, 16
unsigned signed
LSL #1, D0 ASL #1, D0
Need to propagate (or not propagate) the signbit
Unsigned original = 0x80 (128) final = 0x40 (64)
Signed original = 0x80 (-128) final = 0xC0 (-64)
F oating Point Division
-
8/13/2019 02 Processor Requirements
25/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 25 / 48
F oating Point Division The FDIV on 29K takes 15 cycles
There is not a FDIVon the 21K -- use recursion!!
-
8/13/2019 02 Processor Requirements
26/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 26 / 48
Why is floating point so difficult?
Number Internal representation
1.0 0x3F 80 00 00
32.0 0x42 00 00 00
31.98125 0x41 FF D9 9A
1023.4 0x44 7F D9 9A
31.98125 = 1023.4 / 32 = 1023.4 / 2^5
-
8/13/2019 02 Processor Requirements
27/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 27 / 48
Why is floating point so difficult?
Fast scaling Routine for Floating-point
RISC and DSP processors (APR-10)
Floating Point Format
31 23 22 0
S bexp frac
-
8/13/2019 02 Processor Requirements
28/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 28 / 48
Floating point number Ks (bexp -127)
(-1) x 1.frac x 2
0
1.0 = 0x1.0 x 2
0 (127 - 127)
(-1) x 0x1.0000 x 2
-
8/13/2019 02 Processor Requirements
29/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 29 / 48
Floating point number K
s (bexp -127)
(-1) x 1.frac x 2
3 310.0 = 0x10.0 = %1010.0 = %1.0100 x 2 (0x1.4 x 2 )
0 (130 - 127)
(-1) x 0x1.4000 x 2
-
8/13/2019 02 Processor Requirements
30/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 30 / 48
IEEE Std. 754, 1985
Number Internal s bexp frac
representation
1.0 0x3F 80 00 00 0 0x7F 0x00 00 00
32.0 0x42 00 00 00 0 0x84 0x00 00 00
31.98125 0x41 FF D9 9A 0 0x83 0x7F D9 9A
1023.4 0x44 7F D9 9A 0 0x88 0x7F D9 9A
1.frac -- only fractional part is stored
Remember JAMES BOND helped by M (Smith)
The ONE is remembered and not stored
-
8/13/2019 02 Processor Requirements
31/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 31 / 48
Fast floating pt division possible
Number Internal s bexp fracrepresentation
1.0 0x3F 80 00 00 0 0x7F 0x00 00 00
32.0 0x42 00 00 00 0 0x84 0x00 00 00
BEXP DIFF = 5
31.98125 0x41 FF D9 9A 0 0x83 0x7F D9 9A
1023.4 0x44 7F D9 9A 0 0x88 0x7F D9 9A
BEXP DIFF = 5
K = K / -1 -- flip the sign bit with XOR instructionp
K = K / N where N = 2 -- decrease bexp = bexp -5
-
8/13/2019 02 Processor Requirements
32/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 32 / 48
Fast Floating Point Division by 32 Doing it
29K -- FP# K is in gr96
Setting up the power
CONST BEXPchange, 5
Setting up the bexp-diffSLL BEXPchange, BEXPchange, 23
result = K / 32
SUB result, K, BEXPchange
-
8/13/2019 02 Processor Requirements
33/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 33 / 48
Fast Floating Point Division by FP Mwhen M is known to be 2^p
F0 = 1.0
R0 = R8 - R0 // NOTE integer operation
Setting up the bexp-diff
R0 = ASHIFT R0 BY 23
result = K / 32R4 = R4 - R0
Works becauseF8 = 32.0 (0x42000000)
F0 = 1.0 (0x3F800000)
PROBLEMS?
-
8/13/2019 02 Processor Requirements
34/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 34 / 48
PROBLEMS?
Try to do 0 / 32
Get a large negative number
Number s bexp frac
0.0 0 0x00 0x00 00 00
subtract 0 0x05 0x00 00 00-2.126 * 10^37 1 0xFB 0x00 00 00
If dividing by 2^p -- problems if number is smaller than 2^(p-127)
Must be overcome on many processors
Non-issue on 21k which has single cycle multiplicationand division. Calculate reciprocal and then multiply
M t t lt
-
8/13/2019 02 Processor Requirements
35/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 35 / 48
Must guarantee result
68K, 29K, MIPS and 21k problems
ADD.W R0, R1 ADD gr96, gr97, gr98
Every addition (subtraction) result has thepossibility of being out of range-- overflow. Mustbe tested.
68K solutionADD.W R0, R1
BVS Somewhere
-
8/13/2019 02 Processor Requirements
36/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 36 / 48
Specialized coding techniques e.g. 29k has the ability ofthrowing SWI as part of compare (ASSERT)
Test for FP number too small from previous specialDivision operation
CMP.L #toosmall, D0 68K code
BGE okay
-
8/13/2019 02 Processor Requirements
37/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 37 / 48
Specialized conditional instructions on 21k
21K -- F4 contains the FP value -- need F4/32
R0 = 5R0 = ASHIFT R0 BY 23
F1 = minimum value ( 2^(5-127) )
F2 = ABS F4
COMP (F2, F1)
IF GE R4 = R4 - R0 ELSE R4 = R4 - R4
-
8/13/2019 02 Processor Requirements
38/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 38 / 48
LIES -- ALL LIESIF GE R4 = R4 - R0 ELSE R4 = R4 - R4
This is not a legal instruction either!!
COMPUTE instructions take 22 bits to
describe IF JUMP/CALL ELSE R4 = R4 - R4 is allowed
Useless approach anyway since there arebetter ways on 21k to do repeated divisionby a constant.
-
8/13/2019 02 Processor Requirements
39/48
-
8/13/2019 02 Processor Requirements
40/48
Compa isons 1
-
8/13/2019 02 Processor Requirements
41/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 41 / 48
Comparisons -- 1
FIR/IIR
-
8/13/2019 02 Processor Requirements
42/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 42 / 48
FIR/IIR
FFT Radix 2 and Radix 4
-
8/13/2019 02 Processor Requirements
43/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 43 / 48
FFT -- Radix 2 and Radix 4
Requirements for perfect DSP
-
8/13/2019 02 Processor Requirements
44/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 44 / 48
Requirements for perfect DSP
Fast instruction cycle -- different from high clock
speed Cycle time adjustable according to instruction type
Fast hardware multiplier
Floating point for easier algorithm design High precision, implying wide data buses for
memory, internal processor transfers, registersand on-board processing units
Requirements for perfect DSP
-
8/13/2019 02 Processor Requirements
45/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 45 / 48
Requirements for perfect DSP
Several data buses available to reduce bus conflicttransfer overhead
Harvard architecture and/or instruction cache toavoid instruction and data-fetch clashes
Duplicate resources for parallel computation of realand imaginary components of complex numbers
Dedicated hardware required for address
calculations to avoid APU clash with main algorithm
Requirements for perfect DSP
-
8/13/2019 02 Processor Requirements
46/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 46 / 48
Requirements for perfect DSP
Extensive temporary registers to reduce unwantedfetches of continually used data
Or single cycle, highly parallel, memory operations
Fast and reliable, easily programmed, developed
and upgraded
Inexpensive and easy to develop peripherals
High level of customer support
Inexpensive to purchase Lower power consumption with a standby mode
Requirements for perfect DSP
-
8/13/2019 02 Processor Requirements
47/48
2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 47 / 48
Requirements for perfect DSP
Several data buses available to reduce bus conflict
transfer overhead Harvard architecture and/or instruction cache to
avoid instruction and data-fetch clashes
Duplicate resources for parallel computation ofreal and imaginary components of complexnumbers
Dedicated hardware required for address
calculations to avoid APU
Tackled today
-
8/13/2019 02 Processor Requirements
48/48
Tackled today
Characteristics of DSP algorithms
Specialized handling of
Multiplication
Division (21K has no division instruction)
ENCM515 Reference Material
How RISCy Is DSP, IEEE Micro (Jan-10)
Simply Signal Processing (Jan-40)
Fast Scaling, CCI (Apr-10) Saturation Arithmetic (Apr-20)