02 processor requirements

Upload: vaishali-wagh

Post on 04-Jun-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 02 Processor Requirements

    1/48

    2000/03/05 1

    Processor Requirements

    needed to optimizeDSP performance

    M. R. Smith,

    Electrical and Computer Engineering,University of Calgary, Alberta, Canada

    smithmr @ ucalgary.ca

  • 8/13/2019 02 Processor Requirements

    2/48

    2000/03/05ENCM515 -- Characteristics needed in DSP processors

    Copyright [email protected] 2 / 48

    To be tackled today

    Characteristics of DSP algorithms

    Specialized handling of

    Multiplication

    Division (21K has no division instruction)

    ENCM515 Reference Material

    How RISCy Is DSP, IEEE Micro (Jan-10)

    Simply Signal Processing (Jan-40)

    Fast Scaling, CCI (Apr-10) Saturation Arithmetic (Apr-20)

  • 8/13/2019 02 Processor Requirements

    3/48

  • 8/13/2019 02 Processor Requirements

    4/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 4 / 48

    FIR

    Multiply/Addition intensive

    Sum operation with high precision -- overflow considerations

    Long simple loop Online operation -- infinite amount of data

    Store coefficients on-chip for fast access

    Complex domain arithmetic

  • 8/13/2019 02 Processor Requirements

    5/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 5 / 48

    IIR-1

    Interrelated and order dependent multiplicationsand additions

    Small number of delays via register moves?

    short loop -- low number of instructions in loop

    which makes it difficult to optimize Precision -- very important because of feedback

    Multiple stages -- I.e.IIR follows IIR etc

  • 8/13/2019 02 Processor Requirements

    6/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 6 / 48

    IIR-2 LDI

    Short

    complicatedloop

    Many

    intermediatevalues

    Pipelineissues

    because ofinterdependence

  • 8/13/2019 02 Processor Requirements

    7/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 7 / 48

    FFT

    Complex variables (A and B) and fixed coefficients (W)

    Address calculations complex

    Memory accesses numerable Multiplication and additions

    Need for fast access to many registers, address pointers,constants, variables

  • 8/13/2019 02 Processor Requirements

    8/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 8 / 48

    Fast instruction cycle -- needed

    DSP chips -- two cycle instructions (on top of

    FETCH/DECODE) during which the processor performsmany parallel operations

    More recent technology -- 1 clock cycle

    Many processors takes 6 to 32 cycles to handle MULT,FMULT, FDIV or even FADD

    Make processor highly pipelined -- pipeline must bestarted and then kept full

    FIR (easy to pipeline)

    IIR (hard to pipeline)

    FFT (challenging to pipeline)

  • 8/13/2019 02 Processor Requirements

    9/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 9 / 48

    Loop Overhead -- must be minimized

    Use specialized hardware specialized decrement and branch instructions

    occurring in a single cycle

    instruction cached with counter

    superscalar operations

    delayed branches

    hardware loop control

    Use specialized software techniques

    loop unrolling

    down counting loops

  • 8/13/2019 02 Processor Requirements

    10/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 10 / 48

    Memory operations -- Many of them

    Data/instruction and data/data conflicts

    Data caches

    Will also have external data memory banks

    Harvard architecture

    branch target caches multi-ported memory

    register pre-forwarding -- avoid stalls while trying

    to write back result of ALU operation only to re--access the same register

    large register banks -- avoid memory opsassociated with just calculated values

  • 8/13/2019 02 Processor Requirements

    11/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 11 / 48

    Precision -- high but without speed loss

    FIR -- accumulated value can grow big

    IIR -- recursive use of a value

    External Memory bus width Internal Memory bus width

    Data width of registers and ALU

    Saturation arithmetic

  • 8/13/2019 02 Processor Requirements

    12/48

  • 8/13/2019 02 Processor Requirements

    13/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 13 / 48

    Complex arithmetic -- frequency domain operations

    Need to fetch real and imaginary parts in atdifferent times during the algorithm

    Need fast access to adjacent memory

    locations -- burst memory Need for many internal registers to

    temporarily store real/imaginarycomponents (FFT butterfly and last yearsexams)

    Duplication of resources -- was custom, butconsider now 21160

  • 8/13/2019 02 Processor Requirements

    14/48

    2000/03/05 14

    DAG 2

    8 x 4 x 32

    DAG 1

    8 x 4 x 32

    CACHE

    MEMORY

    32 x 48

    PROGRAM

    SEQUENCER

    PMD BUS

    DMD BUS

    32PMA BUS

    PMD

    DMD

    PMA

    32DMA BUSDMA

    64

    64

    JTAG TEST &

    EMULATION

    FLAGS

    TIMER

    TigerSHARC ADSP-21160 Core Architecture

    BUS CONNECT

    FLOATING & FIXED-POINT MULTIPLIER,

    FIXED-POINT

    ACCUMULATOR

    REGISTERFILE

    16 x 40 32-BITBARREL

    SHIFTER

    FLOATING-POINT&FIXED-POINT

    ALU

    FLOATING & FIXED-POINT MULTIPLIER,

    FIXED-POINT

    ACCUMULATOR

    REGISTERFILE

    16 x 4032-BITBARREL

    SHIFTER

    FLOATING-POINT&FIXED-POINT

    ALU

  • 8/13/2019 02 Processor Requirements

    15/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 15 / 48

  • 8/13/2019 02 Processor Requirements

    16/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 16 / 48

    Address calculations -- frequent

    Complex addressing modes -- take manyclock cycles

    Use pointers and autoincrement rather thancalculating pointer + offset

    need many address-related registers

    address calculations compete with ALUcalculations

    group instructions within program

    e.g. read and store often use same or similar addressesso dont recalculate the addresses.

  • 8/13/2019 02 Processor Requirements

    17/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 17 / 48

    Specialized addressing modes

    standard memory access

    premodify

    postmodify

    circular buffers (modulo arithmetic on theaddress registers)

    bit-reverse addressing

    structure handling auto-increment with size accounted for

  • 8/13/2019 02 Processor Requirements

    18/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 18 / 48

    Key issue -- ease of development

    Microcontrollers -- onboard peripherals

    Host communication

    Multiprocessor communications

    Simulators

    Multi-processor operations

    Application notes

    Good working environment

    Compatibility to previous processor versions --legacy code (advantage and a disadvantage)

  • 8/13/2019 02 Processor Requirements

    19/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 19 / 48

    Multiplication Extensive algorithms

    Off-chip multipliers have big bottlenecks

    Get and then give instruction to multiplier

    Get and then give first, second data to multiplier

    Wait till cooked, and then get value

    Newer chips have on-board multiplication orintelligent co-processors (F-LINE exceptions)

    Many chips do multiplication using specializedtechniques introduced by optimizing compiler

  • 8/13/2019 02 Processor Requirements

    20/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 20 / 48

    Smart Multiplication throughoptimizing compiler techniques

    29K RISC FMULT execution takes 6 cycles +fetch

    16bit x 16bit INTEGER multiplication on 68KCISC takes 70 cycles regardless of operations

    Use adds and shift instead since these takeless time -- easy with integer, but floats?

    What are equivalent operations on 21K. Discussedin early lecture on Quirks and SHARCs

  • 8/13/2019 02 Processor Requirements

    21/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 21 / 48

    Smart Integer 68k Multiplication

    Multiplication by 2, 4, 8, 16

    Achieved by shifting 1, 2, 3 or 4 times(done in 6 + 2n operations on 68K)

    D2 = D0 * 19MOVE.W D0, D2

    ASL.W #4, D2 D2 = D0 * 16ADD.W D0, D2 D2 = D0 * 17

    ASL.W #1, D0 D0 = D0 *2

    ADD.W D0, D2 D2 = D0 * 19(29 cycles compared to 70)

    Watch out for overflow, may need conversion to 32 bits (SSI, SSF onsome processssors -- not only 21k)

    Waste of time if have single cycle multipliers (21k?). Careful becausemultiplication results may end in special register.

  • 8/13/2019 02 Processor Requirements

    22/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 22 / 48

    Multiplication Extensive algorithms

    Highly pipelined, therefore complex instructioninterdependence

    R0 = R1 * R2 BUT R0= R1 * R2

    R3 = R4 * R5 R3 = R0* R5

  • 8/13/2019 02 Processor Requirements

    23/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 23 / 48

    Typically need Normalization of result

    N point DFT Result = DFT (Input) ; 0

  • 8/13/2019 02 Processor Requirements

    24/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 24 / 48

    Smart Integer Division

    Division by 2, 4, 8, 16

    unsigned signed

    LSL #1, D0 ASL #1, D0

    Need to propagate (or not propagate) the signbit

    Unsigned original = 0x80 (128) final = 0x40 (64)

    Signed original = 0x80 (-128) final = 0xC0 (-64)

    F oating Point Division

  • 8/13/2019 02 Processor Requirements

    25/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 25 / 48

    F oating Point Division The FDIV on 29K takes 15 cycles

    There is not a FDIVon the 21K -- use recursion!!

  • 8/13/2019 02 Processor Requirements

    26/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 26 / 48

    Why is floating point so difficult?

    Number Internal representation

    1.0 0x3F 80 00 00

    32.0 0x42 00 00 00

    31.98125 0x41 FF D9 9A

    1023.4 0x44 7F D9 9A

    31.98125 = 1023.4 / 32 = 1023.4 / 2^5

  • 8/13/2019 02 Processor Requirements

    27/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 27 / 48

    Why is floating point so difficult?

    Fast scaling Routine for Floating-point

    RISC and DSP processors (APR-10)

    Floating Point Format

    31 23 22 0

    S bexp frac

  • 8/13/2019 02 Processor Requirements

    28/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 28 / 48

    Floating point number Ks (bexp -127)

    (-1) x 1.frac x 2

    0

    1.0 = 0x1.0 x 2

    0 (127 - 127)

    (-1) x 0x1.0000 x 2

  • 8/13/2019 02 Processor Requirements

    29/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 29 / 48

    Floating point number K

    s (bexp -127)

    (-1) x 1.frac x 2

    3 310.0 = 0x10.0 = %1010.0 = %1.0100 x 2 (0x1.4 x 2 )

    0 (130 - 127)

    (-1) x 0x1.4000 x 2

  • 8/13/2019 02 Processor Requirements

    30/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 30 / 48

    IEEE Std. 754, 1985

    Number Internal s bexp frac

    representation

    1.0 0x3F 80 00 00 0 0x7F 0x00 00 00

    32.0 0x42 00 00 00 0 0x84 0x00 00 00

    31.98125 0x41 FF D9 9A 0 0x83 0x7F D9 9A

    1023.4 0x44 7F D9 9A 0 0x88 0x7F D9 9A

    1.frac -- only fractional part is stored

    Remember JAMES BOND helped by M (Smith)

    The ONE is remembered and not stored

  • 8/13/2019 02 Processor Requirements

    31/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 31 / 48

    Fast floating pt division possible

    Number Internal s bexp fracrepresentation

    1.0 0x3F 80 00 00 0 0x7F 0x00 00 00

    32.0 0x42 00 00 00 0 0x84 0x00 00 00

    BEXP DIFF = 5

    31.98125 0x41 FF D9 9A 0 0x83 0x7F D9 9A

    1023.4 0x44 7F D9 9A 0 0x88 0x7F D9 9A

    BEXP DIFF = 5

    K = K / -1 -- flip the sign bit with XOR instructionp

    K = K / N where N = 2 -- decrease bexp = bexp -5

  • 8/13/2019 02 Processor Requirements

    32/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 32 / 48

    Fast Floating Point Division by 32 Doing it

    29K -- FP# K is in gr96

    Setting up the power

    CONST BEXPchange, 5

    Setting up the bexp-diffSLL BEXPchange, BEXPchange, 23

    result = K / 32

    SUB result, K, BEXPchange

  • 8/13/2019 02 Processor Requirements

    33/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 33 / 48

    Fast Floating Point Division by FP Mwhen M is known to be 2^p

    F0 = 1.0

    R0 = R8 - R0 // NOTE integer operation

    Setting up the bexp-diff

    R0 = ASHIFT R0 BY 23

    result = K / 32R4 = R4 - R0

    Works becauseF8 = 32.0 (0x42000000)

    F0 = 1.0 (0x3F800000)

    PROBLEMS?

  • 8/13/2019 02 Processor Requirements

    34/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 34 / 48

    PROBLEMS?

    Try to do 0 / 32

    Get a large negative number

    Number s bexp frac

    0.0 0 0x00 0x00 00 00

    subtract 0 0x05 0x00 00 00-2.126 * 10^37 1 0xFB 0x00 00 00

    If dividing by 2^p -- problems if number is smaller than 2^(p-127)

    Must be overcome on many processors

    Non-issue on 21k which has single cycle multiplicationand division. Calculate reciprocal and then multiply

    M t t lt

  • 8/13/2019 02 Processor Requirements

    35/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 35 / 48

    Must guarantee result

    68K, 29K, MIPS and 21k problems

    ADD.W R0, R1 ADD gr96, gr97, gr98

    Every addition (subtraction) result has thepossibility of being out of range-- overflow. Mustbe tested.

    68K solutionADD.W R0, R1

    BVS Somewhere

  • 8/13/2019 02 Processor Requirements

    36/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 36 / 48

    Specialized coding techniques e.g. 29k has the ability ofthrowing SWI as part of compare (ASSERT)

    Test for FP number too small from previous specialDivision operation

    CMP.L #toosmall, D0 68K code

    BGE okay

  • 8/13/2019 02 Processor Requirements

    37/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 37 / 48

    Specialized conditional instructions on 21k

    21K -- F4 contains the FP value -- need F4/32

    R0 = 5R0 = ASHIFT R0 BY 23

    F1 = minimum value ( 2^(5-127) )

    F2 = ABS F4

    COMP (F2, F1)

    IF GE R4 = R4 - R0 ELSE R4 = R4 - R4

  • 8/13/2019 02 Processor Requirements

    38/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 38 / 48

    LIES -- ALL LIESIF GE R4 = R4 - R0 ELSE R4 = R4 - R4

    This is not a legal instruction either!!

    COMPUTE instructions take 22 bits to

    describe IF JUMP/CALL ELSE R4 = R4 - R4 is allowed

    Useless approach anyway since there arebetter ways on 21k to do repeated divisionby a constant.

  • 8/13/2019 02 Processor Requirements

    39/48

  • 8/13/2019 02 Processor Requirements

    40/48

    Compa isons 1

  • 8/13/2019 02 Processor Requirements

    41/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 41 / 48

    Comparisons -- 1

    FIR/IIR

  • 8/13/2019 02 Processor Requirements

    42/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 42 / 48

    FIR/IIR

    FFT Radix 2 and Radix 4

  • 8/13/2019 02 Processor Requirements

    43/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 43 / 48

    FFT -- Radix 2 and Radix 4

    Requirements for perfect DSP

  • 8/13/2019 02 Processor Requirements

    44/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 44 / 48

    Requirements for perfect DSP

    Fast instruction cycle -- different from high clock

    speed Cycle time adjustable according to instruction type

    Fast hardware multiplier

    Floating point for easier algorithm design High precision, implying wide data buses for

    memory, internal processor transfers, registersand on-board processing units

    Requirements for perfect DSP

  • 8/13/2019 02 Processor Requirements

    45/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 45 / 48

    Requirements for perfect DSP

    Several data buses available to reduce bus conflicttransfer overhead

    Harvard architecture and/or instruction cache toavoid instruction and data-fetch clashes

    Duplicate resources for parallel computation of realand imaginary components of complex numbers

    Dedicated hardware required for address

    calculations to avoid APU clash with main algorithm

    Requirements for perfect DSP

  • 8/13/2019 02 Processor Requirements

    46/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 46 / 48

    Requirements for perfect DSP

    Extensive temporary registers to reduce unwantedfetches of continually used data

    Or single cycle, highly parallel, memory operations

    Fast and reliable, easily programmed, developed

    and upgraded

    Inexpensive and easy to develop peripherals

    High level of customer support

    Inexpensive to purchase Lower power consumption with a standby mode

    Requirements for perfect DSP

  • 8/13/2019 02 Processor Requirements

    47/48

    2000/03/05 ENCM515 -- Characteristics needed in DSP processorsCopyright [email protected] 47 / 48

    Requirements for perfect DSP

    Several data buses available to reduce bus conflict

    transfer overhead Harvard architecture and/or instruction cache to

    avoid instruction and data-fetch clashes

    Duplicate resources for parallel computation ofreal and imaginary components of complexnumbers

    Dedicated hardware required for address

    calculations to avoid APU

    Tackled today

  • 8/13/2019 02 Processor Requirements

    48/48

    Tackled today

    Characteristics of DSP algorithms

    Specialized handling of

    Multiplication

    Division (21K has no division instruction)

    ENCM515 Reference Material

    How RISCy Is DSP, IEEE Micro (Jan-10)

    Simply Signal Processing (Jan-40)

    Fast Scaling, CCI (Apr-10) Saturation Arithmetic (Apr-20)