dsp alg arch mm1 slides

Upload: srinivasa-raonookala

Post on 03-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    1/29

    1MM1 DSP Algorithms and Architectures

    DSP Algorithms and ArchitecturesMinimodule 1.

    Anders Brdls Olsen

    [email protected]

    2MM1 DSP Algorithms and Architectures

    P2-18 DSP Algorithms and Architectures Purpose

    The purpose of the course is to aid the student in getting an

    understanding of the concepts needed in order to map with a

    good interaction a DSP algorithm onto a real-time architecture.

    Objectives

    After the course the student should demonstrate:

    Comprehension of basic and advance concepts in algorithmic

    and architectural interaction

    Application of methods for designing and optimizing data- andcontrol-paths for DSP algorithms

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    2/29

    3MM1 DSP Algorithms and Architectures

    The course content

    Two part course1. (ABO) The architectural aspects

    2. (PK) The more algorithmic aspects

    Contents

    Design concepts for DSP Systems from specs to prototype

    Cost functions Area, Time, Power, Numeric,

    General DSP Architectures

    Algoritmic representation SFG, DFG, SDF, Precedence Graphs

    Critical path, critical loop

    Timing in algorithms retiming, unfolding, pipelining

    Look-ahead transformations

    Allocation, Assignment, and Scheduling

    Finite State Machine with Datapath Methods for Data- and Controlpath optimization

    Memory management in real-time DSP systems

    4MM1 DSP Algorithms and Architectures

    Course practicalities Literature:

    Gajski book

    I will find additional reading in form of papers

    Course Web(http://kom.aau.dk/~abo/Teaching/DSP_alg_arch/index.htm)

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    3/29

    5MM1 DSP Algorithms and Architectures

    Topics of today

    From functionality to silicon

    An introduction

    Motivation for application specific

    architecture

    Cost functions (Noise, Power, Area, Time, )

    Representing a design

    Abstraction levels

    6MM1 DSP Algorithms and Architectures

    The First Computer

    The Babbage

    Difference Engine(1832)

    25,000 parts

    cost: 17,470

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    4/29

    7MM1 DSP Algorithms and Architectures

    ENIAC - The first electronic computer (1946)

    8MM1 DSP Algorithms and Architectures

    Intel 4004 Micro-Processor

    1971

    ~2000 transistors

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    5/29

    9MM1 DSP Algorithms and Architectures

    Todays processors

    2008

    > 300M transistors

    > 3000 MHz operation

    ~150mm2

    10MM1 DSP Algorithms and Architectures

    The A3

    ParadigmApplication

    Algorithm

    Architecture

    LP-filter

    (specification)

    FIR

    IIR (parallel)

    IIR (cascade)

    DSP-Controller

    ASIC/FPGA

    Design dedicated

    architectures that fits our

    algorithmic demands.

    CAD tools typical help us, but

    we need to know why and

    how

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    6/29

    11MM1 DSP Algorithms and Architectures

    The A3 Paradigm

    Application

    Algorithm

    Architecture

    LP-filter

    (specification)

    FIR

    IIR (parallel)

    IIR (cascade)

    DSP

    -Controller

    ASIC

    1:many mapping

    Attributes

    Numerical properties

    Attributes

    size, execution time

    This course

    Specifications

    12MM1 DSP Algorithms and Architectures

    Design representation

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    7/29

    13MM1 DSP Algorithms and Architectures

    Design Abstraction Levels

    n+n+S

    G

    D

    +

    DEVICE

    CIRCUIT

    GATE

    MODULE

    SYSTEM

    14MM1 DSP Algorithms and Architectures

    The design process Top-down design strategies

    Refine Specification successively

    Decompose each component into small components

    Lowest-level primitive components

    Over-sold methodology - only works with plenty of experience

    Bottom-up design strategies Build-up from primitive components

    Combined to form more complex components

    Risk wrong interpretation of specifications

    Mixed strategies Mostly top-down, but also bits of bottom-up Reality: need to know both top level and bottom level constraints

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    8/29

    15MM1 DSP Algorithms and Architectures

    Typical signal processing algorithms

    HX(n) Y(n)

    Sampler

    (quantizer)

    Analog

    Reconstructor

    DigitalSignal

    Processor

    1001101000101001

    1010101011101011

    Typical filter operations

    IIR:

    FIR:

    Additions and Products

    (Control)

    REAL-TIME

    Vector, Matrix: y=Mx

    16MM1 DSP Algorithms and Architectures

    General architectures -controllers

    General Purpose Processors (GPP)

    Application Specific Instruction-set

    Processor (ASIP)

    Digital Signal Processors (DSP)

    Application Specific Integrated Circuit

    (ASIC)

    Field Programmable Gate Array (FPGA)

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    9/29

    17MM1 DSP Algorithms and Architectures

    -controllers and GPPs

    Known as a Von Neumann architecture Product calculations on a ALU!

    Shared instruction and data bus

    Control

    Mem ALU

    Operation cyclus:

    C1: Instruction fetch

    C2: Data 1 fetch

    C3: Data 2 fetch

    C4: operation execution

    C5: Output data storage

    Computational capacity

    Bus capacity

    18MM1 DSP Algorithms and Architectures

    8bit PIC controller

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    10/29

    19MM1 DSP Algorithms and Architectures

    -controllers and GPPs

    Introducing a multiplier in the architecture Precision

    M=N: Single precision

    M=2N: Double precision

    M>2N: Overflow precision

    Control

    Mem ALU

    MUL

    Computational capacity

    Bus capacity

    (Still using the same bus

    for instruction and data)

    20MM1 DSP Algorithms and Architectures

    ARM7

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    11/29

    21MM1 DSP Algorithms and Architectures

    Digital Signal Processors [1]

    Harvard architecture

    Individual data and instruction busses

    Fetch of instruction and data simultaneous

    Micro parallelism (architectural)

    Mem ALU

    MULControl

    Mem

    DataProgram

    Control PathData Path

    Operation cyclus:

    C1: Instruction fetch

    C1: Data 1

    C2: Data 2

    C3: execution || inst fetch

    C4: Output data storage

    Computational capacity

    Bus capacity

    (Two operands!)

    22MM1 DSP Algorithms and Architectures

    TMS32010

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    12/29

    23MM1 DSP Algorithms and Architectures

    Digital Signal Processors [2]

    Modified Harvard architecture Duplicated data busses

    Multiple data memory banks

    Mem

    ALU

    MULC

    ontrol

    M

    em

    Data 2Program

    Data 1

    Mem

    Operation cyclus:

    C1: Instruction fetch

    C1: Data 1 || Data 2

    C2: execution || inst fetch

    C3: Output data storage

    Computational capacity

    Bus capacity

    24MM1 DSP Algorithms and Architectures

    Blackfin architecture

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    13/29

    25MM1 DSP Algorithms and Architectures

    Digital Signal Processors [3]

    Utilizing algorithmic and architecturalproperties

    Using address arithmetic unit the core

    of the above algorithm becomes a

    single line of parallel instructions

    .

    .

    A0 += data1*data2 || A1+=data3*data4;.

    .

    26MM1 DSP Algorithms and Architectures

    Dual-core DSP processors

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    14/29

    27MM1 DSP Algorithms and Architectures

    Digital Signal Processors [4]

    Question: Is it always possible to utilizetwo (or more) MACs?

    Condition: As long as the inherent

    algorithmic parallelism is not fully utilized,

    additional hardware may provide a

    performance optimization!

    28MM1 DSP Algorithms and Architectures

    ASIC and FPGAs [1] ASIC

    Customized for a particular use, in silicon

    Specific combining of functional units, routed by

    busses.

    FPGA

    Customized for a particular use, using programmable

    logic components and programmable interconnecting

    busses

    From an algorithmic point the design

    methodologies is more or less similar for the two

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    15/29

    29MM1 DSP Algorithms and Architectures

    ASIC and FPGA [2]

    Mapping of algorithm onto a customdesign HW architecture!

    Example alg.

    1:1 mapping (fully utilizing parallelism)

    Cost: T, A

    Multiplexed (HW-sharing)

    Cost T, A (+ Control)

    30MM1 DSP Algorithms and Architectures

    Algorithmic parallelism [1]

    HX[n] Y[n]

    HaX[n] Y[n]

    Hb Hc Hd

    Time of operation Throughput

    T1

    T2 = Ta+Tb+Tc+Td

    The operation time of a given transfer function is obviously

    dependent on the algorithmic complexity, but also on the

    implementation technology used.

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    16/29

    31MM1 DSP Algorithms and Architectures

    Algorithmic parallelism [2]

    HaX[n] Y[n-3]

    Hb Hc Hd

    Ha

    X[n]Y[n]

    Hb

    Hc

    Hd

    LatchThe latency is increased

    Can be parallelized

    Factorization

    Partial Fraction Expansion

    The latency is not increased

    Can be parallelized

    Algorithmic manipulation

    is a very important toolwhen optimizing

    architecture designs

    32MM1 DSP Algorithms and Architectures

    Representation methods of alg. [1] Block diagram

    Consists of functional blocks connected with directed

    edges, which represent data flow from its input block

    to its output block

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    17/29

    33MM1 DSP Algorithms and Architectures

    Representation methods of alg. [2]

    Signal-Flow Graph Nodes: represents computations or tasks,

    sum all incoming signal

    Edges: denotes a linear transformation from

    the input to the output

    34MM1 DSP Algorithms and Architectures

    Graphical Representations Data Flow Graphs (DFG)

    Control Flow Graphs (CFG)

    Control Data Flow Graphs (CDFG)

    State Transition Graphs (STG)

    nodes (orvertices)

    edges (or arcs)

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    18/29

    35MM1 DSP Algorithms and Architectures

    Data Flow Graph

    Nodes: represents computations (or functions) Edges: represents data paths (or communications)

    Models data dependencies: a node can perform its

    operation whenever data is present

    Data flow forms directed acyclic graph (DAG):

    x1=a+b

    y=a*c

    z=x1+d

    x2=y-dx3=x2+c

    36MM1 DSP Algorithms and Architectures

    CDFG, DFG, CFG

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    19/29

    37MM1 DSP Algorithms and Architectures

    Cost Functions

    38MM1 DSP Algorithms and Architectures

    Cost Functions Implementation quality is determined by cost

    functions noise, power, area, time

    ai ,depends on the importance of the associated costparameter

    Noise: wordlength

    Power: technology Area: circuit

    Time: the three above

    Interaction

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    20/292

    39MM1 DSP Algorithms and Architectures

    Minimizing the cost function

    Choice of alg. / alg. Manipulation / wordlength

    Extraction and utilization of inherent parallelism

    Number and types of execution units

    Scheduling

    Application

    Algorithm

    Architecture

    40MM1 DSP Algorithms and Architectures

    Sources of Power Consumption

    Short Circuit: Leakage:

    Vout

    Vin

    Vin

    I

    I

    VoutVin

    Ioff

    Vout=VddVin=0

    Ids

    VgsVth

    t0 t1

    v(t)

    Vdd

    I

    Vin10

    i(t)

    v(t)

    Dynamic:

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    21/292

    41MM1 DSP Algorithms and Architectures

    Controlling Energy Consumption

    Largest contributing component to CMOS

    power consumption is switching power:

    What control do you have over

    each factor?

    How does each effect the total

    Energy? (think about f)

    What control do you have as a designer?

    2

    ddavgavgavg VcfnP =

    Circuit Delay:

    42MM1 DSP Algorithms and Architectures

    Energy and Power Warning! In everyday language, the term

    power is used incorrectly in place of energy.

    Power is not energy.

    Power is not something you can run out of.

    Power can not be lost or used up.

    Power is not a thing, it is merely a rate.

    Power can not be put into a battery any morethan velocity can be put in the gas tank of a car.

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    22/292

    43MM1 DSP Algorithms and Architectures

    Design Representation and

    Abstraction levels

    44MM1 DSP Algorithms and Architectures

    Design Representation Behavioral or funct ional representation

    Specifies the behavior or the functions of adesign without any implementationinformation

    Structural representation

    Specifies the implementation of a design interms of components and their interactions

    Physical representation Specifies the physical characteristics of the

    design (Blueprint for manufacturing)

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    23/292

    45MM1 DSP Algorithms and Architectures

    Digital System Design

    IDEA

    Behavioral Design

    Structural Design

    Logic Design

    Physical Design

    Fabrication

    Product

    Algorithm

    State machine,ALU,Regs

    Gate level netlist

    Transistor list

    Plain English

    46MM1 DSP Algorithms and Architectures

    Levels of Design Abstractions

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    24/292

    47MM1 DSP Algorithms and Architectures

    Implementation Technologies

    48MM1 DSP Algorithms and Architectures

    HW Design Abstraction

    Polygons of Silicon

    Transistors

    Logic Gates

    Processor-Memory Level

    RT LevelLevels ofLevels of

    DesignDesign

    AbstractionAbstraction

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    25/292

    49MM1 DSP Algorithms and Architectures

    Representation and Abstraction

    Algorithm

    RT Language

    Boolean Eqn

    Differential EqnTransistor

    Gate

    RT

    Proc. Mem. Switch

    Function

    al

    Function

    alStructural

    Structural

    GeometricGeometric

    Polygons

    Sticks

    Standard Cells

    Floorplan

    YY--ChartChart

    50MM1 DSP Algorithms and Architectures

    Heterogeneous HW/SW Implementations

    Cost

    Performance

    Only SW,Low cost andLow performance.

    Only HW,High cost andHigh performance.

    Mixed HW-SW,Medium cost andperformance.

    Additionally, flexibility and tight time to marketrequirements favour SW implementations.

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    26/292

    51MM1 DSP Algorithms and Architectures

    System-level HW-SW Co-design

    IDEA

    System-levelHW-SW Co-design

    Memory hierarchyand mapping

    SW behavior, RTOS,schedule policyand processors

    Interconnectand buses

    ConstraintsSpecification

    HW behaviorand components

    Components(HW,SW)

    52MM1 DSP Algorithms and Architectures

    Issues in System-level HW-SW Co-design

    Specification of functionality and constraints.Simulation of functionality.

    Components as building blocks SW processors: DSP and Micro-controllers HW co-processors: ASICs, FPGA Storage elements: Cache, Scratchpad, SRAM, DRAM Interconnection elements: Buses and arbiters Interface and I/O units: DMA, UART, D/A, A/D,

    Wireless communication Software platform: RTOS and scheduling

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    27/292

    53MM1 DSP Algorithms and Architectures

    Issues in System-level HW-SW Co-design

    Performance analysis (timing, power, area)(timing, power, area)

    Design and optimization (timing, power, area)(timing, power, area)

    Architecture selection: processing elements,

    memory units and inter-connect.

    RTOS and schedule scheme.

    54MM1 DSP Algorithms and Architectures

    Design flow and Abstraction levels

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    28/292

    55MM1 DSP Algorithms and Architectures

    The A3 model and design flows

    Application

    Algorithm

    Architecture

    LP-filter

    (specification)

    FIR

    IIR (parallel)

    IIR (cascade)

    DSP

    -Controller

    ASIC

    Design

    flows

    56MM1 DSP Algorithms and Architectures

    Summary Algorithms and Architectures

    Data path

    Control path

    Algorithmic properties

    Cost functions

    Design flows and representations Design representations

    Design abstractions

    Following courses Architectural optimization (mm2-mm3)

    Scheduling concepts (mm4-mm5)

  • 7/29/2019 DSP Alg Arch Mm1 Slides

    29/29

    57MM1 DSP Algorithms and Architectures

    Exercises

    Gajski: 1.1, 1.4, and 1.8

    Cost functions: Discuss power vs. energy optimization Why is there a difference?

    How can you optimize energy, only taking the dynamic contribution into account?

    Taking an outset in the paper by C.H. Wang, Algorithmic Implementation ofLow-Power High Performance FIR Filtering IP Cores (Hint: only sections 1and 2). For these exercises you should prepare a few notes such that youcan present your findings next Thursday (no more than 5 minutes).

    Gr840:

    Find the various representation forms of the FIR filter used, and writ them inmathematical form and make a block-diagram representation

    Gr841:

    Discuss or verify that the data-path in figure 2 is reasonable and try to mapthe algorithms onto it.

    Gr842

    Make a 1:1 mapping and propose an architecture for a four tap FIR filter