Download - 01 dsp intro_1
![Page 1: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/1.jpg)
An introduction to
DSP’s
Examples of DSP applications
Why a DSP?
Characteristics of a DSP
Architectures
![Page 2: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/2.jpg)
DSP example: mobile phone
![Page 3: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/3.jpg)
DSP example: mobile phone with video camera
![Page 4: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/4.jpg)
DSP: applications
![Page 5: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/5.jpg)
Why a DSP?
� It’s easy: we want an architecture optimized for Digital Signal Processing
� Some versions are further optimized for some specific applications
- e.g. very low power consumption for mobile phones
![Page 6: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/6.jpg)
Which is the difference between a DSP and a
general purpose processor? (1/4)Memory architecture and bus
� The first processors (in the ‘40) had a Harvard architecture: separate memories for program and data
� But it’s complex -> soon replaced by Von Neumann architecture: no real difference between program and data (an instruction has two fields: operation and data)
� Problem: the processor cannot access instructions and data simultaneously
� To improve performance: Harvard architecture again!
In particular
- separate memories and busses for program and data
- possibly, another separate bus for the DMA
![Page 7: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/7.jpg)
Which is the difference between a DSP and a
general purpose processor? (2/4)
A DSP is often used to realize a linear filter
The convolution integral
is actually a sum:
yn=Σixn-ihi
- if the number of sums is finite: FIR filter (finite impulse response),
- otherwise: IIR (infinite impulse response),
- which can be realized using two finite sums:
yn=Σixn-ibi + Σiyn-iai
![Page 8: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/8.jpg)
Which is the difference between a DSP and a
general purpose processor? (3/4)
� A common operation in a FIR or IIR filter is A=BC+D: we need- a hardware multiplier (introduced in DSPs in the '70)
- a multiply and accumulate in only one clock cycle: MAC instruction.
Actually, the MAC is in a loop: we also need a zero overhead loop:- H/W for address generation (the access to memory is not random)- loop management
- auto-increment; circular addressing
� Other possible H/W:- H/W saturation
- Instructions to perform a division quickly- Bit reversal for FFT
![Page 9: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/9.jpg)
Which is the difference between a DSP and a
general purpose processor? (4/4)
Other possible features:� Often, data are 16- o 8-bit wide (e.g., audio or images)
- a 32-bit ALU can be splitted in two 16-bit ALUs or four 8-bit ALUs, -> 2 o 4 operations in parallel
� several ALUs which work in parallel� fixed point ALUs, o 16-bit ALUs, to reduce power
consumption and costs
� optimized versions:- cost: for consumer applications
- power: for mobile applications- for specific applications, e.g. electric motor control
![Page 10: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/10.jpg)
� Example: ‘C30 (Texas Instruments,
1982)
![Page 11: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/11.jpg)
� Example: FIR filter using a ‘C30
![Page 12: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/12.jpg)
Note: several of these characteristics, which were born on DSPs, have been ported to general purpose processors
E.g.: the cache in the Pentium processor is
Harvard-like
![Page 13: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/13.jpg)
� Another example.: several units working in parallel, and splittable ALUs (see. MMX extensions) in the Pentium 4
processor
![Page 14: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/14.jpg)
Pipeline…
� Example of a 4-stage pipeline (TI ‘C30)
� each instruction is executed in 4 clock cycles, but (normally) can be put just 1 cycle after the previous one (data are needed only 3 cycles later)
![Page 15: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/15.jpg)
Pipeline: branch (e.g. on the ‘C30)
� Standard branch: the pipeline is flushed to correctly handle
the PC -> 4 cycles
� Delayed branch: the pipeline is not flushed, and the 3
following instructions are loaded before modifying the PC
-> only 1 cycle needed!
BRD label ; delayed branch
MPYF ; executed
ADDF ; executed
SUBF ; executed
AND ; not executed
…
…
label MPYF ; fetched after SUBF
…
![Page 16: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/16.jpg)
Two architectures
� In order to exploit the instruction level parallelism (ILP): two possible architectures- Superscalar: the parallelism is dynamically managed by the hardware- Very Long Instruction Word (VLIW): the parallelism is statically managed by the compiler
Which is the problem?
� Dependences in data or control can generate conflicts - on data (an instruction needs the result of a previous
instruction, but the results is not ready yet), or
- on control (conditional jump, but the condition is not ready yet)
-> pipeline stall
![Page 17: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/17.jpg)
Superscalar
� The analysis of the independent instructions is dynamically done by hardware (which is complex!)
� The sequence of instructions can be executed out-of-order;
then, the completion of the instructions (commit) is done in-
order to correctly update the state of the CPU
![Page 18: 01 dsp intro_1](https://reader034.vdocuments.site/reader034/viewer/2022052507/558fc96d1a28ab9b198b462e/html5/thumbnails/18.jpg)
VLIW
� Very Long Instruction Word (VLIW): the parallelism is statically managed by the compiler
� The analysis of independent instructions is statically realized during the compilation phase;
- the instructions which can be realized in parallel are assembled in long instructions and send to the various functional units in-order
� Convenient solution for DSP programs (fixed length cycles, few conditional operations); less convenient for general purpose applications
� Simpler hardware! But a specific compilation for each platform is needed
� Deterministic behaviour -> exact computation of execution times