dsp c5000 chapter 14 finite impulse response (fir) filter implementation copyright © 2003 texas...

DSP C5000DSP C5000

Chapter 14Chapter 14

Finite Impulse Response (FIR) Finite Impulse Response (FIR) Filter ImplementationFilter Implementation

Copyright © 2003 Texas Instruments. All rights reserved.Copyright © 2003 Texas Instruments. All rights reserved.

Copyright © 2003 Texas Instruments. All rights reserved.

ESIEE, Slide 2

OutlineOutline

Digital Filters and FIR filtersDigital Filters and FIR filters

Implementation of FIR Filters on C54xImplementation of FIR Filters on C54x


Comparison of C54x and C55xComparison of C54x and C55x


ESIEE, Slide 3

Outline of FIR FiltersOutline of FIR Filters

Generalities on Digital FiltersGeneralities on Digital Filters FIR Filters with FIR Filters with MatlabMatlab Implementation of FIR FiltersImplementation of FIR Filters


ESIEE, Slide 4

Digital FiltersDigital Filters

xn ynDigital Filter

xn yn

Digital Filter

Samplingfrequency

fS

ADC

DAC

x(t) y(t)Analog

anti-aliasing

filter

Analogsmoothing

filter


ESIEE, Slide 5

Linear, Time-Invariant Digital SystemsLinear, Time-Invariant Digital Systems

LinearityLinearity

Time InvarianceTime Invariance

1

21 1 2 2 1 1 2 2

R

Rx n x n y n y n( ) ( ) ( ) ( )

x n y n x n n y n n( ) ( ) ( ) ( ) 0 0


ESIEE, Slide 6

Impulse ResponseImpulse Response

0

0 0

Impulse sequence 1

0 0

n

n

n

n u

u u

n u

un hn Digital Filter

n=0


ESIEE, Slide 7

Input-Output Relationship, ConvolutionInput-Output Relationship, Convolution

xn

0 1 2 n=-1 =

x-1un+1

0 1 2 n=-1

0 1 2 n=-1

0 1 2 n=-1

0 1 2 n=-1

+

+

+

x0un

x1un-1

x2un-2

n k n kk

x x u


ESIEE, Slide 8

Input-Output Relationship, ConvolutionInput-Output Relationship, Convolution

Using linearity and time invariance:Using linearity and time invariance:

output( )k k

n k n k k n kk k

y x u x h

y x h h xn k n kk

k

k n kk

k


ESIEE, Slide 9

Output for a Single Frequency InputOutput for a Single Frequency Input Single frequency input Single frequency input Single frequency output Single frequency output

x enj nTe 0

y x Hn n ( )0

H h ekj kT

k

ke( )

00

H H e A ej H j( ) ( ) ( )arg( ( )) ( ) 0 0 0

0 0


ESIEE, Slide 10

Frequency Transfer FunctionFrequency Transfer Function

For a digital filter the frequency For a digital filter the frequency transfer function is periodic.transfer function is periodic.

H H e A ej H j( ) ( ) ( )arg( ( )) ( )

( ) arg H

( )

( )

hf

H e dne

j nT

f

f

e

e

e

1

2

( )

PhasePhase Group Group delaydelay

AmplitudeAmplitude


ESIEE, Slide 11

Relationship Between Fourier Transforms Relationship Between Fourier Transforms of Input and Outputof Input and Output

X x enj nT

n

ne( )

Y y enj nT

n

ne( )

Y H X( ) ( ) ( )


ESIEE, Slide 12

Z Transfer FunctionZ Transfer Function

H z h znn

n

( )

H h e H znj nT

nz e

ej Te( ) ( )

Y z X z H z( ) ( ) ( )


ESIEE, Slide 13

Basic Relationships of a Digital FilterBasic Relationships of a Digital Filter

y x h h xn k n kk

k

k n kk

k

Y H X( ) ( ) ( )

Y z X z H z( ) ( ) ( )


ESIEE, Slide 14

Rational z Transfer FunctionRational z Transfer Function

Linear equation with constant coefficients.Linear equation with constant coefficients.

H zN z

D z

b z

a z

ii

i

Q

kk

k

P( )( )

( )

0

1

1

y b x a yn i n ii

Q

k n kk

P

0 1


ESIEE, Slide 15

IIR and FIR FiltersIIR and FIR Filters IIR = Infinite Impulse ResponseIIR = Infinite Impulse Response FIR = Finite Impulse ResponseFIR = Finite Impulse Response

H z b z h z

n Q h

n Q h bii

i

Q

nn

n

n

n n

( ),

,

0

0 1 0

0 1

FIRFIR

IIRIIR

H zN z

D z( )

( )

( ) With D( ) constant.z


ESIEE, Slide 16

FIR and IIRFIR and IIR

FIR: output yFIR: output ynn is a linear combination of a is a linear combination of a finite number of input samples.finite number of input samples.

IIR: output yIIR: output ynn is a linear combination of a is a linear combination of a finite number of input and of output finite number of input and of output samples. Recursive form.samples. Recursive form.

0 0, .

Q Q

n i n i i n i i ii i

y h x b x b h

y b x a yn i n ii

Q

k n kk

P

0 1


ESIEE, Slide 17

Causality and StabilityCausality and Stability

A filter is causal if A filter is causal if hhnn=0 for n < 0=0 for n < 0 A filter is stable if the output is bounded A filter is stable if the output is bounded

for any bounded input.for any bounded input. Condition for stability is:Condition for stability is:

All the pAll the pooles of H(z) are inside the unit circleles of H(z) are inside the unit circle FIR are always stable.FIR are always stable.

Or: Or:

h Ann


ESIEE, Slide 18

Representation of Poles and Zeroes of H(z) in Representation of Poles and Zeroes of H(z) in the Complex Planethe Complex Plane

-1 -0.5 0 0.5 1

-1

-0.5

0

0.5

1

Real Part

Imaginary Part


ESIEE, Slide 19

Some Useful Matlab FunctionsSome Useful Matlab Functions Example for a FIR filter:Example for a FIR filter:

1 2 30 1 2 3

0 1 2 2

( )

[ ] [1 1 1 1].

N z b b z b z b z

b b b b b

Enter the filter coefficients vector b:Enter the filter coefficients vector b: b=[1 1 1 1]; a=1;b=[1 1 1 1]; a=1;

Calculate transfer function Hf, its Calculate transfer function Hf, its amplitude and phase on 256 samples, amplitude and phase on 256 samples, with fs=1:with fs=1: [Hf,f]=freqz(b,a,256,1);[Hf,f]=freqz(b,a,256,1); HfA=abs(Hf);HfA=abs(Hf); Hfphi=angle(Hf);Hfphi=angle(Hf);


ESIEE, Slide 20

Some Useful Matlab Functions Some Useful Matlab Functions

Plot impulse response: Plot impulse response: stem(b)stem(b) Plot amplitude and phase of transfer Plot amplitude and phase of transfer

function: function: plot(f,HfA) plot(f,HfA) andand plot(f,Hfphi) plot(f,Hfphi)

00.05 0.1 0.15 0.2

0.250.3 0.35 0.4 0.45

-2.5

-2

-1.5

-1

-0.5

0

0.5

1Phase of the transfer function

0 0.05 0.1 0.15 0.20.25

0.3 0.35 0.4 0.450.5

0

0.5

1

1.5

2

2.5

3

3.5

4

Frequency, FS=1

Amplitude of the transfer function

Frequency, FS=10.5


ESIEE, Slide 21

Some Useful Matlab FunctionsSome Useful Matlab Functions Generate a test signal = sum of cosines:Generate a test signal = sum of cosines:

x=cos(2*pi*[0:99]*0.25)+2*cos(2*pi*[0:99]*0.1);x=cos(2*pi*[0:99]*0.25)+2*cos(2*pi*[0:99]*0.1);

Apply the filter to x. Output is y:Apply the filter to x. Output is y: y=filter(b,a,x);y=filter(b,a,x);

Plot the results:Plot the results: plot(x); plot(y) plot(x); plot(y)

0 20 40 60 80 100-3

-2

-1

0

1

2

3

Time0 20 40 60 80 100

-6

-4

-2

0

2

4

6

Time

Input x Output y

x is the sum of x is the sum of 2 frequencies : 2 frequencies : 0.25 and 0.1.0.25 and 0.1.

The filter The filter cancels the cancels the frequency 0.25.frequency 0.25.

y has only the y has only the freq. 0.1.freq. 0.1.


ESIEE, Slide 22

Calculation of a FIR using MatlabCalculation of a FIR using Matlab

For given attenuation and frequency For given attenuation and frequency response characteristics, the transfer response characteristics, the transfer function can be calculated using function can be calculated using different methods: different methods: Mean square error, miniMax (Chebychev)Mean square error, miniMax (Chebychev) Empirical window method Empirical window method

Corresponding Matlab functionsCorresponding Matlab functions firlsfirls and and remezremez.. firfir and and fir1fir1..


ESIEE, Slide 23

Example using MatlabExample using Matlab

Design a low pass filter:Design a low pass filter: Sampling frequency = 9600 HzSampling frequency = 9600 Hz Maximum attenuation (passband) = 0.1 dBMaximum attenuation (passband) = 0.1 dB Minimum attenuation (stopband) = 50 dBMinimum attenuation (stopband) = 50 dB Limit frequencies of passband and Limit frequencies of passband and

stopband = 1200 Hz and 2600 Hz.stopband = 1200 Hz and 2600 Hz.Attenuation in dB

f in Hz26001200


ESIEE, Slide 24

Example using MatlabExample using Matlab

Vector of limited frequencies (normalized)Vector of limited frequencies (normalized) F=[0 1200 2600 4800]/4800;F=[0 1200 2600 4800]/4800;

Vector of required amplitudes:Vector of required amplitudes: A=[1 1 0 0];A=[1 1 0 0];

Least square calculation of filter:Least square calculation of filter: Bls=firls(23,F,A);Bls=firls(23,F,A);

Mini Max calculation of filter:Mini Max calculation of filter: Bre=remez(21,F,A);Bre=remez(21,F,A);

Window method (Hamming):Window method (Hamming): Bwin=fir1(25,(1200+2600)/9600);Bwin=fir1(25,(1200+2600)/9600);


ESIEE, Slide 25

Results of Matlab ExampleResults of Matlab Example The minimum orders to satisfy the The minimum orders to satisfy the

constraints are 23 for LS, 21 for constraints are 23 for LS, 21 for minimax and 25 for the window minimax and 25 for the window method.method.

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000-20

0

20

40

60

80

100

120

140

Mini Max Mini Max windowwindow

Least square Least square methodmethod

Window Window methodmethod


ESIEE, Slide 26

Results of Matlab ExampleResults of Matlab Example

Impulse ResponseImpulse Response

0 5 10 15 20 25-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

hnhn

nn


ESIEE, Slide 27

FIR Filters with Constant Group Delay or FIR Filters with Constant Group Delay or Linear PhaseLinear Phase

For many applications, it is desirable to For many applications, it is desirable to use a filter with a constant group delay use a filter with a constant group delay (independant of the frequency).(independant of the frequency). The phase will be linear or affine.The phase will be linear or affine.

2 possible cases: 2 possible cases: symmetrical or asymmetrical FIRsymmetrical or asymmetrical FIR.. Constant group delay = TConstant group delay = TSS (N-1)/2 (N-1)/2 Symmetrical:Symmetrical: h(n)=h(N-1-n)h(n)=h(N-1-n) Asymmetrical;Asymmetrical; h(n)=-h(N-1-h(n)=-h(N-1-n)n)


ESIEE, Slide 28

FIR filters with Constant Group Delay or FIR filters with Constant Group Delay or Linear PhaseLinear Phase

Asymmetric case: linear phaseAsymmetric case: linear phase

( )f kf

Asymmetrical case: Asymmetrical case:

( )f kf 2


ESIEE, Slide 29

Fixed Point Implementation of FIR FiltersFixed Point Implementation of FIR FiltersNumerical IssuesNumerical Issues

Fixed point implementation:Fixed point implementation: 16 bits for data and coefficients16 bits for data and coefficients Accumulators have size 40 bitsAccumulators have size 40 bits

Fixed point representation of dataFixed point representation of data Size B = 16 bits, Format Qk: k fractional bitsSize B = 16 bits, Format Qk: k fractional bits

Quantization of coefficientsQuantization of coefficients Maximum magnitude coefficient = hmaxMaximum magnitude coefficient = hmax Number of bits of the integer part of Number of bits of the integer part of

coefficients is Bi:coefficients is Bi: Bi = logBi = log22(hmax) (hmax) Coefficients in Qk’ with k = 16-BiCoefficients in Qk’ with k = 16-Bi


ESIEE, Slide 30

Matlab ExampleMatlab Example

The coefficients Bre can be quantized The coefficients Bre can be quantized using 16-bit fixed point with 15 fractional using 16-bit fixed point with 15 fractional bits:bits: Bre=round(Bre*2^15);Bre=round(Bre*2^15);

To store the result in a text file for CCS:To store the result in a text file for CCS: fp=fopen('coef.asm','wt')fp=fopen('coef.asm','wt') for i=1:22for i=1:22 fprintf(fp,' .word %d \n',Ba(i))fprintf(fp,' .word %d \n',Ba(i)) endend fclose(fp)fclose(fp)


ESIEE, Slide 31

Matlab ExampleMatlab Example

File coef.asmFile coef.asm Can be edited Can be edited

to be used to be used with CCS.with CCS.

.word 39 .word -92 .word -242 .word 25 .word 668 .word 579 .word -978 .word -2229 .word 86 .word 6374 .word 12127 .word 12127 .word 6374 .word 86 .word -2229 .word -978 .word 579 .word 668 .word 25 .word -242 .word -92 .word 39


ESIEE, Slide 32

FIR Implementation, Numerical issues, FIR Implementation, Numerical issues, FRCT bitFRCT bit

Common case:Common case: Data and coefficients in Q15 formatData and coefficients in Q15 format Product h(i)x(n-i) in Q30 (2 sign bits)Product h(i)x(n-i) in Q30 (2 sign bits) By shifting products 1 bit left, the product By shifting products 1 bit left, the product

are in Q31 format with only 1 sign bit.are in Q31 format with only 1 sign bit. If the FRCT bit (Fraction) is set to 1, If the FRCT bit (Fraction) is set to 1,

products are automatically shifted 1 bit products are automatically shifted 1 bit left.left.


ESIEE, Slide 33

Structures for FIR ImplementationStructures for FIR Implementation

Common structures for FIR filtersCommon structures for FIR filters Transversal structures Transversal structures Trellis structureTrellis structure

Useful in some adaptive situations.Useful in some adaptive situations.

Transversal structures using:Transversal structures using: Linear buffersLinear buffers Circular buffersCircular buffers

Special case for symmetrical or Special case for symmetrical or asymmetrical FIRs.asymmetrical FIRs.


ESIEE, Slide 34

Transversal Structures of FIRTransversal Structures of FIR

Structure with a delay lineStructure with a delay line

xn

yn

xn-N+1xn-1 xn-2

b0 b1 b2 b3 bN-1

xn

yn

bN-1 bN-2 b3 b2 b1 b0

Transposed structureTransposed structure


ESIEE, Slide 35

Implementation of a FIR with a Delay LineImplementation of a FIR with a Delay Line

Most common structure used in DSP.Most common structure used in DSP. The delay line can be implemented using a The delay line can be implemented using a

linear or a circular buffer.linear or a circular buffer. Basic operations:Basic operations:

Read a new data value x(n) every TRead a new data value x(n) every TSS

ACCU=0ACCU=0 for i=0 to N-1:for i=0 to N-1:

Multiply h(i) by x(n-i) and add it to Multiply h(i) by x(n-i) and add it to accumulatoraccumulator

Output y(n)Output y(n)


ESIEE, Slide 36


Implementation of General TransveImplementation of General Transversal FIR filtersrsal FIR filters Using linear buffersUsing linear buffers Using circular buffersUsing circular buffers

Implementation of Symmetrical FIImplementation of Symmetrical FIR filtersR filters


ESIEE, Slide 37

Operations using a Linear Buffer for a FIR Operations using a Linear Buffer for a FIR with N Coefficientswith N Coefficients

Length of the delay line = N samplesLength of the delay line = N samples Read a new sample x(n) and store it in the Read a new sample x(n) and store it in the

delay line in the first position.delay line in the first position. ACCU=0ACCU=0 for i=0 to N-1for i=0 to N-1

Read h(i) and x(n-i)Read h(i) and x(n-i) Multiply h(i) by x(n-i) and add it to ACCUMultiply h(i) by x(n-i) and add it to ACCU

Output y(n)Output y(n) N-1 Shifts in the delay line.N-1 Shifts in the delay line.


ESIEE, Slide 38

Linear Buffer, MACD ModeLinear Buffer, MACD Mode

Instead of shifting N-1 samples at the Instead of shifting N-1 samples at the end, do the shift in the loop one by one. end, do the shift in the loop one by one.

Read a new sample xn and store it in the Read a new sample xn and store it in the delay line in the first position.delay line in the first position.

ACCU=0ACCU=0 for i=N-1 to 0for i=N-1 to 0

Read h(i) and x(n-i)Read h(i) and x(n-i) Multiply h(i) by x(n-i) and add it to ACCUMultiply h(i) by x(n-i) and add it to ACCU Shift x(n-i) in the delay line Shift x(n-i) in the delay line

Output y(n)Output y(n)


ESIEE, Slide 39

MACD InstructionMACD Instruction

MACD:MACD: Multiply Accumulate and Delay move.Multiply Accumulate and Delay move. MACD Smem, pmad, srcMACD Smem, pmad, src

src=src+Smem*pmad;src=src+Smem*pmad; T=Smem; T=Smem; (Smem+1)=Smem(Smem+1)=Smem

If If MACDMACD used in a loop with used in a loop with RPTRPT the the program memory (pmad) address is program memory (pmad) address is automatically incremented.automatically incremented. MACD alone = 3 cycle timesMACD alone = 3 cycle times In a RPT loop 1 cycle timeIn a RPT loop 1 cycle time


ESIEE, Slide 40

Implementing a FIR with MACDImplementing a FIR with MACD

Memory organization of data and coefficientsMemory organization of data and coefficients

Addresses Contenti=pmad b(N-1)

i+1 b(N-2)i+2 b(N-3)… …

i+N-1 b(0)

Program Memory

Addresses Contentk=Smem x(n)

k+1 x(n-1)k+2 x(n-2)…

k+N-1 x(n-N+1)

k+N

dummy place

for copy of

x(n-N+1)

Data Memory


ESIEE, Slide 41

Initialization of RegistersInitialization of Registers

STMSTM Stores #value to the MMR early Stores #value to the MMR earlyin the pipeline to avoid latencies.in the pipeline to avoid latencies. 2 words, 2cycles.2 words, 2cycles.

Initialization of FRCT bit (fractional Initialization of FRCT bit (fractional mode):mode): InstructionsInstructions SSBX SSBX (Set Status Bit) and (Set Status Bit) and

RSBXRSBX (Reset Status Bit). (Reset Status Bit). Initialization of ACCUInitialization of ACCU

Using Using RPTZRPTZ :RePeaT after initializing :RePeaT after initializing ACCU at 0ACCU at 0

Or via Or via LD #0,A LD #0,A


ESIEE, Slide 42

RPT, RPTZRPT, RPTZ Instructions Instructions

RPT #nRPT #n Repeat next instruction n+1 times. Repeat next instruction n+1 times.

Repetition counter set to n and decreases Repetition counter set to n and decreases until 0.until 0.

1 or 2 cycles, not interruptible.1 or 2 cycles, not interruptible. RPTZ src, #nRPTZ src, #n

Same as repeat, except that src ACCU is Same as repeat, except that src ACCU is cleared to zero before repeat.cleared to zero before repeat.

2 cycles , not interruptible.2 cycles , not interruptible. Some instructions execute faster when Some instructions execute faster when

in repeat mode (pipeline).in repeat mode (pipeline).


ESIEE, Slide 43

Implementing a FIR Filter with MACDImplementing a FIR Filter with MACD

.bss adr_debut_dat,N+1adr_fin_dat .set adr_debut_dat+N-1

.text* Initialization of AR1 and FRCT

STM #adr_fin_dat, AR1SSBX FRCT

* Filter loopRPTZ A, #N-1MACD *AR1-, adr_coef, A

Test with CCSTest with CCS Filter with N=32 coefficients all equal to 1/32Filter with N=32 coefficients all equal to 1/32 Create a file fircoef.asm, address of coefficients in Create a file fircoef.asm, address of coefficients in

program mem = adr_coefprogram mem = adr_coef


ESIEE, Slide 44


File containing coefficients fircoef.asmFile containing coefficients fircoef.asm

.global adr_coef

.sect ".coef"adr_coef .word 0X400, 0X400

.word 0X400,0X400,0X400,0X400,0X400

.word 0X400,0X400,0X400,0X400,0X400

.word 0X400,0X400,0X400,0X400,0X400

.word 0X400,0X400,0X400,0X400,0X400

.word 0X400,0X400,0X400,0X400,0X400

.word 0X400,0X400,0X400,0X400,0X400


ESIEE, Slide 45


File firmacd.asm with the programFile firmacd.asm with the program 2 files to compile and link: 2 files to compile and link:

fircoef.asm and firmacd.asmfircoef.asm and firmacd.asm

Test by associating files on the ports Test by associating files on the ports DRR0 and DXR0DRR0 and DXR0 File infir.dat attached to DRR0File infir.dat attached to DRR0 File outfir.dat attached to DXR0File outfir.dat attached to DXR0


ESIEE, Slide 46


Program file firmacd.asm: initializationsProgram file firmacd.asm: initializations.mmregs.global adr_debut_dat.global adr_fin_dat.global adr_coef

N .set 32.bss adr_debut_dat,N+1

adr_fin_dat .set adr_debut_dat+N-1

.text* Initialization of DP and FRCT

LD #0, DPSSBX FRCT

* Initialization of AR0, AR1, AR2STM #(adr_debut_dat),AR2STM #(adr_debut_dat-1),AR1STM #N, AR0


ESIEE, Slide 47


Program file firmacd.asm: endless loopProgram file firmacd.asm: endless loopdebut:* set AR1 at adr_fin_dat

MAR *AR1+0* Read x(n) at DRR

LDM DRR0, A STL A,*AR2

* Endless filter loopRPTZ A, #N-1MACD *AR1-, adr_coef, A

* Write y(n) in DXR* by saving the high part of ACCU in DXR

STH A,DXR0 * Go back to the beginning of the loop

B debut

See files See files firmacd.asm firmacd.asm and and fircoef.asm fircoef.asm for the test in for the test in directory directory tutorial.tutorial.


ESIEE, Slide 48

FIR with MACD, Test with CCSFIR with MACD, Test with CCS Create project, create command file, Create project, create command file,

compile and link.compile and link. To test the impulse response:To test the impulse response:

Create a file infir.dat with:Create a file infir.dat with: A value 0.5 (0x4000) then zeros (at least 40)A value 0.5 (0x4000) then zeros (at least 40)

Set 2 probe pointsSet 2 probe points 1 at reading of DRR: LDM DRR1 at reading of DRR: LDM DRR 1 at end of loop: B debut1 at end of loop: B debut

Attach files to probe pointsAttach files to probe points infir.dat at 1rst probe point (read value stored infir.dat at 1rst probe point (read value stored

at address 0x20 DRR)at address 0x20 DRR) outfir.dat at second probe point (data at outfir.dat at second probe point (data at

address 0x21 DXR is strored in the file)address 0x21 DXR is strored in the file)


ESIEE, Slide 49

ResultsResults

Let program run until end of file Let program run until end of file infir.datinfir.dat

Load file outfir.dat at some address in Load file outfir.dat at some address in the DSP data memory (the DSP data memory (File-Data-LoadFile-Data-Load))

Plot the content of this memory area Plot the content of this memory area ((View-Graph-Time/FrequencyView-Graph-Time/Frequency).). Plot a time graph (Plot a time graph (Single TimeSingle Time)) Plot a frequency graph (Plot a frequency graph (FFT: Magnitude FFT: Magnitude

and Phaseand Phase))


ESIEE, Slide 50

Results for the impulse response and its FFTResults for the impulse response and its FFT


ESIEE, Slide 51

Second TestSecond Test

New test with a sine input.New test with a sine input. Replace infir.dat by file insinus.dat Replace infir.dat by file insinus.dat

containing 80 samples of a sine with 40 containing 80 samples of a sine with 40 samples per period of sine.samples per period of sine.

Name outsine.dat the result file.Name outsine.dat the result file. Repeat the same operations as in the Repeat the same operations as in the

preceding test.preceding test.


ESIEE, Slide 52

Second testSecond test Observe that the output is attenuated and is phase Observe that the output is attenuated and is phase

shifted by values corresponding at H(f) at fshifted by values corresponding at H(f) at fSS/40./40.


ESIEE, Slide 53

Implementation using a Circular BufferImplementation using a Circular Buffer

A circular buffer of length N is a block A circular buffer of length N is a block of contiguous memory words addressed of contiguous memory words addressed by a pointer using a modulo N by a pointer using a modulo N addressing mode. addressing mode. The 2 extreme words of the memory block The 2 extreme words of the memory block

are considered as contiguous.are considered as contiguous. Characteristics of a circular buffer:Characteristics of a circular buffer:

Instead of moving the N data in memory, Instead of moving the N data in memory, just modify the pointers.just modify the pointers.

When a new data x(n) arrives, the pointer When a new data x(n) arrives, the pointer is incremented and the new data is written is incremented and the new data is written in place of the oldest one.in place of the oldest one.


ESIEE, Slide 54

Trace of Memory and Pointer in a Circular Trace of Memory and Pointer in a Circular Buffer of Length 3Buffer of Length 3

Time n Time n+1 Time n+2 Time n+3

x(n-1) x(n-1) x(n+2) x(n+2)x(n) x(n) x(n) x(n+3)

x(n-2) x(n+1) x(n+1) x(n+1)


ESIEE, Slide 55

FIR with Circular BuffersFIR with Circular Buffers

2 circular buffers2 circular buffers 1 for data1 for data 1 for coefficients1 for coefficients

Data Memory

Coefficient memory

adr_deb_data adr_deb_coef b(N-1)b(N-2)

pnt_coefpnt_data

adr_fin_coef adr_fin_coef b(0)


ESIEE, Slide 56

Operation of FIR with Circular BufferOperation of FIR with Circular Buffer Read a new input sample x(n)Read a new input sample x(n) Store it at address of pnt_dataStore it at address of pnt_data ACCU=0ACCU=0 for i=1 to N-1for i=1 to N-1

multiply data pointed by pnt_data by multiply data pointed by pnt_data by coefficient pointed by pnt_coef. Add coefficient pointed by pnt_coef. Add product to ACCUproduct to ACCU

decrement pointers pnt_data and pnt_coefdecrement pointers pnt_data and pnt_coef endend output y(n) from ACCUoutput y(n) from ACCU increment pnt_data of 1increment pnt_data of 1


ESIEE, Slide 57

Instruction MAC with 2 operands in Indirect Instruction MAC with 2 operands in Indirect Addressing ModeAddressing Mode

MAC: Multiply and AccumulateMAC: Multiply and Accumulate MAC Xmem, Ymem, src[, dest]MAC Xmem, Ymem, src[, dest]

dst=src+Xmem*Ymemdst=src+Xmem*Ymem T=XmemT=Xmem With Xmem, Ymem use only AR2 to AR5With Xmem, Ymem use only AR2 to AR5 Can be executed in 1 cycle time.Can be executed in 1 cycle time.

Dual operand instructions indirect Dual operand instructions indirect addressing restricted to:addressing restricted to: AR2, AR3, AR4, AR5AR2, AR3, AR4, AR5 none, +, -, +0%none, +, -, +0%


ESIEE, Slide 58

Circular Buffer with C54xCircular Buffer with C54x Circular indirect addressing mode:Circular indirect addressing mode:

*ARi-%, *ARi+%, *ARi-0%, *ARi+0%, *ARi-%, *ARi+%, *ARi-0%, *ARi+0%, *ARi(lk)%*ARi(lk)%

In dual operand mode Xmem, Ymem:In dual operand mode Xmem, Ymem: *ARi+0%*ARi+0% only valid mode only valid mode To perform a decrement, store a negative value To perform a decrement, store a negative value

in AR0.in AR0.

BKBK register: register: Stores the size N of the circular buffer.Stores the size N of the circular buffer. Must be initialized before use.Must be initialized before use. There may be several circular buffers at There may be several circular buffers at

different addresses at the same time but different addresses at the same time but with the same length.with the same length.


ESIEE, Slide 59

Limitations on Start Addresses of Circular Limitations on Start Addresses of Circular BuffersBuffers

If N is written on nb bits in binary, the If N is written on nb bits in binary, the start address must have its nb LSB at 0:start address must have its nb LSB at 0: Examples:Examples:

for N=32, 6 LSB of start address =0for N=32, 6 LSB of start address =0 for N=30, 5 LSB of start address =0for N=30, 5 LSB of start address =0

To access a circular buffer:To access a circular buffer: Initialize BK with N (nb bits)Initialize BK with N (nb bits) Choose 1 ARi as a pointerChoose 1 ARi as a pointer

The effective start address of the buffer is the The effective start address of the buffer is the value in ARi with its nb LSB at 0.value in ARi with its nb LSB at 0.

The end address = start addess +N-1.The end address = start addess +N-1.


ESIEE, Slide 60

Circular buffer on C54xCircular buffer on C54x

Data Memory

Start_address = xxxxxxxxxxx00000

ARi

End_address = xxxxxxxxxxx11111

xxxxxxxxxxx00010

ARi BK

N=30=1 1 1 1 0


ESIEE, Slide 61

Implementation of FIR Filter Implementation of FIR Filter with 2 Circular Bufferswith 2 Circular Buffers

Same filter as in the preceding example, Same filter as in the preceding example, coefficients in section .coef (in program coefficients in section .coef (in program memory) in file fircoef.asm.memory) in file fircoef.asm.

N=32N=32 2 buffers are allocated in data memory 2 buffers are allocated in data memory

for the coefficients and the data of the for the coefficients and the data of the filtersfilters Start addresses must be multiple of 64.Start addresses must be multiple of 64.

First step of program after initialization:First step of program after initialization: Transfer coefficients from program to data Transfer coefficients from program to data

memory from adr_coef to adr_debut_coef.memory from adr_coef to adr_debut_coef.


ESIEE, Slide 62

Move InstructionsMove Instructions

MVPD #pmad, SmemMVPD #pmad, Smem Copy values from program to data memoryCopy values from program to data memory In RPT mode pmad is automatically In RPT mode pmad is automatically

incremented.incremented.

Program Data MMR DataMVPD, MVDP

READA, WRITEAData Data MMR MMRMVKD, MVDK, MVDD MVMM

MVMD, MVDM


ESIEE, Slide 63

Implementation of FIR with 2 Circular Implementation of FIR with 2 Circular Buffers, InitializationsBuffers, Initializations

.mmregs

.global adr_debut_dat

.global adr_fin_dat

.global adr_debut_coef

.global adr_fin_coef

.global adr_coef

N .set 32adr_debut_dat .usect "buf_data", Nadr_debut_coef .usect "buf_coef", Nadr_fin_dat .set adr_debut_dat+N-1adr_fin_coef .set adr_debut_coef+N-1

.text* Initialization of BK,AR0,FRCT

STM #N, BKSTM #-1, AR0SSBX FRCT

* Initialization of AR2, AR3STM #(adr_debut_dat),AR2STM #(adr_fin_coef),AR3


ESIEE, Slide 64

Implementation of FIR with 2 Circular Implementation of FIR with 2 Circular Buffers, ProgramBuffers, Program

* Transfer of coefficients from* program to data memory

STM #adr_debut_coef, AR4RPT #N-1MVPD adr_coef, *AR4+

* Endless loopdebut:* Read x(n) at DRR

LDM DRR0, ASTL A, *AR2

* Calculation of y(n)RPTZ A, #N-1MAC *AR2+0%, *AR3+0%, A

* Write y(n) in DXR* by saving high part of ACCU

STH A, DXR0* Go back to the beginning of the loop

MAR *AR2+B debut

See files See files fircirc.asm fircirc.asm and and fircoef.asm fircoef.asm for the test.for the test.


ESIEE, Slide 65

Command File for Circular Buffer Command File for Circular Buffer Addressing ConstraintAddressing Constraint

The addresses adr_debut_dat and The addresses adr_debut_dat and adr_debut_coef have to be aligned with adr_debut_coef have to be aligned with a multiple of 64 in the example.a multiple of 64 in the example. adr_debut_dat is the start address of adr_debut_dat is the start address of

unitialized section buf_data.unitialized section buf_data. adr_debut_coef is the start address of adr_debut_coef is the start address of

unitialized section buf_coef.unitialized section buf_coef. To align the 2 sections on a multiple of 64, To align the 2 sections on a multiple of 64,

in the command file add in the command file add align(64)align(64) after the after the name of the sections in the MEMORY name of the sections in the MEMORY directive, for example:directive, for example: buf_data align(64) > DATAbuf_data align(64) > DATApage 1page 1


ESIEE, Slide 66

Implementation of a Symmetrical FIR filterImplementation of a Symmetrical FIR filter

The symmetry of coefficients is used to decrease the The symmetry of coefficients is used to decrease the computational load:computational load: b(n)=b(N-1-n)b(n)=b(N-1-n) N time cycles for a general FIR filter with N N time cycles for a general FIR filter with N

coefficients is N (in good conditions). coefficients is N (in good conditions). N/2 time cycles for a symmetrical FIR filter.N/2 time cycles for a symmetrical FIR filter. Use of specific instruction Use of specific instruction FIRSFIRS..

12

0

11

2

0

( ) ( ) ( ) ( 1 ) N even

1 1( ) ( ) ( ) ( 1 ) N odd

2 2

N

i

N

i

y n b i x n i x n N i

N Ny n b i x n i x n N i b x n


ESIEE, Slide 67

FIRSFIRS Instruction to Work with RPT(Z) Instruction to Work with RPT(Z)

FIRS Xmem, Ymem, pmadFIRS Xmem, Ymem, pmad Xmem, Ymem corresponds to:Xmem, Ymem corresponds to:

x(n-i), x(n-N+1+i)x(n-i), x(n-N+1+i) Coefficients in program memory pmadCoefficients in program memory pmad operations of FIRS:operations of FIRS:

pmad pmad PARPAR while RC while RC 0 0

B = B + A(32:16) x Pmem addressed by PARB = B + A(32:16) x Pmem addressed by PAR A = (Xmem+Ymem)<<16A = (Xmem+Ymem)<<16 PAR=PAR+1PAR=PAR+1 RC=RC-1RC=RC-1


ESIEE, Slide 68

Using FIRS for a Symmetrical FIR FilterUsing FIRS for a Symmetrical FIR Filter 3 arrays:3 arrays:

N/2 first coefficients, N/2 first coefficients, N/2 newest data and N/2 oldest data.N/2 newest data and N/2 oldest data.

Program Memory

Data Memory

adr_debut_coef PAR

b(0) x(n-2)adr_debut_dat0

AR2

b(1) x(n)b(2) x(n-1)

x(n-3)adr_debut_dat1

AR3

x(n-5)x(n-4)

Example for N = 62 circular buffers


ESIEE, Slide 69

Using FIRS for a Symmetrical FIR FilterUsing FIRS for a Symmetrical FIR Filter

BK = N/2BK = N/2 At the beginning AR2 and AR3 point to:At the beginning AR2 and AR3 point to:

the newest data x(n)the newest data x(n) and the oldest data x(n-N+1)and the oldest data x(n-N+1)

x(n) x(n-N+3) x(n) x(n-N+3)x(n-1) x(n-1)

x(n-N/2) x(n-N/2)x(n-N+1) x(n-N+1)

x(n-N/2-1) x(n-N+2) x(n-N/2-1) x(n-N+2)

Beginning After N/2 +1 incrementations


ESIEE, Slide 70

Using FIRS for a Symmetrical FIR FilterUsing FIRS for a Symmetrical FIR Filter FIRS is repeated N/2 timesFIRS is repeated N/2 times The first sum x(n)+x(n-N+1) is done The first sum x(n)+x(n-N+1) is done

before entering the loop.before entering the loop. N/2 iterations (AR2 and AR3 incremented N/2 iterations (AR2 and AR3 incremented

by 1):by 1): At the first iteration AR2 points on x(n-1) and At the first iteration AR2 points on x(n-1) and

AR3 on x(n-N+2)AR3 on x(n-N+2) After N/2 iterations: AR2 is decremented of 2 After N/2 iterations: AR2 is decremented of 2

and AR3 of 1.and AR3 of 1. The oldest sample x(n-N/2+1) of 1st buffer is The oldest sample x(n-N/2+1) of 1st buffer is

stored in 2nd buffer in place of x(n-N+1). stored in 2nd buffer in place of x(n-N+1). Then AR is incremented by 1.Then AR is incremented by 1.

New sample x(n+1) is stored in place of x(n).New sample x(n+1) is stored in place of x(n).


ESIEE, Slide 71

Symmetrical FIR Implementation with FIRS, Symmetrical FIR Implementation with FIRS, InitializationsInitializations.mmregs.global adr_debut_coef.global adr_debut_dat0.global adr_debut_dat1

N .set 32Nsur2 .set 16adr_debut_coef .set adr_coefadr_debut_dat .usect "buf_data0", Nadr_debut_dat1 .usect "buf_data1", N

.text* Initialization of BK, AR0,FRCT

STM #Nsur2, BKSTM #-2, AR0SSBX FRCT

* Initialization of AR2, AR3STM #(adr_debut_dat0),AR2STM #(adr_debut_dat1),AR3


ESIEE, Slide 72

Symmetrical FIR Implementation using Symmetrical FIR Implementation using FIRS, ProgramFIRS, Program

* Endless loopdebut:* Read x(n) at DRR

LDM DRR0, ASTL A, *AR2

* Calculation of y(n)* Calculation of the first sum

ADD *AR2+0%,*AR3+0%,A* Repeat N/2 times FIRS

RPTZ B, #(Nsur2-1)FIRS *AR2+0%, *AR3+0%, adr_coef

* Write y(n) at DXR* by saving high part of ACCU in DXR

STH B, DXR0* Transfer of the oldest value of 1rst array* to the oldest value of the 2nd array

MAR *+AR2(-2)%MAR *AR3-%MVDD *AR2, *AR3+0%

* Go back to the beginning of the loopB debut

See files See files firsym.asm firsym.asm and and fircoef.asm fircoef.asm for the test.for the test.


ESIEE, Slide 73

TutorialTutorial

The listing files for the prceent examples The listing files for the prceent examples can be found in directory tutorial:can be found in directory tutorial: Tutorial > Dsk5416 > Chapter 14 > Labs_firTutorial > Dsk5416 > Chapter 14 > Labs_fir


ESIEE, Slide 74


Implementation of block filtersImplementation of block filters

Implementation of symmetrical or aImplementation of symmetrical or asymmetrical FIR filterssymmetrical FIR filters


ESIEE, Slide 75

Implementation of FIR Filters using C55xImplementation of FIR Filters using C55x 2 MAC units accessed using 3 data buses 2 MAC units accessed using 3 data buses

D, B, C make it possible to:D, B, C make it possible to: Calculate 2 output samples y at a time using Calculate 2 output samples y at a time using

same set of coefficients and different data x.same set of coefficients and different data x. Calculate 2 output samples y at a time using Calculate 2 output samples y at a time using

same input data x but 2 set of coefficients.same input data x but 2 set of coefficients.

AAAACC00

AACC11

ttMMAACC

DDaattaa RReeaadd BBuusseess

MMAACC


ESIEE, Slide 76

Using the 2 MAC UnitsUsing the 2 MAC Units

y n = b 0 x n + b 1 x n-1 + b 2 x n-2 + b 3 x n-3

y n+1 = b 0 x n+1 + b 1 x n + b 2 x n-1 + b 3 x n-2

C55x

y n = b 0 x n + b 1 x n-1 + b 2 x n-2 + b 3 x n-3

C54x

MAC *AR2+, *CDP+, AC0 :: MAC *AR3+, *CDP+, AC1

MAC *AR2+, *AR3+, A

AAAACC00

AACC11

ttMMAACC

DDaattaa RReeaadd BBuusseess

MMAACC

Use of block Use of block filtering in order to filtering in order to calculate 2 output calculate 2 output samples at a time.samples at a time.


ESIEE, Slide 77

Block FilterBlock Filter

Calculate a block of M output samples:Calculate a block of M output samples: Avoids interrupts sample by sampleAvoids interrupts sample by sample Allows calculation of 2 samples at a timeAllows calculation of 2 samples at a time

1

00, 1 .

N

n m i n m ii

y b x m M

M+N-1 inputs necessary to calculate M output M+N-1 inputs necessary to calculate M output samples.samples. Because of N-1 initial conditions.Because of N-1 initial conditions.


ESIEE, Slide 78

Block Filter, example N=4, M=3Block Filter, example N=4, M=3

Coeffcients Input dataCDP b0 AR2 xn

b1 AR3 xn-1

b2 xn-2

b3 xn-3

xn-4

xn-5

…

yn =

yn-1 =

yn-2 =

b0xn+b1xn-1+b2xn-2+b3xn-3

b0xn-1+b1xn-2+b2xn-3+b3xn-4

b0xn-2+b1xn-3+b2xn-4+b3xn-5


ESIEE, Slide 79

Block Filter ExampleBlock Filter Example

Double loop: Double loop: On coefficients and on mOn coefficients and on m

Coefficients accessed by CDP:Coefficients accessed by CDP: CDP (Cmem) modifications limited to:CDP (Cmem) modifications limited to:

*CDP, *CDP+, *CDP-, *(CDP+T0).*CDP, *CDP+, *CDP-, *(CDP+T0).

CDP uses B bus CDP uses B bus onlyonly for dual-MAC. for dual-MAC.Because B bus is internal only, coefficients Because B bus is internal only, coefficients must also be internal.must also be internal.

Place data operands carefully to avoidPlace data operands carefully to avoidmemory conflicts (SA/DARAM).memory conflicts (SA/DARAM).


ESIEE, Slide 80

Using Dual MACUsing Dual MAC

AACC00

AACC11

BB CC DD

CCDDPP AARR22 AARR33

MMAACC MMAACC

MAC *AR2+, *CDP+, AC0 :: MAC *AR3+, *CDP+, AC1

y n = b 0 x n + b 1 x n-1 + b 2 x n-2 + b 3 x n-3

y n+1 = b 0 x n+1 + b 1 x n + b 2 x n-1 + b 3 x n-2

Coeffcients Input dataCDP b0 AR2 xn

b1 AR3 xn-1

b2 xn-2

b3 xn-3

xn-4

xn-5

…


ESIEE, Slide 81

Initialization of PointersInitialization of Pointers

AAMMOOVV ##xx,,XXAARR22 AAMMOOVV ##((xx++11)),,XXAARR33 AAMMOOVV ##aa00,,XXCCDDPP

Use AMOV to do Use AMOV to do transfers during the transfers during the “AD” pipeline phase“AD” pipeline phase..

Init AR2 to point to the 1Init AR2 to point to the 1stst value of value of input data : (x)input data : (x)

Init AR3 to point to the 2Init AR3 to point to the 2ndnd value of value of input data (x+1)input data (x+1)

Init CDP to point to coefficient array (a)Init CDP to point to coefficient array (a)


ESIEE, Slide 82

Inner Loop on CoefficientsInner Loop on Coefficients

RRPPTT ##33 MMAACC **AARR22++,,**CCDDPP++,,AACC00 :::: MMAACC **AARR33++,,**CCDDPP++,,AACC11

Coeffcients Input dataCDP b0 xn

b1 xn-1

b2 AR2 xn-2

b3 AR3 xn-3

CDP AR2 xn-4

AR3 xn-5

…

AASSUUBB ##22,,AARR22 AASSUUBB ##22,,AARR33 MMOOVV ##aa00,,CCDDPP

PointersPointers at the end of the repeat instruction: at the end of the repeat instruction:

Reinitialization of Reinitialization of pointerspointers for next for next output sample:output sample:


ESIEE, Slide 83

Circular Addressing Mode for CoefficientsCircular Addressing Mode for Coefficients

Initialize size of the circular buffer: BKInitialize size of the circular buffer: BK Set up Buffer Start Address: BSA and Set up Buffer Start Address: BSA and

XevenXeven Set up ARi or CDPSet up ARi or CDP No memory alignment constraintNo memory alignment constraint

bb00

bb11

bb22

bb33

XXeevveenn :: BBSSAAxxxx

AARRnn//CCDDPP BBKKzzzz


ESIEE, Slide 84

Circular Buffer Addressing ModeCircular Buffer Addressing Mode

== BBuuffffeerr SSttaarrtt AAddddrreessss

== BBuuffffeerr LLeennggtthh BBKKzzzz[[1155::00]]

OOffffsseett iinnttoo BBuuffffeerr ==

BBSSAAxxxx[[1155::00]] XXeevveenn[[2222::1166]]

== CCaallccuullaatteedd AAddddrreessss BBSSAAxxxx ++ AARRnn//CCDDPP XXeevveenn[[2222::1166]]

AARRnn//CCDDPP ++


ESIEE, Slide 85

Circular Buffer Addressing ModeCircular Buffer Addressing Mode

Offset XevenBuffer Start

Address

Block size Register

AR0AR1AR2AR3AR4AR5AR6AR7CPD XCDP[22:16] BSAC BKC

XAR0[22:16]

XAR2[22:16]

XAR4[22:16]

XAR6[22:16]

BK03

BK03

BSA01

BSA01

BSA01

BSA01

The even XARn (i.e. 0,2,4,6) determines the 64K PageThe even XARn (i.e. 0,2,4,6) determines the 64K Page


ESIEE, Slide 86

Selecting Circular or Linear Addressing Selecting Circular or Linear Addressing ModeMode

Use the LSB of Status word ST2_55Use the LSB of Status word ST2_55

00 == lliinneeaarr mmooddee 11 == cciirrccuullaarr mmooddee

SSTT22__5555

AARR77LLCC

AARR66LLCC

AARR55LLCC

AARR44LLCC

AARR33LLCC

AARR22LLCC

AARR11LLCC

AARR00LLCC

CCDDPPLLCC

ootthheerr bbiittss oorr rrssvvdd

00 11 22 33 44 55 66 77 88 99 1155

((ddeeffaauulltt))

Set or reset status bits:Set or reset status bits: BBSSEETT AARR55LLCC ;;AARR55 iinn cciirrccuullaarr mmooddee BBCCLLRR AARR33LLCC ;;AARR33 iinn lliinneeaarr mmooddee


ESIEE, Slide 87

Circular Buffer ExerciseCircular Buffer ExerciseUse AR4 as a circular pointer to x{5}:Use AR4 as a circular pointer to x{5}:

AARR44 77 11 99 66 22

00

11

22

33

44

xx

AARR44

..sect “data” x .int 7,1,9,6,2 ;init data .sect “code” __________________ ;init XAR __________________ ;init start addr __________________ ;init length __________________ ;init AR4 to top __________________ ;set AR4 to circ MOV #3,T0 ;index MOV *(AR4+T0),AC0 ;AC0 =__7__, AR4 =_3____ MOV *+AR4(#4h),AC1 ;AC1 =_9__, AR4 =_2____ MOV *AR4(T0),AC2 ;AC2 =_7__, AR4 =_2__

AMOV #x,XAR4 MOV #x,BSA45 MOV #5,BK47 MOV #0,AR4 BSET AR4LC

Results areResults arecumulativecumulative


ESIEE, Slide 88

Circular Buffer for CoefficientsCircular Buffer for Coefficients Table of coefficients b0 … b3: Table of coefficients b0 … b3:

Circular buffer addressed by CDP.Circular buffer addressed by CDP. Initialize XCDP: 7 MSBInitialize XCDP: 7 MSB Initialize CDP to 0: offset in the bufferInitialize CDP to 0: offset in the buffer Set up CPD in circular addressing modeSet up CPD in circular addressing mode

ss11:: AAMMOOVV ##xx,,XXAARR22 AAMMOOVV ##aa00,,XXCCDDPP AAMMOOVV ##((xx++11)),,XXAARR33 MOV #a0,BSC MMOOVV ##00,,CCDDPP MMOOVV ##44,,BBKKCC BBSSEETT CCDDPPLLCC


ESIEE, Slide 89

Store Results, 32-bit MovesStore Results, 32-bit Moves

Assuming fractional mode, 2 results are Assuming fractional mode, 2 results are in high parts of AC0 and AC1in high parts of AC0 and AC1

AC0 and AC1 can be saved separately:AC0 and AC1 can be saved separately:

AC0, AC1 AC0, AC1 can be saved at the same time:can be saved at the same time:

Pairs: (AC0,AC1), (AC2,AC3)Pairs: (AC0,AC1), (AC2,AC3) ARi incremented of 2ARi incremented of 2 Even align yEven align y

MOV HI(AC0), *AR4+MOV HI(AC1), *AR4+

MOV pair(hi(AC0)),dbl(*AR4+)


ESIEE, Slide 90

Block Filter Inner LoopBlock Filter Inner Loop s1: AMOV #x,XAR2

AMOV #a0,XCDP AMOV #(x+1),XAR3 AAMMOOVV ##yy,,XXAARR44 MOV #a0,BSAC MOV #0,CDP MOV #4,BKC BSET CDPLC

MOV #0,AC0 MOV #0,AC1 RPT #3 MAC *AR2+,*CDP+,AC0 ::MAC *AR3+,*CDP+,AC1 ASUB #2,AR2 ASUB #2,AR3

ee11:: MMOOVV ppaaiirr((hhii((AACC00)))),,ddbbll((**AARR44++))


ESIEE, Slide 91

Outer Loop Using RPTB or RPTBlocalOuter Loop Using RPTB or RPTBlocal

Use Use RPTBRPTB Repeat Block instruction Repeat Block instruction We must specifiy:We must specifiy:

Start address of the block: next instructionStart address of the block: next instruction End address: label specifies last instructionEnd address: label specifies last instruction The number of repetitions counter: The number of repetitions counter:

BRC0BRC0: loop counter initialized with count-1: loop counter initialized with count-1 Min count = 2Min count = 2

RPTBlocalRPTBlocal: executes from the IBU: executes from the IBU 56 bytes maximum (if > 56 Bytes use RPTB)56 bytes maximum (if > 56 Bytes use RPTB) Reduces power consumptionReduces power consumption


ESIEE, Slide 92

Outer Loop on m: Calculate M yn-mOuter Loop on m: Calculate M yn-m ss11:: AAMMOOVV ##xx,,XXAARR22

AAMMOOVV ##aa00,,XXCCDDPP AAMMOOVV ##((xx++11)),,XXAARR33

AAMMOOVV ##yy,,XXAARR44 MMOOVV ##aa00,,BBSSAACC MMOOVV ##00,,CCDDPP MMOOVV ##44,,BBKKCC BBSSEETT CCDDPPLLCC MMOOVV ##((((ssaammppss--ttaappss))//22)),,BBRRCC00

RRPPTTBBLLOOCCAALL ee11 MMOOVV ##00,,AACC00 MMOOVV ##00,,AACC11 RRPPTT ##33 MMAACC **AARR22++,,**CCDDPP++,,AACC00 :::: MMAACC **AARR33++,,**CCDDPP++,,AACC11 AASSUUBB ##22,,AARR22 AASSUUBB ##22,,AARR33 ee11:: MMOOVV ppaaiirr((hhii((AACC00)))),,ddbbll((**AARR44++))


ESIEE, Slide 93

More Nested loops ?More Nested loops ?

Nesting RPTB or RPTBlocal:Nesting RPTB or RPTBlocal: 2 levels supported using BRC0 (outer) and 2 levels supported using BRC0 (outer) and

BRC1/BRS1 (inner)BRC1/BRS1 (inner) No saving of registers required for nested No saving of registers required for nested

block repeat.block repeat.

MMOOVV ##oouutteerr__ccnntt,,BBRRCC00 ;;llooaadd oouutteerr lloooopp ccoouunntt MMOOVV ##iinnnneerr__ccnntt,,BBRRCC11 ;;llooaadd BBRRCC11,, aauuttoo--llooaadd BBRRSS11 RRPPTTBBLLOOCCAALL oouutteerr ;;uussee BBRRCC00 .. .. .. RRPPTTBBLLOOCCAALL iinnnneerr ;;BBRRCC11:: ddeeccrreemmeennttss,, BBRRSS11--nnoo cchhaannggee

.. .. .. iinnnneerr:: llaasstt__iinnnneerr .. .. .. oouutteerr:: llaasstt oouutteerr


ESIEE, Slide 94

Laboratory on Block FilterLaboratory on Block Filter Implement a block FIR with 16 coefficients Implement a block FIR with 16 coefficients

and input block size = 200.and input block size = 200. Implement subroutineImplement subroutine

aa{{1166}}

y

table{16} code

CC55551100

1_0000h FF_0000h

5_0000h

6644KKxx88 EEPPRROOMM

16Kx8 CCEE00

SARAM0 8Kx8

FF_FF00h vectors

AACC00

AAllll aaddddrreesssseess aanndd lleennggtthhss aarree sshhoowwnn iinn bbyytteess

xx{{220000}} 4000h DARAM2 8Kx8

SSPP//SSSSPP 6000h DARAM3 8Kx8


ESIEE, Slide 95

Using the Stack and SubroutinesUsing the Stack and Subroutines

Subroutines require Subroutines require callcall and and retret.. During a call the return address is During a call the return address is

stored in the stored in the Stack SPStack SP.. Let us call fir the subroutine:Let us call fir the subroutine:

call fircall fir


ESIEE, Slide 96

Initialize the StackInitialize the Stack

Declare an unitialized section (.usect) of Declare an unitialized section (.usect) of appropriate length to reserve space.appropriate length to reserve space.

Initialize stack pointer to point to the Initialize stack pointer to point to the top of stack +1.top of stack +1.

Recommendation: place the stack in Recommendation: place the stack in internal memory and align on a 4-byte internal memory and align on a 4-byte boundary:boundary: ALIGN= specifies bytesALIGN= specifies bytes

00

SSTTKK

SSPP

MMeemm

Size .set 100h Stack .usect "STK",size

AMOV #(stack+size),XSP


ESIEE, Slide 97

The System Stack SSPThe System Stack SSP

When a call occurs PC[15:0] is pushed When a call occurs PC[15:0] is pushed on the stackon the stack

The upper 8 bits SP[23:16] are pushed The upper 8 bits SP[23:16] are pushed on the system stack accessed by SSP on the system stack accessed by SSP System Stack Pointer.System Stack Pointer.

CFCT is used to store the active loop CFCT is used to store the active loop context.context.

WSP and XSSP share the same upper 7 WSP and XSSP share the same upper 7 bits.bits.

Place SP and SSP with care to avoid Place SP and SSP with care to avoid dual-access delays.dual-access delays.


ESIEE, Slide 98

Data TypesData Types Byte: 8 bitsByte: 8 bits Word: 16 bitsWord: 16 bits Long: 32 bitsLong: 32 bits

Long access assumes address points to MSWLong access assumes address points to MSW LSW read from LSW read from samesame address with LSB address with LSB toggledtoggled.. Ptr=100h, MSW=100h, LSW = 101hPtr=100h, MSW=100h, LSW = 101h Ptr=101h, MSW=101h, LSW = 100hPtr=101h, MSW=101h, LSW = 100h

To ensure proper alignment:To ensure proper alignment: Constants (int, long) are automatically aligned on Constants (int, long) are automatically aligned on

type boundariestype boundaries Variables: Variables:

16 bit: no problem16 bit: no problem 32 bits use: use the even-align flag:32 bits use: use the even-align flag:

.usect .usect “vars”,Nwords,,“vars”,Nwords,,11


ESIEE, Slide 99

Solution: DeclarationsSolution: Declarations

.sect "indata".sect "indata"x0x0 .copy in7.dat.copy in7.dat .def start.def start

.cpl_off.cpl_off

.arms_off.arms_off

.c54cm_off.c54cm_off

stklenstklen .set.set100100a0 a0 .usect "coeffs",16,1,1.usect "coeffs",16,1,1y0 y0 .usect "results",200,1,1 .usect "results",200,1,1 BOS BOS .usect "STK", stklen,1,1.usect "STK", stklen,1,1BOSS BOSS .usect "SSTK",stklen,1,1.usect "SSTK",stklen,1,1 .sect "init".sect "init"

table table .int 7FCh, 7FDh, 7FEh, 7FFh.int 7FCh, 7FDh, 7FEh, 7FFh .int 800h, 801h, 802h, 803h.int 800h, 801h, 802h, 803h .int 803h, 802h, 801h, 800h.int 803h, 802h, 801h, 800h

.int 7FFh, 7FEh, 7FDh, 7FCh.int 7FFh, 7FEh, 7FDh, 7FCh


ESIEE, Slide 100

Solution: CodeSolution: Code

..sect "code"sect "code"

.DP.DP a0 a0

start:start: AMOV #BOS+stklen,XSPAMOV #BOS+stklen,XSPc c ;set up Stack ;set up Stack ++ MOV #BOSS+stklen,SSP MOV #BOSS+stklen,SSP ;System Stack P;System Stack Ptrstrs

CALL copy CALL copy ;copy coeffs;copy coeffs

BSET FRCT ;turn onBSET FRCT ;turn on multmult. shift. shiftBSET M40 ;turn on 40 bit mathBSET M40 ;turn on 40 bit mathBSET SXMD ;turn on sign exteBSET SXMD ;turn on sign exten.n.

CALL firCALL fir ;perform fir;perform fir nopnophere:here: B here B here ;stop;stop


ESIEE, Slide 101

Solution: Subroutine copySolution: Subroutine copy

copy: copy: AMOV #table,XAR2AMOV #table,XAR2 ;load pointers;load pointersAMOV #a0,XAR3AMOV #a0,XAR3RPT #7RPT #7

MOV dbl(*AR2+),dbl(*AR3+)MOV dbl(*AR2+),dbl(*AR3+);move from table to a;move from table to a

RETRET


ESIEE, Slide 102

Solution: Subroutine firSolution: Subroutine firfir:fir: MOV #92,BRC0MOV #92,BRC0 ;block repeat count;block repeat count AMOV #x0,XAR2 AMOV #x0,XAR2 ;initialize pointers;initialize pointers

AMOV #x0+1,XAR3 AMOV #x0+1,XAR3 ;for data, ;for data, AMOV #y0,XAR4AMOV #y0,XAR4 ;results;results

AMOV #a0,XCDPAMOV #a0,XCDP ;;and coeffiecientand coeffiecientss MOV #a0,BSACMOV #a0,BSAC ;buffer start address;buffer start address MOV #16,BKCMOV #16,BKC ;buffer size;buffer size MOV #0, CDPMOV #0, CDP ;index;index BSET CDPLCBSET CDPLC ;turn on cir;turn on circ adr CDPc adr CDP

RPTBlocal endRPTBlocal end MPYM *AR2+,*CDP ,AC0 ;AC0 1st productMPYM *AR2+,*CDP ,AC0 ;AC0 1st product MPYM *AR3+,*CDP+,AC1 ;AC1 gets 2MPYM *AR3+,*CDP+,AC1 ;AC1 gets 2ndnd prd prd RPT #14RPT #14 MAC *AR2+,*CDP+,AC0 MAC *AR2+,*CDP+,AC0 ;form results;form results :: MAC *AR3+,*CDP+,AC1 :: MAC *AR3+,*CDP+,AC1

MOV pair(hi(AC0)),dbl(*AR4+)MOV pair(hi(AC0)),dbl(*AR4+) ;store AC0/AC1;store AC0/AC1 ASUB #14,AR2 ;wrap data pointersASUB #14,AR2 ;wrap data pointers

endend ASUB #14,AR3ASUB #14,AR3 ;next cal;next calculationculation RETRET


ESIEE, Slide 103

Implementation of Symmetrical and Implementation of Symmetrical and Anti-symmetrical FIR filters on ‘C55xAnti-symmetrical FIR filters on ‘C55x

CCooeeffffss

bb44 bb55 bb66 bb77 bb33 bb22 bb11 bb00

Symmetrical

CCooeeffffss

bb44 bb55 bb66 bb77 bb33 bb22 bb11 bb00

Anti-symmetrical

These filters may be “folded” and performed with N adds and N/2 MACs Filters need to be designed as even length

1

2

0( ) ( ) ( ) ( 1 ) N even.

N

iy n b i x n i x n N i


ESIEE, Slide 104

Instructions Instructions FIRSADDFIRSADD and and FIRSSUBFIRSSUB

FIRSADD Xmem,Ymem, coef,Acx,AcyFIRSADD Xmem,Ymem, coef,Acx,Acy Acy = Acy + (Acx x (*CDP))Acy = Acy + (Acx x (*CDP)) || Acx = Xmem + Ymem|| Acx = Xmem + Ymem For symmetrical FIRFor symmetrical FIR

FIRSSUB Xmem,Ymem, coef,Acx,AcyFIRSSUB Xmem,Ymem, coef,Acx,Acy Acy = Acy + (Acx x (*CDP))Acy = Acy + (Acx x (*CDP)) || Acx = Xmem - Ymem|| Acx = Xmem - Ymem For anti-symmetrical FIRFor anti-symmetrical FIR

If performing a block FIR, dual MAC has If performing a block FIR, dual MAC has better performance than FIRS.better performance than FIRS.

A design consideration for migration from A design consideration for migration from ‘C54x.‘C54x.


ESIEE, Slide 105

Comparison of C54x and C55xComparison of C54x and C55x

2 MAC in ‘C55x versus 1 for C54x2 MAC in ‘C55x versus 1 for C54x Well suited for block filtering and 2 taps Well suited for block filtering and 2 taps

per cycle time instead of 1 (for large N).per cycle time instead of 1 (for large N). Circular addressing modes:Circular addressing modes:

3 BK registers in C55X instead of 1 in 3 BK registers in C55X instead of 1 in ‘C54x: allows for several simultaneous ‘C54x: allows for several simultaneous circular buffers with different size.circular buffers with different size.

In C54x, circular addressing mode is In C54x, circular addressing mode is specified in indirect addressing type % in specified in indirect addressing type % in the instructions.the instructions.

In C55x, the mode in set in status register In C55x, the mode in set in status register ST2_55 for each register (linear or ST2_55 for each register (linear or circular). No memory alignment constraint.circular). No memory alignment constraint.


ESIEE, Slide 106

Comparison of C54x and C55xComparison of C54x and C55xSymmetrical and Anti-symmetricalSymmetrical and Anti-symmetrical

FIR FiltersFIR Filters In C54x, instruction FIRS:In C54x, instruction FIRS:

Allows 2 taps/cycle for a symmetrical FIRAllows 2 taps/cycle for a symmetrical FIR In C55x, instructions FIRSADD + In C55x, instructions FIRSADD +

FIRSSUB:FIRSSUB: Allow us to efficiently implement Allow us to efficiently implement

symmetrical and anti-symmetrical FIRs.symmetrical and anti-symmetrical FIRs. Despite the 2 MACs, as there is only 1 ALU, Despite the 2 MACs, as there is only 1 ALU,

again 2 taps/cycle for symmetrical or anti-again 2 taps/cycle for symmetrical or anti-symmetrical FIRs.symmetrical FIRs.


ESIEE, Slide 107

Follow On Activities on 5416 DSKFollow On Activities on 5416 DSK

Laboratory 3 for TMS320C5416 DSKLaboratory 3 for TMS320C5416 DSK To determine by practical experiment the best To determine by practical experiment the best

FIR window functions for audio.FIR window functions for audio.

Laboratory 4 for TMS320C5416 DSKLaboratory 4 for TMS320C5416 DSK To determine by experiment how many FIR To determine by experiment how many FIR

coefficients are required for acceptable audio coefficients are required for acceptable audio quality.quality.

Application 4 for TMS320C5416 DSKApplication 4 for TMS320C5416 DSK Electronic Crossover for multiple loudspeaker Electronic Crossover for multiple loudspeaker

system. Divides audio signal into treble and bass at system. Divides audio signal into treble and bass at 16 different selectable frequencies using FIR 16 different selectable frequencies using FIR filters.filters.


ESIEE, Slide 108

Follow on activities on 5510 DSKFollow on activities on 5510 DSK

Application “delays and echo” for Application “delays and echo” for TMS320C5510 DSK TMS320C5510 DSK Simulates delays in communications Simulates delays in communications

networks and reflection of sound heard in a networks and reflection of sound heard in a canyon. Introduces circular buffers and the canyon. Introduces circular buffers and the configuration used for a Finite Impulse configuration used for a Finite Impulse Response (FIR) filter.Response (FIR) filter.

dsp c5000 chapter 14 finite impulse response (fir) filter implementation copyright © 2003 texas...

Documents