copyright © 2003 texas instruments. all rights reserved. dsp c5000 chapter 5 assembly language

Copyright © 2003 Texas Instruments. All rights reserved.

DSP C5000DSP C5000

Chapter 5Chapter 5

Assembly LanguageAssembly Language


ESIEE, Slide 2

Assembly LanguageAssembly Language

Two Main types of assembly language :Two Main types of assembly language : AlgebraicAlgebraic MnemonicMnemonic

Both C54x and C55x can use either type Both C54x and C55x can use either type of assembly language.of assembly language.

C54C54

Running C54 code on the C55

C55C55


ESIEE, Slide 3

C54x Assembly LanguageC54x Assembly Language

The instruction set is divided into four The instruction set is divided into four basic types :basic types : ArithmeticArithmetic LogicLogic LoadLoad, Store & , Store & MoveMove ProgramProgram control control

C54x has a fixed length instruction C54x has a fixed length instruction wordword

Instruction must be encoded in one 16-bit word Instruction must be encoded in one 16-bit word in order to be executed in one cyclein order to be executed in one cycle


ESIEE, Slide 4

Instructions and OperandsInstructions and Operands

General syntax of an instruction :General syntax of an instruction : InstrInstr Op1,[Op2,[Op3,[…]]] Op1,[Op2,[Op3,[…]]]

For For InstrInstr field refer to TI documentation or field refer to TI documentation or following slidesfollowing slides

The The Op1,[Op2,[Op3,[…]]]Op1,[Op2,[Op3,[…]]] field syntax is field syntax is specified in instruction documentation and specified in instruction documentation and specifies the way (type of addressing mode) specifies the way (type of addressing mode) you could use for the operands.you could use for the operands.


ESIEE, Slide 5

Operand SyntaxOperand Syntax

Term Operand nature and access mode

Smem Data memory accessed by any of addressing mode.

pmad Absolute addressing of program memory

dmad Absolute addressing of data memory

Ymem Data memory accessed only by indirect addressing (AR2 through . AR5 only).

Xmem Data memory accessed only by indirect addressing (AR2 through AR5 only).

PA 16-bit port (I/O) immediate address (0 - 65,535)

src Source accumulator (A or B) dst Destination accumulator (A or B)

k,lk Short and long immediate addressing (#constant)


ESIEE, Slide 6

Arithmetic InstructionsArithmetic Instructions General purpose Arithmetic :General purpose Arithmetic :

Addition/Addition/subtractionsubtraction MultiplyMultiply ( (and accumulateand accumulate)) SquareSquare DivideDivide

Application Specific Arithmetic :Application Specific Arithmetic : MiscellaneousMiscellaneous Polynomial Polynomial evaluationevaluation Distance computationDistance computation Specific filtersSpecific filters ButterflyButterfly computation computation (Viterbi) (Viterbi)


ESIEE, Slide 7

General Purpose ArithmeticGeneral Purpose Arithmetic Addition/subtractionAddition/subtraction

field4=field3 field4=field3 field1*2field1*2field2field2

Result is stored in field4 if present else in field3, shift is done Result is stored in field4 if present else in field3, shift is done according to field2 if present.according to field2 if present.

Shift field is detailed on the next slide …Shift field is detailed on the next slide …

Inst field1 field2 field3 field4 words1 ADD/SUB Smem, src 12 ADD/SUB Smem, TS, src 13 ADD/SUB Smem, 16 src [,dst] 14. ADD/SUB Smem, [SHIFT,] src [,dst] 25 ADD/SUB Xmem, SHIFT1, src 16 ADD/SUB Xmem, Ymem, dst 17 ADD/SUB #lk, [SHIFT1,] src [,dst] 28 ADD/SUB #lk, 16 src [,dst] 29 ADD/SUB src, [SHIFT] [,dst] 110 ADD/SUB src, ASM [,dst] 1


ESIEE, Slide 8

Shift FieldShift Field

Many instructions use shift on one operand. Many instructions use shift on one operand. This shift is specified in operands field and This shift is specified in operands field and could be :could be : Immediate if specified by the keywords: Immediate if specified by the keywords:

-16 -16 SHIFTSHIFT 15 15 0 0 SHIFT1SHIFT1 15 15

Register indirect:Register indirect: -16 -16 ASMASM 15 ( 15 (AAccumulator ccumulator SShift hift MMode field of ode field of ST1ST1)) -16 -16 TSTS 31 ( 31 (TSTS are ths 6 LSBs of are ths 6 LSBs of T T register)register)


ESIEE, Slide 9

General Purpose ArithmeticGeneral Purpose Arithmetic Addition/subtraction (special cases)Addition/subtraction (special cases)

Arithmetic with unsigned operand :Arithmetic with unsigned operand :

field2=field2 field2=field2 unsigned(field1)unsigned(field1)

Direct computation on memory :Direct computation on memory :

With :With :

Inst field1 field2 words1 ADDS/SUBS Smem, src 1

Inst field1 field2 words1 ADDM #lk, Smem 2

SXMSXM=1=1

field2=field2 field2=field2 +signed(field1)+signed(field1)

-32768 -32768 lk lk 3276732767

SXMSXM=0=0

field2=field2 +unsigned(field1)field2=field2 +unsigned(field1)

00 lk lk 6553565535


ESIEE, Slide 10

General Purpose ArithmeticGeneral Purpose Arithmetic Addition/subtraction (extended precision) :Addition/subtraction (extended precision) :

32 bits : 32 bits :

1 : field2=field2 1 : field2=field2 field1field1 2 : field2=field1-field22 : field2=field1-field2

If If C16C16=0 field1 and field2 are considered as 32-bit =0 field1 and field2 are considered as 32-bit operands and 32 bits operands and 32 bits is realized. is realized.

If If C16C16=1 field1 and field2 are considered as a pair of 16-bit =1 field1 and field2 are considered as a pair of 16-bit operands and SIMD computation take place.operands and SIMD computation take place.

64 bits :64 bits :

field2=field2 field2=field2 unsigned(field1) unsigned(field1) carry/borrowcarry/borrow

Inst field1 field2 words1 DADD/DSUB Lmem src 12 DRSUB Lmem src 1

Inst field1 field2 words1 ADDC/SUBB Smem, src 1


ESIEE, Slide 11

General Purpose ArithmeticGeneral Purpose Arithmetic Addition/subtraction (extended precision contd)Addition/subtraction (extended precision contd)

64 bits addition/subtraction :64 bits addition/subtraction :

is realized as follows: is realized as follows: ((Look at a code exemple) at a code exemple)


ESIEE, Slide 12

General Purpose ArithmeticGeneral Purpose Arithmetic MultiplyMultiply

1,3 : field3=field11,3 : field3=field1TREGTREG 22

2,4 : field3=field12,4 : field3=field1field2field2

Multiply and Accumulate/SubtractMultiply and Accumulate/Subtract

1 : field3=field3 1 : field3=field3 (field1 (field1TREGTREG) ) 22

3 : field4=field3 3 : field4=field3 (field1 (field1TREGTREG))11

2,4 : field4=field3 2,4 : field4=field3 (field1 (field1field2) field2) 1, 21, 2

Inst field1 field2 field3 words1 MPY[R] Smem, dst 12 MPY Xmem, Ymem, dst 13 MPY #lk dst 24 MPY Smem, #lk, dst 2

1 1 Result is stored in field4 if present else in field3Result is stored in field4 if present else in field322 [R] : rounding result on the 16bits MSB of [R] : rounding result on the 16bits MSB of dst,dst, 16bits LSB are zeroed 16bits LSB are zeroed

Inst field1 field2 field3 field4 words1 MAC[R]/MAS[R] Smem, src 12 MAC[R]/MAS[R] Xmem, Ymem, src [,dst] 13 MAC/MAS #lk src [,dst] 24 MAC/MAS Smem, #lk, src [,dst] 2


ESIEE, Slide 13

General Purpose ArithmeticGeneral Purpose Arithmetic Multiply and Accumulate (with program memory)Multiply and Accumulate (with program memory)

1,2 : field4=field3 1,2 : field4=field3 (field1 (field1field2)field2) 2 : contents of data memory pointed by field1 operand (Smem) 2 : contents of data memory pointed by field1 operand (Smem)

is copied in next following data memory address.is copied in next following data memory address.

Delay of data together with scalar product are needed Delay of data together with scalar product are needed for FIR filter computationfor FIR filter computation

1 : 1 : TREGTREG=field1 and contents of data memory pointed by =field1 and contents of data memory pointed by field1 is copied in the next following data memory address.field1 is copied in the next following data memory address.

Inst field1 field2 field3 words

1 MACP Smem, pmad, src 22 MACD Smem, pmad, src 2

Inst field1 words1 LTD Smem 1


ESIEE, Slide 14

General Purpose ArithmeticGeneral Purpose Arithmetic Multiply, Accumulate and DelayMultiply, Accumulate and Delay

In all cases In all cases x(n-k)x(n-k) are in data memory are in data memory Case 1: Case 1: h(k)h(k) are in program memory are in program memory

Case 2: Case 2: h(k)h(k) are in data memory are in data memory

y n h k x n kk

N

( ) ( ) ( )

0

1

y n h k x n kk

N

( ) ( ) ( )

0

1

RPTRPT #N-1#N-1

MACDMACD *AR1-,coef,A*AR1-,coef,ARPTRPT #N-1#N-1

MACDMACD *AR1-,coef,A*AR1-,coef,A

RPTBRPTB endLoop-1endLoop-1

LTDLTD *AR1-*AR1-

MACMAC *AR2+,A*AR2+,A

endLoop:endLoop:

RPTBRPTB endLoop-1endLoop-1

LTDLTD *AR1-*AR1-

MACMAC *AR2+,A*AR2+,A

endLoop:endLoop:


ESIEE, Slide 15

General Purpose ArithmeticGeneral Purpose Arithmetic Multiply (with accumulator)Multiply (with accumulator)

1 : 1 : BB=field1 =field1 A(32-16)A(32-16)11

2 : field1=2 : field1=TREGTREGA(32-16)A(32-16)11

Multiply and Accumulate/Subtract(with Multiply and Accumulate/Subtract(with accumulator)accumulator)

1 : 1 : BB==BB (field1 (field1 A(32-16)A(32-16)))1,21,2

2 : field3=field2 2 : field3=field2 ( (TREGTREGA(32-16)A(32-16)))1,2,31,2,3

Inst field1 words1 MPYA Smem 12 MPYA dst 1

11A(32-16)A(32-16) stands for the 16bits MSB of accumulator stands for the 16bits MSB of accumulator AA, , BB stands for accumulator stands for accumulator BB..22 [R] : rounding result on the 16bits MSB of [R] : rounding result on the 16bits MSB of dst,dst, 16bits LSB are zeroed 16bits LSB are zeroed3 3 Result is stored in field3 if present else in field2Result is stored in field3 if present else in field2

Inst field1 field2 field3 words1 MACA[R]/MASA Smem [,B] 12 MACA[R]/MASA[R] T, src [,dst] 1


ESIEE, Slide 16

General Purpose ArithmeticGeneral Purpose Arithmetic

Extended precision multiplicationExtended precision multiplication

1: field3=unsigned(field1) 1: field3=unsigned(field1) unsigned(unsigned(TREGTREG)) 2: field3=field3 +(unsigned(field1) 2: field3=field3 +(unsigned(field1) signed(field2))signed(field2))

MPYUMPYU is equivalent to is equivalent to MPYMPY syntax 1, but with unsigned syntax 1, but with unsigned operands.operands.

MACSUMACSU is equivalent to is equivalent to MACMAC syntax 2, but with field1 syntax 2, but with field1 operand unsigned.operand unsigned.

Inst field1 field2 field3 words1 MPYU Smem, src 12 MACSU Xmem, Ymem, src 1


ESIEE, Slide 17

General Purpose ArithmeticGeneral Purpose Arithmetic Extended precision multiplicationExtended precision multiplication

Principle :Principle :

LookLook at a code exemple at a code exemple

MPYU

MACSU

MAC

MACSU


ESIEE, Slide 18


SquareSquare

1: field2=field1 1: field2=field1 field1 field1 2: field2=2: field2=A(32-16) A(32-16) A(32-16) A(32-16) 11

3: field2=field2 3: field2=field2 (field1(field1field1)field1)

Inst field1 field2 words1 SQUR Smem, dst 12 SQUR A, dst 13 SQURA/SQURS Smem, src 1

11A(32-16)A(32-16) stands for the 16bits MSB of accumulator stands for the 16bits MSB of accumulator AA


ESIEE, Slide 19


DivideDivide Division is implemented by using repeated conditional Division is implemented by using repeated conditional

subtraction. subtraction.

Perform a single cycle 1-bit unsigned divide instruction:Perform a single cycle 1-bit unsigned divide instruction: Dividend (numerator) is in LSB of Dividend (numerator) is in LSB of srcsrc and divisor in and divisor in SmemSmem, ,

then :then :

after operation the quotient is LSB of after operation the quotient is LSB of srcsrc and remainder in and remainder in MSB of MSB of src.src.

Inst field1 field2 words1 SUBC Smem, src 1

(src) - (Smem) << 15 --> ALU output(src) - (Smem) << 15 --> ALU output

If If ALU output ALU output 00

ThenThen (ALU output) << 1 + 1 --> src(ALU output) << 1 + 1 --> src

ElseElse (src) << 1 --> src(src) << 1 --> src


ESIEE, Slide 20

General Purpose ArithmeticGeneral Purpose Arithmetic Division Routine Division Routine ((MoreMore examples) examples)

LD @den,16,A

MPYA @num B = num*den (tells sign)

ABS A Strip sign of denominator

STH A,@den

LD @num,A

ABS A Strip sign of numerator

RPT #15 16 iterations

SUBC @den,A 1-bit divide

XC 1,BLT If result needs to be negative

NEG A Invert sign

STL A,@quot Store negative result


ESIEE, Slide 21

Miscellaneous ArithmeticMiscellaneous Arithmetic ABSABS src,[dst]src,[dst] ; compute the absolute ; compute the absolute

value of value of srcsrc and store it in and store it in dst dst if specified if specified srcsrc otherwise (otherwise (dst=|src|dst=|src|).).

NEG src,[dst]NEG src,[dst] ; store the 2s complement of ; store the 2s complement of srcsrc in in dstdst if specified, if specified, srcsrc otherwise ( otherwise (dst=-srcdst=-src).).

MAX(dst)MAX(dst),,MIN(dst)MIN(dst) ; store in ; store in dstdst the greatest the greatest ((resp.resp. the lowest) between the lowest) between AA and and BB accumulator accumulator ((dst=MAX(A,B)dst=MAX(A,B),, dst=MIN(A,B) dst=MIN(A,B)). ).


ESIEE, Slide 22

Miscellaneous ArithmeticMiscellaneous Arithmetic

EXPEXP//NORMNORM :Tools for fixed point to (block)- :Tools for fixed point to (block)-floating point conversionfloating point conversion11 Store the high part of the accumulator (Store the high part of the accumulator (AA or or BB) in ) in

« Mantissa*2^Exponent » form.« Mantissa*2^Exponent » form.

EXP srcEXP src ; compute the number of shift necessary ; compute the number of shift necessary to normalize the high part of the accumulator to normalize the high part of the accumulator srcsrc and store it in and store it in T T register (register (T=EXP(src)T=EXP(src)) . Here ) . Here T=3T=3 after operation. Because of the guard bits, after operation. Because of the guard bits, TT could be negative after operation.could be negative after operation.

1 1 see ch13 «Numerical Issues » for in depth explanation of floating point formatsee ch13 «Numerical Issues » for in depth explanation of floating point format

Guard bitsGuard bits High partHigh part low partlow part

……/…/…


ESIEE, Slide 23

Miscellaneous ArithmeticMiscellaneous Arithmetic NORM src,[dst] NORM src,[dst] ; The contents of the accumu-; The contents of the accumu-

lator lator srcsrc is shifted according to the value in is shifted according to the value in TT reg reg ans stored in ans stored in dstdst if specified, if specified, src src otherwise . otherwise . ((dst=src<<TSdst=src<<TS).).Accumulator after operation :Accumulator after operation :

Guard bitsGuard bits High partHigh part low partlow part

.bss Mantissa,2,1.bss Mantissa,2,1ExpoExpo .set Mantissa+1.set Mantissa+1

.text.text

.bss Mantissa,2,1.bss Mantissa,2,1ExpoExpo .set Mantissa+1.set Mantissa+1

.text.text

; A = 1234h; A = 1234hformat: LD #Mantissa,DPformat: LD #Mantissa,DP

EXP AEXP A NORM ANORM A STST T,@ExpoT,@Expo STH A,@MantissaSTH A,@Mantissa

; A = 1234h; A = 1234hformat: LD #Mantissa,DPformat: LD #Mantissa,DP

EXP AEXP A NORM ANORM A STST T,@ExpoT,@Expo STH A,@MantissaSTH A,@Mantissa 0012

48D0MantissaMantissaExpoExpo


ESIEE, Slide 24

Miscellaneous ArithmeticMiscellaneous Arithmetic

Rounding and saturation are intended for Rounding and saturation are intended for finite precision and finite dynamic number finite precision and finite dynamic number representation :representation : RND src,[dst] RND src,[dst] ; the high part of the accumulator ; the high part of the accumulator

srcsrc is rounded up and stored either in is rounded up and stored either in dst dst if if specified or in specified or in src src ((dst=rnd(src)dst=rnd(src)).).

SAT src SAT src ; If ; If srcsrc is greater than 32767 then is greater than 32767 then src src is is set to 32767 (007FFFh). If set to 32767 (007FFFh). If srcsrc is lower than –32768 is lower than –32768 then then src src is set to –32768 (FF8000h) is set to –32768 (FF8000h) ((SATURATE(src)SATURATE(src)). ).

Before roundingBefore rounding After roundingAfter rounding


ESIEE, Slide 25

Polynomial EvaluationPolynomial Evaluation Considering the 3rd order polynomial:Considering the 3rd order polynomial:

It can be computed as:It can be computed as:

in an iterative way:in an iterative way:

012

23

3 axaxaxay 01

22

33 axaxaxay

0123 axaxaxay 0123 axaxaxay

;

;

;

0

1

23

ayxy

ayxy

axay

;

;

;

0

1

23

ayxy

ayxy

axay

……/…/…


ESIEE, Slide 26

Polynomial EvaluationPolynomial Evaluation Before using Before using POLYPOLY instruction, we have to load instruction, we have to load TT

reg. with the proper value for reg. with the proper value for xx.. POLY SmemPOLY Smem ; The high part of accumulator ; The high part of accumulator AA is is

multiplied by multiplied by TT reg. then added with the high part reg. then added with the high part of of BB and stored in and stored in A. A. The high part of accumula-The high part of accumula-tor tor BB is loaded with the contents of is loaded with the contents of SmemSmem (current (current coefficient) (coefficient) (POLY(Smem)POLY(Smem)).).

coefcoef .sect “COEF”.sect “COEF”.word 1234h,3456h;.word 1234h,3456h;.word 4567h,5678h;.word 4567h,5678h;.bss.bss y,1y,1.text.text

coefcoef .sect “COEF”.sect “COEF”.word 1234h,3456h;.word 1234h,3456h;.word 4567h,5678h;.word 4567h,5678h;.bss.bss y,1y,1.text.text

; A(15-0) = 7FFCh (x); A(15-0) = 7FFCh (x)PoEval: STLM A,TPoEval: STLM A,T

STM #COEF,AR1STM #COEF,AR1 LD *AR1+,16,ALD *AR1+,16,A LD *AR1+,16,BLD *AR1+,16,B RPT #2RPT #2 POLY *AR1+POLY *AR1+ STHSTH A,*(y) A,*(y)

; A(15-0) = 7FFCh (x); A(15-0) = 7FFCh (x)PoEval: STLM A,TPoEval: STLM A,T

STM #COEF,AR1STM #COEF,AR1 LD *AR1+,16,ALD *AR1+,16,A LD *AR1+,16,BLD *AR1+,16,B RPT #2RPT #2 POLY *AR1+POLY *AR1+ STHSTH A,*(y) A,*(y)


ESIEE, Slide 27

Distance ComputationDistance Computation ABDSTABDST Xmem,YmemXmem,Ymem ; computes the L1 ; computes the L1

norm of the distance between 2 vectors according norm of the distance between 2 vectors according to (to (ABDST(Xmem,Ymem)ABDST(Xmem,Ymem)):):

1

0

N

iii yxD

1

0

N

iii yxD.bss X,10.bss X,10

.bss Y,10.bss Y,10

.bss D,1.bss D,1

.text.text

.bss X,10.bss X,10

.bss Y,10.bss Y,10

.bss D,1.bss D,1

.text.text

dist: STMdist: STM #X,AR2#X,AR2STMSTM #Y,AR3#Y,AR3RPTRPT #10#10ABDSTABDST *AR2+,*AR3+*AR2+,*AR3+STHSTH B,*(D)B,*(D)

dist: STMdist: STM #X,AR2#X,AR2STMSTM #Y,AR3#Y,AR3RPTRPT #10#10ABDSTABDST *AR2+,*AR3+*AR2+,*AR3+STHSTH B,*(D)B,*(D)


ESIEE, Slide 28

Distance ComputationDistance Computation SQDSTSQDST Xmem,YmemXmem,Ymem ; computes the ; computes the

squared Lsquared L22 norm of the distance between 2 vectors norm of the distance between 2 vectors according to (according to (SQDST(Xmem,Ymem)SQDST(Xmem,Ymem)):):

1

0

2N

iii yxD

1

0

2N

iii yxD.bss X,10.bss X,10

.bss Y,10.bss Y,10

.bss D,1.bss D,1

.text.text

.bss X,10.bss X,10

.bss Y,10.bss Y,10

.bss D,1.bss D,1

.text.text

dist: STMdist: STM #X,AR2#X,AR2STMSTM #Y,AR3#Y,AR3RPTRPT #10#10SQDSTSQDST *AR2+,*AR3+*AR2+,*AR3+STHSTH B,*(D)B,*(D)

dist: STMdist: STM #X,AR2#X,AR2STMSTM #Y,AR3#Y,AR3RPTRPT #10#10SQDSTSQDST *AR2+,*AR3+*AR2+,*AR3+STHSTH B,*(D)B,*(D)


ESIEE, Slide 29

Specific Filters IntructionsSpecific Filters Intructions

Symmetric FIR filtersSymmetric FIR filters11::An even length symetric FIR filter can be computed An even length symetric FIR filter can be computed

according to:according to:

which yields which yields NN multiplications, because of the multiplications, because of the symmetry of symmetry of h(k)h(k), the equation can be rewritten:, the equation can be rewritten:

yielding only yielding only N/2N/2 multiplications. This optimization multiplications. This optimization is handled by is handled by FIRSFIRS instruction. instruction.

y n h k x n kk

N

( ) ( ) ( )

0

1

y n h k x n kk

N

( ) ( ) ( )

0

1

12

0

1)()()(N

k

kNnxknxkhny

12

0

1)()()(N

k

kNnxknxkhny

1 1 see ch14 «FIR filter implementation » for a full treatment of this topicsee ch14 «FIR filter implementation » for a full treatment of this topic

……/…/…


ESIEE, Slide 30

Specific Filters IntructionsSpecific Filters Intructions FIRS Xmem,Ymem,pmad FIRS Xmem,Ymem,pmad ;The high part of ;The high part of

accumulator accumulator AA is multiplied by the content of is multiplied by the content of pmadpmad and accumulated in accumulator and accumulated in accumulator BB. . XmemXmem and and YmemYmem are added together and stored in the are added together and stored in the high part of accumulator high part of accumulator A A ((FIRS (Xmem,Ymem,pmad)FIRS (Xmem,Ymem,pmad)).).

At each step, At each step, FIRSFIRS do the following computation: do the following computation:

where where y(n)y(n) is in accumulator is in accumulator BB and and tmptmp in in accumulator accumulator AA..

;111

);()()(

kNnxknxtmp

khtmpnyny

;111

);()()(

kNnxknxtmp

khtmpnyny


ESIEE, Slide 31

Specific Filters IntructionsSpecific Filters Intructions

LMS AlgorithmLMS Algorithm11::LMS Adaptive filtering require to update coeffi-LMS Adaptive filtering require to update coeffi-

cients of the filter according to an error signal cients of the filter according to an error signal e(n)e(n) while computing the output of the filter while computing the output of the filter y(n). y(n). This This involves the following computations:involves the following computations:

At each step we have two computations: one for the At each step we have two computations: one for the filter tap and update of the coefficient tap.filter tap and update of the coefficient tap.

1 1 see ch16 «Adaptive Filter Implementation » for a full treatment of this topicsee ch16 «Adaptive Filter Implementation » for a full treatment of this topic

;each for )()( 2)()1(

ncomputatioerror );()(ˆ)(

outputfilter );()()(

kkk

kk

hknxnenhnh

nynyne

knxnhny

;each for )()( 2)()1(

ncomputatioerror );()(ˆ)(

outputfilter );()()(

kkk

kk

hknxnenhnh

nynyne

knxnhny

……/…/…


ESIEE, Slide 32

Specific Filters IntructionsSpecific Filters Intructions LMS Xmem,YmemLMS Xmem,Ymem ; ; XmemXmem is accumulated to the is accumulated to the

high part of accumulator high part of accumulator AA with rounding while with rounding while XmemXmem and and YmemYmem are multiplied and accumula-ted are multiplied and accumula-ted into accumulator into accumulator B B ((LMS(Xmem,Ymem)LMS(Xmem,Ymem)).).

At each step At each step LMSLMS do the following computations: do the following computations:

where where y(n)y(n) is in accumulator is in accumulator BB and and tmptmp in in accumu-lator accumu-lator AA. In addition others instructions . In addition others instructions have to store in accumulator have to store in accumulator AA the error times the the error times the adaptation step and store in adaptation step and store in XmemXmem the updated the updated coefficient value (coefficient value (ST||MPYST||MPY).).

);(

);()()()(

nhtmptmp

knxnhnyny

k

k

);(

);()()()(

nhtmptmp

knxnhnyny

k

k


ESIEE, Slide 33

Butterfly ComputationButterfly Computation

These instructions are only useful in dual 16 bits mode These instructions are only useful in dual 16 bits mode ((C16C16=1)=1)

1: 1: dstdst(31-16)=Lmem(31-16)+(31-16)=Lmem(31-16)+TREG TREG

dstdst(15-0)=Lmem(15-0)-(15-0)=Lmem(15-0)-TREGTREG

2: 2: dstdst(31-16)=Lmem(31-16)-(31-16)=Lmem(31-16)-TREG TREG

dstdst(15-0)=Lmem(15-0)+(15-0)=Lmem(15-0)+TREGTREG

3: 3: dstdst(31-16)=Lmem(31-16)-(31-16)=Lmem(31-16)-TREG TREG

dstdst(15-0)=Lmem(15-0)-(15-0)=Lmem(15-0)-TREGTREG

1 1 see ch22 «Viterbi Algorithm» for in depth explanation and see ch22 «Viterbi Algorithm» for in depth explanation and CMPS for other for other

Viterbi related instructionsViterbi related instructions

Inst field1 field2 words1 DADST Lmem, dst 12 DSADT Lmem, dst 13 DSUBT Lmem, dst 1

dd

-d-d

NN N+1N+1

-d-d

dd

NN N+1N+1


ESIEE, Slide 34

Logic InstructionsLogic Instructions

OverviewOverview LogicLogic

Comparison andComparison and bit test bit test

Shift Shift and rotateand rotate

ANDAND OROR XORXOR CMPLCMPL

ANDMANDM ORMORM XORMXORM

SFTLSFTL SFTASFTA SSFTCFTC

RORROR ROLROL ROLTCROLTC

CMPMCMPM CMPRCMPR CMPSCMPS

BITBIT BITFBITF


ESIEE, Slide 35

Logic InstructionsLogic Instructions Logic operations on accumulatorsLogic operations on accumulators

1,2,3: field4=field3 [1,2,3: field4=field3 [ + + ] field1*2] field1*2field2field2

Result is stored in field4 if present else in field3, shift is done Result is stored in field4 if present else in field3, shift is done according to field2 if present.according to field2 if present.

4: field4=field4 [4: field4=field4 [ + + ] field1*2] field1*2field2field2

field4 is used if present else field1is used instead, shift is done field4 is used if present else field1is used instead, shift is done according to field2 if present.according to field2 if present.

Shift field is recalled on the this Shift field is recalled on the this slideslide … …

field2=field1field2=field111

Result is stored in field2 if present else in field1Result is stored in field2 if present else in field1

Inst field1 field2 field3 field4 words1 AND/OR/XOR Smem, src 12 AND/OR/XOR #lk [,SHIFT1] src [,dst] 23 AND/OR/XOR #lk, 16, src [,dst] 24 AND/OR/XOR src [,SHIFT] [,dst] 1

Inst field1 field2 words1 CMPL src, [,dst] 1

11bit complementbit complement


ESIEE, Slide 36


Logic with memoryLogic with memory

field2=field1 [field2=field1 [ + + ] field2] field2 About About ANDMANDM look at look at BITFBITF

Inst field1 field2 words1 ANDM/ORM/XORM #lk, Smem 2


ESIEE, Slide 37


Comparison (memory)Comparison (memory)

Equality testEquality test TCTC=1 if field1==filed2, else =1 if field1==filed2, else TCTC=0=0

Comparison (auxiliary register)Comparison (auxiliary register)

Versatile comparisonVersatile comparison ARxARx is compared against is compared against AR0AR0 according to according to CC CC (field1) and (field1) and TCTC

is set if compare successis set if compare success

Inst field1 field2 words1 CMPM Smem, #lk 2

Inst field1 field2 words1 CMPR CC, ARx 2

00 =01 !=10 >

11 <


ESIEE, Slide 38


Compare, select, store (and remember)Compare, select, store (and remember)Intended for Viterbi algorithm (see Chapter 22 for an in depth treatment, Intended for Viterbi algorithm (see Chapter 22 for an in depth treatment,

see see DSADT and DADST for other Viterbi related instructions) for other Viterbi related instructions)xx

yy

??NN N+1N+1

Two paths arrive to a node of stage Two paths arrive to a node of stage N+1N+1

from stage from stage NN. Only one will be retained. Only one will be retained

according to its weight according to its weight xx or or yy..

Inst field1 field2 words1 CMPS src, Smem 1

We guess that src(32-16)=We guess that src(32-16)=xx and src(15-0)= and src(15-0)=yy then then

If src(32-16) > src(15-0) thenIf src(32-16) > src(15-0) then

Smem=src(31-16)Smem=src(31-16)

TRN=(TRN) << 1TRN=(TRN) << 1

TRN(0)=0TRN(0)=0

TC=0TC=0

Else src(32-16) Else src(32-16) src(15-0) then src(15-0) then

Smem=src(15-0)Smem=src(15-0)

TRN=(TRN) << 1TRN=(TRN) << 1

TRN(0)=1TRN(0)=1

TC=1TC=1

TransitionTransition

registerregister


ESIEE, Slide 39


Bit testBit test

BITBIT and and BITTBITT set set TCTC according to a bit value in a word according to a bit value in a word specified by the operand in field1. The bit number is specified specified by the operand in field1. The bit number is specified either by either by BITCBITC in case of syntax 1 or by in case of syntax 1 or by TT[3..0] register in case [3..0] register in case of syntax 2.of syntax 2.

Bit numbering is in reverse order, with 0 corresponding to the Bit numbering is in reverse order, with 0 corresponding to the MSB and 15 to the LSB.MSB and 15 to the LSB.

Bit field testBit field test

TCTC is set according the result of (field1 is set according the result of (field1 field2) field2) For this instruction look also at For this instruction look also at ANDMANDM

Inst field1 field2 words1 BIT Xmem, BITC 12 BITT Smem 1

Inst field1 field2 words1 BITF Smem, #lk 2


ESIEE, Slide 40


Shift and rotateShift and rotate

field3=field1 field3=field1 2 2field2field2

Inst field1 field2 field3 words1 SFTL/SFTA src, SHIFT [,dst] 1

For shift field content information see this For shift field content information see this slide

• Field 1 is left or right shifted according to the sign of Field 1 is left or right shifted according to the sign of SHIFTSHIFT and stored and stored

in field3 if present, field 1 otherwisein field3 if present, field 1 otherwise

• SFTLSFTL stands for LOGICAL shift : input bits are equal to 0 stands for LOGICAL shift : input bits are equal to 0

• SFTASFTA stands for ARITHMETIC shift : input low order bits are 0 in stands for ARITHMETIC shift : input low order bits are 0 in

case of positive case of positive SHIFTSHIFT. Input high order bits are equal to the sign bits. Input high order bits are equal to the sign bits

(if (if SXMSXM11 is set) when is set) when SHIFTSHIFT is negative. is negative.

11Sign Extension ModeSign Extension Mode

Inst field1 words1 SFTC src 1

•Shift conditionaly (Shift conditionaly (SFTCSFTC) apply to signed, one left shift is done to remove one ) apply to signed, one left shift is done to remove one

redundant sign bit (redundant sign bit (TCTC is then set) otherwise nothing is done and is then set) otherwise nothing is done and TCTC is reset. is reset.


ESIEE, Slide 41


Shift and rotateShift and rotate

RORROR perform one right rotate through the carry C on perform one right rotate through the carry C on src src (guard bits=0,(guard bits=0,srcsrc(31)=(31)=CC,,CC==srcsrc(0)).(0)).

ROLROL perform one left rotate through the carry C on perform one left rotate through the carry C on src src (guard bits=0,(guard bits=0,srcsrc(0)=(0)=CC,,CC==srcsrc(31)).(31)).

ROLTCROLTC perform one left rotate with perform one left rotate with TCTC as input and as input and CC as ouput. (guard bits=0,as ouput. (guard bits=0,srcsrc(0)=(0)=TCTC,,CC==srcsrc(31)).(31)).

Inst field1 words1 ROR/ROL/ROLTC src 1


ESIEE, Slide 42

Load, Store & Move Instructions Load, Store & Move Instructions Load & Store accumulatorLoad & Store accumulator

field3=field1* 2field3=field1* 2field2.field2.

shift is done according to field2 if present.shift is done according to field2 if present. Recall on Recall on SHIFTSHIFT field can be found at this field can be found at this placeplace

Inst field1 field2 field3 words1 LD Smem, dst 12 LD Smem, TS, dst 13 LD Smem, 16 dst 14. LD Smem, [SHIFT,] dst 25 LD Xmem, SHIFT1, dst 16 LD #K dst 17 LD #lk, [SHIFT1,] dst 28 LD #lk, 16, dst 29 LD src, [SHIFT] [,dst] 110 LD src, ASM [,dst] 1


ESIEE, Slide 43

Load, Store & Move InstructionsLoad, Store & Move Instructions

Load & Store accumulatorLoad & Store accumulator

1: field2=field1*21: field2=field1*21616+ 2+ 21515 or or dstdst(32-16)=field1+0.5(32-16)=field1+0.5 2: field2=unsigned(field1) or 2: field2=unsigned(field1) or dstdst(32-16)=0 and (32-16)=0 and

dstdst(15-0)=field1.(15-0)=field1. 3: particular case of syntax1 3: particular case of syntax1 LDLD for Memory Map for Memory Map

Register. Register.

Inst field1 field2 words1 LDR Smem, dst 12 LDU Smem, dst 1

3 LDM MMR, dst 1


ESIEE, Slide 44


Load & Store accumulatorLoad & Store accumulator

STLSTL store store srcsrc(15-0) and (15-0) and STH STH srcsrc(32-16)(32-16) field3= field1* 2field3= field1* 2field2field2

shift is done according to field2 if present.shift is done according to field2 if present. Recall on Recall on SHIFTSHIFT field can be found at this field can be found at this placeplace

Same as syntax1 Same as syntax1 STL STL above except that field3 is a above except that field3 is a Memory Map Register.Memory Map Register.

Inst field1 field2 field3 words1 STL/STH src, Smem 12 STL/STH src, ASM, Smem 13 STL/STH src, SHIFT1, Xmem 14. STL/STH src, [SHIFT,] Smem 2

Inst field1 field2 field3 words1 STLM src, MMR 1


ESIEE, Slide 45


Load & Store other registersLoad & Store other registers

Allows initialization of Allows initialization of TT, , DPDP or or ASMASM either from either from memory (memory (SmemSmem) or from an immediate value.) or from an immediate value.

##k3k3, #, #k5k5 and # and #k9k9 stand respectively for 3, 5 and 9 bits stand respectively for 3, 5 and 9 bits immediate value.immediate value.

ARPARP is only intended for ‘C25 compatibility mode and is only intended for ‘C25 compatibility mode and is not of interest in native ‘C54x software.is not of interest in native ‘C54x software.

Inst field1 field2 words1 LD Smem, T 12 LD Smem, DP 13 LD #k9 DP 14 LD #k5 ASM 15 LD Smem, ASM 16 LD #k3 ARP 1


ESIEE, Slide 46

Load, Store & Move InstructionsLoad, Store & Move Instructions Save others registers or write immediate to Save others registers or write immediate to

memory.memory.

field1=field2.field1=field2. Syntax 3 allows initialization of any data memory location with Syntax 3 allows initialization of any data memory location with

an immediate value.an immediate value.

Write an immediate 16 bit value into any Memory Map Write an immediate 16 bit value into any Memory Map Register. Register.

Inst field1 field2 words1 STM #lk, MMR 2

Inst field1 field2 words1 ST T, Smem 1

2 ST TRN, Smem 13 ST #lk, Smem 2


ESIEE, Slide 47

Load, Store & Move InstructionsLoad, Store & Move Instructions Direct transfer from memory to memoryDirect transfer from memory to memory

IO prog data data (MMR)IO PORTR

progMVPD

READA

data PORTWMVDP WRITA

MVDD MVDK MVKD

MVDM

data(MMR) MVMD MVMM

Destination SpaceDestination Space

Sour

ce s

pace

Sour

ce s

pace


ESIEE, Slide 48


Data space Data space IO space IO space

field2=field1field2=field111

Data space Data space Prog. space Prog. space

1,3: field2=field11,3: field2=field111

2: source prog. memory address is specified by 2: source prog. memory address is specified by AA(15-0)(15-0) 4: destination prog. memory address is specified by 4: destination prog. memory address is specified by

AA(15-0)(15-0)

Inst field1 field2 words1 PORTR PA, Smem 22 PORTW Smem, PA 2

11 0 0 PA PA 65535, 0 65535, 0 pmad pmad 65535 65535

Inst field1 field2 words1 MVPD pmad, Smem 22 READA Smem 13 MVDP Smem, pmad 24 WRITA Smem 1


ESIEE, Slide 49


Data space Data space Data space Data space

Data space Data space MMR MMR

MMR MMR MMR MMR11

Inst field1 field2 words1 MVDD Xmem, Ymem 12 MVKD pmad, Smem 23 MVDK Smem, pmad 2

Inst field1 field2 words1 MVDM dmad MMR 22 MVMD MMR, dmad 2

Inst field1 field2 words1 MVMM MMR1 MMR2 1

11MMR1,MMR2:AR0-AR7, SP onlyMMR1,MMR2:AR0-AR7, SP only


ESIEE, Slide 50

Program Control InstructionsProgram Control Instructions

† Values for words (W) and cycles assume the use of DARAM for data.

‡ Conditions true

§ Condition false

¶ Delayed instruction

Syntax Expression W† Cycles

B[D] pmad PC = pmad(15–0) 2 4/[2¶]BACC[D] src PC = src(15–0) 1 6/[4¶]

BANZ[D] pmad, Sind if (Sind !=00) then PC = pmad(15–0) 24‡/2§/[

2¶]

BC[D] pmad, cond [ , cond [ , cond] ] if (cond(s)) then PC = pmad(15–0) 25‡/3§/[

3¶]FB[D] extpmad PC = pmad(15–0), XPC = pmad(22–16) 2 4/[2¶]FBACC[D] src PC = src(15–0), XPC = src(22–16) 1 6/[4¶]

Branch instructions


ESIEE, Slide 51



‡ Conditions true, § Condition false, ¶ Delayed instruction


CALA[D] src– –SP, PC + 1[3¶] = TOS,

PC = src(15–0)1 6/[4¶]

CALL[D] pmad– –SP, PC + 2[4¶] = TOS,

PC = pmad(15–0)2 4/[2§]

CC[D] pmad, cond [ , cond [ , cond]]if (cond(s)) then – –SP,

PC + 2[4¶] = TOS, PC = pmad(15–0)

25‡/3§/[

3¶]

FCALA[D] src– –SP, PC + 1 [3¶] = TOS,

PC = src(15–0), XPC = src(22–16)

1 6/[4¶]

FCALL[D] extpmad– –SP, PC + 2[4¶] = TOS,

PC = pmad(15–0), XPC = pmad(22–16)

2 6/[4¶]

FBACC[D] srcPC = src(15–0),

XPC = src(22–16)2 6/[4¶]

Call instructions


ESIEE, Slide 52



INTR K– –SP, + + PC = TOS,

PC = IPTR(15–7) + K << 2, INTM = 1

1 3

TRAP K– –SP, + + PC = TOS,

PC = IPTR(15–7) + K << 21 3

Interrupt instructions



ESIEE, Slide 53

Program Control InstructionsProgram Control InstructionsSyntax Expression W† Cycles

FRET[D]XPC = TOS, ++ SP,

PC = TOS, ++SP1 6/[4¶]

FRETE[D]XPC = TOS, ++ SP, PC = TOS, ++SP,

INTM = 01 6/[4¶]

RC[D] cond [ , cond [ , cond] ]if (cond(s)) then PC

= TOS, ++SP2

5‡/3§/[3¶]

RET[D] PC = TOS, ++SP 1 5/[3¶]

RETE[D]PC = TOS, ++SP,

INTM = 01 5/[3¶]

RETF[D]PC = RTN, ++SP,

INTM = 01 3/[1¶]

Return instructions




ESIEE, Slide 54


RPT SmemRepeat single, RC = Smem

1 3

RPT # KRepeat single,

RC = #K1 1

RPT # lkRepeat single,

RC = #lk2 2

RPTB[D] pmad

Repeat block, RSA = PC + 2[4¶],

REA = pmad, BRAF = 1

2 4/[2¶]

RPTZ dst, # lkRepeat single,

RC = #lk, dst = 02 2

Repeat instructions




ESIEE, Slide 55



FRAME K SP = SP + K 1 1

POPD Smem Smem = TOS, ++SP 1 1

POPM MMR MMR = TOS, ++SP 1 1

PSHD Smem – –SP, Smem = TOS 1 1

PSHM MMR – –SP, MMR = TOS 1 1

Stack-Manipulating instructions


Add 1 word and 1 cycle when using long-offset indirect

addressing or absolute addressing with an Smem.


ESIEE, Slide 56


IDLE K idle(K) 1 4

MAR Smem

If CMPT = 0, then modify ARx If CMPT = 1 and ARx != AR0

then modify ARx, ARP = x If CMPT = 1 and ARx = AR0,

then modify AR(ARP)

1 1

NOP no operation 1 1RESET software reset 1 3RSBX N, SBIT STN (SBIT) = 0 1 1SSBX N, SBIT STN (SBIT) = 1 1 1

XC n , cond [ , cond[ , cond]If (cond(s)) then execute the next n instructions; n = 1 or 2

1 1

Miscellaneous Program-Control instructions

† Values for words (W) and cycles assume the use of DARAM for data. Add 1 word and 1 cycle when using long-offset indirect addressing or absolute addressing with an Smem.


ESIEE, Slide 57

C55x Assembly LanguageC55x Assembly Language

Instructions set Instructions set generalitiesgeneralities C55 C55 Parallelism features and rulesParallelism features and rules C55 C55 mnemonicmnemonic instruction set instruction set summarysummary


ESIEE, Slide 58

C55 Instruction set generalitiesC55 Instruction set generalities

General syntax of an instruction :General syntax of an instruction : InstrInstr Op1,[Op2,[Op3,[…]]] Op1,[Op2,[Op3,[…]]]

For For InstrInstr field refer to TI documentation or field refer to TI documentation or following slides.following slides.

The The Op1,[Op2,[Op3,[…]]]Op1,[Op2,[Op3,[…]]] field syntax is field syntax is specified in instruction documentation and specified in instruction documentation and specifies the way (type of addressing mode) specifies the way (type of addressing mode) you could use for the operands.you could use for the operands.

Parallelism is presented in a following sectionParallelism is presented in a following section Terms, symbols and abbreviations are Terms, symbols and abbreviations are

presented in spru374.pdfpresented in spru374.pdf


ESIEE, Slide 59

C55 Instruction set generalitiesC55 Instruction set generalities

Instruction size can be:Instruction size can be: 8, 16, 24, 32, 40 or 48 bis.8, 16, 24, 32, 40 or 48 bis.

In the mnemonic syntax, instruction names In the mnemonic syntax, instruction names are constituted with:are constituted with: a root and potentially a prefix and a suffix.a root and potentially a prefix and a suffix. There are:There are:

37 roots (see next slides)37 roots (see next slides) 2 Prefixes (see next slides)2 Prefixes (see next slides) 13 Suffixes (see next slides)13 Suffixes (see next slides)


ESIEE, Slide 60

List of roots for instruction names 1/2List of roots for instruction names 1/2Root MeaningABS Absolute valueADD AdditionAND Bitwise AND

B BranchCALL Function callCLR Assign the value to 0CMP CompareCNT CountEXP ExponentMAC Multiply and accumulateMAR Modify auxiliary register contentMAS Multiply and subtractMAX MaximumMIN MinimumMOV Move dataMPY MultiplyNEG Negate (2s complement)NOT Bitwise complement (1s complement)


ESIEE, Slide 61

List of roots for instruction names 2/2List of roots for instruction names 2/2Root MeaningOR Bitwise OR

POP Pop from top of the stackPSH Push to top of the stackRET ReturnROL Rotate leftROR Rotate rightRPT RepeatSAT SaturateSET Assign the value to 1

SFT Shift (left or right depending on sign of shift count)

SQA Square and addSQR SquareSQS Square and subtractSUB Subtraction

SWAP Swap register contentsTST Test bitXOR Bitwise exclusive-OR (XOR)XPA ExpandXTR Extract


ESIEE, Slide 62

List of prefixes for instruction namesList of prefixes for instruction names

Prefix Meaning

A

Instruction happens in address phase and is subject to circular addressing effects. Also, it occurs in the DAGEN functional unit and cannot be placed in parallel with any instruction that uses dual addressing mode.

BBit instruction. Note that B is also a root (branch), suffix (borrow), and prefix (bit). The differences in context should prevent any confusion.


ESIEE, Slide 63

List of suffixes for instruction namesList of suffixes for instruction namesSuffix Meaning

40 Enables the M40 mode (all 40 bits of the accumulator count)

B Borrow

C CarryCC Conditional

I Enable interruptsK Multiply has a constant operandL Logical shift (left or right depending on sign of shift count)

MThis instruction has the option of assigning a memory operand to T3 regardless of whether that assignment actually occurs.

R RoundS Signed shift (left or right depending on sign of shift count)U UnsignedV Absolute valueZ Delay on the memory operand


ESIEE, Slide 64

Operand modifiersOperand modifiers

Modifier Meaningdlb Access a true 32-bit memory operand

dualAccess a 32-bit memory operand for use as two independent 16-bit halves of the given operation

Hi Access upper 16 bits of the accumulatorhigh_byte Access the high byte of the memory location

LO Access lower 16 bits of the accumulatorlow_byte Access the low byte of the memory location

pair Dual register accessrnd Round

saturate Saturateuns Unsigned operand (not used in MOV instructions)


ESIEE, Slide 65

Addressing-mode operandsAddressing-mode operandsSyntax element Description

Baddr

When an instruction contains Baddr, that instruction can access one or two bits in an accumulator (AC0–AC3), an auxiliary register (AR0–AR7), or a temporary register (T0–T3). Only the register bit test/set/clear/complement instructions support Baddr. As y

CmemWhen an instruction contains Cmem, that instruction can access a single word (16 bits) of data from data memory. As you write the instruction, replace Cmem with a compatible operand.

LmemWhen an instruction contains Lmem, that instruction can access a long word (32 bits) of data from data memory or from a memory-mapped registers. As you write the instruction, replace Lmem with a compatible operand.

Smem

When an instruction contains Smem, that instruction can access a single word (16 bits) of data from data memory, from I/O space, or from a memory-mapped register. As you write the instruction, replace Smem with a compatible operand.

Xmem and YmemWhen an instruction contains Xmem and Ymem, that instruction can perform two simultaneous 16-bit accesses to data memory. As you write the instruction, replace Xmem and Ymem with compatible operands.


ESIEE, Slide 66

Absolute addressing modesAbsolute addressing modes

Addressing mode Description

k16 absolute

k16 absolute This mode uses the 7-bit register called DPH (high part of the extended data page register) and a 16-bit unsigned constant to form a 23-bit data-space address. This mode is used to access a memory location or a memory-mapped register.

k23 absoluteThis mode enables you to specify a full address as a 23-bit unsigned constant. This mode is used to access a memory location or a memory-mapped register.

I/O absoluteThis mode enables you to specify an I/O address as a 16-bit unsigned constant. This mode is used to access a location in I/O space.


ESIEE, Slide 67

Direct addressing modesDirect addressing modes


DP direct

This mode uses the main data page specified by DPH (high part of the extended data page register) in conjunction with the data page register (DP). This mode is used to access a memory location or a memory-mapped register.

SP direct

This mode uses the main data page specified by SPH (high part of the extended stack pointers) in conjunction with the data stack pointer (SP). This mode is used to access stack values in data memory.

Register-bit directThis mode uses an offset to specify a bit address. This mode is used to access one register bit or two adjacent register bits.

PDP directThis mode uses the peripheral data page register (PDP) and an offset to specify an I/O address. This mode is used to access a location in I/O space.


ESIEE, Slide 68

Indirect addressing modesIndirect addressing modes


AR indirect

This mode uses one of eight auxiliary registers (AR0–AR7) to point to data. The way the CPU uses the auxiliary register to generate an address depends on whether you are CPU uses the auxiliary register to generate an address depends on whether you are or

Dual AR indirectThis mode uses the same address-generation process as the AR i ndirect addressing mode. This mode is used with instructions that access two or more data-memory locations.

CDP indirect

This mode uses the coefficient data pointer (CDP) to p oint to data. The way the CPU uses CDP to generate an address depends on whether you are accessing data space (memory or memory-mapped registers), individual register bits, or I/O space.

Coefficient indirect

This mode uses the same address-generation process as the CDP indirect addressing mode. This mode is available to support instructions that can access a coefficient in data memory at the same time they access two other data-memory values using the dual AR


ESIEE, Slide 69

C55 Parallelism features and rulesC55 Parallelism features and rules The C55x DSP architecture enables to execute The C55x DSP architecture enables to execute

two instructions intwo instructions in parallel within the same parallel within the same cycle of execution. The types of parallelism cycle of execution. The types of parallelism are:are: Built-in parallelism within a single instruction.Built-in parallelism within a single instruction.

Some instructions perform two different operations in Some instructions perform two different operations in parallel. Doubleparallel. Double colons, ::, are used to separate the two colons, ::, are used to separate the two operations.operations.

User-defined parallelism between two User-defined parallelism between two instructions.instructions. Two instructions may be paralleled by Two instructions may be paralleled by the programmerthe programmer

or the C compiler. The parallelor the C compiler. The parallel bars, ||, are used to bars, ||, are used to separate the two instructions to be executed in parallel.separate the two instructions to be executed in parallel.

Built-in parallelism can be combined with user-Built-in parallelism can be combined with user-defined parallelism. defined parallelism.


ESIEE, Slide 70

Rules for Simple and parallel instructionsRules for Simple and parallel instructions Simple instructions are not allowed to span multiple Simple instructions are not allowed to span multiple

lines. One exception,lines. One exception, single instructions that use the single instructions that use the double colons, ::, notation to imply parallelism.double colons, ::, notation to imply parallelism. These instructions may be split up following the :: These instructions may be split up following the ::

notation.notation. E Example xample of aof a single instruction single instruction on on two lines:two lines: MPYR40 uns(Xmem), uns(Cmem), ACx :: MPYR40 uns(Ymem), uns(Cmem), Acy

User-defined parallelism instructions (using || User-defined parallelism instructions (using || notation) are allowed tonotation) are allowed to span multiple lines. For span multiple lines. For example, all of the following instructions are legal:example, all of the following instructions are legal: MOV AC0, AC1 || MOV AC2, AC3 MOV AC0, AC1 || MOV AC2, AC3 MOV AC0, AC1 || MOV AC2, AC3 MOV AC0, AC1 || MOV AC2, AC3


ESIEE, Slide 71

Built-In vs. User-Defined ParallelismBuilt-In vs. User-Defined Parallelism

Built-In (Implied)Built-In (Implied) - separate, unique instructions (you can look ‘em up) that - separate, unique instructions (you can look ‘em up) that are built-in to the instruction set (actually ONE instruction) are built-in to the instruction set (actually ONE instruction) - use a “::” to separate the pieces: - use a “::” to separate the pieces:

User-Defined (Explicit)User-Defined (Explicit) - determined by the user or compiler - determined by the user or compiler - user places a double “pipe” | | to separate the two instructions: - user places a double “pipe” | | to separate the two instructions:

built_in: MAC *AR3+,*CDP+,AC0 :: MAC ...built_in: MAC *AR3+,*CDP+,AC0 :: MAC ...

user_defined: user_defined: MPYM *AR1+,*AR2+,AC1MPYM *AR1+,*AR2+,AC1 |||| AND AR3,T1 AND AR3,T1

ComboCombo - built-in and user-defined can be combined: - built-in and user-defined can be combined:

combo: ADD *AR3<<#16, AC0, AC1combo: ADD *AR3<<#16, AC0, AC1 :: MOV HI(AC1 << T2), *AR4:: MOV HI(AC1 << T2), *AR4 || MOV #3,AR1|| MOV #3,AR1

Let’s find out more about user-defined parallelism...Let’s find out more about user-defined parallelism...

Parallel instructions execute in a single cycle:Parallel instructions execute in a single cycle:


ESIEE, Slide 72

User-Defined ParallelismUser-Defined Parallelism

Can these two instructions be used in parallel?Can these two instructions be used in parallel?

First, we need to know which processorFirst, we need to know which processorresources are being used:resources are being used:

OperatorsOperators “Architecture components that perform“Architecture components that performtasks.” For example: ALUs, addresstasks.” For example: ALUs, addressgenerators, Dual-MAC (DMAC), etc.generators, Dual-MAC (DMAC), etc.

RegistersRegisters Tx, ARx, ACx, BRCx, BSAxx, etc.Tx, ARx, ACx, BRCx, BSAxx, etc.

BusesBuses Memory Read/Write, Constant, etc.Memory Read/Write, Constant, etc.

Let’s explore the types of operators first...Let’s explore the types of operators first...

MOV *AR1,AC1MOV *AR1,AC1|| || ADD @var,AR2 ADD @var,AR2


ESIEE, Slide 73

OperatorsOperators

Each unit has multiple load/store operators (shown later)Each unit has multiple load/store operators (shown later) All other operators are single useAll other operators are single use

LOADLOAD STORESTORE

REGISTERREGISTERFILEFILE

CONTROLCONTROL

PUPU

REGISTERREGISTERFILEFILE SWAPSWAP

ALU, SHIFTER, DMACALU, SHIFTER, DMAC

STORESTOREw/SHFTw/SHFT

DUDU

LOADLOAD STORESTORE


ALU w/SHIFTALU w/SHIFT

AUAU

LOADLOAD STORESTORE SSWWAAPP

DAGENDAGEN

XX YY CC

Which operators are used in the instruction above?Which operators are used in the instruction above?

Let’s review the contents of the register files...Let’s review the contents of the register files...

MOV *AR1,AC1MOV *AR1,AC1|| || ADD @var,AR2 ADD @var,AR2


ESIEE, Slide 74

Register FilesRegister Files

LOADLOAD STORESTORE


PUPU



DUDU

LOADLOAD STORESTORE


AUAU


DAGENDAGEN

XX YY CC

RPTC RPTC BRC0-1 BRC0-1

RSA0-1RSA0-1 REA0-1 REA0-1

RETARETA CSR CSR

AC0-3 TRN0-1AC0-3 TRN0-1

Following is the list of registers contained in each unit:Following is the list of registers contained in each unit:

Next, let’s review the buses...Next, let’s review the buses...

T0-3 XAR0-7 T0-3 XAR0-7 XSP/SSPXSP/SSP

Xreg XCDPXreg XCDP XDPXDP

BKxx BSAxxBKxx BSAxx ST0-3ST0-3

PDPPDP


ESIEE, Slide 75

Read/Write BusesRead/Write Buses

LOADLOAD STORESTORE


CONTROLCONTROL

PUPU




DUDU

LOADLOAD STORESTORE



AUAU


DAGENDAGEN

XX YY CC

Cross-Unit BusesCross-Unit Buses

Read (BB, CB, DB) Write (EB, FB)Read (BB, CB, DB) Write (EB, FB)

One load/store operator exists forOne load/store operator exists foreach bus entering/leaving the unit.each bus entering/leaving the unit.

EE FF EE FF EE FFCC DD CC DD CC DDBB


ESIEE, Slide 76

Examples of Operator/Bus UsageExamples of Operator/Bus UsageInstructionsInstructions OperatorsOperators BusesBuses

MPYM *AR1,T0,AC0 D-Unit ALU/MAC/SH D-Bus

A-Unit DAGEN X

MOV @var,BRC1MOV @var,BRC1 P-Unit LoadP-Unit Load D-BusD-Bus

A-Unit DAGEN XA-Unit DAGEN X

MOV T1,@varMOV T1,@var A-Unit StoreA-Unit Store E-BusE-Bus

A-Unit DAGEN XA-Unit DAGEN X

MOV dbl(@long),AC0MOV dbl(@long),AC0 D-Unit LoadD-Unit Load C-BusC-Bus

A-Unit DAGEN X A-Unit DAGEN X D-BusD-Bus


ESIEE, Slide 77

User-Defined Parallelism - 3 Basic RulesUser-Defined Parallelism - 3 Basic RulesTwo instructions can be written in parallel if:Two instructions can be written in parallel if:

2.2. Total size is less than or equal to Total size is less than or equal to 6 bytes6 bytes

1.1. There is There is no hardware resourceno hardware resource conflict conflict (operands, buses and registers) (operands, buses and registers)

3.3. For instructions with 2 memory accesses, each mustFor instructions with 2 memory accesses, each mustuse use indirectindirect addressing (soft-dual addressing) addressing (soft-dual addressing) Ex: Ex: MOV *AR2,AC0 || MOV T1,*AR1MOV *AR2,AC0 || MOV T1,*AR1

If the instructions meet these rules, If the instructions meet these rules, write it and assemblewrite it and assemble. .

If the assembler doesn’t like it, refer to the “advanced”If the assembler doesn’t like it, refer to the “advanced”rules at the end of this module.rules at the end of this module.


ESIEE, Slide 78

Parallelism - Exercise 1Parallelism - Exercise 1

MOV *AR1,AC1MOV *AR1,AC1 ;bytes;bytes:2 :2 || || ADD @var,AR2ADD @var,AR2 ;bytes;bytes:3 :3

LOADLOAD STORESTORE


CONTROLCONTROL

PUPU




DUDU

LOADLOAD STORESTORE



AUAU


DAGENDAGEN

XX YY CC





ESIEE, Slide 79

Parallelism - Exercise 1 (Solution)Parallelism - Exercise 1 (Solution)

MOV *AR1,AC1MOV *AR1,AC1 ;bytes;bytes:2 :2 || || ADD @var,AR2ADD @var,AR2 ;bytes;bytes:3 :3

Reason: No soft-dual addressingReason: No soft-dual addressing

LOADLOAD STORESTORE


CONTROLCONTROL

PUPU




DUDU

LOADLOAD STORESTORE



AUAU


DAGENDAGEN

XX YY CC




DAGENDAGEN

XX YY


LOADLOADLOADLOAD


ESIEE, Slide 80


MOV dbl(*(AR1+T0)),AC0 MOV dbl(*(AR1+T0)),AC0 ;bytes;bytes:3:3|||| MOV AC2,dbl(*AR2+) MOV AC2,dbl(*AR2+) ;bytes;bytes:3:3

LOADLOAD STORESTORE


CONTROLCONTROL

PUPU




DUDU

LOADLOAD STORESTORE



AUAU


DAGENDAGEN

XX YY CC





ESIEE, Slide 81


MOV dbl(*(AR1+T0)),AC0 MOV dbl(*(AR1+T0)),AC0 ;bytes;bytes:3:3|||| MOV AC2,dbl(*AR2+) MOV AC2,dbl(*AR2+) ;bytes;bytes:3:3

It works.It works.

LOADLOAD STORESTORE


CONTROLCONTROL

PUPU




DUDU

LOADLOAD STORESTORE



AUAU


DAGENDAGEN

XX YY CC




LOADLOAD STORESTORE

XX YY

DAGENDAGEN


ESIEE, Slide 82


CALL AC0 CALL AC0 ;bytes:2;bytes:2 |||| MOV AC1,AR1 MOV AC1,AR1 ;bytes:2;bytes:2

LOADLOAD STORESTORE


CONTROLCONTROL

PUPU




DUDU

LOADLOAD STORESTORE



AUAU


DAGENDAGEN

XX YY CC





ESIEE, Slide 83


CALL AC0 CALL AC0 ;bytes:2;bytes:2 |||| MOV AC1,AR1 MOV AC1,AR1 ;bytes:2;bytes:2

Reason: cross-unit bus conflictReason: cross-unit bus conflict

LOADLOAD STORESTORE


CONTROLCONTROL

PUPU




DUDU

LOADLOAD STORESTORE



AUAU


DAGENDAGEN

XX YY CC






CONTROLCONTROL


ESIEE, Slide 84


MAC *AR1+, *CDP+, AC0>>#16 MAC *AR1+, *CDP+, AC0>>#16 :::: MAC *AR2+, *CDP+, AC1>>#16 ;bytes:MAC *AR2+, *CDP+, AC1>>#16 ;bytes:44|||| RPTADD CSR,T1 RPTADD CSR,T1 ;bytes:2 ;bytes:2

LOADLOAD STORESTORE


CONTROLCONTROL

PUPU




DUDU

LOADLOAD STORESTORE



AUAU


DAGENDAGEN

XX YY CC





ESIEE, Slide 85


MAC *AR1+, *CDP+, AC0>>#16 MAC *AR1+, *CDP+, AC0>>#16 :::: MAC *AR2+, *CDP+, AC1>>#16 ;bytes:MAC *AR2+, *CDP+, AC1>>#16 ;bytes:44|||| RPTADD CSR,T1 RPTADD CSR,T1 ;bytes:2 ;bytes:2

LOADLOAD STORESTORE


CONTROLCONTROL

PUPU




DUDU

LOADLOAD STORESTORE



AUAU


DAGENDAGEN

XX YY CC




It works...It works...

ALU, SHIFTER, DMACALU, SHIFTER, DMACCONTROLCONTROL ALU w/SHIFTALU w/SHIFT

XX YY CC

DAGENDAGEN


ESIEE, Slide 86

Optimization SuggestionsOptimization Suggestions There are two main types of code you can optimize: There are two main types of code you can optimize:

SetupSetup

Inner LoopInner Loop

Usually contains instructions w/large constants such as:Usually contains instructions w/large constants such as: AMOV #x,XAR5AMOV #x,XAR5which cannot be placed in parallel anywaywhich cannot be placed in parallel anyway

Creating a long sequences of 5-6byte instructions Creating a long sequences of 5-6byte instructions couldcouldstall the IBQ (4-byte fetch only) and negate the benefitstall the IBQ (4-byte fetch only) and negate the benefit

Only run once - i.e. usually not inside a loopOnly run once - i.e. usually not inside a loop

Recommendation: don’t spend time on setup codeRecommendation: don’t spend time on setup code

Largest impact for parallel instructions and avoidingLargest impact for parallel instructions and avoidingpipeline stalls based on #loopspipeline stalls based on #loops

Usually contains simple/math instructions which haveUsually contains simple/math instructions which havean increased chance of placing in parallelan increased chance of placing in parallel

Usually inside RPTB/LOCAL - easy to locate exactly Usually inside RPTB/LOCAL - easy to locate exactly which code you should focus your efforts onwhich code you should focus your efforts on

Recommendation: FOCUS ALL YOUR TIME HERERecommendation: FOCUS ALL YOUR TIME HERE


ESIEE, Slide 87

User-Defined Parallelism - Advanced RulesUser-Defined Parallelism - Advanced RulesA. Address modifiers can not use 16/24-bit constants. A. Address modifiers can not use 16/24-bit constants.

B. If D-unit shift-and-store operator is used:B. If D-unit shift-and-store operator is used:

(a) D-unit ALU shifter can not be used and,(a) D-unit ALU shifter can not be used and,

(b) only two source accumulators in total are allowed.(b) only two source accumulators in total are allowed.

C. Instruction priority: C. Instruction priority:

(a) MMR access takes priority over internal access(a) MMR access takes priority over internal access MOV AR1,AC0 MOV AR1,AC0

|| MOV #0,mmap(@AC0L)|| MOV #0,mmap(@AC0L) ;takes ;takes prioritypriority

(b) In a bus conflict, 2nd value is used in both (b) In a bus conflict, 2nd value is used in both instructions (assembler flags warning)instructions (assembler flags warning)

MOV #3,AC0MOV #3,AC0

|| MOV #4,AC1 || MOV #4,AC1 ;takes priority ;takes priority

D. DAGEN tag combination must be allowedD. DAGEN tag combination must be allowed

E. One of the instructions must have a parallel enable bitE. One of the instructions must have a parallel enable bit


ESIEE, Slide 88

Soft-dual parallelismSoft-dual parallelism Instructions that reference memory operands Instructions that reference memory operands

do not have parallel enable bits.do not have parallel enable bits. Two such instructions may still be combined Two such instructions may still be combined

with a type of parallelism calledwith a type of parallelism called soft-dual soft-dual parallelism. parallelism.

The constraints of soft-dual parallelism are:The constraints of soft-dual parallelism are: Both memory operands must meet the constraints Both memory operands must meet the constraints

of the dual AR indirectof the dual AR indirect addressing mode (Xmem addressing mode (Xmem and Ymem). and Ymem).

Neither instruction can contain any of the Neither instruction can contain any of the following:following: Instructions embedding high_byte(Smem) and Instructions embedding high_byte(Smem) and

low_byte(Smemlow_byte(Smem). These instructions that read and write the same These instructions that read and write the same

memory locationmemory location..


ESIEE, Slide 89

Soft-dual parallelismSoft-dual parallelism With regard to soft-dual parallelism, theWith regard to soft-dual parallelism, the AMAR

Smem instruction has theinstruction has the same properties as any same properties as any memory reference instruction.memory reference instruction.

Soft-Dual Parallelism of MAR InstructionsSoft-Dual Parallelism of MAR Instructions Although the following modify auxiliary register (MAR) Although the following modify auxiliary register (MAR)

instructions do notinstructions do not reference memory and do not have reference memory and do not have parallel enable bits, they may be combinedparallel enable bits, they may be combined together or together or with any other memory reference instructions (not limited with any other memory reference instructions (not limited to Xmem/Ymem) to form soft-dual parallelismto Xmem/Ymem) to form soft-dual parallelism::

AADD TAx, TAy AADD k8, TAx AMOV TAx, TAy AMOV k8, TAx ASUB TAx, TAy ASUB k8, TAx


ESIEE, Slide 90

C55 mnemonic instruction set summaryC55 mnemonic instruction set summary The C55 mnemonic instruction set can be The C55 mnemonic instruction set can be

divided into six basic types of operations:divided into six basic types of operations: Arithmetical operationsArithmetical operations Bit manipulations Bit manipulations operationsoperations Extended auxiliary registerExtended auxiliary register (XAR) (XAR) operationsoperations Logical operationsLogical operations Move operationsMove operations ProgramProgram-control -control operationsoperations

With each instruction listing, you will find the With each instruction listing, you will find the availability of a parallel enable bit, word availability of a parallel enable bit, word count(size), cycle time, what pipeline stage the count(size), cycle time, what pipeline stage the instruction is executed, and in what unit the instruction is executed, and in what unit the instruction is executedinstruction is executed.


ESIEE, Slide 91

Arithmetic InstructionsArithmetic Instructions

SyntaxParallel enable

bitSize Cycles Pipeline

Executed in

ABDST Xmem, Ymem, ACx, ACy No 4 1 XD unit MAC

and ALU

ABS [src,] dst Yes 2 1 XA or D unit

ALU

Absolute distance

Absolute Value

The ABDST instruction executes two operations in parallel: The ABDST instruction executes two operations in parallel: one in the D-unit MAC and one in the D-unit ALUone in the D-unit MAC and one in the D-unit ALU:

ACy = ACy + |HI(ACx)| ACx = (Xmem << #16) - (Ymem << #16)

1

0

N

iii yxD

1

0

N

iii yxD


ESIEE, Slide 92

Arithmetic Instructions: ADDArithmetic Instructions: ADDSyntax

Parallel enable bit

Size Cycles Pipeline Executed in

ADD [src,] dst Yes 2 1 XA or D unit

ALU

ADD k4, dst Yes 2 1 XA or D unit

ALU

ADD K16, [src,] dst No 4 1 XA or D unit

ALU

ADD Smem, [src,] dst No 3 1 XA or D unit

ALU

ADD ACx << Tx, ACy Yes 2 1 XD unit ALU and Shiter

ADD ACx << #SHIFTW, ACy Yes 3 1 XD unit ALU and Shiter

ADD K16 << #16, [ACx,] ACy No 4 1 X D unit ALU

ADD K16 << #SHFT, [ACx,] ACy No 4 1 XD unit ALU and shifter

ADD Smem << Tx, [ACx,] ACy No 3 1 XD unit ALU and shifter

ADD Smem << #16, [ACx,] ACy No 3 1 X D unit ALUADD [uns(]Smem[)], CARRY, [ACx,] ACy No 3 1 X D unit ALUADD [uns(]Smem[)], [ACx,] ACy No 3 1 X D unit ALU

ADD [uns(]Smem[)] << #SHIFTW, [ACx,] ACy No 4 1 XD unit ALU and shifter

ADD dbl(Lmem), [ACx,] ACy No 3 1 X D unit ALUADD Xmem, Ymem, ACx No 3 1 X D unit ALUADD K16, Smem No 4 1 X D unit ALUADD[R]V [ACx,] ACy Yes 2 1 X D unit MAC

Addition


ESIEE, Slide 93

Some examples for the 17 forms of ADD Some examples for the 17 forms of ADD ADD AC1, AC0 The content of AC1 is added to the content The content of AC1 is added to the content

of AC0 and the result is stored in AC0.of AC0 and the result is stored in AC0. ADD #15, AC0 The content of AC0 is added to an unsigned The content of AC0 is added to an unsigned

4-bit value (15) and the result is stored in AC0.4-bit value (15) and the result is stored in AC0. ADD #2E00h, AC0, AC1 The content of AC0 is added to The content of AC0 is added to

the signed 16-bit value (2E00h) and the result is stored in the signed 16-bit value (2E00h) and the result is stored in AC1.AC1.

ADD *AR3+, T0, T1 The content of T0 is added to the The content of T0 is added to the content addressed by AR3 and the result is stored in T1. AR3 content addressed by AR3 and the result is stored in T1. AR3 is incremented by 1.is incremented by 1.

ADD AC1 << T0, AC0 The content of AC1 shifted by the The content of AC1 shifted by the content of T0 is added to the content of AC0 and the result is content of T0 is added to the content of AC0 and the result is stored in AC0.stored in AC0.

ADD AC1 << #31, AC0 The content of AC1 shifted left by The content of AC1 shifted left by 31 bits is added to the content of AC0 and the result is stored 31 bits is added to the content of AC0 and the result is stored in AC0.in AC0.


ESIEE, Slide 94

Some examples for the 17 forms of ADDSome examples for the 17 forms of ADD ADD #FFFFh << #16, AC1, AC0 A signed 16-bit value A signed 16-bit value

(FFFFh) shifted left by 16 bits is added to the content of AC1 (FFFFh) shifted left by 16 bits is added to the content of AC1 and the result is stored in AC0.and the result is stored in AC0.

ADD #FFFFh << #15, AC1, AC0 A signed 16-bit value A signed 16-bit value (FFFFh) shifted left by 15 bits is added to the content of AC1 (FFFFh) shifted left by 15 bits is added to the content of AC1 and the result is stored in AC0.and the result is stored in AC0.

ADD *AR1 << T0, AC1, AC0 The content addressed by The content addressed by AR1 shifted left by the content of T0 is added to the content of AR1 shifted left by the content of T0 is added to the content of AC1 and the result is stored in AC0.AC1 and the result is stored in AC0.

ADD *AR3 << #16, AC1, AC0 The content addressed by The content addressed by AR3 shifted left by 16 bits is added to the content of AC1 and AR3 shifted left by 16 bits is added to the content of AC1 and the result is stored in AC0.the result is stored in AC0.

ADD uns(*AR3), CARRY, AC1, AC0 The CARRY status The CARRY status bit and the unsigned content addressed by AR3 are added to bit and the unsigned content addressed by AR3 are added to the content of AC1 and the result is stored in AC0.the content of AC1 and the result is stored in AC0.

ADD uns(*AR3), AC1, AC0 The unsigned content The unsigned content addressed by AR3 is added to the content of AC1 and the addressed by AR3 is added to the content of AC1 and the result is stored in AC0.result is stored in AC0.


ESIEE, Slide 95

Some examples for the 17 forms of ADDSome examples for the 17 forms of ADD ADD uns(*AR3) << #31, AC1, AC0 The unsigned content The unsigned content

addressed by AR3 shifted left by 31 bits is added to the addressed by AR3 shifted left by 31 bits is added to the content of AC1 and the result is stored in AC0.content of AC1 and the result is stored in AC0.

ADD dbl(*AR3+), AC1, AC0 The content (long word) The content (long word) addressed by AR3 and AR3 + 1 is added to the content of AC1 addressed by AR3 and AR3 + 1 is added to the content of AC1 and the result is stored in AC0. Because this instruction is a and the result is stored in AC0. Because this instruction is a long-operand instruction, AR3 is incremented by 2 after the long-operand instruction, AR3 is incremented by 2 after the execution.execution.

ADD *AR3, *AR4, AC0 The content addressed by AR3 The content addressed by AR3 shifted left by 16 bits is added to the content addressed by shifted left by 16 bits is added to the content addressed by AR4 shifted left by 16 bits and the result is stored in AC0.AR4 shifted left by 16 bits and the result is stored in AC0.

ADD #FFFFh, *AR3 The content addressed by AR3 is added The content addressed by AR3 is added to a signed 16-bit value and the result is stored back into the to a signed 16-bit value and the result is stored back into the location addressed by AR3.location addressed by AR3.

ADDV AC1, AC0 The absolute value of AC1 is added to the The absolute value of AC1 is added to the content of AC0 and the result is stored in AC0.content of AC0 and the result is stored in AC0.


ESIEE, Slide 96

Arithmetic instructions: compare and select Arithmetic instructions: compare and select extremumextremum

SyntaxParallel

enable bitSize Cycles Pipeline

Executed in

MAXDIFF ACx, ACy, ACz, ACw Yes 3 1 X D unit ALUDMAXDIFF ACx, ACy, ACz, ACw, TRNx Yes 3 1 X D unit ALUMINDIFF ACx, ACy, ACz, ACw Yes 3 1 X D unit ALUDMINDIFF ACx, ACy, ACz, ACw, TRNx Yes 3 1 X D unit ALU

Compare and Select Extremum

See examples and more details on next slidesSee examples and more details on next slides


ESIEE, Slide 97

Index Search - DMAXDIFF, MAXDIFFIndex Search - DMAXDIFF, MAXDIFF

DMAXDIFF ACx,ACy,ACz,ACw,TRNxDMAXDIFF ACx,ACy,ACz,ACw,TRNx 32-bit Search32-bit Search

ACx/ACy = values, ACz = max, ACw = ACy-ACx, TRNx = indexACx/ACy = values, ACz = max, ACw = ACy-ACx, TRNx = index

~N cycles for N taps with index~N cycles for N taps with index

TRN0/1TRN0/1001515

11 (If new max, shift “1” into MSB)(If new max, shift “1” into MSB)

Splits AC’s into two 16-bit registersSplits AC’s into two 16-bit registers

Max of hi/low halves placed in ACz, ACy-ACx placed in ACw, Max of hi/low halves placed in ACz, ACy-ACx placed in ACw, TRN0/1 hold indexTRN0/1 hold index

TRN0 (tracks AC high), TRN1 (tracks AC low)TRN0 (tracks AC high), TRN1 (tracks AC low)

~N/2 cycles for N taps with index~N/2 cycles for N taps with index

MAXDIFF ACx,ACy,ACz,ACwMAXDIFF ACx,ACy,ACz,ACw Dual 16-bit SearchDual 16-bit Search

MINDIFF and DMINDIFF also supportedMINDIFF and DMINDIFF also supported


ESIEE, Slide 98

Example of MAXDIFF instructionExample of MAXDIFF instruction MAXDIFF AC0, AC1, AC2, AC1

Before After AC0 10 2400 2222 AC0 10 2400 2222 AC1 90 0000 0000 AC1 FF 8000 DDDE AC2 00 0000 0000 AC2 10 2400 2222 SATD 1 SATD 1 TRN0 1000 TRN0 0800 TRN1 0100 TRN1 0080 ACOV1 0 ACOV1 1 CARRY 1 CARRY 0 The difference is stored in AC1. The difference is stored in AC1. The content of AC0(39–16) is subtracted from the content of AC1(39–16) and The content of AC0(39–16) is subtracted from the content of AC1(39–16) and

the result is stored in AC1(39–16). Since SATD = 1 and an overflow is the result is stored in AC1(39–16). Since SATD = 1 and an overflow is detected, AC1(39–16) = FF 8000h (saturation). detected, AC1(39–16) = FF 8000h (saturation).

The content of AC0(15–0) is subtracted from the content of AC1(15–0) and the The content of AC0(15–0) is subtracted from the content of AC1(15–0) and the result is stored in AC1(15–0). result is stored in AC1(15–0).

The maximum is stored in AC2. The maximum is stored in AC2. The content of TRN0 and TRN1 is shifted right 1 bit. AC0(31–16) is greater The content of TRN0 and TRN1 is shifted right 1 bit. AC0(31–16) is greater

than AC1(31–16), AC0(39–16) is stored in AC2(39–16) and TRN0(15) is than AC1(31–16), AC0(39–16) is stored in AC2(39–16) and TRN0(15) is cleared to 0. AC0(15–0) is greater than AC1(15–0), AC0(15–0) is storedcleared to 0. AC0(15–0) is greater than AC1(15–0), AC0(15–0) is stored inAC2(15–0) and TRN1(15) is cleared to 0.inAC2(15–0) and TRN1(15) is cleared to 0.


ESIEE, Slide 99

Example of DMAXDIFF instructionExample of DMAXDIFF instruction DMAXDIFF AC0, AC1, AC2, AC3, TRN1

Before After AC0 10 2400 2222 AC0 10 2400 2222 AC1 00 8000 DDDE AC1 00 8000 DDDE AC2 00 0000 0000 AC2 10 2400 2222 AC3 00 0000 0000 AC3 F0 5C00 BBBC M40 1 M40 1 SATD 1 SATD 1 TRN1 0080 TRN1 0040 ACOV3 0 ACOV3 0 CARRY 0 CARRY 0 The difference is stored in AC3. The difference is stored in AC3. The content of AC0 is subtracted from the content of AC1 and the The content of AC0 is subtracted from the content of AC1 and the

result is stored in AC3.result is stored in AC3. The maximum is stored in AC2. The maximum is stored in AC2. The content of TRN1 is shifted right 1 bit. AC0 is greater than AC1, The content of TRN1 is shifted right 1 bit. AC0 is greater than AC1,

AC0 is stored in AC2 and TRN1(15) is cleared to 0.AC0 is stored in AC2 and TRN1(15) is cleared to 0.


ESIEE, Slide 100

Arithmetic instructions: Conditional Arithmetic instructions: Conditional Addition/subtraction, Shift, SubtractAddition/subtraction, Shift, Subtract

SyntaxParallel


Executed in

ADDSUBCC Smem, ACx, TC1, ACy No 3 1 X D unit ALUADDSUBCC Smem, ACx, TC2, ACy No 3 1 X D unit ALUADDSUBCC Smem, ACx, TC1, TC2, ACy No 3 1 X D unit ALU

ADDSUB2CC Smem, ACx, Tx, TC1, TC2, ACy No 3 1 XD unit Shifter

SFTCC ACx, TCx Yes 2 1 XD unit shifter

SUBC Smem, [ACx,] ACy No 3 1 X D unit ALU

Conditional addition/subtraction

Conditional Shift

Conditional Subtract

These instructions are useful for division.These instructions are useful for division.


ESIEE, Slide 101

Example for conditional shift SFTCCExample for conditional shift SFTCC

SFTCC AC0, TC1 Before Before AfterAfter AC0 AC0 FF 8765 0055 FF 8765 0055 AC0 AC0 FF 8765 0055FF 8765 0055 TC1 TC1 0 0 TC1 TC1 11

Because AC0(31) XORed with AC0(30) equals 1, the Because AC0(31) XORed with AC0(30) equals 1, the content of AC0 is not shifted left and TC1 is set to 1.content of AC0 is not shifted left and TC1 is set to 1.


ESIEE, Slide 102

Example for SUBCExample for SUBC

SUBC *AR1, AC0, AC1 Before After AC0 23 4300 0000 AC0 23 4300 0000 AC1 00 0000 0000 AC1 46 8400 0001 AR1 300 AR1 300 300 200 300 200 SXMD 0 SXMD 0 ACOV1 0 ACOV1 1 CARRY 0 CARRY 1 The content addressed by AR1 shifted left by 15 bits is subtracted from The content addressed by AR1 shifted left by 15 bits is subtracted from

the content of AC0. The result is greater than 0; therefore, the result is the content of AC0. The result is greater than 0; therefore, the result is shifted left by 1 bit, added to 1, and the new result stored in AC1. The shifted left by 1 bit, added to 1, and the new result stored in AC1. The result generated an overflow and a carry.result generated an overflow and a carry.


ESIEE, Slide 103

Dual 16-Bit ArithmeticDual 16-Bit Arithmetic

SyntaxParallel


Executed in

ADDSUB Tx, Smem, Acx No 3 1 X D unit ALUSUBADD Tx, Smem, Acx No 3 1 X D unit ALUADD dual(Lmem), [ACx,] Acy No 3 1 X D unit ALUSUB dual(Lmem), [ACx,] Acy No 3 1 X D unit ALUSUB ACx, dual(Lmem), Acy No 3 1 X D unit ALUSUB dual(Lmem), Tx, Acx No 3 1 X D unit ALUADD dual(Lmem), Tx, ACx No No 3 1 X D unit ALUSUB Tx, dual(Lmem), Acx No 3 1 X D unit ALUADDSUB Tx, dual(Lmem), Acx No 3 1 X D unit ALUSUBADD Tx, dual(Lmem), ACx No 3 1 X D unit ALU

Dual 16-Bit Arithmetic

See next slides for more details on ADDSUB and SUBADD.See next slides for more details on ADDSUB and SUBADD.

See chapter 22 for application to Viterbi algorithm.See chapter 22 for application to Viterbi algorithm.


ESIEE, Slide 104

Viterbi DecodingViterbi DecodingChannelChannelENCENCDataData

GG00GG11 DECDEC ~Data~Data~G~G00GG11

Know: Received data, and how the original data was encoded Need: Derive the original data from the received data Viterbi: “Deriving the most likely path taken through a Viterbi trellis” Process: Establish path through trellis (using metric/penalties) to allow

traceback to determine the original data that determined this path

SelectSelectMaxMax

SelectSelectMaxMax

NextNextStateState

+T3+T3

+T3+T3

- T3

- T3

- T3- T3

ACACxxHH

ACACyyHH

ACACxxLL

ACACyyLL

PrelimPrelimValuesValues

TRN0/1TRN0/1

1. Get current metric1. Get current metric

CurrentCurrentStateState

*ptr

*ptr

ProcedureProcedure

2. Add/sub local2. Add/sub local distance (T3) distance (T3)

3. Compare and3. Compare and select min/max select min/max

4. Note which path4. Note which path was taken (TRNx) was taken (TRNx)


ESIEE, Slide 105

Viterbi DecodingViterbi Decoding ADDSUB *AR0+,T3,AC0ADDSUB *AR0+,T3,AC0 ;hi(AC0) <- p0(J);hi(AC0) <- p0(J)

;lo(AC0) <- p0(J+N/2) ;lo(AC0) <- p0(J+N/2)

SUBADD *AR0+,T3,AC1SUBADD *AR0+,T3,AC1 ;hi(AC1) <- p1(J);hi(AC1) <- p1(J);lo(AC1) <- p1(J+N/2);lo(AC1) <- p1(J+N/2)

MAXDIFF AC0,AC1,AC2,AC3MAXDIFF AC0,AC1,AC2,AC3 ;put “best path” in AC2;put “best path” in AC2

- Use ABDST/SQDST to determine metric; metric update and traceback not shown- Use ABDST/SQDST to determine metric; metric update and traceback not shown

SelectSelectMaxMax

SelectSelectMaxMax

NextNextStateState

+T3+T3

+T3+T3

- T3

- T3

- T3- T3

ACACxxHH

ACACyyHH

ACACxxLL

ACACyyLL

PrelimPrelimValuesValues

TRN0/1TRN0/1

1. Get current metric1. Get current metric

CurrentCurrentStateState

*ptr

*ptr

ProcedureProcedure

2. Add/sub local2. Add/sub local distance (T3) distance (T3)

3. Compare and3. Compare and select min/max select min/max

4. Note which path4. Note which path was taken (TRNx) was taken (TRNx)


ESIEE, Slide 106

Dual Multiply/Accumulate/Subtract 1/2Dual Multiply/Accumulate/Subtract 1/2Syntax

Parallel enable bit


AMAR Xmem No 4 1 X D unit MACs

:: MAS[R][40] [uns(]Ymem[)], [uns(]Cmem[)], ACx

MAC[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx >> #16 No 4 1 X D unit MACs

:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)], ACyMPY[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACy >> #16MAC[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx >> #16 No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACy >> #16

MAS[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACy >> #16

AMAR Xmem No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACx >> #16

AMAR Xmem, Ymem, Cmem No 4 1 X D unit MACs

Dual Multiply (Accumulate/Subtract)


ESIEE, Slide 107

Dual Multiply/Accumulate/Subtract 2/2Dual Multiply/Accumulate/Subtract 2/2Syntax

Parallel enable bit


AMAR Xmem No 4 1 X D unit MACs

:: MAS[R][40] [uns(]Ymem[)], [uns(]Cmem[)], ACx

MAC[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx >> #16 No 4 1 X D unit MACs

:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)], ACyMPY[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACy >> #16MAC[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx >> #16 No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACy >> #16

MAS[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACy >> #16

AMAR Xmem No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACx >> #16

AMAR Xmem, Ymem, Cmem No 4 1 X D unit MACs

Dual Multiply (Accumulate/Subtract)


ESIEE, Slide 108

FIR symetrical and antisymetrical filtersFIR symetrical and antisymetrical filters

SyntaxParallel

enable bitSize Cycles Pipeline Executed in

FIRSADD Xmem, Ymem, Cmem, ACx, Acy No 4 1 XD unit MAC

and ALU

FIRSSUB Xmem, Ymem, Cmem, ACx, Acy No 4 1 XD unit MAC

and ALU

Finite Impulse Response Filter, Symmetrical/Antisymmetrical

See chapter 14 on FIR filters for detailed explanation of these instructions See chapter 14 on FIR filters for detailed explanation of these instructions and next slide for illustration.and next slide for illustration.


ESIEE, Slide 109

Symmetrical/Antisymmetrical FIRSymmetrical/Antisymmetrical FIR

CoeffsCoeffs

a4a4 a5a5 a6a6 a7a7a3a3a2a2a1a1a0a0

SymmetricalSymmetrical

CoeffsCoeffs

a4a4 a5a5 a6a6 a7a7a3a3a2a2a1a1a0a0

Anti-symmetricalAnti-symmetrical

These filters may be “folded” and performed with N adds and N/2 MACs Filters need to be designed as even length

Y(n) = Y(n) = a0a0(x7 + x0) + (x7 + x0) + a1a1(x6 + x1) + (x6 + x1) + a2a2(x5 + x2) + (x5 + x2) + a3a3(x4 + x3)(x4 + x3)

Anti-symmetrical: use FIRSSUB (e.g. a0(x7-x0)), FIRSADD: 2 taps/cycle Anti-symmetrical: use FIRSSUB (e.g. a0(x7-x0)), FIRSADD: 2 taps/cycle

FIRSADD Xmem,Ymem,coef,ACx,ACyFIRSADD Xmem,Ymem,coef,ACx,ACy

FIRSADD = ACy = ACy + (ACx * (*CDP))FIRSADD = ACy = ACy + (ACx * (*CDP)) || ACx = Xmem + Ymem|| ACx = Xmem + Ymem

PseudoPseudoCode:Code:


ESIEE, Slide 110

Implied parallelism instructionsImplied parallelism instructions

SyntaxParallel


MPYM[R] [T3 = ]Xmem, Tx, ACy No 4 1 X D unit MAC:: MOV HI(ACx << T2), Ymem and shifterMACM[R] [T3 = ]Xmem, Tx, ACy No 4 1 X D unit MAC:: MOV HI(ACx << T2), Ymem and shifterMASM[R] [T3 = ]Xmem, Tx, ACy No 4 1 X D unit MAC:: MOV HI(ACx << T2), Ymem and shifterADD Xmem << #16, ACx, ACy No 4 1 X D unit ALU:: MOV HI(ACy << T2), Ymem and shifterSUB Xmem << #16, ACx, ACy No 4 1 X D unit ALU:: MOV HI(ACy << T2), Ymem and shifterMOV Xmem << #16, ACy No 4 1 X D unit ALU:: MOV HI(ACx << T2), Ymem and shifterMACM[R] [T3 = ]Xmem, Tx, ACx No 4 1 X D unit MAC:: MOV Ymem << #16, ACyMASM[R] [T3 = ]Xmem, Tx, ACx No 4 1 X D unit MAC:: MOV Ymem << #16, ACy

Implied Paralleled Instructions


ESIEE, Slide 111

LMS, MAX and MIN instructionsLMS, MAX and MIN instructions

SyntaxParallel


LMS Xmem, Ymem, ACx, Acy No 4 1 XD unit MAC

and ALU

MAX [src,] dst Yes 2 1 XA or D unit

ALU

MIN [src,] dst Yes 2 1 XA or D unit

ALU

Least Mean Square (LMS)

Maximum Comparison

Minimum Comparison

See chapter 16 on adaptive FIR filters for detailed explanation of the LMS instruction See chapter 16 on adaptive FIR filters for detailed explanation of the LMS instruction and next slide for illustration.and next slide for illustration.

See next sides for more explanation on min and max.See next sides for more explanation on min and max.


ESIEE, Slide 112

Adaptive Filtering Using LMS - ConceptAdaptive Filtering Using LMS - ConceptA least mean square (LMS) approach is widely used for adaptive filter routines. A least mean square (LMS) approach is widely used for adaptive filter routines.

The technique minimizes an error term by tuning the filter coefficients.The technique minimizes an error term by tuning the filter coefficients.

e*xe*x00e*xe*x11 e*xe*xnn

inputinput Modeled SystemModeled System

dd

ee

--yy

a0

z-1 z-1 z-1...

a1 an

x0 x1 xk

Provide input to theProvide input to thereal system to getreal system to getdesired output: d(n)desired output: d(n)

LMS = LMS = FIRFIR , , Coeff updateCoeff update = (ACy = ACy + (ai*xi)), ACx = rnd(ei+ai)= (ACy = ACy + (ai*xi)), ACx = rnd(ei+ai)

Filter input through FIRFilter input through FIRmodel to get estimatedmodel to get estimatedoutput: y(n)output: y(n)

Compare results toCompare results toget error term: e(n)get error term: e(n)

Correlate the errorCorrelate the errorterm with x0, x1 …,term with x0, x1 …,to update coefficientsto update coefficients

input + noiseinput + noise

-input - noise-input - noise

eei i = e*x= e*xii


ESIEE, Slide 113

LMS Adaptive Filter CodeLMS Adaptive Filter Code ... ... .asg AR3, Coefs.asg AR3, Coefs .asg AR4, Data.asg AR4, Data

MOV @B2e,T3MOV @B2e,T3

|| MOV #(N-2),BRC0|| MOV #(N-2),BRC0

MPYM *Data,T3,AC0MPYM *Data,T3,AC0

|| MOV #0,AC1|| MOV #0,AC1

MOV port(@X_new),*Data+MOV port(@X_new),*Data+

LMS *Coefs,*Data,AC0,AC1LMS *Coefs,*Data,AC0,AC1

|| RPTBLOCAL e1|| RPTBLOCAL e1

MOV HI(AC0),*Coefs+MOV HI(AC0),*Coefs+

|| MPYM *Data+,T3,AC0|| MPYM *Data+,T3,AC0

e1: LMS *Coefs,*Data,AC0,AC1e1: LMS *Coefs,*Data,AC0,AC1

MOV HI(AC0),*CoefsMOV HI(AC0),*Coefs

|| MOV HI(AC1),*Result+|| MOV HI(AC1),*Result+

Pre-calculate Pre-calculate 2*2**e(n) ... *e(n) ... . . AR3 pts to coeff table: a[n]AR3 pts to coeff table: a[n]AR4 pts to data table: x[n]AR4 pts to data table: x[n]

T3 holds error step amount...T3 holds error step amount...

… … while loading BRC0while loading BRC0

ACO = error * oldest sample: x(n)... ACO = error * oldest sample: x(n)...

… while clearing AC1 (running FIR)… while clearing AC1 (running FIR)

Overwrite x(n) with x(0)Overwrite x(n) with x(0)

Start FIR calc, update oldest coeff…Start FIR calc, update oldest coeff…

… … and start repeat blockand start repeat block

Store update coefficient ...Store update coefficient ...

...while calculating next update term...while calculating next update term

Calc FIR, update coefficientCalc FIR, update coefficient

Store final coefficient...Store final coefficient...

……while storing FIR outputwhile storing FIR output


ESIEE, Slide 114

Search - MIN, MAXSearch - MIN, MAX Goal: find the max (or min) value in an array

162824

24893588

*ptrsrc dst

max

Y Carry=0

MAX src,dstMAX src,dst

Operands: src/dst can be AC0-3, AR0-7, T0-3

RPT #(N-2)RPT #(N-2)

MAX AC0,AC1MAX AC0,AC1

|| MOV *ptr+,AC0 || MOV *ptr+,AC0

Benchmark: ~N cycles to find the min/max of N elements

How do you determine WHICH value was the min/max?How do you determine WHICH value was the min/max?


ESIEE, Slide 115

Memory comparison, modify auxiliary Memory comparison, modify auxiliary register, modify data stack pointerregister, modify data stack pointer

SyntaxParallel


CMP Smem == K16, TC1 No 4 1 X A unit ALUCMP Smem == K16, TC2 No 4 1 X A unit ALU

AADD TAx, Tay Yes 3 1 AD A unit DAGENs

ASUB TAx, Tay Yes 3 1 AD A unit DAGENsAMOV TAx, Tay Yes 3 1 AD A unit DAGENs

AADD k8, Tax Yes 3 1 AD A unit DAGENs

ASUB k8, Tax Yes 3 1 AD A unit DAGENs

AMOV k8, Tax Yes 3 1 AD A unit DAGENsAMOV D16, Tax No 4 1 AD A unit DAGENsAMAR Smem No 2 1 AD A unit DAGENs

AADD K8, SP Yes 2 1 AD A unit ALU

Modify Auxiliary Register (MAR)

Modify Data Stack Pointer (SP)

Memory Comparison


ESIEE, Slide 116

Multiply instructionsMultiply instructions

SyntaxParallel


Executed in

SQR[R] [ACx,] Acy Yes 2 1 X D unit MACMPY[R] [ACx,] Acy Yes 2 1 X D unit MACMPY[R] Tx, [ACx,] Acy Yes 2 1 X D unit MACMPYK[R] K8, [ACx,] Acy Yes 3 1 X D unit MACMPYK[R] K16, [ACx,] Acy No 4 1 X D unit MACMPYM[R] [T3 = ]Smem, Cmem, Acx No 3 1 X D unit MACSQRM[R] [T3 = ]Smem, Acx No 3 1 X D unit MACMPYM[R] [T3 = ]Smem, [ACx,] Acy No 3 1 X D unit MACMPYMK[R] [T3 = ]Smem, K8, Acx No 4 1 X D unit MACMPYM[R][40] [T3 = ][uns(]Xmem[)], [uns(]Ymem[)], Acx No 4 1 X D unit MACMPYM[R][U] [T3 = ]Smem, Tx, Acx No 3 1 X D unit MAC

Multiply


ESIEE, Slide 117

Multiply and accumulate instructionsMultiply and accumulate instructions

SyntaxParallel


Executed in

SQA[R] [ACx,] Acy Yes 2 1 X D unit MACMAC[R] ACx, Tx, ACy[, ACy] Yes 2 1 X D unit MACMAC[R] ACy, Tx, ACx, Acy Yes 2 1 X D unit MACMACK[R] Tx, K8, [ACx,] Acy Yes 3 1 X D unit MACMACK[R] Tx, K16, [ACx,] Acy No 4 1 X D unit MACMACM[R] [T3 = ]Smem, Cmem, Acx No 3 1 X D unit MACMACM[R]Z [T3 = ]Smem, Cmem, Acx No 3 1 X D unit MACSQAM[R] [T3 = ]Smem, [ACx,] Acy No 3 1 X D unit MACMACM[R] [T3 = ]Smem, [ACx,] Acy No 3 1 X D unit MACMACM[R] [T3 = ]Smem, Tx, [ACx,] Acy No 3 1 X D unit MACMACMK[R] [T3 = ]Smem, K8, [ACx,] Acy No 4 1 X D unit MACMACM[R][40] [T3 = ][uns(]Xmem[)], [uns(]Ymem[)], No 4 1 X D unit MAC[ACx,] ACyMACM[R][40] [T3 = ][uns(]Xmem[)], [uns(]Ymem[)], No 4 1 X D unit MACACx >> #16[, ACy]

Multiply and Accumulate (MAC)


ESIEE, Slide 118

Multiply and subtract, negationMultiply and subtract, negation

SyntaxParallel


SQS[R] [ACx,] Acy Yes 2 1 X D unit MACMAS[R] Tx, [ACx,] Acy Yes 2 1 X D unit MACMASM[R] [T3 = ]Smem, Cmem, ACx No 3 1 X No 3 1 X D unit MACSQSM[R] [T3 = ]Smem, [ACx,] ACy No 3 1 X No 3 1 X D unit MACMASM[R] [T3 = ]Smem, [ACx,] ACy No 3 1 X No 3 1 X D unit MACMASM[R] [T3 = ]Smem, Tx, [ACx,] ACy No 3 1 X No 3 1 X D unit MACMASM[R][40] [T3 = ][uns(]Xmem[)], [uns(]Ymem[)], No 4 1 X D unit MAC[ACx,] ACy

NEG [src,] dst Yes 2 1 X A or D unit ALU

Multiply and Subtract (MAS)

Negation


ESIEE, Slide 119

Normalization, Register comparisonNormalization, Register comparison

SyntaxParallel


MANT ACx, ACy Yes 3 1 X A unit ALU:: NEXP ACx, Tx and D unit shifterEXP ACx, Tx Yes 3 1 X A unit ALU

and D unit shifter

Normalization

SyntaxParallel


CMP[U] src RELOP dst, TCx Yes 3 1 X A or D unit ALUCMPAND[U] src RELOP dst, TCy, TCx Yes 3 1 X A or D unit ALUCMPAND[U] src RELOP dst, !TCy, TCx Yes 3 1 X A or D unit ALUCMPOR[U] src RELOP dst, TCy, TCx Yes 3 1 X A or D unit ALUCMPOR[U] src RELOP dst, !TCy, TCx Yes 3 1 X A or D unit ALU

Register comparison

These normalization instructions are useful when working with floating point numbers.These normalization instructions are useful when working with floating point numbers.


ESIEE, Slide 120

Round and saturate, signed shiftRound and saturate, signed shift

SyntaxParallel


ROUND [ACx,] Acy Yes 2 1 X D unit ALU

SAT[R] [ACx,] Acy Yes 2 1 X D unit ALU

Round

Saturate

SyntaxParallel


SFTS dst, #–1 Yes 2 1 X A unit ALU or D unit shifterSFTS dst, #1 Yes 2 1 X A unit ALU or D unit shifterSFTS ACx, Tx[, ACy] Yes 2 1 X D unit shifterSFTSC ACx, Tx[, ACy] Yes 2 1 X D unit shifterSFTS ACx, #SHIFTW[, ACy] Yes 3 1 X D unit shifterSFTSC ACx, #SHIFTW[, ACy] Yes 3 1 X D unit shifter

Signed shift


ESIEE, Slide 121

Square distanceSquare distance

SyntaxParallel


SQDST Xmem, Ymem, ACx, Acy No 4 1 X D unit MAC and ALUSquare distance


ESIEE, Slide 122

SubtractionSubtractionSyntax

Parallel enable bit


SUB [src,] dst Yes 2 1 X A or D unit ALU

SUB k4, dst Yes 2 1 X A or D unit ALU

SUB K16, [src,] dst No 4 1 X A or D unit ALU

SUB Smem, [src,] dst No 3 1 X A or D unit ALU

SUB src, Smem, dst No 3 1 X A or D unit ALU

SUB ACx << Tx, ACy Yes 2 1 X D unit ALU and Shiter

SUB ACx << #SHIFTW, ACy Yes 3 1 X D unit ALU and Shiter

SUB K16 << #16, [ACx,] ACy No 4 1 X D unit ALU

SUB K16 << #SHFT, [ACx,] ACy No 4 1 X D unit ALU and shifter

SUB Smem << Tx, [ACx,] ACy No 3 1 X D unit ALU and shifter

SUB Smem << #16, [ACx,] ACy No 3 1 X D unit ALU

SUB [uns(]Smem[)], CARRY, [ACx,] ACy No 3 1 X D unit ALU

SUB [uns(]Smem[)], [ACx,] ACy No 3 1 X D unit ALU

SUB [uns(]Smem[)] << #SHIFTW, [ACx,] ACy No 4 1 X D unit ALU and shifter

SUB dbl(Lmem), [ACx,] ACy No 3 1 X D unit ALU

SUB Xmem, Ymem, ACx No 3 1 X D unit ALU

SUB K16, Smem No 4 1 X D unit ALU

SUB[R]V [ACx,] ACy Yes 2 1 X D unit MAC

Subtraction

Similar to ADD instruction, except for form n°5Similar to ADD instruction, except for form n°5


ESIEE, Slide 123

Example for 5Example for 5thth form of SUB form of SUB

Syntax:Syntax: SUB src, Smem, dst SUB AC1, *AR3, AC0 The content of AC1 is subtracted from the The content of AC1 is subtracted from the

content addressed by AR3 and the result is content addressed by AR3 and the result is stored in AC0.stored in AC0.


ESIEE, Slide 124

Bit manipulations operationsBit manipulations operations

SyntaxParallel


BAND Smem, k16, TC1 No 4 1 X A unit ALU

BAND Smem, k16, TC2 No 4 1 X A unit ALU

BFXPA k16, ACx, dst No 4 1 X D unit shifter

BFXTR k16, ACx, dst No 4 1 X D unit shifter

Bit field comparison

Bit field expand

Bit field extract


ESIEE, Slide 125

Memory Bit Test/Set/Clear/Complement

SyntaxParallel


BTST src, Smem, TCx No 3 1 X A unit ALU

BNOT src, Smem No 3 1 X A unit ALU

BCLR src, Smem No 3 1 X A unit ALU

BSET src, Smem No 3 1 X A unit ALU

BTSTSET k4, Smem, TC1 No 3 1 X A unit ALU

BTSTSET k4, Smem, TC2 No 3 1 X A unit ALU

BTSTCLR k4, Smem, TC1 No 3 1 X A unit ALU

BTSTCLR k4, Smem, TC2 No 3 1 X A unit ALU

BTSTNOT k4, Smem, TC1 No 3 1 X A unit ALU

BTSTNOT k4, Smem, TC2 No 3 1 X A unit ALU

BTST k4, Smem, TC1 No 3 1 X A unit ALU

BTST k4, Smem, TC2 No 3 1 X A unit ALU

Memory Bit Test/Set/Clear/Complement


ESIEE, Slide 126

Register Bit test/Set/Clear/ComplementRegister Bit test/Set/Clear/Complement

SyntaxParallel


BTST Baddr, src, TCx No 3 1 X A or D unit ALU

BNOT Baddr, src No 3 1 X A or D unit ALU

BCLR Baddr, src No 3 1 X A or D unit ALU

BSET Baddr, src No 3 1 X A or D unit ALU

BTSTP Baddr, src No 3 1 X A or D unit ALU

Register Bit Test/Set/Clear/Complement


ESIEE, Slide 127

Status bit Set/ClearStatus bit Set/Clear

SyntaxParallel


BCLR k4, ST0_55 Yes 2 1 X A unit ALU

BSET k4, ST0_55 Yes 2 1 X A unit ALU





BCLR k4, ST3_55 Yes 2 1† X A unit ALU

BSET k4, ST3_55 Yes 2 1† X A unit ALU

BCLR f–name Yes 2 1† X A unit ALU

BSET f–name Yes 2 1† X A unit ALU

Status Bit Set/Clear

† † When these instructions are decoded to modify status bit CAFRZ (15), When these instructions are decoded to modify status bit CAFRZ (15), CAEN (14), or CACLR (13), the CPU pipeline is flushed and the CAEN (14), or CACLR (13), the CPU pipeline is flushed and the

instruction is executed in 5 cycles regardless of the instruction contextinstruction is executed in 5 cycles regardless of the instruction context.


ESIEE, Slide 128

Extended auxiliary register (XAR) operationsExtended auxiliary register (XAR) operations

SyntaxParallel


MOV xsrc, xdst No 2 1 X

AMAR Smem, Xadst No 3 1 ADAMOV k23, XAdst No 6 1 AD

MOV dbl(Lmem), Xadst No 3 1 X

POPBOTH xdst Yes 2 1 X

PSHBOTH xsrc Yes 2 1 X

MOV XAsrc, dbl(Lmem) No 3 1 X

Extended Auxiliary Register Move

Load Effective Address to Extended Auxiliary Register

Load Extended Auxiliary Register from Memory

Pop Extended Auxiliary Register from Stack Pointers

Push Extended Auxiliary Register to Stack Pointers

Store Extended Auxiliary Register to Memory


ESIEE, Slide 129

Logical operationsLogical operations

SyntaxParallel


BCNT ACx, ACy,TCx, Tx Yes 3 1 X D unit shifter and A unit ALU

NOT [src,] dst Yes 2 1 X A or D unit ALU

AND src, dst Yes 2 1 X A or D unit ALU

AND k8, src, dst Yes 3 1 X A or D unit ALU

AND k16, src, dst No 4 1 X A or D unit ALU

AND Smem, src, dst No 3 1 X A or D unit ALU

AND ACx << #SHIFTW[, ACy] Yes 3 1 X D unit shifter

AND k16 << #16, [ACx,] Acy No 4 1 X D unit ALU

AND k16 << #SHFT, [ACx,] Acy No 4 1 X D unit shifter

AND k16, Smem No 4 1 X A unit ALU

Bitwise Complement

Bitwise AND

Bit Field Counting


ESIEE, Slide 130

Logical operationsLogical operationsSyntax

Parallel enable bit


OR src, dst Yes 2 1 X A or D unit ALU

OR k8, src, dst Yes 3 1 X A or D unit ALU

OR k16, src, dst No 4 1 X A or D unit ALU

OR Smem, src, dst No 3 1 X A or D unit ALU

OR ACx << #SHIFTW[, ACy] Yes 3 1 X D unit shifter

OR k16 << #16, [ACx,] Acy No 4 1 X D unit ALU

OR k16 << #SHFT, [ACx,] Acy No 4 1 X D unit shifter

OR k16, Smem No 4 1 X A unit ALU

XOR src, dst Yes 2 1 X A or D unit ALU

XOR k8, src, dst Yes 3 1 X A or D unit ALU

XOR k16, src, dst No 4 1 X A or D unit ALU

XOR Smem, src, dst No 3 1 X A or D unit ALU

XOR ACx << #SHIFTW[, ACy] Yes 3 1 X D unit shifter

OR k16 << #16, [ACx,] Acy No 4 1 X D unit ALU

XOR k16 << #SHFT, [ACx,] Acy No 4 1 X D unit shifter

XOR k16, Smem No 4 1 X A unit ALU

BitwiseXOR

Bitwise OR


ESIEE, Slide 131

Logival operationsLogival operations

SyntaxParallel


SFTL dst, #1 Yes 2 1 X A unit ALU or D unit shifter

SFTL dst, #–1 Yes 2 1 X A unit ALU or D unit shifter

SFTL ACx, Tx[, ACy] Yes 2 1 X D unit shifter

SFTL ACx, #SHIFTW[, ACy] Yes 3 1 X D unit shifter

NEG [src,] dst Yes 2 1 X A or D unit ALU

ROL BitOut, src, BitIn, dst Yes 3 1 X A unit ALU or D unit shifter

ROR BitIn, src, BitOut, dst Yes 3 1 X A unit ALU or D unit shifterRotate Right

Logical Shift

Rotate Left

Negation


ESIEE, Slide 132

Miscelaneous operationsMiscelaneous operations

SyntaxParallel


<instruction>.LR No 1 1 AD<instruction>.CR No 1 1 AD

mmap No 1 1 D

port(Smem) No 1 1 D

Linear/Circular Addressing Qualifiers

Memory-Mapped Register Access Qualifier

Peripheral Port Register Access Qualifier


ESIEE, Slide 133

Move operationsMove operations

SyntaxParallel


SWAP ARx, Tx Yes 2 1 AD A unit register file

SWAP Tx, Ty Yes 2 1 AD A unit register file

SWAP ARx, ARy Yes 2 1 AD A unit register file

SWAP ACx, ACy Yes 2 1 X D unit register file

SWAPP ARx, Tx Yes 2 1 AD A unit register file

SWAPP T0, T2 Yes 2 1 AD A unit register file

SWAPP AR0, AR2 Yes 2 1 AD A unit register file

SWAPP AC0, AC2 Yes 2 1 X D unit register file

SWAP4 AR4, T0 Yes 2 1 AD A unit register file

Accumulator, Auxiliary, or Temporary Register Content Swap


ESIEE, Slide 134


SyntaxParallel


MOV k4, dst Yes 2 1 X A or D unit register file

MOV –k4, dst Yes 2 1 X A or D unit register file

MOV K16, dst No 4 1 X A or D unit register file

MOV Smem, dst No 2 1 X A or D unit register file

MOV [uns(]high_byte(Smem)[)], dst No 3 1 X A or D unit register file

MOV [uns(]low_byte(Smem)[)], dst No 3 1 X A or D unit register file

MOV K16 << #16, ACx No 4 1 X D unit ALU

MOV K16 << #SHFT, ACx No 4 1 X D unit shifter

MOV [rnd(]Smem << Tx[)], ACx No 3 1 X D unit shifter

Accumulator, Auxiliary, or Temporary Register Load 1/2


ESIEE, Slide 135


SyntaxParallel


MOV low_byte(Smem) << #SHIFTW, Acx No 3 1 X D unit shifter

MOV high_byte(Smem) << #SHIFTW, Acx No 3 1 X D unit shifter

MOV Smem << #16, Acx No 2 1 X D unit ALU

MOV [uns(]Smem[)], Acx No 3 1 X D unit register file

MOV [uns(]Smem[)] << #SHIFTW, ACx No 4 1 X D unit shifter

MOV[40] dbl(Lmem), ACx No 3 1 X D unit register file

MOV Xmem, Ymem, ACx No 3 1 1 D unit register file

MOV dbl(Lmem), pair(HI(ACx)) No 3 1 X D unit register file

MOV dbl(Lmem), pair(LO(ACx)) No 3 1 X D unit register file

MOV dbl(Lmem), pair(TAx) No 3 1 X A unit register file

Accumulator, Auxiliary, or Temporary Register Load 2/2


ESIEE, Slide 136


SyntaxParallel


MOV src, dst Yes 2 1 X A or D unit ALU

MOV HI(ACx), Tax Yes 2 1 X A unit ALU

MOV TAx, HI(ACx) Yes 2 1 X D unit ALU

Accumulator, Auxiliary, or Temporary Register Move


ESIEE, Slide 137


SyntaxParallel


MOV src, Smem No 2 1 X A or D unit register file

MOV src, high_byte(Smem) No 3 1 X A or D unit register file

MOV src, low_byte(Smem) No 3 1 X A or D unit register file

MOV HI(ACx), Smem No 2 1 X D unit register file

MOV [rnd(]HI(ACx)[)], Smem No 3 1 X D unit shifter

MOV ACx << Tx, Smem No 3 1 X D unit shifter

MOV [rnd(]HI(ACx << Tx)[)], Smem No 3 1 X D unit shifter

MOV ACx << #SHIFTW, Smem No 3 1 X D unit shifter

MOV HI(ACx << #SHIFTW), Smem No 3 1 X D unit shifter

MOV [rnd(]HI(ACx << #SHIFTW)[)], Smem No 4 1 X D unit shifter

Accumulator, Auxiliary, or Temporary Register Store 1/2


ESIEE, Slide 138


SyntaxParallel


MOV [uns(] [rnd(]HI(saturate(ACx))[))], Smem No 3 1 X D unit shifter

MOV [uns(] [rnd(]HI(saturate(ACx << Tx))[))], Smem No 3 1 X D unit shifter

MOV [uns(](rnd(]HI(saturate(ACx << #SHIFTW))[))],Smem No 4 1 X D unit shifter

MOV ACx, dbl(Lmem) No 3 1 X D unit shifter

MOV [uns(]saturate(ACx)[)], dbl(Lmem) No 3 1 X D unit shifter

MOV ACx >> #1, dual(Lmem) No 3 1 X D unit register file

MOV pair(HI(ACx)), dbl(Lmem) No 3 1 X D unit register file

MOV pair(LO(ACx)), dbl(Lmem) No 3 1 X A unit register file

MOV pair(TAx), dbl(Lmem) No 3 1 X D unit shifter

MOV ACx, Xmem, Ymem No 3 1 X D unit register file

Accumulator, Auxiliary, or Temporary Register Store 2/2


ESIEE, Slide 139


SyntaxParallel


DELAY Smem No 2 1 X A or D unit register file

MOV Cmem, Smem No 3 1 X A or D unit register file

MOV Smem, Cmem No 3 1 X A or D unit register file

MOV K8, Smem No 3 1 X A or D unit register file

MOV K16, Smem No 4 1 X A or D unit register file

MOV Cmem,dbl(Lmem) No 3 1 X A or D unit register file

MOV dbl(Lmem), Cmem No 3 1 X A or D unit register file

MOV dbl(Xmem), dbl(Ymem) No 3 1 X A or D unit register file

MOV Xmem, Ymem No 3 1 X A or D unit register file

Memory Delay

Memory-to-Memory Move/Memory Initialization


ESIEE, Slide 140


SyntaxParallel


POP dst1,dst2 Yes 2 1 X A or D unit register file

POP dst Yes 2 1 X A or D unit register file

POP dst, Smem No 3 1 X A or D unit register file

POP dbl(ACx) Yes 2 1 X A or D unit register file

POP Smem No 2 1 X A or D unit register file

POP dbl(Lmem) No 2 1 X A or D unit register file

PSH src1,src2 Yes 2 1 X A or D unit register file

PSH src Yes 2 1 X A or D unit register file

PSH src,Smem No 3 1 X A or D unit register file

PSH dbl(ACx) Yes 2 1 X A or D unit register file

PSH Smem No 2 1 X A or D unit register file

PSH dbl(Lmem) No 2 1 X A or D unit register file

Pop Top of Stack (TOS)

Push to Top of Stack (TOS)


ESIEE, Slide 141


SyntaxParallel


MOV TAx, BRC0 Yes 2 1 X A unit ALU

MOV TAx, BRC1 Yes 2 1 X A unit ALU

MOV TAx, CDP Yes 2 1 X A unit ALU

MOV TAx, CSR Yes 2 1 X A unit ALU

MOV TAx, SP Yes 2 1 X A unit ALU

MOV TAx, SSP Yes 2 1 X A unit ALU

MOV BRC0, Tax Yes 2 1 X A unit ALU

MOV BRC1, Tax Yes 2 1 X A unit ALU

MOV CDP, Tax Yes 2 1 X A unit ALU

MOV RPTC, Tax Yes 2 1 X A unit ALU

MOV SP, Tax Yes 2 1 X A unit ALU

MOV SSP, Tax Yes 2 1 X A unit ALU

Specific CPU Register Move


ESIEE, Slide 142


SyntaxParallel


MOV BK03, Smem No 3 1 X A or D unit register file

MOV BK47, Smem No 3 1 X A or D unit register file

MOV BKC, Smem No 3 1 X A or D unit register file

MOV BSA01, Smem No 3 1 X A or D unit register file




MOV BSAC, Smem No 3 1 X A or D unit register file

MOV BRC0, Smem No 3 1 X A or D unit register file

MOV BRC1, Smem No 3 1 X A or D unit register file

Specific CPU Register Store 1/2


ESIEE, Slide 143


SyntaxParallel


MOV CDP, Smem No 3 1 X A or D unit register file

MOV CSR, Smem No 3 1 X A or D unit register file

MOV DP, Smem No 3 1 X A or D unit register file

MOV DPH, Smem No 3 1 X A or D unit register file

MOV PDP, Smem No 3 1 X A or D unit register file

MOV SP, Smem No 3 1 X A or D unit register file

MOV SSP, Smem No 3 1 X A or D unit register file

MOV TRN0, Smem No 3 1 X A or D unit register file

MOV TRN1, Smem No 3 1 X A or D unit register file

MOV RETA, dbl(Lmem) No 3 5 X A or D unit register file

Specific CPU Register Store 2/2


ESIEE, Slide 144

Program-control operationsProgram-control operations

x/y cycles: x cycles = condition true, y cycles = condition falsex/y cycles: x cycles = condition true, y cycles = condition false

† † These instructions execute in 3 cycles if the addressed instruction is These instructions execute in 3 cycles if the addressed instruction is in the instruction buffer unit.in the instruction buffer unit.

SyntaxParallel


BCC l4, cond No 2 6/5 RBCC L8, cond Yes 3 6/5 RBCC L16, cond No 4 6/5 RBCC P24, cond No 5 5/5 R

B Acx No 2 10 XB L7 Yes 2 6† ADB L16 Yes 3 6† ADB P24 No 4 5 D

BCC L16, ARn_mod ! = #0 No 4 6/5 AD

Branch Conditionally

Branch Unconditionally

Branch on Auxiliary Register Not Zero


ESIEE, Slide 145


SyntaxParallel


CALLCC L16, cond No 4 6/5 RCALLCC P24, cond No 5 5/5 R

CALL Acx No 2 10 XCALL L16 Yes 3 6 ADCALL P24 No 4 5 D

BCC[U] L8, src RELOP K8 No 4 7/6 X

Call Conditionally

Call Unconditionally

Compare and Branch

x/y cycles: x cycles = condition true, y cycles = condition false.x/y cycles: x cycles = condition true, y cycles = condition false.


ESIEE, Slide 146


SyntaxParallel


XCC [label, ]cond No 2 1 ADXCCPART [label, ]cond No 2 1 X

IDLE No 4 ? D No 4 ? D

NOP Yes 1 1 DNOP_16 Yes 2 1 D

Idle

No Operation (NOP)

Execute Conditionally


ESIEE, Slide 147


SyntaxParallel


RPTBLOCAL pmad Yes 2 1 ADRPTB pmad Yes 3 1 AD

RPTCC k8, cond Yes 3 1 AD

RPT CSR Yes 2 1 ADRPTADD CSR, Tax Yes 2 1 XRPT k8 Yes 2 1 ADRPTADD CSR, k4 Yes 2 1 XRPTSUB CSR, k4 Yes 2 1 XRPT k16 Yes 3 1 AD

Repeat Single Instruction Unconditionally

Repeat Block of Instructions Unconditionally

Repeat Single Instruction Conditionally


ESIEE, Slide 148


x/y cycles: x cycles = condition true, y cycles = condition falsex/y cycles: x cycles = condition true, y cycles = condition false

SyntaxParallel


RETCC cond Yes 3 5/5 R

RET Yes 2 5 D

RETI Yes 2 5 D

INTR k5 No 2 3 D

RESET No 2 ? D

TRAP k5 No 2 ? D

Software Interrupt

Software Reset

Software Trap

Return Conditionally

Return Unconditionally

Return from Interrupt


ESIEE, Slide 149

Running C54 code on the C55Running C54 code on the C55

Assembling/optimizing a C54x algorithmAssembling/optimizing a C54x algorithm

System issues (interrupts, peripherals, etc)System issues (interrupts, peripherals, etc)


ESIEE, Slide 150

Running C54 Code on the C55Running C54 Code on the C55 C55 assembler accepts all C54x mnemonics (no translationC55 assembler accepts all C54x mnemonics (no translation

necessary: C55 instructions are a superset of C54)necessary: C55 instructions are a superset of C54)

C54 source will execute correctly on C55 (bit exact)C54 source will execute correctly on C55 (bit exact)

User must optimize C54 code to use new C55 featuresUser must optimize C54 code to use new C55 features

What % of commonly used C54 instructions assemble asWhat % of commonly used C54 instructions assemble asONE instruction on the C55 ?ONE instruction on the C55 ?

One C54x instruction assembles as:

1 C55x instruction(97%)

2 C55x instructions(2%)

3+ C55x instructions(1%)

2 areas of focus:2 areas of focus: AlgorithmsAlgorithms SystemSystem

Let’s look at Let’s look at algorithms first...algorithms first...


ESIEE, Slide 151

C54 Filter Algorithm ExampleC54 Filter Algorithm Example

fir: fir: STM #184,BRCSTM #184,BRCSTM #x,AR3STM #x,AR3STM #y,AR4STM #y,AR4RPTB done-1RPTB done-1 STM #a0,AR2STM #a0,AR2 MPY *AR2+,*AR3+,AMPY *AR2+,*AR3+,A RPTRPT #14#14 MACMAC *AR2+,*AR3+,A*AR2+,*AR3+,A

MAR *+AR3(#-15)MAR *+AR3(#-15) STHSTH A,*AR4+ A,*AR4+

done:done: B next_routineB next_routine

fir: fir: AMOV #184,BRC0AMOV #184,BRC0AMOV #x,XAR3AMOV #x,XAR3AMOV #y,XAR4AMOV #y,XAR4RPTB doneRPTB done AMOV #a0,XAR2AMOV #a0,XAR2 MPYM *AR2+,*AR3+,AC0MPYM *AR2+,*AR3+,AC0 RPT #14RPT #14 MACM *AR2+,*AR3+,AC0MACM *AR2+,*AR3+,AC0

AMAR *+AR3(#-15)AMAR *+AR3(#-15)done:done: MOVH AC0,*AR4+ MOVH AC0,*AR4+

B next_routineB next_routine

Original C54 CodeOriginal C54 Code Resulting C55 DisassemblyResulting C55 Disassembly

The code above assembles AS IS - no errors, no warningsThe code above assembles AS IS - no errors, no warnings User must modify code to use C55 dual-MAC (pointers, MAC:MAC)User must modify code to use C55 dual-MAC (pointers, MAC:MAC)

RPTB kernel: 20 bytes, 4072 cyclesRPTB kernel: 20 bytes, 4072 cycles

Does “anything go” with C54 source? Nope...Does “anything go” with C54 source? Nope...

kernel (w/o B): 17 bytes, 3701 cycleskernel (w/o B): 17 bytes, 3701 cycles


ESIEE, Slide 152

C54 Coding Practices - No-No’sC54 Coding Practices - No-No’s

hard_addr: B 1000hhard_addr: B 1000h

B loop+10B loop+10

ERROR! at line 39: [E9999]ERROR! at line 39: [E9999] Illegal operand or Illegal operand or operand combinationoperand combination

Original C54 CodeOriginal C54 Code C55 Assembler OutputC55 Assembler Output

Cannot use hard-coded address or program label offsetsCannot use hard-coded address or program label offsets

C55 uses variable-length instructions. Location of hardC55 uses variable-length instructions. Location of hardaddress and/or offset will be different.address and/or offset will be different.

pipe_trick: STM #100h,AR5pipe_trick: STM #100h,AR5

STLM A,AR5STLM A,AR5

ADD *AR5,BADD *AR5,B

pipe_trick: AMOV #100h,AR5pipe_trick: AMOV #100h,AR5

MOV AC0,AR5 MOV AC0,AR5

ADD *AR5,AC1ADD *AR5,AC1


Pipe Trick: C54 Pipe Trick: C54 ADDADD uses old value of AR5 (#100h) uses old value of AR5 (#100h) Code assembles fine, but will NOT yield same resultsCode assembles fine, but will NOT yield same results C55 PPU stalls C55 PPU stalls ADDADD until AC0 until AC0 AR5 AR5


ESIEE, Slide 153

Other C55 Assembler ExamplesOther C55 Assembler Examples

b_delay: BD nextb_delay: BD next

STM #x,AR5STM #x,AR5


No delayed operations on C55No delayed operations on C55

Delay slot instruction (Delay slot instruction (STMSTM) moved ahead of) moved ahead ofstandard branch (same with standard branch (same with CALLD, RETD, BCDCALLD, RETD, BCD))

b_delay: AMOV #x,AR5b_delay: AMOV #x,AR5

B nextB next

mv: MVDK *AR3,#800hmv: MVDK *AR3,#800h

Resulting C55 .lst FileResulting C55 .lst File

““PORT”PORT” used to signify 1-to-multiple translation used to signify 1-to-multiple translation Some temporary registers may be used (e.g. Some temporary registers may be used (e.g. XCDPXCDP) )

57 ****** PORT MVDK *AR4,#800h 57 ****** PORT MVDK *AR4,#800h 57 000038 EC31 AMAR *(#(2048)),XCDP57 000038 EC31 AMAR *(#(2048)),XCDP 00003a 7E00 00003a 7E00 00003c 1000 00003c 1000 57 00003e EF81 MOV *AR4,*CDP57 00003e EF81 MOV *AR4,*CDP

Original C54 CodeOriginal C54 Code

Next, let’s examine some system issues...Next, let’s examine some system issues...


ESIEE, Slide 154

System Issues (1)System Issues (1) The user must make the following modifications to the The user must make the following modifications to the

existing C54 code to achieve proper execution on a C55:existing C54 code to achieve proper execution on a C55:

Add System Stack (XSSP)Add System Stack (XSSP) Must also initialize the system stack (XSSP) for CALL/RETMust also initialize the system stack (XSSP) for CALL/RET

instructions to operate properly on C55instructions to operate properly on C55

SP and SSP must reside on the same 64K (word) pageSP and SSP must reside on the same 64K (word) page

22

rsv: BD reset_ISRrsv: BD reset_ISR

STM #TOS,SPSTM #TOS,SP

Vector Table - C54Vector Table - C54 Vector Table - C55Vector Table - C55

.ivec specifies vector address and aligns vector packet.ivec specifies vector address and aligns vector packeton an 8-byte boundary.on an 8-byte boundary.

C55 vector packet allows one instruction (up to 4 bytes)C55 vector packet allows one instruction (up to 4 bytes)to be executedto be executed

rsv: .ivec reset_ISRrsv: .ivec reset_ISR

AMOV #TOS,XSPAMOV #TOS,XSP

11 Re-Write Vector TableRe-Write Vector Table


ESIEE, Slide 155

System Issues (2)System Issues (2)33 Re-write any instructions which access peripheralsRe-write any instructions which access peripherals

C55 does not support sub-bank addressingC55 does not support sub-bank addressing

C55 peripheral registers are located in I/OC55 peripheral registers are located in I/O

C55 may have a different mix of peripheralsC55 may have a different mix of peripherals

44 Re-write linker.CMD file to use BYTE addresses/lengthsRe-write linker.CMD file to use BYTE addresses/lengths C55 uses a unified memory map (program/data share same map)C55 uses a unified memory map (program/data share same map)

C55 linker.CMD file uses BYTE addresses and BYTE lengthsC55 linker.CMD file uses BYTE addresses and BYTE lengths

55 Possibly re-write C-callable assembly routinesPossibly re-write C-callable assembly routines C55 compiler passes parameters differently than the C54C55 compiler passes parameters differently than the C54

by defaultby default

C55 can support C54 parameter passing rules if specifiedC55 can support C54 parameter passing rules if specifiedby the userby the user


ESIEE, Slide 156

User Must Optimize C54 CodeUser Must Optimize C54 Code

The final step in the migration process is to optimize theThe final step in the migration process is to optimize thecode to take advantage of the new features of the C55:code to take advantage of the new features of the C55:

Dual MACDual MAC Additional registersAdditional registers Additional memory address rangeAdditional memory address range Address unit ALUAddress unit ALU Instruction-level parallelismInstruction-level parallelism Nested repeat blocksNested repeat blocks Etc.Etc.


ESIEE, Slide 157

For More InformationFor More Information This introduction to C54-to-C55 code migration isThis introduction to C54-to-C55 code migration is

NOTNOT a complete list of all issues a complete list of all issues

Additional details can be found in the following locations:Additional details can be found in the following locations:

SPRU280b - TMS320C55x Assembly Language Tools User’s GuideSPRU280b - TMS320C55x Assembly Language Tools User’s Guide

Chapter 6 - Running C54x Code on C55xChapter 6 - Running C54x Code on C55x

Chapter 7 - Migrating a C54x System to a C55x SystemChapter 7 - Migrating a C54x System to a C55x System

CCS On-line HelpCCS On-line Help

Code Generation Tools Code Generation Tools Running C54x Code on C55x Running C54x Code on C55x

copyright © 2003 texas instruments. all rights reserved. dsp c5000 chapter 5 assembly language

Documents

slide slide

field2 unsignedfield1

field2 field1

field2 field4

slide shift field

field2 result

cycle slide

assembly language slide