copyright © 2003 texas instruments. all rights reserved. dsp c5000 chapter 5 assembly language
TRANSCRIPT
Copyright © 2003 Texas Instruments. All rights reserved.
DSP C5000DSP C5000
Chapter 5Chapter 5
Assembly LanguageAssembly Language
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 2
Assembly LanguageAssembly Language
Two Main types of assembly language :Two Main types of assembly language : AlgebraicAlgebraic MnemonicMnemonic
Both C54x and C55x can use either type Both C54x and C55x can use either type of assembly language.of assembly language.
C54C54
Running C54 code on the C55
C55C55
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 3
C54x Assembly LanguageC54x Assembly Language
The instruction set is divided into four The instruction set is divided into four basic types :basic types : ArithmeticArithmetic LogicLogic LoadLoad, Store & , Store & MoveMove ProgramProgram control control
C54x has a fixed length instruction C54x has a fixed length instruction wordword
Instruction must be encoded in one 16-bit word Instruction must be encoded in one 16-bit word in order to be executed in one cyclein order to be executed in one cycle
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 4
Instructions and OperandsInstructions and Operands
General syntax of an instruction :General syntax of an instruction : InstrInstr Op1,[Op2,[Op3,[…]]] Op1,[Op2,[Op3,[…]]]
For For InstrInstr field refer to TI documentation or field refer to TI documentation or following slidesfollowing slides
The The Op1,[Op2,[Op3,[…]]]Op1,[Op2,[Op3,[…]]] field syntax is field syntax is specified in instruction documentation and specified in instruction documentation and specifies the way (type of addressing mode) specifies the way (type of addressing mode) you could use for the operands.you could use for the operands.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 5
Operand SyntaxOperand Syntax
Term Operand nature and access mode
Smem Data memory accessed by any of addressing mode.
pmad Absolute addressing of program memory
dmad Absolute addressing of data memory
Ymem Data memory accessed only by indirect addressing (AR2 through . AR5 only).
Xmem Data memory accessed only by indirect addressing (AR2 through AR5 only).
PA 16-bit port (I/O) immediate address (0 - 65,535)
src Source accumulator (A or B) dst Destination accumulator (A or B)
k,lk Short and long immediate addressing (#constant)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 6
Arithmetic InstructionsArithmetic Instructions General purpose Arithmetic :General purpose Arithmetic :
Addition/Addition/subtractionsubtraction MultiplyMultiply ( (and accumulateand accumulate)) SquareSquare DivideDivide
Application Specific Arithmetic :Application Specific Arithmetic : MiscellaneousMiscellaneous Polynomial Polynomial evaluationevaluation Distance computationDistance computation Specific filtersSpecific filters ButterflyButterfly computation computation (Viterbi) (Viterbi)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 7
General Purpose ArithmeticGeneral Purpose Arithmetic Addition/subtractionAddition/subtraction
field4=field3 field4=field3 field1*2field1*2field2field2
Result is stored in field4 if present else in field3, shift is done Result is stored in field4 if present else in field3, shift is done according to field2 if present.according to field2 if present.
Shift field is detailed on the next slide …Shift field is detailed on the next slide …
Inst field1 field2 field3 field4 words1 ADD/SUB Smem, src 12 ADD/SUB Smem, TS, src 13 ADD/SUB Smem, 16 src [,dst] 14. ADD/SUB Smem, [SHIFT,] src [,dst] 25 ADD/SUB Xmem, SHIFT1, src 16 ADD/SUB Xmem, Ymem, dst 17 ADD/SUB #lk, [SHIFT1,] src [,dst] 28 ADD/SUB #lk, 16 src [,dst] 29 ADD/SUB src, [SHIFT] [,dst] 110 ADD/SUB src, ASM [,dst] 1
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 8
Shift FieldShift Field
Many instructions use shift on one operand. Many instructions use shift on one operand. This shift is specified in operands field and This shift is specified in operands field and could be :could be : Immediate if specified by the keywords: Immediate if specified by the keywords:
-16 -16 SHIFTSHIFT 15 15 0 0 SHIFT1SHIFT1 15 15
Register indirect:Register indirect: -16 -16 ASMASM 15 ( 15 (AAccumulator ccumulator SShift hift MMode field of ode field of ST1ST1)) -16 -16 TSTS 31 ( 31 (TSTS are ths 6 LSBs of are ths 6 LSBs of T T register)register)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 9
General Purpose ArithmeticGeneral Purpose Arithmetic Addition/subtraction (special cases)Addition/subtraction (special cases)
Arithmetic with unsigned operand :Arithmetic with unsigned operand :
field2=field2 field2=field2 unsigned(field1)unsigned(field1)
Direct computation on memory :Direct computation on memory :
With :With :
Inst field1 field2 words1 ADDS/SUBS Smem, src 1
Inst field1 field2 words1 ADDM #lk, Smem 2
SXMSXM=1=1
field2=field2 field2=field2 +signed(field1)+signed(field1)
-32768 -32768 lk lk 3276732767
SXMSXM=0=0
field2=field2 +unsigned(field1)field2=field2 +unsigned(field1)
00 lk lk 6553565535
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 10
General Purpose ArithmeticGeneral Purpose Arithmetic Addition/subtraction (extended precision) :Addition/subtraction (extended precision) :
32 bits : 32 bits :
1 : field2=field2 1 : field2=field2 field1field1 2 : field2=field1-field22 : field2=field1-field2
If If C16C16=0 field1 and field2 are considered as 32-bit =0 field1 and field2 are considered as 32-bit operands and 32 bits operands and 32 bits is realized. is realized.
If If C16C16=1 field1 and field2 are considered as a pair of 16-bit =1 field1 and field2 are considered as a pair of 16-bit operands and SIMD computation take place.operands and SIMD computation take place.
64 bits :64 bits :
field2=field2 field2=field2 unsigned(field1) unsigned(field1) carry/borrowcarry/borrow
Inst field1 field2 words1 DADD/DSUB Lmem src 12 DRSUB Lmem src 1
Inst field1 field2 words1 ADDC/SUBB Smem, src 1
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 11
General Purpose ArithmeticGeneral Purpose Arithmetic Addition/subtraction (extended precision contd)Addition/subtraction (extended precision contd)
64 bits addition/subtraction :64 bits addition/subtraction :
is realized as follows: is realized as follows: ((Look at a code exemple) at a code exemple)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 12
General Purpose ArithmeticGeneral Purpose Arithmetic MultiplyMultiply
1,3 : field3=field11,3 : field3=field1TREGTREG 22
2,4 : field3=field12,4 : field3=field1field2field2
Multiply and Accumulate/SubtractMultiply and Accumulate/Subtract
1 : field3=field3 1 : field3=field3 (field1 (field1TREGTREG) ) 22
3 : field4=field3 3 : field4=field3 (field1 (field1TREGTREG))11
2,4 : field4=field3 2,4 : field4=field3 (field1 (field1field2) field2) 1, 21, 2
Inst field1 field2 field3 words1 MPY[R] Smem, dst 12 MPY Xmem, Ymem, dst 13 MPY #lk dst 24 MPY Smem, #lk, dst 2
1 1 Result is stored in field4 if present else in field3Result is stored in field4 if present else in field322 [R] : rounding result on the 16bits MSB of [R] : rounding result on the 16bits MSB of dst,dst, 16bits LSB are zeroed 16bits LSB are zeroed
Inst field1 field2 field3 field4 words1 MAC[R]/MAS[R] Smem, src 12 MAC[R]/MAS[R] Xmem, Ymem, src [,dst] 13 MAC/MAS #lk src [,dst] 24 MAC/MAS Smem, #lk, src [,dst] 2
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 13
General Purpose ArithmeticGeneral Purpose Arithmetic Multiply and Accumulate (with program memory)Multiply and Accumulate (with program memory)
1,2 : field4=field3 1,2 : field4=field3 (field1 (field1field2)field2) 2 : contents of data memory pointed by field1 operand (Smem) 2 : contents of data memory pointed by field1 operand (Smem)
is copied in next following data memory address.is copied in next following data memory address.
Delay of data together with scalar product are needed Delay of data together with scalar product are needed for FIR filter computationfor FIR filter computation
1 : 1 : TREGTREG=field1 and contents of data memory pointed by =field1 and contents of data memory pointed by field1 is copied in the next following data memory address.field1 is copied in the next following data memory address.
Inst field1 field2 field3 words
1 MACP Smem, pmad, src 22 MACD Smem, pmad, src 2
Inst field1 words1 LTD Smem 1
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 14
General Purpose ArithmeticGeneral Purpose Arithmetic Multiply, Accumulate and DelayMultiply, Accumulate and Delay
In all cases In all cases x(n-k)x(n-k) are in data memory are in data memory Case 1: Case 1: h(k)h(k) are in program memory are in program memory
Case 2: Case 2: h(k)h(k) are in data memory are in data memory
y n h k x n kk
N
( ) ( ) ( )
0
1
y n h k x n kk
N
( ) ( ) ( )
0
1
RPTRPT #N-1#N-1
MACDMACD *AR1-,coef,A*AR1-,coef,ARPTRPT #N-1#N-1
MACDMACD *AR1-,coef,A*AR1-,coef,A
RPTBRPTB endLoop-1endLoop-1
LTDLTD *AR1-*AR1-
MACMAC *AR2+,A*AR2+,A
endLoop:endLoop:
RPTBRPTB endLoop-1endLoop-1
LTDLTD *AR1-*AR1-
MACMAC *AR2+,A*AR2+,A
endLoop:endLoop:
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 15
General Purpose ArithmeticGeneral Purpose Arithmetic Multiply (with accumulator)Multiply (with accumulator)
1 : 1 : BB=field1 =field1 A(32-16)A(32-16)11
2 : field1=2 : field1=TREGTREGA(32-16)A(32-16)11
Multiply and Accumulate/Subtract(with Multiply and Accumulate/Subtract(with accumulator)accumulator)
1 : 1 : BB==BB (field1 (field1 A(32-16)A(32-16)))1,21,2
2 : field3=field2 2 : field3=field2 ( (TREGTREGA(32-16)A(32-16)))1,2,31,2,3
Inst field1 words1 MPYA Smem 12 MPYA dst 1
11A(32-16)A(32-16) stands for the 16bits MSB of accumulator stands for the 16bits MSB of accumulator AA, , BB stands for accumulator stands for accumulator BB..22 [R] : rounding result on the 16bits MSB of [R] : rounding result on the 16bits MSB of dst,dst, 16bits LSB are zeroed 16bits LSB are zeroed3 3 Result is stored in field3 if present else in field2Result is stored in field3 if present else in field2
Inst field1 field2 field3 words1 MACA[R]/MASA Smem [,B] 12 MACA[R]/MASA[R] T, src [,dst] 1
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 16
General Purpose ArithmeticGeneral Purpose Arithmetic
Extended precision multiplicationExtended precision multiplication
1: field3=unsigned(field1) 1: field3=unsigned(field1) unsigned(unsigned(TREGTREG)) 2: field3=field3 +(unsigned(field1) 2: field3=field3 +(unsigned(field1) signed(field2))signed(field2))
MPYUMPYU is equivalent to is equivalent to MPYMPY syntax 1, but with unsigned syntax 1, but with unsigned operands.operands.
MACSUMACSU is equivalent to is equivalent to MACMAC syntax 2, but with field1 syntax 2, but with field1 operand unsigned.operand unsigned.
Inst field1 field2 field3 words1 MPYU Smem, src 12 MACSU Xmem, Ymem, src 1
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 17
General Purpose ArithmeticGeneral Purpose Arithmetic Extended precision multiplicationExtended precision multiplication
Principle :Principle :
LookLook at a code exemple at a code exemple
MPYU
MACSU
MAC
MACSU
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 18
General Purpose ArithmeticGeneral Purpose Arithmetic
SquareSquare
1: field2=field1 1: field2=field1 field1 field1 2: field2=2: field2=A(32-16) A(32-16) A(32-16) A(32-16) 11
3: field2=field2 3: field2=field2 (field1(field1field1)field1)
Inst field1 field2 words1 SQUR Smem, dst 12 SQUR A, dst 13 SQURA/SQURS Smem, src 1
11A(32-16)A(32-16) stands for the 16bits MSB of accumulator stands for the 16bits MSB of accumulator AA
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 19
General Purpose ArithmeticGeneral Purpose Arithmetic
DivideDivide Division is implemented by using repeated conditional Division is implemented by using repeated conditional
subtraction. subtraction.
Perform a single cycle 1-bit unsigned divide instruction:Perform a single cycle 1-bit unsigned divide instruction: Dividend (numerator) is in LSB of Dividend (numerator) is in LSB of srcsrc and divisor in and divisor in SmemSmem, ,
then :then :
after operation the quotient is LSB of after operation the quotient is LSB of srcsrc and remainder in and remainder in MSB of MSB of src.src.
Inst field1 field2 words1 SUBC Smem, src 1
(src) - (Smem) << 15 --> ALU output(src) - (Smem) << 15 --> ALU output
If If ALU output ALU output 00
ThenThen (ALU output) << 1 + 1 --> src(ALU output) << 1 + 1 --> src
ElseElse (src) << 1 --> src(src) << 1 --> src
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 20
General Purpose ArithmeticGeneral Purpose Arithmetic Division Routine Division Routine ((MoreMore examples) examples)
LD @den,16,A
MPYA @num B = num*den (tells sign)
ABS A Strip sign of denominator
STH A,@den
LD @num,A
ABS A Strip sign of numerator
RPT #15 16 iterations
SUBC @den,A 1-bit divide
XC 1,BLT If result needs to be negative
NEG A Invert sign
STL A,@quot Store negative result
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 21
Miscellaneous ArithmeticMiscellaneous Arithmetic ABSABS src,[dst]src,[dst] ; compute the absolute ; compute the absolute
value of value of srcsrc and store it in and store it in dst dst if specified if specified srcsrc otherwise (otherwise (dst=|src|dst=|src|).).
NEG src,[dst]NEG src,[dst] ; store the 2s complement of ; store the 2s complement of srcsrc in in dstdst if specified, if specified, srcsrc otherwise ( otherwise (dst=-srcdst=-src).).
MAX(dst)MAX(dst),,MIN(dst)MIN(dst) ; store in ; store in dstdst the greatest the greatest ((resp.resp. the lowest) between the lowest) between AA and and BB accumulator accumulator ((dst=MAX(A,B)dst=MAX(A,B),, dst=MIN(A,B) dst=MIN(A,B)). ).
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 22
Miscellaneous ArithmeticMiscellaneous Arithmetic
EXPEXP//NORMNORM :Tools for fixed point to (block)- :Tools for fixed point to (block)-floating point conversionfloating point conversion11 Store the high part of the accumulator (Store the high part of the accumulator (AA or or BB) in ) in
« Mantissa*2^Exponent » form.« Mantissa*2^Exponent » form.
EXP srcEXP src ; compute the number of shift necessary ; compute the number of shift necessary to normalize the high part of the accumulator to normalize the high part of the accumulator srcsrc and store it in and store it in T T register (register (T=EXP(src)T=EXP(src)) . Here ) . Here T=3T=3 after operation. Because of the guard bits, after operation. Because of the guard bits, TT could be negative after operation.could be negative after operation.
1 1 see ch13 «Numerical Issues » for in depth explanation of floating point formatsee ch13 «Numerical Issues » for in depth explanation of floating point format
Guard bitsGuard bits High partHigh part low partlow part
……/…/…
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 23
Miscellaneous ArithmeticMiscellaneous Arithmetic NORM src,[dst] NORM src,[dst] ; The contents of the accumu-; The contents of the accumu-
lator lator srcsrc is shifted according to the value in is shifted according to the value in TT reg reg ans stored in ans stored in dstdst if specified, if specified, src src otherwise . otherwise . ((dst=src<<TSdst=src<<TS).).Accumulator after operation :Accumulator after operation :
Guard bitsGuard bits High partHigh part low partlow part
.bss Mantissa,2,1.bss Mantissa,2,1ExpoExpo .set Mantissa+1.set Mantissa+1
.text.text
.bss Mantissa,2,1.bss Mantissa,2,1ExpoExpo .set Mantissa+1.set Mantissa+1
.text.text
; A = 1234h; A = 1234hformat: LD #Mantissa,DPformat: LD #Mantissa,DP
EXP AEXP A NORM ANORM A STST T,@ExpoT,@Expo STH A,@MantissaSTH A,@Mantissa
; A = 1234h; A = 1234hformat: LD #Mantissa,DPformat: LD #Mantissa,DP
EXP AEXP A NORM ANORM A STST T,@ExpoT,@Expo STH A,@MantissaSTH A,@Mantissa 0012
48D0MantissaMantissaExpoExpo
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 24
Miscellaneous ArithmeticMiscellaneous Arithmetic
Rounding and saturation are intended for Rounding and saturation are intended for finite precision and finite dynamic number finite precision and finite dynamic number representation :representation : RND src,[dst] RND src,[dst] ; the high part of the accumulator ; the high part of the accumulator
srcsrc is rounded up and stored either in is rounded up and stored either in dst dst if if specified or in specified or in src src ((dst=rnd(src)dst=rnd(src)).).
SAT src SAT src ; If ; If srcsrc is greater than 32767 then is greater than 32767 then src src is is set to 32767 (007FFFh). If set to 32767 (007FFFh). If srcsrc is lower than –32768 is lower than –32768 then then src src is set to –32768 (FF8000h) is set to –32768 (FF8000h) ((SATURATE(src)SATURATE(src)). ).
Before roundingBefore rounding After roundingAfter rounding
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 25
Polynomial EvaluationPolynomial Evaluation Considering the 3rd order polynomial:Considering the 3rd order polynomial:
It can be computed as:It can be computed as:
in an iterative way:in an iterative way:
012
23
3 axaxaxay 01
22
33 axaxaxay
0123 axaxaxay 0123 axaxaxay
;
;
;
0
1
23
ayxy
ayxy
axay
;
;
;
0
1
23
ayxy
ayxy
axay
……/…/…
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 26
Polynomial EvaluationPolynomial Evaluation Before using Before using POLYPOLY instruction, we have to load instruction, we have to load TT
reg. with the proper value for reg. with the proper value for xx.. POLY SmemPOLY Smem ; The high part of accumulator ; The high part of accumulator AA is is
multiplied by multiplied by TT reg. then added with the high part reg. then added with the high part of of BB and stored in and stored in A. A. The high part of accumula-The high part of accumula-tor tor BB is loaded with the contents of is loaded with the contents of SmemSmem (current (current coefficient) (coefficient) (POLY(Smem)POLY(Smem)).).
coefcoef .sect “COEF”.sect “COEF”.word 1234h,3456h;.word 1234h,3456h;.word 4567h,5678h;.word 4567h,5678h;.bss.bss y,1y,1.text.text
coefcoef .sect “COEF”.sect “COEF”.word 1234h,3456h;.word 1234h,3456h;.word 4567h,5678h;.word 4567h,5678h;.bss.bss y,1y,1.text.text
; A(15-0) = 7FFCh (x); A(15-0) = 7FFCh (x)PoEval: STLM A,TPoEval: STLM A,T
STM #COEF,AR1STM #COEF,AR1 LD *AR1+,16,ALD *AR1+,16,A LD *AR1+,16,BLD *AR1+,16,B RPT #2RPT #2 POLY *AR1+POLY *AR1+ STHSTH A,*(y) A,*(y)
; A(15-0) = 7FFCh (x); A(15-0) = 7FFCh (x)PoEval: STLM A,TPoEval: STLM A,T
STM #COEF,AR1STM #COEF,AR1 LD *AR1+,16,ALD *AR1+,16,A LD *AR1+,16,BLD *AR1+,16,B RPT #2RPT #2 POLY *AR1+POLY *AR1+ STHSTH A,*(y) A,*(y)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 27
Distance ComputationDistance Computation ABDSTABDST Xmem,YmemXmem,Ymem ; computes the L1 ; computes the L1
norm of the distance between 2 vectors according norm of the distance between 2 vectors according to (to (ABDST(Xmem,Ymem)ABDST(Xmem,Ymem)):):
1
0
N
iii yxD
1
0
N
iii yxD.bss X,10.bss X,10
.bss Y,10.bss Y,10
.bss D,1.bss D,1
.text.text
.bss X,10.bss X,10
.bss Y,10.bss Y,10
.bss D,1.bss D,1
.text.text
dist: STMdist: STM #X,AR2#X,AR2STMSTM #Y,AR3#Y,AR3RPTRPT #10#10ABDSTABDST *AR2+,*AR3+*AR2+,*AR3+STHSTH B,*(D)B,*(D)
dist: STMdist: STM #X,AR2#X,AR2STMSTM #Y,AR3#Y,AR3RPTRPT #10#10ABDSTABDST *AR2+,*AR3+*AR2+,*AR3+STHSTH B,*(D)B,*(D)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 28
Distance ComputationDistance Computation SQDSTSQDST Xmem,YmemXmem,Ymem ; computes the ; computes the
squared Lsquared L22 norm of the distance between 2 vectors norm of the distance between 2 vectors according to (according to (SQDST(Xmem,Ymem)SQDST(Xmem,Ymem)):):
1
0
2N
iii yxD
1
0
2N
iii yxD.bss X,10.bss X,10
.bss Y,10.bss Y,10
.bss D,1.bss D,1
.text.text
.bss X,10.bss X,10
.bss Y,10.bss Y,10
.bss D,1.bss D,1
.text.text
dist: STMdist: STM #X,AR2#X,AR2STMSTM #Y,AR3#Y,AR3RPTRPT #10#10SQDSTSQDST *AR2+,*AR3+*AR2+,*AR3+STHSTH B,*(D)B,*(D)
dist: STMdist: STM #X,AR2#X,AR2STMSTM #Y,AR3#Y,AR3RPTRPT #10#10SQDSTSQDST *AR2+,*AR3+*AR2+,*AR3+STHSTH B,*(D)B,*(D)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 29
Specific Filters IntructionsSpecific Filters Intructions
Symmetric FIR filtersSymmetric FIR filters11::An even length symetric FIR filter can be computed An even length symetric FIR filter can be computed
according to:according to:
which yields which yields NN multiplications, because of the multiplications, because of the symmetry of symmetry of h(k)h(k), the equation can be rewritten:, the equation can be rewritten:
yielding only yielding only N/2N/2 multiplications. This optimization multiplications. This optimization is handled by is handled by FIRSFIRS instruction. instruction.
y n h k x n kk
N
( ) ( ) ( )
0
1
y n h k x n kk
N
( ) ( ) ( )
0
1
12
0
1)()()(N
k
kNnxknxkhny
12
0
1)()()(N
k
kNnxknxkhny
1 1 see ch14 «FIR filter implementation » for a full treatment of this topicsee ch14 «FIR filter implementation » for a full treatment of this topic
……/…/…
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 30
Specific Filters IntructionsSpecific Filters Intructions FIRS Xmem,Ymem,pmad FIRS Xmem,Ymem,pmad ;The high part of ;The high part of
accumulator accumulator AA is multiplied by the content of is multiplied by the content of pmadpmad and accumulated in accumulator and accumulated in accumulator BB. . XmemXmem and and YmemYmem are added together and stored in the are added together and stored in the high part of accumulator high part of accumulator A A ((FIRS (Xmem,Ymem,pmad)FIRS (Xmem,Ymem,pmad)).).
At each step, At each step, FIRSFIRS do the following computation: do the following computation:
where where y(n)y(n) is in accumulator is in accumulator BB and and tmptmp in in accumulator accumulator AA..
;111
);()()(
kNnxknxtmp
khtmpnyny
;111
);()()(
kNnxknxtmp
khtmpnyny
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 31
Specific Filters IntructionsSpecific Filters Intructions
LMS AlgorithmLMS Algorithm11::LMS Adaptive filtering require to update coeffi-LMS Adaptive filtering require to update coeffi-
cients of the filter according to an error signal cients of the filter according to an error signal e(n)e(n) while computing the output of the filter while computing the output of the filter y(n). y(n). This This involves the following computations:involves the following computations:
At each step we have two computations: one for the At each step we have two computations: one for the filter tap and update of the coefficient tap.filter tap and update of the coefficient tap.
1 1 see ch16 «Adaptive Filter Implementation » for a full treatment of this topicsee ch16 «Adaptive Filter Implementation » for a full treatment of this topic
;each for )()( 2)()1(
ncomputatioerror );()(ˆ)(
outputfilter );()()(
kkk
kk
hknxnenhnh
nynyne
knxnhny
;each for )()( 2)()1(
ncomputatioerror );()(ˆ)(
outputfilter );()()(
kkk
kk
hknxnenhnh
nynyne
knxnhny
……/…/…
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 32
Specific Filters IntructionsSpecific Filters Intructions LMS Xmem,YmemLMS Xmem,Ymem ; ; XmemXmem is accumulated to the is accumulated to the
high part of accumulator high part of accumulator AA with rounding while with rounding while XmemXmem and and YmemYmem are multiplied and accumula-ted are multiplied and accumula-ted into accumulator into accumulator B B ((LMS(Xmem,Ymem)LMS(Xmem,Ymem)).).
At each step At each step LMSLMS do the following computations: do the following computations:
where where y(n)y(n) is in accumulator is in accumulator BB and and tmptmp in in accumu-lator accumu-lator AA. In addition others instructions . In addition others instructions have to store in accumulator have to store in accumulator AA the error times the the error times the adaptation step and store in adaptation step and store in XmemXmem the updated the updated coefficient value (coefficient value (ST||MPYST||MPY).).
);(
);()()()(
nhtmptmp
knxnhnyny
k
k
);(
);()()()(
nhtmptmp
knxnhnyny
k
k
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 33
Butterfly ComputationButterfly Computation
These instructions are only useful in dual 16 bits mode These instructions are only useful in dual 16 bits mode ((C16C16=1)=1)
1: 1: dstdst(31-16)=Lmem(31-16)+(31-16)=Lmem(31-16)+TREG TREG
dstdst(15-0)=Lmem(15-0)-(15-0)=Lmem(15-0)-TREGTREG
2: 2: dstdst(31-16)=Lmem(31-16)-(31-16)=Lmem(31-16)-TREG TREG
dstdst(15-0)=Lmem(15-0)+(15-0)=Lmem(15-0)+TREGTREG
3: 3: dstdst(31-16)=Lmem(31-16)-(31-16)=Lmem(31-16)-TREG TREG
dstdst(15-0)=Lmem(15-0)-(15-0)=Lmem(15-0)-TREGTREG
1 1 see ch22 «Viterbi Algorithm» for in depth explanation and see ch22 «Viterbi Algorithm» for in depth explanation and CMPS for other for other
Viterbi related instructionsViterbi related instructions
Inst field1 field2 words1 DADST Lmem, dst 12 DSADT Lmem, dst 13 DSUBT Lmem, dst 1
dd
-d-d
NN N+1N+1
-d-d
dd
NN N+1N+1
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 34
Logic InstructionsLogic Instructions
OverviewOverview LogicLogic
Comparison andComparison and bit test bit test
Shift Shift and rotateand rotate
ANDAND OROR XORXOR CMPLCMPL
ANDMANDM ORMORM XORMXORM
SFTLSFTL SFTASFTA SSFTCFTC
RORROR ROLROL ROLTCROLTC
CMPMCMPM CMPRCMPR CMPSCMPS
BITBIT BITFBITF
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 35
Logic InstructionsLogic Instructions Logic operations on accumulatorsLogic operations on accumulators
1,2,3: field4=field3 [1,2,3: field4=field3 [ + + ] field1*2] field1*2field2field2
Result is stored in field4 if present else in field3, shift is done Result is stored in field4 if present else in field3, shift is done according to field2 if present.according to field2 if present.
4: field4=field4 [4: field4=field4 [ + + ] field1*2] field1*2field2field2
field4 is used if present else field1is used instead, shift is done field4 is used if present else field1is used instead, shift is done according to field2 if present.according to field2 if present.
Shift field is recalled on the this Shift field is recalled on the this slideslide … …
field2=field1field2=field111
Result is stored in field2 if present else in field1Result is stored in field2 if present else in field1
Inst field1 field2 field3 field4 words1 AND/OR/XOR Smem, src 12 AND/OR/XOR #lk [,SHIFT1] src [,dst] 23 AND/OR/XOR #lk, 16, src [,dst] 24 AND/OR/XOR src [,SHIFT] [,dst] 1
Inst field1 field2 words1 CMPL src, [,dst] 1
11bit complementbit complement
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 36
Logic InstructionsLogic Instructions
Logic with memoryLogic with memory
field2=field1 [field2=field1 [ + + ] field2] field2 About About ANDMANDM look at look at BITFBITF
Inst field1 field2 words1 ANDM/ORM/XORM #lk, Smem 2
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 37
Logic InstructionsLogic Instructions
Comparison (memory)Comparison (memory)
Equality testEquality test TCTC=1 if field1==filed2, else =1 if field1==filed2, else TCTC=0=0
Comparison (auxiliary register)Comparison (auxiliary register)
Versatile comparisonVersatile comparison ARxARx is compared against is compared against AR0AR0 according to according to CC CC (field1) and (field1) and TCTC
is set if compare successis set if compare success
Inst field1 field2 words1 CMPM Smem, #lk 2
Inst field1 field2 words1 CMPR CC, ARx 2
00 =01 !=10 >
11 <
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 38
Logic InstructionsLogic Instructions
Compare, select, store (and remember)Compare, select, store (and remember)Intended for Viterbi algorithm (see Chapter 22 for an in depth treatment, Intended for Viterbi algorithm (see Chapter 22 for an in depth treatment,
see see DSADT and DADST for other Viterbi related instructions) for other Viterbi related instructions)xx
yy
??NN N+1N+1
Two paths arrive to a node of stage Two paths arrive to a node of stage N+1N+1
from stage from stage NN. Only one will be retained. Only one will be retained
according to its weight according to its weight xx or or yy..
Inst field1 field2 words1 CMPS src, Smem 1
We guess that src(32-16)=We guess that src(32-16)=xx and src(15-0)= and src(15-0)=yy then then
If src(32-16) > src(15-0) thenIf src(32-16) > src(15-0) then
Smem=src(31-16)Smem=src(31-16)
TRN=(TRN) << 1TRN=(TRN) << 1
TRN(0)=0TRN(0)=0
TC=0TC=0
Else src(32-16) Else src(32-16) src(15-0) then src(15-0) then
Smem=src(15-0)Smem=src(15-0)
TRN=(TRN) << 1TRN=(TRN) << 1
TRN(0)=1TRN(0)=1
TC=1TC=1
TransitionTransition
registerregister
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 39
Logic InstructionsLogic Instructions
Bit testBit test
BITBIT and and BITTBITT set set TCTC according to a bit value in a word according to a bit value in a word specified by the operand in field1. The bit number is specified specified by the operand in field1. The bit number is specified either by either by BITCBITC in case of syntax 1 or by in case of syntax 1 or by TT[3..0] register in case [3..0] register in case of syntax 2.of syntax 2.
Bit numbering is in reverse order, with 0 corresponding to the Bit numbering is in reverse order, with 0 corresponding to the MSB and 15 to the LSB.MSB and 15 to the LSB.
Bit field testBit field test
TCTC is set according the result of (field1 is set according the result of (field1 field2) field2) For this instruction look also at For this instruction look also at ANDMANDM
Inst field1 field2 words1 BIT Xmem, BITC 12 BITT Smem 1
Inst field1 field2 words1 BITF Smem, #lk 2
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 40
Logic InstructionsLogic Instructions
Shift and rotateShift and rotate
field3=field1 field3=field1 2 2field2field2
Inst field1 field2 field3 words1 SFTL/SFTA src, SHIFT [,dst] 1
For shift field content information see this For shift field content information see this slide
• Field 1 is left or right shifted according to the sign of Field 1 is left or right shifted according to the sign of SHIFTSHIFT and stored and stored
in field3 if present, field 1 otherwisein field3 if present, field 1 otherwise
• SFTLSFTL stands for LOGICAL shift : input bits are equal to 0 stands for LOGICAL shift : input bits are equal to 0
• SFTASFTA stands for ARITHMETIC shift : input low order bits are 0 in stands for ARITHMETIC shift : input low order bits are 0 in
case of positive case of positive SHIFTSHIFT. Input high order bits are equal to the sign bits. Input high order bits are equal to the sign bits
(if (if SXMSXM11 is set) when is set) when SHIFTSHIFT is negative. is negative.
11Sign Extension ModeSign Extension Mode
Inst field1 words1 SFTC src 1
•Shift conditionaly (Shift conditionaly (SFTCSFTC) apply to signed, one left shift is done to remove one ) apply to signed, one left shift is done to remove one
redundant sign bit (redundant sign bit (TCTC is then set) otherwise nothing is done and is then set) otherwise nothing is done and TCTC is reset. is reset.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 41
Logic InstructionsLogic Instructions
Shift and rotateShift and rotate
RORROR perform one right rotate through the carry C on perform one right rotate through the carry C on src src (guard bits=0,(guard bits=0,srcsrc(31)=(31)=CC,,CC==srcsrc(0)).(0)).
ROLROL perform one left rotate through the carry C on perform one left rotate through the carry C on src src (guard bits=0,(guard bits=0,srcsrc(0)=(0)=CC,,CC==srcsrc(31)).(31)).
ROLTCROLTC perform one left rotate with perform one left rotate with TCTC as input and as input and CC as ouput. (guard bits=0,as ouput. (guard bits=0,srcsrc(0)=(0)=TCTC,,CC==srcsrc(31)).(31)).
Inst field1 words1 ROR/ROL/ROLTC src 1
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 42
Load, Store & Move Instructions Load, Store & Move Instructions Load & Store accumulatorLoad & Store accumulator
field3=field1* 2field3=field1* 2field2.field2.
shift is done according to field2 if present.shift is done according to field2 if present. Recall on Recall on SHIFTSHIFT field can be found at this field can be found at this placeplace
Inst field1 field2 field3 words1 LD Smem, dst 12 LD Smem, TS, dst 13 LD Smem, 16 dst 14. LD Smem, [SHIFT,] dst 25 LD Xmem, SHIFT1, dst 16 LD #K dst 17 LD #lk, [SHIFT1,] dst 28 LD #lk, 16, dst 29 LD src, [SHIFT] [,dst] 110 LD src, ASM [,dst] 1
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 43
Load, Store & Move InstructionsLoad, Store & Move Instructions
Load & Store accumulatorLoad & Store accumulator
1: field2=field1*21: field2=field1*21616+ 2+ 21515 or or dstdst(32-16)=field1+0.5(32-16)=field1+0.5 2: field2=unsigned(field1) or 2: field2=unsigned(field1) or dstdst(32-16)=0 and (32-16)=0 and
dstdst(15-0)=field1.(15-0)=field1. 3: particular case of syntax1 3: particular case of syntax1 LDLD for Memory Map for Memory Map
Register. Register.
Inst field1 field2 words1 LDR Smem, dst 12 LDU Smem, dst 1
3 LDM MMR, dst 1
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 44
Load, Store & Move InstructionsLoad, Store & Move Instructions
Load & Store accumulatorLoad & Store accumulator
STLSTL store store srcsrc(15-0) and (15-0) and STH STH srcsrc(32-16)(32-16) field3= field1* 2field3= field1* 2field2field2
shift is done according to field2 if present.shift is done according to field2 if present. Recall on Recall on SHIFTSHIFT field can be found at this field can be found at this placeplace
Same as syntax1 Same as syntax1 STL STL above except that field3 is a above except that field3 is a Memory Map Register.Memory Map Register.
Inst field1 field2 field3 words1 STL/STH src, Smem 12 STL/STH src, ASM, Smem 13 STL/STH src, SHIFT1, Xmem 14. STL/STH src, [SHIFT,] Smem 2
Inst field1 field2 field3 words1 STLM src, MMR 1
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 45
Load, Store & Move InstructionsLoad, Store & Move Instructions
Load & Store other registersLoad & Store other registers
Allows initialization of Allows initialization of TT, , DPDP or or ASMASM either from either from memory (memory (SmemSmem) or from an immediate value.) or from an immediate value.
##k3k3, #, #k5k5 and # and #k9k9 stand respectively for 3, 5 and 9 bits stand respectively for 3, 5 and 9 bits immediate value.immediate value.
ARPARP is only intended for ‘C25 compatibility mode and is only intended for ‘C25 compatibility mode and is not of interest in native ‘C54x software.is not of interest in native ‘C54x software.
Inst field1 field2 words1 LD Smem, T 12 LD Smem, DP 13 LD #k9 DP 14 LD #k5 ASM 15 LD Smem, ASM 16 LD #k3 ARP 1
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 46
Load, Store & Move InstructionsLoad, Store & Move Instructions Save others registers or write immediate to Save others registers or write immediate to
memory.memory.
field1=field2.field1=field2. Syntax 3 allows initialization of any data memory location with Syntax 3 allows initialization of any data memory location with
an immediate value.an immediate value.
Write an immediate 16 bit value into any Memory Map Write an immediate 16 bit value into any Memory Map Register. Register.
Inst field1 field2 words1 STM #lk, MMR 2
Inst field1 field2 words1 ST T, Smem 1
2 ST TRN, Smem 13 ST #lk, Smem 2
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 47
Load, Store & Move InstructionsLoad, Store & Move Instructions Direct transfer from memory to memoryDirect transfer from memory to memory
IO prog data data (MMR)IO PORTR
progMVPD
READA
data PORTWMVDP WRITA
MVDD MVDK MVKD
MVDM
data(MMR) MVMD MVMM
Destination SpaceDestination Space
Sour
ce s
pace
Sour
ce s
pace
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 48
Load, Store & Move InstructionsLoad, Store & Move Instructions
Data space Data space IO space IO space
field2=field1field2=field111
Data space Data space Prog. space Prog. space
1,3: field2=field11,3: field2=field111
2: source prog. memory address is specified by 2: source prog. memory address is specified by AA(15-0)(15-0) 4: destination prog. memory address is specified by 4: destination prog. memory address is specified by
AA(15-0)(15-0)
Inst field1 field2 words1 PORTR PA, Smem 22 PORTW Smem, PA 2
11 0 0 PA PA 65535, 0 65535, 0 pmad pmad 65535 65535
Inst field1 field2 words1 MVPD pmad, Smem 22 READA Smem 13 MVDP Smem, pmad 24 WRITA Smem 1
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 49
Load, Store & Move InstructionsLoad, Store & Move Instructions
Data space Data space Data space Data space
Data space Data space MMR MMR
MMR MMR MMR MMR11
Inst field1 field2 words1 MVDD Xmem, Ymem 12 MVKD pmad, Smem 23 MVDK Smem, pmad 2
Inst field1 field2 words1 MVDM dmad MMR 22 MVMD MMR, dmad 2
Inst field1 field2 words1 MVMM MMR1 MMR2 1
11MMR1,MMR2:AR0-AR7, SP onlyMMR1,MMR2:AR0-AR7, SP only
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 50
Program Control InstructionsProgram Control Instructions
† Values for words (W) and cycles assume the use of DARAM for data.
‡ Conditions true
§ Condition false
¶ Delayed instruction
Syntax Expression W† Cycles
B[D] pmad PC = pmad(15–0) 2 4/[2¶]BACC[D] src PC = src(15–0) 1 6/[4¶]
BANZ[D] pmad, Sind if (Sind !=00) then PC = pmad(15–0) 24‡/2§/[
2¶]
BC[D] pmad, cond [ , cond [ , cond] ] if (cond(s)) then PC = pmad(15–0) 25‡/3§/[
3¶]FB[D] extpmad PC = pmad(15–0), XPC = pmad(22–16) 2 4/[2¶]FBACC[D] src PC = src(15–0), XPC = src(22–16) 1 6/[4¶]
Branch instructions
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 51
Program Control InstructionsProgram Control Instructions
† Values for words (W) and cycles assume the use of DARAM for data.
‡ Conditions true, § Condition false, ¶ Delayed instruction
Syntax Expression W† Cycles
CALA[D] src– –SP, PC + 1[3¶] = TOS,
PC = src(15–0)1 6/[4¶]
CALL[D] pmad– –SP, PC + 2[4¶] = TOS,
PC = pmad(15–0)2 4/[2§]
CC[D] pmad, cond [ , cond [ , cond]]if (cond(s)) then – –SP,
PC + 2[4¶] = TOS, PC = pmad(15–0)
25‡/3§/[
3¶]
FCALA[D] src– –SP, PC + 1 [3¶] = TOS,
PC = src(15–0), XPC = src(22–16)
1 6/[4¶]
FCALL[D] extpmad– –SP, PC + 2[4¶] = TOS,
PC = pmad(15–0), XPC = pmad(22–16)
2 6/[4¶]
FBACC[D] srcPC = src(15–0),
XPC = src(22–16)2 6/[4¶]
Call instructions
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 52
Program Control InstructionsProgram Control Instructions
Syntax Expression W† Cycles
INTR K– –SP, + + PC = TOS,
PC = IPTR(15–7) + K << 2, INTM = 1
1 3
TRAP K– –SP, + + PC = TOS,
PC = IPTR(15–7) + K << 21 3
Interrupt instructions
† Values for words (W) and cycles assume the use of DARAM for data.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 53
Program Control InstructionsProgram Control InstructionsSyntax Expression W† Cycles
FRET[D]XPC = TOS, ++ SP,
PC = TOS, ++SP1 6/[4¶]
FRETE[D]XPC = TOS, ++ SP, PC = TOS, ++SP,
INTM = 01 6/[4¶]
RC[D] cond [ , cond [ , cond] ]if (cond(s)) then PC
= TOS, ++SP2
5‡/3§/[3¶]
RET[D] PC = TOS, ++SP 1 5/[3¶]
RETE[D]PC = TOS, ++SP,
INTM = 01 5/[3¶]
RETF[D]PC = RTN, ++SP,
INTM = 01 3/[1¶]
Return instructions
† Values for words (W) and cycles assume the use of DARAM for data.
‡ Conditions true, § Condition false, ¶ Delayed instruction
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 54
Program Control InstructionsProgram Control InstructionsSyntax Expression W† Cycles
RPT SmemRepeat single, RC = Smem
1 3
RPT # KRepeat single,
RC = #K1 1
RPT # lkRepeat single,
RC = #lk2 2
RPTB[D] pmad
Repeat block, RSA = PC + 2[4¶],
REA = pmad, BRAF = 1
2 4/[2¶]
RPTZ dst, # lkRepeat single,
RC = #lk, dst = 02 2
Repeat instructions
† Values for words (W) and cycles assume the use of DARAM for data.
‡ Conditions true, § Condition false, ¶ Delayed instruction
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 55
Program Control InstructionsProgram Control Instructions
Syntax Expression W† Cycles
FRAME K SP = SP + K 1 1
POPD Smem Smem = TOS, ++SP 1 1
POPM MMR MMR = TOS, ++SP 1 1
PSHD Smem – –SP, Smem = TOS 1 1
PSHM MMR – –SP, MMR = TOS 1 1
Stack-Manipulating instructions
† Values for words (W) and cycles assume the use of DARAM for data.
Add 1 word and 1 cycle when using long-offset indirect
addressing or absolute addressing with an Smem.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 56
Program Control InstructionsProgram Control InstructionsSyntax Expression W† Cycles
IDLE K idle(K) 1 4
MAR Smem
If CMPT = 0, then modify ARx If CMPT = 1 and ARx != AR0
then modify ARx, ARP = x If CMPT = 1 and ARx = AR0,
then modify AR(ARP)
1 1
NOP no operation 1 1RESET software reset 1 3RSBX N, SBIT STN (SBIT) = 0 1 1SSBX N, SBIT STN (SBIT) = 1 1 1
XC n , cond [ , cond[ , cond]If (cond(s)) then execute the next n instructions; n = 1 or 2
1 1
Miscellaneous Program-Control instructions
† Values for words (W) and cycles assume the use of DARAM for data. Add 1 word and 1 cycle when using long-offset indirect addressing or absolute addressing with an Smem.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 57
C55x Assembly LanguageC55x Assembly Language
Instructions set Instructions set generalitiesgeneralities C55 C55 Parallelism features and rulesParallelism features and rules C55 C55 mnemonicmnemonic instruction set instruction set summarysummary
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 58
C55 Instruction set generalitiesC55 Instruction set generalities
General syntax of an instruction :General syntax of an instruction : InstrInstr Op1,[Op2,[Op3,[…]]] Op1,[Op2,[Op3,[…]]]
For For InstrInstr field refer to TI documentation or field refer to TI documentation or following slides.following slides.
The The Op1,[Op2,[Op3,[…]]]Op1,[Op2,[Op3,[…]]] field syntax is field syntax is specified in instruction documentation and specified in instruction documentation and specifies the way (type of addressing mode) specifies the way (type of addressing mode) you could use for the operands.you could use for the operands.
Parallelism is presented in a following sectionParallelism is presented in a following section Terms, symbols and abbreviations are Terms, symbols and abbreviations are
presented in spru374.pdfpresented in spru374.pdf
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 59
C55 Instruction set generalitiesC55 Instruction set generalities
Instruction size can be:Instruction size can be: 8, 16, 24, 32, 40 or 48 bis.8, 16, 24, 32, 40 or 48 bis.
In the mnemonic syntax, instruction names In the mnemonic syntax, instruction names are constituted with:are constituted with: a root and potentially a prefix and a suffix.a root and potentially a prefix and a suffix. There are:There are:
37 roots (see next slides)37 roots (see next slides) 2 Prefixes (see next slides)2 Prefixes (see next slides) 13 Suffixes (see next slides)13 Suffixes (see next slides)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 60
List of roots for instruction names 1/2List of roots for instruction names 1/2Root MeaningABS Absolute valueADD AdditionAND Bitwise AND
B BranchCALL Function callCLR Assign the value to 0CMP CompareCNT CountEXP ExponentMAC Multiply and accumulateMAR Modify auxiliary register contentMAS Multiply and subtractMAX MaximumMIN MinimumMOV Move dataMPY MultiplyNEG Negate (2s complement)NOT Bitwise complement (1s complement)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 61
List of roots for instruction names 2/2List of roots for instruction names 2/2Root MeaningOR Bitwise OR
POP Pop from top of the stackPSH Push to top of the stackRET ReturnROL Rotate leftROR Rotate rightRPT RepeatSAT SaturateSET Assign the value to 1
SFT Shift (left or right depending on sign of shift count)
SQA Square and addSQR SquareSQS Square and subtractSUB Subtraction
SWAP Swap register contentsTST Test bitXOR Bitwise exclusive-OR (XOR)XPA ExpandXTR Extract
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 62
List of prefixes for instruction namesList of prefixes for instruction names
Prefix Meaning
A
Instruction happens in address phase and is subject to circular addressing effects. Also, it occurs in the DAGEN functional unit and cannot be placed in parallel with any instruction that uses dual addressing mode.
BBit instruction. Note that B is also a root (branch), suffix (borrow), and prefix (bit). The differences in context should prevent any confusion.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 63
List of suffixes for instruction namesList of suffixes for instruction namesSuffix Meaning
40 Enables the M40 mode (all 40 bits of the accumulator count)
B Borrow
C CarryCC Conditional
I Enable interruptsK Multiply has a constant operandL Logical shift (left or right depending on sign of shift count)
MThis instruction has the option of assigning a memory operand to T3 regardless of whether that assignment actually occurs.
R RoundS Signed shift (left or right depending on sign of shift count)U UnsignedV Absolute valueZ Delay on the memory operand
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 64
Operand modifiersOperand modifiers
Modifier Meaningdlb Access a true 32-bit memory operand
dualAccess a 32-bit memory operand for use as two independent 16-bit halves of the given operation
Hi Access upper 16 bits of the accumulatorhigh_byte Access the high byte of the memory location
LO Access lower 16 bits of the accumulatorlow_byte Access the low byte of the memory location
pair Dual register accessrnd Round
saturate Saturateuns Unsigned operand (not used in MOV instructions)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 65
Addressing-mode operandsAddressing-mode operandsSyntax element Description
Baddr
When an instruction contains Baddr, that instruction can access one or two bits in an accumulator (AC0–AC3), an auxiliary register (AR0–AR7), or a temporary register (T0–T3). Only the register bit test/set/clear/complement instructions support Baddr. As y
CmemWhen an instruction contains Cmem, that instruction can access a single word (16 bits) of data from data memory. As you write the instruction, replace Cmem with a compatible operand.
LmemWhen an instruction contains Lmem, that instruction can access a long word (32 bits) of data from data memory or from a memory-mapped registers. As you write the instruction, replace Lmem with a compatible operand.
Smem
When an instruction contains Smem, that instruction can access a single word (16 bits) of data from data memory, from I/O space, or from a memory-mapped register. As you write the instruction, replace Smem with a compatible operand.
Xmem and YmemWhen an instruction contains Xmem and Ymem, that instruction can perform two simultaneous 16-bit accesses to data memory. As you write the instruction, replace Xmem and Ymem with compatible operands.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 66
Absolute addressing modesAbsolute addressing modes
Addressing mode Description
k16 absolute
k16 absolute This mode uses the 7-bit register called DPH (high part of the extended data page register) and a 16-bit unsigned constant to form a 23-bit data-space address. This mode is used to access a memory location or a memory-mapped register.
k23 absoluteThis mode enables you to specify a full address as a 23-bit unsigned constant. This mode is used to access a memory location or a memory-mapped register.
I/O absoluteThis mode enables you to specify an I/O address as a 16-bit unsigned constant. This mode is used to access a location in I/O space.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 67
Direct addressing modesDirect addressing modes
Addressing mode Description
DP direct
This mode uses the main data page specified by DPH (high part of the extended data page register) in conjunction with the data page register (DP). This mode is used to access a memory location or a memory-mapped register.
SP direct
This mode uses the main data page specified by SPH (high part of the extended stack pointers) in conjunction with the data stack pointer (SP). This mode is used to access stack values in data memory.
Register-bit directThis mode uses an offset to specify a bit address. This mode is used to access one register bit or two adjacent register bits.
PDP directThis mode uses the peripheral data page register (PDP) and an offset to specify an I/O address. This mode is used to access a location in I/O space.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 68
Indirect addressing modesIndirect addressing modes
Addressing mode Description
AR indirect
This mode uses one of eight auxiliary registers (AR0–AR7) to point to data. The way the CPU uses the auxiliary register to generate an address depends on whether you are CPU uses the auxiliary register to generate an address depends on whether you are or
Dual AR indirectThis mode uses the same address-generation process as the AR i ndirect addressing mode. This mode is used with instructions that access two or more data-memory locations.
CDP indirect
This mode uses the coefficient data pointer (CDP) to p oint to data. The way the CPU uses CDP to generate an address depends on whether you are accessing data space (memory or memory-mapped registers), individual register bits, or I/O space.
Coefficient indirect
This mode uses the same address-generation process as the CDP indirect addressing mode. This mode is available to support instructions that can access a coefficient in data memory at the same time they access two other data-memory values using the dual AR
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 69
C55 Parallelism features and rulesC55 Parallelism features and rules The C55x DSP architecture enables to execute The C55x DSP architecture enables to execute
two instructions intwo instructions in parallel within the same parallel within the same cycle of execution. The types of parallelism cycle of execution. The types of parallelism are:are: Built-in parallelism within a single instruction.Built-in parallelism within a single instruction.
Some instructions perform two different operations in Some instructions perform two different operations in parallel. Doubleparallel. Double colons, ::, are used to separate the two colons, ::, are used to separate the two operations.operations.
User-defined parallelism between two User-defined parallelism between two instructions.instructions. Two instructions may be paralleled by Two instructions may be paralleled by the programmerthe programmer
or the C compiler. The parallelor the C compiler. The parallel bars, ||, are used to bars, ||, are used to separate the two instructions to be executed in parallel.separate the two instructions to be executed in parallel.
Built-in parallelism can be combined with user-Built-in parallelism can be combined with user-defined parallelism. defined parallelism.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 70
Rules for Simple and parallel instructionsRules for Simple and parallel instructions Simple instructions are not allowed to span multiple Simple instructions are not allowed to span multiple
lines. One exception,lines. One exception, single instructions that use the single instructions that use the double colons, ::, notation to imply parallelism.double colons, ::, notation to imply parallelism. These instructions may be split up following the :: These instructions may be split up following the ::
notation.notation. E Example xample of aof a single instruction single instruction on on two lines:two lines: MPYR40 uns(Xmem), uns(Cmem), ACx :: MPYR40 uns(Ymem), uns(Cmem), Acy
User-defined parallelism instructions (using || User-defined parallelism instructions (using || notation) are allowed tonotation) are allowed to span multiple lines. For span multiple lines. For example, all of the following instructions are legal:example, all of the following instructions are legal: MOV AC0, AC1 || MOV AC2, AC3 MOV AC0, AC1 || MOV AC2, AC3 MOV AC0, AC1 || MOV AC2, AC3 MOV AC0, AC1 || MOV AC2, AC3
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 71
Built-In vs. User-Defined ParallelismBuilt-In vs. User-Defined Parallelism
Built-In (Implied)Built-In (Implied) - separate, unique instructions (you can look ‘em up) that - separate, unique instructions (you can look ‘em up) that are built-in to the instruction set (actually ONE instruction) are built-in to the instruction set (actually ONE instruction) - use a “::” to separate the pieces: - use a “::” to separate the pieces:
User-Defined (Explicit)User-Defined (Explicit) - determined by the user or compiler - determined by the user or compiler - user places a double “pipe” | | to separate the two instructions: - user places a double “pipe” | | to separate the two instructions:
built_in: MAC *AR3+,*CDP+,AC0 :: MAC ...built_in: MAC *AR3+,*CDP+,AC0 :: MAC ...
user_defined: user_defined: MPYM *AR1+,*AR2+,AC1MPYM *AR1+,*AR2+,AC1 |||| AND AR3,T1 AND AR3,T1
ComboCombo - built-in and user-defined can be combined: - built-in and user-defined can be combined:
combo: ADD *AR3<<#16, AC0, AC1combo: ADD *AR3<<#16, AC0, AC1 :: MOV HI(AC1 << T2), *AR4:: MOV HI(AC1 << T2), *AR4 || MOV #3,AR1|| MOV #3,AR1
Let’s find out more about user-defined parallelism...Let’s find out more about user-defined parallelism...
Parallel instructions execute in a single cycle:Parallel instructions execute in a single cycle:
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 72
User-Defined ParallelismUser-Defined Parallelism
Can these two instructions be used in parallel?Can these two instructions be used in parallel?
First, we need to know which processorFirst, we need to know which processorresources are being used:resources are being used:
OperatorsOperators “Architecture components that perform“Architecture components that performtasks.” For example: ALUs, addresstasks.” For example: ALUs, addressgenerators, Dual-MAC (DMAC), etc.generators, Dual-MAC (DMAC), etc.
RegistersRegisters Tx, ARx, ACx, BRCx, BSAxx, etc.Tx, ARx, ACx, BRCx, BSAxx, etc.
BusesBuses Memory Read/Write, Constant, etc.Memory Read/Write, Constant, etc.
Let’s explore the types of operators first...Let’s explore the types of operators first...
MOV *AR1,AC1MOV *AR1,AC1|| || ADD @var,AR2 ADD @var,AR2
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 73
OperatorsOperators
Each unit has multiple load/store operators (shown later)Each unit has multiple load/store operators (shown later) All other operators are single useAll other operators are single use
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
CONTROLCONTROL
PUPU
REGISTERREGISTERFILEFILE SWAPSWAP
ALU, SHIFTER, DMACALU, SHIFTER, DMAC
STORESTOREw/SHFTw/SHFT
DUDU
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
ALU w/SHIFTALU w/SHIFT
AUAU
LOADLOAD STORESTORE SSWWAAPP
DAGENDAGEN
XX YY CC
Which operators are used in the instruction above?Which operators are used in the instruction above?
Let’s review the contents of the register files...Let’s review the contents of the register files...
MOV *AR1,AC1MOV *AR1,AC1|| || ADD @var,AR2 ADD @var,AR2
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 74
Register FilesRegister Files
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
PUPU
REGISTERREGISTERFILEFILE SWAPSWAP
STORESTOREw/SHFTw/SHFT
DUDU
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
AUAU
LOADLOAD STORESTORE SSWWAAPP
DAGENDAGEN
XX YY CC
RPTC RPTC BRC0-1 BRC0-1
RSA0-1RSA0-1 REA0-1 REA0-1
RETARETA CSR CSR
AC0-3 TRN0-1AC0-3 TRN0-1
Following is the list of registers contained in each unit:Following is the list of registers contained in each unit:
Next, let’s review the buses...Next, let’s review the buses...
T0-3 XAR0-7 T0-3 XAR0-7 XSP/SSPXSP/SSP
Xreg XCDPXreg XCDP XDPXDP
BKxx BSAxxBKxx BSAxx ST0-3ST0-3
PDPPDP
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 75
Read/Write BusesRead/Write Buses
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
CONTROLCONTROL
PUPU
REGISTERREGISTERFILEFILE SWAPSWAP
ALU, SHIFTER, DMACALU, SHIFTER, DMAC
STORESTOREw/SHFTw/SHFT
DUDU
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
ALU w/SHIFTALU w/SHIFT
AUAU
LOADLOAD STORESTORE SSWWAAPP
DAGENDAGEN
XX YY CC
Cross-Unit BusesCross-Unit Buses
Read (BB, CB, DB) Write (EB, FB)Read (BB, CB, DB) Write (EB, FB)
One load/store operator exists forOne load/store operator exists foreach bus entering/leaving the unit.each bus entering/leaving the unit.
EE FF EE FF EE FFCC DD CC DD CC DDBB
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 76
Examples of Operator/Bus UsageExamples of Operator/Bus UsageInstructionsInstructions OperatorsOperators BusesBuses
MPYM *AR1,T0,AC0 D-Unit ALU/MAC/SH D-Bus
A-Unit DAGEN X
MOV @var,BRC1MOV @var,BRC1 P-Unit LoadP-Unit Load D-BusD-Bus
A-Unit DAGEN XA-Unit DAGEN X
MOV T1,@varMOV T1,@var A-Unit StoreA-Unit Store E-BusE-Bus
A-Unit DAGEN XA-Unit DAGEN X
MOV dbl(@long),AC0MOV dbl(@long),AC0 D-Unit LoadD-Unit Load C-BusC-Bus
A-Unit DAGEN X A-Unit DAGEN X D-BusD-Bus
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 77
User-Defined Parallelism - 3 Basic RulesUser-Defined Parallelism - 3 Basic RulesTwo instructions can be written in parallel if:Two instructions can be written in parallel if:
2.2. Total size is less than or equal to Total size is less than or equal to 6 bytes6 bytes
1.1. There is There is no hardware resourceno hardware resource conflict conflict (operands, buses and registers) (operands, buses and registers)
3.3. For instructions with 2 memory accesses, each mustFor instructions with 2 memory accesses, each mustuse use indirectindirect addressing (soft-dual addressing) addressing (soft-dual addressing) Ex: Ex: MOV *AR2,AC0 || MOV T1,*AR1MOV *AR2,AC0 || MOV T1,*AR1
If the instructions meet these rules, If the instructions meet these rules, write it and assemblewrite it and assemble. .
If the assembler doesn’t like it, refer to the “advanced”If the assembler doesn’t like it, refer to the “advanced”rules at the end of this module.rules at the end of this module.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 78
Parallelism - Exercise 1Parallelism - Exercise 1
MOV *AR1,AC1MOV *AR1,AC1 ;bytes;bytes:2 :2 || || ADD @var,AR2ADD @var,AR2 ;bytes;bytes:3 :3
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
CONTROLCONTROL
PUPU
REGISTERREGISTERFILEFILE SWAPSWAP
ALU, SHIFTER, DMACALU, SHIFTER, DMAC
STORESTOREw/SHFTw/SHFT
DUDU
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
ALU w/SHIFTALU w/SHIFT
AUAU
LOADLOAD STORESTORE SSWWAAPP
DAGENDAGEN
XX YY CC
Cross-Unit BusesCross-Unit Buses
Read (BB, CB, DB) Write (EB, FB)Read (BB, CB, DB) Write (EB, FB)
EE FF EE FF EE FFCC DD CC DD CC DDBB
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 79
Parallelism - Exercise 1 (Solution)Parallelism - Exercise 1 (Solution)
MOV *AR1,AC1MOV *AR1,AC1 ;bytes;bytes:2 :2 || || ADD @var,AR2ADD @var,AR2 ;bytes;bytes:3 :3
Reason: No soft-dual addressingReason: No soft-dual addressing
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
CONTROLCONTROL
PUPU
REGISTERREGISTERFILEFILE SWAPSWAP
ALU, SHIFTER, DMACALU, SHIFTER, DMAC
STORESTOREw/SHFTw/SHFT
DUDU
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
ALU w/SHIFTALU w/SHIFT
AUAU
LOADLOAD STORESTORE SSWWAAPP
DAGENDAGEN
XX YY CC
Cross-Unit BusesCross-Unit Buses
Read (BB, CB, DB) Write (EB, FB)Read (BB, CB, DB) Write (EB, FB)
EE FF EE FF EE FFCC DD CC DD CC DDBB
DAGENDAGEN
XX YY
ALU w/SHIFTALU w/SHIFT
LOADLOADLOADLOAD
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 80
Parallelism - Exercise 2Parallelism - Exercise 2
MOV dbl(*(AR1+T0)),AC0 MOV dbl(*(AR1+T0)),AC0 ;bytes;bytes:3:3|||| MOV AC2,dbl(*AR2+) MOV AC2,dbl(*AR2+) ;bytes;bytes:3:3
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
CONTROLCONTROL
PUPU
REGISTERREGISTERFILEFILE SWAPSWAP
ALU, SHIFTER, DMACALU, SHIFTER, DMAC
STORESTOREw/SHFTw/SHFT
DUDU
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
ALU w/SHIFTALU w/SHIFT
AUAU
LOADLOAD STORESTORE SSWWAAPP
DAGENDAGEN
XX YY CC
Cross-Unit BusesCross-Unit Buses
Read (BB, CB, DB) Write (EB, FB)Read (BB, CB, DB) Write (EB, FB)
EE FF EE FF EE FFCC DD CC DD CC DDBB
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 81
Parallelism - Exercise 2 (Solution)Parallelism - Exercise 2 (Solution)
MOV dbl(*(AR1+T0)),AC0 MOV dbl(*(AR1+T0)),AC0 ;bytes;bytes:3:3|||| MOV AC2,dbl(*AR2+) MOV AC2,dbl(*AR2+) ;bytes;bytes:3:3
It works.It works.
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
CONTROLCONTROL
PUPU
REGISTERREGISTERFILEFILE SWAPSWAP
ALU, SHIFTER, DMACALU, SHIFTER, DMAC
STORESTOREw/SHFTw/SHFT
DUDU
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
ALU w/SHIFTALU w/SHIFT
AUAU
LOADLOAD STORESTORE SSWWAAPP
DAGENDAGEN
XX YY CC
Cross-Unit BusesCross-Unit Buses
Read (BB, CB, DB) Write (EB, FB)Read (BB, CB, DB) Write (EB, FB)
EE FF EE FF EE FFCC DD CC DD CC DDBB
LOADLOAD STORESTORE
XX YY
DAGENDAGEN
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 82
Parallelism - Exercise 3Parallelism - Exercise 3
CALL AC0 CALL AC0 ;bytes:2;bytes:2 |||| MOV AC1,AR1 MOV AC1,AR1 ;bytes:2;bytes:2
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
CONTROLCONTROL
PUPU
REGISTERREGISTERFILEFILE SWAPSWAP
ALU, SHIFTER, DMACALU, SHIFTER, DMAC
STORESTOREw/SHFTw/SHFT
DUDU
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
ALU w/SHIFTALU w/SHIFT
AUAU
LOADLOAD STORESTORE SSWWAAPP
DAGENDAGEN
XX YY CC
Cross-Unit BusesCross-Unit Buses
Read (BB, CB, DB) Write (EB, FB)Read (BB, CB, DB) Write (EB, FB)
EE FF EE FF EE FFCC DD CC DD CC DDBB
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 83
Parallelism - Exercise 3 (Solution)Parallelism - Exercise 3 (Solution)
CALL AC0 CALL AC0 ;bytes:2;bytes:2 |||| MOV AC1,AR1 MOV AC1,AR1 ;bytes:2;bytes:2
Reason: cross-unit bus conflictReason: cross-unit bus conflict
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
CONTROLCONTROL
PUPU
REGISTERREGISTERFILEFILE SWAPSWAP
ALU, SHIFTER, DMACALU, SHIFTER, DMAC
STORESTOREw/SHFTw/SHFT
DUDU
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
ALU w/SHIFTALU w/SHIFT
AUAU
LOADLOAD STORESTORE SSWWAAPP
DAGENDAGEN
XX YY CC
Cross-Unit BusesCross-Unit Buses
Read (BB, CB, DB) Write (EB, FB)Read (BB, CB, DB) Write (EB, FB)
EE FF EE FF EE FFCC DD CC DD CC DDBB
ALU w/SHIFTALU w/SHIFT
Cross-Unit BusesCross-Unit Buses
CONTROLCONTROL
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 84
Parallelism - Exercise 4Parallelism - Exercise 4
MAC *AR1+, *CDP+, AC0>>#16 MAC *AR1+, *CDP+, AC0>>#16 :::: MAC *AR2+, *CDP+, AC1>>#16 ;bytes:MAC *AR2+, *CDP+, AC1>>#16 ;bytes:44|||| RPTADD CSR,T1 RPTADD CSR,T1 ;bytes:2 ;bytes:2
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
CONTROLCONTROL
PUPU
REGISTERREGISTERFILEFILE SWAPSWAP
ALU, SHIFTER, DMACALU, SHIFTER, DMAC
STORESTOREw/SHFTw/SHFT
DUDU
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
ALU w/SHIFTALU w/SHIFT
AUAU
LOADLOAD STORESTORE SSWWAAPP
DAGENDAGEN
XX YY CC
Cross-Unit BusesCross-Unit Buses
Read (BB, CB, DB) Write (EB, FB)Read (BB, CB, DB) Write (EB, FB)
EE FF EE FF EE FFCC DD CC DD CC DDBB
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 85
Parallelism - Exercise 4Parallelism - Exercise 4
MAC *AR1+, *CDP+, AC0>>#16 MAC *AR1+, *CDP+, AC0>>#16 :::: MAC *AR2+, *CDP+, AC1>>#16 ;bytes:MAC *AR2+, *CDP+, AC1>>#16 ;bytes:44|||| RPTADD CSR,T1 RPTADD CSR,T1 ;bytes:2 ;bytes:2
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
CONTROLCONTROL
PUPU
REGISTERREGISTERFILEFILE SWAPSWAP
ALU, SHIFTER, DMACALU, SHIFTER, DMAC
STORESTOREw/SHFTw/SHFT
DUDU
LOADLOAD STORESTORE
REGISTERREGISTERFILEFILE
ALU w/SHIFTALU w/SHIFT
AUAU
LOADLOAD STORESTORE SSWWAAPP
DAGENDAGEN
XX YY CC
Cross-Unit BusesCross-Unit Buses
Read (BB, CB, DB) Write (EB, FB)Read (BB, CB, DB) Write (EB, FB)
EE FF EE FF EE FFCC DD CC DD CC DDBB
It works...It works...
ALU, SHIFTER, DMACALU, SHIFTER, DMACCONTROLCONTROL ALU w/SHIFTALU w/SHIFT
XX YY CC
DAGENDAGEN
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 86
Optimization SuggestionsOptimization Suggestions There are two main types of code you can optimize: There are two main types of code you can optimize:
SetupSetup
Inner LoopInner Loop
Usually contains instructions w/large constants such as:Usually contains instructions w/large constants such as: AMOV #x,XAR5AMOV #x,XAR5which cannot be placed in parallel anywaywhich cannot be placed in parallel anyway
Creating a long sequences of 5-6byte instructions Creating a long sequences of 5-6byte instructions couldcouldstall the IBQ (4-byte fetch only) and negate the benefitstall the IBQ (4-byte fetch only) and negate the benefit
Only run once - i.e. usually not inside a loopOnly run once - i.e. usually not inside a loop
Recommendation: don’t spend time on setup codeRecommendation: don’t spend time on setup code
Largest impact for parallel instructions and avoidingLargest impact for parallel instructions and avoidingpipeline stalls based on #loopspipeline stalls based on #loops
Usually contains simple/math instructions which haveUsually contains simple/math instructions which havean increased chance of placing in parallelan increased chance of placing in parallel
Usually inside RPTB/LOCAL - easy to locate exactly Usually inside RPTB/LOCAL - easy to locate exactly which code you should focus your efforts onwhich code you should focus your efforts on
Recommendation: FOCUS ALL YOUR TIME HERERecommendation: FOCUS ALL YOUR TIME HERE
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 87
User-Defined Parallelism - Advanced RulesUser-Defined Parallelism - Advanced RulesA. Address modifiers can not use 16/24-bit constants. A. Address modifiers can not use 16/24-bit constants.
B. If D-unit shift-and-store operator is used:B. If D-unit shift-and-store operator is used:
(a) D-unit ALU shifter can not be used and,(a) D-unit ALU shifter can not be used and,
(b) only two source accumulators in total are allowed.(b) only two source accumulators in total are allowed.
C. Instruction priority: C. Instruction priority:
(a) MMR access takes priority over internal access(a) MMR access takes priority over internal access MOV AR1,AC0 MOV AR1,AC0
|| MOV #0,mmap(@AC0L)|| MOV #0,mmap(@AC0L) ;takes ;takes prioritypriority
(b) In a bus conflict, 2nd value is used in both (b) In a bus conflict, 2nd value is used in both instructions (assembler flags warning)instructions (assembler flags warning)
MOV #3,AC0MOV #3,AC0
|| MOV #4,AC1 || MOV #4,AC1 ;takes priority ;takes priority
D. DAGEN tag combination must be allowedD. DAGEN tag combination must be allowed
E. One of the instructions must have a parallel enable bitE. One of the instructions must have a parallel enable bit
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 88
Soft-dual parallelismSoft-dual parallelism Instructions that reference memory operands Instructions that reference memory operands
do not have parallel enable bits.do not have parallel enable bits. Two such instructions may still be combined Two such instructions may still be combined
with a type of parallelism calledwith a type of parallelism called soft-dual soft-dual parallelism. parallelism.
The constraints of soft-dual parallelism are:The constraints of soft-dual parallelism are: Both memory operands must meet the constraints Both memory operands must meet the constraints
of the dual AR indirectof the dual AR indirect addressing mode (Xmem addressing mode (Xmem and Ymem). and Ymem).
Neither instruction can contain any of the Neither instruction can contain any of the following:following: Instructions embedding high_byte(Smem) and Instructions embedding high_byte(Smem) and
low_byte(Smemlow_byte(Smem). These instructions that read and write the same These instructions that read and write the same
memory locationmemory location..
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 89
Soft-dual parallelismSoft-dual parallelism With regard to soft-dual parallelism, theWith regard to soft-dual parallelism, the AMAR
Smem instruction has theinstruction has the same properties as any same properties as any memory reference instruction.memory reference instruction.
Soft-Dual Parallelism of MAR InstructionsSoft-Dual Parallelism of MAR Instructions Although the following modify auxiliary register (MAR) Although the following modify auxiliary register (MAR)
instructions do notinstructions do not reference memory and do not have reference memory and do not have parallel enable bits, they may be combinedparallel enable bits, they may be combined together or together or with any other memory reference instructions (not limited with any other memory reference instructions (not limited to Xmem/Ymem) to form soft-dual parallelismto Xmem/Ymem) to form soft-dual parallelism::
AADD TAx, TAy AADD k8, TAx AMOV TAx, TAy AMOV k8, TAx ASUB TAx, TAy ASUB k8, TAx
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 90
C55 mnemonic instruction set summaryC55 mnemonic instruction set summary The C55 mnemonic instruction set can be The C55 mnemonic instruction set can be
divided into six basic types of operations:divided into six basic types of operations: Arithmetical operationsArithmetical operations Bit manipulations Bit manipulations operationsoperations Extended auxiliary registerExtended auxiliary register (XAR) (XAR) operationsoperations Logical operationsLogical operations Move operationsMove operations ProgramProgram-control -control operationsoperations
With each instruction listing, you will find the With each instruction listing, you will find the availability of a parallel enable bit, word availability of a parallel enable bit, word count(size), cycle time, what pipeline stage the count(size), cycle time, what pipeline stage the instruction is executed, and in what unit the instruction is executed, and in what unit the instruction is executedinstruction is executed.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 91
Arithmetic InstructionsArithmetic Instructions
SyntaxParallel enable
bitSize Cycles Pipeline
Executed in
ABDST Xmem, Ymem, ACx, ACy No 4 1 XD unit MAC
and ALU
ABS [src,] dst Yes 2 1 XA or D unit
ALU
Absolute distance
Absolute Value
The ABDST instruction executes two operations in parallel: The ABDST instruction executes two operations in parallel: one in the D-unit MAC and one in the D-unit ALUone in the D-unit MAC and one in the D-unit ALU:
ACy = ACy + |HI(ACx)| ACx = (Xmem << #16) - (Ymem << #16)
1
0
N
iii yxD
1
0
N
iii yxD
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 92
Arithmetic Instructions: ADDArithmetic Instructions: ADDSyntax
Parallel enable bit
Size Cycles Pipeline Executed in
ADD [src,] dst Yes 2 1 XA or D unit
ALU
ADD k4, dst Yes 2 1 XA or D unit
ALU
ADD K16, [src,] dst No 4 1 XA or D unit
ALU
ADD Smem, [src,] dst No 3 1 XA or D unit
ALU
ADD ACx << Tx, ACy Yes 2 1 XD unit ALU and Shiter
ADD ACx << #SHIFTW, ACy Yes 3 1 XD unit ALU and Shiter
ADD K16 << #16, [ACx,] ACy No 4 1 X D unit ALU
ADD K16 << #SHFT, [ACx,] ACy No 4 1 XD unit ALU and shifter
ADD Smem << Tx, [ACx,] ACy No 3 1 XD unit ALU and shifter
ADD Smem << #16, [ACx,] ACy No 3 1 X D unit ALUADD [uns(]Smem[)], CARRY, [ACx,] ACy No 3 1 X D unit ALUADD [uns(]Smem[)], [ACx,] ACy No 3 1 X D unit ALU
ADD [uns(]Smem[)] << #SHIFTW, [ACx,] ACy No 4 1 XD unit ALU and shifter
ADD dbl(Lmem), [ACx,] ACy No 3 1 X D unit ALUADD Xmem, Ymem, ACx No 3 1 X D unit ALUADD K16, Smem No 4 1 X D unit ALUADD[R]V [ACx,] ACy Yes 2 1 X D unit MAC
Addition
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 93
Some examples for the 17 forms of ADD Some examples for the 17 forms of ADD ADD AC1, AC0 The content of AC1 is added to the content The content of AC1 is added to the content
of AC0 and the result is stored in AC0.of AC0 and the result is stored in AC0. ADD #15, AC0 The content of AC0 is added to an unsigned The content of AC0 is added to an unsigned
4-bit value (15) and the result is stored in AC0.4-bit value (15) and the result is stored in AC0. ADD #2E00h, AC0, AC1 The content of AC0 is added to The content of AC0 is added to
the signed 16-bit value (2E00h) and the result is stored in the signed 16-bit value (2E00h) and the result is stored in AC1.AC1.
ADD *AR3+, T0, T1 The content of T0 is added to the The content of T0 is added to the content addressed by AR3 and the result is stored in T1. AR3 content addressed by AR3 and the result is stored in T1. AR3 is incremented by 1.is incremented by 1.
ADD AC1 << T0, AC0 The content of AC1 shifted by the The content of AC1 shifted by the content of T0 is added to the content of AC0 and the result is content of T0 is added to the content of AC0 and the result is stored in AC0.stored in AC0.
ADD AC1 << #31, AC0 The content of AC1 shifted left by The content of AC1 shifted left by 31 bits is added to the content of AC0 and the result is stored 31 bits is added to the content of AC0 and the result is stored in AC0.in AC0.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 94
Some examples for the 17 forms of ADDSome examples for the 17 forms of ADD ADD #FFFFh << #16, AC1, AC0 A signed 16-bit value A signed 16-bit value
(FFFFh) shifted left by 16 bits is added to the content of AC1 (FFFFh) shifted left by 16 bits is added to the content of AC1 and the result is stored in AC0.and the result is stored in AC0.
ADD #FFFFh << #15, AC1, AC0 A signed 16-bit value A signed 16-bit value (FFFFh) shifted left by 15 bits is added to the content of AC1 (FFFFh) shifted left by 15 bits is added to the content of AC1 and the result is stored in AC0.and the result is stored in AC0.
ADD *AR1 << T0, AC1, AC0 The content addressed by The content addressed by AR1 shifted left by the content of T0 is added to the content of AR1 shifted left by the content of T0 is added to the content of AC1 and the result is stored in AC0.AC1 and the result is stored in AC0.
ADD *AR3 << #16, AC1, AC0 The content addressed by The content addressed by AR3 shifted left by 16 bits is added to the content of AC1 and AR3 shifted left by 16 bits is added to the content of AC1 and the result is stored in AC0.the result is stored in AC0.
ADD uns(*AR3), CARRY, AC1, AC0 The CARRY status The CARRY status bit and the unsigned content addressed by AR3 are added to bit and the unsigned content addressed by AR3 are added to the content of AC1 and the result is stored in AC0.the content of AC1 and the result is stored in AC0.
ADD uns(*AR3), AC1, AC0 The unsigned content The unsigned content addressed by AR3 is added to the content of AC1 and the addressed by AR3 is added to the content of AC1 and the result is stored in AC0.result is stored in AC0.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 95
Some examples for the 17 forms of ADDSome examples for the 17 forms of ADD ADD uns(*AR3) << #31, AC1, AC0 The unsigned content The unsigned content
addressed by AR3 shifted left by 31 bits is added to the addressed by AR3 shifted left by 31 bits is added to the content of AC1 and the result is stored in AC0.content of AC1 and the result is stored in AC0.
ADD dbl(*AR3+), AC1, AC0 The content (long word) The content (long word) addressed by AR3 and AR3 + 1 is added to the content of AC1 addressed by AR3 and AR3 + 1 is added to the content of AC1 and the result is stored in AC0. Because this instruction is a and the result is stored in AC0. Because this instruction is a long-operand instruction, AR3 is incremented by 2 after the long-operand instruction, AR3 is incremented by 2 after the execution.execution.
ADD *AR3, *AR4, AC0 The content addressed by AR3 The content addressed by AR3 shifted left by 16 bits is added to the content addressed by shifted left by 16 bits is added to the content addressed by AR4 shifted left by 16 bits and the result is stored in AC0.AR4 shifted left by 16 bits and the result is stored in AC0.
ADD #FFFFh, *AR3 The content addressed by AR3 is added The content addressed by AR3 is added to a signed 16-bit value and the result is stored back into the to a signed 16-bit value and the result is stored back into the location addressed by AR3.location addressed by AR3.
ADDV AC1, AC0 The absolute value of AC1 is added to the The absolute value of AC1 is added to the content of AC0 and the result is stored in AC0.content of AC0 and the result is stored in AC0.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 96
Arithmetic instructions: compare and select Arithmetic instructions: compare and select extremumextremum
SyntaxParallel
enable bitSize Cycles Pipeline
Executed in
MAXDIFF ACx, ACy, ACz, ACw Yes 3 1 X D unit ALUDMAXDIFF ACx, ACy, ACz, ACw, TRNx Yes 3 1 X D unit ALUMINDIFF ACx, ACy, ACz, ACw Yes 3 1 X D unit ALUDMINDIFF ACx, ACy, ACz, ACw, TRNx Yes 3 1 X D unit ALU
Compare and Select Extremum
See examples and more details on next slidesSee examples and more details on next slides
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 97
Index Search - DMAXDIFF, MAXDIFFIndex Search - DMAXDIFF, MAXDIFF
DMAXDIFF ACx,ACy,ACz,ACw,TRNxDMAXDIFF ACx,ACy,ACz,ACw,TRNx 32-bit Search32-bit Search
ACx/ACy = values, ACz = max, ACw = ACy-ACx, TRNx = indexACx/ACy = values, ACz = max, ACw = ACy-ACx, TRNx = index
~N cycles for N taps with index~N cycles for N taps with index
TRN0/1TRN0/1001515
11 (If new max, shift “1” into MSB)(If new max, shift “1” into MSB)
Splits AC’s into two 16-bit registersSplits AC’s into two 16-bit registers
Max of hi/low halves placed in ACz, ACy-ACx placed in ACw, Max of hi/low halves placed in ACz, ACy-ACx placed in ACw, TRN0/1 hold indexTRN0/1 hold index
TRN0 (tracks AC high), TRN1 (tracks AC low)TRN0 (tracks AC high), TRN1 (tracks AC low)
~N/2 cycles for N taps with index~N/2 cycles for N taps with index
MAXDIFF ACx,ACy,ACz,ACwMAXDIFF ACx,ACy,ACz,ACw Dual 16-bit SearchDual 16-bit Search
MINDIFF and DMINDIFF also supportedMINDIFF and DMINDIFF also supported
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 98
Example of MAXDIFF instructionExample of MAXDIFF instruction MAXDIFF AC0, AC1, AC2, AC1
Before After AC0 10 2400 2222 AC0 10 2400 2222 AC1 90 0000 0000 AC1 FF 8000 DDDE AC2 00 0000 0000 AC2 10 2400 2222 SATD 1 SATD 1 TRN0 1000 TRN0 0800 TRN1 0100 TRN1 0080 ACOV1 0 ACOV1 1 CARRY 1 CARRY 0 The difference is stored in AC1. The difference is stored in AC1. The content of AC0(39–16) is subtracted from the content of AC1(39–16) and The content of AC0(39–16) is subtracted from the content of AC1(39–16) and
the result is stored in AC1(39–16). Since SATD = 1 and an overflow is the result is stored in AC1(39–16). Since SATD = 1 and an overflow is detected, AC1(39–16) = FF 8000h (saturation). detected, AC1(39–16) = FF 8000h (saturation).
The content of AC0(15–0) is subtracted from the content of AC1(15–0) and the The content of AC0(15–0) is subtracted from the content of AC1(15–0) and the result is stored in AC1(15–0). result is stored in AC1(15–0).
The maximum is stored in AC2. The maximum is stored in AC2. The content of TRN0 and TRN1 is shifted right 1 bit. AC0(31–16) is greater The content of TRN0 and TRN1 is shifted right 1 bit. AC0(31–16) is greater
than AC1(31–16), AC0(39–16) is stored in AC2(39–16) and TRN0(15) is than AC1(31–16), AC0(39–16) is stored in AC2(39–16) and TRN0(15) is cleared to 0. AC0(15–0) is greater than AC1(15–0), AC0(15–0) is storedcleared to 0. AC0(15–0) is greater than AC1(15–0), AC0(15–0) is stored inAC2(15–0) and TRN1(15) is cleared to 0.inAC2(15–0) and TRN1(15) is cleared to 0.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 99
Example of DMAXDIFF instructionExample of DMAXDIFF instruction DMAXDIFF AC0, AC1, AC2, AC3, TRN1
Before After AC0 10 2400 2222 AC0 10 2400 2222 AC1 00 8000 DDDE AC1 00 8000 DDDE AC2 00 0000 0000 AC2 10 2400 2222 AC3 00 0000 0000 AC3 F0 5C00 BBBC M40 1 M40 1 SATD 1 SATD 1 TRN1 0080 TRN1 0040 ACOV3 0 ACOV3 0 CARRY 0 CARRY 0 The difference is stored in AC3. The difference is stored in AC3. The content of AC0 is subtracted from the content of AC1 and the The content of AC0 is subtracted from the content of AC1 and the
result is stored in AC3.result is stored in AC3. The maximum is stored in AC2. The maximum is stored in AC2. The content of TRN1 is shifted right 1 bit. AC0 is greater than AC1, The content of TRN1 is shifted right 1 bit. AC0 is greater than AC1,
AC0 is stored in AC2 and TRN1(15) is cleared to 0.AC0 is stored in AC2 and TRN1(15) is cleared to 0.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 100
Arithmetic instructions: Conditional Arithmetic instructions: Conditional Addition/subtraction, Shift, SubtractAddition/subtraction, Shift, Subtract
SyntaxParallel
enable bitSize Cycles Pipeline
Executed in
ADDSUBCC Smem, ACx, TC1, ACy No 3 1 X D unit ALUADDSUBCC Smem, ACx, TC2, ACy No 3 1 X D unit ALUADDSUBCC Smem, ACx, TC1, TC2, ACy No 3 1 X D unit ALU
ADDSUB2CC Smem, ACx, Tx, TC1, TC2, ACy No 3 1 XD unit Shifter
SFTCC ACx, TCx Yes 2 1 XD unit shifter
SUBC Smem, [ACx,] ACy No 3 1 X D unit ALU
Conditional addition/subtraction
Conditional Shift
Conditional Subtract
These instructions are useful for division.These instructions are useful for division.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 101
Example for conditional shift SFTCCExample for conditional shift SFTCC
SFTCC AC0, TC1 Before Before AfterAfter AC0 AC0 FF 8765 0055 FF 8765 0055 AC0 AC0 FF 8765 0055FF 8765 0055 TC1 TC1 0 0 TC1 TC1 11
Because AC0(31) XORed with AC0(30) equals 1, the Because AC0(31) XORed with AC0(30) equals 1, the content of AC0 is not shifted left and TC1 is set to 1.content of AC0 is not shifted left and TC1 is set to 1.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 102
Example for SUBCExample for SUBC
SUBC *AR1, AC0, AC1 Before After AC0 23 4300 0000 AC0 23 4300 0000 AC1 00 0000 0000 AC1 46 8400 0001 AR1 300 AR1 300 300 200 300 200 SXMD 0 SXMD 0 ACOV1 0 ACOV1 1 CARRY 0 CARRY 1 The content addressed by AR1 shifted left by 15 bits is subtracted from The content addressed by AR1 shifted left by 15 bits is subtracted from
the content of AC0. The result is greater than 0; therefore, the result is the content of AC0. The result is greater than 0; therefore, the result is shifted left by 1 bit, added to 1, and the new result stored in AC1. The shifted left by 1 bit, added to 1, and the new result stored in AC1. The result generated an overflow and a carry.result generated an overflow and a carry.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 103
Dual 16-Bit ArithmeticDual 16-Bit Arithmetic
SyntaxParallel
enable bitSize Cycles Pipeline
Executed in
ADDSUB Tx, Smem, Acx No 3 1 X D unit ALUSUBADD Tx, Smem, Acx No 3 1 X D unit ALUADD dual(Lmem), [ACx,] Acy No 3 1 X D unit ALUSUB dual(Lmem), [ACx,] Acy No 3 1 X D unit ALUSUB ACx, dual(Lmem), Acy No 3 1 X D unit ALUSUB dual(Lmem), Tx, Acx No 3 1 X D unit ALUADD dual(Lmem), Tx, ACx No No 3 1 X D unit ALUSUB Tx, dual(Lmem), Acx No 3 1 X D unit ALUADDSUB Tx, dual(Lmem), Acx No 3 1 X D unit ALUSUBADD Tx, dual(Lmem), ACx No 3 1 X D unit ALU
Dual 16-Bit Arithmetic
See next slides for more details on ADDSUB and SUBADD.See next slides for more details on ADDSUB and SUBADD.
See chapter 22 for application to Viterbi algorithm.See chapter 22 for application to Viterbi algorithm.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 104
Viterbi DecodingViterbi DecodingChannelChannelENCENCDataData
GG00GG11 DECDEC ~Data~Data~G~G00GG11
Know: Received data, and how the original data was encoded Need: Derive the original data from the received data Viterbi: “Deriving the most likely path taken through a Viterbi trellis” Process: Establish path through trellis (using metric/penalties) to allow
traceback to determine the original data that determined this path
SelectSelectMaxMax
SelectSelectMaxMax
NextNextStateState
+T3+T3
+T3+T3
- T3
- T3
- T3- T3
ACACxxHH
ACACyyHH
ACACxxLL
ACACyyLL
PrelimPrelimValuesValues
TRN0/1TRN0/1
1. Get current metric1. Get current metric
CurrentCurrentStateState
*ptr
*ptr
ProcedureProcedure
2. Add/sub local2. Add/sub local distance (T3) distance (T3)
3. Compare and3. Compare and select min/max select min/max
4. Note which path4. Note which path was taken (TRNx) was taken (TRNx)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 105
Viterbi DecodingViterbi Decoding ADDSUB *AR0+,T3,AC0ADDSUB *AR0+,T3,AC0 ;hi(AC0) <- p0(J);hi(AC0) <- p0(J)
;lo(AC0) <- p0(J+N/2) ;lo(AC0) <- p0(J+N/2)
SUBADD *AR0+,T3,AC1SUBADD *AR0+,T3,AC1 ;hi(AC1) <- p1(J);hi(AC1) <- p1(J);lo(AC1) <- p1(J+N/2);lo(AC1) <- p1(J+N/2)
MAXDIFF AC0,AC1,AC2,AC3MAXDIFF AC0,AC1,AC2,AC3 ;put “best path” in AC2;put “best path” in AC2
- Use ABDST/SQDST to determine metric; metric update and traceback not shown- Use ABDST/SQDST to determine metric; metric update and traceback not shown
SelectSelectMaxMax
SelectSelectMaxMax
NextNextStateState
+T3+T3
+T3+T3
- T3
- T3
- T3- T3
ACACxxHH
ACACyyHH
ACACxxLL
ACACyyLL
PrelimPrelimValuesValues
TRN0/1TRN0/1
1. Get current metric1. Get current metric
CurrentCurrentStateState
*ptr
*ptr
ProcedureProcedure
2. Add/sub local2. Add/sub local distance (T3) distance (T3)
3. Compare and3. Compare and select min/max select min/max
4. Note which path4. Note which path was taken (TRNx) was taken (TRNx)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 106
Dual Multiply/Accumulate/Subtract 1/2Dual Multiply/Accumulate/Subtract 1/2Syntax
Parallel enable bit
Size Cycles Pipeline Executed in
AMAR Xmem No 4 1 X D unit MACs
:: MAS[R][40] [uns(]Ymem[)], [uns(]Cmem[)], ACx
MAC[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx >> #16 No 4 1 X D unit MACs
:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)], ACyMPY[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACy >> #16MAC[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx >> #16 No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACy >> #16
MAS[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACy >> #16
AMAR Xmem No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACx >> #16
AMAR Xmem, Ymem, Cmem No 4 1 X D unit MACs
Dual Multiply (Accumulate/Subtract)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 107
Dual Multiply/Accumulate/Subtract 2/2Dual Multiply/Accumulate/Subtract 2/2Syntax
Parallel enable bit
Size Cycles Pipeline Executed in
AMAR Xmem No 4 1 X D unit MACs
:: MAS[R][40] [uns(]Ymem[)], [uns(]Cmem[)], ACx
MAC[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx >> #16 No 4 1 X D unit MACs
:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)], ACyMPY[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACy >> #16MAC[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx >> #16 No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACy >> #16
MAS[R][40] [uns(]Xmem[)], [uns(]Cmem[)], ACx No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACy >> #16
AMAR Xmem No 4 1 X D unit MACs:: MAC[R][40] [uns(]Ymem[)], [uns(]Cmem[)],ACx >> #16
AMAR Xmem, Ymem, Cmem No 4 1 X D unit MACs
Dual Multiply (Accumulate/Subtract)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 108
FIR symetrical and antisymetrical filtersFIR symetrical and antisymetrical filters
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
FIRSADD Xmem, Ymem, Cmem, ACx, Acy No 4 1 XD unit MAC
and ALU
FIRSSUB Xmem, Ymem, Cmem, ACx, Acy No 4 1 XD unit MAC
and ALU
Finite Impulse Response Filter, Symmetrical/Antisymmetrical
See chapter 14 on FIR filters for detailed explanation of these instructions See chapter 14 on FIR filters for detailed explanation of these instructions and next slide for illustration.and next slide for illustration.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 109
Symmetrical/Antisymmetrical FIRSymmetrical/Antisymmetrical FIR
CoeffsCoeffs
a4a4 a5a5 a6a6 a7a7a3a3a2a2a1a1a0a0
SymmetricalSymmetrical
CoeffsCoeffs
a4a4 a5a5 a6a6 a7a7a3a3a2a2a1a1a0a0
Anti-symmetricalAnti-symmetrical
These filters may be “folded” and performed with N adds and N/2 MACs Filters need to be designed as even length
Y(n) = Y(n) = a0a0(x7 + x0) + (x7 + x0) + a1a1(x6 + x1) + (x6 + x1) + a2a2(x5 + x2) + (x5 + x2) + a3a3(x4 + x3)(x4 + x3)
Anti-symmetrical: use FIRSSUB (e.g. a0(x7-x0)), FIRSADD: 2 taps/cycle Anti-symmetrical: use FIRSSUB (e.g. a0(x7-x0)), FIRSADD: 2 taps/cycle
FIRSADD Xmem,Ymem,coef,ACx,ACyFIRSADD Xmem,Ymem,coef,ACx,ACy
FIRSADD = ACy = ACy + (ACx * (*CDP))FIRSADD = ACy = ACy + (ACx * (*CDP)) || ACx = Xmem + Ymem|| ACx = Xmem + Ymem
PseudoPseudoCode:Code:
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 110
Implied parallelism instructionsImplied parallelism instructions
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
MPYM[R] [T3 = ]Xmem, Tx, ACy No 4 1 X D unit MAC:: MOV HI(ACx << T2), Ymem and shifterMACM[R] [T3 = ]Xmem, Tx, ACy No 4 1 X D unit MAC:: MOV HI(ACx << T2), Ymem and shifterMASM[R] [T3 = ]Xmem, Tx, ACy No 4 1 X D unit MAC:: MOV HI(ACx << T2), Ymem and shifterADD Xmem << #16, ACx, ACy No 4 1 X D unit ALU:: MOV HI(ACy << T2), Ymem and shifterSUB Xmem << #16, ACx, ACy No 4 1 X D unit ALU:: MOV HI(ACy << T2), Ymem and shifterMOV Xmem << #16, ACy No 4 1 X D unit ALU:: MOV HI(ACx << T2), Ymem and shifterMACM[R] [T3 = ]Xmem, Tx, ACx No 4 1 X D unit MAC:: MOV Ymem << #16, ACyMASM[R] [T3 = ]Xmem, Tx, ACx No 4 1 X D unit MAC:: MOV Ymem << #16, ACy
Implied Paralleled Instructions
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 111
LMS, MAX and MIN instructionsLMS, MAX and MIN instructions
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
LMS Xmem, Ymem, ACx, Acy No 4 1 XD unit MAC
and ALU
MAX [src,] dst Yes 2 1 XA or D unit
ALU
MIN [src,] dst Yes 2 1 XA or D unit
ALU
Least Mean Square (LMS)
Maximum Comparison
Minimum Comparison
See chapter 16 on adaptive FIR filters for detailed explanation of the LMS instruction See chapter 16 on adaptive FIR filters for detailed explanation of the LMS instruction and next slide for illustration.and next slide for illustration.
See next sides for more explanation on min and max.See next sides for more explanation on min and max.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 112
Adaptive Filtering Using LMS - ConceptAdaptive Filtering Using LMS - ConceptA least mean square (LMS) approach is widely used for adaptive filter routines. A least mean square (LMS) approach is widely used for adaptive filter routines.
The technique minimizes an error term by tuning the filter coefficients.The technique minimizes an error term by tuning the filter coefficients.
e*xe*x00e*xe*x11 e*xe*xnn
inputinput Modeled SystemModeled System
dd
ee
--yy
a0
z-1 z-1 z-1...
a1 an
x0 x1 xk
Provide input to theProvide input to thereal system to getreal system to getdesired output: d(n)desired output: d(n)
LMS = LMS = FIRFIR , , Coeff updateCoeff update = (ACy = ACy + (ai*xi)), ACx = rnd(ei+ai)= (ACy = ACy + (ai*xi)), ACx = rnd(ei+ai)
Filter input through FIRFilter input through FIRmodel to get estimatedmodel to get estimatedoutput: y(n)output: y(n)
Compare results toCompare results toget error term: e(n)get error term: e(n)
Correlate the errorCorrelate the errorterm with x0, x1 …,term with x0, x1 …,to update coefficientsto update coefficients
input + noiseinput + noise
-input - noise-input - noise
eei i = e*x= e*xii
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 113
LMS Adaptive Filter CodeLMS Adaptive Filter Code ... ... .asg AR3, Coefs.asg AR3, Coefs .asg AR4, Data.asg AR4, Data
MOV @B2e,T3MOV @B2e,T3
|| MOV #(N-2),BRC0|| MOV #(N-2),BRC0
MPYM *Data,T3,AC0MPYM *Data,T3,AC0
|| MOV #0,AC1|| MOV #0,AC1
MOV port(@X_new),*Data+MOV port(@X_new),*Data+
LMS *Coefs,*Data,AC0,AC1LMS *Coefs,*Data,AC0,AC1
|| RPTBLOCAL e1|| RPTBLOCAL e1
MOV HI(AC0),*Coefs+MOV HI(AC0),*Coefs+
|| MPYM *Data+,T3,AC0|| MPYM *Data+,T3,AC0
e1: LMS *Coefs,*Data,AC0,AC1e1: LMS *Coefs,*Data,AC0,AC1
MOV HI(AC0),*CoefsMOV HI(AC0),*Coefs
|| MOV HI(AC1),*Result+|| MOV HI(AC1),*Result+
Pre-calculate Pre-calculate 2*2**e(n) ... *e(n) ... . . AR3 pts to coeff table: a[n]AR3 pts to coeff table: a[n]AR4 pts to data table: x[n]AR4 pts to data table: x[n]
T3 holds error step amount...T3 holds error step amount...
… … while loading BRC0while loading BRC0
ACO = error * oldest sample: x(n)... ACO = error * oldest sample: x(n)...
… while clearing AC1 (running FIR)… while clearing AC1 (running FIR)
Overwrite x(n) with x(0)Overwrite x(n) with x(0)
Start FIR calc, update oldest coeff…Start FIR calc, update oldest coeff…
… … and start repeat blockand start repeat block
Store update coefficient ...Store update coefficient ...
...while calculating next update term...while calculating next update term
Calc FIR, update coefficientCalc FIR, update coefficient
Store final coefficient...Store final coefficient...
……while storing FIR outputwhile storing FIR output
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 114
Search - MIN, MAXSearch - MIN, MAX Goal: find the max (or min) value in an array
162824
24893588
*ptrsrc dst
max
Y Carry=0
MAX src,dstMAX src,dst
Operands: src/dst can be AC0-3, AR0-7, T0-3
RPT #(N-2)RPT #(N-2)
MAX AC0,AC1MAX AC0,AC1
|| MOV *ptr+,AC0 || MOV *ptr+,AC0
Benchmark: ~N cycles to find the min/max of N elements
How do you determine WHICH value was the min/max?How do you determine WHICH value was the min/max?
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 115
Memory comparison, modify auxiliary Memory comparison, modify auxiliary register, modify data stack pointerregister, modify data stack pointer
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
CMP Smem == K16, TC1 No 4 1 X A unit ALUCMP Smem == K16, TC2 No 4 1 X A unit ALU
AADD TAx, Tay Yes 3 1 AD A unit DAGENs
ASUB TAx, Tay Yes 3 1 AD A unit DAGENsAMOV TAx, Tay Yes 3 1 AD A unit DAGENs
AADD k8, Tax Yes 3 1 AD A unit DAGENs
ASUB k8, Tax Yes 3 1 AD A unit DAGENs
AMOV k8, Tax Yes 3 1 AD A unit DAGENsAMOV D16, Tax No 4 1 AD A unit DAGENsAMAR Smem No 2 1 AD A unit DAGENs
AADD K8, SP Yes 2 1 AD A unit ALU
Modify Auxiliary Register (MAR)
Modify Data Stack Pointer (SP)
Memory Comparison
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 116
Multiply instructionsMultiply instructions
SyntaxParallel
enable bitSize Cycles Pipeline
Executed in
SQR[R] [ACx,] Acy Yes 2 1 X D unit MACMPY[R] [ACx,] Acy Yes 2 1 X D unit MACMPY[R] Tx, [ACx,] Acy Yes 2 1 X D unit MACMPYK[R] K8, [ACx,] Acy Yes 3 1 X D unit MACMPYK[R] K16, [ACx,] Acy No 4 1 X D unit MACMPYM[R] [T3 = ]Smem, Cmem, Acx No 3 1 X D unit MACSQRM[R] [T3 = ]Smem, Acx No 3 1 X D unit MACMPYM[R] [T3 = ]Smem, [ACx,] Acy No 3 1 X D unit MACMPYMK[R] [T3 = ]Smem, K8, Acx No 4 1 X D unit MACMPYM[R][40] [T3 = ][uns(]Xmem[)], [uns(]Ymem[)], Acx No 4 1 X D unit MACMPYM[R][U] [T3 = ]Smem, Tx, Acx No 3 1 X D unit MAC
Multiply
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 117
Multiply and accumulate instructionsMultiply and accumulate instructions
SyntaxParallel
enable bitSize Cycles Pipeline
Executed in
SQA[R] [ACx,] Acy Yes 2 1 X D unit MACMAC[R] ACx, Tx, ACy[, ACy] Yes 2 1 X D unit MACMAC[R] ACy, Tx, ACx, Acy Yes 2 1 X D unit MACMACK[R] Tx, K8, [ACx,] Acy Yes 3 1 X D unit MACMACK[R] Tx, K16, [ACx,] Acy No 4 1 X D unit MACMACM[R] [T3 = ]Smem, Cmem, Acx No 3 1 X D unit MACMACM[R]Z [T3 = ]Smem, Cmem, Acx No 3 1 X D unit MACSQAM[R] [T3 = ]Smem, [ACx,] Acy No 3 1 X D unit MACMACM[R] [T3 = ]Smem, [ACx,] Acy No 3 1 X D unit MACMACM[R] [T3 = ]Smem, Tx, [ACx,] Acy No 3 1 X D unit MACMACMK[R] [T3 = ]Smem, K8, [ACx,] Acy No 4 1 X D unit MACMACM[R][40] [T3 = ][uns(]Xmem[)], [uns(]Ymem[)], No 4 1 X D unit MAC[ACx,] ACyMACM[R][40] [T3 = ][uns(]Xmem[)], [uns(]Ymem[)], No 4 1 X D unit MACACx >> #16[, ACy]
Multiply and Accumulate (MAC)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 118
Multiply and subtract, negationMultiply and subtract, negation
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
SQS[R] [ACx,] Acy Yes 2 1 X D unit MACMAS[R] Tx, [ACx,] Acy Yes 2 1 X D unit MACMASM[R] [T3 = ]Smem, Cmem, ACx No 3 1 X No 3 1 X D unit MACSQSM[R] [T3 = ]Smem, [ACx,] ACy No 3 1 X No 3 1 X D unit MACMASM[R] [T3 = ]Smem, [ACx,] ACy No 3 1 X No 3 1 X D unit MACMASM[R] [T3 = ]Smem, Tx, [ACx,] ACy No 3 1 X No 3 1 X D unit MACMASM[R][40] [T3 = ][uns(]Xmem[)], [uns(]Ymem[)], No 4 1 X D unit MAC[ACx,] ACy
NEG [src,] dst Yes 2 1 X A or D unit ALU
Multiply and Subtract (MAS)
Negation
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 119
Normalization, Register comparisonNormalization, Register comparison
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
MANT ACx, ACy Yes 3 1 X A unit ALU:: NEXP ACx, Tx and D unit shifterEXP ACx, Tx Yes 3 1 X A unit ALU
and D unit shifter
Normalization
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
CMP[U] src RELOP dst, TCx Yes 3 1 X A or D unit ALUCMPAND[U] src RELOP dst, TCy, TCx Yes 3 1 X A or D unit ALUCMPAND[U] src RELOP dst, !TCy, TCx Yes 3 1 X A or D unit ALUCMPOR[U] src RELOP dst, TCy, TCx Yes 3 1 X A or D unit ALUCMPOR[U] src RELOP dst, !TCy, TCx Yes 3 1 X A or D unit ALU
Register comparison
These normalization instructions are useful when working with floating point numbers.These normalization instructions are useful when working with floating point numbers.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 120
Round and saturate, signed shiftRound and saturate, signed shift
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
ROUND [ACx,] Acy Yes 2 1 X D unit ALU
SAT[R] [ACx,] Acy Yes 2 1 X D unit ALU
Round
Saturate
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
SFTS dst, #–1 Yes 2 1 X A unit ALU or D unit shifterSFTS dst, #1 Yes 2 1 X A unit ALU or D unit shifterSFTS ACx, Tx[, ACy] Yes 2 1 X D unit shifterSFTSC ACx, Tx[, ACy] Yes 2 1 X D unit shifterSFTS ACx, #SHIFTW[, ACy] Yes 3 1 X D unit shifterSFTSC ACx, #SHIFTW[, ACy] Yes 3 1 X D unit shifter
Signed shift
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 121
Square distanceSquare distance
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
SQDST Xmem, Ymem, ACx, Acy No 4 1 X D unit MAC and ALUSquare distance
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 122
SubtractionSubtractionSyntax
Parallel enable bit
Size Cycles Pipeline Executed in
SUB [src,] dst Yes 2 1 X A or D unit ALU
SUB k4, dst Yes 2 1 X A or D unit ALU
SUB K16, [src,] dst No 4 1 X A or D unit ALU
SUB Smem, [src,] dst No 3 1 X A or D unit ALU
SUB src, Smem, dst No 3 1 X A or D unit ALU
SUB ACx << Tx, ACy Yes 2 1 X D unit ALU and Shiter
SUB ACx << #SHIFTW, ACy Yes 3 1 X D unit ALU and Shiter
SUB K16 << #16, [ACx,] ACy No 4 1 X D unit ALU
SUB K16 << #SHFT, [ACx,] ACy No 4 1 X D unit ALU and shifter
SUB Smem << Tx, [ACx,] ACy No 3 1 X D unit ALU and shifter
SUB Smem << #16, [ACx,] ACy No 3 1 X D unit ALU
SUB [uns(]Smem[)], CARRY, [ACx,] ACy No 3 1 X D unit ALU
SUB [uns(]Smem[)], [ACx,] ACy No 3 1 X D unit ALU
SUB [uns(]Smem[)] << #SHIFTW, [ACx,] ACy No 4 1 X D unit ALU and shifter
SUB dbl(Lmem), [ACx,] ACy No 3 1 X D unit ALU
SUB Xmem, Ymem, ACx No 3 1 X D unit ALU
SUB K16, Smem No 4 1 X D unit ALU
SUB[R]V [ACx,] ACy Yes 2 1 X D unit MAC
Subtraction
Similar to ADD instruction, except for form n°5Similar to ADD instruction, except for form n°5
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 123
Example for 5Example for 5thth form of SUB form of SUB
Syntax:Syntax: SUB src, Smem, dst SUB AC1, *AR3, AC0 The content of AC1 is subtracted from the The content of AC1 is subtracted from the
content addressed by AR3 and the result is content addressed by AR3 and the result is stored in AC0.stored in AC0.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 124
Bit manipulations operationsBit manipulations operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
BAND Smem, k16, TC1 No 4 1 X A unit ALU
BAND Smem, k16, TC2 No 4 1 X A unit ALU
BFXPA k16, ACx, dst No 4 1 X D unit shifter
BFXTR k16, ACx, dst No 4 1 X D unit shifter
Bit field comparison
Bit field expand
Bit field extract
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 125
Memory Bit Test/Set/Clear/Complement
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
BTST src, Smem, TCx No 3 1 X A unit ALU
BNOT src, Smem No 3 1 X A unit ALU
BCLR src, Smem No 3 1 X A unit ALU
BSET src, Smem No 3 1 X A unit ALU
BTSTSET k4, Smem, TC1 No 3 1 X A unit ALU
BTSTSET k4, Smem, TC2 No 3 1 X A unit ALU
BTSTCLR k4, Smem, TC1 No 3 1 X A unit ALU
BTSTCLR k4, Smem, TC2 No 3 1 X A unit ALU
BTSTNOT k4, Smem, TC1 No 3 1 X A unit ALU
BTSTNOT k4, Smem, TC2 No 3 1 X A unit ALU
BTST k4, Smem, TC1 No 3 1 X A unit ALU
BTST k4, Smem, TC2 No 3 1 X A unit ALU
Memory Bit Test/Set/Clear/Complement
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 126
Register Bit test/Set/Clear/ComplementRegister Bit test/Set/Clear/Complement
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
BTST Baddr, src, TCx No 3 1 X A or D unit ALU
BNOT Baddr, src No 3 1 X A or D unit ALU
BCLR Baddr, src No 3 1 X A or D unit ALU
BSET Baddr, src No 3 1 X A or D unit ALU
BTSTP Baddr, src No 3 1 X A or D unit ALU
Register Bit Test/Set/Clear/Complement
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 127
Status bit Set/ClearStatus bit Set/Clear
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
BCLR k4, ST0_55 Yes 2 1 X A unit ALU
BSET k4, ST0_55 Yes 2 1 X A unit ALU
BCLR k4, ST1_55 Yes 2 1 X A unit ALU
BSET k4, ST1_55 Yes 2 1 X A unit ALU
BCLR k4, ST2_55 Yes 2 1 X A unit ALU
BSET k4, ST2_55 Yes 2 1 X A unit ALU
BCLR k4, ST3_55 Yes 2 1† X A unit ALU
BSET k4, ST3_55 Yes 2 1† X A unit ALU
BCLR f–name Yes 2 1† X A unit ALU
BSET f–name Yes 2 1† X A unit ALU
Status Bit Set/Clear
† † When these instructions are decoded to modify status bit CAFRZ (15), When these instructions are decoded to modify status bit CAFRZ (15), CAEN (14), or CACLR (13), the CPU pipeline is flushed and the CAEN (14), or CACLR (13), the CPU pipeline is flushed and the
instruction is executed in 5 cycles regardless of the instruction contextinstruction is executed in 5 cycles regardless of the instruction context.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 128
Extended auxiliary register (XAR) operationsExtended auxiliary register (XAR) operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
MOV xsrc, xdst No 2 1 X
AMAR Smem, Xadst No 3 1 ADAMOV k23, XAdst No 6 1 AD
MOV dbl(Lmem), Xadst No 3 1 X
POPBOTH xdst Yes 2 1 X
PSHBOTH xsrc Yes 2 1 X
MOV XAsrc, dbl(Lmem) No 3 1 X
Extended Auxiliary Register Move
Load Effective Address to Extended Auxiliary Register
Load Extended Auxiliary Register from Memory
Pop Extended Auxiliary Register from Stack Pointers
Push Extended Auxiliary Register to Stack Pointers
Store Extended Auxiliary Register to Memory
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 129
Logical operationsLogical operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
BCNT ACx, ACy,TCx, Tx Yes 3 1 X D unit shifter and A unit ALU
NOT [src,] dst Yes 2 1 X A or D unit ALU
AND src, dst Yes 2 1 X A or D unit ALU
AND k8, src, dst Yes 3 1 X A or D unit ALU
AND k16, src, dst No 4 1 X A or D unit ALU
AND Smem, src, dst No 3 1 X A or D unit ALU
AND ACx << #SHIFTW[, ACy] Yes 3 1 X D unit shifter
AND k16 << #16, [ACx,] Acy No 4 1 X D unit ALU
AND k16 << #SHFT, [ACx,] Acy No 4 1 X D unit shifter
AND k16, Smem No 4 1 X A unit ALU
Bitwise Complement
Bitwise AND
Bit Field Counting
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 130
Logical operationsLogical operationsSyntax
Parallel enable bit
Size Cycles Pipeline Executed in
OR src, dst Yes 2 1 X A or D unit ALU
OR k8, src, dst Yes 3 1 X A or D unit ALU
OR k16, src, dst No 4 1 X A or D unit ALU
OR Smem, src, dst No 3 1 X A or D unit ALU
OR ACx << #SHIFTW[, ACy] Yes 3 1 X D unit shifter
OR k16 << #16, [ACx,] Acy No 4 1 X D unit ALU
OR k16 << #SHFT, [ACx,] Acy No 4 1 X D unit shifter
OR k16, Smem No 4 1 X A unit ALU
XOR src, dst Yes 2 1 X A or D unit ALU
XOR k8, src, dst Yes 3 1 X A or D unit ALU
XOR k16, src, dst No 4 1 X A or D unit ALU
XOR Smem, src, dst No 3 1 X A or D unit ALU
XOR ACx << #SHIFTW[, ACy] Yes 3 1 X D unit shifter
OR k16 << #16, [ACx,] Acy No 4 1 X D unit ALU
XOR k16 << #SHFT, [ACx,] Acy No 4 1 X D unit shifter
XOR k16, Smem No 4 1 X A unit ALU
BitwiseXOR
Bitwise OR
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 131
Logival operationsLogival operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
SFTL dst, #1 Yes 2 1 X A unit ALU or D unit shifter
SFTL dst, #–1 Yes 2 1 X A unit ALU or D unit shifter
SFTL ACx, Tx[, ACy] Yes 2 1 X D unit shifter
SFTL ACx, #SHIFTW[, ACy] Yes 3 1 X D unit shifter
NEG [src,] dst Yes 2 1 X A or D unit ALU
ROL BitOut, src, BitIn, dst Yes 3 1 X A unit ALU or D unit shifter
ROR BitIn, src, BitOut, dst Yes 3 1 X A unit ALU or D unit shifterRotate Right
Logical Shift
Rotate Left
Negation
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 132
Miscelaneous operationsMiscelaneous operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
<instruction>.LR No 1 1 AD<instruction>.CR No 1 1 AD
mmap No 1 1 D
port(Smem) No 1 1 D
Linear/Circular Addressing Qualifiers
Memory-Mapped Register Access Qualifier
Peripheral Port Register Access Qualifier
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 133
Move operationsMove operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
SWAP ARx, Tx Yes 2 1 AD A unit register file
SWAP Tx, Ty Yes 2 1 AD A unit register file
SWAP ARx, ARy Yes 2 1 AD A unit register file
SWAP ACx, ACy Yes 2 1 X D unit register file
SWAPP ARx, Tx Yes 2 1 AD A unit register file
SWAPP T0, T2 Yes 2 1 AD A unit register file
SWAPP AR0, AR2 Yes 2 1 AD A unit register file
SWAPP AC0, AC2 Yes 2 1 X D unit register file
SWAP4 AR4, T0 Yes 2 1 AD A unit register file
Accumulator, Auxiliary, or Temporary Register Content Swap
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 134
Move operationsMove operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
MOV k4, dst Yes 2 1 X A or D unit register file
MOV –k4, dst Yes 2 1 X A or D unit register file
MOV K16, dst No 4 1 X A or D unit register file
MOV Smem, dst No 2 1 X A or D unit register file
MOV [uns(]high_byte(Smem)[)], dst No 3 1 X A or D unit register file
MOV [uns(]low_byte(Smem)[)], dst No 3 1 X A or D unit register file
MOV K16 << #16, ACx No 4 1 X D unit ALU
MOV K16 << #SHFT, ACx No 4 1 X D unit shifter
MOV [rnd(]Smem << Tx[)], ACx No 3 1 X D unit shifter
Accumulator, Auxiliary, or Temporary Register Load 1/2
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 135
Move operationsMove operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
MOV low_byte(Smem) << #SHIFTW, Acx No 3 1 X D unit shifter
MOV high_byte(Smem) << #SHIFTW, Acx No 3 1 X D unit shifter
MOV Smem << #16, Acx No 2 1 X D unit ALU
MOV [uns(]Smem[)], Acx No 3 1 X D unit register file
MOV [uns(]Smem[)] << #SHIFTW, ACx No 4 1 X D unit shifter
MOV[40] dbl(Lmem), ACx No 3 1 X D unit register file
MOV Xmem, Ymem, ACx No 3 1 1 D unit register file
MOV dbl(Lmem), pair(HI(ACx)) No 3 1 X D unit register file
MOV dbl(Lmem), pair(LO(ACx)) No 3 1 X D unit register file
MOV dbl(Lmem), pair(TAx) No 3 1 X A unit register file
Accumulator, Auxiliary, or Temporary Register Load 2/2
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 136
Move operationsMove operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
MOV src, dst Yes 2 1 X A or D unit ALU
MOV HI(ACx), Tax Yes 2 1 X A unit ALU
MOV TAx, HI(ACx) Yes 2 1 X D unit ALU
Accumulator, Auxiliary, or Temporary Register Move
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 137
Move operationsMove operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
MOV src, Smem No 2 1 X A or D unit register file
MOV src, high_byte(Smem) No 3 1 X A or D unit register file
MOV src, low_byte(Smem) No 3 1 X A or D unit register file
MOV HI(ACx), Smem No 2 1 X D unit register file
MOV [rnd(]HI(ACx)[)], Smem No 3 1 X D unit shifter
MOV ACx << Tx, Smem No 3 1 X D unit shifter
MOV [rnd(]HI(ACx << Tx)[)], Smem No 3 1 X D unit shifter
MOV ACx << #SHIFTW, Smem No 3 1 X D unit shifter
MOV HI(ACx << #SHIFTW), Smem No 3 1 X D unit shifter
MOV [rnd(]HI(ACx << #SHIFTW)[)], Smem No 4 1 X D unit shifter
Accumulator, Auxiliary, or Temporary Register Store 1/2
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 138
Move operationsMove operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
MOV [uns(] [rnd(]HI(saturate(ACx))[))], Smem No 3 1 X D unit shifter
MOV [uns(] [rnd(]HI(saturate(ACx << Tx))[))], Smem No 3 1 X D unit shifter
MOV [uns(](rnd(]HI(saturate(ACx << #SHIFTW))[))],Smem No 4 1 X D unit shifter
MOV ACx, dbl(Lmem) No 3 1 X D unit shifter
MOV [uns(]saturate(ACx)[)], dbl(Lmem) No 3 1 X D unit shifter
MOV ACx >> #1, dual(Lmem) No 3 1 X D unit register file
MOV pair(HI(ACx)), dbl(Lmem) No 3 1 X D unit register file
MOV pair(LO(ACx)), dbl(Lmem) No 3 1 X A unit register file
MOV pair(TAx), dbl(Lmem) No 3 1 X D unit shifter
MOV ACx, Xmem, Ymem No 3 1 X D unit register file
Accumulator, Auxiliary, or Temporary Register Store 2/2
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 139
Move operationsMove operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
DELAY Smem No 2 1 X A or D unit register file
MOV Cmem, Smem No 3 1 X A or D unit register file
MOV Smem, Cmem No 3 1 X A or D unit register file
MOV K8, Smem No 3 1 X A or D unit register file
MOV K16, Smem No 4 1 X A or D unit register file
MOV Cmem,dbl(Lmem) No 3 1 X A or D unit register file
MOV dbl(Lmem), Cmem No 3 1 X A or D unit register file
MOV dbl(Xmem), dbl(Ymem) No 3 1 X A or D unit register file
MOV Xmem, Ymem No 3 1 X A or D unit register file
Memory Delay
Memory-to-Memory Move/Memory Initialization
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 140
Move operationsMove operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
POP dst1,dst2 Yes 2 1 X A or D unit register file
POP dst Yes 2 1 X A or D unit register file
POP dst, Smem No 3 1 X A or D unit register file
POP dbl(ACx) Yes 2 1 X A or D unit register file
POP Smem No 2 1 X A or D unit register file
POP dbl(Lmem) No 2 1 X A or D unit register file
PSH src1,src2 Yes 2 1 X A or D unit register file
PSH src Yes 2 1 X A or D unit register file
PSH src,Smem No 3 1 X A or D unit register file
PSH dbl(ACx) Yes 2 1 X A or D unit register file
PSH Smem No 2 1 X A or D unit register file
PSH dbl(Lmem) No 2 1 X A or D unit register file
Pop Top of Stack (TOS)
Push to Top of Stack (TOS)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 141
Move operationsMove operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
MOV TAx, BRC0 Yes 2 1 X A unit ALU
MOV TAx, BRC1 Yes 2 1 X A unit ALU
MOV TAx, CDP Yes 2 1 X A unit ALU
MOV TAx, CSR Yes 2 1 X A unit ALU
MOV TAx, SP Yes 2 1 X A unit ALU
MOV TAx, SSP Yes 2 1 X A unit ALU
MOV BRC0, Tax Yes 2 1 X A unit ALU
MOV BRC1, Tax Yes 2 1 X A unit ALU
MOV CDP, Tax Yes 2 1 X A unit ALU
MOV RPTC, Tax Yes 2 1 X A unit ALU
MOV SP, Tax Yes 2 1 X A unit ALU
MOV SSP, Tax Yes 2 1 X A unit ALU
Specific CPU Register Move
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 142
Move operationsMove operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
MOV BK03, Smem No 3 1 X A or D unit register file
MOV BK47, Smem No 3 1 X A or D unit register file
MOV BKC, Smem No 3 1 X A or D unit register file
MOV BSA01, Smem No 3 1 X A or D unit register file
MOV BSA23, Smem No 3 1 X A or D unit register file
MOV BSA45, Smem No 3 1 X A or D unit register file
MOV BSA67, Smem No 3 1 X A or D unit register file
MOV BSAC, Smem No 3 1 X A or D unit register file
MOV BRC0, Smem No 3 1 X A or D unit register file
MOV BRC1, Smem No 3 1 X A or D unit register file
Specific CPU Register Store 1/2
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 143
Move operationsMove operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
MOV CDP, Smem No 3 1 X A or D unit register file
MOV CSR, Smem No 3 1 X A or D unit register file
MOV DP, Smem No 3 1 X A or D unit register file
MOV DPH, Smem No 3 1 X A or D unit register file
MOV PDP, Smem No 3 1 X A or D unit register file
MOV SP, Smem No 3 1 X A or D unit register file
MOV SSP, Smem No 3 1 X A or D unit register file
MOV TRN0, Smem No 3 1 X A or D unit register file
MOV TRN1, Smem No 3 1 X A or D unit register file
MOV RETA, dbl(Lmem) No 3 5 X A or D unit register file
Specific CPU Register Store 2/2
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 144
Program-control operationsProgram-control operations
x/y cycles: x cycles = condition true, y cycles = condition falsex/y cycles: x cycles = condition true, y cycles = condition false
† † These instructions execute in 3 cycles if the addressed instruction is These instructions execute in 3 cycles if the addressed instruction is in the instruction buffer unit.in the instruction buffer unit.
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
BCC l4, cond No 2 6/5 RBCC L8, cond Yes 3 6/5 RBCC L16, cond No 4 6/5 RBCC P24, cond No 5 5/5 R
B Acx No 2 10 XB L7 Yes 2 6† ADB L16 Yes 3 6† ADB P24 No 4 5 D
BCC L16, ARn_mod ! = #0 No 4 6/5 AD
Branch Conditionally
Branch Unconditionally
Branch on Auxiliary Register Not Zero
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 145
Program-control operationsProgram-control operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
CALLCC L16, cond No 4 6/5 RCALLCC P24, cond No 5 5/5 R
CALL Acx No 2 10 XCALL L16 Yes 3 6 ADCALL P24 No 4 5 D
BCC[U] L8, src RELOP K8 No 4 7/6 X
Call Conditionally
Call Unconditionally
Compare and Branch
x/y cycles: x cycles = condition true, y cycles = condition false.x/y cycles: x cycles = condition true, y cycles = condition false.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 146
Program-control operationsProgram-control operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
XCC [label, ]cond No 2 1 ADXCCPART [label, ]cond No 2 1 X
IDLE No 4 ? D No 4 ? D
NOP Yes 1 1 DNOP_16 Yes 2 1 D
Idle
No Operation (NOP)
Execute Conditionally
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 147
Program-control operationsProgram-control operations
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
RPTBLOCAL pmad Yes 2 1 ADRPTB pmad Yes 3 1 AD
RPTCC k8, cond Yes 3 1 AD
RPT CSR Yes 2 1 ADRPTADD CSR, Tax Yes 2 1 XRPT k8 Yes 2 1 ADRPTADD CSR, k4 Yes 2 1 XRPTSUB CSR, k4 Yes 2 1 XRPT k16 Yes 3 1 AD
Repeat Single Instruction Unconditionally
Repeat Block of Instructions Unconditionally
Repeat Single Instruction Conditionally
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 148
Program-control operationsProgram-control operations
x/y cycles: x cycles = condition true, y cycles = condition falsex/y cycles: x cycles = condition true, y cycles = condition false
SyntaxParallel
enable bitSize Cycles Pipeline Executed in
RETCC cond Yes 3 5/5 R
RET Yes 2 5 D
RETI Yes 2 5 D
INTR k5 No 2 3 D
RESET No 2 ? D
TRAP k5 No 2 ? D
Software Interrupt
Software Reset
Software Trap
Return Conditionally
Return Unconditionally
Return from Interrupt
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 149
Running C54 code on the C55Running C54 code on the C55
Assembling/optimizing a C54x algorithmAssembling/optimizing a C54x algorithm
System issues (interrupts, peripherals, etc)System issues (interrupts, peripherals, etc)
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 150
Running C54 Code on the C55Running C54 Code on the C55 C55 assembler accepts all C54x mnemonics (no translationC55 assembler accepts all C54x mnemonics (no translation
necessary: C55 instructions are a superset of C54)necessary: C55 instructions are a superset of C54)
C54 source will execute correctly on C55 (bit exact)C54 source will execute correctly on C55 (bit exact)
User must optimize C54 code to use new C55 featuresUser must optimize C54 code to use new C55 features
What % of commonly used C54 instructions assemble asWhat % of commonly used C54 instructions assemble asONE instruction on the C55 ?ONE instruction on the C55 ?
One C54x instruction assembles as:
1 C55x instruction(97%)
2 C55x instructions(2%)
3+ C55x instructions(1%)
2 areas of focus:2 areas of focus: AlgorithmsAlgorithms SystemSystem
Let’s look at Let’s look at algorithms first...algorithms first...
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 151
C54 Filter Algorithm ExampleC54 Filter Algorithm Example
fir: fir: STM #184,BRCSTM #184,BRCSTM #x,AR3STM #x,AR3STM #y,AR4STM #y,AR4RPTB done-1RPTB done-1 STM #a0,AR2STM #a0,AR2 MPY *AR2+,*AR3+,AMPY *AR2+,*AR3+,A RPTRPT #14#14 MACMAC *AR2+,*AR3+,A*AR2+,*AR3+,A
MAR *+AR3(#-15)MAR *+AR3(#-15) STHSTH A,*AR4+ A,*AR4+
done:done: B next_routineB next_routine
fir: fir: AMOV #184,BRC0AMOV #184,BRC0AMOV #x,XAR3AMOV #x,XAR3AMOV #y,XAR4AMOV #y,XAR4RPTB doneRPTB done AMOV #a0,XAR2AMOV #a0,XAR2 MPYM *AR2+,*AR3+,AC0MPYM *AR2+,*AR3+,AC0 RPT #14RPT #14 MACM *AR2+,*AR3+,AC0MACM *AR2+,*AR3+,AC0
AMAR *+AR3(#-15)AMAR *+AR3(#-15)done:done: MOVH AC0,*AR4+ MOVH AC0,*AR4+
B next_routineB next_routine
Original C54 CodeOriginal C54 Code Resulting C55 DisassemblyResulting C55 Disassembly
The code above assembles AS IS - no errors, no warningsThe code above assembles AS IS - no errors, no warnings User must modify code to use C55 dual-MAC (pointers, MAC:MAC)User must modify code to use C55 dual-MAC (pointers, MAC:MAC)
RPTB kernel: 20 bytes, 4072 cyclesRPTB kernel: 20 bytes, 4072 cycles
Does “anything go” with C54 source? Nope...Does “anything go” with C54 source? Nope...
kernel (w/o B): 17 bytes, 3701 cycleskernel (w/o B): 17 bytes, 3701 cycles
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 152
C54 Coding Practices - No-No’sC54 Coding Practices - No-No’s
hard_addr: B 1000hhard_addr: B 1000h
B loop+10B loop+10
ERROR! at line 39: [E9999]ERROR! at line 39: [E9999] Illegal operand or Illegal operand or operand combinationoperand combination
Original C54 CodeOriginal C54 Code C55 Assembler OutputC55 Assembler Output
Cannot use hard-coded address or program label offsetsCannot use hard-coded address or program label offsets
C55 uses variable-length instructions. Location of hardC55 uses variable-length instructions. Location of hardaddress and/or offset will be different.address and/or offset will be different.
pipe_trick: STM #100h,AR5pipe_trick: STM #100h,AR5
STLM A,AR5STLM A,AR5
ADD *AR5,BADD *AR5,B
pipe_trick: AMOV #100h,AR5pipe_trick: AMOV #100h,AR5
MOV AC0,AR5 MOV AC0,AR5
ADD *AR5,AC1ADD *AR5,AC1
Original C54 CodeOriginal C54 Code Resulting C55 DisassemblyResulting C55 Disassembly
Pipe Trick: C54 Pipe Trick: C54 ADDADD uses old value of AR5 (#100h) uses old value of AR5 (#100h) Code assembles fine, but will NOT yield same resultsCode assembles fine, but will NOT yield same results C55 PPU stalls C55 PPU stalls ADDADD until AC0 until AC0 AR5 AR5
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 153
Other C55 Assembler ExamplesOther C55 Assembler Examples
b_delay: BD nextb_delay: BD next
STM #x,AR5STM #x,AR5
Original C54 CodeOriginal C54 Code Resulting C55 DisassemblyResulting C55 Disassembly
No delayed operations on C55No delayed operations on C55
Delay slot instruction (Delay slot instruction (STMSTM) moved ahead of) moved ahead ofstandard branch (same with standard branch (same with CALLD, RETD, BCDCALLD, RETD, BCD))
b_delay: AMOV #x,AR5b_delay: AMOV #x,AR5
B nextB next
mv: MVDK *AR3,#800hmv: MVDK *AR3,#800h
Resulting C55 .lst FileResulting C55 .lst File
““PORT”PORT” used to signify 1-to-multiple translation used to signify 1-to-multiple translation Some temporary registers may be used (e.g. Some temporary registers may be used (e.g. XCDPXCDP) )
57 ****** PORT MVDK *AR4,#800h 57 ****** PORT MVDK *AR4,#800h 57 000038 EC31 AMAR *(#(2048)),XCDP57 000038 EC31 AMAR *(#(2048)),XCDP 00003a 7E00 00003a 7E00 00003c 1000 00003c 1000 57 00003e EF81 MOV *AR4,*CDP57 00003e EF81 MOV *AR4,*CDP
Original C54 CodeOriginal C54 Code
Next, let’s examine some system issues...Next, let’s examine some system issues...
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 154
System Issues (1)System Issues (1) The user must make the following modifications to the The user must make the following modifications to the
existing C54 code to achieve proper execution on a C55:existing C54 code to achieve proper execution on a C55:
Add System Stack (XSSP)Add System Stack (XSSP) Must also initialize the system stack (XSSP) for CALL/RETMust also initialize the system stack (XSSP) for CALL/RET
instructions to operate properly on C55instructions to operate properly on C55
SP and SSP must reside on the same 64K (word) pageSP and SSP must reside on the same 64K (word) page
22
rsv: BD reset_ISRrsv: BD reset_ISR
STM #TOS,SPSTM #TOS,SP
Vector Table - C54Vector Table - C54 Vector Table - C55Vector Table - C55
.ivec specifies vector address and aligns vector packet.ivec specifies vector address and aligns vector packeton an 8-byte boundary.on an 8-byte boundary.
C55 vector packet allows one instruction (up to 4 bytes)C55 vector packet allows one instruction (up to 4 bytes)to be executedto be executed
rsv: .ivec reset_ISRrsv: .ivec reset_ISR
AMOV #TOS,XSPAMOV #TOS,XSP
11 Re-Write Vector TableRe-Write Vector Table
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 155
System Issues (2)System Issues (2)33 Re-write any instructions which access peripheralsRe-write any instructions which access peripherals
C55 does not support sub-bank addressingC55 does not support sub-bank addressing
C55 peripheral registers are located in I/OC55 peripheral registers are located in I/O
C55 may have a different mix of peripheralsC55 may have a different mix of peripherals
44 Re-write linker.CMD file to use BYTE addresses/lengthsRe-write linker.CMD file to use BYTE addresses/lengths C55 uses a unified memory map (program/data share same map)C55 uses a unified memory map (program/data share same map)
C55 linker.CMD file uses BYTE addresses and BYTE lengthsC55 linker.CMD file uses BYTE addresses and BYTE lengths
55 Possibly re-write C-callable assembly routinesPossibly re-write C-callable assembly routines C55 compiler passes parameters differently than the C54C55 compiler passes parameters differently than the C54
by defaultby default
C55 can support C54 parameter passing rules if specifiedC55 can support C54 parameter passing rules if specifiedby the userby the user
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 156
User Must Optimize C54 CodeUser Must Optimize C54 Code
The final step in the migration process is to optimize theThe final step in the migration process is to optimize thecode to take advantage of the new features of the C55:code to take advantage of the new features of the C55:
Dual MACDual MAC Additional registersAdditional registers Additional memory address rangeAdditional memory address range Address unit ALUAddress unit ALU Instruction-level parallelismInstruction-level parallelism Nested repeat blocksNested repeat blocks Etc.Etc.
Copyright © 2003 Texas Instruments. All rights reserved.
ESIEE, Slide 157
For More InformationFor More Information This introduction to C54-to-C55 code migration isThis introduction to C54-to-C55 code migration is
NOTNOT a complete list of all issues a complete list of all issues
Additional details can be found in the following locations:Additional details can be found in the following locations:
SPRU280b - TMS320C55x Assembly Language Tools User’s GuideSPRU280b - TMS320C55x Assembly Language Tools User’s Guide
Chapter 6 - Running C54x Code on C55xChapter 6 - Running C54x Code on C55x
Chapter 7 - Migrating a C54x System to a C55x SystemChapter 7 - Migrating a C54x System to a C55x System
CCS On-line HelpCCS On-line Help
Code Generation Tools Code Generation Tools Running C54x Code on C55x Running C54x Code on C55x