arm cortex m4 architecture
DESCRIPTION
ARM Cortex M4 ArchitectureARM Cortex M4 ArchitectureARM Cortex M4 ArchitectureTRANSCRIPT
-
erCortex-M4 Architecture and ASM ProgrammingIntroductionIn this chapter programming the Cortex-M4 in assembly and Cwill be introduced. Preference will be given to explaining codedevelopment for the STM32F4 Discovery and LPC4088 QuickStart. The basis for the material presented in this chapter is thecourse notes from the ARM LiB program1.
Overview Cortex-M4 Memory Map
Cortex-M4 Memory Map Bit-band Operations Cortex-M4 Program Image and Endianness
ARM Cortex-M4 Processor Instruction Set ARM and Thumb Instruction Set Cortex-M4 Instruction Set
1.LiB Low-level Embedded NXP LPC4088 Quick Start
Chapt
3
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Memory Map The Cortex-M4 processor has 4 GB of memory address space
Support for bit-band operation (detailed later) The 4GB memory space is architecturally defined as a num-
ber of regions Each region is given for recommended usage Easy for software programmer to port between different
devices Nevertheless, despite of the default memory map, the actual
usage of the memory map can also be flexibly defined by theuser, except some fixed memory addresses, such as internalprivate peripheral bus32 ECE 5655/4655 Real-Time DSP
-
M4 Memory Map (cont.)M4 Memory Map (cont.)
Priv
ate
peri
pher
als
e.g.
NV
IC, S
CS
Mai
nly
used
for
exte
rnal
per
iphe
rals
e.g.
SD c
ard
Mai
nly
used
for
exte
rnal
mem
orie
se.
g. ex
tern
al D
DR
, FLA
SH, L
CD
Mai
nly
used
for
on-c
hip
peri
pher
als
e.g.
AH
B, A
PB p
erip
hera
ls
Mai
nly
used
for
data
mem
ory
e.g.
on-c
hip
SRA
M, S
DR
AM
Mai
nly
used
for
prog
ram
cod
e e.
g. on
-chi
p FL
ASH
Vend
or s
peci
ficM
emor
y
Exte
rnal
dev
ice
Exte
rnal
RA
M
Peri
pher
als
SRA
M
Cod
e
0xFF
FFFF
FF
0xE0
0000
00Pr
ivat
e Pe
riph
eral
Bus
(PPB
)0x
DFF
FFFF
F
0xA
0000
000
0x9F
FFFF
FF
0x60
0000
000x
5FFF
FFFF
0x40
0000
000x
3FFF
FFFF
0x1F
FFFF
FF0x
2000
0000
0x00
0000
00
512M
B
512M
B
512M
B
1GB
1GB
512M
B0x
E00F
FFFF
0xE0
1000
00R
eser
ved
for
othe
r pu
rpos
esRO
M t
able
Exte
rnal
PPB
Embe
dded
trac
e m
acro
cell
Trac
e po
rt in
terf
ace
unit
Res
erve
d
Syst
em C
ontr
ol S
pace
, inc
ludi
ngN
este
d Ve
ctor
ed In
terr
upt
Con
trol
ler
(NV
IC)
Res
erve
dFe
tch
patc
h an
d br
eakp
oint
uni
t
Dat
a w
atch
poin
tan
d tr
ace
unit
Inst
rum
enta
tion
trac
e m
acro
cell
Exte
rnal
PPB
Inte
rnal
PPBECE 5655/4655 Real-Time DSP 33
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingM4 Memory Map (cont.) Code Region
Primarily used to store program code Can also be used for data memory On-chip memory, such as on-chip FLASH
SRAM Region Primarily used to store data, such as heaps and stacks Can also be used for program code On-chip memory; despite its name SRAM, the actual
device could be SRAM, SDRAM or other types Peripheral Region
Primarily used for peripherals, such as Advanced High-performance Bus (AHB) or Advanced Peripheral Bus(APB) peripherals
External RAM Region Primarily used to store large data blocks, or memory
caches Off-chip memory, slower than on-chip SRAM region
External Device Region Primarily used to map to external devices Off-chip devices, such as SD card
Internal Private Peripheral Bus (PPB)34 ECE 5655/4655 Real-Time DSP
-
Cortex-M4 Memory Map Example Used inside the processor core for internal control Within PPB, a special range of memory is defined as Sys-
tem Control Space (SCS) The Nested Vectored Interrupt Controller (NVIC) is part of
SCS
Cortex-M4 Memory Map Example
AHB bus
External SRAM,FLASH External LCD SD card
Cortex-M4 PPB SCS NVICDebug Ctrl
On-chip FLASH(Code Region)
On-chip SRAM(SRAM Region) Peripheral Region
External memory interface(External RAM Region)
External device interface(External Device Region)
Timer UART GPIO
Chip SiliconECE 5655/4655 Real-Time DSP 35
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingBit-band Operations Bit-band operation allows a single load/store operation to
access a single bit in the memory, for example, to change asingle bit of one 32-bit data: Normal operation without bit-band (read-modify-write) Read the value of 32-bit data Modify a single bit of the 32-bit value (keep other bits
unchanged) Write the value back to the address Bit-band operation Directly write a single bit (0 or 1) to the bit-band alias
address of the data Bit-band alias address
Each bit-band alias address is mapped to a real dataaddress
When writing to the bit-band alias address, only a singlebit of the data will be changed36 ECE 5655/4655 Real-Time DSP
-
Bit-band Operation ExampleBit-band Operation Example For example, in order to set bit[3] in word data in address
0x20000000:
Read-Modify-Write operation Read the real data address (0x20000000) Modify the desired bit (retain other bits unchanged) Write the modified data back
Bit-band operation Directly set the bit by writing 1 to address 0x2200000C,
which is the alias address of the fourth bit of the 32-bitdata at 0x20000000
In effect, this single instruction is mapped to 2 bus trans-fers: read data from 0x20000000 to the buffer, and thenwrite to 0x20000000 from the buffer with bit [3] set
;Read-Modify-Write Operation
LDR R1, =0x20000000 ;Setup addressLDR R0, [R1] ;ReadORR.W R0, #0x8 ;Modify bitSTR R0, [R1] ;Write back
;Bit-band Operation
LDR R1, =0x2200000C ;Setup addressMOV R0, #1 ;Load dataSTR R0, [R1] ;WriteECE 5655/4655 Real-Time DSP 37
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingBit-band Alias AddressEach bit of the 32-bit data is one-to-one mapped to the bit-bandalias address
For example, the fourth bit (bit [3]) of the data at0x20000000 is mapped to the bit-band alias address at0x2200000C
Hence, to set bit [3] of the data at 0x20000000, we onlyneed to write 1 to address 0x2200000C
In Cortex-M4, there are two pre-defined bit-band aliasregions: one for SRAM region, and one for peripheralsregion
0x20000000
0x20000004
0x20000008
Real 32-bit data address
0x22000000
0x22000080
0x22000100
0x2200000C
0x22000018
Bit-band alias address38 ECE 5655/4655 Real-Time DSP
-
Bit-band Alias Address (cont.)Bit-band Alias Address (cont.) SRAM region
32MB memory space (0x22000000 0x23FFFFFF) isused as the bit-band alias region for 1MB data(0x20000000 0x200FFFFF)
Peripherals region 32MB memory space (0x42000000 0x43FFFFFF) is
used as the bit-band alias region for 1MB data(0x40000000 0x400FFFFF)
External RAM
Peripherals
SRAM
Code
0x600000000x5FFFFFFF
0x400000000x3FFFFFFF
0x1FFFFFFF0x20000000
0x00000000
512MB
512MB
512MB
1MB Bit-band region
32MB Bit-band alias
31MB non-bit-band region
0x20000000
0x20100000
0x22000000
0x23FFFFFF
0x21FFFFFF
1MB Bit-band region
32MB Bit-band alias
31MB non-bit-band region
0x40000000
0x40100000
0x42000000
0x43FFFFFF
0x41FFFFFFECE 5655/4655 Real-Time DSP 39
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingBenefits of Bit-Band Operations Faster bit operations Fewer instructions Atomic operation, avoid hazards
For example, if an interrupt is triggered and served duringthe Read-Modify-Write operations, and the interrupt ser-vice routine modifies the same data, a data conflict willoccur
Interrupt occurs
Read data at 0x00 Modify bit [1]
Read data at 0x00 Modify bit [1] Write data back
Write data back
Interrupt returns
Bit [1] modified by ISR is overwritten by the main program
Interrupt Service Routine
Main program310 ECE 5655/4655 Real-Time DSP
-
Cortex-M4 Program ImageCortex-M4 Program Image The program image in Cortex-M4 contains
Vector table -- includes the starting addresses of exceptions(vectors) and the value of the main stack point (MSP);
C start-up routine; Program code application code and data; C library code program codes for C library functions
0x00000000
Code region
Start-up routine &Program code &C library code
Vector table
ProgramImage
Initial MSP valueReset vectorNMI vector
Hard fault vectorMemManage fault
ReservedPendSVSysTick
External Interrupts
Bus faultUsage fault
Reserved
SVCallDebug monitorECE 5655/4655 Real-Time DSP 311
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Program Image (cont) After Reset, the processor:
First reads the initial MSP value; Then reads the reset vector; Branches to the start of the programme execution address
(reset handler); Subsequently executes program instructions
Reset
Fetch initial value for MSP(Read address 0x00000000)
Fetch reset vector(Read address 0x00000004)
Fetch 1st instruction(Read address of reset vector)
Fetch 2nd instruction(Read subsequent instructions)312 ECE 5655/4655 Real-Time DSP
-
Cortex-M4 EndiannessCortex-M4 Endianness Endian refers to the order of bytes stored in memory
Little endian: lowest byte of a word-size data is stored inbit 0 to bit 7
Big endian: lowest byte of a word-size data is stored in bit24 to bit 31
Cortex-M4 supports both little endian and big endian However, Endianness only exists in the hardware level
Byte0 Byte1 Byte2 Byte3
Byte0 Byte1 Byte2 Byte3
Byte0 Byte1 Byte2 Byte3
[7:0][15:8][23:16][31:24]
Word 1
Word 2
Word 3
Big endian 32-bit memory
Byte0Byte1Byte2Byte3
Byte0Byte1Byte2Byte3
Byte0Byte1Byte2Byte3
0x00000000
0x00000004
0x00000008
Address [7:0][15:8][23:16][31:24]
Word 1
Word 2
Word 3
Little endian 32-bit memoryECE 5655/4655 Real-Time DSP 313
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingARM and Thumb Instruction Set Early ARM instruction set
32-bit instruction set, called the ARM instructions Powerful and good performance Larger program memory compared to 8-bit and 16-bit pro-
cessors
Larger power consumption Thumb-1 instruction set
16-bit instruction set, first used in ARM7TDMI processorin 1995
Provides a subset of the ARM instructions, giving bettercode density compared to 32-bit RISC architecture
Code size is reduced by ~30%, but performance is alsoreduced by ~20%314 ECE 5655/4655 Real-Time DSP
-
ARM and Thumb Instruction Set (cont.)ARM and Thumb Instruction Set (cont.) Mix of ARM and Thumb-1 Instruction sets
Benefit from both 32-bit ARM (high performance) and 16-bit Thumb-1 (high code density)
A multiplexer is used to switch between two states: ARMstate (32-bit) and Thumb state (16-bit), which requires aswitching overhead
Thumb-2 instruction set Consists of both 32-bit Thumb instructions and original 16-
bit Thumb-1 instruction sets Compared to 32-bit ARM instructions set, code size is
reduced by ~26%, while keeping a similar performance Capable of handling all processing requirements in one oper-
ation state
IncomingInstructions Thumb remap
to ARM
ARMInstructiondecoder
InstructionsExecuting
T bit, 0: select ARM,1: select Thumb
0
1ECE 5655/4655 Real-Time DSP 315
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Cortex-M4 processor
ARMv7-M architecture Supports 32-bit Thumb-2 instructions Possible to handle all processing requirements in one oper-
ation state (Thumb state) Compared with traditional ARM processors (use state
switching), advantages include:* No state switching overhead both execution time and instruc-
tion space are saved* No need to separate ARM code and Thumb code source files,
which makes the development and maintenance of softwareeasier
* Easier to get optimized efficiency and performance316 ECE 5655/4655 Real-Time DSP
-
Cortex-M4 Instruction Set (cont.)Cortex-M4 Instruction Set (cont.) ARM assembly syntax:
labelmnemonic operand1,operand2, ; Comments
Label is used as a reference to an address location; Mnemonic is the name of the instruction; Operand1 is the destination of the operation; Operand2 is normally the source of the operation; Comments are written after ; , which does not affect the
program; For exampleMOVSR3, #0x11;Set register R3 to 0x11
Note that the assembly code can be assembled by eitherARM assembler (armasm) or assembly tools from a vari-ety of vendors (e.g. GNU tool chain). When using GNUtool chain, the syntax for labels and comments is slightlydifferentECE 5655/4655 Real-Time DSP 317
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
AD
C, A
DC
S{R
d,} R
n, O
p2A
dd w
ith C
arry
N,Z
,C,V
AD
D, A
DD
S{R
d,} R
n, O
p2A
ddN
,Z,C
,V
AD
D, A
DD
W{R
d,} R
n, #
imm
12A
ddN
,Z,C
,V
AD
RR
d, la
bel
Load
PC
-rel
ativ
e A
ddre
ss
AN
D, A
ND
S{R
d,} R
n, O
p2Lo
gica
l AN
DN
,Z,C
ASR
, ASR
SR
d, R
m,
Ari
thm
etic
Shi
ft R
ight
N,Z
,C
Bla
bel
Bran
ch
BFC
Rd,
#ls
b, #
wid
thBi
t Fie
ld C
lear
BFI
Rd,
Rn,
#ls
b, #
wid
thBi
t Fie
ld In
sert
BIC
, BIC
S{R
d,} R
n, O
p2Bi
t Cle
arN
,Z,C
BKPT
#im
mBr
eakp
oint
BLla
bel
Bran
ch w
ith L
ink
BLX
Rm
Bran
ch in
dire
ct w
ith L
ink
BXR
mBr
anch
indi
rect318 ECE 5655/4655 Real-Time DSP
-
Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
CBN
ZR
n, la
bel
Com
pare
and
Bra
nch
if N
on Z
ero
CBZ
Rn,
labe
lC
ompa
re a
nd B
ranc
h if
Zer
o
CLR
EXC
lear
Exc
lusi
ve
CLZ
Rd,
Rm
Cou
nt L
eadi
ng Z
eros
CM
NR
n, O
p2C
ompa
re N
egat
ive
N,Z
,C,V
CM
PR
n, O
p2C
ompa
reN
,Z,C
,V
CPS
IDi
Cha
nge
Proc
esso
r St
ate,
Dis
able
Inte
rrup
ts
CPS
IEi
Cha
nge
Proc
esso
r St
ate,
Ena
ble
Inte
rrup
ts
DM
BD
ata
Mem
ory
Barr
ier
DSB
Dat
a Sy
nchr
oniz
atio
n Ba
rrie
r
EOR
, EO
RS
{Rd,
} R
n, O
p2Ex
clus
ive
OR
N,Z
,C
ISB
-In
stru
ctio
n Sy
nchr
oniz
atio
n Ba
rrie
rECE 5655/4655 Real-Time DSP 319
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
ITIf-
The
n co
nditi
on b
lock
LDM
Rn{
!}, r
eglis
tLo
ad M
ultip
le r
egis
ters
, inc
rem
ent a
fter
LDM
DB,
LD
MEA
Rn{
!}, r
eglis
tLo
ad M
ultip
le r
egis
ters
, dec
rem
ent b
efor
e
LDM
FD, L
DM
IAR
n{!}
, reg
list
Load
Mul
tiple
reg
iste
rs, i
ncre
men
t afte
r
LDR
Rt,
[Rn,
#of
fset
]Lo
ad R
egis
ter
with
wor
d
LDR
B, L
DR
BTR
t, [R
n, #
offs
et]
Load
Reg
iste
r w
ith b
yte
LDR
DR
t, R
t2, [
Rn,
#of
fset
]Lo
ad R
egis
ter
with
tw
o by
tes
LDR
EXR
t, [R
n, #
offs
et]
Load
Reg
iste
r Ex
clus
ive
LDR
EXB
Rt,
[Rn]
Load
Reg
iste
r Ex
clus
ive
with
Byt
e
LDR
EXH
Rt,
[Rn]
Load
Reg
iste
r Ex
clus
ive
with
Hal
fwor
d
LDR
H, L
DR
HT
Rt,
[Rn,
#of
fset
]Lo
ad R
egis
ter
with
Hal
fwor
d320 ECE 5655/4655 Real-Time DSP
-
Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
LDR
SB, L
DR
SBT
Rt,
[Rn,
#of
fset
]Lo
ad R
egis
ter
with
Sig
ned
Byte
LDR
SH, L
DR
SHT
Rt,
[Rn,
#of
fset
]Lo
ad R
egis
ter
with
Sig
ned
Hal
fwor
d
LDRT
Rt,
[Rn,
#of
fset
]Lo
ad R
egis
ter
with
wor
d
LSL,
LSL
SR
d, R
m,
Logi
cal S
hift
Left
N,Z
,C
LSR
, LSR
SR
d, R
m,
Logi
cal S
hift
Rig
htN
,Z,C
MLA
Rd,
Rn,
Rm
, Ra
Mul
tiply
with
Acc
umul
ate,
32-
bit r
esul
t
MLS
Rd,
Rn,
Rm
, Ra
Mul
tiply
and
Sub
trac
t, 32
-bit
resu
lt
MO
V, M
OV
SR
d, O
p2M
ove
N,Z
,C
MO
VT
Rd,
#im
m16
Mov
e To
p
MO
VW
, MO
VR
d, #
imm
16M
ove
16-b
it co
nsta
ntN
,Z,C
MR
SR
d, s
pec_
reg
Mov
e fr
om S
peci
al R
egis
ter
to g
ener
al r
egis
ter
MSR
spec
_reg
, Rm
Mov
e fr
om g
ener
al r
egis
ter
to S
peci
al R
egis
ter
N,Z
,C,VECE 5655/4655 Real-Time DSP 321
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
MU
L, M
ULS
{Rd,
} Rn,
Rm
Mul
tiply,
32-
bit r
esul
tN
,Z
MV
N, M
VN
SR
d, O
p2M
ove
NO
TN
,Z,C
NO
PN
o O
pera
tion
OR
N, O
RN
S{R
d,} R
n, O
p2Lo
gica
l OR
NO
TN
,Z,C
OR
R, O
RR
S{R
d,} R
n, O
p2Lo
gica
l OR
N,Z
,C
PKH
TB,
PKH
BT{R
d, }
Rn,
Rm
, Op2
Pack
Hal
fwor
d
POP
regl
ist
Pop
regi
ster
s fr
om s
tack
PUSH
regl
ist
Push
reg
iste
rs o
nto
stac
k
QA
DD
{Rd,
} R
n, R
mSa
tura
ting
doub
le a
nd A
ddQ
QA
DD
16{R
d, }
Rn,
Rm
Satu
ratin
gAdd
16
QA
DD
8{R
d, }
Rn,
Rm
Satu
ratin
gAdd
8322 ECE 5655/4655 Real-Time DSP
-
Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
QA
SX{R
d, }
Rn,
Rm
Sa
tura
ting A
dd a
nd S
ubtr
act w
ith E
xcha
nge
QD
AD
D{R
d, }
Rn,
Rm
Satu
ratin
g Add
Q
QD
SUB
{Rd,
} R
n, R
mSa
tura
ting
doub
le a
nd S
ubtr
act
Q
QSA
X{R
d, }
Rn,
Rm
Sa
tura
ting
Subt
ract
and
Add
with
Exc
hang
e
QSU
B{R
d,}
Rn,
Rm
Satu
ratin
g Su
btra
ctQ
QSU
B16
{Rd,
} R
n, R
mSa
tura
ting
Subt
ract
16
QSU
B8{R
d, }
Rn,
Rm
Satu
ratin
g Su
btra
ct 8
RBI
TR
d, R
nR
ever
se B
its
REV
Rd,
Rn
Rev
erse
byt
e or
der
in a
wor
d
REV
16R
d, R
nR
ever
se b
yte
orde
r in
eac
h ha
lfwor
d
REV
SHR
d, R
nR
ever
se b
yte
orde
r in
bot
tom
hal
fwor
dan
d si
gn e
xten
d
ROR
, RO
RS
Rd,
Rm
, R
otat
e R
ight
N,Z
,CECE 5655/4655 Real-Time DSP 323
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
RR
X, R
RX
SR
d, R
mR
otat
e R
ight
with
Ext
end
N,Z
,C
RSB
, RSB
S{R
d,} R
n, O
p2R
ever
se S
ubtr
act
N,Z
,C,V
SAD
D16
{Rd,
} R
n, R
mSi
gned
Add
16
GE
SAD
D8
{Rd,
} R
n, R
mSi
gned
Add
8G
E
SASX
{Rd,
}R
n, R
mSi
gned
Add
and
Sub
trac
t with
Exc
hang
eG
E
SBC
, SBC
S{R
d,} R
n, O
p2Su
btra
ct w
ith C
arry
N,Z
,C,V
SBFX
Rd,
Rn,
#ls
b, #
wid
thSi
gned
Bit
Fiel
d Ex
trac
t
SDIV
{Rd,
} Rn,
Rm
Sign
ed D
ivid
e
SEV
Send
Eve
nt
SHA
DD
16{R
d,} R
n, R
mSi
gned
Hal
ving
Add
16
SHA
DD
8{R
d,} R
n, R
mSi
gned
Hal
ving
Add
8
SHA
SX{R
d,} R
n, R
mSi
gned
Hal
ving
Add
and
Sub
trac
t with
Exc
hang
e324 ECE 5655/4655 Real-Time DSP
-
Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
SHSA
X{R
d,}
Rn,
Rm
Sign
ed H
alvi
ng S
ubtr
act a
nd A
dd w
ith E
xcha
nge
SHSU
B16
{Rd,
} Rn,
Rm
Sign
ed H
alvi
ng S
ubtr
act 1
6
SHSU
B8{R
d,}
Rn,
Rm
Sign
ed H
alvi
ng S
ubtr
act 8
SMLA
BB, S
MLA
BT, S
MLA
TB,
SM
LAT
TR
d, R
n, R
m, R
aSi
gned
Mul
tiply
Acc
umul
ate
Long
(hal
fwor
ds)
Q
SMLA
D, S
MLA
DX
Rd,
Rn,
Rm
, Ra
Sign
ed M
ultip
ly A
ccum
ulat
e D
ual
Q
SMLA
LR
dLo,
RdH
i, R
n, R
mSi
gned
Mul
tiply
with
Acc
umul
ate
(32
x 32
+ 6
4), 6
4-bi
t re
sult
SMLA
LBB,
SM
LALB
T, SM
LALT
B,
SMLA
LTT
RdL
o, R
dHi,
Rn,
Rm
Sign
ed M
ultip
ly A
ccum
ulat
e Lo
ng, h
alfw
ords
SMLA
LD, S
MLA
LDX
RdL
o, R
dHi,
Rn,
Rm
Sign
ed M
ultip
ly A
ccum
ulat
e Lo
ng D
ual
SMLA
WB,
SM
LAW
TR
d, R
n, R
m, R
aSi
gned
Mul
tiply
Acc
umul
ate,
wor
d by
hal
fwor
dQ
SMLS
DR
d, R
n, R
m, R
aSi
gned
Mul
tiply
Sub
trac
t Dua
lQ
SMLS
LDR
dLo,
RdH
i, R
n, R
mSi
gned
Mul
tiply
Sub
trac
t Lon
g D
ual
SMM
LAR
d, R
n, R
m, R
aSi
gned
Mos
t sig
nific
ant w
ord
Mul
tiply
Acc
umul
ateECE 5655/4655 Real-Time DSP 325
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
SMM
LS, S
MM
LRR
d, R
n, R
m, R
aSi
gned
Mos
t sig
nific
ant w
ord
Mul
tiply
Sub
trac
t
SMM
UL,
SM
MU
LR{R
d,} R
n, R
mSi
gned
Mos
t sig
nific
ant w
ord
Mul
tiply
SMU
AD
{Rd,
} Rn,
Rm
Sign
ed d
ual M
ultip
ly A
ddQ
SMU
LBB,
SM
ULB
T S
MU
LTB,
SM
ULT
T{R
d,} R
n, R
mSi
gned
Mul
tiply
(hal
fwor
ds)
SMU
LLR
dLo,
RdH
i, R
n, R
mSi
gned
Mul
tiply
(32
x 32
), 64
-bit
resu
lt
SMU
LWB,
SM
ULW
T{R
d,} R
n, R
mSi
gned
Mul
tiply
wor
d by
hal
fwor
d
SMU
SD, S
MU
SDX
{Rd,
} Rn,
Rm
Sign
ed d
ual M
ultip
ly S
ubtr
act
SSAT
Rd,
#n,
Rm
{,shi
ft #s
}Si
gned
Sat
urat
eQ
SSAT
16R
d, #
n, R
mSi
gned
Sat
urat
e 16
Q
SSA
X{R
d,} R
n, R
mSi
gned
Sub
trac
t and
Add
with
Exc
hang
eG
E
SSU
B16
{Rd,
} Rn,
Rm
Sign
ed S
ubtr
act 1
6
SSU
B8{R
d,} R
n, R
mSi
gned
Sub
trac
t 8326 ECE 5655/4655 Real-Time DSP
-
Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
STM
Rn{
!}, r
eglis
tSt
ore
Mul
tiple
reg
iste
rs, i
ncre
men
t afte
r
STM
DB,
ST
MEA
Rn{
!}, r
eglis
tSt
ore
Mul
tiple
reg
iste
rs, d
ecre
men
t bef
ore
STM
FD, S
TM
IAR
n{!}
, reg
list
Stor
e M
ultip
le r
egis
ters
, inc
rem
ent a
fter
STR
Rt,
[Rn,
#of
fset
]St
ore
Reg
iste
r w
ord
STR
B, S
TR
BTR
t, [R
n, #
offs
et]
Stor
e R
egis
ter
byte
STR
DR
t, R
t2, [
Rn,
#of
fset
]St
ore
Reg
iste
r tw
o w
ords
STR
EXR
d, R
t, [R
n, #
offs
et]
Stor
e R
egis
ter
Excl
usiv
e
STR
EXB
Rd,
Rt,
[Rn]
Stor
e R
egis
ter
Excl
usiv
e By
te
STR
EXH
Rd,
Rt,
[Rn]
Stor
e R
egis
ter
Excl
usiv
e H
alfw
ord
STR
H, S
TR
HT
Rt,
[Rn,
#of
fset
]St
ore
Reg
iste
r H
alfw
ord
STRT
Rt,
[Rn,
#of
fset
]St
ore
Reg
iste
r w
ord
SUB,
SU
BS{R
d,} R
n, O
p2Su
btra
ctN
,Z,C
,VECE 5655/4655 Real-Time DSP 327
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
SUB,
SU
BW{R
d,} R
n, #
imm
12Su
btra
ctN
,Z,C
,V
SVC
#im
mSu
perv
isor
Cal
l
SXTA
B{R
d,}
Rn,
Rm
,{,RO
R #
}Ex
tend
8 b
its t
o 32
and
add
SXTA
B16
{Rd,
} R
n, R
m,{,
ROR
#}
Dua
l ext
end
8 bi
ts t
o 16
and
add
SXTA
H{R
d,}
Rn,
Rm
,{,RO
R #
}Ex
tend
16
bits
to 3
2 an
d ad
d
SXT
B16
{Rd,
} Rm
{,RO
R #
n}Si
gned
Ext
end
Byte
16
SXT
B{R
d,} R
m{,R
OR
#n}
Sign
ext
end
a by
te
SXT
H{R
d,} R
m{,R
OR
#n}
Sign
ext
end
a ha
lfwor
d
TBB
[Rn,
Rm
]Ta
ble
Bran
ch B
yte
TBH
[Rn,
Rm
, LSL
#1]
Tabl
e Br
anch
Hal
fwor
d
TEQ
Rn,
Op2
Test
Equ
ival
ence
N,Z
,C
TST
Rn,
Op2
Test
N,Z
,C328 ECE 5655/4655 Real-Time DSP
-
Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
UA
DD
16{R
d,} R
n, R
mU
nsig
ned
Add
16
GE
UA
DD
8{R
d,} R
n, R
mU
nsig
ned
Add
8G
E
USA
X{R
d,}
Rn,
Rm
Uns
igne
d Su
btra
ct a
nd A
dd w
ith E
xcha
nge
GE
UH
AD
D16
{Rd,
} Rn,
Rm
Uns
igne
d H
alvi
ng A
dd 1
6
UH
AD
D8
{Rd,
} Rn,
Rm
Uns
igne
d H
alvi
ng A
dd 8
UH
ASX
{Rd,
} Rn,
Rm
Uns
igne
d H
alvi
ng A
dd a
nd S
ubtr
act w
ith E
xcha
nge
UH
SAX
{Rd,
} Rn,
Rm
Uns
igne
d H
alvi
ng S
ubtr
act a
nd A
dd w
ith E
xcha
nge
UH
SUB1
6{R
d,} R
n, R
mU
nsig
ned
Hal
ving
Sub
trac
t 16
UH
SUB8
{Rd,
} Rn,
Rm
Uns
igne
d H
alvi
ng S
ubtr
act 8
UBF
XR
d, R
n, #
lsb,
#w
idth
Uns
igne
d Bi
t Fie
ld E
xtra
ct
UD
IV{R
d,} R
n, R
mU
nsig
ned
Div
ide
UM
AA
LR
dLo,
RdH
i, R
n, R
mU
nsig
ned
Mul
tiply
Acc
umul
ate
Acc
umul
ate
Long
(32
x 32
+
32 +
32),
64-b
it re
sultECE 5655/4655 Real-Time DSP 329
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
UM
LAL
RdL
o, R
dHi,
Rn,
Rm
Uns
igne
d M
ultip
ly w
ith A
ccum
ulat
e (3
2 x
32 +
64)
, 64-
bit
resu
lt
UM
ULL
RdL
o, R
dHi,
Rn,
Rm
Uns
igne
d M
ultip
ly (3
2 x
32),
64-b
it re
sult
UQ
AD
D16
{Rd,
} Rn,
Rm
Uns
igne
d Sa
tura
ting A
dd 1
6
UQ
AD
D8
{Rd,
} Rn,
Rm
Uns
igne
d Sa
tura
ting A
dd 8
UQ
ASX
{Rd,
} Rn,
Rm
Uns
igne
d Sa
tura
ting A
dd a
nd S
ubtr
act w
ith E
xcha
nge
UQ
SAX
{Rd,
} R
n, R
mU
nsig
ned
Satu
ratin
g Su
btra
ct a
nd A
dd w
ith E
xcha
nge
UQ
SUB1
6{R
d,} R
n, R
mU
nsig
ned
Satu
ratin
g Su
btra
ct 1
6
UQ
SUB8
{Rd,
} Rn,
Rm
Uns
igne
d Sa
tura
ting
Subt
ract
8
USA
D8
{Rd,
} Rn,
Rm
Uns
igne
d Su
m o
f Abs
olut
e D
iffer
ence
s
USA
DA
8{R
d,}
Rn,
Rm
, Ra
Uns
igne
d Su
m o
f Abs
olut
e D
iffer
ence
s an
d A
ccum
ulat
e
USA
TR
d, #
n, R
m{,s
hift
#s}
Uns
igne
d Sa
tura
teQ
USA
T16
Rd,
#n,
Rm
Uns
igne
d Sa
tura
te 1
6Q330 ECE 5655/4655 Real-Time DSP
-
Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
UA
SX{R
d,} R
n, R
mU
nsig
ned
Add
and
Sub
trac
t with
Exc
hang
eG
E
USU
B16
{Rd,
} Rn,
Rm
Uns
igne
d Su
btra
ct 1
6G
E
USU
B8{R
d,} R
n, R
mU
nsig
ned
Subt
ract
8G
E
UX
TAB
{Rd,
} Rn,
Rm
,{,RO
R #
}R
otat
e, e
xten
d 8
bits
to 3
2 an
d A
dd
UX
TAB1
6{R
d,} R
n, R
m,{,
ROR
#}
Rot
ate,
dua
l ext
end
8 bi
ts t
o 16
and
Add
UX
TAH
{Rd,
} Rn,
Rm
,{,RO
R #
}R
otat
e, u
nsig
ned
exte
nd a
nd A
dd H
alfw
ord
UX
TB
{Rd,
} Rm
{,RO
R #
n}Z
ero
exte
nd a
Byt
e
UX
TB1
6{R
d,} R
m {,
ROR
#n}
Uns
igne
d Ex
tend
Byt
e 16
UX
TH
{Rd,
} Rm
{,RO
R #
n}Z
ero
exte
nd a
Hal
fwor
d
VABS
.F32
Sd, S
mFl
oatin
g-po
int A
bsol
ute
VAD
D.F
32{S
d,} S
n, S
mFl
oatin
g-po
int A
dd
VC
MP.F
32Sd
,
Com
pare
two
float
ing-
poin
t reg
iste
rs, o
r on
e flo
atin
g-po
int r
egis
ter
and
zero
FPSC
RECE 5655/4655 Real-Time DSP 331
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
VC
MPE
.F32
Sd, C
ompa
re tw
o flo
atin
g-po
int r
egis
ters
, or
one
float
ing-
poin
t reg
iste
r an
d ze
ro w
ith In
valid
Ope
ratio
n ch
eck
FPSC
R
VC
VT.
S32.
F32
Sd, S
mC
onve
rt b
etw
een
float
ing-
poin
t and
inte
ger
VC
VT.
S16.
F32
Sd, S
d, #
fbits
Con
vert
bet
wee
n flo
atin
g-po
int a
nd fi
xed
poin
t
VC
VT
R.S
32.F
32Sd
, Sm
Con
vert
bet
wee
n flo
atin
g-po
int a
nd in
tege
r w
ith
roun
ding
VC
VT
.F3
2.F1
6Sd
, Sm
Con
vert
s ha
lf-pr
ecis
ion
valu
e to
sin
gle-
prec
isio
n
VC
VT
T.
F32.
F16
Sd, S
mC
onve
rts
sing
le-p
reci
sion
reg
iste
r to
hal
f-pre
cisi
on
VD
IV.F
32{S
d,} S
n, S
mFl
oatin
g-po
int D
ivid
e
VFM
A.F
32{S
d,} S
n, S
mFl
oatin
g-po
int F
used
Mul
tiply
Acc
umul
ate
VFN
MA
.F32
{Sd,
} Sn,
Sm
Floa
ting-
poin
t Fus
ed N
egat
e M
ultip
ly A
ccum
ulat
e
VFM
S.F3
2{S
d,} S
n, S
mFl
oatin
g-po
int F
used
Mul
tiply
Sub
trac
t
VFN
MS.
F32
{Sd,
} Sn,
Sm
Floa
ting-
poin
t Fus
ed N
egat
e M
ultip
ly S
ubtr
act
VLD
M.F
Rn{
!}, li
stLo
ad M
ultip
le e
xten
sion
reg
iste
rs332 ECE 5655/4655 Real-Time DSP
-
Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
VLD
R.F
,
[Rn]
Load
an
exte
nsio
n re
gist
er fr
om m
emor
y
VLM
A.F
32{S
d,} S
n, S
mFl
oatin
g-po
int M
ultip
ly A
ccum
ulat
e
VLM
S.F3
2{S
d,} S
n, S
mFl
oatin
g-po
int M
ultip
ly S
ubtr
act
VM
OV.
F32
Sd, #
imm
Floa
ting-
poin
t Mov
e im
med
iate
VM
OV
Sd, S
mFl
oatin
g-po
int M
ove
regi
ster
VM
OV
Sn, R
tC
opy A
RM
cor
e re
gist
er t
o si
ngle
pre
cisi
on
VM
OV
Sm, S
m1,
Rt,
Rt2
Cop
y 2
AR
M c
ore
regi
ster
s to
2 s
ingl
e pr
ecis
ion
VM
OV
Dd[
x], R
tC
opy A
RM
cor
e re
gist
er t
o sc
alar
VM
OV
Rt,
Dn[
x]C
opy
scal
ar t
o A
RM
cor
e re
gist
er
VM
RS
Rt,
FPSC
RM
ove
FPSC
R to
AR
M c
ore
regi
ster
or A
PSR
N,Z
,C,V
VM
SRFP
SCR
, Rt
Mov
e to
FPS
CR
from
AR
M C
ore
regi
ster
FPSC
R
VM
UL.
F32
{Sd,
} Sn,
Sm
Floa
ting-
poin
t Mul
tiplyECE 5655/4655 Real-Time DSP 333
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingCortex-M4 Instruction Set Tables (cont.)
Mne
mon
icO
pera
nds
Bri
ef d
escr
ipti
onFl
ags
VN
EG.F
32Sd
, Sm
Floa
ting-
poin
t Neg
ate
VN
MLA
.F32
Sd, S
n, S
mFl
oatin
g-po
int M
ultip
ly a
nd A
dd
VN
MLS
.F32
Sd, S
n, S
mFl
oatin
g-po
int M
ultip
ly a
nd S
ubtr
act
VN
MU
L{S
d,} S
n, S
mFl
oatin
g-po
int M
ultip
ly
VPO
Plis
tPo
p ex
tens
ion
regi
ster
s
VPU
SHlis
tPu
sh e
xten
sion
reg
iste
rs
VSQ
RT.F
32Sd
, Sm
Cal
cula
tes
float
ing-
poin
t Squ
are
Roo
t
VST
MR
n{!}
, list
Floa
ting-
poin
t reg
iste
r St
ore
Mul
tiple
VST
R.F
Sd, [
Rn]
Stor
es a
n ex
tens
ion
regi
ster
to
mem
ory
VSU
B.F{S
d,} S
n, S
mFl
oatin
g-po
int S
ubtr
act
WFE
Wai
t Fo
r Ev
ent
WFI
Wai
t Fo
r In
terr
upt
Note:
full e
xplan
ation
of ea
ch in
struc
tion c
an be
foun
d in C
ortex
-M4 D
evice
s Ge
neric
Use
r Guid
e (Re
f-4)334 ECE 5655/4655 Real-Time DSP
-
Cortex-M4 Instruction Set TablesCortex-M4 Instruction Set Tables (cont.) Cortex-M4 Suffix
Some instructions can be followed by suffixes to updateprocessor flags or execute the instruction on a certain con-dition
Suffi
x D
escr
ipti
onE
xam
ple
Exa
mpl
e ex
plan
atio
n
SU
pdat
e A
PSR
(flag
s)A
DD
SR
1,
#0x2
1A
dd0x
21 t
o R
1 an
d up
date
APS
R
EQ, N
E,C
S, C
C, M
I, PL
, VS,
VC, H
I, LS
, G
E, L
T, G
T, LE
Con
ditio
nex
ecut
ion
e.g.
EQ=
equa
l, N
E= n
ot e
qual
, LT=
less
tha
nBN
Ela
bel
Bran
ch t
o th
e la
bel i
f not
equ
alECE 5655/4655 Real-Time DSP 335
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingC Calling AssemblyFor real-time DSP applications the most common scenarioinvolving assembly code writing, if needed at all, will be C call-ing assembly. In simple terms the rules are:
Formally, the ARM Architecture Procedure Call Standard(AAPCS) defines: Which registers must be saved and restored How to call procedures How to return from procedures
AAPCS Register Use Conventions Make it easier to create modular, isolated and integrated code Scratch registers are not expected to be preserved upon
returning from a called subroutine This applies to r0r3
Preserved (variable) registers are expected to have theiroriginal values upon returning from a called subroutine This applies to r4r8, r10r11 Use PUSH {r4,..} and POP {r4,...}336 ECE 5655/4655 Real-Time DSP
-
C Calling AssemblyAAPCS Core Register Use
Reg
iste
rSy
nony
mSp
ecia
lR
ole
in t
he p
roce
dure
cal
l sta
ndar
dr1
5PC
The
Pro
gram
Cou
nter
.r1
4LR
The
Lin
k R
egis
ter.
r13
SPT
he S
tack
Poi
nter
.r1
2IP
The
Intr
a-Pr
oced
ure-
call
scra
tch
regi
ster
.r1
1v8
Vari
able
-reg
ister
8.
r10
v7Va
riab
le-r
egist
er 7
.
r9v6
,SB,
TR
Plat
form
reg
iste
r. T
he m
eani
ng o
f thi
s re
gist
er is
def
ined
by
the
pla
tform
sta
ndar
d.r8
v5Va
riab
le-r
egist
er 5
.r7
v4Va
riab
le r
egis
ter
4.r6
v3Va
riab
le r
egis
ter
3.r5
v2Va
riab
le r
egis
ter
2.r4
v1Va
riab
le r
egis
ter
1.r3
a4A
rgum
ent
/ scr
atch
reg
iste
r 4.
r2a3
Arg
umen
t / s
crat
ch r
egis
ter
3.r1
a2A
rgum
ent
/ res
ult
/ scr
atch
reg
iste
r 2.
r0a1
Arg
umen
t / r
esul
t / s
crat
ch r
egis
ter
1.
Mus
t be
sav
ed, r
esto
red
by
calle
e-pr
oced
ure
it m
ay m
odify
th
em.
Cal
ling
subr
outi
ne e
xpec
ts
thes
e to
ret
ain
thei
r va
lue.
Mus
t be
sav
ed, r
esto
red
by
calle
e-pr
oced
ure
it m
ay m
odify
th
em.
Don
t n
eed
to b
e sa
ved.
May
be
used
for
argu
men
ts, r
esul
ts, o
r te
mpo
rary
val
ues.ECE 5655/4655 Real-Time DSP 337
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingExample: Vector Norm SquaredIn this example we will be computing the squared length of avector using 16-bit (int16_t) signed numbers. In mathematicalterms we are finding
(3.1)
where
(3.2)is an -dimensional vector (column or row vector). The solution will be obtained in two different ways:
Conventional C programming Cortex-M assembly
Optimization is not a concern at this point The focus here is to see by way of a simple example, how to
call a C routine from C (obvious), and how to call an assem-bly routine from C
A 2 An2
n 1=
N
=
A A1 AN=N338 ECE 5655/4655 Real-Time DSP
-
Example: Vector Norm SquaredC Version We implement this simple routine in C using a declared vec-
tor length N and vector contents in the array v The C source, which includes the called functionnorm_sq_c is given below:
/******************************************************
Vector norm-squared routine in C
******************************************************/
int main(void){int16_t x = 0;int16_t v[5] = {1,2,3,6,7};...
x = norm_sq_c (v, 5);// call c functionsprintf(my_debug, "Norm: The answer is %d\n", x);TM_USART_Puts(USART6, my_debug);...
}
int16_t norm_sq_c(int16_t* v, int16_t n){
int16_t i;int16_t out = 0;for(i=0; i
-
Chapter 3 Cortex-M4 Architecture and ASM Programming The expected answer is
The cycle count comparison follow assembly version
Assembly Version The parent C routine is the following:
/******************************************************
Vector norm-squared routine in assembly
******************************************************/
extern int16_t norm_sq_asm(int16_t *x, int16_t n);
int main(){
int16_t x = 0;int16_t v[5] = {1,2,3,6,7};...
zInt_sq = simple_sqrt(zInt);sprintf(my_debug, "uint16_t SQRT of %d is %d\n",
zInt, zInt_sq);TM_USART_Puts(USART6, my_debug);...
}; File demo_asm.s
PRESERVE8 ; Preserve 8 byte stack alignment THUMB ; indicate THUMB code is used
AREA |.text|, CODE, READONLY;Start of the CODE areaEXPORT norm_sq_asm
norm_sq_asm FUNCTION; Input array address: R0
1 4 9 36 49+ + + + 99=From CoolTerm340 ECE 5655/4655 Real-Time DSP
-
Example: Vector Norm Squared; Number of elements: R1MOVS R2, R0 ; move the address in R0 to R2MOVS R0, #0 ; initialize the result
sum_loopLDRSH R3, [R2],#0x2; load int16_t value pointed to
; by R2 into R3, then incrementMLA R0, R3, R3, R0; sq & accum in one step (faster)SUBS R1, R1, #1; R1 = R1 - 1, decrement the countCMP R1, #0 ; compare to 0 and set Z registerBNE sum_loop; branch if compare not zeroBX LR ; return R0ENDFUNCEND ; End of file
From just the C source it is not obvious that the function pro-totype for norm_asm is actually an assembly routine
The answer is again 99
Performance Comparison In the Keil IDE debugger we set break points around the
function to b timed:
From CoolTermECE 5655/4655 Real-Time DSP 341
-
Chapter 3 Cortex-M4 Architecture and ASM Programming Then make note of the States and Sec in the registers win-dow:
Example: Unsigned Integer Square Root1uint32_t zInt = 64;uint32_t zInt_sq;...
arm_status status;// uint16_t Square root experimentzInt_sq = simple_sqrt(zInt);sprintf(my_debug, "uint16_t SQRT of %d is %d\n",
zInt, zInt_sq);TM_USART_Puts(USART6, my_debug);
; in demo_asm.sPRESERVE8 ; Preserve 8 byte stack alignment
THUMB ; indicate THUMB code is usedAREA |.text|, CODE, READONLY; Start of the CODE areaEXPORT simple_sqrt
simple_sqrt FUNCTION; Input : R0; Output : R0 (square root result)
1.Yiu Chapter 20, p. 664.
norm
_sq_
c with
O0
norm
_sq_
asm
norm
_sq_
c with
O3
cycles86
time0.51us
cycles49
time0.29us
cycles86
time0.47us
BEST!342 ECE 5655/4655 Real-Time DSP
-
Example: Unsigned Integer Square RootMOVW R1, #0x8000 ; R1 = 0x00008000MOVS R2, #0 ; Initialize result
simple_sqrt_loopADDS R2, R2, R1 ; M = (M + N)MUL R3, R2, R2 ; R3 = M^2CMP R3, R0 ; If M^2 > InputIT HI ; Greater ThanSUBHI R2, R2, R1 ; M = (M - N)LSRS R1, R1, #1 ; N = N >> 1BNE simple_sqrt_loopMOV R0, R2 ; Copy to R0 and returnBX LR ; ReturnENDFUNC
Function Flow ChartECE 5655/4655 Real-Time DSP 343
-
Chapter 3 Cortex-M4 Architecture and ASM ProgrammingSample Results For an input of 64 the output is 8, as expected
For an input of 99 the output is 9 (81 is closest to 99), asexpected
Useful Resources Architecture Reference Manual:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0403c/index.html
Cortex-M4 Technical Reference Manual:http://infocenter.arm.com/help/topic/com.arm.doc.ddi0439d/DDI0439D_cortex_m4_proces-
sor_r0p1_trm.pdf
Cortex-M4 Devices Generic User Guide:http://infocenter.arm.com/help/topic/com.arm.doc.dui0553a/DUI0553A_cortex_m4_dgug.pdf
cycles125
time0.74us
From CoolTerm344 ECE 5655/4655 Real-Time DSP
Cortex-M4 Architecture and ASM ProgrammingIntroductionOverviewCortex-M4 Memory MapM4 Memory Map (cont.)M4 Memory Map (cont.)Cortex-M4 Memory Map ExampleBit-band OperationsBit-band Operation ExampleBit-band Alias AddressBit-band Alias Address (cont.)Benefits of Bit-Band OperationsCortex-M4 Program ImageCortex-M4 Program Image (cont)Cortex-M4 EndiannessARM and Thumb Instruction SetARM and Thumb Instruction Set (cont.)Cortex-M4 Instruction SetCortex-M4 Instruction Set (cont.)Cortex-M4 Instruction Set TablesC Calling AssemblyAAPCS Register Use ConventionsAAPCS Core Register Use
Example: Vector Norm SquaredC VersionAssembly VersionPerformance Comparison
Example: Unsigned Integer Square RootFunction Flow ChartSample Results
Useful Resources