arm

11

ARMARM

ARM ARCHITECTUREARM ARCHITECTURE

ARM INHERITANCEARM INHERITANCE

ARM PROGRAMMERS MODELARM PROGRAMMERS MODEL

22

What Is ARM?What Is ARM?

Advanced RISC MachineAdvanced RISC Machine

First RISC microprocessor First RISC microprocessor

for commercial usefor commercial use

Market-leader for low-powerMarket-leader for low-power

and cost-sensitive embedded applicationsand cost-sensitive embedded applications

33

ARM Powered ProductsARM Powered Products

44

FeaturesFeatures

Architectural simplicityArchitectural simplicity

which allowswhich allows

Very small implementationsVery small implementations

which result inwhich result in

Very low power consumptionVery low power consumption

55

The History of ARMThe History of ARM

Developed at Acorn Computers Limited,

of Cambridge, England,

between 1983 and 1985

Problems with CISC:

Slower then memory parts

Clock cycles per instruction

66

The History of ARM (2)The History of ARM (2)

Solution – the Berkeley RISC I:

Competitive

Easy to develop (less than a year)

Cheap

Pointing the way to the future

77

ARM ArchitectureARM Architecture

Typical RISC architecture:Typical RISC architecture:

Large uniform register fileLarge uniform register file

Load/store architectureLoad/store architecture

Simple addressing modesSimple addressing modes

Uniform and fixed-length instruction fieldsUniform and fixed-length instruction fields

88

ARM Architecture (2)ARM Architecture (2)

Enhancements:Enhancements:

Each instruction controls the ALU and shifterEach instruction controls the ALU and shifter

Auto-incrementAuto-increment

and auto-decrement addressing modesand auto-decrement addressing modes

Multiple Load/StoreMultiple Load/Store

Conditional executionConditional execution

99

ARM Architecture (3)ARM Architecture (3)

Results:Results:

High performanceHigh performance

Low code sizeLow code size

Low power consumptionLow power consumption

Low silicon areaLow silicon area

1010

ARCHITECTURAL INHERITANCEARCHITECTURAL INHERITANCE

FEATURES USEDFEATURES USED A LOAD STORE ARCHITECTUREA LOAD STORE ARCHITECTURE FIXED LENGTH 32 BIT INSTRUCTIONSFIXED LENGTH 32 BIT INSTRUCTIONS 3 ADDRESS INSTRUCTION FORMATS3 ADDRESS INSTRUCTION FORMATSFEATURES REJECTEDFEATURES REJECTED REGISTER WINDOWSREGISTER WINDOWS DELAYED BRANCHESDELAYED BRANCHES SINGLE CYCLE EXECUTION OF ALL SINGLE CYCLE EXECUTION OF ALL

INSTRUCTIONSINSTRUCTIONS

1111

Pipeline OrganizationPipeline Organization

Increases speed – Increases speed –

most instructions executed in single cyclemost instructions executed in single cycle

Versions:Versions: 3-stage (ARM7TDMI and earlier)3-stage (ARM7TDMI and earlier)

5-stage (ARMS, ARM9TDMI)5-stage (ARMS, ARM9TDMI)

6-stage (ARM10TDMI)6-stage (ARM10TDMI)

1212

Pipeline Organization (2)Pipeline Organization (2)

3-stage pipeline: Fetch – Decode - Execute3-stage pipeline: Fetch – Decode - Execute

Three-cycle latency, Three-cycle latency,

one instruction per cycle throughputone instruction per cycle throughput

cycle

Fetch Decode Execute



instruction

t t+1 t+2 t+3 t+4

i

i+1

i+2

1313

Write-backWrite-back

Buffer/dataBuffer/data

ExecuteExecute

DecodeDecode


5-stage pipeline:5-stage pipeline: Reduces work per cycle => Reduces work per cycle =>

allows higher clock frequencyallows higher clock frequency

Separates data and Separates data and instruction memory => instruction memory => reduction of CPI reduction of CPI

(average number (average number of clock Cycles Per Instruction)of clock Cycles Per Instruction)

Stages:Stages:

FetchFetch

1414


Pipeline flushed and refilled on branch,Pipeline flushed and refilled on branch,

causing execution to slow downcausing execution to slow down

Special features in instruction setSpecial features in instruction set

eliminate small jumps in codeeliminate small jumps in code

to obtain the best flow through pipelineto obtain the best flow through pipeline

1515

Operating ModesOperating Modes

Seven operating modes:Seven operating modes: UserUser

Privileged:Privileged:

System (version 4 and above)System (version 4 and above)

FIQFIQ

IRQIRQ

AbortAbort

Undefined Undefined

SupervisorSupervisor

exception modesexception modes

1616

Operating Modes (2)Operating Modes (2)

User mode:User mode:

Normal program Normal program execution modeexecution mode

System resources System resources unavailableunavailable

Mode changed Mode changed by exception onlyby exception only

Exception modes:Exception modes:

Entered Entered upon exceptionupon exception

Full accessFull accessto system resourcesto system resources

Mode changed freelyMode changed freely

1717

ExceptionsExceptions

Table 1 - Exception types, sorted by Interrupt Vector addressesTable 1 - Exception types, sorted by Interrupt Vector addresses

ExceptionException ModeMode PriorityPriority IV AddressIV Address

ResetReset SupervisorSupervisor 11 0x000000000x00000000

Undefined instructionUndefined instruction UndefinedUndefined 66 0x000000040x00000004

Software interruptSoftware interrupt SupervisorSupervisor 66 0x000000080x00000008

Prefetch AbortPrefetch Abort AbortAbort 55 0x0000000C0x0000000C

Data AbortData Abort AbortAbort 22 0x000000100x00000010

InterruptInterrupt IRQIRQ 44 0x000000180x00000018

Fast interruptFast interrupt FIQFIQ 33 0x0000001C0x0000001C

1818

ARM RegistersARM Registers

31 general-purpose 32-bit registers31 general-purpose 32-bit registers

16 visible, R0 – R1516 visible, R0 – R15

Others speed up the exception processOthers speed up the exception process

1919

ARM Registers (2)ARM Registers (2)

Special roles:Special roles: Hardware Hardware

R14 – Link Register (LR): R14 – Link Register (LR): optionally holds return address optionally holds return address for branch instructionsfor branch instructions

R15 – Program Counter (PC)R15 – Program Counter (PC)

SoftwareSoftware

R13 - Stack Pointer (SP)R13 - Stack Pointer (SP)

2020


Current Program Status Register (CPSR)Current Program Status Register (CPSR)

Saved Program Status Register (SPSR)Saved Program Status Register (SPSR)

On exception, entering On exception, entering modmod mode: mode: (PC + 4) (PC + 4) LR LR

CPSR CPSR SPSR_modSPSR_mod

PC PC IV address IV address

R13, R14 replaced by R13_mod, R14_modR13, R14 replaced by R13_mod, R14_mod

In case of FIQ mode R7 – R12 also replacedIn case of FIQ mode R7 – R12 also replaced

2121


R0R1R2R3R4R5R6R7R8R9R10R11R12R13R14

R15 (PC)

CPSR

System & UserR0R1R2R3R4R5R6

R7_fiqR8_fiqR9_fiqR10_fiqR11_fiqR12_fiqR13_fiqR14_fiq

R15 (PC)

CPSRSPSR_fiq

FIQR0R1R2R3R4R5R6R7R8R9R10R11R12

R13_irqR14_irq

R15 (PC)

CPSRSPSR_irq

IRQR0R1R2R3R4R5R6R7R8R9R10R11R12

R13_svcR14_svcR15 (PC)

CPSRSPSR_svc

SupervisorR0R1R2R3R4R5R6R7R8R9R10R11R12

R13_abtR14_abtR15 (PC)

CPSRSPSR_abt

AbortR0R1R2R3R4R5R6R7R8R9R10R11R12

R13_undR14_undR15 (PC)

CPSRSPSR_und

Undefined

2222

Instruction SetInstruction Set

Two instruction sets:Two instruction sets: ARMARM

Standard 32-bit instruction setStandard 32-bit instruction set

THUMBTHUMB

16-bit compressed form16-bit compressed form

Code density better than most CISCCode density better than most CISC

Dynamic decompression in pipelineDynamic decompression in pipeline

2323

ARM Instruction SetARM Instruction Set

Features:Features: Load/Store architectureLoad/Store architecture

3-address data processing instructions3-address data processing instructions

Conditional executionConditional execution

Load/Store multiple registersLoad/Store multiple registers

Shift & ALU operation in single clock cycleShift & ALU operation in single clock cycle

2424

ARM Instruction Set (2)ARM Instruction Set (2)

Conditional execution:Conditional execution: Each data processing instruction Each data processing instruction

prefixed by condition codeprefixed by condition code

Result – smooth flow of instructions through pipelineResult – smooth flow of instructions through pipeline

16 condition codes:16 condition codes:

EQEQ equalequal MIMI negativenegative HIHI unsigned unsigned higherhigher GTGT signed greater signed greater

thanthan

NENE not equalnot equal PLPL positive or positive or zerozero LSLS unsigned lower unsigned lower

or sameor same LELE signed less signed less than or equalthan or equal

CSCSunsigned unsigned higher or higher or samesame

VSVS overflowoverflow GEGE signed greater signed greater than or equalthan or equal ALAL alwaysalways

CCCC unsigned unsigned lowerlower VCVC no overflowno overflow LTLT signed less signed less

thanthan NVNV special special purposepurpose

2525

ARM Instruction Set (3)ARM Instruction Set (3)

ARM instruction set

Data processing instructions

Data transfer instructions

Software interruptinstructions

Block transferinstructions

Multiply instructions

Branching instructions

2626

Data Processing Instructions Data Processing Instructions

Arithmetic and logical operationsArithmetic and logical operations

3-address format:3-address format: Two 32-bit operands Two 32-bit operands

(op1 is register, op2 is register or immediate)(op1 is register, op2 is register or immediate)

32-bit result placed in a register32-bit result placed in a register

Barrel shifter for op2 allows full 32-bit shiftBarrel shifter for op2 allows full 32-bit shiftwithin instruction cyclewithin instruction cycle

2727

Data Processing Instructions (2) Data Processing Instructions (2)

Arithmetic operations:Arithmetic operations: ADD, ADDC, SUB, SUBC, RSB, RSCADD, ADDC, SUB, SUBC, RSB, RSC

Bit-wise logical operations:Bit-wise logical operations: AND, EOR, ORR, BICAND, EOR, ORR, BIC

Register movement operations:Register movement operations: MOV, MVN MOV, MVN

Comparison operations:Comparison operations: TST, TEQ, CMP, CMNTST, TEQ, CMP, CMN

2828

Data Processing Instructions (3)Data Processing Instructions (3)

Conditional codesConditional codes

++Data processing instructionsData processing instructions

++Barrel shifterBarrel shifter

==Powerful tools for efficient coded programsPowerful tools for efficient coded programs

2929

Data Processing Instructions (4)Data Processing Instructions (4)

e.g.:e.g.:

if (z==1) R1=R2+(R3*4) if (z==1) R1=R2+(R3*4)

compiles tocompiles to

EQADDS R1,R2,R3, LSL #2EQADDS R1,R2,R3, LSL #2

( SINGLE INSTRUCTION ! )( SINGLE INSTRUCTION ! )

3030

Data Transfer InstructionsData Transfer Instructions

Load/store instructionsLoad/store instructions

Used to move signed and unsigned Used to move signed and unsigned Word, Half Word and Byte to and from registersWord, Half Word and Byte to and from registers

Can be used to load PC Can be used to load PC (if target address is beyond branch instruction range)(if target address is beyond branch instruction range)

LDRLDR Load WordLoad Word STRSTR Store WordStore Word

LDRHLDRH Load Half WordLoad Half Word STRHSTRH Store Half WordStore Half Word

LDRSHLDRSH Load Signed Half WordLoad Signed Half Word STRSHSTRSH Store Signed Half WordStore Signed Half Word

LDRBLDRB Load ByteLoad Byte STRBSTRB Store ByteStore Byte

LDRSBLDRSB Load Signed ByteLoad Signed Byte STRSBSTRSB Store Signed ByteStore Signed Byte

3131

Block Transfer InstructionsBlock Transfer Instructions

Load/Store Multiple instructions Load/Store Multiple instructions ((LDMLDM//STMSTM) )

Whole register bank Whole register bank or a or a subset subset copied to memory or restored copied to memory or restored with single instructionwith single instruction

R0

R1

R2

R14

R15

Mi

Mi+1

Mi+2

Mi+14

Mi+15

LDM

STM

3232

Swap InstructionSwap Instruction

Exchanges a word Exchanges a word between registers between registers

Two cyclesTwo cycles

butbut

single atomic actionsingle atomic action

Support for RT Support for RT semaphoressemaphores

R0

R1

R2

R7

R8

R15

3333

Modifying the Status RegistersModifying the Status Registers

Only indirectlyOnly indirectly

MSRMSR moves contents moves contents from CPSR/SPSR from CPSR/SPSR to selected GPRto selected GPR

MRSMRS moves contents moves contents from selected GPR from selected GPR to CPSR/SPSRto CPSR/SPSR

Only in privileged Only in privileged modesmodes

R0

R1

R7

R8

R14

R15

CPSRSPSR

MSR

MRS

3434

Multiply InstructionsMultiply Instructions

Integer multiplication (32-bit result)Integer multiplication (32-bit result)

Long integer multiplication (64-bit result) Long integer multiplication (64-bit result)

Built in Multiply Accumulate Unit (MAC)Built in Multiply Accumulate Unit (MAC)

Multiply and accumulate instructions add product Multiply and accumulate instructions add product to running totalto running total

3535

Multiply InstructionsMultiply Instructions

Instructions:Instructions:

MULMUL MultiplyMultiply 32-bit result32-bit result

MULAMULA Multiply accumulateMultiply accumulate 32-bit result32-bit result

UMULLUMULL Unsigned multiplyUnsigned multiply 64-bit result64-bit result

UMLALUMLAL Unsigned multiply accumulateUnsigned multiply accumulate 64-bit result64-bit result

SMULLSMULL Signed multiplySigned multiply 64-bit result64-bit result

SMLALSMLAL Signed multiply accumulateSigned multiply accumulate 64-bit result64-bit result

3636

Software InterruptSoftware Interrupt

SWISWI instruction instruction Forces CPU into supervisor modeForces CPU into supervisor mode Usage: SWI #nUsage: SWI #n

Maximum 2Maximum 22424 calls calls Suitable for running privileged code and Suitable for running privileged code and making OS callsmaking OS calls

Cond Opcode Ordinal

31 28 27 24 23 0

3737

Branching InstructionsBranching Instructions

BranchBranch (B): (B): jumps forwards/backwards jumps forwards/backwards

up to 32 MBup to 32 MB

Branch linkBranch link (BL): (BL): same same + saves (PC+4) in LR+ saves (PC+4) in LR

Suitable for function call/returnSuitable for function call/return

Condition codes for conditional branchesCondition codes for conditional branches

3838

Branching Instructions (2)Branching Instructions (2)

Branch exchange (BX) Branch exchange (BX) and and Branch link exchange (BLX)Branch link exchange (BLX): :

same as same as B/BL B/BL ++ exchange exchange instruction setinstruction set (ARM (ARM THUMB) THUMB)

Only way to swap setsOnly way to swap sets

3939

Thumb Instruction SetThumb Instruction Set

Compressed form of ARMCompressed form of ARM Instructions stored as 16-bit,Instructions stored as 16-bit, Decompressed into ARM instructions andDecompressed into ARM instructions and ExecutedExecuted

Lower performance (ARM 40% faster)Lower performance (ARM 40% faster)

Higher density (THUMB saves 30% space)Higher density (THUMB saves 30% space)

Optimal – Optimal – ““interworkinginterworking” ” (combining two sets) – (combining two sets) – compiler compiler supportedsupported

4040

THUMB Instruction Set (2)THUMB Instruction Set (2)

More traditional:More traditional: No condition codesNo condition codes Two-address data processing instructionsTwo-address data processing instructions

Access to R0 – R8 restricted toAccess to R0 – R8 restricted to MOVMOV, , ADDADD, , CMPCMP

PUSH/POP for stack manipulationPUSH/POP for stack manipulation Descending stack (SP hardwired to R13)Descending stack (SP hardwired to R13)

4141

THUMB Instruction Set (3)THUMB Instruction Set (3)

No No MSR MSR and and MRSMRS, , must change to ARM to modify CPSR must change to ARM to modify CPSR (change using (change using BX BX or or BLXBLX))

ARM entered automatically after RESET ARM entered automatically after RESET or entering exception modeor entering exception mode

Maximum 255 SWI callsMaximum 255 SWI calls

4242

The Next StepThe Next Step

New ARM Cortex family of processors New NEON™ media and signal

processing extensions Thumb®-2 blended 16/32-bit instruction set

for performance and low power Improved Interrupt handling

4343

SummarySummary

Adoption of ARM technology has increased efficiency and lowered costs

ARM is the world’s leading architecture today 3 billion ARM Powered chips and counting

arm

Documents

thumb instruction set

data processing instructions

arm instruction set

bit instruction set

arm architecture

bit result

bit result 64

arm registers